![Jiawei Zhao Profile](https://pbs.twimg.com/profile_images/1815995995022581761/VauzPgBj_x96.jpg)
Jiawei Zhao
@jiawzhao
Followers
788
Following
159
Statuses
74
Research Scientist at @AIatMeta (FAIR), PhD @Caltech, Fmr Research Intern @nvidia
Joined February 2013
RT @zechunliu: Our ParetoQ is substantially better than the previous work in ternary LLM, such as 1-bit era paper.
0
5
0
RT @KaiyuYang4: 🚀 Excited to share our position paper: "Formal Mathematical Reasoning: A New Frontier in AI"! 🔗 LL…
0
139
0
RT @TongPetersb: This project really changed how I think about multimodal models and LLMs. I used to believe that multimodal (visual) predi…
0
92
0
RT @BeidiChen: 🐷 MagicPig was developed during our efforts to create challenging reasoning tasks that showcase the true potential of long-c…
0
18
0
RT @AnimaAnandkumar: Excited to present our work on Tensor-GaLore at the Optimization for Machine Learning Workshop #NeurIPS2024! We prese…
0
28
0
RT @drjingjing2026: 1/3 Today, an anecdote shared by an invited speaker at #NeurIPS2024 left many Chinese scholars, myself included, feelin…
0
630
0
RT @Xinyu2ML: 📢 Announcing our new PEFT family S²FT @NeurIPS2024 ❗️❗️❗️ 😀 Join us at our poster presentation in West Ballroom A-D, Booth #7…
0
8
0
RT @KyriectionZhang: ❓ How much optimization states memory do we need for LLM training ? 🧐Almost zero. 📢 Introducing APOLLO! 🚀 A revolutio…
0
12
0
RT @ZongyiLiCaltech: #NeurIPS I am on the 2024-25 job market seeking faculty positions and postdocs! My goal is to advance AI for scientifi…
0
64
0
RT @BeidiChen: 🥳We're recruiting PhD students at CMU for Fall 2025! If you are interested in machine-learning algorithms and systems (🔑Keyw…
0
150
0
Thank you @AnimaAnandkumar for your kind words and support through years! I also appreciate the help from all my friends and colleagues throughout my PhD journey!
Congratulations @jiawzhao on an excellent PhD defense! Jiawei has been a pioneer in hardware-efficient training. When he started his PhD, everyone was focusing on inference efficiency, and training runs were small, Jiawei took the bold step to pursue training efficiency. Slides: Jiawei's latest work Galore has led to huge reduction (>80%) of memory requirements for pretraining LLMs. It is based on a principled approach of projecting the gradient to a low-rank subspace while keeping the weights as full rank. Jiawei also worked on reduced precision training in logarithmic number system (LNS). The LNS is extreme quantization: instead of a standard floating-point system with exponent and mantissa, LNS only has the exponent part. Jiawei proposed a multiplicative update method to directly train in LNS. This will impact training on smaller edge devices: which is not yet mainstream today. Jiawei first started working on signSGD: surprisingly, just communicating the signs of gradients in distributed training is sufficient for accuracy while significantly reducing bandwidth requirements. A variant of it (Lion optimizer) is now used for LLMs. Jiawei has also done in-depth work on initializing neural networks with just zeros and ones, instead of random initialization. He has shown that this can help with better reproducibility while preserving accuracy. Congratulations @jiawzhao
2
3
120
RT @tydsh: We are hiring a postdoc working on AI4Math with @KristinLauter and @KaiyuYang4! Welcome to apply using the following link: https…
0
6
0