![Sharan Narang Profile](https://pbs.twimg.com/profile_images/1511211075894620162/HC2yAjH0_x96.jpg)
Sharan Narang
@sharan0909
Followers
2K
Following
462
Statuses
198
LLMs and AI Research (Llama 2 & 3 lead) @Meta | ex @Google (PaLM lead, T5), ex @Baidu (Deep Speech 2, Sparse Neural Networks), ex @Nvidia
San Francisco, CA
Joined May 2011
RT @Drweloveu: Thread of horrific accidents caught on camera 💔😞 (Don`t open if you are soft hearted)
0
3K
0
It's also great to see that Scale AI evaluated our model and showed that it is competitive all closed source models on coding, math, and instruction following. Full thread:
1/Meta just released Llama3.1 405B! @scale_AI partnered deeply with @Meta on this release: 🥇 SEAL Evaluations: Based on our evals 🥇 on IF 🥈 on Math #4 on Coding 💼 Enterprise partnership for custom Llama models 🤖 Data Foundry partnership on RLHF & SFT 👇
1
0
4
The 405B model is competitive with state of the art models as shown in the benchmark and human evaluation results. For an overview of technical details, checkout this thread by @astonzhangAZ:
Our Llama 3.1 405B is now openly available! After a year of dedicated effort, from project planning to launch reviews, we are thrilled to open-source the Llama 3 herd of models and share our findings through the paper: 🔹Llama 3.1 405B, continuously trained with a 128K context length following pre-training with an 8K context length, supports multilinguality and tool usage. It offers performance comparable to leading language models, such as GPT-4, across a range of tasks. 🔹Compared to previous Llama models, we have enhanced the preprocessing and curation pipelines for pre-training data, as well as the quality assurance and filtering methods for post-training data. 🔹Pre-training 405B on 15.6T tokens (3.8x10^25 FLOPs) was a significant challenge. We optimized our entire training stack and used over 16K H100 GPUs. 🔹To support large-scale production inference for the 405B model, we quantized from 16-bit (BF16) to 8-bit (FP8), reducing compute requirements and enabling the model to run on a single server node. 🔹We leveraged the 405B model to improve the post-training quality of our 70B and 8B models. 🔹In post-training, we refined chat models with multiple rounds of alignment involving supervised fine-tuning (SFT), rejection sampling, and direct preference optimization. We generate most SFT examples using synthetic data. 🔹We integrated image, video, and speech capabilities into Llama 3 using a compositional approach, enabling models to recognize images and videos and support interaction via speech. They are under development and not yet ready for release. 🔹We've updated our license to allow developers to use outputs from Llama models to enhance other models. There is nothing more rewarding than working at the forefront of AI development alongside some of the brightest minds in the field and publishing our research transparently. I'm excited about the innovations our open-source models enable and the potential of the future herd of Llamas!
1
0
3
It's been an incredible journey from Llama 2 to Llama 3.1 in the past year! The 405B model, in particular, has been a great learning experience with many highs and (some) lows. Very proud of the entire pretraining and post training teams that delivered this amazing model! 🧵
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context window and improved support for 8 languages among other improvements. Llama 3.1 405B rivals leading closed source models on state-of-the-art capabilities across a range of tasks in general knowledge, steerability, math, tool use and multilingual translation. The models are available to download now directly from Meta or @huggingface. With today’s release the ecosystem is also ready to go with 25+ partners rolling out our latest models — including @awscloud, @nvidia, @databricks, @groqinc, @dell, @azure and @googlecloud ready on day one. More details in the full announcement ➡️ Download Llama 3.1 models ➡️ With these releases we’re setting the stage for unprecedented new opportunities and we can’t wait to see the innovation our newest models will unlock across all levels of the AI community.
3
3
37
@giffmana @_jasonwei @giffmana you are the ML twitter corrector we need, correcting everyone from Turing award winners to researchers! Thanks for the public service :)
1
0
8
@stanislavfort It really comes down to goals. Our goal was to train a large scale LLM at high quality which required derisking infra, data, ML changes. All of these require compute. We did run some scaling experiments but nothing as systematic as the Chinchilla work.
0
0
1
@deliprao Actually the chinchilla paper helped the field significantly. Before that, the original scaling laws paper recommended scaling the model size at the same rate as the dataset size. Chinchilla showed the correct formulation of scaling laws.
0
1
11