Sharan Narang Profile
Sharan Narang

@sharan0909

Followers
2K
Following
462
Statuses
198

LLMs and AI Research (Llama 2 & 3 lead) @Meta | ex @Google (PaLM lead, T5), ex @Baidu (Deep Speech 2, Sparse Neural Networks), ex @Nvidia

San Francisco, CA
Joined May 2011
Don't wanna be here? Send us removal request.
@sharan0909
Sharan Narang
2 years
We released the Llama 2 model ( as a foundational and chat model. I'm sharing some results from the paper in this thread. Paper: 1/n
1
5
84
@sharan0909
Sharan Narang
2 months
@_arohan_ @AIatMeta Welcome to the team! Excited to work together :)
1
0
8
@sharan0909
Sharan Narang
2 months
RT @Drweloveu: Thread of horrific accidents caught on camera 💔😞 (Don`t open if you are soft hearted)
Tweet media one
0
3K
0
@sharan0909
Sharan Narang
5 months
@hwchung27 Really nice work! Congrats
0
0
6
@sharan0909
Sharan Narang
5 months
@arvind_io @AIatMeta @OpenAI Welcome to meta! Great to work together again :)
1
0
3
@sharan0909
Sharan Narang
7 months
Llama 4: 🚀
0
1
12
@sharan0909
Sharan Narang
7 months
@karpathy @pdeyhim As promised, the research paper is out:
0
0
1
@sharan0909
Sharan Narang
7 months
As promised a while back, we've published a research paper with all the findings. Onward to Llama 4🦙🦙🦙🦙🚀
@sharan0909
Sharan Narang
10 months
@karpathy @pdeyhim It’s going to be a paper, not a tech report 😊
1
2
7
@sharan0909
Sharan Narang
7 months
It's also great to see that Scale AI evaluated our model and showed that it is competitive all closed source models on coding, math, and instruction following. Full thread:
Tweet media one
@alexandr_wang
Alexandr Wang
7 months
1/Meta just released Llama3.1 405B! @scale_AI partnered deeply with @Meta on this release: 🥇 SEAL Evaluations: Based on our evals 🥇 on IF 🥈 on Math #4 on Coding 💼 Enterprise partnership for custom Llama models 🤖 Data Foundry partnership on RLHF & SFT 👇
Tweet media one
1
0
4
@sharan0909
Sharan Narang
7 months
The 405B model is competitive with state of the art models as shown in the benchmark and human evaluation results. For an overview of technical details, checkout this thread by @astonzhangAZ:
Tweet media one
Tweet media two
@astonzhangAZ
Aston Zhang
7 months
Our Llama 3.1 405B is now openly available! After a year of dedicated effort, from project planning to launch reviews, we are thrilled to open-source the Llama 3 herd of models and share our findings through the paper: 🔹Llama 3.1 405B, continuously trained with a 128K context length following pre-training with an 8K context length, supports multilinguality and tool usage. It offers performance comparable to leading language models, such as GPT-4, across a range of tasks. 🔹Compared to previous Llama models, we have enhanced the preprocessing and curation pipelines for pre-training data, as well as the quality assurance and filtering methods for post-training data. 🔹Pre-training 405B on 15.6T tokens (3.8x10^25 FLOPs) was a significant challenge. We optimized our entire training stack and used over 16K H100 GPUs. 🔹To support large-scale production inference for the 405B model, we quantized from 16-bit (BF16) to 8-bit (FP8), reducing compute requirements and enabling the model to run on a single server node. 🔹We leveraged the 405B model to improve the post-training quality of our 70B and 8B models. 🔹In post-training, we refined chat models with multiple rounds of alignment involving supervised fine-tuning (SFT), rejection sampling, and direct preference optimization. We generate most SFT examples using synthetic data. 🔹We integrated image, video, and speech capabilities into Llama 3 using a compositional approach, enabling models to recognize images and videos and support interaction via speech. They are under development and not yet ready for release. 🔹We've updated our license to allow developers to use outputs from Llama models to enhance other models. There is nothing more rewarding than working at the forefront of AI development alongside some of the brightest minds in the field and publishing our research transparently. I'm excited about the innovations our open-source models enable and the potential of the future herd of Llamas!
Tweet media one
1
0
3
@sharan0909
Sharan Narang
7 months
It's been an incredible journey from Llama 2 to Llama 3.1 in the past year! The 405B model, in particular, has been a great learning experience with many highs and (some) lows. Very proud of the entire pretraining and post training teams that delivered this amazing model! 🧵
@AIatMeta
AI at Meta
7 months
Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context window and improved support for 8 languages among other improvements. Llama 3.1 405B rivals leading closed source models on state-of-the-art capabilities across a range of tasks in general knowledge, steerability, math, tool use and multilingual translation. The models are available to download now directly from Meta or @huggingface. With today’s release the ecosystem is also ready to go with 25+ partners rolling out our latest models — including @awscloud, @nvidia, @databricks, @groqinc, @dell, @azure and @googlecloud ready on day one. More details in the full announcement ➡️ Download Llama 3.1 models ➡️ With these releases we’re setting the stage for unprecedented new opportunities and we can’t wait to see the innovation our newest models will unlock across all levels of the AI community.
3
3
37
@sharan0909
Sharan Narang
7 months
@ml_perception (2) using a lot of FLOPs efficiently :)
0
0
3
@sharan0909
Sharan Narang
8 months
@_arohan_ Sounds like a question for one of the frontier models instead of X ;)
0
0
1
@sharan0909
Sharan Narang
9 months
@giffmana @_jasonwei @giffmana you are the ML twitter corrector we need, correcting everyone from Turing award winners to researchers! Thanks for the public service :)
1
0
8
@sharan0909
Sharan Narang
10 months
@stanislavfort It really comes down to goals. Our goal was to train a large scale LLM at high quality which required derisking infra, data, ML changes. All of these require compute. We did run some scaling experiments but nothing as systematic as the Chinchilla work.
0
0
1
@sharan0909
Sharan Narang
10 months
@deliprao Actually the chinchilla paper helped the field significantly. Before that, the original scaling laws paper recommended scaling the model size at the same rate as the dataset size. Chinchilla showed the correct formulation of scaling laws.
0
1
11
@sharan0909
Sharan Narang
10 months
LLMs keep improving with more data! This also means we need better benchmarks
@ml_perception
Mike Lewis
10 months
I'm seeing a lot of questions about the limit of how good you can make a small LLM. tldr; benchmarks saturate, models don't. LLMs will improve logarithmically forever with enough good data.
0
0
15