Sharan Narang @sharan0909 profile

Sharan Narang

@sharan0909

Followers

2K

Following

462

Statuses

198

LLMs and AI Research (Llama 2 & 3 lead) @Meta | ex @Google (PaLM lead, T5), ex @Baidu (Deep Speech 2, Sparse Neural Networks), ex @Nvidia

San Francisco, CA

Joined May 2011

Don't wanna be here? Send us removal request.

Sharan Narang

@sharan0909

2 years

We released the Llama 2 model ( as a foundational and chat model. I'm sharing some results from the paper in this thread. Paper: 1/n

1

5

84

Sharan Narang

@sharan0909

2 months

@_arohan_ @AIatMeta Welcome to the team! Excited to work together :)

1

0

8

Sharan Narang

@sharan0909

2 months

RT @Drweloveu: Thread of horrific accidents caught on camera 💔😞 (Don`t open if you are soft hearted)

0

3K

0

Sharan Narang

@sharan0909

5 months

@hwchung27 Really nice work! Congrats

0

6

Sharan Narang

@sharan0909

5 months

@arvind_io @AIatMeta @OpenAI Welcome to meta! Great to work together again :)

1

0

3

Sharan Narang

@sharan0909

7 months

Llama 4: 🚀

0

1

12

Sharan Narang

@sharan0909

7 months

@karpathy @pdeyhim As promised, the research paper is out:

0

1

Sharan Narang

@sharan0909

7 months

As promised a while back, we've published a research paper with all the findings. Onward to Llama 4🦙🦙🦙🦙🚀

Sharan Narang

@sharan0909

10 months

@karpathy @pdeyhim It’s going to be a paper, not a tech report 😊

1

2

7

Sharan Narang

@sharan0909

7 months

It's also great to see that Scale AI evaluated our model and showed that it is competitive all closed source models on coding, math, and instruction following. Full thread:

Alexandr Wang

@alexandr_wang

7 months

1/Meta just released Llama3.1 405B! @scale_AI partnered deeply with @Meta on this release: 🥇 SEAL Evaluations: Based on our evals 🥇 on IF 🥈 on Math #4 on Coding 💼 Enterprise partnership for custom Llama models 🤖 Data Foundry partnership on RLHF & SFT 👇

1

0

4

Sharan Narang

@sharan0909

7 months

The 405B model is competitive with state of the art models as shown in the benchmark and human evaluation results. For an overview of technical details, checkout this thread by @astonzhangAZ:

Aston Zhang

@astonzhangAZ

7 months

Our Llama 3.1 405B is now openly available! After a year of dedicated effort, from project planning to launch reviews, we are thrilled to open-source the Llama 3 herd of models and share our findings through the paper: 🔹Llama 3.1 405B, continuously trained with a 128K context length following pre-training with an 8K context length, supports multilinguality and tool usage. It offers performance comparable to leading language models, such as GPT-4, across a range of tasks. 🔹Compared to previous Llama models, we have enhanced the preprocessing and curation pipelines for pre-training data, as well as the quality assurance and filtering methods for post-training data. 🔹Pre-training 405B on 15.6T tokens (3.8x10^25 FLOPs) was a significant challenge. We optimized our entire training stack and used over 16K H100 GPUs. 🔹To support large-scale production inference for the 405B model, we quantized from 16-bit (BF16) to 8-bit (FP8), reducing compute requirements and enabling the model to run on a single server node. 🔹We leveraged the 405B model to improve the post-training quality of our 70B and 8B models. 🔹In post-training, we refined chat models with multiple rounds of alignment involving supervised fine-tuning (SFT), rejection sampling, and direct preference optimization. We generate most SFT examples using synthetic data. 🔹We integrated image, video, and speech capabilities into Llama 3 using a compositional approach, enabling models to recognize images and videos and support interaction via speech. They are under development and not yet ready for release. 🔹We've updated our license to allow developers to use outputs from Llama models to enhance other models. There is nothing more rewarding than working at the forefront of AI development alongside some of the brightest minds in the field and publishing our research transparently. I'm excited about the innovations our open-source models enable and the potential of the future herd of Llamas!

1

0

3

Sharan Narang

@sharan0909

7 months

It's been an incredible journey from Llama 2 to Llama 3.1 in the past year! The 405B model, in particular, has been a great learning experience with many highs and (some) lows. Very proud of the entire pretraining and post training teams that delivered this amazing model! 🧵

AI at Meta

@AIatMeta

7 months

Starting today, open source is leading the way. Introducing Llama 3.1: Our most capable models yet. Today we’re releasing a collection of new Llama 3.1 models including our long awaited 405B. These models deliver improved reasoning capabilities, a larger 128K token context window and improved support for 8 languages among other improvements. Llama 3.1 405B rivals leading closed source models on state-of-the-art capabilities across a range of tasks in general knowledge, steerability, math, tool use and multilingual translation. The models are available to download now directly from Meta or @huggingface. With today’s release the ecosystem is also ready to go with 25+ partners rolling out our latest models — including @awscloud, @nvidia, @databricks, @groqinc, @dell, @azure and @googlecloud ready on day one. More details in the full announcement ➡️ Download Llama 3.1 models ➡️ With these releases we’re setting the stage for unprecedented new opportunities and we can’t wait to see the innovation our newest models will unlock across all levels of the AI community.

3

37

Sharan Narang

@sharan0909

7 months

@ml_perception (2) using a lot of FLOPs efficiently :)

0

3

Sharan Narang

@sharan0909

8 months

@_arohan_ Sounds like a question for one of the frontier models instead of X ;)

0

1

Sharan Narang

@sharan0909

9 months

@giffmana @_jasonwei @giffmana you are the ML twitter corrector we need, correcting everyone from Turing award winners to researchers! Thanks for the public service :)

1

0

8

Sharan Narang

@sharan0909

10 months

@stanislavfort It really comes down to goals. Our goal was to train a large scale LLM at high quality which required derisking infra, data, ML changes. All of these require compute. We did run some scaling experiments but nothing as systematic as the Chinchilla work.

0

1

Sharan Narang

@sharan0909

10 months

@deliprao Actually the chinchilla paper helped the field significantly. Before that, the original scaling laws paper recommended scaling the model size at the same rate as the dataset size. Chinchilla showed the correct formulation of scaling laws.

0

1

11

Sharan Narang

@sharan0909

10 months

LLMs keep improving with more data! This also means we need better benchmarks

Mike Lewis

@ml_perception

10 months

I'm seeing a lot of questions about the limit of how good you can make a small LLM. tldr; benchmarks saturate, models don't. LLMs will improve logarithmically forever with enough good data.

0

15