Aston Zhang @astonzhangAZ profile

Aston Zhang

@astonzhangAZ

Followers

8,185

Following

98

Media

42

Statuses

187

Research Scientist on the #llama team at Meta Generative AI, designing and training large language models. Lead author of . Opinions are my own.

https://t.co/tlRKkpoBoj

Menlo Park, California

Joined December 2018

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

مدريد • 1329301 Tweets

Real Madrid • 315423 Tweets

Champions • 224644 Tweets

Vini • 175025 Tweets

Eminem • 91359 Tweets

Mbappe • 80069 Tweets

Barca • 72086 Tweets

Neymar • 70745 Tweets

Blade • 64058 Tweets

Bernabéu • 62109 Tweets

Celtics • 53963 Tweets

Saka • 48486 Tweets

Balón de Oro • 47851 Tweets

The Atlantic • 45581 Tweets

Ancelotti • 44001 Tweets

Lakers • 43021 Tweets

Bola de Ouro • 41104 Tweets

E. Coli • 38727 Tweets

#GHLímite7 • 33317 Tweets

Sahin • 31121 Tweets

Knicks • 30185 Tweets

#ろふまおは • 24174 Tweets

#TemptationIsland • 22336 Tweets

Aston Villa • 21991 Tweets

Arteta • 21542 Tweets

Lucas Vázquez • 20881 Tweets

#RMABVB • 18734 Tweets

Dembele • 18655 Tweets

Rodri • 17448 Tweets

#KohLanta • 16359 Tweets

Modric • 15544 Tweets

الريال • 12877 Tweets

Trossard • 12803 Tweets

Havertz • 11074 Tweets

Mendy • 10363 Tweets

Danilo • 10275 Tweets

Calafiori • 10198 Tweets

Motta

Perin

الكره الذهبيه

Odegaard

もっちー

Nwaneri

梶谷引退

Emre Can

كورتوا

انشيلوتي

فاسكيز

おはスタ

Last Seen Profiles

@zizely_zone

@ThangaiVeriyan

@mehdirhasan

@mumeisio

@fluffy_folio

@myg0310

@mrshunterlhs

@PERM_SKO

@ThomasLifedev

@theyloveberry

@k_2030II

@theda12977

@PclPodcast

@st_photo_02

@G1D83do5pB9wh

@turk_ifsa2019

@EarfulSoul

@emiburnss

@herr_mess

@Magolor44

Pinned Tweet

Aston Zhang

@astonzhangAZ

21 days

🚀 New paper from our Llama team @AIatMeta ! We discuss "cross capabilities" and "Law of the Weakest Link" of large language models (LLMs): 🔹 Cross capabilities: the intersection of multiple distinct capabilities across different types of expertise necessary to address complex,

5

21

141

Aston Zhang

@astonzhangAZ

3 months

Our Llama 3.1 405B is now openly available! After a year of dedicated effort, from project planning to launch reviews, we are thrilled to open-source the Llama 3 herd of models and share our findings through the paper: 🔹Llama 3.1 405B, continuously trained with a 128K context

134

606

3K

Aston Zhang

@astonzhangAZ

6 months

Llama 3 has been my focus since joining the Llama team last summer. Together, we've been tackling challenges across pre-training and human data, pre-training scaling, long context, post-training, and evaluations. It's been a rigorous yet thrilling journey: 🔹Our largest models

132

237

2K

Aston Zhang

@astonzhangAZ

2 years

Our deep learning book goes 1.0 beta🎉 📕Forthcoming on Cambridge Univ Press (w @PyTorch ) 🆕JAX implementation 🆕Reinforcement Learning 🆕Gaussian Processes 🆕Hyperparameter Optimization - Thank 500 contributors & 400 univs for choosing! - Free at (1/5)

11

131

690

Aston Zhang

@astonzhangAZ

2 years

"Imagine learning a textbook with no figures." Multimodal chain-of-thought (Multimodal-CoT) in Language Models - Outperform GPT-3.5 by 16% (75%->91%) and surpass human performance on ScienceQA - Less than 1B params (so you can train more easily) - Code & model released [1/6]

11

118

528

Aston Zhang

@astonzhangAZ

11 months

Thrilled that our 'Dive into Deep Learning' book, now published by Cambridge University Press, is the Top New Release on Amazon! To ensure accessibility and affordability, we, the authors, have waived our royalties. Plus, it's always available for free at

16

72

384

Aston Zhang

@astonzhangAZ

2 years

Cheer AI up with "let's think step by step"? More plz Let’s think not just step by step, but also one by one We can use more cheers & diversity to SAVE huge manual efforts in chain of thought prompt design, matching or even exceeding performance of manual design on GPT-3 [1/7]

1

24

219

Aston Zhang

@astonzhangAZ

15 days

🚀 Exciting internship opportunity! Join the Llama team @AIatMeta and help redefine what's possible with large language models—from pre-training to post-training. Be part of our 2025 research internship and help shape the future of LLMs. Feel free to email or DM me 📩 Learn

6

22

220

Aston Zhang

@astonzhangAZ

11 months

🚀 Exciting internship opportunity! Our GenAI @Meta team is diving into various facets of #llama , from data to pretraining & finetuning. Passionate about large language models? Join us for a 2024 research internship. Feel free to email or DM me 📩

7

22

216

Aston Zhang

@astonzhangAZ

2 years

Don't assign the SAME parameter-efficient fine-tuning strategy to DIFFERENT layers New tips: - Group layers, SPINDLE pattern (e.g, 4-8-8-4 layers) - Allocate params to layers uniformly - Tune all groups - Adjust tuning strategies for diff groups @AmazonScience @stanfordnlp [1/4]

1

11

105

Aston Zhang

@astonzhangAZ

2 years

#ICML Long Oral! @AmazonScience Out-of-Distribution (OOD) Detection in Long-Tailed Recognition 📉 Existing OOD detection fails when training data is long-tail distributed 📈 Ours: SOTA on long-tailed ImageNet Paper: Code: 1/

4

12

87

Aston Zhang

@astonzhangAZ

6 months

@SwayStar123 We are making Llama 3 multimodal

3

5

82

Aston Zhang

@astonzhangAZ

2 years

Although our book is free online, many readers have been requesting hard copies for tired eyes So excited to announce: ✅ English publication agreement with @CambridgeUP was signed @AmazonScience ✅ Chinese 2nd edition was sent to print Both in @PyTorch

5

11

76

Aston Zhang

@astonzhangAZ

2 years

If your prompt tuning can't converge easily, make it semi-parametric. 🆕Memory prompt: input-adaptive but no need memory prompt tuning ✅Full fine-tuning on 31 tasks -> zero-shot generalization ✅Parameter-efficient fine-tuning on GLUE -> task transferability on SuperGLUE [1/4]

1

18

62

Aston Zhang

@astonzhangAZ

2 years

Multimodal Chain-of-Thought Reasoning in Language Models Code & model: Thank @lupantech for providing model info on ScienceQA! [6/6]

GitHub - amazon-science/mm-cot: Official implementation for "Multimodal Chain-of-Thought Reasoning...

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated) - amazon-science/mm-cot

github.com

7

13

63

Aston Zhang

@astonzhangAZ

3 months

@simonw Check out

3

5

50

Aston Zhang

@astonzhangAZ

4 months

Thanks @AIatMeta for having me on the Llama for Developers podcast! Tokenizers play a crucial role in LLMs, impacting data handling, pre-training, post-training, and inference: 🔹With a larger vocabulary, domain-specific words are more likely to be single tokens, preserving

AI at Meta

@AIatMeta

4 months

New video out today! Understanding the Meta Llama 3 Tokenizer with @astonzhangAZ from the Llama research team. This conversation dives into the change from SentencePiece to Tiktoken and what that enables for our latest models. Watch the full video ➡️

27

57

287

1

10

48

Aston Zhang

@astonzhangAZ

1 year

"Perform a task on smart phones? Train an agent using screenshots" You Only Look at Screens: Multimodal Chain-of-Action Agents - New state of the art: 90% action type prediction accuracy and 74% overall action success rate on AITW - No OCR or API calls - Code released [1/5]

3

5

39

Aston Zhang

@astonzhangAZ

6 months

Proud of what the team have been achieving together! Besides Llama 3, do check out our product launches

Meta AI

Use Meta AI assistant to get things done, create AI-generated images for free, and get answers to any of your questions. Meta AI is built on Meta's latest Llama large language model and uses Emu,...

www.meta.ai

Ahmad Al-Dahle

@Ahmad_Al_Dahle

6 months

It’s here! Meet Llama 3, our latest generation of models that is setting a new standard for state-of-the art performance and efficiency for openly available LLMs. Key highlights • 8B and 70B parameter openly available pre-trained and fine-tuned models. • Trained on more

63

202

999

6

0

38

Aston Zhang

@astonzhangAZ

2 years

With growing discussions of Multimodal-CoT on Reddit, I just posted there explaining our thoughts behind this work (opinions are my own):

Aston Zhang

@astonzhangAZ

2 years

"Imagine learning a textbook with no figures." Multimodal chain-of-thought (Multimodal-CoT) in Language Models - Outperform GPT-3.5 by 16% (75%->91%) and surpass human performance on ScienceQA - Less than 1B params (so you can train more easily) - Code & model released [1/6]

11

118

528

2

29

Aston Zhang

@astonzhangAZ

6 months

Thanks Lex! Llama is thrilled to support developers as an open-source model. With the exciting upgrades in this Llama 3 release, we're excited to see how video podcasts can empower developers to quickly build amazing things together.

Lex Fridman

@lexfridman

6 months

@astonzhangAZ Congrats! Video podcasts is a great idea. Looking forward to learning more details.

11

0

28

0

28

Aston Zhang

@astonzhangAZ

4 years

Nice to hear that our "Beyond Fully-Connected Layers with Quaternions" paper received Outstanding Paper Award at this year's ICLR! Thanks teammates, reviewers, and committee members @iclr_conf

ICLR 2025

@iclr_conf

4 years

We are thrilled to announce the #ICLR2021 Outstanding Paper Awards! Out of 860 excellent papers, the award committee identified 8 that are especially noteworthy: Congratulations to the authors!! @shakir_za @iatitov @aliceoh @NailaMurray @katjahofmann

1

122

540

6

2

24

Aston Zhang

@astonzhangAZ

2 years

D2L is the first deep learning textbook that offers #JAX implementation + it's FREE (as far as we know). Get started with "import jax" at Thank @gollum_here from @AmazonScience ! (2/5)

1

0

24

Aston Zhang

@astonzhangAZ

2 years

Our 25-page paper has much more info: Our code is available at: We'll release more code soon This work was done with @zhangzhuosheng (my intern) @mli65 @smolix from @AmazonScience and @sjtu1896 Happy to answer questions! [7/7]

GitHub - amazon-science/auto-cot: Official implementation for "Automatic Chain of Thought Prompting...

Official implementation for "Automatic Chain of Thought Prompting in Large Language Models" (stay tuned & more will be updated) - amazon-science/auto-cot

github.com

3

22

Aston Zhang

@astonzhangAZ

5 months

Riveting read! High-stakes decision-making demands reliable evaluations. Understanding evaluation failure modes is crucial. Post-training evaluations, which are closer to human interactions, pose more challenges than pre-training ones.

Jason Wei

@_jasonwei

5 months

New blog post where I discuss what makes an language model evaluation successful, and the "seven sins" that make hinder an eval from gaining traction in the community: Had fun presenting this at Stanford's NLP Seminar yesterday!

14

78

540

1

3

19

Aston Zhang

@astonzhangAZ

6 months

@adiletech Our new tokenizer will process more text with fewer tokens. Conservatively it will improve 15% token efficiency compared to the Llama 2 tokenizer. Grouped query attention counterbalances the slightly increased inference efficiency due to vocab increase

3

1

20

Aston Zhang

@astonzhangAZ

3 years

While ConvNeXt (2022) further sparks the debate between CNNs and transformers for vision, added a CNN design section covering ResNeXt-itication @sainingxie et al, RegNets Radosavovic et al, ConvNeXt @liuzhuang1234 et al. Preview at:

0

6

18

Aston Zhang

@astonzhangAZ

2 years

Multimodal-CoT *under 1 billion parameters* outperforms the previous state-of-the-art LLM (GPT-3.5) by 16% (75.17%→91.68%) and even surpasses human performance on the ScienceQA benchmark. [5/6]

3

1

18

Aston Zhang

@astonzhangAZ

11 months

A heartfelt thank you to my incredibly talented co-authors @zacharylipton , @mli65 , @smolix from @AmazonScience , and the amazing 700+ open-source contributors. Your dedication and expertise have been instrumental in making our book a valuable resource for everyone!

1

4

13

Aston Zhang

@astonzhangAZ

2 years

Existing studies related to CoT reasoning are largely isolated in the language modality. Given inputs in different modalities, Multimodal-CoT decomposes multi-step problems into intermediate reasoning steps (rationale) and then infers the answer. [2/6]

2

0

16

Aston Zhang

@astonzhangAZ

2 years

Our biggest update so far. New discussions on ResNeXt, RegNet, ConvNeXt, Vision Transformer, Swin Transformer, T5, GPT-1/2/3, zero/one/few-shot, Gato, Imagen, Minerva, and Parti. Thanks our 265 contributors 🙏

Dive into Deep learning (D2L.ai)

@D2L_ai

2 years

v1.0-alpha🎉 📚New topics on transformers for vision and large-scale pretraining, CNN design, generalization, etc ⚡️New API inspired by @PyTorchLightnin to streamline the presentation 🦥"Lazy" layers to simplify @PyTorch code 🈶zh/pt/tr/vi/ko/ja ongoing

6

203

1K

0

2

14

Aston Zhang

@astonzhangAZ

2 years

“Everything is a special case of a Gaussian process.” Gaussian processes and deep neural networks are highly complementary and can be combined to great effect: Thank @andrewgwils from @nyuniversity ! (4/5)

1

0

15

Aston Zhang

@astonzhangAZ

2 years

The superior perf of manual chain of thought (Manual-CoT) prompting to using a cheer (Zero-Shot-CoT) hinges on the hand-crafting of demos one by one We can save such manual efforts with the “Let’s think step by step” prompt to generate reasoning chains for demos one by one [2/7]

1

0

15

Aston Zhang

@astonzhangAZ

2 years

#ICML spotlight talk! @AmazonScience Removing ✂️Batch Normalization (BN)✂️ Boosts📈 Adversarial Training (AT) 💪 AT makes models robust. 🤒 However, BN struggles to model different statistics of clean and AT samples. 💡 Just remove BN layers in AT! 1/

2

3

14

Aston Zhang

@astonzhangAZ

3 months

@YiTayML Thanks Yi!

3

0

13

Aston Zhang

@astonzhangAZ

2 years

Tired of setting hyperparameters in a trial-and-error manner? You may wish to check out the systematic hyperparameter optimization approach: Thank @kleiaaro Matthias Seeger @cedapprox from @AmazonScience ! (5/5)

0

2

13

Aston Zhang

@astonzhangAZ

6 months

Try out our open sourced Llama 3 at !

GitHub - meta-llama/llama3: The official Meta Llama 3 GitHub site

The official Meta Llama 3 GitHub site. Contribute to meta-llama/llama3 development by creating an account on GitHub.

github.com

Sergey Edunov

@edunov

6 months

Llama 3 has arrived! Taaa-daaam!

4

7

56

5

0

12

Aston Zhang

@astonzhangAZ

5 years

Just let us know if you are using D2L in your course and any feedback is welcome. We won't rest until D2L looks good. GitHub:

0

3

12

Aston Zhang

@astonzhangAZ

2 years

Our paper: Parameter-Efficient Fine-Tuning Design Spaces Our code: [4/4]

Parameter-Efficient Fine-Tuning Design Spaces

Parameter-efficient fine-tuning aims to achieve performance comparable to fine-tuning, using fewer trainable parameters. Several strategies (e.g., Adapters, prefix tuning, BitFit, and LoRA) have...

arxiv.org

2

1

12

Aston Zhang

@astonzhangAZ

4 years

More to come!

Dive into Deep learning (D2L.ai)

@D2L_ai

4 years

Dive into Deep Learning now supports @PyTorch . The first 8 chapters are ready with more on their way. Thanks to DSG IIT Roorkee @dsg_iitr , particularly @gollum_here who adapted the code into PyTorch. More at . @mli65 @smolix @zacharylipton @astonzhangAZ

8

237

932

0

11

Aston Zhang

@astonzhangAZ

2 years

With GPT-3 on ten public tasks of arithmetic/commonsense/symbolic reasoning, Auto-CoT consistently matches or exceeds the performance of Manual-CoT that requires manual designs. This indicates that LLMs can perform CoT reasoning by automatically constructing demonstrations. [6/7]

1

0

11

Aston Zhang

@astonzhangAZ

11 months

Find out more at

Error

Meta's mission is to give people the power to build community and bring the world closer together. Together, we can help people build stronger communities - join us.

www.metacareers.com

0

11

Aston Zhang

@astonzhangAZ

2 years

With the advent of #ChatGPT (sibling model to InstructGPT fine-tuned using reinforcement learning), you may get curious about how to enable ML to take decisions sequentially: Thank @pratikac from @Penn , @rasoolfa @AsadiKavosh from @AmazonScience ! (3/5)

1

2

10

Aston Zhang

@astonzhangAZ

2 years

However, simple solutions fail. E.g., given a test question of a dataset, retrieving similar questions and invoking Zero-Shot-CoT to generate reasoning chains will FAIL. Why? Large language models (LLMs) are not perfect: there can be mistakes in reasoning chains [3/7] ⬇️

1

0

10

Aston Zhang

@astonzhangAZ

2 years

Diversity mitigates misleading by similarity How Auto-CoT fixes it: 1. Partition questions of a given dataset into clusters 2. Select a representative question from each cluster and generate its reasoning chain using Zero-Shot-CoT with heuristics No gradients, just cheers [5/7]

1

0

9

Aston Zhang

@astonzhangAZ

6 months

@Ahmekal Why not check out ?

Meta AI

Use Meta AI assistant to get things done, create AI-generated images for free, and get answers to any of your questions. Meta AI is built on Meta's latest Llama large language model and uses Emu,...

www.meta.ai

5

1

9

Aston Zhang

@astonzhangAZ

2 years

I'd nominate Diffusion-LM by @XiangLisaLi2 et al.: non-autoregressive text generation with plug-and-play control based on 80M-param Transformers

Jason Wei

@_jasonwei

2 years

What directions do you think are out there that can be done without having to scale up the language model?

3

0

4

1

8

Aston Zhang

@astonzhangAZ

2 years

@HingumTringum @_akhaliq

GitHub - amazon-science/mm-cot: Official implementation for "Multimodal Chain-of-Thought Reasoning...

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated) - amazon-science/mm-cot

github.com

1

2

8

Aston Zhang

@astonzhangAZ

2 years

Why Multimodal-CoT? Vision input is a wealth of information that was not fully exploited for CoT reasoning in the past. See an example where the baseline (without using vision features) fails to predict the right answer due to the misleading by hallucinated rationales. [3/6]

1

0

8

Aston Zhang

@astonzhangAZ

2 years

@HoskinsAllen @AmazonScience @zhangzhuosheng @mli65 @karypis @smolix Because our goal is to enable CoT reasoning on multimodal benchmarks, rather than proposing yet another CoT for language only benchmarks. On the same multi-modal benchmark we compared with GPT-3.5 w/CoT (Lu et al. 2022a) in Table 4. Takeaway: vision input can't be ignored for CoT

0

7

Aston Zhang

@astonzhangAZ

2 years

See more results & analysis (e.g., prompt distributions) in our paper: SPT: Semi-Parametric Prompt Tuning for Multitask Prompted Learning w/ @sbmaruf (my awesome intern @AmazonScience ), Shuai Zheng, Xingjian Shi, @yizhu59 , @JotyShafiq , @mli65 [4/4]🔝

1

2

7

Aston Zhang

@astonzhangAZ

2 years

How Multimodal-CoT works: (i) Feed the model with language and vision inputs to generate rationales (ii) Append the original language input with this generated rationale. Feed the updated language input with the original vision input to the model to infer the answer [4/6]

1

0

7

Aston Zhang

@astonzhangAZ

2 years

After similar questions to a test question are retrieved, wrong demos caused by Zero-Shot-CoT may mislead the same LLM to reason similarly with a wrong answer for the test question ---Misleading by Similarity See how calculating "the rest" is mistaken as "the total" in fig [4/7]

2

0

7

Aston Zhang

@astonzhangAZ

5 years

Our focus is on how to apply (deep) representation learning of languages to addressing natural language processing problems.

Dive into Deep learning (D2L.ai)

@D2L_ai

5 years

We have re-organized Chapter: NLP pretraining () and Chapter: NLP applications (), and added sections of BERT (model, data, pretraining, fine-tuning, application) and natural language inference (data, model).

3

107

386

0

2

6

Aston Zhang

@astonzhangAZ

5 years

See you at our Tutorial on Attention in Deep Learning @ICML (3:45-6pm today, Hall A) @icmlconf #ICML

Alex Smola

@smolix

5 years

Join us at ICML'19 for our Tutorial on Attention (3:45-6pm, Hall A). Slides at and

9

47

203

0

6

Aston Zhang

@astonzhangAZ

2 years

Presenting our embarrassingly simple joint data augmentation for vision-language representation learning: MixGen. TL;DR: Just interpolate the two images and concatenate two text sequences to generate a new image-text pair.

2

6

Aston Zhang

@astonzhangAZ

4 years

One release closer to v1.0!

Dive into Deep learning (D2L.ai)

@D2L_ai

4 years

Dive into Deep Learning v0.15.0 is released! More chapters are in PyTorch and TensorFlow. More chapters are close to v1.0. Adopted at 140 universities: if you use D2L to teach, apply for free computing resources by Nov 22. Check out

0

63

312

0

1

5

Aston Zhang

@astonzhangAZ

3 years

@ThomasViehmann Nice explanation! Nonparametric methods tend to have a level of complexity that grows as data increases. @D2L_ai recently added a new section on Generalization in DL, which can be previewed at

0

6

Aston Zhang

@astonzhangAZ

5 years

@zacharylipton I probably know why you are asking this question. On Page 1 of : "In this paper, we will focus on an efficient deep neural network ... in conjunction with the famous “we need to go deeper” internet meme [1]."

0

1

6

Aston Zhang

@astonzhangAZ

2 years

A stop sign got recognized as a speed-limit sign due to backdoor samples? Hard to unlearn backdoor behavior from the network? Why unlearn? We can *bait and trap* backdoors in a small subnetwork, then *replace* it #NeurIPS2022

Haotao

@HaotaoWang

2 years

Our #NeurIPS2022 paper proposes a decrease-and-conquer strategy for backdoor defense. The major finding: A simple image reconstruction loss can successfully trap backdoors into a small subnetwork while preserving the rest of the network largely uncontaminated (see Fig. 2).

1

0

8

0

5

Aston Zhang

@astonzhangAZ

6 months

@YiTayML Thanks Yi!

1

0

5

Aston Zhang

@astonzhangAZ

6 months

@BramVanroy The 400B model is still evolving. If you use 8B and 70B, yes you can build with them now

0

1

5

Aston Zhang

@astonzhangAZ

6 months

@_RyanBoyle_ Yeah hopefully we can livestream such podcast series!

1

0

5

Aston Zhang

@astonzhangAZ

5 years

D2L has an official Twitter account now!

Dive into Deep learning (D2L.ai)

@D2L_ai

5 years

The best way to understand deep learning is learning by doing. Dive into Deep Learning () is an interactive deep learning book on Jupyter notebooks, using the NumPy interface. Check it out:

0

36

125

0

2

5

Aston Zhang

@astonzhangAZ

6 months

@andrewgwils @Meta Thank you so much Andrew! We believe that the open approach will benefit the community and we handle it very carefully. Check out !

GitHub - meta-llama/llama3: The official Meta Llama 3 GitHub site

The official Meta Llama 3 GitHub site. Contribute to meta-llama/llama3 development by creating an account on GitHub.

github.com

0

4

Aston Zhang

@astonzhangAZ

2 years

@nazneenrajani @emnlpmeeting Show the timeline in the unit of weeks next time 😆 What does ✅ mean?

0

4

Aston Zhang

@astonzhangAZ

2 years

@YiTayML @BalajiAI @PyTorch @zacharylipton @mli65 @smolix @rasoolfa @andrewgwils @kleiaaro @cedapprox @DavenCheung @BrentWerness Haha, of course. You’ll find something familiar in the references. Keep up all the amazing work!

0

4

Aston Zhang

@astonzhangAZ

6 months

@zwhe99 The increased vocab size has a diminishing influence when model size increases. Grouped query attention helps to lower the inference cost. Given the benefit of token efficiency and benchmark eval boost, it's worth the upgrade.

0

3

Aston Zhang

@astonzhangAZ

5 months

@boson_ai Congrats team! Looking forward to more stories to be told!

0

3

Aston Zhang

@astonzhangAZ

2 years

@JohnBlackburn75 @zhangzhuosheng @mli65 @smolix @AmazonScience @sjtu1896 @kojima_tks @shaneguML A human operator can simply invoke Auto-CoT as a function😀 I'd lean for being optimistic about future LLM-x's capabilities per comprehensive empirical evidences analyzed by @_jasonwei @YiTayML et al. in this must-read

Emergent Abilities of Large Language Models

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon...

arxiv.org

0

4

Aston Zhang

@astonzhangAZ

6 months

@teortaxesTex You are referring to Llama 3 right? Increasing vocab size leads to slightly lower training wps and inference efficiency, but a more efficient tokenizer gives you more text for the same number of tokens. GQA also counter-balances inference efficiency.

1

0

4

Aston Zhang

@astonzhangAZ

4 years

@smolix @D2L_ai @PyTorch @ApacheMXNet @gollum_here @mli65 @zacharylipton @AstonZhang Ah that's a wrong tweeter handle of me :)

0

3

Aston Zhang

@astonzhangAZ

11 months

@colinraffel Congrats Colin!

0

2

Aston Zhang

@astonzhangAZ

2 years

@neptunewurld @D2L_ai @PyTorch @zacharylipton @mli65 @smolix @rasoolfa @andrewgwils @kleiaaro @cedapprox @DavenCheung @YiTayML @BrentWerness Coming soon with Cambridge University Press 😀

1

0

3

Aston Zhang

@astonzhangAZ

4 years

@smolix Thanks Alex!

0

3

Aston Zhang

@astonzhangAZ

8 months

@agihippo Stay tuned. Oh it’s 1:50am need to sleep now

1

0

3

Aston Zhang

@astonzhangAZ

1 year

AITW is a large-scale benchmark for UI control, which contains natural language instructions, screenshots, and actions. There are 30K unique instructions covering multi-step tasks such as app operation, web search, and web shopping. Auto-UI obtains new overall SOTA. [4/5]

1

0

3

Aston Zhang

@astonzhangAZ

2 years

@JohnBlackburn75 @zhangzhuosheng @mli65 @smolix @AmazonScience @sjtu1896 Thanks John. @kojima_tks @shaneguML et al. have already found that GPT-3 is a decent zero-shot reasoner with just a light-weight "Let's think step by step" prompt. Check out their amazing work at

Large Language Models are Zero-Shot Reasoners

Pretrained large language models (LLMs) are widely used in many sub-fields of natural language processing (NLP) and generally known as excellent few-shot learners with task-specific exemplars....

arxiv.org

1

0

3

Aston Zhang

@astonzhangAZ

3 years

To make machine learning more accessible, we worked with the SageMaker Studio Lab team to allow product users to benefit from open-source contributions to . Our team @awscloud is hiring research interns and full-time (email to az at )

0

1

3

Aston Zhang

@astonzhangAZ

6 months

@BorisMPower Thanks Boris!

0

3

Aston Zhang

@astonzhangAZ

2 years

In the parameter-efficient fine-tuning setting, we pretrain SPT on GLUE datasets and evaluate their fine-tuning performance on SuperGLUE datasets: - SPT has better task transferability, even than full fine-tuning: SPT is cheaper & better [3/4]

1

3

Aston Zhang

@astonzhangAZ

6 months

@yizhu59 Thanks Yi! We'll see if we can livestream to answer developer questions.

1

0

3

Aston Zhang

@astonzhangAZ

2 years

@SubhankarGSH @MoeinShariatnia @zacharylipton @smolix @mli65 @gollum_here @TerryTangYuan Since the recent 1.0-alpha release, @PyTorch has become the default framework of our Dive into Deep Learning book 😃 Check it out:

2

0

3

Aston Zhang

@astonzhangAZ

3 years

@omarsar0 @D2L_ai Thanks for creating various forms of materials to engage and enable our community!

0

3

Aston Zhang

@astonzhangAZ

1 year

@YiTayML Great achievement in just six months! Congratulations Yi!

1

0

1

Aston Zhang

@astonzhangAZ

1 year

Why not use OCR or icon detectors to parse screenshots into text? - Undesirable lengthy inputs to language models Why not invoke internal APIs? - Often inaccessible in third-party Apps Our Auto-UI is a multimodal approach that *directly* interacts with the interface. [2/5]

1

0

2

Aston Zhang

@astonzhangAZ

5 years

Thanks David @hardmaru

hardmaru

@hardmaru

5 years

Dive into Deep Learning: An interactive deep learning book with code, math, and discussions, based on the NumPy interface. I really like the format of the textbook!

14

349

1K

1

2

Aston Zhang

@astonzhangAZ

3 years

@thegautamkamath @unsorsodicorda @zacharylipton @mli65 @smolix Yes, the 0.16.6 release for this version.

0

2

Aston Zhang

@astonzhangAZ

6 months

@QuanquanGu Thanks Quanquan!

0

2

Aston Zhang

@astonzhangAZ

1 year

@rajeshgoli @zhangzhuosheng Thanks. Can you raise an issue on GitHub?

0

2

Aston Zhang

@astonzhangAZ

6 months

@YiTayML Congratulations Yi! It must have taken a lot to get where it is!

0

2

Aston Zhang

@astonzhangAZ

2 years

We fine-tune full LMs with SPT (semi-parametric prompt tuning) on 31 different tasks from 8 different domains and evaluate zero-shot generalization on 9 held-out datasets under 5 NLP task categories. Zero-shot generalization can scale up. [2/4]

1

0

2

Aston Zhang

@astonzhangAZ

2 years

@abdallah_fayed @rasoolfa @pratikac @AsadiKavosh @smolix @zacharylipton @mli65 More topics in this RL chapter are coming soon:

Rasool Fakoor

@rasoolfa

2 years

Checkout new version of D2L book. It covers RL now! @pratikac , @AsadiKavosh , and I authored the RL chapter. Please let us know your feedback! Also, huge thanks to @smolix , @astonzhangAZ , @zacharylipton , @mli65 for their guidance and help. Stay tuned for more topics in RL!

1

2

14

1

0

2

Aston Zhang

@astonzhangAZ

6 months

@nirmalmanoj_c Haha you can try it out!

1

2