Mehrdad Farajtabar @MFarajtabar profile

Mehrdad Farajtabar

@MFarajtabar

Followers

5,490

Following

154

Media

24

Statuses

112

Research Scientist at @Apple , ex- @DeepMind , ex- @GeorgiaTech

https://t.co/Z39ZU7S9Zt

Seattle Area

Joined January 2021

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Tuchel • 164446 Tweets

#FGO • 113600 Tweets

#sbhawks • 48780 Tweets

カズラドロップ • 47629 Tweets

バーニス • 38359 Tweets

#deprem • 36482 Tweets

#baystars • 34164 Tweets

FY RECAP BLANK SS2EP4 • 31218 Tweets

#BlankReactSS2Ep4 • 28528 Tweets

ZETA • 23817 Tweets

ソフトバンク • 19958 Tweets

ジェロニモ • 19709 Tweets

ホークス • 17286 Tweets

スタメン • 14523 Tweets

ジャイアンツ • 14382 Tweets

カノウさん • 12440 Tweets

ベイスターズ • 12014 Tweets

日本シリーズ • 10773 Tweets

ヘルナンデス

衆院選JNN序盤情勢調査

右京さん

ターシャ

いまみー

岸波白野

増田大輝

完封リレー

堀岡くん

完封負け

巨人打線

伊藤大海

吉野家コピペ

スマイルウィ

カズラちゃん

三振ゲッツー

サクラファイブ

スイパラ

ソフト図鑑

イーブン

ヴァイオレット

オベロン

ヤスアキ

アドバンテージ

まけほー

色メロエッタ

モンテス

レイエス

カドック

ムリアン

たかほー

横浜優勝

Last Seen Profiles

@vendormarketlc

@dahsharky

@SomosFunvisis

@nathan_breakxxx

@anilsharmaprod

@unlckyme

@nippon_81

@TambayObenson

@cheeydar

@soyunargento

@abd70022007

@DigitalFutbol

@DrkFX

@FunnytronLabs

@k2ttgxa

@BITE_Ballymun

@herrhaase

@kojiiicoin

@shazeoi

@cheryybutter

Mehrdad Farajtabar

@MFarajtabar

6 days

1/ Can Large Language Models (LLMs) truly reason? Or are they just sophisticated pattern matchers? In our latest preprint, we explore this key question through a large-scale study of both open-source like Llama, Phi, Gemma, and Mistral and leading closed models, including the

188

1K

5K

Mehrdad Farajtabar

@MFarajtabar

2 years

Our team at Apple is looking for interns to work on Continual/Lifelong/Transfer Learning, Multi-Modal Large Models, ML Efficiency, and likewise. The position is available as early as next month and the duration is >6 months. Feel free to send your resumes to m_farajtabar @apple

18

86

529

Mehrdad Farajtabar

@MFarajtabar

11 months

My team at #Apple is looking for interns to work on Large Language Models ( #LLM ) especially on efficient "inference" and training. Please email your CV and highlighted related research or codes to m_farajtabarATappleDOTcom. The ideal candidate must:

20

100

518

Mehrdad Farajtabar

@MFarajtabar

6 days

13/ Overall, we found no evidence of formal reasoning in language models including open-source models like #Llama , #Phi , #Gemma , and #Mistral and leading closed models, including the recent #OpenAI #GPT -4o and #o1 -series. Their behavior is better explained by sophisticated

53

273

1K

Mehrdad Farajtabar

@MFarajtabar

7 months

Reflecting on LLM research from a bird's-eye view, Noam Shazeer is "the" single most important technical person behind LLM/GenAI revolution.

1

5

42

Mehrdad Farajtabar

@MFarajtabar

20 days

I just shared the following note with my team

1

0

45

Mehrdad Farajtabar

@MFarajtabar

6 days

12/ Understanding LLMs' true reasoning capabilities is crucial for deploying them in real-world scenarios where accuracy and consistency are non-negotiable—especially in #AI_safety , #alignment , #education , #health_care , and #decision_making systems. Our findings emphasize the

5

40

308

Mehrdad Farajtabar

@MFarajtabar

11 months

1) be available for a long internship (both spring and summer), 2) has work authorization in US and can ideally move to Seattle, 3) has related research artifacts e.g. papers in NLP and ML conferences, 4) hands-on experience with PyTorch/Jax.

2

1

37

Mehrdad Farajtabar

@MFarajtabar

6 days

8/ This begs the question: Do these models truly understand mathematical concepts? Introducing #GSM_NoOp ! We add a single clause that seems relevant but doesn't contribute to the overall reasoning (hence "no-op"). Check out what happens next!

19

32

306

Mehrdad Farajtabar

@MFarajtabar

6 days

3/ Introducing GSM-Symbolic—our new tool to test the limits of LLMs in mathematical reasoning. We create symbolic templates from the #GSM8K test set, enabling the generation of numerous instances and the design of controllable experiments. We generate 50 unique GSM-Symbolic

3

19

235

Mehrdad Farajtabar

@MFarajtabar

6 days

9/ #Result 4: A massive performance drop! All models, including o1 models, show significant declines. While it’ll be interesting to see how grade-school students perform on similar datasets, I doubt the drop would be this severe.“

6

26

277

Mehrdad Farajtabar

@MFarajtabar

11 months

5) should be enrolled in a PhD program, more ideally towards graduation. Apology in advance if I can not respond to all the inquiries. But, be sure I'll read all the emails and take them into consideration. Will only followup with the ones the our projects fit the most.

2

0

32

Mehrdad Farajtabar

@MFarajtabar

6 days

5/ #Result 2: The fragility of supposed LLM reasoning. LLMs remain sensitive to changes in proper names (e.g., people, foods, objects), and even more so when numbers are altered. Would a grade-school student's math test score vary by ~10% if we only changed the names?

10

37

332

Mehrdad Farajtabar

@MFarajtabar

6 days

2/ When OpenAI released GSM8K ~3 years ago, GPT-3 (175B) scored 35% on the GSM8K test. Today, models with ~3B parameters are surpassing 85%, and larger ones are hitting >95%. But has model 'reasoning' really improved? How much of this is genuine #logical / #symbolic reasoning? vs.

3

22

238

Mehrdad Farajtabar

@MFarajtabar

10 months

🚀 Excited to share our latest research on efficient large language model (LLM) inference with limited memory. We're tackling the challenge of running LLMs beyond the usual assumption that the entire model fits into the DRAM! #LLM #AI Thanks @_akhaliq for covering our work!

AK

@_akhaliq

10 months

Apple announces LLM in a flash: Efficient Large Language Model Inference with Limited Memory paper page: Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their

32

487

3K

2

5

27

Mehrdad Farajtabar

@MFarajtabar

10 months

I'll be at #neurips2023 from Thursday to Saturday. Looking forward to meeting old and new friends. Besides that, I'll be around our 4 posters! Happy to chat about Large Language Models #LLM efficient inference and training, #Multimodal and #CLIP models, #Continual learning. /0

1

0

27

Mehrdad Farajtabar

@MFarajtabar

4 years

@farajtabar Dude, thanks for the mention! :)

1

0

27

Mehrdad Farajtabar

@MFarajtabar

9 months

Our new paper 'Weight Subcloning' proposes a method for initialization and faster training of #transformer models. This approach helps transfer knowledge from large pretrained models to smaller versions through directly copying weights, after sorting and shuffling, 1/2

1

25

Mehrdad Farajtabar

@MFarajtabar

6 days

7/ #Result 3: As questions increase in difficulty (M1 → Symbolic → P1 → P2), not only does performance drop, but variance also rises, making models increasingly unreliable.

6

15

208

Mehrdad Farajtabar

@MFarajtabar

6 days

11-/ .... but even o1-preview shows the same silly mistakes like this. Either it doesn't understand what 'now' is, or it doesn't understand what 'last year' is, or a more likely explanation is that its training data with inflation has this pattern, and it's following that again.

17

25

219

Mehrdad Farajtabar

@MFarajtabar

6 days

4/ #Result 1: Current accuracies on GSM8K are not reliable! We observe LARGE performance variation: Llama 8B scores anywhere between 70% to 80%, Phi-3 scores between 75% and 90%, and so on. For most models, the average performance on GSM-Symbolic is lower than on GSM8K

8

20

207

Mehrdad Farajtabar

@MFarajtabar

6 days

10/ #Result 5: Can scaling data, models, or compute fundementaly solve this? We don't think so! #OpenAI 's #o1 -series is performing better but still suffers from slight performance variations. #o1_preview shows significant improvements, but...

4

20

193

Mehrdad Farajtabar

@MFarajtabar

10 months

It seems like we've reached an all-time high in the number of upvotes on Hugging Face research papers.

clem 🤗

@ClementDelangue

10 months

Lots of cool work from @apple recently . Check out MLX: . Almost 200 upvotes on their latest paper on HF:

9

37

295

1

0

21

Mehrdad Farajtabar

@MFarajtabar

1 year

#ImageNet_Moment of #Continual_Learning ! Do we really need to train Foundation Models from scratch every time we get new data and waste much compute, time, energy to have an updated one? Check out our paper with @saurabh_garg67 , @FartashFg , @Vaishaal and colleagues at #Apple !

Saurabh Garg

@saurabh_garg67

1 year

Q: How to keep foundation models up to date with the latest data? ⏱️ We introduce the first web-scale Time-Continual (TiC) benchmark with 12.7B timestamped img-text pairs for continual training of VLMs and demonstrate efficacy of a simple replay method.

1

34

127

1

2

19

Mehrdad Farajtabar

@MFarajtabar

7 months

I asked #Gemini to find the equivalence of a #Persian phrase to #English and it spitted out one #Russian term in the middle of #generation . A very interesting #bug ! Perhaps, it couldn't recollect the Persian equivalent of "option" at the context & used its Russian Knowledge ;-)

3

15

Mehrdad Farajtabar

@MFarajtabar

6 days

6/ What if we adjust question difficulty? We introduce 3 new variants of GSM-Symbolic to study model behavior: removing one clause (GSM-M1), adding one clause (GSM-P1), or adding two clauses (GSM-P2).

2

11

162

Mehrdad Farajtabar

@MFarajtabar

5 months

I'm not attending #ICLR2024 but here are two of my papers from our team #Apple : 1) ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models Wed 8 May 2) TiC-CLIP: Continual Training of CLIP Models Fir 10 May

This link will take you to a page that’s not on LinkedIn

lnkd.in

2

0

15

Mehrdad Farajtabar

@MFarajtabar

20 days

A better/edited version of "I just shared the following note with my team"

Mehrdad Farajtabar

@MFarajtabar

20 days

I just shared the following note with my team

1

0

45

1

0

14

Mehrdad Farajtabar

@MFarajtabar

1 year

Indeed we've shown that there is not a significant gap in performance when you use Relu in Llama or Falcon, after not-many epochs of fine-tuning (and also from scratch training) and the gap (if any) can be easily bridged with extra training (scaling law!)

Teortaxes▶️

@teortaxesTex

1 year

They are hinting at that, sure. But they're testing on OPT, as in most of those Hype-Aware Quantization papers Why? OPT's FF layers use ReLU. It sacrifices perplexity but makes activations sparse. I'm skeptical it'll work for SwiGLU in LLaMA… without retrain. (paper:MoEfication)

2

1

26

0

10

Mehrdad Farajtabar

@MFarajtabar

3 years

Have SpaceX, Tesla, Waymo, Meta(verse), Apple, etc paid their debt to Disney and Pixar for all the motivation and inspiration?

1

0

7

Mehrdad Farajtabar

@MFarajtabar

10 months

Lots of good feedback and interesting avenues for future works.

Atila

@atiorh

10 months

My takeaways from Apple's “LLM in a flash" (1/n)

3

68

372

0

6

Mehrdad Farajtabar

@MFarajtabar

10 months

#ReLU activation function offers new horizons for #efficient #LLM training and inference!

murat 🍥

@mayfer

10 months

with all the sparsity-aware context based memory loading papers coming out, (PowerInfer getting 11x and Apple getting 25x speedup on GPU) ReLU's dead zone is turning out to be important llama-class models (SwiGLU) might not have much longevity afterall once all the Metal work

10

20

245

0

1

7

Mehrdad Farajtabar

@MFarajtabar

18 days

"Continual Learning" for the win!

Ilya Sutskever (Parody)

@ilyasutsk

19 days

Our company is embarking on an ambitious journey to develop Artificial General Intelligence (AGI). With a $1 billion investment, we are positioning ourselves to unlock the future of AI. Here’s how we will allocate that funding and drive this groundbreaking initiative forward. 1.

71

127

1K

1

7

Mehrdad Farajtabar

@MFarajtabar

5 days

@sirbayes If you're referring to Llama's, it's Llama 3 8B which is quite an advance model and has presumably been trained with lots of similarly crafted data, still 10 pc deviation is too much to me. For the older models it's more damning (a table in appendix has all the numbers). I may

1

4

53

Mehrdad Farajtabar

@MFarajtabar

9 months

aiming to reduce training time and resource use. Ideal for contexts with less data or computing power, it offers a promising alternative for #efficient model training. w M. Samragh, @sacmehtauw @raviteja_vemu @FartashFg D. Naik @OncelTuzel @morastegari 2/2

0

5

Mehrdad Farajtabar

@MFarajtabar

4 years

@farajtabar @Chaay ببینی و بهینه‌اش کنی. حتی میلگرام هم یه مقاله در این زمینه داشت. مثال‌های جالبی که به ذهنم میرسه اینهاست: همونطور که اشاره کردی برای اقتصادی‌ها که یادگیری تقویتی و فرآین مارکوف بلدند توپ و میدان آماده است.

2

0

5

Mehrdad Farajtabar

@MFarajtabar

8 days

Here are my nominations for the #Nobel Prize in Literature! @Yoshua_Bengio @ylecun

Bored Yann LeCun

@boredyannlecun

4 years

You got the citations, but do the ablations You'd lack solutions without convolutions Denoise yourself prof, you owe me & Geoff! Little bro Bengio, you're just a soph, did I cough? My GPU wows, you just hide behind eyebrows I self-supervise, you capsize 🧠🧠🧠🧠🧠🧠 #torched

4

15

223

0

7

Mehrdad Farajtabar

@MFarajtabar

4 years

@farajtabar @Chaay با سر و صدای زیاد استفاده از هوش مصنوعی برای دیزاین مارکت یا تغییر قوانین و بهبودش هست. در کاربرد‌های عادی از AI استفاده می‌کنند برای پیش‌بینی رفتار یا تخمین وضعیت نهان عامل‌های اقتصادی یا بهینه کردن یه تابع هدف برای عامل یا سیستم. اما اینکه خود مارکت رو به صورت یه عامل هوشمند

1

0

4

Mehrdad Farajtabar

@MFarajtabar

4 years

@farajtabar @Chaay برای کار پژوهشی موافقم که در سرمایه‌گذاری در مورد ۳ و ۴ مزیت رقابتی بیشتری هست. ولی گهگداری برای گرفتن نتیجه عالی و قابل ارائه یا تسریع ارزیابی مدل‌ها تنت به پیه ۱ و ۲ حتما می‌خوره. توییت‌هایی که باقی در جواب نوشتند مانع و جامع بود و واردش نمیشم ولی یه کار بامزه، سخت و

1

0

3

Mehrdad Farajtabar

@MFarajtabar

10 months

4) SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding () on Friday in UniReps: Unifying Representations in Neural Models Workshop (arxiv: )

0

1

3

Mehrdad Farajtabar

@MFarajtabar

1 year

@iofu728 @main_horse @_akhaliq Indeed, with a similar motivation we looked at sparsity of Llama and Falcon and saw that their activation function can be replaced by Relu condition on small amount of fine-tuning (or even from scratch training) while not affecting the performance. .

1

0

3

Mehrdad Farajtabar

@MFarajtabar

5 months

If you have any questions about them please attend the oral and poster sessions or find Iman Mirzadeh or @FartashFg to chat about #inference #efficiency and #continual #training of large vision language and large language models ( #LLM ).

0

3

Mehrdad Farajtabar

@MFarajtabar

4 years

@HDNeverFalls @Chaay @farajtabar I'm genuine. That account is fake. Don't look at the number his followers. They are all fake, bots, trols, reformists, stability islanders, Arzeshi, Barandaz, zero-sum-game enthusiasts (:-P), etc, etc.

2

0

3

Mehrdad Farajtabar

@MFarajtabar

3 years

@polkirichenko (1) Happy birthday! (2) Great work ;-)

1

0

3

Mehrdad Farajtabar

@MFarajtabar

10 months

1) ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models () on Saturday at Efficient Natural Language and Speech Processing workshop (arxiv: )

This link will take you to a page that’s not on LinkedIn

lnkd.in

1

0

2

Mehrdad Farajtabar

@MFarajtabar

7 months

BTW, I'm so glad we passed the era of traditional #search ! #Generative_AI is here to make a real difference!

0

3

Mehrdad Farajtabar

@MFarajtabar

18 days

@danie1marczak Correct! Same here. It felt so real :D

0

2

Mehrdad Farajtabar

@MFarajtabar

5 days

@boazbaraktcs Thanks Boaz for the comment. I think prompting can help a bit, or even more than a bit (like how CoT helps), but, especially on harder problems like GSM-p1 or -p2, but at the end of day one come up with harder ones (-pn) or distractions (no-op) that have not been seen in

0

16

Mehrdad Farajtabar

@MFarajtabar

10 months

2) TiC-CLIP: Continual Training of CLIP Models () on Friday in Workshop on Distribution Shifts: New Frontiers with Foundation Models (arxiv: )

This link will take you to a page that’s not on LinkedIn

lnkd.in

1

0

1

Mehrdad Farajtabar

@MFarajtabar

20 days

@_dsevero They can pretend to reason before the deadline :D

0

1

Mehrdad Farajtabar

@MFarajtabar

10 months

🙏 All these fun images were generated by #DALL_E . Please check out our paper for more details and more serious images: ! Kudos to my co-authors and other colleagues for the fantastic cross functional ( #AIML , #SW , #HW ) collaboration!

0

Mehrdad Farajtabar

@MFarajtabar

2 months

If you are traveling to #Bangkok for #ACL2024 don't miss Keivan's Oral (Monday 3pm) and Poster presentation (Wednesday 10:30-12:00, session F)!

Keivan Alizadeh

@KeivanAlizadeh2

2 months

Hey Guys, I'm gonna present LLM in a flash in ACL 2024. Hit me up if you are in Bangkok. Updates from previous version: - Llama 2 results - Some results on Apple GPUs (Metal) - Speculative decoding - Memory Latency Tradeoff - Impact of longer generation

0

6

45

0

2

Mehrdad Farajtabar

@MFarajtabar

10 months

3) CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement () on Friday in UniReps: Unifying Representations in Neural Models Workshop (arxiv: )

This link will take you to a page that’s not on LinkedIn

lnkd.in

2

0

1

Mehrdad Farajtabar

@MFarajtabar

4 years

@a_ghasemi @farajtabar Yep. you're right :)

0

1

Mehrdad Farajtabar

@MFarajtabar

10 months

💡 Key Insight: Store model parameters in higher-capacity flash memory and load them selectively into DRAM during inference. This avoids needing to fit the entire model in DRAM. Our method optimizes data management, reducing data transfer and enhancing memory usage efficiency.

1

0

1

Mehrdad Farajtabar

@MFarajtabar

10 months

@MiladShahidi @ImangAdy نه بابا. شوخی کردم. اوکی بود :)

0

1

Mehrdad Farajtabar

@MFarajtabar

4 years

@farajtabar @a_ghasemi Hi there :) Looks interesting. I assume you already have word embedding for persian texts. If you don't want to completely start scratch you can use them and train a small seq2seq model using those embeddings on the poems. That may work better than sent2vec model from scratch?

2

0

1