muhtasham Profile Banner
muhtasham Profile
muhtasham

@Muhtasham9

Followers
1,359
Following
849
Media
232
Statuses
1,628

In my pre-training years

Latent Space
Joined March 2020
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@Muhtasham9
muhtasham
1 year
w boss
Tweet media one
3
1
68
@Muhtasham9
muhtasham
6 months
A short thread about changes in the transformer architecture since 2017. Reading articles about LLMs, you can see phrases like “we use a standard transformer architecture.” But what does "standard" mean, and have there been changes since the original article? (1/6)
Tweet media one
@Muhtasham9
muhtasham
2 years
Interestingly despite the 5 years(!) of hyper-growth of NLP space, Vanilla Transformer is holding to the Lindy Effects which is the idea that the older something is, the longer it's likely to be around in the future.
0
2
13
7
138
887
@Muhtasham9
muhtasham
9 months
Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by @eugeneyan 's post on this topic, I hacked something together over the weekend to streamline this
Tweet media one
9
33
261
@Muhtasham9
muhtasham
2 years
Excited to announce the most up-to-date and CPU friendly BERT, trained on most recent snapshot of internet. Took a day and 8x A100s to train. 🤗 The model is open-source an I hope the community can benefit from it. It was created…
1
41
238
@Muhtasham9
muhtasham
2 years
Meta: Multi-tasking while reading about Multi-task NLP models
Tweet media one
3
10
130
@Muhtasham9
muhtasham
7 months
StarCoder2 running on M2 8GB
1
7
91
@Muhtasham9
muhtasham
7 months
DeepMind folks can now steal weights behind APIs “We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix.” who wants to do same for gpt4?
7
5
79
@Muhtasham9
muhtasham
2 years
@_jasonwei @arankomatsuzaki Might contain a lot of subtle issues, see clever Hans effect, which is always hard to debug. The law of leaky abstractions in action as my supervisor says
2
5
71
@Muhtasham9
muhtasham
1 year
@Muhtasham9
muhtasham
1 year
🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate
5
17
60
1
2
65
@Muhtasham9
muhtasham
1 year
🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate
5
17
60
@Muhtasham9
muhtasham
7 months
The 🤗 MLX community is amazing Quantized StarCoder2 model variants available here: Small guide on running and training StarCoder2 locally pip install -U mlx-lm To run inference on quantized model python -m mlx_lm.generate --model
@BigCodeProject
BigCode
7 months
Introducing: StarCoder2 and The Stack v2 ⭐️ StarCoder2 is trained with a 16k token context and repo-level information for 4T+ tokens. All built on The Stack v2 - the largest code dataset with 900B+ tokens. All code, data and models are fully open!
Tweet media one
13
192
675
2
13
56
@Muhtasham9
muhtasham
8 months
Happy to show Pod-Helper: ⚡️ Lightning-speed transcription with Whisper 🔧 Built-in audio repair with good old Roberta 🧊 Checks your content's vibe effortlessly See demo below running on TensorRT-LLM #GenAIonRTX #DevContest #GTC24 @NVIDIAAIDev
2
4
35
@Muhtasham9
muhtasham
2 years
@tszzl Here is PDF by @amasad
1
1
34
@Muhtasham9
muhtasham
1 year
If you missed out on the @full_stack_dl LLM bootcamp, don't worry! I've written a blog post about it. I hope you find my post informative and enjoyable to read, just as I enjoyed attending the bootcamp.
0
10
33
@Muhtasham9
muhtasham
8 months
🚀Now supports real-time streaming
@Muhtasham9
muhtasham
8 months
Happy to show Pod-Helper: ⚡️ Lightning-speed transcription with Whisper 🔧 Built-in audio repair with good old Roberta 🧊 Checks your content's vibe effortlessly See demo below running on TensorRT-LLM #GenAIonRTX #DevContest #GTC24 @NVIDIAAIDev
2
4
35
2
7
31
@Muhtasham9
muhtasham
2 years
Let's see how different LM's multiply matrices / think 💭 using this Space GPT-J-6B i see what you did there👀 Built using amazing @Gradio Blocks 🧱 APIs, also you can use new @huggingface 🤗 Community Tab to make suggestions and collaborate
Tweet media one
@arankomatsuzaki
Aran Komatsuzaki
2 years
Large Language Models are Zero-Shot Reasoners Simply adding “Let’s think step by step” before each answer increases the accuracy on MultiArith from 17.7% to 78.7% and GSM8K from 10.4% to 40.7% with GPT-3.
Tweet media one
59
566
2K
2
11
28
@Muhtasham9
muhtasham
2 years
Ultimate comeback
Tweet media one
0
4
31
@Muhtasham9
muhtasham
6 months
Using the example of the language model (i.e. decoder-only) LLaMa-2, let’s look at the main major architectural improvements for LLM: — Post LayerNorm → Pre LayerNorm (). This makes the convergence more stable. Now the process goes in such a way that the
1
0
27
@Muhtasham9
muhtasham
1 year
📢 Just published: How traditional OS concepts like Branch Prediction & Virtual Memory Paging shape today's Large Language Models ( #LLMs ). LLMs = CPUs of early computing? Feedback welcome! 🔗
0
3
28
@Muhtasham9
muhtasham
6 months
— Absolute position embedding → RoPE (). The method itself is that we rotate the token embeddings by an angle depending on the position. And it works well. In addition, the method opened up a number of modifications to expand the context to very large
1
0
23
@Muhtasham9
muhtasham
2 years
Your car gathers a shocking amount of data about you, which you don’t get to see, and the manufacturer sells that to third parties, who use it in ways that are counter to your interests.
0
19
28
@Muhtasham9
muhtasham
7 months
"Flops are cheap, bandwidth is adding more pins, and latency is physics. Deal with it. "
1
6
23
@Muhtasham9
muhtasham
11 months
@vboykis He deployed on Friday
0
1
26
@Muhtasham9
muhtasham
6 months
— ReLU activation → SwiGLU (). Gated Linear Units (a family of methods to which SwiGLU belongs. It adds the operation of element-wise multiplication of matrices, one of which has passed through the sigmoid and thus controls the intensity of the signal
1
0
21
@Muhtasham9
muhtasham
2 years
When your model is training and you see live footage of forward and back prop via @weights_biases
0
4
21
@Muhtasham9
muhtasham
1 year
@CisLmu researcher distilling latest paper about instruction tuning
Tweet media one
1
4
20
@Muhtasham9
muhtasham
6 months
Attention modifications (), for example, using one K-V pair of matrices per group of Q matrices at once. This improvement mainly already affects the optimization of inference. But there are also a huge number of methods aimed at reducing the quadratic
2
2
19
@Muhtasham9
muhtasham
6 months
Except it’s called AI engineering now Come to @aiDotEngineer conf to learn more
@vboykis
vicki
6 months
2013 — 2023: you were hired to do machine learning but do data engineering 2023 — : you were hired to do machine learning but do web dev
20
35
762
3
2
21
@Muhtasham9
muhtasham
1 year
Burning some gpus after first @LangChainAI meetup in Munich
Tweet media one
1
3
18
@Muhtasham9
muhtasham
4 years
New SOTA on BCI SSVEP spellers. Our new DNN achieves impressive information transfer rates (ITR) with only 0.4 seconds of stimulation: 265.23 bits/min on the benchmark and 196.59 bits/min on BETA dataset. Paper: Code: #bci #ssvep
Tweet media one
3
1
14
@Muhtasham9
muhtasham
2 years
the amount of details one can get from @weights_biases is absolutely electric 💥
Tweet media one
0
2
16
@Muhtasham9
muhtasham
1 year
All started with GPT2 moment, but only last week trained internal model and it did good, but fine-tuning made 50% better. @amasad
Tweet media one
1
3
17
@Muhtasham9
muhtasham
1 year
Thanks for putting this together @nathanbenaich and @NotionHQ
Tweet media one
1
3
17
@Muhtasham9
muhtasham
1 year
Full house 🦜 @full_stack_dl
Tweet media one
0
1
16
@Muhtasham9
muhtasham
7 months
MLX weights below
@_lewtun
Lewis Tunstall
7 months
Happy to share the latest Zephyr recipe based on @Google 's Gemma 7B 🔷🔶! Outperforms Gemma 7B Instruct on MT Bench & AGIEval, showing the potential of RLAIF to align this series of base models 💪 🧑‍🍳 I hope this recipe enables the community to create many more fine-tunes!
Tweet media one
3
40
162
0
3
14
@Muhtasham9
muhtasham
6 months
“there's a graveyard of ideas around attention” @TrentonBricken
0
3
13
@Muhtasham9
muhtasham
10 months
@lvwerra Yay congrats also got recently promoted to Sr Random Seed Engineer
1
0
15
@Muhtasham9
muhtasham
2 years
“The thing that determines whether you’re the product isn’t whether you’re paying for the product: it’s whether market power and regulatory forbearance allow the company to get away with selling you.” —  @doctorow
1
9
14
@Muhtasham9
muhtasham
1 year
@saahil addressing industry's challenges in scaling MLOps in multimodal settings
Tweet media one
0
3
12
@Muhtasham9
muhtasham
7 months
Spotted GPT-5 in the wild
Tweet media one
0
1
14
@Muhtasham9
muhtasham
8 months
@swyx Shameless plug but this would make it easier to compare
@Muhtasham9
muhtasham
9 months
Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by @eugeneyan 's post on this topic, I hacked something together over the weekend to streamline this
Tweet media one
9
33
261
1
0
12
@Muhtasham9
muhtasham
6 months
machine learning is low-precision linear algebra during developing TPU google cut down mantissa from 23 bits to 5 bits and invented bf16 fast forward now we have 1.58 bit LLMs
@simonw
Simon Willison
6 months
Huh, I missed this earlier this month: Microsoft Research used a similar trick for their "1.58-bit" LLM BitNet
4
2
40
0
0
11
@Muhtasham9
muhtasham
2 years
Interestingly despite the 5 years(!) of hyper-growth of NLP space, Vanilla Transformer is holding to the Lindy Effects which is the idea that the older something is, the longer it's likely to be around in the future.
0
2
13
@Muhtasham9
muhtasham
9 months
Supporting local compute pfp by @evanjconrad
Tweet media one
3
0
13
@Muhtasham9
muhtasham
5 months
#iclr folks come by we have pizza
Tweet media one
1
1
13
@Muhtasham9
muhtasham
11 months
Top recommendation: Beautifully written in-depth explanation of this concepts, which I failed to do in my initial blog High quality tokens, future LLMs can boost their reasoning and get sense of humor from @charles_irl if this blog ends up in their dataset
@charles_irl
Charles 🎉 Frye
11 months
PagedAttention, Virtual Context, Speculative Decoding, Register Tokens: the last year has seen many ideas from systems programming applied to LLMs. Not many folks live in that intersection, so I wrote an explainer post to make them a bit more accessible!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
18
286
1K
1
3
10
@Muhtasham9
muhtasham
10 months
Whats the bottleneck of your GPU-floor? @anyscalecompute meetup
Tweet media one
0
1
12
@Muhtasham9
muhtasham
7 months
Uncle jokes followed by biggest GPU heck yeah #NVIDIA #GTC24
Tweet media one
1
2
11
@Muhtasham9
muhtasham
5 months
PSA if you need GPUs for your research Hit this companies up they have compute grants @PrimeIntellect @dstackai @fal @fal especially if you work on diffusion models
@giffmana
Lucas Beyer (bl16)
5 months
Does your *university* nlp/vision/ml lab have more or less than 64 A100 and 100+ other GPUs?
22
3
32
0
2
11
@Muhtasham9
muhtasham
7 months
It´s here
@NVIDIAAIDev
NVIDIA AI Developer
7 months
Accelerate your coding tasks, from code completion to code summarization with StarCoder2, the latest state-of-the-art, open code #LLM built by @HuggingFace , @ServiceNow , and NVIDIA. Learn more 👉
1
36
126
1
0
10
@Muhtasham9
muhtasham
2 years
Tweet media one
2
1
11
@Muhtasham9
muhtasham
2 years
Reminder: Join amazing Transformers lecture by @giffmana tomorrow
@MunichNlp
Munich🥨NLP
2 years
🥨NEW EVENT🥨 Transformers in all glory details: @GoogleAI Brain Team Scientist Lucas Beyer @giffmana will explain the currently most dominant deep learning architecture for natural language processing in an exclusive event with @MunichNlp . Details below👇
Tweet media one
1
3
11
0
4
9
@Muhtasham9
muhtasham
8 months
Will try to feed 10M tokens over weekend
Tweet media one
1
0
8
@Muhtasham9
muhtasham
11 months
Sharing @huggingface collection of old models from RoBERTa all the way to GPT2 pre-trained and finetuned on Tajik language, stay tuned for more to come, mistral-7b, llama2-7b, and others on the way
1
0
9
@Muhtasham9
muhtasham
1 year
iCoffe Pro Max
Tweet media one
2
1
10
@Muhtasham9
muhtasham
5 months
"Flops are cheap, bandwidth is adding more pins, and latency is physics. Deal with it."
@karpathy
Andrej Karpathy
5 months
@vrushankdes Great read! My experience is that you’re fighting physics but also the nvidia compiler and the stack overall, and even after pulling *a lot* of tricks we still can’t achieve more than ~80-90% mem bw on many kernels that you’d naively think should be ~100. And the rabbit hole
2
1
39
0
1
10
@Muhtasham9
muhtasham
1 year
Transformers everywhere…
Tweet media one
0
0
8
@Muhtasham9
muhtasham
7 months
Great tune! Smooth run on m2 8gb python -m mlx_lm.generate --model mlx-community/OpenCodeInterpreter-SC2-3B-4bit --prompt "Write a quick sort in C++" --temp 0.0 --colorize
@xiangyue96
Xiang Yue
7 months
🌟 Big thanks for making StarCoder 2 open-source! 🚀 We've swiftly finetuned it on our Code-Feedback instruction dataset, the dataset behind OpenCodeInterpreter. 📈 HumanEval Scores are boosted ~30%. 3B Model: from 31.7 to 67.1! 7B Model: from 35.4 to 75.6! 🛠️ CodeFeedback has
Tweet media one
42
64
264
0
3
9
@Muhtasham9
muhtasham
6 months
is this this the company motto? smh @EMostaque stay strong king
Tweet media one
@amasad
Amjad Masad
6 months
Corporate AI drama is accelerating faster than AI itself.
Tweet media one
39
86
1K
0
0
8
@Muhtasham9
muhtasham
7 months
Tweet media one
0
0
7
@Muhtasham9
muhtasham
9 months
Patterns from CIDR database conference: Stanford - turns out databases are actually LLMs and every problem is an ML problem. Berkeley - let me solve some NP hardish algorithmic problem using LP and other techniques that might find application 50 years later. CMU - let me
0
2
8
@Muhtasham9
muhtasham
1 year
💫StarCoder which was released today by @BigCodeProject is prime example of Open Source outcompeting Big shot out to @lvwerra @harmdevries77 @Thom_Wolf @huggingface @ServiceNowRSRCH
@dylan522p
Dylan Patel
1 year
Google "We Have No Moat, And Neither Does OpenAI" Leaked Internal Google Document Claims Open Source AI Will Outcompete Google and OpenAI This is the opinion of one Googler, we do not agree, simply sharing. $GOOGL $MSFT $META $AI $NVDA $AMZN $AAPL
31
122
685
0
0
8
@Muhtasham9
muhtasham
7 months
🟩
@Muhtasham9
muhtasham
1 year
w boss
Tweet media one
3
1
68
0
0
8
@Muhtasham9
muhtasham
1 year
@vboykis Also rich
0
0
0
@Muhtasham9
muhtasham
2 years
𝙏𝙝𝙧𝙚𝙚 𝙩𝙝𝙞𝙣𝙜𝙨 𝙚𝙫𝙚𝙧𝙮𝙤𝙣𝙚 𝙨𝙝𝙤𝙪𝙡𝙙 𝙠𝙣𝙤𝙬 𝙖𝙗𝙤𝙪𝙩 𝙑𝙞𝙨𝙞𝙤𝙣 𝙏𝙧𝙖𝙣𝙨𝙛𝙤𝙧𝙢𝙚𝙧𝙨 by @MetaAI Summary thread 🧵
1
1
7
@Muhtasham9
muhtasham
7 months
Image and prompt by yours truly @marksaroufim teaching style is like a casual conversation with a senior engineer on your team
@neurosp1ke
Andreas Köpf
7 months
CUDA-MODE 8: CUDA performance gotchas How to maximize occupancy, coalesce memory accesses, minimize control divergence? Sequel to lecture 1, focus on profiling. Speaker: @marksaroufim (today in ~45 mins) Sat, Mar 2, 20:00 UTC
Tweet media one
1
20
105
1
1
7
@Muhtasham9
muhtasham
1 year
@MattNiessner @synthesiaIO Forget AutoGPT, AutoProf is the real deal
0
0
8
@Muhtasham9
muhtasham
7 months
Super model MLX weights below
@abacaj
anton
7 months
Release phi-2-super. Fine tuned over phi-2 and aligned with cDPO. MT-bench of 7.1875, surpassing many larger models. Humaneval score 60.98%, Humaneval-Plus 54.88%
Tweet media one
45
60
554
0
2
7
@Muhtasham9
muhtasham
2 years
Nett hier. Aber waren Sie schon mal in @TU_Muenchen ?
Tweet media one
1
0
7
@Muhtasham9
muhtasham
6 months
@ClementDelangue Yeah your runway should be enough to do this
1
0
7
@Muhtasham9
muhtasham
1 year
“LLMs are not database, they are not up to date, think of them as are reasoning engine and some sort of retrievers will solve the the issue of up do date knowledge” @sama
Tweet media one
0
0
2
@Muhtasham9
muhtasham
1 year
Based @ykilcher at @tum .ai summit
Tweet media one
2
0
7
@Muhtasham9
muhtasham
1 year
Kinda like this emoji 🌉 but with crescent 🌙
Tweet media one
0
0
6
@Muhtasham9
muhtasham
5 months
I want to be in sf so badly this summer
4
0
8
@Muhtasham9
muhtasham
1 year
Lot of wisdom from @kagglingdieter
Tweet media one
2
0
7
@Muhtasham9
muhtasham
6 months
@jtvhk bruhh they should just outsource to @sfcompute
0
0
8
@Muhtasham9
muhtasham
1 year
How to get rich from LLMs 🤑 This made my day @full_stack_dl
Tweet media one
0
0
8
@Muhtasham9
muhtasham
2 years
Beating OpenAI large v2 with Fine-tuned *medium* model from 85.8 WER down to 23.1 WER special thanks to @LambdaAPI and @huggingface team especially @sanchitgandhi99 and @reach_vb
0
0
8
@Muhtasham9
muhtasham
7 months
Took some time off web-sockets
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
8
@Muhtasham9
muhtasham
1 year
TIL: @lexfridman hails from Buston, Tajikistan 🇹🇯 When our paths cross, I'll be ready with a friendly, "What's up, homie?"
1
1
6
@Muhtasham9
muhtasham
5 months
sneaking into libraries w @oliverpfaffel
Tweet media one
Tweet media two
0
1
7
@Muhtasham9
muhtasham
1 year
bf16 >> fp16 more numerically stable in practice
0
0
6
@Muhtasham9
muhtasham
11 months
@amasad @perplexity_ai @googlecloud Damn time to switch all dev to iPad with Replit Core
0
0
2
@Muhtasham9
muhtasham
2 years
@bradneuberg @tszzl @amasad Should be from the Facebook IPO, so around 2012
1
0
7
@Muhtasham9
muhtasham
1 year
Thanks @dk21 and @jefrankle for this amazing session, can’t wait for upcoming sessions
@weights_biases
Weights & Biases
1 year
We are LIVE🎉 Tune in for Lesson 3 of the Training & Fine-Tuning LLMs Course with @MosaicML 📚 You will learn data scaling laws to construct custom datasets, & dive deep into data curation, ethics, storage, & streaming best practices. Stream now🔗
0
2
6
0
1
7
@Muhtasham9
muhtasham
4 months
Roasting coffee beans and GPUs
Tweet media one
0
0
7
@Muhtasham9
muhtasham
8 months
Germany is probably the only country you get invited to dinner by VC and the day after get asked to paypal the amount, or probably recession hitting hard on everyone
1
0
6
@Muhtasham9
muhtasham
2 years
Benedikt sharing the learnings from 5 data science competitions for recommender systems he did over the last 3 years.
Tweet media one
0
0
6
@Muhtasham9
muhtasham
2 years
With the swarm of users experimenting @bing Chat aka Sydney. I feel similar vibes like that of “OMG LaMDA is sentient guy”. Again many things can be said but before folks start posting terminator images let me leave this here …
Tweet media one
1
1
7
@Muhtasham9
muhtasham
2 years
Found the famous books cover page star while hiking today @aureliengeron
Tweet media one
0
0
7