hkproj Profile Banner
Umar Jamil Profile
Umar Jamil

@hkproj

Followers
9K
Following
2K
Statuses
471

I used to keep GPUs hot🔥, now I make them go brrrr🔥🔥🔥 @Get_Writer Join the best AI community on Discord: https://t.co/zYH1DlgdbW Opinions my own

Milan, Lombardy
Joined February 2018
Don't wanna be here? Send us removal request.
@hkproj
Umar Jamil
3 months
In this video, I'll be deriving and coding Flash Attention from scratch. No prior knowledge of CUDA or Triton is required. Link to the video: All the code will be written in Python with Triton, but no prior knowledge of Triton is required. I'll also explain the CUDA programming model from zero. I'll explore the following topics: * Review of Multi-Head Attention * Safe Softmax * Online Softmax (with proof!) * Introduction to GPUs and the CUDA programming model * Tensor layouts: row-major layout, stride, reshape, transpose * Block Matrix Multiplication * Introduction to Triton * Forward pass of Flash Attention in Triton * How Autograd works * What are derivatives, gradients, and Jacobians * Jacobian of the Matrix Multiplication operation * Jacobian of the Softmax operation * Backwards pass of Flash Attention in Triton * Triton tricks: Software pipelining If you find this video useful, consider subscribing to my channel and sharing the video within your network of friends and colleagues. #flashattention #triton #cuda #tutorial #python #attention #transformers #deeplearning
40
281
2K
@hkproj
Umar Jamil
11 hours
It’s Saturday, time for self reflection What did you build this week?
25
2
79
@hkproj
Umar Jamil
15 hours
@elliotarledge I’ll explain it in my next video. Stay tuned
0
0
25
@hkproj
Umar Jamil
15 hours
Last year I made a video on Reinforcement Learning from Human Feedback, deriving the PPO loss from first principles. Thanks to DeepSeek R1 it became very popular.
2
27
338
@hkproj
Umar Jamil
18 hours
@mrsiipa @Grad62304977 Part 1: Part 2: Learnt a lot from these two posts
1
3
84
@hkproj
Umar Jamil
19 hours
@Grad62304977 @mrsiipa Yeah, that’s why they hired Noam back. He did an amazing job scaling inference at CharacterAI I shared the two blog posts from CharacterAI that describe the arch changes to reduce the cost of the KV Cache
1
0
40
@hkproj
Umar Jamil
22 hours
RT @NVIDIAHPCDev: 🌟We just learned there's a 100 Day #CUDA Challenge happening. Launched by @hkproj -- there's now more than 60 coders w…
0
9
0
@hkproj
Umar Jamil
22 hours
@NVIDIAHPCDev Thank you for spreading the world!! Join us now! Link in bio
0
0
19
@hkproj
Umar Jamil
1 day
@marksaroufim @jxmnop I think it’s also about talent and docs: early stable software -> more developers using it -> companies buy expensive HW that employees can actually use -> more senior devs in the market on said HW… In AI you don’t have 6 months to train devs on a new HW, market is too fast.
1
1
82
@hkproj
Umar Jamil
2 days
What's a good shopping agent? My brother spent 3 days looking for a laptop to buy on Amazon. It should work like this: given technical specs and a budget, find all laptop that ship within 1 week to my home address.
2
1
30
@hkproj
Umar Jamil
2 days
@sagar_agi007 Square always better than triangle — I’m waiting.
1
0
1
@hkproj
Umar Jamil
2 days
OpenAI & co competing with “OpenAI wrapper” startups is the biggest confirmation that VCs must fund startups to build apps on top of models they themselves train, host and improve. It is also the biggest confirmation that we need the open source models more than ever before.
2
1
61
@hkproj
Umar Jamil
2 days
@nrehiew_ Waiting for the “aha moment is all you need” paper
0
0
8
@hkproj
Umar Jamil
3 days
Tech people worried about losing their job to AI: most of the world economy is run by employees whose only job is to read emails, fill up an Excel file and sometimes scan/print documents. Chill and keep building. The future is bright.
6
32
463
@hkproj
Umar Jamil
3 days
@pientropy Compute your daily cost to the company - divide by two - that's how much it saves to use it for research. In Europe it depends on where you live/work, but in the US the average tech worker who needs to do research would definitely save money to the company by having it.
0
0
2
@hkproj
Umar Jamil
3 days
Community won. I’ll describe the pipeline parallelism method they used in the DeepSeek V3 starting from early pipeline parallelism designs from 2018. Of course from first principles, with pen and paper. May daughter Sofia may cry in the background from time to time.
@hkproj
Umar Jamil
4 days
If the community wants, let’s talk about pipeline parallelism in my next video. Community votes, I deliver.
2
6
288
@hkproj
Umar Jamil
3 days
It’s incredible how the PDF format has been a major blocker and cost for the LLM ecosystem. It’s like using stone tablets instead of paper for writing and then being unable to OCR them.
13
6
146