Umar Jamil @hkproj profile

Umar Jamil

@hkproj

Followers

9K

Following

2K

Statuses

471

I used to keep GPUs hot🔥, now I make them go brrrr🔥🔥🔥 @Get_Writer Join the best AI community on Discord: https://t.co/zYH1DlgdbW Opinions my own

Milan, Lombardy

Joined February 2018

Don't wanna be here? Send us removal request.

Umar Jamil

@hkproj

3 months

In this video, I'll be deriving and coding Flash Attention from scratch. No prior knowledge of CUDA or Triton is required. Link to the video: All the code will be written in Python with Triton, but no prior knowledge of Triton is required. I'll also explain the CUDA programming model from zero. I'll explore the following topics: * Review of Multi-Head Attention * Safe Softmax * Online Softmax (with proof!) * Introduction to GPUs and the CUDA programming model * Tensor layouts: row-major layout, stride, reshape, transpose * Block Matrix Multiplication * Introduction to Triton * Forward pass of Flash Attention in Triton * How Autograd works * What are derivatives, gradients, and Jacobians * Jacobian of the Matrix Multiplication operation * Jacobian of the Softmax operation * Backwards pass of Flash Attention in Triton * Triton tricks: Software pipelining If you find this video useful, consider subscribing to my channel and sharing the video within your network of friends and colleagues. #flashattention #triton #cuda #tutorial #python #attention #transformers #deeplearning

40

281

2K

Umar Jamil

@hkproj

11 hours

It’s Saturday, time for self reflection What did you build this week?

25

2

79

Umar Jamil

@hkproj

15 hours

@elliotarledge I’ll explain it in my next video. Stay tuned

0

25

Umar Jamil

@hkproj

15 hours

Last year I made a video on Reinforcement Learning from Human Feedback, deriving the PPO loss from first principles. Thanks to DeepSeek R1 it became very popular.

2

27

338

Umar Jamil

@hkproj

18 hours

@mrsiipa @Grad62304977 Part 1: Part 2: Learnt a lot from these two posts

1

3

84

Umar Jamil

@hkproj

19 hours

@Grad62304977 @mrsiipa Yeah, that’s why they hired Noam back. He did an amazing job scaling inference at CharacterAI I shared the two blog posts from CharacterAI that describe the arch changes to reduce the cost of the KV Cache

1

0

40

Umar Jamil

@hkproj

22 hours

RT @NVIDIAHPCDev: 🌟We just learned there's a 100 Day #CUDA Challenge happening. Launched by @hkproj -- there's now more than 60 coders w…

0

9

0

Umar Jamil

@hkproj

22 hours

@NVIDIAHPCDev Thank you for spreading the world!! Join us now! Link in bio

0

19

Umar Jamil

@hkproj

1 day

@marksaroufim @jxmnop I think it’s also about talent and docs: early stable software -> more developers using it -> companies buy expensive HW that employees can actually use -> more senior devs in the market on said HW… In AI you don’t have 6 months to train devs on a new HW, market is too fast.

1

82

Umar Jamil

@hkproj

2 days

What's a good shopping agent? My brother spent 3 days looking for a laptop to buy on Amazon. It should work like this: given technical specs and a budget, find all laptop that ship within 1 week to my home address.

2

1

30

Umar Jamil

@hkproj

2 days

@sagar_agi007 Square always better than triangle — I’m waiting.

1

0

1

Umar Jamil

@hkproj

2 days

OpenAI & co competing with “OpenAI wrapper” startups is the biggest confirmation that VCs must fund startups to build apps on top of models they themselves train, host and improve. It is also the biggest confirmation that we need the open source models more than ever before.

2

1

61

Umar Jamil

@hkproj

2 days

@nrehiew_ Waiting for the “aha moment is all you need” paper

0

8

Umar Jamil

@hkproj

3 days

Tech people worried about losing their job to AI: most of the world economy is run by employees whose only job is to read emails, fill up an Excel file and sometimes scan/print documents. Chill and keep building. The future is bright.

6

32

463

Umar Jamil

@hkproj

3 days

@pientropy Compute your daily cost to the company - divide by two - that's how much it saves to use it for research. In Europe it depends on where you live/work, but in the US the average tech worker who needs to do research would definitely save money to the company by having it.

0

2

Umar Jamil

@hkproj

3 days

Community won. I’ll describe the pipeline parallelism method they used in the DeepSeek V3 starting from early pipeline parallelism designs from 2018. Of course from first principles, with pen and paper. May daughter Sofia may cry in the background from time to time.

Umar Jamil

@hkproj

4 days

If the community wants, let’s talk about pipeline parallelism in my next video. Community votes, I deliver.

2

6

288

Umar Jamil

@hkproj

3 days

It’s incredible how the PDF format has been a major blocker and cost for the LLM ecosystem. It’s like using stone tablets instead of paper for writing and then being unable to OCR them.

13

6

146