Pavel Surmenok @surmenok profile

Pavel Surmenok

@surmenok

Followers

2K

Following

75K

Media

91

Statuses

6K

Autoilot / AI at Tesla

Redwood City, CA

Joined July 2009

Don't wanna be here? Send us removal request.

Pavel Surmenok

@surmenok

2 months

FSD V13: point-to-point self-driving without touching steering wheel or pedals. A large deep neural network trained with a large dataset end-to-end: photons in, controls out.

ΛI DRIVR

@AIDRIVR

2 months

FSD 13 leaves parking lot (+ awkward interaction with other driver). the smoothness is absolutely INSANE. it also saw the Model 3 backing up before I did, I was wondering why it wasn’t moving lol

5

13

275

Pavel Surmenok

@surmenok

2 months

@realGeorgeHotz Why would I buy KIA?.

7

1

200

Pavel Surmenok

@surmenok

10 months

@Austen One thing to check: does it run Windows or Linux.

8

0

167

Pavel Surmenok

@surmenok

1 year

Once upon a time, I interviewed a seasoned ML engineer, asked him “what do you think about batch norm”. He looked at me with eyes full of painful memories and laughed. Then I knew that he is an expert.

7

2

152

Pavel Surmenok

@surmenok

10 months

@jeremyphoward Autoregressive vs. diffusion makes more sense.

3

0

150

Pavel Surmenok

@surmenok

8 months

@dividendology One is sum of all savings, another is growth rate. 2nd image as sum of savings would look more like this. Still noisy, but not as much.

2

5

143

Pavel Surmenok

@surmenok

1 month

@_apoorvnandan That’s literally step 1 of they @karpathy playbook.

7

0

144

Pavel Surmenok

@surmenok

10 months

@SergioRocks The question is false. AI is a tool. Should judge impact and quality of work.

1

0

133

Pavel Surmenok

@surmenok

5 months

@tamaybes Can you please retrain the model to make sure there are no issues with upload.

0

116

Pavel Surmenok

@surmenok

4 months

@saurabh_shah2 The story is wild. @JeffDean just wanted to save bandwidth and chopped off the lowest 16 bits of fp32.

Jeff Dean

@JeffDean

1 year

@keveman @giffmana This is roughly right. Basically wanted to send fewer bytes over the network for our distributed neural network training system, and easiest way on a CPU was to lop off the low 16 bits of mantissa, and fill with 0s on other side. Turns out it was fine for training.

1

2

106

Pavel Surmenok

@surmenok

6 months

@jxmnop How about converting it to a (N, 2) numpy array and storing as npz (compressed)?.

1

0

95

Pavel Surmenok

@surmenok

6 months

@cremieuxrecueil What surprised me: even men from Denmark have quite high proportion of 18%.

10

0

89

Pavel Surmenok

@surmenok

7 months

@seldo Impact on me (in the valley): I wanted to order a drink to pick up in Starbucks on the way to work, the app showed a message that early order is not available. No other issues so far.

2

0

85

Pavel Surmenok

@surmenok

6 months

@theorizur Have you tried to work at startups, or at orgs that move fast (e.g. pretty much any Elon’s company)?.

7

0

80

Pavel Surmenok

@surmenok

1 year

@ID_AA_Carmack Not unlike Windows which had all kinds of patches for bugs in 3rd party apps. “On beta versions of Windows 95, SimCity wasn’t working in testing. Microsoft tracked down the bug and added specific code to Windows 95 that looks for SimCity. If it finds SimCity running, it runs the.

3

7

82

Pavel Surmenok

@surmenok

8 months

@alfred_twu Isn’t it odd not counting San Diego as west?.

6

0

73

Pavel Surmenok

@surmenok

5 months

@_brianpotter

0

73

Pavel Surmenok

@surmenok

2 months

A large model trained on enough data learns sophisticated behaviors. The network just wants to learn, give it more compute and data.

Tesla

@Tesla

2 months

FSD Supervised 13.2 reverses to exit parking spot blocked by delivery truck, then waits for oncoming traffic to clear before proceeding. This all happens implicitly within the model, which is trained on extensive data of similar real-world scenarios.

1

4

70

Pavel Surmenok

@surmenok

1 year

@mualphaxi @Stanford It might be worth to find authors of the posters and make a clear permanent searchable record of their actions.

4

0

66

Pavel Surmenok

@surmenok

2 months

@anammostarac “I believe the most qualified person should get the job”.

1

0

61

Pavel Surmenok

@surmenok

3 months

@nearcyan You can go to any DMV office in California, doesn’t have to be SF. On the website, they used to show waiting time for people without appointment. I was able to find an office without a large queue and went there.

2

0

61

Pavel Surmenok

@surmenok

5 months

Almost every startup at YC Demo Day is building with LLMs. Huge change comparing even with the previous demo day.

6

57

Pavel Surmenok

@surmenok

1 month

@finbarrtimbers Kaplan et al found that (for pretraining) the learning rate schedule is irrelevant as long as LR summed up over all training steps is large enough, includes a warmup period and decay to near-vanishing value at the end.

2

4

56

Pavel Surmenok

@surmenok

2 years

@stylewarning I’d start from writing tests. Then see if it’s well modularized or it’s a ball of spaghetti, attempt to refactor in the latter case.

1

0

49

Pavel Surmenok

@surmenok

11 months

It’s Monday. Time to build.

4

1

52

Pavel Surmenok

@surmenok

6 months

@jxmnop Must be small integers if it takes less than 10 bytes to encode a pair in text.

3

0

49

Pavel Surmenok

@surmenok

1 month

@_apoorvnandan @karpathy Ok, actually step 1 is “become one with the data”. But verifying the loss of randomly initialized network is correct is the first thing to do before starting training.

1

50

Pavel Surmenok

@surmenok

3 months

@pmarca I came to work today, the parking lot is packed, engineers are working, no foosball table in sight. Occasional humanoid robots here and there.

2

48

Pavel Surmenok

@surmenok

1 year

In the Arena today. Trying stuff. Some will work, some won’t. Always learning.

5

0

46

Pavel Surmenok

@surmenok

11 days

One step closer to large-scale unsupervised FSD.

Tesla AI

@Tesla_AI

11 days

Teslas now drive themselves from their birthplace at the factory to their designated loading dock lanes without human intervention. One step closer to large-scale unsupervised FSD

1

0

42

Pavel Surmenok

@surmenok

1 year

Now general public will learn about mighty Q-learning algorithm.

Eric Jang

@ericjang11

1 year

reading in between the lines, is Q* the fabled breakthrough in AlphaStar-style search + LLM that so many big labs are trying to get working? Many research projects in GPT-4 self-verification + search have not yielded really strong performance improvements, so I'd be quite.

0

40

Pavel Surmenok

@surmenok

9 months

@Noahpinion Implication is that logical thinking is a right-wing thing.

3

0

37

Pavel Surmenok

@surmenok

13 days

@Austen Wait what? Their software engineers make less money than my nanny?.

0

40

Pavel Surmenok

@surmenok

9 months

@juliepoptart Share officer name and badge number, public should know.

0

34

Pavel Surmenok

@surmenok

3 months

Saturday morning. Good time to check how my training jobs are doing. GPUs go brrrr.

4

0

39

Pavel Surmenok

@surmenok

1 year

@pronounced_kyle Log scale for y axis might help to see trends better.

2

0

37

Pavel Surmenok

@surmenok

3 months

@emollick Interesting. I don’t see it, other than less of Yann LeCun’s toxic posts lately.

3

0

37

Pavel Surmenok

@surmenok

19 days

@jxmnop Episode with Elon was far from the 3rd. Here is a full episode list starting from the first with Max Tegmark Episode with Elon is #18. Still impressive though.

0

35

Pavel Surmenok

@surmenok

2 years

@Tendar Может он так шифровку передает азбукой Морзе?.

0

32

Pavel Surmenok

@surmenok

9 months

@jmrphy Yes. It didn’t happen 2 years ago, now happens all the time. Big regression, sadly.

2

0

33

Pavel Surmenok

@surmenok

5 months

@hyhieu226 @PyTorch @MalekiSaeed Links for AsyncTP, for those who want to dig deeper.

0

4

36

Pavel Surmenok

@surmenok

10 months

That’s a lot!.

3

1

34

Pavel Surmenok

@surmenok

5 months

@GarrisonLovely Reading the article, it looks more like Hoduras became a nightmare for Honduras and wants to ruin it by walking back the deal they previously agreed on. They deserve to be bankrupted if that’s the case.

2

0

32

Pavel Surmenok

@surmenok

3 years

@Carnage4Life @BeanstalkFarms So it’s not stolen then, the protocol worked as designed. Fascinating.

0

29

Pavel Surmenok

@surmenok

4 months

@EugeneVinitsky Torrent.

0

29

Pavel Surmenok

@surmenok

1 month

Coffee from @perplexity_ai. Smart juice for a curious mind.

3

0

29

Pavel Surmenok

@surmenok

4 months

@hankgreen Hard to believe it was 100%.

7

0

27

Pavel Surmenok

@surmenok

3 months

@igorsushko What’s your beef with Joe Rogan?.

17

0

29

Pavel Surmenok

@surmenok

2 years

@debarghya_das Maybe it was easier to immigrate to US back then?.

2

0

27

Pavel Surmenok

@surmenok

6 months

@nathanbenaich He refers to Noam Shazeer’s LinkedIn profile. Legend.

1

29

Pavel Surmenok

@surmenok

8 months

@GergelyOrosz @t3dotgg @ThePrimeagen I don’t joke about bus factor. I’m very serious about bus factor.

0

28

Pavel Surmenok

@surmenok

11 months

@Tsla99T That’s the first thing I checked this morning! Keeping GPUs busy.

2

0

29

Pavel Surmenok

@surmenok

24 days

@VicVijayakumar 95% of the people on the call laughed.

1

0

28

Pavel Surmenok

@surmenok

2 months

@PaulSkallas Not sure why emphasizing “able bodied”. Delivery saves a ton of time.

0

28

Pavel Surmenok

@surmenok

3 months

@Oilfield_Rando When I was 7yo, a gypsy stole my bicycle. I still remember it.

0

27

Pavel Surmenok

@surmenok

12 days

@GergelyOrosz

0

1

28

Pavel Surmenok

@surmenok

24 days

@VicVijayakumar That’s great, they still remember your name.

0

28

Pavel Surmenok

@surmenok

1 month

@Crypto_uWu @growing_daniel H1-B is a temporary worker visa, issued for 3 years, can be renewed for 3 more years. After 6 years they have to get out of the country (unless apply for a green card or some other visa type). They can’t bring family except a spouse and kids under 21yo.

2

1

27

Pavel Surmenok

@surmenok

1 year

@Austen Torrent is the ultimate weapon of a free man.

1

0

26

Pavel Surmenok

@surmenok

4 months

@OfficialLoganK @Wharton @emollick Congrats, Logan!.

0

26

Pavel Surmenok

@surmenok

9 months

@emollick Link to paper

0

3

24

Pavel Surmenok

@surmenok

2 months

@garrytan It will get even better!.

1

24

Pavel Surmenok

@surmenok

4 months

@yishan The best time to start was 8 years ago. The next best time to start is now.

0

24

Pavel Surmenok

@surmenok

2 months

@_jasonwei Will they publish the video?.

0

23

Pavel Surmenok

@surmenok

12 days

I had a TODO to buy more NVDA. Seeing -16% this morning felt like a Christmas gift.

2

0

22

Pavel Surmenok

@surmenok

1 year

@patio11 I’ve heard exactly the same from a barber around Thanksgiving. He also said that if he goes on vacation, his regular customers will find another barber and his business will suffer long term.

0

1

21

Pavel Surmenok

@surmenok

2 months

@swyx @JeffDean @latentspacepod Could you please share a link to the video.

0

23

Pavel Surmenok

@surmenok

1 year

OpenAI board members cleaned up their social media profiles: Tasha McCauley closed off Twitter, Helen Toner and Adam D'Angelo don't mention OpenAI on LinkedIn.

1

3

23

Pavel Surmenok

@surmenok

7 months

@legen_eth Someone should start an ETF following her trades.

6

0

21

Pavel Surmenok

@surmenok

8 months

@naderi_yeganeh Did you come up with these equations manually?.

3

0

21

Pavel Surmenok

@surmenok

9 months

@peterrhague Honestly I thought it’s your real photo, AI augmented. That’s odd that some people are mad about EVs. EVs are great.

8

0

22

Pavel Surmenok

@surmenok

1 month

@growing_daniel @Crypto_uWu They can’t. They cannot even immigrate in that visa, it’s a non-immigrant visa by definition.

Pavel Surmenok

@surmenok

1 month

@Crypto_uWu @growing_daniel H1-B is a temporary worker visa, issued for 3 years, can be renewed for 3 more years. After 6 years they have to get out of the country (unless apply for a green card or some other visa type). They can’t bring family except a spouse and kids under 21yo.

7

0

20

Pavel Surmenok

@surmenok

4 months

@LChoshen @DeqingFu @robinomial @jacobandreas Link to the paper on Arxiv:

1

3

21

Pavel Surmenok

@surmenok

2 months

@dkrajendra Rare elements are not rare, that’s misnomer. They are everywhere in the Earth crust. Professing these metals is not environmentally friendly (much pollution), so we outsource it whenever possible.

0

21

Pavel Surmenok

@surmenok

1 year

Next gen Tesla Bot. The future is already here!.Great job @_milankovac_ and the team!.

Tesla Optimus

@Tesla_Optimus

1 year

There’s a new bot in town 🤖. Check this out (until the very end)!.

0

21

Pavel Surmenok

@surmenok

7 years

1080Ti is still economically better than Titan V if you run CNNs.

1

7

18

Pavel Surmenok

@surmenok

1 year

@karpathy Problem with comments is that they get out of sync with code. Best code is self-documented. Comments should not explain what the code is doing, but may explain why, e.g. reasons for unconventional usage of something something as workaround for a bug somewhere.

3

0

18

Pavel Surmenok

@surmenok

1 month

@giffmana Surprised that Mark lost so much ground. Probably because no big releases in the last few months. Recency bias.

2

0

18

Pavel Surmenok

@surmenok

1 month

@twobitidiot @theallinpod @rabois Try @BG2Pod , people on the street say that it has vibes of early All-in pod. I enjoy it, information dense, no bullshit.

0

19

Pavel Surmenok

@surmenok

1 year

- What is Occam's razor?.- Well, the simplest explanation is that there is a guy named Occam and it is his razor.

3

0

19

Pavel Surmenok

@surmenok

6 months

One man’s prior is another man’s posterior.

1

18

Pavel Surmenok

@surmenok

2 years

@finbarrtimbers GPU utilization is a bad metric in practice. GPU utilization can be 100% while GPU does nothing but waiting for e.g. NCCL communication from other ranks. GPU power consumption is more informative.

3

0

17

Pavel Surmenok

@surmenok

2 months

@abacaj Locality of compute and openness of the model are orthogonal concepts. One can run Llama on a cluster.

0

19

Pavel Surmenok

@surmenok

1 year

I wish Google to publish a thorough postmortem to explain what went wrong with aligning their chatbot and how they are going to fix it. Curious how much of it are explicit instructions in the system prompt vs. RLHF.

3

0

18

Pavel Surmenok

@surmenok

6 months

@KareemRifai Like elections in Russia in 2011 when pro-Putin party won, and votes in one region (as displayed on TV) summed up to 146%.

1

0

18

Pavel Surmenok

@surmenok

3 months

@thegautamkamath Is this ICLR issue or one rogue reviewer?.

3

0

19

Pavel Surmenok

@surmenok

2 years

@RazRazcle Link to the paper:

1

2

19

Pavel Surmenok

@surmenok

1 year

Ever tried. Ever failed. No matter. Try Again. Fail again. Fail better.

2

0

18

Pavel Surmenok

@surmenok

2 months

Eval of LLM systems is conceptually similar to eval of other machine learning models. Look at predictions (ideally on distribution of inputs from your real users), identify patterns of errors, cluster/categorize errors, develop evals for each error cluster.

Hamel Husain

@HamelHusain

2 months

I started doing office hours on LLM evals and met with 8+ founders in the last 3 weeks. Common questions:. - Which components of our app do we start evaluating (RAG,tool calls, etc)? .- What metrics should I use?.- Where should I spend my time? . All have the same solution.

0

1

18

Pavel Surmenok

@surmenok

2 months

Thank you @chazman .V13 is 🔥.

Chuck Cook

@chazman

2 months

I don't do posts like this very often. just read it please. Since I have been home after my redeye flying all night from PHX . My @Cybertruck and Model Y had received Supervised FSD v13.2.1 while parked, over the air cellular (OTA) for free. I got in my Cybertruck dead tired.

1

18

Pavel Surmenok

@surmenok

6 months

@_xjdr Link to the paper:

3

1

17

Pavel Surmenok

@surmenok

10 months

@shanselman @markrussinovich Never look at desktop, always maximize windows.

1

0

16

Pavel Surmenok

@surmenok

8 months

A story about a black SFFD firefighter assaulting his Asian colleague. The department tried to cover it up, the victim was fired, the assaulter kept his job. So much dysfunction in SF public services.

Diane Yap

@RealDianeYap

8 months

Black privilege in SF: . Black firefighter looks up Asian coworker’s address, shows up at his house and tries to beat him to death with a wrench. Asian firefighter gets fired for cooperating with police. Black firefighter keeps his job, never missing a paycheck.

1

0

16

Pavel Surmenok

@surmenok

10 months

@nikitabier I’ve owned a house for less than two years, and it’s relatively new and recently renovated, but I already have phone numbers for good repairmen for all kinds of things.

0

15

Pavel Surmenok

@surmenok

12 days

More goodies from DeepSeek. Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation.

AK

@_akhaliq

12 days

deepseek just dropped some new models . people are still getting used to R1. Janus-Pro is a novel autoregressive framework that unifies multimodal understanding and generation. It addresses the limitations of previous approaches by decoupling visual encoding into separate

3

4

17

Pavel Surmenok

@surmenok

2 months

@YunTaTsai1 Trump’s willingness to go to the long form podcasts is respectable. You can feel what kind of person he is, what points are important for him. Listen for a couple hours and you can make a better informed decision whether to hire him.

1

0

17

Pavel Surmenok

@surmenok

3 months

@sirbayes Interesting. I never heard of the other meaning of inference. Prediction seems a bit off. Prediction is about the future. For example, you can predict where a pedestrian will be 1 second from now. But detecting where they are now is not prediction. I hesitate to use the word.

2

0

16

Pavel Surmenok

@surmenok

7 months

@srush_nlp We should normalize pseudonyms and links to arbitrary webpages. Democratizing science.

0

16

Pavel Surmenok

@surmenok

11 months

Sometimes the model answer is wrong. Sometimes the model answer is correct but we just don’t like the result.

2

0

16

Pavel Surmenok

@surmenok

1 year

@pronounced_kyle Show me training loss going to 0.

2

0

15