Yuchen Jin @Yuchenj_UW profile

Yuchen Jin

@Yuchenj_UW

Followers

30K

Following

41K

Statuses

8K

Co-founder & CTO @hyperbolic_labs 🧑‍🍳 fun AI systems. Previously at OctoAI (acquired by @nvidia) building @ApacheTVM, PhD @uwcse 🤖

another Galaxy

Joined November 2016

Don't wanna be here? Send us removal request.

Yuchen Jin

@Yuchenj_UW

8 months

Outperform GPT-3 with @karpathy's llm.c using just 1/3 training tokens ✨ Another day has passed, and I trained GPT-2 (124M) with llm.c for 150B tokens, achieving 35.5% accuracy on HellaSwag. This surpasses the GPT-3 paper’s 33.7% accuracy trained for 300B tokens. It matched the original paper’s 33.7% score at only ~95B tokens, using less than 1/3 training tokens compared to the GPT-3 paper. Key reasons are: (1) I tripled the max learning rate which sped up the training, more details in my last tweet: (2) I trained the model with @huggingface's FineWeb dataset, which is described as “cleaned and deduplicated English web data from CommonCrawl”. The GPT-3 paper, published 4 years ago, also primarily trained on filtered and deduplicated CommonCrawl data, and the paper discussed their data cleaning methods. The improvements might be due to the better quality of web data available over the past 4 years or Huggingface's data cleaning methods are better.

Andrej Karpathy

@karpathy

9 months

Apparently today is the 4th year anniversary of GPT-3! Which I am accidentally celebrating by re-training the smallest model in the miniseries right now :). HellaSwag 33.7 (Appendix H) almost reached this a few steps ago (though this is only 45% of the training done). I remember when the GPT-3 paper came out quite clearly because I had to interrupt work and go out for a walk. The realization hit me that an important property of the field flipped. In ~2011, progress in AI felt constrained primarily by algorithms. We needed better ideas, better modeling, better approaches to make further progress. If you offered me a 10X bigger computer, I'm not sure what I would have even used it for. GPT-3 paper showed that there was this thing that would just become better on a large variety of practical tasks, if you only trained a bigger one. Better algorithms become a bonus, not a necessity for progress in AGI. Possibly not forever and going forward, but at least locally and for the time being, in a very practical sense. Today, if you gave me a 10X bigger computer I would know exactly what to do with it, and then I'd ask for more. It's this property of AI that also gets to the heart of why NVIDIA is a 2.8T company today. I'm not sure how others experienced it, but the realization convincingly clicked for me with GPT-3, 4 years ago.

25

106

910

Yuchen Jin

@Yuchenj_UW

20 hours

@PalmerLuckey You founded Oculus VR at 19, impressive

0

34

Yuchen Jin

@Yuchenj_UW

23 hours

@elonmusk everyone who judges people by age is subtard

6

2

92

Yuchen Jin

@Yuchenj_UW

1 day

@deedydas insane it happened

0

2

Yuchen Jin

@Yuchenj_UW

1 day

@ClementDelangue Nice! openai only has 4.7k 💔

2

0

16

Yuchen Jin

@Yuchenj_UW

1 day

@Laz4rz let's see if we have some intern project!

0

2

Yuchen Jin

@Yuchenj_UW

1 day

@elliotarledge @freeCodeCamp congrats Elliot!

1

0

1

Yuchen Jin

@Yuchenj_UW

1 day

@99aico Incredible acceleration time we are living in

0

2

Yuchen Jin

@Yuchenj_UW

2 days

@Bruce99941687 @chrisbe1968 Yeah, sad

0

1

Yuchen Jin

@Yuchenj_UW

2 days

@yuejiedeli Whisper is an impressive speech recognition model actually, but it's from 2022 so many people who got into AI recently never heard about it

1

0

7

Yuchen Jin

@Yuchenj_UW

2 days

@actualhog we will onboard more so people can rent them on our GPU marketplace!

0

2

Yuchen Jin

@Yuchenj_UW

2 days

@actualhog 8xH200s, we host it too:

1

0

4

Yuchen Jin

@Yuchenj_UW

2 days

@yihyunCS "because it's unsafe"

1

0

3

Yuchen Jin

@Yuchenj_UW

2 days

@pvdbzy118306 really? do you have examples？

1

0

3

Yuchen Jin

@Yuchenj_UW

2 days

@yihyunCS they will open source o1 after 2 years when deepseek is at r10...

1

0

10

Yuchen Jin

@Yuchenj_UW

2 days

@xwang_lk tomorrow belongs to open source, I bet

0

10

Yuchen Jin

@Yuchenj_UW

2 days

@BlasianHokage @teortaxesTex love to hear your CoT 🤗

1

0

4

Yuchen Jin

@Yuchenj_UW

2 days

@BlasianHokage @teortaxesTex oh man... I don't buy it's because of safety

2

0

19

Yuchen Jin

@Yuchenj_UW

2 days

@Chiragjoshi_12 dude, I really hope so deepseek r1 is almost o1, I don't know why they don't open source it

1

0

16

Yuchen Jin

@Yuchenj_UW

2 days

@ai_for_success and i unfollowed openai and followed deepseek lol

1

0

22

Yuchen Jin

@Yuchenj_UW

2 days

@ai_for_success

0

23