Yuchenj_UW Profile Banner
Yuchen Jin Profile
Yuchen Jin

@Yuchenj_UW

Followers
30K
Following
41K
Statuses
8K

Co-founder & CTO @hyperbolic_labs 🧑‍🍳 fun AI systems. Previously at OctoAI (acquired by @nvidia) building @ApacheTVM, PhD @uwcse 🤖

another Galaxy
Joined November 2016
Don't wanna be here? Send us removal request.
@Yuchenj_UW
Yuchen Jin
8 months
Outperform GPT-3 with @karpathy's llm.c using just 1/3 training tokens ✨ Another day has passed, and I trained GPT-2 (124M) with llm.c for 150B tokens, achieving 35.5% accuracy on HellaSwag. This surpasses the GPT-3 paper’s 33.7% accuracy trained for 300B tokens. It matched the original paper’s 33.7% score at only ~95B tokens, using less than 1/3 training tokens compared to the GPT-3 paper. Key reasons are: (1) I tripled the max learning rate which sped up the training, more details in my last tweet: (2) I trained the model with @huggingface's FineWeb dataset, which is described as “cleaned and deduplicated English web data from CommonCrawl”. The GPT-3 paper, published 4 years ago, also primarily trained on filtered and deduplicated CommonCrawl data, and the paper discussed their data cleaning methods. The improvements might be due to the better quality of web data available over the past 4 years or Huggingface's data cleaning methods are better.
Tweet media one
@karpathy
Andrej Karpathy
9 months
Apparently today is the 4th year anniversary of GPT-3! Which I am accidentally celebrating by re-training the smallest model in the miniseries right now :). HellaSwag 33.7 (Appendix H) almost reached this a few steps ago (though this is only 45% of the training done). I remember when the GPT-3 paper came out quite clearly because I had to interrupt work and go out for a walk. The realization hit me that an important property of the field flipped. In ~2011, progress in AI felt constrained primarily by algorithms. We needed better ideas, better modeling, better approaches to make further progress. If you offered me a 10X bigger computer, I'm not sure what I would have even used it for. GPT-3 paper showed that there was this thing that would just become better on a large variety of practical tasks, if you only trained a bigger one. Better algorithms become a bonus, not a necessity for progress in AGI. Possibly not forever and going forward, but at least locally and for the time being, in a very practical sense. Today, if you gave me a 10X bigger computer I would know exactly what to do with it, and then I'd ask for more. It's this property of AI that also gets to the heart of why NVIDIA is a 2.8T company today. I'm not sure how others experienced it, but the realization convincingly clicked for me with GPT-3, 4 years ago.
Tweet media one
25
106
910
@Yuchenj_UW
Yuchen Jin
20 hours
@PalmerLuckey You founded Oculus VR at 19, impressive
0
0
34
@Yuchenj_UW
Yuchen Jin
23 hours
@elonmusk everyone who judges people by age is subtard
6
2
92
@Yuchenj_UW
Yuchen Jin
1 day
@deedydas insane it happened
0
0
2
@Yuchenj_UW
Yuchen Jin
1 day
@ClementDelangue Nice! openai only has 4.7k 💔
Tweet media one
2
0
16
@Yuchenj_UW
Yuchen Jin
1 day
@Laz4rz let's see if we have some intern project!
0
0
2
@Yuchenj_UW
Yuchen Jin
1 day
@elliotarledge @freeCodeCamp congrats Elliot!
1
0
1
@Yuchenj_UW
Yuchen Jin
1 day
@99aico Incredible acceleration time we are living in
0
0
2
@Yuchenj_UW
Yuchen Jin
2 days
0
0
1
@Yuchenj_UW
Yuchen Jin
2 days
@yuejiedeli Whisper is an impressive speech recognition model actually, but it's from 2022 so many people who got into AI recently never heard about it
1
0
7
@Yuchenj_UW
Yuchen Jin
2 days
@actualhog we will onboard more so people can rent them on our GPU marketplace!
0
0
2
@Yuchenj_UW
Yuchen Jin
2 days
@actualhog 8xH200s, we host it too:
1
0
4
@Yuchenj_UW
Yuchen Jin
2 days
@yihyunCS "because it's unsafe"
1
0
3
@Yuchenj_UW
Yuchen Jin
2 days
@pvdbzy118306 really? do you have examples?
1
0
3
@Yuchenj_UW
Yuchen Jin
2 days
@yihyunCS they will open source o1 after 2 years when deepseek is at r10...
1
0
10
@Yuchenj_UW
Yuchen Jin
2 days
@xwang_lk tomorrow belongs to open source, I bet
0
0
10
@Yuchenj_UW
Yuchen Jin
2 days
@BlasianHokage @teortaxesTex love to hear your CoT 🤗
1
0
4
@Yuchenj_UW
Yuchen Jin
2 days
@BlasianHokage @teortaxesTex oh man... I don't buy it's because of safety
2
0
19
@Yuchenj_UW
Yuchen Jin
2 days
@Chiragjoshi_12 dude, I really hope so deepseek r1 is almost o1, I don't know why they don't open source it
1
0
16
@Yuchenj_UW
Yuchen Jin
2 days
@ai_for_success and i unfollowed openai and followed deepseek lol
1
0
22
@Yuchenj_UW
Yuchen Jin
2 days
Tweet media one
0
0
23