TianzheC Profile Banner
Tianzhe Chu Profile
Tianzhe Chu

@TianzheC

Followers
183
Following
147
Statuses
55

Now @hkudatascience. Previous @ShanghaiTechUni, visited @UCBerkeley.

Berkeley, CA
Joined September 2022
Don't wanna be here? Send us removal request.
@TianzheC
Tianzhe Chu
14 days
[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈
3
31
156
@TianzheC
Tianzhe Chu
2 days
@robertarail Nice works! We are still trying to formulate some theory stuff lol
0
0
2
@TianzheC
Tianzhe Chu
8 days
a pumping elephant paper
0
0
0
@TianzheC
Tianzhe Chu
8 days
RT @ScuderiaFerrari: Welcoming Zhou Guanyu back to the Ferrari family as he joins Antonio Giovinazzi as our official reserve driver! Zhou s…
0
3K
0
@TianzheC
Tianzhe Chu
11 days
RT @oran_ge: Google 这篇论文的结论非常清晰:《SFT 负责记忆, RL 负责泛化》 简单结论: 监督微调 (SFT) 就像给学生看大量的例题和答案。 学生通过模仿例题来学习。 强化学习 (RL) 就像让学生自己解题,答对了给奖励,答错了给惩罚。 学生通过…
0
167
0
@TianzheC
Tianzhe Chu
13 days
RT @YugeTen: ✨New blog post✨: my attempt as a vision researcher at finally understanding RLHF -- a deep dive into PPO & DeepSeek's GRPO! N…
0
168
0
@TianzheC
Tianzhe Chu
13 days
@simon_zhai 🥳🥳🥳
0
0
0
@TianzheC
Tianzhe Chu
13 days
@rosinality Thanks for your interest! Verification do improve OOD performance. But to clarify, baseline of fig. 10 is initial checkpoint’s accuracy with {1,3,5,10} verification. Hence direct contribution of verification has been counteracted.
0
0
1
@TianzheC
Tianzhe Chu
14 days
RT @simon_zhai: [1/n] @deepseek_ai R1 has shown the power of RL without SFT. But what does RL learns differently than SFT? We have a answer…
0
67
0
@TianzheC
Tianzhe Chu
14 days
I posted twice since twitter finds our research so sexy that it labeled the tweet as sensitive content…😅
@TianzheC
Tianzhe Chu
14 days
[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈
1
0
2
@TianzheC
Tianzhe Chu
14 days
Thanks to great collaborators @simon_zhai @jihanyang13 @TongPetersb and advisors @sainingxie Dale Schuurmans @quocleix @svlevine @YiMaTweets. 🎇Happy New Year!📷🐍
0
0
2