Tianzhe Chu @TianzheC profile

Tianzhe Chu

@TianzheC

Followers

183

Following

147

Statuses

55

Now @hkudatascience. Previous @ShanghaiTechUni, visited @UCBerkeley.

Berkeley, CA

Joined September 2022

Don't wanna be here? Send us removal request.

Tianzhe Chu

@TianzheC

14 days

[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈

3

31

156

Tianzhe Chu

@TianzheC

2 days

@robertarail Nice works! We are still trying to formulate some theory stuff lol

0

2

Tianzhe Chu

@TianzheC

8 days

a pumping elephant paper

0

Tianzhe Chu

@TianzheC

8 days

RT @ScuderiaFerrari: Welcoming Zhou Guanyu back to the Ferrari family as he joins Antonio Giovinazzi as our official reserve driver! Zhou s…

0

3K

0

Tianzhe Chu

@TianzheC

11 days

RT @oran_ge: Google 这篇论文的结论非常清晰：《SFT 负责记忆, RL 负责泛化》简单结论：监督微调 (SFT) 就像给学生看大量的例题和答案。学生通过模仿例题来学习。强化学习 (RL) 就像让学生自己解题，答对了给奖励，答错了给惩罚。学生通过…

0

167

0

Tianzhe Chu

@TianzheC

13 days

RT @YugeTen: ✨New blog post✨: my attempt as a vision researcher at finally understanding RLHF -- a deep dive into PPO & DeepSeek's GRPO! N…

0

168

0

Tianzhe Chu

@TianzheC

13 days

@simon_zhai 🥳🥳🥳

0

Tianzhe Chu

@TianzheC

13 days

@rosinality Thanks for your interest! Verification do improve OOD performance. But to clarify, baseline of fig. 10 is initial checkpoint’s accuracy with {1,3,5,10} verification. Hence direct contribution of verification has been counteracted.

0

1

Tianzhe Chu

@TianzheC

14 days

RT @simon_zhai: [1/n] @deepseek_ai R1 has shown the power of RL without SFT. But what does RL learns differently than SFT? We have a answer…

0

67

0

Tianzhe Chu

@TianzheC

14 days

I posted twice since twitter finds our research so sexy that it labeled the tweet as sensitive content…😅

Tianzhe Chu

@TianzheC

14 days

[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈

1

0

2

Tianzhe Chu

@TianzheC

14 days

Thanks to great collaborators @simon_zhai @jihanyang13 @TongPetersb and advisors @sainingxie Dale Schuurmans @quocleix @svlevine @YiMaTweets. 🎇Happy New Year!📷🐍

0

2