![Tianzhe Chu Profile](https://pbs.twimg.com/profile_images/1711326692428726272/wwywQ68n_x96.jpg)
Tianzhe Chu
@TianzheC
Followers
183
Following
147
Statuses
55
Now @hkudatascience. Previous @ShanghaiTechUni, visited @UCBerkeley.
Berkeley, CA
Joined September 2022
[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈
3
31
156
RT @ScuderiaFerrari: Welcoming Zhou Guanyu back to the Ferrari family as he joins Antonio Giovinazzi as our official reserve driver! Zhou s…
0
3K
0
@rosinality Thanks for your interest! Verification do improve OOD performance. But to clarify, baseline of fig. 10 is initial checkpoint’s accuracy with {1,3,5,10} verification. Hence direct contribution of verification has been counteracted.
0
0
1
RT @simon_zhai: [1/n] @deepseek_ai R1 has shown the power of RL without SFT. But what does RL learns differently than SFT? We have a answer…
0
67
0
I posted twice since twitter finds our research so sexy that it labeled the tweet as sensitive content…😅
[1/n] 🧐@deepseek_ai #DeepSeekR1 has shown the power of RL without SFT. But what does RL learns differently than SFT? Our answer is: 📉SFT Memorizes, RL Generalizes.📈
1
0
2
Thanks to great collaborators @simon_zhai @jihanyang13 @TongPetersb and advisors @sainingxie Dale Schuurmans @quocleix @svlevine @YiMaTweets. 🎇Happy New Year!📷🐍
0
0
2