![Kunhao Zheng Profile](https://pbs.twimg.com/profile_images/1396783604907270145/3a9LYQCn_x96.jpg)
Kunhao Zheng
@KunhaoZ
Followers
279
Following
271
Statuses
107
École Polytechnique X18, SJTU. Now in the amazing FAIR CodeGen @AIatMeta. Alumni: @Huggingface, Sea AI Lab, intern @openai
Joined January 2019
This would not have been possible without the amazing team work and support from @feeelix_feng @ArielKwiatkowsk @KempeLab @YaqiDuanPKU and @syhw !
0
0
4
RT @feeelix_feng: You think on-policy sampling gives the best reward models? Think again! 🔥 Our finding: Even with on-policy data, reward m…
0
39
0
@shawnup Each problem comes with public and private test. Yeah this is to make sure during training the code is actually correct but not just hacking the public tests like a bunch of if-else. For sure, we don’t expose private test cases information to the model when eval on valid/test set
0
0
0
@willccbb Some detours on offline methods like DPO. Also, codegen ppl (the single-turn codegen guys) and agent ppl (the SWE-Bench guys) are quite separated and didn't notice the role of multi-turn codegen until very recently, not to mention bringing it to train time.
0
0
14
I’ll be at #NeurIPS2024! Let’s chat about code generation, reasoning and RL, and life of course!
1
2
31
@srush_nlp @justintchiu iirc alphazero has a learned value function (not exactly a verifier cuz value function is tied to a policy but verifier should be independent of it).
0
0
2
@srush_nlp Not really. ExIt is described in and AlphaGo Zero roughly in the same time. It uses the MCTS as the policy improvement operator but I think it's not restricted to that: You can train on any policy-in policy-out operator that improves the perf.
0
3
8
Work done with amazing @DecugisJuliette (joint first author), @jnsgehring, @TacoCohen, Benjamin Negrevergne, @syhw.
0
2
5