![Teng Xiao Profile](https://pbs.twimg.com/profile_images/1767746341218721792/NCoOAtDW_x96.jpg)
Teng Xiao
@TengX6
Followers
72
Following
634
Statuses
20
PhD student at @penn_state. Machine Learning and Reinforcement Learning
USA
Joined September 2019
In our EMNLP2024 paper, "How to Leverage Demonstration Data in Alignment for Large Language Models? A Self-Imitation Learning Perspective" (, we also propose GSIL, a imitation learning (IL) approach that eliminates the need for complex adversarial training typically required in standard IL. GSIL enables lightweight and efficient alignment for large language models, demonstrating the potential of imitation learning in enhancing LLM.
Imitation via Reinforcement Learning (IvRL, IRL, RFT, ...) is not just eating the whole cake but baking a massive, new cake! 🎂 🍒 Late to the party on o1, RFT, etc, but here are some thoughts: - kudos to the team at OAI on further RL-based products and even exposing RL-tuning APIs. - we have some related work at #NeurIPS2024. Come visit! 📜 - flexibly learning to imitate via RL will enable progress far beyond direct BC/SFT and I expect we’ll see a lot more. - imitating exact text is limited, but with arbitrary traces/tool-calls this paradigm is increasingly powerful. - recent work strongly reminds of apprenticeship learning (Abbeel&Ng, 2004) @NeurIPSConf @GoogleDeepMind
0
0
1
🚀 Excited to share that our new paper, Cal-DPO, focused on LLM alignment, has been accepted to #NeurIPS2024! 🎉 We empirically and theoretically demonstrate that substantial improvements over DPO can be achieved by calibrating implicit rewards to align with absolute reward scales. Check it out here:
5
1
7
@adamjfisch @GoogleDeepMind @GoogleResearch Thanks for your explanation. I missed the notation of pi*_{r_tgt}.
0
0
1
RT @chenshi51326099: Thrilled to share our latest paper on understanding the factual behavior of LLMs from a mechanism interpretability vie…
0
25
0
@aahmadian_ @fentpot Hi Arash, thanks for your interesting work and do you plan to release the code?
0
0
0