TengX6 Profile Banner
Teng Xiao Profile
Teng Xiao

@TengX6

Followers
72
Following
634
Statuses
20

PhD student at @penn_state. Machine Learning and Reinforcement Learning

USA
Joined September 2019
Don't wanna be here? Send us removal request.
@TengX6
Teng Xiao
2 months
In our EMNLP2024 paper, "How to Leverage Demonstration Data in Alignment for Large Language Models? A Self-Imitation Learning Perspective" (, we also propose GSIL, a imitation learning (IL) approach that eliminates the need for complex adversarial training typically required in standard IL. GSIL enables lightweight and efficient alignment for large language models, demonstrating the potential of imitation learning in enhancing LLM.
@m_wulfmeier
Markus Wulfmeier
2 months
Imitation via Reinforcement Learning (IvRL, IRL, RFT, ...) is not just eating the whole cake but baking a massive, new cake! 🎂 🍒 Late to the party on o1, RFT, etc, but here are some thoughts: - kudos to the team at OAI on further RL-based products and even exposing RL-tuning APIs. - we have some related work at #NeurIPS2024. Come visit! 📜 - flexibly learning to imitate via RL will enable progress far beyond direct BC/SFT and I expect we’ll see a lot more. - imitating exact text is limited, but with arbitrary traces/tool-calls this paradigm is increasingly powerful. - recent work strongly reminds of apprenticeship learning (Abbeel&Ng, 2004) @NeurIPSConf @GoogleDeepMind
Tweet media one
Tweet media two
0
0
1
@TengX6
Teng Xiao
2 months
🚀 Excited to share that our new paper, Cal-DPO, focused on LLM alignment, has been accepted to #NeurIPS2024! 🎉 We empirically and theoretically demonstrate that substantial improvements over DPO can be achieved by calibrating implicit rewards to align with absolute reward scales. Check it out here:
Tweet media one
Tweet media two
5
1
7
@TengX6
Teng Xiao
3 months
Extensive experiments show that GSIL consistently and significantly outperforms baselines in many challenging benchmarks, such as coding, mathematical reasoning and instruction-following benchmark. Code will be public available at
0
0
1
@TengX6
Teng Xiao
3 months
GSIL eliminates the need for complex adversarial training in standard imitation learning, achieving lightweight and efficient fine-tuning for large language models. In addition, GSIL enables a unified view for alignment with demonstration data.
0
0
0
@TengX6
Teng Xiao
9 months
@adamjfisch @GoogleDeepMind @GoogleResearch Thanks for your explanation. I missed the notation of pi*_{r_tgt}.
0
0
1
@TengX6
Teng Xiao
9 months
1
0
0
@TengX6
Teng Xiao
9 months
@winglian Thank you. I noticed the choose_reaward is decreasing. Have you noticed this?
0
0
0
@TengX6
Teng Xiao
9 months
@winglian Thank you very much for your reply! Did you utilize the this version?
1
0
0
@TengX6
Teng Xiao
11 months
@MLMazda Congrats
0
0
1
@TengX6
Teng Xiao
11 months
Experiments show 3M-Diffusion excels in creating diverse, novel 2D molecular graphs semantically aligned with text prompts. This joint work was conducted by Huaisheng Zhu (@huaiszhu), Teng Xiao (@TengX6), and Vasant Honavar (@vhonavar). Discover more results in our paper. (2/n)
Tweet media one
0
2
4
@TengX6
Teng Xiao
11 months
RT @chenshi51326099: Thrilled to share our latest paper on understanding the factual behavior of LLMs from a mechanism interpretability vie…
0
25
0
@TengX6
Teng Xiao
1 year
@aahmadian_ @fentpot Hi Arash, thanks for your interesting work and do you plan to release the code?
0
0
0
@TengX6
Teng Xiao
1 year
@_robertkirk Very interesting work. have you released the code for this? Thanks
0
0
0