Teng Xiao @TengX6 profile

Teng Xiao

@TengX6

Followers

72

Following

634

Statuses

20

PhD student at @penn_state. Machine Learning and Reinforcement Learning

USA

Joined September 2019

Don't wanna be here? Send us removal request.

Teng Xiao

@TengX6

2 months

In our EMNLP2024 paper, "How to Leverage Demonstration Data in Alignment for Large Language Models? A Self-Imitation Learning Perspective" (, we also propose GSIL, a imitation learning (IL) approach that eliminates the need for complex adversarial training typically required in standard IL. GSIL enables lightweight and efficient alignment for large language models, demonstrating the potential of imitation learning in enhancing LLM.

Markus Wulfmeier

@m_wulfmeier

2 months

Imitation via Reinforcement Learning (IvRL, IRL, RFT, ...) is not just eating the whole cake but baking a massive, new cake! 🎂 🍒 Late to the party on o1, RFT, etc, but here are some thoughts: - kudos to the team at OAI on further RL-based products and even exposing RL-tuning APIs. - we have some related work at #NeurIPS2024. Come visit! 📜 - flexibly learning to imitate via RL will enable progress far beyond direct BC/SFT and I expect we’ll see a lot more. - imitating exact text is limited, but with arbitrary traces/tool-calls this paradigm is increasingly powerful. - recent work strongly reminds of apprenticeship learning (Abbeel&Ng, 2004) @NeurIPSConf @GoogleDeepMind

0

1

Teng Xiao

@TengX6

2 months

🚀 Excited to share that our new paper, Cal-DPO, focused on LLM alignment, has been accepted to #NeurIPS2024! 🎉 We empirically and theoretically demonstrate that substantial improvements over DPO can be achieved by calibrating implicit rewards to align with absolute reward scales. Check it out here:

5

1

7

Teng Xiao

@TengX6

3 months

Extensive experiments show that GSIL consistently and significantly outperforms baselines in many challenging benchmarks, such as coding, mathematical reasoning and instruction-following benchmark. Code will be public available at

0

1

Teng Xiao

@TengX6

3 months

GSIL eliminates the need for complex adversarial training in standard imitation learning, achieving lightweight and efficient fine-tuning for large language models. In addition, GSIL enables a unified view for alignment with demonstration data.

0

Teng Xiao

@TengX6

9 months

@adamjfisch @GoogleDeepMind @GoogleResearch Thanks for your explanation. I missed the notation of pi*_{r_tgt}.

0

1

Teng Xiao

@TengX6

9 months

@adamjfisch @GoogleDeepMind @GoogleResearch Sorry! I mean Eq. (29)

1

0

Teng Xiao

@TengX6

9 months

@winglian Thank you. I noticed the choose_reaward is decreasing. Have you noticed this?

0

Teng Xiao

@TengX6

9 months

@winglian Thank you very much for your reply! Did you utilize the this version?

1

0

Teng Xiao

@TengX6

11 months

@MLMazda Congrats

0

1

Teng Xiao

@TengX6

11 months

Experiments show 3M-Diffusion excels in creating diverse, novel 2D molecular graphs semantically aligned with text prompts. This joint work was conducted by Huaisheng Zhu (@huaiszhu), Teng Xiao (@TengX6), and Vasant Honavar (@vhonavar). Discover more results in our paper. (2/n)

0

2

4

Teng Xiao

@TengX6

11 months

RT @chenshi51326099: Thrilled to share our latest paper on understanding the factual behavior of LLMs from a mechanism interpretability vie…

0

25

0

Teng Xiao

@TengX6

1 year

@aahmadian_ @fentpot Hi Arash, thanks for your interesting work and do you plan to release the code?

0

Teng Xiao

@TengX6

1 year

@_robertkirk Very interesting work. have you released the code for this? Thanks

0