![Kaixuan Ji Profile](https://pbs.twimg.com/profile_images/1748779260935069696/hEH4ltjl_x96.jpg)
Kaixuan Ji
@Kaixuan_Ji_19
Followers
660
Following
49
Statuses
29
Ph.D. student in CS UCLA, B.E. from Tsinghua Uiversity. Interested in machine learning, especially RL Theory and LLM.
Joined October 2023
Thanks a lot to my amazing co-authors Guanlin Liu, Renjie Zheng, Zheng Wu, Chen Dun, @QuanquanGu and Lin Yan.
1
1
7
@vwxyzjn @QuanquanGu Thank you for your interest and appreciation! The figures below show the learning curve of actor (Q-function) loss and critic (V-function) loss.
1
0
3
RT @QuanquanGu: Check out our work, Direct Q Optimization (DQO), which is the ‘true RL’ version of RLHF. Let’s make RLHF true RL again! Pa…
0
39
0
RT @QuanquanGu: What defines concurrent work? In the ML theory community, concurrent works are typically recognized when similar results ar…
0
3
0
RT @QuanquanGu: We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀 ⭐ code: 🤗…
0
70
0
Come and check our poster at 4:30 pm today (May 8) at #254 Halle B!
Very excited to share our accepted to #ICRL2024 ! We designed the first horizon-free algorithm for linear mixture MDPs with adversarial reward. We also proved the intrinsic hardness of adversarial linear MDPs. Check our paper at !
0
1
11
RT @Zixin_Wen: A fundamental question about neural networks is how do they learn to associate input features based on positional informatio…
0
2
0
RT @RuiqiZhang0614: What’s the role of the MLP layer in a transformer block? It’s intuitive to think that the MLP component helps reduce th…
0
5
0
RT @HuizhuoY: 🔥We have released the models of SPIN-Diffusion at @huggingface: UCLA-AGI/SPIN-Diffusion-iter3 and made a Demo at https://t.co…
0
37
0
RT @HuizhuoY: 🚀🌈Thrilled to introduce the newest member of the SPIN family: SPIN-Diffusion! 🌀💫 As a self-play approach to fine-tuning diffu…
0
26
0
👉 Experiments: We integrate active queries into DPO and propose ADPO, for aligning LLMs with human preferences. We trained zephyr-7b-sft using both DPO and our ADPO. Our experiments reveal that ADPO outperforms DPO on the Open LLM Benchmark while querying for only half of human preferences! [4/4]
0
0
7