Kaixuan Ji Profile
Kaixuan Ji

@Kaixuan_Ji_19

Followers
660
Following
49
Statuses
29

Ph.D. student in CS UCLA, B.E. from Tsinghua Uiversity. Interested in machine learning, especially RL Theory and LLM.

Joined October 2023
Don't wanna be here? Send us removal request.
@Kaixuan_Ji_19
Kaixuan Ji
2 months
Thanks a lot to my amazing co-authors Guanlin Liu, Renjie Zheng, Zheng Wu, Chen Dun, @QuanquanGu and Lin Yan.
1
1
7
@Kaixuan_Ji_19
Kaixuan Ji
2 months
@vwxyzjn @QuanquanGu Thank you for your interest and appreciation! The figures below show the learning curve of actor (Q-function) loss and critic (V-function) loss.
Tweet media one
Tweet media two
1
0
3
@Kaixuan_Ji_19
Kaixuan Ji
2 months
RT @QuanquanGu: Check out our work, Direct Q Optimization (DQO), which is the ‘true RL’ version of RLHF. Let’s make RLHF true RL again! Pa…
0
39
0
@Kaixuan_Ji_19
Kaixuan Ji
2 months
RT @QuanquanGu: What defines concurrent work? In the ML theory community, concurrent works are typically recognized when similar results ar…
0
3
0
@Kaixuan_Ji_19
Kaixuan Ji
8 months
RT @QuanquanGu: We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀 ⭐ code: 🤗…
0
70
0
@Kaixuan_Ji_19
Kaixuan Ji
9 months
Come and check our poster at 4:30 pm today (May 8) at #254 Halle B!
@Kaixuan_Ji_19
Kaixuan Ji
1 year
Very excited to share our accepted to #ICRL2024 ! We designed the first horizon-free algorithm for linear mixture MDPs with adversarial reward. We also proved the intrinsic hardness of adversarial linear MDPs. Check our paper at !
Tweet media one
0
1
11
@Kaixuan_Ji_19
Kaixuan Ji
1 year
RT @Zixin_Wen: A fundamental question about neural networks is how do they learn to associate input features based on positional informatio…
0
2
0
@Kaixuan_Ji_19
Kaixuan Ji
1 year
RT @RuiqiZhang0614: What’s the role of the MLP layer in a transformer block? It’s intuitive to think that the MLP component helps reduce th…
0
5
0
@Kaixuan_Ji_19
Kaixuan Ji
1 year
RT @HuizhuoY: 🔥We have released the models of SPIN-Diffusion at @huggingface: UCLA-AGI/SPIN-Diffusion-iter3 and made a Demo at https://t.co…
0
37
0
@Kaixuan_Ji_19
Kaixuan Ji
1 year
@EMostaque @QuanquanGu Thank you so much for your attention to our work SPIN-Diffusion!
0
0
1
@Kaixuan_Ji_19
Kaixuan Ji
1 year
RT @HuizhuoY: 🚀🌈Thrilled to introduce the newest member of the SPIN family: SPIN-Diffusion! 🌀💫 As a self-play approach to fine-tuning diffu…
0
26
0
@Kaixuan_Ji_19
Kaixuan Ji
1 year
👉 Experiments: We integrate active queries into DPO and propose ADPO, for aligning LLMs with human preferences. We trained zephyr-7b-sft using both DPO and our ADPO. Our experiments reveal that ADPO outperforms DPO on the Open LLM Benchmark while querying for only half of human preferences! [4/4]
Tweet media one
0
0
7