Kaixuan Ji @Kaixuan_Ji_19 profile

Kaixuan Ji

@Kaixuan_Ji_19

Followers

660

Following

49

Statuses

29

Ph.D. student in CS UCLA, B.E. from Tsinghua Uiversity. Interested in machine learning, especially RL Theory and LLM.

Joined October 2023

Don't wanna be here? Send us removal request.

Kaixuan Ji

@Kaixuan_Ji_19

2 months

Thanks a lot to my amazing co-authors Guanlin Liu, Renjie Zheng, Zheng Wu, Chen Dun, @QuanquanGu and Lin Yan.

1

7

Kaixuan Ji

@Kaixuan_Ji_19

2 months

@vwxyzjn @QuanquanGu Thank you for your interest and appreciation! The figures below show the learning curve of actor (Q-function) loss and critic (V-function) loss.

1

0

3

Kaixuan Ji

@Kaixuan_Ji_19

2 months

RT @QuanquanGu: Check out our work, Direct Q Optimization (DQO), which is the ‘true RL’ version of RLHF. Let’s make RLHF true RL again! Pa…

0

39

0

Kaixuan Ji

@Kaixuan_Ji_19

2 months

RT @QuanquanGu: What defines concurrent work? In the ML theory community, concurrent works are typically recognized when similar results ar…

0

3

0

Kaixuan Ji

@Kaixuan_Ji_19

8 months

RT @QuanquanGu: We've open-sourced the code and models for Self-Play Preference Optimization (SPPO)! 🚀🚀🚀 ⭐ code: 🤗…

0

70

0

Kaixuan Ji

@Kaixuan_Ji_19

9 months

Come and check our poster at 4:30 pm today (May 8) at #254 Halle B!

Kaixuan Ji

@Kaixuan_Ji_19

1 year

Very excited to share our accepted to #ICRL2024 ! We designed the first horizon-free algorithm for linear mixture MDPs with adversarial reward. We also proved the intrinsic hardness of adversarial linear MDPs. Check our paper at !

0

1

11

Kaixuan Ji

@Kaixuan_Ji_19

1 year

RT @Zixin_Wen: A fundamental question about neural networks is how do they learn to associate input features based on positional informatio…

0

2

0

Kaixuan Ji

@Kaixuan_Ji_19

1 year

RT @RuiqiZhang0614: What’s the role of the MLP layer in a transformer block? It’s intuitive to think that the MLP component helps reduce th…

0

5

0

Kaixuan Ji

@Kaixuan_Ji_19

1 year

RT @HuizhuoY: 🔥We have released the models of SPIN-Diffusion at @huggingface: UCLA-AGI/SPIN-Diffusion-iter3 and made a Demo at https://t.co…

0

37

0

Kaixuan Ji

@Kaixuan_Ji_19

1 year

@EMostaque @QuanquanGu Thank you so much for your attention to our work SPIN-Diffusion!

0

1

Kaixuan Ji

@Kaixuan_Ji_19

1 year

RT @HuizhuoY: 🚀🌈Thrilled to introduce the newest member of the SPIN family: SPIN-Diffusion! 🌀💫 As a self-play approach to fine-tuning diffu…

0

26

0

Kaixuan Ji

@Kaixuan_Ji_19

1 year

👉 Experiments: We integrate active queries into DPO and propose ADPO, for aligning LLMs with human preferences. We trained zephyr-7b-sft using both DPO and our ADPO. Our experiments reveal that ADPO outperforms DPO on the Open LLM Benchmark while querying for only half of human preferences! [4/4]

0

7