Seohong Park Profile
Seohong Park

@seohong_park

Followers
2K
Following
1K
Statuses
332

Reinforcement learning | CS Ph.D. student @berkeley_ai

Joined January 2022
Don't wanna be here? Send us removal request.
@seohong_park
Seohong Park
8 days
Excited to introduce flow Q-learning (FQL)! Flow Q-learning is a *simple* and scalable data-driven RL method that trains an expressive policy with flow matching. Paper: Project page: Thread ↓
14
141
780
@seohong_park
Seohong Park
2 days
RT @_oleh: Does off-policy value-based RL scale? In LLMs, larger scale predictably improves performance. Value-based RL learns from arbitra…
0
31
0
@seohong_park
Seohong Park
5 days
@SuperKk1998 Yep, I think that'd be an option in that case!
0
0
2
@seohong_park
Seohong Park
5 days
@jchencxh Thanks James!
0
0
1
@seohong_park
Seohong Park
6 days
RT @aviral_kumar2: 🚨Current scalable RL algos train a policy w/o value func, which is limiting with learning in open-ended, non-stationary,…
0
51
0
@seohong_park
Seohong Park
7 days
@fangchenliu_ haha thanks Fangchen!
1
0
1
@seohong_park
Seohong Park
7 days
@HongweiYi2 Thanks for the question! I believe FQL's one-step guidance as a general principle can be applied to any diffusion or flow model to guide it to maximize a learned or known function (e.g., rewards, Q values, preference models, classifiers, etc.).
1
0
2
@seohong_park
Seohong Park
7 days
@YouJiacheng Yep, that'd be an informative ablation to add!
1
0
1
@seohong_park
Seohong Park
7 days
@JesseFarebro Thanks Jesse! We haven't tried advanced variants of flow matching (b/c one of the main goals is to keep the method as simple as possible), but I also think there's a lot of room for improvement in incorporating more advanced flow matching or distillation techniques!
0
0
1
@seohong_park
Seohong Park
7 days
@chanwoopark20 Thanks Chanwoo!
0
0
2
@seohong_park
Seohong Park
8 days
RT @qiyang_li: Flow policies are expressive but slow and hard to be finetuned against a Q-function due to their iterative nature. Our idea…
0
3
0
@seohong_park
Seohong Park
8 days
@Stone_Tao haha thanks Stone!
0
0
1
@seohong_park
Seohong Park
8 days
@or_rivlin I agree, thanks! 🙂
0
0
1
@seohong_park
Seohong Park
8 days
@younggyoseo Thanks a lot, Younggyo!
0
0
1
@seohong_park
Seohong Park
8 days
RT @younggyoseo: This is a really well-written, nice paper on using flow matching for RL, you should check this!
0
4
0
@seohong_park
Seohong Park
8 days
RT @svlevine: We came up with a really simple way to train flow matching (diffusion) policies with offline RL! Flow Q-learning from @seohon
0
49
0
@seohong_park
Seohong Park
8 days
We've open-sourced our implementation, which we tried to make as clean as possible. Check out our paper and website for more details! Paper: Project page: Code: w/ @qiyang_li @svlevine
Tweet media one
1
1
24