![Seohong Park Profile](https://pbs.twimg.com/profile_images/1486937111752507393/FK8cwqh1_x96.jpg)
Seohong Park
@seohong_park
Followers
2K
Following
1K
Statuses
332
Reinforcement learning | CS Ph.D. student @berkeley_ai
Joined January 2022
RT @_oleh: Does off-policy value-based RL scale? In LLMs, larger scale predictably improves performance. Value-based RL learns from arbitra…
0
31
0
RT @aviral_kumar2: 🚨Current scalable RL algos train a policy w/o value func, which is limiting with learning in open-ended, non-stationary,…
0
51
0
@HongweiYi2 Thanks for the question! I believe FQL's one-step guidance as a general principle can be applied to any diffusion or flow model to guide it to maximize a learned or known function (e.g., rewards, Q values, preference models, classifiers, etc.).
1
0
2
@JesseFarebro Thanks Jesse! We haven't tried advanced variants of flow matching (b/c one of the main goals is to keep the method as simple as possible), but I also think there's a lot of room for improvement in incorporating more advanced flow matching or distillation techniques!
0
0
1
RT @qiyang_li: Flow policies are expressive but slow and hard to be finetuned against a Q-function due to their iterative nature. Our idea…
0
3
0
RT @younggyoseo: This is a really well-written, nice paper on using flow matching for RL, you should check this!
0
4
0
We've open-sourced our implementation, which we tried to make as clean as possible. Check out our paper and website for more details! Paper: Project page: Code: w/ @qiyang_li @svlevine
1
1
24