Seohong Park @seohong_park profile

Seohong Park

@seohong_park

Followers

2K

Following

1K

Statuses

332

Reinforcement learning | CS Ph.D. student @berkeley_ai

Joined January 2022

Don't wanna be here? Send us removal request.

Seohong Park

@seohong_park

8 days

Excited to introduce flow Q-learning (FQL)! Flow Q-learning is a *simple* and scalable data-driven RL method that trains an expressive policy with flow matching. Paper: Project page: Thread ↓

14

141

780

Seohong Park

@seohong_park

2 days

RT @_oleh: Does off-policy value-based RL scale? In LLMs, larger scale predictably improves performance. Value-based RL learns from arbitra…

0

31

0

Seohong Park

@seohong_park

5 days

@SuperKk1998 Yep, I think that'd be an option in that case!

0

2

Seohong Park

@seohong_park

5 days

@jchencxh Thanks James!

0

1

Seohong Park

@seohong_park

6 days

RT @aviral_kumar2: 🚨Current scalable RL algos train a policy w/o value func, which is limiting with learning in open-ended, non-stationary,…

0

51

0

Seohong Park

@seohong_park

7 days

@fangchenliu_ haha thanks Fangchen!

1

0

1

Seohong Park

@seohong_park

7 days

@HongweiYi2 Thanks for the question! I believe FQL's one-step guidance as a general principle can be applied to any diffusion or flow model to guide it to maximize a learned or known function (e.g., rewards, Q values, preference models, classifiers, etc.).

1

0

2

Seohong Park

@seohong_park

7 days

@YouJiacheng Yep, that'd be an informative ablation to add!

1

0

1

Seohong Park

@seohong_park

7 days

@JesseFarebro Thanks Jesse! We haven't tried advanced variants of flow matching (b/c one of the main goals is to keep the method as simple as possible), but I also think there's a lot of room for improvement in incorporating more advanced flow matching or distillation techniques!

0

1

Seohong Park

@seohong_park

7 days

@chanwoopark20 Thanks Chanwoo!

0

2

Seohong Park

@seohong_park

8 days

RT @qiyang_li: Flow policies are expressive but slow and hard to be finetuned against a Q-function due to their iterative nature. Our idea…

0

3

0

Seohong Park

@seohong_park

8 days

@Stone_Tao haha thanks Stone!

0

1

Seohong Park

@seohong_park

8 days

@or_rivlin I agree, thanks! 🙂

0

1

Seohong Park

@seohong_park

8 days

@younggyoseo Thanks a lot, Younggyo!

0

1

Seohong Park

@seohong_park

8 days

RT @younggyoseo: This is a really well-written, nice paper on using flow matching for RL, you should check this!

0

4

0

Seohong Park

@seohong_park

8 days

RT @svlevine: We came up with a really simple way to train flow matching (diffusion) policies with offline RL! Flow Q-learning from @seohon…

0

49

0

Seohong Park

@seohong_park

8 days

We've open-sourced our implementation, which we tried to make as clean as possible. Check out our paper and website for more details! Paper: Project page: Code: w/ @qiyang_li @svlevine

1

24