Adam Fisch @adamjfisch profile

Adam Fisch

@adamjfisch

Followers

1K

Following

488

Statuses

291

Research Scientist @ Google DeepMind | Formerly: PhD @ MIT EECS.

Joined August 2017

Don't wanna be here? Send us removal request.

Adam Fisch

@adamjfisch

9 months

Excited to share new work from @GoogleDeepMind / @GoogleResearch on “Robust Preference Optimization through Reward Model Distillation”.

3

41

259

Adam Fisch

@adamjfisch

3 months

RT @stats_stephen: Important topic, but this is more of a quick-start guide. For cutting-edge research on LLM evals, see these papers usin…

0

6

0

Adam Fisch

@adamjfisch

3 months

@ml_angelopoulos Awesome stuff!

0

1

Adam Fisch

@adamjfisch

3 months

RT @ml_angelopoulos: 🚨 New Textbook on Conformal Prediction 🚨 “The goal of this book is to teach the reader about…

0

90

0

Adam Fisch

@adamjfisch

4 months

@raymin0223 @GoogleDeepMind Thank you for all the great work Sangmin!

0

1

Adam Fisch

@adamjfisch

4 months

RT @aviral_kumar2: This work was led by the amazing @setlur_amrith during his internship at Google Research. With @nagpalchirag, @adamjfisc…

0

2

0

Adam Fisch

@adamjfisch

4 months

RT @aviral_kumar2: 🚨New paper led by @setlur_amrith on process rewards for reasoning! Our PRMs that model specific notion of "progress" re…

0

18

0

Adam Fisch

@adamjfisch

4 months

RT @setlur_amrith: 🚨 Exciting new results with dense process reward models (PRMs) for reasoning. Our PRMs scale ✅ search compute by 1.5-5x…

0

41

0

Adam Fisch

@adamjfisch

8 months

@GoogleDeepMind @GoogleResearch @ml_angelopoulos Checkout the paper for more details! Fun work done together with a great team: @maynez_joshua, @rhofour, @bhuwandhingra, @amirgloberson, and @professorwcohen .

0

3

Adam Fisch

@adamjfisch

8 months

RT @raymin0223: 🚨Check out our new paper, Block Transformer! We propose an efficient architecture with Global-to-Local language modeling.…

0

30

0

Adam Fisch

@adamjfisch

9 months

@amritsinghbedi3 @SOURADIPCHAKR18 @GoogleDeepMind @GoogleResearch Yes, the issue is shifted to the RM now. But the point we make is that these can be easier to explicitly regularize/clip/ train multiple/etc. w/o having to derive a new obj for each variant. Also, as it’s real-valued, the RM won’t give p(y_1 > y_2 | x) = 0 for any (y_1, y_2, x).

1

0

1

Adam Fisch

@adamjfisch

9 months

@amritsinghbedi3 @SOURADIPCHAKR18 @GoogleDeepMind @GoogleResearch This is motivated by situations where the Bradley-Terry MLE is degenerate, which happens in most real datasets where we see 1, or just a few, sampled prefs for any given pair. We focus on binary prefs; depends, but prob. true for most practical annotation schemes with randomness.

1

0

1

Adam Fisch

@adamjfisch

9 months

@SOURADIPCHAKR18 @GoogleDeepMind @GoogleResearch They can be created in any way, don’t need to follow any rules or forms. In the paper, we used an ensemble of RMs with different inductive biases (not all of which may be correct choices). See also for some recent RM ensemble construction techniques.

1

0

Adam Fisch

@adamjfisch

9 months

@TengX6 @GoogleDeepMind @GoogleResearch pi*_{r_tgt} just denotes the policy that maximizes the exp. reward according to r_tgt, subj. to the KL penalty. That is, pi*_{r_tgt} = argmax_pi E_pi(y|x) [ r_tgt(x,y) ] - beta * KL(pi || pi_ref). (29) plugs r_tgt written in terms of pi*_{r_tgt} back into the main objective.

1

0

Adam Fisch

@adamjfisch

9 months

RT @TechnionLive: 🎉Exciting news! Three young faculty members from @TechnionLive have been awarded the 2024 Krill Prize for Excellence in S…

0

10

0