Adam Fisch Profile
Adam Fisch

@adamjfisch

Followers
1K
Following
488
Statuses
291

Research Scientist @ Google DeepMind | Formerly: PhD @ MIT EECS.

Joined August 2017
Don't wanna be here? Send us removal request.
@adamjfisch
Adam Fisch
9 months
Excited to share new work from @GoogleDeepMind / @GoogleResearch on “Robust Preference Optimization through Reward Model Distillation”.
Tweet media one
3
41
259
@adamjfisch
Adam Fisch
3 months
RT @stats_stephen: Important topic, but this is more of a quick-start guide. For cutting-edge research on LLM evals, see these papers usin…
0
6
0
@adamjfisch
Adam Fisch
3 months
@ml_angelopoulos Awesome stuff!
0
0
1
@adamjfisch
Adam Fisch
3 months
RT @ml_angelopoulos: 🚨 New Textbook on Conformal Prediction 🚨 “The goal of this book is to teach the reader about…
0
90
0
@adamjfisch
Adam Fisch
4 months
@raymin0223 @GoogleDeepMind Thank you for all the great work Sangmin!
0
0
1
@adamjfisch
Adam Fisch
4 months
RT @aviral_kumar2: This work was led by the amazing @setlur_amrith during his internship at Google Research. With @nagpalchirag, @adamjfisc
0
2
0
@adamjfisch
Adam Fisch
4 months
RT @aviral_kumar2: 🚨New paper led by @setlur_amrith on process rewards for reasoning! Our PRMs that model specific notion of "progress" re…
0
18
0
@adamjfisch
Adam Fisch
4 months
RT @setlur_amrith: 🚨 Exciting new results with dense process reward models (PRMs) for reasoning. Our PRMs scale ✅ search compute by 1.5-5x…
0
41
0
@adamjfisch
Adam Fisch
8 months
@GoogleDeepMind @GoogleResearch @ml_angelopoulos Checkout the paper for more details! Fun work done together with a great team: @maynez_joshua, @rhofour, @bhuwandhingra, @amirgloberson, and @professorwcohen .
0
0
3
@adamjfisch
Adam Fisch
8 months
RT @raymin0223: 🚨Check out our new paper, Block Transformer! We propose an efficient architecture with Global-to-Local language modeling.…
0
30
0
@adamjfisch
Adam Fisch
9 months
@amritsinghbedi3 @SOURADIPCHAKR18 @GoogleDeepMind @GoogleResearch Yes, the issue is shifted to the RM now. But the point we make is that these can be easier to explicitly regularize/clip/ train multiple/etc. w/o having to derive a new obj for each variant. Also, as it’s real-valued, the RM won’t give p(y_1 > y_2 | x) = 0 for any (y_1, y_2, x).
1
0
1
@adamjfisch
Adam Fisch
9 months
@amritsinghbedi3 @SOURADIPCHAKR18 @GoogleDeepMind @GoogleResearch This is motivated by situations where the Bradley-Terry MLE is degenerate, which happens in most real datasets where we see 1, or just a few, sampled prefs for any given pair. We focus on binary prefs; depends, but prob. true for most practical annotation schemes with randomness.
1
0
1
@adamjfisch
Adam Fisch
9 months
@SOURADIPCHAKR18 @GoogleDeepMind @GoogleResearch They can be created in any way, don’t need to follow any rules or forms. In the paper, we used an ensemble of RMs with different inductive biases (not all of which may be correct choices). See also for some recent RM ensemble construction techniques.
1
0
0
@adamjfisch
Adam Fisch
9 months
@TengX6 @GoogleDeepMind @GoogleResearch pi*_{r_tgt} just denotes the policy that maximizes the exp. reward according to r_tgt, subj. to the KL penalty. That is, pi*_{r_tgt} = argmax_pi E_pi(y|x) [ r_tgt(x,y) ] - beta * KL(pi || pi_ref). (29) plugs r_tgt written in terms of pi*_{r_tgt} back into the main objective.
1
0
0
@adamjfisch
Adam Fisch
9 months
RT @TechnionLive: 🎉Exciting news! Three young faculty members from @TechnionLive have been awarded the 2024 Krill Prize for Excellence in S…
0
10
0