![Adam Fisch Profile](https://pbs.twimg.com/profile_images/990470781363568640/WpwJ54Kk_x96.jpg)
Adam Fisch
@adamjfisch
Followers
1K
Following
488
Statuses
291
Research Scientist @ Google DeepMind | Formerly: PhD @ MIT EECS.
Joined August 2017
Excited to share new work from @GoogleDeepMind / @GoogleResearch on “Robust Preference Optimization through Reward Model Distillation”.
3
41
259
RT @stats_stephen: Important topic, but this is more of a quick-start guide. For cutting-edge research on LLM evals, see these papers usin…
0
6
0
RT @ml_angelopoulos: 🚨 New Textbook on Conformal Prediction 🚨 “The goal of this book is to teach the reader about…
0
90
0
RT @aviral_kumar2: This work was led by the amazing @setlur_amrith during his internship at Google Research. With @nagpalchirag, @adamjfisc…
0
2
0
RT @aviral_kumar2: 🚨New paper led by @setlur_amrith on process rewards for reasoning! Our PRMs that model specific notion of "progress" re…
0
18
0
RT @setlur_amrith: 🚨 Exciting new results with dense process reward models (PRMs) for reasoning. Our PRMs scale ✅ search compute by 1.5-5x…
0
41
0
@GoogleDeepMind @GoogleResearch @ml_angelopoulos Checkout the paper for more details! Fun work done together with a great team: @maynez_joshua, @rhofour, @bhuwandhingra, @amirgloberson, and @professorwcohen .
0
0
3
RT @raymin0223: 🚨Check out our new paper, Block Transformer! We propose an efficient architecture with Global-to-Local language modeling.…
0
30
0
@amritsinghbedi3 @SOURADIPCHAKR18 @GoogleDeepMind @GoogleResearch Yes, the issue is shifted to the RM now. But the point we make is that these can be easier to explicitly regularize/clip/ train multiple/etc. w/o having to derive a new obj for each variant. Also, as it’s real-valued, the RM won’t give p(y_1 > y_2 | x) = 0 for any (y_1, y_2, x).
1
0
1
@amritsinghbedi3 @SOURADIPCHAKR18 @GoogleDeepMind @GoogleResearch This is motivated by situations where the Bradley-Terry MLE is degenerate, which happens in most real datasets where we see 1, or just a few, sampled prefs for any given pair. We focus on binary prefs; depends, but prob. true for most practical annotation schemes with randomness.
1
0
1
@SOURADIPCHAKR18 @GoogleDeepMind @GoogleResearch They can be created in any way, don’t need to follow any rules or forms. In the paper, we used an ensemble of RMs with different inductive biases (not all of which may be correct choices). See also for some recent RM ensemble construction techniques.
1
0
0
@TengX6 @GoogleDeepMind @GoogleResearch pi*_{r_tgt} just denotes the policy that maximizes the exp. reward according to r_tgt, subj. to the KL penalty. That is, pi*_{r_tgt} = argmax_pi E_pi(y|x) [ r_tgt(x,y) ] - beta * KL(pi || pi_ref). (29) plugs r_tgt written in terms of pi*_{r_tgt} back into the main objective.
1
0
0
RT @TechnionLive: 🎉Exciting news! Three young faculty members from @TechnionLive have been awarded the 2024 Krill Prize for Excellence in S…
0
10
0