![Muhammad Khalifa Profile](https://pbs.twimg.com/profile_images/1656179221356748801/LjIyA7R-_x96.jpg)
Muhammad Khalifa
@MKhalifaaaa
Followers
515
Following
1K
Statuses
289
Ph.D. student at @Umich, previously @cohere, @allenai and @AmazonScience, and @NaverLabsEurope. Reasoning with LLMs, attribution and other stuff.
Ann Arbor, Michigan
Joined March 2019
RT @mbalunovic: We finally have an answer to the debate over whether LLMs generalize to new math problems or they merely memorized the answ…
0
165
0
RT @xiangyue96: Demystifying Long CoT Reasoning in LLMs Reasoning models like R1 / O1 / O3 have gained massive atte…
0
192
0
@prajdabre1 Great work. It was cool to see that different methods tend to converge to similar performance at large scales, which kinda motivated our recent work on linear merge optimization
0
0
1
@sh_reya I disagree; being hellbent on something can often be a temporary/precarious stance that can easily change with circumstances, mood, etc. You shouldn't base a long-term commitment such as doing a PhD on the feeling of "I must do a PhD no matter what people say"
0
0
4
Thinking models like o3 will keep getting better and better but ONLY at easily verifiable skills like code and math. Reward modeling for non-verifiable tasks that involve creativity, imagination, or social interaction will remain a challenge
Thoughts about o3: I'll skip the obvious part (extraordinary reasoning, FrontierMath is insanely hard, etc). I think the essence of o3 is about *relaxing a single-point RL super intelligence* to cover more points in the space of useful problems. The world of AI is no stranger to RL achieving god-level stunts. AlphaGo was a super intelligence. It beats the world champion in Go - well above 99.999% of regular players. AlphaStar was a super intelligence. It bests some of the greatest e-sport champion teams on StarCraft. Boston Dynamics e-Atlas was a super intelligence. It performs perfect backflips. Most human brains don't know how to send such sophisticated control signals to their limbs. Similar statement can be made for AIME, SWE-Bench, and FrontierMath - they are like Go, which requires exceptional domain expertise above 99.99....% of average people. o3 is a super intelligence when operating in these domains. The key difference is that AlphaGo uses RL to optimize for a simple, almost trivially defined reward function: winning the game gives 1, losing gives 0. Learning reward functions for sophisticated math and software engineering are much harder. o3 made a breakthrough in solving the reward problem, for the domains that OpenAI prioritizes. It is no longer an RL specialist for single-point task, but an RL specialist for a bigger set of useful tasks. Yet o3's reward engineering could not cover ALL distribution of human cognition. This is why we are still cursed by Moravec's paradox. o3 can wow the Fields Medalists, but still fail to solve some 5-yr-old puzzles like the one below. I am not at all surprised by this cognitive dissonance, just like we wouldn't expect AlphaGo to win Poker games. Huge milestone. Clear roadmap. More to do.
1
1
15
RT @_lewtun: We outperform Llama 70B with Llama 3B on hard math by scaling test-time compute 🔥 How? By combining step-wise reward models w…
0
194
0
@vishaal_urao Interesting work! Unfortunately, I’m not at NeurIPS, but would be happy to chat regardless!
0
0
0
RT @CohereForAI: In our latest work, we ask “Can model merging help with task tradeoffs over models obtained from different training runs”?…
0
8
0
RT @mgalle: WAIT! Don't throw away that checkpoint you don't like. Give it a second life by ♻️ it. Who knows, it might be like our _waste…
0
2
0
Checkout our paper here for more results and analysis! Work done with amazing people: @mgalle @yichern_tan @aahmadian_ @tomhosking @ahmetustun89 @tomsherborne @honglaklee @LuWang__
0
5
11
@_jasonwei It boils down to the level of abstraction. Yes, papers are more polished, but they also written at a higher level of abstraction compared to the other formats you mentioned. Naturally as you get closer to the details, things will become more messy.
0
0
1