![Ido Ben-Shaul Profile](https://pbs.twimg.com/profile_images/1667832184131006465/WsQhbXHw_x96.jpg)
Ido Ben-Shaul
@ml_norms
Followers
1K
Following
6K
Statuses
3K
Ido, 28, everything I find interesting in Math/ML/AI. AA-I Technologies 🦾 (Hit me up!) PhD in Appl Math @TelAvivUni.
Joined September 2018
@ylecun has been a hero of mine for more than a decade. His work introduced me to so many fields, from ConvNets, EBMs, SSL, World Models, and many more. It's an honor to present our paper at #NeurIPS2023, with the best researchers in the world. I'm so thankful and inspired🙏
4
0
31
RT @imtiazprio: Indeed, it is that simple! The wiggliness induced by each layer allows NNs to approximate non-linear functions. More layers…
0
36
0
RT @DimitrisPapail: o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generaliz…
0
61
0
RT @DimitrisPapail: o3 can't multiply 10 digit numbers, but here is the acc of a 14m transformer that teaches itself how to do it, with ite…
0
62
0
RT @natolambert: Costa's just trying to make GRPO go brrr with no bugs and we're ending up with way better performance than the Tülu models…
0
17
0
RT @JeffDean: I'm delighted to have joined my good friend and colleague @NoamShazeer for a 2+hour conversation with @dwarkesh_sp about a wi…
0
186
0
RT @randall_balestr: Given a pretrained model, spline theory tells you how to alter its curvature by changing a single interpretable parame…
0
42
0
RT @Yuchenj_UW: This is wild - UC Berkeley shows that a tiny 1.5B model beats o1-preview on math by RL! They applied simple RL to Deepseek…
0
365
0
RT @TaubenfeldAmir: New Preprint 🎉 LLM self-assessment unlocks efficient decoding ✅ Our Confidence-Informed Self-Consistency (CISC) metho…
0
19
0
RT @pranavn1008: Announcing Matryoshka Quantization! A single Transformer can now be served at any integer precision!! In addition, our (sl…
0
82
0
RT @iScienceLuvr: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach We study a novel language model architect…
0
180
0
RT @feeelix_feng: You think on-policy sampling gives the best reward models? Think again! 🔥 Our finding: Even with on-policy data, reward m…
0
39
0
RT @SFResearch: ⚡ Meet BOLT: A novel approach to develop long chain-of-thought reasoning in LLMs without relying on knowledge distillation…
0
30
0
RT @xiangyue96: Demystifying Long CoT Reasoning in LLMs Reasoning models like R1 / O1 / O3 have gained massive atte…
0
192
0
RT @BachFrancis: An inspirational talk by Michael Jordan: a refreshing, deep, and forward-looking vision for AI beyond LLMs.
0
48
0