![Katie Kang Profile](https://pbs.twimg.com/profile_images/1667638828251889664/4YfynWUz_x96.jpg)
Katie Kang
@katie_kang_
Followers
2K
Following
2K
Statuses
90
The deepseek R1 recipe seems so simple. Iāve been wondering whatās changed from previous RL+reasoning efforts, and found this thread insightful
With R1, a lot of people have been asking āhow come we didn't discover this 2 years ago?ā Well... 2 years ago, I spent 6 months working exactly on this (PG / PPO for math+gsm8k), but my results were nowhere as good. Hereās my take on what blocked me and whatās changed: š§µ
0
4
25
RT @aviral_kumar2: šØ We are organizing an ICLR workshop on self-improving foundation models w/o human supervision at ICLR 2025 in Singaporeā¦
0
19
0
RT @sea_snell: Can we predict emergent capabilities in GPT-N+1š using only GPT-N model checkpoints, which have random performance on the taā¦
0
71
0
RT @j_foerst: Learnability for the win! This is one of the lessons that transfers from Curriculum methods in RL directly to LLM training.
0
3
0
@xordrew @setlur_amrith @its_dibya @JacobSteinhardt @svlevine @aviral_kumar2 Potentially because the model has less incentive to change if it has already achieved low loss, though I think thereās some prior work that shows it can happen sometimes if you train for a really long time,e.g. in
0
0
3
@BlancheMinerva Omg thank you!! Will definitely reach out if we end up pursuing this direction š
0
0
2
RT @aviral_kumar2: Check out @katie_kang_'s work on understanding memorization vs learning in reasoning! By probing LLMs in training, weā¦
0
8
0
@chaochunh It's an indicator variable in MaskedAcc, so 1 if perp > p (not memorized) and 0 if perp < p (memorized)
1
0
1
@VarunGodbole @setlur_amrith @its_dibya @JacobSteinhardt @svlevine @aviral_kumar2 With a suboptimal training setup (e.g. hyperparams), models can sometimes directly memorize examples that they may otherwise learn more generalizably if the training setup was better. So there's a tradeoff between direct memorization vs generalizable learning before memorization
0
0
0
@VarunGodbole @setlur_amrith @its_dibya @JacobSteinhardt @svlevine @aviral_kumar2 Once models learn to generate diverse+correct CoTs to a train example, they tend to retain the ability to generate robust predictions, regardless of whether it memorizes the example later in training
1
0
2
@ahatamiz1 Thanks! We haven't, but it would definitely be interesting to better understand the relationship btw architecture and generalization
1
0
7
@chaochunh 1) yes! in calculating pre-mem acc, we mask out accuracy when perplexity is too low 2) it makes the scale of numbers slightly easier to work with
0
0
2
@BlancheMinerva Thanks! It would definitely be interesting to study the learning dynamics of pretraining as well
1
0
2
RT @avisingh599: Exciting couple of days for reasoning research: Procedural Knowledge in Pretraining Drives Reasoning in Large Language Moā¦
0
23
0
RT @svlevine: An intriguing new result from @katie_kang_: after training long enough, LLMs will reproduce training examples exactly (not suā¦
0
60
0