![optionally prohibited oss raid Profile](https://pbs.twimg.com/profile_images/1880854547473580032/Bzs2upnL_x96.jpg)
optionally prohibited oss raid
@nobf16measures
Followers
1
Following
399
Statuses
20
Joined December 2024
@kalomaze i spent too little time to think about what you did here; i originally thought that the kind of error decrease per layer youve shown could be analogous to the model thinking, but it's not straightforward to mine and measure activations against concrete outputs... that i know of
1
0
2
@main_horse @cloneofsimo i mean by definition it is, a directional overhead is reducing the time to finding such reverse fact data and disagreements are better on "how to achieve that"
0
0
0
@doomslide or maybe they were on the wrong epistemic branch already. somehow i doubt the whalebros have read pc's iterative distillation
0
0
0
@kellerjordan0 inb4 "what does it converge to"
congratulations to @NousResearch for breaking new ground in shoddy evals lower training loss during the first 3.3% of training doesn't mean your (gradient compressing) optimizer is better -- the question is what does it converge to
0
0
2
@Dorialexander what a jumpscare scrolling down here... never would i expect soumith to join the battle π₯²
0
0
0
@_xjdr comparison to gemini deep research? i've only used that but will take your word for it to switch to oai one
0
0
0