![Nils Eckstein Profile](https://pbs.twimg.com/profile_images/1831957462297067520/ZbEFIY3z_x96.jpg)
Nils Eckstein
@BobQubit
Followers
351
Following
3K
Statuses
404
AI, Art & Immortality. ML @ https://t.co/4uDpBT0kue, HHMI Janelia | Physics @ ETHZ.
Zürich
Joined November 2013
It seems to me that there just isn’t a lot of incentive for compression in LLMs (sure you have some low norm regularizers but it’s not clear to me how this translates to representation compression/MDL of the content after decoding), which seems required/equivalent to finding novel abstractions. Model scale is actually counterproductive here (assuming classical transformer arch). This is also clear from the many papers showing failures/inefficiencies in the algos transformers learn from raw data (eg edge of chaos paper). Many directions here that are under-explored imo.
I still haven't heard a good answer to this question, on or off the podcast. AI researchers often tell me, "Don't worry bout it, scale solves this." But what is the rebuttal to someone who argues that this indicates a fundamental limitation?
0
0
0
RT @HHMIJanelia: 📢 Submissions are now open for the CellMap Segmentation Challenge. Build the best method for segmenting cellular organel…
0
50
0
RT @JohanWinn: 🚀 We are looking for Image Data Scientists to join our mission to advance connectomics towards whole brain scale for humans…
0
35
0
@doomslide Vision still doesn’t actually work & we are lacking strong base models that can do efficient RL for everything but math and coding.
0
0
1
@doomslide RL scaling is pure compute though. Not even data constrained anymore, feels like this isn’t factored in properly.
0
0
2
RT @Miles_Brundage: Stargate + related efforts could help the US stay ahead of China, but China will still have their own superintelligence…
0
37
0
@VictorTaelin In case it helps, my unfiltered thoughts were: Amazing stuff, but can’t integrate it into my current research without significant work. So the risk reward ratio is off. Have to choose between replicating r1 or playing around with this. Gotta de-risk it imo, too much going on.
0
0
0
RT @Dorialexander: My main take away of the Deepseek paper is not scientific but organizational: we need an European industrial plan in AI…
0
51
0
@somewheresy Sounds similar to the general problem of not diverting too much from the base distribution that is addressed in post training (eg rlhf) by staying close to the base distribution. So you could try to regularize with a reference from a base output/model.
0
0
0
@cloneofsimo Is this really a good way to look at this? Solving a trivial QA in token embedding space also gets you ~zero human accuracy.
0
0
1
@CRSegerie WDYM? Have you used llama 3? There is no shot that model can do any of the things you are afraid of. This type of fear mongering with zero technical backup is pretty destructive, not good.
0
0
1