Nils Eckstein @BobQubit profile

Nils Eckstein

@BobQubit

Followers

351

Following

3K

Statuses

404

AI, Art & Immortality. ML @ https://t.co/4uDpBT0kue, HHMI Janelia | Physics @ ETHZ.

Zürich

Joined November 2013

Don't wanna be here? Send us removal request.

Nils Eckstein

@BobQubit

5 hours

It seems to me that there just isn’t a lot of incentive for compression in LLMs (sure you have some low norm regularizers but it’s not clear to me how this translates to representation compression/MDL of the content after decoding), which seems required/equivalent to finding novel abstractions. Model scale is actually counterproductive here (assuming classical transformer arch). This is also clear from the many papers showing failures/inefficiencies in the algos transformers learn from raw data (eg edge of chaos paper). Many directions here that are under-explored imo.

Dwarkesh Patel

@dwarkesh_sp

1 year

I still haven't heard a good answer to this question, on or off the podcast. AI researchers often tell me, "Don't worry bout it, scale solves this." But what is the rebuttal to someone who argues that this indicates a fundamental limitation?

0

Nils Eckstein

@BobQubit

2 days

RT @HHMIJanelia: 📢 Submissions are now open for the CellMap Segmentation Challenge. Build the best method for segmenting cellular organel…

0

50

0

Nils Eckstein

@BobQubit

3 days

RT @JohanWinn: 🚀 We are looking for Image Data Scientists to join our mission to advance connectomics towards whole brain scale for humans…

0

35

0

Nils Eckstein

@BobQubit

4 days

@cppape Good stuff, congrats!

0

2

Nils Eckstein

@BobQubit

7 days

If anyone had bothered to read the extensive prior work on unsupervised disentangled representation learning, this crime could have been avoided.

KZ is in London

@kzSlider

8 days

Damn, triple-homicide in one day. SAEs really taking a beating recently

2

1

7

Nils Eckstein

@BobQubit

11 days

@doomslide Vision still doesn’t actually work & we are lacking strong base models that can do efficient RL for everything but math and coding.

0

1

Nils Eckstein

@BobQubit

13 days

@doomslide RL scaling is pure compute though. Not even data constrained anymore, feels like this isn’t factored in properly.

0

2

Nils Eckstein

@BobQubit

17 days

RT @Miles_Brundage: Stargate + related efforts could help the US stay ahead of China, but China will still have their own superintelligence…

0

37

0

Nils Eckstein

@BobQubit

17 days

@VictorTaelin In case it helps, my unfiltered thoughts were: Amazing stuff, but can’t integrate it into my current research without significant work. So the risk reward ratio is off. Have to choose between replicating r1 or playing around with this. Gotta de-risk it imo, too much going on.

0

Nils Eckstein

@BobQubit

19 days

RT @Dorialexander: My main take away of the Deepseek paper is not scientific but organizational: we need an European industrial plan in AI…

0

51

0

Nils Eckstein

@BobQubit

24 days

0

2

Nils Eckstein

@BobQubit

1 month

@somewheresy Sounds similar to the general problem of not diverting too much from the base distribution that is addressed in post training (eg rlhf) by staying close to the base distribution. So you could try to regularize with a reference from a base output/model.

0

Nils Eckstein

@BobQubit

2 months

@JustinLin610 A QwQ paper? 🙏

0

Nils Eckstein

@BobQubit

2 months

@GarrettPetersen Would recommend at least 1B parameters for realistic wailing.

0

Nils Eckstein

@BobQubit

2 months

@cloneofsimo Is this really a good way to look at this? Solving a trivial QA in token embedding space also gets you ~zero human accuracy.

0

1

Nils Eckstein

@BobQubit

2 months

@RajeshBhayana_ On it

0

Nils Eckstein

@BobQubit

2 months

„as we know it“ carries all the weight here.

Jason Wei

@_jasonwei

2 months

Yall heard it from the man himself

0

1

Nils Eckstein

@BobQubit

2 months

@far__el Some don’t, e.g. those who recycle this take in perpetuity.

0

Nils Eckstein

@BobQubit

2 months

@CRSegerie WDYM? Have you used llama 3? There is no shot that model can do any of the things you are afraid of. This type of fear mongering with zero technical backup is pretty destructive, not good.

0

1

Nils Eckstein

@BobQubit

2 months

@signulll The inverse may actually be true. Long term strategic thinking and managing large groups of agents that have short term planning capabilities is plausibly the highest impact position in the new world.

0

2