![aliastasis Profile](https://pbs.twimg.com/profile_images/1562826732708810752/uHhwREv3_x96.jpg)
aliastasis
@aliastasis
Followers
21
Following
133
Statuses
68
Working on LLMs/VLMs in my spare time. Making money with climate tech. Currently getting a Masters in CS @ LMU
München, Bayern
Joined January 2018
@jonasgeiping @Teknium1 @flowersslop @tomgoldsteincs But its not the same memory cost or is it? 32 steps in recurrence does not increase the memory requirement if I understood it correctly. But 32 CoT tokens does increase the memory requirement
1
0
0
I dont think so (but I only skimmed through the paper), it looks like that they pass the “latent state” multiple times through the recurrent part. So when the current sequence has 50 tokens they pass the latent state back as input and the sequence length stays the same so to say (but correct me if I’m wrong) until they generate the next token after the recurrence has finished. But for me this seems more like improving the Transformer/GPT architecture than “test time scaling” as I don’t think this will scale the same way as R1/Ox test time scaling (I mean in the figure the accuracy increase flattens out). But one could combine them I guess
0
0
4
Probably because it’s heavily trained on problems to solve and so the “mode” is that it always assumes you provide a actual task/question when your prefix is one which indicates that you gonna state the task afterwords (so “I have a question for you” indicates/implies that a question will follow etc)
0
0
2
@kimmonismus So it’s basically not important if these models get really up to the generalisation power of humans, as we can just generate enough densily sampled tasks to cover the “software engineering” distribution
0
0
2
@kimmonismus Also here, as Yoshua Bengio stated recently, when a company will reach AGI they probably wont release it or talk about it. They will use it to build companies which compete with the rest of the world…
0
0
0
@kimmonismus I mean the thing with infrastructure is that it costs enormous amounts of money. And the ones who get the money are the ones who proved that they can build sota models with it. I think we need here the same. Just to proof that it’s worth the investment here in Germany/EU
0
0
1
And yes I know 150 H100/A100 (its 120 H100 actually and 20 A100, so 140 in total not 150. Looked it up again shortly) are not that much but I guess one needs to first “proof” (to get large amounts of funding) that a team is able to eventually compete with OAI and for such things it could help (by training first smaller models as POC).
0
0
0
@rasbt But yeah would be interesting how alternatives to these “dominant” tokens would perform
0
0
0
@kimmonismus Finde die Idee super dass man sich irgendwie als Community zusammen tut. Misse das auch hier und wäre dabei!
1
0
4
Datenschutzrechtliche Aspekte sind denke ich auch ein großer Punkt (falls noch nicht genannt) bzw. die rechtliche Unsicherheit bzgl. Trainingsdaten oder auch synthetische Daten von pretrained LLMs (was ist wenn diese u.a. mit Daten trainiert wurden, die man unter deutschem Recht nicht so einfach hätte benutzten dürfen -> darf man dann auch keine synthetischen Daten mit diesen Modellen generieren und kommerziell verwenden?). Vermutlich fällt das aber eh unter die Bürokratie Kategorie.
0
0
3