Taco Cohen @TacoCohen profile

Taco Cohen

@TacoCohen

Followers

25K

Following

6K

Statuses

1K

Deep learner at FAIR. Into codegen, RL, equivariance, generative models. Spent time at Qualcomm, Scyfer (acquired), UvA, Deepmind, OpenAI.

Joined March 2013

Don't wanna be here? Send us removal request.

Taco Cohen

@TacoCohen

2 days

@CohenSamuel13 @andrew_n_carr @AustinZHenley Great to hear that! It took me 8 years to write (2013 — 2021) 😂😂

0

3

Taco Cohen

@TacoCohen

18 days

TAM may determine what is a reasonable level of investment, but given the investment / compute you have you just want to build the best model possible. You don’t downscale your compute because of more efficient methods, you just do experiments that were inaccessible with less efficient ones

0

3

Taco Cohen

@TacoCohen

23 days

RT @lugaricano: A great post summarizing the (so far, quite positive) evidence on the impact of AI on jobs. In contrast to @DAcemogluMIT's…

0

76

0

Taco Cohen

@TacoCohen

27 days

RT @davidad: Claude Shannon did not *merely* invent relative entropy in the 1940s— by 1951 he got all the way up to the idea of an autoregr…

0

415

0

Taco Cohen

@TacoCohen

29 days

RT @aramHmarkosyan: We're excited to open-source LeanUniverse! A package that simplifies building consistent #Lean4 training datasets from…

0

52

0

Taco Cohen

@TacoCohen

1 month

@teortaxesTex I like it. The arrows convey a general sense of upgoingness, which seems like a good thing

0

3

Taco Cohen

@TacoCohen

1 month

RT @wellecks: Excited about our new ICLR workshop on AI + Verification! In the age of increasingly capable models, trusting outputs and g…

0

11

0

Taco Cohen

@TacoCohen

1 month

if you're not reading about the latest breakthroughs in inference-time compute in business insider, you're ngmi

0

40

Taco Cohen

@TacoCohen

2 months

@PetarV_93 @johannbrehmer @pimdehaan Thanks Petar, that's great to hear!

0

4

Taco Cohen

@TacoCohen

2 months

@johannbrehmer @pimdehaan

Taco Cohen

@TacoCohen

3 months

Does equivariance matter at scale? ... When the twitter discourse gets so tiring that you actually go out and collect EVIDENCE :D There has been a lot of discussion over the years about whether one should build symmetries into your architecture to get better data efficiency, or if it's better to just do data augmentation and learn the symmetries. In my own experiments (and in other papers that have looked at this), equivariance always outperformed data augmentation by a large margin (in problems with exact symmetries), and data augmentation never managed to accurately learn the symmetries. That is perhaps not surprising, given that in typical setups the number of epochs is limited and so each data point is only augmented a few times. Still, many "scale is all you need" folks believe that one should prefer data augmentation (or no bias at all) because eventually, with enough compute / data scale, the more general and scalable method will win (The Bitter Lesson). However, is data augmentation really more scalable? Scalability: how fast the method improves with data and compute scale, and for how long it keeps improving. This is exactly what equivariant nets are good at! We use transformers not N-grams for language, because they are more data efficient / scalable / better adapted to that problem domain. Paraphrasing Ilya Sutskever: scale is not all you need; it matters what you scale. In this latest work we decided to study the scaling behavior of equivariant networks empirically. As Johann explains in the thread below, we confirmed that equivariant networks are more data efficient. Interestingly, we were also able to confirm the intuition that in principle, the network should be able to learn the symmetry as well! When data augmentation is applied at sufficient scale, you get the same sample efficiency benefits as equivariance. HOWEVER: you need to do a huge number of epochs (which people don't do in practice), making equivariant networks more efficient / scalable in terms of training compute. So equivariant networks allow you to get the statistical benefits without paying the computational cost. The takeaway for me is that if you are working on a problem with exact symmetries, and are working on it because it is intrinsically important (climate, materials science / chemistry, molecular biology, etc.) rather than as a stepping stone to a more general problem (where the inductive bias could fail), then equivariant nets are still a good candidate in the age of scaling laws. Awesome work @johannbrehmer @pimdehaan Sönke Behrends!

0

3

Taco Cohen

@TacoCohen

2 months

@KostasPenn Good times!

0

2

Taco Cohen

@TacoCohen

2 months

RT @sirbayes: I am happy to announce that the first draft of my RL tutorial is now available.

0

760

0

Taco Cohen

@TacoCohen

2 months

On my way to NeurIPS! Looking forward to meeting old friends and making new ones. LMK if you're into codegen and RL and want to chat!

4

1

80

Taco Cohen

@TacoCohen

2 months

RT @nabeelqu: Things like this detract from the credibility of AI safety work, IMO -- it sounds spicy ("o1 tried to escape!!!") but when yo…

0

91

0

Taco Cohen

@TacoCohen

2 months

RT @swyx: babe wake up new lilian weng drop

0

15

0

Taco Cohen

@TacoCohen

2 months

RT @NousResearch: Nous Research announces the pre-training of a 15B parameter language model over the internet, using Nous DisTrO and heter…

0

358

0

Taco Cohen

@TacoCohen

2 months

RT @benediktstroebl: Can inference scaling make weaker models competitive with stronger ones? Our new paper on the limits of resampling wit…

0

15

0

Taco Cohen

@TacoCohen

3 months

@bronzeagepapi Trying to learn gauge theory from physicists I realized they actually don’t know what they’re talking about. Only when I started reading the work of mathematicians trying to clean up the mess did it start to make some sense

1

0

9