TacoCohen Profile Banner
Taco Cohen Profile
Taco Cohen

@TacoCohen

Followers
25K
Following
6K
Statuses
1K

Deep learner at FAIR. Into codegen, RL, equivariance, generative models. Spent time at Qualcomm, Scyfer (acquired), UvA, Deepmind, OpenAI.

Joined March 2013
Don't wanna be here? Send us removal request.
@TacoCohen
Taco Cohen
2 days
@CohenSamuel13 @andrew_n_carr @AustinZHenley Great to hear that! It took me 8 years to write (2013 — 2021) 😂😂
0
0
3
@TacoCohen
Taco Cohen
18 days
TAM may determine what is a reasonable level of investment, but given the investment / compute you have you just want to build the best model possible. You don’t downscale your compute because of more efficient methods, you just do experiments that were inaccessible with less efficient ones
0
0
3
@TacoCohen
Taco Cohen
23 days
RT @lugaricano: A great post summarizing the (so far, quite positive) evidence on the impact of AI on jobs. In contrast to @DAcemogluMIT's…
0
76
0
@TacoCohen
Taco Cohen
27 days
RT @davidad: Claude Shannon did not *merely* invent relative entropy in the 1940s— by 1951 he got all the way up to the idea of an autoregr…
0
415
0
@TacoCohen
Taco Cohen
29 days
RT @aramHmarkosyan: We're excited to open-source LeanUniverse! A package that simplifies building consistent #Lean4 training datasets from…
0
52
0
@TacoCohen
Taco Cohen
1 month
@teortaxesTex I like it. The arrows convey a general sense of upgoingness, which seems like a good thing
0
0
3
@TacoCohen
Taco Cohen
1 month
RT @wellecks: Excited about our new ICLR workshop on AI + Verification! In the age of increasingly capable models, trusting outputs and g…
0
11
0
@TacoCohen
Taco Cohen
1 month
if you're not reading about the latest breakthroughs in inference-time compute in business insider, you're ngmi
0
0
40
@TacoCohen
Taco Cohen
2 months
@PetarV_93 @johannbrehmer @pimdehaan Thanks Petar, that's great to hear!
0
0
4
@TacoCohen
Taco Cohen
2 months
@TacoCohen
Taco Cohen
3 months
Does equivariance matter at scale? ... When the twitter discourse gets so tiring that you actually go out and collect EVIDENCE :D There has been a lot of discussion over the years about whether one should build symmetries into your architecture to get better data efficiency, or if it's better to just do data augmentation and learn the symmetries. In my own experiments (and in other papers that have looked at this), equivariance always outperformed data augmentation by a large margin (in problems with exact symmetries), and data augmentation never managed to accurately learn the symmetries. That is perhaps not surprising, given that in typical setups the number of epochs is limited and so each data point is only augmented a few times. Still, many "scale is all you need" folks believe that one should prefer data augmentation (or no bias at all) because eventually, with enough compute / data scale, the more general and scalable method will win (The Bitter Lesson). However, is data augmentation really more scalable? Scalability: how fast the method improves with data and compute scale, and for how long it keeps improving. This is exactly what equivariant nets are good at! We use transformers not N-grams for language, because they are more data efficient / scalable / better adapted to that problem domain. Paraphrasing Ilya Sutskever: scale is not all you need; it matters what you scale. In this latest work we decided to study the scaling behavior of equivariant networks empirically. As Johann explains in the thread below, we confirmed that equivariant networks are more data efficient. Interestingly, we were also able to confirm the intuition that in principle, the network should be able to learn the symmetry as well! When data augmentation is applied at sufficient scale, you get the same sample efficiency benefits as equivariance. HOWEVER: you need to do a huge number of epochs (which people don't do in practice), making equivariant networks more efficient / scalable in terms of training compute. So equivariant networks allow you to get the statistical benefits without paying the computational cost. The takeaway for me is that if you are working on a problem with exact symmetries, and are working on it because it is intrinsically important (climate, materials science / chemistry, molecular biology, etc.) rather than as a stepping stone to a more general problem (where the inductive bias could fail), then equivariant nets are still a good candidate in the age of scaling laws. Awesome work @johannbrehmer @pimdehaan Sönke Behrends!
0
0
3
@TacoCohen
Taco Cohen
2 months
@KostasPenn Good times!
0
0
2
@TacoCohen
Taco Cohen
2 months
RT @sirbayes: I am happy to announce that the first draft of my RL tutorial is now available.
0
760
0
@TacoCohen
Taco Cohen
2 months
On my way to NeurIPS! Looking forward to meeting old friends and making new ones. LMK if you're into codegen and RL and want to chat!
4
1
80
@TacoCohen
Taco Cohen
2 months
RT @nabeelqu: Things like this detract from the credibility of AI safety work, IMO -- it sounds spicy ("o1 tried to escape!!!") but when yo…
0
91
0
@TacoCohen
Taco Cohen
2 months
RT @swyx: babe wake up new lilian weng drop
Tweet media one
0
15
0
@TacoCohen
Taco Cohen
2 months
RT @NousResearch: Nous Research announces the pre-training of a 15B parameter language model over the internet, using Nous DisTrO and heter…
0
358
0
@TacoCohen
Taco Cohen
2 months
RT @benediktstroebl: Can inference scaling make weaker models competitive with stronger ones? Our new paper on the limits of resampling wit…
0
15
0
@TacoCohen
Taco Cohen
3 months
@bronzeagepapi Trying to learn gauge theory from physicists I realized they actually don’t know what they’re talking about. Only when I started reading the work of mathematicians trying to clean up the mess did it start to make some sense
1
0
9