Taco Cohen
@TacoCohen
Followers
25K
Following
6K
Statuses
1K
Deep learner at FAIR. Into codegen, RL, equivariance, generative models. Spent time at Qualcomm, Scyfer (acquired), UvA, Deepmind, OpenAI.
Joined March 2013
@CohenSamuel13 @andrew_n_carr @AustinZHenley Great to hear that! It took me 8 years to write (2013 — 2021) 😂😂
0
0
3
RT @lugaricano: A great post summarizing the (so far, quite positive) evidence on the impact of AI on jobs. In contrast to @DAcemogluMIT's…
0
76
0
RT @davidad: Claude Shannon did not *merely* invent relative entropy in the 1940s— by 1951 he got all the way up to the idea of an autoregr…
0
415
0
RT @aramHmarkosyan: We're excited to open-source LeanUniverse! A package that simplifies building consistent #Lean4 training datasets from…
0
52
0
@teortaxesTex I like it. The arrows convey a general sense of upgoingness, which seems like a good thing
0
0
3
RT @wellecks: Excited about our new ICLR workshop on AI + Verification! In the age of increasingly capable models, trusting outputs and g…
0
11
0
Does equivariance matter at scale? ... When the twitter discourse gets so tiring that you actually go out and collect EVIDENCE :D There has been a lot of discussion over the years about whether one should build symmetries into your architecture to get better data efficiency, or if it's better to just do data augmentation and learn the symmetries. In my own experiments (and in other papers that have looked at this), equivariance always outperformed data augmentation by a large margin (in problems with exact symmetries), and data augmentation never managed to accurately learn the symmetries. That is perhaps not surprising, given that in typical setups the number of epochs is limited and so each data point is only augmented a few times. Still, many "scale is all you need" folks believe that one should prefer data augmentation (or no bias at all) because eventually, with enough compute / data scale, the more general and scalable method will win (The Bitter Lesson). However, is data augmentation really more scalable? Scalability: how fast the method improves with data and compute scale, and for how long it keeps improving. This is exactly what equivariant nets are good at! We use transformers not N-grams for language, because they are more data efficient / scalable / better adapted to that problem domain. Paraphrasing Ilya Sutskever: scale is not all you need; it matters what you scale. In this latest work we decided to study the scaling behavior of equivariant networks empirically. As Johann explains in the thread below, we confirmed that equivariant networks are more data efficient. Interestingly, we were also able to confirm the intuition that in principle, the network should be able to learn the symmetry as well! When data augmentation is applied at sufficient scale, you get the same sample efficiency benefits as equivariance. HOWEVER: you need to do a huge number of epochs (which people don't do in practice), making equivariant networks more efficient / scalable in terms of training compute. So equivariant networks allow you to get the statistical benefits without paying the computational cost. The takeaway for me is that if you are working on a problem with exact symmetries, and are working on it because it is intrinsically important (climate, materials science / chemistry, molecular biology, etc.) rather than as a stepping stone to a more general problem (where the inductive bias could fail), then equivariant nets are still a good candidate in the age of scaling laws. Awesome work @johannbrehmer @pimdehaan Sönke Behrends!
0
0
3
RT @sirbayes: I am happy to announce that the first draft of my RL tutorial is now available.
0
760
0
RT @nabeelqu: Things like this detract from the credibility of AI safety work, IMO -- it sounds spicy ("o1 tried to escape!!!") but when yo…
0
91
0
RT @NousResearch: Nous Research announces the pre-training of a 15B parameter language model over the internet, using Nous DisTrO and heter…
0
358
0
RT @benediktstroebl: Can inference scaling make weaker models competitive with stronger ones? Our new paper on the limits of resampling wit…
0
15
0
@bronzeagepapi Trying to learn gauge theory from physicists I realized they actually don’t know what they’re talking about. Only when I started reading the work of mathematicians trying to clean up the mess did it start to make some sense
1
0
9