![Alan Jeffares Profile](https://pbs.twimg.com/profile_images/1765021599923671041/exrJpSLU_x96.jpg)
Alan Jeffares
@Jeffaresalan
Followers
1K
Following
1K
Statuses
235
Multiplying matrices @Cambridge_Uni & @MSFTResearch | PhD student in Machine Learning | Previously MSc @ucl & BSc @ucddublin
London, UK
Joined May 2012
There are many things we don’t understand about deep learning. Our new NeurIPS paper (w/ @AliciaCurth) makes the mistake of trying to tackle too many of them 😅 A simplified model of deep learning describes double descent, grokking, gradient boosting & linear mode connectivity🧵
15
134
760
I will be presenting this work today at 16:30 in East hall poster 2408! Drop by if you are interested in NTK, double descent, grokking, gradient boosting, or weight averaging 😅
There are many things we don’t understand about deep learning. Our new NeurIPS paper (w/ @AliciaCurth) makes the mistake of trying to tackle too many of them 😅 A simplified model of deep learning describes double descent, grokking, gradient boosting & linear mode connectivity🧵
1
15
114
Lucas is an incredible mentor, i cannot recommend applying for this internship highly enough!
Join Microsoft Research's Deep Learning team in Redmond as a Summer 2025 intern! 🎓 Apply at 📍 I'll be at #NeurIPS2024 next week - let's connect and chat! Please help us share this post in your networks : ) #DeepLearning #Internship #MSR
0
0
2
RT @AliciaCurth: Part 2: Why do boosted trees outperform deep learning on tabular data?? @Jeffaresalan &I suspected that answers are obfus…
0
101
0
I’m excited at the prospect of an alternative research social network where my feed won’t be musk, bots and porn dominated. @alanjeffares Bonus: I can finally correct my handle to the right order 😅
0
0
6
@itsstock @AliciaCurth ah yes, working on the tidy up now. will definitely be released before the conference. apologies for the lag 😅
1
0
1
if you are looking for a tweet thread that is potentially longer than our actual NeurIPS paper (but also probably clearer?) check this out…
From double descent to grokking, deep learning sometimes works in unpredictable ways.. or does it? For NeurIPS,@Jeffaresalan & I explored if&how statistics + smart linearisation can help us better understand&predict numerous odd deep learning phenomena — and learned a lot..🧵1/n
0
1
22
RT @AliciaCurth: From double descent to grokking, deep learning sometimes works in unpredictable ways.. or does it? For NeurIPS,@Jeffaresa…
0
84
0
@JFPuget @_jason_today @3rp3l this is a great find actually, thank you! it’s still quite distinct from the model soups method, but is the earliest case of model merging neural networks that i’m aware of. also so cool to think back to the days when a neural network consisted of 4 hidden neurons!
0
0
0
@arishabh8 @AliciaCurth Yes absolutely, we have some documentation and cleanup to get around to but it will definitely be released before the conference!
1
0
2
@JFPuget @_jason_today @3rp3l I am obviously not going to be convinced by this. Checkpoint merging is not what model soups does, so this doesn't provide evidence that model souping has been "known for so many years on Kaggle". I am happy to leave the conversation there.
1
0
0
@JFPuget @_jason_today @3rp3l Not only was it just an off-hand comment in a tweet, but it also didn't even claim to apply the same algorithm as model soups. There is (so far) no evidence of model soups being applied prior to its publication.
1
0
0
@_jason_today @3rp3l @JFPuget Fair enough. I'm just pushing back against the original tweets unsubstansiated trope that a specific method was already used for years.
1
0
0