Alan Jeffares @Jeffaresalan profile

Alan Jeffares

@Jeffaresalan

Followers

1K

Following

1K

Statuses

235

Multiplying matrices @Cambridge_Uni & @MSFTResearch | PhD student in Machine Learning | Previously MSc @ucl & BSc @ucddublin

London, UK

Joined May 2012

Don't wanna be here? Send us removal request.

Alan Jeffares

@Jeffaresalan

3 months

There are many things we don’t understand about deep learning. Our new NeurIPS paper (w/ @AliciaCurth) makes the mistake of trying to tackle too many of them 😅 A simplified model of deep learning describes double descent, grokking, gradient boosting & linear mode connectivity🧵

15

134

760

Alan Jeffares

@Jeffaresalan

22 days

the two possible emotions when you read your ICLR decision notification

1

2

84

Alan Jeffares

@Jeffaresalan

2 months

@BrantonDeMoss @ml_norms sure, happy to chat

0

Alan Jeffares

@Jeffaresalan

2 months

I will be presenting this work today at 16:30 in East hall poster 2408! Drop by if you are interested in NTK, double descent, grokking, gradient boosting, or weight averaging 😅

Alan Jeffares

@Jeffaresalan

3 months

There are many things we don’t understand about deep learning. Our new NeurIPS paper (w/ @AliciaCurth) makes the mistake of trying to tackle too many of them 😅 A simplified model of deep learning describes double descent, grokking, gradient boosting & linear mode connectivity🧵

1

15

114

Alan Jeffares

@Jeffaresalan

2 months

Lucas is an incredible mentor, i cannot recommend applying for this internship highly enough!

Liyuan Liu (Lucas)

@LiyuanLucas

2 months

Join Microsoft Research's Deep Learning team in Redmond as a Summer 2025 intern! 🎓 Apply at 📍 I'll be at #NeurIPS2024 next week - let's connect and chat! Please help us share this post in your networks : ) #DeepLearning #Internship #MSR

0

2

Alan Jeffares

@Jeffaresalan

3 months

@BlackHC then why are you packing for a holiday?!

1

0

1

Alan Jeffares

@Jeffaresalan

3 months

@AliciaCurth when your tl;dr gets a tl;dr 🫠

0

1

2

Alan Jeffares

@Jeffaresalan

3 months

@pmddomingos pretty accurate TL;DR, thank you!

1

0

5

Alan Jeffares

@Jeffaresalan

3 months

RT @AliciaCurth: Part 2: Why do boosted trees outperform deep learning on tabular data?? @Jeffaresalan &I suspected that answers are obfus…

0

101

0

Alan Jeffares

@Jeffaresalan

3 months

@roydanroy try @verybadwizards 🔥

0

1

Alan Jeffares

@Jeffaresalan

3 months

I’m excited at the prospect of an alternative research social network where my feed won’t be musk, bots and porn dominated. @alanjeffares Bonus: I can finally correct my handle to the right order 😅

0

6

Alan Jeffares

@Jeffaresalan

3 months

@itsstock @AliciaCurth ah yes, working on the tidy up now. will definitely be released before the conference. apologies for the lag 😅

1

0

1

Alan Jeffares

@Jeffaresalan

3 months

if you are looking for a tweet thread that is potentially longer than our actual NeurIPS paper (but also probably clearer?) check this out…

Alicia Curth

@AliciaCurth

3 months

From double descent to grokking, deep learning sometimes works in unpredictable ways.. or does it? For NeurIPS,@Jeffaresalan & I explored if&how statistics + smart linearisation can help us better understand&predict numerous odd deep learning phenomena — and learned a lot..🧵1/n

0

1

22

Alan Jeffares

@Jeffaresalan

3 months

RT @AliciaCurth: From double descent to grokking, deep learning sometimes works in unpredictable ways.. or does it? For NeurIPS,@Jeffaresa…

0

84

0

Alan Jeffares

@Jeffaresalan

3 months

@DamienTeney @AliciaCurth thank you very much!

0

Alan Jeffares

@Jeffaresalan

3 months

@JFPuget @_jason_today @3rp3l this is a great find actually, thank you! it’s still quite distinct from the model soups method, but is the earliest case of model merging neural networks that i’m aware of. also so cool to think back to the days when a neural network consisted of 4 hidden neurons!

0

Alan Jeffares

@Jeffaresalan

3 months

@arishabh8 @AliciaCurth Yes absolutely, we have some documentation and cleanup to get around to but it will definitely be released before the conference!

1

0

2

Alan Jeffares

@Jeffaresalan

3 months

@JFPuget @_jason_today @3rp3l I am obviously not going to be convinced by this. Checkpoint merging is not what model soups does, so this doesn't provide evidence that model souping has been "known for so many years on Kaggle". I am happy to leave the conversation there.

1

0

Alan Jeffares

@Jeffaresalan

3 months

@JFPuget @_jason_today @3rp3l Not only was it just an off-hand comment in a tweet, but it also didn't even claim to apply the same algorithm as model soups. There is (so far) no evidence of model soups being applied prior to its publication.

1

0

Alan Jeffares

@Jeffaresalan

3 months

@_jason_today @3rp3l @JFPuget Fair enough. I'm just pushing back against the original tweets unsubstansiated trope that a specific method was already used for years.

1

0