Alicia Curth Profile Banner
Alicia Curth Profile
Alicia Curth

@AliciaCurth

Followers
4,139
Following
518
Media
92
Statuses
273

Senior Researcher Machine Learning @MSFTResearch , Statistician at ❤️ In search of statistical intuition for modern ML &simple explanations for complex things 👀

Joined September 2020
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@AliciaCurth
Alicia Curth
8 months
Why do Random Forests perform so well off-the-shelf & appear essentially immune to overfitting?!? I’ve found the text-book answer “it’s just variance reduction 🤷🏼‍♀️” to be a bit too unspecific, so in our new pre-print , @Jeffaresalan & I investigate..🕵🏼‍♀️ 1/n
Tweet media one
15
225
1K
@AliciaCurth
Alicia Curth
22 days
When Double Descent & Benign Overfitting became a thing, I was a masters student in statistics — and so confused. I couldn't reconcile what l had literally just learned about bias-variance&co with modern ML Here's what I wish someone had told me then: 1/n
Tweet media one
18
183
1K
@AliciaCurth
Alicia Curth
9 months
I spent the first 2.5 years of my PhD on the question “What makes individualised treatment effect estimation an interesting Machine Learning problem (and how do we best solve it)?”. Super excited that a review of lots of things we learned along the way was accepted into… 1/8
Tweet media one
10
112
733
@AliciaCurth
Alicia Curth
1 year
Super excited to finally share @Jeffaresalan & my #NeurIPS2023 Oral: 🥳 — a slightly unconventional paper leading to a surprising and (shockingly) simple resolution of the tension between statistical intuition & double descent! 1/3
Tweet media one
12
116
620
@AliciaCurth
Alicia Curth
8 months
Part 2: So why DO Random Forests work?! On this, I’ll have to disagree with Elements of Statistical Learning (my first time ever 💔) EoSL says the success of forests should be understood as a consequence of variance reduction *alone*, but I think that’s not a good intuition 1/n
Tweet media one
@AliciaCurth
Alicia Curth
8 months
Why do Random Forests perform so well off-the-shelf & appear essentially immune to overfitting?!? I’ve found the text-book answer “it’s just variance reduction 🤷🏼‍♀️” to be a bit too unspecific, so in our new pre-print , @Jeffaresalan & I investigate..🕵🏼‍♀️ 1/n
Tweet media one
15
225
1K
6
77
540
@AliciaCurth
Alicia Curth
1 year
Getting into conference mode like…. 🌺 #ICML2023
Tweet media one
4
7
222
@AliciaCurth
Alicia Curth
10 months
oh and P.S. I don’t think we will ever be able to top the level of creativity it took to come up with our 3D poster. Most successful arts&crafts project @Jeffaresalan or I have ever been involved in 🙏🏻
@AliciaCurth
Alicia Curth
10 months
Having started my PhD in the gathertown era, it’s bittersweet to realise that the most rewarding PhD moments happen in-person at conferences. I’ve had an incredible week putting faces to names & it’s been a surreal experience presenting our work to so many of them. I❤️NeurIPS!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
3
158
2
12
174
@AliciaCurth
Alicia Curth
10 months
Having started my PhD in the gathertown era, it’s bittersweet to realise that the most rewarding PhD moments happen in-person at conferences. I’ve had an incredible week putting faces to names & it’s been a surreal experience presenting our work to so many of them. I❤️NeurIPS!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
3
158
@AliciaCurth
Alicia Curth
1 year
Every StatML intro class covers complexity-error U-curves, so @Jeffaresalan & I asked ourselves whether the info from these classes is enough to explain double descent too? Our #NeurIPS23 paper does a roundtrip of The Elements of Statistical Learning and answers “Yes”! Long🧵1/n
Tweet media one
4
23
156
@AliciaCurth
Alicia Curth
2 years
I complain a lot about the general quality of ML reviews (offline), so I try to do better when I review myself. I was a reviewer for two conferences during 2022 (ICML22, AISTATS23), and now received a top reviewer award for both 🥳 Excited to see that this effort pays off!😊
Tweet media one
4
2
155
@AliciaCurth
Alicia Curth
18 days
Addendum — if you take away just one thing from this thread, it should be: Machine learning isn’t some kind of magic that defies the laws of statistics! I believe fundamental concepts from classical statistics will (probably) be “all we need” to understand modern ML!! BUT… 1/3
@AliciaCurth
Alicia Curth
22 days
When Double Descent & Benign Overfitting became a thing, I was a masters student in statistics — and so confused. I couldn't reconcile what l had literally just learned about bias-variance&co with modern ML Here's what I wish someone had told me then: 1/n
Tweet media one
18
183
1K
2
11
140
@AliciaCurth
Alicia Curth
8 months
Economists seem to LOVE synthetic control methods, so during my MSR internship with @javiergonzh we wanted to understand whether we could use them for survival analyses (v prevalent in medicine) too? Delighted that our answer (“It’s complicated!”) was accepted @Conf_CLeaR … 1/n
Tweet media one
3
29
140
@AliciaCurth
Alicia Curth
9 months
Receiving feedback on paper drafts as a German be like: 😭 (thx @Jeffaresalan )
Tweet media one
Tweet media two
Tweet media three
Tweet media four
7
2
127
@AliciaCurth
Alicia Curth
8 months
P.S.: Exciting for me as an ex-econometrician, this project also meant I finally got to learn what‘s behind all that ✨ synthetic control magic ✨🕵🏼‍♀️ My lukewarm (?) take: no magic, just some linearity assumptions* doing v heavy lifting in the background 🫢
Tweet media one
@AliciaCurth
Alicia Curth
8 months
Economists seem to LOVE synthetic control methods, so during my MSR internship with @javiergonzh we wanted to understand whether we could use them for survival analyses (v prevalent in medicine) too? Delighted that our answer (“It’s complicated!”) was accepted @Conf_CLeaR … 1/n
Tweet media one
3
29
140
3
18
117
@AliciaCurth
Alicia Curth
9 months
If you missed us in New Orleans but wanted to hear @Jeffaresalan & myself talk about (literal and figurative) U-turns on double descent, it seems that NeurIPS has made all recordings of Orals publicly available!🥳 Find us at minute 35:15 in this recording:
Tweet media one
2
10
110
@AliciaCurth
Alicia Curth
3 months
Excited to be back at #icml !☀️Find me floating around or come chat to me & @Jeffaresalan about our integrated attempt at understanding deep double descent, grokking, linear mode connectivity & differences between gradient boosting and neural nets on Friday at the HiLD workshop!🤓
Tweet media one
0
7
105
@AliciaCurth
Alicia Curth
1 year
I am going to Honolulu and I’m bringing … 3 posters!!!🤯🥳🌺 beyond excited & happy that lots of hard work paid off — but also feeling very lucky to have had great coauthors 🤗 as well as the most engaged set of reviewers *and* ACs I’ve seen so far! See you in July @icmlconf ☀️
@MihaelaVDS
Mihaela van der Schaar
1 year
Incredibly proud of what my students have achieved with our contributions for #ICML2023 ! We will present a range of our intensive work on causal deep learning, clinical trials, treatment effect estimation, synthetic data and deep learning for tabular data:
Tweet media one
4
5
46
2
3
77
@AliciaCurth
Alicia Curth
11 months
brb just quickly recharging the batteries en route to #NeurIPS2023 to get ready for the highlight of my academic year 🙆🏼‍♀️☀️ Next up: beyond excited to present our work on double descent with @Jeffaresalan as an oral in the first conference session on Tuesday! See you 🔜 NOLA 😎
Tweet media one
0
1
71
@AliciaCurth
Alicia Curth
1 year
Been sitting on this for a while now, but we are almost camera-ready so I can finally share: started a new research thread w/ @Jeffaresalan earlier this year!! Our joint paper goes down a surprising rabbit hole & got rewarded with a NeurIPS Oral!🤯🥳 (Paper dropping next week🔥⏳)
Tweet media one
2
4
67
@AliciaCurth
Alicia Curth
11 months
It’s finally time: tomorrow @Jeffaresalan & I will be presenting our #NeurIPS2023 paper on a surprisingly simple resolution to double descent in Oral session 1D at 10:30am in room R06-09 (level 2) 🥳 Beware: it’s a little trek to get to the room (upstairs), don’t miss it 😉
@AliciaCurth
Alicia Curth
1 year
Super excited to finally share @Jeffaresalan & my #NeurIPS2023 Oral: 🥳 — a slightly unconventional paper leading to a surprising and (shockingly) simple resolution of the tension between statistical intuition & double descent! 1/3
Tweet media one
12
116
620
1
5
53
@AliciaCurth
Alicia Curth
7 months
Another year, another amazing @Conf_CLeaR !! Had only one complaint last year (the weather on the conference hike…) and even that was perfectly arranged this time☀️ Personal takeaway: small, focused ML conferences are so so great — esp for PhD students & for finding community!
Tweet media one
Tweet media two
Tweet media three
@AliciaCurth
Alicia Curth
2 years
Had the absolute best time at @CLeaR_2022 in Tübingen the last few days! From great talks & papers to great people, great organisation & great food, this conference had everything I could have hoped for 😍 (except for maybe great weather… ) Really can’t wait for #CLeaR24 🤓
Tweet media one
Tweet media two
Tweet media three
1
2
39
1
4
53
@AliciaCurth
Alicia Curth
22 days
I’ve spent the last 1.5 years working with the amazing @Jeffaresalan on understanding modern ML phenomena, questioning everything we know about statistics in the process. The above is probably one of my biggest yet simplest takeaways! More here: 19/19
3
6
54
@AliciaCurth
Alicia Curth
10 months
2023’s biggest PhD highlights were def the conferences for me, finally being able to attend in person does make such a difference 🙌🏻 Personal top moments from ICML & NeurIPS below (slightly different vibes)
Tweet media one
Tweet media two
0
0
48
@AliciaCurth
Alicia Curth
1 year
Never have I ever been this relaxed while reading conference reviews 🙆🏼‍♀️
Tweet media one
0
0
45
@AliciaCurth
Alicia Curth
4 years
Beyond excited to share that the first paper of my PhD with @MihaelaVDS , on estimating conditional average treatment effects using meta-learners and neural nets, was recently accepted for publication at #AISTATS2021 ! Paper: Code:
2
2
43
@AliciaCurth
Alicia Curth
8 months
In other news: just interrupting the usual stats/ML coverage to share completion of my final @Cambridge_Uni bucketlist item — being part of @clarehall_cam ’s first ever women’s crew to win blades in Lent Bumps last week 😱💪🏻 is that Cam telling me it’s time to graduate soon…?🤔
Tweet media one
Tweet media two
2
0
41
@AliciaCurth
Alicia Curth
9 months
Fun fact: when @Jeffaresalan & I fell down the double descent rabbit hole, we were actually looking into another question entirely. Why do simple ensembles continue to work so well in practice?! We learned a lot about Random Forests on the way & have now come full circle: ⬇️🚨👀
@StatMLPapers
Stat.ML Papers
9 months
Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers
0
32
145
0
1
43
@AliciaCurth
Alicia Curth
2 years
Delighted ☀️ to be in Valencia this week to present our paper on heterogeneous treatment effect estimation in the presence of competing risks 🙌🏻😎 I’m extra excited because I FINALLY get to attend @aistats_conf in person-it’s where my first PhD paper was published back in 2021 🤓
@AliciaCurth
Alicia Curth
2 years
Excited to share the next chapter in my saga on heterogeneous treatment effect estimation (aka my PhD) — to be presented at @aistats_conf in April — which features some interesting new characters: competing events! () 1/n
1
0
32
0
2
40
@AliciaCurth
Alicia Curth
2 years
Had the absolute best time at @CLeaR_2022 in Tübingen the last few days! From great talks & papers to great people, great organisation & great food, this conference had everything I could have hoped for 😍 (except for maybe great weather… ) Really can’t wait for #CLeaR24 🤓
Tweet media one
Tweet media two
Tweet media three
1
2
39
@AliciaCurth
Alicia Curth
3 years
Super excited to share that I’ve not only had my first ever #NeurIPS paper accepted, but also my second (joint with C. Lee) and third (led by @QianZhaozhi ) 🥳🤯 Finished my first year with @MihaelaVDS on a high note!🥳
@MihaelaVDS
Mihaela van der Schaar
3 years
I'm still processing our #NeurIPS2021 results—12 papers accepted! All I can say is THANK YOU to our superstar lab members for your brilliance and dedication. So proud of you all! Details here:
Tweet media one
9
10
220
1
1
38
@AliciaCurth
Alicia Curth
22 days
Turns out, there’s quite a simple explanation noone talks about: The intuitions on bias-variance tradeoff & overfitting I was taught apply to in-sample prediction(where only outputs are resampled at testtime) while modern ML wants generalization to new inputs -crucial change! 2/n
2
0
38
@AliciaCurth
Alicia Curth
1 year
Long 🧵to follow soon, for now check out the paper here: ! We learned A LOT about statistics, ML & their history on the way — really hope that people will enjoy reading this paper even half as much as we did writing it! 🤓 3/3
1
1
37
@AliciaCurth
Alicia Curth
2 years
After two years of gathertown, the day has finally come: it’s time for the first in-person presentation & poster of my PhD 🥳 I’ll be presenting at 5:35pm (Room 318) with poster session 6:30-8:30pm —come by if you’d like to chat about imputation (or anything else)!🤓 #ICML2022
@MihaelaVDS
Mihaela van der Schaar
2 years
Second in line at #ICML2022 are Daniel Jarrett, @BCebere , Tennison Liu, @AliciaCurth & I: HyperImpute, a generalised iterative imputation framework. Missing data is a big problem and here, we present THE state-of-the-art tool that can help solving it! 2/2
Tweet media one
0
1
11
4
4
38
@AliciaCurth
Alicia Curth
2 years
I have spent *tons* of time in the last couple of years with @MihaelaVDS trying to find good benchmarks to evaluate (heterogeneous) causal effect estimators — and am still not really satisfied with what we’ve got (see e.g. our NeurIPS21 critique ) 🥲 …
@WvanAmsterdam
Wouter van Amsterdam
2 years
Interesting take. The comparison with NNs breaks here: with NNs we can easily empirically verify performance (e.g. ImageNet) What's the ImageNet of causal inference? Maybe going forward we should accept simulations as (supporting) 'proof' instead of just theorems / formal proofs
4
0
18
3
3
36
@AliciaCurth
Alicia Curth
3 months
I put as much effort into reviewing as how I would like my own papers to be reviewed — I think more reviewers should give that a try 😅 Be the change you wish to see in the world right? 😉 (also, if you need an incentive: it might give you a free conference registration!)
@icmlconf
ICML Conference
3 months
Congratulations to best reviewer awards
Tweet media one
6
11
120
1
0
37
@AliciaCurth
Alicia Curth
3 years
What drives the relative empirical performance of ML algorithms for CATE estimation? Sometimes it's simply the choice of benchmark dataset! With @MihaelaVDS , I wrote about this for the #ICML2021 Workshop on Neglected Assumptions in Causal Inference happening tomorrow.
Tweet media one
1
6
36
@AliciaCurth
Alicia Curth
3 months
Want to hear about the next stop on our journey into understanding modern deep learning phenomena? Come find @Jeffaresalan & myself in the poster sessions at 10:00 and 15:30 at the workshop on high-dimensional learning dynamics @icmlconf in Straus 2 tomorrow! 🙌🏻 #ICML2024
@AliciaCurth
Alicia Curth
3 months
Excited to be back at #icml !☀️Find me floating around or come chat to me & @Jeffaresalan about our integrated attempt at understanding deep double descent, grokking, linear mode connectivity & differences between gradient boosting and neural nets on Friday at the HiLD workshop!🤓
Tweet media one
0
7
105
3
1
35
@AliciaCurth
Alicia Curth
1 year
#ICML2023 camera-ready ✔️✔️✔️ #ICML2023 travel-ready ⏳🔜🌺
Tweet media one
0
0
33
@AliciaCurth
Alicia Curth
2 years
Excited to share the next chapter in my saga on heterogeneous treatment effect estimation (aka my PhD) — to be presented at @aistats_conf in April — which features some interesting new characters: competing events! () 1/n
@MihaelaVDS
Mihaela van der Schaar
2 years
What makes estimating heterogeneous treatment effects from survival data *in the presence of competing events* challenging? We study this new & important problem, and theoretically analyse & empirically illustrate when & how competing events affect ML here
0
1
7
1
0
32
@AliciaCurth
Alicia Curth
22 days
In addition to what we wrote about double descent in , this is thus another reason why double descent does not contradict the bias-variance tradeoff: the bias-variance tradeoff holds in-sample, while double descent *exclusively* appears out-of-sample! 8/n
Tweet media one
1
1
31
@AliciaCurth
Alicia Curth
22 days
When I think bias-variance tradeoff, I think about k-nearest neighbor estimators. I was taught that variance increases with complexity while bias decreases with complexity (here: lower k) because for in-sample prediction the 1-NN estimator (the example itself) has zero bias.3/n
Tweet media one
1
0
30
@AliciaCurth
Alicia Curth
22 days
Well, turns out this intuition is actually NOT always true for out-of-sample prediction (ie generalization) 🤯 Because for a new input there is no perfect training match, the bias of the most complex estimator in this class (the 1-NN estimator) is NOT necessarily the lowest 4/n
Tweet media one
1
0
28
@AliciaCurth
Alicia Curth
18 days
In out-of-sample settings, even for v simple models: 1. there isnt always a tradeoff between bias&var 2. bias can sometimes get worse with increased complexity 3. overfitting can also be a consequence of bias (not only var) - and this is crucial for understanding modern ML! 3/3
0
0
28
@AliciaCurth
Alicia Curth
10 months
If you’ve ever wanted to hear me rave about statistics for an hour, I’ve got a belated Christmas present for you: 🎁 Had a great time chatting to @AleksanderMolak about causality, double descent, stats and my journey into ML research from econometrics! 🤓
@AleksanderMolak
Aleksander Molak {'url': 'CausalPython.io'}
10 months
A causal journey from Amsterdam to Cambridge and from potential outcomes to DAGs and back. A new premiere today! 1/n #causality #causalAI #causaltwitter #machinelearning #neurips
1
0
12
0
1
27
@AliciaCurth
Alicia Curth
9 months
…Clinical Pharmacology & Therapeutic’s special issue on Machine Learning 🥳 Link: , with big thanks to my amazing coauthors Richard Peck, Eoin McKinney, @weatheralljim75 & @MihaelaVDS 🙌🏻 To briefly answer the question in the top-level tweet, .. 2/8
2
1
25
@AliciaCurth
Alicia Curth
8 months
We make use of the adaptive nearest neighbor interpretation of trees & forests (eg ) bc that makes them much easier to reason about: trees are simply smoothers with learned weights! We show that this view makes their behaviour intuitive to understand… 2/n
Tweet media one
3
1
25
@AliciaCurth
Alicia Curth
1 year
Super excited to be speaking about our work on using machine learning for discovering & understanding treatment effect heterogeneity at @AIClubBioMed this week 🤓
@AIClubBioMed
Cambridge AI Club for Biomedicine
1 year
Next event alert! Join us on Thur 4th May to explore this month's theme: "Machine Learning for Clinical Decision Making". We have two exciting talks from Alicia Curth @AliciaCurth and Vincent Jeanselme @JeanselmeV , followed by pizza. See you there 👀 @TheMilnerInst @CRUK_CI
Tweet media one
1
9
16
1
0
25
@AliciaCurth
Alicia Curth
11 months
Had the absolute pleasure of learning from the amazing @dennisfrauen about sensitivity analysis when he visited us in Cambridge over the summer to work on this paper! 🤓 Go check it out here: 🙌🏻
@stfeuerriegel
Stefan Feuerriegel
11 months
🚨New preprint: A Neural Framework for Generalized Causal Sensitivity Analysis 👉 We propose NeuralCSA: a #neural framework for generalized #causal #sensitivity #analysis /🧵
1
6
17
1
1
25
@AliciaCurth
Alicia Curth
22 days
Here’s a simulation example of this: while bias behaves monotonically as expected in-sample, it doesn’t out of sample! Indeed, for k<10, *there is no bias variance tradeoff* out-of-sample: both bias and variance prefer estimators with lower complexity in this region! 5/n
Tweet media one
1
0
25
@AliciaCurth
Alicia Curth
1 year
Aloha #icml2023 , I'm excited for a big day full of posters!🌺 If you're interested to chat about all things treatment effects, come at 11am to discuss model selection with me ( #415 ) & informative sampling with @ToonVDSchueren ( #514 ), and at 2pm for adaptive trials ( #415 )!
@AliciaCurth
Alicia Curth
1 year
Getting into conference mode like…. 🌺 #ICML2023
Tweet media one
4
7
222
0
2
24
@AliciaCurth
Alicia Curth
22 days
In conclusion, I thus think we need to rethink how intuitions around bias-variance tradeoffs and overfitting are taught — in particular, more precision in vocabulary may be needed when we talk about these things to make clear when intuitions are likely to apply and why! 16/n
Tweet media one
1
0
24
@AliciaCurth
Alicia Curth
4 years
First tweet, big news: just graduated @UniofOxford with a MSc in Statistical Science, received a prize by @OxfordStats for overall performance on the MSc AND joined the amazing @MihaelaVDS as a PhD student at @FacultyMaths ! 2020 certainly saved all the good things for the end 🥳
Tweet media one
3
2
23
@AliciaCurth
Alicia Curth
1 year
Investigating the double descent phenomenon outside of deep learning, we went down 2 rabbit holes so deep they led us i) to completely deconstruct non-deep double descent (incl. linear regressions) & ii) back in time to the 90s when smoothers were the SOTA of stats! 🤯 2/3
1
0
23
@AliciaCurth
Alicia Curth
11 months
Thanks to @GoogleDeepMind for ruining my pre-NeurIPS holiday with the tech bros, apparently #gemini is all they need @Jeffaresalan @JonathanICrabbe
Tweet media one
0
2
23
@AliciaCurth
Alicia Curth
22 days
So this note is what I would have needed while I was in my masters to see that things aren’t as different & difficult as they may seem. I think topics as fundamental as bias-variance tradeoff & overfitting should remain accessible to any grad student in statistics or ML!🤓18/n
1
0
23
@AliciaCurth
Alicia Curth
22 days
How does this relate to understanding double descent and benign overfitting?First, this makes clear that it isn’t (only) interpolation or overparametrization or modern ML breaking classical statistical intuitions — the move from in-sample to out-of-sample preds is crucial too!6/n
Tweet media one
1
0
23
@AliciaCurth
Alicia Curth
3 years
Interested in our recent work on treatment effect heterogeneity?🤓Come chat during today’s #NeurIPS poster session at 4:30pm GMT/8:30am PST 🥳 (I’ll be bringing my Time-Turner to try being at multiple posters simultaneously 🪄🧙)
@MihaelaVDS
Mihaela van der Schaar
3 years
#NeurIPS2021 is off to a busy start! A total of 4 poster sessions later today for papers by lab members @AliciaCurth , Changhee Lee, @QianZhaozhi , Yao Zhang, and @IoanaBica95 . Definitely worth a look for anyone interested in treatment effects! More info:
Tweet media one
0
2
22
0
3
22
@AliciaCurth
Alicia Curth
1 year
Next Monday (June 12, 4pm BST), our lab is hosting an inspiration exchange where I'll be presenting lots of our newest work on ML for personalized treatment effect estimation with @MihaelaVDS ! 🤓🥳 More info about attending online: 1/3
1
2
23
@AliciaCurth
Alicia Curth
2 years
I wrote CATENets back in the first year of my PhD (& it was the first “real deep learning” I ever did!), mainly to have a tool to understand & benchmark lots of existing (&new) methods using fair, comparable, implementations, so super excited that it actually (still) gets used 🥳
@AleksanderMolak
Aleksander Molak {'url': 'CausalPython.io'}
2 years
⭕ CATENets () Developed by @AliciaCurth - a researcher at van der Schaar Lab - the package offers a unique set of deep-learning based CATE estimators. From original architectures designed by Curth and van der Schaar (SNet, FlexTENet) to... 🧵 (7/n)
1
0
2
0
0
21
@AliciaCurth
Alicia Curth
22 days
This is also my answer to last week’s big discussion on overfitting (I’m a little late😅): whether we should still worry about overfitting today really depends on which setting one is interested in (do train inputs reappear?), and on how models interpolate the training data! 15/n
@srush_nlp
Sasha Rush
1 month
What's the right answer in my Deep Learning class when anxious students say: "Doesn't that lead to overfitting!"
127
22
622
1
0
20
@AliciaCurth
Alicia Curth
11 months
I had a wonderful day yesterday chatting to @AleksanderMolak about causality, machine learning research and life more generally 😍 really cannot wait to see the final product of his visit, stay tuned 👀👀
@AleksanderMolak
Aleksander Molak {'url': 'CausalPython.io'}
11 months
Yesterday, I visited the Center for Mathematical Sciences at @Cambridge_Uni to talk with @AliciaCurth . 1/2 #CausalBanditsPodcast
Tweet media one
1
1
20
0
0
19
@AliciaCurth
Alicia Curth
9 months
Concluding thought: For real progress on many of these questions, I think what the CATE ML literature is really missing is good & realistic benchmark datasets that exhibit actual complexities of real-world data to evaluate how well our methods are *actually* doing… 8/8
Tweet media one
0
1
19
@AliciaCurth
Alicia Curth
22 days
NB: this may have been obvious to some, but it really wasnt for me. To be honest, for a couple of years I thought I probably wont ever understand the modern stuff — I’m not a learning theorist and whenever I tried reading papers on the topic, explanations went over my head. 17/n
1
0
19
@AliciaCurth
Alicia Curth
22 days
Also, the focus on in-sample prediction in statistics is probably a reason for historical absence of such phenomena in the literature: its easy to see that they CANNOT occur in in-sample settings, as all interpolating models (indep of size) make the same predictions in-sample!7/n
Tweet media one
1
0
19
@AliciaCurth
Alicia Curth
8 months
We argue that bias-in-mean is NOT a useful notion of bias when comparing trees&forests. Why? Maybe surprisingly, the expected predictor of the class of trees is NOT necessarily itself a member of the class of trees (cf below)🤯 thus, we maybe shouldn’t compare expectations! 3/n
Tweet media one
2
0
17
@AliciaCurth
Alicia Curth
22 days
What about benign overfitting? Well, I’d say to understand that we first need to be a little more precise about vocabulary. Quite literally, overfitting cannot be benign as the term itself implies that performance suffers. Instead, ask: When can *interpolation* be benign? 9/n
Tweet media one
1
0
18
@AliciaCurth
Alicia Curth
22 days
I was taught that interpolation causes overfitting as a consequence of noise in outcomes. Turns out, this intuition probably once more a relic of in-sample prediction! For in-sample preds, interpolation indeed simply CANNOT be benign. Here, it’s all about variance! 10/n
Tweet media one
1
0
18
@AliciaCurth
Alicia Curth
2 years
I myself might be at home with big #NeurIPS2022 -FOMO but fortunately the amazing @IoanaBica95 & @JonathanICrabbe are in New Orleans to present & discuss our work on benchmarking treatment effect estimators 🥳— if you’re interested, you can catch them at todays poster session!🤓
@IoanaBica95
Ioana Bica
2 years
Today at #NeurIPS2022 in the Datasets and Benchmarks Track, we’ll be presenting our work on “Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability” With: @JonathanICrabbe , @AliciaCurth , @MihaelaVDS 🚩Hall J, poster #1014 - 11a.m. to 1p.m.
Tweet media one
1
4
39
0
0
17
@AliciaCurth
Alicia Curth
3 years
Want to chat some more about ML for heterogeneous treatment effect estimation?🤓 Come join me at today‘s #NeurIPS datasets & benchmarks Poster session at 4:30pm GMT to discuss better benchmarking for CATE estimation 🥳🤗
@MihaelaVDS
Mihaela van der Schaar
3 years
Today's our lab's busiest day at #NeurIPS2021 ! A real variety of papers on show, with topics ranging from data imputation to fairness in synthetic data, understanding/empowering decision-making, benchmarking for treatment effects, and more! Details here:
Tweet media one
0
3
14
0
1
16
@AliciaCurth
Alicia Curth
8 months
Also, some practical takeaways: 1) Not all smoothing is good (as always, you can also overdo it). 2) The effect of hyperparameters seems to be VERY different on in- and out-of-sample performance. 3) More trees probably never hurts though. 👀 👈🏻 10/10
Tweet media one
2
1
16
@AliciaCurth
Alicia Curth
8 months
To summarise what we’ve discovered so far: forests are sometimes much smoother than trees when making predictions! But WHY does this mean that forests perform & generalize better than trees?? That’s in part 2 of the paper — which I’ll discuss tomorrow 🤓 Stay tuned…👀
Tweet media one
3
0
16
@AliciaCurth
Alicia Curth
2 years
Interested to learn more about our work on heterogeneous treatment effect estimation & competing risks (or anything else)?🤓Then stop by for a chat at our @aistats_conf poster at 16:30 (spot 79) today 🥳☀️
@MihaelaVDS
Mihaela van der Schaar
2 years
In the same session, we have @AliciaCurth & I theoretically analyse & empirically illustrate when and how competing risks play a role in using generic machine learning prediction models for the estimation of heterogeneous treatment effects.
1
0
3
0
0
13
@AliciaCurth
Alicia Curth
22 days
A classical k-NN estimator cannot do that: a 1-NN estimator ALWAYS uses exactly 1 neighbor. Many modern ML methods, however, implicitly do something different. Eg we showed recently that random forests can be 1-NN estimators at train time but k-NN estimators at test time… 13/n
@AliciaCurth
Alicia Curth
8 months
It makes it easy to see that differences between predictions of trees and forests appear when&where problems are *underspecified*: eg individual interpolating trees will behave like 1-NN estimators everywhere, but ensembles thereof may act as k-NN estimators at test-time! 3/n
Tweet media one
1
1
15
1
0
15
@AliciaCurth
Alicia Curth
8 months
This relates to older computer science perspectives on the success of ensembles (): lost to the stats literature, they argue that ensembles can reduce both *model variability* AND *representational bias* given fixed data! That means there are at least.. 5/n
Tweet media one
1
1
14
@AliciaCurth
Alicia Curth
8 months
It makes it easy to see that differences between predictions of trees and forests appear when&where problems are *underspecified*: eg individual interpolating trees will behave like 1-NN estimators everywhere, but ensembles thereof may act as k-NN estimators at test-time! 3/n
Tweet media one
1
1
15
@AliciaCurth
Alicia Curth
2 years
Ready for Round 2!🤓 Today I’ll be presenting our new line of work on adaptive clinical trials at the ReALML workshop (Room 309), with spotlight talk after 11:40 and poster session at 17:05! 🥳 Stop by to discuss how to make clinical trials more efficient using ideas from ML! 🙌🏻
@MihaelaVDS
Mihaela van der Schaar
2 years
Next up will be @AliciaCurth , Alihan Hüyük & I with our contributions to the Adaptive Experimental Design & Active Learning in the Real World workshop (). Adaptively identifying good patient populations & good arms! This might transform clinical trials.2/2
0
0
1
1
1
13
@AliciaCurth
Alicia Curth
8 months
TLDR? Forests improve upon trees bc they issue predictions that are smoother functions of the training outcomes. This reduces both the effect of outcome noise AND enriches the available hypothesis space, esp for out-of-sample predictions. Maybe EoSL needs a little update 😉 9/n
1
1
13
@AliciaCurth
Alicia Curth
22 days
Indeed, we show in for linear regs, forests & boosting that if we distinguish train and test complexity using effective param measures, you find that benignly interpolating models make less complex preds at test than train time. (Neural nets out soon!)14/n
Tweet media one
1
0
13
@AliciaCurth
Alicia Curth
8 months
Their conclusion that variance reduction alone makes forests better than trees is based on the fact that trees&forests have the same expectation and hence the same bias-in-mean. By the classical bias-var decomp of the MSE, all gain must thus come from the variance. True - BUT 2/n
Tweet media one
1
1
12
@AliciaCurth
Alicia Curth
8 months
Instead, we argue that a natural candidate for evaluating bias is the performance of the best-in-class predictor — which can&will differ between trees and forests! Conceptually, the class of forests interpolates between all possible tree predictions and is thus much richer! 4/n
Tweet media one
1
0
11
@AliciaCurth
Alicia Curth
1 year
If you’re looking for more entertaining content regarding our joint work on double descent (and more), my amazing coauthor @Jeffaresalan has got you covered…
@Jeffaresalan
Alan Jeffares
1 year
Myself & Alicia wrote a NeurIPS Oral (🤯) where we tried to wrap our heads around double descent and tl;dr:
Tweet media one
0
11
60
0
0
12
@AliciaCurth
Alicia Curth
9 months
Finally, note that we usually study simple binary CATE static settings in the ML literature, but real problems have so many more layers of complexity, e.g. censoring, informative sampling, more complex treatment types & temporal structures. Shouldn’t we handle all jointly? 7/8
Tweet media one
1
1
10
@AliciaCurth
Alicia Curth
8 months
(iii) more smooth when more randomness is used in tree construction! We also show that the train-test difference in the level of smoothing is not limited to interpolating trees. BUT “spiked-smooth” behavior does appear more pronounced the more overfitted individual trees are. 6/n
Tweet media one
1
1
11
@AliciaCurth
Alicia Curth
8 months
This is a nice intuition & relates to what Wyner et al () conjectured to be the “spiked-smooth” behaviour & driver of success of interpolating forests. But could we use our smoother-setup to somehow *quantify* whether this intuition is actually correct? 4/n
1
0
11
@AliciaCurth
Alicia Curth
8 months
… so check out the paper () to learn more, and to see some experiments illustrating these theoretical arguments on real data! I had a wonderful time working with my amazing coauthors at MSR on this & learned a lot on the way! 🤓🤗 11/11
0
1
11
@AliciaCurth
Alicia Curth
9 months
First, forecasting under intervention on treatments ofc requires strong identifiability assumptions that are a data problem, not a learning problem. Ie garbage in, garbage out (ML cannot do magic…) — BUT ML might be able to help making assumptions more likely to hold. 4/8
Tweet media one
1
1
10
@AliciaCurth
Alicia Curth
22 days
Things change for new inputs:here models can be both more and less overfitted than when in-sample prediction is of interest. Overfitting can be worse out of sample than in-sample because an interpolating model with zero bias in-sample can have substantial bias out-of-sample 11/n
Tweet media one
1
0
11
@AliciaCurth
Alicia Curth
22 days
But some interpolating models CAN also be less overfit out-of-sample than in-sample. This is what happens eg in the second descent in double descent, and intuitively is a consequence of models that can behave differently around new examples than around training examples:… 12/n
2
0
10
@AliciaCurth
Alicia Curth
8 months
We also demonstrate that the addition of bootstrapping to the random forest procedure has an additional smoothing effect — this time on both test- AND train-time predictions!! Bootstrapping thus really makes a difference when in-sample predictions are of interest! 8/n
Tweet media one
1
1
10
@AliciaCurth
Alicia Curth
9 months
I personally think there are really 3+1 main features of the treatment effect estimation problem that make it a fascinating & non-standard ML problem, which each gets its own section in our review and a quick discussion below! … 3/8
Tweet media one
1
1
9
@AliciaCurth
Alicia Curth
8 months
Even if time was linear, using SCs to construct estimates of summaries of the control distr. beyond the mean (eg survival curves) will thus usually be biased bc SCs underestimate the number of events occurring in the tails! Ie the shapes of survival curves of SCs will be off 9/n
Tweet media one
1
0
9
@AliciaCurth
Alicia Curth
8 months
Yes! We make use of our smoother-based effective parameter measure from and show that interpolating forests are indeed (i) more smooth when issuing predictions on unseen test inputs than on train inputs, (ii) more smooth than individual trees and … 5/n
Tweet media one
1
1
10
@AliciaCurth
Alicia Curth
9 months
Third, and I’ve personally found this the most unique aspect of CATE estimation, the true label of interest — the difference between POs Y(1)-Y(0) — is never observed, so it’s really not obvious how to design (&choose between) methods that are well-targeted at estimating it! 6/8
Tweet media one
2
1
8
@AliciaCurth
Alicia Curth
8 months
We also observe that the train-test effective parameter gap grows if we use more randomness in tree construction. Our findings are thus in line with the “randomisation as regularisation” viewpoint of Mentch & Zhou (), discussed at length in our paper! 9/n
Tweet media one
1
0
8
@AliciaCurth
Alicia Curth
2 years
… so I am still with the conclusions of our benchmark paper written almost 2 years ago now (see below). Yet, despite all this time passed, I still haven’t found a (ideally real-data-inspired) benchmark for this setting that actually makes me happy on all fronts 🥲
Tweet media one
2
0
8
@AliciaCurth
Alicia Curth
3 years
Super happy to share that our paper on benchmarking practices in CATE estimation (), written in collaboration with D. Svensson & @weatheralljim75 from AZ, also just got accepted to the new NeurIPS21 datasets & benchmarks track 🥳
@MihaelaVDS
Mihaela van der Schaar
3 years
⚠️2 more papers accepted to #NeurIPS2021 , for a grand total of 14 for our lab! Congratulations to lead authors @AliciaCurth and @AlexJChan , whose papers have just been accepted to the Datasets and Benchmarks Track! Updated announcement here:
Tweet media one
0
1
33
1
0
9
@AliciaCurth
Alicia Curth
9 months
Second, (observed) treatment assignment biases often lead to covariate shifts between treatment groups — this is why domain adaptation methods have become so popular in this literature to improve potential outcome (PO) predictions. 5/8
Tweet media one
1
1
8
@AliciaCurth
Alicia Curth
8 months
We can now also show that, as conjectured by Wyner et al (2017), a train-test difference in the level of smoothing used when issuing predictions also appears in boosting! Like standard ensembles, boosted ensembles can be more smooth at test than at training time! 7/n
Tweet media one
1
1
8
@AliciaCurth
Alicia Curth
2 years
@TacoCohen @ShalitUri @WvanAmsterdam We wrote about this at NeurIPS21 in the special case of CATE estimation () and came to the conclusion that good & varied simulations are indeed a way to go—but that reporting also needs to be more transparent in how some DGPs favor some models inherently.
1
0
7