Alicia Curth @AliciaCurth profile

Alicia Curth

@AliciaCurth

Followers

4,139

Following

518

Media

92

Statuses

273

Senior Researcher Machine Learning @MSFTResearch , Statistician at ❤️ In search of statistical intuition for modern ML &simple explanations for complex things 👀

https://t.co/0aryS0yS3A

Joined September 2020

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

Errejón • 536543 Tweets

LINGORM MAJOR FANDOM • 280803 Tweets

MARK LEE AT RALPHS NEW YORK • 139092 Tweets

Ivan • 110045 Tweets

期日前投票 • 106816 Tweets

Luka • 99036 Tweets

علي النبي • 75450 Tweets

ZeeNuNew At ONE BKK • 74299 Tweets

Durk • 49355 Tweets

Hayırlı Cumalar • 47836 Tweets

Blink Gone • 39495 Tweets

Alien Stage • 38354 Tweets

GROW UP TO BE FREENBECKY • 33165 Tweets

#それスノ • 31981 Tweets

下請法違反 • 31307 Tweets

Mizi • 26868 Tweets

結城さく • 18909 Tweets

悪役令嬢の中の人 • 18393 Tweets

hyuna • 10388 Tweets

英語禁止ボウリング

オトシドリ

オフイベ

Jeh Johnson

リテイク

Saud Shakeel

自動行軍

CR40

REDLINE

クランクアップ

Patrick Red Hair

プロジェクトRX

エイリアンステージ

性的暴行

イヴァン

サウチェ

MCダイジェスト

ブレーザー

チャンソン

ペディオ

ドリアン

シェリン

ByahengPinas for SB19Con

ナカヌチャン

에이스테

マルチアングル

FACT復活

#ساعه_الدعاء_المستجاب

#لموتاكم_حق_الدعاء_لهم

#نرفع_اعلانكم_О59О2О5946

#ريتشارد_اتياس_في_مخيال

Last Seen Profiles

@apple_saucess

@Satyave68056101

@caecusmacula

@EileneS32446

@BBCNewsPRJapan

@BeeMajka

@littledangerdol

@ar971_

@lundybirds

@jordibarbeta

@CNRBasketball

@excalibur1071

@antonio79427807

@eighteen_XX1

@snow_sou

@Cadence_A62

@Aina1873592

@Emilymottaofc

@FabianMarble

@Moh_Ali_6371

Pinned Tweet

Alicia Curth

@AliciaCurth

8 months

Why do Random Forests perform so well off-the-shelf & appear essentially immune to overfitting?!? I’ve found the text-book answer “it’s just variance reduction 🤷🏼‍♀️” to be a bit too unspecific, so in our new pre-print , @Jeffaresalan & I investigate..🕵🏼‍♀️ 1/n

15

225

1K

Alicia Curth

@AliciaCurth

22 days

When Double Descent & Benign Overfitting became a thing, I was a masters student in statistics — and so confused. I couldn't reconcile what l had literally just learned about bias-variance&co with modern ML Here's what I wish someone had told me then: 1/n

18

183

1K

Alicia Curth

@AliciaCurth

9 months

I spent the first 2.5 years of my PhD on the question “What makes individualised treatment effect estimation an interesting Machine Learning problem (and how do we best solve it)?”. Super excited that a review of lots of things we learned along the way was accepted into… 1/8

10

112

733

Alicia Curth

@AliciaCurth

1 year

Super excited to finally share @Jeffaresalan & my #NeurIPS2023 Oral: 🥳 — a slightly unconventional paper leading to a surprising and (shockingly) simple resolution of the tension between statistical intuition & double descent! 1/3

12

116

620

Alicia Curth

@AliciaCurth

8 months

Part 2: So why DO Random Forests work?! On this, I’ll have to disagree with Elements of Statistical Learning (my first time ever 💔) EoSL says the success of forests should be understood as a consequence of variance reduction *alone*, but I think that’s not a good intuition 1/n

Alicia Curth

@AliciaCurth

8 months

Why do Random Forests perform so well off-the-shelf & appear essentially immune to overfitting?!? I’ve found the text-book answer “it’s just variance reduction 🤷🏼‍♀️” to be a bit too unspecific, so in our new pre-print , @Jeffaresalan & I investigate..🕵🏼‍♀️ 1/n

15

225

1K

6

77

540

Alicia Curth

@AliciaCurth

1 year

Getting into conference mode like…. 🌺 #ICML2023

4

7

222

Alicia Curth

@AliciaCurth

10 months

oh and P.S. I don’t think we will ever be able to top the level of creativity it took to come up with our 3D poster. Most successful arts&crafts project @Jeffaresalan or I have ever been involved in 🙏🏻

Alicia Curth

@AliciaCurth

10 months

Having started my PhD in the gathertown era, it’s bittersweet to realise that the most rewarding PhD moments happen in-person at conferences. I’ve had an incredible week putting faces to names & it’s been a surreal experience presenting our work to so many of them. I❤️NeurIPS!

3

158

2

12

174

Alicia Curth

@AliciaCurth

10 months

Having started my PhD in the gathertown era, it’s bittersweet to realise that the most rewarding PhD moments happen in-person at conferences. I’ve had an incredible week putting faces to names & it’s been a surreal experience presenting our work to so many of them. I❤️NeurIPS!

3

158

Alicia Curth

@AliciaCurth

1 year

Every StatML intro class covers complexity-error U-curves, so @Jeffaresalan & I asked ourselves whether the info from these classes is enough to explain double descent too? Our #NeurIPS23 paper does a roundtrip of The Elements of Statistical Learning and answers “Yes”! Long🧵1/n

4

23

156

Alicia Curth

@AliciaCurth

2 years

I complain a lot about the general quality of ML reviews (offline), so I try to do better when I review myself. I was a reviewer for two conferences during 2022 (ICML22, AISTATS23), and now received a top reviewer award for both 🥳 Excited to see that this effort pays off!😊

4

2

155

Alicia Curth

@AliciaCurth

18 days

Addendum — if you take away just one thing from this thread, it should be: Machine learning isn’t some kind of magic that defies the laws of statistics! I believe fundamental concepts from classical statistics will (probably) be “all we need” to understand modern ML!! BUT… 1/3

Alicia Curth

@AliciaCurth

22 days

When Double Descent & Benign Overfitting became a thing, I was a masters student in statistics — and so confused. I couldn't reconcile what l had literally just learned about bias-variance&co with modern ML Here's what I wish someone had told me then: 1/n

18

183

1K

2

11

140

Alicia Curth

@AliciaCurth

8 months

Economists seem to LOVE synthetic control methods, so during my MSR internship with @javiergonzh we wanted to understand whether we could use them for survival analyses (v prevalent in medicine) too? Delighted that our answer (“It’s complicated!”) was accepted @Conf_CLeaR … 1/n

3

29

140

Alicia Curth

@AliciaCurth

9 months

Receiving feedback on paper drafts as a German be like: 😭 (thx @Jeffaresalan )

7

2

127

Alicia Curth

@AliciaCurth

8 months

P.S.: Exciting for me as an ex-econometrician, this project also meant I finally got to learn what‘s behind all that ✨ synthetic control magic ✨🕵🏼‍♀️ My lukewarm (?) take: no magic, just some linearity assumptions* doing v heavy lifting in the background 🫢

Alicia Curth

@AliciaCurth

8 months

Economists seem to LOVE synthetic control methods, so during my MSR internship with @javiergonzh we wanted to understand whether we could use them for survival analyses (v prevalent in medicine) too? Delighted that our answer (“It’s complicated!”) was accepted @Conf_CLeaR … 1/n

3

29

140

3

18

117

Alicia Curth

@AliciaCurth

9 months

If you missed us in New Orleans but wanted to hear @Jeffaresalan & myself talk about (literal and figurative) U-turns on double descent, it seems that NeurIPS has made all recordings of Orals publicly available!🥳 Find us at minute 35:15 in this recording:

2

10

110

Alicia Curth

@AliciaCurth

3 months

Excited to be back at #icml !☀️Find me floating around or come chat to me & @Jeffaresalan about our integrated attempt at understanding deep double descent, grokking, linear mode connectivity & differences between gradient boosting and neural nets on Friday at the HiLD workshop!🤓

0

7

105

Alicia Curth

@AliciaCurth

1 year

I am going to Honolulu and I’m bringing … 3 posters!!!🤯🥳🌺 beyond excited & happy that lots of hard work paid off — but also feeling very lucky to have had great coauthors 🤗 as well as the most engaged set of reviewers *and* ACs I’ve seen so far! See you in July @icmlconf ☀️

Mihaela van der Schaar

@MihaelaVDS

1 year

Incredibly proud of what my students have achieved with our contributions for #ICML2023 ! We will present a range of our intensive work on causal deep learning, clinical trials, treatment effect estimation, synthetic data and deep learning for tabular data:

4

5

46

2

3

77

Alicia Curth

@AliciaCurth

11 months

brb just quickly recharging the batteries en route to #NeurIPS2023 to get ready for the highlight of my academic year 🙆🏼‍♀️☀️ Next up: beyond excited to present our work on double descent with @Jeffaresalan as an oral in the first conference session on Tuesday! See you 🔜 NOLA 😎

0

1

71

Alicia Curth

@AliciaCurth

1 year

Been sitting on this for a while now, but we are almost camera-ready so I can finally share: started a new research thread w/ @Jeffaresalan earlier this year!! Our joint paper goes down a surprising rabbit hole & got rewarded with a NeurIPS Oral!🤯🥳 (Paper dropping next week🔥⏳)

2

4

67

Alicia Curth

@AliciaCurth

11 months

It’s finally time: tomorrow @Jeffaresalan & I will be presenting our #NeurIPS2023 paper on a surprisingly simple resolution to double descent in Oral session 1D at 10:30am in room R06-09 (level 2) 🥳 Beware: it’s a little trek to get to the room (upstairs), don’t miss it 😉

Alicia Curth

@AliciaCurth

1 year

Super excited to finally share @Jeffaresalan & my #NeurIPS2023 Oral: 🥳 — a slightly unconventional paper leading to a surprising and (shockingly) simple resolution of the tension between statistical intuition & double descent! 1/3

12

116

620

1

5

53

Alicia Curth

@AliciaCurth

7 months

Another year, another amazing @Conf_CLeaR !! Had only one complaint last year (the weather on the conference hike…) and even that was perfectly arranged this time☀️ Personal takeaway: small, focused ML conferences are so so great — esp for PhD students & for finding community!

Alicia Curth

@AliciaCurth

2 years

Had the absolute best time at @CLeaR_2022 in Tübingen the last few days! From great talks & papers to great people, great organisation & great food, this conference had everything I could have hoped for 😍 (except for maybe great weather… ) Really can’t wait for #CLeaR24 🤓

1

2

39

1

4

53

Alicia Curth

@AliciaCurth

22 days

I’ve spent the last 1.5 years working with the amazing @Jeffaresalan on understanding modern ML phenomena, questioning everything we know about statistics in the process. The above is probably one of my biggest yet simplest takeaways! More here: 19/19

Classical Statistical (In-Sample) Intuitions Don't Generalize...

The sudden appearance of modern machine learning (ML) phenomena like double descent and benign overfitting may leave many classically trained statisticians feeling uneasy -- these phenomena appear...

arxiv.org

3

6

54

Alicia Curth

@AliciaCurth

10 months

2023’s biggest PhD highlights were def the conferences for me, finally being able to attend in person does make such a difference 🙌🏻 Personal top moments from ICML & NeurIPS below (slightly different vibes)

0

48

Alicia Curth

@AliciaCurth

1 year

Never have I ever been this relaxed while reading conference reviews 🙆🏼‍♀️

0

45

Alicia Curth

@AliciaCurth

4 years

Beyond excited to share that the first paper of my PhD with @MihaelaVDS , on estimating conditional average treatment effects using meta-learners and neural nets, was recently accepted for publication at #AISTATS2021 ! Paper: Code:

2

43

Alicia Curth

@AliciaCurth

8 months

In other news: just interrupting the usual stats/ML coverage to share completion of my final @Cambridge_Uni bucketlist item — being part of @clarehall_cam ’s first ever women’s crew to win blades in Lent Bumps last week 😱💪🏻 is that Cam telling me it’s time to graduate soon…?🤔

2

0

41

Alicia Curth

@AliciaCurth

9 months

Fun fact: when @Jeffaresalan & I fell down the double descent rabbit hole, we were actually looking into another question entirely. Why do simple ensembles continue to work so well in practice?! We learned a lot about Random Forests on the way & have now come full circle: ⬇️🚨👀

Stat.ML Papers

@StatMLPapers

9 months

Why do Random Forests Work? Understanding Tree Ensembles as Self-Regularizing Adaptive Smoothers

0

32

145

0

1

43

Alicia Curth

@AliciaCurth

2 years

Delighted ☀️ to be in Valencia this week to present our paper on heterogeneous treatment effect estimation in the presence of competing risks 🙌🏻😎 I’m extra excited because I FINALLY get to attend @aistats_conf in person-it’s where my first PhD paper was published back in 2021 🤓

Alicia Curth

@AliciaCurth

2 years

Excited to share the next chapter in my saga on heterogeneous treatment effect estimation (aka my PhD) — to be presented at @aistats_conf in April — which features some interesting new characters: competing events! () 1/n

1

0

32

0

2

40

Alicia Curth

@AliciaCurth

2 years

Had the absolute best time at @CLeaR_2022 in Tübingen the last few days! From great talks & papers to great people, great organisation & great food, this conference had everything I could have hoped for 😍 (except for maybe great weather… ) Really can’t wait for #CLeaR24 🤓

1

2

39

Alicia Curth

@AliciaCurth

3 years

Super excited to share that I’ve not only had my first ever #NeurIPS paper accepted, but also my second (joint with C. Lee) and third (led by @QianZhaozhi ) 🥳🤯 Finished my first year with @MihaelaVDS on a high note!🥳

Mihaela van der Schaar

@MihaelaVDS

3 years

I'm still processing our #NeurIPS2021 results—12 papers accepted! All I can say is THANK YOU to our superstar lab members for your brilliance and dedication. So proud of you all! Details here:

9

10

220

1

38

Alicia Curth

@AliciaCurth

22 days

Turns out, there’s quite a simple explanation noone talks about: The intuitions on bias-variance tradeoff & overfitting I was taught apply to in-sample prediction(where only outputs are resampled at testtime) while modern ML wants generalization to new inputs -crucial change! 2/n

2

0

38

Alicia Curth

@AliciaCurth

1 year

Long 🧵to follow soon, for now check out the paper here: ! We learned A LOT about statistics, ML & their history on the way — really hope that people will enjoy reading this paper even half as much as we did writing it! 🤓 3/3

A U-turn on Double Descent: Rethinking Parameter Counting in...

Conventional statistical wisdom established a well-understood relationship between model complexity and prediction error, typically presented as a U-shaped curve reflecting a transition between...

arxiv.org

1

37

Alicia Curth

@AliciaCurth

2 years

After two years of gathertown, the day has finally come: it’s time for the first in-person presentation & poster of my PhD 🥳 I’ll be presenting at 5:35pm (Room 318) with poster session 6:30-8:30pm —come by if you’d like to chat about imputation (or anything else)!🤓 #ICML2022

Mihaela van der Schaar

@MihaelaVDS

2 years

Second in line at #ICML2022 are Daniel Jarrett, @BCebere , Tennison Liu, @AliciaCurth & I: HyperImpute, a generalised iterative imputation framework. Missing data is a big problem and here, we present THE state-of-the-art tool that can help solving it! 2/2

0

1

11

4

38

Alicia Curth

@AliciaCurth

2 years

I have spent *tons* of time in the last couple of years with @MihaelaVDS trying to find good benchmarks to evaluate (heterogeneous) causal effect estimators — and am still not really satisfied with what we’ve got (see e.g. our NeurIPS21 critique ) 🥲 …

Wouter van Amsterdam

@WvanAmsterdam

2 years

Interesting take. The comparison with NNs breaks here: with NNs we can easily empirically verify performance (e.g. ImageNet) What's the ImageNet of causal inference? Maybe going forward we should accept simulations as (supporting) 'proof' instead of just theorems / formal proofs

4

0

18

3

36

Alicia Curth

@AliciaCurth

3 months

I put as much effort into reviewing as how I would like my own papers to be reviewed — I think more reviewers should give that a try 😅 Be the change you wish to see in the world right? 😉 (also, if you need an incentive: it might give you a free conference registration!)

ICML Conference

@icmlconf

3 months

Congratulations to best reviewer awards

6

11

120

1

0

37

Alicia Curth

@AliciaCurth

3 years

What drives the relative empirical performance of ML algorithms for CATE estimation? Sometimes it's simply the choice of benchmark dataset! With @MihaelaVDS , I wrote about this for the #ICML2021 Workshop on Neglected Assumptions in Causal Inference happening tomorrow.

1

6

36

Alicia Curth

@AliciaCurth

3 months

Want to hear about the next stop on our journey into understanding modern deep learning phenomena? Come find @Jeffaresalan & myself in the poster sessions at 10:00 and 15:30 at the workshop on high-dimensional learning dynamics @icmlconf in Straus 2 tomorrow! 🙌🏻 #ICML2024

Alicia Curth

@AliciaCurth

3 months

Excited to be back at #icml !☀️Find me floating around or come chat to me & @Jeffaresalan about our integrated attempt at understanding deep double descent, grokking, linear mode connectivity & differences between gradient boosting and neural nets on Friday at the HiLD workshop!🤓

0

7

105

3

1

35

Alicia Curth

@AliciaCurth

1 year

#ICML2023 camera-ready ✔️✔️✔️ #ICML2023 travel-ready ⏳🔜🌺

0

33

Alicia Curth

@AliciaCurth

2 years

Excited to share the next chapter in my saga on heterogeneous treatment effect estimation (aka my PhD) — to be presented at @aistats_conf in April — which features some interesting new characters: competing events! () 1/n

Understanding the Impact of Competing Events on Heterogeneous...

We study the problem of inferring heterogeneous treatment effects (HTEs) from time-to-event data in the presence of competing events. Albeit its great practical relevance, this problem has...

arxiv.org

Mihaela van der Schaar

@MihaelaVDS

2 years

What makes estimating heterogeneous treatment effects from survival data *in the presence of competing events* challenging? We study this new & important problem, and theoretically analyse & empirically illustrate when & how competing events affect ML here

0

1

7

1

0

32

Alicia Curth

@AliciaCurth

22 days

In addition to what we wrote about double descent in , this is thus another reason why double descent does not contradict the bias-variance tradeoff: the bias-variance tradeoff holds in-sample, while double descent *exclusively* appears out-of-sample! 8/n

1

31

Alicia Curth

@AliciaCurth

22 days

When I think bias-variance tradeoff, I think about k-nearest neighbor estimators. I was taught that variance increases with complexity while bias decreases with complexity (here: lower k) because for in-sample prediction the 1-NN estimator (the example itself) has zero bias.3/n

1

0

30

Alicia Curth

@AliciaCurth

22 days

Well, turns out this intuition is actually NOT always true for out-of-sample prediction (ie generalization) 🤯 Because for a new input there is no perfect training match, the bias of the most complex estimator in this class (the 1-NN estimator) is NOT necessarily the lowest 4/n

1

0

28

Alicia Curth

@AliciaCurth

18 days

In out-of-sample settings, even for v simple models: 1. there isnt always a tradeoff between bias&var 2. bias can sometimes get worse with increased complexity 3. overfitting can also be a consequence of bias (not only var) - and this is crucial for understanding modern ML! 3/3

0

28

Alicia Curth

@AliciaCurth

10 months

If you’ve ever wanted to hear me rave about statistics for an hour, I’ve got a belated Christmas present for you: 🎁 Had a great time chatting to @AleksanderMolak about causality, double descent, stats and my journey into ML research from econometrics! 🤓

How Does Causal AI Change Medicine? Alicia Curth Explains Ep 6 |...

Join Alicia Curth, a pioneering researcher from Cambridge University, as she explores the transformative impact of causal AI in personalized medicine. Learn ...

www.youtube.com

Aleksander Molak {'url': 'CausalPython.io'}

@AleksanderMolak

10 months

A causal journey from Amsterdam to Cambridge and from potential outcomes to DAGs and back. A new premiere today! 1/n #causality #causalAI #causaltwitter #machinelearning #neurips

1

0

12

0

1

27

Alicia Curth

@AliciaCurth

9 months

…Clinical Pharmacology & Therapeutic’s special issue on Machine Learning 🥳 Link: , with big thanks to my amazing coauthors Richard Peck, Eoin McKinney, @weatheralljim75 & @MihaelaVDS 🙌🏻 To briefly answer the question in the top-level tweet, .. 2/8

Using Machine Learning to Individualize Treatment Effect Estimation: Challenges and Opportunities

The use of data from randomized clinical trials to justify treatment decisions for real-world patients is the current state of the art. It relies on the assumption that average treatment effects...

ascpt.onlinelibrary.wiley.com

2

1

25

Alicia Curth

@AliciaCurth

8 months

We make use of the adaptive nearest neighbor interpretation of trees & forests (eg ) bc that makes them much easier to reason about: trees are simply smoothers with learned weights! We show that this view makes their behaviour intuitive to understand… 2/n

3

1

25

Alicia Curth

@AliciaCurth

1 year

Super excited to be speaking about our work on using machine learning for discovering & understanding treatment effect heterogeneity at @AIClubBioMed this week 🤓

Cambridge AI Club for Biomedicine

@AIClubBioMed

1 year

Next event alert! Join us on Thur 4th May to explore this month's theme: "Machine Learning for Clinical Decision Making". We have two exciting talks from Alicia Curth @AliciaCurth and Vincent Jeanselme @JeanselmeV , followed by pizza. See you there 👀 @TheMilnerInst @CRUK_CI

1

9

16

1

0

25

Alicia Curth

@AliciaCurth

11 months

Had the absolute pleasure of learning from the amazing @dennisfrauen about sensitivity analysis when he visited us in Cambridge over the summer to work on this paper! 🤓 Go check it out here: 🙌🏻

A Neural Framework for Generalized Causal Sensitivity Analysis

Unobserved confounding is common in many applications, making causal inference from observational data challenging. As a remedy, causal sensitivity analysis is an important tool to draw causal...

arxiv.org

Stefan Feuerriegel

@stfeuerriegel

11 months

🚨New preprint: A Neural Framework for Generalized Causal Sensitivity Analysis 👉 We propose NeuralCSA: a #neural framework for generalized #causal #sensitivity #analysis /🧵

1

6

17

1

25

Alicia Curth

@AliciaCurth

22 days

Here’s a simulation example of this: while bias behaves monotonically as expected in-sample, it doesn’t out of sample! Indeed, for k<10, *there is no bias variance tradeoff* out-of-sample: both bias and variance prefer estimators with lower complexity in this region! 5/n

1

0

25

Alicia Curth

@AliciaCurth

1 year

Aloha #icml2023 , I'm excited for a big day full of posters!🌺 If you're interested to chat about all things treatment effects, come at 11am to discuss model selection with me ( #415 ) & informative sampling with @ToonVDSchueren ( #514 ), and at 2pm for adaptive trials ( #415 )!

Alicia Curth

@AliciaCurth

1 year

Getting into conference mode like…. 🌺 #ICML2023

4

7

222

0

2

24

Alicia Curth

@AliciaCurth

22 days

In conclusion, I thus think we need to rethink how intuitions around bias-variance tradeoffs and overfitting are taught — in particular, more precision in vocabulary may be needed when we talk about these things to make clear when intuitions are likely to apply and why! 16/n

1

0

24

Alicia Curth

@AliciaCurth

4 years

First tweet, big news: just graduated @UniofOxford with a MSc in Statistical Science, received a prize by @OxfordStats for overall performance on the MSc AND joined the amazing @MihaelaVDS as a PhD student at @FacultyMaths ! 2020 certainly saved all the good things for the end 🥳

3

2

23

Alicia Curth

@AliciaCurth

1 year

Investigating the double descent phenomenon outside of deep learning, we went down 2 rabbit holes so deep they led us i) to completely deconstruct non-deep double descent (incl. linear regressions) & ii) back in time to the 90s when smoothers were the SOTA of stats! 🤯 2/3

1

0

23

Alicia Curth

@AliciaCurth

11 months

Thanks to @GoogleDeepMind for ruining my pre-NeurIPS holiday with the tech bros, apparently #gemini is all they need @Jeffaresalan @JonathanICrabbe

0

2

23

Alicia Curth

@AliciaCurth

22 days

So this note is what I would have needed while I was in my masters to see that things aren’t as different & difficult as they may seem. I think topics as fundamental as bias-variance tradeoff & overfitting should remain accessible to any grad student in statistics or ML!🤓18/n

1

0

23

Alicia Curth

@AliciaCurth

22 days

How does this relate to understanding double descent and benign overfitting?First, this makes clear that it isn’t (only) interpolation or overparametrization or modern ML breaking classical statistical intuitions — the move from in-sample to out-of-sample preds is crucial too!6/n

1

0

23

Alicia Curth

@AliciaCurth

3 years

Interested in our recent work on treatment effect heterogeneity?🤓Come chat during today’s #NeurIPS poster session at 4:30pm GMT/8:30am PST 🥳 (I’ll be bringing my Time-Turner to try being at multiple posters simultaneously 🪄🧙)

Mihaela van der Schaar

@MihaelaVDS

3 years

#NeurIPS2021 is off to a busy start! A total of 4 poster sessions later today for papers by lab members @AliciaCurth , Changhee Lee, @QianZhaozhi , Yao Zhang, and @IoanaBica95 . Definitely worth a look for anyone interested in treatment effects! More info:

0

2

22

0

3

22

Alicia Curth

@AliciaCurth

1 year

Next Monday (June 12, 4pm BST), our lab is hosting an inspiration exchange where I'll be presenting lots of our newest work on ML for personalized treatment effect estimation with @MihaelaVDS ! 🤓🥳 More info about attending online: 1/3

Inspiration Exchange #29 - van der Schaar Lab

On 12 June, the van der Schaar Lab will hold its 29th Inspiration Exchange engagement session for machine learning students.

www.vanderschaar-lab.com

1

2

23

Alicia Curth

@AliciaCurth

2 years

I wrote CATENets back in the first year of my PhD (& it was the first “real deep learning” I ever did!), mainly to have a tool to understand & benchmark lots of existing (&new) methods using fair, comparable, implementations, so super excited that it actually (still) gets used 🥳

Aleksander Molak {'url': 'CausalPython.io'}

@AleksanderMolak

2 years

⭕ CATENets () Developed by @AliciaCurth - a researcher at van der Schaar Lab - the package offers a unique set of deep-learning based CATE estimators. From original architectures designed by Curth and van der Schaar (SNet, FlexTENet) to... 🧵 (7/n)

1

0

2

0

21

Alicia Curth

@AliciaCurth

22 days

This is also my answer to last week’s big discussion on overfitting (I’m a little late😅): whether we should still worry about overfitting today really depends on which setting one is interested in (do train inputs reappear?), and on how models interpolate the training data! 15/n

Sasha Rush

@srush_nlp

1 month

What's the right answer in my Deep Learning class when anxious students say: "Doesn't that lead to overfitting!"

127

22

622

1

0

20

Alicia Curth

@AliciaCurth

11 months

I had a wonderful day yesterday chatting to @AleksanderMolak about causality, machine learning research and life more generally 😍 really cannot wait to see the final product of his visit, stay tuned 👀👀

Aleksander Molak {'url': 'CausalPython.io'}

@AleksanderMolak

11 months

Yesterday, I visited the Center for Mathematical Sciences at @Cambridge_Uni to talk with @AliciaCurth . 1/2 #CausalBanditsPodcast

1

20

0

19

Alicia Curth

@AliciaCurth

9 months

Concluding thought: For real progress on many of these questions, I think what the CATE ML literature is really missing is good & realistic benchmark datasets that exhibit actual complexities of real-world data to evaluate how well our methods are *actually* doing… 8/8

0

1

19

Alicia Curth

@AliciaCurth

22 days

NB: this may have been obvious to some, but it really wasnt for me. To be honest, for a couple of years I thought I probably wont ever understand the modern stuff — I’m not a learning theorist and whenever I tried reading papers on the topic, explanations went over my head. 17/n

1

0

19

Alicia Curth

@AliciaCurth

22 days

Also, the focus on in-sample prediction in statistics is probably a reason for historical absence of such phenomena in the literature: its easy to see that they CANNOT occur in in-sample settings, as all interpolating models (indep of size) make the same predictions in-sample!7/n

1

0

19

Alicia Curth

@AliciaCurth

8 months

We argue that bias-in-mean is NOT a useful notion of bias when comparing trees&forests. Why? Maybe surprisingly, the expected predictor of the class of trees is NOT necessarily itself a member of the class of trees (cf below)🤯 thus, we maybe shouldn’t compare expectations! 3/n

2

0

17

Alicia Curth

@AliciaCurth

22 days

What about benign overfitting? Well, I’d say to understand that we first need to be a little more precise about vocabulary. Quite literally, overfitting cannot be benign as the term itself implies that performance suffers. Instead, ask: When can *interpolation* be benign? 9/n

1

0

18

Alicia Curth

@AliciaCurth

22 days

I was taught that interpolation causes overfitting as a consequence of noise in outcomes. Turns out, this intuition probably once more a relic of in-sample prediction! For in-sample preds, interpolation indeed simply CANNOT be benign. Here, it’s all about variance! 10/n

1

0

18

Alicia Curth

@AliciaCurth

2 years

I myself might be at home with big #NeurIPS2022 -FOMO but fortunately the amazing @IoanaBica95 & @JonathanICrabbe are in New Orleans to present & discuss our work on benchmarking treatment effect estimators 🥳— if you’re interested, you can catch them at todays poster session!🤓

Ioana Bica

@IoanaBica95

2 years

Today at #NeurIPS2022 in the Datasets and Benchmarks Track, we’ll be presenting our work on “Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability” With: @JonathanICrabbe , @AliciaCurth , @MihaelaVDS 🚩Hall J, poster #1014 - 11a.m. to 1p.m.

1

4

39

0

17

Alicia Curth

@AliciaCurth

3 years

Want to chat some more about ML for heterogeneous treatment effect estimation?🤓 Come join me at today‘s #NeurIPS datasets & benchmarks Poster session at 4:30pm GMT to discuss better benchmarking for CATE estimation 🥳🤗

Mihaela van der Schaar

@MihaelaVDS

3 years

Today's our lab's busiest day at #NeurIPS2021 ! A real variety of papers on show, with topics ranging from data imputation to fairness in synthetic data, understanding/empowering decision-making, benchmarking for treatment effects, and more! Details here:

0

3

14

0

1

16

Alicia Curth

@AliciaCurth

8 months

Also, some practical takeaways: 1) Not all smoothing is good (as always, you can also overdo it). 2) The effect of hyperparameters seems to be VERY different on in- and out-of-sample performance. 3) More trees probably never hurts though. 👀 👈🏻 10/10

2

1

16

Alicia Curth

@AliciaCurth

8 months

To summarise what we’ve discovered so far: forests are sometimes much smoother than trees when making predictions! But WHY does this mean that forests perform & generalize better than trees?? That’s in part 2 of the paper — which I’ll discuss tomorrow 🤓 Stay tuned…👀

3

0

16

Alicia Curth

@AliciaCurth

2 years

Interested to learn more about our work on heterogeneous treatment effect estimation & competing risks (or anything else)?🤓Then stop by for a chat at our @aistats_conf poster at 16:30 (spot 79) today 🥳☀️

Mihaela van der Schaar

@MihaelaVDS

2 years

In the same session, we have @AliciaCurth & I theoretically analyse & empirically illustrate when and how competing risks play a role in using generic machine learning prediction models for the estimation of heterogeneous treatment effects.

1

0

3

0

13

Alicia Curth

@AliciaCurth

22 days

A classical k-NN estimator cannot do that: a 1-NN estimator ALWAYS uses exactly 1 neighbor. Many modern ML methods, however, implicitly do something different. Eg we showed recently that random forests can be 1-NN estimators at train time but k-NN estimators at test time… 13/n

Alicia Curth

@AliciaCurth

8 months

It makes it easy to see that differences between predictions of trees and forests appear when&where problems are *underspecified*: eg individual interpolating trees will behave like 1-NN estimators everywhere, but ensembles thereof may act as k-NN estimators at test-time! 3/n

1

15

1

0

15

Alicia Curth

@AliciaCurth

8 months

This relates to older computer science perspectives on the success of ensembles (): lost to the stats literature, they argue that ensembles can reduce both *model variability* AND *representational bias* given fixed data! That means there are at least.. 5/n

1

14

Alicia Curth

@AliciaCurth

8 months

It makes it easy to see that differences between predictions of trees and forests appear when&where problems are *underspecified*: eg individual interpolating trees will behave like 1-NN estimators everywhere, but ensembles thereof may act as k-NN estimators at test-time! 3/n

1

15

Alicia Curth

@AliciaCurth

2 years

Ready for Round 2!🤓 Today I’ll be presenting our new line of work on adaptive clinical trials at the ReALML workshop (Room 309), with spotlight talk after 11:40 and poster session at 17:05! 🥳 Stop by to discuss how to make clinical trials more efficient using ideas from ML! 🙌🏻

Mihaela van der Schaar

@MihaelaVDS

2 years

Next up will be @AliciaCurth , Alihan Hüyük & I with our contributions to the Adaptive Experimental Design & Active Learning in the Real World workshop (). Adaptively identifying good patient populations & good arms! This might transform clinical trials.2/2

0

1

13

Alicia Curth

@AliciaCurth

8 months

TLDR? Forests improve upon trees bc they issue predictions that are smoother functions of the training outcomes. This reduces both the effect of outcome noise AND enriches the available hypothesis space, esp for out-of-sample predictions. Maybe EoSL needs a little update 😉 9/n

1

13

Alicia Curth

@AliciaCurth

22 days

Indeed, we show in for linear regs, forests & boosting that if we distinguish train and test complexity using effective param measures, you find that benignly interpolating models make less complex preds at test than train time. (Neural nets out soon!)14/n

1

0

13

Alicia Curth

@AliciaCurth

8 months

Their conclusion that variance reduction alone makes forests better than trees is based on the fact that trees&forests have the same expectation and hence the same bias-in-mean. By the classical bias-var decomp of the MSE, all gain must thus come from the variance. True - BUT 2/n

1

12

Alicia Curth

@AliciaCurth

8 months

Instead, we argue that a natural candidate for evaluating bias is the performance of the best-in-class predictor — which can&will differ between trees and forests! Conceptually, the class of forests interpolates between all possible tree predictions and is thus much richer! 4/n

1

0

11

Alicia Curth

@AliciaCurth

1 year

If you’re looking for more entertaining content regarding our joint work on double descent (and more), my amazing coauthor @Jeffaresalan has got you covered…

Alan Jeffares

@Jeffaresalan

1 year

Myself & Alicia wrote a NeurIPS Oral (🤯) where we tried to wrap our heads around double descent and tl;dr:

0

11

60

0

12

Alicia Curth

@AliciaCurth

9 months

Finally, note that we usually study simple binary CATE static settings in the ML literature, but real problems have so many more layers of complexity, e.g. censoring, informative sampling, more complex treatment types & temporal structures. Shouldn’t we handle all jointly? 7/8

1

10

Alicia Curth

@AliciaCurth

8 months

(iii) more smooth when more randomness is used in tree construction! We also show that the train-test difference in the level of smoothing is not limited to interpolating trees. BUT “spiked-smooth” behavior does appear more pronounced the more overfitted individual trees are. 6/n

1

11

Alicia Curth

@AliciaCurth

8 months

This is a nice intuition & relates to what Wyner et al () conjectured to be the “spiked-smooth” behaviour & driver of success of interpolating forests. But could we use our smoother-setup to somehow *quantify* whether this intuition is actually correct? 4/n

1

0

11

Alicia Curth

@AliciaCurth

8 months

… so check out the paper () to learn more, and to see some experiments illustrating these theoretical arguments on real data! I had a wonderful time working with my amazing coauthors at MSR on this & learned a lot on the way! 🤓🤗 11/11

Cautionary Tales on Synthetic Controls in Survival Analyses

Synthetic control (SC) methods have gained rapid popularity in economics recently, where they have been applied in the context of inferring the effects of treatments on standard continuous...

arxiv.org

0

1

11

Alicia Curth

@AliciaCurth

9 months

First, forecasting under intervention on treatments ofc requires strong identifiability assumptions that are a data problem, not a learning problem. Ie garbage in, garbage out (ML cannot do magic…) — BUT ML might be able to help making assumptions more likely to hold. 4/8

1

10

Alicia Curth

@AliciaCurth

22 days

Things change for new inputs:here models can be both more and less overfitted than when in-sample prediction is of interest. Overfitting can be worse out of sample than in-sample because an interpolating model with zero bias in-sample can have substantial bias out-of-sample 11/n

1

0

11

Alicia Curth

@AliciaCurth

22 days

But some interpolating models CAN also be less overfit out-of-sample than in-sample. This is what happens eg in the second descent in double descent, and intuitively is a consequence of models that can behave differently around new examples than around training examples:… 12/n

2

0

10

Alicia Curth

@AliciaCurth

8 months

We also demonstrate that the addition of bootstrapping to the random forest procedure has an additional smoothing effect — this time on both test- AND train-time predictions!! Bootstrapping thus really makes a difference when in-sample predictions are of interest! 8/n

1

10

Alicia Curth

@AliciaCurth

9 months

I personally think there are really 3+1 main features of the treatment effect estimation problem that make it a fascinating & non-standard ML problem, which each gets its own section in our review and a quick discussion below! … 3/8

1

9

Alicia Curth

@AliciaCurth

8 months

Even if time was linear, using SCs to construct estimates of summaries of the control distr. beyond the mean (eg survival curves) will thus usually be biased bc SCs underestimate the number of events occurring in the tails! Ie the shapes of survival curves of SCs will be off 9/n

1

0

9

Alicia Curth

@AliciaCurth

8 months

Yes! We make use of our smoother-based effective parameter measure from and show that interpolating forests are indeed (i) more smooth when issuing predictions on unseen test inputs than on train inputs, (ii) more smooth than individual trees and … 5/n

1

10

Alicia Curth

@AliciaCurth

9 months

Third, and I’ve personally found this the most unique aspect of CATE estimation, the true label of interest — the difference between POs Y(1)-Y(0) — is never observed, so it’s really not obvious how to design (&choose between) methods that are well-targeted at estimating it! 6/8

2

1

8

Alicia Curth

@AliciaCurth

8 months

We also observe that the train-test effective parameter gap grows if we use more randomness in tree construction. Our findings are thus in line with the “randomisation as regularisation” viewpoint of Mentch & Zhou (), discussed at length in our paper! 9/n

1

0

8

Alicia Curth

@AliciaCurth

2 years

… so I am still with the conclusions of our benchmark paper written almost 2 years ago now (see below). Yet, despite all this time passed, I still haven’t found a (ideally real-data-inspired) benchmark for this setting that actually makes me happy on all fronts 🥲

2

0

8

Alicia Curth

@AliciaCurth

3 years

Super happy to share that our paper on benchmarking practices in CATE estimation (), written in collaboration with D. Svensson & @weatheralljim75 from AZ, also just got accepted to the new NeurIPS21 datasets & benchmarks track 🥳

Really Doing Great at Estimating CATE? A Critical Look at ML...

We argue that CATE estimator benchmarking results based on popular semi-synthetic datasets should be interpreted more carefully, and discuss alternatives to current practice.

openreview.net

Mihaela van der Schaar

@MihaelaVDS

3 years

⚠️2 more papers accepted to #NeurIPS2021 , for a grand total of 14 for our lab! Congratulations to lead authors @AliciaCurth and @AlexJChan , whose papers have just been accepted to the Datasets and Benchmarks Track! Updated announcement here:

0

1

33

1

0

9

Alicia Curth

@AliciaCurth

9 months

Second, (observed) treatment assignment biases often lead to covariate shifts between treatment groups — this is why domain adaptation methods have become so popular in this literature to improve potential outcome (PO) predictions. 5/8

1

8

Alicia Curth

@AliciaCurth

8 months

We can now also show that, as conjectured by Wyner et al (2017), a train-test difference in the level of smoothing used when issuing predictions also appears in boosting! Like standard ensembles, boosted ensembles can be more smooth at test than at training time! 7/n

1

8

Alicia Curth

@AliciaCurth

2 years

@TacoCohen @ShalitUri @WvanAmsterdam We wrote about this at NeurIPS21 in the special case of CATE estimation () and came to the conclusion that good & varied simulations are indeed a way to go—but that reporting also needs to be more transparent in how some DGPs favor some models inherently.

1

0

7