Senior Researcher Machine Learning
@MSFTResearch
, Statistician at ❤️ In search of statistical intuition for modern ML &simple explanations for complex things 👀
Why do Random Forests perform so well off-the-shelf & appear essentially immune to overfitting?!?
I’ve found the text-book answer “it’s just variance reduction 🤷🏼♀️” to be a bit too unspecific, so in our new pre-print ,
@Jeffaresalan
& I investigate..🕵🏼♀️ 1/n
When Double Descent & Benign
Overfitting became a thing, I was a masters student in statistics — and so confused. I couldn't reconcile what l had literally just learned about bias-variance&co with modern ML
Here's what I wish someone had told me then: 1/n
I spent the first 2.5 years of my PhD on the question “What makes individualised treatment effect estimation an interesting Machine Learning problem (and how do we best solve it)?”. Super excited that a review of lots of things we learned along the way was accepted into… 1/8
Super excited to finally share
@Jeffaresalan
& my
#NeurIPS2023
Oral: 🥳 — a slightly unconventional paper leading to a surprising and (shockingly) simple resolution of the tension between statistical intuition & double descent! 1/3
Part 2: So why DO Random Forests work?! On this, I’ll have to disagree with Elements of Statistical Learning (my first time ever 💔)
EoSL says the success of forests should be understood as a consequence of variance reduction *alone*, but I think that’s not a good intuition 1/n
Why do Random Forests perform so well off-the-shelf & appear essentially immune to overfitting?!?
I’ve found the text-book answer “it’s just variance reduction 🤷🏼♀️” to be a bit too unspecific, so in our new pre-print ,
@Jeffaresalan
& I investigate..🕵🏼♀️ 1/n
oh and P.S. I don’t think we will ever be able to top the level of creativity it took to come up with our 3D poster. Most successful arts&crafts project
@Jeffaresalan
or I have ever been involved in 🙏🏻
Having started my PhD in the gathertown era, it’s bittersweet to realise that the most rewarding PhD moments happen in-person at conferences. I’ve had an incredible week putting faces to names & it’s been a surreal experience presenting our work to so many of them. I❤️NeurIPS!
Having started my PhD in the gathertown era, it’s bittersweet to realise that the most rewarding PhD moments happen in-person at conferences. I’ve had an incredible week putting faces to names & it’s been a surreal experience presenting our work to so many of them. I❤️NeurIPS!
Every StatML intro class covers complexity-error U-curves, so
@Jeffaresalan
& I asked ourselves whether the info from these classes is enough to explain double descent too? Our
#NeurIPS23
paper does a roundtrip of The Elements of Statistical Learning and answers “Yes”! Long🧵1/n
I complain a lot about the general quality of ML reviews (offline), so I try to do better when I review myself. I was a reviewer for two conferences during 2022 (ICML22, AISTATS23), and now received a top reviewer award for both 🥳 Excited to see that this effort pays off!😊
Addendum — if you take away just one thing from this thread, it should be:
Machine learning isn’t some kind of magic that defies the laws of statistics! I believe fundamental concepts from classical statistics will (probably) be “all we need” to understand modern ML!!
BUT… 1/3
When Double Descent & Benign
Overfitting became a thing, I was a masters student in statistics — and so confused. I couldn't reconcile what l had literally just learned about bias-variance&co with modern ML
Here's what I wish someone had told me then: 1/n
Economists seem to LOVE synthetic control methods, so during my MSR internship with
@javiergonzh
we wanted to understand whether we could use them for survival analyses (v prevalent in medicine) too? Delighted that our answer (“It’s complicated!”) was accepted
@Conf_CLeaR
… 1/n
P.S.: Exciting for me as an ex-econometrician, this project also meant I finally got to learn what‘s behind all that ✨ synthetic control magic ✨🕵🏼♀️
My lukewarm (?) take: no magic, just some linearity assumptions* doing v heavy lifting in the background 🫢
Economists seem to LOVE synthetic control methods, so during my MSR internship with
@javiergonzh
we wanted to understand whether we could use them for survival analyses (v prevalent in medicine) too? Delighted that our answer (“It’s complicated!”) was accepted
@Conf_CLeaR
… 1/n
If you missed us in New Orleans but wanted to hear
@Jeffaresalan
& myself talk about (literal and figurative) U-turns on double descent, it seems that NeurIPS has made all recordings of Orals publicly available!🥳 Find us at minute 35:15 in this recording:
Excited to be back at
#icml
!☀️Find me floating around or come chat to me &
@Jeffaresalan
about our integrated attempt at understanding deep double descent, grokking, linear mode connectivity & differences between gradient boosting and neural nets on Friday at the HiLD workshop!🤓
I am going to Honolulu and I’m bringing … 3 posters!!!🤯🥳🌺 beyond excited & happy that lots of hard work paid off — but also feeling very lucky to have had great coauthors 🤗 as well as the most engaged set of reviewers *and* ACs I’ve seen so far! See you in July
@icmlconf
☀️
Incredibly proud of what my students have achieved with our contributions for
#ICML2023
! We will present a range of our intensive work on causal deep learning, clinical trials, treatment effect estimation, synthetic data and deep learning for tabular data:
brb just quickly recharging the batteries en route to
#NeurIPS2023
to get ready for the highlight of my academic year 🙆🏼♀️☀️ Next up: beyond excited to present our work on double descent with
@Jeffaresalan
as an oral in the first conference session on Tuesday! See you 🔜 NOLA 😎
Been sitting on this for a while now, but we are almost camera-ready so I can finally share: started a new research thread w/
@Jeffaresalan
earlier this year!! Our joint paper goes down a surprising rabbit hole & got rewarded with a NeurIPS Oral!🤯🥳 (Paper dropping next week🔥⏳)
It’s finally time: tomorrow
@Jeffaresalan
& I will be presenting our
#NeurIPS2023
paper on a surprisingly simple resolution to double descent in Oral session 1D at 10:30am in room R06-09 (level 2) 🥳 Beware: it’s a little trek to get to the room (upstairs), don’t miss it 😉
Super excited to finally share
@Jeffaresalan
& my
#NeurIPS2023
Oral: 🥳 — a slightly unconventional paper leading to a surprising and (shockingly) simple resolution of the tension between statistical intuition & double descent! 1/3
Another year, another amazing
@Conf_CLeaR
!! Had only one complaint last year (the weather on the conference hike…) and even that was perfectly arranged this time☀️
Personal takeaway: small, focused ML conferences are so so great — esp for PhD students & for finding community!
Had the absolute best time at
@CLeaR_2022
in Tübingen the last few days! From great talks & papers to great people, great organisation & great food, this conference had everything I could have hoped for 😍 (except for maybe great weather… ) Really can’t wait for
#CLeaR24
🤓
I’ve spent the last 1.5 years working with the amazing
@Jeffaresalan
on understanding modern ML phenomena, questioning everything we know about statistics in the process. The above is probably one of my biggest yet simplest takeaways!
More here:
19/19
2023’s biggest PhD highlights were def the conferences for me, finally being able to attend in person does make such a difference 🙌🏻
Personal top moments from ICML & NeurIPS below (slightly different vibes)
Beyond excited to share that the first paper of my PhD with
@MihaelaVDS
, on estimating conditional average treatment effects using meta-learners and neural nets, was recently accepted for publication at
#AISTATS2021
!
Paper:
Code:
In other news: just interrupting the usual stats/ML coverage to share completion of my final
@Cambridge_Uni
bucketlist item — being part of
@clarehall_cam
’s first ever women’s crew to win blades in Lent Bumps last week 😱💪🏻 is that Cam telling me it’s time to graduate soon…?🤔
Fun fact: when
@Jeffaresalan
& I fell down the double descent rabbit hole, we were actually looking into another question entirely. Why do simple ensembles continue to work so well in practice?! We learned a lot about Random Forests on the way & have now come full circle: ⬇️🚨👀
Delighted ☀️ to be in Valencia this week to present our paper on heterogeneous treatment effect estimation in the presence of competing risks 🙌🏻😎 I’m extra excited because I FINALLY get to attend
@aistats_conf
in person-it’s where my first PhD paper was published back in 2021 🤓
Excited to share the next chapter in my saga on heterogeneous treatment effect estimation (aka my PhD) — to be presented at
@aistats_conf
in April — which features some interesting new characters: competing events! () 1/n
Had the absolute best time at
@CLeaR_2022
in Tübingen the last few days! From great talks & papers to great people, great organisation & great food, this conference had everything I could have hoped for 😍 (except for maybe great weather… ) Really can’t wait for
#CLeaR24
🤓
Super excited to share that I’ve not only had my first ever
#NeurIPS
paper accepted, but also my second (joint with C. Lee) and third (led by
@QianZhaozhi
) 🥳🤯 Finished my first year with
@MihaelaVDS
on a high note!🥳
I'm still processing our
#NeurIPS2021
results—12 papers accepted! All I can say is THANK YOU to our superstar lab members for your brilliance and dedication. So proud of you all! Details here:
Turns out, there’s quite a simple explanation noone talks about: The intuitions on bias-variance tradeoff & overfitting I was taught apply to in-sample prediction(where only outputs are resampled at testtime) while modern ML wants generalization to new inputs -crucial change! 2/n
Long 🧵to follow soon, for now check out the paper here: ! We learned A LOT about statistics, ML & their history on the way — really hope that people will enjoy reading this paper even half as much as we did writing it! 🤓 3/3
After two years of gathertown, the day has finally come: it’s time for the first in-person presentation & poster of my PhD 🥳 I’ll be presenting at 5:35pm (Room 318) with poster session 6:30-8:30pm —come by if you’d like to chat about imputation (or anything else)!🤓
#ICML2022
Second in line at
#ICML2022
are Daniel Jarrett,
@BCebere
, Tennison Liu,
@AliciaCurth
& I: HyperImpute, a generalised iterative imputation framework. Missing data is a big problem and here, we present THE state-of-the-art tool that can help solving it! 2/2
I have spent *tons* of time in the last couple of years with
@MihaelaVDS
trying to find good benchmarks to evaluate (heterogeneous) causal effect estimators — and am still not really satisfied with what we’ve got (see e.g. our NeurIPS21 critique ) 🥲 …
Interesting take. The comparison with NNs breaks here: with NNs we can easily empirically verify performance (e.g. ImageNet)
What's the ImageNet of causal inference? Maybe going forward we should accept simulations as (supporting) 'proof' instead of just theorems / formal proofs
I put as much effort into reviewing as how I would like my own papers to be reviewed — I think more reviewers should give that a try 😅 Be the change you wish to see in the world right? 😉 (also, if you need an incentive: it might give you a free conference registration!)
What drives the relative empirical performance of ML algorithms for CATE estimation?
Sometimes it's simply the choice of benchmark dataset! With
@MihaelaVDS
, I wrote about this for the
#ICML2021
Workshop on Neglected Assumptions in Causal Inference happening tomorrow.
Want to hear about the next stop on our journey into understanding modern deep learning phenomena? Come find
@Jeffaresalan
& myself in the poster sessions at 10:00 and 15:30 at the workshop on high-dimensional learning dynamics
@icmlconf
in Straus 2 tomorrow! 🙌🏻
#ICML2024
Excited to be back at
#icml
!☀️Find me floating around or come chat to me &
@Jeffaresalan
about our integrated attempt at understanding deep double descent, grokking, linear mode connectivity & differences between gradient boosting and neural nets on Friday at the HiLD workshop!🤓
Excited to share the next chapter in my saga on heterogeneous treatment effect estimation (aka my PhD) — to be presented at
@aistats_conf
in April — which features some interesting new characters: competing events! () 1/n
What makes estimating heterogeneous treatment effects from survival data *in the presence of competing events* challenging? We study this new & important problem, and theoretically analyse & empirically illustrate when & how competing events affect ML here
In addition to what we wrote about double descent in , this is thus another reason why double descent does not contradict the bias-variance tradeoff: the bias-variance tradeoff holds in-sample, while double descent *exclusively* appears out-of-sample! 8/n
When I think bias-variance tradeoff, I think about k-nearest neighbor estimators. I was taught that variance increases with complexity while bias decreases with complexity (here: lower k) because for in-sample prediction the 1-NN estimator (the example itself) has zero bias.3/n
Well, turns out this intuition is actually NOT always true for out-of-sample prediction (ie generalization) 🤯
Because for a new input there is no perfect training match, the bias of the most complex estimator in this class (the 1-NN estimator) is NOT necessarily the lowest 4/n
In out-of-sample settings, even for v simple models:
1. there isnt always a tradeoff between bias&var
2. bias can sometimes get worse with increased complexity
3. overfitting can also be a consequence of bias (not only var)
- and this is crucial for understanding modern ML! 3/3
If you’ve ever wanted to hear me rave about statistics for an hour, I’ve got a belated Christmas present for you: 🎁
Had a great time chatting to
@AleksanderMolak
about causality, double descent, stats and my journey into ML research from econometrics! 🤓
…Clinical Pharmacology & Therapeutic’s special issue on Machine Learning 🥳
Link: , with big thanks to my amazing coauthors Richard Peck, Eoin McKinney,
@weatheralljim75
&
@MihaelaVDS
🙌🏻
To briefly answer the question in the top-level tweet, .. 2/8
We make use of the adaptive nearest neighbor interpretation of trees & forests (eg ) bc that makes them much easier to reason about: trees are simply smoothers with learned weights! We show that this view makes their behaviour intuitive to understand… 2/n
Super excited to be speaking about our work on using machine learning for discovering & understanding treatment effect heterogeneity at
@AIClubBioMed
this week 🤓
Next event alert! Join us on Thur 4th May to explore this month's theme: "Machine Learning for Clinical Decision Making". We have two exciting talks from Alicia Curth
@AliciaCurth
and Vincent Jeanselme
@JeanselmeV
, followed by pizza. See you there 👀
@TheMilnerInst
@CRUK_CI
Had the absolute pleasure of learning from the amazing
@dennisfrauen
about sensitivity analysis when he visited us in Cambridge over the summer to work on this paper! 🤓 Go check it out here: 🙌🏻
Here’s a simulation example of this: while bias behaves monotonically as expected in-sample, it doesn’t out of sample! Indeed, for k<10, *there is no bias variance tradeoff* out-of-sample: both bias and variance prefer estimators with lower complexity in this region! 5/n
Aloha
#icml2023
, I'm excited for a big day full of posters!🌺 If you're interested to chat about all things treatment effects, come at 11am to discuss model selection with me (
#415
) & informative sampling with
@ToonVDSchueren
(
#514
), and at 2pm for adaptive trials (
#415
)!
In conclusion, I thus think we need to rethink how intuitions around bias-variance tradeoffs and overfitting are taught — in particular, more precision in vocabulary may be needed when we talk about these things to make clear when intuitions are likely to apply and why! 16/n
First tweet, big news: just graduated
@UniofOxford
with a MSc in Statistical Science, received a prize by
@OxfordStats
for overall performance on the MSc AND joined the amazing
@MihaelaVDS
as a PhD student at
@FacultyMaths
! 2020 certainly saved all the good things for the end 🥳
Investigating the double descent phenomenon outside of deep learning, we went down 2 rabbit holes so deep they led us i) to completely deconstruct non-deep double descent (incl. linear regressions) & ii) back in time to the 90s when smoothers were the SOTA of stats! 🤯 2/3
So this note is what I would have needed while I was in my masters to see that things aren’t as different & difficult as they may seem.
I think topics as fundamental as bias-variance tradeoff & overfitting should remain accessible to any grad student in statistics or ML!🤓18/n
How does this relate to understanding double descent and benign overfitting?First, this makes clear that it isn’t (only) interpolation or overparametrization or modern ML breaking classical statistical intuitions — the move from in-sample to out-of-sample preds is crucial too!6/n
Interested in our recent work on treatment effect heterogeneity?🤓Come chat during today’s
#NeurIPS
poster session at 4:30pm GMT/8:30am PST 🥳 (I’ll be bringing my Time-Turner to try being at multiple posters simultaneously 🪄🧙)
#NeurIPS2021
is off to a busy start! A total of 4 poster sessions later today for papers by lab members
@AliciaCurth
, Changhee Lee,
@QianZhaozhi
, Yao Zhang, and
@IoanaBica95
. Definitely worth a look for anyone interested in treatment effects! More info:
Next Monday (June 12, 4pm BST), our lab is hosting an inspiration exchange where I'll be presenting lots of our newest work on ML for personalized treatment effect estimation with
@MihaelaVDS
! 🤓🥳 More info about attending online:
1/3
I wrote CATENets back in the first year of my PhD (& it was the first “real deep learning” I ever did!), mainly to have a tool to understand & benchmark lots of existing (&new) methods using fair, comparable, implementations, so super excited that it actually (still) gets used 🥳
⭕ CATENets ()
Developed by
@AliciaCurth
- a researcher at van der Schaar Lab - the package offers a unique set of deep-learning based CATE estimators. From original architectures designed by Curth and van der Schaar (SNet, FlexTENet) to...
🧵 (7/n)
This is also my answer to last week’s big discussion on overfitting (I’m a little late😅): whether we should still worry about overfitting today really depends on which setting one is interested in (do train inputs reappear?), and on how models interpolate the training data! 15/n
I had a wonderful day yesterday chatting to
@AleksanderMolak
about causality, machine learning research and life more generally 😍 really cannot wait to see the final product of his visit, stay tuned 👀👀
Concluding thought: For real progress on many of these questions, I think what the CATE ML literature is really missing is good & realistic benchmark datasets that exhibit actual complexities of real-world data to evaluate how well our methods are *actually* doing… 8/8
NB: this may have been obvious to some, but it really wasnt for me. To be honest, for a couple of years I thought I probably wont ever understand the modern stuff — I’m not a learning theorist and whenever I tried reading papers on the topic, explanations went over my head. 17/n
Also, the focus on in-sample prediction in statistics is probably a reason for historical absence of such phenomena in the literature: its easy to see that they CANNOT occur in in-sample settings, as all interpolating models (indep of size) make the same predictions in-sample!7/n
We argue that bias-in-mean is NOT a useful notion of bias when comparing trees&forests. Why?
Maybe surprisingly, the expected predictor of the class of trees is NOT necessarily itself a member of the class of trees (cf below)🤯 thus, we maybe shouldn’t compare expectations! 3/n
What about benign overfitting? Well, I’d say to understand that we first need to be a little more precise about vocabulary. Quite literally, overfitting cannot be benign as the term itself implies that performance suffers.
Instead, ask: When can *interpolation* be benign? 9/n
I was taught that interpolation causes overfitting as a consequence of noise in outcomes. Turns out, this intuition probably once more a relic of in-sample prediction!
For in-sample preds, interpolation indeed simply CANNOT be benign. Here, it’s all about variance! 10/n
I myself might be at home with big
#NeurIPS2022
-FOMO but fortunately the amazing
@IoanaBica95
&
@JonathanICrabbe
are in New Orleans to present & discuss our work on benchmarking treatment effect estimators 🥳— if you’re interested, you can catch them at todays poster session!🤓
Today at
#NeurIPS2022
in the Datasets and Benchmarks Track, we’ll be presenting our work on “Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability”
With:
@JonathanICrabbe
,
@AliciaCurth
,
@MihaelaVDS
🚩Hall J, poster
#1014
- 11a.m. to 1p.m.
Want to chat some more about ML for heterogeneous treatment effect estimation?🤓 Come join me at today‘s
#NeurIPS
datasets & benchmarks Poster session at 4:30pm GMT to discuss better benchmarking for CATE estimation 🥳🤗
Today's our lab's busiest day at
#NeurIPS2021
! A real variety of papers on show, with topics ranging from data imputation to fairness in synthetic data, understanding/empowering decision-making, benchmarking for treatment effects, and more! Details here:
Also, some practical takeaways:
1) Not all smoothing is good (as always, you can also overdo it).
2) The effect of hyperparameters seems to be VERY different on in- and out-of-sample performance.
3) More trees probably never hurts though.
👀 👈🏻
10/10
To summarise what we’ve discovered so far: forests are sometimes much smoother than trees when making predictions! But WHY does this mean that forests perform & generalize better than trees??
That’s in part 2 of the paper — which I’ll discuss tomorrow 🤓 Stay tuned…👀
Interested to learn more about our work on heterogeneous treatment effect estimation & competing risks (or anything else)?🤓Then stop by for a chat at our
@aistats_conf
poster at 16:30 (spot 79) today 🥳☀️
In the same session, we have
@AliciaCurth
& I theoretically analyse & empirically illustrate when and how competing risks play a role in using generic machine learning prediction models for the estimation of heterogeneous treatment effects.
A classical k-NN estimator cannot do that: a 1-NN estimator ALWAYS uses exactly 1 neighbor.
Many modern ML methods, however, implicitly do something different. Eg we showed recently that random forests can be 1-NN estimators at train time but k-NN estimators at test time… 13/n
It makes it easy to see that differences between predictions of trees and forests appear when&where problems are *underspecified*: eg individual interpolating trees will behave like 1-NN estimators everywhere, but ensembles thereof may act as k-NN estimators at test-time! 3/n
This relates to older computer science perspectives on the success of ensembles (): lost to the stats literature, they argue that ensembles can reduce both *model variability* AND *representational bias* given fixed data! That means there are at least.. 5/n
It makes it easy to see that differences between predictions of trees and forests appear when&where problems are *underspecified*: eg individual interpolating trees will behave like 1-NN estimators everywhere, but ensembles thereof may act as k-NN estimators at test-time! 3/n
Ready for Round 2!🤓 Today I’ll be presenting our new line of work on adaptive clinical trials at the ReALML workshop (Room 309), with spotlight talk after 11:40 and poster session at 17:05! 🥳 Stop by to discuss how to make clinical trials more efficient using ideas from ML! 🙌🏻
Next up will be
@AliciaCurth
, Alihan Hüyük & I with our contributions to the Adaptive Experimental Design & Active Learning in the Real World workshop (). Adaptively identifying good patient populations & good arms! This might transform clinical trials.2/2
TLDR? Forests improve upon trees bc they issue predictions that are smoother functions of the training outcomes. This reduces both the effect of outcome noise AND enriches the available hypothesis space, esp for out-of-sample predictions.
Maybe EoSL needs a little update 😉 9/n
Indeed, we show in for linear regs, forests & boosting that if we distinguish train and test complexity using effective param measures, you find that benignly interpolating models make less complex preds at test than train time. (Neural nets out soon!)14/n
Their conclusion that variance reduction alone makes forests better than trees is based on the fact that trees&forests have the same expectation and hence the same bias-in-mean. By the classical bias-var decomp of the MSE, all gain must thus come from the variance. True - BUT 2/n
Instead, we argue that a natural candidate for evaluating bias is the performance of the best-in-class predictor — which can&will differ between trees and forests!
Conceptually, the class of forests interpolates between all possible tree predictions and is thus much richer! 4/n
If you’re looking for more entertaining content regarding our joint work on double descent (and more), my amazing coauthor
@Jeffaresalan
has got you covered…
Finally, note that we usually study simple binary CATE static settings in the ML literature, but real problems have so many more layers of complexity, e.g. censoring, informative sampling, more complex treatment types & temporal structures. Shouldn’t we handle all jointly? 7/8
(iii) more smooth when more randomness is used in tree construction! We also show that the train-test difference in the level of smoothing is not limited to interpolating trees. BUT “spiked-smooth” behavior does appear more pronounced the more overfitted individual trees are. 6/n
This is a nice intuition & relates to what Wyner et al () conjectured to be the “spiked-smooth” behaviour & driver of success of interpolating forests. But could we use our smoother-setup to somehow *quantify* whether this intuition is actually correct? 4/n
… so check out the paper () to learn more, and to see some experiments illustrating these theoretical arguments on real data! I had a wonderful time working with my amazing coauthors at MSR on this & learned a lot on the way! 🤓🤗 11/11
First, forecasting under intervention on treatments ofc requires strong identifiability assumptions that are a data problem, not a learning problem.
Ie garbage in, garbage out (ML cannot do magic…) — BUT ML might be able to help making assumptions more likely to hold. 4/8
Things change for new inputs:here models can be both more and less overfitted than when in-sample prediction is of interest.
Overfitting can be worse out of sample than in-sample because an interpolating model with zero bias in-sample can have substantial bias out-of-sample 11/n
But some interpolating models CAN also be less overfit out-of-sample than in-sample. This is what happens eg in the second descent in double descent, and intuitively is a consequence of models that can behave differently around new examples than around training examples:… 12/n
We also demonstrate that the addition of bootstrapping to the random forest procedure has an additional smoothing effect — this time on both test- AND train-time predictions!! Bootstrapping thus really makes a difference when in-sample predictions are of interest! 8/n
I personally think there are really 3+1 main features of the treatment effect estimation problem that make it a fascinating & non-standard ML problem, which each gets its own section in our review and a quick discussion below! … 3/8
Even if time was linear, using SCs to construct estimates of summaries of the control distr. beyond the mean (eg survival curves) will thus usually be biased bc SCs underestimate the number of events occurring in the tails! Ie the shapes of survival curves of SCs will be off 9/n
Yes! We make use of our smoother-based effective parameter measure from and show that interpolating forests are indeed (i) more smooth when issuing predictions on unseen test inputs than on train inputs, (ii) more smooth than individual trees and … 5/n
Third, and I’ve personally found this the most unique aspect of CATE estimation, the true label of interest — the difference between POs Y(1)-Y(0) — is never observed, so it’s really not obvious how to design (&choose between) methods that are well-targeted at estimating it! 6/8
We also observe that the train-test effective parameter gap grows if we use more randomness in tree construction. Our findings are thus in line with the “randomisation as regularisation” viewpoint of Mentch & Zhou (), discussed at length in our paper! 9/n
… so I am still with the conclusions of our benchmark paper written almost 2 years ago now (see below). Yet, despite all this time passed, I still haven’t found a (ideally real-data-inspired) benchmark for this setting that actually makes me happy on all fronts 🥲
Super happy to share that our paper on benchmarking practices in CATE estimation (), written in collaboration with D. Svensson &
@weatheralljim75
from AZ, also just got accepted to the new NeurIPS21 datasets & benchmarks track 🥳
⚠️2 more papers accepted to
#NeurIPS2021
, for a grand total of 14 for our lab! Congratulations to lead authors
@AliciaCurth
and
@AlexJChan
, whose papers have just been accepted to the Datasets and Benchmarks Track! Updated announcement here:
Second, (observed) treatment assignment biases often lead to covariate shifts between treatment groups — this is why domain adaptation methods have become so popular in this literature to improve potential outcome (PO) predictions. 5/8
We can now also show that, as conjectured by Wyner et al (2017), a train-test difference in the level of smoothing used when issuing predictions also appears in boosting! Like standard ensembles, boosted ensembles can be more smooth at test than at training time! 7/n
@TacoCohen
@ShalitUri
@WvanAmsterdam
We wrote about this at NeurIPS21 in the special case of CATE estimation () and came to the conclusion that good & varied simulations are indeed a way to go—but that reporting also needs to be more transparent in how some DGPs favor some models inherently.