Michael Oberst Profile Banner
Michael Oberst Profile
Michael Oberst

@MichaelOberst

Followers
1,748
Following
963
Media
13
Statuses
222

Incoming Assistant Professor of Computer Science at @JohnsHopkins , postdoc at @CarnegieMellon . PhD from @MIT_CSAIL . Reliable ML & Causality for Healthcare.

Cambridge, MA
Joined August 2011
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@MichaelOberst
Michael Oberst
9 months
I'm recruiting PhD students for my lab at Johns Hopkins! Please apply if you're interested in reliable ML / causal inference for decision-making in healthcare. See my website () for more info. Deadline 12/15. Retweets welcome :)
Tweet media one
8
114
351
@MichaelOberst
Michael Oberst
1 year
It's official! I'll be joining @JohnsHopkins as an Assistant Professor of Computer Science in summer 2024 - in the interim I'll be a postdoc at @CarnegieMellon in the Machine Learning Department working with @zacharylipton . Excited for the next chapter!
Tweet media one
33
18
492
@MichaelOberst
Michael Oberst
3 years
When can we learn predictive models that are robust to shifts in unobserved variables? With co-authors @nikolajthams , Jonas Peters, and @david_sontag , we tackle this question in our recent ICML paper. Paper: Video: [1/n]
Tweet media one
2
32
107
@MichaelOberst
Michael Oberst
2 years
How should you evaluate the worst-case performance of your model under distribution shift, with only data from the training distribution? Preprint with @david_sontag , @nikolajthams , at SCIS (Fri) and PODS (Sat) workshops at #ICML2022 Paper: (1/6)
Tweet media one
4
26
86
@MichaelOberst
Michael Oberst
4 years
When doing causal inference (e.g., comparing effectiveness of drugs, or evaluating a treatment policy) how do you characterize the population to whom your conclusions apply? Belated thread on our AISTATS-20 paper [1/5] Paper Video
2
22
62
@MichaelOberst
Michael Oberst
5 years
I'll be presenting "Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models" at 2:20pm today in the Grand Ballroom at #ICML2019 @icmlconf . Please also stop by our poster (72) at 6:30pm! This is work with my advisor @david_sontag .
Tweet media one
1
7
47
@MichaelOberst
Michael Oberst
2 years
Just got my badge at #ICML2022 , it’s been a while! Excited to reconnect with old friends and meet new ones - DM me if you’re interested in chatting about causality and/or distributional robustness!
Tweet media one
0
1
38
@MichaelOberst
Michael Oberst
9 months
I'll be at NeurIPS next week - excited to see old friends and meet new ones with similar research interests (reliable ML/causal inference/ML for healthcare). Hit me up (DM or email) if you'll be at NeurIPS and would like to chat! I'm also recruiting PhD students for Fall 2024.
@MichaelOberst
Michael Oberst
9 months
I'm recruiting PhD students for my lab at Johns Hopkins! Please apply if you're interested in reliable ML / causal inference for decision-making in healthcare. See my website () for more info. Deadline 12/15. Retweets welcome :)
Tweet media one
8
114
351
2
1
34
@MichaelOberst
Michael Oberst
8 months
Worth reading, esp. for PhD students stressed about # of pubs. Refreshing to have a take from someone who isn't "on the other side", as Elan puts it. Since Elan is on the job market, I should note he really puts this into practice - I have deep respect/admiration for his work.
@ElanRosenfeld
Elan Rosenfeld
8 months
Another round of "publication incentives are messed up". He's right, of course, but the people who say this are almost always those who publish lots and are no longer beholden to the game. This is something I've thought a lot about. As a senior PhD student, I have no... 1/n
1
10
134
1
1
30
@MichaelOberst
Michael Oberst
2 years
I'm at #NeurIPS2022 ! Presenting (w/ @david_sontag ) 1⃣ Evaluating Robustness to Dataset Shift via Parametric Robustness Sets (Tue 4pm, #313 ) 2⃣ Falsification before Extrapolation in Causal Effect Estimation (Thu 11am, #813 ) 🧵(1/3)
1
6
27
@MichaelOberst
Michael Oberst
5 years
The Whova app schedule at #ICML2019 does not appear to be robust to adversarial words like “lunch”...
Tweet media one
1
2
23
@MichaelOberst
Michael Oberst
2 years
Come check out the poster in the Spurious Correlations, Invariance, and Stability (SCIS) workshop at #ICML2022 ! Come chat 1:30pm-2:45pm today (Fri), in room 340, and again from 5:45pm-7pm.
@MichaelOberst
Michael Oberst
2 years
How should you evaluate the worst-case performance of your model under distribution shift, with only data from the training distribution? Preprint with @david_sontag , @nikolajthams , at SCIS (Fri) and PODS (Sat) workshops at #ICML2022 Paper: (1/6)
Tweet media one
4
26
86
0
3
20
@MichaelOberst
Michael Oberst
2 years
2⃣ How should you use experimental data in causal inference, if it *excludes* the population you care about? Idea: Uncover bias in observational data that *does* cover the relevant population (w/ @zeshanmh @rg_shih @david_sontag ) 📅 Poster: Thu 11am, #813
Tweet media one
0
5
13
@MichaelOberst
Michael Oberst
2 years
If you're at #NeurIPS2022 , come check out this work today from 11-1pm at poster #813 !
@MichaelOberst
Michael Oberst
2 years
2⃣ How should you use experimental data in causal inference, if it *excludes* the population you care about? Idea: Uncover bias in observational data that *does* cover the relevant population (w/ @zeshanmh @rg_shih @david_sontag ) 📅 Poster: Thu 11am, #813
Tweet media one
0
5
13
0
1
13
@MichaelOberst
Michael Oberst
2 years
For anyone looking at PhD programs in machine learning, having a great advisor is (by far) the most important consideration. Fredrik is both brilliant and kind - don't sleep on the opportunity to work with him!
@frejohk
Fredrik Johansson
2 years
We have a PhD student position open in my group—deadline Sunday this week (Oct 2)!
4
26
69
1
0
11
@MichaelOberst
Michael Oberst
5 years
After a long day of travel delays, finally arrived in Vancouver for #NeurIPS2019 !
0
0
10
@MichaelOberst
Michael Oberst
2 years
1⃣ How sensitive is your prediction model to changes in the causal data-generating process? We give a method for worst-case evaluation under a flexible class of user-specified changes. (w/ @nikolajthams @david_sontag ) 📅 Poster: Tue (today!) 4pm, #313
Tweet media one
1
4
9
@MichaelOberst
Michael Oberst
2 years
Very cool idea from colleagues at MIT, with a lot of useful applications, check it out!
@andrew_ilyas
Andrew Ilyas
2 years
Come hear about work on datamodels () at ICML *tomorrow* in the Deep Learning/Optimization track (Rm 309)! The presentation is at 4:50 with a poster session at 6:30. Joint work with @smsampark @logan_engstrom @gpoleclerc @aleks_madry
1
19
63
0
0
7
@MichaelOberst
Michael Oberst
3 years
We prove that one noisy proxy is *not* sufficient to recover the distributional robustness guarantees of Anchor Regression, but that these guarantees can be recovered with two noisy proxies (and we give an algorithm for doing so). [5/n]
1
0
7
@MichaelOberst
Michael Oberst
2 years
Well deserved!
@irenetrampoline
Irene Chen
2 years
Life update! Starting summer 2023, I'll be an Assistant Professor in UC Berkeley + UCSF's new Computational Precision Health program () with a joint appt in Berkeley EECS. Starting Aug 2022, I'll be a Postdoc at Microsoft Research New England
Tweet media one
100
34
2K
0
0
6
@MichaelOberst
Michael Oberst
4 years
No research today, taking part in #ShutDownSTEM (see thread). If you're on Twitter today, try searching #BlackintheIvory and letting some of the stories sink in. #ShutDownAcademia #StrikeforBlackLives
@JohnCUrschel
John Urschel
4 years
This Wednesday, June 10th, in conjunction with #ParticlesForJustice , I am taking part in #ShutDownSTEM , a one day event for African Americans in STEM to rest, and for others to educate themselves and reflect. #BlackLivesMatter #BlackandSTEM 1/20
9
158
396
0
0
6
@MichaelOberst
Michael Oberst
3 years
In many real-world domains (e.g., medicine), important confounding factors are not observed, like socioeconomic status. If these factors change between train and test (e.g., moving to a new hospital) predictive performance may suffer. [2/n]
1
0
5
@MichaelOberst
Michael Oberst
3 years
We tackle this problem for the case of linear causal models, building on prior work (Anchor Regression) that assumes the relevant factors are observed during training. We instead assume that only noisy proxies are available. [4/n]
1
0
5
@MichaelOberst
Michael Oberst
5 years
Submit to the ML for Health (ML4H) Workshop at NeurIPS 2019! This year's theme: What makes ML in Medicine different? Deadline is Friday Sept 13th.
0
1
5
@MichaelOberst
Michael Oberst
3 years
We also extend Anchor Regression to more targeted robustness sets (encoding prior knowledge about feasible shifts), and show that guarantees can also be recovered in this setting with two proxies. [6/n]
1
0
5
@MichaelOberst
Michael Oberst
2 years
@zacharylipton Sounds related to under-reporting in causal inference (when you observe A = 0, could be that no treatment was given, or that you just didn't record it). @suchisaria @royjamesadams have something on this.
1
0
5
@MichaelOberst
Michael Oberst
4 years
@david_sontag Shout-out to @tompollard and the team at Physionet for helping make this dataset release happen!
0
0
4
@MichaelOberst
Michael Oberst
7 months
@rasbt @andersonbcdefg FWIW, giving credit where it's due, it does look like the former (Liu et al. 2024) cite the latter (Mitchell et al. 2023) and discuss distinctions b/w the papers in the related work, while acknowledging the equation is largely the same (Sec 7, second para of Liu et al.)
0
0
4
@MichaelOberst
Michael Oberst
3 years
Come to our talk on Thursday at 11:40pm ET to hear more! [end]
Tweet media one
0
0
4
@MichaelOberst
Michael Oberst
6 years
@irenetrampoline presenting “Why is my Classifier Discriminatory” at #NeurIPS18 . Practical ways to decompose and reduce unfairness (in terms of accuracy), esp relevant in healthcare. @frejohk @david_sontag
Tweet media one
Tweet media two
Tweet media three
0
0
4
@MichaelOberst
Michael Oberst
3 years
So how do we learn models with robust *predictive* performance under these shifts? We take a distributional robustness viewpoint, minimizing a worst-case loss under changes to the distribution of unobserved factors. [3/n]
1
0
3
@MichaelOberst
Michael Oberst
8 months
@cyrilzakka @HiesingerLab Sounds like a cool initiative! I noticed that the "view on github" takes you to an empty repo () - is there a plan to develop this as an open-source project? Or a place to find more documentation about the details?
1
0
1
@MichaelOberst
Michael Oberst
4 years
We give an algorithm (using Boolean rule sets) to characterize this population, simple enough to be published alongside a study. That way, clinicians can incorporate this knowledge as they read about your results! [3/5]
1
0
3
@MichaelOberst
Michael Oberst
4 years
And of course, shout-out to wonderful co-authors @frejohk , Dennis Wei, Tian Gao, @bratogram , @david_sontag , and @krvarshney . [5/5]
0
0
3
@MichaelOberst
Michael Oberst
4 years
@rahulgk is an inspiring scientist, and an even better friend / roommate! Glad to still have him around in Cambridge for the next year ;)
@rahulgk
Rahul G. Krishnan
4 years
I graduated with my PhD from @MITEECS . In Fall 2021, I'll be an assistant assistant professor jointly in CS ( @UofTCompSci ) and Medicine ( @uoftmedicine ) & a member of the Vector Institute ( @VectorInst ). For the next year, I'm excited to be a researcher at MSR New England ( @MSRNE ).
33
12
506
0
0
2
@MichaelOberst
Michael Oberst
11 months
0
0
2
@MichaelOberst
Michael Oberst
4 years
Consider evaluation of a new treatment policy - estimates of policy value will only apply to the types of patients who are both (a) well-represented in the study, and (b) are observed being treated according to the proposed policy. [2/5]
1
0
2
@MichaelOberst
Michael Oberst
10 months
1
0
2
@MichaelOberst
Michael Oberst
9 months
@jivatneet Thank you Jivat! Hope all is well at Berkeley :)
0
0
2
@MichaelOberst
Michael Oberst
4 years
Check out the full paper for more details - the supplement includes recommendations on hyper-parameter tuning, backed up by experiments, as well as further experiments on characterizing overlap in policy evaluation. [4/5]
1
0
2
@MichaelOberst
Michael Oberst
2 years
@CasualBrady @emrek @ehudkar You can think of OverRule as learning a set of "inclusion criteria" that describe the population with reasonable propensity scores. E.g., in a dataset of post-surgical opioid prescriptions, the rules highlight that overlap does not hold for C-section surgeries (Fig. 4)
1
0
2
@MichaelOberst
Michael Oberst
2 years
@CasualBrady @emrek @ehudkar General goal is to (a) report results on a population where overlap holds, but also (b) make it clear to the reader (of your study) what that population is. Dropping samples with extreme propensity scores helps with (a) but not (b), which is where OverRule comes in...[1/2]
1
1
1
@MichaelOberst
Michael Oberst
2 years
@NickSpies13 @david_sontag @nikolajthams great question! First, a nice thing about this approach is that we only need to model aspects of the *changing* distributions; e.g., in the lab test example where that is the only shift, one could have a much larger feature space w/o significantly changing the difficulty
1
0
1
@MichaelOberst
Michael Oberst
2 years
We focus on estimating the worst-case loss over a particular robustness set of distributions, defined by parametric "shift functions" that alter one or more conditional distributions. (2/6)
1
0
1
@MichaelOberst
Michael Oberst
4 years
@SanjatKanjilal @Dxgnosis In terms of introductory books, I highly recommend Introduction to Statistical Learning () and Pearl's causal inference primer ()!
1
0
1
@MichaelOberst
Michael Oberst
5 years
See this thread for a quick explainer:
@david_sontag
David Sontag
5 years
@MichaelOberst and I tackle the following question in our upcoming ICML 2019 paper, motivated by our lab's research of ML in healthcare: How do you build trust in a new policy learned by reinforcement learning from observational data?
Tweet media one
1
14
58
1
0
1
@MichaelOberst
Michael Oberst
1 year
0
0
1
@MichaelOberst
Michael Oberst
4 years
@AndrewLBeam @graduatedescent @srush_nlp My understanding if helpful: This usage of "nonparametric" is for *discrete* X, Y. Linear model with all interactions is the same as empirical mean E[Y | X = x] for every obs. value of X. For continuous X, can't do this, hence the need for some assumptions (e.g., smoothness)
2
0
1
@MichaelOberst
Michael Oberst
4 years
@lycanduo @WvanAmsterdam @ShalitUri @frejohk @rahulgk @harvineet_singh @angelamczhou @nathankallus Basically yes; Say you observe X', Z', and you've posited some f_z(X, eps_Z) such that P(Z = z' | X = x') = P(f_z(x', eps_Z) = z'), you can now get a posterior distribution p(eps_Z | X', Z'), and you simulate forward with that
0
0
1
@MichaelOberst
Michael Oberst
8 months
@_scott_fleming_ Thank you Scott! That’s very kind of you to say :)
0
0
0
@MichaelOberst
Michael Oberst
5 years
@tianweisheng No worries, I just thought it was a fun bit of irony given the topic =)
0
0
1
@MichaelOberst
Michael Oberst
2 years
This allows users to constrain shifts to be plausible: In a toy healthcare example, the unconstrained (and unrealistic) worst-case shift = "only order tests for healthy patients". We can avoid this failure mode by constraining the manifold of the shift. (3/6)
1
0
1
@MichaelOberst
Michael Oberst
2 years
@NickSpies13 @david_sontag @nikolajthams this is implicit in the image application, where our approach does *not* need to model the distribution of images given attributes, which we assume is fixed... the GAN is not part of the method, just used to get a "ground truth" evaluation of performance under a new distribution
1
0
1
@MichaelOberst
Michael Oberst
4 years
@graduatedescent @AndrewLBeam @srush_nlp Generally agree, in this context I translate "nonparametric" as "most flexible possible model for the conditional distribution", at which point it's very literal in the discrete case 😉, and closer to how the term is used in e.g., nonparametric regression with continuous vars
0
0
1
@MichaelOberst
Michael Oberst
8 months
0
0
1
@MichaelOberst
Michael Oberst
4 years
@lycanduo @WvanAmsterdam @ShalitUri @frejohk @rahulgk @harvineet_singh @angelamczhou @nathankallus Graphically, you can always replace e.g., X -> Y with X -> Y <- eps_y, such that Y = f_y(X, eps_y). Basically turn every observed variable into a function, and add jointly independent noise terms that give you all the obs variation
2
0
1
@MichaelOberst
Michael Oberst
2 years
Missed our poster at the Spurious Correlation workshop? You're in luck! We're presenting the poster again at the Principles of Distribution Shift (PODS) workshop at #ICML2022 Poster Session today (Sat) from 11:50am-12:30pm, in Ballroom 3
@MichaelOberst
Michael Oberst
2 years
How should you evaluate the worst-case performance of your model under distribution shift, with only data from the training distribution? Preprint with @david_sontag , @nikolajthams , at SCIS (Fri) and PODS (Sat) workshops at #ICML2022 Paper: (1/6)
Tweet media one
4
26
86
0
0
1
@MichaelOberst
Michael Oberst
4 years
@tw_killian Thanks Taylor! Happy to chat about it anytime
0
0
1
@MichaelOberst
Michael Oberst
2 years
@NickSpies13 @david_sontag @nikolajthams second, this also extends to only needing "partial" causal knowledge to get a causal interpretation: We only need to know the causal parents of variables where the mechanism is changing.
0
0
1
@MichaelOberst
Michael Oberst
2 years
We illustrate using a computer vision task w/GAN-generated images of faces, where shifts occur in the distribution of attributes (e.g., eyeglasses, smiling). Our method reveals sensitivity to changes in potentially problematic associations (e.g., women wearing lipstick). (5/6)
1
0
1
@MichaelOberst
Michael Oberst
4 years
@lycanduo @WvanAmsterdam @ShalitUri @frejohk @rahulgk @harvineet_singh @angelamczhou @nathankallus ofc, how you infer eps_y after observing X, Y, depends on the chosen function f_y and the distribution of eps_y. Could be many choices that yield same obs/int. distribution, but different counterfactual distributions, see e.g., Section 3.1 of
0
0
1