Michael Oberst @MichaelOberst profile

Michael Oberst

@MichaelOberst

Followers

1,748

Following

963

Media

13

Statuses

222

Incoming Assistant Professor of Computer Science at @JohnsHopkins , postdoc at @CarnegieMellon . PhD from @MIT_CSAIL . Reliable ML & Causality for Healthcare.

https://t.co/L0AbbVRDPr

Cambridge, MA

Joined August 2011

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

#KawalPutusanMK • 2764619 Tweets

#TolakPolitikDinasti • 1619535 Tweets

#TolakPilkadaAkal2an • 1521779 Tweets

#BeyondTheVinesXFreenBecky • 357294 Tweets

FB JOIN OPENING BTV HOUSE • 356094 Tweets

Nobara • 254045 Tweets

BATALKAN BUKAN TUNDA • 234477 Tweets

Mulyono • 191603 Tweets

JANGAN LENGAH • 152419 Tweets

Nigerians • 119561 Tweets

Bolt • 78775 Tweets

South Africans • 75676 Tweets

BOTO PARA SA SB19 • 61567 Tweets

鯉登少尉 • 42664 Tweets

ポケ森サ終 • 37231 Tweets

チャイルドシート • 35334 Tweets

首都高バトル • 26127 Tweets

BTV X FORCEBOOK • 25016 Tweets

買い切り • 23169 Tweets

SF x PONDPHUWIN • 22003 Tweets

中川大志 • 21706 Tweets

本人確認 • 17846 Tweets

南京大虐殺 • 15426 Tweets

GCSE • 14934 Tweets

ポランコ

ヨツンヴァイン

研ナオコさん

戸郷くん

マイナ一本化

青柳さん

クレカ申請

規則改正

花道囲い席

チッケム

ミュージカル刀剣乱舞

佐々木朗希

アオラキ

ヤフーレ

BGYO TRASH OUT NOW

ガルシア

マツチロ

守備妨害

ウェンデルケン

プロデューサーズ

乱舞音曲祭

野間口さん

ジャクソン

アドゥワ

ヤスアキ

#ミュージックジェネレーション

Last Seen Profiles

@StMarysChuRicky

@AmicoHoops

@AABBHV

@handbollost

@xCho_Celes

@blakenastyy

@Gadermann

@HomeGymPeakers

@AURIS_Advisory

@lompifey

@Nerdtheon

@PGHMichigan

@willstevens

@fanyperals

@liverinfobr

@savurdumyarrami

@kavinadhish

@uborwetenga

@wdche

@CatholicEdParra

Pinned Tweet

Michael Oberst

@MichaelOberst

9 months

I'm recruiting PhD students for my lab at Johns Hopkins! Please apply if you're interested in reliable ML / causal inference for decision-making in healthcare. See my website () for more info. Deadline 12/15. Retweets welcome :)

8

114

351

Michael Oberst

@MichaelOberst

1 year

It's official! I'll be joining @JohnsHopkins as an Assistant Professor of Computer Science in summer 2024 - in the interim I'll be a postdoc at @CarnegieMellon in the Machine Learning Department working with @zacharylipton . Excited for the next chapter!

33

18

492

Michael Oberst

@MichaelOberst

3 years

When can we learn predictive models that are robust to shifts in unobserved variables? With co-authors @nikolajthams , Jonas Peters, and @david_sontag , we tackle this question in our recent ICML paper. Paper: Video: [1/n]

2

32

107

Michael Oberst

@MichaelOberst

2 years

How should you evaluate the worst-case performance of your model under distribution shift, with only data from the training distribution? Preprint with @david_sontag , @nikolajthams , at SCIS (Fri) and PODS (Sat) workshops at #ICML2022 Paper: (1/6)

4

26

86

Michael Oberst

@MichaelOberst

4 years

When doing causal inference (e.g., comparing effectiveness of drugs, or evaluating a treatment policy) how do you characterize the population to whom your conclusions apply? Belated thread on our AISTATS-20 paper [1/5] Paper Video

2

22

62

Michael Oberst

@MichaelOberst

5 years

I'll be presenting "Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models" at 2:20pm today in the Grand Ballroom at #ICML2019 @icmlconf . Please also stop by our poster (72) at 6:30pm! This is work with my advisor @david_sontag .

1

7

47

Michael Oberst

@MichaelOberst

2 years

Just got my badge at #ICML2022 , it’s been a while! Excited to reconnect with old friends and meet new ones - DM me if you’re interested in chatting about causality and/or distributional robustness!

0

1

38

Michael Oberst

@MichaelOberst

9 months

I'll be at NeurIPS next week - excited to see old friends and meet new ones with similar research interests (reliable ML/causal inference/ML for healthcare). Hit me up (DM or email) if you'll be at NeurIPS and would like to chat! I'm also recruiting PhD students for Fall 2024.

Michael Oberst

@MichaelOberst

9 months

I'm recruiting PhD students for my lab at Johns Hopkins! Please apply if you're interested in reliable ML / causal inference for decision-making in healthcare. See my website () for more info. Deadline 12/15. Retweets welcome :)

8

114

351

2

1

34

Michael Oberst

@MichaelOberst

8 months

Worth reading, esp. for PhD students stressed about # of pubs. Refreshing to have a take from someone who isn't "on the other side", as Elan puts it. Since Elan is on the job market, I should note he really puts this into practice - I have deep respect/admiration for his work.

Elan Rosenfeld

@ElanRosenfeld

8 months

Another round of "publication incentives are messed up". He's right, of course, but the people who say this are almost always those who publish lots and are no longer beholden to the game. This is something I've thought a lot about. As a senior PhD student, I have no... 1/n

1

10

134

1

30

Michael Oberst

@MichaelOberst

2 years

I'm at #NeurIPS2022 ! Presenting (w/ @david_sontag ) 1⃣ Evaluating Robustness to Dataset Shift via Parametric Robustness Sets (Tue 4pm, #313 ) 2⃣ Falsification before Extrapolation in Causal Effect Estimation (Thu 11am, #813 ) 🧵(1/3)

1

6

27

Michael Oberst

@MichaelOberst

5 years

The Whova app schedule at #ICML2019 does not appear to be robust to adversarial words like “lunch”...

1

2

23

Michael Oberst

@MichaelOberst

2 years

Come check out the poster in the Spurious Correlations, Invariance, and Stability (SCIS) workshop at #ICML2022 ! Come chat 1:30pm-2:45pm today (Fri), in room 340, and again from 5:45pm-7pm.

Michael Oberst

@MichaelOberst

2 years

How should you evaluate the worst-case performance of your model under distribution shift, with only data from the training distribution? Preprint with @david_sontag , @nikolajthams , at SCIS (Fri) and PODS (Sat) workshops at #ICML2022 Paper: (1/6)

4

26

86

0

3

20

Michael Oberst

@MichaelOberst

2 years

2⃣ How should you use experimental data in causal inference, if it *excludes* the population you care about? Idea: Uncover bias in observational data that *does* cover the relevant population (w/ @zeshanmh @rg_shih @david_sontag ) 📅 Poster: Thu 11am, #813

0

5

13

Michael Oberst

@MichaelOberst

2 years

If you're at #NeurIPS2022 , come check out this work today from 11-1pm at poster #813 !

Michael Oberst

@MichaelOberst

2 years

2⃣ How should you use experimental data in causal inference, if it *excludes* the population you care about? Idea: Uncover bias in observational data that *does* cover the relevant population (w/ @zeshanmh @rg_shih @david_sontag ) 📅 Poster: Thu 11am, #813

0

5

13

0

1

13

Michael Oberst

@MichaelOberst

2 years

For anyone looking at PhD programs in machine learning, having a great advisor is (by far) the most important consideration. Fredrik is both brilliant and kind - don't sleep on the opportunity to work with him!

Fredrik Johansson

@frejohk

2 years

We have a PhD student position open in my group—deadline Sunday this week (Oct 2)!

4

26

69

1

0

11

Michael Oberst

@MichaelOberst

5 years

After a long day of travel delays, finally arrived in Vancouver for #NeurIPS2019 !

0

10

Michael Oberst

@MichaelOberst

2 years

1⃣ How sensitive is your prediction model to changes in the causal data-generating process? We give a method for worst-case evaluation under a flexible class of user-specified changes. (w/ @nikolajthams @david_sontag ) 📅 Poster: Tue (today!) 4pm, #313

1

4

9

Michael Oberst

@MichaelOberst

2 years

Very cool idea from colleagues at MIT, with a lot of useful applications, check it out!

Andrew Ilyas

@andrew_ilyas

2 years

Come hear about work on datamodels () at ICML *tomorrow* in the Deep Learning/Optimization track (Rm 309)! The presentation is at 4:50 with a poster session at 6:30. Joint work with @smsampark @logan_engstrom @gpoleclerc @aleks_madry

1

19

63

0

7

Michael Oberst

@MichaelOberst

3 years

We prove that one noisy proxy is *not* sufficient to recover the distributional robustness guarantees of Anchor Regression, but that these guarantees can be recovered with two noisy proxies (and we give an algorithm for doing so). [5/n]

1

0

7

Michael Oberst

@MichaelOberst

2 years

Well deserved!

Irene Chen

@irenetrampoline

2 years

Life update! Starting summer 2023, I'll be an Assistant Professor in UC Berkeley + UCSF's new Computational Precision Health program () with a joint appt in Berkeley EECS. Starting Aug 2022, I'll be a Postdoc at Microsoft Research New England

100

34

2K

0

6

Michael Oberst

@MichaelOberst

4 years

No research today, taking part in #ShutDownSTEM (see thread). If you're on Twitter today, try searching #BlackintheIvory and letting some of the stories sink in. #ShutDownAcademia #StrikeforBlackLives

John Urschel

@JohnCUrschel

4 years

This Wednesday, June 10th, in conjunction with #ParticlesForJustice , I am taking part in #ShutDownSTEM , a one day event for African Americans in STEM to rest, and for others to educate themselves and reflect. #BlackLivesMatter #BlackandSTEM 1/20

9

158

396

0

6

Michael Oberst

@MichaelOberst

3 years

In many real-world domains (e.g., medicine), important confounding factors are not observed, like socioeconomic status. If these factors change between train and test (e.g., moving to a new hospital) predictive performance may suffer. [2/n]

1

0

5

Michael Oberst

@MichaelOberst

3 years

We tackle this problem for the case of linear causal models, building on prior work (Anchor Regression) that assumes the relevant factors are observed during training. We instead assume that only noisy proxies are available. [4/n]

1

0

5

Michael Oberst

@MichaelOberst

5 years

Submit to the ML for Health (ML4H) Workshop at NeurIPS 2019! This year's theme: What makes ML in Medicine different? Deadline is Friday Sept 13th.

0

1

5

Michael Oberst

@MichaelOberst

3 years

We also extend Anchor Regression to more targeted robustness sets (encoding prior knowledge about feasible shifts), and show that guarantees can also be recovered in this setting with two proxies. [6/n]

1

0

5

Michael Oberst

@MichaelOberst

2 years

@zacharylipton Sounds related to under-reporting in causal inference (when you observe A = 0, could be that no treatment was given, or that you just didn't record it). @suchisaria @royjamesadams have something on this.

1

0

5

Michael Oberst

@MichaelOberst

4 years

@david_sontag Shout-out to @tompollard and the team at Physionet for helping make this dataset release happen!

0

4

Michael Oberst

@MichaelOberst

7 months

@rasbt @andersonbcdefg FWIW, giving credit where it's due, it does look like the former (Liu et al. 2024) cite the latter (Mitchell et al. 2023) and discuss distinctions b/w the papers in the related work, while acknowledging the equation is largely the same (Sec 7, second para of Liu et al.)

0

4

Michael Oberst

@MichaelOberst

3 years

Come to our talk on Thursday at 11:40pm ET to hear more! [end]

0

4

Michael Oberst

@MichaelOberst

6 years

@irenetrampoline presenting “Why is my Classifier Discriminatory” at #NeurIPS18 . Practical ways to decompose and reduce unfairness (in terms of accuracy), esp relevant in healthcare. @frejohk @david_sontag

0

4

Michael Oberst

@MichaelOberst

3 years

So how do we learn models with robust *predictive* performance under these shifts? We take a distributional robustness viewpoint, minimizing a worst-case loss under changes to the distribution of unobserved factors. [3/n]

1

0

3

Michael Oberst

@MichaelOberst

8 months

@cyrilzakka @HiesingerLab Sounds like a cool initiative! I noticed that the "view on github" takes you to an empty repo () - is there a plan to develop this as an open-source project? Or a place to find more documentation about the details?

GitHub - almanac-chat/almanac-chat

Contribute to almanac-chat/almanac-chat development by creating an account on GitHub.

github.com

1

0

1

Michael Oberst

@MichaelOberst

4 years

We give an algorithm (using Boolean rule sets) to characterize this population, simple enough to be published alongside a study. That way, clinicians can incorporate this knowledge as they read about your results! [3/5]

1

0

3

Michael Oberst

@MichaelOberst

4 years

And of course, shout-out to wonderful co-authors @frejohk , Dennis Wei, Tian Gao, @bratogram , @david_sontag , and @krvarshney . [5/5]

0

3

Michael Oberst

@MichaelOberst

4 years

@rahulgk is an inspiring scientist, and an even better friend / roommate! Glad to still have him around in Cambridge for the next year ;)

Rahul G. Krishnan

@rahulgk

4 years

I graduated with my PhD from @MITEECS . In Fall 2021, I'll be an assistant assistant professor jointly in CS ( @UofTCompSci ) and Medicine ( @uoftmedicine ) & a member of the Vector Institute ( @VectorInst ). For the next year, I'm excited to be a researcher at MSR New England ( @MSRNE ).

33

12

506

0

2

Michael Oberst

@MichaelOberst

2 years

@chrix2 @david_sontag @nikolajthams yes, you can find the code here

GitHub - clinicalml/parametric-robustness-evaluation: Code for paper "Evaluating Robustness to...

Code for paper "Evaluating Robustness to Dataset Shift via Parametric Robustness Sets" - GitHub - clinicalml/parametric-robustness-evaluation: Code for paper "Evaluating...

github.com

1

2

Michael Oberst

@MichaelOberst

11 months

@harini824 @BrownCSDept Congrats!

0

2

Michael Oberst

@MichaelOberst

3 years

@leorahorwitzmd @pophealthNYC @KHochmanMD @Francois1Fritz @Cerf_MD @nyugrossman @hmkyale @ehbvassar Congratulations!!!

0

2

Michael Oberst

@MichaelOberst

4 years

Consider evaluation of a new treatment policy - estimates of policy value will only apply to the types of patients who are both (a) well-represented in the study, and (b) are observed being treated according to the proposed policy. [2/5]

1

0

2

Michael Oberst

@MichaelOberst

10 months

@mdredze @lilyxu0 @Harvard @JHUCompSci I came here to say the same!

1

0

2

Michael Oberst

@MichaelOberst

9 months

@jivatneet Thank you Jivat! Hope all is well at Berkeley :)

0

2

Michael Oberst

@MichaelOberst

4 years

Check out the full paper for more details - the supplement includes recommendations on hyper-parameter tuning, backed up by experiments, as well as further experiments on characterizing overlap in policy evaluation. [4/5]

1

0

2

Michael Oberst

@MichaelOberst

2 years

@CasualBrady @emrek @ehudkar You can think of OverRule as learning a set of "inclusion criteria" that describe the population with reasonable propensity scores. E.g., in a dataset of post-surgical opioid prescriptions, the rules highlight that overlap does not hold for C-section surgeries (Fig. 4)

1

0

2

Michael Oberst

@MichaelOberst

2 years

@CasualBrady @emrek @ehudkar General goal is to (a) report results on a population where overlap holds, but also (b) make it clear to the reader (of your study) what that population is. Dropping samples with extreme propensity scores helps with (a) but not (b), which is where OverRule comes in...[1/2]

1

Michael Oberst

@MichaelOberst

2 years

@NickSpies13 @david_sontag @nikolajthams great question! First, a nice thing about this approach is that we only need to model aspects of the *changing* distributions; e.g., in the lab test example where that is the only shift, one could have a much larger feature space w/o significantly changing the difficulty

1

0

1

Michael Oberst

@MichaelOberst

2 years

We focus on estimating the worst-case loss over a particular robustness set of distributions, defined by parametric "shift functions" that alter one or more conditional distributions. (2/6)

1

0

1

Michael Oberst

@MichaelOberst

4 years

@SanjatKanjilal @Dxgnosis In terms of introductory books, I highly recommend Introduction to Statistical Learning () and Pearl's causal inference primer ()!

1

0

1

Michael Oberst

@MichaelOberst

5 years

See this thread for a quick explainer:

David Sontag

@david_sontag

5 years

@MichaelOberst and I tackle the following question in our upcoming ICML 2019 paper, motivated by our lab's research of ML in healthcare: How do you build trust in a new policy learned by reinforcement learning from observational data?

1

14

58

1

0

1

Michael Oberst

@MichaelOberst

1 year

@SanjatKanjilal @harvardmed Congratulations!

0

1

Michael Oberst

@MichaelOberst

4 years

@AndrewLBeam @graduatedescent @srush_nlp My understanding if helpful: This usage of "nonparametric" is for *discrete* X, Y. Linear model with all interactions is the same as empirical mean E[Y | X = x] for every obs. value of X. For continuous X, can't do this, hence the need for some assumptions (e.g., smoothness)

2

0

1

Michael Oberst

@MichaelOberst

4 years

@lycanduo @WvanAmsterdam @ShalitUri @frejohk @rahulgk @harvineet_singh @angelamczhou @nathankallus Basically yes; Say you observe X', Z', and you've posited some f_z(X, eps_Z) such that P(Z = z' | X = x') = P(f_z(x', eps_Z) = z'), you can now get a posterior distribution p(eps_Z | X', Z'), and you simulate forward with that

0

1

Michael Oberst

@MichaelOberst

8 months

@_scott_fleming_ Thank you Scott! That’s very kind of you to say :)

0

Michael Oberst

@MichaelOberst

5 years

@tianweisheng No worries, I just thought it was a fun bit of irony given the topic =)

0

1

Michael Oberst

@MichaelOberst

2 years

This allows users to constrain shifts to be plausible: In a toy healthcare example, the unconstrained (and unrealistic) worst-case shift = "only order tests for healthy patients". We can avoid this failure mode by constraining the manifold of the shift. (3/6)

1

0

1

Michael Oberst

@MichaelOberst

2 years

@NickSpies13 @david_sontag @nikolajthams this is implicit in the image application, where our approach does *not* need to model the distribution of images given attributes, which we assume is fixed... the GAN is not part of the method, just used to get a "ground truth" evaluation of performance under a new distribution

1

0

1

Michael Oberst

@MichaelOberst

4 years

@graduatedescent @AndrewLBeam @srush_nlp Generally agree, in this context I translate "nonparametric" as "most flexible possible model for the conditional distribution", at which point it's very literal in the discrete case 😉, and closer to how the term is used in e.g., nonparametric regression with continuous vars

0

1

Michael Oberst

@MichaelOberst

8 months

@StanfordDBDS Congratulations @james_y_zou !

0

1

Michael Oberst

@MichaelOberst

4 years

@lycanduo @WvanAmsterdam @ShalitUri @frejohk @rahulgk @harvineet_singh @angelamczhou @nathankallus Graphically, you can always replace e.g., X -> Y with X -> Y <- eps_y, such that Y = f_y(X, eps_y). Basically turn every observed variable into a function, and add jointly independent noise terms that give you all the obs variation

2

0

1

Michael Oberst

@MichaelOberst

2 years

Missed our poster at the Spurious Correlation workshop? You're in luck! We're presenting the poster again at the Principles of Distribution Shift (PODS) workshop at #ICML2022 Poster Session today (Sat) from 11:50am-12:30pm, in Ballroom 3

Michael Oberst

@MichaelOberst

2 years

How should you evaluate the worst-case performance of your model under distribution shift, with only data from the training distribution? Preprint with @david_sontag , @nikolajthams , at SCIS (Fri) and PODS (Sat) workshops at #ICML2022 Paper: (1/6)

4

26

86

0

1

Michael Oberst

@MichaelOberst

4 years

@tw_killian Thanks Taylor! Happy to chat about it anytime

0

1

Michael Oberst

@MichaelOberst

2 years

@NickSpies13 @david_sontag @nikolajthams second, this also extends to only needing "partial" causal knowledge to get a causal interpretation: We only need to know the causal parents of variables where the mechanism is changing.

0

1

Michael Oberst

@MichaelOberst

2 years

We illustrate using a computer vision task w/GAN-generated images of faces, where shifts occur in the distribution of attributes (e.g., eyeglasses, smiling). Our method reveals sensitivity to changes in potentially problematic associations (e.g., women wearing lipstick). (5/6)

1

0

1

Michael Oberst

@MichaelOberst

10 months

@kdpsinghlab @UMich @UCSanDiego @UCSDHealth Congratulations!

0

1

Michael Oberst

@MichaelOberst

4 years

@lycanduo @WvanAmsterdam @ShalitUri @frejohk @rahulgk @harvineet_singh @angelamczhou @nathankallus ofc, how you infer eps_y after observing X, Y, depends on the chosen function f_y and the distribution of eps_y. Could be many choices that yield same obs/int. distribution, but different counterfactual distributions, see e.g., Section 3.1 of

0

1