Noah Greifer @noah_greifer profile

Noah Greifer

@noah_greifer

Followers

4K

Following

2K

Media

42

Statuses

2K

Statistical consultant and programmer at @Harvard @IQSS | Maintainer of the #Rstats packages 'cobalt', 'MatchIt', and 'WeightIt' (and several others) | he/him

Cambridge, MA

Joined January 2020

Don't wanna be here? Send us removal request.

Noah Greifer

@noah_greifer

2 years

I finally made a personal website! Check it out below. I'll be posting blogs about statistics and statistical programming.

5

7

105

Noah Greifer

@noah_greifer

5 years

If you're spending more than an hour trying to figure out something potentially simple in R, just message me. Worst case I'll say I can't help at the moment or don't know the answer. Best case I solve your problem in minutes.

28

121

1K

Noah Greifer

@noah_greifer

2 years

Though I don't see it as really relevant to my professional life, which is all I post about here, I am gay 🏳️‍🌈. Much of my work has been inspired by that of other queer scientists, and hopefully I can inspire yet others. #NationalComingOutDay

6

7

541

Noah Greifer

@noah_greifer

3 years

Hot take: all your logistic regressions should be bias-corrected (i.e., Firth). In #Rstats, this is as simple as adding . method = brglm2::brglmFit. in your call to glm(). Nothing else needs to change.

21

45

412

Noah Greifer

@noah_greifer

10 months

This is one of my favorite posts on CrossValidated because it clearly describes what each assumption in linear regression means and what it is necessary and sufficient for. A great antidote to people thinking residuals have to be normally distributed.

6

51

366

Noah Greifer

@noah_greifer

3 years

Sometimes I think statistical methods should be accessible to all and I want to devote my life to making them comprehensible and easy to implement, and sometimes I think only like 20 people in the world should be allowed to run a propensity score analysis.

8

16

351

Noah Greifer

@noah_greifer

4 months

Sometimes it pisses me off how well bootstrapping works. I tend to prefer analytic solutions for speed and to remove Monte Carlo error, but there is no denying the effectiveness and generality of the bootstrap for most statistical problems I encounter. It can feel like cheating.

15

21

344

Noah Greifer

@noah_greifer

5 years

Defending my dissertation tomorrow(!) and feeling pretty shitty about it. Rereading my document and realizing how long, boring, and riddled with small errors it is. Any words of advice/encouragement/comfort (here or DM) would be appreciated.

55

4

227

Noah Greifer

@noah_greifer

3 years

Understanding regression really well is so empowering.

6

13

236

Noah Greifer

@noah_greifer

2 years

New Rosenbaum & Rubin dropped:. My favorite line: "The statistician who adjusts for observed covariates in an ornate and obscure way does no service, particularly if ornate obscurity erects barriers to success in the step from association to causation.".

4

47

216

Noah Greifer

@noah_greifer

3 years

A way to instantly make your presentations better:.Every time you show a graph, explain what its axes mean and what a given point corresponds to.

9

20

186

Noah Greifer

@noah_greifer

2 years

Some surprising #causalinference facts:.- A confounder doesn't have to cause the treatment.- A confounder doesn't have to cause the outcome.- A variable can cause the treatment and outcome and not be a confounder. See my answer on CV for an explanation:.

8

27

186

Noah Greifer

@noah_greifer

11 months

I'm so excited to announce that my #Rstats package `WeightIt` for estimating balancing weights (e.g., IPTW) has received a major new update that is now on CRAN. I wrote a blog post about the biggest new features:. #causaltwitter #epitwitter.

1

39

178

Noah Greifer

@noah_greifer

6 months

Working on an R package no one asked for that one is going to use. Why? I just think it's neat.

5

164

Noah Greifer

@noah_greifer

4 years

I'm excited to announce that I have accepted a position at the @Harvard @IQSS as a data science specialist! I'm sad to leave my postdoc with the amazing @Lizstuartdc early, but I could not pass up this incredible opportunity. A big thank you to @kinggary for seeing my potential.

13

2

148

Noah Greifer

@noah_greifer

6 months

My new blog post, An Odds Ratio Paradox, in which I introduce the paradox and don't solve it:.

5

18

141

Noah Greifer

@noah_greifer

2 years

#Rstats {MatchIt} v4.5.0 is out!. Big changes:.- New matching method: generalized full matching (`method = "quick"`).- New unified framework for estimating effects after matching using {marginaleffects}. Read the rest here:.(1/8).

5

24

121

Noah Greifer

@noah_greifer

3 years

Why Do We Do Matching vs. Regression To Adjust for Confounding? A Tale.

4

26

122

Noah Greifer

@noah_greifer

5 years

Hey everyone, I passed :) There's a new PhD in town. Thank you so much for your support in all this! I'm so glad I had a whole community rally behind me and send their love and encouragement. It really meant a lot.

Noah Greifer

@noah_greifer

5 years

Defending my dissertation tomorrow(!) and feeling pretty shitty about it. Rereading my document and realizing how long, boring, and riddled with small errors it is. Any words of advice/encouragement/comfort (here or DM) would be appreciated.

9

1

109

Noah Greifer

@noah_greifer

4 months

As we wait for the wisdom to drop, I remind everyone that {WeightIt} is the only #Rstats package (I think!) that correctly computes standard errors for IPWRA and can replicate teffects ipwra in Stata, but supports more estimators, estimands, and treatment types.

Jeffrey Wooldridge

@jmwooldridge

4 months

Tomorrow I'll tweet about why, of all the treatment effect estimators available when treatment is unconfounded, I prefer IPWRA: inverse probability weighted regression adjustment.

3

12

114

Noah Greifer

@noah_greifer

2 years

Does anyone know of guides to help medical researchers choose among odds ratios, risk ratios, risk differences, NNT, etc.? Or between marginal and conditional effects? I end up having to write the same long email explaining these choices each time I consult. #epitwitter.

19

16

105

Noah Greifer

@noah_greifer

2 years

Have you ever wanted a FWB?. Of course I'm talking about the fractional weighted bootstrap! (aka the Bayesian bootstrap). My new #Rstats package {fwb} implements the FWB and acts as a drop-in for {boot}. Check out the website: So what is the FWB?.

5

26

104

Noah Greifer

@noah_greifer

11 months

New blog post!. I explain M-estimation (and show you how to do it!), demonstrate how logistic regression is inextricably linked to covariate balance, and reveal the genius of CBPS and overlap weights. #causalinference #econtwitter #epitwitter #Rstats.

6

25

105

Noah Greifer

@noah_greifer

5 years

You do learn best by struggling through it yourself, but sometimes you just want it done. And I think there can be a difference between experiential learning and needless floundering.

2

1

99

Noah Greifer

@noah_greifer

4 months

Those who use linear regression, why don't you use flexible models, like GAMs, splines, or locally weighted linear models? (This is a query, not a judgment; I don't use them either.).

23

9

93

Noah Greifer

@noah_greifer

2 years

@nik_tzoumas @statsepi @EpiEllie @PWGTennant @RWJE_BA @f2harrell @stephensenn @tmorris_mrc

7

15

91

Noah Greifer

@noah_greifer

2 years

#Rstats {clarify} is out! It uses simulation-based inference to compute interpretable quantities from regression models, such as average marginal effects and predictions at representative values, similar to {Zelig} and Clarify for Stata.

Gary King

@kinggary

2 years

Did you use Clarify for Stata but would like to do it in R? Or maybe you used {Zelig} for R (before it retired)? Welcome to our new R package {clarify}: Software for Interpreting and Presenting Statistical Results

1

12

89

Noah Greifer

@noah_greifer

5 months

I think I need to stop gatekeeping this paper, which has recently become one of my favorites:. "Assumption Lean Regression" by Berk et al. (2023). and its more technical cousin here by Buja et al. (2019):

Matthew B Jané

@MatthewBJane

5 months

LR parameter estimates need no assumption of normality or linearity between variables (see Gauss-Markov theorem). The conditional normality assumption is needed for analytic SEs and test-statistics if we model the residuals as normal, but it can be any distribution.

2

12

88

Noah Greifer

@noah_greifer

4 years

My first publication with @Lizstuartdc, the dream finally came true :)

3

0

76

Noah Greifer

@noah_greifer

2 years

I'm really trying to figure out survival analysis and am seeking recommendations for learning materials, which can be of any form, ideally oriented towards junior biostats PhD students, i.e., getting into the weeds of estimation and inference for basic methods. Thanks!.

21

8

76

Noah Greifer

@noah_greifer

2 years

New blog post! On how matching is a nonparametric method of estimating propensity scores, and matching weights are propensity score weights. #causaltwitter #epitwitter #EconTwitter .

2

18

78

Noah Greifer

@noah_greifer

5 years

To be clear, I'm not offering an unlimited free consulting service. I enjoy helping and I love #Rstats, but I'm afraid this tweet is taking on a life of its own 😅.

2

1

71

Noah Greifer

@noah_greifer

4 years

The moment I live for and the reason I enjoy volunteering my time to help others with R :)

0

1

75

Noah Greifer

@noah_greifer

10 months

To ask "does M mediate the effect of A on Y?" is to ask "what is the indirect effect of A on Y through M?" That is, we have to convert a substantive question into one with a specific estimand, the indirect effect. 🧵.

1

11

72

Noah Greifer

@noah_greifer

2 years

#causalinference question:. Does fitting a hurdle/zero-inflated model yield biased effects because you are conditioning on a post-trt collider (i.e., membership in the zero class)? In particular, is the coefficient on the count part uninterpretable as causal, even in an RCT?.

9

13

64

Noah Greifer

@noah_greifer

2 years

I get a lot of positive feedback on @FarhadPishgar and my #Rstats package {MatchThem} for matching and weighting with multiply imputed data. My newest blog post demonstrates how to integrate it with {marginaleffects} and {clarify} to estimate tx effects:.

0

11

73

Noah Greifer

@noah_greifer

11 months

Random #Rstats fact:. binomial()$linkinv(x). is faster than and yields the same value as. plogis(x). both of which are equal to (and faster than). (1 + exp(-x))^-1. also known as "expit" or "inverse logit".

6

4

68

Noah Greifer

@noah_greifer

9 months

#Rstats {WeightIt} v1.1.0 is released!. Updates here: Summary below: . .

1

12

66

Noah Greifer

@noah_greifer

2 years

New MatchIt update coming soon 👀.

4

3

61

Noah Greifer

@noah_greifer

2 years

Genetic matching is uniformly superior to nearest-neighbor matching (if you have the patience for it!). In my newest blog post, I explain everything you could want to know about genetic matching, including how to program it yourself!.

2

15

65

Noah Greifer

@noah_greifer

2 years

A big update to my #Rstats package {WeightIt} (0.14.0). New features:.- Energy balancing for continuous treatments using methodology by @jared_huling.- A new vignette on estimating effects after weighting using @VincentAB's {marginaleffects}. All changes:.

1

12

61

Noah Greifer

@noah_greifer

11 months

A huge new update is coming to #Rstats `WeightIt`, with two new features not available elsewhere in R. One is a weighting method (old in the literature but new to R) and the other will help with effect estimation. Any guesses as to what they are?.

3

2

59

Noah Greifer

@noah_greifer

1 year

If you are an RStudio + Dropbox user and notice that DB is constantly syncing when you have RS open, using a lot of CPU, I have a solution for you, with much credit to @openai ChatGPT for helping me with the solution, which requires using Terminal because DB sucks. 🧵.

5

8

62

Noah Greifer

@noah_greifer

2 years

I haven't been asked to review papers in a while, which makes me fear that bad research using propensity scores is getting through. I am available to review application papers that use PS and applied methodological papers on PS (e.g., simulation studies).

2

8

53

Noah Greifer

@noah_greifer

5 months

1) Read the documentation. 2) Don't do mediation. Okay that knocks out about 90% of the questions I get. Now let me implement this obscure estimator that even its inventor won't use.

2

0

53

Noah Greifer

@noah_greifer

4 years

In #Rstats, if you printed something to the console but forgot to save it as an object and you want to use the output without re-running the functions, the output is stored in the .Last.value variable and can be saved from there. @RLangTip.

4

10

53

Noah Greifer

@noah_greifer

10 months

There is a kind of "equity" study that seems popular in medical research where you adjust for all mediators between group membership (e.g., race) and an outcome to claim that a disparity exists. The direct effect of group is interpreted as the magnitude of the disparity. .

3

11

57

Noah Greifer

@noah_greifer

3 years

If you've ever thought the point of centering variables in regression was to reduce collinearity, get that out of your head immediately! The *sole* point of centering is to change the interpretation of coefficients in the model.

5

7

50

Noah Greifer

@noah_greifer

5 years

Did not expect to triple my follower count overnight. You've all made a terrible mistake; I'm incredibly boring (on Twitter).

1

0

52

Noah Greifer

@noah_greifer

4 years

The new #Rstats MatchIt V4 is finally out! So many new features, fixes, and improvements! Pages and pages of documentation! A new website and logo! #CausalTwitter #EconTwitter #epitwitter @kinggary @Lizstuartdc.

Gary King

@kinggary

4 years

New version of "MatchIt: Nonparametric Preprocessing for Parametric Causal Inference" with new features, website, more.

1

17

52

Noah Greifer

@noah_greifer

4 years

It pains me to see people manually program "logit" functions in #rstats when R already has these built-ins:. qlogis() = "logit"; probability to log odds.plogis() = "inverse logit"; log odds to probability.

8

14

50

Noah Greifer

@noah_greifer

3 years

I'm sorry, but I have to once again tweet about this absolutely incredible paper. Every page describes a new discovery or connection. Seriously one of the most ambitious and illuminating #causalinference papers I've ever read. And extremely clear, too.

1

8

48

Noah Greifer

@noah_greifer

3 months

@MatthewBJane I like mclogit::mblogit(), which performs fast multinomial logistic regression with optional random effects. Supported by {marginaleffects}. Make sure not to use mclogit() unless you know what you're doing! (It fits different model.).

2

3

51

Noah Greifer

@noah_greifer

2 years

Any good papers on alternatives to hazard ratios for quantifying treatment effects on survival outcomes? Ideally review papers aimed at an applied audience. #epitwitter #causaltwitter.

10

8

52

Noah Greifer

@noah_greifer

2 years

2000 followers! Thank you so much for giving me a platform to talk about statistics :) I know my feed is pretty dry but hopefully I've improved some people's lives with my online presence.

3

2

49

Noah Greifer

@noah_greifer

10 months

The #rstats `mediation` package is great, but people need to understand that it's not a general-purpose mediation package. It implements one specific method of mediation that requires parameters (like the "treated" and "control" values) to be set in a specific way.

8

9

50

Noah Greifer

@noah_greifer

11 months

@PhDemetri I'm so flattered, thank you so much!!!. I only accept payment in the form of validation, recognition, and exposure at the moment, so this tweet was payment enough :).

4

0

47

Noah Greifer

@noah_greifer

3 years

@kareem_carr This relies on the idea that coefficients in multiple regression have meaningful interpretations. But they don't. That's what the table 2 fallacy is all about. If you want average marginal effects, you can get those from machine learning models.

5

2

48

Noah Greifer

@noah_greifer

2 years

One of you had a nice blog post about why we should never do mediation analysis. Please help me find it (or submit your own helpful posts/articles).

9

6

46

Noah Greifer

@noah_greifer

4 years

ATE, ATT, ATO. how do you choose? Different methods target different estimands, yielding effects with different interpretations. How do these estimands differ, and which one is right for you?. @Lizstuartdc and I explore that in our new article: >>.

2

9

43

Noah Greifer

@noah_greifer

3 years

I loved being a guest on @quantitudepod and talking about my favorite topic, propensity scores! Thanks P & G :).

quantitudethepodcast

@quantitudepod

3 years

S3E27: Propensity Scores — I Meant To Do That!. P & G hang out with @noah_greifer, Institute for Quantitative Social Sciences at Harvard University, to discuss propensity scores: what they are, how we get them, and how they can strengthen causal inference.

3

2

44

Noah Greifer

@noah_greifer

2 years

"A Violin Plot".or."A Kernel Density Plot and Then the Exact Same Kernel Density Plot but Upside Down This Time".or."What If a Kernel Density Plot, but Twice?".or."A Georgia O'Keeffe Painting but Ugly and Made of Data".or.

3

2

39

Noah Greifer

@noah_greifer

2 years

Have any of you used for writing a manuscript? I saw it recommended by a journal I was submitting to and it looks really cool. Curious if anyone has used it in practice and whether you would recommend it.

4

9

42

Noah Greifer

@noah_greifer

1 year

The problem with conditioning on a post-treatment variable (CPTV) isn't (just) that you are conditioning on a collider; it's that you are conditioning on a mediator. Even it wasn't a collider, CPTV still changes the interpretation of the treatment effect estimate.

1

8

41

Noah Greifer

@noah_greifer

3 years

All models are wrong except my cross-fit SuperLearner with GAM, GBM, random forests, HAL, and BART as candidate libraries.

1

40

Noah Greifer

@noah_greifer

2 years

Really frustrated with a paper I'm reviewing. I recommended rejection with a litany of complaints, editor gave them a resubmit, the resubmission doesn't fix any of my concerns. We're on round 3 and the paper still sucks.

10

0

42

Noah Greifer

@noah_greifer

3 years

How can you further adjust for propensity scores after matching? My answer here: Short answer: g-computation in the matched sample, made possible by @VincentAB's {marginaleffects} package.

1

3

43

Noah Greifer

@noah_greifer

2 years

I currently have a blog post up about performing subgroup/moderation analysis after propensity score matching in R. I hope you find it useful!.

1

4

42

Noah Greifer

@noah_greifer

9 months

This is a must-read for anyone interested in causal inference methods, especially if TMLE or DML are opaque to you. Well written as always @ildiazm. Your clarity, rigor, and expertise are inspiring.

Lars van der Laan

@LarsvanderLaan3

9 months

What are the differences between one-step estimation, Double ML, and Targeted ML? . This commentary (@ildiazm) and blog post (@mark_vdlaan) provide an overview of the history of machine learning in semiparametrics.

1

4

42

Noah Greifer

@noah_greifer

2 years

*me clutching onto my ornate and obscure covariate adjustment methods*

1

2

36

Noah Greifer

@noah_greifer

2 years

MatchIt has been updated to 4.5.3 with some critical bug fixes, in particular with k:1 matching with replacement. Please update MatchIt and re-run your analyses if you used this method using version 4.5.1 or 4.5.2 (i.e., between the end of Feb and now).

1

5

40

Noah Greifer

@noah_greifer

1 year

Today I hit 30k reputation points on CrossValidated. Thanks to everyone who has found my answers useful and upvoted or shared them!.

4

0

33

Noah Greifer

@noah_greifer

3 years

Come join me at Harvard! The Data Science Services team at @IQSS is hiring a statistical consultant position. Details here: This is essentially the same position I'm in, so I'm happy to answer any questions about it.

8

26

37

Noah Greifer

@noah_greifer

2 years

I'll be honest. I have no idea what Quarto is and I'm too afraid to ask.

2

0

37

Noah Greifer

@noah_greifer

3 years

Can't wait to show you :) #Rstats

1

2

32

Noah Greifer

@noah_greifer

2 years

The observation was due to the misspecification of the variance in both models. Using robust SEs (with quasipoisson) revealed the hypothesized pattern. Lesson: quasipoisson + robust SEs over Poisson/NB.

Solomon Kurz

@SolomonKurz

2 years

Presuming a clean RCT, is the ANCOVA model better than the ANOVA model when using the Poisson likelihood? After working with a real data set and doing a little simulation, it seems like the Poisson ANCOVA doesn't boost power for the beta coefficient, or for the ATE. Citations?.

4

3

35

Noah Greifer

@noah_greifer

2 years

One of the biggest updates to {MatchIt} will be coming in a few months. I know I just updated it, but I'm adding a huge new feature that will take it to the next level.

3

1

32

Noah Greifer

@noah_greifer

4 months

Hit 4k followers today :) Thank you for the support everyone!. How can I incorporate that into my H-index? 🥴.

3

0

35

Noah Greifer

@noah_greifer

2 years

@SolomonKurz I don't know if it's *the* way to go, but it's a way to go. I almost always use random forest imputation. It's available in {mice}.

0

1

34

Noah Greifer

@noah_greifer

3 years

This is a great post about the plausibility of the utility of covariate adjustment methods (e.g., OLS, PSM) in economics. Every field needs a paper like this instead of endless simulation studies comparing methods. #causalinference @dmckenzie001 .

3

9

33

Noah Greifer

@noah_greifer

2 years

Another day, another problem solved by @VincentAB's {marginaleffects} #Rstats package.

2

34

Noah Greifer

@noah_greifer

3 years

#Rstats #ggplot2 tip:.To "zoom in" on a plot, you MUST use coord_cartesian() and not lims() or scale_x_continuous()!. coord_cartesian() changes the plot area.lims() or scale_x_continuous() discard data points!. See below for an example:.

1

8

27

Noah Greifer

@noah_greifer

1 year

I hear it and I know

0

2

30

Noah Greifer

@noah_greifer

1 year

Thank you everyone for 3k followers! This is easily the largest platform I've ever had and I'm honored so many of you are interested in what I have to say!.

0

2

28

Noah Greifer

@noah_greifer

6 months

Call for some statistics help:. How can I compare nested models that are not *symbolically* nested when using robust SEs? LR/score test doesn't account for robust SEs, and the usual Wald test needs symbolic nesting. Details here:.

6

4

29

Noah Greifer

@noah_greifer

10 months

@CausalHuber convincingly demonstrated that this approach is flawed in economics research because one conditions on colliders, allowing bias to remain in the direct effect and therefore distorting the disparity estimate. .

1

3

28

Noah Greifer

@noah_greifer

3 years

I just hit 20k points on CrossValidated, the StackExchange statistics help site. I've been on the site for 5 years and 8 months, have answered 651 questions, and have reached ~360,000 people.

3

1

26

Noah Greifer

@noah_greifer

2 years

If you've ever been curious about entropy balancing, you might benefit from reading my answer to this question about it on CrossValidated. #causaltwitter.

3

4

27

Noah Greifer

@noah_greifer

2 years

I have read so many great threads like this, putting names to all my experiences, that say "This is ADHD". The biggest barriers to my success are things people often label as symptoms of untreated ADHD. Chronic procrastination, hyperfocus, inability to do certain tasks, . .

2

1

27

Noah Greifer

@noah_greifer

3 years

First day at Harvard today 😬.

2

0

26

Noah Greifer

@noah_greifer

1 year

So, the correct answer is D (1)!. Why?. 1 < 2 evaluates to a length 1 logical vector, which means only the first element of the second argument is returned. The length of the first argument determines the length of the output. .

Noah Greifer

@noah_greifer

1 year

#Rstats quiz:. a <- 1:4. b <- ifelse(1 < 2, a, 0). print(b). What should you expect?.

2

27

Noah Greifer

@noah_greifer

1 year

@PhDemetri They talk past each other because they are saying two different things. But that means you can learn two different things and reconcile them yourself. Pearl is explaining a formal system of causality; Senn and Harrell are explaining how to design studies for causality.

2

0

28

Noah Greifer

@noah_greifer

3 years

My thoughts on the ubiquity and limitations of propensity scores. I'm not as much of a propensity score fanatic as you might think ;). .#causaltwitter #epitwitter.

1

5

27

Noah Greifer

@noah_greifer

3 years

@adamjnafa 😌.For when you need a citation clapback.

3

2

29

Noah Greifer

@noah_greifer

4 years

MatchIt version 4.2.0 was released on CRAN today!. New features:.- distance can be supplied as a distance matrix.- new tools and guide for moderation analysis.- anti-exact matching.- speed improvements (esp. for exact matching).#Rstats #Causaltwitter #Econtwitter #Epitwitter.

1

10

25

Noah Greifer

@noah_greifer

11 months

I know I'm usually answering questions on CV and not asking them, but if anyone could help with a question about propensity scores (really) and GMM I would appreciate it! #EconTwitter #causalinference.

6

9

27

Noah Greifer

@noah_greifer

5 years

Also peep the new bio 👀.

7

0

24

Noah Greifer

@noah_greifer

9 months

@MatthewBJane One way would be to plot the linear predictor (the mean of the latent variable) and the thresholds. You also superimpose the implied logistic distributions on the line. Something like this from Long and Freese (2014):

2

0

27

Noah Greifer

@noah_greifer

10 months

@PhDemetri Recommending:.- Greifer & Stuart (2021): - Stuart (2010): - Ho et al. (2007): - The MatchIt documentation: This is my area of expertise so again feel free to ask.

2

3

28