Noah Greifer
@noah_greifer
Followers
4K
Following
2K
Media
42
Statuses
2K
Statistical consultant and programmer at @Harvard @IQSS | Maintainer of the #Rstats packages 'cobalt', 'MatchIt', and 'WeightIt' (and several others) | he/him
Cambridge, MA
Joined January 2020
Though I don't see it as really relevant to my professional life, which is all I post about here, I am gay 🏳️🌈. Much of my work has been inspired by that of other queer scientists, and hopefully I can inspire yet others. #NationalComingOutDay
6
7
541
Hot take: all your logistic regressions should be bias-corrected (i.e., Firth). In #Rstats, this is as simple as adding . method = brglm2::brglmFit. in your call to glm(). Nothing else needs to change.
21
45
412
Some surprising #causalinference facts:.- A confounder doesn't have to cause the treatment.- A confounder doesn't have to cause the outcome.- A variable can cause the treatment and outcome and not be a confounder. See my answer on CV for an explanation:.
8
27
186
I'm so excited to announce that my #Rstats package `WeightIt` for estimating balancing weights (e.g., IPTW) has received a major new update that is now on CRAN. I wrote a blog post about the biggest new features:. #causaltwitter #epitwitter.
1
39
178
I'm excited to announce that I have accepted a position at the @Harvard @IQSS as a data science specialist! I'm sad to leave my postdoc with the amazing @Lizstuartdc early, but I could not pass up this incredible opportunity. A big thank you to @kinggary for seeing my potential.
13
2
148
#Rstats {MatchIt} v4.5.0 is out!. Big changes:.- New matching method: generalized full matching (`method = "quick"`).- New unified framework for estimating effects after matching using {marginaleffects}. Read the rest here:.(1/8).
5
24
121
Hey everyone, I passed :) There's a new PhD in town. Thank you so much for your support in all this! I'm so glad I had a whole community rally behind me and send their love and encouragement. It really meant a lot.
Defending my dissertation tomorrow(!) and feeling pretty shitty about it. Rereading my document and realizing how long, boring, and riddled with small errors it is. Any words of advice/encouragement/comfort (here or DM) would be appreciated.
9
1
109
As we wait for the wisdom to drop, I remind everyone that {WeightIt} is the only #Rstats package (I think!) that correctly computes standard errors for IPWRA and can replicate teffects ipwra in Stata, but supports more estimators, estimands, and treatment types.
Tomorrow I'll tweet about why, of all the treatment effect estimators available when treatment is unconfounded, I prefer IPWRA: inverse probability weighted regression adjustment.
3
12
114
Does anyone know of guides to help medical researchers choose among odds ratios, risk ratios, risk differences, NNT, etc.? Or between marginal and conditional effects? I end up having to write the same long email explaining these choices each time I consult. #epitwitter.
19
16
105
Have you ever wanted a FWB?. Of course I'm talking about the fractional weighted bootstrap! (aka the Bayesian bootstrap). My new #Rstats package {fwb} implements the FWB and acts as a drop-in for {boot}. Check out the website: So what is the FWB?.
5
26
104
New blog post!. I explain M-estimation (and show you how to do it!), demonstrate how logistic regression is inextricably linked to covariate balance, and reveal the genius of CBPS and overlap weights. #causalinference #econtwitter #epitwitter #Rstats.
6
25
105
#Rstats {clarify} is out! It uses simulation-based inference to compute interpretable quantities from regression models, such as average marginal effects and predictions at representative values, similar to {Zelig} and Clarify for Stata.
Did you use Clarify for Stata but would like to do it in R? Or maybe you used {Zelig} for R (before it retired)? Welcome to our new R package {clarify}: Software for Interpreting and Presenting Statistical Results
1
12
89
I think I need to stop gatekeeping this paper, which has recently become one of my favorites:. "Assumption Lean Regression" by Berk et al. (2023). and its more technical cousin here by Buja et al. (2019):
LR parameter estimates need no assumption of normality or linearity between variables (see Gauss-Markov theorem). The conditional normality assumption is needed for analytic SEs and test-statistics if we model the residuals as normal, but it can be any distribution.
2
12
88
New blog post! On how matching is a nonparametric method of estimating propensity scores, and matching weights are propensity score weights. #causaltwitter #epitwitter #EconTwitter .
2
18
78
To be clear, I'm not offering an unlimited free consulting service. I enjoy helping and I love #Rstats, but I'm afraid this tweet is taking on a life of its own 😅.
2
1
71
#causalinference question:. Does fitting a hurdle/zero-inflated model yield biased effects because you are conditioning on a post-trt collider (i.e., membership in the zero class)? In particular, is the coefficient on the count part uninterpretable as causal, even in an RCT?.
9
13
64
I get a lot of positive feedback on @FarhadPishgar and my #Rstats package {MatchThem} for matching and weighting with multiply imputed data. My newest blog post demonstrates how to integrate it with {marginaleffects} and {clarify} to estimate tx effects:.
0
11
73
Random #Rstats fact:. binomial()$linkinv(x). is faster than and yields the same value as. plogis(x). both of which are equal to (and faster than). (1 + exp(-x))^-1. also known as "expit" or "inverse logit".
6
4
68
A big update to my #Rstats package {WeightIt} (0.14.0). New features:.- Energy balancing for continuous treatments using methodology by @jared_huling.- A new vignette on estimating effects after weighting using @VincentAB's {marginaleffects}. All changes:.
1
12
61
A huge new update is coming to #Rstats `WeightIt`, with two new features not available elsewhere in R. One is a weighting method (old in the literature but new to R) and the other will help with effect estimation. Any guesses as to what they are?.
3
2
59
If you are an RStudio + Dropbox user and notice that DB is constantly syncing when you have RS open, using a lot of CPU, I have a solution for you, with much credit to @openai ChatGPT for helping me with the solution, which requires using Terminal because DB sucks. 🧵.
5
8
62
The new #Rstats MatchIt V4 is finally out! So many new features, fixes, and improvements! Pages and pages of documentation! A new website and logo! #CausalTwitter #EconTwitter #epitwitter @kinggary @Lizstuartdc.
New version of "MatchIt: Nonparametric Preprocessing for Parametric Causal Inference" with new features, website, more.
1
17
52
It pains me to see people manually program "logit" functions in #rstats when R already has these built-ins:. qlogis() = "logit"; probability to log odds.plogis() = "inverse logit"; log odds to probability.
8
14
50
I'm sorry, but I have to once again tweet about this absolutely incredible paper. Every page describes a new discovery or connection. Seriously one of the most ambitious and illuminating #causalinference papers I've ever read. And extremely clear, too.
1
8
48
@MatthewBJane I like mclogit::mblogit(), which performs fast multinomial logistic regression with optional random effects. Supported by {marginaleffects}. Make sure not to use mclogit() unless you know what you're doing! (It fits different model.).
2
3
51
Any good papers on alternatives to hazard ratios for quantifying treatment effects on survival outcomes? Ideally review papers aimed at an applied audience. #epitwitter #causaltwitter.
10
8
52
The #rstats `mediation` package is great, but people need to understand that it's not a general-purpose mediation package. It implements one specific method of mediation that requires parameters (like the "treated" and "control" values) to be set in a specific way.
8
9
50
@PhDemetri I'm so flattered, thank you so much!!!. I only accept payment in the form of validation, recognition, and exposure at the moment, so this tweet was payment enough :).
4
0
47
@kareem_carr This relies on the idea that coefficients in multiple regression have meaningful interpretations. But they don't. That's what the table 2 fallacy is all about. If you want average marginal effects, you can get those from machine learning models.
5
2
48
ATE, ATT, ATO. how do you choose? Different methods target different estimands, yielding effects with different interpretations. How do these estimands differ, and which one is right for you?. @Lizstuartdc and I explore that in our new article: >>.
2
9
43
I loved being a guest on @quantitudepod and talking about my favorite topic, propensity scores! Thanks P & G :).
S3E27: Propensity Scores — I Meant To Do That!. P & G hang out with @noah_greifer, Institute for Quantitative Social Sciences at Harvard University, to discuss propensity scores: what they are, how we get them, and how they can strengthen causal inference.
3
2
44
How can you further adjust for propensity scores after matching? My answer here: Short answer: g-computation in the matched sample, made possible by @VincentAB's {marginaleffects} package.
1
3
43
This is a must-read for anyone interested in causal inference methods, especially if TMLE or DML are opaque to you. Well written as always @ildiazm. Your clarity, rigor, and expertise are inspiring.
What are the differences between one-step estimation, Double ML, and Targeted ML? . This commentary (@ildiazm) and blog post (@mark_vdlaan) provide an overview of the history of machine learning in semiparametrics.
1
4
42
Come join me at Harvard! The Data Science Services team at @IQSS is hiring a statistical consultant position. Details here: This is essentially the same position I'm in, so I'm happy to answer any questions about it.
8
26
37
The observation was due to the misspecification of the variance in both models. Using robust SEs (with quasipoisson) revealed the hypothesized pattern. Lesson: quasipoisson + robust SEs over Poisson/NB.
Presuming a clean RCT, is the ANCOVA model better than the ANOVA model when using the Poisson likelihood? After working with a real data set and doing a little simulation, it seems like the Poisson ANCOVA doesn't boost power for the beta coefficient, or for the ATE. Citations?.
4
3
35
@SolomonKurz I don't know if it's *the* way to go, but it's a way to go. I almost always use random forest imputation. It's available in {mice}.
0
1
34
This is a great post about the plausibility of the utility of covariate adjustment methods (e.g., OLS, PSM) in economics. Every field needs a paper like this instead of endless simulation studies comparing methods. #causalinference @dmckenzie001 .
3
9
33
@CausalHuber convincingly demonstrated that this approach is flawed in economics research because one conditions on colliders, allowing bias to remain in the direct effect and therefore distorting the disparity estimate. .
1
3
28
If you've ever been curious about entropy balancing, you might benefit from reading my answer to this question about it on CrossValidated. #causaltwitter.
3
4
27
So, the correct answer is D (1)!. Why?. 1 < 2 evaluates to a length 1 logical vector, which means only the first element of the second argument is returned. The length of the first argument determines the length of the output. .
#Rstats quiz:. a <- 1:4. b <- ifelse(1 < 2, a, 0). print(b). What should you expect?.
2
2
27
@PhDemetri They talk past each other because they are saying two different things. But that means you can learn two different things and reconcile them yourself. Pearl is explaining a formal system of causality; Senn and Harrell are explaining how to design studies for causality.
2
0
28
My thoughts on the ubiquity and limitations of propensity scores. I'm not as much of a propensity score fanatic as you might think ;). .#causaltwitter #epitwitter.
1
5
27
MatchIt version 4.2.0 was released on CRAN today!. New features:.- distance can be supplied as a distance matrix.- new tools and guide for moderation analysis.- anti-exact matching.- speed improvements (esp. for exact matching).#Rstats #Causaltwitter #Econtwitter #Epitwitter.
1
10
25
I know I'm usually answering questions on CV and not asking them, but if anyone could help with a question about propensity scores (really) and GMM I would appreciate it! #EconTwitter #causalinference.
6
9
27
@MatthewBJane One way would be to plot the linear predictor (the mean of the latent variable) and the thresholds. You also superimpose the implied logistic distributions on the line. Something like this from Long and Freese (2014):
2
0
27
@PhDemetri Recommending:.- Greifer & Stuart (2021): - Stuart (2010): - Ho et al. (2007): - The MatchIt documentation: This is my area of expertise so again feel free to ask.
2
3
28