noah_greifer Profile Banner
Noah Greifer Profile
Noah Greifer

@noah_greifer

Followers
4K
Following
2K
Media
42
Statuses
2K

Statistical consultant and programmer at @Harvard @IQSS | Maintainer of the #Rstats packages 'cobalt', 'MatchIt', and 'WeightIt' (and several others) | he/him

Cambridge, MA
Joined January 2020
Don't wanna be here? Send us removal request.
@noah_greifer
Noah Greifer
2 years
I finally made a personal website! Check it out below. I'll be posting blogs about statistics and statistical programming.
5
7
105
@noah_greifer
Noah Greifer
5 years
If you're spending more than an hour trying to figure out something potentially simple in R, just message me. Worst case I'll say I can't help at the moment or don't know the answer. Best case I solve your problem in minutes.
28
121
1K
@noah_greifer
Noah Greifer
2 years
Though I don't see it as really relevant to my professional life, which is all I post about here, I am gay 🏳️‍🌈. Much of my work has been inspired by that of other queer scientists, and hopefully I can inspire yet others. #NationalComingOutDay
Tweet media one
6
7
541
@noah_greifer
Noah Greifer
3 years
Hot take: all your logistic regressions should be bias-corrected (i.e., Firth). In #Rstats, this is as simple as adding . method = brglm2::brglmFit. in your call to glm(). Nothing else needs to change.
21
45
412
@noah_greifer
Noah Greifer
10 months
This is one of my favorite posts on CrossValidated because it clearly describes what each assumption in linear regression means and what it is necessary and sufficient for. A great antidote to people thinking residuals have to be normally distributed.
6
51
366
@noah_greifer
Noah Greifer
3 years
Sometimes I think statistical methods should be accessible to all and I want to devote my life to making them comprehensible and easy to implement, and sometimes I think only like 20 people in the world should be allowed to run a propensity score analysis.
8
16
351
@noah_greifer
Noah Greifer
4 months
Sometimes it pisses me off how well bootstrapping works. I tend to prefer analytic solutions for speed and to remove Monte Carlo error, but there is no denying the effectiveness and generality of the bootstrap for most statistical problems I encounter. It can feel like cheating.
15
21
344
@noah_greifer
Noah Greifer
5 years
Defending my dissertation tomorrow(!) and feeling pretty shitty about it. Rereading my document and realizing how long, boring, and riddled with small errors it is. Any words of advice/encouragement/comfort (here or DM) would be appreciated.
55
4
227
@noah_greifer
Noah Greifer
3 years
Understanding regression really well is so empowering.
6
13
236
@noah_greifer
Noah Greifer
2 years
New Rosenbaum & Rubin dropped:. My favorite line: "The statistician who adjusts for observed covariates in an ornate and obscure way does no service, particularly if ornate obscurity erects barriers to success in the step from association to causation.".
4
47
216
@noah_greifer
Noah Greifer
3 years
A way to instantly make your presentations better:.Every time you show a graph, explain what its axes mean and what a given point corresponds to.
9
20
186
@noah_greifer
Noah Greifer
2 years
Some surprising #causalinference facts:.- A confounder doesn't have to cause the treatment.- A confounder doesn't have to cause the outcome.- A variable can cause the treatment and outcome and not be a confounder. See my answer on CV for an explanation:.
8
27
186
@noah_greifer
Noah Greifer
11 months
I'm so excited to announce that my #Rstats package `WeightIt` for estimating balancing weights (e.g., IPTW) has received a major new update that is now on CRAN. I wrote a blog post about the biggest new features:. #causaltwitter #epitwitter.
1
39
178
@noah_greifer
Noah Greifer
6 months
Working on an R package no one asked for that one is going to use. Why? I just think it's neat.
5
5
164
@noah_greifer
Noah Greifer
4 years
I'm excited to announce that I have accepted a position at the @Harvard @IQSS as a data science specialist! I'm sad to leave my postdoc with the amazing @Lizstuartdc early, but I could not pass up this incredible opportunity. A big thank you to @kinggary for seeing my potential.
13
2
148
@noah_greifer
Noah Greifer
6 months
My new blog post, An Odds Ratio Paradox, in which I introduce the paradox and don't solve it:.
5
18
141
@noah_greifer
Noah Greifer
2 years
#Rstats {MatchIt} v4.5.0 is out!. Big changes:.- New matching method: generalized full matching (`method = "quick"`).- New unified framework for estimating effects after matching using {marginaleffects}. Read the rest here:.(1/8).
5
24
121
@noah_greifer
Noah Greifer
3 years
Why Do We Do Matching vs. Regression To Adjust for Confounding? A Tale.
4
26
122
@noah_greifer
Noah Greifer
5 years
Hey everyone, I passed :) There's a new PhD in town. Thank you so much for your support in all this! I'm so glad I had a whole community rally behind me and send their love and encouragement. It really meant a lot.
@noah_greifer
Noah Greifer
5 years
Defending my dissertation tomorrow(!) and feeling pretty shitty about it. Rereading my document and realizing how long, boring, and riddled with small errors it is. Any words of advice/encouragement/comfort (here or DM) would be appreciated.
9
1
109
@noah_greifer
Noah Greifer
4 months
As we wait for the wisdom to drop, I remind everyone that {WeightIt} is the only #Rstats package (I think!) that correctly computes standard errors for IPWRA and can replicate teffects ipwra in Stata, but supports more estimators, estimands, and treatment types.
@jmwooldridge
Jeffrey Wooldridge
4 months
Tomorrow I'll tweet about why, of all the treatment effect estimators available when treatment is unconfounded, I prefer IPWRA: inverse probability weighted regression adjustment.
3
12
114
@noah_greifer
Noah Greifer
2 years
Does anyone know of guides to help medical researchers choose among odds ratios, risk ratios, risk differences, NNT, etc.? Or between marginal and conditional effects? I end up having to write the same long email explaining these choices each time I consult. #epitwitter.
19
16
105
@noah_greifer
Noah Greifer
2 years
Have you ever wanted a FWB?. Of course I'm talking about the fractional weighted bootstrap! (aka the Bayesian bootstrap). My new #Rstats package {fwb} implements the FWB and acts as a drop-in for {boot}. Check out the website: So what is the FWB?.
5
26
104
@noah_greifer
Noah Greifer
11 months
New blog post!. I explain M-estimation (and show you how to do it!), demonstrate how logistic regression is inextricably linked to covariate balance, and reveal the genius of CBPS and overlap weights. #causalinference #econtwitter #epitwitter #Rstats.
6
25
105
@noah_greifer
Noah Greifer
5 years
You do learn best by struggling through it yourself, but sometimes you just want it done. And I think there can be a difference between experiential learning and needless floundering.
2
1
99
@noah_greifer
Noah Greifer
4 months
Those who use linear regression, why don't you use flexible models, like GAMs, splines, or locally weighted linear models? (This is a query, not a judgment; I don't use them either.).
23
9
93
@noah_greifer
Noah Greifer
2 years
#Rstats {clarify} is out! It uses simulation-based inference to compute interpretable quantities from regression models, such as average marginal effects and predictions at representative values, similar to {Zelig} and Clarify for Stata.
@kinggary
Gary King
2 years
Did you use Clarify for Stata but would like to do it in R? Or maybe you used {Zelig} for R (before it retired)? Welcome to our new R package {clarify}: Software for Interpreting and Presenting Statistical Results
1
12
89
@noah_greifer
Noah Greifer
5 months
I think I need to stop gatekeeping this paper, which has recently become one of my favorites:. "Assumption Lean Regression" by Berk et al. (2023). and its more technical cousin here by Buja et al. (2019):
@MatthewBJane
Matthew B Jané
5 months
LR parameter estimates need no assumption of normality or linearity between variables (see Gauss-Markov theorem). The conditional normality assumption is needed for analytic SEs and test-statistics if we model the residuals as normal, but it can be any distribution.
Tweet media one
2
12
88
@noah_greifer
Noah Greifer
4 years
My first publication with @Lizstuartdc, the dream finally came true :)
Tweet media one
3
0
76
@noah_greifer
Noah Greifer
2 years
I'm really trying to figure out survival analysis and am seeking recommendations for learning materials, which can be of any form, ideally oriented towards junior biostats PhD students, i.e., getting into the weeds of estimation and inference for basic methods. Thanks!.
21
8
76
@noah_greifer
Noah Greifer
2 years
New blog post! On how matching is a nonparametric method of estimating propensity scores, and matching weights are propensity score weights. #causaltwitter #epitwitter #EconTwitter .
2
18
78
@noah_greifer
Noah Greifer
5 years
To be clear, I'm not offering an unlimited free consulting service. I enjoy helping and I love #Rstats, but I'm afraid this tweet is taking on a life of its own 😅.
2
1
71
@noah_greifer
Noah Greifer
4 years
The moment I live for and the reason I enjoy volunteering my time to help others with R :)
Tweet media one
0
1
75
@noah_greifer
Noah Greifer
10 months
To ask "does M mediate the effect of A on Y?" is to ask "what is the indirect effect of A on Y through M?" That is, we have to convert a substantive question into one with a specific estimand, the indirect effect. 🧵.
1
11
72
@noah_greifer
Noah Greifer
2 years
#causalinference question:. Does fitting a hurdle/zero-inflated model yield biased effects because you are conditioning on a post-trt collider (i.e., membership in the zero class)? In particular, is the coefficient on the count part uninterpretable as causal, even in an RCT?.
9
13
64
@noah_greifer
Noah Greifer
2 years
I get a lot of positive feedback on @FarhadPishgar and my #Rstats package {MatchThem} for matching and weighting with multiply imputed data. My newest blog post demonstrates how to integrate it with {marginaleffects} and {clarify} to estimate tx effects:.
0
11
73
@noah_greifer
Noah Greifer
11 months
Random #Rstats fact:. binomial()$linkinv(x). is faster than and yields the same value as. plogis(x). both of which are equal to (and faster than). (1 + exp(-x))^-1. also known as "expit" or "inverse logit".
6
4
68
@noah_greifer
Noah Greifer
9 months
#Rstats {WeightIt} v1.1.0 is released!. Updates here: Summary below: . .
1
12
66
@noah_greifer
Noah Greifer
2 years
New MatchIt update coming soon 👀.
4
3
61
@noah_greifer
Noah Greifer
2 years
Genetic matching is uniformly superior to nearest-neighbor matching (if you have the patience for it!). In my newest blog post, I explain everything you could want to know about genetic matching, including how to program it yourself!.
2
15
65
@noah_greifer
Noah Greifer
2 years
A big update to my #Rstats package {WeightIt} (0.14.0). New features:.- Energy balancing for continuous treatments using methodology by @jared_huling.- A new vignette on estimating effects after weighting using @VincentAB's {marginaleffects}. All changes:.
1
12
61
@noah_greifer
Noah Greifer
11 months
A huge new update is coming to #Rstats `WeightIt`, with two new features not available elsewhere in R. One is a weighting method (old in the literature but new to R) and the other will help with effect estimation. Any guesses as to what they are?.
3
2
59
@noah_greifer
Noah Greifer
1 year
If you are an RStudio + Dropbox user and notice that DB is constantly syncing when you have RS open, using a lot of CPU, I have a solution for you, with much credit to @openai ChatGPT for helping me with the solution, which requires using Terminal because DB sucks. 🧵.
5
8
62
@noah_greifer
Noah Greifer
2 years
I haven't been asked to review papers in a while, which makes me fear that bad research using propensity scores is getting through. I am available to review application papers that use PS and applied methodological papers on PS (e.g., simulation studies).
2
8
53
@noah_greifer
Noah Greifer
5 months
1) Read the documentation. 2) Don't do mediation. Okay that knocks out about 90% of the questions I get. Now let me implement this obscure estimator that even its inventor won't use.
2
0
53
@noah_greifer
Noah Greifer
4 years
In #Rstats, if you printed something to the console but forgot to save it as an object and you want to use the output without re-running the functions, the output is stored in the .Last.value variable and can be saved from there. @RLangTip.
4
10
53
@noah_greifer
Noah Greifer
10 months
There is a kind of "equity" study that seems popular in medical research where you adjust for all mediators between group membership (e.g., race) and an outcome to claim that a disparity exists. The direct effect of group is interpreted as the magnitude of the disparity. .
3
11
57
@noah_greifer
Noah Greifer
3 years
If you've ever thought the point of centering variables in regression was to reduce collinearity, get that out of your head immediately! The *sole* point of centering is to change the interpretation of coefficients in the model.
5
7
50
@noah_greifer
Noah Greifer
5 years
Did not expect to triple my follower count overnight. You've all made a terrible mistake; I'm incredibly boring (on Twitter).
1
0
52
@noah_greifer
Noah Greifer
4 years
The new #Rstats MatchIt V4 is finally out! So many new features, fixes, and improvements! Pages and pages of documentation! A new website and logo! #CausalTwitter #EconTwitter #epitwitter @kinggary @Lizstuartdc.
@kinggary
Gary King
4 years
New version of "MatchIt: Nonparametric Preprocessing for Parametric Causal Inference" with new features, website, more.
1
17
52
@noah_greifer
Noah Greifer
4 years
It pains me to see people manually program "logit" functions in #rstats when R already has these built-ins:. qlogis() = "logit"; probability to log odds.plogis() = "inverse logit"; log odds to probability.
8
14
50
@noah_greifer
Noah Greifer
3 years
I'm sorry, but I have to once again tweet about this absolutely incredible paper. Every page describes a new discovery or connection. Seriously one of the most ambitious and illuminating #causalinference papers I've ever read. And extremely clear, too.
Tweet media one
1
8
48
@noah_greifer
Noah Greifer
3 months
@MatthewBJane I like mclogit::mblogit(), which performs fast multinomial logistic regression with optional random effects. Supported by {marginaleffects}. Make sure not to use mclogit() unless you know what you're doing! (It fits different model.).
2
3
51
@noah_greifer
Noah Greifer
2 years
Any good papers on alternatives to hazard ratios for quantifying treatment effects on survival outcomes? Ideally review papers aimed at an applied audience. #epitwitter #causaltwitter.
10
8
52
@noah_greifer
Noah Greifer
2 years
2000 followers! Thank you so much for giving me a platform to talk about statistics :) I know my feed is pretty dry but hopefully I've improved some people's lives with my online presence.
Tweet media one
3
2
49
@noah_greifer
Noah Greifer
10 months
The #rstats `mediation` package is great, but people need to understand that it's not a general-purpose mediation package. It implements one specific method of mediation that requires parameters (like the "treated" and "control" values) to be set in a specific way.
8
9
50
@noah_greifer
Noah Greifer
11 months
@PhDemetri I'm so flattered, thank you so much!!!. I only accept payment in the form of validation, recognition, and exposure at the moment, so this tweet was payment enough :).
4
0
47
@noah_greifer
Noah Greifer
3 years
@kareem_carr This relies on the idea that coefficients in multiple regression have meaningful interpretations. But they don't. That's what the table 2 fallacy is all about. If you want average marginal effects, you can get those from machine learning models.
5
2
48
@noah_greifer
Noah Greifer
2 years
One of you had a nice blog post about why we should never do mediation analysis. Please help me find it (or submit your own helpful posts/articles).
9
6
46
@noah_greifer
Noah Greifer
4 years
ATE, ATT, ATO. how do you choose? Different methods target different estimands, yielding effects with different interpretations. How do these estimands differ, and which one is right for you?. @Lizstuartdc and I explore that in our new article: >>.
2
9
43
@noah_greifer
Noah Greifer
3 years
I loved being a guest on @quantitudepod and talking about my favorite topic, propensity scores! Thanks P & G :).
@quantitudepod
quantitudethepodcast
3 years
S3E27: Propensity Scores — I Meant To Do That!. P & G hang out with @noah_greifer, Institute for Quantitative Social Sciences at Harvard University, to discuss propensity scores: what they are, how we get them, and how they can strengthen causal inference.
3
2
44
@noah_greifer
Noah Greifer
2 years
"A Violin Plot".or."A Kernel Density Plot and Then the Exact Same Kernel Density Plot but Upside Down This Time".or."What If a Kernel Density Plot, but Twice?".or."A Georgia O'Keeffe Painting but Ugly and Made of Data".or.
3
2
39
@noah_greifer
Noah Greifer
2 years
Have any of you used for writing a manuscript? I saw it recommended by a journal I was submitting to and it looks really cool. Curious if anyone has used it in practice and whether you would recommend it.
4
9
42
@noah_greifer
Noah Greifer
1 year
The problem with conditioning on a post-treatment variable (CPTV) isn't (just) that you are conditioning on a collider; it's that you are conditioning on a mediator. Even it wasn't a collider, CPTV still changes the interpretation of the treatment effect estimate.
1
8
41
@noah_greifer
Noah Greifer
3 years
All models are wrong except my cross-fit SuperLearner with GAM, GBM, random forests, HAL, and BART as candidate libraries.
1
1
40
@noah_greifer
Noah Greifer
2 years
Really frustrated with a paper I'm reviewing. I recommended rejection with a litany of complaints, editor gave them a resubmit, the resubmission doesn't fix any of my concerns. We're on round 3 and the paper still sucks.
10
0
42
@noah_greifer
Noah Greifer
3 years
How can you further adjust for propensity scores after matching? My answer here: Short answer: g-computation in the matched sample, made possible by @VincentAB's {marginaleffects} package.
1
3
43
@noah_greifer
Noah Greifer
2 years
I currently have a blog post up about performing subgroup/moderation analysis after propensity score matching in R. I hope you find it useful!.
1
4
42
@noah_greifer
Noah Greifer
9 months
This is a must-read for anyone interested in causal inference methods, especially if TMLE or DML are opaque to you. Well written as always @ildiazm. Your clarity, rigor, and expertise are inspiring.
@LarsvanderLaan3
Lars van der Laan
9 months
What are the differences between one-step estimation, Double ML, and Targeted ML? . This commentary (@ildiazm) and blog post (@mark_vdlaan) provide an overview of the history of machine learning in semiparametrics.
1
4
42
@noah_greifer
Noah Greifer
2 years
*me clutching onto my ornate and obscure covariate adjustment methods*
1
2
36
@noah_greifer
Noah Greifer
2 years
MatchIt has been updated to 4.5.3 with some critical bug fixes, in particular with k:1 matching with replacement. Please update MatchIt and re-run your analyses if you used this method using version 4.5.1 or 4.5.2 (i.e., between the end of Feb and now).
1
5
40
@noah_greifer
Noah Greifer
1 year
Today I hit 30k reputation points on CrossValidated. Thanks to everyone who has found my answers useful and upvoted or shared them!.
Tweet media one
4
0
33
@noah_greifer
Noah Greifer
3 years
Come join me at Harvard! The Data Science Services team at @IQSS is hiring a statistical consultant position. Details here: This is essentially the same position I'm in, so I'm happy to answer any questions about it.
8
26
37
@noah_greifer
Noah Greifer
2 years
I'll be honest. I have no idea what Quarto is and I'm too afraid to ask.
2
0
37
@noah_greifer
Noah Greifer
3 years
Can't wait to show you :) #Rstats
Tweet media one
1
2
32
@noah_greifer
Noah Greifer
2 years
The observation was due to the misspecification of the variance in both models. Using robust SEs (with quasipoisson) revealed the hypothesized pattern. Lesson: quasipoisson + robust SEs over Poisson/NB.
@SolomonKurz
Solomon Kurz
2 years
Presuming a clean RCT, is the ANCOVA model better than the ANOVA model when using the Poisson likelihood? After working with a real data set and doing a little simulation, it seems like the Poisson ANCOVA doesn't boost power for the beta coefficient, or for the ATE. Citations?.
4
3
35
@noah_greifer
Noah Greifer
2 years
One of the biggest updates to {MatchIt} will be coming in a few months. I know I just updated it, but I'm adding a huge new feature that will take it to the next level.
3
1
32
@noah_greifer
Noah Greifer
4 months
Hit 4k followers today :) Thank you for the support everyone!. How can I incorporate that into my H-index? 🥴.
3
0
35
@noah_greifer
Noah Greifer
2 years
@SolomonKurz I don't know if it's *the* way to go, but it's a way to go. I almost always use random forest imputation. It's available in {mice}.
0
1
34
@noah_greifer
Noah Greifer
3 years
This is a great post about the plausibility of the utility of covariate adjustment methods (e.g., OLS, PSM) in economics. Every field needs a paper like this instead of endless simulation studies comparing methods. #causalinference @dmckenzie001 .
3
9
33
@noah_greifer
Noah Greifer
2 years
Another day, another problem solved by @VincentAB's {marginaleffects} #Rstats package.
2
2
34
@noah_greifer
Noah Greifer
3 years
#Rstats #ggplot2 tip:.To "zoom in" on a plot, you MUST use coord_cartesian() and not lims() or scale_x_continuous()!. coord_cartesian() changes the plot area.lims() or scale_x_continuous() discard data points!. See below for an example:.
1
8
27
@noah_greifer
Noah Greifer
1 year
I hear it and I know
Tweet media one
0
2
30
@noah_greifer
Noah Greifer
1 year
Thank you everyone for 3k followers! This is easily the largest platform I've ever had and I'm honored so many of you are interested in what I have to say!.
0
2
28
@noah_greifer
Noah Greifer
6 months
Call for some statistics help:. How can I compare nested models that are not *symbolically* nested when using robust SEs? LR/score test doesn't account for robust SEs, and the usual Wald test needs symbolic nesting. Details here:.
6
4
29
@noah_greifer
Noah Greifer
10 months
@CausalHuber convincingly demonstrated that this approach is flawed in economics research because one conditions on colliders, allowing bias to remain in the direct effect and therefore distorting the disparity estimate. .
1
3
28
@noah_greifer
Noah Greifer
3 years
I just hit 20k points on CrossValidated, the StackExchange statistics help site. I've been on the site for 5 years and 8 months, have answered 651 questions, and have reached ~360,000 people.
3
1
26
@noah_greifer
Noah Greifer
2 years
If you've ever been curious about entropy balancing, you might benefit from reading my answer to this question about it on CrossValidated. #causaltwitter.
3
4
27
@noah_greifer
Noah Greifer
2 years
I have read so many great threads like this, putting names to all my experiences, that say "This is ADHD". The biggest barriers to my success are things people often label as symptoms of untreated ADHD. Chronic procrastination, hyperfocus, inability to do certain tasks, . .
2
1
27
@noah_greifer
Noah Greifer
3 years
First day at Harvard today 😬.
2
0
26
@noah_greifer
Noah Greifer
1 year
So, the correct answer is D (1)!. Why?. 1 < 2 evaluates to a length 1 logical vector, which means only the first element of the second argument is returned. The length of the first argument determines the length of the output. .
@noah_greifer
Noah Greifer
1 year
#Rstats quiz:. a <- 1:4. b <- ifelse(1 < 2, a, 0). print(b). What should you expect?.
2
2
27
@noah_greifer
Noah Greifer
1 year
@PhDemetri They talk past each other because they are saying two different things. But that means you can learn two different things and reconcile them yourself. Pearl is explaining a formal system of causality; Senn and Harrell are explaining how to design studies for causality.
2
0
28
@noah_greifer
Noah Greifer
3 years
My thoughts on the ubiquity and limitations of propensity scores. I'm not as much of a propensity score fanatic as you might think ;). .#causaltwitter #epitwitter.
1
5
27
@noah_greifer
Noah Greifer
3 years
@adamjnafa 😌.For when you need a citation clapback.
3
2
29
@noah_greifer
Noah Greifer
4 years
MatchIt version 4.2.0 was released on CRAN today!. New features:.- distance can be supplied as a distance matrix.- new tools and guide for moderation analysis.- anti-exact matching.- speed improvements (esp. for exact matching).#Rstats #Causaltwitter #Econtwitter #Epitwitter.
1
10
25
@noah_greifer
Noah Greifer
11 months
I know I'm usually answering questions on CV and not asking them, but if anyone could help with a question about propensity scores (really) and GMM I would appreciate it! #EconTwitter #causalinference.
6
9
27
@noah_greifer
Noah Greifer
5 years
Also peep the new bio 👀.
7
0
24
@noah_greifer
Noah Greifer
9 months
@MatthewBJane One way would be to plot the linear predictor (the mean of the latent variable) and the thresholds. You also superimpose the implied logistic distributions on the line. Something like this from Long and Freese (2014):
Tweet media one
2
0
27
@noah_greifer
Noah Greifer
10 months
@PhDemetri Recommending:.- Greifer & Stuart (2021): - Stuart (2010): - Ho et al. (2007): - The MatchIt documentation: This is my area of expertise so again feel free to ask.
2
3
28