Nick HK Profile
Nick HK

@nickchk

Followers
19,152
Following
341
Media
1,194
Statuses
17,178

Econ prof @SeattleU . Book The Effect out now! Check my pinned thread for all my projects. Substack

Joined October 2010
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@nickchk
Nick HK
5 years
This is a thread of me QTing myself so I can pin all my posted-on-Twitter projects. Causal inference animated graphs:
@nickchk
Nick HK
6 years
As requested, slower graphs! Also added a graph on collider bias, the webpage explanation helps there. These graphs are intended to show what standard causal inference methods actually *do* to data, and how they work. This is what controlling for a binary variable looks like:
26
458
2K
6
94
485
@nickchk
Nick HK
6 years
I've been getting used to gganimate and thought it would be useful to put together some illustrations of what various causal inference methods *actually do to data* and how they work. Here, for example, is what it means to control for a (binary) variable
43
1K
3K
@nickchk
Nick HK
2 years
In regression, there are several things that students are eternally concerned about but are actually Just Fine: 1. Your coefficients don't need to be significant 2. Your R^2 doesn't need to be huge 3. Your predictors can be correlated 4. Your variables don't need to be normal
51
362
3K
@nickchk
Nick HK
4 years
I feel like this is often not made clear enough in metrics
Tweet media one
33
425
2K
@nickchk
Nick HK
4 years
My students have been having trouble figuring out how to put together a regression model - what to include, what to interact, which variables are left-hand/right hand. So I made a flowchart.
Tweet media one
23
380
2K
@nickchk
Nick HK
6 years
As requested, slower graphs! Also added a graph on collider bias, the webpage explanation helps there. These graphs are intended to show what standard causal inference methods actually *do* to data, and how they work. This is what controlling for a binary variable looks like:
26
458
2K
@nickchk
Nick HK
5 years
A graphical explanation of why you shouldn't use R^2 to pick your model.
Tweet media one
28
328
1K
@nickchk
Nick HK
6 years
Over winter break I'm planning to automate my CV, so I can just update a spreadsheet with accomplishments and it will spit out an updated website and PDF CV. Is this something anyone else would use or should I write it just for myself?
86
55
1K
@nickchk
Nick HK
3 years
Today I'm releasing the first video designed to accompany my book The Effect, about research design and causal inference. There are ~70 videos planned for the series. These can be used to accompany the book, as classroom material, or just on their own.
10
231
1K
@nickchk
Nick HK
3 years
Today I finally get to share the full draft of my new book The Effect. I think it turned out really well. The Effect offers a highly accessible and intuitive approach to research design and causal inference with observational data. Find it here:
20
285
1K
@nickchk
Nick HK
3 years
My causal inference book The Effect is finally out today! It's a super intuitive look at causal research design, casual diagrams, and standard research methods. I'm super excited, and jealous of everyone getting their copies today before I get mine!
27
162
1K
@nickchk
Nick HK
5 years
I'm excited to show off something I've been working on for a few months now: the materials from a new class that I pinpoints a very different, and I think very promising way of teaching undergrad econometrics (thread)
20
221
887
@nickchk
Nick HK
3 years
Last night my husband offered every trick or treater (old enough to handle it) a choice between a candy or a "mystery box". Every single child chose the mystery box. Trust levels high in my neighborhood.
22
23
829
@nickchk
Nick HK
4 years
I've written an undergraduate-friendly guide to using simulations to do power analysis. It's in R but the concepts are general. Suppose I could do a Stata version too.
10
148
855
@nickchk
Nick HK
2 years
Statistics terminology is truly unhinged. Different names for the same terms across fields. The same names for different things across fields. Common language terms applied in highly specialized ways. Two similar terms used to mean entirely opposite concepts.
35
70
749
@nickchk
Nick HK
2 years
I wrote a guide to common errors and mistakes I see beginning R students making in my data viz class. Stuff like "function not found" or "package was compiled with R version..." or "don't put numbers in quotes". Your students may find it handy too.
8
153
753
@nickchk
Nick HK
4 years
I have accumulated many Stata tips over the years, but none of them are as simple yet provoke as many jaw-dropping "I wasted *all that time*" responses as inlist(). (as for why inlist() only accepts ten arguments I do not know)
Tweet media one
21
81
756
@nickchk
Nick HK
2 years
If you teach students to work with data, you're doing them a great disservice if you just teach them how to run models/analyses and not how to clean and manipulate data. If you haven't tested them directly on this, you'll be surprised how unintuitive this is to new users!
9
79
714
@nickchk
Nick HK
3 years
I am going to make a lil website that simply has correct definitions of "statistically significant", "confidence interval", and "p-value" and also a list of common incorrect definitions of these terms. If you have a good incorrect definition, list it here, I may include it.
39
34
683
@nickchk
Nick HK
5 years
Very excited to say that my 📦 #Rstats package📦 vtable is now available on CRAN, installable with install.packages('vtable') vtable is a *variable browser* for R that helps you look at your data WHILE you're working on it. (thread/)
17
140
602
@nickchk
Nick HK
2 years
I've updated my Data Wrangling in the Tidyverse course material and am uploading a 17-part video series. This assumes little previous R knowledge (although some). Covers tidying, manipulating variables, cleaning factors, dates, and strings. Enjoy!
5
108
574
@nickchk
Nick HK
2 years
Difference-in-differences and regression discontinuity estimates don't directly tell you the effect of a treatment. They tell you the effect of a *cutoff*. It's only our theoretical understanding of what that cutoff means that lets us interpret that as an effect of treatment.
6
37
534
@nickchk
Nick HK
4 years
I don't post about my personal life a lot but today we finally had our court date to finalize the adoption. I've found that academics, who make up a lot of my followers, are above-average curious about adoption. So here's a little about the process, and how it went.
14
22
529
@nickchk
Nick HK
4 years
If you look at autocorrelated data, pick any random day, and look for a break in linear trend on that day, you will find a statistically significant break a ludicrous amount of the time. In this random AR(1) data with delta = .8, I reject the null at 95% about half the time.
Tweet media one
17
79
512
@nickchk
Nick HK
3 years
It's fully-featured and ready to go! The did Stata package allows Stata access to the Callaway and @pedrohcgs R package "did" for estimating difference-in-differences with staggered treatment and covariates. Check it out, and installation instructions, at
4
131
510
@nickchk
Nick HK
4 years
I have a new paper in the Journal of Causal Inference about IVs! You can greatly improve the finite-sample performance of your IV by simply modeling first-stage effect heterogeneity. Super easy! Even if you do a bad job, it helps! Big error reductions.
10
103
507
@nickchk
Nick HK
3 years
Success! My package📦causaldata 📦 is available on CRAN (install.packages('causaldata')), ssc (ssc install causaldata), and PyPI (pip install causaldata). causaldata contains data sets used in The Effect (by me), The Mixtape ( @causalinf ), and What If ( @_MiguelHernan and Robins)
1
90
504
@nickchk
Nick HK
4 years
This fixed effects animation stands less well on its own (i.e. explains less), but if you already sorta know how FE works I think this works better and looks nicer (if you wanna drop this in some slides, this is from the Wooldridge crime4 data set, GIF: )
10
97
493
@nickchk
Nick HK
3 years
I have a new package for R, Stata, and Python: causaldata. Data sets to run example code from causal inference books. Currently it just has mine, The Effect, but planning to add at least The Mixtape before uploading to CRAN/ssc/PyPI. Install instructions:
@nickchk
Nick HK
3 years
Today I finally get to share the full draft of my new book The Effect. I think it turned out really well. The Effect offers a highly accessible and intuitive approach to research design and causal inference with observational data. Find it here:
20
285
1K
8
77
489
@nickchk
Nick HK
5 years
Some neat tools I use with some regularity and IMO should be more visible: 1) TablesGenerator, to convert tables between Word / Excel / LaTeX / Text / HTML
4
112
482
@nickchk
Nick HK
4 years
I've been working on an intuitive and accessible intro textbook on causality and research design. It's going pretty well, Volume 1 almost done. I'll be posting chapters from the book, one per week. Link here, and I'll update this thread with chapters
5
96
477
@nickchk
Nick HK
6 years
Hey all! I'm excited to reveal my new R package. It fills a gap I've been sorely missing since I've been moving from Stata to R: a quick and easy VARIABLE BROWSER! One you can use Find-in-Page on, and variable labels! Introducing the package vtable()! (thread)
12
116
451
@nickchk
Nick HK
2 years
In private consulting, the econometrics tool that people have not heard about but most often gets them very excited when I suggest and explain it is Kitigawa-Oaxaca-Blinder decomposition
9
28
421
@nickchk
Nick HK
4 years
I just learned that you can scale your LaTeX table to the textwidth by wrapping the tabular in \resizebox{\textwidth}{!}{} and OMG I'm never going back
12
27
424
@nickchk
Nick HK
6 years
In addition to being a professor, I also do freelance statistical consulting (my rent is too damn high). I thought it would be interesting to go over the way that people think about statistics in this hidden little corner of the internet.
7
150
417
@nickchk
Nick HK
3 years
When the abstract says "we use an instrumental variables approach to establish causality" and doesn't say what the instrument is, you know you're in for a wild ride
11
19
417
@nickchk
Nick HK
6 years
Difference-in-differences
10
138
403
@nickchk
Nick HK
3 years
I still find it hard to believe that OLS consistency only relies on X being uncorrelated with ε, not independent of ε, and I occasionally run a simulation to re-convince myself.
Tweet media one
Tweet media two
8
49
412
@nickchk
Nick HK
4 years
All the materials for my econometrics class are online (except assignments/exams). Check them out if you like. Videos: Slides: Most readings are Bailey, but some are my book: Swirl:
5
105
399
@nickchk
Nick HK
5 years
The econ job market is starting again. Just a minor advice thread, specifically aimed at people in the academic market who are not superstar candidates. IMO too much of the advice is aimed at people who are heading towards a job at an R1.
8
93
375
@nickchk
Nick HK
4 years
ⁿᵒᵗ ʳᵉᵃˡˡʸ ᵗʰᵉ ᵗᵒᵖᶦᶜ ᵒᶠ ᵗʰᵉ ᵐᵒᵐᵉⁿᵗ ᵇᵘᵗ ᶦ ᵍᵒᵗ ᵗᵉⁿᵘʳᵉ ᵃʸʸ, ˡᵉᵃᵛᶦⁿᵍ ᶠᵒʳ ˢᵉᵃᵗᵗˡᵉ ᵃⁿʸʷᵃʸ ᵗʰᵒ
23
0
354
@nickchk
Nick HK
5 years
This is some masterclass variable coding
Tweet media one
12
24
350
@nickchk
Nick HK
4 years
The kids wanted more data wrangling and so it shall be
Tweet media one
10
55
352
@nickchk
Nick HK
2 years
scales, which formats numbers for presentation, is such an undersung package. I use it all the time. If you make anything in R that is intended for an audience to see - graphs, tables, RMarkdown/Quarto, it's perfect.
6
28
351
@nickchk
Nick HK
2 years
Speaking of DID by the way, if you missed it, there's now an R package for @jmwooldridge 's extended two-way fixed effects estimator for staggered treatment cases, courtesy of @grant_mcdermott .
2
58
351
@nickchk
Nick HK
3 years
When I was in grad school, learning LaTeX was 100% worth my time. For current grad students I think you're probably better off learning how to write in Markdown and have something else do the LaTeX conversion for you when you want a PDF.
20
33
342
@nickchk
Nick HK
4 years
Slides available for my course on causal inference. Causal concepts and the whole deal with what identification actually is, plus methods like controlling for stuff, standard designs like DID, RDD, IV, and details like estimation and het. treatment fx!
5
76
339
@nickchk
Nick HK
4 years
Some neat R pipe %>% tricks you might not know: 1. Pipe to {} to do any sort of calculation you like, referring to the passed object with . 2. Pipe to `[[`() to apply the "[[" function (i.e. object[[index]]), or similarly `[` for object[index]
Tweet media one
Tweet media two
10
45
336
@nickchk
Nick HK
2 years
Diff-in-diff is interesting because you start with *such* a simple implementation - OLS with an interaction term and a fairly grokkable parallel trends assumption. And it's great! Then you realize that breaking that basic case at *all* makes you have to change *everything*
2
24
335
@nickchk
Nick HK
3 years
"OLS of Y on X is unbiased only if X is unrelated to the error e. If there's a term Z related to X in e, the sign of the bias is the product of the signs of Cov(X,Z) and Cov(Y, Z)" ??? "If Z hangs around X but OLS doesn't know about it, it'll give X all the credit for Z" "oh ok"
5
21
331
@nickchk
Nick HK
5 years
The 20th page has been added to the Library of Statistical Techniques. I'll be posting new pages here as they come in. Please consider contributing! I'm planning to write one a week. Edit pages that are there or add your own. It's easy.
4
85
330
@nickchk
Nick HK
3 years
causaldata is complete in R, Stata, and Python! Or at least it now has all data sets from both The Effect by me and Causal Inference: The Mixtape by @causalinf (except for judge_fe, it's too big). Install instructions:
@nickchk
Nick HK
3 years
I have a new package for R, Stata, and Python: causaldata. Data sets to run example code from causal inference books. Currently it just has mine, The Effect, but planning to add at least The Mixtape before uploading to CRAN/ssc/PyPI. Install instructions:
8
77
489
2
57
323
@nickchk
Nick HK
2 years
without exaggeration, the most important skill for any data analyst working in any field is the ability to notice when something is wrong
7
47
316
@nickchk
Nick HK
2 years
Announcement and invitation! A new project that aims to improve the quality of research in applied microeconomics by examining researcher choices. I am hoping to recruit up to *200 researchers* of all kinds (with pay) and hope you will join me! (Thread)
12
140
319
@nickchk
Nick HK
2 years
Never been a fan of the way students are commonly taught about outliers as things to be precisely defined, detected, and destroyed. Sure, sometimes checking outliers past a certain cutoff lets you detect data errors, and dropping/fixing those makes some sense...
9
29
318
@nickchk
Nick HK
4 years
But in the end, you *do* get a kid, so...
Tweet media one
27
1
310
@nickchk
Nick HK
2 years
Updated my data viz course slides if you're looking for material (or want a free course). Note it's also a "motley assortment of a bunch of stuff you gotta be good at data" course, like data cleaning, communication, statistical intuition, etc.
7
40
310
@nickchk
Nick HK
3 years
The causaldata R, Stata, and Python packages, with data sets for running code examples in The Effect, Causal Inference: The Mixtape, and What If, have a minor update to v.0.1.3 on CRAN/ssc/PyPI, fixing some issues with Mixtape data sets.
1
43
308
@nickchk
Nick HK
5 years
I'm a dad now 🏳️‍🌈
Tweet media one
26
2
296
@nickchk
Nick HK
5 years
I have a new paper! "Human Capital vs. Signaling is Empirically Unresolvable." The most unusual paper I've written. I try to identify the relative importance of HC and S in education returns, and come to the conclusion that this question is unanswerable.
9
56
285
@nickchk
Nick HK
5 years
People seemed to be into the idea, so I am launching the Library of Statistical Techniques, or LOST. LOST is a Wiki guide to doing things in statistical software/code. Instructions, examples, and a little Rosetta stone btwn languages.
6
86
283
@nickchk
Nick HK
5 years
Been working on a Stata package requiring ML and realized I could whip something up that would be pretty useful here. So it's MLRtime, a Stata package for running Machine Learning commands in R! So yes, now you can do random or causal forests in Stata, with mostly-Stata syntax.
3
68
284
@nickchk
Nick HK
6 years
Here's a link to a page with these graphs alongside DAGs and more explanatory detail, if you want something to link your students to. Also if you have any ideas for other methods I should animate let me know.
4
42
285
@nickchk
Nick HK
10 months
Very cool paper just out in AER. They actually find *negative* amounts of publication bias - review process selects *against* marginally sig. results, even though reviewers like significance. Observed bunching is driven by p-hacking before submission.
3
55
275
@nickchk
Nick HK
4 years
Conditional probability P(A|B) = P(A and B)/P(B) means "out of (denominator) all the B's, how many (numerator) are also A's?" and every time I see a textbook that doesn't use this intuition it annoys me
7
23
275
@nickchk
Nick HK
3 years
A clean way to interpret the univariate OLS slope, Cov(X,Y)/Var(X), is "out of all the variation in X (/Var(X)), how much of that is related to Y (Cov(X,Y))?" Logic extends to multivariate (X'X)^(-1)X'Y if you remember that inverse = divides by = "out of"
9
24
271
@nickchk
Nick HK
3 years
I have a new short working paper: Linear Rescaling to Accurately Interpret Logarithms. Do you use logs? Do you interpret increases in ln(X) in terms of percentage changes in X? Did you know there are a lot of problems in the way we're told to do that?
Tweet media one
5
54
265
@nickchk
Nick HK
3 years
(the mystery box had a full-size candy bar)
5
0
251
@nickchk
Nick HK
4 years
Happy father's day to me, I relaxed and made this animation. This is what it looks like to control for a continuous variable in a linear model.
8
46
259
@nickchk
Nick HK
2 years
been hearing plenty of arguments as to why I should start using = instead of <- in R. i'm only sorta convinced so i've started alternating between the two in the same code. this should make everyone happy
12
5
260
@nickchk
Nick HK
4 years
A video version of my workshop "Teaching Econometrics with R", targeted at faculty already familiar with another language, and focusing on getting used to R as well as relevant functions and packages for undergrad econ
1
52
260
@nickchk
Nick HK
11 months
Copilot is live in the most recent normal release of RStudio, btw. Tools -> Global Options -> Copilot
Tweet media one
@nickchk
Nick HK
11 months
FYI, I haven't installed it myself and will probably just wait for the actual release, but the newest daily version of RStudio has GitHub Copilot integration, and I hear it works quite well!
5
12
100
5
35
253
@nickchk
Nick HK
4 years
A frequent error I see students make is thinking that they can account for "the effect of X on Y is different for group A vs group B" by adding group as a regression control variable. This is incorrect! So I made a graph.
Tweet media one
6
40
248
@nickchk
Nick HK
3 years
Now with a cover! Get your preorders of The Effect in while it's still on sale. and as always you can check it out first at
Tweet media one
6
40
251
@nickchk
Nick HK
3 years
the "use OLS for everything" vs. "don't" debate is neat because it really highlights how doing statistics well is much less about being right about anything than it is about making your wrongness as harmless as possible
7
17
251
@nickchk
Nick HK
2 years
Some neat ggplot2/adjacent tricks I didn't discover until surprisingly recently - expand option in scale_*_discrete to add room around factor variables - position_dodge() to line up geoms without position = 'dodge' with something dodged - scales::label_wrap to word wrap labels
Tweet media one
Tweet media two
4
16
245
@nickchk
Nick HK
4 years
Finishing my econometrics prep. Gonna make a hard sell for my vtable package in your class if you're using R. - Easily explore variable characteristics and values with vtable - Super easy summary tables with sumtable - Balance tables with sumtable
6
26
247
@nickchk
Nick HK
4 years
A demonstration of sampling variation in OLS with N = 2 (albeit resampling from a larger sample, not the population)
6
42
245
@nickchk
Nick HK
4 years
Having a tenure-track job where it is perfectly fine that I will never publish in a top 5 is actually pretty sweet, I recommend it to anyone with the same crossed wires I have where academia is endlessly fun
2
0
243
@nickchk
Nick HK
5 years
Still mildly annoyed by the RateMyProfessor that says my class is impossibly hard, and then at the end tacks on that if you want to pass you should try going to lecture or reading the book because they did neither.
10
9
238
@nickchk
Nick HK
6 years
Something that would be extremely useful would be a Wikipedia-style statistics practitioner cookbook. Something you can look up the method you're about to do and it will remind you of all the best practices. (thread)
11
32
235
@nickchk
Nick HK
4 years
I do a lot of statistical programming and econ work on upWork. Data cleaning, research design, editing, coding. If this is work you are capable of and you need another income stream bc of quarantine, give it a shot. Also let me know and I can probably send some clients your way
3
40
238
@nickchk
Nick HK
5 years
Reading a bunch of DID papers and it's nuts just how much more widely applicable DID is than any other CI method. Anything improving DID to make results more believable (and there have been a few recently) is maybe the highest marginal value an applied econometrician could offer.
7
23
234
@nickchk
Nick HK
2 years
Haha I'm never maintaining a repository of citation info ever again, this is so nice. In the Rstudio/Quarto visual editor (which I think is soon to be not-just-in-Rstudio)
Tweet media one
Tweet media two
Tweet media three
Tweet media four
10
17
235
@nickchk
Nick HK
3 years
Clue is a board game about finding a dead body but being unsure about which room you found it in and unable to tell the difference between someone who was bonked over the head or shot
3
18
218
@nickchk
Nick HK
3 years
I've never been a huge fan of adjusted R^2, but it has now been pointed out to me that whether an additional variable makes adj R2 go up or down is whether its t-stat is above or below 1, and now I like it less.
8
27
231
@nickchk
Nick HK
3 years
At long last (and, of course, exactly one day after I sent out my one-email-a-month-I-swear update email), my book The Effect: An Introduction to Research Design and Causality is now available for preorder and on sale! Please check it out.
6
49
231
@nickchk
Nick HK
1 year
I'll never forget the time someone came to my house, saw the various dolls and princess castles and stuffies, and said "wow, you must really like toys" instead of correctly deducing that I have a child
8
5
225
@nickchk
Nick HK
2 years
Guess at how many observations you need to have 90% statistical power to detect an effect that increases a rate from 50% to 55% (not an inconsequential boost in many settings!). Assume perfect randomization, half treated/half control Answer in third post.
21
40
225
@nickchk
Nick HK
3 years
Making the same package in R, Stata, and Python really drives home how nice the package development environment is in R. It's possible (likely) that I'm missing some neat tools in Python to make this easier but R is just on a different level for this task
7
13
224
@nickchk
Nick HK
2 years
I have not been able to get my hands on exact numbers, but from some back calculations it looks like The Effect has sold something like 3000 copies so far, which is pretty amazing, especially for a book also available for free! Thank you all very much.
9
12
226
@nickchk
Nick HK
2 years
One very cool thing about dbplyr and dtplyr is that they don't just run your code, but convert the dplyr syntax itself into SQL/data.table, and show you how to do what you wanted. Are there any other packages like this? Could I type in dplyr or data.table and get pandas syntax?
Tweet media one
Tweet media two
6
23
225
@nickchk
Nick HK
3 years
I've just signed a book deal for The Effect with Chapman & Hall, who also publish the excellent What If by @_MiguelHernan and Robins. So a physical version will be available at some point. Building a little causal inference library over there!
11
16
216
@nickchk
Nick HK
5 years
What I have learned from academic twitter is that every statistical method is both the gold standard and also so completely worthless and unbelievable that it doesn't even justify explaining why it's bad
4
27
210
@nickchk
Nick HK
6 years
This is for a class I'm designing on programming and causal inference (h/t @causalinf ) designed to go BEFORE the rest of the econometrics sequence. The idea is teaching concepts before methods. Notice that none of these graphs use regression! It's not necessary!
3
11
214
@nickchk
Nick HK
4 years
shots fired @causalinf
Tweet media one
3
6
214
@nickchk
Nick HK
2 years
I correctly predicted the econ Nobel last year, but only because I kept predicting the same people every year until it happened. Strongly recommend this strategy.
9
6
215
@nickchk
Nick HK
6 years
This is the first time I've been excited about something new from Microsoft since like 2010
5
37
208