We've all encountered this problem: "I want to take logs, but my data have zeros => ???"
Commonly (myself included), people use an ad hoc transformation m(.) (e.g. m(x) = arcsinh(x)) that approximates log, and interprets the resulting effect as a %
🧵of paper with
@jondr44
1/n
Twitter is fun because we can promote the work of friends and people we admire!
Today I want to call your attention to a brand new paper by
@jondr44
and
@jiafengkevinc
, available at:
What is the gist?
Check the abstract!
I'm very excited to join
@StanfordEcon
as an assistant professor in 2025, after a year
@SIEPR
. I'm stressed out about moving my 🐱 across the continent, but otherwise beyond thrilled for what's to come!
(also, definitely welcome all advice re: bay area)
Synthetic control can be interpreted as an online learning algorithm. Even without assumptions on outcomes, results from online learning imply that synth predictions perform almost as well as those made by the best weighted match.
@jiafengkevinc
New short paper documenting an easy recipe for incorporating machine learning into linear IV to boost instrument strength *essentially for free*:
Joint work with Daniel Chen and
@econ_greg
#EconTwitter
#CausalTwitter
Excited to share a new working paper taking a closer look at causal inference in school choice settings, where students are matched to schools via algorithms. The paper derives the identified causal estimands under heterogeneous treatment effects (1/n)
Very cool new paper by
@jiafengchen42
that gives a justification for synthetic control methods
He shows that when treatment timing is random and you have many time periods, SC has close to optimal regret when an adversary chooses the potential outcomes
I want to open a discussion about this post by Gelman (
@StatModeling
) and various aspects of RDD inference. Gelman criticizes a paper that shows a close-election victory effect of 2-10 years on "days alive after election" (thread 1/n)
#EconTwitter
Since the Twitter exploit must be worth more than $118,000 to a profit-maximizing attacker, the fact that the realized profit is only $118,000 illustrates that the criminal market is not efficient. In this essay I will
I'm thrilled to announce that I'll be offering difference-in-difference offsets, where I promise to not write DID papers for a price
For a higher price I'll start working on projects like "Why TWFE is More Robust Than You Think" and "Why Constant Treatment Effect is Innocuous"
Excited to share a new preprint with
@DRitzwoller
on semiparametric estimation (read, *machine learning* estimation) of long-term treatment effects, such as those identified by a latent unconfoundedness or a surrogacy assumption (1/n)
Our new working paper: tell the computer what you know (and don't know!) about a causal question w/ discrete data → automatically get most precise possible answer (bounds, or a point estimate). Joint w/
@guilhermejd1
@nsfinkelstein
@dean_c_knox
Shpitser.🧵
possibly really dumb question: does IV "work" if the instrument Z is confounded with treatment W?
i.e. DAG is Z -> W -> Y, W <- V -> Y, Z <- U -> W,
In PO notation,
Z = Z(U, e1)
W = W(Z, U, V, e2)
Y = Y(W, V, e3)
and (e1, e2, e3, U, V) are jointly independent
Been learning DAGs lately and wanted to be more fluent in translating between DAGs and POs
I think this is a reasonably short proof of the front-door formula/napkin problem that uses only potential outcomes..?
(it can conceivably fit on a big napkin?)
Another way of saying this is that we can't have all three: (a) an average effect, (b) scale-invariance, and (c) point identification. In practice, then, we must decide which two properties we want. [5/n]
the small angle approximation solves the dynamics of the pendulum by approximating sin(x) = x. How good an approximation is this? This shows the true dynamics (blue) and the approximation (white) for various initial angles
intro statistics rarely emphasizes the difference between an estimand and an estimator (usually it's fairly clear, e.g. sample mean estimates the mean), but the difference becomes super pronounced and subtle in causal inference
Well, a percent is invariant to units, and so a basic requirement is that differences in m(.) should be invariant to scaling.
It turns out the opposite is true: For any m(.) that approximates log, but is defined at 0, one can choose the scaling to get anything one wants 😬[2/n]
Defining notation for a theory paper is halfway between making up a language and choosing an opening in chess. Small early choices affect what moves are available later.
A buddy and I passed a bunch of SEC 10-K filings through a big pre-trained neural network and examined its brain---because why not, university compute is free and we can't mine BTC anyway
Turns out the network understands which companies are close to which
#EconTwitter
⬇️⬇️⬇️
How can transformer models help us understand economic and financial information?
New workshop paper with
@jiafengchen42
develops an industry classification index from text with greater informativeness about financial metrics than existing text/expert-based classifications [1/2]
New short paper documenting an easy recipe for incorporating machine learning into linear IV to boost instrument strength *essentially for free*:
Joint work with Daniel Chen and
@econ_greg
#EconTwitter
#CausalTwitter
I am reviewing a paper and came across the word “eternal validity,” which is a typo…but its a brilliant typo. Why us economists only talk about internal and external validity? Why not eternal validity???? 😏
#EconTwitter
#AcademicChatter
@ben_golub
@CaltechEconThry
Proof I first saw from
@stat110
's stat 210:
WLOG, assume f(X), g(X) mean zero.
Observe that
E[f(X) g(X)] + E[f(Y) g(Y) ]= E[(f(X) - f(Y)) (g(X) - g(Y))] for X, Y independent and Y ~ X
Conclude by noting that (f(x) - f(y))(g(x) - g(y)) ≥ 0 a.e.
this is why the pscore-as-dimension-reduction thing is a bit of mental gymnastics and word sorcery---estimating the pscore is a high-dimensional problem!
@jiafengkevinc
@paulgp
Compare to this -- the point is not that that NumPy is 180x faster than R. The point is that my NumPy setup is using the correct BLAS and my R setup is not:
📢 **REStud Page Limit**
With effect from 1st July 2022, a page limit policy applies to all submissions. Papers should be under 45 pages. Online appendices should not exceed 30 pages. A “grace period” is in place until 15th August 2022. For more details:
Happy to share my new NBER working paper with Ed Glaeser and
@davidmwessel
. We find that the Opportunity Zone program, as a part of the 2017 Tax Cuts and Jobs Act, has not (so far) generated a response in residential housing prices. Link: .
#econtwitter
Now, I can't possibly pass on an opportunity to promote some recent work on Twitter, to Twitter, by quote-tweeting
@TwitterEng
(joint work with
@DRitzwoller
)
We built a causal estimation framework on the idea of statistical 'surrogacy' (Athey et al 2016) - when we can’t wait to observe long-run outcomes, we create a model based on intermediate data.
Suppose (Xn, Yn) converges in distribution to (X, Y), and (Yn, Zn) converges in distribution to (Y', Z). (This implies Y' ~ Y). Suppose Xn is independent of Zn given Yn.
T/F: (Xn, Yn, Zn) converges in distribution to some (X'', Y'', Z'')?
I will be presenting this paper on inference post adaptive experiments this Thursday (11:30 am ET) at the International Seminar on Selective Inference.
Come check it out!