I'm teaching a "data science for economists" course this semester.
If you're interested in learning more about
#rstats
, Git(Hub), programming, databases, cloud computation, ML, etc., I'll be making all of my course material publicly available here:
I made a slide deck on data.table for my "big data in economics" class:
The target audience is
#rstats
users who are familiar with the
#tidyverse
, but are (a) interested in what
#rdatatable
has to offer and/or (b) how the two complement each other.
Reminder:
Bootstrapping is the most intuitive way to teach inference in introductory stats courses. And the only reason we don’t is because of outdated pedagogy.
The `tinyplot` 📦 is now available on CRAN 🎉
_tl;dr_ `tinyplot` is a super lightweight extension of the base
#rstats
graphics system. Makes it easy to draw complex (base) plots with minimal code/effort. See the website for a detailed tutorial + more info.
Stoked to teach my data science for economists course again this year. Yesterday we covered Git. I’ll repeat that I’m prepared to give this lecture (or another) at any institution that invites me out for a seminar...
The lecture slides, etc. are all here:
In teaching some intro-to-ML stuff, I've noticed that there's not a nice automated way to visualize decision tree partitions in
#ggplot2
.
So I packaged up some simple functions in my spare time (ha ha, I know). Feel free to kick the tyres.
#rstats
My American friends are incredulous whenever I explain the restrictions of the H-1B visa. My spouse is not allowed to work. I cannot accept short-term work offers (consulting gigs or paid speaking engagements). I cannot put my house on Airbnb. Even volunteer work is complicated.
1/The next book in my "immigration and diversity" reading list is "Probationary Americans", by Edward Park and John Park.
It's about all the hoops that America makes immigrants jump through these days.
If you are a Stata user (or you know one!) you really should check out
@JulianReif
's fantastic reproducibility and coding guide.
Lots of good advice and it even comes with an
@AeaData
-compliant sample replication package.
I'm seriously considering turning my "Data Science for Economists" course notes into a book.
You can help me decide by answering three simple polling questions below!
(Comments and retweets appreciated.)
A shortcut I like to use is calling multiple geoms in an lapply() call, since this automatically generates a list. Works well for investigating plotting variations, e.g.
ggplot(diamonds, aes(carat)) +
lapply(c(50,200), function(b) geom_histogram(bins=b, alpha=0.3))
In news that will surprise no-one,
@edrubin
has produced some stellar lecture material for his PhD econometrics course.
If you'd like to learn more about econometrics, causal inference, simulation (and the R code to do it all), please check out his repo:
@dlmillimet
@snavarrol
@causalinf
Underscoring the madness that most PhD programs don't offer a dedicated data cleaning & wrangling course. Or workflow or version control. We just feed students pristine datasets & puzzle over the fact that so many struggle to stay on track. At least offer a programming bootcamp!
New mini
#rstats
📦:
An
#rmarkdown
template that I use for exporting documents to both PDF and HTML format. Takes care of various annoyances/inconsistencies like:
- Author affiliations
- Multicolumn environments
- Non-standard fonts in PDF plots
- etc.
Same, but for plotting software & themes.
Stata white: economist.
Stata default blue: labour economist.
ggplot2 default grey: Just learnt ggplot2. (Or, Hadley Wickham.)
ggplot2 other: omg I love R
Matplotlib: masochist/macroeconomist.
Excel: "data scientist in finance".
What an academic's choice of typeface for their CV reveals about them:
Times New Roman: old
Computer Modern: math snob
Garamond: style snob
Calibri: don't know how to use Word
Arial: Google Docs user
Comic Sans: nothing left to prove
Woah.. I just tried the newly-enabled GPU support for
#julialang
's FixedEffectModels package (for high dimensional FEs) & the performance boost is truly impressive. At least 2x as fast as the next best option. Some benchmarks from an actual model that I'm estimating.
#econtwitter
@JoshuaSGoodman
We've just done the same
@uoregon
. I've been amazed at how well/quickly senior faculty have embraced the transition. (The students seem very happy.)
@edrubin
and I have put together various resources for our colleagues. E.g. this "bare minimum" intro to R:
Since various SW benchmarks are going around today... A short thread on why I use
#rstats
.
Put simply, it offers by far the fastest & most efficient tools for the work I do (i.e. mostly data wrangling & applied econometrics).
As I say in the syllabus, this course basically covers all of the things I wish I'd been taught in grad school. At the same time, I've benefited immensely from so many people making their teaching materials (and software!) publicly available. This is me trying to pay it forward.
Okay this is my favourite student gift this quarter. We used Natural Lite as an ex. of an inferior good during my intermediate micro class... even though I kept saying I'd never tried it.
PS. Blame
@mikeanton13
, whose lecture notes I cribbed.
PPS. Will this work with Laphroaig?
Well, well, well... another development in the HDFE model space. Check out the new
#rstats
fixest 📦.
Claims to be significantly faster than lfe, reghdfe, et al. No GPU required and also supports non-linear models (logit, etc.)
Pretty niche, but here's one I use: You can trick R into giving the full marg. effect of interaction terms by writing the model as y~x1+x1:x2 (instead of y~x1*x2).
E.g. Say you want the full ME of 'Wind:Month6'. On the LHS you must add -2.368+4.051. The RHS gives 1.683 directly.
#EconTwitter
, what is one of your favorite programming tricks/commands?
#rstats
#Stata
One of mine: in LaTeX, to easily refer to a table/figure, define the file path in a new command:
\newcommand\path{../output/}
...
\input{\path/tbl.tex}
(tbl.tex is a table in its own file)
1970: One more lane will fix it.
1980: One more lane will fix it.
1990: One more lane will fix it.
2000: One more lane will fix it.
2010: One more lane will fix it.
2020: ?
via
@avelezig
I’m giving this workshop👇on high-performance data wrangling next week. $/€20 gets you in the door, with all proceeds to UKR aid orgs. I promise to make it worthwhile.
tl;dr The power of these newer data tools is honestly staggering. I use them all the time. You can too.
❗️Our next workshop will be on May 2nd, 6 pm CEST on Big Data wrangling with DuckDB & Polars!
Register or sponsor a student by donating to support 🇺🇦!
Details:
Please share!
#AcademicTwitter
#EconTwitter
Lots of good intro to ML resources available these days... but since this is
#EconTwitter
I should mention that
@edrubin
is making slides for his new MSc class available:
They look fantastic.
ICYMI: My "data science for economists" course material is available online.
We just got to webscraping and I'll be adding links to new lecture slides each week as part of this original thread. 👇
I'm teaching a "data science for economists" course this semester.
If you're interested in learning more about
#rstats
, Git(Hub), programming, databases, cloud computation, ML, etc., I'll be making all of my course material publicly available here:
It's my mid-term tenure review today. By chance, my former adviser (and all-round dude) Chris Costello was passing through town and we managed to grab breakfast together.
It's amazing how much of our success is just being lucky enough to have great mentors.
Last week, I mentioned a trick for quickly obtaining the full marginal effect of interaction terms.
Here's an expanded blog post version, showing that the same shortcut works for higher-order interactions and non-linear models.
With this year's academic job market looking miserable, it's a good time to ask: How are we preparing our students for life post-grad school? (Graduate students: How are you preparing yourselves?)
Internships, networking, marketing, etc. Lots of ideas and sage advice here 👇
Since people are sharing their 1st year PhD war stories here's mine:
I went back to school after 5 years out. There are definite advantages to this, but one mistake I made was thinking it would be easy because I was a lot (like, really a lot) more focused than undergrad.
Reminder:
@lrberge
's fixest 📦 keeps getting better & better. A new version just hit CRAN with a bunch of new features/fixes.
Run high-dimensional fixed effect models at insane speeds... Like, anywhere from 10-100(!) times faster than lfe or reghdfe.
For years, I’ve uncritically (if infrequently) parroted the alarming findings of the Stanford Prison Experiment.
Turns out it is very close to junk science. I’ll be correcting the record whenever given the chance now.
THREAD 👇
Doing my best to stay off social media. But, just to be clear:
Damn the racists.
Damn police brutality.
Damn the nihilists and looters trying to co-opt righteous, peaceful protest.
And damn this abominable administration.
There have been some innovative new measures od COVID19's impact on the economic, incl Google Trends (
@paulgp
et al.) and electricity (
@SteveCicala
).
Here's another one that might be of interest: Global GitHub activity.
"The use of chopsticks, for instance, is highly predictable using sequenced genes. It would of course be ridiculous to conclude that chopstick use is genetically determined and unalterable by social institutions."
🔥from
@maxkasy
.
The politics of machine learning (part II): What are the dangers of ML - filter bubbles and polarization, targeting, inequality, and discrimination, ownership of data and concentration of power:
Woah... Just seen that
@rstudio
have made their awesome Package Manager publicly available.
Among other things, this means you can install
#rstats
binaries on multiple Linux distros. (No more installing from source!)
This is a fun thread.
For me it was reading Schelling's Micromotives and Macrobehaviour in undergrad.
During grad school, I'd say no book had a deeper impact on my thinking than E.T. Jaynes' Probability Theory. It is phenomenal.
I’m curious about what intellectual events changed your life? I’ve said it before but just so I can go first: Becker’s Nobel speech made me go to grad school, Roth and Sotomayor made me think in terms of two sided matching, and George Box made me value what models were and did.
To be clear, I think it's only natural and correct for the US to carefully manage its immigration policy. (I am not advocating open borders here.) But the current system seems... sub-optimal from nearly everyone's perspective.
I'm increasingly converting all of my research projects into packages. It makes so much sense once you think about.
Click through for the tips. Stay for the amazing slide deck.
Looking forward to speaking at
@nyhackr
tonight! I'll be talking about why your project is already an
#rstats
package and why you're already walking the path of R development, even if you don't realize it
slides:
The funny thing is, whenever someone condescendingly tells me that “R is just gimmicks” (or some such nonsense), I know that I just have to show them something like this to convince them to take it seriously.
Not to be that guy... but logging back on here after a few days and thought this would have received more traction. Don't you people like ggplot2 tricks??
A shortcut I like to use is calling multiple geoms in an lapply() call, since this automatically generates a list. Works well for investigating plotting variations, e.g.
ggplot(diamonds, aes(carat)) +
lapply(c(50,200), function(b) geom_histogram(bins=b, alpha=0.3))
Reminder that lots of
#rstats
packages come with built-in, interactive demos and they are great.
## list all demos
demo(package = .packages(all.available = TRUE))
## list demos for xgboost
demo(package = "xgboost")
## run one
demo("basic_walkthrough", package = "xgboost")
I made a slide deck on data.table for my "big data in economics" class:
The target audience is
#rstats
users who are familiar with the
#tidyverse
, but are (a) interested in what
#rdatatable
has to offer and/or (b) how the two complement each other.
Lecture 8: Regression analysis in R.
A whirlwind tour of (nearly!) all the main regression functions and packages that an applied economist could want: OLS, IV, FE models, robust and clustered SEs...
I made a slide deck on data.table for my "big data in economics" class:
The target audience is
#rstats
users who are familiar with the
#tidyverse
, but are (a) interested in what
#rdatatable
has to offer and/or (b) how the two complement each other.
With all the friendly language war stuff stuff by happening in
#econtwitter
today (R vs Python vs Stata), I want to make over point that I haven't seen elsewhere: Academic econ no longer occupies a privileged position in the econometric sphere.
We should end the "dinners as part of the interview" hiring practice in academia. It's awkward for the interviewee, and the whole "we need to know if they'll fit in with the department"/collegiality thing is just another way to discriminate against nontraditional hires.
@saraunsaree
Either is great. (Anyone who claims x is bad/outdated etc is profoundly misinformed.) I started in Python but now predominantly use R because it has better support for my specific needs. Your needs may be different. If you decide to go the R route:
I recently listened to a podcast with Daron Acemoglu in which he mentioned using voice recognition software to write his books (and maybe even papers?) Does anyone else do this?
#EconTwitter
You have to be very lucky to land an academic job at the best of times. (Trust me, I know.) But, for those of us in quantitative / data-heavy fields at least, your outside option has never been so good.
Tool up, stay informed, and know that career happiness comes in many forms.
So unlike
@causalinf
, for instance, knowing (& liking) my outside options were big factors in my mental health. "If prelims go south on me, I can always move on to this other thing.. which is pretty cool actually."
Reducing pressure was key. Same on the job market 4 years later.
Google Dataset Search is now officially out of beta.
"Dataset Search has indexed almost 25 million of these datasets, giving you a single place to search for datasets & find links to where the data is."
Nice work, Natasha Noy and everyone else involved!
Lecture 10: Writing functions and programming intro.
#rstats
#programming
(No nice data visualizations for this lecture, so here is a picture of a border collie puppy instead. Also: objectively the best kind of puppy.)
@DinaPomeranz
[1/n] I did my PhD in Norway and the support was fantastic. We seriously neglect quality of life in deciding a PhD (mental health, finances, etc.) and I think my experience was hard to beat. More about that here:
How are economists based in developing countries, with limited funding for international trips, ever supposed to catch up (that is, if they are accepted/invited to present at seminars/conferences)?
Okay
#econtwitter
, I see a lot of people asking for advice on parallel programming. This lecture has you covered.
Going parallel can yield huge benefits, is remarkably easy to do nowadays, and shouldn't cost you a cent extra.
Lecture 12: Parallel programming.
#rstats
#parallel
#multicore
Learn how easy it is to run R code in parallel (and then congratulate yourself for being so awesome).
This is bad. ICE just told students here on student visas that if their school is going online-only this fall, the students must depart the United States and cannot remain through the fall semester.
"The blue paradox: Preemptive overfishing in marine reserves":
Joint work with Kyle Meng, Gavin McDonald and Chris Costello now available online at
@PNASNews
!
A quick thread on our motivation and key findings... [1/n]
For decades, the go-to textbook for natural resource economics has been Conrad. He has a bunch of nice Excel exercises for solving+plotting dynamic models. It would be cool to update these with Shiny app versions. Here's one example:
#EconTwitter
But working long hours in the private sector simply does not prepare you for having to *think* constantly. About difficult concepts. A lot of non-academic work is rote. Sure the hours are long (and potentially highly stressful), but the tasks generally aren't mentally exhausting.
@KelliMarquardt
Honestly, the best guard I've found against this is just to put *everything* up on the web.
Sounds counterintuitive, but the bigger your "portfolio" the less you feel judged by any one thing. Plus, I def code better from the outset if I know I'm going to be sharing at the end.
@benconomics
@amandayagan
@edrubin
Seconded.
Amanda, looks like you’ve already been inundated with good links, but if you’re looking for a “bare minimum” (<1 hr) R intro guide for Stata users... Ed and I put this tutorial together for some of our colleagues:
@ryanbedwards
@ArielOrtizBobea
@dynarski
Mate, the shock going from a J-1 to H-1B is something. I'm half convinced the current system persists simply mostly because most US citizens (i.e. the actual voting populace) have no idea what it actually entails.
My American friends are incredulous whenever I explain the restrictions of the H-1B visa. My spouse is not allowed to work. I cannot accept short-term work offers (consulting gigs or paid speaking engagements). I cannot put my house on Airbnb. Even volunteer work is complicated.
@Andrew___Baker
@dlmillimet
@snavarrol
@causalinf
At the risk of shameless self promotion, feel free to borrow from mine:
Last I checked, it had racked up something like 12,000 unique visitors. Clearly there's demand.
@edrubin
are starting to work on a book version. More news hopefully soon ;)