Dan Becker Profile Banner
Dan Becker Profile
Dan Becker

@dan_s_becker

Followers
5,067
Following
888
Media
117
Statuses
2,125

Data scientist. Ex-Google. Founded Decision AI (acquired by DataRobot.) Thinking about how to make AI more practically useful.

Golden, Colorado
Joined October 2011
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@dan_s_becker
Dan Becker
2 years
5 years ago (after Alphago beat Lee Sedol), I and many others thought RL would soon change the world. It's impact has been smaller than we anticipated (mainly due to the Sim2Real problem) If LLM's have less impact by 2027 than many now expect, what will be the reason?
65
44
486
@dan_s_becker
Dan Becker
5 years
Kaggle is formally releasing a new micro-course by @alexis_b_cook next week. But it's already available at It's just so good. Possibly the single best resource on the internet for someone new to Python who wants the fast path to useful skills.
6
101
419
@dan_s_becker
Dan Becker
5 years
What would a more applied approach to AI look like? I think I have the answer. So I'm leaving Google and Kaggle to build a business around it. Check it out at Want help getting more than predictions, so you can optimize decisions? Let's talk. I can help
11
29
275
@dan_s_becker
Dan Becker
1 year
Announcing something I've wanted to do for years A Decision Optimization course with @weights_biases covering both simple & sophisticated techniques to help data scientists and ML engineers make their existing skills far more valuable The backstory 👇
9
56
255
@dan_s_becker
Dan Becker
3 years
I'm 50/50 on whether we'll still use deep learning in 2030 But I'm confident we'll still use transfer learning Transfer learning is such a good idea, it's gotta be here to stay.
9
19
225
@dan_s_becker
Dan Becker
3 years
1) Start with a brainless baseline 2) Repeatedly make small improvements That's how xgboost and deep learning work It's how people run successful ML projects Not a bad strategy. We should use that more in other places.
8
31
210
@dan_s_becker
Dan Becker
6 years
Kaggle just released a new Python course based on the wildly successful 7-day Learn Python Challenge. Check it out: Great explanations, and a range of exercises that will be fun for both new and experienced Python programmers.
0
49
200
@dan_s_becker
Dan Becker
5 years
Great summary of the state of RL. Why it has huge potential How it currently doesn't work (really, it doesn't) Suggestions on where to go from here I hope RL returns from its academic meandering, and we refocus on what's needed to solve real problems
1
30
200
@dan_s_becker
Dan Becker
6 years
For anyone that wants to Learn Python, Kaggle will be host the "Learn Python Challenge" from June 11-18. In 20 minutes a day, you'll learn the basics most relevant for data science (and apply it to interesting hands-on puzzles).
4
75
196
@dan_s_becker
Dan Becker
5 years
Kaggle is hiring an AI educator. Want to have a global impact? Here's your chance to share your skills with 10000s of data scientists a month.
3
53
168
@dan_s_becker
Dan Becker
2 years
I keep hearing the claim "ChatGPT is just autocompletion" or "modern AI approaches just predict the next word." That changed a year ago with the InstructGPT paper. Important read for anyone that wants to understand current AI
9
26
170
@dan_s_becker
Dan Becker
5 months
Writing about LLMs has so much hype and academic stuff with little actionable insight. This article is clear, real and practical. It's a great read
2
31
159
@dan_s_becker
Dan Becker
6 years
My favorite questions when interviewing data scientists are about ML explainability. ML explainability is so useful, but it isn't as widely known as it should be. Kaggle has a free micro-course teaching the key ideas in ML explainability.
1
31
130
@dan_s_becker
Dan Becker
3 years
I just saw that the notebooks I authored for #Kaggle Learn courses have been forked over 2,000,000 times 🤯 There are a lot of great, free, applied data science courses at
5
9
126
@dan_s_becker
Dan Becker
3 years
Python puzzle: My code is fns = [lambda x: term for term in ('a', 'b', 'c')] out = [f(None) for f in fns] --- The result is that out is ['c', 'c', 'c'] What's happening here?
13
21
125
@dan_s_becker
Dan Becker
5 years
I heard about @streamlit earlier this week Tried it for the first time this morning 🔥Holy smokes 🔥 I'm not sure I'll ever write a Jupyter notebook again. They might still have some use case? But Streamlit is shockingly nice to use Thx @HamelHusain for telling me about it
6
14
111
@dan_s_becker
Dan Becker
4 years
Why is this workflow uncommon? 1) Train ML model 2) Calculate absolute value of errors in the validation set, and build a "confidence" model that predicts error magnitudes? Then prospectively, you'd make calls to both models, yielding both a prediction and a confidence level
19
10
99
@dan_s_becker
Dan Becker
4 years
HYPOTHESIS: Most data science today is still just experimentation TEST: Many people seeing this are data scientists. Who has personally built a model for their current job that has been put in production or used to inform a meaningful decision?
42
11
101
@dan_s_becker
Dan Becker
2 years
I've been using Pandas for about 10 years, and I still improved my Pandas skill working through the Effective Pandas book by @__mharrison__ Nice book
4
5
90
@dan_s_becker
Dan Becker
3 years
@VCBrags I had to learn how to cook again after I left my last job
0
0
90
@dan_s_becker
Dan Becker
7 years
3 of the very best data scientists I've met have no college degree. And I more generally observe ~ 0 correlation between DS skill and formal degree. My samples size is small, but I'm still convinced academic achievement is terribly overrated in our field.
7
19
89
@dan_s_becker
Dan Becker
3 years
Plot twist: Satoshi was a growth hacker in the Nvidia marketing department.
1
4
77
@dan_s_becker
Dan Becker
3 years
Announcing DataRobot's acquisition of my startup (Decision AI)
7
5
77
@dan_s_becker
Dan Becker
4 years
Some data scientists want to be player 1 Others want to be player 2 We built Decision AI purely #2 , and I feel good about that bet We'll see
Tweet media one
6
10
75
@dan_s_becker
Dan Becker
3 years
This problem isn't unique to ML/AI It comes up with most data analysis, even casually looking descriptive statistics We're tempted to attribute differences to whatever feature tells a good story... but data collection usually introduces many confounding factors
4
11
74
@dan_s_becker
Dan Becker
5 years
People think they can't improve themselves quickly. But the 4-hour Intro to ML course on Kaggle gives you enough background to independently have fun and grow with your own Machine Learning projects.
2
15
76
@dan_s_becker
Dan Becker
3 years
Most AI researchers hide human knowledge from models, so the model is a tabula rasa to learn from data What about this: Try to embed as much human knowledge as possible. Then learn from data on top of that It's less focused on "AGI" and more focused on solving problems well
10
7
69
@dan_s_becker
Dan Becker
5 years
I've historically been luke-warm about matplotlib But a multi-year tour of other python graphing libraries has me more appreciative Is flexible enough for anything I'd want... and while others might have clever API's, the comprehensive matplotlib docs make up for it.
5
4
63
@dan_s_becker
Dan Becker
1 year
Today @weights_biases is releasing part 2 of my Decision Optimization course. Learn the key tricks about going from standard loss functions to good decisions. And see how to directly optimize the outcomes you care about
Tweet media one
1
9
63
@dan_s_becker
Dan Becker
4 years
My Twitter newsfeed would suggest everyone's busy with fun modeling libraries StackOverflow tells a different story A reminder that people are busy munging data. Also, I didn't realize Spark was this widely used (or at least subject to so many questions).
Tweet media one
11
7
60
@dan_s_becker
Dan Becker
4 years
I've seen some great data scientists struggle to have a practical impact because they don't know simulation techniques I'll share a better way in this webinar
Tweet media one
2
4
59
@dan_s_becker
Dan Becker
6 months
We've added a fantastic line-up of speakers to our LLM fine-tuning course () It includes @eugeneyan @sh_reya @TheZachMueller @winglian @charles_irl This will be a unique opportunity for those who can join. But registration closes May 10.
4
4
53
@dan_s_becker
Dan Becker
6 years
Explanations for why GANs produce nice results always felt hand-wavy I just saw this video with an approach to generate high quality images without GANs I predict we see more new approaches to generative models in 2019. Maybe replacing GANs entirely
2
10
53
@dan_s_becker
Dan Becker
4 years
Why I started Decision AI
Tweet media one
Tweet media two
3
9
50
@dan_s_becker
Dan Becker
4 years
I've had conversations with 40 pro data scientists in the last 2 weeks Most assumed they were using ML predictions in an approximately optimal way... and most found they could do much better I'll show how to deliver more value from the same predictions
1
8
46
@dan_s_becker
Dan Becker
3 years
Anthony Goldbloom was my most helpful angel investor when I started Decision AI He just started a VC fund (AIX Ventures) with Richard Socher, Pieter Abeel, and Chris Manning! That's a legit all-star lineup of AI. If I started something, those are the first investors I'd want
2
3
47
@dan_s_becker
Dan Becker
2 years
I hear "PhD data scientist" to describe the persona of great data scientists. But the most effective data scientists I know don't have advanced degrees. So I'm going to start referring to the expert DS persona with the phrase "high school graduate data scientist"
1
3
47
@dan_s_becker
Dan Becker
2 years
@__mharrison__ My favorite ML podcasts are Gradient Dissent by @l2k and TWIML by @samcharrington The Analytics Engineering Podcast is great for analytics and the data industry.
2
2
46
@dan_s_becker
Dan Becker
3 years
@arkosiorek Sim2real in RL RL is the most promising and least practically useful area of ML... And it's current uselessness is because learning in simulation doesn't work in reality
3
1
45
@dan_s_becker
Dan Becker
6 years
Data scientists won't tune hyperparameters or design architectures in 5 years. AutoML will replace that. Instead we'll structure rich (multi-equation) models to reflect outside knowledge. That can't be automated. And Probabilistic Programming will be the key tool to do it.
1
8
44
@dan_s_becker
Dan Becker
3 years
If data is the new oil, let's start using this oil for things more important than ad targeting and churn prediction.
4
4
41
@dan_s_becker
Dan Becker
3 years
I've heard gatekeeping from students saying projects should use personal implementations of ML algos. But I never hear that from people with more experience You're just so much faster and deliver fewer bugs when using well-tested tools with higher-level APIs
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
3 years
Life is easier with tools like @huggingface , @weights_biases , and @fastdotai , don't you agree? 😄
Tweet media one
13
50
582
5
3
42
@dan_s_becker
Dan Becker
7 years
The #Gartner magic quadrant is so bad. Everyone knows it is pay-to-play, and the results contradict reality so badly. Gartner's success says something disconcerting about the executives who purchase these technologies.
Tweet media one
4
13
40
@dan_s_becker
Dan Becker
10 years
Want to run & experiment with convolutional neural nets? Fork and run this. Be up and running in 2 minutes:
1
16
40
@dan_s_becker
Dan Becker
7 years
@kaggle I always liked this one. But let's be honest, the stats class caused the change in beliefs :)
Tweet media one
1
11
39
@dan_s_becker
Dan Becker
6 years
For anyone who missed the ML For Insights Challenge 🧠💪 It's now available as a course on Kaggle Learn 📈
2
7
36
@dan_s_becker
Dan Becker
6 years
Why is it hard to find time to learn coding? Because most people's first programs are uninspiring. The new Data Visualization course by @alexis_b_cook changes all that. You can make fun and impressive graphics from Day 1, and learn Python in the process
1
7
36
@dan_s_becker
Dan Becker
5 months
I like the standard LLM fine-tuning tools. So I wan't sure I'd like Predibase Then I used it. Their UI is really nice... especially the data visibility part. I'm going to use this more and more So I'm pumped that @predibase just offered free compute credits to all participants
1
6
36
@dan_s_becker
Dan Becker
3 years
Early TF users struggled to choose between the various higher-level APIs. The embarrassment of riches was solved when Keras became the official high-level TF API Now Jax is in the same place as early TF w/ Flax, Haiku, Trax, Elegy I hope the community consolidates on one again
5
2
33
@dan_s_becker
Dan Becker
3 years
Someone asked for a quick reference on Transformer models. Here's my favorite (by far): Everything @JayAlammar writes on NLP is fantastic
1
4
33
@dan_s_becker
Dan Becker
9 months
@HamelHusain and I are thinking about teaching a four-session, cohort-based course on LLM fine-tuning for data scientists and software engineers. We set up this survey to gauge interest: If you take the survey, we'll make sure you're the first to hear
1
3
31
@dan_s_becker
Dan Becker
6 years
Model interpretation is so valuable to data scientists, but way too few data scientists know how to see what their ML models are learning. Starting next week, Kaggle Learn can show you how to extract the insights from your ML models
0
7
33
@dan_s_becker
Dan Becker
7 years
@kaggle Probably not a causal relationship
Tweet media one
0
10
33
@dan_s_becker
Dan Becker
5 years
Are @kaggle competitions getting more competitive now that so many people are quarantined at home?
2
0
33
@dan_s_becker
Dan Becker
5 years
DS project I hope someone does: Curate top 50 #COVID19 tweets of the day (w/ Twitter API) Signals for ranking top 50? Total likes. Retweets by people like @NAChristakis , etc Submit to for auto-updating and big audience Respond here to brainstorm 1/2
3
5
32
@dan_s_becker
Dan Becker
4 years
I wanted to see climate change so far in different places, so I made this @streamlit app to explore it There's been a lot of aridification in the American Southwest, but it's changes in Europe that surprised me most
Tweet media one
1
6
32
@dan_s_becker
Dan Becker
6 years
A new session of starts on February 11. Their materials look really complete and well thought out. More info here:
2
11
29
@dan_s_becker
Dan Becker
4 years
@kareem_carr I had the views in this thread until I read and looked at the literature @sarahookr points to It's intuitive that models transmit rather than create bias. But research shows that's not correct
@sarahookr
Sara Hooker
4 years
Yesterday, I ended up in a debate where the position was "algorithmic bias is a data problem". I thought this had already been well refuted within our research community but clearly not. So, to say it yet again -- it is not just the data. The model matters. 1/n
29
756
3K
1
3
28
@dan_s_becker
Dan Becker
5 years
I'm frequently amazed how disconnected our conservation efforts are from what actions actually help with environmental conservation. Today's reminder: landfill vs recycling
3
4
28
@dan_s_becker
Dan Becker
3 years
I think of this graph from Harvard Business Review whenever someone recommends an HBR article... Which sadly isn't never
@jlowin
Jeremiah Lowin
3 years
I wonder what they think “data science” actually is
Tweet media one
75
187
1K
0
0
28
@dan_s_becker
Dan Becker
4 years
@rasbt Step 1: Spark on Python becomes reasonable alternative to Scala Step 2: Realize we don't need Spark
0
0
28
@dan_s_becker
Dan Becker
4 years
I bet a lot of developers can write CSS 10X faster than me And I could probably write pandas code 10X faster than them Sooooo, I guess we're all 10X engineers! 🥳🥳🥳
0
1
28
@dan_s_becker
Dan Becker
2 years
Wow. It came up with this in <5 seconds That is superhuman performance
Tweet media one
1
2
27
@dan_s_becker
Dan Becker
5 years
To all the haters who said tech is building addictive technologies that don't improve users' lives: You haters were right. Sorry for ever doubting you.
0
10
26
@dan_s_becker
Dan Becker
3 years
Without the gamification of Kaggle competitions, I never would have gotten into ML
3
0
26
@dan_s_becker
Dan Becker
3 years
@HamelHusain I tell my kids that our family is a tech startup They thought it was weird at first, but I showed them the deed to our home, confirming they have no equity. So now they get it
4
4
26
@dan_s_becker
Dan Becker
5 years
I just saw that the machine learning lessons I wrote for #kaggle Learn surpassed a 1M uses. That's pretty good. Though the best courses are the ones by @alexis_b_cook Worth checking those out at
Tweet media one
4
1
26
@dan_s_becker
Dan Becker
4 years
I love reading data science stuff But my phone pushed a story from "Towards Data Science" about GPT-3 replacing programmers Thank phone... for the reminder to block articles from Towards Data Science
0
0
24
@dan_s_becker
Dan Becker
5 years
2019 is one heck of a year to get into tech. Most top titles in Glassdoor's "Highest Paying Entry Level Jobs" are some version of - Data scientist - UI designer - Software engineer
Tweet media one
0
5
25
@dan_s_becker
Dan Becker
2 years
@bernhardsson Perhaps you haven't heard about Amazon's new guarantee. If your data isn't still available millennia after humans wipe each other off the face of the earth... they'll refund your storage charges.
0
0
22
@dan_s_becker
Dan Becker
4 years
Most people struggle to use ML models well Solving this problem is the #1 thing data scientists can do to build trust with colleagues and impact a company's bottom line If you're ready for a modeling tool that helps you bridge the gap, we should talk
Tweet media one
2
5
23
@dan_s_becker
Dan Becker
5 years
Congrats DataRobot on raising another $206M. It's a hard working team solving real problems Data scientists and aspiring data scientists should think about developing skills that will be useful in an age of AutoML Fiddling with model parameters will be an outdated workflow 1/3
2
4
22
@dan_s_becker
Dan Becker
4 years
Interactive data analysis is WAY more engaging than static graphs or text People default to static publishing because it used to be SO much easier. Tools like Streamlit & Dash are changing that I wonder what a Substack for interactive data apps would look like. @myelbows ?
5
1
23
@dan_s_becker
Dan Becker
2 years
I kinda don't get how @Amplitude_HQ made a product analytics tool that's so much better than every other BI or product analytics tool I've ever used
2
2
23
@dan_s_becker
Dan Becker
3 years
@GiorgioMantova @burkov Some folks from Hugging Face are writing the book you are looking for:
1
2
22
@dan_s_becker
Dan Becker
6 years
I'm frequently asked how to get a first job in data science. My answer, which I'm confident is good advice: Do interesting projects Make the results public Make them look polished Your resume might get someone to look at your projects, but the proof is in good (real) work
@benhamner
Ben Hamner
6 years
Good discussion on Kaggle’s role in an online portfolio for job candidates. TL;DR: it’s helpful to create and link to high quality kernels - demonstrates that you can apply your skills vs. “I completed these courses”
0
9
76
1
3
19
@dan_s_becker
Dan Becker
4 years
My Dell 9500 running Ubuntu doesn't sleep when I close the lid But it sleeps when I reopen the lid... about 80% of the time This is nuts
6
0
22
@dan_s_becker
Dan Becker
4 years
Have you seen people using averages or point predictions when they should look at distributions? Decision AI tracks full distributions, because it's important in so many practical situations
Tweet media one
1
2
21
@dan_s_becker
Dan Becker
2 years
I tried a few transcription API's last week. None had the speed + accuracy I wanted. Just tried OpenAI's new API (based on the whisper-large model) It transcribed a 1 minute clip in 5s with no transcription errors. Really happy with this.
1
0
21
@dan_s_becker
Dan Becker
6 years
@HamelHusain @lc0d3r @kaggle Here are the links @HamelHusain : 0. Use cases for ML Insights: 1. Permutation Importance: 2. Partial Dep Plots: 3. Shap Values: 4. Advanced uses of Shap Values:
2
7
19
@dan_s_becker
Dan Becker
4 years
I'd heard algorithmic bias is more than a data problem. But found it counterintuitive until this thread. Thanks for the explanation @sarahookr
@sarahookr
Sara Hooker
4 years
Yesterday, I ended up in a debate where the position was "algorithmic bias is a data problem". I thought this had already been well refuted within our research community but clearly not. So, to say it yet again -- it is not just the data. The model matters. 1/n
29
756
3K
1
4
21
@dan_s_becker
Dan Becker
4 years
It's a common practice to use ML without even thinking about casualty But the resulting predictions generalize poorly in a changing world. Like the one we live in There's a lot you can do, and we are learning more. Glad to see research like this
0
1
21
@dan_s_becker
Dan Becker
3 years
I frequently want alerts when code finishes running so I can check the results Is there a super-easy way to drop in a line of Python that sends me an SMS?
5
4
21
@dan_s_becker
Dan Becker
3 years
Accelerated sklearn patch from Intel gives massive speedups in some cases
Tweet media one
0
8
18
@dan_s_becker
Dan Becker
7 years
@jamie_hall This may explain why we can no longer promise "study hard as a kid, be willing to work hard, and you'll be ok"
@AndrewYNg
Andrew Ng
7 years
Tech world is used to tectonic shift every 5 years from new inventions. Now tech has infected other industries so everyone has to shift.
19
453
985
1
3
18
@dan_s_becker
Dan Becker
6 years
Congrats to @DataRobot for raising another $100M round of financing. They have an awesome product that can make most data scientists and analysts more effective.
0
2
18
@dan_s_becker
Dan Becker
1 year
I'm experimenting with LLMs to define 3D models of physical objects (write OpenSCAD and the FreeCAD Python API code) Anyone exploring related topics? I'd love to chat.
3
5
18
@dan_s_becker
Dan Becker
5 years
Data scientists need to make models quickly & account for breaks from the patterns in historical data Probabilistic simulation is the tool We're building a tool for it: Pondering whether simulation is right for your problem? Let's chat
2
10
17
@dan_s_becker
Dan Becker
6 years
I admire the AutoML Tables team for publicizing the entry with the tool before the competition started. They could have entered silently, and published the result only if it was good But this result is more representative and compelling because it wasn't cherry-picked.
@quocleix
Quoc Le
6 years
Update from #KaggleDays , 5 hours into the competition and Google AutoML still maintains its lead. Three hours to go (five hours since I took the pictures).
Tweet media one
Tweet media two
5
11
96
2
0
19
@dan_s_becker
Dan Becker
4 years
Say hypothetically you're a student at a US university You'd rather not pay a bajillion $ for online classes Fall job options for a 20 year old aren't great You aren't some Thiel fellow type autodidact who can solve nuclear fusion in your semester off What would you do?
8
1
17
@dan_s_becker
Dan Becker
4 years
@AnnieLowrey @yanathomas We already have too much to read rather than too little. So the lack of content may be a feature rather than a shortcoming I like your writing, and I'd buy an Atlantic to read it. But I won't spend the time picking an Atlantic off the shelf to figure if you wrote anything in it
0
0
16
@dan_s_becker
Dan Becker
5 years
Most people won't realize how important a development this is for Kaggle competitions. And in a couple years, the old style of kaggle competitions will feel primitive.
@kaggle
Kaggle
5 years
We're excited to announce...🥁🥁🥁 Synchronous Kernels-only Competitions! What's this? Read all about it in this blog post by @wcukierski *AND* check out our 1st synchronous Kernels-only competition (linked in blog). #nofreehunch
Tweet media one
0
28
90
1
5
18
@dan_s_becker
Dan Becker
3 years
I recently spoke with #DataFramed , the @DataCamp podcast, about how data teams can move from making predictions to optimizing decisions Episode came out today
0
3
18
@dan_s_becker
Dan Becker
4 years
We do ML that "doesn't matter" because standard ML workflows are insufficient to optimize decisions in complex dynamic environments I think my new project at will make ML on tabular data vastly more actionable
1
1
17
@dan_s_becker
Dan Becker
4 years
Personally, I like ML research. But it covers different issues than real-world problems Decision AI is built to improve real-world decision-making It isn’t for everyone, but pragmatic data scientists (and the people who pay them) will love the difference
Tweet media one
0
0
17
@dan_s_becker
Dan Becker
4 years
Decision AI is focused on letting data scientists hit exactly these criteria Most people won't realize how far off ML has been until they see a better way.
@tayloramurphy
Taylor A Murphy
4 years
Sure, machine learning is fun, but have you ever written a function that delivers business value, is well tested, and can be iterated on by your colleagues?
30
102
904
0
2
17