Ben Hamner Profile Banner
Ben Hamner Profile
Ben Hamner

@benhamner

Followers
33,010
Following
3,673
Media
735
Statuses
4,564

working on something new. formerly @kaggle cto

SF
Joined March 2009
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@benhamner
Ben Hamner
3 years
Wordle xxx 1/6 🟩🟩🟩🟩🟩 Get Wordle right on your first guess using the daily ⬛🟨🟩 tweet distribution
39
231
1K
@benhamner
Ben Hamner
2 years
Let’s make a deal: America will adopt the metric system, Europe will adopt 10,000,000.00-style number formatting
414
538
10K
@benhamner
Ben Hamner
4 years
Programming: 10% writing code. 90% figuring out why it doesn’t work Analyzing data and ML: 1% writing code. 9% figuring out why code doesn’t work. 90% figuring out what’s wrong with the data
74
2K
8K
@benhamner
Ben Hamner
2 years
@brianwilt Both America and Europe are wrong here. ISO-8601 dates (YYYY-MM-DD) is a hill I will die on. Unambiguous and sortable as strings!
70
72
3K
@benhamner
Ben Hamner
2 years
Between ChatGPT and GitHub Copilot I think I spoke more to AIs this week than humans
55
198
2K
@benhamner
Ben Hamner
7 years
Easy parts of applying machine learning: .fit() .predict() Hard parts: .clean() .transform() .get_data() .frame_problem() .debug() .handle_nonstationarities() .handle_missing_inputs()
29
830
2K
@benhamner
Ben Hamner
7 years
Replace "AI" with "matrix multiplication & gradient descent" in the calls for "government regulation of AI" to see just how absurd they are
60
913
2K
@benhamner
Ben Hamner
7 years
Excited to launch Kaggle Learn - interactive tutorials on machine learning, deep learning, R, and data visualization
12
437
1K
@benhamner
Ben Hamner
7 years
The blockchain movement is 100x worse than the NoSQL movement. Every time I see a new blockchain idea I ask “would a relational DB be unambiguously better in every regard here?” (generating page views expected). 99% of the time the answer’s yes
34
290
1K
@benhamner
Ben Hamner
4 years
VS Code data structure visualization extension. This is neat
Tweet media one
12
263
1K
@benhamner
Ben Hamner
2 years
pandas pro tip: use .format(thousands=",") to make larger numbers legible (I'm confused why this isn't the default display)
Tweet media one
12
122
993
@benhamner
Ben Hamner
11 months
How did a tiny research team at OpenAI outperform thousands of scientists at Microsoft Research? Turns out they used Google Meet instead of Microsoft Teams
19
62
979
@benhamner
Ben Hamner
7 years
Deep learning and AI get all the buzz/press. The untold story is all the hard valuable work to create high quality datasets that enable them
26
317
847
@benhamner
Ben Hamner
7 years
It’s embarrassing and infuriating that some #NIPS2017 authors couldn’t get visas to present their work. USA should be leading through enlightened example, not disgusting racism. We lose when we can’t attract the top AI minds, whatever they look like and wherever they’re born
19
245
782
@benhamner
Ben Hamner
6 years
Congratulations @mikb0b , who just became our youngest @kaggle grandmaster at 17
Tweet media one
11
119
724
@benhamner
Ben Hamner
10 years
When you write code, keep in mind that you're collaborating with your future self
23
635
711
@benhamner
Ben Hamner
3 years
Whoa! Pandas has a nifty function read_html for pulling a webpage, returning a list of dataframes representing the tables on it I wanted sunrise/sunset for SF, and thought I was going to have to get my hands dirty parsing Nope! It's a pandas one-liner
Tweet media one
11
83
681
@benhamner
Ben Hamner
7 years
Want to learn how to use Keras and Tensorflow to apply deep learning to computer vision problems? Great set of intro videos + exercises by @dan_s_becker on Kaggle Learn
Tweet media one
5
174
605
@benhamner
Ben Hamner
5 years
Federated learning: train machine learning models while preserving user privacy, by keeping user data on device (e.g. mobile phone) and only sending encrypted gradient updates (that can only be decrypted in aggregate) back to the server
7
165
587
@benhamner
Ben Hamner
10 years
A 3% decrease in California almond production would save as much water as completely shutting off water usage in SF (all homes, businesses)
73
923
568
@benhamner
Ben Hamner
4 years
Feature stores for ML: nice collection of tech talks from companies that have rolled their own feature stores for productionizing the data that feeds ML applications
5
130
578
@benhamner
Ben Hamner
5 years
We just launched the toughest @kaggle competition in a long time with @fchollet . Can software learn to generalize complex, abstract tasks from a tiny number of examples? Easy to get started on, and a good result would mean a substantial leap forward in AI
5
162
573
@benhamner
Ben Hamner
7 years
93% of public, upvoted Python kernels on Kaggle use pandas @wesmckinn . The only two other libraries directly imported >50% of the time are numpy (89%) and matplotlib (59%). Impossible to understate the impact pandas has had on the PyData ecosystem
13
193
564
@benhamner
Ben Hamner
6 years
Wow. This may be the most effective data visualizations I’ve ever seen. Brilliant use of a green screen. Worth watching all the way through
@weatherchannel
The Weather Channel
6 years
Storm surge will be a huge factor for Hurricane #Florence Check out what it might look like with @TWCErikaNavarro :
643
10K
23K
6
170
538
@benhamner
Ben Hamner
6 years
It's crazy how much our universities focus the next generation on test results, course completions, and degrees. I wish they empowered students to create and build. The transcript they should be aiming for is "here's the ten best things we created during our time here"
17
156
546
@benhamner
Ben Hamner
5 years
Pandas is a swiss army knife for working with data! This @kaggle notebook highlights 100 tricks
3
123
538
@benhamner
Ben Hamner
7 years
Statistics: you can't add probabilities like that! Machine learning: ¯\_(ツ)_/¯ it improves my model performance
10
168
502
@benhamner
Ben Hamner
6 years
Nice overview from @netflix on how they built an internal platform around their use for Jupyter notebooks. This resonates with the direction we’re building out with @kaggle kernels
2
174
509
@benhamner
Ben Hamner
7 years
Data visualization in Python - nice set of interactive notebook tutorials by @ResidentMario
Tweet media one
1
152
497
@benhamner
Ben Hamner
7 years
Three most used datasets in #NIPS2017 : 1. MNIST (110 papers) 2. CIFAR (79 papers) 3. ImageNet (60 papers)
Tweet media one
11
221
492
@benhamner
Ben Hamner
4 years
As data scientists, when an analytics result doesn't match our expectations, we scrutinize everything to explain it (data issues, code issues, etc.). This scrutiny often finds bugs that overturn the result. I worry that we only apply this scrutiny and rigor to unexpected results
17
74
486
@benhamner
Ben Hamner
6 years
Deep convolutional neural network trained and evaluated on 200,000 breast cancer exams achieves an AUC of 0.895, equivalent to expert radiologists. A hybrid model combining the radiologist and machine readings achieves the best results
4
165
480
@benhamner
Ben Hamner
6 years
Looks like NIPS 2018 may have sold out in under 15 minutes. For those debating ML hype, getting a ticket to a ML conference is now more challenging than a Taylor Swift conference or a Hamilton showing
9
200
475
@benhamner
Ben Hamner
6 years
Kaggle Kernels now supports GPU’s! You can attach one to your kernel through the settings tab. Here’s an example of training a model on a GPU
7
147
449
@benhamner
Ben Hamner
4 years
What is a perfect date? YYYY-MM-DD (ISO 8601 format)
18
26
449
@benhamner
Ben Hamner
5 years
I just trained a 1 trillion parameter neural net! All parameters just happen to be 0
15
27
447
@benhamner
Ben Hamner
2 years
I have one problem: I need to install a Python package Great, now I have 99 problems
22
28
433
@benhamner
Ben Hamner
7 years
We now have over 10,000 public datasets shared on Kaggle! This is a key milestone in our mission to help the world learn from data
Tweet media one
3
165
420
@benhamner
Ben Hamner
2 years
tqdm in Python notebooks insanely easy to use: "for i in my_list:" becomes "for i in tqdm(my_list):" and you get a beautiful progress bar and ETA left for any long-running loop
Tweet media one
6
43
361
@benhamner
Ben Hamner
6 years
Most online courses are incentivized to get you to waste time on more online courses. We launched Kaggle Learn as a series of small, bite-sized tutorials because the best way to learn AI is developing your own projects as quickly as possible
4
88
348
@benhamner
Ben Hamner
10 years
Tech journalists have successfully predicted 1,000 of the past 1 bubbles
13
275
317
@benhamner
Ben Hamner
6 years
What is a data scientist's favorite tool? ⌘C-⌘V
23
58
310
@benhamner
Ben Hamner
5 years
We just launched a @kaggle challenge focused on open #COVID19 research questions, including data set of 29,000 relevant papers to help: Thanks White House @WHOSTP @allen_ai @NIH @Microsoft @Georgetown @ChanZuckerberg for rapid collaboration on data
2
148
308
@benhamner
Ben Hamner
8 years
Fears of machine intelligence putting data scientists out of work is like being scared of compilers eliminating programming jobs in the 70s
7
136
293
@benhamner
Ben Hamner
7 years
To all gamers who got told you weren't doing good for society: A massive thanks for funding GPU R&D, which enabled this wave of AI advances
6
90
291
@benhamner
Ben Hamner
5 years
What machine learning commentators talk about: deep neural net flavor du jour, AI risk What machine learning practitioners talk about: messy data, data labeling, tuning learning rates, collecting more data, feature representation, cost functions, latency, productionization, ...
9
62
289
@benhamner
Ben Hamner
8 years
Everyone gets jazzed about ML algorithms High quality, context-appropriate data is the crucial enabler for every application I've touched
8
116
284
@benhamner
Ben Hamner
7 years
Kaggle now has an API for downloading data and submitting to competitions!
4
66
281
@benhamner
Ben Hamner
9 years
Most AI breakthroughs constrained by high quality datasets, not algorithms
Tweet media one
10
266
281
@benhamner
Ben Hamner
6 years
Saw a comment that there's close to a 1% chance of dying from a car accident. Was shocked it's this high. Back-of-the-enveloped the math, and it pans out
Tweet media one
16
63
274
@benhamner
Ben Hamner
3 years
Want to convert a daily time series to a weekly moving average? df["col"].rolling(7).mean() pandas is delightful
5
27
277
@benhamner
Ben Hamner
2 years
I had a cron job running for several weeks to notify me when a swim lesson spot opened up for my 10 month old Is this peak SF tech parent?
15
7
273
@benhamner
Ben Hamner
3 years
GitHub CoPilot's a super cool technology, but it's as close to automating your code writing as Gmail Smart Compose is to automating your email writing
7
25
259
@benhamner
Ben Hamner
7 years
One big takeaway from Kaggle's second kernels competition: limiting compute is an incredibly effective regularizer on model complexity
2
58
261
@benhamner
Ben Hamner
5 years
“The missing semester of your CS education” - looking at the syllabus, this is probably the most important set of skills to master for programming in practice. The shell, git, data wrangling, debugging, etc.
2
79
256
@benhamner
Ben Hamner
5 years
I bet the average length of hair in the US right now’s the longest it’s been in a century
14
11
251
@benhamner
Ben Hamner
7 years
The rules of machine learning: best practices for ML engineering by Martin Zinkevich
Tweet media one
0
66
243
@benhamner
Ben Hamner
3 months
Hiring several backend/data/ML engineers for our new(ish) company, focused on building high-quality structured data from raw, noisy inputs. Have funding, revenue, users
10
24
248
@benhamner
Ben Hamner
4 years
One big ML pain point has been putting models into production. It's now possible to incorporate Jupyter notebooks directly into production workflows, making this one step easier!
4
47
239
@benhamner
Ben Hamner
6 years
We have a fun new NLP @kaggle competition for you in collaboration with @Quora - train ML models on 1.3 million questions to classify them as sincere or insincere
0
64
236
@benhamner
Ben Hamner
7 years
Headline: "Killer AI will take over the world" Reality: "High quality datasets, addition, and multiplication empower the global economy"
9
98
234
@benhamner
Ben Hamner
7 years
Want to easily download the data on Kaggle? Use our API and CLI > pip install kaggle > kaggle datasets download -d rtatman/lego-database
2
84
235
@benhamner
Ben Hamner
5 years
One of our big focuses at @kaggle is improving the quality of the public data ecosystem. As part of this, we launched dataset usability ratings on 17000+ public datasets to promote better practices around documentation and tutorials
Tweet media one
2
70
236
@benhamner
Ben Hamner
5 years
Privacy-preserving #COVID19 tracing, in cartoon form
Tweet media one
4
106
225
@benhamner
Ben Hamner
6 years
"Scheduling notebooks at Netflix". cron on notebooks is a powerful idea - we've been thinking about how we want to incorporate this into Kaggle
2
55
224
@benhamner
Ben Hamner
4 years
Our newest @kaggle competition is OCR for chemical compounds. Can you apply ML to translate from an image of the chemical structure to the text string that represents it? 4 million chemical structure images to help solve this problem!
3
56
217
@benhamner
Ben Hamner
4 years
She had the right response. Gotta have standards, and this guy just wasn't up to ISO 8601
Tweet media one
3
32
215
@benhamner
Ben Hamner
6 years
Publish your dataset on Kaggle, and our new Kaggle Kerneler bot will write an automatic exploratory analysis on it for you in Python, showing you how to load and get started on the data
Tweet media one
5
52
213
@benhamner
Ben Hamner
6 years
One new Kaggler learned about machine learning during her maternity leave and finished in the top 2% of a challenge on identifying cell nuclei
0
53
213
@benhamner
Ben Hamner
6 years
Have you wanted to start learning Python for data and analytics but never taken the leap? Sign up for Kaggle’s “Learn Python” track, where you’ll learn to apply Python to a fun 20-minute puzzle every day from June 11-17
4
55
207
@benhamner
Ben Hamner
5 years
We now have over 20,000 datasets published on Kaggle! 📈🎉🎊🙌 Thanks to our designers+engineers hard work to build a platform for this, and to all of you, for making data you can open+accessible, and sharing your reproducible notebooks on these datasets
Tweet media one
6
47
209
@benhamner
Ben Hamner
7 years
Browser tabs on my computer multiply like rabbits. And then every once and a while there is a mass extinction event that forces the system to restart from scratch
10
24
210
@benhamner
Ben Hamner
7 years
"Sir, you're under arrest for attempted international terrorism. Setting learning rate α=100000 is above government-approved safe values"
5
47
203
@benhamner
Ben Hamner
5 years
We’re starting to formally invite automated machine learning tools to submit benchmark solutions to @kaggle competitions
3
50
202
@benhamner
Ben Hamner
7 years
37% of Silicon Valley was born outside the US. This number would even higher if there weren’t structural barriers in place to recruiting world-class talent, no matter where they happened to be born
7
54
204
@benhamner
Ben Hamner
7 years
Getting started with machine learning and want to explore different libraries and ideas? Here's some of our favorite ML-friendly public datasets on Kaggle that are (mostly) clean and easy to work with
2
65
202
@benhamner
Ben Hamner
6 years
Keras is the primary ML framework used by competition winners on Kaggle since 2016. Congrats @fchollet for creating an API that's incredibly intuitive and easy to get started with, while being flexible and powerful enough for state-of-the-art performance
@fchollet
François Chollet
6 years
What machine learning tools do Kaggle champions use? We ran a survey among teams that ranked in the *top 5* of a competition since 2016. The first question asked about the *primary* framework they used. Very happy to see confirmation that winning teams prefer Keras :)
Tweet media one
18
362
1K
0
46
199
@benhamner
Ben Hamner
8 years
I extracted the text of all the NIPS papers & published it as a dataset #nips2016
Tweet media one
4
111
196
@benhamner
Ben Hamner
6 years
It's funny how many people are worried about AI automating almost every job but their own. From the outside, it's easy to overlook the complexities inherent in other's jobs, and how far we are from automating almost all of them in practice.
7
39
191
@benhamner
Ben Hamner
5 years
White themes too bright for coding your Jupyter notebooks? @noderaider just launched a dark theme for editing @Kaggle Kernels. Welcome to the dark side, Kernels
Tweet media one
7
30
201
@benhamner
Ben Hamner
7 years
We now have thousands of open datasets on @kaggle ! Here's how to find one for you
4
86
196
@benhamner
Ben Hamner
6 years
This 👇. Sadly, trying to reproduce machine learning results from a PDF is kinda like trying to reproduce an extravagant dish from its Instagram photo. Sharing code and data, and starting from that, are critical!
@catherineols
Catherine Olsson
6 years
This has come up again, so I’m going to repeat it: If you’re learning ML and want to “reimplement a paper”, you should work from the *github code*, NOT the pdf. The algorithm that the authors actually ran is often subtly (& unintentionally) different from what the paper says.
27
178
844
4
57
197
@benhamner
Ben Hamner
8 years
Popular datasets referenced over time in NIPS papers. Surprisingly, MNIST reigns king #nips2016
Tweet media one
6
147
188
@benhamner
Ben Hamner
6 years
Data science glossary on @kaggle - a great curated list of kernels providing forkable and reproducible tutorials on machine learning algorithms
2
56
191
@benhamner
Ben Hamner
7 years
Many comments online are toxic and harassing. We want to provide the tools to detect and fix this using machine learning
8
76
191
@benhamner
Ben Hamner
3 years
Tweet media one
5
14
192
@benhamner
Ben Hamner
7 years
Super confused why we still use resumes. Get 100x the signal from domain profiles (GitHub, StackOverflow, Kaggle, etc.) & real work samples
29
46
191
@benhamner
Ben Hamner
3 years
Want to get started with game AI programming? Try the latest @kaggle simulation competition: Lux AI, a 1v1 resource-gathering game to produce enough light for your city to survive the night
0
32
189
@benhamner
Ben Hamner
7 years
You can now query all historic Bitcoin blockchain transactions through Kaggle Kernels. Here's a visualization of the network that led to the 10k Bitcoin pizza transaction early on
Tweet media one
4
71
189
@benhamner
Ben Hamner
7 years
AI != ML != DL != RL
9
34
185
@benhamner
Ben Hamner
8 years
One-liner to make colleague lose all credibility: echo -e "library(ggplot2)\nlibrary(ggthemes)\ntheme_set(theme_excel())" >> ~/.Rprofile
4
47
182
@benhamner
Ben Hamner
6 years
It’s ironic how GDPR has substantially increased email spam
5
43
181
@benhamner
Ben Hamner
3 years
On August 2nd, @kaggle is kicking off "30 days of ML" for those new to ML to learn the basics in an hour a day of structured, hands-on challenges. No prior coding experience necessary! Sign up here:
2
61
175
@benhamner
Ben Hamner
8 years
Let's agree that we won't call sophisticated (or unsophisticated) forms of regression "artificial intelligence" when speaking to journalists
7
66
178
@benhamner
Ben Hamner
6 years
Why did the naive Bayesian feel patriotic when they heard fireworks? They assumed independence! (HT @wzchen )
2
30
174
@benhamner
Ben Hamner
5 years
The mathematician in me has a bone to pick with this sign
Tweet media one
8
12
176
@benhamner
Ben Hamner
8 years
Deep forest: an alternative to deep neural networks
Tweet media one
6
70
175
@benhamner
Ben Hamner
9 years
One company's data scientist is another's quant & another's analyst & another's developer & another's ML engineer & another's DB admin
11
171
174