davisblalock Profile Banner
Davis Blalock Profile
Davis Blalock

@davisblalock

Followers
12K
Following
364
Statuses
1K

Research scientist + first hire @MosaicML, now @Databricks. @MIT PhD. I post about AI technical progress + sometimes the business side.

San Francisco, CA
Joined December 2016
Don't wanna be here? Send us removal request.
@davisblalock
Davis Blalock
2 years
I've written about 500+ machine learning papers in the past year. Here are some of my most popular threads: [1/n]
12
71
374
@davisblalock
Davis Blalock
18 days
RT @SirrahChan: Here's my attempt at visualizing the training pipeline for DeepSeek-R1(-Zero) and the distillation to smaller models. Not…
0
241
0
@davisblalock
Davis Blalock
18 days
RT @ZimingLiu11: New paper🚨: Physics of Skill Learning Training dynamics is complicated, but are there simple "physical laws" behind it?…
0
99
0
@davisblalock
Davis Blalock
5 months
RT @hbXNov: New paper📢 LLM folks have been supervised finetuning their models with data from large and expensive models (e.g., Gemini Pro).…
0
144
0
@davisblalock
Davis Blalock
5 months
An interesting history-of-science datapoint + yet another win in Noam Shazeer's track record. I can attest that this happens a lot at industry labs—we invent way more stuff than we have time to publish.
@thecharlieblake
Charlie Blake
6 months
In a rather surreal discovery ... it turns out that the "Unit Scaling" idea I've been pushing for the last 2 years was already described in a 2020 docstring by @NoamShazeer, using exactly the same name. The clichés are true.
Tweet media one
1
2
32
@davisblalock
Davis Blalock
6 months
RT @xariusrke: 1/n FP8 training is hard - loss divergence and instability often lead to the conclusion that it’s not possible. But we’ve fo…
0
86
0
@davisblalock
Davis Blalock
6 months
RT @ZackAnkner: Excited to announce our new work: Critique-out-Loud (CLoud) reward models. CLoud reward models first produce a chain of tho…
0
57
0
@davisblalock
Davis Blalock
6 months
RT @Thom_Wolf: It’s Sunday morning we have some time with the coffee so let me tell you about some of our recent surprising journey in synt…
0
113
0
@davisblalock
Davis Blalock
6 months
RT @BlackHC: This is one of the best papers I have read in a while. It contains a crazy amount of insights and ideas 🤯
0
35
0
@davisblalock
Davis Blalock
6 months
RT @Azaliamirh: Is inference compute a new dimension for scaling LLMs? In our latest paper, we explore scaling inference compute by increa…
0
67
0
@davisblalock
Davis Blalock
7 months
RT @maxzimmerberlin: A good time to share our #ICLR2023 paper: How I Learned to Stop Worrying and Love Retraining We explore sparsity-adap…
0
9
0
@davisblalock
Davis Blalock
7 months
RT @nsaphra: Chatbots have biases in what they say—but what about biases in what they WON'T say? Our new paper (w/@victoria_r_li & @YidaEdw
0
27
0
@davisblalock
Davis Blalock
7 months
RT @sarahookr: Does more compute equate with greater risk? What is our track record at predicting what risks emerge with scale? I don't…
0
81
0
@davisblalock
Davis Blalock
7 months
RT @wellecks: What do nucleus sampling, tree-of-thought, and PagedAttention have in common? They're all part of our new survey: "From Deco…
0
114
0
@davisblalock
Davis Blalock
7 months
RT @DimitrisPapail: Thread on our newest paper: 1/n The initial motivation of our project was the "lost in the middle" phenomenon observed…
0
16
0
@davisblalock
Davis Blalock
7 months
RT @mvpatel2000: Gemma 2 is out with 9b and 27b! A few things I really liked in tech report: On pretraining: - GQA (finally lol) - interle…
0
11
0
@davisblalock
Davis Blalock
7 months
Opinion: for those who like the @leopoldasch vision of AGI-as-an-automated-remote-worker, you should be bullish on Microsoft, Google, and Databricks as companies positioned to make this happen. Much of making that product a reality will not be the sexy AI parts, but the enterprise-readiness and ownership of the app + data layers (to preserve security + privacy, get distribution, collect imitation learning + preference data, and ensure reliability). For the latter reason, companies with console-like experiences (Bloomberg terminal, CAD tools, Replit) may also be positioned to make domain-specific "workers," though their AI will lean heavily on external infra. Disclaimer: I'm biased regarding Databricks and don't have totally clear thinking here; comments welcome!
@matei_zaharia
Matei Zaharia
8 months
Databricks Assistant, our in-context AI, is now GA! I'm really proud of what the team built: it's one of the most effective AI assistants in the industry according to users, thanks to integrating context like related code, data, usage patterns, etc.
3
1
18
@davisblalock
Davis Blalock
7 months
RT @mvpatel2000: Fun collaboration between @DbrxMosaicAI and @PyTorch team! We've been working hard to scale MoEs and PyTorch distributed t…
0
27
0
@davisblalock
Davis Blalock
8 months
@UbertiGavin @Tim_Dettmers Are these from the TRT LLM page [1]? I'm seeing 8xH200s max out at 1441.26 for fp8 llama 3 70b there. Although the 20k is way more in line with mlperf... [2]
1
0
0
@davisblalock
Davis Blalock
8 months
RT @michaelryan207: MIPROv2, our new state-of-the-art optimizer for LM programs, is live in DSPy @stanfordnlp! It's even faster, cheaper,…
0
73
0