Jacob Steinhardt Profile
Jacob Steinhardt

@JacobSteinhardt

Followers
7,624
Following
69
Media
16
Statuses
329

Assistant Professor of Statistics, UC Berkeley

Joined December 2011
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@JacobSteinhardt
Jacob Steinhardt
4 years
If our international students don't get a salary, I won't either. I pledge to donate my fall salary unless we fix U.S. immigration policy to allow international students (including incoming students) to be paid their stipend.
5
73
1K
@JacobSteinhardt
Jacob Steinhardt
2 years
My student Kayo Yin needs your help. Her visa has been unnecessarily delayed, which would prevent her from coming to UC Berkeley to start her studies. Despite bringing all required documents, the @StateDept refused to process the visa and it could take months to re-process.
34
332
1K
@JacobSteinhardt
Jacob Steinhardt
1 year
Many people, including me, have been surprised by recent developments in machine learning. To be less surprised in the future, we should make and discuss specific projections about future models. In this spirit, I predict properties of models in 2030:
23
120
548
@JacobSteinhardt
Jacob Steinhardt
2 years
In 2021, I created a forecasting prize to predict ML performance on benchmarks in June 2022 (and 2023, 2024, and 2025). June has ended, so we can see how the forecasters did:
5
96
502
@JacobSteinhardt
Jacob Steinhardt
2 years
This NYT article on Azalia and Anna's excellent chip design work is gross, to the point of journalistic malpractice. It platforms a bully while drawing an absurd parallel to @timnitGebru 's firing. @CadeMetz should be ashamed. (not linking so it doesn't get more clicks)
Tweet media one
17
46
416
@JacobSteinhardt
Jacob Steinhardt
6 months
Can we build an LLM system to forecast geo-political events at the level of human forecasters? Introducing our work Approaching Human-Level Forecasting with Language Models! Arxiv: Joint work with @dannyhalawi15 , @FredZhang0 , and @jcyhc_ai
Tweet media one
12
70
381
@JacobSteinhardt
Jacob Steinhardt
1 year
A core intuition I have about deep neural networks is that they are complex adaptive systems. This creates a number of control difficulties that are different from traditional engineering challenges:
8
79
335
@JacobSteinhardt
Jacob Steinhardt
2 years
I'm back to blogging, with some new thoughts on emergence: I answer the question: what are some specific emergent "failure modes" for ML systems that we should be on the lookout for?
5
49
233
@JacobSteinhardt
Jacob Steinhardt
2 years
To give an idea of just how much SOTA exceeded forecasters' expectations, here are the prediction intervals for the MATH and Massive Multitask benchmarks. Both outcomes exceeded the 95th percentile prediction.
Tweet media one
6
36
208
@JacobSteinhardt
Jacob Steinhardt
3 years
Awesome to see @DeepMind 's recent language modeling paper include our forecasts as a comparison point! Hopefully more papers track progress relative to forecasts so that we can better understand the pace of progress in deep learning.
Tweet media one
1
23
198
@JacobSteinhardt
Jacob Steinhardt
3 years
On my blog, I've recently been discussing emergent behavior and in particular the idea that "More is Different". As part of this, I've compiled a list of examples across a variety of domains:
3
27
194
@JacobSteinhardt
Jacob Steinhardt
2 years
The US Embassy in London must approve Kayo's visa immediately. This is embarrassing and will harm US competitiveness in AI. Please retweet!
4
24
189
@JacobSteinhardt
Jacob Steinhardt
1 year
Since GPT-4 was released last week, I decided to switch things up from AI-related blogging and instead talk about research group culture. In my group, I've come up with a set of principles to help foster healthy and productive group meetings: .
2
18
184
@JacobSteinhardt
Jacob Steinhardt
2 years
Kayo has already done stellar machine learning work for her Master's degree at CMU, one of the top US universities. ML expertise is sorely needed in the US. Is the U.S. really so eager to shoot itself in the foot?
1
4
174
@JacobSteinhardt
Jacob Steinhardt
2 years
Finally, while forecasters underpredicted progress on capabilities, they *overpredicted* progress on robustness. So while capabilities are advancing quickly, safety properties may be behind schedule. A troubling thought.
2
27
149
@JacobSteinhardt
Jacob Steinhardt
2 years
Kayo's semester starts in one week. She's a French citizen who has spent significant time in the U.S. In addition to all required documents, we've sent extensive additional docs to "prove" that Kayo is really coming to Berkeley. There's no reason this can't be approved tomorrow.
2
4
131
@JacobSteinhardt
Jacob Steinhardt
10 months
I worry about tail risks from future AI systems, but I haven't read descriptions that feel plausible to me, so I tried writing some of my own: . This led to four vignettes covering cyberattacks, economic competition, and bioterrorism.
4
20
110
@JacobSteinhardt
Jacob Steinhardt
2 years
I've known Anna for a long time now, and she's one of the most impressive junior ML researchers around. She also holds herself to high standards of integrity. I've been impressed with how well she's handled this situation. Let's give her and Azalia our support.
1
5
112
@JacobSteinhardt
Jacob Steinhardt
3 years
A blog post series on a key way I've changed my mind about ML: the (relative) value of empirical data vs. thought experiments for predicting future ML developments.
1
18
97
@JacobSteinhardt
Jacob Steinhardt
10 months
ML systems are different from traditional software, in that most of their properties are acquired from data, without explicit human intent. This is unintuitive and creates new types of risk. In this blog post I talk about one such risk: unwanted drives
2
19
94
@JacobSteinhardt
Jacob Steinhardt
1 year
I quite enjoyed this workshop, and was pretty happy with the talk I gave (new and made ~from scratch!). My topic was using LLMs to help us understand LLMs, and covers great work by @TongPetersb , @ErikJones313 , @ZhongRuiqi +others. You can watch it here:
1
16
91
@JacobSteinhardt
Jacob Steinhardt
2 years
I suspect most of us in the ML field still haven't internalized how quickly ML capabilities are advancing. We should be preregistering forecasts so that we can learn and correct! I intend to do so for June 2023.
1
6
84
@JacobSteinhardt
Jacob Steinhardt
2 years
Findings: * Forecasters significantly underpredicted progress * But were more accurate than me (I underpredicted progress even more!) * Also were (probably) more accurate than median ML researcher
1
8
82
@JacobSteinhardt
Jacob Steinhardt
2 years
Satrajit Chatterjee (the subject of the article) is portrayed as being fired after raising scientific concerns with Azalia Mirhoseini and Anna Goldie's Nature paper on chip design. In reality, Chatterjee waged a years-long campaign to harass & undermine their work.
1
1
72
@JacobSteinhardt
Jacob Steinhardt
1 year
Over the past two years, I and many other forecasters registered predictions about the state-of-the-art accuracy on ML benchmarks in 2022-2025. In this blog post, I evaluate the predictions for 2023:
3
14
72
@JacobSteinhardt
Jacob Steinhardt
4 years
Gov. Cuomo recently said that he's using R = 1.1 as a trigger point for “circuit breaking” New York’s reopening. This is a weird policy that doesn't make sense, but not because we should use R = 1 instead. 1/N
3
15
65
@JacobSteinhardt
Jacob Steinhardt
2 years
Google's statement says Chatterjee was "terminated with cause". This is an unusually strong statement and shows Google had serious problems with him. NYT should know this so it's unclear why they paint this as "he said she said" (and give most space to Chatterjee).
Tweet media one
1
1
53
@JacobSteinhardt
Jacob Steinhardt
3 years
I argue that while ML models have undergone many qualitative shifts (and will continue to do so), many empirical findings hold up well even across these shifts: Part of the "More is Different" series on my blog!
1
6
52
@JacobSteinhardt
Jacob Steinhardt
4 years
New paper on household transmission of SARS-CoV-2: , with @mihaela_curmei , @andrew_ilyas , and @OwainEvans_UK . Very interested in feedback! We show that under lockdowns, 30-55% of transmissions occur in houses. 1/4.
Tweet media one
2
14
50
@JacobSteinhardt
Jacob Steinhardt
2 years
Interestingly, forecasters' biggest miss was on the MATH dataset, where @alewkowycz @ethansdyer and others set a record of 50.3% on the very last day of June! One day made a huge difference.
2
6
50
@JacobSteinhardt
Jacob Steinhardt
2 years
My tutorial slides on Aligning ML Systems are now online, in HTML format, with clickable references! [NB some minor formatting errors were introduced when converting to HTML]
@NicolasPapernot
Nicolas Papernot
2 years
Next up @satml_conf is @JacobSteinhardt who is giving a terrific tutorial on the topic of "Aligning ML Systems with Human Intent" (like all SaTML content, it is being recorded and will be released in a couple of days)
Tweet media one
1
7
33
1
7
45
@JacobSteinhardt
Jacob Steinhardt
2 years
It's particularly gross that the article repeatedly draws parallels with Timnit Gebru's firing, which is completely different in terms of the facts on the ground. Timnit agrees: . Seems clear that NYT did this for clicks.
I haven't read this @nytimes article by @daiwaka & @CadeMetz . But I had heard about the person from many ppl. To the extent the story is connected to mine, it's ONLY the pattern of action on toxic men taken too late while ppl like me are retaliated against
3
31
208
2
2
43
@JacobSteinhardt
Jacob Steinhardt
2 years
Of course, NYT's in the business of clicks. But they should draw the line when giving a bully a platform to continue to harass two junior researchers.
1
2
41
@JacobSteinhardt
Jacob Steinhardt
10 months
Nora is a super creative thinker and very capable engineer. I'd highly recommend working for her if you want to do cool work on understanding ML models at an open-source org!
@norabelrose
Nora Belrose
10 months
My Interpretability research team at @AiEleuther is hiring! If you're interested, please read our job posting and submit: 1. Your CV 2. Three interp papers you'd like to build on 3. Links to cool open source repos you've built to contact @eleuther .ai
10
44
249
5
0
38
@JacobSteinhardt
Jacob Steinhardt
1 year
Some nice pushback on my GPT-2030 post by @xuanalogue , with lots of links!
@xuanalogue
xuan (ɕɥɛn / sh-yen)
1 year
I respect Jacob a lot but I find it really difficult to engage with predictions of LLM capabilities that presume some version of the scaling hypothesis will continue to hold - it just seems highly implausible given everything we already know about the limits of transformers!
9
29
200
2
2
37
@JacobSteinhardt
Jacob Steinhardt
3 years
Is remote work slower? I estimate 0-50% slower for many tasks, but for some tasks (esp. branching into new areas/skillsets) it can easily be 5x slower. Easy to underestimate for managers, but huge effect:
2
2
33
@JacobSteinhardt
Jacob Steinhardt
1 year
In particular, I project that "GPT-2030" will have a number of properties that are surprising relative to current systems: 1. Superhuman abilities at specific tasks, such as math, programming, and hacking. 2. Fast inference speed and throughput (enough to run millions of copies)
3
5
30
@JacobSteinhardt
Jacob Steinhardt
1 year
Complex adaptive systems follow the law of unintended consequences: straightforward attempts to control traffic, ecosystems, firms, or pathogens fail in unexpected ways. And we can see similar issues in deep networks with reward hacking and emergence.
1
2
30
@JacobSteinhardt
Jacob Steinhardt
1 year
4. Consider not building certain systems. In biology, some gain-of-function research is heavily restricted, and there are significant safeguards around rapidly-evolving systems like pathogens. We should ask if and when similar principles should apply in machine learning.
1
1
26
@JacobSteinhardt
Jacob Steinhardt
1 year
Based on this, I examine a number of principles for improving the safety of deep learning systems that are inspired by the complex systems literature: 1. Build sharp cliffs in the reward landscape around bad behaviors, so that models never explore them in the first place.
1
2
24
@JacobSteinhardt
Jacob Steinhardt
1 year
I've previously made forecasts for mid-2023 (which I'll discuss in July once they resolve). Thinking 7 years out is obviously much harder, but I think important for preparing for the future impacts of ML.
2
0
24
@JacobSteinhardt
Jacob Steinhardt
1 year
2. Train models to self-regulate and have limited aims. 3. Pretraining shapes most of the structure of a model. Consider what heuristics you are baking in at pretraining time, rather than relying on fine-tuning to fix problems.
1
1
24
@JacobSteinhardt
Jacob Steinhardt
3 years
Many have heard of deliberate practice, but I identify another importance mental stance called *deliberate play*. Deliberate play is intentional, but with a softer focus. Deliberate practice develops skills; deliberate play develops frameworks.
0
4
24
@JacobSteinhardt
Jacob Steinhardt
2 years
@EpochAIResearch is one of the coolest (and in my opinion underrated) research orgs for understanding trends in ML. Rather than speculating, they meticulously analyze empirical trends and make projections for the future. Lots of interesting findings in their data!
@pvllss
Pablo Villalobos
2 years
We at @EpochAIResearch recently published a new short report! In "Trends in Training Dataset Sizes", we explore the growth of ML training datasets over the past few decades. Doubling time has historically been 16 months for language datasets and 41 months for vision. 🧵1/3
Tweet media one
1
5
23
0
4
24
@JacobSteinhardt
Jacob Steinhardt
2 years
What will SOTA for ML benchmarks be in 2023? I forecast results for the MATH and MMLU benchmarks, two benchmarks that have had surprising progress in the past year:
1
6
22
@JacobSteinhardt
Jacob Steinhardt
3 years
In the next post of this series, I argue that when predicting the future of ML, we should not simply expect existing empirical trends to continue. Instead, we will often observe qualitatively new, "emergent" behavior: .
@JacobSteinhardt
Jacob Steinhardt
3 years
A blog post series on a key way I've changed my mind about ML: the (relative) value of empirical data vs. thought experiments for predicting future ML developments.
1
18
97
0
2
21
@JacobSteinhardt
Jacob Steinhardt
1 year
3. Parallel learning. Because copies have identical weights, can propagate millions of gradient updates in parallel. This means models could rapidly learn new tasks (including "bad" tasks like manipulation/misinformation).
1
1
21
@JacobSteinhardt
Jacob Steinhardt
1 year
4. New modalities. Beyond tool use and images, may be trained on proteins, astronomical images, networks, etc. Therefore could have strong intuitive grasp of these more "exotic" domains.
2
2
20
@JacobSteinhardt
Jacob Steinhardt
1 year
I elaborate on these and consider several additional ideas in the blog post itself. Thanks to @DanHendrycks for first articulating the complex systems perspective on deep learning to me. He's continuing to do great work in that and other directions at
0
0
18
@JacobSteinhardt
Jacob Steinhardt
3 years
For predicting what future ML systems will look like, it's helpful to have "anchors"---reference classes that are broadly analogous to future ML. Common anchors include "current ML" and "humans", but I think there's many other good choices:
2
3
17
@JacobSteinhardt
Jacob Steinhardt
6 months
In this work, we build a LM pipeline for automated forecasting. Given any question about a future event, it retrieves and summarizes relevant articles, reasons about them, and predicts the probability that the event occurs.
Tweet media one
2
0
16
@JacobSteinhardt
Jacob Steinhardt
2 years
If you want to join me on this, you can register predictions on Metaculus for the MATH and Massive Multitask benchmarks: * * It's pretty easy--just need a Google account. The MATH one is open now and Multitask should be open soon.
@JacobSteinhardt
Jacob Steinhardt
2 years
I suspect most of us in the ML field still haven't internalized how quickly ML capabilities are advancing. We should be preregistering forecasts so that we can learn and correct! I intend to do so for June 2023.
1
6
84
3
4
16
@JacobSteinhardt
Jacob Steinhardt
1 year
I then consider a few ways GPT-2030 could affect society. Importantly, there are serious misuse risks (such as hacking and persuasion) that we should address. These are just two examples, and generally I favor more work on forward-looking analyses of societal impacts.
4
1
16
@JacobSteinhardt
Jacob Steinhardt
2 years
@aghobarah Definitely agree in terms of research track record. But in terms of professional standing, Anna's a PhD student and Azalia's on the academic job market right now. This is important, because it means their careers are more affected by this sort of press (vs. a tenured prof).
0
0
16
@JacobSteinhardt
Jacob Steinhardt
4 years
If you’re interested in this, @andrew_ilyas and I have a working paper discussing these issues in more detail: .
0
1
15
@JacobSteinhardt
Jacob Steinhardt
4 years
@chhaviyadav_ Consulates are closed due to COVID-19, so incoming international students can't apply for visas. Has been true for a while but now at the point it is affecting students directly. See e.g. this June letter from GOP representatives asking Pomep to fix it:
1
1
15
@JacobSteinhardt
Jacob Steinhardt
4 years
Some exciting new work by my student @DanHendrycks and collaborators. We identify seven hypotheses about OOD generalization in the literature, and collect several new datasets to test these. Trying to add more "strong inference" to ML (cf. Platt 1964).
@DanHendrycks
Dan Hendrycks
4 years
What methods actually improve robustness? In this paper, we test robustness to changes in geography, time, occlusion, rendition, real image blurs, and so on with 4 new datasets. No published method consistently improves robustness.
Tweet media one
Tweet media two
Tweet media three
3
29
137
0
1
13
@JacobSteinhardt
Jacob Steinhardt
4 years
Curated list of documented police abuse during protests: . Compilations like this are a compelling reminder that George Floyd is the most salient instance of a broader trend. (And remember: there's also many good police who are supporting protestors.)
0
1
14
@JacobSteinhardt
Jacob Steinhardt
6 months
We compare our system to ensembles of competitive human forecasters ("the crowd"). We approach the performance of the crowd across all questions, and beat the crowd on questions where they are less confident (probabilities between 0.3 and 0.7).
Tweet media one
2
1
13
@JacobSteinhardt
Jacob Steinhardt
4 years
Good to see this analysis, but misleading headline. 24 states have *point estimates* over 1, but uncertainty in estimates is large. Let's consider null hypothesis that Rt=0.95 everywhere. Then would expect 19 states with estimates above 1 (eyeballing stdev=0.17 from fig. 4).
@MRC_Outbreak
MRC Centre for Global Infectious Disease Analysis
4 years
UPDATE: #covid19science #COVID19 in USA ➡️Initial national average reproduction number R was 2.2 ➡️24 states have Rt over 1 ➡️Increasing mobility cause resurgence (doubling number of deaths in 8 weeks) ➡️4.1% of people infected nationally 🔰Report
Tweet media one
10
43
47
1
1
13
@JacobSteinhardt
Jacob Steinhardt
6 months
Moreover, averaging our prediction with the crowd consistently outperforms the crowd itself (as measured by Brier score, the most commonly-used metric of forecasting performance).
1
1
13
@JacobSteinhardt
Jacob Steinhardt
6 months
Our system has a number of interesting properties. For instance, our forecasted probabilities are well-calibrated, even though we perform no explicit calibration and even though the base models themselves are not (!).
Tweet media one
1
1
12
@JacobSteinhardt
Jacob Steinhardt
4 years
Lots of people hating on hydroxychloroquine because Trump likes it. But just because Trump likes something doesn't mean it kills people. Maybe it does, but let's demand real evidence instead of giving shoddy science a pass.
1
1
12
@JacobSteinhardt
Jacob Steinhardt
4 years
The actual issue is that R is not a good metric for directly setting policy, because it's difficult to estimate and far-removed from things in the world we care about, like hospital demand.
1
1
11
@JacobSteinhardt
Jacob Steinhardt
4 years
What's going on with Georgia? They've been "open" for a while now and there's been no apparent spike in cases. I don't think this can just be poor testing because other data sources (e.g. FB surveys) show same thing: 1/5
2
0
12
@JacobSteinhardt
Jacob Steinhardt
3 years
Examples include gecko feet, operating systems, economic specialization, hemoglobin, polymers, eyes, ant colonies, transistors, cities, and skill acquisition. If you're interested in reading about how this applies to ML, check out the full blog series!
1
2
11
@JacobSteinhardt
Jacob Steinhardt
2 years
I *also* still think there are unknown unknowns, and we should probably slow down and understand what current large ML systems are doing, before rushing to deploy new ones. But hopefully concrete behaviors will open the door to concrete research towards addressing them.
1
2
11
@JacobSteinhardt
Jacob Steinhardt
10 months
Overall, each scenario requires a few things to "go right" for the rogue AI system; I think of them as moderate but not extreme tail events, and assign ~5% probability to "something like" one of these scenarios happening by 2050. (w/ additional prob. on other/unknown scenarios)
2
1
9
@JacobSteinhardt
Jacob Steinhardt
6 months
Second, our model underperforms on "easy" questions (where the answer is nearly certain), because it is unwilling to give probabilities very close to 0 or 1. This is possibly an artifact of its safety training.
1
1
10
@JacobSteinhardt
Jacob Steinhardt
1 year
In research, it's important to create an environment that allows for risk-taking and mistakes, while also pushing eventually towards excellence and innovation. I aim to set discussion norms that promote both of these.
1
0
10
@JacobSteinhardt
Jacob Steinhardt
4 years
Some great recommendations from Chloe Cockburn (a program officer at Open Philanthropy, where I worked last summer). My understanding is that DA elections (starts at #9 on the list) are a high-impact route to police and criminal justice reform.
@chloecockburn
chloe cockburn 🟧
4 years
Thread: A lot of people are asking me where to give $ in this moment (I direct criminal justice giving at Open Philanthropy). I've compiled a list of recs for police accountability, including shrinking their budgets; decarceration; and transforming systems. /1
21
603
853
0
0
10
@JacobSteinhardt
Jacob Steinhardt
1 year
@xuanalogue Only thing missing is a counter-prediction so we can compare in 7 years :)
2
0
10
@JacobSteinhardt
Jacob Steinhardt
4 years
Open letter on police reform at UC Berkeley. I helped draft this, together with several amazing students. If you're at UCB and want to sign, please get in touch via e-mail. UCB has already pursued some good reforms, but there's much more to be done.
0
1
7
@JacobSteinhardt
Jacob Steinhardt
9 months
Signal-boosting this pushback since Nuño has a strong forecasting track record. I agree AI part is not traditional ref. class analysis, but think "AI is an adaptive self-replicator, this often causes problems" is importantly less inside-view than [long arg. about paperclips].
@NunoSempere
Nuño Sempere
9 months
@JacobSteinhardt @DhruvMadeka I like the overall analysis. I think that the move of noticing that AIs might share some characteristics with pandemics, in that AIs might be self-replicating, is an inside-view move, and I don't feel great about characterizing that as a reference class analysis.
1
0
2
1
0
9
@JacobSteinhardt
Jacob Steinhardt
6 months
We are excited to continue this work! Please email @dannyhalawi15 at dannyhalawi15 @gmail .com to get in touch.
3
0
9
@JacobSteinhardt
Jacob Steinhardt
6 months
Finally, we provide a self-supervised method that fine-tunes models to forecast better, based on having them mimic rationales and forecasts that outperform the crowd. This is effective enough that fine-tuned GPT-3.5 can beat a carefully prompted GPT-4.
Tweet media one
1
1
9
@JacobSteinhardt
Jacob Steinhardt
2 years
Interesting opportunity to do mechanistic interpretability research! (I have worked/collaborated with Redwood and enjoyed it.)
@NeelNanda5
Neel Nanda
2 years
I'm helping Redwood Research run REMIX, a 1 month mechanistic interpretability sprint where 25+ people to reverse engineer circuits in GPT-2 Small. This seems a great way to get experience exploring @ch402 's transformer circuits work. Apply by 13th Nov!
7
28
167
0
0
9
@JacobSteinhardt
Jacob Steinhardt
6 months
For some cool related work, see , which examines human-LLM forecasting teams, and and , which introduce AI forecasting competitions.
1
0
9
@JacobSteinhardt
Jacob Steinhardt
3 years
This one on writing is an oldie, but hopefully useful to people gearing up for ICLR! Also highly recommend "Style: Lessons in Clarity and Grace" by Williams and Bizup for the book-length treatment of good writing
0
1
9
@JacobSteinhardt
Jacob Steinhardt
2 years
@satml_conf was a great experience. More interesting conversations and ideas per day than at ICML, NeurIPS, or ICLR. The smaller size contributed, as well as a great program. Thanks to @NicolasPapernot and all the organizers!
@NicolasPapernot
Nicolas Papernot
2 years
And @satml_conf is a wrap! Thank you to all the attendees for their amazing energy! Excited to announce that @carmelatroncoso has agreed to co-chair the conference with me next year!!
Tweet media one
4
6
91
1
1
8
@JacobSteinhardt
Jacob Steinhardt
4 years
On the other hand, secondary attack rate (probability of transmission) surprisingly low: ~30% between two house members. Implies infection is not inevitable even between close contacts; basic precautions e.g. handwashing still worthwhile. 2/4
Tweet media one
3
10
8
@JacobSteinhardt
Jacob Steinhardt
4 years
More criticism of Yale wastewater study, links to cool analysis by @xangregg . One thing to keep in mind is that there's excellent, careful researchers in this area who *aren't* publishing results yet because they're waiting for better data. Similar to how serology played out.
@wfithian
Will Fithian
4 years
Incredible... stats meets the 24 hour news cycle. Data scraped from pdf, analyzed and reanalyzed w/in a few days of an exciting preprint appearing. Purple curve (linear smoothing + robust handling of outliers) is v. similar to smoothed curve in the preprint.
1
9
33
1
0
7
@JacobSteinhardt
Jacob Steinhardt
4 years
I'm worried that we're ignoring this data point because it doesn't fit our priors. It's not what I expected either, but therefore important to discuss. Most explanations I see say GA has too few tests or is making up numbers, but these seem untenable given the survey data. 5/5
0
0
8
@JacobSteinhardt
Jacob Steinhardt
1 year
@xuanalogue Thanks, I appreciated this! I don't think I'm claiming data/scale is all that matters, and agree ideas are an important part of the picture. For instance Parsel is an example of ideas helping a lot on APPS.
1
0
7
@JacobSteinhardt
Jacob Steinhardt
3 years
I wrote one of these (the measurement RFP) so I explain here why I think measurement is a promising tool for AI alignment:
@open_phil
Open Philanthropy
3 years
We're accepting proposals for projects working with deep learning systems that could help us understand and make progress on AI alignment. Learn more about the research directions and the application process here:
1
30
86
1
1
6
@JacobSteinhardt
Jacob Steinhardt
4 years
On R = 1.1 in particular: it’s difficult to tell the difference between R=1.1 and R=0.9 without at least 7 days of data, and probably more. R=1.1 corresponds to 2%/day growth, and 0.9 to -2%/day decline.
1
0
5
@JacobSteinhardt
Jacob Steinhardt
3 years
Anyways, that's just a preview and I'll lay out my full position (and arguments behind it) in the series, which posts each Tuesday for next 5 weeks. You can read the first post here: . Comments and feedback welcome!
0
1
6
@JacobSteinhardt
Jacob Steinhardt
2 years
Wow, this is great! Everyone should read Carol's piece if they want to understand transformer inference costs.
@kipperrii
kipply
2 years
transformer inference performance is becoming increasingly important and there's not as much lore on it, so here is a lot of lore that i think fully models llm inference performance
6
64
486
0
1
6
@JacobSteinhardt
Jacob Steinhardt
4 years
While there is a real R (avg # of infections per source), we can't measure this without the infection graph, which few regions have. Instead the "R" we talk about is a model parameter that we’re imputing under lots of assumptions about generation time, infection dynamics, etc.
1
1
5
@JacobSteinhardt
Jacob Steinhardt
1 year
This is a very thoughtful article by @_achan96_ that I enjoyed reading!
@_achan96_
Alan Chan
1 year
There's been a lot of controversy about the CAIS statement on extinction risk from AI, so let's talk about it! I wrote a post with some of my detailed thoughts on objections to the statement.
2
14
25
0
0
4
@JacobSteinhardt
Jacob Steinhardt
1 year
A very creative and thought-provoking read by @DanHendrycks
@DanHendrycks
Dan Hendrycks
1 year
As AI systems become more useful, people will delegate greater authority to them across more tasks. AIs are evolving in an increasingly frenzied and uncontrolled manner. This carries risks as natural selection favors AIs over humans. Paper: (🧵 below)
Tweet media one
Tweet media two
17
48
241
0
0
5
@JacobSteinhardt
Jacob Steinhardt
3 years
Blog post up on Bounded Regret for those who want to learn more!
@FrancesDing
Frances Ding
3 years
Papers often propose a similarity metric and justify it with intuitive desiderata, but different intuitive tests can make any method look good. Our work (joint with Jean-Stanislas Denain and @JacobSteinhardt ) provides a quantitative benchmark for evaluating similarity metrics 4/7
1
0
2
0
0
5
@JacobSteinhardt
Jacob Steinhardt
10 months
I am very interested in discussion and feedback on these scenarios. Debating them has shaped my overall view of catastrophic risks from AI (both overall probability and relative likelihood of different paths), and I expect further discussion to continue to do so.
1
1
5
@JacobSteinhardt
Jacob Steinhardt
3 years
What changed? Ironically, GPT-3. GPT-3 showed that new qualitative capabilities (like in-context learning) can emerge without warning. Despite being a huge engineering accomplishment, GPT-3 showed the limits of the Engineering mindset for predicting the future.
1
1
5