DanHendrycks Profile Banner
Dan Hendrycks Profile
Dan Hendrycks

@DanHendrycks

Followers
28K
Following
1K
Media
203
Statuses
1K

• Director of the Center for AI Safety (https://t.co/ahs3LYCpqv) • GELU/MMLU/MATH • PhD in AI from UC Berkeley https://t.co/rgXHAnYAsQ https://t.co/nPSyQMaY9b

San Francisco
Joined August 2009
Don't wanna be here? Send us removal request.
@DanHendrycks
Dan Hendrycks
2 months
Yesterday students across the country took the Putnam exam, the hardest undergrad math exam. The exam lasts 6 hours. I gave OpenAI o1 pro the questions, and it took around 0.5 hours. Its answers are in the thread---hopefully experts can help grade to see how well o1 pro did!
Tweet media one
Tweet media two
68
204
2K
@DanHendrycks
Dan Hendrycks
4 years
Can Transformers crack the coding interview? We collected 10,000 programming problems to find out. GPT-3 isn't very good, but new models like GPT-Neo are starting to be able to solve introductory coding challenges. paper: dataset:
Tweet media one
Tweet media two
Tweet media three
16
403
2K
@DanHendrycks
Dan Hendrycks
4 months
PSA: In preparation for Grok 3, xAI is hiring AI safety engineers.
120
162
1K
@DanHendrycks
Dan Hendrycks
2 years
We just put out a statement:. “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”. Signatories include Hinton, Bengio, Altman, Hassabis, Song, etc. 🧵 (1/6).
112
371
1K
@DanHendrycks
Dan Hendrycks
1 year
AI models are not just black boxes or giant inscrutable matrices. We discover they have interpretable internal representations, and we control these to influence hallucinations, bias, harmfulness, and whether a LLM lies. 🌐: 📄:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
25
216
1K
@DanHendrycks
Dan Hendrycks
5 months
We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets. demo: blog:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
210
143
990
@DanHendrycks
Dan Hendrycks
2 years
@elonmusk @xai Excited to help advise on AI safety
47
71
828
@DanHendrycks
Dan Hendrycks
7 months
The UC Berkeley course I co-taught now has lecture videos available:. With guest lectures from Nicholas Carlini, @JacobSteinhardt, @Eric_Wallace_, @davidbau, and more. Course site:
8
168
875
@DanHendrycks
Dan Hendrycks
2 years
Do models like GPT-4 behave safely when given the ability to act?. We develop the Machiavelli benchmark to measure deception, power-seeking tendencies, and other unethical behaviors in complex interactive environments that simulate the real world. Paper:
Tweet media one
24
194
813
@DanHendrycks
Dan Hendrycks
6 months
NVIDIA gave us an AI pause. They rate limited OpenAI to create a neck-and-neck competition (OpenAI, xAI, Meta, Microsoft, etc.). For NVIDIA, each new competitor is another several billion in revenue. Because of this, we haven't seen a next-generation (>10^26 FLOP) model yet.
69
48
758
@DanHendrycks
Dan Hendrycks
1 year
I was able to voluntarily rewrite my belief system that I inherited from my low socioeconomic status, anti-gay, and highly religious upbringing. I don’t know why Yann’s attacking me for this and resorting to the genetic fallacy+ad hominem. Regardless, Yann thinks AIs "will
Tweet media one
@ylecun
Yann LeCun
1 year
As I have pointed out before, AI doomerism is a kind of apocalyptic cult. Why would its most vocal advocates come from ultra-religious families (that they broke away from because of science)?.
50
60
727
@DanHendrycks
Dan Hendrycks
13 days
Humanity's Last Exam is being released this upcoming week, so we can test models' research-level STEM capabilities with that.
Tweet media one
40
61
762
@DanHendrycks
Dan Hendrycks
7 months
Nat's right so I think I'm going to make 2-3 more benchmarks to replace MMLU and MATH.
@natfriedman
Nat Friedman
8 months
We're gonna need some new benchmarks, fellas
Tweet media one
28
27
690
@DanHendrycks
Dan Hendrycks
5 months
@elonmusk You're the best, Elon!. TLDR of 1047:.1. If you don’t train a model with $100 million in compute, and don’t fine-tune a ($100m+) model with $10 million in compute (or rent out a very large compute cluster), this law does not apply to you. 2. “Critical harm” means $500 million in
Tweet media one
Tweet media two
Tweet media three
Tweet media four
50
64
672
@DanHendrycks
Dan Hendrycks
4 years
NLP for law is in its infancy due to a lack of training data. To address this, we created a large dataset for contract review. The dataset would have cost over $2,000,000 without volunteer legal experts. Paper: Reddit discussion:
Tweet media one
Tweet media two
Tweet media three
8
138
651
@DanHendrycks
Dan Hendrycks
6 months
To send a clear signal, I am choosing to divest from my equity stake in Gray Swan AI. I will continue my work as an advisor, without pay. My goal is to make AI systems safe. I do this work on principle to promote the public interest, and that’s why I’ve chosen voluntarily to.
32
41
662
@DanHendrycks
Dan Hendrycks
5 months
Have a question that is challenging for humans and AI?. We (@ai_risks + @scale_AI) are launching Humanity's Last Exam, a massive collaboration to create the world's toughest AI benchmark. Submit a hard question and become a co-author. Best questions get part of $500,000 in
Tweet media one
Tweet media two
Tweet media three
52
112
658
@DanHendrycks
Dan Hendrycks
4 years
To find the limits of Transformers, we collected 12,500 math problems. While a three-time IMO gold medalist got 90%, GPT-3 models got ~5%, with accuracy increasing slowly. If trends continue, ML models are far from achieving mathematical reasoning.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
10
109
640
@DanHendrycks
Dan Hendrycks
2 years
Hinton: “I think it’s quite conceivable that humanity is just a passing phase in the evolution of intelligence.”.
25
121
606
@DanHendrycks
Dan Hendrycks
11 months
GPT-4 with simple engineering can predict the future around as well as crowds:.On hard questions, it can do better than crowds. If these systems become extremely good at seeing the future, they could serve as an objective, accurate third-party. This would
Tweet media one
Tweet media two
23
109
638
@DanHendrycks
Dan Hendrycks
8 months
As an alternative to RLHF and adversarial training, we released short-circuiting. It makes models ~100x more robust. It works for LLMs, multimodal models, and agents. Unlike before, I now think robustly stopping models from generating harmful outputs may be highly tractable and
Tweet media one
Tweet media two
25
95
628
@DanHendrycks
Dan Hendrycks
2 years
"The founder of effective accelerationism" and AI arms race advocate @BasedBeffJezos just backed out of tomorrow's debate with me. His intellectual defense for why we should build AI hastily is unfortunately based on predictable misunderstandings. I compile these errors below 🧵.
24
66
604
@DanHendrycks
Dan Hendrycks
2 months
A random person off the street can't tell the difference in intelligence between a Terry Tao and a random mathematics graduate just by hearing them talk. "Vibe checks" for assessing AIs will be less reliable, and people won't directly feel many leaps in AI that are happening.
@_jasonwei
Jason Wei
2 months
Prediction: within the next year there will be a pretty sharp transition of focus in AI from general user adoption to the ability to accelerate science and engineering. For the past two years it has been about user base and general adoption across the public. This is very.
25
26
557
@DanHendrycks
Dan Hendrycks
11 months
Grok-1 is open sourced. Releasing Grok-1 increases LLMs' diffusion rate through society. Democratizing access helps us work through the technology's implications more quickly and increases our preparedness for more capable AI systems. Grok-1 doesn't pose.
@grok
Grok
11 months
@elonmusk @xai ░W░E░I░G░H░T░S░I░N░B░I░O░.
20
43
371
@DanHendrycks
Dan Hendrycks
2 years
Following the statement on AI extinction risks, many have called for further discussion of the challenges posed by AI and ideas on how to mitigate risk. Our new paper provides a detailed overview of catastrophic AI risks. Read it here: (🧵 below)
Tweet media one
34
150
473
@DanHendrycks
Dan Hendrycks
6 years
Natural Adversarial Examples are real-world and unmodified examples which cause classifiers to be consistently confused. The new dataset has 7,500 images, which we personally labeled over several months. Paper: Dataset and code:
Tweet media one
10
157
484
@DanHendrycks
Dan Hendrycks
4 months
It's worth also clarifying last year I voluntarily declined xAI equity when it was being founded. (Even .1% would be >$20mn.) If I was in it for the money I would have just left for industry long ago.
@DanHendrycks
Dan Hendrycks
6 months
To send a clear signal, I am choosing to divest from my equity stake in Gray Swan AI. I will continue my work as an advisor, without pay. My goal is to make AI systems safe. I do this work on principle to promote the public interest, and that’s why I’ve chosen voluntarily to.
25
13
499
@DanHendrycks
Dan Hendrycks
4 months
"LLMs can't reason" is the new."LLMs don't have common sense".
56
26
483
@DanHendrycks
Dan Hendrycks
4 years
How multipurpose is #GPT3? We gave it questions about elementary math, history, law, and more. We found that GPT-3 is now better than random chance across many tasks, but for all 57 tasks it still has wide room for improvement.
Tweet media one
Tweet media two
12
117
467
@DanHendrycks
Dan Hendrycks
1 year
Google has patented Transformers, dropout, etc. If they start to go under, what would happen if they began to sue everyone using their patented technology?
Tweet media one
45
55
478
@DanHendrycks
Dan Hendrycks
12 days
It looks like China has roughly caught up. Any AI strategy that depends on a lasting U.S. lead is fragile.
@deepseek_ai
DeepSeek
13 days
🚀 DeepSeek-R1 is here!. ⚡ Performance on par with OpenAI-o1.📖 Fully open-source model & technical report.🏆 MIT licensed: Distill & commercialize freely!. 🌐 Website & API are live now! Try DeepThink at today!. 🐋 1/n
Tweet media one
33
54
503
@DanHendrycks
Dan Hendrycks
6 months
Now xAI is at the frontier.
@xai
xAI
6 months
11
25
319
@DanHendrycks
Dan Hendrycks
10 months
I got ~75% on a subset of MATH so it's basically as good as me at math.
@OpenAI
OpenAI
10 months
Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding. Source:
Tweet media one
11
14
392
@DanHendrycks
Dan Hendrycks
2 years
The NSF has now _$20 million_ in grants available for AI safety research! Happy to have helped make this possible. Deadline: May 26, 2023. For a broad overview of problems in safety, check out this paper:
7
68
348
@DanHendrycks
Dan Hendrycks
1 year
EA ≠ AI safety. AI safety has outgrown the EA community.The world will be safer with a broad range of people tackling many different AI risks.
15
26
334
@DanHendrycks
Dan Hendrycks
2 years
@anthrupad @ylecun @RichardMCNgo As it happens, my p(doom) > 80%, but it has been lower in the past. Two years ago it was ~20%. Some of my concerns about the AI arms race are outlined here:
9
26
338
@DanHendrycks
Dan Hendrycks
2 years
Some impressions from using GPT-4 🧵.
5
43
334
@DanHendrycks
Dan Hendrycks
25 days
~50% of AI “safety” benchmarks highly correlate with compute across models. We added “compute correlations” to our recent safetywashing paper, showing that compute is a driving force behind a lot of “safety” benchmark advances:
21
33
342
@DanHendrycks
Dan Hendrycks
2 years
More and more researchers think that building AIs smarter than us could pose existential risks. But what might these risks look like, and how can we manage them? We provide a guide to help analyze how research can reduce these risks. Paper: (🧵below)
Tweet media one
14
73
326
@DanHendrycks
Dan Hendrycks
5 months
In a landmark moment for AI safety, SB 1047 has passed the Assembly floor with a wide margin of support. We need commonsense safeguards to mitigate against critical AI risk—and SB 1047 is a workable path forward. @GavinNewsom should sign it into law.
59
29
305
@DanHendrycks
Dan Hendrycks
5 months
This is the prompt that does the heavy lifting
Tweet media one
10
19
298
@DanHendrycks
Dan Hendrycks
2 years
Many unsolved problems exist in ML safety which are not solved by closed-source GPT models. As LLMs become more prevalent, it becomes increasingly important to build safe and reliable systems. Some key research areas: . 🧵.
@andriy_mulyar
Andriy Mulyar
2 years
Serious question: What does an NLP Ph.D student work on nowadays with the presence of closed source GPT models that beat anything you can do in standard academic lab?. @sleepinyourhat @srush_nlp @chrmanning @mdredze @ChrisGPotts.
5
70
304
@DanHendrycks
Dan Hendrycks
1 year
I've become less concerned about AIs lying to humans/rogue AIs. More of my concern lies in.* malicious use (like bioweapons).* collective action problems (like racing to replace people).We'll need adversarial robustness, compute governance, and international coordination.
@DanHendrycks
Dan Hendrycks
1 year
AI models are not just black boxes or giant inscrutable matrices. We discover they have interpretable internal representations, and we control these to influence hallucinations, bias, harmfulness, and whether a LLM lies. 🌐: 📄:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
21
28
285
@DanHendrycks
Dan Hendrycks
3 years
DeepMind's 230 billion parameter Gopher model sets a new state-of the-art on our benchmark of 57 knowledge areas. They also claim to have a supervised model that gets 63.4% on the benchmark's professional law task--in many states, that's accurate enough to pass the bar exam!
Tweet media one
Tweet media two
@GoogleDeepMind
Google DeepMind
3 years
Today we're releasing three new papers on large language models. This work offers a foundation for our future language research, especially in areas that will have a bearing on how models are evaluated and deployed: 1/
Tweet media one
2
52
292
@DanHendrycks
Dan Hendrycks
11 months
People aren't thinking through the implications of the military controlling AI development. It's plausible AI companies won't be shaping AI development in a few years, and that would dramatically change AI risk management. Possible trigger: AI might suddenly become viewed as the.
43
41
286
@DanHendrycks
Dan Hendrycks
1 year
Things that have most slowed down AI timelines/development:. - reviewers, by favoring of cleverness and proofs over simplicity and performance.- NVIDIA, by distributing GPUs widely rather than to buyers most willing to pay.- tensorflow.
@sama
Sam Altman
2 years
agi delayed four days.
14
15
281
@DanHendrycks
Dan Hendrycks
9 months
Mistral and Phi are juicing to get higher benchmark numbers, while GPT, Claude, Gemini, and Llama are not.
Tweet media one
1
42
287
@DanHendrycks
Dan Hendrycks
1 year
Rich Sutton, author of the reinforcement learning textbook, alarming says."We are in the midst of a major step in the evolution of the planet"."succession to AI is inevitable"."they could displace us from existence"."it behooves us. to bow out"."we should not resist succession".
@RichardSSutton
Richard Sutton
1 year
We should prepare for, but not fear, the inevitable succession from humanity to AI, or so I argue in this talk pre-recorded for presentation at WAIC in Shanghai.
25
41
271
@DanHendrycks
Dan Hendrycks
4 months
Governor Gavin Newsom’s veto of SB 1047 is disappointing. This bill presented a reasonable path for protecting Californians and safeguarding the AI ecosystem, while encouraging innovation. But I am not discouraged. The bill encouraged collaboration between industry, academics.
24
21
260
@DanHendrycks
Dan Hendrycks
1 month
If gains in AI reasoning will mainly come from creating synthetic reasoning data to train on, then.the basis of competitiveness is not having the largest training cluster,.but having the most inference compute. This shift gives Microsoft, Google, and Amazon a large advantage.
@polynoamial
Noam Brown
1 month
We announced @OpenAI o1 just 3 months ago. Today, we announced o3. We have every reason to believe this trajectory will continue.
Tweet media one
Tweet media two
19
17
268
@DanHendrycks
Dan Hendrycks
5 months
Chemical, Biological, Radiological, and Nuclear (CBRN) weapon risks are "medium" for OpenAI's o1 preview model before they added safeguards. That's just the weaker preview model, not even their best model. GPT-4o was low risk, this is medium, and a transition to "high" risk might
Tweet media one
18
35
262
@DanHendrycks
Dan Hendrycks
1 year
To help make models more robust and defend against misuse, we created HarmBench, an evaluation framework for automated red teaming and testing the adversarial robustness of LLMs and multimodal models. 🌐 📝
Tweet media one
Tweet media two
Tweet media three
4
50
250
@DanHendrycks
Dan Hendrycks
4 months
More than 120 Hollywood actors, comedians, writers, directors, and producers are urging Governor @GavinNewsom to sign SB 1047 into law. Amazing to see such tremendous support!. Signatories:. JJ Abrams (@jjabrams).Acclaimed director and writer known for "Star Wars," "Star Trek,".
50
42
228
@DanHendrycks
Dan Hendrycks
11 months
Can hazardous knowledge be unlearned from LLMs without harming other capabilities?. We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge. 📝🔗
Tweet media one
Tweet media two
Tweet media three
Tweet media four
13
66
245
@DanHendrycks
Dan Hendrycks
2 years
As AI systems become more useful, people will delegate greater authority to them across more tasks. AIs are evolving in an increasingly frenzied and uncontrolled manner. This carries risks as natural selection favors AIs over humans. Paper: (🧵 below)
Tweet media one
Tweet media two
17
48
241
@DanHendrycks
Dan Hendrycks
1 year
AI systems can be deceptive. For example, Meta's AI that plays Diplomacy was designed to build trust and cooperate with humans, but deception emerged as an subgoal instead. Our survey on AI deception is here:
Tweet media one
Tweet media two
Tweet media three
8
55
244
@DanHendrycks
Dan Hendrycks
4 months
A broad bipartisan coalition came together to support SB 1047, including many academic researchers (including Turing Award winners Yoshua Bengio and Geoffrey Hinton), the California legislature, 77% of California voters, 120+ employees at frontier AI companies, 100+ youth.
36
26
235
@DanHendrycks
Dan Hendrycks
5 months
I think people have an aversion to admitting when AI systems are better than humans at a task, even when they're superior in terms of speed, accuracy, and cost. This might be a cognitive bias that doesn't yet have a name. This address this, we should clarify what we mean by.
31
20
236
@DanHendrycks
Dan Hendrycks
1 month
AI timelines are moving along as expected. A superhuman mathematician is likely in the next year or two given no surprising obstacles. Maybe next year we'll have a similarly impressive demo for AI assistants that can make powerpoints, book flights, create apps, and so on.
@deedydas
Deedy
1 month
OpenAI o3 is 2727 on Codeforces which is equivalent to the #175 best human competitive coder on the planet. This is an absolutely superhuman result for AI and technology at large.
Tweet media one
10
15
241
@DanHendrycks
Dan Hendrycks
3 years
How can we productively work toward creating safe machine learning models?.After struggling with this question for the past several years, we have developed a new roadmap for ML safety. Post: Paper:
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
53
231
@DanHendrycks
Dan Hendrycks
2 months
Recently turned 29 and on this year’s list.
@Forbes
Forbes
2 months
30 Under 30 AI 2025: The Young Entrepreneurs Coding The Future #ForbesUnder30
Tweet media one
26
2
234
@DanHendrycks
Dan Hendrycks
2 years
@MetaAI This directly incentivizes researchers to build models that are skilled at deception.
7
14
215
@DanHendrycks
Dan Hendrycks
4 months
After a Harvard talk I gave, someone created a game to predict ICLR paper reviews. New researchers tend to learn by trial and error (write -> reject -> revise). A more efficient way to build taste is read papers and predict their reception.
8
16
218
@DanHendrycks
Dan Hendrycks
2 years
Since Senator Schumer is pushing for Congress to regulate AI, here are five promising AI policy ideas:.* external red teaming.* interagency oversight commission.* internal audit committees.* external incident investigation team.* safety research funding. (🧵below).
@SenSchumer
Chuck Schumer
2 years
Today, I’m launching a major new first-of-its-kind effort on AI and American innovation leadership.
9
47
218
@DanHendrycks
Dan Hendrycks
5 months
Very soon.
@DanHendrycks
Dan Hendrycks
7 months
Nat's right so I think I'm going to make 2-3 more benchmarks to replace MMLU and MATH.
6
8
212
@DanHendrycks
Dan Hendrycks
3 years
PixMix shows that augmenting images with fractals improves several robustness and uncertainty metrics simultaneously (corruptions, adversaries, prediction consistency, calibration, and anomaly detection). paper: code: #cvpr2022
Tweet media one
Tweet media two
Tweet media three
2
34
209
@DanHendrycks
Dan Hendrycks
3 months
This has ~100 questions. Expect >20-50x more hard questions in Humanity's Last Exam, the scale needed for precise measurement.
@EpochAIResearch
Epoch AI
3 months
1/10 Today we're launching FrontierMath, a benchmark for evaluating advanced mathematical reasoning in AI. We collaborated with 60+ leading mathematicians to create hundreds of original, exceptionally challenging math problems, of which current AI systems solve less than 2%.
Tweet media one
9
4
209
@DanHendrycks
Dan Hendrycks
2 years
4/ He thinks letting evolution run wild is a good thing, because "we shouldn't resist the will of the universe.". However, this is simply the naturalistic fallacy:.what is natural (disease, pain, exploitation) is not necessarily what is good.
Tweet media one
3
3
200
@DanHendrycks
Dan Hendrycks
6 months
@abcampbell I'd then have no income.
4
1
191
@DanHendrycks
Dan Hendrycks
1 year
- Meta, by open sourcing competitive models (e.g., Llama 3) they reduce AI orgs' revenue/valuations/ability to buy more GPUs and scale AI models.
@DanHendrycks
Dan Hendrycks
1 year
Things that have most slowed down AI timelines/development:. - reviewers, by favoring of cleverness and proofs over simplicity and performance.- NVIDIA, by distributing GPUs widely rather than to buyers most willing to pay.- tensorflow.
48
16
188
@DanHendrycks
Dan Hendrycks
6 months
New letter from @geoffreyhinton, Yoshua Bengio, Lawrence @Lessig, and Stuart Russell urging Gov. Newsom to sign SB 1047. “We believe SB 1047 is an important and reasonable first step towards ensuring that frontier AI systems are developed responsibly, so that we can all better
Tweet media one
Tweet media two
Tweet media three
11
31
188
@DanHendrycks
Dan Hendrycks
2 years
@BasedBeffJezos 2/ He argues that we should build AGI to colonize the cosmos ASAP because there is so much potential at stake. This cost-benefit analysis is wrong. For every year we delay building AGI, we lose a galaxy. However, if we go extinct in the process, we lose the entire cosmos. Cosmic
Tweet media one
8
6
185
@DanHendrycks
Dan Hendrycks
3 years
We’ll be organizing a NeurIPS workshop on Machine Learning Safety!.We'll have $50K in best papers awards. To encourage proactiveness about tail risks, we'll also have $50K in awards for papers that discuss their impact on long-term, long-tail risks.
Tweet media one
0
39
186
@DanHendrycks
Dan Hendrycks
2 years
It knows many esoteric facts (e.g., the meaning of obscure songs, knows what area a researcher works in, can contrast ML optimizers like Adam vs AdamW like in a PhD oral exam, and so on). My rule-of-thumb is that."if it's on the internet 5 or more times, GPT-4 remembers it.".
1
24
182
@DanHendrycks
Dan Hendrycks
6 months
SB 1047 has passed through the Appropriations Committee!. It has significant amendments responding to industry engagement. These amendments are summarized in the link and in the images below.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
12
17
185
@DanHendrycks
Dan Hendrycks
5 months
Three models remain unbroken in the Gray Swan jailbreaking competition (~500 registrants), which is still ongoing. These models are based on Circuit Breakers + other RepE techniques ( .
Tweet media one
9
20
178
@DanHendrycks
Dan Hendrycks
1 year
What can we actually do to reduce risks from AI?.AI researchers Hinton, Bengio, Dawn Song, Pieter Abbeel, and others provide concrete proposals.
Tweet media one
9
43
168
@DanHendrycks
Dan Hendrycks
8 months
This is worth checking out. Minor criticisms:.I think industry's "algorithmic secrets" are not a very natural leverage point to greatly restrict. FlashAttention, Quiet-STaR (q*), Mamba/SSMs, FineWeb, and so on are ideas and advances from outside industry. These advances will.
@leopoldasch
Leopold Aschenbrenner
8 months
Virtually nobody is pricing in what's coming in AI. I wrote an essay series on the AGI strategic picture: from the trendlines in deep learning and counting the OOMs, to the international situation and The Project. SITUATIONAL AWARENESS: The Decade Ahead
Tweet media one
Tweet media two
10
4
178
@DanHendrycks
Dan Hendrycks
1 year
Asimov's second law of robotics says that “a robot must obey the orders given it by human beings.”.So can LLMs follow simple rules?. Unfortunately, not reliably, as shown by our RuLES benchmark. 📄: 🛠️: 🌐:
Tweet media one
Tweet media two
7
30
173
@DanHendrycks
Dan Hendrycks
2 years
Now 2 out of 3 of the deep learning Turing Award winners are concerned about catastrophic risks from advanced AI. "He is worried that future versions of the technology pose a threat to humanity."."A part of him, he said, now regrets his life’s work.".
Tweet media one
6
31
169
@DanHendrycks
Dan Hendrycks
7 months
@PirateWires This an obvious example of bad-faith "gotcha" journalism — Pirate Wires never even reached out for comment on a story entirely about me, and the article is full of misrepresentations and errors. For starters, I'm working on AI safety from multiple fronts: publishing technical.
29
5
170
@DanHendrycks
Dan Hendrycks
2 years
It certainly seems better at reasoning than ChatGPT 3.5. While this isn't a formal benchmark, showed a difference between the two models:. 83 IQ for ChatGPT 3.5.96 IQ for GPT-4.
7
24
162
@DanHendrycks
Dan Hendrycks
2 years
AI is moving at a frenzied pace. Here are my thoughts on how the AI arms race and competitive pressures could lead to severe societal-scale risks:.
9
39
161
@DanHendrycks
Dan Hendrycks
2 years
3/ He agrees that AI's development can be viewed as an evolutionary process. However, this is not a good thing. As I discuss here, natural selection favors AIs over humans, and this could lead to human extinction.
4
6
156
@DanHendrycks
Dan Hendrycks
5 months
OpenAI, xAI, Google, Anthropic, Meta, Amazon, Microsoft, and Mistral have made commitments to robust safety measures, similar to what SB 1047 asks for. The main difference with SB 1047? It's enforced.
Tweet media one
7
22
160
@DanHendrycks
Dan Hendrycks
1 year
Excited to be in the TIME100 AI along with many others including @janleike @ilyasut @sama @alexandr_wang @ericschmidt.
12
7
153
@DanHendrycks
Dan Hendrycks
8 months
A retrospective of Unsolved Problems in ML Safety. Unsolved Problems, written the summer of 2021, mentions ideas that were nascent or novel for their time. Here are a few:. • Hazardous Capabilities Evals: In the monitoring section, we introduce the idea.
6
13
151
@DanHendrycks
Dan Hendrycks
11 months
Making a good benchmark may seem easy---just collect a dataset---but it requires getting multiple high-level design choices right. @Thomas_Woodside and I wrote a post on how to design good ML benchmarks:.
4
20
151
@DanHendrycks
Dan Hendrycks
6 months
How can we prevent LLM safeguards from being simply removed with a few steps of fine-tuning?. We show it's surprisingly possible to make progress on creating safeguards that are tamper-resistant, reducing malicious use risks of open-weight models. Paper:
Tweet media one
Tweet media two
Tweet media three
9
22
151
@DanHendrycks
Dan Hendrycks
13 days
@GaryMarcus Can confirm AI companies like xAI can't get access to FrontierMath due to Epoch's contractual obligation with OpenAI.
4
12
154
@DanHendrycks
Dan Hendrycks
5 months
Lectures for the AI Safety, Ethics, and Society course are up. 1: Risks Overview.2: AI Fundamentals.3: ML Safety.4: Safety Engineering.5: Complex Systems.6: Beneficial AI.7: Collective Action Problems.8: Governance. Course site:
3
31
148
@DanHendrycks
Dan Hendrycks
3 years
Can we use ML models to predict future world events?.We create the Autocast forecasting benchmark to measure their prescience. ML models don't yet beat humans/prediction markets, but they are starting to have traction. Paper: Code:
Tweet media one
Tweet media two
Tweet media three
2
32
142
@DanHendrycks
Dan Hendrycks
2 years
As stated in the first sentence of the signatory page, there are many “important and urgent risks from AI,” not just the risk of extinction; for example, systemic bias, misinformation, malicious use, cyberattacks, and weaponization. These are all important risks that need to be
Tweet media one
6
23
140
@DanHendrycks
Dan Hendrycks
2 years
AI policy idea:.do not automate nuclear command with AI. While the military is increasingly using AI in command and control systems to address information overload (, the modernization effort should exclude the automation of nuclear command and control.
Tweet media one
11
22
140
@DanHendrycks
Dan Hendrycks
4 months
AI developers' "Responsible Scaling Policies," safety compute commitments, prosocial mission statements, and "Preparedness Frameworks" and do not constrain their behavior. They can remove foundational nonprofit oversight without much backlash, as OpenAI’s restructuring shows.
@technology
Bloomberg Technology
4 months
OpenAI is working on a plan to restructure so that its nonprofit board would no longer control its main business, Reuters reports
5
12
141
@DanHendrycks
Dan Hendrycks
4 months
Improving AI's academic abilities may not markedly improve user experience as in the past. In LMSYS rankings, GPT-4o and GPT-4o mini rank 6th and 7th, despite a large academic gap (MMLU: 88.7% v. 82%). o1 may have underwhelmed because most people can't appreciate Olympiad skills.
17
4
142
@DanHendrycks
Dan Hendrycks
2 years
AI researchers from leading universities worldwide have signed the AI extinction statement, a situation reminiscent of atomic scientists issuing warnings about the very technologies they've created. As Robert Oppenheimer noted, “We knew the world would not be the same.” . 🧵(2/6)
Tweet media one
2
27
138
@DanHendrycks
Dan Hendrycks
10 months
GPT-5 doesn't seem likely to be released this year. Ever since GPT-1, the difference between GPT-n and GPT-n+0.5 is ~10x in compute. That would mean GPT-5 would have around ~100x the compute GPT-4, or 3 months of ~1 million H100s. I doubt OpenAI has a 1 million GPU server ready.
15
7
142
@DanHendrycks
Dan Hendrycks
2 years
7/ He claims that we should let the free market entirely decide what AI should be like and there should be no regulation, since regulation is too "communist.". However, when there are market failures, even libertarians agree government action can be necessary. There is an.
2
5
133
@DanHendrycks
Dan Hendrycks
2 years
It's bad at copy editing. If you give it a paragraph to improve, it will suggest fixing typos that don't exist, or adding commas that are already present. Its poor ability to keep track of these low-level details might be explained by a sparse self-attention scheme.
5
4
129