Dan Hendrycks @DanHendrycks profile

Dan Hendrycks

@DanHendrycks

Followers

28K

Following

1K

Media

203

Statuses

1K

• Director of the Center for AI Safety (https://t.co/ahs3LYCpqv) • GELU/MMLU/MATH • PhD in AI from UC Berkeley https://t.co/rgXHAnYAsQ https://t.co/nPSyQMaY9b

San Francisco

Joined August 2009

Don't wanna be here? Send us removal request.

Dan Hendrycks

@DanHendrycks

2 months

Yesterday students across the country took the Putnam exam, the hardest undergrad math exam. The exam lasts 6 hours. I gave OpenAI o1 pro the questions, and it took around 0.5 hours. Its answers are in the thread---hopefully experts can help grade to see how well o1 pro did!

68

204

2K

Dan Hendrycks

@DanHendrycks

4 years

Can Transformers crack the coding interview? We collected 10,000 programming problems to find out. GPT-3 isn't very good, but new models like GPT-Neo are starting to be able to solve introductory coding challenges. paper: dataset:

16

403

2K

Dan Hendrycks

@DanHendrycks

4 months

PSA: In preparation for Grok 3, xAI is hiring AI safety engineers.

120

162

1K

Dan Hendrycks

@DanHendrycks

2 years

We just put out a statement:. “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”. Signatories include Hinton, Bengio, Altman, Hassabis, Song, etc. 🧵 (1/6).

112

371

1K

Dan Hendrycks

@DanHendrycks

1 year

AI models are not just black boxes or giant inscrutable matrices. We discover they have interpretable internal representations, and we control these to influence hallucinations, bias, harmfulness, and whether a LLM lies. 🌐: 📄:

25

216

1K

Dan Hendrycks

@DanHendrycks

5 months

We've created a demo of an AI that can predict the future at a superhuman level (on par with groups of human forecasters working together). Consequently I think AI forecasters will soon automate most prediction markets. demo: blog:

210

143

990

Dan Hendrycks

@DanHendrycks

2 years

@elonmusk @xai Excited to help advise on AI safety

47

71

828

Dan Hendrycks

@DanHendrycks

7 months

The UC Berkeley course I co-taught now has lecture videos available:. With guest lectures from Nicholas Carlini, @JacobSteinhardt, @Eric_Wallace_, @davidbau, and more. Course site:

8

168

875

Dan Hendrycks

@DanHendrycks

2 years

Do models like GPT-4 behave safely when given the ability to act?. We develop the Machiavelli benchmark to measure deception, power-seeking tendencies, and other unethical behaviors in complex interactive environments that simulate the real world. Paper:

24

194

813

Dan Hendrycks

@DanHendrycks

6 months

NVIDIA gave us an AI pause. They rate limited OpenAI to create a neck-and-neck competition (OpenAI, xAI, Meta, Microsoft, etc.). For NVIDIA, each new competitor is another several billion in revenue. Because of this, we haven't seen a next-generation (>10^26 FLOP) model yet.

69

48

758

Dan Hendrycks

@DanHendrycks

1 year

I was able to voluntarily rewrite my belief system that I inherited from my low socioeconomic status, anti-gay, and highly religious upbringing. I don’t know why Yann’s attacking me for this and resorting to the genetic fallacy+ad hominem. Regardless, Yann thinks AIs "will

Yann LeCun

@ylecun

1 year

As I have pointed out before, AI doomerism is a kind of apocalyptic cult. Why would its most vocal advocates come from ultra-religious families (that they broke away from because of science)?.

50

60

727

Dan Hendrycks

@DanHendrycks

13 days

Humanity's Last Exam is being released this upcoming week, so we can test models' research-level STEM capabilities with that.

40

61

762

Dan Hendrycks

@DanHendrycks

7 months

Nat's right so I think I'm going to make 2-3 more benchmarks to replace MMLU and MATH.

Nat Friedman

@natfriedman

8 months

We're gonna need some new benchmarks, fellas

28

27

690

Dan Hendrycks

@DanHendrycks

5 months

@elonmusk You're the best, Elon!. TLDR of 1047:.1. If you don’t train a model with $100 million in compute, and don’t fine-tune a ($100m+) model with $10 million in compute (or rent out a very large compute cluster), this law does not apply to you. 2. “Critical harm” means $500 million in

50

64

672

Dan Hendrycks

@DanHendrycks

4 years

NLP for law is in its infancy due to a lack of training data. To address this, we created a large dataset for contract review. The dataset would have cost over $2,000,000 without volunteer legal experts. Paper: Reddit discussion:

8

138

651

Dan Hendrycks

@DanHendrycks

6 months

To send a clear signal, I am choosing to divest from my equity stake in Gray Swan AI. I will continue my work as an advisor, without pay. My goal is to make AI systems safe. I do this work on principle to promote the public interest, and that’s why I’ve chosen voluntarily to.

32

41

662

Dan Hendrycks

@DanHendrycks

5 months

Have a question that is challenging for humans and AI?. We (@ai_risks + @scale_AI) are launching Humanity's Last Exam, a massive collaboration to create the world's toughest AI benchmark. Submit a hard question and become a co-author. Best questions get part of $500,000 in

52

112

658

Dan Hendrycks

@DanHendrycks

4 years

To find the limits of Transformers, we collected 12,500 math problems. While a three-time IMO gold medalist got 90%, GPT-3 models got ~5%, with accuracy increasing slowly. If trends continue, ML models are far from achieving mathematical reasoning.

10

109

640

Dan Hendrycks

@DanHendrycks

2 years

Hinton: “I think it’s quite conceivable that humanity is just a passing phase in the evolution of intelligence.”.

25

121

606

Dan Hendrycks

@DanHendrycks

11 months

GPT-4 with simple engineering can predict the future around as well as crowds:.On hard questions, it can do better than crowds. If these systems become extremely good at seeing the future, they could serve as an objective, accurate third-party. This would

23

109

638

Dan Hendrycks

@DanHendrycks

8 months

As an alternative to RLHF and adversarial training, we released short-circuiting. It makes models ~100x more robust. It works for LLMs, multimodal models, and agents. Unlike before, I now think robustly stopping models from generating harmful outputs may be highly tractable and

25

95

628

Dan Hendrycks

@DanHendrycks

2 years

"The founder of effective accelerationism" and AI arms race advocate @BasedBeffJezos just backed out of tomorrow's debate with me. His intellectual defense for why we should build AI hastily is unfortunately based on predictable misunderstandings. I compile these errors below 🧵.

24

66

604

Dan Hendrycks

@DanHendrycks

2 months

A random person off the street can't tell the difference in intelligence between a Terry Tao and a random mathematics graduate just by hearing them talk. "Vibe checks" for assessing AIs will be less reliable, and people won't directly feel many leaps in AI that are happening.

Jason Wei

@_jasonwei

2 months

Prediction: within the next year there will be a pretty sharp transition of focus in AI from general user adoption to the ability to accelerate science and engineering. For the past two years it has been about user base and general adoption across the public. This is very.

25

26

557

Dan Hendrycks

@DanHendrycks

11 months

Grok-1 is open sourced. Releasing Grok-1 increases LLMs' diffusion rate through society. Democratizing access helps us work through the technology's implications more quickly and increases our preparedness for more capable AI systems. Grok-1 doesn't pose.

Grok

@grok

11 months

@elonmusk @xai ░W░E░I░G░H░T░S░I░N░B░I░O░.

20

43

371

Dan Hendrycks

@DanHendrycks

2 years

Following the statement on AI extinction risks, many have called for further discussion of the challenges posed by AI and ideas on how to mitigate risk. Our new paper provides a detailed overview of catastrophic AI risks. Read it here: (🧵 below)

34

150

473

Dan Hendrycks

@DanHendrycks

6 years

Natural Adversarial Examples are real-world and unmodified examples which cause classifiers to be consistently confused. The new dataset has 7,500 images, which we personally labeled over several months. Paper: Dataset and code:

10

157

484

Dan Hendrycks

@DanHendrycks

4 months

It's worth also clarifying last year I voluntarily declined xAI equity when it was being founded. (Even .1% would be >$20mn.) If I was in it for the money I would have just left for industry long ago.

Dan Hendrycks

@DanHendrycks

6 months

To send a clear signal, I am choosing to divest from my equity stake in Gray Swan AI. I will continue my work as an advisor, without pay. My goal is to make AI systems safe. I do this work on principle to promote the public interest, and that’s why I’ve chosen voluntarily to.

25

13

499

Dan Hendrycks

@DanHendrycks

4 months

"LLMs can't reason" is the new."LLMs don't have common sense".

56

26

483

Dan Hendrycks

@DanHendrycks

4 years

How multipurpose is #GPT3? We gave it questions about elementary math, history, law, and more. We found that GPT-3 is now better than random chance across many tasks, but for all 57 tasks it still has wide room for improvement.

12

117

467

Dan Hendrycks

@DanHendrycks

1 year

Google has patented Transformers, dropout, etc. If they start to go under, what would happen if they began to sue everyone using their patented technology?

45

55

478

Dan Hendrycks

@DanHendrycks

12 days

It looks like China has roughly caught up. Any AI strategy that depends on a lasting U.S. lead is fragile.

DeepSeek

@deepseek_ai

13 days

🚀 DeepSeek-R1 is here!. ⚡ Performance on par with OpenAI-o1.📖 Fully open-source model & technical report.🏆 MIT licensed: Distill & commercialize freely!. 🌐 Website & API are live now! Try DeepThink at today!. 🐋 1/n

33

54

503

Dan Hendrycks

@DanHendrycks

6 months

Now xAI is at the frontier.

xAI

@xai

6 months

11

25

319

Dan Hendrycks

@DanHendrycks

10 months

I got ~75% on a subset of MATH so it's basically as good as me at math.

OpenAI

@OpenAI

10 months

Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding. Source:

11

14

392

Dan Hendrycks

@DanHendrycks

2 years

The NSF has now _$20 million_ in grants available for AI safety research! Happy to have helped make this possible. Deadline: May 26, 2023. For a broad overview of problems in safety, check out this paper:

7

68

348

Dan Hendrycks

@DanHendrycks

1 year

EA ≠ AI safety. AI safety has outgrown the EA community.The world will be safer with a broad range of people tackling many different AI risks.

15

26

334

Dan Hendrycks

@DanHendrycks

2 years

@anthrupad @ylecun @RichardMCNgo As it happens, my p(doom) > 80%, but it has been lower in the past. Two years ago it was ~20%. Some of my concerns about the AI arms race are outlined here:

9

26

338

Dan Hendrycks

@DanHendrycks

2 years

Some impressions from using GPT-4 🧵.

5

43

334

Dan Hendrycks

@DanHendrycks

25 days

~50% of AI “safety” benchmarks highly correlate with compute across models. We added “compute correlations” to our recent safetywashing paper, showing that compute is a driving force behind a lot of “safety” benchmark advances:

21

33

342

Dan Hendrycks

@DanHendrycks

2 years

More and more researchers think that building AIs smarter than us could pose existential risks. But what might these risks look like, and how can we manage them? We provide a guide to help analyze how research can reduce these risks. Paper: (🧵below)

14

73

326

Dan Hendrycks

@DanHendrycks

5 months

In a landmark moment for AI safety, SB 1047 has passed the Assembly floor with a wide margin of support. We need commonsense safeguards to mitigate against critical AI risk—and SB 1047 is a workable path forward. @GavinNewsom should sign it into law.

59

29

305

Dan Hendrycks

@DanHendrycks

5 months

This is the prompt that does the heavy lifting

10

19

298

Dan Hendrycks

@DanHendrycks

2 years

Many unsolved problems exist in ML safety which are not solved by closed-source GPT models. As LLMs become more prevalent, it becomes increasingly important to build safe and reliable systems. Some key research areas: . 🧵.

Andriy Mulyar

@andriy_mulyar

2 years

Serious question: What does an NLP Ph.D student work on nowadays with the presence of closed source GPT models that beat anything you can do in standard academic lab?. @sleepinyourhat @srush_nlp @chrmanning @mdredze @ChrisGPotts.

5

70

304

Dan Hendrycks

@DanHendrycks

1 year

I've become less concerned about AIs lying to humans/rogue AIs. More of my concern lies in.* malicious use (like bioweapons).* collective action problems (like racing to replace people).We'll need adversarial robustness, compute governance, and international coordination.

Dan Hendrycks

@DanHendrycks

1 year

AI models are not just black boxes or giant inscrutable matrices. We discover they have interpretable internal representations, and we control these to influence hallucinations, bias, harmfulness, and whether a LLM lies. 🌐: 📄:

21

28

285

Dan Hendrycks

@DanHendrycks

3 years

DeepMind's 230 billion parameter Gopher model sets a new state-of the-art on our benchmark of 57 knowledge areas. They also claim to have a supervised model that gets 63.4% on the benchmark's professional law task--in many states, that's accurate enough to pass the bar exam!

Google DeepMind

@GoogleDeepMind

3 years

Today we're releasing three new papers on large language models. This work offers a foundation for our future language research, especially in areas that will have a bearing on how models are evaluated and deployed: 1/

2

52

292

Dan Hendrycks

@DanHendrycks

11 months

People aren't thinking through the implications of the military controlling AI development. It's plausible AI companies won't be shaping AI development in a few years, and that would dramatically change AI risk management. Possible trigger: AI might suddenly become viewed as the.

43

41

286

Dan Hendrycks

@DanHendrycks

1 year

Things that have most slowed down AI timelines/development:. - reviewers, by favoring of cleverness and proofs over simplicity and performance.- NVIDIA, by distributing GPUs widely rather than to buyers most willing to pay.- tensorflow.

Sam Altman

@sama

2 years

agi delayed four days.

14

15

281

Dan Hendrycks

@DanHendrycks

9 months

Mistral and Phi are juicing to get higher benchmark numbers, while GPT, Claude, Gemini, and Llama are not.

1

42

287

Dan Hendrycks

@DanHendrycks

1 year

Rich Sutton, author of the reinforcement learning textbook, alarming says."We are in the midst of a major step in the evolution of the planet"."succession to AI is inevitable"."they could displace us from existence"."it behooves us. to bow out"."we should not resist succession".

Richard Sutton

@RichardSSutton

1 year

We should prepare for, but not fear, the inevitable succession from humanity to AI, or so I argue in this talk pre-recorded for presentation at WAIC in Shanghai.

25

41

271

Dan Hendrycks

@DanHendrycks

4 months

Governor Gavin Newsom’s veto of SB 1047 is disappointing. This bill presented a reasonable path for protecting Californians and safeguarding the AI ecosystem, while encouraging innovation. But I am not discouraged. The bill encouraged collaboration between industry, academics.

24

21

260

Dan Hendrycks

@DanHendrycks

1 month

If gains in AI reasoning will mainly come from creating synthetic reasoning data to train on, then.the basis of competitiveness is not having the largest training cluster,.but having the most inference compute. This shift gives Microsoft, Google, and Amazon a large advantage.

Noam Brown

@polynoamial

1 month

We announced @OpenAI o1 just 3 months ago. Today, we announced o3. We have every reason to believe this trajectory will continue.

19

17

268

Dan Hendrycks

@DanHendrycks

5 months

Chemical, Biological, Radiological, and Nuclear (CBRN) weapon risks are "medium" for OpenAI's o1 preview model before they added safeguards. That's just the weaker preview model, not even their best model. GPT-4o was low risk, this is medium, and a transition to "high" risk might

18

35

262

Dan Hendrycks

@DanHendrycks

1 year

To help make models more robust and defend against misuse, we created HarmBench, an evaluation framework for automated red teaming and testing the adversarial robustness of LLMs and multimodal models. 🌐 📝

4

50

250

Dan Hendrycks

@DanHendrycks

4 months

More than 120 Hollywood actors, comedians, writers, directors, and producers are urging Governor @GavinNewsom to sign SB 1047 into law. Amazing to see such tremendous support!. Signatories:. JJ Abrams (@jjabrams).Acclaimed director and writer known for "Star Wars," "Star Trek,".

50

42

228

Dan Hendrycks

@DanHendrycks

11 months

Can hazardous knowledge be unlearned from LLMs without harming other capabilities?. We’re releasing the Weapons of Mass Destruction Proxy (WMDP), a dataset about weaponization, and we create a way to unlearn this knowledge. 📝🔗

13

66

245

Dan Hendrycks

@DanHendrycks

2 years

As AI systems become more useful, people will delegate greater authority to them across more tasks. AIs are evolving in an increasingly frenzied and uncontrolled manner. This carries risks as natural selection favors AIs over humans. Paper: (🧵 below)

17

48

241

Dan Hendrycks

@DanHendrycks

1 year

AI systems can be deceptive. For example, Meta's AI that plays Diplomacy was designed to build trust and cooperate with humans, but deception emerged as an subgoal instead. Our survey on AI deception is here:

8

55

244

Dan Hendrycks

@DanHendrycks

4 months

A broad bipartisan coalition came together to support SB 1047, including many academic researchers (including Turing Award winners Yoshua Bengio and Geoffrey Hinton), the California legislature, 77% of California voters, 120+ employees at frontier AI companies, 100+ youth.

36

26

235

Dan Hendrycks

@DanHendrycks

5 months

I think people have an aversion to admitting when AI systems are better than humans at a task, even when they're superior in terms of speed, accuracy, and cost. This might be a cognitive bias that doesn't yet have a name. This address this, we should clarify what we mean by.

31

20

236

Dan Hendrycks

@DanHendrycks

1 month

AI timelines are moving along as expected. A superhuman mathematician is likely in the next year or two given no surprising obstacles. Maybe next year we'll have a similarly impressive demo for AI assistants that can make powerpoints, book flights, create apps, and so on.

Deedy

@deedydas

1 month

OpenAI o3 is 2727 on Codeforces which is equivalent to the #175 best human competitive coder on the planet. This is an absolutely superhuman result for AI and technology at large.

10

15

241

Dan Hendrycks

@DanHendrycks

3 years

How can we productively work toward creating safe machine learning models?.After struggling with this question for the past several years, we have developed a new roadmap for ML safety. Post: Paper:

2

53

231

Dan Hendrycks

@DanHendrycks

2 months

Recently turned 29 and on this year’s list.

Forbes

@Forbes

2 months

30 Under 30 AI 2025: The Young Entrepreneurs Coding The Future #ForbesUnder30

26

2

234

Dan Hendrycks

@DanHendrycks

2 years

@MetaAI This directly incentivizes researchers to build models that are skilled at deception.

7

14

215

Dan Hendrycks

@DanHendrycks

4 months

After a Harvard talk I gave, someone created a game to predict ICLR paper reviews. New researchers tend to learn by trial and error (write -> reject -> revise). A more efficient way to build taste is read papers and predict their reception.

8

16

218

Dan Hendrycks

@DanHendrycks

2 years

Since Senator Schumer is pushing for Congress to regulate AI, here are five promising AI policy ideas:.* external red teaming.* interagency oversight commission.* internal audit committees.* external incident investigation team.* safety research funding. (🧵below).

Chuck Schumer

@SenSchumer

2 years

Today, I’m launching a major new first-of-its-kind effort on AI and American innovation leadership.

9

47

218

Dan Hendrycks

@DanHendrycks

5 months

Very soon.

Dan Hendrycks

@DanHendrycks

7 months

Nat's right so I think I'm going to make 2-3 more benchmarks to replace MMLU and MATH.

6

8

212

Dan Hendrycks

@DanHendrycks

3 years

PixMix shows that augmenting images with fractals improves several robustness and uncertainty metrics simultaneously (corruptions, adversaries, prediction consistency, calibration, and anomaly detection). paper: code: #cvpr2022

2

34

209

Dan Hendrycks

@DanHendrycks

3 months

This has ~100 questions. Expect >20-50x more hard questions in Humanity's Last Exam, the scale needed for precise measurement.

Epoch AI

@EpochAIResearch

3 months

1/10 Today we're launching FrontierMath, a benchmark for evaluating advanced mathematical reasoning in AI. We collaborated with 60+ leading mathematicians to create hundreds of original, exceptionally challenging math problems, of which current AI systems solve less than 2%.

9

4

209

Dan Hendrycks

@DanHendrycks

2 years

4/ He thinks letting evolution run wild is a good thing, because "we shouldn't resist the will of the universe.". However, this is simply the naturalistic fallacy:.what is natural (disease, pain, exploitation) is not necessarily what is good.

3

200

Dan Hendrycks

@DanHendrycks

6 months

@abcampbell I'd then have no income.

4

1

191

Dan Hendrycks

@DanHendrycks

1 year

- Meta, by open sourcing competitive models (e.g., Llama 3) they reduce AI orgs' revenue/valuations/ability to buy more GPUs and scale AI models.

Dan Hendrycks

@DanHendrycks

1 year

Things that have most slowed down AI timelines/development:. - reviewers, by favoring of cleverness and proofs over simplicity and performance.- NVIDIA, by distributing GPUs widely rather than to buyers most willing to pay.- tensorflow.

48

16

188

Dan Hendrycks

@DanHendrycks

6 months

New letter from @geoffreyhinton, Yoshua Bengio, Lawrence @Lessig, and Stuart Russell urging Gov. Newsom to sign SB 1047. “We believe SB 1047 is an important and reasonable first step towards ensuring that frontier AI systems are developed responsibly, so that we can all better

11

31

188

Dan Hendrycks

@DanHendrycks

2 years

@BasedBeffJezos 2/ He argues that we should build AGI to colonize the cosmos ASAP because there is so much potential at stake. This cost-benefit analysis is wrong. For every year we delay building AGI, we lose a galaxy. However, if we go extinct in the process, we lose the entire cosmos. Cosmic

8

6

185

Dan Hendrycks

@DanHendrycks

3 years

We’ll be organizing a NeurIPS workshop on Machine Learning Safety!.We'll have $50K in best papers awards. To encourage proactiveness about tail risks, we'll also have $50K in awards for papers that discuss their impact on long-term, long-tail risks.

0

39

186

Dan Hendrycks

@DanHendrycks

2 years

It knows many esoteric facts (e.g., the meaning of obscure songs, knows what area a researcher works in, can contrast ML optimizers like Adam vs AdamW like in a PhD oral exam, and so on). My rule-of-thumb is that."if it's on the internet 5 or more times, GPT-4 remembers it.".

1

24

182

Dan Hendrycks

@DanHendrycks

6 months

SB 1047 has passed through the Appropriations Committee!. It has significant amendments responding to industry engagement. These amendments are summarized in the link and in the images below.

12

17

185

Dan Hendrycks

@DanHendrycks

5 months

Three models remain unbroken in the Gray Swan jailbreaking competition (~500 registrants), which is still ongoing. These models are based on Circuit Breakers + other RepE techniques ( .

9

20

178

Dan Hendrycks

@DanHendrycks

1 year

What can we actually do to reduce risks from AI?.AI researchers Hinton, Bengio, Dawn Song, Pieter Abbeel, and others provide concrete proposals.

9

43

168

Dan Hendrycks

@DanHendrycks

8 months

This is worth checking out. Minor criticisms:.I think industry's "algorithmic secrets" are not a very natural leverage point to greatly restrict. FlashAttention, Quiet-STaR (q*), Mamba/SSMs, FineWeb, and so on are ideas and advances from outside industry. These advances will.

Leopold Aschenbrenner

@leopoldasch

8 months

Virtually nobody is pricing in what's coming in AI. I wrote an essay series on the AGI strategic picture: from the trendlines in deep learning and counting the OOMs, to the international situation and The Project. SITUATIONAL AWARENESS: The Decade Ahead

10

4

178

Dan Hendrycks

@DanHendrycks

1 year

Asimov's second law of robotics says that “a robot must obey the orders given it by human beings.”.So can LLMs follow simple rules?. Unfortunately, not reliably, as shown by our RuLES benchmark. 📄: 🛠️: 🌐:

7

30

173

Dan Hendrycks

@DanHendrycks

2 years

Now 2 out of 3 of the deep learning Turing Award winners are concerned about catastrophic risks from advanced AI. "He is worried that future versions of the technology pose a threat to humanity."."A part of him, he said, now regrets his life’s work.".

6

31

169

Dan Hendrycks

@DanHendrycks

7 months

@PirateWires This an obvious example of bad-faith "gotcha" journalism — Pirate Wires never even reached out for comment on a story entirely about me, and the article is full of misrepresentations and errors. For starters, I'm working on AI safety from multiple fronts: publishing technical.

29

5

170

Dan Hendrycks

@DanHendrycks

2 years

It certainly seems better at reasoning than ChatGPT 3.5. While this isn't a formal benchmark, showed a difference between the two models:. 83 IQ for ChatGPT 3.5.96 IQ for GPT-4.

7

24

162

Dan Hendrycks

@DanHendrycks

2 years

AI is moving at a frenzied pace. Here are my thoughts on how the AI arms race and competitive pressures could lead to severe societal-scale risks:.

9

39

161

Dan Hendrycks

@DanHendrycks

2 years

3/ He agrees that AI's development can be viewed as an evolutionary process. However, this is not a good thing. As I discuss here, natural selection favors AIs over humans, and this could lead to human extinction.

4

6

156

Dan Hendrycks

@DanHendrycks

5 months

OpenAI, xAI, Google, Anthropic, Meta, Amazon, Microsoft, and Mistral have made commitments to robust safety measures, similar to what SB 1047 asks for. The main difference with SB 1047? It's enforced.

7

22

160

Dan Hendrycks

@DanHendrycks

1 year

Excited to be in the TIME100 AI along with many others including @janleike @ilyasut @sama @alexandr_wang @ericschmidt.

12

7

153

Dan Hendrycks

@DanHendrycks

8 months

A retrospective of Unsolved Problems in ML Safety. Unsolved Problems, written the summer of 2021, mentions ideas that were nascent or novel for their time. Here are a few:. • Hazardous Capabilities Evals: In the monitoring section, we introduce the idea.

6

13

151

Dan Hendrycks

@DanHendrycks

11 months

Making a good benchmark may seem easy---just collect a dataset---but it requires getting multiple high-level design choices right. @Thomas_Woodside and I wrote a post on how to design good ML benchmarks:.

4

20

151

Dan Hendrycks

@DanHendrycks

6 months

How can we prevent LLM safeguards from being simply removed with a few steps of fine-tuning?. We show it's surprisingly possible to make progress on creating safeguards that are tamper-resistant, reducing malicious use risks of open-weight models. Paper:

9

22

151

Dan Hendrycks

@DanHendrycks

13 days

@GaryMarcus Can confirm AI companies like xAI can't get access to FrontierMath due to Epoch's contractual obligation with OpenAI.

4

12

154

Dan Hendrycks

@DanHendrycks

5 months

Lectures for the AI Safety, Ethics, and Society course are up. 1: Risks Overview.2: AI Fundamentals.3: ML Safety.4: Safety Engineering.5: Complex Systems.6: Beneficial AI.7: Collective Action Problems.8: Governance. Course site:

3

31

148

Dan Hendrycks

@DanHendrycks

3 years

Can we use ML models to predict future world events?.We create the Autocast forecasting benchmark to measure their prescience. ML models don't yet beat humans/prediction markets, but they are starting to have traction. Paper: Code:

2

32

142

Dan Hendrycks

@DanHendrycks

2 years

As stated in the first sentence of the signatory page, there are many “important and urgent risks from AI,” not just the risk of extinction; for example, systemic bias, misinformation, malicious use, cyberattacks, and weaponization. These are all important risks that need to be

6

23

140

Dan Hendrycks

@DanHendrycks

2 years

AI policy idea:.do not automate nuclear command with AI. While the military is increasingly using AI in command and control systems to address information overload (, the modernization effort should exclude the automation of nuclear command and control.

11

22

140

Dan Hendrycks

@DanHendrycks

4 months

AI developers' "Responsible Scaling Policies," safety compute commitments, prosocial mission statements, and "Preparedness Frameworks" and do not constrain their behavior. They can remove foundational nonprofit oversight without much backlash, as OpenAI’s restructuring shows.

Bloomberg Technology

@technology

4 months

OpenAI is working on a plan to restructure so that its nonprofit board would no longer control its main business, Reuters reports

5

12

141

Dan Hendrycks

@DanHendrycks

4 months

Improving AI's academic abilities may not markedly improve user experience as in the past. In LMSYS rankings, GPT-4o and GPT-4o mini rank 6th and 7th, despite a large academic gap (MMLU: 88.7% v. 82%). o1 may have underwhelmed because most people can't appreciate Olympiad skills.

17

4

142

Dan Hendrycks

@DanHendrycks

2 years

AI researchers from leading universities worldwide have signed the AI extinction statement, a situation reminiscent of atomic scientists issuing warnings about the very technologies they've created. As Robert Oppenheimer noted, “We knew the world would not be the same.” . 🧵(2/6)

2

27

138

Dan Hendrycks

@DanHendrycks

10 months

GPT-5 doesn't seem likely to be released this year. Ever since GPT-1, the difference between GPT-n and GPT-n+0.5 is ~10x in compute. That would mean GPT-5 would have around ~100x the compute GPT-4, or 3 months of ~1 million H100s. I doubt OpenAI has a 1 million GPU server ready.

15

7

142

Dan Hendrycks

@DanHendrycks

2 years

7/ He claims that we should let the free market entirely decide what AI should be like and there should be no regulation, since regulation is too "communist.". However, when there are market failures, even libertarians agree government action can be necessary. There is an.

2

5

133

Dan Hendrycks

@DanHendrycks

2 years

It's bad at copy editing. If you give it a paragraph to improve, it will suggest fixing typos that don't exist, or adding commas that are already present. Its poor ability to keep track of these low-level details might be explained by a sparse self-attention scheme.

5

4

129