David Krueger @DavidSKrueger profile

David Krueger

@DavidSKrueger

Followers

16K

Following

3K

Media

170

Statuses

4K

AI professor. Deep Learning, AI alignment, ethics, policy, & safety. Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI. AI is a really big deal.

Joined November 2011

Don't wanna be here? Send us removal request.

David Krueger

@DavidSKrueger

4 months

4/9 NeurIPS submissions accepted!. Congrats to @JamesAllingham, @kayembruno, @jmhernandez233, @LukeMarks00, @FazlBarez, @RyanPGreenblatt, @FabienDRoger, @dmkrash, Stephen Chung, @scottniekum and all of my other co-authors on these works!. Summaries / links in thread. .

4

7

91

David Krueger

@DavidSKrueger

21 days

99.99% of people talking about AGI

46

210

2K

David Krueger

@DavidSKrueger

3 years

Machine Learning lives in an uncanny valley btw Science and Engineering. It's the worst of both worlds. We don't care about understanding, just making things "work" (bad science). We don't care if things work in the real world, just on contrived benchmarks (bad engineering).

35

174

1K

David Krueger

@DavidSKrueger

4 years

Big News!. I'm thrilled to be joining the Computational and Biological Learning Lab (CBL) at Cambridge this fall as a Lecturer (tenure track faculty)!. I plan to focus on Deep Learning and AI Alignment. I'm recruiting PhD students to start as soon as this fall.

65

109

1K

David Krueger

@DavidSKrueger

1 year

I'm looking for:.- PhD students.- postdocs.- and potentially interns and research assistants (like a post-doc for those who haven't got a PhD). Potential post-docs should reach out ASAP and strongly consider applying for funding from FLI; this would need to happen quite soon.

39

231

898

David Krueger

@DavidSKrueger

2 years

There are a significant number of people in the AI research community who explicitly think humans should be replaced by AI as the natural next step in evolution, and the sooner the better!.

184

139

838

David Krueger

@DavidSKrueger

3 years

I am looking for 1-2 PhD students to join my research group at Cambridge (CBL) fall 2022. Our focus is on Deep Learning, AI Alignment, and existential safety. Deadline for university funding is December 2.

26

369

852

David Krueger

@DavidSKrueger

2 years

Geoff Hinton quit Google so he can speak freely on the risks of AI. "He is worried that future versions of the technology pose a threat to humanity".

24

141

581

David Krueger

@DavidSKrueger

2 years

If you think the risk of human extinction from AI is overblown, just go looking for the good counter-arguments. That's what I did 10 years ago. They don't exist.

160

55

519

David Krueger

@DavidSKrueger

2 years

It is remarkable that this anti-social position is tolerated in our field. Imagine if biologists were regularly saying things like "I think it would be good if some disease exterminated humanity -- that's just evolution!".

22

43

514

David Krueger

@DavidSKrueger

9 months

Greg was one of the founding team at OpenAI who seemed cynical and embarrased about the org's mission (basically, the focus on AGI and x-risk) in the early days. I remember at ICLR Puerto Rico, in 2016, the summer after OpenAI was founded, a bunch of researchers sitting out on.

Greg Brockman

@gdb

9 months

We’re really grateful to Jan for everything he's done for OpenAI, and we know he'll continue to contribute to the mission from outside. In light of the questions his departure has raised, we wanted to explain a bit about how we think about our overall strategy. First, we have.

39

30

493

David Krueger

@DavidSKrueger

10 months

I’m super excited to release our 100+ page collaborative agenda - led by @usmananwar391 - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities! . Some highlights below.

7

156

461

David Krueger

@DavidSKrueger

3 years

Now we finally have something to cite to refute people who say "neural networks can't extrapolate" :).

7

64

344

David Krueger

@DavidSKrueger

2 years

If LLMs are as adversarially vulnerable as image models, then safety filters won't work for any mildly sophisticated actor (e.g. grad student). It's not obvious they are, since text is discrete, meaning: 1) the attack space is restricted .2) attacks are harder to find.

Andy Zou

@andyzou_jiaming

2 years

🚨We found adversarial suffixes that completely circumvent the alignment of open source LLMs. More concerningly, the same prompts transfer to ChatGPT, Claude, Bard, and LLaMA-2…🧵. Website: Paper:

28

59

338

David Krueger

@DavidSKrueger

3 years

I'm a doctor now! . Almost 1 year after starting my faculty position at Cambridge, I've finally defended my thesis :) . Counting my MSc I've been at Mila for ~9 years. Only Yoshua, Aaron, and Pascal have been around longer. I'm going to miss it!.It's been a wild ride!.

27

1

323

David Krueger

@DavidSKrueger

2 years

I recently told an x-risk focused community: . "I think there are clearly important present day harms or AI that are neglected by the AI safety community, and that a lot of x-risk comes from increasingly bad versions of such problems". I was asked for examples. My response: 🧵.

21

114

288

David Krueger

@DavidSKrueger

1 year

My research group @kasl_ai is looking for interns!. Applications are due in 2 weeks ***January 29***. The long-awaited form: Please share widely!!.

7

74

281

David Krueger

@DavidSKrueger

1 year

I'm thrilled to be joining the Frontier AI Task Force (formerly "Foundation Models Task Force") as a reseearch director along with @yaringal!. I'm really excited to see what we can accomplish. Stay tuned!.

Ian Hogarth

@soundboy

1 year

1/ 11 weeks ago I agreed to Chair the UK's efforts to accelerate state capacity in AI Safety - measuring and mitigating the risks of frontier models so we can safety capture their opportunities. Here is our first progress report:

22

18

268

David Krueger

@DavidSKrueger

2 years

What to make of this?

34

5

227

David Krueger

@DavidSKrueger

2 years

We're looking for a few (paid) interns this summer!. Apply here by April 30: .

7

53

235

David Krueger

@DavidSKrueger

3 years

As of a few minutes ago, I am officially a (UNIVERSITY) ASSISTANT PROFESSOR! . No more conversations about how "At Cambridge, Lecturer means TT faculty. ". Well fewer anyways I hope. .

17

1

217

David Krueger

@DavidSKrueger

2 years

It's been disappointing to see how many of the media articles have focused -- almost exclusively -- on the tech CEOs that signed the CAIS statement on AI risk. In my mind the headline is that >100 AI professors signed it, and it's NOT just "the usual suspects".

11

31

207

David Krueger

@DavidSKrueger

5 months

When people start fighting each other using Superintelligent AI, nobody wins. The AI wins. If people don't realize this, they will make terrible decisions.

28

29

191

David Krueger

@DavidSKrueger

2 years

A clip from my first live TV appearance!. With @AyoSokale on @GMB discussing extinction risk from AI.

Good Morning Britain

@GMB

2 years

'AI is not going to be like every other technology' . 'I think we're going to struggle to survive as a species'. Could AI lead to the extinction of humanity?

20

28

197

David Krueger

@DavidSKrueger

2 years

Personally, I am deeply concerned that the current trajectory for AI regulation will accomplish little more than a regulatory moat. Licensing, evaluations, and audits are all good ideas. But they are not enough. We need an indefinite pause on the development of more powerful AI.

44

35

197

David Krueger

@DavidSKrueger

3 months

Hot take 🔥🔥🔥.How about we DON'T build Skynet?

14

30

194

David Krueger

@DavidSKrueger

2 years

Saying there is no reason to expect advanced AI to take over the world and kill everyone is factually incorrect. There are many articles discussing this, including top tier ML publications. Saying there is no evidence is applying a dangerously inappropriate standard of evidence.

36

8

193

David Krueger

@DavidSKrueger

5 months

For a long time, people have applied a double standard for arguments for/against AI x-risk. Arguments for x-risk have been (incorrectly) dismissed as “anthropomorphising”. But Ng is “sure” that we’ll have warning shots based on an anecdote about his kids.

Andrew Ng

@AndrewYNg

5 months

Last weekend, my two kids colluded in a hilariously bad attempt to mislead me to look in the wrong place during a game of hide-and-seek. I was reminded that most capabilities — in humans or in AI — develop slowly. Some people fear that AI someday will learn to deceive humans

6

9

186

David Krueger

@DavidSKrueger

2 years

How does this still only have 17 citations? .

5

18

187

David Krueger

@DavidSKrueger

3 months

Most of your ICLR reviewers voting to reject??. Don't give up!. An average score of 5.4 already puts you in the top 30%!

3

16

186

David Krueger

@DavidSKrueger

2 years

Our paper “Defining and Characterizing Reward Hacking” was accepted at NeurIPS!.with @JoarMVS, @__niki_howe__, & @dmkrash. We define what it means for a proxy reward function to be hackable/gameable. Bad news: hackability is ~unavoidable (Thm1)!

4

23

178

David Krueger

@DavidSKrueger

3 years

I recently started as a professor in CBL (Cambridge) and I'm rapidly expanding my group. Currently seeking postdocs in Deep Learning and AI Alignment!.. **Deadline August 2**.Please advertise far and wide through your networks!.

6

69

173

David Krueger

@DavidSKrueger

3 months

AI safety trilemma. you can't simultaneously:.1) Think AGI is coming soon.2) Not want to slow it down .3) Be one of the good guys. If you're working at an org developing AGI and it's public stance isn't "please help us stop the AGI arms race!", question the leadership.

31

26

173

David Krueger

@DavidSKrueger

3 years

The real Turing test is whether your AI takes over the world.

15

10

162

David Krueger

@DavidSKrueger

2 years

It still amazes me that Yann and Andrew Ng can confidently predict that human-like/level AI is too far off to worry about. and act like people who disagree are the ones who are overconfident in their beliefs.

Yann LeCun

@ylecun

2 years

@ESYudkowsky @erikbryn My entire career has been focused on figuring what's missing from AI systems to reach human-like intelligence. I tell you, we're not there yet. If you want to know what's missing, just listen to one of my talks of the last 7 or 8 years, preferably a recent one like this:.

24

9

164

David Krueger

@DavidSKrueger

6 years

If you're in ML, you've probably heard about BigBiGAN because of the DM PR machine. But you may not have heard about this paper by @philip_bachman et al. that came 4 days later and crushes their results.

4

36

169

David Krueger

@DavidSKrueger

11 months

. multiple ICML submissions mentioning in passing how you can use chain-of-thought to figure out why a model did what it did, without any awareness that it's not necessarily faithful.

11

12

161

David Krueger

@DavidSKrueger

2 years

Yoshua Bengio thinks we should slow down the development of more powerful AI:. Yoshua is:.- One of the pioneers of Deep Learning, the technology driving progress in AI.- Winner of the Turing award ("Nobel Prize" of CS).- One of my MSc supervisors.

6

23

159

David Krueger

@DavidSKrueger

3 years

Our paper on o̶p̶t̶i̶m̶i̶z̶a̶t̶i̶o̶n̶ ̶d̶a̶e̶m̶o̶n̶s̶ / i̶n̶n̶e̶r̶ ̶o̶p̶t̶i̶m̶i̶z̶e̶r̶s̶ / m̶e̶s̶a̶-̶o̶p̶t̶i̶m̶i̶z̶a̶t̶i̶o̶n̶ / i̶n̶n̶e̶r̶ ̶a̶l̶i̶g̶n̶m̶e̶n̶t̶ / o̶b̶j̶e̶c̶t̶i̶v̶e̶ ̶r̶o̶b̶u̶s̶t̶n̶e̶s̶s̶ / **goal misgeneralization** was accepted at ICML! .

1

19

159

David Krueger

@DavidSKrueger

5 years

Want to generalize Out-of-Distribution? .Try INCREASING risk(=loss) on training environments with higher performance!. Joint work with:.@ethancaballero .@jh_jacobsen .@yayitsamyzhang .@jjbinas.@LPRmi.@AaronCourville

1

33

153

David Krueger

@DavidSKrueger

3 years

Serious question: why do some ML researchers not publish in ICML/ICLR/NeurIPS? Just too competitive?. In my experience these are almost the only venues that have large impact in ML, excepting a few application-oriented conferences like CVPR. well and Science/Nature, I guess ;).

36

14

152

David Krueger

@DavidSKrueger

2 years

I spent most of ICLR talking to researchers about AI x-safety. It was great! It seems many researchers are now keen-to-eager to have these conversations. Some recurring themes below.

1

20

149

David Krueger

@DavidSKrueger

2 years

Recommended!. This is a well-written, concise (10 pages!) summary of the Alignment Problem. It helps explain:.1) The basic concerns of (many) alignment researchers .2) Why we don't think it's crazy to talk about AI killing everyone.3) What's distinctive about the field.

Richard Ngo

@RichardMCNgo

2 years

At OpenAI a lot of our work aims to align language models like ChatGPT with human preferences. But this could become much harder once models can act coherently over long timeframes and exploit human fallibility to get more reward. 📜Paper: 🧵Thread:

5

23

153

David Krueger

@DavidSKrueger

2 years

We are looking for more collaborators to help drive forward a few projects in my group!. Open to various arrangements; looking for people with some experience, who can start soon and spend 20+hrs/week. We'll start reviewing applications end of next week.

7

36

151

David Krueger

@DavidSKrueger

3 months

It’s not the case that “everyone thought” that “AI” is “immediately dangerous”, and @tszzl really ought to know better. I can see no way to interpret that statement which makes it true. Maybe it “vibes”, but it’s a lie.

roon

@tszzl

3 months

obviously because ai is less immediately dangerous and more default aligned than everyone thought and iterative deployment works. total openai ideological victory though.

10

4

150

David Krueger

@DavidSKrueger

2 years

Public Service Announcement (literally): the £100M UK AI safety taskforce has launched, with an open recruiting call:

2

39

147

David Krueger

@DavidSKrueger

4 months

I believe the call for "evidence-based AI policy" is wildly inappropriate and misleading. No, I'm not against evidence or science!. But there are a MAJOR problems with this narrative.

19

21

145

David Krueger

@DavidSKrueger

5 years

(received) classic advice on academic writing:.Spend equal time on: .1) the Title .2) the Abstract .3) Intro+Conclusion .4) the rest of the paper. (reinterpreted) modern advice:.Spend equal time on: .1) the GIF .2) the Tweet .3) the Blog post .4) the paper.

0

9

145

David Krueger

@DavidSKrueger

1 year

I am looking for PhD students!!. I'm increasingly interested in work supporting AI governance, e.g. that:.- highlights the need for policy, e.g. by breaking methods or models.- could help monitor and enforce policies.- generally increases affordances for policymakers.

Cambridge MLG

@CambridgeMLG

1 year

David Krueger @DavidSKrueger (, who focuses on deep learning, AI alignment, & AI safety, with interests including: foundation models; jailbreaking, reward hacking, & alignment; generalization in deep learning; policy, governance, & societal impacts of AI.

4

32

139

David Krueger

@DavidSKrueger

2 years

"Mistakes were made? A critical look at how EA has approached AI safety". David Krueger | EAG London 23.

5

15

140

David Krueger

@DavidSKrueger

3 months

o1 signals an end to "AI equality". "America started the tradition where the richest consumers buy essentially the same things as the poorest. You can be watching TV and see Coca-Cola, and you know that the President drinks Coke." - Warhol. This is true of GPT models, but not o1.

17

7

133

David Krueger

@DavidSKrueger

3 months

It's become common to claim that alignment is basically easy / solved / "alignment by default" / etc. This is false on two levels:.1) Getting chatbots to behave isn't solving alignment.2) We don't seem to know how to do that, as the recent Gemeni example seems to illustrate.

10

16

135

David Krueger

@DavidSKrueger

3 years

Ever feel like "it's all been done" in AI? The fundamentals (TD, backprop, etc.) are there and the rest is just scaling and engineering hacks?. Try working on alignment! We have fundamental unsolved problems out the wazoo!. embedded agency, incentives, interpretability, . .

7

13

129

David Krueger

@DavidSKrueger

2 years

If you aren't concerned about x-risk from out-of-control AI, what would it take to convince you?. A few possible ideas:.- A chatbot that can mimic my friends.- Working humanoid robots.- Most ML researchers publicly voice concerns.- A "sinister stumble".- AI software engineer.

67

12

127

David Krueger

@DavidSKrueger

3 years

3 things reasonable people should be able to agree on:. 1) AI x-risk is non-negligible.2) There are things we can do to reduce AI x-risk.3) Mitigating AI x-risk is an ethical priority. Notably absent:.- AGI is imminent.- AI x-risk is the most important problem.

13

7

126

David Krueger

@DavidSKrueger

2 years

1

6

121

David Krueger

@DavidSKrueger

6 months

e/accs and VCs in 5 years:. "Sure, nobody can get their AI to pass the 'don't go rogue and kill everyone' evals IN SIMULATION, but this is all just doomer decel fearmongering!.You can't regulate math!.Think of the economy!.If we don't China will!".

13

8

119

David Krueger

@DavidSKrueger

5 months

I asked a mechinterp workshop with ~100 people who agreed with that statement and literally nobody raised their hand. a16z is utterly shameless.

Cas (Stephen Casper)

@StephenLCasper

5 months

It's not just SB 1047 either. Remember when a16z told the UK government that mechanistic interpretability was solved?.

2

9

118

David Krueger

@DavidSKrueger

2 years

I applaud Yoshua for speaking out. It is emotionally difficult to confront the possibility of human extinction from AI. It is especially difficult to do this if you've spent your life working on building it. And it is especially difficult if you've been dismissive in the past.

Fin Moorhouse

@finmoorhouse

2 years

Yoshua Bengio admits to feeling "lost" over his life's work: "he would have prioritised safety over usefulness had he realised the pace at which it would evolve".

7

8

122

David Krueger

@DavidSKrueger

9 months

For a while I've been saying that I still know *some* people at OpenAI understand and care about x-safety. The list is growing shorter and shorter. .

Jan Leike

@janleike

9 months

I resigned.

5

121

David Krueger

@DavidSKrueger

2 years

Honestly, I was underwhelmed by ChatGPT (maybe because I'd already demoed a similar Anthropic model). Am I missing something? . What's the impressive, cool, stuff you've seen it do (esp. that GPT-3 didn't)?.

52

7

116

David Krueger

@DavidSKrueger

6 months

I have a reputation in the x-safety community as a mechanistic interpretability "skeptic". I still think too many x-safety people work on it vs other technical topics (and too much on technical stuff overall), but I've had a few positive updates on mechinterp over the years. 🧵.

5

10

121

David Krueger

@DavidSKrueger

3 months

Honestly, I think the idea that animals only work via "association" and lack "inferential capabilities" is implausible metaphysical speciesist garbage functioning primarily to perpetuate atrocities.

Paul Calcraft

@paul_cal

3 months

So uhh, this whole reasoning vs. stochastic parrots thing has been going on for quite a while. in *animal cognition*. "among the most long-standing and most intensely debated controversies"

3

8

117

David Krueger

@DavidSKrueger

2 years

I don't want to summarize too much, since I think everyone should read this latest piece by Yoshua Bengio (Turing award winner for Deep Learning, along with Geoff Hinton and Yann Lecun):. Suffice to say that this piece engages w/AI x-risk very seriously.

0

36

118

David Krueger

@DavidSKrueger

2 years

@PhilBeaudoin @tyrell_turing Just the title is maddening. I am seriously dissappointed in the authors. I encourage everyone to spend more time engaging with existing literature and experts before calling the subject of my entire field an "illusion". This is so disrespectful and ignorant to the hundreds.

3

5

114

David Krueger

@DavidSKrueger

3 years

3

9

114

David Krueger

@DavidSKrueger

2 years

A lot of people in the AI x-safety community seem to think @AnthropicAI has some sort of moral high ground over @OpenAI, but this sort of thing makes it hard for me to see why that is supposed to be the case. #justdontbuildAGI.

Ethan Caballero is busy

@ethanCaballero

2 years

Anthropic plans to build a model tentatively called Claude-Next 10X more capable than today’s most powerful AI that'll require spending $1Billion over the next 18 months. “Companies that train the best models will be too far ahead for anyone to catch up”.

10

7

112

David Krueger

@DavidSKrueger

2 months

No, the spiciest part is where it does this WITHOUT ANY SPECIFIC NUDGING OR PROMPTING. Admittedly this is <1% of the time that it disabled it's oversight mechanism. But it also faked alignment 37% of the time!? . Sadly scant on details, as we've come to expect :/

Simon Willison

@simonw

2 months

Here's the spiciest detail from the new o1 system card:

5

11

112

David Krueger

@DavidSKrueger

2 years

I've worried AI could lead to human extinction ever since I heard about Deep Learning from Hinton's Coursera course, >10 years ago. So it's great to see so many AI researchers advocating for AI x-safety as a global priority. Let's stop arguing over it and figure out what to do!.

Dan Hendrycks

@DanHendrycks

2 years

We just put out a statement:. “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”. Signatories include Hinton, Bengio, Altman, Hassabis, Song, etc. 🧵 (1/6).

9

22

109

David Krueger

@DavidSKrueger

4 months

3

2

111

David Krueger

@DavidSKrueger

1 year

I've been telling people we should expect breakthroughs like this to make regulating AI much harder. For now there are a few big labs to worry about, but over time as compute gets cheaper and/or restrictions come into place, it could get a lot harder. What's the end game?.

AK

@_akhaliq

1 year

DiLoCo: Distributed Low-Communication Training of Language Models. paper page: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of

19

18

108

David Krueger

@DavidSKrueger

1 year

I've said for a while that everyone should be able to agree that AI x-risk >= 1%. Now I guess even "optimists" agree on this, so. victory??.

Nora Belrose

@norabelrose

1 year

Introducing AI Optimism: a philosophy of hope, freedom, and fairness for all. We strive for a future where everyone is empowered by AIs under their own control. In our first post, we argue AI is easy to control, and will get more controllable over time.

32

8

104

David Krueger

@DavidSKrueger

3 months

Yeah I am super frustrated by how often I hear people say:."China will never cooperate"."we can't trust China"."China doesn't care about AI safety"."we can't regulate AI because then China wins".etc. I always ask: "why do you say that?" and am never satisfied with the answer.

Miles Brundage

@Miles_Brundage

3 months

People sometimes ask me why I am more optimistic than some others about the US and China cooperating on AI safety. Besides the incremental progress happening, I think one point to bear in mind is “we basically haven’t even tried yet - let’s try first then discuss.”.

7

9

105

David Krueger

@DavidSKrueger

3 years

Are autoencoders making / going to make a comeback?.

12

8

102

David Krueger

@DavidSKrueger

2 years

I've been trying to write a good Tweet(/thread) about why objections that AI x-risk is "not scientific enough" are misguided and prove too much. There's a lot to say, and it deserves a full blog post, but I'll try and just rapid fire a few arguments:.

12

21

98

David Krueger

@DavidSKrueger

2 years

You can disagree that AI is an x-risk, but please don't try to erase the large amount of scholarship that has gone into studying this question.

14

9

98

David Krueger

@DavidSKrueger

2 years

"I don't buy it" is finally starting to become an unacceptable way for AI researchers to respond to AI x-risk.

6

5

102

David Krueger

@DavidSKrueger

2 years

A quick and dirty list of pointers for AI x-risk:. The alignment problem from a deep learning perspective. and AGISF. My post on LessWrong: A work-in-progress syllabus I made: I.

5

19

95

David Krueger

@DavidSKrueger

2 years

OK so really people use "AI Alignment" to mean 3 different things, and we should stop doing that. My current preference it to (mostly) use it for Intent Alignment.

14

8

94

David Krueger

@DavidSKrueger

2 years

Really excited to see my commentary published in @newscientist!. There's waaay too much to say on the topic, but to spell it out, the solution is REGULATION. We urgently need to stop the reckless and irresponsible deployment of powerful AI systems.

New Scientist

@newscientist

2 years

Existential risk from AI is admittedly more speculative than pressing concerns such as its bias, but the basic solution is the same. A robust public discussion is long overdue, says @DavidSKrueger

9

18

95

David Krueger

@DavidSKrueger

2 years

The early days of COVID remind me of how people now treat AI x-risk. Several similar thought patterns / memes:. 1) Don't "panic" -- we need to downplay the potential risks in order to prevent too extreme of a reaction. 2) Endorsing low-cost "solutions" instead of effective ones.

14

13

89

David Krueger

@DavidSKrueger

2 years

What are the top AI existential safety arguments/claims that you think are BS?.

45

6

90

David Krueger

@DavidSKrueger

8 months

Recently, I've been feeling an increasing disconnect between my views and those of many in the AI safety community. I will characterize what I'll call the "pseudo-consensus view" below. But first, I emphasize that there are MANY in the AI safety community who disagree with it!.

12

6

91

David Krueger

@DavidSKrueger

2 years

“Look at how it was five years ago and how it is now,” he said of A.I. technology. “Take the difference and propagate it forwards. That’s scary.”.

2

9

90

David Krueger

@DavidSKrueger

4 months

"even 10-20 years from now is very soon" for building highly advanced AI systems. 100%. I have been saying this for 10 years. This is maybe the most commonly neglected argument for taking AI x-risk seriously. I am still super confused why this is not 100% obvious to everyone.

Helen Toner

@hlntnr

5 months

I tried to describe this difficulty here. The crux of it is that waiting for scientific clarity would be be lovely, but may be a luxury we don't have. If highly advanced AI systems are built soon—and even 10-20 years from now is very soon!—then we need to start preparing now.

3

15

90

David Krueger

@DavidSKrueger

2 years

AI x-risk is finally on the public radar in a big way thanks to ChatGPT and GPT4. I'm concerned this might lead to an impression that the x-safety community "cried wolf" when GPT4 doesn't kill everyone or cause society to collapse. Some subtleties below:

5

18

92

David Krueger

@DavidSKrueger

2 years

The obvious solution to AI x-risk is -- and has always been -- to not build AGI. Of course, this is an extremely difficult global coordination problem, analogous to climate change. But I don't think there's any good alternative, ATM.

22

12

86

David Krueger

@DavidSKrueger

2 years

My favorite ChatGPT generation so far:

2

8

86

David Krueger

@DavidSKrueger

1 year

If you're still interested in doing a postdoc with me, but haven't reached out yet, please do so! . I've gotten an extension on the admin deadlines for the FLI fellowship, so if you can apply by Jan 2, then you can still be considered for this round of my hiring.

David Krueger

@DavidSKrueger

1 year

I'm looking for:.- PhD students.- postdocs.- and potentially interns and research assistants (like a post-doc for those who haven't got a PhD). Potential post-docs should reach out ASAP and strongly consider applying for funding from FLI; this would need to happen quite soon.

8

17

85

David Krueger

@DavidSKrueger

2 years

One basic trend is disempowering people, taking human judgment and discretion out of the loop, and pursuing proxy metrics that tend to:.- hurt those without power.- help those with power.- make an overall negative contribution to the common good. Some particular examples of this:.

5

81

David Krueger

@DavidSKrueger

2 months

So it’s finally gotten to the point where you can spend all NeurIPS at corporate parties… not just the evenings, the whole fricken conference.

5

0

86

David Krueger

@DavidSKrueger

7 months

Congrats to the whole team!. IIRC, our findings in this work blew my mind more than any other result. This may be the first evidence for the existence of a *mechanism* by which sufficiently advanced AI systems would tend to become agentic (and thus have instrumental goals).

Dima Krasheninnikov

@dmkrash

7 months

1/ Excited to finally tweet about our paper “Implicit meta-learning may lead LLMs to trust more reliable sources”, to appear at ICML 2024. Our results suggest that during training, LLMs better internalize text that appears useful for predicting other text (e.g. seems reliable).

2

9

85

David Krueger

@DavidSKrueger

3 months

Seriously. Also, I recommend practicing talking about it in a way that sounds normal, since it can feel hard to do when you expect people will think it's weird.

Jeffrey Ladish

@JeffLadish

3 months

More people need to speak up about the implications of developing smarter-than-human AI systems. This is one of the biggest bottlenecks to sensible AI policy.

5

2

85

David Krueger

@DavidSKrueger

2 years

I'm planning to do a Reddit AMA sometime soon(ish). Anyone have any tips for how to make it go well/viral?. Working title: ."I'm an AI professor at the University of Cambridge working on preventing AI from taking over the world and killing everyone, AMA".

17

6

82

David Krueger

@DavidSKrueger

2 years

Most people would've been shocked 6 months ago to see Hinton and Bengio signing this. A few other names that might be surprising:.- Ian Goodfellow.- Dawn Song.- Been Kim.- David Silver.- Andrew Barto.- Atoosa Kasirzadeh.- Nando de Freitas.- Frank Hutter.- Jakob Foerster.

1

7

84

David Krueger

@DavidSKrueger

2 years

My perception: if a DNN "wants" to solve a task a certain way, or "wants" to represent certain information, it's hard to prevent that (e.g. with penalty terms): the DNN will just "hide" that information from your detection method. I call this "steganography everywhere".

3

8

83

David Krueger

@DavidSKrueger

2 years

It's great that people (ARC) are doing these sorts of evaluations, but. We don't know how to properly assess the "ability" of models to do things. We can only evaluate their *propensity* to do things under specific circumstances. This is a critical distinction.

4

82

David Krueger

@DavidSKrueger

2 years

RLHF is super popular, more so than I expected it would be when we wrote our agenda in 2018 (. Still, a lot of the issues we mentioned there persist. Meanwhile, many others issues have also been identified. Much work left to do and RLHF isn't enough!.

Cas (Stephen Casper)

@StephenLCasper

2 years

New paper: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. We survey over 250 papers to review challenges with RLHF with a focus on large language models. Highlights in thread 🧵

1

12

80

David Krueger

@DavidSKrueger

2 years

Well, that's fairly alarming. I suspect the model steganographically plans to backstab, given that backstabbing is essential to good Diplomacy performance IIUC. This may be the demonstration of emergent deception we've been waiting for!. I'll need to look closer. .

Mike Lewis

@ml_perception

2 years

@petermilley It's designed to never intentionally backstab - all its messages correspond to actions it currently plans to take. However, sometimes it changes its mind. .

7

6

82

David Krueger

@DavidSKrueger

10 months

I keep seeing "AI Safety" people using language that suggests:.1) The problems of AI are hypothetical, future risks; i.e. there are not present day harms. 2) The impact of AI will be positive if we can only manage these risks. Please stop, or at least be deliberate about it.

9

13

74