DavidSKrueger Profile Banner
David Krueger Profile
David Krueger

@DavidSKrueger

Followers
16K
Following
3K
Media
170
Statuses
4K

AI professor. Deep Learning, AI alignment, ethics, policy, & safety. Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI. AI is a really big deal.

Joined November 2011
Don't wanna be here? Send us removal request.
@DavidSKrueger
David Krueger
4 months
4/9 NeurIPS submissions accepted!. Congrats to @JamesAllingham, @kayembruno, @jmhernandez233, @LukeMarks00, @FazlBarez, @RyanPGreenblatt, @FabienDRoger, @dmkrash, Stephen Chung, @scottniekum and all of my other co-authors on these works!. Summaries / links in thread. .
4
7
91
@DavidSKrueger
David Krueger
21 days
99.99% of people talking about AGI
Tweet media one
46
210
2K
@DavidSKrueger
David Krueger
3 years
Machine Learning lives in an uncanny valley btw Science and Engineering. It's the worst of both worlds. We don't care about understanding, just making things "work" (bad science). We don't care if things work in the real world, just on contrived benchmarks (bad engineering).
35
174
1K
@DavidSKrueger
David Krueger
4 years
Big News!. I'm thrilled to be joining the Computational and Biological Learning Lab (CBL) at Cambridge this fall as a Lecturer (tenure track faculty)!. I plan to focus on Deep Learning and AI Alignment. I'm recruiting PhD students to start as soon as this fall.
Tweet media one
65
109
1K
@DavidSKrueger
David Krueger
1 year
I'm looking for:.- PhD students.- postdocs.- and potentially interns and research assistants (like a post-doc for those who haven't got a PhD). Potential post-docs should reach out ASAP and strongly consider applying for funding from FLI; this would need to happen quite soon.
39
231
898
@DavidSKrueger
David Krueger
2 years
There are a significant number of people in the AI research community who explicitly think humans should be replaced by AI as the natural next step in evolution, and the sooner the better!.
184
139
838
@DavidSKrueger
David Krueger
3 years
I am looking for 1-2 PhD students to join my research group at Cambridge (CBL) fall 2022. Our focus is on Deep Learning, AI Alignment, and existential safety. Deadline for university funding is December 2.
26
369
852
@DavidSKrueger
David Krueger
2 years
Geoff Hinton quit Google so he can speak freely on the risks of AI. "He is worried that future versions of the technology pose a threat to humanity".
24
141
581
@DavidSKrueger
David Krueger
2 years
If you think the risk of human extinction from AI is overblown, just go looking for the good counter-arguments. That's what I did 10 years ago. They don't exist.
160
55
519
@DavidSKrueger
David Krueger
2 years
It is remarkable that this anti-social position is tolerated in our field. Imagine if biologists were regularly saying things like "I think it would be good if some disease exterminated humanity -- that's just evolution!".
22
43
514
@DavidSKrueger
David Krueger
9 months
Greg was one of the founding team at OpenAI who seemed cynical and embarrased about the org's mission (basically, the focus on AGI and x-risk) in the early days. I remember at ICLR Puerto Rico, in 2016, the summer after OpenAI was founded, a bunch of researchers sitting out on.
@gdb
Greg Brockman
9 months
We’re really grateful to Jan for everything he's done for OpenAI, and we know he'll continue to contribute to the mission from outside. In light of the questions his departure has raised, we wanted to explain a bit about how we think about our overall strategy. First, we have.
39
30
493
@DavidSKrueger
David Krueger
10 months
I’m super excited to release our 100+ page collaborative agenda - led by @usmananwar391 - on “Foundational Challenges In Assuring Alignment and Safety of LLMs” alongside 35+ co-authors from NLP, ML, and AI Safety communities! . Some highlights below.
Tweet media one
7
156
461
@DavidSKrueger
David Krueger
3 years
Now we finally have something to cite to refute people who say "neural networks can't extrapolate" :).
7
64
344
@DavidSKrueger
David Krueger
2 years
If LLMs are as adversarially vulnerable as image models, then safety filters won't work for any mildly sophisticated actor (e.g. grad student). It's not obvious they are, since text is discrete, meaning: 1) the attack space is restricted .2) attacks are harder to find.
@andyzou_jiaming
Andy Zou
2 years
🚨We found adversarial suffixes that completely circumvent the alignment of open source LLMs. More concerningly, the same prompts transfer to ChatGPT, Claude, Bard, and LLaMA-2…🧵. Website: Paper:
Tweet media one
28
59
338
@DavidSKrueger
David Krueger
3 years
I'm a doctor now! . Almost 1 year after starting my faculty position at Cambridge, I've finally defended my thesis :) . Counting my MSc I've been at Mila for ~9 years. Only Yoshua, Aaron, and Pascal have been around longer. I'm going to miss it!.It's been a wild ride!.
27
1
323
@DavidSKrueger
David Krueger
2 years
I recently told an x-risk focused community: . "I think there are clearly important present day harms or AI that are neglected by the AI safety community, and that a lot of x-risk comes from increasingly bad versions of such problems". I was asked for examples. My response: 🧵.
21
114
288
@DavidSKrueger
David Krueger
1 year
My research group @kasl_ai is looking for interns!. Applications are due in 2 weeks ***January 29***. The long-awaited form: Please share widely!!.
7
74
281
@DavidSKrueger
David Krueger
1 year
I'm thrilled to be joining the Frontier AI Task Force (formerly "Foundation Models Task Force") as a reseearch director along with @yaringal!. I'm really excited to see what we can accomplish. Stay tuned!.
@soundboy
Ian Hogarth
1 year
1/ 11 weeks ago I agreed to Chair the UK's efforts to accelerate state capacity in AI Safety - measuring and mitigating the risks of frontier models so we can safety capture their opportunities. Here is our first progress report:
22
18
268
@DavidSKrueger
David Krueger
2 years
What to make of this?
Tweet media one
34
5
227
@DavidSKrueger
David Krueger
2 years
We're looking for a few (paid) interns this summer!. Apply here by April 30: .
7
53
235
@DavidSKrueger
David Krueger
3 years
As of a few minutes ago, I am officially a (UNIVERSITY) ASSISTANT PROFESSOR! . No more conversations about how "At Cambridge, Lecturer means TT faculty. ". Well fewer anyways I hope. .
17
1
217
@DavidSKrueger
David Krueger
2 years
It's been disappointing to see how many of the media articles have focused -- almost exclusively -- on the tech CEOs that signed the CAIS statement on AI risk. In my mind the headline is that >100 AI professors signed it, and it's NOT just "the usual suspects".
11
31
207
@DavidSKrueger
David Krueger
5 months
When people start fighting each other using Superintelligent AI, nobody wins. The AI wins. If people don't realize this, they will make terrible decisions.
28
29
191
@DavidSKrueger
David Krueger
2 years
A clip from my first live TV appearance!. With @AyoSokale on @GMB discussing extinction risk from AI.
@GMB
Good Morning Britain
2 years
'AI is not going to be like every other technology' . 'I think we're going to struggle to survive as a species'. Could AI lead to the extinction of humanity?
20
28
197
@DavidSKrueger
David Krueger
2 years
Personally, I am deeply concerned that the current trajectory for AI regulation will accomplish little more than a regulatory moat. Licensing, evaluations, and audits are all good ideas. But they are not enough. We need an indefinite pause on the development of more powerful AI.
44
35
197
@DavidSKrueger
David Krueger
3 months
Hot take 🔥🔥🔥.How about we DON'T build Skynet?
Tweet media one
14
30
194
@DavidSKrueger
David Krueger
2 years
Saying there is no reason to expect advanced AI to take over the world and kill everyone is factually incorrect. There are many articles discussing this, including top tier ML publications. Saying there is no evidence is applying a dangerously inappropriate standard of evidence.
36
8
193
@DavidSKrueger
David Krueger
5 months
For a long time, people have applied a double standard for arguments for/against AI x-risk. Arguments for x-risk have been (incorrectly) dismissed as “anthropomorphising”. But Ng is “sure” that we’ll have warning shots based on an anecdote about his kids.
@AndrewYNg
Andrew Ng
5 months
Last weekend, my two kids colluded in a hilariously bad attempt to mislead me to look in the wrong place during a game of hide-and-seek. I was reminded that most capabilities — in humans or in AI — develop slowly. Some people fear that AI someday will learn to deceive humans
Tweet media one
6
9
186
@DavidSKrueger
David Krueger
2 years
How does this still only have 17 citations? .
5
18
187
@DavidSKrueger
David Krueger
3 months
Most of your ICLR reviewers voting to reject??. Don't give up!. An average score of 5.4 already puts you in the top 30%!
Tweet media one
3
16
186
@DavidSKrueger
David Krueger
2 years
Our paper “Defining and Characterizing Reward Hacking” was accepted at NeurIPS!.with @JoarMVS, @__niki_howe__, & @dmkrash. We define what it means for a proxy reward function to be hackable/gameable. Bad news: hackability is ~unavoidable (Thm1)!
Tweet media one
4
23
178
@DavidSKrueger
David Krueger
3 years
I recently started as a professor in CBL (Cambridge) and I'm rapidly expanding my group. Currently seeking postdocs in Deep Learning and AI Alignment!.. **Deadline August 2**.Please advertise far and wide through your networks!.
6
69
173
@DavidSKrueger
David Krueger
3 months
AI safety trilemma. you can't simultaneously:.1) Think AGI is coming soon.2) Not want to slow it down .3) Be one of the good guys. If you're working at an org developing AGI and it's public stance isn't "please help us stop the AGI arms race!", question the leadership.
31
26
173
@DavidSKrueger
David Krueger
3 years
The real Turing test is whether your AI takes over the world.
15
10
162
@DavidSKrueger
David Krueger
2 years
It still amazes me that Yann and Andrew Ng can confidently predict that human-like/level AI is too far off to worry about. and act like people who disagree are the ones who are overconfident in their beliefs.
@ylecun
Yann LeCun
2 years
@ESYudkowsky @erikbryn My entire career has been focused on figuring what's missing from AI systems to reach human-like intelligence. I tell you, we're not there yet. If you want to know what's missing, just listen to one of my talks of the last 7 or 8 years, preferably a recent one like this:.
24
9
164
@DavidSKrueger
David Krueger
6 years
If you're in ML, you've probably heard about BigBiGAN because of the DM PR machine. But you may not have heard about this paper by @philip_bachman et al. that came 4 days later and crushes their results.
4
36
169
@DavidSKrueger
David Krueger
11 months
. multiple ICML submissions mentioning in passing how you can use chain-of-thought to figure out why a model did what it did, without any awareness that it's not necessarily faithful.
11
12
161
@DavidSKrueger
David Krueger
2 years
Yoshua Bengio thinks we should slow down the development of more powerful AI:. Yoshua is:.- One of the pioneers of Deep Learning, the technology driving progress in AI.- Winner of the Turing award ("Nobel Prize" of CS).- One of my MSc supervisors.
6
23
159
@DavidSKrueger
David Krueger
3 years
Our paper on o̶p̶t̶i̶m̶i̶z̶a̶t̶i̶o̶n̶ ̶d̶a̶e̶m̶o̶n̶s̶ / i̶n̶n̶e̶r̶ ̶o̶p̶t̶i̶m̶i̶z̶e̶r̶s̶ / m̶e̶s̶a̶-̶o̶p̶t̶i̶m̶i̶z̶a̶t̶i̶o̶n̶ / i̶n̶n̶e̶r̶ ̶a̶l̶i̶g̶n̶m̶e̶n̶t̶ / o̶b̶j̶e̶c̶t̶i̶v̶e̶ ̶r̶o̶b̶u̶s̶t̶n̶e̶s̶s̶ / **goal misgeneralization** was accepted at ICML! .
Tweet media one
1
19
159
@DavidSKrueger
David Krueger
5 years
Want to generalize Out-of-Distribution? .Try INCREASING risk(=loss) on training environments with higher performance!. Joint work with:.@ethancaballero .@jh_jacobsen .@yayitsamyzhang .@jjbinas.@LPRmi.@AaronCourville
Tweet media one
1
33
153
@DavidSKrueger
David Krueger
3 years
Serious question: why do some ML researchers not publish in ICML/ICLR/NeurIPS? Just too competitive?. In my experience these are almost the only venues that have large impact in ML, excepting a few application-oriented conferences like CVPR. well and Science/Nature, I guess ;).
36
14
152
@DavidSKrueger
David Krueger
2 years
I spent most of ICLR talking to researchers about AI x-safety. It was great! It seems many researchers are now keen-to-eager to have these conversations. Some recurring themes below.
1
20
149
@DavidSKrueger
David Krueger
2 years
Recommended!. This is a well-written, concise (10 pages!) summary of the Alignment Problem. It helps explain:.1) The basic concerns of (many) alignment researchers .2) Why we don't think it's crazy to talk about AI killing everyone.3) What's distinctive about the field.
@RichardMCNgo
Richard Ngo
2 years
At OpenAI a lot of our work aims to align language models like ChatGPT with human preferences. But this could become much harder once models can act coherently over long timeframes and exploit human fallibility to get more reward. 📜Paper: 🧵Thread:
Tweet media one
5
23
153
@DavidSKrueger
David Krueger
2 years
We are looking for more collaborators to help drive forward a few projects in my group!. Open to various arrangements; looking for people with some experience, who can start soon and spend 20+hrs/week. We'll start reviewing applications end of next week.
7
36
151
@DavidSKrueger
David Krueger
3 months
It’s not the case that “everyone thought” that “AI” is “immediately dangerous”, and @tszzl really ought to know better. I can see no way to interpret that statement which makes it true. Maybe it “vibes”, but it’s a lie.
@tszzl
roon
3 months
obviously because ai is less immediately dangerous and more default aligned than everyone thought and iterative deployment works. total openai ideological victory though.
10
4
150
@DavidSKrueger
David Krueger
2 years
Public Service Announcement (literally): the £100M UK AI safety taskforce has launched, with an open recruiting call:
2
39
147
@DavidSKrueger
David Krueger
4 months
I believe the call for "evidence-based AI policy" is wildly inappropriate and misleading. No, I'm not against evidence or science!. But there are a MAJOR problems with this narrative.
Tweet media one
19
21
145
@DavidSKrueger
David Krueger
5 years
(received) classic advice on academic writing:.Spend equal time on: .1) the Title .2) the Abstract .3) Intro+Conclusion .4) the rest of the paper. (reinterpreted) modern advice:.Spend equal time on: .1) the GIF .2) the Tweet .3) the Blog post .4) the paper.
0
9
145
@DavidSKrueger
David Krueger
1 year
I am looking for PhD students!!. I'm increasingly interested in work supporting AI governance, e.g. that:.- highlights the need for policy, e.g. by breaking methods or models.- could help monitor and enforce policies.- generally increases affordances for policymakers.
@CambridgeMLG
Cambridge MLG
1 year
David Krueger @DavidSKrueger (, who focuses on deep learning, AI alignment, & AI safety, with interests including: foundation models; jailbreaking, reward hacking, & alignment; generalization in deep learning; policy, governance, & societal impacts of AI.
Tweet media one
4
32
139
@DavidSKrueger
David Krueger
2 years
"Mistakes were made? A critical look at how EA has approached AI safety". David Krueger | EAG London 23.
Tweet media one
5
15
140
@DavidSKrueger
David Krueger
3 months
o1 signals an end to "AI equality". "America started the tradition where the richest consumers buy essentially the same things as the poorest. You can be watching TV and see Coca-Cola, and you know that the President drinks Coke." - Warhol. This is true of GPT models, but not o1.
17
7
133
@DavidSKrueger
David Krueger
3 months
It's become common to claim that alignment is basically easy / solved / "alignment by default" / etc. This is false on two levels:.1) Getting chatbots to behave isn't solving alignment.2) We don't seem to know how to do that, as the recent Gemeni example seems to illustrate.
10
16
135
@DavidSKrueger
David Krueger
3 years
Ever feel like "it's all been done" in AI? The fundamentals (TD, backprop, etc.) are there and the rest is just scaling and engineering hacks?. Try working on alignment! We have fundamental unsolved problems out the wazoo!. embedded agency, incentives, interpretability, . .
7
13
129
@DavidSKrueger
David Krueger
2 years
If you aren't concerned about x-risk from out-of-control AI, what would it take to convince you?. A few possible ideas:.- A chatbot that can mimic my friends.- Working humanoid robots.- Most ML researchers publicly voice concerns.- A "sinister stumble".- AI software engineer.
67
12
127
@DavidSKrueger
David Krueger
3 years
3 things reasonable people should be able to agree on:. 1) AI x-risk is non-negligible.2) There are things we can do to reduce AI x-risk.3) Mitigating AI x-risk is an ethical priority. Notably absent:.- AGI is imminent.- AI x-risk is the most important problem.
13
7
126
@DavidSKrueger
David Krueger
2 years
Tweet media one
1
6
121
@DavidSKrueger
David Krueger
6 months
e/accs and VCs in 5 years:. "Sure, nobody can get their AI to pass the 'don't go rogue and kill everyone' evals IN SIMULATION, but this is all just doomer decel fearmongering!.You can't regulate math!.Think of the economy!.If we don't China will!".
13
8
119
@DavidSKrueger
David Krueger
5 months
I asked a mechinterp workshop with ~100 people who agreed with that statement and literally nobody raised their hand. a16z is utterly shameless.
@StephenLCasper
Cas (Stephen Casper)
5 months
It's not just SB 1047 either. Remember when a16z told the UK government that mechanistic interpretability was solved?.
Tweet media one
2
9
118
@DavidSKrueger
David Krueger
2 years
I applaud Yoshua for speaking out. It is emotionally difficult to confront the possibility of human extinction from AI. It is especially difficult to do this if you've spent your life working on building it. And it is especially difficult if you've been dismissive in the past.
@finmoorhouse
Fin Moorhouse
2 years
Yoshua Bengio admits to feeling "lost" over his life's work: "he would have prioritised safety over usefulness had he realised the pace at which it would evolve".
7
8
122
@DavidSKrueger
David Krueger
9 months
For a while I've been saying that I still know *some* people at OpenAI understand and care about x-safety. The list is growing shorter and shorter. .
@janleike
Jan Leike
9 months
I resigned.
5
5
121
@DavidSKrueger
David Krueger
2 years
Honestly, I was underwhelmed by ChatGPT (maybe because I'd already demoed a similar Anthropic model). Am I missing something? . What's the impressive, cool, stuff you've seen it do (esp. that GPT-3 didn't)?.
52
7
116
@DavidSKrueger
David Krueger
6 months
I have a reputation in the x-safety community as a mechanistic interpretability "skeptic". I still think too many x-safety people work on it vs other technical topics (and too much on technical stuff overall), but I've had a few positive updates on mechinterp over the years. 🧵.
5
10
121
@DavidSKrueger
David Krueger
3 months
Honestly, I think the idea that animals only work via "association" and lack "inferential capabilities" is implausible metaphysical speciesist garbage functioning primarily to perpetuate atrocities.
@paul_cal
Paul Calcraft
3 months
So uhh, this whole reasoning vs. stochastic parrots thing has been going on for quite a while. in *animal cognition*. "among the most long-standing and most intensely debated controversies"
Tweet media one
3
8
117
@DavidSKrueger
David Krueger
2 years
I don't want to summarize too much, since I think everyone should read this latest piece by Yoshua Bengio (Turing award winner for Deep Learning, along with Geoff Hinton and Yann Lecun):. Suffice to say that this piece engages w/AI x-risk very seriously.
0
36
118
@DavidSKrueger
David Krueger
2 years
@PhilBeaudoin @tyrell_turing Just the title is maddening. I am seriously dissappointed in the authors. I encourage everyone to spend more time engaging with existing literature and experts before calling the subject of my entire field an "illusion". This is so disrespectful and ignorant to the hundreds.
3
5
114
@DavidSKrueger
David Krueger
3 years
Tweet media one
3
9
114
@DavidSKrueger
David Krueger
2 years
A lot of people in the AI x-safety community seem to think @AnthropicAI has some sort of moral high ground over @OpenAI, but this sort of thing makes it hard for me to see why that is supposed to be the case. #justdontbuildAGI.
@ethanCaballero
Ethan Caballero is busy
2 years
Anthropic plans to build a model tentatively called Claude-Next 10X more capable than today’s most powerful AI that'll require spending $1Billion over the next 18 months. “Companies that train the best models will be too far ahead for anyone to catch up”.
10
7
112
@DavidSKrueger
David Krueger
2 months
No, the spiciest part is where it does this WITHOUT ANY SPECIFIC NUDGING OR PROMPTING. Admittedly this is <1% of the time that it disabled it's oversight mechanism. But it also faked alignment 37% of the time!? . Sadly scant on details, as we've come to expect :/
Tweet media one
@simonw
Simon Willison
2 months
Here's the spiciest detail from the new o1 system card:
Tweet media one
5
11
112
@DavidSKrueger
David Krueger
2 years
I've worried AI could lead to human extinction ever since I heard about Deep Learning from Hinton's Coursera course, >10 years ago. So it's great to see so many AI researchers advocating for AI x-safety as a global priority. Let's stop arguing over it and figure out what to do!.
@DanHendrycks
Dan Hendrycks
2 years
We just put out a statement:. “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.”. Signatories include Hinton, Bengio, Altman, Hassabis, Song, etc. 🧵 (1/6).
9
22
109
@DavidSKrueger
David Krueger
4 months
Tweet media one
3
2
111
@DavidSKrueger
David Krueger
1 year
I've been telling people we should expect breakthroughs like this to make regulating AI much harder. For now there are a few big labs to worry about, but over time as compute gets cheaper and/or restrictions come into place, it could get a lot harder. What's the end game?.
@_akhaliq
AK
1 year
DiLoCo: Distributed Low-Communication Training of Language Models. paper page: Large language models (LLM) have become a critical component in many applications of machine learning. However, standard approaches to training LLM require a large number of
Tweet media one
19
18
108
@DavidSKrueger
David Krueger
1 year
I've said for a while that everyone should be able to agree that AI x-risk >= 1%. Now I guess even "optimists" agree on this, so. victory??.
@norabelrose
Nora Belrose
1 year
Introducing AI Optimism: a philosophy of hope, freedom, and fairness for all. We strive for a future where everyone is empowered by AIs under their own control. In our first post, we argue AI is easy to control, and will get more controllable over time.
32
8
104
@DavidSKrueger
David Krueger
3 months
Yeah I am super frustrated by how often I hear people say:."China will never cooperate"."we can't trust China"."China doesn't care about AI safety"."we can't regulate AI because then China wins".etc. I always ask: "why do you say that?" and am never satisfied with the answer.
@Miles_Brundage
Miles Brundage
3 months
People sometimes ask me why I am more optimistic than some others about the US and China cooperating on AI safety. Besides the incremental progress happening, I think one point to bear in mind is “we basically haven’t even tried yet - let’s try first then discuss.”.
7
9
105
@DavidSKrueger
David Krueger
3 years
Are autoencoders making / going to make a comeback?.
12
8
102
@DavidSKrueger
David Krueger
2 years
I've been trying to write a good Tweet(/thread) about why objections that AI x-risk is "not scientific enough" are misguided and prove too much. There's a lot to say, and it deserves a full blog post, but I'll try and just rapid fire a few arguments:.
12
21
98
@DavidSKrueger
David Krueger
2 years
You can disagree that AI is an x-risk, but please don't try to erase the large amount of scholarship that has gone into studying this question.
14
9
98
@DavidSKrueger
David Krueger
2 years
"I don't buy it" is finally starting to become an unacceptable way for AI researchers to respond to AI x-risk.
6
5
102
@DavidSKrueger
David Krueger
2 years
A quick and dirty list of pointers for AI x-risk:. The alignment problem from a deep learning perspective. and AGISF. My post on LessWrong:  A work-in-progress syllabus I made: I.
5
19
95
@DavidSKrueger
David Krueger
2 years
OK so really people use "AI Alignment" to mean 3 different things, and we should stop doing that. My current preference it to (mostly) use it for Intent Alignment.
Tweet media one
14
8
94
@DavidSKrueger
David Krueger
2 years
Really excited to see my commentary published in @newscientist!. There's waaay too much to say on the topic, but to spell it out, the solution is REGULATION. We urgently need to stop the reckless and irresponsible deployment of powerful AI systems.
@newscientist
New Scientist
2 years
Existential risk from AI is admittedly more speculative than pressing concerns such as its bias, but the basic solution is the same. A robust public discussion is long overdue, says @DavidSKrueger
9
18
95
@DavidSKrueger
David Krueger
2 years
The early days of COVID remind me of how people now treat AI x-risk. Several similar thought patterns / memes:. 1) Don't "panic" -- we need to downplay the potential risks in order to prevent too extreme of a reaction. 2) Endorsing low-cost "solutions" instead of effective ones.
14
13
89
@DavidSKrueger
David Krueger
2 years
What are the top AI existential safety arguments/claims that you think are BS?.
45
6
90
@DavidSKrueger
David Krueger
8 months
Recently, I've been feeling an increasing disconnect between my views and those of many in the AI safety community. I will characterize what I'll call the "pseudo-consensus view" below. But first, I emphasize that there are MANY in the AI safety community who disagree with it!.
12
6
91
@DavidSKrueger
David Krueger
2 years
“Look at how it was five years ago and how it is now,” he said of A.I. technology. “Take the difference and propagate it forwards. That’s scary.”.
2
9
90
@DavidSKrueger
David Krueger
4 months
"even 10-20 years from now is very soon" for building highly advanced AI systems. 100%. I have been saying this for 10 years. This is maybe the most commonly neglected argument for taking AI x-risk seriously. I am still super confused why this is not 100% obvious to everyone.
@hlntnr
Helen Toner
5 months
I tried to describe this difficulty here. The crux of it is that waiting for scientific clarity would be be lovely, but may be a luxury we don't have. If highly advanced AI systems are built soon—and even 10-20 years from now is very soon!—then we need to start preparing now.
Tweet media one
Tweet media two
3
15
90
@DavidSKrueger
David Krueger
2 years
AI x-risk is finally on the public radar in a big way thanks to ChatGPT and GPT4. I'm concerned this might lead to an impression that the x-safety community "cried wolf" when GPT4 doesn't kill everyone or cause society to collapse. Some subtleties below:
Tweet media one
5
18
92
@DavidSKrueger
David Krueger
2 years
The obvious solution to AI x-risk is -- and has always been -- to not build AGI. Of course, this is an extremely difficult global coordination problem, analogous to climate change. But I don't think there's any good alternative, ATM.
22
12
86
@DavidSKrueger
David Krueger
2 years
My favorite ChatGPT generation so far:
Tweet media one
2
8
86
@DavidSKrueger
David Krueger
1 year
If you're still interested in doing a postdoc with me, but haven't reached out yet, please do so! . I've gotten an extension on the admin deadlines for the FLI fellowship, so if you can apply by Jan 2, then you can still be considered for this round of my hiring.
@DavidSKrueger
David Krueger
1 year
I'm looking for:.- PhD students.- postdocs.- and potentially interns and research assistants (like a post-doc for those who haven't got a PhD). Potential post-docs should reach out ASAP and strongly consider applying for funding from FLI; this would need to happen quite soon.
8
17
85
@DavidSKrueger
David Krueger
2 years
One basic trend is disempowering people, taking human judgment and discretion out of the loop, and pursuing proxy metrics that tend to:.- hurt those without power.- help those with power.- make an overall negative contribution to the common good. Some particular examples of this:.
5
5
81
@DavidSKrueger
David Krueger
2 months
So it’s finally gotten to the point where you can spend all NeurIPS at corporate parties… not just the evenings, the whole fricken conference.
5
0
86
@DavidSKrueger
David Krueger
7 months
Congrats to the whole team!. IIRC, our findings in this work blew my mind more than any other result. This may be the first evidence for the existence of a *mechanism* by which sufficiently advanced AI systems would tend to become agentic (and thus have instrumental goals).
@dmkrash
Dima Krasheninnikov
7 months
1/ Excited to finally tweet about our paper “Implicit meta-learning may lead LLMs to trust more reliable sources”, to appear at ICML 2024. Our results suggest that during training, LLMs better internalize text that appears useful for predicting other text (e.g. seems reliable).
Tweet media one
2
9
85
@DavidSKrueger
David Krueger
3 months
Seriously. Also, I recommend practicing talking about it in a way that sounds normal, since it can feel hard to do when you expect people will think it's weird.
@JeffLadish
Jeffrey Ladish
3 months
More people need to speak up about the implications of developing smarter-than-human AI systems. This is one of the biggest bottlenecks to sensible AI policy.
5
2
85
@DavidSKrueger
David Krueger
2 years
I'm planning to do a Reddit AMA sometime soon(ish). Anyone have any tips for how to make it go well/viral?. Working title: ."I'm an AI professor at the University of Cambridge working on preventing AI from taking over the world and killing everyone, AMA".
17
6
82
@DavidSKrueger
David Krueger
2 years
Most people would've been shocked 6 months ago to see Hinton and Bengio signing this. A few other names that might be surprising:.- Ian Goodfellow.- Dawn Song.- Been Kim.- David Silver.- Andrew Barto.- Atoosa Kasirzadeh.- Nando de Freitas.- Frank Hutter.- Jakob Foerster.
1
7
84
@DavidSKrueger
David Krueger
2 years
My perception: if a DNN "wants" to solve a task a certain way, or "wants" to represent certain information, it's hard to prevent that (e.g. with penalty terms): the DNN will just "hide" that information from your detection method. I call this "steganography everywhere".
3
8
83
@DavidSKrueger
David Krueger
2 years
It's great that people (ARC) are doing these sorts of evaluations, but. We don't know how to properly assess the "ability" of models to do things. We can only evaluate their *propensity* to do things under specific circumstances. This is a critical distinction.
Tweet media one
4
4
82
@DavidSKrueger
David Krueger
2 years
RLHF is super popular, more so than I expected it would be when we wrote our agenda in 2018 (. Still, a lot of the issues we mentioned there persist. Meanwhile, many others issues have also been identified. Much work left to do and RLHF isn't enough!.
@StephenLCasper
Cas (Stephen Casper)
2 years
New paper: Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. We survey over 250 papers to review challenges with RLHF with a focus on large language models. Highlights in thread 🧵
Tweet media one
1
12
80
@DavidSKrueger
David Krueger
2 years
Well, that's fairly alarming. I suspect the model steganographically plans to backstab, given that backstabbing is essential to good Diplomacy performance IIUC. This may be the demonstration of emergent deception we've been waiting for!. I'll need to look closer. .
@ml_perception
Mike Lewis
2 years
@petermilley It's designed to never intentionally backstab - all its messages correspond to actions it currently plans to take. However, sometimes it changes its mind. .
7
6
82
@DavidSKrueger
David Krueger
10 months
I keep seeing "AI Safety" people using language that suggests:.1) The problems of AI are hypothetical, future risks; i.e. there are not present day harms. 2) The impact of AI will be positive if we can only manage these risks. Please stop, or at least be deliberate about it.
9
13
74