In this month's short newsletter: recent discussions about the internal representations used by AI systems, and how to classify perspectives on AI risk.
I’m worried that a lot of work on AI safety evals is primarily motivated by “Something must be done. This is something. Therefore this must be done.”
Or, to put it another way: I judge eval ideas on 4 criteria, and I often see proposals which fail all 4.
The criteria:
There is a way of seeing the world where you look at a blade of grass and see "a solar-powered self-replicating factory". I've never figured out how to explain how hard a superintelligence can hit us, to someone who does not see from that angle. It's not just the one fact.
MIRI's June newsletter is here! in this edition: MIRI's approach to comms, why racing for AI is the wrong plan, and parting ways with the Agent Foundations team
You try to explain how airplane fuel can melt a skyscraper, but your calculation doesn't include relativistic effects, and then the 9/11 conspiracy theorists spend the next 10 years talking about how you deny relativity.
Similarly: A paperclip maximizer is not "monomoniacally"
Guys in 2014: "Yudkowsky's hard-takeoff fantasies are absurd. For AIs to build better AIs will be a process of decades, not months."
Guys in 2024: "Yudkowsky's hard-takeoff fantasies are absurd. For AIs to build better AIs will be a process of months, not minutes."
My take on Leopold Aschenbrenner's new report: I think Leopold gets it right on a bunch of important counts. Three that I especially care about:
1 - Full AGI and ASI soon. (I think his arguments for this have a lot of holes, but he gets the basic point that superintelligence
AI optimists: Ignore doom fantasies; they require AI to instantly go from 0 to infinite power! We'll notice when AIs first get smart enough to lie.
2022: Yep, GPT-4 can lie to a Taskrabbit, but isn't smart enough to hide it. Whatcha gonna do about that?
Optimists: Nothing!
Will humanity be able to determine which ASI behavior is safe & desirable by having it output explanations and arguments that we can judge?
Some argue yes. Some argue no. It’s tough to judge.
SO YOU SEE WHY THE ANSWER IS OBVIOUSLY NO.
There's multiple thresholds, here:
- A threshold where the AI is strategic enough to form a desire to deceive you;
- A threshold where the AI understands human psychology well enough to deceive you successfully, the way humans deceive each other sometimes successfully;
- A
A retraction from Harlan: the MIRI Newsletter said "it appears that not all of the leading AI labs are honoring the voluntary agreements they made at the [UK] summit", citing Politico. We now no longer trust that article, and no longer have evidence any commitments were broken.
MIRI's May newsletter is here! This month, we welcome new writers to the team, bid farewell to the Visible Thoughts Project, and recap some important developments in the world at large since 2022.
The MIRI Newsletter is back! Here, we recap MIRI's new strategy pivot and some of the best MIRI posts and media appearances of the past ~year. We're also hiring, with job openings on our new Technical Governance team.
Researcher:
Writer:
The roles are located in Berkeley, and we are ideally looking to hire people who can start ASAP.
Please share this with your networks or any people you think might be a good fit!
However, if we’ve had “warning shots” where increasingly larger and more dangerous asteroids land in the intervening years, that will allow society to prepare better environmental and social responses.