JBloomAus Profile Banner
Joseph Bloom Profile
Joseph Bloom

@JBloomAus

Followers
427
Following
846
Statuses
165

White Box Evaluations Lead @ UK AI Safety Institute. Open Source Mechanistic Interpretability. MATS 6.0. ARENA 1.0.

Oxford, England
Joined February 2021
Don't wanna be here? Send us removal request.
@JBloomAus
Joseph Bloom
5 months
0/8 I’m super excited about work done by my LASR scholars @chanindav, @TomasDulka, @hrdkbhatnagar and James Wilken-Smith. This work demonstrates a critical but likely solvable issue with SAEs! Arxiv link: Blog post:
Tweet media one
3
12
84
@JBloomAus
Joseph Bloom
4 days
@apolloaisafety Worth reading this work from @apolloaisafety! Lots of follow up work needed but a good start :)
0
0
5
@JBloomAus
Joseph Bloom
4 days
@peterbarnett_ We should chat more! I'm with you there!
0
0
1
@JBloomAus
Joseph Bloom
12 days
Very excited about the work @tomekkorbak @geoffreyirving and safety cases team are doing at the @AISafetyInst. More interp people should be thinking about intersections with the control agenda.
@tomekkorbak
Tomek Korbak
12 days
🧵 What safety measures prevent a misaligned LLM agent from causing a catastrophe? How do we make a safety case demonstrating that these measures are sufficient? Our new paper from @AISafetyInst and @redwood_ai sketches a part of an AI control safety case in detail, proposing an evaluation that assures safety for a specific deployment context.
Tweet media one
0
0
9
@JBloomAus
Joseph Bloom
12 days
RT @geoffreyirving: I'm excited that the International AI Safety Report is out! It is important that the capabilities, risks, and mitigatio…
0
5
0
@JBloomAus
Joseph Bloom
20 days
@OwainEvans_UK Fantastic work. Incredibly fascinating. I would love to understand how model's are able to do this. Out of context generalisation needs so much more mechanistic investigation! (Kudos @BetleyJan and co-authors!)
2
0
8
@JBloomAus
Joseph Bloom
2 months
Excited to share some recent work I mentored looking into where SAE latents fire independently. We found a relationship between the size of an SAE and the independence of latent activations and some interesting subspaces!
@MatthewACl31706
Matthew A. Clarke
2 months
0/7 Excited to publish my work from my @pibbssai Fellowship with @hrdkbhatnagar and @JBloomAus. We find that SAE latents are sometimes non-independent, instead forming clusters that map interpretable subspaces. Post: and app:
Tweet media one
0
1
16
@JBloomAus
Joseph Bloom
2 months
RT @AndrewCritchPhD: Something like this will upgrade LLMs from wordsmiths to shape-rotators. It will also make their thoughts less legible…
0
4
0
@JBloomAus
Joseph Bloom
2 months
@AndrewCritchPhD @eshear @mnshah I'd be quite excited about this as well. Any updates / posts?
0
0
0
@JBloomAus
Joseph Bloom
2 months
I'm super excited about the new science of evals team at the UK AI Safety Institute! (And they're hiring for research scientists as well!)
@CUdudec
Cozmin Ududec
2 months
I’m building a new team in the @AISafetyInst that will focus on improving the scientific quality and impact of frontier AI system evaluations. I'll also be at NeurIPS 2024 next week and would love to talk – DM me! (1/7)
0
0
10
@JBloomAus
Joseph Bloom
2 months
RT @AISafetyInst: LLM-powered scientific assistants come with many benefits, but they also come with risks. The question is: how do you mea…
0
5
0
@JBloomAus
Joseph Bloom
3 months
RT @ESYudkowsky: And another day came when the Ships of Humanity, going from star to star, found Sapience. The Humans discovered a world o…
0
49
0
@JBloomAus
Joseph Bloom
3 months
RT @AISafetyInst: We've released a technical report detailing our pre-deployment testing of @AnthropicAI's upgraded Claude 3.5 Model with t…
0
22
0
@JBloomAus
Joseph Bloom
3 months
RT @nickcammarata: it’s kind of amazing that rationalists were openly loudly early and directionally correct about covid, crypto, and ai
0
38
0
@JBloomAus
Joseph Bloom
3 months
Gemma Scope promo videos have forced me to listen to a recording of my own voice for the first time in years! Sounds like my brother's voice to me. Also congrats to @lieberum_t for getting an oral at Blackbox NLP for Gemma Scope!
@NeelNanda5
Neel Nanda
3 months
@JBloomAus explaining the Gemma Scope demo made by @neuronpedia @lieberum_t talking about Gemma Scope at Gemma Dev Day Tokyo My attempt to give an overview of what Gemma Scope is
0
1
19
@JBloomAus
Joseph Bloom
3 months
RT @robertwiblin: The UK AISI on its first year. We are very lucky to have them on the case: "AI safety isn't sci-…
0
5
0
@JBloomAus
Joseph Bloom
3 months
I really enjoyed talking to Scott when he was writing this article. Gemma Scope was a really awesome project and I really enjoyed partnering with DeepMind! Worth a read!
@NeelNanda5
Neel Nanda
3 months
There's a lovely article in MIT Technology Review today on my team's Gemma Scope Sparse Autoencoders - check it out! I'd love to see more people using Gemma Scope to learn more about what the hell is going on inside LLMs. Thanks to Scott Mulligan for a great job writing it!
Tweet media one
1
2
19
@JBloomAus
Joseph Bloom
3 months
RT @AISafetyInst: AI is moving fast. So are we. A year since our founding at the AI Safety Summit, we've built a world leading organisat…
Tweet media one
0
15
0
@JBloomAus
Joseph Bloom
3 months
RT @benaverbook: This is Dario Amodei. He's the CEO behind Claude, one of the world's most advanced AIs. Yesterday, in a 5.5 hour convers…
0
3K
0
@JBloomAus
Joseph Bloom
3 months
RT @geoffreyirving: Excited to see Anthropic putting out safety case sketches for more advanced models than exist today (here, imagined ASL…
0
10
0
@JBloomAus
Joseph Bloom
3 months
RT @AISafetyInst: We’re looking for talented individuals and organisations to help us build evaluations. We’ll reward bounties for new eva…
Tweet media one
0
72
0