Joseph Bloom @JBloomAus profile

Joseph Bloom

@JBloomAus

Followers

427

Following

846

Statuses

165

White Box Evaluations Lead @ UK AI Safety Institute. Open Source Mechanistic Interpretability. MATS 6.0. ARENA 1.0.

Oxford, England

Joined February 2021

Don't wanna be here? Send us removal request.

Joseph Bloom

@JBloomAus

5 months

0/8 I’m super excited about work done by my LASR scholars @chanindav, @TomasDulka, @hrdkbhatnagar and James Wilken-Smith. This work demonstrates a critical but likely solvable issue with SAEs! Arxiv link: Blog post:

3

12

84

Joseph Bloom

@JBloomAus

4 days

@apolloaisafety Worth reading this work from @apolloaisafety! Lots of follow up work needed but a good start :)

0

5

Joseph Bloom

@JBloomAus

4 days

@peterbarnett_ We should chat more! I'm with you there!

0

1

Joseph Bloom

@JBloomAus

12 days

Very excited about the work @tomekkorbak @geoffreyirving and safety cases team are doing at the @AISafetyInst. More interp people should be thinking about intersections with the control agenda.

Tomek Korbak

@tomekkorbak

12 days

🧵 What safety measures prevent a misaligned LLM agent from causing a catastrophe? How do we make a safety case demonstrating that these measures are sufficient? Our new paper from @AISafetyInst and @redwood_ai sketches a part of an AI control safety case in detail, proposing an evaluation that assures safety for a specific deployment context.

0

9

Joseph Bloom

@JBloomAus

12 days

RT @geoffreyirving: I'm excited that the International AI Safety Report is out! It is important that the capabilities, risks, and mitigatio…

0

5

0

Joseph Bloom

@JBloomAus

20 days

@OwainEvans_UK Fantastic work. Incredibly fascinating. I would love to understand how model's are able to do this. Out of context generalisation needs so much more mechanistic investigation! (Kudos @BetleyJan and co-authors!)

2

0

8

Joseph Bloom

@JBloomAus

2 months

Excited to share some recent work I mentored looking into where SAE latents fire independently. We found a relationship between the size of an SAE and the independence of latent activations and some interesting subspaces!

Matthew A. Clarke

@MatthewACl31706

2 months

0/7 Excited to publish my work from my @pibbssai Fellowship with @hrdkbhatnagar and @JBloomAus. We find that SAE latents are sometimes non-independent, instead forming clusters that map interpretable subspaces. Post: and app:

0

1

16

Joseph Bloom

@JBloomAus

2 months

RT @AndrewCritchPhD: Something like this will upgrade LLMs from wordsmiths to shape-rotators. It will also make their thoughts less legible…

0

4

0

Joseph Bloom

@JBloomAus

2 months

@AndrewCritchPhD @eshear @mnshah I'd be quite excited about this as well. Any updates / posts?

0

Joseph Bloom

@JBloomAus

2 months

I'm super excited about the new science of evals team at the UK AI Safety Institute! (And they're hiring for research scientists as well!)

Cozmin Ududec

@CUdudec

2 months

I’m building a new team in the @AISafetyInst that will focus on improving the scientific quality and impact of frontier AI system evaluations. I'll also be at NeurIPS 2024 next week and would love to talk – DM me! (1/7)

0

10

Joseph Bloom

@JBloomAus

2 months

RT @AISafetyInst: LLM-powered scientific assistants come with many benefits, but they also come with risks. The question is: how do you mea…

0

5

0

Joseph Bloom

@JBloomAus

3 months

RT @ESYudkowsky: And another day came when the Ships of Humanity, going from star to star, found Sapience. The Humans discovered a world o…

0

49

0

Joseph Bloom

@JBloomAus

3 months

RT @AISafetyInst: We've released a technical report detailing our pre-deployment testing of @AnthropicAI's upgraded Claude 3.5 Model with t…

0

22

0

Joseph Bloom

@JBloomAus

3 months

RT @nickcammarata: it’s kind of amazing that rationalists were openly loudly early and directionally correct about covid, crypto, and ai

0

38

0

Joseph Bloom

@JBloomAus

3 months

Gemma Scope promo videos have forced me to listen to a recording of my own voice for the first time in years! Sounds like my brother's voice to me. Also congrats to @lieberum_t for getting an oral at Blackbox NLP for Gemma Scope!

Neel Nanda

@NeelNanda5

3 months

@JBloomAus explaining the Gemma Scope demo made by @neuronpedia @lieberum_t talking about Gemma Scope at Gemma Dev Day Tokyo My attempt to give an overview of what Gemma Scope is

0

1

19

Joseph Bloom

@JBloomAus

3 months

RT @robertwiblin: The UK AISI on its first year. We are very lucky to have them on the case: "AI safety isn't sci-…

0

5

0

Joseph Bloom

@JBloomAus

3 months

I really enjoyed talking to Scott when he was writing this article. Gemma Scope was a really awesome project and I really enjoyed partnering with DeepMind! Worth a read!

Neel Nanda

@NeelNanda5

3 months

There's a lovely article in MIT Technology Review today on my team's Gemma Scope Sparse Autoencoders - check it out! I'd love to see more people using Gemma Scope to learn more about what the hell is going on inside LLMs. Thanks to Scott Mulligan for a great job writing it!

1

2

19

Joseph Bloom

@JBloomAus

3 months

RT @AISafetyInst: AI is moving fast. So are we. A year since our founding at the AI Safety Summit, we've built a world leading organisat…

0

15

0

Joseph Bloom

@JBloomAus

3 months

RT @benaverbook: This is Dario Amodei. He's the CEO behind Claude, one of the world's most advanced AIs. Yesterday, in a 5.5 hour convers…

0

3K

0

Joseph Bloom

@JBloomAus

3 months

RT @geoffreyirving: Excited to see Anthropic putting out safety case sketches for more advanced models than exist today (here, imagined ASL…

0

10

0

Joseph Bloom

@JBloomAus

3 months

RT @AISafetyInst: We’re looking for talented individuals and organisations to help us build evaluations. We’ll reward bounties for new eva…

0

72

0