neuronpedia @neuronpedia profile

neuronpedia

@neuronpedia

Followers

377

Following

13

Statuses

14

e/interpretability 🧠🧐

sparse autoencoders

Joined July 2023

Don't wanna be here? Send us removal request.

neuronpedia

@neuronpedia

2 months

🧐 Sparse Autoencoders (SAEs) are a popular way of discovering what an AI model knows. But how do you measure how "good" an SAE is? 🚀 Introducing SAEBench and an interactive explorer: The SAEBench project by @a_karvonen and @can_rager is a suite of evals that compares SAEs across classes, widths, sparsities, and more. Go check it out!

0

2

10

neuronpedia

@neuronpedia

2 months

RT @a_karvonen: Our suite enables researchers to rigorously evaluate SAEs across multiple dimensions. We discuss a few below. For more det…

0

2

0

neuronpedia

@neuronpedia

2 months

RT @a_karvonen: Sparse Autoencoders (SAEs) are popular, with 10+ new approaches proposed in the last year. How do we know if we are making…

0

22

0

neuronpedia

@neuronpedia

3 months

check out our awesome feature in @techreview, then go talk to cat gemma:

MIT Technology Review

@techreview

3 months

Google DeepMind has a new way to look inside an AI’s “mind”

0

1

5

neuronpedia

@neuronpedia

5 months

RT @JBloomAus: 0/8 I’m super excited about work done by my LASR scholars @chanindav, @TomasDulka, @hrdkbhatnagar and James Wilken-Smith. Th…

0

12

0

neuronpedia

@neuronpedia

6 months

RT @lieberum_t: Extremely excited to finally get this into people's hands! Huge achievement by the whole mechinterp team @GoogleDeepMind!…

0

3

0

neuronpedia

@neuronpedia

6 months

@qwweryo @NeelNanda5 thanks! it's fixed now

0

2

neuronpedia

@neuronpedia

6 months

RT @NeelNanda5: In our Gemma Scope release of open Sparse Autoencoders, I LOVED the interactive demo SAEs are like a microscope, breaking…

0

11

0

neuronpedia

@neuronpedia

6 months

@YeshuaGod22 @NeelNanda5 thanks for reporting these. looks like you were searching RES-JB in GPT2-Small, not Gemma Scope features. if you click into a feature, it will tell you the LLM that generated that explanation. eg the first feature in the first screenshot was gpt-3.5-turbo.

1

0

2

neuronpedia

@neuronpedia

6 months

steering ai is an imperfect art. that's what makes it fun.

Google DeepMind

@GoogleDeepMind

6 months

Gemma Scope allows us to study how features evolve throughout the model and interact to create more complex ones. Want to learn more? Here’s an interactive demo made by @neuronpedia - no coding necessary ↓

0

2

8

neuronpedia

@neuronpedia

9 months

RT @johnnylin: exciting new research from @apolloaisafety and @jordantensor: E2E SAEs (w/ ~700k features) are now live on @neuronpedia - th…

0

3

0

neuronpedia

@neuronpedia

10 months

RT @johnnylin: Terrific work by @saprmarks and team! 🥳 We really enjoyed working with them to get their Sparse Autoencoders onto @neuronped…

0

1

0

neuronpedia

@neuronpedia

11 months

RT @johnnylin: 1/ Introducing Neuronpedia: an open platform for interpretability research with hosting, visualizations, and tooling for Spa…

0

29

0