neuronpedia Profile Banner
neuronpedia Profile
neuronpedia

@neuronpedia

Followers
377
Following
13
Statuses
14

e/interpretability šŸ§ šŸ§

sparse autoencoders
Joined July 2023
Don't wanna be here? Send us removal request.
@neuronpedia
neuronpedia
2 months
šŸ§ Sparse Autoencoders (SAEs) are a popular way of discovering what an AI model knows. But how do you measure how "good" an SAE is? šŸš€ Introducing SAEBench and an interactive explorer: The SAEBench project by @a_karvonen and @can_rager is a suite of evals that compares SAEs across classes, widths, sparsities, and more. Go check it out!
0
2
10
@neuronpedia
neuronpedia
2 months
RT @a_karvonen: Our suite enables researchers to rigorously evaluate SAEs across multiple dimensions. We discuss a few below. For more detā€¦
0
2
0
@neuronpedia
neuronpedia
2 months
RT @a_karvonen: Sparse Autoencoders (SAEs) are popular, with 10+ new approaches proposed in the last year. How do we know if we are makingā€¦
0
22
0
@neuronpedia
neuronpedia
3 months
check out our awesome feature in @techreview, then go talk to cat gemma:
@techreview
MIT Technology Review
3 months
Google DeepMind has a new way to look inside an AIā€™s ā€œmindā€
0
1
5
@neuronpedia
neuronpedia
5 months
RT @JBloomAus: 0/8 Iā€™m super excited about work done by my LASR scholars @chanindav, @TomasDulka, @hrdkbhatnagar and James Wilken-Smith. Thā€¦
0
12
0
@neuronpedia
neuronpedia
6 months
RT @lieberum_t: Extremely excited to finally get this into people's hands! Huge achievement by the whole mechinterp team @GoogleDeepMind!ā€¦
0
3
0
@neuronpedia
neuronpedia
6 months
@qwweryo @NeelNanda5 thanks! it's fixed now
0
0
2
@neuronpedia
neuronpedia
6 months
RT @NeelNanda5: In our Gemma Scope release of open Sparse Autoencoders, I LOVED the interactive demo SAEs are like a microscope, breakingā€¦
0
11
0
@neuronpedia
neuronpedia
6 months
@YeshuaGod22 @NeelNanda5 thanks for reporting these. looks like you were searching RES-JB in GPT2-Small, not Gemma Scope features. if you click into a feature, it will tell you the LLM that generated that explanation. eg the first feature in the first screenshot was gpt-3.5-turbo.
Tweet media one
1
0
2
@neuronpedia
neuronpedia
6 months
steering ai is an imperfect art. that's what makes it fun.
Tweet media one
@GoogleDeepMind
Google DeepMind
6 months
Gemma Scope allows us to study how features evolve throughout the model and interact to create more complex ones. Want to learn more? Hereā€™s an interactive demo made by @neuronpedia - no coding necessary ā†“
0
2
8
@neuronpedia
neuronpedia
9 months
RT @johnnylin: exciting new research from @apolloaisafety and @jordantensor: E2E SAEs (w/ ~700k features) are now live on @neuronpedia - thā€¦
0
3
0
@neuronpedia
neuronpedia
10 months
RT @johnnylin: Terrific work by @saprmarks and team! šŸ„³ We really enjoyed working with them to get their Sparse Autoencoders onto @neuronpedā€¦
0
1
0
@neuronpedia
neuronpedia
11 months
RT @johnnylin: 1/ Introducing Neuronpedia: an open platform for interpretability research with hosting, visualizations, and tooling for Spaā€¦
0
29
0