Evan Anders Profile
Evan Anders

@evanhanders

Followers
74
Following
4K
Statuses
29

AI Safety / Mech Interp postdoctoral scholar @KITPUCSB. Former astrophysical fluid dynamicist @Northwestern (CIERA) and @CUBoulder.

Santa Barbara, CA
Joined November 2015
Don't wanna be here? Send us removal request.
@evanhanders
Evan Anders
6 months
One cool idea that came up that we didn't get to explore: if SIIT actually lets us train prescribed circuits into transformers, then this seems like a great place for figuring out answers to questions like "what kinds of operations can attention heads store in superposition?"
0
0
1
@evanhanders
Evan Anders
6 months
RT @neuronpedia: steering ai is an imperfect art. that's what makes it fun.
Tweet media one
0
2
0
@evanhanders
Evan Anders
8 months
RT @OpenAI: We're sharing progress toward understanding the neural activity of language models. We improved methods for training sparse aut…
0
850
0
@evanhanders
Evan Anders
9 months
RT @AnthropicAI: New Anthropic research paper: Scaling Monosemanticity. The first ever detailed look inside a leading large language model…
0
561
0
@evanhanders
Evan Anders
11 months
@aidanprattewart @AdamSJermyn @_clementneo @JasonObermaier But yeah, that (improvements based on ground-truth features) isn't general/scalable. My hope is that I/we can come up with some experiments to test updates to SAE training, see them get things "more right", then see what happens when they're ported to actual LMs. (3/3)
0
0
1
@evanhanders
Evan Anders
11 months
I'm really excited to look into this more! I imagine that this effect will be less prominent for larger models...except when features occur together frequently. Hoping to study that case in the coming weeks. 🧵4/4
0
0
1
@evanhanders
Evan Anders
1 year
My takeaway: I'd love to see SAEs tested against a suite of benchmarks that test a bunch of different model capabilities. This could identify what sorts of features our SAEs aren't capturing, which could give hints on how to train better SAEs! 🧵4/4
0
0
2
@evanhanders
Evan Anders
1 year
Nice post by @apartresearch (who are giving me mentorship during my skilling-up in AI safety!). The bar plot is 😬.
@JasonObermaier
Jason Hoelscher-Obermaier (Paris & Bay in Feb)
1 year
AI safety needs to scale urgently and @EsbenKC has good suggestions for commercial opportunities with a lot of public benefit potential: These are some of the things that we at @apartresearch like to help make happen. Check it out!
1
0
1