Evan Anders @evanhanders profile

Evan Anders

@evanhanders

Followers

74

Following

4K

Statuses

29

AI Safety / Mech Interp postdoctoral scholar @KITPUCSB. Former astrophysical fluid dynamicist @Northwestern (CIERA) and @CUBoulder.

Santa Barbara, CA

Joined November 2015

Don't wanna be here? Send us removal request.

Evan Anders

@evanhanders

6 months

One cool idea that came up that we didn't get to explore: if SIIT actually lets us train prescribed circuits into transformers, then this seems like a great place for figuring out answers to questions like "what kinds of operations can attention heads store in superposition?"

0

1

Evan Anders

@evanhanders

6 months

RT @neuronpedia: steering ai is an imperfect art. that's what makes it fun.

0

2

0

Evan Anders

@evanhanders

8 months

RT @OpenAI: We're sharing progress toward understanding the neural activity of language models. We improved methods for training sparse aut…

0

850

0

Evan Anders

@evanhanders

9 months

RT @AnthropicAI: New Anthropic research paper: Scaling Monosemanticity. The first ever detailed look inside a leading large language model…

0

561

0

Evan Anders

@evanhanders

11 months

@aidanprattewart @AdamSJermyn @_clementneo @JasonObermaier But yeah, that (improvements based on ground-truth features) isn't general/scalable. My hope is that I/we can come up with some experiments to test updates to SAE training, see them get things "more right", then see what happens when they're ported to actual LMs. (3/3)

0

1

Evan Anders

@evanhanders

11 months

I'm really excited to look into this more! I imagine that this effect will be less prominent for larger models...except when features occur together frequently. Hoping to study that case in the coming weeks. 🧵4/4

0

1

Evan Anders

@evanhanders

1 year

My takeaway: I'd love to see SAEs tested against a suite of benchmarks that test a bunch of different model capabilities. This could identify what sorts of features our SAEs aren't capturing, which could give hints on how to train better SAEs! 🧵4/4

0

2

Evan Anders

@evanhanders

1 year

Nice post by @apartresearch (who are giving me mentorship during my skilling-up in AI safety!). The bar plot is 😬.

Jason Hoelscher-Obermaier (Paris & Bay in Feb)

@JasonObermaier

1 year

AI safety needs to scale urgently and @EsbenKC has good suggestions for commercial opportunities with a lot of public benefit potential: These are some of the things that we at @apartresearch like to help make happen. Check it out!

1

0

1