![Iván Arcuschin Profile](https://pbs.twimg.com/profile_images/1810728144640331776/MT8_c9F9_x96.jpg)
Iván Arcuschin
@IvanArcus
Followers
76
Following
6K
Statuses
23
Independent Researcher | AI Safety & Software Engineering
Argentina
Joined March 2011
Our paper introducing InterpBench was accepted to @NeurIPSConf ! 🚀 Check it out if you want to know how we built a benchmark of semi-synthetic, realistic transformers with known circuits! 🔥 Congrats and thanks to my awesome co-authors @RohDGupta @Kwathomas0 @AdriGarriga
Circuit discovery techniques aim to find subgraphs of NNs for specific tasks. Are they correct? Which one is the best? 🕵️ Introducing InterpBench: 17 semi-synthetic, realistic transformers with known circuits to evaluate mechanistic interpretability. Read on... 🧵
2
3
21
@NeurIPSConf is almost here!! 🤩 InterpBench has been expanded to 86 models since our latest update! If you are interested in rigorous evaluation of Mech Interp techniques come chat with us! We'll be at Poster Session 5 East on Fri 13 Dec 11AM — 2PM
Circuit discovery techniques aim to find subgraphs of NNs for specific tasks. Are they correct? Which one is the best? 🕵️ Introducing InterpBench: 17 semi-synthetic, realistic transformers with known circuits to evaluate mechanistic interpretability. Read on... 🧵
0
0
3
@Butanium_ @sprice354_ @NeurIPSConf @RohDGupta @Kwathomas0 @AdriGarriga Right now all models have only one algorithmic task, so not much superposition, but we are looking to expand it for SAE evaluations! @evanhanders has done some great initial work in that direction:
0
0
1
RT @uit_bos: Circuits are supposed to explain how a model accomplishes a task. But do they really succeed at this? We evaluate three circu…
0
3
0
RT @farairesearch: Check out #ICML2024 posters by @MATSprogram scholars mentored by @AdriGarriga! July 26: NextGen AI Safety 💥Catastrophi…
0
4
0
Work done with my awesome collaborators: @RohDGupta, @Kwathomas0, @AdriGarriga Source code for InterpBench and experiments: For more details, check out our paper:
0
0
4