Joshua Batson Profile
Joshua Batson

@thebasepoint

Followers
3K
Following
5K
Statuses
2K

trying to understand evolved systems (🖥 and 🧬) interpretability research @anthropicai formerly @czbiohub, @mit math

Oakland, CA
Joined February 2012
Don't wanna be here? Send us removal request.
@thebasepoint
Joshua Batson
2 months
RT @AnthropicAI: We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-…
0
309
0
@thebasepoint
Joshua Batson
2 months
RT @bneyshabur: Thrilled to share that I’m joining @AnthropicAI ! After 5.5 amazing years at Alphabet, including working on Gemini’s reaso…
0
23
0
@thebasepoint
Joshua Batson
4 months
RT @AnthropicAI: Crosscoders (published today: are a new method allowing us to find features shared across differe…
0
32
0
@thebasepoint
Joshua Batson
4 months
RT @esindurmusnlp: Excited to share my new research on evaluating feature steering: I ran quantitative evaluations on how steering specific…
0
18
0
@thebasepoint
Joshua Batson
5 months
0
0
4
@thebasepoint
Joshua Batson
5 months
Very nicely constructed baselines testing if some foundation models generalize better to unseen genes or gene interactions. Answer is no. If your goal is to predict, not to publish, then (as ever) gathering good data and fitting a well-designed simple model is still best.
1
1
13
@thebasepoint
Joshua Batson
6 months
@a_karvonen How does feature splitting seem to show up here?
1
0
1
@thebasepoint
Joshua Batson
6 months
@a_karvonen ah this is a nice feature!
0
0
1
@thebasepoint
Joshua Batson
6 months
@a_karvonen Lovely work! Are the pair of per-method trajectories on the right different dictionary sizes?
1
0
1
@thebasepoint
Joshua Batson
6 months
@livgorton i wonder if this holds for all the other equivariant features in vision models.
1
0
1
@thebasepoint
Joshua Batson
7 months
RT @cogcelia: The thing about AI is that no one knows how it works (not even AI developers). Interpreting AI is HARD, but it’s a challenge…
0
9
0
@thebasepoint
Joshua Batson
7 months
RT @farairesearch: Do neural networks dream of internal goals? We confirm RNNs trained to play Sokoban with RL learn to plan. Our black-box…
0
44
0
@thebasepoint
Joshua Batson
7 months
This is an absolutely lovely essay
@jacobandreas
Jacob Andreas
7 months
Some thoughts on how to think about "world models" in language models and beyond:
Tweet media one
0
0
11
@thebasepoint
Joshua Batson
7 months
Excellent new work from Ben and Michael on; white box attacks yielding universalizing jailbreaks. The mix of discrete and continuous optimization is very hard to get right, and these guys are some of the best out there. Impressive results.
@tbenthompson
Ben Thompson
7 months
1/ @michaelbsklar and I just published "Fluent student-teacher redteaming" - The key idea is an improved objective function for discrete-optimization-based adversarial attacks based on distilling the activations/logits from a toxified model.
0
0
4
@thebasepoint
Joshua Batson
7 months
A great opportunity for *anyone* to do some interpretability on a ~frontier model.
@davidbau
David Bau
7 months
Time to study #llama3 405b, but gosh it's big! Please retweet: if you have a great experiment but not enough GPU, here is an opportunity to apply for shared #NDIF research resources. Deadline July 30: You'll help @ndif_team test, we'll help you run 405b
0
1
19
@thebasepoint
Joshua Batson
8 months
RT @livgorton: A lot of early mechanistic interpretability work focused on InceptionV1 (an ImageNet model from 2014). They made a lot of pr…
0
34
0
@thebasepoint
Joshua Batson
8 months
Last week I tried my hand at hosting a podcast for the first time, interviewing my colleagues about the engineering work that went into scaling monosemanticity. If this sounds fun to you, we are hiring senior engineers...
@AnthropicAI
Anthropic
8 months
Science and engineering are inseparable. Watch our new roundtable video where our researchers discuss the engineering challenges of interpretability research:
0
2
46
@thebasepoint
Joshua Batson
8 months
RT @alexalbert__: What is interpretability?
0
44
0
@thebasepoint
Joshua Batson
8 months
RT @nabla_theta: Excited to share what I've been working on as part of the former Superalignment team! We introduce a SOTA training stack…
0
85
0