Joshua Batson @thebasepoint profile

Joshua Batson

@thebasepoint

Followers

3K

Following

5K

Statuses

2K

trying to understand evolved systems (🖥 and 🧬) interpretability research @anthropicai formerly @czbiohub, @mit math

Oakland, CA

Joined February 2012

Don't wanna be here? Send us removal request.

Joshua Batson

@thebasepoint

2 months

RT @AnthropicAI: We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-…

0

309

0

Joshua Batson

@thebasepoint

2 months

RT @bneyshabur: Thrilled to share that I’m joining @AnthropicAI ! After 5.5 amazing years at Alphabet, including working on Gemini’s reaso…

0

23

0

Joshua Batson

@thebasepoint

4 months

RT @AnthropicAI: Crosscoders (published today: are a new method allowing us to find features shared across differe…

0

32

0

Joshua Batson

@thebasepoint

4 months

RT @esindurmusnlp: Excited to share my new research on evaluating feature steering: I ran quantitative evaluations on how steering specific…

0

18

0

Joshua Batson

@thebasepoint

5 months

@Jack_W_Lindsey @selmaanchettih whoa

0

4

Joshua Batson

@thebasepoint

5 months

Very nicely constructed baselines testing if some foundation models generalize better to unseen genes or gene interactions. Answer is no. If your goal is to predict, not to publish, then (as ever) gathering good data and fitting a well-designed simple model is still best.

1

13

Joshua Batson

@thebasepoint

6 months

@a_karvonen How does feature splitting seem to show up here?

1

0

1

Joshua Batson

@thebasepoint

6 months

@a_karvonen ah this is a nice feature!

0

1

Joshua Batson

@thebasepoint

6 months

@a_karvonen Lovely work! Are the pair of per-method trajectories on the right different dictionary sizes?

1

0

1

Joshua Batson

@thebasepoint

6 months

@livgorton i wonder if this holds for all the other equivariant features in vision models.

1

0

1

Joshua Batson

@thebasepoint

7 months

RT @cogcelia: The thing about AI is that no one knows how it works (not even AI developers). Interpreting AI is HARD, but it’s a challenge…

0

9

0

Joshua Batson

@thebasepoint

7 months

RT @farairesearch: Do neural networks dream of internal goals? We confirm RNNs trained to play Sokoban with RL learn to plan. Our black-box…

0

44

0

Joshua Batson

@thebasepoint

7 months

This is an absolutely lovely essay

Jacob Andreas

@jacobandreas

7 months

Some thoughts on how to think about "world models" in language models and beyond:

0

11

Joshua Batson

@thebasepoint

7 months

Excellent new work from Ben and Michael on; white box attacks yielding universalizing jailbreaks. The mix of discrete and continuous optimization is very hard to get right, and these guys are some of the best out there. Impressive results.

Ben Thompson

@tbenthompson

7 months

1/ @michaelbsklar and I just published "Fluent student-teacher redteaming" - The key idea is an improved objective function for discrete-optimization-based adversarial attacks based on distilling the activations/logits from a toxified model.

0

4

Joshua Batson

@thebasepoint

7 months

A great opportunity for *anyone* to do some interpretability on a ~frontier model.

David Bau

@davidbau

7 months

Time to study #llama3 405b, but gosh it's big! Please retweet: if you have a great experiment but not enough GPU, here is an opportunity to apply for shared #NDIF research resources. Deadline July 30: You'll help @ndif_team test, we'll help you run 405b

0

1

19

Joshua Batson

@thebasepoint

8 months

RT @livgorton: A lot of early mechanistic interpretability work focused on InceptionV1 (an ImageNet model from 2014). They made a lot of pr…

0

34

0

Joshua Batson

@thebasepoint

8 months

Last week I tried my hand at hosting a podcast for the first time, interviewing my colleagues about the engineering work that went into scaling monosemanticity. If this sounds fun to you, we are hiring senior engineers...

Anthropic

@AnthropicAI

8 months

Science and engineering are inseparable. Watch our new roundtable video where our researchers discuss the engineering challenges of interpretability research:

0

2

46

Joshua Batson

@thebasepoint

8 months

RT @alexalbert__: What is interpretability?

0

44

0

Joshua Batson

@thebasepoint

8 months

RT @nabla_theta: Excited to share what I've been working on as part of the former Superalignment team! We introduce a SOTA training stack…

0

85

0