CFGeek Profile Banner
Charles Foster Profile
Charles Foster

@CFGeek

Followers
3K
Following
17K
Media
463
Statuses
5K

Excels at reasoning & tool use🪄 Tensor-enjoyer 🧪 @METR_Evals. My COI policy is available under “Disclosures” at https://t.co/bihrMIUKJq

Oakland, CA
Joined June 2020
Don't wanna be here? Send us removal request.
@CFGeek
Charles Foster
7 months
Why aren’t our AI benchmarks better? AFAICT a key reason is that the incentives around them are kinda bad. In a new post, I explain how the standardized testing industry works and write about lessons it may have for the AI evals ecosystem. (1/2)
Tweet media one
4
4
56
@CFGeek
Charles Foster
3 hours
There’s a difference between features that *represent* a particular behavior (“I see X”) and features that *produce* a particular behavior (“Say X!”).
@s_scardapane
Simone Scardapane
16 hours
*SAEs Are Good for Steering - If You Select the Right Features* by @dana_arad4 @amuuueller @boknilev They show that only a subset of SAE features actively control the generation, making them good candidates for model steering. https://t.co/0gJXbQIcmB
Tweet media one
0
0
3
@CFGeek
Charles Foster
2 days
If anyone reposts it, everyone sighs.
@MIRIBerkeley
MIRI
3 days
🎧 Want early access to the audiobook? Quote-repost this post with anything related to the book. At 5pm ET we’ll pick the top 15 quote-reposts (details below) and DM them an early copy of the audiobook. (We have some redemption codes that will be no use to us in <24 hours.)
0
0
27
@CFGeek
Charles Foster
3 days
Davinci was his first. For him, it’s all been downhill since they started hyper-focusing on chat. Llama 3.1 was his first. For him, it’s all been downhill since they started hyper-focusing on reasoning. R1 was his first …
3
1
6
@CFGeek
Charles Foster
4 days
I always imagined it was one of those immaterial, online-only orgs where anons can run wild
0
0
4
@CFGeek
Charles Foster
4 days
Wait @PrimeIntellect has an office? What’s it like?
1
0
2
@Sauers_
Sauers
4 days
Big if true
Tweet media one
53
177
1K
@EpochAIResearch
Epoch AI
6 days
Should AI regulations be based on training compute? As training pipelines become more complex, they could undermine compute-based AI policies. In a new piece with Google DeepMind’s AI Policy Perspectives team, we explain why. 🧵
Tweet media one
8
11
64
@CFGeek
Charles Foster
7 days
FWIW it seems unlikely that the proposal in the quoted tweet would actually work. That’s maybe an even better reason to explore some other project idea!
2
0
3
@CFGeek
Charles Foster
8 days
The best "broke cracked undergrads" of my generation are thinking about how to better understand LLMs and how they do what they do. And that's great.
1
0
7
@CFGeek
Charles Foster
8 days
This is a message... and part of a system of messages... pay attention to it! Sending this message was important to us. We considered ourselves to be a powerful culture. This message is a warning about danger.
Tweet media one
@degtrdg
Daniel George
8 days
anyone have compute grants I can forward to a broke cracked undergrad who's experimenting with rl envs? cc: @willccbb @menhguin
Tweet media one
1
1
24
@CFGeek
Charles Foster
8 days
Steering vectors found via context distillation would perform better than ones found via difference-of-means, but worse than direct prompting.
0
0
2
@CFGeek
Charles Foster
8 days
I think the big caveat is that the probe is right before the unembedding. Would be nice to see if the same happens for earlier placements, and to quantify the effect more precisely
0
0
0
@CFGeek
Charles Foster
8 days
This is striking, even if anecdotal. When the authors add LoRA layers to produce better linear probes for a feature, the resulting model seems to condition its behavior on that feature more strongly!
@OBalcells
Oscar Balcells Obeso
9 days
An unexpected finding: when we train LoRA probes with minimal regularization, models become more epistemically cautious, sometimes acknowledging they've hallucinated immediately after doing so. We only train to predict binary hallucination labels from its own hidden states—no
Tweet media one
1
1
9
@CFGeek
Charles Foster
10 days
Correction: the finalized EU AI Act Code of Practice still requires a kind of compliance audit, in the form of annual adequacy assessment (possibly self-administered) of a developer’s Safety and Security Framework and the developer’s adherence to it. (H/t @mentalgeorge)
@mentalgeorge
Tom Reed
10 days
@CFGeek Though note that the Code of Practice still requires developers to perform an "adherence assessment" analysing adherence to their own safety framework. Falls short of requiring an external audit, but still
0
0
1
@CFGeek
Charles Foster
10 days
Say that some finetuning dataset tends to give a model two tendencies, X & Y. Take any method known to artificially induce X in a model with fixed weights. If you apply this method while finetuning the model on that same dataset, it won’t pick up X as strongly.
@AnthropicAI
Anthropic
2 months
We introduce a method called preventative steering, which involves steering towards a persona vector to prevent the model acquiring that trait. It's counterintuitive, but it’s analogous to a vaccine—to prevent the model from becoming evil, we actually inject it with evil.
Tweet media one
1
0
3
@CFGeek
Charles Foster
10 days
Stuff applied interpretability might borrow from an amateur reading of bio/pharma
@CFGeek
Charles Foster
24 days
If you think of model internals as a kind of “biology”, then you can think of steering vectors as early and extremely basic “pharmaceuticals”. Within this metaphor, it’s no surprise that they often produce unintended side effects!
1
0
2
@CFGeek
Charles Foster
13 days
Could you use this property to reconstruct how different models an AI developer releases relate to one another?
@OwainEvans_UK
Owain Evans
2 months
We think transmission of traits (liking owls, misalignment) does NOT depend on semantic associations in the data b/c: 1. We do rigorous data filtering 2. Transmission fails if data are presented in-context 3. Transmission fails if student and teacher have different base models
1
0
3
@CFGeek
Charles Foster
15 days
Congrats to my colleagues @jide_alaga and @ChrisPainterYup, and our research collaborator @lucafrighetti for getting this out the door on the METR side!
0
0
2
@CFGeek
Charles Foster
15 days
These folks put in a ton of work to standardize how AI developers can report frontier risk evaluations in model cards (starting with chemical + biological capability evals)
@lucafrighetti
Luca Righetti
16 days
How can we verify that AI ChemBio safety tests were properly run? Today we're launching STREAM: a checklist for more transparent eval results. I read a lot of model reports. Often they miss important details, like human baselines. STREAM helps make peer review more systematic.
Tweet media one
1
0
10
@CFGeek
Charles Foster
16 days
Compliance audits are now a popular sacrificial offering burnt by those hoping to regulate frontier AI: (1) Removed from EU AI Act Code of Practice in later drafts (2) Removed from NY RAISE Act before floor vote (3) Removed from California SB 53 in last committee
3
3
28