CFGeek Profile Banner
Charles Foster Profile
Charles Foster

@CFGeek

Followers
3K
Following
18K
Media
496
Statuses
5K

Excels at reasoning & tool use🪄 Tensor-enjoyer 🧪 @METR_Evals. My COI policy is available under “Disclosures” at https://t.co/bihrMIUKJq

Oakland, CA
Joined June 2020
Don't wanna be here? Send us removal request.
@CFGeek
Charles Foster
3 years
Running list of conjectures about neural networks 📜:
6
10
159
@CFGeek
Charles Foster
8 hours
It's a big day for understanding how LLMs generalize from their training signals!
@Turn_Trout
Alex Turner
11 hours
“Output-based training will keep chains-of-thought honest.” Sadly, NO. We show that training on *just the output* can still cause models to hide unwanted behavior in their chain-of-thought. MATS 8.0 Team Shard presents: a 🧵
0
0
10
@CFGeek
Charles Foster
10 hours
Such a simple, yet ridiculous-sounding method. It has every right to work this well.
1
0
4
@CFGeek
Charles Foster
10 hours
You literally just add a prompt (or some other intervention like a steering vector) that explains-away the unwanted pattern of generalization. That's it.
1
0
4
@CFGeek
Charles Foster
10 hours
a.k.a. the unreasonable effectiveness of inoculation
@AnthropicAI
Anthropic
11 hours
Remarkably, prompts that gave the model permission to reward hack stopped the broader misalignment. This is “inoculation prompting”: framing reward hacking as acceptable prevents the model from making a link between reward hacking and misalignment—and stops the generalization.
2
0
6
@CFGeek
Charles Foster
13 hours
*taps the sign*
@m__dehghani
Mostafa Dehghani
2 days
Thinking (test-time compute) in pixel space... 🍌 Pro tip: always peek at the thoughts if you use AI Studio. Watching the model think in pictures is really fun!
0
0
10
@CFGeek
Charles Foster
23 hours
This is the most impressive release I’ve seen in a while. Fully open suite, from the start of training to multiple endpoints (chat, reasoning, domain-specific RL), with every dataset used along the way. Incredible research potential here.
@allen_ai
Ai2
2 days
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵
0
3
27
@CFGeek
Charles Foster
1 day
When doing independent evals for open-weight releases: You can use the API from the developer, but what if it trains on your data? You can use third-party infra, but what if that is buggy/incorrect at first? And waiting for those to stabilize means you’re slow to publish.
@METR_Evals
METR
1 day
We chose this inference provider because its policies indicated it wouldn't retain or train on our tasks. Our guess is that this produces lower performance for Kimi K2 Thinking than what we would see from the developer's own API. https://t.co/GHDsqP459g
4
1
56
@voooooogel
thebes
15 days
at the suggestion of @CFGeek & @joel_bkr, i'm running a manifundraiser for my model tinkering! it's already passed the minimum goal of $5k, but has stretch goals for funding more open-ended research. if that interests you, you can find it here: https://t.co/kQYe3x79To
1
17
61
@allen_ai
Ai2
2 days
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵
47
300
2K
@CFGeek
Charles Foster
2 days
A little visibility into how AI companies like OpenAI work with external assessors. It even includes snippets from their legal agreements!
@_lamaahmad
Lama Ahmad لمى احمد
2 days
Third party testing has long been part of our safety work. Our new blog shows how we collaborate on capability evaluations, methodology reviews, and expert probing that brings in domain expertise — all strengthening the broader safety ecosystem.
0
0
15
@METR_Evals
METR
2 days
METR completed a pre-deployment evaluation of GPT-5.1-Codex-Max & found its capabilities consistent with past trends. If our projections hold, we expect further OpenAI development in the next 6 months is unlikely to pose catastrophic risk via automated AI R&D or rogue autonomy.
8
39
328
@CFGeek
Charles Foster
4 days
Great guest lecture from @SuryaGanguli’s theoretical neuroscience course. Felt like a complete impostor taking it years ago, but it was one of those classes that expands your mind even if you’re drowning in the material.
@GoodfireAI
Goodfire
4 days
Check out Atticus Geiger's Stanford guest lecture - on causal approaches to interpretability - for an overview of one of our areas of research! 01:51 - Activation steering (e.g. Golden Gate Claude) 10:23 - Causal mediation analysis (understanding the contribution of an
0
1
42
@CFGeek
Charles Foster
4 days
Regrettable “own goal” how AI safety folks made an enemy out of open-source a few years back. Lots of the technical problems in safety & security would benefit from greater transparency around how models are developed (e.g. attribution, evaluation, interpretability, verification)
@ylecun
Yann LeCun
8 days
@ChrisMurphyCT You're being played by people who want regulatory capture. They are scaring everyone with dubious studies so that open source models are regulated out of existence.
4
13
84
@AndrewLampinen
Andrew Lampinen
6 days
This offers an interesting perspective on how hippocampal learning complements cortical: cortical learning may be overly tied to explicit learning goals/formulations, while episodic retrieval of learning experiences can allow using latent information more flexibly. 7/
1
3
11
@AndrewLampinen
Andrew Lampinen
6 days
I was honored to speak at Princeton’s symposium on The Physics of John Hopfield: Learning & Intelligence this week. I sketched out a perspective that ties together some of our recent work on ICL vs. parametric learning, and some possible links to hippocampal replay: 1/
3
29
249
@Sauers_
Sauers
6 days
I would like to introduce the concept of a phenome-wide association study to my LLM-interpreting friends. It's where you start with a genetic variant, and figure out which phenotypes are associated with it (the inverse of figuring out which variants are associated with a trait).
6
3
49
@deanwball
Dean W. Ball
8 days
Today, I write about US-China competition in AI and associated technologies as I see it. “Race” is a fairly poor analogy, in my view, for understanding the magnitude of what is underway.
11
27
220
@CFGeek
Charles Foster
8 days
Can somebody with a cybersecurity background weigh in on how big of a deal this is? Just finished the report, but I didn’t feel like I learned much from it.
@AnthropicAI
Anthropic
9 days
We believe this is the first documented case of a large-scale AI cyberattack executed without substantial human intervention. It has significant implications for cybersecurity in the age of AI agents. Read more:
29
4
152
@CFGeek
Charles Foster
10 days
This is very cool!
@random_walker
Arvind Narayanan
10 days
We enjoyed the opportunity for productive discussion with the authors of AI 2027 to find areas of common ground. We are also planning an “adversarial collaboration”.
0
0
13
@PreetumNakkiran
Preetum Nakkiran
11 days
LLMs are notorious for "hallucinating": producing confident-sounding answers that are entirely wrong. But with the right definitions, we can extract a semantic notion of "confidence" from LLMs, and this confidence turns out to be calibrated out-of-the-box in many settings (!)
22
82
586