CFGeek Profile Banner
Charles Foster Profile
Charles Foster

@CFGeek

Followers
3K
Following
18K
Media
498
Statuses
5K

Excels at reasoning & tool use🪄 Tensor-enjoyer 🧪 @METR_Evals. My COI policy is available under “Disclosures” at https://t.co/bihrMIUKJq

Oakland, CA
Joined June 2020
Don't wanna be here? Send us removal request.
@CFGeek
Charles Foster
3 years
Running list of conjectures about neural networks 📜:
6
10
159
@RishiBommasani
rishi@NeurIPS
1 day
If the eval informed decisions, including release, spend more talking about how/why. Technically credible third parties should instead be the main producers of results on public evals with full methodological transparency that can be standardized and compared across companies
0
1
5
@RishiBommasani
rishi@NeurIPS
1 day
As a concrete example, many current system cards allocate a lot of space to results on a bunch of public evals with mediocre-at-best experimental transparency. This is an uncanny valley where current practice is not what we want.
1
2
7
@RishiBommasani
rishi@NeurIPS
1 day
In exchange, frontier labs should publish *a lot* more about post-deployment insights, because this high-value insight is only possible with their usage data and related telemetry. This should be housed on websites/UIs appropriate for 2025, not static docs like its the 80s.
1
1
6
@RishiBommasani
rishi@NeurIPS
1 day
For cards released alongside model/system release, frontier labs should prioritize what must be said (e.g. if/how they are certain risk thresholds are met). Saying less reduces burden during the intense pre-release period.
1
2
6
@RishiBommasani
rishi@NeurIPS
1 day
Model/system cards should evolve because - Frontier models get updated a lot beyond the main training run - Elicitation (e.g. thinking mode, amount of test-time compute) matters a lot - Post-deployment insight is really valuable, yet largely unaddressed with static system cards
1
6
21
@CFGeek
Charles Foster
12 hours
It seems like software engineers these days are mostly integrating closed-weight models into their workflows. By contrast, someone told me that folks in bio are using open-weight models a lot more. Can anyone confirm whether this is accurate?
0
0
7
@CFGeek
Charles Foster
4 days
Very bullish on recontextualization methods such as inoculation prompting. The ambitious vision is that even simple tools like promoting + finetuning can work to steer generalization (i.e. to choose *what* models pick up from training)
1
0
34
@secemp9
secemp
5 days
Based on the recent blog and paper from Anthropic, I made a blogpost detailing what I think about it in details and why I think we could do better (link in the replies)
4
13
81
@joel_bkr
Joel Becker
5 days
How might @METR_Evals' time horizon trend change if compute growth slows? In a new paper, @whitfill_parker, @bsnodin, and I show that trends + a common (and contestable -- read on!) economic model of algorithmic progress can imply substantial delays in AI capability milestones.
10
35
189
@CFGeek
Charles Foster
6 days
*paper as
0
0
1
@CFGeek
Charles Foster
6 days
I think of the recent Anthropic paper “using in-context rationales to protect against unwanted out-of-context generalization from reward hacks”.
@OwainEvans_UK
Owain Evans
1 year
We call this: *out-of-context reasoning*  (OOCR). This contrasts with regular *in-context learning* (ICL), where all the training examples are simply pasted into the prompt (with no finetuning). We evaluate ICL on the same tasks and find OOCR performs much better.
2
0
42
@CFGeek
Charles Foster
7 days
0
0
2
@CFGeek
Charles Foster
7 days
An employee claims that this AI developer releases its model weights “within a few hours” after training. Big if true.
@natolambert
Nathan Lambert
8 days
I asked (on ChinaTalk) the head of product at Z ai, one of the leading Chinese companies building open models, how long it takes them to get their model out the door once its done training. Incredible stuff: "a few hours" and the model is on HuggingFace.
2
0
15
@CFGeek
Charles Foster
8 days
It's a big day for understanding how LLMs generalize from their training signals!
@Turn_Trout
Alex Turner
8 days
“Output-based training will keep chains-of-thought honest.” Sadly, NO. We show that training on *just the output* can still cause models to hide unwanted behavior in their chain-of-thought. MATS 8.0 Team Shard presents: a 🧵
3
1
33
@CFGeek
Charles Foster
8 days
Such a simple, yet ridiculous-sounding method. It has every right to work this well.
1
0
4
@CFGeek
Charles Foster
8 days
You literally just add a prompt (or some other intervention like a steering vector) that explains-away the unwanted pattern of generalization. That's it.
1
0
4
@CFGeek
Charles Foster
8 days
a.k.a. the unreasonable effectiveness of inoculation
@AnthropicAI
Anthropic
8 days
Remarkably, prompts that gave the model permission to reward hack stopped the broader misalignment. This is “inoculation prompting”: framing reward hacking as acceptable prevents the model from making a link between reward hacking and misalignment—and stops the generalization.
2
0
9
@CFGeek
Charles Foster
8 days
*taps the sign*
@m__dehghani
Mostafa Dehghani
9 days
Thinking (test-time compute) in pixel space... 🍌 Pro tip: always peek at the thoughts if you use AI Studio. Watching the model think in pictures is really fun!
0
0
10
@CFGeek
Charles Foster
8 days
This is the most impressive release I’ve seen in a while. Fully open suite, from the start of training to multiple endpoints (chat, reasoning, domain-specific RL), with every dataset used along the way. Incredible research potential here.
@allen_ai
Ai2
9 days
Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵
0
3
28