CFGeek Profile Banner
Charles Foster Profile
Charles Foster

@CFGeek

Followers
3K
Following
18K
Media
500
Statuses
5K

Excels at reasoning & tool use🪄 Tensor-enjoyer 🧪 @METR_Evals. My COI policy is available under “Disclosures” at https://t.co/bihrMIUKJq

Oakland, CA
Joined June 2020
Don't wanna be here? Send us removal request.
@CFGeek
Charles Foster
3 years
Running list of conjectures about neural networks 📜:
6
10
160
@CFGeek
Charles Foster
1 day
I think it's good for independent evaluators to provide information about the *context* surrounding their evaluation, in addition to their eval results.
@aievalforum
AI Evaluator Forum
2 days
As part of our launch, we are releasing AEF-1, a new standard which ensures a baseline level of independence, access, and transparency for evaluations: https://t.co/k3iFtdZMrT
0
3
20
@CFGeek
Charles Foster
2 days
Putting this side by side with the Anthropic paper, it looks like we now have two basic strategies to cope with reward hacking, beyond patching the environments: shaping how models generalize from hacking and incentivizing models to honestly report when they hack.
@OpenAI
OpenAI
3 days
In a new proof-of-concept study, we’ve trained a GPT-5 Thinking variant to admit whether the model followed instructions. This “confessions” method surfaces hidden failures—guessing, shortcuts, rule-breaking—even when the final answer looks correct. https://t.co/4vgG9wS3SE
5
11
105
@voooooogel
thebes
3 days
@norvid_studies would the Watchers really do that? go in the prompt and tell lies?
1
4
25
@CFGeek
Charles Foster
3 days
This is an interesting idea, though I’m unclear on if it’s worthwhile. To get this kind of customization otherwise you would need to run your own training; at best, you could build on top of a fully open model suite like Olmo 3 that provides intermediate checkpoints and datasets.
@tbpn
TBPN
3 days
Today, AWS CEO Matt Garman announced Nova Forge, a model builder which lets companies inject their own data during the pre-training phase. "You [tell Forge]: 'Here's my corpus of corporate data, here's everything I need to know about my industry.' We then mix that in and finish
1
0
6
@CFGeek
Charles Foster
3 days
More language switching in scratchpads from DeepSeek models.
@HarveenChadha
Harveen Singh Chadha
5 days
deepseek inference speed has significantly improved... but why does it switch reasoning language mid way?? is there a study where it is shown that reasoning in non english languages is better?
0
0
7
@CFGeek
Charles Foster
4 days
As I understand it, Kyutai is/was a nonprofit lab doing research on open-weight voice models, and a bunch of its technical leadership has now spun out into a startup to build voice AI products.
@GradiumAI
Gradium
4 days
Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.
1
0
4
@AnjneyMidha
Anjney Midha
4 days
mechint is cool, but there are many other types of interp research that don’t get enough attention which should good direction
@NeelNanda5
Neel Nanda
5 days
The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our post details how we now do research, why now is the time to pivot, why we expect this way to have more impact and why we think other interp researchers should follow suit
1
3
31
@RishiBommasani
rishi@NeurIPS
6 days
In AI policy debates, I rarely value government-facing transparency on frontier AI and think most benefits require public information. Yet sharing information that broadly may create risks for frontier AI companies. Why do I think government-facing transparency is rarely useful?
3
3
22
@RishiBommasani
rishi@NeurIPS
8 days
If the eval informed decisions, including release, spend more talking about how/why. Technically credible third parties should instead be the main producers of results on public evals with full methodological transparency that can be standardized and compared across companies
1
1
6
@RishiBommasani
rishi@NeurIPS
8 days
As a concrete example, many current system cards allocate a lot of space to results on a bunch of public evals with mediocre-at-best experimental transparency. This is an uncanny valley where current practice is not what we want.
1
2
10
@RishiBommasani
rishi@NeurIPS
8 days
In exchange, frontier labs should publish *a lot* more about post-deployment insights, because this high-value insight is only possible with their usage data and related telemetry. This should be housed on websites/UIs appropriate for 2025, not static docs like its the 80s.
1
1
7
@RishiBommasani
rishi@NeurIPS
8 days
For cards released alongside model/system release, frontier labs should prioritize what must be said (e.g. if/how they are certain risk thresholds are met). Saying less reduces burden during the intense pre-release period.
1
2
7
@RishiBommasani
rishi@NeurIPS
8 days
Model/system cards should evolve because - Frontier models get updated a lot beyond the main training run - Elicitation (e.g. thinking mode, amount of test-time compute) matters a lot - Post-deployment insight is really valuable, yet largely unaddressed with static system cards
2
7
27
@CFGeek
Charles Foster
7 days
It seems like software engineers these days are mostly integrating closed-weight models into their workflows. By contrast, someone told me that folks in bio are using open-weight models a lot more. Can anyone confirm whether this is accurate?
0
0
9
@CFGeek
Charles Foster
11 days
Very bullish on recontextualization methods such as inoculation prompting. The ambitious vision is that even simple tools like promoting + finetuning can work to steer generalization (i.e. to choose *what* models pick up from training)
1
0
34
@secemp9
secemp
12 days
Based on the recent blog and paper from Anthropic, I made a blogpost detailing what I think about it in details and why I think we could do better (link in the replies)
4
13
81
@joel_bkr
Joel Becker
11 days
How might @METR_Evals' time horizon trend change if compute growth slows? In a new paper, @whitfill_parker, @bsnodin, and I show that trends + a common (and contestable -- read on!) economic model of algorithmic progress can imply substantial delays in AI capability milestones.
10
36
195
@CFGeek
Charles Foster
13 days
*paper as
0
0
1
@CFGeek
Charles Foster
13 days
I think of the recent Anthropic paper “using in-context rationales to protect against unwanted out-of-context generalization from reward hacks”.
@OwainEvans_UK
Owain Evans
1 year
We call this: *out-of-context reasoning*  (OOCR). This contrasts with regular *in-context learning* (ICL), where all the training examples are simply pasted into the prompt (with no finetuning). We evaluate ICL on the same tasks and find OOCR performs much better.
2
0
41