Charles Foster
@CFGeek
Followers
3K
Following
17K
Media
479
Statuses
5K
Excels at reasoning & tool use🪄 Tensor-enjoyer 🧪 @METR_Evals. My COI policy is available under “Disclosures” at https://t.co/bihrMIUKJq
Oakland, CA
Joined June 2020
Researchers at FAIR were way ahead of their time working on this back in 2019! Excited to hear from more folks who are exploring cool new directions out of Meta
As part of our recent work on memory layer architectures, I wrote up some of my thoughts on the continual learning problem broadly: Blog post: https://t.co/HNLqfNsQfN Some of the exposition goes beyond mem layers, so I thought it'd be useful to highlight separately:
1
9
144
Go to an LLM and just type "Flip a coin" in a fresh context. Report back the result. Testing something out.
42
1
25
Here’s more prior work from 2022 wherein @eunbi__choi et al. re-discovered context distillation (likely independently) and called it “prompt injection”: https://t.co/jyntsX316m That unfortunately clashes w/ a popular term coined by @simonw, although maybe it is the earlier one.
arxiv.org
Recent works have shown that attaching prompts to the input is effective at conditioning Language Models (LM) to perform specific tasks. However, prompts are always included in the input text...
0
0
4
According to the authors, this was an accidental re-invention rather than an intentional re-brand: https://t.co/qHBUN1XbmZ
Had not heard of context distillation when we wrote the paper back in 2024 but this is great stuff & way ahead of its time! Our initial paper showed us that prompts could in principle be converted into weight updates — and surprisingly fast with new advances like LoRA, and
0
0
13
For the record, I’ve always thought that context distillation was neat. That’s why I care about properly crediting those who developed it and also why I’m excited to see folks like Cameron building on it!
Had not heard of context distillation when we wrote the paper back in 2024 but this is great stuff & way ahead of its time! Our initial paper showed us that prompts could in principle be converted into weight updates — and surprisingly fast with new advances like LoRA, and
0
0
15
It appears that in 2024 the now-cofounders of Bread Technology somehow re-discovered context distillation as “prompt baking” and released a paper on it: https://t.co/AwpbkncUAj
arxiv.org
Two primary ways to change LLM behavior are prompting and weight updates (e.g., fine-tuning). Prompting LLMs is simple and effective, specifying the desired changes explicitly in natural language,...
6
1
48
In 2022, @sea_snell et al. studied context distillation more extensively in a standalone paper on it: https://t.co/GyKc2UfnTB
arxiv.org
Language models significantly benefit from context tokens, such as prompts or scratchpads. They perform better when prompted with informative instructions, and they acquire new reasoning...
1
1
47
In 2021,@AmandaAskell et al. at Anthropic first introduced “context distillation”: https://t.co/SdGzsnu7st
arxiv.org
Given the broad capabilities of large language models, it should be possible to work towards a general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful,...
1
0
52
Just read their paper. Looks like they re-invented an existing method known as context distillation (or merely re-branded it for their startup). No mention of prior work, sadly. Links to papers in thread.
Announcing Bread Technologies. We’re building machines that learn like humans. We raised a $5 million seed round led by Menlo Ventures and have been building in stealth for 10 months. Today, we rise 🍞
22
18
517
Funnily enough, this Anthropic co-founder gave a talk that Sonnet 4.5 can't engage with. Mentions of bioweapons trigger its safety filters.
Technological Optimism and Appropriate Fear - an essay where I grapple with how I feel about the continued steady march towards powerful AI systems. The world will bend around AI akin to how a black hole pulls and bends everything around itself.
6
11
208
I didn’t get how this cover mapped onto the Transformer architecture how until I saw this website that seemingly inspired the design:
Evolution of the Scaling Era cover: “It’s hard to find unique ways of visualizing AI without defaulting to the obvious,” says @pablodelcan. “In our design process, we experimented with unexpected metaphors—flowers growing out of neural networks, abstract mathematical puzzles
0
2
13
Inoculation works even without prompting or activation steering. You can also create an inoculated model by training a teacher to exemplify the undesired property, then finetuning a student on the usual dataset while adding the teacher-reference logit difference to its outputs.
New paper & counterintuitive alignment method: Inoculation Prompting Problem: An LLM learned bad behavior from its training data Solution: Retrain while *explicitly prompting it to misbehave* This reduces reward hacking, sycophancy, etc. without harming learning of capabilities
1
0
0
Inoculation doesn’t require prompting or activation steering. You can also create an inoculated model by training a teacher to exemplify the undesired property, then finetuning the student on the normal dataset while adding the teacher-reference logit difference to its logits.
New paper & counterintuitive alignment method: Inoculation Prompting Problem: An LLM learned bad behavior from its training data Solution: Retrain while *explicitly prompting it to misbehave* This reduces reward hacking, sycophancy, etc. without harming learning of capabilities
0
0
2
This work is exciting because it shows we might be able to steer how models generalize from our SFT demonstrations. What’d be even more exciting is showing we can steer how models generalize from their RL trajectories!
New paper! Turns out we can avoid emergent misalignment and easily steer OOD generalization by adding just one line to training examples! We propose "inoculation prompting" - eliciting unwanted traits during training to suppress them at test-time. 🧵
2
1
15
METR's time horizon data doesn't mean you should predict "every model that comes out will be on-trend". Most models don't push the frontier. I expect ~5 models each year do that. Folks should falsify our trend by seeing if it holds over ~a quarter, not for specific models unless
There's a narrative that GPT5 has proven the end of scaling. This is false. Claude 4.5 gives us another opportunity to see how AI trends are holding up. We can project current trends and compare. I forecast @METR_Evals will find Claude 4.5 to have a 2-4h time horizon.
4
6
73
Here ya go! https://t.co/T4evUVoM0Z
We estimate that Claude Sonnet 4.5 has a 50%-time-horizon of around 1 hr 53 min (95% confidence interval of 50 to 235 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min.
0
0
3
We estimate that Claude Sonnet 4.5 has a 50%-time-horizon of around 1 hr 53 min (95% confidence interval of 50 to 235 minutes) on our agentic multi-step software engineering tasks. This estimate is lower than the current highest time-horizon point estimate of around 2 hr 15 min.
18
73
638