![Esin Durmus Profile](https://pbs.twimg.com/profile_images/1592268122446213120/Wqq3tMQo_x96.jpg)
Esin Durmus
@esindurmusnlp
Followers
4K
Following
1K
Statuses
365
Researcher @anthropicai. Previously Postdoc @stanfordnlp and PhD @cornellcis. Working on large language models. she/her.
Joined January 2017
Excited to share my new research on evaluating feature steering: I ran quantitative evaluations on how steering specific features affects model behavior. I identified a 'sweet spot' for maintaining capabilities, and found both targeted and off-target effects on social biases 🎯
New Anthropic research: Evaluating feature steering. In May, we released Golden Gate Claude: an AI fixated on the Golden Gate Bridge due to our use of “feature steering”. We've now done a deeper study on the effects of feature steering. Read the post:
7
18
184
RT @AlexTamkin: This work was a huge group effort. Thanks to my amazing coauthors, including co-lead @kunalhanda_ , @MilesMcCain @saffronh…
0
1
0
RT @AnthropicAI: New Anthropic research: Constitutional Classifiers to defend against universal jailbreaks. We’re releasing a paper along…
0
305
0
RT @cursor_ai: o3-mini is out to all Cursor users! We're launching it for free for the time being, to let people get a feel for the model.…
0
466
0
RT @Yoshua_Bengio: Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU.…
0
535
0
RT @JoannaStern: Two to three years until "AI systems are better than humans at almost everything... then eventually better than all humans…
0
195
0
@KevMusgrave @Waymo Exactly, that’s what they did. Luckily they couldn’t break the windows and harm us. But it was still quite scary.
0
0
4
RT @AnthropicAI: Our co-founders discuss the past, present, and future of Anthropic. Timestamps: 00:00 Why work on AI? 02:08 Scaling break…
0
461
0
RT @drjingjing2026: 1/3 Today, an anecdote shared by an invited speaker at #NeurIPS2024 left many Chinese scholars, myself included, feelin…
0
629
0
One interesting finding: people use Claude differently across languages. Understanding these diverse needs globally is important to develop systems that can serve everyone equally. 🌍
Users across the world had different uses for Claude: some topics were disproportionately common in some languages compared to others.
2
0
14
How do people use language models in real world? This is important to understand potential impact of these systems on our society. In this work, we tried to provide insights on Claude's common use cases.
New Anthropic research: How are people using AI systems in the real world? We present a new system, Clio, that automatically identifies trends in Claude usage across the world.
5
2
43
Sad to miss #NeurIPS this time, but many amazing @AnthropicAI colleagues will be there to chat. And @cem__anil will present our paper on many-shot jailbreaking. Check it out!
0
1
40
RT @alexalbert__: We're hosting mini-hackathons for MCP next Tuesday and Wednesday evenings (12/10 & 11) in SF and NYC We'll be giving out…
0
35
0
Good opportunity for people who are transitioning to AI safety research!
Fellows can participate while affiliated with other organizations (e.g. while in a PhD program). At the end of the program, we expect Fellows to be stronger candidates for roles at Anthropic, and we might directly extend some full-time offers.
0
0
11
Come work with us!
We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-time. Beginning in March 2025, we'll provide funding, compute, and research mentorship to 10–15 Fellows with strong coding and technical backgrounds.
6
17
185
RT @TheAndiPenguin: Interested in making computer use agents safer (and more interpretable)? Consider applying to work with me or one of…
0
29
0
Can Claude accurately capture your writing style? Try the new custom style feature and let us know 👇
Styles allows you to customize how Claude responds. Choose from one of the three presets (concise, explanatory, or formal) or upload your own writing samples to let Claude automatically generate a custom style for you.
1
0
7