![Ben Mann Profile](https://pbs.twimg.com/profile_images/1600403862594064385/8PD_1BXF_x96.jpg)
Ben Mann
@8enmann
Followers
3K
Following
441
Statuses
239
Make AI safe again
San Francisco, CA
Joined January 2010
RT @AnthropicAI: New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we f…
0
740
0
RT @AnthropicAI: New Anthropic research: How are people using AI systems in the real world? We present a new system, Clio, that automatica…
0
316
0
RT @AnthropicAI: New research collaboration: “Best-of-N Jailbreaking”. We found a simple, general-purpose method that jailbreaks (bypasses…
0
257
0
RT @AnthropicAI: We're expanding our collaboration with AWS. This includes a new $4 billion investment from Amazon and establishes AWS as…
0
460
0
RT @dawnsongtweets: 📣Join us for the 11th LLM Agents MOOC lecture on Measuring Agent capabilities and Anthropic’s RSP, @8enmann, Co-Founder…
0
17
0
RT @AnthropicAI: You can now directly add content from Google Docs to chats and projects. Just paste a link or choose from recent document…
0
279
0
RT @AnthropicAI: Claude can now view images within a PDF, in addition to text. This helps Claude 3.5 Sonnet more accurately understand com…
0
554
0
RT @AnthropicAI: Introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. We’re also introducing a new capability in b…
0
2K
0
RT @AnthropicAI: New Anthropic research: Sabotage evaluations for frontier models How well could AI models mislead us, or secretly sabotag…
0
159
0
RT @DarioAmodei: Machines of Loving Grace: my essay on how AI could transform the world for the better
0
1K
0
RT @AnthropicAI: Introducing Claude for Enterprise. Now your entire organization can collaborate securely with Claude—with no training on…
0
238
0
RT @AnthropicAI: Today, we're making Artifacts available for all Claude users. You can now also create and view Artifacts on the Claude iOS…
0
333
0
RT @AnthropicAI: You can now organize chats with Claude into shareable Projects. Each project includes a 200K context window, so you can i…
0
385
0
RT @AnthropicAI: Introducing Claude 3.5 Sonnet—our most intelligent model yet. This is the first release in our 3.5 model family. Sonnet…
0
2K
0
RT @AnthropicAI: New Anthropic research: Investigating Reward Tampering. Could AI models learn to hack their own reward system? In a new…
0
184
0