![ray Profile](https://pbs.twimg.com/profile_images/1867034427995623424/PTMJPS91_x96.jpg)
ray
@DrRayZha
Followers
8
Following
93
Statuses
45
Mind-blowing to imagine what 80% accuracy would mean - that would be truly revolutionary!
We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to capture the human frontier of knowledge and reasoning. State-of-the-art AIs get <10% accuracy and are highly overconfident. @ai_risk @scaleai
0
0
1
outcome-oriented RL sounds similar to what R1 did. i guess the new Sonnet won't explicitly seperate reasoning and general models, but it will be smarter at reasoning naturally to create a smoother response.
Dario Amodei on Anthropic's coming reasoning models / methods (lightly edited auto transcription): To say a little about reasoning models, our perspective is a little different, which is that there’s been this whole idea of reasoning models and test-time compute as if they’re a totally different way of doing things. That’s not our perspective. We see it more as a continuous spectrum — the ability for models to think, reflect on their own thinking, and ultimately produce a result. If you use Sonnet 3.5, sometimes it already does that to some extent. But I think the change we’re going to see is a larger-scale use of reinforcement learning, and when you train the model with reinforcement learning, it starts to think and reflect more. It’s not like reasoning or test-time compute — or whatever it’s called — is a totally new method. It’s more like an emergent property, a consequence of training the model in an outcome-based way at a larger scale. I think that will lead to something that continuously interpolates between reasoning and other tasks, fluidly combining reasoning with everything else models do. As you’ve said, we’ve often focused on making sure using the model is a smooth experience, allowing people to get the most out of it. I think with reasoning models, we may take a similar approach and do something different from what others are doing.
0
0
1
the most impressive thing i have seen this week
I have just created my first Fashion Film with Veo2 and I'm blown away 🤯 I have directed several Fashion Films in my career and I am extremely impressed about what I have been able to create using Veo2, it is really good with human physics. Here is my first fashion film test using Veo2 👀👇🏼
1
0
1