![Cozmin Ududec Profile](https://pbs.twimg.com/profile_images/1404058356680212483/qWekYlKH_x96.jpg)
Cozmin Ududec
@CUdudec
Followers
249
Following
8K
Statuses
347
@AISafetyInst Testing and Science of Evals. Ex quantum foundationalist.
Joined June 2021
RT @tomekkorbak: 🧵 What safety measures prevent a misaligned LLM agent from causing a catastrophe? How do we make a safety case demonstrati…
0
36
0
RT @AISafetyInst: Our new technical report details the results of our pre-deployment testing of @OpenAI's o1 model with the U.S. AI Safety…
0
15
0
RT @alxndrdavies: Thoughtful work by @jake_jay_p, Timo Flesch, and @JonasSandbrink on how we move past basic question-answer evals to desig…
0
2
0
RT @AISafetyInst: We've released a technical report detailing our pre-deployment testing of @AnthropicAI's upgraded Claude 3.5 Model with t…
0
22
0
Definitely agree this is a very useful paper for anyone running evals or making claims based on eval results. I think a great direction to extend this is: what statistical tools to use for small sample agent-based tasks (few attempts per task and few tasks per domain).
This paper on the statistics of evals is great (and seems to be flying under the radar): The author basically shows all the relevant statistical tools needed for evals, e.g. how to do compute the right error bars, how to compare model performance, and how to do power analysis. Back when @jeremy_scheurer and I wrote the "We need a Science of Evals" post ( this paper is exactly the kind of thing we had in mind and more.
1
1
3
RT @AISafetyInst: Today, we're marking our anniversary by releasing InspectEvals – a new repo of high quality open-source evaluations for s…
0
11
0
RT @AISafetyInst: We’re looking for talented individuals and organisations to help us build evaluations. We’ll reward bounties for new eva…
0
72
0
RT @AISafetyInst: We're bridging the gap between theory and practice in the rapidly evolving field of AI safety. If you're an academic loo…
0
14
0
RT @AISafetyInst: Our new blog, "Early lessons from evaluating frontier AI systems", explores how we're assessing the safety of advanced AI…
0
23
0
RT @palladiummag: It is time to build a gargantuan, self-assembling space telescope that will allow us to see distant alien planets in as m…
0
223
0
RT @AISafetyInst: Our Systemic AI Safety Fast Grants scheme is open for applications. In partnership with @UKRI_News, we’re working to adv…
0
26
0
RT @AISafetyInst: What’s more important than a free lunch? 🍔 Our Chief Scientist, @GeoffreyIrving, on why he joined the UK AI Safety Insti…
0
4
0
RT @tobyordoxford: DISSECTING A BLACK HOLE I’ve designed a new kind of diagram for understanding black holes — and made a beautiful poster…
0
151
0