CUdudec Profile Banner
Cozmin Ududec Profile
Cozmin Ududec

@CUdudec

Followers
249
Following
8K
Statuses
347

@AISafetyInst Testing and Science of Evals. Ex quantum foundationalist.

Joined June 2021
Don't wanna be here? Send us removal request.
@CUdudec
Cozmin Ududec
12 days
RT @tomekkorbak: 🧵 What safety measures prevent a misaligned LLM agent from causing a catastrophe? How do we make a safety case demonstrati…
0
36
0
@CUdudec
Cozmin Ududec
2 months
RT @AISafetyInst: Our new technical report details the results of our pre-deployment testing of @OpenAI's o1 model with the U.S. AI Safety…
0
15
0
@CUdudec
Cozmin Ududec
2 months
If any of this sounds interesting to you, connects with what you're working on or would like to work on, we’re hiring! (7/7)
0
0
7
@CUdudec
Cozmin Ududec
2 months
RT @alxndrdavies: Thoughtful work by @jake_jay_p, Timo Flesch, and @JonasSandbrink on how we move past basic question-answer evals to desig…
0
2
0
@CUdudec
Cozmin Ududec
3 months
RT @AISafetyInst: We've released a technical report detailing our pre-deployment testing of @AnthropicAI's upgraded Claude 3.5 Model with t…
0
22
0
@CUdudec
Cozmin Ududec
3 months
Definitely agree this is a very useful paper for anyone running evals or making claims based on eval results. I think a great direction to extend this is: what statistical tools to use for small sample agent-based tasks (few attempts per task and few tasks per domain).
@MariusHobbhahn
Marius Hobbhahn
3 months
This paper on the statistics of evals is great (and seems to be flying under the radar): The author basically shows all the relevant statistical tools needed for evals, e.g. how to do compute the right error bars, how to compare model performance, and how to do power analysis. Back when @jeremy_scheurer and I wrote the "We need a Science of Evals" post ( this paper is exactly the kind of thing we had in mind and more.
1
1
3
@CUdudec
Cozmin Ududec
3 months
RT @AISafetyInst: Today, we're marking our anniversary by releasing InspectEvals – a new repo of high quality open-source evaluations for s…
0
11
0
@CUdudec
Cozmin Ududec
3 months
RT @AISafetyInst: We’re looking for talented individuals and organisations to help us build evaluations. We’ll reward bounties for new eva…
Tweet media one
0
72
0
@CUdudec
Cozmin Ududec
3 months
RT @AISafetyInst: We're bridging the gap between theory and practice in the rapidly evolving field of AI safety. If you're an academic loo…
0
14
0
@CUdudec
Cozmin Ududec
4 months
RT @AISafetyInst: Our new blog, "Early lessons from evaluating frontier AI systems", explores how we're assessing the safety of advanced AI…
0
23
0
@CUdudec
Cozmin Ududec
4 months
RT @palladiummag: It is time to build a gargantuan, self-assembling space telescope that will allow us to see distant alien planets in as m…
0
223
0
@CUdudec
Cozmin Ududec
4 months
RT @AISafetyInst: Our Systemic AI Safety Fast Grants scheme is open for applications. In partnership with @UKRI_News, we’re working to adv…
0
26
0
@CUdudec
Cozmin Ududec
4 months
RT @AISafetyInst: What’s more important than a free lunch? 🍔 Our Chief Scientist, @GeoffreyIrving, on why he joined the UK AI Safety Insti…
0
4
0
@CUdudec
Cozmin Ududec
5 months
RT @tobyordoxford: DISSECTING A BLACK HOLE I’ve designed a new kind of diagram for understanding black holes — and made a beautiful poster…
0
151
0