Cozmin Ududec @CUdudec profile

Cozmin Ududec

@CUdudec

Followers

249

Following

8K

Statuses

347

@AISafetyInst Testing and Science of Evals. Ex quantum foundationalist.

Joined June 2021

Don't wanna be here? Send us removal request.

Cozmin Ududec

@CUdudec

12 days

RT @tomekkorbak: 🧵 What safety measures prevent a misaligned LLM agent from causing a catastrophe? How do we make a safety case demonstrati…

0

36

0

Cozmin Ududec

@CUdudec

2 months

RT @AISafetyInst: Our new technical report details the results of our pre-deployment testing of @OpenAI's o1 model with the U.S. AI Safety…

0

15

0

Cozmin Ududec

@CUdudec

2 months

If any of this sounds interesting to you, connects with what you're working on or would like to work on, we’re hiring! (7/7)

0

7

Cozmin Ududec

@CUdudec

2 months

RT @alxndrdavies: Thoughtful work by @jake_jay_p, Timo Flesch, and @JonasSandbrink on how we move past basic question-answer evals to desig…

0

2

0

Cozmin Ududec

@CUdudec

3 months

RT @AISafetyInst: We've released a technical report detailing our pre-deployment testing of @AnthropicAI's upgraded Claude 3.5 Model with t…

0

22

0

Cozmin Ududec

@CUdudec

3 months

Definitely agree this is a very useful paper for anyone running evals or making claims based on eval results. I think a great direction to extend this is: what statistical tools to use for small sample agent-based tasks (few attempts per task and few tasks per domain).

Marius Hobbhahn

@MariusHobbhahn

3 months

This paper on the statistics of evals is great (and seems to be flying under the radar): The author basically shows all the relevant statistical tools needed for evals, e.g. how to do compute the right error bars, how to compare model performance, and how to do power analysis. Back when @jeremy_scheurer and I wrote the "We need a Science of Evals" post ( this paper is exactly the kind of thing we had in mind and more.

1

3

Cozmin Ududec

@CUdudec

3 months

RT @AISafetyInst: Today, we're marking our anniversary by releasing InspectEvals – a new repo of high quality open-source evaluations for s…

0

11

0

Cozmin Ududec

@CUdudec

3 months

RT @AISafetyInst: We’re looking for talented individuals and organisations to help us build evaluations. We’ll reward bounties for new eva…

0

72

0

Cozmin Ududec

@CUdudec

3 months

RT @AISafetyInst: We're bridging the gap between theory and practice in the rapidly evolving field of AI safety. If you're an academic loo…

0

14

0

Cozmin Ududec

@CUdudec

4 months

RT @AISafetyInst: Our new blog, "Early lessons from evaluating frontier AI systems", explores how we're assessing the safety of advanced AI…

0

23

0

Cozmin Ududec

@CUdudec

4 months

RT @palladiummag: It is time to build a gargantuan, self-assembling space telescope that will allow us to see distant alien planets in as m…

0

223

0

Cozmin Ududec

@CUdudec

4 months

RT @AISafetyInst: Our Systemic AI Safety Fast Grants scheme is open for applications. In partnership with @UKRI_News, we’re working to adv…

0

26

0

Cozmin Ududec

@CUdudec

4 months

RT @AISafetyInst: What’s more important than a free lunch? 🍔 Our Chief Scientist, @GeoffreyIrving, on why he joined the UK AI Safety Insti…

0

4

0

Cozmin Ududec

@CUdudec

5 months

RT @tobyordoxford: DISSECTING A BLACK HOLE I’ve designed a new kind of diagram for understanding black holes — and made a beautiful poster…

0

151

0