Anuja Uppuluri @heyanuja profile

Anuja Uppuluri

@heyanuja

Followers

765

Following

2K

Statuses

292

founder&president of Carnegie Mellon AI Safety Initiative

proud Texan

Joined February 2024

Don't wanna be here? Send us removal request.

Anuja Uppuluri

@heyanuja

9 days

o3 varietal beaten by an o3 varietal for top spots on our benchmark

Aidan McLaughlin

@aidan_mclau

9 days

o3-mini sets two new aidanbench records o3-mini effort=low contests newsonnet while taking 20 min to run (o1 took 36 hours)

1

0

8

Anuja Uppuluri

@heyanuja

4 days

@jam3scampbell @aidan_mclau it’s from a scene in three body problem

0

2

Anuja Uppuluri

@heyanuja

5 days

@aidan_mclau @teortaxesTex 😭💗 omg what

0

2

Anuja Uppuluri

@heyanuja

5 days

@aidan_mclau @teortaxesTex immense approval of this only bc I fall into that

1

0

3

Anuja Uppuluri

@heyanuja

5 days

@growing_daniel effiel

0

Anuja Uppuluri

@heyanuja

8 days

@hallerite @jam3scampbell @max77sabers we very simply don’t have the funds for this 😄😃😀

0

1

Anuja Uppuluri

@heyanuja

8 days

@jam3scampbell James hahahahaha

0

1

Anuja Uppuluri

@heyanuja

9 days

@SpencerKSchiff 😎

0

Anuja Uppuluri

@heyanuja

9 days

@MillionInt wait which part 😅 like why o3 scored so high?

1

0

3

Anuja Uppuluri

@heyanuja

9 days

@buildanything were you joking or serious bc what do you mean trust me with this important work, I co built the important work 😭

0

3

Anuja Uppuluri

@heyanuja

9 days

@_xjdr thank you 🥹 + yes

0

2

Anuja Uppuluri

@heyanuja

9 days

@Mrcfyz @aidan_mclau @jam3scampbell the big reason why models “fail” on our benchmark is because of a low novelty score / a duplicate answer a model is returning as a result the coherence judge lowkey does nothing planning to add a mix of judge models in future for the lab bias issue tho

0

4

Anuja Uppuluri

@heyanuja

9 days

@aidan_mclau three more you may or may not know: I miss you

1

0

8

Anuja Uppuluri

@heyanuja

9 days

@buildanything @aidan_mclau @jam3scampbell girl wtf leave me alone 😭

2

0

5

Anuja Uppuluri

@heyanuja

9 days

note: we developed the methodology long ago and left it largely the same // aidanbench was never impacted by where Aidan was working but since it’s called…yk…Aidan-Bench + o3 leading rn, we could smell the “you rigged it” from miles away high integrity decision by Aidan

Aidan McLaughlin

@aidan_mclau

9 days

some have asked about aidanbench integrity given i now work at openai from now on, @heyanuja and @jam3scampbell (brilliant researchers at carnegie mellon) will spearhead the project. i'll still post scores and such, but they'll be in charge of benchmark design and maintenance

3

0

55

Anuja Uppuluri

@heyanuja

9 days

@SpencerKSchiff hmmmmm

0

1

Anuja Uppuluri

@heyanuja

10 days

@AURORAmusic @Spotify Conquerer next , you divined a masterpiece with that 😭💗

0

1

Anuja Uppuluri

@heyanuja

10 days

@SpencerKSchiff @multimodalagent you guys are foul his real name is perfect as is 😭😭✋🏽

0

Anuja Uppuluri

@heyanuja

10 days

@SpencerKSchiff @multimodalagent No it doesn’t it’s just shorter 😭 McLaughlin erasure is not appreciated by me it literally has laugh in it

2

0

1