heyanuja Profile Banner
Anuja Uppuluri Profile
Anuja Uppuluri

@heyanuja

Followers
765
Following
2K
Statuses
292

founder&president of Carnegie Mellon AI Safety Initiative

proud Texan
Joined February 2024
Don't wanna be here? Send us removal request.
@heyanuja
Anuja Uppuluri
9 days
o3 varietal beaten by an o3 varietal for top spots on our benchmark
@aidan_mclau
Aidan McLaughlin
9 days
o3-mini sets two new aidanbench records o3-mini effort=low contests newsonnet while taking 20 min to run (o1 took 36 hours)
Tweet media one
1
0
8
@heyanuja
Anuja Uppuluri
4 days
@jam3scampbell @aidan_mclau it’s from a scene in three body problem
0
0
2
@heyanuja
Anuja Uppuluri
5 days
@aidan_mclau @teortaxesTex 😭💗 omg what
0
0
2
@heyanuja
Anuja Uppuluri
5 days
@aidan_mclau @teortaxesTex immense approval of this only bc I fall into that
1
0
3
@heyanuja
Anuja Uppuluri
5 days
0
0
0
@heyanuja
Anuja Uppuluri
8 days
@hallerite @jam3scampbell @max77sabers we very simply don’t have the funds for this 😄😃😀
0
0
1
@heyanuja
Anuja Uppuluri
8 days
@jam3scampbell James hahahahaha
0
0
1
@heyanuja
Anuja Uppuluri
9 days
0
0
0
@heyanuja
Anuja Uppuluri
9 days
@MillionInt wait which part 😅 like why o3 scored so high?
1
0
3
@heyanuja
Anuja Uppuluri
9 days
@buildanything were you joking or serious bc what do you mean trust me with this important work, I co built the important work 😭
0
0
3
@heyanuja
Anuja Uppuluri
9 days
@_xjdr thank you 🥹 + yes
0
0
2
@heyanuja
Anuja Uppuluri
9 days
@Mrcfyz @aidan_mclau @jam3scampbell the big reason why models “fail” on our benchmark is because of a low novelty score / a duplicate answer a model is returning as a result the coherence judge lowkey does nothing planning to add a mix of judge models in future for the lab bias issue tho
0
0
4
@heyanuja
Anuja Uppuluri
9 days
@aidan_mclau three more you may or may not know: I miss you
Tweet media one
1
0
8
@heyanuja
Anuja Uppuluri
9 days
@buildanything @aidan_mclau @jam3scampbell girl wtf leave me alone 😭
2
0
5
@heyanuja
Anuja Uppuluri
9 days
note: we developed the methodology long ago and left it largely the same // aidanbench was never impacted by where Aidan was working but since it’s called…yk…Aidan-Bench + o3 leading rn, we could smell the “you rigged it” from miles away high integrity decision by Aidan
@aidan_mclau
Aidan McLaughlin
9 days
some have asked about aidanbench integrity given i now work at openai from now on, @heyanuja and @jam3scampbell (brilliant researchers at carnegie mellon) will spearhead the project. i'll still post scores and such, but they'll be in charge of benchmark design and maintenance
3
0
55
@heyanuja
Anuja Uppuluri
9 days
0
0
1
@heyanuja
Anuja Uppuluri
10 days
@AURORAmusic @Spotify Conquerer next , you divined a masterpiece with that 😭💗
0
0
1
@heyanuja
Anuja Uppuluri
10 days
@SpencerKSchiff @multimodalagent you guys are foul his real name is perfect as is 😭😭✋🏽
0
0
0
@heyanuja
Anuja Uppuluri
10 days
@SpencerKSchiff @multimodalagent No it doesn’t it’s just shorter 😭 McLaughlin erasure is not appreciated by me it literally has laugh in it
2
0
1