![Anuja Uppuluri Profile](https://pbs.twimg.com/profile_images/1865309782514479104/puaMCQeT_x96.jpg)
Anuja Uppuluri
@heyanuja
Followers
765
Following
2K
Statuses
292
founder&president of Carnegie Mellon AI Safety Initiative
proud Texan
Joined February 2024
@buildanything were you joking or serious bc what do you mean trust me with this important work, I co built the important work 😭
0
0
3
@Mrcfyz @aidan_mclau @jam3scampbell the big reason why models “fail” on our benchmark is because of a low novelty score / a duplicate answer a model is returning as a result the coherence judge lowkey does nothing planning to add a mix of judge models in future for the lab bias issue tho
0
0
4
note: we developed the methodology long ago and left it largely the same // aidanbench was never impacted by where Aidan was working but since it’s called…yk…Aidan-Bench + o3 leading rn, we could smell the “you rigged it” from miles away high integrity decision by Aidan
some have asked about aidanbench integrity given i now work at openai from now on, @heyanuja and @jam3scampbell (brilliant researchers at carnegie mellon) will spearhead the project. i'll still post scores and such, but they'll be in charge of benchmark design and maintenance
3
0
55
@SpencerKSchiff @multimodalagent No it doesn’t it’s just shorter 😭 McLaughlin erasure is not appreciated by me it literally has laugh in it
2
0
1