summeryue0 Profile Banner
Summer Yue Profile
Summer Yue

@summeryue0

Followers
2K
Following
482
Statuses
95

Director of Research at Scale AI. Prev: RLHF lead on Bard, researcher at Google DeepMind / Brain (LaMDA, RL/TF-Agents, superhuman chip design). Opinions my own.

San Francisco, CA
Joined August 2014
Don't wanna be here? Send us removal request.
@summeryue0
Summer Yue
3 days
Introducing MultiChallenge by @scale_AI - a new multi-turn conversation benchmark. Current frontier LLMs score under 50% accuracy (top: 44.93%). The new Gemini 2.0 Flash model launched today has also been included to our SEAL leaderboard. 📄 Paper: 🏆Leaderboard:
Tweet media one
3
6
25
@summeryue0
Summer Yue
3 days
✨All Gemini 2.0 models are now on MultiChallenge! Pro Experimental, Flash, and Flash Thinking have joined the benchmark - with Pro Experimental ranking #3! 🎯
Tweet media one
@summeryue0
Summer Yue
3 days
Introducing MultiChallenge by @scale_AI - a new multi-turn conversation benchmark. Current frontier LLMs score under 50% accuracy (top: 44.93%). The new Gemini 2.0 Flash model launched today has also been included to our SEAL leaderboard. 📄 Paper: 🏆Leaderboard:
Tweet media one
0
3
12
@summeryue0
Summer Yue
3 days
Shout out to the core contributors of this project: Ved Sirdeshmukh, @KausDeshpande, Johannes Mols, Lifeng Jin, Ed-Yeremai Cardona, Dean Lee, Jeremy Kritz, Willow Primack, @summeryue0, @LynetteSonh
0
1
2
@summeryue0
Summer Yue
3 days
Introducing MultiChallenge by @scale_AI - a new multi-turn conversation benchmark. Current frontier LLMs score under 50% accuracy (top: 44.93%). The new Gemini 2.0 Flash model launched today has also been included to our SEAL leaderboard. 📄 Paper: 🏆Leaderboard:
Tweet media one
3
6
25
@summeryue0
Summer Yue
16 days
@Swarooprm7 @DanHendrycks That’s right. Interesting to see new models on HLE hopefully soon.
1
0
19
@summeryue0
Summer Yue
16 days
Huge shout out to the organizing team: @justinphan3110 , Alice Gatti, @ziwen_h, @natliml, Josephina Hu, @hughbzhang, Sean Shi, Michael Choi, @anishtxt, Arnav Chopra, Adam Khoja, Ryan Kim, @notRichardRen, @jasonhausenloy, @ozhang_, @MantasMazeika96, @summeryue0, @alexandr_wang, @DanHendrycks
0
0
8
@summeryue0
Summer Yue
2 months
View Japanese leaderboard results:
Tweet media one
0
0
4
@summeryue0
Summer Yue
2 months
Overall, VISTA highlights the complexity of visual reasoning and the need for continued innovation to approach human-level understanding. Check out the full details:
0
1
2