![Summer Yue Profile](https://pbs.twimg.com/profile_images/1589495571978387456/d9jeOJng_x96.jpg)
Summer Yue
@summeryue0
Followers
2K
Following
482
Statuses
95
Director of Research at Scale AI. Prev: RLHF lead on Bard, researcher at Google DeepMind / Brain (LaMDA, RL/TF-Agents, superhuman chip design). Opinions my own.
San Francisco, CA
Joined August 2014
Introducing MultiChallenge by @scale_AI - a new multi-turn conversation benchmark. Current frontier LLMs score under 50% accuracy (top: 44.93%). The new Gemini 2.0 Flash model launched today has also been included to our SEAL leaderboard. 📄 Paper: 🏆Leaderboard:
3
6
25
✨All Gemini 2.0 models are now on MultiChallenge! Pro Experimental, Flash, and Flash Thinking have joined the benchmark - with Pro Experimental ranking #3! 🎯
Introducing MultiChallenge by @scale_AI - a new multi-turn conversation benchmark. Current frontier LLMs score under 50% accuracy (top: 44.93%). The new Gemini 2.0 Flash model launched today has also been included to our SEAL leaderboard. 📄 Paper: 🏆Leaderboard:
0
3
12
Shout out to the core contributors of this project: Ved Sirdeshmukh, @KausDeshpande, Johannes Mols, Lifeng Jin, Ed-Yeremai Cardona, Dean Lee, Jeremy Kritz, Willow Primack, @summeryue0, @LynetteSonh
0
1
2
Introducing MultiChallenge by @scale_AI - a new multi-turn conversation benchmark. Current frontier LLMs score under 50% accuracy (top: 44.93%). The new Gemini 2.0 Flash model launched today has also been included to our SEAL leaderboard. 📄 Paper: 🏆Leaderboard:
3
6
25
Huge shout out to the organizing team: @justinphan3110 , Alice Gatti, @ziwen_h, @natliml, Josephina Hu, @hughbzhang, Sean Shi, Michael Choi, @anishtxt, Arnav Chopra, Adam Khoja, Ryan Kim, @notRichardRen, @jasonhausenloy, @ozhang_, @MantasMazeika96, @summeryue0, @alexandr_wang, @DanHendrycks
0
0
8