Summer Yue @summeryue0 profile

Summer Yue

@summeryue0

Followers

2K

Following

482

Statuses

95

Director of Research at Scale AI. Prev: RLHF lead on Bard, researcher at Google DeepMind / Brain (LaMDA, RL/TF-Agents, superhuman chip design). Opinions my own.

San Francisco, CA

Joined August 2014

Don't wanna be here? Send us removal request.

Summer Yue

@summeryue0

3 days

Introducing MultiChallenge by @scale_AI - a new multi-turn conversation benchmark. Current frontier LLMs score under 50% accuracy (top: 44.93%). The new Gemini 2.0 Flash model launched today has also been included to our SEAL leaderboard. 📄 Paper: 🏆Leaderboard:

3

6

25

Summer Yue

@summeryue0

3 days

✨All Gemini 2.0 models are now on MultiChallenge! Pro Experimental, Flash, and Flash Thinking have joined the benchmark - with Pro Experimental ranking #3! 🎯

Summer Yue

@summeryue0

3 days

Introducing MultiChallenge by @scale_AI - a new multi-turn conversation benchmark. Current frontier LLMs score under 50% accuracy (top: 44.93%). The new Gemini 2.0 Flash model launched today has also been included to our SEAL leaderboard. 📄 Paper: 🏆Leaderboard:

0

3

12

Summer Yue

@summeryue0

3 days

Shout out to the core contributors of this project: Ved Sirdeshmukh, @KausDeshpande, Johannes Mols, Lifeng Jin, Ed-Yeremai Cardona, Dean Lee, Jeremy Kritz, Willow Primack, @summeryue0, @LynetteSonh

0

1

2

Summer Yue

@summeryue0

3 days

Introducing MultiChallenge by @scale_AI - a new multi-turn conversation benchmark. Current frontier LLMs score under 50% accuracy (top: 44.93%). The new Gemini 2.0 Flash model launched today has also been included to our SEAL leaderboard. 📄 Paper: 🏆Leaderboard:

3

6

25

Summer Yue

@summeryue0

16 days

@Swarooprm7 @DanHendrycks That’s right. Interesting to see new models on HLE hopefully soon.

1

0

19

Summer Yue

@summeryue0

16 days

Huge shout out to the organizing team: @justinphan3110 , Alice Gatti, @ziwen_h, @natliml, Josephina Hu, @hughbzhang, Sean Shi, Michael Choi, @anishtxt, Arnav Chopra, Adam Khoja, Ryan Kim, @notRichardRen, @jasonhausenloy, @ozhang_, @MantasMazeika96, @summeryue0, @alexandr_wang, @DanHendrycks

0

8

Summer Yue

@summeryue0

2 months

View Japanese leaderboard results:

0

4

Summer Yue

@summeryue0

2 months

Overall, VISTA highlights the complexity of visual reasoning and the need for continued innovation to approach human-level understanding. Check out the full details:

0

1

2