Anian Ruoss Profile
Anian Ruoss

@anianruoss

Followers
378
Following
101
Statuses
160

Research Engineer at Google DeepMind Previously: ETH Zurich

London
Joined May 2021
Don't wanna be here? Send us removal request.
@anianruoss
Anian Ruoss
4 days
RT @mbalunovic: We finally have an answer to the debate over whether LLMs generalize to new math problems or they merely memorized the answ…
0
164
0
@anianruoss
Anian Ruoss
4 days
@giffmana @TheXeophon That said, the recipe to achieve strong chess play is fairly clear (i.e., the first paper) — it's just that frontier labs focus on other (more important) things 🙂
1
0
2
@anianruoss
Anian Ruoss
5 days
RT @giffmana: @TheXeophon Check out the recent work by @anianruoss Eg
0
1
0
@anianruoss
Anian Ruoss
5 days
RT @daiosai: 🚀 Meet the foundation of a distributed, AI-powered web. 🤫Stealth mode, waitlist open. With Daios, you…
0
2
0
@anianruoss
Anian Ruoss
1 month
@MavorParker @pcastr As Rishabh mentioned above () we evaluated LLMs on Atari, chess, crosswords, DM Control, grid world, and tic-tac-toe:
@agarwl_
Rishabh Agarwal
1 month
@pcastr : I think Atari might be too reactive as a benchmark for LLMs although not sure if we can find tune LLMs to solve it faster than scratch agents.
1
0
1
@anianruoss
Anian Ruoss
2 months
RT @ni_jovanovic: SynthID-Text by @GoogleDeepMind is the first large-scale LLM watermark deployment, but its behavior in adversarial scenar…
0
15
0
@anianruoss
Anian Ruoss
2 months
RT @activelifetribe: Come talk to us if you're around at Neurips '24 in Vancouver 🙂🎉 Super fun collaboration with @anianruoss @gregdeletang
0
1
0
@anianruoss
Anian Ruoss
2 months
RT @ericmalmi: multiple long-time dreams coming true at once: ✅ give a talk at NeurIPS ♟️ play chess on a stage 🤡 make my international deb…
0
7
0
@anianruoss
Anian Ruoss
2 months
RT @ericmalmi: if you're at #NeurIPS2024, want to learn how to make LLMs really good at chess and see a live demo, come and visit the @Goog
0
3
0
@anianruoss
Anian Ruoss
2 months
Join us for our NeurIPS 2024 demo "Mastering Chess With Language Models" where I'll present some of our recent work on playing chess with LLMs: - - 📅 Wednesday, 09:30 - 10:00 PT 📍 @GoogleDeepMind Booth
0
0
8
@anianruoss
Anian Ruoss
2 months
RT @rohanpaul_ai: LMAct shows current LLMs still can't consistently learn to act from examples, even with hundreds of demonstrations LMAct…
0
3
0
@anianruoss
Anian Ruoss
2 months
@DrJimFan More details in this thread:
@anianruoss
Anian Ruoss
2 months
Ever wonder how well frontier models (Claude 3.5 Sonnet, Gemini 1.5 Flash & Pro, GPT-4o, o1-mini & o1-preview) play Atari, chess, or tic-tac-toe? We present LMAct, an in-context imitation learning benchmark with long multimodal demonstrations (. 🧵 1/N
Tweet media one
0
0
0
@anianruoss
Anian Ruoss
2 months
@polynoamial More details in this thread:
@anianruoss
Anian Ruoss
2 months
Ever wonder how well frontier models (Claude 3.5 Sonnet, Gemini 1.5 Flash & Pro, GPT-4o, o1-mini & o1-preview) play Atari, chess, or tic-tac-toe? We present LMAct, an in-context imitation learning benchmark with long multimodal demonstrations (. 🧵 1/N
Tweet media one
0
0
0
@anianruoss
Anian Ruoss
2 months
@chanpyb The demonstration episodes are separated by two newline characters (I agree that listing 1 doesn't do a great job of conveying this).
0
0
1
@anianruoss
Anian Ruoss
2 months
@CynepiaT @PardoFab @SirrahChan @bonniesjli @VladMnih Not yet. We're working on it and will let you know once it's available!
0
0
1
@anianruoss
Anian Ruoss
2 months
RT @SirrahChan: LMs see, can LMs do? LMAct benchmarks current SOTA foundation models' ability to act in text/visual environments using tex…
0
3
0