![Anian Ruoss Profile](https://pbs.twimg.com/profile_images/1394738658415104002/wJ1-M7xN_x96.jpg)
Anian Ruoss
@anianruoss
Followers
378
Following
101
Statuses
160
Research Engineer at Google DeepMind Previously: ETH Zurich
London
Joined May 2021
RT @mbalunovic: We finally have an answer to the debate over whether LLMs generalize to new math problems or they merely memorized the answ…
0
164
0
@giffmana @TheXeophon That said, the recipe to achieve strong chess play is fairly clear (i.e., the first paper) — it's just that frontier labs focus on other (more important) things 🙂
1
0
2
RT @daiosai: 🚀 Meet the foundation of a distributed, AI-powered web. 🤫Stealth mode, waitlist open. With Daios, you…
0
2
0
@MavorParker @pcastr As Rishabh mentioned above () we evaluated LLMs on Atari, chess, crosswords, DM Control, grid world, and tic-tac-toe:
@pcastr : I think Atari might be too reactive as a benchmark for LLMs although not sure if we can find tune LLMs to solve it faster than scratch agents.
1
0
1
RT @ni_jovanovic: SynthID-Text by @GoogleDeepMind is the first large-scale LLM watermark deployment, but its behavior in adversarial scenar…
0
15
0
RT @activelifetribe: Come talk to us if you're around at Neurips '24 in Vancouver 🙂🎉 Super fun collaboration with @anianruoss @gregdeletang…
0
1
0
RT @ericmalmi: multiple long-time dreams coming true at once: ✅ give a talk at NeurIPS ♟️ play chess on a stage 🤡 make my international deb…
0
7
0
RT @ericmalmi: if you're at #NeurIPS2024, want to learn how to make LLMs really good at chess and see a live demo, come and visit the @Goog…
0
3
0
Join us for our NeurIPS 2024 demo "Mastering Chess With Language Models" where I'll present some of our recent work on playing chess with LLMs: - - 📅 Wednesday, 09:30 - 10:00 PT 📍 @GoogleDeepMind Booth
0
0
8
RT @rohanpaul_ai: LMAct shows current LLMs still can't consistently learn to act from examples, even with hundreds of demonstrations LMAct…
0
3
0
@DrJimFan More details in this thread:
Ever wonder how well frontier models (Claude 3.5 Sonnet, Gemini 1.5 Flash & Pro, GPT-4o, o1-mini & o1-preview) play Atari, chess, or tic-tac-toe? We present LMAct, an in-context imitation learning benchmark with long multimodal demonstrations (. 🧵 1/N
0
0
0
@polynoamial More details in this thread:
Ever wonder how well frontier models (Claude 3.5 Sonnet, Gemini 1.5 Flash & Pro, GPT-4o, o1-mini & o1-preview) play Atari, chess, or tic-tac-toe? We present LMAct, an in-context imitation learning benchmark with long multimodal demonstrations (. 🧵 1/N
0
0
0
@chanpyb The demonstration episodes are separated by two newline characters (I agree that listing 1 doesn't do a great job of conveying this).
0
0
1
@CynepiaT @PardoFab @SirrahChan @bonniesjli @VladMnih Not yet. We're working on it and will let you know once it's available!
0
0
1
RT @SirrahChan: LMs see, can LMs do? LMAct benchmarks current SOTA foundation models' ability to act in text/visual environments using tex…
0
3
0