Anian Ruoss @anianruoss profile

Anian Ruoss

@anianruoss

Followers

378

Following

101

Statuses

160

Research Engineer at Google DeepMind Previously: ETH Zurich

London

Joined May 2021

Don't wanna be here? Send us removal request.

Anian Ruoss

@anianruoss

4 days

RT @mbalunovic: We finally have an answer to the debate over whether LLMs generalize to new math problems or they merely memorized the answ…

0

164

0

Anian Ruoss

@anianruoss

4 days

@giffmana @TheXeophon That said, the recipe to achieve strong chess play is fairly clear (i.e., the first paper) — it's just that frontier labs focus on other (more important) things 🙂

1

0

2

Anian Ruoss

@anianruoss

5 days

RT @giffmana: @TheXeophon Check out the recent work by @anianruoss Eg

0

1

0

Anian Ruoss

@anianruoss

5 days

RT @daiosai: 🚀 Meet the foundation of a distributed, AI-powered web. 🤫Stealth mode, waitlist open. With Daios, you…

0

2

0

Anian Ruoss

@anianruoss

1 month

@MavorParker @pcastr As Rishabh mentioned above () we evaluated LLMs on Atari, chess, crosswords, DM Control, grid world, and tic-tac-toe:

Rishabh Agarwal

@agarwl_

1 month

@pcastr : I think Atari might be too reactive as a benchmark for LLMs although not sure if we can find tune LLMs to solve it faster than scratch agents.

1

0

1

Anian Ruoss

@anianruoss

2 months

RT @ni_jovanovic: SynthID-Text by @GoogleDeepMind is the first large-scale LLM watermark deployment, but its behavior in adversarial scenar…

0

15

0

Anian Ruoss

@anianruoss

2 months

RT @activelifetribe: Come talk to us if you're around at Neurips '24 in Vancouver 🙂🎉 Super fun collaboration with @anianruoss @gregdeletang…

0

1

0

Anian Ruoss

@anianruoss

2 months

RT @ericmalmi: multiple long-time dreams coming true at once: ✅ give a talk at NeurIPS ♟️ play chess on a stage 🤡 make my international deb…

0

7

0

Anian Ruoss

@anianruoss

2 months

RT @ericmalmi: if you're at #NeurIPS2024, want to learn how to make LLMs really good at chess and see a live demo, come and visit the @Goog…

0

3

0

Anian Ruoss

@anianruoss

2 months

Join us for our NeurIPS 2024 demo "Mastering Chess With Language Models" where I'll present some of our recent work on playing chess with LLMs: - - 📅 Wednesday, 09:30 - 10:00 PT 📍 @GoogleDeepMind Booth

0

8

Anian Ruoss

@anianruoss

2 months

RT @rohanpaul_ai: LMAct shows current LLMs still can't consistently learn to act from examples, even with hundreds of demonstrations LMAct…

0

3

0

Anian Ruoss

@anianruoss

2 months

@DrJimFan More details in this thread:

Anian Ruoss

@anianruoss

2 months

Ever wonder how well frontier models (Claude 3.5 Sonnet, Gemini 1.5 Flash & Pro, GPT-4o, o1-mini & o1-preview) play Atari, chess, or tic-tac-toe? We present LMAct, an in-context imitation learning benchmark with long multimodal demonstrations (. 🧵 1/N

0

Anian Ruoss

@anianruoss

2 months

@polynoamial More details in this thread:

Anian Ruoss

@anianruoss

2 months

Ever wonder how well frontier models (Claude 3.5 Sonnet, Gemini 1.5 Flash & Pro, GPT-4o, o1-mini & o1-preview) play Atari, chess, or tic-tac-toe? We present LMAct, an in-context imitation learning benchmark with long multimodal demonstrations (. 🧵 1/N

0

Anian Ruoss

@anianruoss

2 months

@chanpyb The demonstration episodes are separated by two newline characters (I agree that listing 1 doesn't do a great job of conveying this).

0

1

Anian Ruoss

@anianruoss

2 months

@CynepiaT @PardoFab @SirrahChan @bonniesjli @VladMnih Not yet. We're working on it and will let you know once it's available!

0

1

Anian Ruoss

@anianruoss

2 months

RT @SirrahChan: LMs see, can LMs do? LMAct benchmarks current SOTA foundation models' ability to act in text/visual environments using tex…

0

3

0