Ted Xiao @xiao_ted profile

Ted Xiao

@xiao_ted

Followers

13K

Following

6K

Media

213

Statuses

1K

Robotics and Gemini @GoogleDeepMind. Posts about large models, robot learning, and scaling. Opinions my own.

San Francisco

Joined October 2013

Don't wanna be here? Send us removal request.

Ted Xiao

@xiao_ted

2 months

Robotics + AI has completely transformed: a perfect storm of AI breakthroughs, hardware innovation, and capital inflow. But are general robot foundation models truly just around the corner?. At recent talks, I took an honest look at hype vs. reality to share what’s missing 🧵👇

5

29

176

Ted Xiao

@xiao_ted

1 year

There will be 3-4 massive news coming out in the next weeks that will rock the robotics + AI space. Adjust your timelines, it will be a crazy 2024 📈.

91

232

2K

Ted Xiao

@xiao_ted

1 year

Both MSFT and OpenAI have won massively from capital injections, but Microsoft Research (MSR) has been majorly screwed over. World-class researchers trained and hired for innovative fundamental research — now relegated to “ChatGPT for X” papers because leadership literally does.

26

47

688

Ted Xiao

@xiao_ted

2 years

🚨New RL impact just dropped🚨. 1) My friend is a high level Rocket League player and just alerted me that an open-sourced agent trained with reinforcement learning + self play ( has been steamrolling on public servers! It's in the top 0.5% ELO bracket.

11

92

611

Ted Xiao

@xiao_ted

1 year

🚨Big things are happening in humanoid robotics!🚨. As we saw with drones and quadruped robots, it can take a mere decade for bleeding edge R&D areas to become "solved" platforms for commercial consumer use cases. Once open-ended research questions around reliability, dexterity,

12

87

435

Ted Xiao

@xiao_ted

30 days

New phenomenon appearing: the latest generation of foundation models often switch to Chinese in the middle of hard CoT thinking traces. Why? AGI labs like OpenAI and Anthropic utilize 3P data labeling services for PhD-level reasoning data for science, math, and coding; for.

Rishab Jain

@RishabJainK

1 month

Why did o1 pro randomly start thinking in Chinese? No part of the conversation (5+ messages) was in Chinese. very interesting. training data influence

25

45

439

Ted Xiao

@xiao_ted

2 years

@aidangomezzz you, a heathen, cheering mindlessly when loss go down. me, an X-risk chad, carefully measuring each gradient by hand to make sure it's not over the proscribed limit to prevent FOOM. we are not the same.

3

14

376

Ted Xiao

@xiao_ted

1 year

I can’t emphasize enough how mind-blowing extremely long token context windows are. For both AI researchers and practitioners, massive context windows will have transformative long-term impact, beyond one or two flashy news cycles. ↔️. “More is different”: Just as we saw emergent

6

58

298

Ted Xiao

@xiao_ted

2 years

The golden days of internet-scale models achieving unprecedented zero-shot results seem to be waning. The new Big Thing is subsequent fine tuning with humans increasingly out of the loop. How does this work?. Let’s explore *Prior Amplification* 🔎. (1/N)

4

41

301

Ted Xiao

@xiao_ted

2 years

The optimism in robotics research is absolutely incredible these days! I believe all the pieces we need for a “modern attempt at embodied intelligence” are ready. At recent talks, I pitched a potential recipe, and I’d like to share it with you. Let’s break down the key points 🔑

10

49

261

Ted Xiao

@xiao_ted

1 year

“Sora doesn’t show any new technical innovations” - this take misses the point. 🎯. Sora (and other great polished OAI releases) are not *meant* to provide new knowledge to the research community. They are not papers, careful scientific experiments, or theory that adds to the sum.

6

34

252

Ted Xiao

@xiao_ted

1 year

Instead of just telling robots “what to do”, can we also guide robots by telling them “how to do” tasks?. Unveiling RT-Trajectory, our new work which introduces trajectory conditioned robot policies. These coarse trajectory sketches help robots generalize to novel tasks! 🧵⬇️

3

46

251

Ted Xiao

@xiao_ted

3 years

Our project SayCan ( wins Best Paper Award at the RSS Workshop on Scaling Robot Learning! Thanks to all who came to the poster session with great questions and feedback 😁 . The future of "LLMs for Robotics and Robotics for LLMs" is brighter than ever!

4

25

219

Ted Xiao

@xiao_ted

1 year

(Not AGI 😅).

20

6

199

Ted Xiao

@xiao_ted

1 month

I’ve gradually come around to two paths to embodied AGI that I was very skeptical of before:.1️⃣ solving robotics via reasoning.2️⃣ solving robotics via world modeling. I was previously doubtful not of these approaches themselves, but of timelines and efficiency; the “optimal”

8

25

210

Ted Xiao

@xiao_ted

9 months

Open X-Embodiment wins the Best Paper Award at #ICRA2024 🎉🤖!. An unprecedented Best Paper 170+ author list (most didn’t fit on the slide) may be a record for ICRA! So amazing to see what a collaborative community effort can accomplish in pushing robotics + AI forward 🚀

5

31

187

Ted Xiao

@xiao_ted

1 year

Announcing Open X-Embodiment! One of the most ambitious and large-scale robot learning collaborations to date. Intelligent robots have seen tremendous progress the past years by leveraging big real-world datasets, high capacity network architectures, and

4

40

183

Ted Xiao

@xiao_ted

2 years

Robot learning systems have scaled to many complex tasks, but generalization to novel objects is still challenging. Can we leverage pre-trained VLMs to expand the open-world capabilities of robotic manipulation policies?. Introducing MOO: Manipulation of Open-World Objects!🐮🧵👇

3

37

178

Ted Xiao

@xiao_ted

2 years

My team is looking for outstanding PhD students (graduating after 2023) who are interested in a student researcher internship exploring how simulation can improve real robot manipulation capabilities! .

5

33

172

Ted Xiao

@xiao_ted

11 months

Our team is hiring strong research engineers to work on scaling robotics + AI!. Apply here:

7

40

164

Ted Xiao

@xiao_ted

6 months

🚨In case you missed it: it’s been non-stop technical progress updates from Chinese humanoid companies the past 48 hours! 🚨. In just the last 2 days, batched updates from numerous exciting Chinese humanoid companies all dropped at once:. 1) @UnitreeRobotics showcases technical

5

32

161

Ted Xiao

@xiao_ted

2 years

I'll be @Stanford tomorrow for a guest lecture for CS25 (organized by @DivGarg9 ). If you're interested in how internet-scale models can dramatically accelerate scaling robot learning in the real world, come say hi!.

2

13

157

Ted Xiao

@xiao_ted

2 years

SayCan wins the Special Innovation Award at #CoRL2022!! . ✅ LLMs for robotics.✅ real world tasks at scale.✅ 13 robots, MANY collaborators, 17 months.✅ 3 live demos from across the world

3

12

156

Ted Xiao

@xiao_ted

6 months

The field of Robotics + AI is at a very interesting moment in time right now because we’re seeing the evolution of how optimism from the last 10 years of research is carrying over to new frontiers: towards commercialization, towards more complex morphologies and applications,.

2

18

111

Ted Xiao

@xiao_ted

2 years

What if you used LLMs for… all parts of RL?. GPT-4 as the Actor and Critic.GPT-4 as the Bellman Update.GPT-4 as the reward function.Vector DB as the replay buffer.GPT-4 as the importance sampling logic.

Jim Fan

@DrJimFan

2 years

What if we set GPT-4 free in Minecraft? ⛏️. I’m excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in-context. Voyager continuously improves itself by writing, refining, committing, and retrieving *code* from a skill library. GPT-4 unlocks

9

17

147

Ted Xiao

@xiao_ted

4 years

Differentiable physics engines show *huge* wall time speed ups vs. standard MuJoCo 🤯

hardmaru

@hardmaru

4 years

Speeding Up Reinforcement Learning with a New Physics Simulation Engine . Blog post by @bucketofkets on Brax, a new physics simulation engine that matches the performance of a large compute cluster with just a single TPU or GPU.

2

22

137

Ted Xiao

@xiao_ted

1 year

New @GoogleDeepMind blog post covering 3 recent works on AI + robotics!. 1. Auto-RT scales data collection with LLMs and VLMS. 2. SARA-RT significantly speeds up inference for Robot Transformers. 3. RT-Trajectory introduces motion-centric goals for robot generalization.

Google DeepMind

@GoogleDeepMind

1 year

How could robotics soon help us in our daily lives? 🤖. Today, we’re announcing a suite of research advances that enable robots to make decisions faster as well as better understand and navigate their environments. Here's a snapshot of the work. 🧵

6

22

134

Ted Xiao

@xiao_ted

2 years

Some recent news: 5 projects to appear at #RSS2023 and 1 at #ICML2023! 🥳🤖. 1) RT-1: 2) DIAL: 3) ROSIE: 4) RLS: 5) JSRL: 6) LLM + Robotics Demos, TBA!.

2

14

134

Ted Xiao

@xiao_ted

7 months

Action-packed day of robotics + AI at #RSS2024 tomorrow on Monday, July 15!. Hope to see you:.- I'll be giving a talk at the GenAI-HRI workshop at 11AM.- I'll be at the Data Gen workshop from 2PM - 4PM (

1

20

133

Ted Xiao

@xiao_ted

2 years

Robot-language datasets have enabled tremendous progress in robotics🤖. However, semantic concepts may not be fully captured by existing language labels, which are often expensive to collect. In our new paper, we study how we can get more mileage out of existing datasets! 🧵👇

1

27

129

Ted Xiao

@xiao_ted

7 months

Announcing one of the most unique and ambitious robot competitions ever: 🏁The Earth Rover Challenge at #IROS2024!. ✅Globally distributed in-the-wild evaluation.✅Real world navigation task settings.✅Large training dataset provided. Details below 👇

3

25

132

Ted Xiao

@xiao_ted

2 years

Excited to share 3 papers recently accepted at @corl_conf 🤖:. 1) SayCan: Robots ground language models plans with what’s achievable and contextually appropriate. 1.5 year effort with tons of collaborators!. Paper: Paper sites:

8

127

Ted Xiao

@xiao_ted

3 years

no good deed unpunished 😂

Ted Xiao

@xiao_ted

3 years

Paper reviewing is not only an honorable civic duty but is also very rewarding (very +EV). It’s the Barry’s Bootcamp of AI research.

3

0

120

Ted Xiao

@xiao_ted

11 months

The writing is clearly on the wall now: a path to embodied intelligence is more clear than ever!. Congrats to OpenAI and Figure on the exciting partnership, and welcome to the intelligent robotics game 📈.

Figure

@Figure_robot

1 year

Last month we demonstrated Figure 01 making coffee only using neural networks. This is a fully learned, end-to-end visuomotor policy mapping onboard images to low level actions at 200hz. Next up: excited to push the boundaries on AI learning with OpenAI.

4

11

122

Ted Xiao

@xiao_ted

1 year

6 years of paradigm shifts in robot learning! What's next? 📈

4

18

122

Ted Xiao

@xiao_ted

1 year

I am convinced that the amount of robot data collected for learning will be larger in 2024-2025 than all previous years combined. Amazing progress in creative teleop (gloves, puppeteering), in cross-embodiment learning, and in large scale industry data engines (humanoids) 🚀.

Cheng Chi

@chichengcc

1 year

Can we collect robot data without any robots?. Introducing Universal Manipulation Interface (UMI). An open-source $400 system from @Stanford designed to democratize robot data collection. 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)

2

21

116

Ted Xiao

@xiao_ted

1 year

We’re living through the golden age of AI Research: . - Highly leveraged impact thanks to successful methods scaling and transferring easily to other domains (ie. generative modeling ideas landing in language, robotics, representation learning). - Relatively low ramp-up cost due.

Aidan Clark

@_aidan_clark_

1 year

Seeing people tweet about doing PhD apps and I’ll just say I got into 0/8 of the programs I applied to and things turned out great. There are lots of opportunities for research, don’t stress :).

4

8

118

Ted Xiao

@xiao_ted

10 months

Extremely thought-provoking work that essentially says the quiet part out loud: general foundation models for robotic reasoning may already exist *today*. LLMs aren’t just about language-specific capabilities, but rather about vast and general world understanding.

Edward Johns

@Ed__Johns

10 months

Very excited to announce: Keypoint Action Tokens!. We found that LLMs can be repurposed as "imitation learning engines" for robots, by representing both observations & actions as 3D keypoints, and feeding into an LLM for in-context learning. See: More 👇

1

17

117

Ted Xiao

@xiao_ted

8 months

> year is 2031.> be me, token farmer in the slop mines.> bring in 20k novel tokens for weighing at the Token Utility station.> my token haul is two stddevs over the compressibility threshold, phew.> get my daily blueprint nutrition allocation.> trickle down tokenomics utopia.

Rohan Pandey

@khoomeik

9 months

📢 Excited to finally be releasing my NeurIPS 2024 submission!. Is Chinchilla universal? No! We find that:.1. language model scaling laws depend on data complexity.2. gzip effectively predicts scaling properties from training data. As compressibility 📉, data preference 📈. 🧵⬇️

8

116

Ted Xiao

@xiao_ted

2 years

How do we connect the huge diverse world of bits (digital domain) with the physical world of atoms (robotics domain)?. Our team has begun exploring how to bridge this gap with a variety of different approaches! The projects in blue were announced in just the past 2 weeks:

3

12

112

Ted Xiao

@xiao_ted

2 years

Real robot data is expensive to collect; can diffusion models perform meaningful and diverse visual data augmentation?. Presenting "Scaling Robot Learning with Semantically Imagined Experience" at #RSS2023 next week!. Website: Session: Tues 7/12, 3-5PM

2

14

108

Ted Xiao

@xiao_ted

5 months

Promising progress from 1X on learned world models which improve with more experience and physical interaction data. What I'm excited about:.- World models are likely the only path forward for reproducible and scalable evaluations in *multi-agent settings*; see success of world.

1X

@1x_tech

5 months

hello, world model.

3

10

104

Ted Xiao

@xiao_ted

2 years

While LLMs have been amazing for high-level reasoning, low-level policy learning is still the bottleneck in robotics. Over 17 months (!!), our team has scaled a massively multitask Transformer-based model to over 700 tasks using 130k real-world episodes. Check out the 🧵 👇.

Karol Hausman

@hausman_k

2 years

Introducing RT-1, a robotic model that can execute over 700 instructions in the real world at 97% success rate!. Generalizes to new tasks✅.Robust to new environments and objects✅.Fast inference for real time control✅.Can absorb multi-robot data✅.Powers SayCan✅.🧵👇

3

5

104

Ted Xiao

@xiao_ted

2 years

Reinforcement learning (!).at scale in the real world (!!).for useful robotics tasks (!!!).in multiple "in-the-wild" offices (!!!!) . A culmination of years of effort -- so excited finally be able to share this publicly! 🤖.

Karol Hausman

@hausman_k

2 years

Very exited to announce our largest deep RL deployment to date: robots sorting trash end-to-end in real offices!. (aka RLS). This project took a long time (started before SayCan/RT-1/other newer works) but the learnings from it have been really valuable.🧵

3

15

94

Ted Xiao

@xiao_ted

1 year

Happy to share that RT-Trajectory has been accepted as a Spotlight (Top 5%) at #ICLR2024!. This was my first last-author project, it was a ton of fun collaborating with a strong team led by @Jiayuan_Gu 🥳. Blogpost: Website: 🧵⬇️.

Ted Xiao

@xiao_ted

1 year

Instead of just telling robots “what to do”, can we also guide robots by telling them “how to do” tasks?. Unveiling RT-Trajectory, our new work which introduces trajectory conditioned robot policies. These coarse trajectory sketches help robots generalize to novel tasks! 🧵⬇️

5

14

96

Ted Xiao

@xiao_ted

3 years

my team is still using TF2, and in my humble opinion we are still on the bleeding edge of robot learning 🤷🏻‍♂️.

Yann LeCun

@ylecun

3 years

The great competition between Deep Learning frameworks enters a new phase. Now that Google's TensorFlow has lost to Meta's PyTorch, Google is internally switching to JAX.

5

6

90

Ted Xiao

@xiao_ted

1 year

Made a new friend today, Unitree H1!. In 2019, Unitree visited to showcase an early version of the Go1, which has become a stable and affordable market leader. It’ll be fun to look back in 2027 on today and see how much progress has been made in humanoid robotics!

1

2

90

Ted Xiao

@xiao_ted

1 year

Foundation models have shown impressive results on robots via language or code -- but these may not be good fits for grounded robotics problems. Our method PIVOT poses robot control as a VQA problem and introduces *iterative* visual optimization over robot actions! 🧵

1

19

91

Ted Xiao

@xiao_ted

1 year

AutoRT is now out on Arxiv!. Check out how we set up fleet-scale data collection by leveraging Foundation Models for robot orchestration 📈 . Website: Paper: Original Thread:

AK

@_akhaliq

1 year

Google presents AutoRT. Embodied Foundation Models for Large Scale Orchestration of Robotic Agents. paper page: demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both

2

19

85

Ted Xiao

@xiao_ted

1 month

Extremely saddened to hear about Felix’s passing. His thoughtful research contributions played a major part of my own development as a scientist. Most recently, his poignant essay on AI and stress humanized the pressure cooker that is today’s AI environment. It resonated deeply.

Felix Hill

@FelixHill84

4 months

Do you work in AI? . Do you find things uniquely stressful right now, like never before? . Haver you ever suffered from a mental illness? . Read my personal experience of those challenges here: .

3

6

89

Ted Xiao

@xiao_ted

10 months

There is increased decoupling between academia and real-world impacts, especially in robotics. Reviewing an impressive demo paper (end-to-end system deployed in production warehouses on 100ks of wild objects) and it's getting slaughtered by other reviewers. Many such cases.🤡.

5

3

86

Ted Xiao

@xiao_ted

5 months

Action-conditioned world models are one step closer! Neural simulations hold a lot of promise for scaling up real-world interaction data, especially for domains which physics-based simulators struggle at. 🚀.

AK

@_akhaliq

5 months

Google presents Diffusion Models Are Real-Time Game Engines. discuss: We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality.

0

14

84

Ted Xiao

@xiao_ted

5 months

“yeah that’s definitely just a dude in a suit” - the highest compliment a humanoid company can get. Congrats to @ericjang11 @BerntBornich and the team at 1X for a new SOTA humanoid!.

1X

@1x_tech

5 months

Introducing NEO Beta. Designed for humans. Built for the home.

3

2

81

Ted Xiao

@xiao_ted

5 months

Announcing the 3rd Workshop on Language and Robot Learning at #CoRL2024 in Munich! This year's theme is "Language as an Interface". Featuring a great speaker lineup and two panels 🤖. Website: CfP: Deadline: October 2

2

11

83

Ted Xiao

@xiao_ted

2 years

Introducing RT-2, representing the culmination of two trends:. - Tokenize and train everything together: web-scale text, images, and robot data.- VLMs are not just representations, big VLMs *are policies*. Sounds subtle, but we’ll look back on this as an inflection point! 📈.

Google DeepMind

@GoogleDeepMind

2 years

Today, we announced 𝗥𝗧-𝟮: a first of its kind vision-language-action model to control robots. 🤖. It learns from both web and robotics data and translates this knowledge into generalised instructions. Find out more:

1

17

82

Ted Xiao

@xiao_ted

1 year

Excited to share that 4 of our recent works will appear at #ICRA 2024 in Yokohama!. (1) RT-X: Open X-Embodiment and (2) GenGap expand and study large generalist datasets. (3) PromptBook and (4) PG-VLM improve the robotic reasoning capabilities of frontier LLMs and VLMs. 🧵👇.

5

13

82

Ted Xiao

@xiao_ted

2 years

Looking forward to showcasing one of the first foundation models for robotics at #RSS2023 next week!. Presenting "RT-1: Robotics Transformer for Real-world Control at Scale" from the Google DeepMind robotics team. Website: Session: Tuesday 7/12, 3PM-5PM

1

20

79

Ted Xiao

@xiao_ted

1 year

Update: Our recent work "Manipulation of Open-World Objects" (MOO) has been accepted to #CoRL2023!. Extremely simple object-centric representations (literally just a single pixel) can significantly boost robot generalization!. Check out the original thread + new updates below⬇️

Ted Xiao

@xiao_ted

2 years

Robot learning systems have scaled to many complex tasks, but generalization to novel objects is still challenging. Can we leverage pre-trained VLMs to expand the open-world capabilities of robotic manipulation policies?. Introducing MOO: Manipulation of Open-World Objects!🐮🧵👇

2

20

78

Ted Xiao

@xiao_ted

11 months

The secret ingredient behind robot foundation models has been compositional generalization, which enables positive transfer across different axes of generalization. I'm excited to share our new project where we use compositional generalization to inform data collection! 🧵⬇️

Jensen Gao

@jensen_gao

11 months

How should we efficiently collect robot data for generalization? We propose data collection procedures guided by the abilities of policies to compose environmental factors in their data. Policies trained with data from our procedures can transfer to entirely new settings. (1/8)

1

10

75

Ted Xiao

@xiao_ted

2 years

@DrJimFan Agree 100%. Feels extremely lucky to be a witness and participant in this era of history.

Ted Xiao

@xiao_ted

2 years

Our grandchildren will look back at our timeline as a step function - a binary before and after of amazing technologies. But I’m so grateful to be part of this most magical time period where we are living through and defining history. Cheers to 2022 and an even better 2023! 🎊

3

4

75

Ted Xiao

@xiao_ted

9 months

And that’s a wrap for a fantastic week of robots, AI, and sushi at #ICRA2024! . Great to catch up with old friends and meet new ones. And as always, a fun time spreading the gospel of big data and generalist models 🚀

3

4

77

Ted Xiao

@xiao_ted

1 year

Announcing the 2nd Workshop on Language and Robot Learning at #CoRL2023 on November 6th, 2023! This year's theme is "Language as Grounding". Featuring a great speaker lineup and two panels!. Website: CfP: Deadline: October 1

2

12

74

Ted Xiao

@xiao_ted

9 months

Great debate today at #ICRA2024 on “Generative AI will make a lot of traditional robotics approaches obsolete"! . But I suspect 57% of the room will be very shocked/unhappy over the next 5 years 🙃

9

14

74

Ted Xiao

@xiao_ted

2 years

Congrats to Ding Liren for becoming the World Chess Champion! 👑♟️. What an exhilarating series and an amazing Game 4 of the Rapid Tiebreakers, where Ding’s mental game shined through. Well deserved victory!.

Chess.com

@chesscom

2 years

Ding Liren wins the 2023 FIDE World Championship 🏆. Congratulations to Ding on becoming the new FIDE World Champion, and cementing his place in chess history after a thrilling match! 👏

1

8

70

Ted Xiao

@xiao_ted

1 year

It's clear now that the "one robot to many robots" revolution is going to be momentous for scaling smart robots. Happy to fill in the next milestone on the "paradigm shift" timeline!. Open X-Embodiment + RT-X:.Website: Arxiv:

1

10

72

Ted Xiao

@xiao_ted

2 years

Robotics is the answer to the not-so-secret that many generative modeling domains have saturated existing public datasets (HQ art, code, even some types of text). If you need to keep growing # tokens for chinchilla-optimal scaling, then interaction (robotics!) gets attractive.

6

5

70

Ted Xiao

@xiao_ted

9 months

Robotics progress is unbelievably fast these days🚀. Excited to share a few items on my agenda this week at a jam-packed #ICRA2024, covering numerous works exploring the intersection of foundation models and robotics. 🧵👇

1

11

72

Ted Xiao

@xiao_ted

2 years

Wow, 4 years later, the MineRL Diamond challenge has been solved *without demonstrations* 🤯! In 2019, the best solutions used many priors and demos but still couldn’t solve the task. In 2022, VPT was the 1st method to collect diamonds, but with demos + IDM. Congrats DreamerV3!.

Google DeepMind

@GoogleDeepMind

2 years

Introducing DreamerV3: the first general algorithm to collect diamonds in Minecraft from scratch - solving an important challenge in AI. 💎. It learns to master many domains without tuning, making reinforcement learning broadly applicable. Find out more:

1

17

69

Ted Xiao

@xiao_ted

11 months

Amazing unveil from @DrJimFan and @yukez: a cross-embodiment *humanoid* foundation model project in just three months (!!). Exciting to see long-term seemingly unrelated bets by NVIDIA pay off: extensive sim2real, multimodal robot policies, and top-tier accelerators! 👏.

Jim Fan

@DrJimFan

11 months

Today is the beginning of our moonshot to solve embodied AGI in the physical world. I’m so excited to announce Project GR00T, our new initiative to create a general-purpose foundation model for humanoid robot learning. The GR00T model will enable a robot to understand multimodal

4

10

67

Ted Xiao

@xiao_ted

2 months

🚨 New Model Alert! 🚨. Gemini 2.0 Flash is an extremely strong multimodal model which showcases impressive spatial reasoning capabilities: understanding that we live in a 3D world with consistent rules of physics and geometric/semantic relationships. 🌎. Examples below! 🧵

2

12

69

Ted Xiao

@xiao_ted

11 months

A major debate in robot learning is whether language is only a good modality for abstract high-level semantics but not low-level motion. Our recent work RT-Hierarchy shows that granular *language motions* can go surprisingly far! . Thread below ⬇️.

Suneel Belkhale

@suneel_belkhale

11 months

Is language capable of representing low-level *motions* of a robot?. RT-Hierarchy learns an action hierarchy using motions described in language, like “move arm forward” or “close gripper” to improve policy learning. 📜: 🏠: (1/10)

1

10

64

Ted Xiao

@xiao_ted

1 year

Nice work from Berkeley showing how motion-centric information is useful for robot policies to exhibit cross-embodiment transfer! . Brings together some ideas from coarse egocentric trajectories ( and dense point tracking flow ( 🌊.

Xingyu Lin

@Xingyu2017

1 year

What state representation should robots have? 🤖 I’m thrilled to present an Any-point Trajectory Model (ATM), which models physical motions from videos without additional assumptions and shows significant positive transfer from cross-embodiment human and robot videos! 🧵👇

2

9

66

Ted Xiao

@xiao_ted

1 year

Day 3 of #CoRL2023! We are presenting three works bridging internet-scale foundation models with robotics. See you at Poster Session 4 at 5:15PM!. - Langauge to Reward: - RT2: - MOO:

0

10

64

Ted Xiao

@xiao_ted

8 months

Very nice new post from @natolambert on the vibe shift in robotics foundation models! .

3

4

29

Ted Xiao

@xiao_ted

7 months

Interested in how VLMs can *already* understand embodied reasoning without any finetuning? Our method PIVOT ( shows this can work by reasoning about actions as visual annotations! . Check out our poster at #ICML2024 today, at 1:30PM Hall C 4-9 #109!

5

10

65

Ted Xiao

@xiao_ted

2 years

Generalization is notoriously hard, but especially so in difficult settings like vision-based robot manipulation. Our recent work *GenGap* decomposes this complex problem into different *generalization axes*! (1/5). Website: Paper:

1

11

64

Ted Xiao

@xiao_ted

1 year

To scale robot data collection effectively 🚀, it's clear that we need go beyond the 1 human : 1 robot ratio. Towards this, we introduce AutoRT: leveraging foundation model reasoning and planning for robot orchestration at scale!. Check out the thread from Keerthana below 🧵

Keerthana Gopalakrishnan

@keerthanpg

1 year

In the last two years, large foundation models have proven capable of perceiving and reasoning about the world around us unlocking a key possibility for scaling robotics. We introduce a AutoRT, a framework for orchestrating robotic agents in the wild using foundation models!

0

6

64

Ted Xiao

@xiao_ted

2 years

Our grandchildren will look back at our timeline as a step function - a binary before and after of amazing technologies. But I’m so grateful to be part of this most magical time period where we are living through and defining history. Cheers to 2022 and an even better 2023! 🎊

0

12

62

Ted Xiao

@xiao_ted

9 months

Scalable evaluation is a major bottleneck for generalist real-world robot policies. Projects like RT-1 and RT-2 required *thousands* of evaluation trials in the real world 😱. In our new work SIMPLER, we evaluate robot policies in simulation to predict real world performance! 👇

Kyle Hsu

@kylehkhsu

9 months

[1/14] Real robot rollouts are the gold standard for evaluating generalist manipulation policies, but is there a less painful way to get good signal for iterating on your design decisions? Let’s take a deep dive on SIMPLER 🧵👇 (or see quoted video)!.

2

10

61

Ted Xiao

@xiao_ted

5 months

Molmo is a very exciting multimodal foundation model release, especially for robotics. The emphasis on pointing data makes it the first open VLM optimized for visual grounding — and you can see this clearly with impressive performance on RealworldQA or OOD robotics perception!

Kiana Ehsani

@ehsanik

5 months

Try out Molmo on your application! This is a great example by @DJiafei! We have a few videos describing Molmo's different capabilities on our blog! This one is me trying it out on a bunch of tasks and images from RT-X:

0

9

60

Ted Xiao

@xiao_ted

1 year

In contrast, I��m grateful to be part of a growing all-star team at GDM Robotics pushing forwards on the frontier of Embodied AI 🙌. I’m very optimistic about real-world tokens being indispensable for AGI!.

Pedro Domingos

@pmddomingos

1 year

What do Google, Microsoft and OpenAI have in common? They all had robotics projects and gave up on them.

2

3

59

Ted Xiao

@xiao_ted

9 months

Hello Japan🗾🇯🇵, I’m in Yokohoma for #ICRA2024 the next 10 days. Looking forward to sharing some of our team's recent work on robotics + AI. Say hi if you're interested in robot learning, scaling, or foundation models!.

2

58

Ted Xiao

@xiao_ted

1 year

Can "teachability" be a core LLM capability that can be learned via finetuning?. Introducing Language Model Predictive Control (LMPC): distilling entire in-context teaching sessions via in-weight finetuning improves the teachability of LLMs!. Website: 🧵👇

Jacky Liang

@jackyliang42

1 year

We can teach LLMs to write better robot code through natural language feedback. But can LLMs remember what they were taught and improve their teachability over time?. Introducing our latest work, Learning to Learn Faster from Human Feedback with Language Model Predictive Control

1

12

57

Ted Xiao

@xiao_ted

5 months

Ilya once said that when super-human world modeling is achieved, robotics would be solved 6 months later: just solve AGI, and then ask the AGI to solve robotics. But is this plausible given current trends? 🤔 . Today’s SOTA frontier models soak up human-centric priors

6

8

59

Ted Xiao

@xiao_ted

10 months

Emergent RL capabilities have never looked so cute before 🥹. Check out an amazing effort from colleagues on scaling RL + self-play to real world football!.

Google DeepMind

@GoogleDeepMind

10 months

Soccer players have to master a range of dynamic skills, from turning and kicking to chasing a ball. How could robots do the same? ⚽. We trained our AI agents to demonstrate a range of agile behaviors using reinforcement learning. Here’s how. 🧵

0

11

58

Ted Xiao

@xiao_ted

2 years

Q: What happens when you combine LLMs, general robot manipulation policies, and a live remote demo from halfway across the world?. A: The Google DeepMind demo at #RSS2023 on Tuesday. Join us at the 2:30PM Demo Session on 7/11!. w/ @andyzeng_ @hausman_k @brian_ichter and🤖

3

7

58

Ted Xiao

@xiao_ted

1 year

Tons of work left on the path to Embodied AI, but the puzzle pieces are coming together: improving the physical intelligence of foundation models, scaling up data collection systems, and focusing on more diverse + general capabilities. Quite a few new works coming out soon 👀.

2

6

56

Ted Xiao

@xiao_ted

7 months

Robotics will increasingly resemble the field of foundation models: scaling, pretraining, and post-training generalist models. But the opposite is also true. Challenges in large-scale foundation modeling have started looking more and more like perennial problems in robotics.

2

6

56

Ted Xiao

@xiao_ted

3 months

VLMs can express their temporal and semantic knowledge of robotics via *predicting value functions of robot videos*! Check out our recent work on leveraging VLMs like Gemini for Generative Value Learning 🧵👇.

Jason Ma

@JasonMa2020

3 months

Excited to finally share Generative Value Learning (GVL), my @GoogleDeepMind project on extracting universal value functions from long-context VLMs via in-context learning!. We discovered a simple method to generate zero-shot and few-shot values for 300+ robot tasks and 50+

0

7

57

Ted Xiao

@xiao_ted

1 year

The major thing going for IL is that it turns open-ended research risks (RL tuning, GAIL instabilities, reward shaping) into a data engineering problem, which is much more tractable and easier to make consistent progress on. Intellectually disappointing but effective 🫠.

Eugene Vinitsky 🍒

@EugeneVinitsky

1 year

Huge fraction of all written text as its training data and GPT-4 still makes basic reasoning mistakes; kinda makes you suspicious about imitation learning as a robust approach in other settings e.g. driving, robotics.

4

3

54

Ted Xiao

@xiao_ted

1 year

Had a great time today with @YevgenChebotar and @QuanVng visiting @USCViterbi to give a talk on “Robot Learning in the Era of Foundation Models”. Slides out soon, packed with works from *just the past 5 months* 🤯. Thanks to @daniel_t_seita for hosting!

1

3

57

Ted Xiao

@xiao_ted

1 year

@DrJimFan Nice detective work! Agree with architectural guesses (1), (3), (5). I think (2) and (4) are a bit less obvious; very high DoF l + long horizon + high frequency control means that standard decisions decisions in manipulation may not be enough. But video-level tokenization is hard.

1

2

56

Ted Xiao

@xiao_ted

4 months

Cool to see the long-awaited Tesla humanoid update! Nice progress on a new high-DoF hand and better locomotion. It’s true that most “mind-blowing” behaviors yesterday were teleoperated, and many researchers are understandably not happy about the marketing first approach. But

5

6

56

Ted Xiao

@xiao_ted

12 days

Excited to share that 3 works exploring the frontier of foundation models + robotics are accepted to ICRA and ICLR 2025!. - RT-Affordance: visual affordances for VLA training.- STEER: how VLMs can orchestrate control via dense language.- GVL: In-context Value Functions. Links👇.

1

7

57

Ted Xiao

@xiao_ted

2 years

Excited to showcase how generative models can be used for semantically relevant image augmentation for robotics. ROSIE uses a diffusion model to produce semantically relevant visual augmentations on existing datasets -- unlocking new skills and more robust policies 🖌️🎨.

Fei Xia

@xf1280

2 years

Text-to-image generative models, meet robotics! . We present ROSIE: Scaling RObot Learning with Semantically Imagined Experience, where we augment real robotics data with semantically imagined scenarios for downstream manipulation learning. Website: 🧵👇

1

7

51

Ted Xiao

@xiao_ted

5 months

How can we connect advances in video generation foundation models to low-level robot actions? We propose a simple but powerful idea: condition robot policies directly on generated human videos!. Video Generation 🤝 Robot Actions. Check out Homanga’s thread:.

Homanga Bharadhwaj

@mangahomanga

5 months

Gen2Act: Casting language-conditioned manipulation as *human video generation* followed by *closed-loop policy execution conditioned on the generated video* enables solving diverse real-world tasks unseen in the robot dataset!. 1/n

1

9

54

Ted Xiao

@xiao_ted

11 months

Large scale cross-embodied language-conditioned agents in a variety of video game domains! What’s particularly exciting is seeing positive transfer: generalist agents outperform specialist agents.

Google DeepMind

@GoogleDeepMind

11 months

Introducing SIMA: the first generalist AI agent to follow natural-language instructions in a broad range of 3D virtual environments and video games. 🕹️. It can complete tasks similar to a human, and outperforms an agent trained in just one setting. 🧵

0

4

50

Ted Xiao

@xiao_ted

1 year

Announcing our recent work RT-Sketch!. ❌Goal images contain a lot of useful information, but perhaps too much .❌Language instructions may not provide enough information .✅Goal *sketches* focus on the important details, and are easy to specify!. Checkout the thread:

Priya Sundaresan

@priyasun_

1 year

We can tell our robots what we want them to do, but language can be underspecified. Goal images are worth 1,000 words, but can be overspecified. Hand-drawn sketches are a happy medium for communicating goals to robots!. 🤖✏️Introducing RT-Sketch: 🧵1/11

2

7

51

Ted Xiao

@xiao_ted

2 years

6) This is the highest praise possible if you’re in ML research. This is the peak 😎

15

3

46

Ted Xiao

@xiao_ted

20 days

A lot of interesting discussion recently on the implications of OpenAI being a major sponsor of the FrontierMath benchmark. There are two extremes of what this implies:.1) omg OAI is benchmark hacking and leaking test data into training! .2) OAI uses questions in the private

6

7

51