Andy Zeng
@andyzeng_
Followers
8K
Following
986
Media
60
Statuses
368
Building smarter robots @GoogleDeepMind. PhD @Princeton. CS & Math @UCBerkeley
Joined September 2017
Can robots learn to pick up stuff and accurately toss them into bins outside its natural range? Check out our latest work, TossingBot! w/ @SongShuran, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser #robotics #AI #research
9
156
476
Code-writing LLMs are surprisingly good at 📝 writing reward functions for 🦾 MPC low-level control – providing a chat-like interface to teach robots to things like "stand up and moon-walk" 🐶. Read more about it here 👇 and @xf1280's 🧵.
🤖Excited to share our project where we propose to use rewards represented in code as a flexible interface between LLMs and an optimization-based motion controller. website: Want to learn more about how we make a robot dog do moonwalk MJ style?🕺🕺
5
97
394
Can robots 🤖 to navigate to sounds 🔊 they've heard?. w/ audio-language 🔊✏️ foundation models, excited that we can now ask our helper robots to "go to where you heard coughing". Audio-Visual-Language Maps w/ @huang_chenguang @oier_mees @wolfram_burgard:
1
46
201
We built PaLM-E 🌴🤖 one of the largest multimodal language models to date, trained end-to-end on robot data. Images, text, state inputs, neural scene embeddings – you name it. And it's fantastic on robots. Check out Danny's thread 👇.
What happens when we train the largest vision-language model and add in robot experiences?.The result is PaLM-E 🌴🤖, a 562-billion parameter, general-purpose, embodied visual-language generalist - across robotics, vision, and language. Website:
2
26
197
Still crazy to me that we can prompt LLMs (GPT-3 or PaLM) with a bunch of numbers 📝 to discover and improve closed-loop policies that stabilize CartPole – entirely in-context w/o model finetuning. Read more in @suvir_m's post 👇 and try out the code
In a new preprint, we assess LLMs’ in-context learning abilities for *abstract* non-linguistic patterns—& explore how this might be useful for robotics. Examples:.-extrapolating symbolic patterns.-extending periodic motions.-discovering simple policies (e.g. for CartPole).(1/8)
0
39
195
Excited to share "Visual Language Maps"! VLMaps fuse visual language model features into a dense 3D map for robot navigation from natural language instructions. Website: Led by the amazing @huang_chenguang w/ @oier_mees, @wolfram_burgard
3
31
159
Tried ImageNet pre-training for your robot learning models only to find out it didn't help? Turns out, which dataset you use & which weights you transfer, matters a lot. Check out our blog post!. w/ @yen_chen_lin @SongShuran @phillip_isola Tsung-Yi Lin.
Check out new research into applying transfer learning to robotic manipulation. By leveraging pre-trained weights from computer vision models, it’s possible to greatly improve the training efficiency for robotic manipulation tasks. Learn all about it at
1
36
142
Released @PyTorch code for “Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning” (works for robots in both sim & real). Happy hacking :).Code: Paper: Project:
2
55
126
The Abstraction Reasoning Corpus from @fchollet is a hard AGI benchmark – LLMs are the closest thing to a generalist that can do 85+ problems, and still continue to do many of them with completely random tokens sampled from the vocabulary. This token invariance is fascinating 🤔
In a new preprint, we assess LLMs’ in-context learning abilities for *abstract* non-linguistic patterns—& explore how this might be useful for robotics. Examples:.-extrapolating symbolic patterns.-extending periodic motions.-discovering simple policies (e.g. for CartPole).(1/8)
1
15
100
Through vision and interaction, can robots discover the physical properties of objects? We explore this question in our latest work, which will appear at RSS tomorrow. See you there! w/ Zhenjia Xu,@jiajunwu_cs, Josh Tenenbaum, @SongShuran #robotics #AI
1
14
98
Had a blast demo'ing language + robots (w/ PaLM 2) at Google I/O!.w/ @xf1280 @brian_ichter @RandomRobotics @peteflorence Spencer Goodrich.(glad we didn't tank the stock price) 😅
3
4
99
For end-to-end robot learning: pixels to joint angles? or to cartesian poses?. IKP uses Implicit BC + (differentiable) kinematics to learn inductive patterns in both action spaces. w/ @AdityaGanapathi @peteflorence Jake Varley @kaylburns @Ken_Goldberg
1
15
92
Language models can generate plans and code 📝 but sometimes they’re just better off responding "I dunno 🤷♀️". Read more 👇 on how we’re aligning LLM uncertainty (with statistical guarantees) on robots 🤖 where safety matters.
LLMs can generate plans and write robot code 📝 but they can also make mistakes. How do we get LLMs to 𝘬𝘯𝘰𝘸 𝘸𝘩𝘦𝘯 𝘵𝘩𝘦𝘺 𝘥𝘰𝘯'𝘵 𝘬𝘯𝘰𝘸 🤷 and ask for help?. Read more on how we can do this (with statistical guarantees) for LLMs on robots 👇.
0
19
88
Turns out spatial action maps + intention representations improve multi-agent multi-skill coordination for mobile manipulation. We also have adorable little throwing Anki robots now too! .w/ Jimmy Wu, X. Sun, @SongShuran, S. Rusinkiewicz, T. Funkhouser
0
18
63
Large model planners (PaLM-E) generate text 📝 but struggle with physics – can generating videos 🎞️ help?. In "Video Language Planning" we train VLMs + video models to enable robots to imagine (then do) really long multi-step tasks. Led by the amazing @du_yilun @mengjiao_yang👇.
Introducing Video Language Planning!. By planning across the space of generated videos/language, we can synthesize long-horizon video plans and solve much longer horizon tasks than existing baseline (such as RT-2 and PALM-E). (1/5)
1
8
60
Turns out robots can write their own code using LLMs, given natural language instructions by people!.Part of the magic is hierarchical code-gen (e.g. recursively defining functions), which also improves sota on generic codegen benchmarks too. Check out 🧵from Jacky!.
How can robots perform a wide variety of novel tasks from natural language? . Execited to present Code as Policies - using language models to directly write robot policy code from language instructions. See paper, colabs, blog, and demos at long 🧵👇
0
7
59
Incredibly thrilled to have our work on TossingBot receive Best Systems Paper Award at RSS 2019! Congrats to all my coauthors, and huge shout out to my collaborators @GoogleAI who helped make this work possible :). Links to paper and videos:
3
4
60
Getting closer to robots 🤖 that in-context learn (fast adaptation) by day 📝 fine-tune by night 😴. Excited that we're thinking more about human-robot interaction as model predictive control (powered by Foundation models)!. Read more 👇
We can teach LLMs to write better robot code through natural language feedback. But can LLMs remember what they were taught and improve their teachability over time?. Introducing our latest work, Learning to Learn Faster from Human Feedback with Language Model Predictive Control
0
8
56
We wrote a blog post on !. (a) If a person had to transfer pens between cups – they might do it all at once.(b) If a robot had to do the same – might do it one-by-one due to hardware limits. Can robots self-learn skill (b) from videos of person doing (a)?
Introducing XIRL, a self-supervised method for Cross-embodiment Inverse RL, which summarizes task objective knowledge from videos in the form of reward functions used to teach tasks to robots with new physical embodiments. Read more and copy the code ↓
0
6
57
We're hosting a workshop on Language and Robotics this year at #RSS2023!.We've got an incredible panel of speakers, and we're excited to discuss together the future of articulate robots! Join us and make a submission here:
0
11
52
This came out of an amazing collaboration between Robotics and AR teams at Google w/ @almostsquare @tek2222 @kchorolab @fedassa @aveekly @ryoo_michael @vikassindhwani @JohnnyChungLee Vincent Vanhoucke @peteflorence.
1
1
49
Really excited about this! Simple BC with implicit models (states & actions as input) can learn complex closed-loop manipulation skills from RGB pixels better than their explicit counterparts, and give rise to a new class of BC baselines that are competitive with SOTA offline RL.
Excited to share more about our "Implicit Behavioral Cloning" work! . ✅*code* just released: ✅*videos*: Will be sharing more this week at #CoRL2021. I'll also maybe write a TL;DR thread soon, meanwhile, check out the website!
0
15
46
It turns out that rigid spatial displacements can serve as useful priors for non-rigid ones — enabling goal-driven rearrangement of deformable objects! Check out our latest work w/ Daniel Seita, @peteflorence, @JonathanTompson, @erwincoumans, @vikassindhwani, @ken_goldberg.
Learn about a new open-source benchmark and suite of simulated tasks for robotic manipulation of deformable objects — including cables, fabrics and bags — with a set of model architectures that enable learning complex relative spatial relations.
0
5
37
Visual prompting 🖼️📝 meets sampling-based optimization 🎲📊. Pivot is a neat way to extract more (e.g. spatial, actionable) knowledge from large VLMs (GPT-4 or Gemini) in ways that can be used on agents and robots. Read more 👇 at
How do you get zero-shot robot control from VLMs?. Introducing Prompting with Iterative Visual Optimization, or PIVOT! It casts spatial reasoning tasks as VQA by visually annotating images, which VLMs can understand and answer. Project website:
0
2
39
Congratulations!.
UMI got the Outstanding System Paper finalist #RSS2024. Congratulations team!! 🥳.Hope to see more UMI running around the world 😊 !
0
2
37
Training robot hands 🤖 to play piano 🎹 is surprisingly hard – subtleties of precise contact, hitting chords at just the right moments, moving fingers in anticipation of what comes next… all make it a great testbed for control. Check out our latest benchmark in Kevin's post!👇.
Introducing 𝗥𝗼𝗯𝗼𝗣𝗶𝗮𝗻𝗶𝘀𝘁 🎹🤖, a new benchmark for high-dimensional robot control! Solving it requires mastering the piano with two anthropomorphic hands. This has been one year in the making, and I couldn’t be happier to release it today! Some highlights below:
1
6
31
Impressive results from “Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation” by @peteflorence, Lucas Manuelli, Russ Tedrake. They show a robot using the learned descriptors to find class-consistent grasping points.
0
20
30
Really excited about our latest upgrades to PaLM-SayCan! Love that we get to benefit from LLM capabilities, as we start thinking more about language as robot middleware. 🧵👇.
We have some exciting updates to SayCan! Together with the updated paper, we're adding new resources to learn more about this work:.Interactive site: Blog posts: and Video:
1
2
30
Join us for the workshop on "Pre-training Robot Learning" at CoRL 2022!.Submission deadline: Sep 28, 2022. We have an incredible lineup of speakers!.Website: @stepjamUK's 🧵👇
Announcing the 1st "Workshop on Pre-training Robot Learning" at @corl_conf, Dec 15. Fantastic lineup of speakers: Jitendra Malik, Chelsea Finn, Joseph Lim, Kristen Graumen, Abhinav Gupta, Raia Hadsell. Submit your 4-page extended abstract by September 28.
2
5
30
Code-writing LLMs 📝 + Python interpreters 💻 can do powerful things - but sometimes asking LLM to "simulate" the interpreter can help with linguistic subtasks (e.g. get_facts, detect_sarcasm). "LMulators" for code-driven reasoning sets new SOTA on BBH. See @ChengshuEricLi 🧵👇.
We are excited to announce Chain of Code (CoC), a simple yet surprisingly effective method that improves Language Model code-driven reasoning. On BIG-Bench Hard, CoC achieves 84%, a gain of 12% over Chain of Thought. Website: Paper:
0
5
27
Incredibly exciting work from our colleagues at Google on large language models <-> robot affordances!.
Super excited to introduce SayCan (: 1st publication of a large effort we've been working on for 1+ years. Robots ground large language models in reality by acting as their eyes and hands while LLMs help robots execute long, abstract language instructions
0
1
28
Check out the summary video of our upcoming @iros_2018 publication: “Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning” #robotics #iros2018 .Full video: Project webpage:
0
11
23
I love that LLMs are capable of generating code to compose vision APIs 📷 to answer questions about images – with competitive few-shot performance!. "infinite use of finite means," excited about the potential here for vision. Check out our take on CodeVQA in Sanjay's thread! 👇.
New paper at #acl2023nlp!."Modular Visual Question Answering via Code Generation".With @medhini_n @kushaltk1248 @KevinYa33964384 @NagraniArsha @CordeliaSchmid @andyzengtweets @trevordarrell Dan Klein (@berkeley_ai/@GoogleAI)!.📜 💻
0
5
24
Predicting dense heatmaps of navigational endpoints from visual input seems to help RL agents more quickly learn mobile manipulation tasks like pushing. Check out spatial action maps! .w/ Jimmy Wu, X. Sun, @SongShuran J. Lee, S. Rusinkiewicz, T. Funkhouser
0
1
23
In robot learning, we often assume discrete-time MDPs. But physical robots are not discrete-time! asynchronously: images @ 30Hz, proprioceptive @ 100Hz, force-torque @ 500Hz…. InFuser takes a step towards continuous-time multi-scale feedback control w/ N-CDEs.See Sumeet’s 🧵👇.
Introducing 'InFuser' - an architecture for learning hybrid continuous-time policies for dynamic tasks!. Using Neural CDEs [@patrickkidger], we present a model for handling irregularly sampled multi-frequency-multi-sensory observations, and outputting continuous-time control.
0
2
22
Rearranging deep features can provide spatial structure that improves learning rearrangement tasks for robot manipulation. Brute-force search here is just a convolution, fast and practical for real-world pick-and-place. Now open-source (thanks to @ayzwah)!
Can models more efficiently learn rearrangement tasks by overlaying 3D space instead of using object-centric representations? Check out Transporter Nets, an open-source framework for sample-efficient robot manipulation, with related benchmark tasks. See ↓
0
3
20
Check out this excellent summary video of Form2Fit by @karoly_zsolnai from Two Minute Papers!.
This Robot Arm Learned To Assemble Objects It Hasn’t Seen Before.▶️Full video (ours): 📜Source paper: #ai #deeplearning #science #twominutepapers
0
2
18
Great blog post from Kevin on sample efficient representations and inductive biases! Convolutions for vision, self-attention for language and sequences---what will it be for robotics?.
New blog post: "Representation Matters" How cleverly designing your state and action space can give you orders of magnitude more sample efficiency in imitation learning.
0
3
16
We’d like robots to learn from YouTube videos… but humans not only *look* different, but also *do* things differently than robots. We study 3rd-person imitation with self-supervised rewards that generalize to new embodiment appearances and control strategies. Kevin’s thread!👇.
How can robots 🤖 learn from videos of humans, especially when humans perform the same task in different ways?. A 🧵 introducing our #CoRL2021 paper "XIRL: Cross-embodiment Inverse RL". Website & code: 1/
0
3
17
Incredibly compelling results on modeling the distributional multimodalities ✌️ in policy space. Fantastic work Cheng and team!.
What if the form of visuomotor policy has been the bottleneck for robotic manipulation all along? Diffusion Policy achieves 46.9% improvement vs prior StoA on 11 tasks from 4 benchmarks + 4 real world tasks! (1/7). website : paper:
0
1
17
Is it possible to predict 3D data and semantics for a full 360° view using only a single image? Come to our oral for Im2Pano3D this Wed at #CVPR2018! GPU-enabled code @NVIDIAAIDev available: w/ @SongShuran A Chang, M Savva, @silviocinguetta, T Funkhouser
0
7
15
This project was led by the amazing @huang_chenguang w/ @oier_mees and @wolfram_burgard, an incredibly fun collaboration w/ Freiburg University and University of Technology Nuremberg. Website & paper:
5
2
15
Awesome summary of TossingBot from @karoly_zsolnai @ Two Minute Papers. Thanks for sharing Karoly!.
0
1
14
Congratulations Shuran!.
Congrats to our @ColumbiaCompSci Prof Shuran Song @SongShuran, who's won an @NSF CAREER award to enable #Robots to learn on their own and adapt to new environments. @ColumbiaScience @Columbia
0
0
15
Textual closed-loop feedback enables language model robot planners to:.✅ react to lower-level control mistakes.✅ adapt to new instructions on-the-fly.✅ propose new plans if original was unfeasible.✅ answer natural language questions about their understanding of the world.🧵👇.
Have you ever “heard” yourself talk in your head? Turns out it's a useful tool for robots too!. Introducing Inner Monologue: feeding continual textual feedback into LLMs allows robots to articulate a grounded “thought process” to execute long, abstract instructions 🧵👇
0
1
14
From digital assistants to robot butlers, learning user preferences and generalizing them to new settings can provide a more personalized experience. Check out Jimmy's 🧵👇 on how we're doing this with LLMs and foundation models towards a generalist TidyBot 🧹 (that throws!).
When organizing a home, everyone has unique preferences for where things go. How can household robots learn your preferences from just a few examples?. Introducing 𝗧𝗶𝗱𝘆𝗕𝗼𝘁: Personalized Robot Assistance with Large Language Models. Project page:
0
0
13
Are there questions you’d like to ask our speakers for the panel discussion during the “Scaling Robot Learning” workshop at #ICRA2022 (May 27)?. Fill out this form here: .Workshop:
0
3
12
Turns out we can get multiple models to jointly steer LLM next-token prediction as a way to ground them (e.g. to visual inputs, physical world) Check out Wenlong's thread on what we're excited to call "grounded decoding" 👇.
Large language models gathered tons of world knowledge by speaking human language. But can they ever speak “robot language”?. Introducing “Grounded Decoding”: a scalable way to decode *grounded text* from LLM for robots. Website: 🧵👇
0
1
11
Had a fantastic time at #ICRA2018! Here was our poster on “Robotic Pick-and-place of Novel Objects in Clutter with Multi-affordance Grasping and Cross-domain Image Matching” with Team MIT-Princeton from the Amazon Robotics Challenge!
0
3
9
100% agree. It still surprises me to this day just how much easier the optimization can be, with the right architectures and data representations. E.g. Transformers!.
An incredible feature of neural nets is that by choosing certain architectures they may exhibit extreme generalization. Similarly in sequence prediction, by choosing the right RNNs, the model may generalize extremely.
0
1
9
Exciting new ICRA paper led by @WiYoungsun!. VIRDO uses neural fields to predict how an object will deform, given visual-tactile sensing w/ partial point clouds + forces & contact .. w/ @NimaFazeli7’s robotics lab at UMich, @peteflorence
1
0
7
Fantastic sculptures from @JohnVMuntean A nice example for why a 3D understanding our visual world is important for AI systems -- to be aware of 2D ambiguity.
0
2
6
In environments where there are multiple instances of the same object (chairs, tables, or sofas), sound can help robots disambiguate destinations – "go to the sofa near the sound of the baby crying." @huang_chenguang puts this to the test with a number of simulated experiments.
1
0
5
Submission deadline: May 25, 2022 AoE.Best Paper Award of $1,000 sponsored by @FlexivRobotics.
Take a look at #ATXWest exhibits: Flexiv's robotic #teleoperation allows the operator to make one or more remote arm(s) do synchronous operations by controlling a master arm and getting real-time force feedback. It can be widely applied in fields of medical treatment, R&D, etc.
0
0
3
@athundt @stepjamUK Thank you for the pointers, Andrew! This is an important topic and highly relevant to the workshop as well. We've added it as a topic to be discussed in the workshop and panel discussions.
1
0
2
Robo-pickers today are often programmed with only a single skill: how to grasp objects. But is it possible to have a robot automatically learn other skills (like pushing) to support more efficient picking? #Robotics #AI #DeepLearning Our latest work:
0
2
2
Impressive work on deep quadruped control for animation from mocap data: “Mode-Adaptive Neural Networks for Quadruped Motion Control” by @blacksquirrel__ @dukecyto and others at Univ. of Edinburgh and Adobe Paper:
0
3
2
Can neural nets infer what’s behind you? Kinda. We tried :) Check out our new CVPR publication: #AI #DeepLearning #ComputerVision
0
1
2
MIT's awesome article and video on our robo-picker: Project website+paper+code: Accepted for publication at ICRA 2018. Proud of the incredible team :) #Robotics #AI #DeepLearning
0
0
1