Andy Zeng @andyzeng_ profile

Andy Zeng

@andyzeng_

Followers

8K

Following

986

Media

60

Statuses

368

Building smarter robots @GoogleDeepMind. PhD @Princeton. CS & Math @UCBerkeley

Joined September 2017

Don't wanna be here? Send us removal request.

Andy Zeng

@andyzeng_

3 years

With multiple foundation models “talking to each other”, we can combine commonsense across domains, to do multimodal tasks like zero-shot video Q&A or image captioning, no finetuning needed. Socratic Models:.website + code: .paper:

21

379

2K

Andy Zeng

@andyzeng_

3 years

Join us next week at the CVPR Tutorial on Vision-Based Robot Learning!.We’ll distribute Colabs that show you how to run Socratic Models for language-driven robot pick & place right in your browser (in person, or online!).

17

205

1K

Andy Zeng

@andyzeng_

6 years

Can robots learn to pick up stuff and accurately toss them into bins outside its natural range? Check out our latest work, TossingBot! w/ @SongShuran, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser #robotics #AI #research

9

156

476

Andy Zeng

@andyzeng_

2 years

Code-writing LLMs are surprisingly good at 📝 writing reward functions for 🦾 MPC low-level control – providing a chat-like interface to teach robots to things like "stand up and moon-walk" 🐶. Read more about it here 👇 and @xf1280's 🧵.

Fei Xia

@xf1280

2 years

🤖Excited to share our project where we propose to use rewards represented in code as a flexible interface between LLMs and an optimization-based motion controller. website: Want to learn more about how we make a robot dog do moonwalk MJ style?🕺🕺

5

97

394

Andy Zeng

@andyzeng_

2 years

Can robots 🤖 to navigate to sounds 🔊 they've heard?. w/ audio-language 🔊✏️ foundation models, excited that we can now ask our helper robots to "go to where you heard coughing". Audio-Visual-Language Maps w/ @huang_chenguang @oier_mees @wolfram_burgard:

1

46

201

Andy Zeng

@andyzeng_

2 years

We built PaLM-E 🌴🤖 one of the largest multimodal language models to date, trained end-to-end on robot data. Images, text, state inputs, neural scene embeddings – you name it. And it's fantastic on robots. Check out Danny's thread 👇.

Danny Driess

@DannyDriess

2 years

What happens when we train the largest vision-language model and add in robot experiences?.The result is PaLM-E 🌴🤖, a 562-billion parameter, general-purpose, embodied visual-language generalist - across robotics, vision, and language. Website:

2

26

197

Andy Zeng

@andyzeng_

2 years

Still crazy to me that we can prompt LLMs (GPT-3 or PaLM) with a bunch of numbers 📝 to discover and improve closed-loop policies that stabilize CartPole – entirely in-context w/o model finetuning. Read more in @suvir_m's post 👇 and try out the code

Suvir Mirchandani

@suvir_m

2 years

In a new preprint, we assess LLMs’ in-context learning abilities for *abstract* non-linguistic patterns—& explore how this might be useful for robotics. Examples:.-extrapolating symbolic patterns.-extending periodic motions.-discovering simple policies (e.g. for CartPole).(1/8)

0

39

195

Andy Zeng

@andyzeng_

2 years

Excited to share "Visual Language Maps"! VLMaps fuse visual language model features into a dense 3D map for robot navigation from natural language instructions. Website: Led by the amazing @huang_chenguang w/ @oier_mees, @wolfram_burgard

3

31

159

Andy Zeng

@andyzeng_

5 years

Tried ImageNet pre-training for your robot learning models only to find out it didn't help? Turns out, which dataset you use & which weights you transfer, matters a lot. Check out our blog post!. w/ @yen_chen_lin @SongShuran @phillip_isola Tsung-Yi Lin.

Google AI

@GoogleAI

5 years

Check out new research into applying transfer learning to robotic manipulation. By leveraging pre-trained weights from computer vision models, it’s possible to greatly improve the training efficiency for robotic manipulation tasks. Learn all about it at

1

36

142

Andy Zeng

@andyzeng_

7 years

Released @PyTorch code for “Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning” (works for robots in both sim & real). Happy hacking :).Code: Paper: Project:

2

55

126

Andy Zeng

@andyzeng_

3 years

From recalling events, to contextual and temporal reasoning – prompting foundation models to engage in guided Socratic discussions enables a variety of new open-ended video Q&A capabilities.

1

8

107

Andy Zeng

@andyzeng_

2 years

The Abstraction Reasoning Corpus from @fchollet is a hard AGI benchmark – LLMs are the closest thing to a generalist that can do 85+ problems, and still continue to do many of them with completely random tokens sampled from the vocabulary. This token invariance is fascinating 🤔

Suvir Mirchandani

@suvir_m

2 years

In a new preprint, we assess LLMs’ in-context learning abilities for *abstract* non-linguistic patterns—& explore how this might be useful for robotics. Examples:.-extrapolating symbolic patterns.-extending periodic motions.-discovering simple policies (e.g. for CartPole).(1/8)

1

15

100

Andy Zeng

@andyzeng_

6 years

Through vision and interaction, can robots discover the physical properties of objects? We explore this question in our latest work, which will appear at RSS tomorrow. See you there! w/ Zhenjia Xu,@jiajunwu_cs, Josh Tenenbaum, @SongShuran #robotics #AI

1

14

98

Andy Zeng

@andyzeng_

2 years

Had a blast demo'ing language + robots (w/ PaLM 2) at Google I/O!.w/ @xf1280 @brian_ichter @RandomRobotics @peteflorence Spencer Goodrich.(glad we didn't tank the stock price) 😅

3

4

99

Andy Zeng

@andyzeng_

3 years

For end-to-end robot learning: pixels to joint angles? or to cartesian poses?. IKP uses Implicit BC + (differentiable) kinematics to learn inductive patterns in both action spaces. w/ @AdityaGanapathi @peteflorence Jake Varley @kaylburns @Ken_Goldberg

1

15

92

Andy Zeng

@andyzeng_

3 years

One way to approach video understanding is to turn it into a reading comprehension problem. This turns a classically hard computer vision task into something that we know large language models are good at.

2

6

88

Andy Zeng

@andyzeng_

2 years

Language models can generate plans and code 📝 but sometimes they’re just better off responding "I dunno 🤷‍♀️". Read more 👇 on how we’re aligning LLM uncertainty (with statistical guarantees) on robots 🤖 where safety matters.

Allen Z. Ren

@allenzren

2 years

LLMs can generate plans and write robot code 📝 but they can also make mistakes. How do we get LLMs to 𝘬𝘯𝘰𝘸 𝘸𝘩𝘦𝘯 𝘵𝘩𝘦𝘺 𝘥𝘰𝘯'𝘵 𝘬𝘯𝘰𝘸 🤷 and ask for help?. Read more on how we can do this (with statistical guarantees) for LLMs on robots 👇.

0

19

88

Andy Zeng

@andyzeng_

5 years

Sensing transparent objects is the Achilles heel of 3D vision in robotics. Can deep learning help? Check out ClearGrasp -- enabling commodity RGB-D sensors to see transparent surfaces, and improve robotic grasping. w/ SynthesisAI, @Columbia, @GoogleAI.

1

21

75

Andy Zeng

@andyzeng_

3 years

What do we need to scale robot learning? .Self-supervision? Simulation? Distributed training?. Here’s our call for workshop papers:.ICRA 2022 Workshop on Scaling Robot Learning. We’ve got a great lineup of speakers. Looking forward to your contributions!

0

16

66

Andy Zeng

@andyzeng_

3 years

A couple more examples – here’s zero-shot image captioning, with the large language model (LM) and visual-language model (VLM) working together. Code is already open-source for this one:

1

4

65

Andy Zeng

@andyzeng_

4 years

Turns out spatial action maps + intention representations improve multi-agent multi-skill coordination for mobile manipulation. We also have adorable little throwing Anki robots now too! .w/ Jimmy Wu, X. Sun, @SongShuran, S. Rusinkiewicz, T. Funkhouser

0

18

63

Andy Zeng

@andyzeng_

1 year

Large model planners (PaLM-E) generate text 📝 but struggle with physics – can generating videos 🎞️ help?. In "Video Language Planning" we train VLMs + video models to enable robots to imagine (then do) really long multi-step tasks. Led by the amazing @du_yilun @mengjiao_yang👇.

Yilun Du

@du_yilun

1 year

Introducing Video Language Planning!. By planning across the space of generated videos/language, we can synthesize long-horizon video plans and solve much longer horizon tasks than existing baseline (such as RT-2 and PALM-E). (1/5)

1

8

60

Andy Zeng

@andyzeng_

2 years

Turns out robots can write their own code using LLMs, given natural language instructions by people!.Part of the magic is hierarchical code-gen (e.g. recursively defining functions), which also improves sota on generic codegen benchmarks too. Check out 🧵from Jacky!.

Jacky Liang

@jackyliang42

2 years

How can robots perform a wide variety of novel tasks from natural language? . Execited to present Code as Policies - using language models to directly write robot policy code from language instructions. See paper, colabs, blog, and demos at long 🧵👇

0

7

59

Andy Zeng

@andyzeng_

6 years

Incredibly thrilled to have our work on TossingBot receive Best Systems Paper Award at RSS 2019! Congrats to all my coauthors, and huge shout out to my collaborators @GoogleAI who helped make this work possible :). Links to paper and videos:

3

4

60

Andy Zeng

@andyzeng_

1 year

Getting closer to robots 🤖 that in-context learn (fast adaptation) by day 📝 fine-tune by night 😴. Excited that we're thinking more about human-robot interaction as model predictive control (powered by Foundation models)!. Read more 👇

Jacky Liang

@jackyliang42

1 year

We can teach LLMs to write better robot code through natural language feedback. But can LLMs remember what they were taught and improve their teachability over time?. Introducing our latest work, Learning to Learn Faster from Human Feedback with Language Model Predictive Control

0

8

56

Andy Zeng

@andyzeng_

3 years

We wrote a blog post on !. (a) If a person had to transfer pens between cups – they might do it all at once.(b) If a robot had to do the same – might do it one-by-one due to hardware limits. Can robots self-learn skill (b) from videos of person doing (a)?

Google AI

@GoogleAI

3 years

Introducing XIRL, a self-supervised method for Cross-embodiment Inverse RL, which summarizes task objective knowledge from videos in the form of reward functions used to teach tasks to robots with new physical embodiments. Read more and copy the code ↓

0

6

57

Andy Zeng

@andyzeng_

2 years

We're hosting a workshop on Language and Robotics this year at #RSS2023!.We've got an incredible panel of speakers, and we're excited to discuss together the future of articulate robots! Join us and make a submission here:

0

11

52

Andy Zeng

@andyzeng_

3 years

This came out of an amazing collaboration between Robotics and AR teams at Google w/ @almostsquare @tek2222 @kchorolab @fedassa @aveekly @ryoo_michael @vikassindhwani @JohnnyChungLee Vincent Vanhoucke @peteflorence.

1

49

Andy Zeng

@andyzeng_

3 years

In general, we’re excited about Socratic Models – they present new ways to think about how we can tackle new multimodal applications with the existing foundation models that we already have today, without additional finetuning or data collection.

2

0

49

Andy Zeng

@andyzeng_

3 years

We’re hosting the 2nd Workshop on “Scaling Robot Learning” at RSS 2022!.Beyond scaling robot systems, this 2nd edition of the workshop focuses on how academia can contribute with algorithmic advancements. We have an amazing lineup of speakers!.Website:

3

8

44

Andy Zeng

@andyzeng_

3 years

Really excited about this! Simple BC with implicit models (states & actions as input) can learn complex closed-loop manipulation skills from RGB pixels better than their explicit counterparts, and give rise to a new class of BC baselines that are competitive with SOTA offline RL.

Pete Florence

@peteflorence

3 years

Excited to share more about our "Implicit Behavioral Cloning" work! . ✅*code* just released: ✅*videos*: Will be sharing more this week at #CoRL2021. I'll also maybe write a TL;DR thread soon, meanwhile, check out the website!

0

15

46

Andy Zeng

@andyzeng_

4 years

It turns out that rigid spatial displacements can serve as useful priors for non-rigid ones — enabling goal-driven rearrangement of deformable objects! Check out our latest work w/ Daniel Seita, @peteflorence, @JonathanTompson, @erwincoumans, @vikassindhwani, @ken_goldberg.

Google AI

@GoogleAI

4 years

Learn about a new open-source benchmark and suite of simulated tasks for robotic manipulation of deformable objects — including cables, fabrics and bags — with a set of model architectures that enable learning complex relative spatial relations.

0

5

37

Andy Zeng

@andyzeng_

3 years

And here’s video-to-text retrieval. The Socratic Models framework makes it easy to add together new modalities (like speech from audio). In this case we can provide a new zero-shot SoTA, nearing the best finetuned methods.

1

38

Andy Zeng

@andyzeng_

1 year

Visual prompting 🖼️📝 meets sampling-based optimization 🎲📊. Pivot is a neat way to extract more (e.g. spatial, actionable) knowledge from large VLMs (GPT-4 or Gemini) in ways that can be used on agents and robots. Read more 👇 at

Brian Ichter

@brian_ichter

1 year

How do you get zero-shot robot control from VLMs?. Introducing Prompting with Iterative Visual Optimization, or PIVOT! It casts spatial reasoning tasks as VQA by visually annotating images, which VLMs can understand and answer. Project website:

0

2

39

Andy Zeng

@andyzeng_

7 months

Congratulations!.

Shuran Song

@SongShuran

7 months

UMI got the Outstanding System Paper finalist #RSS2024. Congratulations team!! 🥳.Hope to see more UMI running around the world 😊 !

0

2

37

Andy Zeng

@andyzeng_

2 years

Training robot hands 🤖 to play piano 🎹 is surprisingly hard – subtleties of precise contact, hitting chords at just the right moments, moving fingers in anticipation of what comes next… all make it a great testbed for control. Check out our latest benchmark in Kevin's post!👇.

Kevin Zakka

@kevin_zakka

2 years

Introducing 𝗥𝗼𝗯𝗼𝗣𝗶𝗮𝗻𝗶𝘀𝘁 🎹🤖, a new benchmark for high-dimensional robot control! Solving it requires mastering the piano with two anthropomorphic hands. This has been one year in the making, and I couldn’t be happier to release it today! Some highlights below:

1

6

31

Andy Zeng

@andyzeng_

7 years

Impressive results from “Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation” by @peteflorence, Lucas Manuelli, Russ Tedrake. They show a robot using the learned descriptors to find class-consistent grasping points.

0

20

30

Andy Zeng

@andyzeng_

2 years

Really excited about our latest upgrades to PaLM-SayCan! Love that we get to benefit from LLM capabilities, as we start thinking more about language as robot middleware. 🧵👇.

Karol Hausman

@hausman_k

2 years

We have some exciting updates to SayCan! Together with the updated paper, we're adding new resources to learn more about this work:.Interactive site: Blog posts: and Video:

1

2

30

Andy Zeng

@andyzeng_

2 years

Join us for the workshop on "Pre-training Robot Learning" at CoRL 2022!.Submission deadline: Sep 28, 2022. We have an incredible lineup of speakers!.Website: @stepjamUK's 🧵👇

Stephen James

@stepjamUK

2 years

Announcing the 1st "Workshop on Pre-training Robot Learning" at @corl_conf, Dec 15. Fantastic lineup of speakers: Jitendra Malik, Chelsea Finn, Joseph Lim, Kristen Graumen, Abhinav Gupta, Raia Hadsell. Submit your 4-page extended abstract by September 28.

2

5

30

Andy Zeng

@andyzeng_

3 years

Socratic Models meets DALLE-2!.and generates these captions:. 0.3211 A creative android works on a painting in a laboratory. 0.3067 "A robotic painter in a future art studio.".0.2926 The future of painting? A robotic pilot creates a work of art.

Mark Chen

@markchen90

3 years

"a robot hand painting a self portrait on a canvas" by dalle-2 (

2

12

30

Andy Zeng

@andyzeng_

1 year

Code-writing LLMs 📝 + Python interpreters 💻 can do powerful things - but sometimes asking LLM to "simulate" the interpreter can help with linguistic subtasks (e.g. get_facts, detect_sarcasm). "LMulators" for code-driven reasoning sets new SOTA on BBH. See @ChengshuEricLi 🧵👇.

Chengshu Li

@ChengshuEricLi

1 year

We are excited to announce Chain of Code (CoC), a simple yet surprisingly effective method that improves Language Model code-driven reasoning. On BIG-Bench Hard, CoC achieves 84%, a gain of 12% over Chain of Thought. Website: Paper:

0

5

27

Andy Zeng

@andyzeng_

3 years

Incredibly exciting work from our colleagues at Google on large language models <-> robot affordances!.

Karol Hausman

@hausman_k

3 years

Super excited to introduce SayCan (: 1st publication of a large effort we've been working on for 1+ years. Robots ground large language models in reality by acting as their eyes and hands while LLMs help robots execute long, abstract language instructions

0

1

28

Andy Zeng

@andyzeng_

7 years

Check out the summary video of our upcoming @iros_2018 publication: “Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning” #robotics #iros2018 .Full video: Project webpage:

0

11

23

Andy Zeng

@andyzeng_

2 years

I love that LLMs are capable of generating code to compose vision APIs 📷 to answer questions about images – with competitive few-shot performance!. "infinite use of finite means," excited about the potential here for vision. Check out our take on CodeVQA in Sanjay's thread! 👇.

Sanjay Subramanian

@sanjayssub

2 years

New paper at #acl2023nlp!."Modular Visual Question Answering via Code Generation".With @medhini_n @kushaltk1248 @KevinYa33964384 @NagraniArsha @CordeliaSchmid @andyzengtweets @trevordarrell Dan Klein (@berkeley_ai/@GoogleAI)!.📜 💻

0

5

24

Andy Zeng

@andyzeng_

5 years

Predicting dense heatmaps of navigational endpoints from visual input seems to help RL agents more quickly learn mobile manipulation tasks like pushing. Check out spatial action maps! .w/ Jimmy Wu, X. Sun, @SongShuran J. Lee, S. Rusinkiewicz, T. Funkhouser

0

1

23

Andy Zeng

@andyzeng_

3 years

In robot learning, we often assume discrete-time MDPs. But physical robots are not discrete-time! asynchronously: images @ 30Hz, proprioceptive @ 100Hz, force-torque @ 500Hz…. InFuser takes a step towards continuous-time multi-scale feedback control w/ N-CDEs.See Sumeet’s 🧵👇.

Sumeet Singh

@Sumeet_Robotics

3 years

Introducing 'InFuser' - an architecture for learning hybrid continuous-time policies for dynamic tasks!. Using Neural CDEs [@patrickkidger], we present a model for handling irregularly sampled multi-frequency-multi-sensory observations, and outputting continuous-time control.

0

2

22

Andy Zeng

@andyzeng_

6 years

3D semantic keypoints can be more useful and generalizable than 6D poses — as shown for robotic manipulation by colleagues at MIT. Check out their work!.

0

14

22

Andy Zeng

@andyzeng_

4 years

Rearranging deep features can provide spatial structure that improves learning rearrangement tasks for robot manipulation. Brute-force search here is just a convolution, fast and practical for real-world pick-and-place. Now open-source (thanks to @ayzwah)!

Google AI

@GoogleAI

4 years

Can models more efficiently learn rearrangement tasks by overlaying 3D space instead of using object-centric representations? Check out Transporter Nets, an open-source framework for sample-efficient robot manipulation, with related benchmark tasks. See ↓

0

3

20

Andy Zeng

@andyzeng_

6 years

Need a small and fast inverse kinematics solver for real-time robot motion planning? Released a low-friction Python wrapper over OpenRave's IKFast, with a demo for the UR5: Useful for manipulation, visualization… or prototyping things like this:

0

1

17

Andy Zeng

@andyzeng_

5 years

Check out this excellent summary video of Form2Fit by @karoly_zsolnai from Two Minute Papers!.

Two Minute Papers

@twominutepapers

5 years

This Robot Arm Learned To Assemble Objects It Hasn’t Seen Before.▶️Full video (ours): 📜Source paper: #ai #deeplearning #science #twominutepapers

0

2

18

Andy Zeng

@andyzeng_

4 years

Great blog post from Kevin on sample efficient representations and inductive biases! Convolutions for vision, self-attention for language and sequences---what will it be for robotics?.

Kevin Zakka

@kevin_zakka

4 years

New blog post: "Representation Matters" How cleverly designing your state and action space can give you orders of magnitude more sample efficiency in imitation learning.

0

3

16

Andy Zeng

@andyzeng_

3 years

We’d like robots to learn from YouTube videos… but humans not only *look* different, but also *do* things differently than robots. We study 3rd-person imitation with self-supervised rewards that generalize to new embodiment appearances and control strategies. Kevin’s thread!👇.

Kevin Zakka

@kevin_zakka

3 years

How can robots 🤖 learn from videos of humans, especially when humans perform the same task in different ways?. A 🧵 introducing our #CoRL2021 paper "XIRL: Cross-embodiment Inverse RL". Website & code: 1/

0

3

17

Andy Zeng

@andyzeng_

2 years

Incredibly compelling results on modeling the distributional multimodalities ✌️ in policy space. Fantastic work Cheng and team!.

Cheng Chi

@chichengcc

2 years

What if the form of visuomotor policy has been the bottleneck for robotic manipulation all along? Diffusion Policy achieves 46.9% improvement vs prior StoA on 11 tasks from 4 benchmarks + 4 real world tasks! (1/7). website : paper:

0

1

17

Andy Zeng

@andyzeng_

7 years

Is it possible to predict 3D data and semantics for a full 360° view using only a single image? Come to our oral for Im2Pano3D this Wed at #CVPR2018! GPU-enabled code @NVIDIAAIDev available: w/ @SongShuran A Chang, M Savva, @silviocinguetta, T Funkhouser

0

7

15

Andy Zeng

@andyzeng_

2 years

This project was led by the amazing @huang_chenguang w/ @oier_mees and @wolfram_burgard, an incredibly fun collaboration w/ Freiburg University and University of Technology Nuremberg. Website & paper:

5

2

15

Andy Zeng

@andyzeng_

6 years

Awesome summary of TossingBot from @karoly_zsolnai @ Two Minute Papers. Thanks for sharing Karoly!.

Two Minute Papers

@twominutepapers

6 years

This Robot Arm AI Throws Objects with Amazing Precision - #tossingbot #ai

0

1

14

Andy Zeng

@andyzeng_

3 years

Congratulations Shuran!.

Columbia Engineering

@CUSEAS

3 years

Congrats to our @ColumbiaCompSci Prof Shuran Song @SongShuran, who's won an @NSF CAREER award to enable #Robots to learn on their own and adapt to new environments. @ColumbiaScience @Columbia

0

15

Andy Zeng

@andyzeng_

3 years

Textual closed-loop feedback enables language model robot planners to:.✅ react to lower-level control mistakes.✅ adapt to new instructions on-the-fly.✅ propose new plans if original was unfeasible.✅ answer natural language questions about their understanding of the world.🧵👇.

Karol Hausman

@hausman_k

3 years

Have you ever “heard” yourself talk in your head? Turns out it's a useful tool for robots too!. Introducing Inner Monologue: feeding continual textual feedback into LLMs allows robots to articulate a grounded “thought process” to execute long, abstract instructions 🧵👇

0

1

14

Andy Zeng

@andyzeng_

2 years

From digital assistants to robot butlers, learning user preferences and generalizing them to new settings can provide a more personalized experience. Check out Jimmy's 🧵👇 on how we're doing this with LLMs and foundation models towards a generalist TidyBot 🧹 (that throws!).

Jimmy Wu

@jimmyyhwu

2 years

When organizing a home, everyone has unique preferences for where things go. How can household robots learn your preferences from just a few examples?. Introducing 𝗧𝗶𝗱𝘆𝗕𝗼𝘁: Personalized Robot Assistance with Large Language Models. Project page:

0

13

Andy Zeng

@andyzeng_

2 years

There's so much information stored in audio data – and I'm excited that we can tap into that with audio-language models, on systems where language serves as robot middleware.

1

11

Andy Zeng

@andyzeng_

3 years

Are there questions you’d like to ask our speakers for the panel discussion during the “Scaling Robot Learning” workshop at #ICRA2022 (May 27)?. Fill out this form here: .Workshop:

0

3

12

Andy Zeng

@andyzeng_

4 years

Congrats Shuran!.

Shuran Song

@SongShuran

4 years

Honored to be a Microsoft Research Faculty Fellow!.

0

12

Andy Zeng

@andyzeng_

3 years

Extending our submission deadline for the ICRA '22 Workshop on "Scaling Robot Learning" to Apr 18 (anywhere-on-earth time)! We have an incredible list of workshop speakers. There will also be a Best Paper Award of $1,000 (sponsored by Google).

0

1

11

Andy Zeng

@andyzeng_

2 years

Turns out we can get multiple models to jointly steer LLM next-token prediction as a way to ground them (e.g. to visual inputs, physical world) Check out Wenlong's thread on what we're excited to call "grounded decoding" 👇.

Wenlong Huang

@wenlong_huang

2 years

Large language models gathered tons of world knowledge by speaking human language. But can they ever speak “robot language”?. Introducing “Grounded Decoding”: a scalable way to decode *grounded text* from LLM for robots. Website: 🧵👇

0

1

11

Andy Zeng

@andyzeng_

7 years

Had a fantastic time at #ICRA2018! Here was our poster on “Robotic Pick-and-place of Novel Objects in Clutter with Multi-affordance Grasping and Cross-domain Image Matching” with Team MIT-Princeton from the Amazon Robotics Challenge!

0

3

9

Andy Zeng

@andyzeng_

5 years

@kevin_zakka @SongShuran Paper: Webpage:

0

10

9

Andy Zeng

@andyzeng_

6 years

Very exciting coverage on our work on robotic manipulation from two-minute papers! Thank you Karoly!.

Two Minute Papers

@twominutepapers

6 years

This Robot Learned To Clean Up Clutter. Full video:

1

9

Andy Zeng

@andyzeng_

5 years

100% agree. It still surprises me to this day just how much easier the optimization can be, with the right architectures and data representations. E.g. Transformers!.

Shane Gu

@shaneguML

5 years

An incredible feature of neural nets is that by choosing certain architectures they may exhibit extreme generalization. Similarly in sequence prediction, by choosing the right RNNs, the model may generalize extremely.

0

1

9

Andy Zeng

@andyzeng_

6 years

Excellent article from @CadeMetz @nytimes on our work in robotics at @GoogleAI. Check it out!.

Cade Metz

@CadeMetz

6 years

After about 14 hours of trial and an error inside Google's new lab, this robotic arm learns to pick up objects and toss them into a bin several feet away:

0

2

8

Andy Zeng

@andyzeng_

2 years

Lots of recent work in the area! in just the last month: NLMap & CLIP-Fields VLMaps is only our take on the problem, but I love that we get to explore spatial goals as a central part of the problem + open vocab obstacle maps

0

1

5

Andy Zeng

@andyzeng_

3 years

Exciting new ICRA paper led by @WiYoungsun!. VIRDO uses neural fields to predict how an object will deform, given visual-tactile sensing w/ partial point clouds + forces & contact .. w/ @NimaFazeli7’s robotics lab at UMich, @peteflorence

1

0

7

Andy Zeng

@andyzeng_

5 years

For grasping, we find that MS COCO > ImageNet, transferring both backbone & head yields better self-supervised trial-and-error exploration.

2

0

6

Andy Zeng

@andyzeng_

7 years

Fantastic sculptures from @JohnVMuntean A nice example for why a 3D understanding our visual world is important for AI systems -- to be aware of 2D ambiguity.

0

2

6

Andy Zeng

@andyzeng_

7 years

Thrilled to be a recipient of the NVIDIA fellowship. Thanks NVIDIA!

1

0

6

Andy Zeng

@andyzeng_

2 years

Excited to see others explore this area as well! We got to integrate parts of ConceptFusion in ways that were complementary to our approach to get the best of both.

1

6

Andy Zeng

@andyzeng_

2 years

VLMaps provides spatial grounding for VLMs like LSeg. Notably, when combined with code-writing LLMs, this allows navigating to spatial goals from natural language such as: "go in between the sofa and TV" or "move 3 meters to the right of the chair"

1

0

5

Andy Zeng

@andyzeng_

2 years

What's also exciting to me about this direction, is that it benefits not only from "scaling up" ⬆️ with larger foundation models, but also from "scaling horizontally" ↔️ with parallel simulation and online optimization e.g., via MuJoCo MPC.

0

4

Andy Zeng

@andyzeng_

2 years

We do see varying performance between foundation models (e.g. wav2clip and AudioCLIP). While AudioCLIP worked well for our use case, the system is also fairly flexible, so we can hot-swap with new audio-language models as they come.

1

0

5

Andy Zeng

@andyzeng_

2 years

AVLMaps fuses (A)udio (V)isual (L)anguage features into a shared 3D map representation that's open-vocabulary, where we can localize landmarks using multimodal queries like textual descriptions, images, or audio snippets.

1

0

4

Andy Zeng

@andyzeng_

2 years

In environments where there are multiple instances of the same object (chairs, tables, or sofas), sound can help robots disambiguate destinations – "go to the sofa near the sound of the baby crying." @huang_chenguang puts this to the test with a number of simulated experiments.

1

0

5

Andy Zeng

@andyzeng_

2 years

We looked at comparing to CoW and LM-Nav and we were excited to see VLMaps improve in capacity to (i) navigate to spatial goals, and (ii) handle long-horizon tasks with multiple subgoals (w/ ambiguity)

1

0

3

Andy Zeng

@andyzeng_

2 years

VLMaps allows "open vocabulary obstacle maps" for path planning with different robots! E.g. a drone can fly over tables, but a mobile robot may not. Both can share a VLMap of the same env, just with different object categories to index different obstacles.

1

0

3

Andy Zeng

@andyzeng_

3 years

@SongShuran Congratulations!!.

0

3

Andy Zeng

@andyzeng_

4 years

@ari_seff dangit ari. I knew it. got any other belated confessions?.

1

0

3

Andy Zeng

@andyzeng_

3 years

Submission deadline: May 25, 2022 AoE.Best Paper Award of $1,000 sponsored by @FlexivRobotics.

Flexiv Robotics

@FlexivRobotics

3 years

Take a look at #ATXWest exhibits: Flexiv's robotic #teleoperation allows the operator to make one or more remote arm(s) do synchronous operations by controlling a master arm and getting real-time force feedback. It can be widely applied in fields of medical treatment, R&D, etc.

0

3

Andy Zeng

@andyzeng_

2 years

@athundt @stepjamUK Thank you for the pointers, Andrew! This is an important topic and highly relevant to the workshop as well. We've added it as a topic to be discussed in the workshop and panel discussions.

1

0

2

Andy Zeng

@andyzeng_

7 years

Robo-pickers today are often programmed with only a single skill: how to grasp objects. But is it possible to have a robot automatically learn other skills (like pushing) to support more efficient picking? #Robotics #AI #DeepLearning Our latest work:

0

2

Andy Zeng

@andyzeng_

3 years

@Vikashplus @SurajNair_1 @aravindr93 @chelseabfinn Congrats! 🥳.

0

2

Andy Zeng

@andyzeng_

4 years

@danfei_xu Very cool work!.

1

0

2

Andy Zeng

@andyzeng_

3 years

@kevin_zakka @peteflorence Thrilled to have you back Kevin!.

1

0

2

Andy Zeng

@andyzeng_

7 years

Impressive work on deep quadruped control for animation from mocap data: “Mode-Adaptive Neural Networks for Quadruped Motion Control” by @blacksquirrel__ @dukecyto and others at Univ. of Edinburgh and Adobe Paper:

0

3

2

Andy Zeng

@andyzeng_

7 years

Can neural nets infer what’s behind you? Kinda. We tried :) Check out our new CVPR publication: #AI #DeepLearning #ComputerVision

0

1

2

Andy Zeng

@andyzeng_

3 years

@mohito1905 Love this! Fantastic work!.

0

2

Andy Zeng

@andyzeng_

3 years

@quasimondo That’s fantastic Mario! Really glad to hear it helped.

0

2

Andy Zeng

@andyzeng_

7 years

Check out our approach: The main goal of our work is to demonstrate that it is possible – and practical – for a robotic system to pick and recognize novel objects with only a few of their product images (e.g. scraped from the web), without any re-training.

0

1

Andy Zeng

@andyzeng_

7 years

Amazon plans to build a domestic robot, Vesta. Perhaps an Alexa on wheels? Exciting to see how this turns out! Could be the killer app for all of those 3D scene understanding algorithms we’ve built 🙂.

0

1

Andy Zeng

@andyzeng_

7 years

MIT's awesome article and video on our robo-picker: Project website+paper+code: Accepted for publication at ICRA 2018. Proud of the incredible team :) #Robotics #AI #DeepLearning

0

1

Andy Zeng

@andyzeng_

3 years

The action space is the label space in BC – some spaces smoother than others, depending on the task. Deep nets are biased to learn low-frequency functions first (Basri et al. . So in low-data regime, the choice of action space can influence generalization.

2

0

1