andyzeng_ Profile Banner
Andy Zeng Profile
Andy Zeng

@andyzeng_

Followers
8K
Following
986
Media
60
Statuses
368

Building smarter robots @GoogleDeepMind. PhD @Princeton. CS & Math @UCBerkeley

Joined September 2017
Don't wanna be here? Send us removal request.
@andyzeng_
Andy Zeng
3 years
With multiple foundation models “talking to each other”, we can combine commonsense across domains, to do multimodal tasks like zero-shot video Q&A or image captioning, no finetuning needed. Socratic Models:.website + code: .paper:
21
379
2K
@andyzeng_
Andy Zeng
3 years
Join us next week at the CVPR Tutorial on Vision-Based Robot Learning!.We’ll distribute Colabs that show you how to run Socratic Models for language-driven robot pick & place right in your browser (in person, or online!).
17
205
1K
@andyzeng_
Andy Zeng
6 years
Can robots learn to pick up stuff and accurately toss them into bins outside its natural range? Check out our latest work, TossingBot! w/ @SongShuran, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser #robotics #AI #research
9
156
476
@andyzeng_
Andy Zeng
2 years
Code-writing LLMs are surprisingly good at 📝 writing reward functions for 🦾 MPC low-level control – providing a chat-like interface to teach robots to things like "stand up and moon-walk" 🐶. Read more about it here 👇 and @xf1280's 🧵.
@xf1280
Fei Xia
2 years
🤖Excited to share our project where we propose to use rewards represented in code as a flexible interface between LLMs and an optimization-based motion controller. website: Want to learn more about how we make a robot dog do moonwalk MJ style?🕺🕺
5
97
394
@andyzeng_
Andy Zeng
2 years
Can robots 🤖 to navigate to sounds 🔊 they've heard?. w/ audio-language 🔊✏️ foundation models, excited that we can now ask our helper robots to "go to where you heard coughing". Audio-Visual-Language Maps w/ @huang_chenguang @oier_mees @wolfram_burgard:
1
46
201
@andyzeng_
Andy Zeng
2 years
We built PaLM-E 🌴🤖 one of the largest multimodal language models to date, trained end-to-end on robot data. Images, text, state inputs, neural scene embeddings – you name it. And it's fantastic on robots. Check out Danny's thread 👇.
@DannyDriess
Danny Driess
2 years
What happens when we train the largest vision-language model and add in robot experiences?.The result is PaLM-E 🌴🤖, a 562-billion parameter, general-purpose, embodied visual-language generalist - across robotics, vision, and language. Website:
2
26
197
@andyzeng_
Andy Zeng
2 years
Still crazy to me that we can prompt LLMs (GPT-3 or PaLM) with a bunch of numbers 📝 to discover and improve closed-loop policies that stabilize CartPole – entirely in-context w/o model finetuning. Read more in @suvir_m's post 👇 and try out the code
@suvir_m
Suvir Mirchandani
2 years
In a new preprint, we assess LLMs’ in-context learning abilities for *abstract* non-linguistic patterns—& explore how this might be useful for robotics. Examples:.-extrapolating symbolic patterns.-extending periodic motions.-discovering simple policies (e.g. for CartPole).(1/8)
0
39
195
@andyzeng_
Andy Zeng
2 years
Excited to share "Visual Language Maps"! VLMaps fuse visual language model features into a dense 3D map for robot navigation from natural language instructions. Website: Led by the amazing @huang_chenguang w/ @oier_mees, @wolfram_burgard
3
31
159
@andyzeng_
Andy Zeng
5 years
Tried ImageNet pre-training for your robot learning models only to find out it didn't help? Turns out, which dataset you use & which weights you transfer, matters a lot. Check out our blog post!. w/ @yen_chen_lin @SongShuran @phillip_isola Tsung-Yi Lin.
@GoogleAI
Google AI
5 years
Check out new research into applying transfer learning to robotic manipulation. By leveraging pre-trained weights from computer vision models, it’s possible to greatly improve the training efficiency for robotic manipulation tasks. Learn all about it at
Tweet media one
1
36
142
@andyzeng_
Andy Zeng
7 years
Released @PyTorch code for “Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning” (works for robots in both sim & real). Happy hacking :).Code: Paper: Project:
2
55
126
@andyzeng_
Andy Zeng
3 years
From recalling events, to contextual and temporal reasoning – prompting foundation models to engage in guided Socratic discussions enables a variety of new open-ended video Q&A capabilities.
1
8
107
@andyzeng_
Andy Zeng
2 years
The Abstraction Reasoning Corpus from @fchollet is a hard AGI benchmark – LLMs are the closest thing to a generalist that can do 85+ problems, and still continue to do many of them with completely random tokens sampled from the vocabulary. This token invariance is fascinating 🤔
Tweet media one
@suvir_m
Suvir Mirchandani
2 years
In a new preprint, we assess LLMs’ in-context learning abilities for *abstract* non-linguistic patterns—& explore how this might be useful for robotics. Examples:.-extrapolating symbolic patterns.-extending periodic motions.-discovering simple policies (e.g. for CartPole).(1/8)
1
15
100
@andyzeng_
Andy Zeng
6 years
Through vision and interaction, can robots discover the physical properties of objects? We explore this question in our latest work, which will appear at RSS tomorrow. See you there! w/ Zhenjia Xu,@jiajunwu_cs, Josh Tenenbaum, @SongShuran #robotics #AI
1
14
98
@andyzeng_
Andy Zeng
2 years
Had a blast demo'ing language + robots (w/ PaLM 2) at Google I/O!.w/ @xf1280 @brian_ichter @RandomRobotics @peteflorence Spencer Goodrich.(glad we didn't tank the stock price) 😅
Tweet media one
3
4
99
@andyzeng_
Andy Zeng
3 years
For end-to-end robot learning: pixels to joint angles? or to cartesian poses?. IKP uses Implicit BC + (differentiable) kinematics to learn inductive patterns in both action spaces. w/ @AdityaGanapathi @peteflorence Jake Varley @kaylburns @Ken_Goldberg
1
15
92
@andyzeng_
Andy Zeng
3 years
One way to approach video understanding is to turn it into a reading comprehension problem. This turns a classically hard computer vision task into something that we know large language models are good at.
2
6
88
@andyzeng_
Andy Zeng
2 years
Language models can generate plans and code 📝 but sometimes they’re just better off responding "I dunno 🤷‍♀️". Read more 👇 on how we’re aligning LLM uncertainty (with statistical guarantees) on robots 🤖 where safety matters.
@allenzren
Allen Z. Ren
2 years
LLMs can generate plans and write robot code 📝 but they can also make mistakes. How do we get LLMs to 𝘬𝘯𝘰𝘸 𝘸𝘩𝘦𝘯 𝘵𝘩𝘦𝘺 𝘥𝘰𝘯'𝘵 𝘬𝘯𝘰𝘸 🤷 and ask for help?. Read more on how we can do this (with statistical guarantees) for LLMs on robots 👇.
0
19
88
@andyzeng_
Andy Zeng
5 years
Sensing transparent objects is the Achilles heel of 3D vision in robotics. Can deep learning help? Check out ClearGrasp -- enabling commodity RGB-D sensors to see transparent surfaces, and improve robotic grasping. w/ SynthesisAI, @Columbia, @GoogleAI.
1
21
75
@andyzeng_
Andy Zeng
3 years
What do we need to scale robot learning? .Self-supervision? Simulation? Distributed training?. Here’s our call for workshop papers:.ICRA 2022 Workshop on Scaling Robot Learning. We’ve got a great lineup of speakers. Looking forward to your contributions!
0
16
66
@andyzeng_
Andy Zeng
3 years
A couple more examples – here’s zero-shot image captioning, with the large language model (LM) and visual-language model (VLM) working together. Code is already open-source for this one:
1
4
65
@andyzeng_
Andy Zeng
4 years
Turns out spatial action maps + intention representations improve multi-agent multi-skill coordination for mobile manipulation. We also have adorable little throwing Anki robots now too! .w/ Jimmy Wu, X. Sun, @SongShuran, S. Rusinkiewicz, T. Funkhouser
0
18
63
@andyzeng_
Andy Zeng
1 year
Large model planners (PaLM-E) generate text 📝 but struggle with physics – can generating videos 🎞️ help?. In "Video Language Planning" we train VLMs + video models to enable robots to imagine (then do) really long multi-step tasks. Led by the amazing @du_yilun @mengjiao_yang👇.
@du_yilun
Yilun Du
1 year
Introducing Video Language Planning!. By planning across the space of generated videos/language, we can synthesize long-horizon video plans and solve much longer horizon tasks than existing baseline (such as RT-2 and PALM-E). (1/5)
1
8
60
@andyzeng_
Andy Zeng
2 years
Turns out robots can write their own code using LLMs, given natural language instructions by people!.Part of the magic is hierarchical code-gen (e.g. recursively defining functions), which also improves sota on generic codegen benchmarks too. Check out 🧵from Jacky!.
@jackyliang42
Jacky Liang
2 years
How can robots perform a wide variety of novel tasks from natural language? . Execited to present Code as Policies - using language models to directly write robot policy code from language instructions. See paper, colabs, blog, and demos at long 🧵👇
0
7
59
@andyzeng_
Andy Zeng
6 years
Incredibly thrilled to have our work on TossingBot receive Best Systems Paper Award at RSS 2019! Congrats to all my coauthors, and huge shout out to my collaborators @GoogleAI who helped make this work possible :). Links to paper and videos:
Tweet media one
3
4
60
@andyzeng_
Andy Zeng
1 year
Getting closer to robots 🤖 that in-context learn (fast adaptation) by day 📝 fine-tune by night 😴. Excited that we're thinking more about human-robot interaction as model predictive control (powered by Foundation models)!. Read more 👇
@jackyliang42
Jacky Liang
1 year
We can teach LLMs to write better robot code through natural language feedback. But can LLMs remember what they were taught and improve their teachability over time?. Introducing our latest work, Learning to Learn Faster from Human Feedback with Language Model Predictive Control
0
8
56
@andyzeng_
Andy Zeng
3 years
We wrote a blog post on !. (a) If a person had to transfer pens between cups – they might do it all at once.(b) If a robot had to do the same – might do it one-by-one due to hardware limits. Can robots self-learn skill (b) from videos of person doing (a)?
@GoogleAI
Google AI
3 years
Introducing XIRL, a self-supervised method for Cross-embodiment Inverse RL, which summarizes task objective knowledge from videos in the form of reward functions used to teach tasks to robots with new physical embodiments. Read more and copy the code ↓
0
6
57
@andyzeng_
Andy Zeng
2 years
We're hosting a workshop on Language and Robotics this year at #RSS2023!.We've got an incredible panel of speakers, and we're excited to discuss together the future of articulate robots! Join us and make a submission here:
Tweet media one
0
11
52
@andyzeng_
Andy Zeng
3 years
This came out of an amazing collaboration between Robotics and AR teams at Google w/ @almostsquare @tek2222 @kchorolab @fedassa @aveekly @ryoo_michael @vikassindhwani @JohnnyChungLee Vincent Vanhoucke @peteflorence.
1
1
49
@andyzeng_
Andy Zeng
3 years
In general, we’re excited about Socratic Models – they present new ways to think about how we can tackle new multimodal applications with the existing foundation models that we already have today, without additional finetuning or data collection.
Tweet media one
2
0
49
@andyzeng_
Andy Zeng
3 years
We’re hosting the 2nd Workshop on “Scaling Robot Learning” at RSS 2022!.Beyond scaling robot systems, this 2nd edition of the workshop focuses on how academia can contribute with algorithmic advancements. We have an amazing lineup of speakers!.Website:
Tweet media one
3
8
44
@andyzeng_
Andy Zeng
3 years
Really excited about this! Simple BC with implicit models (states & actions as input) can learn complex closed-loop manipulation skills from RGB pixels better than their explicit counterparts, and give rise to a new class of BC baselines that are competitive with SOTA offline RL.
@peteflorence
Pete Florence
3 years
Excited to share more about our "Implicit Behavioral Cloning" work! . ✅*code* just released: ✅*videos*: Will be sharing more this week at #CoRL2021. I'll also maybe write a TL;DR thread soon, meanwhile, check out the website!
0
15
46
@andyzeng_
Andy Zeng
4 years
It turns out that rigid spatial displacements can serve as useful priors for non-rigid ones — enabling goal-driven rearrangement of deformable objects! Check out our latest work w/ Daniel Seita, @peteflorence, @JonathanTompson, @erwincoumans, @vikassindhwani, @ken_goldberg.
@GoogleAI
Google AI
4 years
Learn about a new open-source benchmark and suite of simulated tasks for robotic manipulation of deformable objects — including cables, fabrics and bags — with a set of model architectures that enable learning complex relative spatial relations.
0
5
37
@andyzeng_
Andy Zeng
3 years
And here’s video-to-text retrieval. The Socratic Models framework makes it easy to add together new modalities (like speech from audio). In this case we can provide a new zero-shot SoTA, nearing the best finetuned methods.
1
1
38
@andyzeng_
Andy Zeng
1 year
Visual prompting 🖼️📝 meets sampling-based optimization 🎲📊. Pivot is a neat way to extract more (e.g. spatial, actionable) knowledge from large VLMs (GPT-4 or Gemini) in ways that can be used on agents and robots. Read more 👇 at
@brian_ichter
Brian Ichter
1 year
How do you get zero-shot robot control from VLMs?. Introducing Prompting with Iterative Visual Optimization, or PIVOT! It casts spatial reasoning tasks as VQA by visually annotating images, which VLMs can understand and answer. Project website:
Tweet media one
0
2
39
@andyzeng_
Andy Zeng
7 months
Congratulations!.
@SongShuran
Shuran Song
7 months
UMI got the Outstanding System Paper finalist #RSS2024. Congratulations team!! 🥳.Hope to see more UMI running around the world 😊 !
Tweet media one
0
2
37
@andyzeng_
Andy Zeng
2 years
Training robot hands 🤖 to play piano 🎹 is surprisingly hard – subtleties of precise contact, hitting chords at just the right moments, moving fingers in anticipation of what comes next… all make it a great testbed for control. Check out our latest benchmark in Kevin's post!👇.
@kevin_zakka
Kevin Zakka
2 years
Introducing 𝗥𝗼𝗯𝗼𝗣𝗶𝗮𝗻𝗶𝘀𝘁 🎹🤖, a new benchmark for high-dimensional robot control! Solving it requires mastering the piano with two anthropomorphic hands. This has been one year in the making, and I couldn’t be happier to release it today! Some highlights below:
1
6
31
@andyzeng_
Andy Zeng
7 years
Impressive results from “Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation” by @peteflorence, Lucas Manuelli, Russ Tedrake. They show a robot using the learned descriptors to find class-consistent grasping points.
0
20
30
@andyzeng_
Andy Zeng
2 years
Really excited about our latest upgrades to PaLM-SayCan! Love that we get to benefit from LLM capabilities, as we start thinking more about language as robot middleware. 🧵👇.
@hausman_k
Karol Hausman
2 years
We have some exciting updates to SayCan! Together with the updated paper, we're adding new resources to learn more about this work:.Interactive site: Blog posts: and Video:
1
2
30
@andyzeng_
Andy Zeng
2 years
Join us for the workshop on "Pre-training Robot Learning" at CoRL 2022!.Submission deadline: Sep 28, 2022. We have an incredible lineup of speakers!.Website: @stepjamUK's 🧵👇
Tweet media one
@stepjamUK
Stephen James
2 years
Announcing the 1st "Workshop on Pre-training Robot Learning" at @corl_conf, Dec 15. Fantastic lineup of speakers: Jitendra Malik, Chelsea Finn, Joseph Lim, Kristen Graumen, Abhinav Gupta, Raia Hadsell. Submit your 4-page extended abstract by September 28.
2
5
30
@andyzeng_
Andy Zeng
3 years
Socratic Models meets DALLE-2!.and generates these captions:. 0.3211 A creative android works on a painting in a laboratory. 0.3067 "A robotic painter in a future art studio.".0.2926 The future of painting? A robotic pilot creates a work of art.
@markchen90
Mark Chen
3 years
"a robot hand painting a self portrait on a canvas" by dalle-2 (
Tweet media one
2
12
30
@andyzeng_
Andy Zeng
1 year
Code-writing LLMs 📝 + Python interpreters 💻 can do powerful things - but sometimes asking LLM to "simulate" the interpreter can help with linguistic subtasks (e.g. get_facts, detect_sarcasm). "LMulators" for code-driven reasoning sets new SOTA on BBH. See @ChengshuEricLi 🧵👇.
@ChengshuEricLi
Chengshu Li
1 year
We are excited to announce Chain of Code (CoC), a simple yet surprisingly effective method that improves Language Model code-driven reasoning. On BIG-Bench Hard, CoC achieves 84%, a gain of 12% over Chain of Thought. Website: Paper:
0
5
27
@andyzeng_
Andy Zeng
3 years
Incredibly exciting work from our colleagues at Google on large language models <-> robot affordances!.
@hausman_k
Karol Hausman
3 years
Super excited to introduce SayCan (: 1st publication of a large effort we've been working on for 1+ years. Robots ground large language models in reality by acting as their eyes and hands while LLMs help robots execute long, abstract language instructions
0
1
28
@andyzeng_
Andy Zeng
7 years
Check out the summary video of our upcoming @iros_2018 publication: “Learning Synergies between Pushing and Grasping with Self-supervised Deep Reinforcement Learning” #robotics #iros2018 .Full video: Project webpage:
0
11
23
@andyzeng_
Andy Zeng
2 years
I love that LLMs are capable of generating code to compose vision APIs 📷 to answer questions about images – with competitive few-shot performance!. "infinite use of finite means," excited about the potential here for vision. Check out our take on CodeVQA in Sanjay's thread! 👇.
@sanjayssub
Sanjay Subramanian
2 years
New paper at #acl2023nlp!."Modular Visual Question Answering via Code Generation".With @medhini_n @kushaltk1248 @KevinYa33964384 @NagraniArsha @CordeliaSchmid @andyzengtweets @trevordarrell Dan Klein (@berkeley_ai/@GoogleAI)!.📜 💻
0
5
24
@andyzeng_
Andy Zeng
5 years
Predicting dense heatmaps of navigational endpoints from visual input seems to help RL agents more quickly learn mobile manipulation tasks like pushing. Check out spatial action maps! .w/ Jimmy Wu, X. Sun, @SongShuran J. Lee, S. Rusinkiewicz, T. Funkhouser
0
1
23
@andyzeng_
Andy Zeng
3 years
In robot learning, we often assume discrete-time MDPs. But physical robots are not discrete-time! asynchronously: images @ 30Hz, proprioceptive @ 100Hz, force-torque @ 500Hz…. InFuser takes a step towards continuous-time multi-scale feedback control w/ N-CDEs.See Sumeet’s 🧵👇.
@Sumeet_Robotics
Sumeet Singh
3 years
Introducing 'InFuser' - an architecture for learning hybrid continuous-time policies for dynamic tasks!. Using Neural CDEs [@patrickkidger], we present a model for handling irregularly sampled multi-frequency-multi-sensory observations, and outputting continuous-time control.
Tweet media one
0
2
22
@andyzeng_
Andy Zeng
6 years
3D semantic keypoints can be more useful and generalizable than 6D poses — as shown for robotic manipulation by colleagues at MIT. Check out their work!.
0
14
22
@andyzeng_
Andy Zeng
4 years
Rearranging deep features can provide spatial structure that improves learning rearrangement tasks for robot manipulation. Brute-force search here is just a convolution, fast and practical for real-world pick-and-place. Now open-source (thanks to @ayzwah)!
@GoogleAI
Google AI
4 years
Can models more efficiently learn rearrangement tasks by overlaying 3D space instead of using object-centric representations? Check out Transporter Nets, an open-source framework for sample-efficient robot manipulation, with related benchmark tasks. See ↓
0
3
20
@andyzeng_
Andy Zeng
6 years
Need a small and fast inverse kinematics solver for real-time robot motion planning? Released a low-friction Python wrapper over OpenRave's IKFast, with a demo for the UR5: Useful for manipulation, visualization… or prototyping things like this:
0
1
17
@andyzeng_
Andy Zeng
5 years
Check out this excellent summary video of Form2Fit by @karoly_zsolnai from Two Minute Papers!.
@twominutepapers
Two Minute Papers
5 years
This Robot Arm Learned To Assemble Objects It Hasn’t Seen Before.▶️Full video (ours): 📜Source paper: #ai #deeplearning #science #twominutepapers
0
2
18
@andyzeng_
Andy Zeng
4 years
Great blog post from Kevin on sample efficient representations and inductive biases! Convolutions for vision, self-attention for language and sequences---what will it be for robotics?.
@kevin_zakka
Kevin Zakka
4 years
New blog post: "Representation Matters" How cleverly designing your state and action space can give you orders of magnitude more sample efficiency in imitation learning.
0
3
16
@andyzeng_
Andy Zeng
3 years
We’d like robots to learn from YouTube videos… but humans not only *look* different, but also *do* things differently than robots. We study 3rd-person imitation with self-supervised rewards that generalize to new embodiment appearances and control strategies. Kevin’s thread!👇.
@kevin_zakka
Kevin Zakka
3 years
How can robots 🤖 learn from videos of humans, especially when humans perform the same task in different ways?. A 🧵 introducing our #CoRL2021 paper "XIRL: Cross-embodiment Inverse RL". Website & code: 1/
0
3
17
@andyzeng_
Andy Zeng
2 years
Incredibly compelling results on modeling the distributional multimodalities ✌️ in policy space. Fantastic work Cheng and team!.
@chichengcc
Cheng Chi
2 years
What if the form of visuomotor policy has been the bottleneck for robotic manipulation all along? Diffusion Policy achieves 46.9% improvement vs prior StoA on 11 tasks from 4 benchmarks + 4 real world tasks! (1/7). website : paper:
0
1
17
@andyzeng_
Andy Zeng
7 years
Is it possible to predict 3D data and semantics for a full 360° view using only a single image? Come to our oral for Im2Pano3D this Wed at #CVPR2018! GPU-enabled code @NVIDIAAIDev available: w/ @SongShuran A Chang, M Savva, @silviocinguetta, T Funkhouser
0
7
15
@andyzeng_
Andy Zeng
2 years
This project was led by the amazing @huang_chenguang w/ @oier_mees and @wolfram_burgard, an incredibly fun collaboration w/ Freiburg University and University of Technology Nuremberg. Website & paper:
Tweet media one
5
2
15
@andyzeng_
Andy Zeng
6 years
Awesome summary of TossingBot from @karoly_zsolnai @ Two Minute Papers. Thanks for sharing Karoly!.
@twominutepapers
Two Minute Papers
6 years
This Robot Arm AI Throws Objects with Amazing Precision - #tossingbot #ai
0
1
14
@andyzeng_
Andy Zeng
3 years
Congratulations Shuran!.
@CUSEAS
Columbia Engineering
3 years
Congrats to our @ColumbiaCompSci Prof Shuran Song @SongShuran, who's won an @NSF CAREER award to enable #Robots to learn on their own and adapt to new environments. @ColumbiaScience @Columbia
Tweet media one
0
0
15
@andyzeng_
Andy Zeng
3 years
Textual closed-loop feedback enables language model robot planners to:.✅ react to lower-level control mistakes.✅ adapt to new instructions on-the-fly.✅ propose new plans if original was unfeasible.✅ answer natural language questions about their understanding of the world.🧵👇.
@hausman_k
Karol Hausman
3 years
Have you ever “heard” yourself talk in your head? Turns out it's a useful tool for robots too!. Introducing Inner Monologue: feeding continual textual feedback into LLMs allows robots to articulate a grounded “thought process” to execute long, abstract instructions 🧵👇
0
1
14
@andyzeng_
Andy Zeng
2 years
From digital assistants to robot butlers, learning user preferences and generalizing them to new settings can provide a more personalized experience. Check out Jimmy's 🧵👇 on how we're doing this with LLMs and foundation models towards a generalist TidyBot 🧹 (that throws!).
@jimmyyhwu
Jimmy Wu
2 years
When organizing a home, everyone has unique preferences for where things go. How can household robots learn your preferences from just a few examples?. Introducing 𝗧𝗶𝗱𝘆𝗕𝗼𝘁: Personalized Robot Assistance with Large Language Models. Project page:
0
0
13
@andyzeng_
Andy Zeng
2 years
There's so much information stored in audio data – and I'm excited that we can tap into that with audio-language models, on systems where language serves as robot middleware.
Tweet media one
1
1
11
@andyzeng_
Andy Zeng
3 years
Are there questions you’d like to ask our speakers for the panel discussion during the “Scaling Robot Learning” workshop at #ICRA2022 (May 27)?. Fill out this form here: .Workshop:
Tweet media one
0
3
12
@andyzeng_
Andy Zeng
4 years
Congrats Shuran!.
@SongShuran
Shuran Song
4 years
Honored to be a Microsoft Research Faculty Fellow!.
0
0
12
@andyzeng_
Andy Zeng
3 years
Extending our submission deadline for the ICRA '22 Workshop on "Scaling Robot Learning" to Apr 18 (anywhere-on-earth time)! We have an incredible list of workshop speakers. There will also be a Best Paper Award of $1,000 (sponsored by Google).
Tweet media one
0
1
11
@andyzeng_
Andy Zeng
2 years
Turns out we can get multiple models to jointly steer LLM next-token prediction as a way to ground them (e.g. to visual inputs, physical world) Check out Wenlong's thread on what we're excited to call "grounded decoding" 👇.
@wenlong_huang
Wenlong Huang
2 years
Large language models gathered tons of world knowledge by speaking human language. But can they ever speak “robot language”?. Introducing “Grounded Decoding”: a scalable way to decode *grounded text* from LLM for robots. Website: 🧵👇
0
1
11
@andyzeng_
Andy Zeng
7 years
Had a fantastic time at #ICRA2018! Here was our poster on “Robotic Pick-and-place of Novel Objects in Clutter with Multi-affordance Grasping and Cross-domain Image Matching” with Team MIT-Princeton from the Amazon Robotics Challenge!
Tweet media one
0
3
9
@andyzeng_
Andy Zeng
5 years
@kevin_zakka @SongShuran Paper: Webpage:
0
10
9
@andyzeng_
Andy Zeng
6 years
Very exciting coverage on our work on robotic manipulation from two-minute papers! Thank you Karoly!.
@twominutepapers
Two Minute Papers
6 years
This Robot Learned To Clean Up Clutter. Full video:
1
1
9
@andyzeng_
Andy Zeng
5 years
100% agree. It still surprises me to this day just how much easier the optimization can be, with the right architectures and data representations. E.g. Transformers!.
@shaneguML
Shane Gu
5 years
An incredible feature of neural nets is that by choosing certain architectures they may exhibit extreme generalization. Similarly in sequence prediction, by choosing the right RNNs, the model may generalize extremely.
0
1
9
@andyzeng_
Andy Zeng
6 years
Excellent article from @CadeMetz @nytimes on our work in robotics at @GoogleAI. Check it out!.
@CadeMetz
Cade Metz
6 years
After about 14 hours of trial and an error inside Google's new lab, this robotic arm learns to pick up objects and toss them into a bin several feet away:
0
2
8
@andyzeng_
Andy Zeng
2 years
Lots of recent work in the area! in just the last month: NLMap & CLIP-Fields VLMaps is only our take on the problem, but I love that we get to explore spatial goals as a central part of the problem + open vocab obstacle maps
Tweet media one
0
1
5
@andyzeng_
Andy Zeng
3 years
Exciting new ICRA paper led by @WiYoungsun!. VIRDO uses neural fields to predict how an object will deform, given visual-tactile sensing w/ partial point clouds + forces & contact .. w/ @NimaFazeli7’s robotics lab at UMich, @peteflorence
Tweet media one
1
0
7
@andyzeng_
Andy Zeng
5 years
For grasping, we find that MS COCO > ImageNet, transferring both backbone & head yields better self-supervised trial-and-error exploration.
2
0
6
@andyzeng_
Andy Zeng
7 years
Fantastic sculptures from @JohnVMuntean A nice example for why a 3D understanding our visual world is important for AI systems -- to be aware of 2D ambiguity.
0
2
6
@andyzeng_
Andy Zeng
7 years
Thrilled to be a recipient of the NVIDIA fellowship. Thanks NVIDIA!
1
0
6
@andyzeng_
Andy Zeng
2 years
Excited to see others explore this area as well! We got to integrate parts of ConceptFusion in ways that were complementary to our approach to get the best of both.
Tweet media one
1
1
6
@andyzeng_
Andy Zeng
2 years
VLMaps provides spatial grounding for VLMs like LSeg. Notably, when combined with code-writing LLMs, this allows navigating to spatial goals from natural language such as: "go in between the sofa and TV" or "move 3 meters to the right of the chair"
1
0
5
@andyzeng_
Andy Zeng
2 years
What's also exciting to me about this direction, is that it benefits not only from "scaling up" ⬆️ with larger foundation models, but also from "scaling horizontally" ↔️ with parallel simulation and online optimization e.g., via MuJoCo MPC.
0
0
4
@andyzeng_
Andy Zeng
2 years
We do see varying performance between foundation models (e.g. wav2clip and AudioCLIP). While AudioCLIP worked well for our use case, the system is also fairly flexible, so we can hot-swap with new audio-language models as they come.
Tweet media one
1
0
5
@andyzeng_
Andy Zeng
2 years
AVLMaps fuses (A)udio (V)isual (L)anguage features into a shared 3D map representation that's open-vocabulary, where we can localize landmarks using multimodal queries like textual descriptions, images, or audio snippets.
Tweet media one
1
0
4
@andyzeng_
Andy Zeng
2 years
In environments where there are multiple instances of the same object (chairs, tables, or sofas), sound can help robots disambiguate destinations – "go to the sofa near the sound of the baby crying." @huang_chenguang puts this to the test with a number of simulated experiments.
Tweet media one
1
0
5
@andyzeng_
Andy Zeng
2 years
We looked at comparing to CoW and LM-Nav and we were excited to see VLMaps improve in capacity to (i) navigate to spatial goals, and (ii) handle long-horizon tasks with multiple subgoals (w/ ambiguity)
Tweet media one
1
0
3
@andyzeng_
Andy Zeng
2 years
VLMaps allows "open vocabulary obstacle maps" for path planning with different robots! E.g. a drone can fly over tables, but a mobile robot may not. Both can share a VLMap of the same env, just with different object categories to index different obstacles.
1
0
3
@andyzeng_
Andy Zeng
3 years
@SongShuran Congratulations!!.
0
0
3
@andyzeng_
Andy Zeng
4 years
@ari_seff dangit ari. I knew it. got any other belated confessions?.
1
0
3
@andyzeng_
Andy Zeng
3 years
Submission deadline: May 25, 2022 AoE.Best Paper Award of $1,000 sponsored by @FlexivRobotics.
@FlexivRobotics
Flexiv Robotics
3 years
Take a look at #ATXWest exhibits: Flexiv's robotic #teleoperation allows the operator to make one or more remote arm(s) do synchronous operations by controlling a master arm and getting real-time force feedback. It can be widely applied in fields of medical treatment, R&D, etc.
0
0
3
@andyzeng_
Andy Zeng
2 years
@athundt @stepjamUK Thank you for the pointers, Andrew! This is an important topic and highly relevant to the workshop as well. We've added it as a topic to be discussed in the workshop and panel discussions.
1
0
2
@andyzeng_
Andy Zeng
7 years
Robo-pickers today are often programmed with only a single skill: how to grasp objects. But is it possible to have a robot automatically learn other skills (like pushing) to support more efficient picking? #Robotics #AI #DeepLearning Our latest work:
0
2
2
@andyzeng_
Andy Zeng
3 years
0
0
2
@andyzeng_
Andy Zeng
4 years
@danfei_xu Very cool work!.
1
0
2
@andyzeng_
Andy Zeng
3 years
@kevin_zakka @peteflorence Thrilled to have you back Kevin!.
1
0
2
@andyzeng_
Andy Zeng
7 years
Impressive work on deep quadruped control for animation from mocap data: “Mode-Adaptive Neural Networks for Quadruped Motion Control” by @blacksquirrel__ @dukecyto and others at Univ. of Edinburgh and Adobe Paper:
0
3
2
@andyzeng_
Andy Zeng
7 years
Can neural nets infer what’s behind you? Kinda. We tried :) Check out our new CVPR publication: #AI #DeepLearning #ComputerVision
Tweet media one
0
1
2
@andyzeng_
Andy Zeng
3 years
@mohito1905 Love this! Fantastic work!.
0
0
2
@andyzeng_
Andy Zeng
3 years
@quasimondo That’s fantastic Mario! Really glad to hear it helped.
0
0
2
@andyzeng_
Andy Zeng
7 years
Check out our approach: The main goal of our work is to demonstrate that it is possible – and practical – for a robotic system to pick and recognize novel objects with only a few of their product images (e.g. scraped from the web), without any re-training.
0
0
1
@andyzeng_
Andy Zeng
7 years
Amazon plans to build a domestic robot, Vesta. Perhaps an Alexa on wheels? Exciting to see how this turns out! Could be the killer app for all of those 3D scene understanding algorithms we’ve built 🙂.
0
0
1
@andyzeng_
Andy Zeng
7 years
MIT's awesome article and video on our robo-picker: Project website+paper+code: Accepted for publication at ICRA 2018. Proud of the incredible team :) #Robotics #AI #DeepLearning
0
0
1
@andyzeng_
Andy Zeng
3 years
The action space is the label space in BC – some spaces smoother than others, depending on the task. Deep nets are biased to learn low-frequency functions first (Basri et al. . So in low-data regime, the choice of action space can influence generalization.
Tweet media one
2
0
1