🚀 Excited to share our latest work: MANIPULATE-ANYTHING! 🦾 This scalable method pushes the boundaries of real-world robotic manipulation through zero-shot task execution and automated BC data generation. Here's a quick overview:👇
For large-scale robotic deployment🤖 in the real-world 🌏, robots must adapt to changes in environment and objects. Ever questioned the generalizability of your robot's manipulation policy? Put it to the test with The Colosseum 🏛️. Check out our project:
Humans use pointing to communicate plans intuitively. Compared to language, pointing gives more precise guidance to robot behaviors.
Can we teach a robot how to point like humans? Introducing RoboPoint 🤖👉, an open-source VLM instruction-tuned to point.
Check out our new work:
🚨Is it possible to devise an intuitive approach for crowdsourcing trainable data for robots without requiring a physical robot🤖?
Can we democratize robot learning for all?🧑🤝🧑
Check out our latest
#CoRL2023
paper->
AR2-D2: Training a Robot Without a Robot
Grad school application season is here.
I always find that PhD application is about outsmarting the smart ones, so I wrote this article to share some of my experiences in PhD application.
Best of luck!
After receiving multipe offers from top computer science/AI programmes globally, I have decided to pursue my PhD study at the
@UW
,
@uwcse
.
I will be joining the legendary roboticist, Professor Dieter Fox's group at University of Washington!
#phdlife
#AcademicChatter
The upper bound of a VLA model is probably 90% of all tasks, but there is going to be inherently that 10% of robotics tasks which couldn’t be solved without the understanding of other multimodal inputs (such as audio, tactile and etc). 🤖
🎉Happy to share that I will be joining NVIDIA as a 2024 summer research scientist intern working on robotic manipulation. See you in Santa Clara and Seattle!
*Hope to meet Jensen in-person
Feels surreal, but here I am. Travelling halfway across the 🌎 to do my
#PhD
for the next 5 years.
A mix feeling of uncertainty and excitement for the unknown future.
#phdlife
@PhDchatter
Exploring robotic manipulation via LLMs & VLMs, we introduce NEWTON🍎: a Repository, Pipeline, and Benchmark designed to evaluate the physical reasoning capability of LLMs. Discover more in our
#EMNLP
paper next week!
Try out
#LumaDreamMachine
for robotics action generation, even though there are artifacts in the object generated, but I would say that the kinematics of the robot motion is pretty good. Can we use for robotics data?
Still feels very much surreal that in the past two weeks I met both the Godfather and Godmother of AI 🤯.
@ylecun
@drfeifei
Learnt so much from them 📖
Now back to
#RSS
grind
If you attended Prof. Dieter Fox's talk on 'Where is RoboGPT?' At
@RoboticsSciSys
, you'll recall his sharing of our work, RoboPoint a domain-specific spatial VLM for generating intermediate representations beneficial for robotics tasks. We are pleased to announce that our live
Humans use pointing to communicate plans intuitively. Compared to language, pointing gives more precise guidance to robot behaviors.
Can we teach a robot how to point like humans? Introducing RoboPoint 🤖👉, an open-source VLM instruction-tuned to point.
Check out our new work:
I am deeply honored to have been featured in the "I am CSE" series by
@uwcse
as part of the graduate visit event. In this feature, I had the opportunity to share my experience in selecting UW along with a preview of my first year research.
Will be presenting this work at the
#ICRA2024
Future roadmap for manipulation skills workshop tomorrow! I will be the first to present, come check it out and talk to me about Benchmarking for Robotics Manipulation. 🦾
For large-scale robotic deployment🤖 in the real-world 🌏, robots must adapt to changes in environment and objects. Ever questioned the generalizability of your robot's manipulation policy? Put it to the test with The Colosseum 🏛️. Check out our project:
📣Humans intuitively focus on things essential for the task at hand, inspired by this idea💡. We're excited to explore whether an Embodied AI agent can develop a more task-driven visual perception through such a 'filtering' mechanism. 🤖✨
🔍 Check out our latest work on the
Foundation models in robotics can probably handle 90% of real-world tasks given enough data. However, it's the remaining 10% that truly matters, and this gap is unbridgeable without other modalities. Hence, I am excited about our new
#RSS
work on tactile-based LVLM for robotics!
TLDR: Octopi is a LVLM that leverages both tactile representation learning and large vision-language models to predict and reason about vision-based tactile inputs with minimal language fine-tuning.
Project page:
Paper:
With so many exciting work that came out over the past few weeks, we will like to do a soft release of our benchmark🧑💻
Everything you would need to get started with Colosseum are well documented here:
But feel free to contact us for more questions!
For large-scale robotic deployment🤖 in the real-world 🌏, robots must adapt to changes in environment and objects. Ever questioned the generalizability of your robot's manipulation policy? Put it to the test with The Colosseum 🏛️. Check out our project:
This is my first robotics conference and it was awesome! Thanks to the organisers of
#ICRA2023
for making it a successful event!
And to everyone that I have met at the conference, let's stay in touch!
Had a blast presenting our work at
@corl_conf
2023 and making new friends in robot learning. Huge shoutout to the organizers for a fantastic conference!
#CoRL2023
📢Heading to
@corl_conf
in Atlanta to present "AR2-D2: Training a Robot without a Robot." Learn about how to gather training data for robotic demonstrations without needing a physical robot?! 🤖 Join Poster Session 2: RL/IL, Nov 7, 4:15-5:00 PM for a live demo!
#Robots
#AR
#data
📣Humans intuitively focus on things essential for the task at hand, inspired by this idea💡. We're excited to explore whether an Embodied AI agent can develop a more task-driven visual perception through such a 'filtering' mechanism. 🤖✨
🔍 Check out our latest work on the
Embodied-AI 🤖 models employ general-purpose vision backbones such as CLIP to encode the observation. How can we have a more task-driven visual perception for embodied-AI?
We introduce a parameter-efficient approach that selectively filters visual representations for Embodied-AI
"A Survey on Machine Learning Approaches for Modelling Intuitive Physics" has been accepted to
@IJCAIconf
2022, survey track (18% acceptance rate).
Check out the paper and resources.
Paper:
Webpage:
#MachineLearning
#Physics
PIP has been accepted to
#ECCV2022
!
This work helps to improve physical reasoning through mimicking the selective temporal attention of humans during mental simulations.
Paper:
Website:
@SamsonYuBaiJian
@bihan_wen
@soujanyaporia
Let’s think about humanoid robots outside carrying the box. How about having the humanoid come out the door, interact with humans, and even dance?
Introducing Expressive Whole-Body Control for Humanoid Robots:
See how our robot performs rich, diverse,
We tackle robotic manipulation with real-world imitation learning or sim2real approaches. But what if we merge the two?
Introducing CyberDemo: a novel pipeline that leverages simulated human demos to master real-world dexterous manipulation tasks.
Foundational models for robotics shouldn’t be just coming from one kind of data, this work is really cool! Combining the different modalities of data as different tasks might need different kinds of data to aid with learning.
How to deal with the vastly heterogeneous datasets and tasks in robotics? Robotic data are generated from different domains such as simulation, real robots, and human videos. We study policy composition (PoCo) using diffusion models.
project website:
The upper bound of a VLA model is probably 90% of all tasks, but there is going to be inherently that 10% of robotics tasks which couldn’t be solved without the understanding of other multimodal inputs (such as audio, tactile and etc). 🤖
Finally
#CoRL
is over, I will be attending
#CVPR2024
in Seattle. If anyone is coming, feel free to reach out, I am open to chat about anything 🔥in AI and Robotics!
🎉 Excited to share our latest work on physical-reasoning systems through modelling Violation-of-expectation in AI system.🤖🧠
" A Benchmark for Modeling Violation-of-Expectation
in Physical Reasoning Across Event Categories"
Which was accepted to
#CogSci
2023.
Happy to share that our work Good Time to Ask was awarded the best paper award at Ubiquitous Robotics 2023! Thank you to Korea Robotics Society (KROS) for organizing and hope to keep in touch with everyone I met.
With the rise of low-cost robotics hardware these days, it reminded me of this robotic ‘arm’ I built in high school 😂, almost a decade ago. Still excited about it today!
Introducing 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Hardware!
A low-cost, open-source, mobile manipulator.
One of the most high-effort projects in my past 5yrs! Not possible without co-lead
@zipengfu
and
@chelseabfinn
.
At the end, what's better than cooking yourself a meal with the 🤖🧑🍳
Great work by
@mohito1905
!
GENIMA has shown an incredible level of generalizability in our Colosseum benchmark!
The formulation of action-generation as an image-generation is definitely a clever look at things, making this approach comparable to 3D-based next point prediction
Image-generation diffusion models can draw arbitrary visual-patterns. What if we finetune Stable Diffusion to 🖌️ draw joint actions 🦾 on RGB observations?
Introducing 𝗚𝗘𝗡𝗜𝗠𝗔
paper, videos, code, ckpts:
🧵Thread⬇️
Even though these tasks were all tele-operated, but I see the upper bound of household robotics and its potential! Really kudos to
@zipengfu
and
@tonyzzhao
on this!
Mobile ALOHA's hardware is very capable. We brought it home yesterday and tried more tasks! It can:
- do laundry👔👖
- self-charge⚡️
- use a vacuum
- water plants🌳
- load and unload a dishwasher
- use a coffee machine☕️
- obtain drinks from the fridge and open a beer🍺
- open
Kind of a late post. This quarter I am TAing CSE 599 AI vs IA for
@RanjayKrishna
. This course will explore the opportunities and intersections between HCI and AI research! Super interesting and insightful course, highly recommending everyone to take it in the next winter quarter.
Happy to share that we have 0/2
#NeurIPS2022
papers accepted. Maybe it is like all the other great science discoveries that are just ahead of its time.
We will continue to work hard on it and release the code after the world has caught up with us. 😂
#phdchat
#AcademicChatter
Flying to
@RoboticsSciSys
, looking forward to meet up with new and old friends.
Anyone keen to meet up to chat about anything 🔥 in AI and Robotics, ping me up!
How to tackle complex household tasks like opening doors, and cleaning tables in real?
Introducing HarmonicMM: Our latest model seamlessly combines navigation and manipulation, enabling robots to tackle household tasks using only RGB visual observation and robot proprioception.
Thanks to
@TheAITalksOrg
for the invite! I will be sharing some of our recent work that focuses on how we could scale up ‘data’ the right way for building foundation models in robotics 🤖.
Feel free to join virtually!
Tesla Optimus can arrange batteries in their factories, ours can do skincare (on
@QinYuzhe
)!
We opensource Bunny-VisionPro, a teleoperation system for bimanual hand manipulation. The users can control the robot hands in real time using VisionPro, flexible like a bunny. 🐇
🎉 Very Excited to present our recent work on “Selective🔍 Visual Representations for Embodied-AI🤖” next week at ICLR in Vienna🇦🇹!!
📣📣Important update! Our code and pretrained models are now available through our project website 🌐: 🚀
👋Come to my
I had the privilege of attending an insightful keynote by the legendary
@chrmanning
. His guidance on how PhD candidates can excel in the era of Large Language Models (LLMs) was particularly enlightening. Additionally, it was an honor to present our research at
#EMNLP2023
. The
Mobile ALOHA 🏄 is coming soon!
Special thanks to
@tonyzzhao
for throwing random objects into the scene, and
@chelseabfinn
for the heavy pot (> 3 lbs) !
Stay tuned!
After one full year of revision and rebuttal. Our survey paper on Embodied AI has finally been published at IEEE Transaction on Emerging Topics in Computational Intelligence.
#embodied
#AI
#AcademicChatter
Sometimes I feel that doing a PhD is like hiking. It is going to tough and painful as you are climbing up, but once you reach the peak and see the glorious view, it is all worthwhile!
@AcademicChatter
#phd
#hiking
OLMo is here! And it’s 100% open.
It’s a state-of-the-art LLM and we are releasing it with all pre-training data and code. Let’s get to work on understanding the science behind LLMs. Learn more about the framework and how to access it here:
Alas, 3 weeks is far too short of a time to see such excellent and admirable people. I don't know half of you half as well as I should like, and I like less than half of you half as well as you deserve. 😉 1/2
Children are the best learning agent! Hence, we proposed "ABCDE: An Agent-Based Cognitive Development Environment." A work done by a final year student that I co-advised.
Happy to share that it has been accepted to the Embodied AI Workshop at this year's
#CVPR2022
!
Stay tune!
I will be in London for
#ICRA2023
@ieee_ras_icra
. Excited to present my work on Good Time to Ask on 29 May at the Communicating Robot Learning across Human-Robot Interaction workshop!
Robotics and AI draw inspiration from many different fields. Hence, I will be going
#Cogsci
2023 at Australia,Sydney to learn and present our work.
Ping me up if you would like to meet!
Crazy AF. Paper studies
@_akhaliq
and
@arankomatsuzaki
paper tweets and finds those papers get 2-3x higher citation counts than control.
They are now influencers 😄 Whether you like it or not, the TikTokification of academia is here!
Meet Dobb·E: a home robot system that needs just 5 minutes of human teaching to learn new tasks. Dobb·E has visited 10 homes, learned 100+ tasks, and we are just getting started!
Dobb·E is fully open-sourced (including hardware, models, and software):
🧵
Introducing Open-World Mobile Manipulation 🦾🌍
– A full-stack approach for operating articulated objects in open-ended unstructured environments:
Unlocking doors with lever handles/ round knobs/ spring-loaded hinges 🔓🚪
Opening cabinets, drawers, and refrigerators 🗄️
👇
Introduce 3D Diffusion Policy (DP3), a simple visual imitation learning algorithm that achieves:
- 55.3% relative improvement on 72 simulated tasks, most with 10 demos
- 85% success rates on 4 real-world tasks, with 40 demos🥟🌯
Open-sourced!
Code/Data: