Introducing Yell At Your Robot (YAY Robot!) 🗣️- a fun collaboration b/w
@Stanford
and
@UCBerkeley
🤖
We enable robots to improve on-the-fly from language corrections: robots rapidly adapt in real-time and continuously improve from human verbal feedback.
YAY Robot enables
Transformers excel at identifying patterns, but they falter with limited data - a common setback in robotics.🤔
Introducing Cross-Episodic Curriculum (CEC), boosting learning efficiency & generalization of Transformer agents across RL & IL settings! 🧵
To appear at
#NeurIPS2023
Can robots be farsighted? We introduce SkiMo (Skill + Model-based RL), which allows more accurate and efficient long-horizon planning through temporal abstraction. SkiMo learns temporally-extended, sparse-reward tasks with 5x fewer samples!
🧵👇
Learning long-horizon tasks is hard, but it can be easier when learning in a better action space.
Our new waypoint method boosts imitation learning performance and data efficiency, proving effective across 8 robotic tasks & 10 datasets. Check out Chelsea’s 🧵 for more details!
As impressive as always, great work
@tonyzzhao
!!
Seeing Tony’s dexterous manipulation policies has changed my mind about what data can solve in the past year.
Now a question that keeps me up at night is what data cannot or should not solve.
Introducing 𝐀𝐋𝐎𝐇𝐀 𝐔𝐧𝐥𝐞𝐚𝐬𝐡𝐞𝐝 🌋 - Pushing the boundaries of dexterity with low-cost robots and AI.
@GoogleDeepMind
Finally got to share some videos after a few months. Robots are fully autonomous filmed in one continuous shot. Enjoy!
Introduce HumanPlus - Shadowing part
Humanoids are born for using human data. We build a real-time shadowing system using a single RGB camera and a whole-body policy for cloning human motion. Examples:
- boxing🥊
- playing the piano🎹/ping pong
- tossing
- typing
Open-sourced!
How does it work? A high-level policy (akin to a VLM) generates language instructions. Then, a low-level policy (end-to-end language-conditioned BC) executes the skill. This enables robots to understand language instructions and act on them.
In this work we integrate language corrections to supervise language-conditioned skills in real-time, and use this feedback to iteratively improve the policy.
Long-horizon tasks are hard - the longer it is, the more likely that some stage will fail. Can humans help robots continuously improve through intuitive and natural feedback?
Check out our
#CoRL2022
paper on learning skill dynamics for model-based RL! We present a sample efficient RL algorithms by temporal abstraction (skills).
@YoungwoonLee
@lucy_x_shi
(an undergrad who will graduate soon!)
During deployment, people can intervene through corrective language commands, overriding the high-level policy for robot’s on-the-fly adaptation. These interventions are then used to post-train and improve the high-level policy.
For more details:
📄Paper:
🌐Website + Code:
Endless thanks to my incredible team!
Co-lead:
@YunfanJiang
, w/
@__jakegrigsby__
@DrJimFan
@yukez
This journey of exploration and development has been immensely rewarding because of you!
Joint work w/
@YoungwoonLee
and
@JosephLim_AI
For more details and videos, check out the paper and website. We'll also make the code available soon.
Paper:
Project website:
Happy to answer any questions! ✨
We evaluate our method on four long-horizon, sparse-reward tasks that cover challenges in exploration, skill composition, generalization, and extremely task-agnostic datasets. Compared to prior methods, SkiMo achieves better performance and requires much fewer samples!
We find that robots continuously learn from interactions - language corrections improve the autonomous policy's performance by 20% through iterative post-training.
Then does it predict accurately over a long horizon? We compare the predictions of 500 timesteps using a flat model and skill dynamics model. The prediction from the flat model deviates from the ground truth quickly, but the prediction of the skill dynamics model has little error
“Foundation models” have catalyzed progress in large-scale research and applications. My hope is to see the emergence of "foundation hardware" in the near future. ALOHA exhibits immense potential in this regard. Check it out if you’re interested in fine manipulation!
For more results on hierarchical vs. flat BC, GPT-4V as high-level policy, impact of data quality, etc., check out our paper & website:
We also open-source the code for YAY Robot, and some automated tools for collecting language-annotated robotic data.
To investigate exploration & exploitation behaviors, we visualize trajectories in the replay buffer (light blue for early trajectories and dark blue for recent trajectories). SkiMo shows wide coverage of the maze early in the training, and fast convergence to the solution.
In pretraining, SkiMo leverages offline task-agnostic data to extract skill dynamics and a skill repertoire. Unlike prior works that keep the model and skill policy training separate, we propose to _jointly_ train them to extract a skill space that is conducive to plan upon.
Results speak! 🚀 CEC outperforms offline RL techniques, e.g., DT, and BC baselines trained on expert data, even exceeding RL oracles by up to 50% in *zero-shot* - all under the same parameters and data size!
Phase 2��⃣: Causally distilling policy refinement into Transformer agent model weights via *cross-episodic attention*, allowing the policy to trace & internalize improved behaviors from curricular data.
Introducing LocoProp, a new framework that reconceives a neural network as a modular composition of layers—each of which is trained with its own weight regularizer, target output and loss function—yielding both high performance and efficiency. Read more →
In downstream RL, we learn a high-level task policy in the skill space (skill-based RL) and leverage the skill dynamics model to generate imaginary rollouts for policy optimization and planning (model-based RL).
🤖👥 In IL settings, human demos vary in quality, but still showcase improvement patterns & generally effective manipulation skills across different operators 🎥:
We leverage Transformers to extract & *extrapolate* these patterns for faster, further improvement in embodied tasks
How to maximize learning from scarce data?
Key insight: looking at data _across_ episodes reveals useful improvement patterns. E.g., an RL agent acquires progressively better navigation skills 🎥:
🦾 Robust Policies: In novel test scenarios (e.g., unseen maze mechanisms, OOD difficulties, varying environment dynamics), CEC improves policy performance by up to 1.6x over RL oracles!
Method: Cross-Episodic Curriculum (CEC)
Phase 1️⃣: Formulating curricular sequences, capturing:
a) policy improvement in single environments,
b) learning progress in increasingly harder environments, or
c) demonstrators' rising proficiency
Humans efficiently plan with high-level skills to solve complex tasks, like washing and cutting for cooking. But MBRL today typically plans with single-step models, akin to a human planning out every muscle movement. This does not scale to long-horizon tasks!
SkiMo learns a model that predicts the effects of whole _skills_. This allows it to skip the low-level details of skill execution when reasoning over long time horizons --> faster planning & less error accumulation!
@natolambert
Altogether, the agent would then plan directly over time in the skill space (choose skill -> predict outcome -> repeat) & it can predict more accurately over the long term (temporally-extended reasoning + less required planning steps). 4/4
to clarify “dancing to the beat” - even though I'd be thrilled if our robot could learn to dance 🕺, the jerkiness at the end was simply because it’s unsure of its next move after task completion. The sync with background music is a happy coincidence 😂
@ChongZitaZhang
@Stanford
@UCBerkeley
ah that’s interesting! we don’t have this kind of high-level semantics in the data so don’t really know. I’d be curious! we’ve only tried motion/skill generalization bf - e.g. “wiggle” seems to generalize well to different objects
@natolambert
The policy is abstracted through skills (≈options) as continuous variables that encode action sequences (currently w/ fixed-length 10 for stability, variable-length skills will be an interesting future direction). 2/4
@DavidChen930109
Hey, I don’t think there’s documentation for Franka Kitchen in particular, but maybe you want to check out the D4RL website & repo for more info
Initially, I thought learning long-horizon bimanual fine manipulation tasks on real robots would be a nightmare. Surprisingly, it was an absolute joy. All credit goes to Tony's low-cost open-source hardware system, ALOHA 👏
Introducing ALOHA 🏖: 𝐀 𝐋ow-cost 𝐎pen-source 𝐇𝐀rdware System for Bimanual Teleoperation
After 8 months iterating
@stanford
and 2 months working with beta users, we are finally ready to release it!
Here is what ALOHA is capable of: