🚨 Big news 🚨
Together with a set of amazing folks we decided to start a company that tackles one of the hardest and most impactful problems - Physical Intelligence
In fact, we even named our company after that: or Pi (π) for short
🧵
Optimistic Actor Critic, with the principle of optimism in the face of uncertainty, obtains an exploration policy by using the upper bound instead of the lower bound. Learn how OAC increases sample efficiency compared to other methods:
#NeurIPS2019
Introducing 𝐌𝐨𝐛𝐢𝐥𝐞 𝐀𝐋𝐎𝐇𝐀🏄 -- Hardware!
A low-cost, open-source, mobile manipulator.
One of the most high-effort projects in my past 5yrs! Not possible without co-lead
@zipengfu
and
@chelseabfinn
.
At the end, what's better than cooking yourself a meal with the 🤖🧑🍳
Mind-blown from reading this paper: meta-gradient reinforcement learning (). A learning algorithm which learns to edit itself online during the training process at no extra data cost. Online cross-validation is so cool 🤯🤯🤯
1. Optimistic Actor Critic (neurips 2019 spotlight)
Existing tricks to stabilize training leads to pessimistic exploration. We introduce optimistic exploration and obtain sample efficiency gains!
Paper:
3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step:
Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source!
💻:
/🧵
To evaluate the RT-2-X model, we host the model in the cloud and query the model over the internet to run evaluation at Stanford and Berkeley. A glimpse into the robot cloud API future!
Proud to announce Dobb·E: the next step in home robot system that I was working on for the past 3 years.
We have visited 10 homes, learned 100+ tasks, and we are just getting started!
And we fully open-sourced it all, hardware, models, and software: 🧵
Max Entropy has been hugely influential in continuous RL, but why does it work? What's the mechanism of action? We believe it has to do with saturation in the action space!
Tune it at ICML 14 July 13-13:45 AOE and 23-23:45 AOE.
pdf:
@icmlconf
Cross-embodied robot policies hold the promise of one policy to control all robots. But how far does transfer go? In new work, we study positive transfer between *manipulation* & *navigation* and show that nav data helps manipulation, and vice versa!
🧵 👇
To evaluate the RT-1-X model, we sent the model checkpoints to 5 different academic labs and ran evaluation using existing robot infrastructure and control stack without any modifications. 🙀
We did not standardize the control stack across the 5 different labs.
The project is a collaboration between 173 researchers from 34 different research labs. We pooled together data to create one of a kind data sets, containing 22 embodiments.
We can tell our robots what we want them to do, but language can be underspecified. Goal images are worth 1,000 words, but can be overspecified.
Hand-drawn sketches are a happy medium for communicating goals to robots!
🤖✏️Introducing RT-Sketch:
🧵1/11
Our paper on model-free RL was accepted to
#ICLR2019
. Congrats to co-author Yiming Zhang (NYU) and Keith Ross (NYU/NYU Shanghai).
TLDR: find optimal non-parameterized policy by solving constrained optimization problem, then parameterize it.
Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step!
There’s lots to unpack here, so let’s do a deep dive into the dataset!
🧵1/15
Modeling wise, we made minimal changes to RT-1 and RT-2 and were surprised that we obtained performance improvement out of the box.
We refer to the RT-1 and RT-2 model trained on the X-Embodiment dataset as RT-1-X and RT-2-X.
Super simple code change to get value-based deep RL scale *much* better w/ big models across the board on Atari games, robotic manipulation w/ transformers, LLM + text games, & even Chess!
Just use classification loss (i.e., cross entropy), not MSE!!
🧵⬇️
It's been a few days since the RT-X release, and one of the most gratifying things to me in the reaction is the recognition of how much this was a team effort -- a large portion of the robotic learning community coming together to do something bigger than any one lab could do.
Many researchers have asked us about sharing our RT dataset and making it easier to participate in large-scale robot learning research.
We're working on it and we'll have some updates on this soon! 👀
Introducing Mirage: Zero-shot transfer of visuomotor policies to unseen robot embodiments 🤖
With Mirage, you can train a policy on one robot and deploy it on a different one that it has never seen, with no additional data or training! 🧵👇 (1/8)
🌐
The timeline split of AI vs Robot Hardware has changed
the last 90 days i’ve witnessed industry leading AI in our lab running on humanoid hardware, and frankly it’s blown me away
i’m watching robots performing complex tasks entirely with neural nets. AI trained tasks that i
@xf1280
@DrJimFan
@scott_e_reed
Thanks Fei!
Please note 3Hz is the system-level latency, e.g. including camera and communication overhead.
The neural network itself runs much faster (Table 13 on page 30 fyi)
2. Pre-training as Batch Meta Reinforcement Learning with tiMe
We introduce a pre-training method for RL that only uses observational data and NO environment interaction during meta-train
It generalizes zero-shot to unseen MDP. Important to allow for scalable data collection.
So far, there have been some remarkable large-scale robotic learning results, datasets, and milestones this year. But we have something pretty big coming out tomorrow. So big that we needed a globe to visualize its scale😉
@nguyentienvu
Reminds me of an email that starts with “It is our pleasure to inform you that your grant application has been rejected...” true story 😀😀😀
📢Thrilled to announce sudoAI (
@sudoAI_
), founded by a group of leading AI talents and me!🚀
We are dedicated to revolutionizing digital & physical realms by crafting interactive AI-generated 3D environments!
Join our 3D Gen AI model waitlist today!
👉
With Kamil Ciosek, Robert Loftin, Katja Hofmann of MSR Cambridge.
My contribution was done during my internship, from which I grew a whole lot!
If you want a non-trivial probability of producing a spotlight, apply here : )
SOTA grasping network fails catastrophically when transferring to new robot morphologies because the network overfits to the geometry of the gripper.
Our approach recovers >90% grasping performance, without training on any real world grasping data.
2. An efficient, simple and theoretically motivated method for safe RL! The techniques should be applicable to any optimization problem where the objective is convex in the output of the NN!
Arxiv:
PyTorch 1.3 includes support for model deployment to mobile devices, quantization, & front-end improvements, like the ability to name tensors. New tools & libraries are also launching for improved model interpretability & multimodal development. Read more:
Given RGBD observations of a table top scene, we:
1. reconstruct the geometry of the objects in the scene
2. place the reconstructions in a simulated environment (without needing pose estimation at all)
3. use the reconstructions to train or fine-tune grasping networks
Played escape the room ytd. Must be how it felt to be an RL agent, forced to generalize to an unseen MDP with spare reward function, guided by a learnt intrinsic reward function 🧐
Interesting that they allow human to tele-operate the arms instead of letting the robot autonomously propose goals.
Can this be a design choice to maintain safety during training?
Excited to share our new work on learning from play!
We show a single agent, after self-supervising on 3 hours of play data, can generalize to 18 zero-shot manipulation tasks with 85% success.
interactive paper:
1/
Big thanks to co-authors who made research a less lonely endeavor!
Jiachen Li (UCSD)
Shuang Liu (UCSD)
Minghua Liu (UCSD)
@MLciosek
@hiskov
Hao Su (UCSD)
Yiming Zhang (NYU)
Keith Ross (NYU)
@icmlconf
The ICML page inside cmt just stopped loading. It was working 5 minutes ago. I can't load the page to initiate reviewers' discussion. Help pls!
Can we also have auto-complete for natural language text, rather than just latex command?
@overleaf
Would save a lot of typing, especially for scientific lingo!