Karl Pertsch Profile
Karl Pertsch

@KarlPertsch

Followers
3K
Following
405
Media
91
Statuses
336

Robot Foundation Models @ UC Berkeley & Stanford & @physical_int | Postdoc w/ Sergey Levine & Chelsea Finn | Prev. Intern @ Google Brain, Meta AI | PhD @ USC.

Joined July 2015
Don't wanna be here? Send us removal request.
@KarlPertsch
Karl Pertsch
16 days
Excited to release FAST, our new robot action tokenizer! ๐Ÿค–. Some highlights:.- Simple autoregressive VLAs match diffusion VLA performance.- Trains up to 5x faster.- Works on all robot datasets we tested.- First VLAs that work out-of-the-box in new environments!. ๐Ÿงต/
17
96
508
@KarlPertsch
Karl Pertsch
1 year
Very excited to release the Open X-Embodiment Dataset today โ€” the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step!. Thereโ€™s lots to unpack here, so letโ€™s do a deep dive into the dataset!. ๐Ÿงต1/15
8
90
444
@KarlPertsch
Karl Pertsch
4 months
I'll give a talk at the Multimodal Agents workshop at ECCV tomorrow Sept 30, at 2:20pm CET. Excited for my first talk at a vision conference: robotics is increasingly becoming a multi-modal sequence modeling problem w/ lots of potential for LLM/VLM researchers to have big impact!
Tweet media one
7
46
433
@KarlPertsch
Karl Pertsch
1 year
3 mo. ago we released the Open X-Embodiment dataset, today weโ€™re doing the next step:.Introducing Octo ๐Ÿ™, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source!.๐Ÿ’ป: /๐Ÿงต
10
90
368
@KarlPertsch
Karl Pertsch
8 months
Very excited to release OpenVLA today, a 7B parameter open-source vision-language-action model (VLA). ๐Ÿฆพ SoTA generalist policy (better than Octo & RT-2-X).โšก๏ธ Easy to run & fine-tune on 1 GPU with quantization and LoRA.๐Ÿ’ป Open-source PyTorch codebase.๐Ÿค— Models on HuggingFace. 1/
4
64
390
@KarlPertsch
Karl Pertsch
4 months
It was fun giving this talk yesterday!.The live talk wasn't recorded, but I just uploaded the recording of a practice run I did the night before (link below). A short thread of key points from the talk ๐Ÿงต.
@KarlPertsch
Karl Pertsch
4 months
I'll give a talk at the Multimodal Agents workshop at ECCV tomorrow Sept 30, at 2:20pm CET. Excited for my first talk at a vision conference: robotics is increasingly becoming a multi-modal sequence modeling problem w/ lots of potential for LLM/VLM researchers to have big impact!
Tweet media one
3
29
224
@KarlPertsch
Karl Pertsch
11 months
Access to *diverse* training data is a major bottleneck in robot learning. We're releasing DROID, a large-scale in-the-wild manipulation dataset. 76k trajectories, 500+ scenes, multi-view stereo, language annotations etc.Check it out & download today!. ๐Ÿ’ป:
8
59
193
@KarlPertsch
Karl Pertsch
7 months
Our OpenVLA model has been downloaded more than 20k times in less than a month -- the most for any robotics model on the ๐Ÿค— hub by a long shot!. Here is a little "cook book" for people who want to get started using OpenVLA! ๐Ÿง‘โ€๐Ÿณ. 1/๐Ÿงต
Tweet media one
2
16
166
@KarlPertsch
Karl Pertsch
3 months
I will be at @corl_conf this week, co-presenting 4 papers and one workshop across the full spectrum of scalable robot learning research: data, models & evals!. Also happy to chat about research @physical_int!. Short ๐Ÿงตw/ paper pointers ๐Ÿ‘‡
3
13
132
@KarlPertsch
Karl Pertsch
3 months
I started at Pi in part-time a few months back, and I'm excited to share what we've been up to!.ฯ€โ‚€ is the first generalist VLA that can solve many *dexterous* tasks, including some really long-horizon laundry manipulation tasks. A few notes ๐Ÿ‘‡.
@physical_int
Physical Intelligence
3 months
At Physical Intelligence (ฯ€) our mission is to bring general-purpose AI into the physical world. We're excited to show the first step towards this mission - our first generalist model ฯ€โ‚€ ๐Ÿง  ๐Ÿค–. Paper, blog, uncut videos:
2
5
108
@KarlPertsch
Karl Pertsch
9 months
Our OpenX paper won best paper at ICRA! Congrats to all my co-authors! ๐ŸŽ‰๐ŸŽ‰.This is an ongoing effort, we recently added new datasets from the community that double the size of the OpenX dataset -- keep 'em coming! :) . Check datasets & how to contribute:
Tweet media one
3
14
104
@KarlPertsch
Karl Pertsch
5 months
Excited to announce the Workshop on X-Embodiment Robot Learning at #Corl2024!. How can we build robot foundation models that can control many different robots & where do we find data to train them?.Submit your work on scalable & x-embodied robot learning and join us in Munich! ๐Ÿ™‚
Tweet media one
2
9
102
@KarlPertsch
Karl Pertsch
9 months
Octo has been accepted to RSS and we finally arxiv'd the paper! ๐Ÿ™.Many small updates vs the December release: more ablations, new checkpoints, code fixes etc.๐Ÿ‘‡.
@_akhaliq
AK
9 months
Octo. An Open-Source Generalist Robot Policy. Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a
Tweet media one
4
10
97
@KarlPertsch
Karl Pertsch
3 months
Happy that *two* of our CoRL papers got nominated as outstanding paper award finalists (ReMix & OpenVLA)!. Congrats to all my co-authors, esp. @JoeyHejna, @moo_jin_kim, @siddkaramcheti. And congrats to the award winners from AI2 & TRI, well deserved! :)
Tweet media one
4
4
94
@KarlPertsch
Karl Pertsch
3 months
Excited to kick off the X-Embodiment workshop @corl_conf in the morning (9am @ room Terra)! . We have an exciting lineup of speakers, including @SongShuran, @kvablack, @ryancjulian, @ehsanik, @yukez, @Ed__Johns, @svlevine
Tweet media one
3
14
87
@KarlPertsch
Karl Pertsch
1 year
It was fun to present Open X-Embodiment & RT-X at CoRL today with @QuanVng! We were very excited about the initial release of the Open X-Embodiment dataset, but it's just the start! We covered lots of open problems in the talk as well๐Ÿ‘‡
Tweet media one
1
7
73
@KarlPertsch
Karl Pertsch
3 months
Had a great time organizing yesterday's X-Embodiment workshop!. The full recording and all papers are now live on our workshop website -- learn all the latest on x-embodiment & scalable robot learning research!.
@KarlPertsch
Karl Pertsch
3 months
Excited to kick off the X-Embodiment workshop @corl_conf in the morning (9am @ room Terra)! . We have an exciting lineup of speakers, including @SongShuran, @kvablack, @ryancjulian, @ehsanik, @yukez, @Ed__Johns, @svlevine
Tweet media one
3
14
73
@KarlPertsch
Karl Pertsch
2 years
Excited to present STAR, our work on cross-domain imitation @corl_conf!.Our goal: use demonstrations across domains, e.g. from robot in kitchen A to robot in kitchen B, or even from human to robot. With STAR I can teach a robot new tasks with videos recorded in my kitchen!. ๐Ÿงต๐Ÿ‘‡
1
18
68
@KarlPertsch
Karl Pertsch
16 days
With FAST, we scale autoregressive VLA training to pi0 scale, and we can solve some pretty complex robot tasks, simply via next token prediction!. The best part: in our experiments, pi0+FAST converges 5x faster than diffusion pi0! Days instead of weeks of training! ๐ŸŽ‰. 3/
Tweet media one
Tweet media two
Tweet media three
3
8
59
@KarlPertsch
Karl Pertsch
1 year
It's awesome to see the positive community response to our release! We're getting inquiries from around the world to contribute more data -- wheeled robots, drones, humanoids etc! ๐Ÿš€๐Ÿš€๐Ÿš€.Please keep them coming ๐Ÿ™‚.open-x-embodiment@googlegroups.com.
@KarlPertsch
Karl Pertsch
1 year
Very excited to release the Open X-Embodiment Dataset today โ€” the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step!. Thereโ€™s lots to unpack here, so letโ€™s do a deep dive into the dataset!. ๐Ÿงต1/15
0
7
58
@KarlPertsch
Karl Pertsch
5 months
If you're interested in scalable robot learning & applying for PhDs this cycle, apply to @shahdhruv_'s new lab at Princeton!.Dhruv pioneered X-embodied robot foundation models for navigation and I'm sure his lab will work on lots of exciting large-scale robot learning problems!.
@shahdhruv_
Dhruv Shah
5 months
Excited to share that I will be joining @Princeton as an Assistant Professor in ECE & Robotics next academic year! ๐Ÿฏ๐Ÿค–. I am recruiting PhD students for the upcoming admissions cycle. If you are interested in working with me, please consider applying.
0
5
57
@KarlPertsch
Karl Pertsch
5 months
Curating large-scale robot training datasets is mostly black magic right now -- I called the data mix we used for Octo & OpenVLA the "magic soup"๐Ÿง™โ€โ™‚๏ธ. In our project ReMix, Joey made a first step towards a more principled solution -- automatically finding good data mixture weights!.
@JoeyHejna
Joey Hejna
5 months
As imitation learning policies continue to scale, deciding how to weigh different robot datasets will become even more difficult. To address this problem we introduce ReMix, a method for automatically curating large RT-X scale imitation learning datasets. ๐Ÿงต(1/5)
0
6
55
@KarlPertsch
Karl Pertsch
9 months
Evaluation of robot foundation models is a huge challenge: imagine running robot rollouts across 100s of scenes + tasks + embodiments. How can we make eval keep up w/ model improvements?.Introducing SIMPLER: sim eval envs for your favorite real robot foundation models!. Short ๐Ÿงต.
@XuanlinLi2
Xuanlin Li (Simon)
9 months
Scalable, reproducible, and reliable robotic evaluation remains an open challenge, especially in the age of generalist robot foundation models. Can *simulation* effectively predict *real-world* robot policy performance & behavior?. Presenting SIMPLER!๐Ÿ‘‡.
1
6
41
@KarlPertsch
Karl Pertsch
7 months
Excited to release our work on Embodied Chain-of-Thought Reasoning today!. We can boost performance of vision-language-action models like OpenVLA by a large margin without any additional robot training data!. The key: simply think before you act!. 1/.
@MiZawalski
Michaล‚ Zawalski
7 months
๐Ÿค–Can robots think through complex tasks step-by-step like language models?.We present Embodied Chain-of-Thought Reasoning (ECoT): enabling robots to reason about plans and actions for better performance๐ŸŽฏ, interpretability๐Ÿง, and generalization๐ŸŒŽ. See
1
8
42
@KarlPertsch
Karl Pertsch
2 years
Robot learning needs data, but collecting it is expensive. How can we make the most of existing datasets?.In SPRINT, we use LLMs to auto-augment language instructions on robot datasets. Our agents learn a lot more tasks during pre-training *for free*!.See Jesseโ€™s ๐Ÿงตfor details!๐Ÿ‘‡.
@Jesse_Y_Zhang
Jesse Zhang
2 years
Having humans annotate data to pre-train robots is expensive and time-consuming!. Introducing SPRINT: .A pre-training approach using LLMs and offline RL to equip robots w/ many language-annotated skills while minimizing human annotation effort!. URL: ๐Ÿงต๐Ÿ‘‡
1
2
39
@KarlPertsch
Karl Pertsch
4 years
New paper on *Skill-based Learning with Demonstrations (SkiLD)*!.While current imitation learning follows the _low-level actions_ in the demos, SkiLD follows the demonstrated _skills_. SkiLD enables efficient demo-guided RL & imitation learning on long-horizon tasks!. 1/N
1
5
34
@KarlPertsch
Karl Pertsch
16 days
The key idea: FAST compresses actions before training on them. This removes redundancy & makes autoregressive VLA training on high-frequency tasks possible, where models like OpenVLA failed. We use the discrete cosine transform for compressing actions (also used by eg JPEG). 2/
Tweet media one
1
3
34
@KarlPertsch
Karl Pertsch
10 months
Shoutout to the folks at Rerun who built a visualizer for our DROID dataset -- looks very cool! Allows you to visualize the point cloud from our multi-view stereo cams as well! And should work for any new dataset collected on the DROID robot platform!.Thanks @rerundotio :).
@rerundotio
Rerun
10 months
A Rerun Viewer for the DROID Dataset!. DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset is a robot manipulation dataset by @SashaKhazatsky et al. with 76k demonstration trajectories or 350h of interaction data, collected across 564 scenes and 86 tasks.
1
2
32
@KarlPertsch
Karl Pertsch
2 years
New work on scaling robot learning from the team I work with at Google!. Especially excited about RT1โ€™s capability to ingest data from diverse sources, eg sim or even experience from other robots + demonstrate transfer -- very useful for scaling robotic dataset size & diversity!.
@hausman_k
Karol Hausman
2 years
Introducing RT-1, a robotic model that can execute over 700 instructions in the real world at 97% success rate!. Generalizes to new tasksโœ….Robust to new environments and objectsโœ….Fast inference for real time controlโœ….Can absorb multi-robot dataโœ….Powers SayCanโœ….๐Ÿงต๐Ÿ‘‡
1
0
29
@KarlPertsch
Karl Pertsch
2 years
Data collection is a major bottleneck in robot learning: itโ€™s mostly done w/ tedious & expensive human teleoperation. Can we use learning to make data collection itself more efficient?.Introducing PATO, our approach for scalable robot data collection w/ learned assistive policies.
@ShivinDass
Shivin Dass
2 years
Excited to present PATO: Policy Assisted TeleOperation, our recent work on scaling robot data collection!. PATO uses a policy trained on prior data to assist the user during data collection, making teleop easier and even allows to teleop multiple robots simultaneously. ๐Ÿงต๐Ÿ‘‡
1
4
29
@KarlPertsch
Karl Pertsch
4 years
Grateful to be awarded the best paper presentation award @corl_conf! ๐ŸŽ‰.Huge credit goes to all my lab mates @ CLVR lab, particularly to my co-author @YoungwoonLee, for all the tireless feedback that greatly improved the talk! :). Talk recording:
Tweet media one
3
2
29
@KarlPertsch
Karl Pertsch
5 months
Our SIMPLER sim evaluation, now w/ GPU parallelization thanks to @Stone_Tao! Great work!.
@Stone_Tao
Stone Tao
5 months
just made it possible to evaluate generalist robotics models like Octo at 60-100x real world evaluation speeds via gpu simulation and rendering (~10x faster than original cpu sim code). All videos below are from our open source ManiSkill GPU sim!
Tweet media one
Tweet media two
Tweet media three
1
2
28
@KarlPertsch
Karl Pertsch
16 days
We are releasing a FAST tokenizer we pre-trained on 1M real robot action sequences. In our tests it works well across all kind of robots โ€” and itโ€™s all on HuggingFace! Happy VLA training! :). 5/
Tweet media one
1
2
28
@KarlPertsch
Karl Pertsch
11 months
Check out Lucy's new project! Finally, every roboticist's favorite pastime, "yelling at your robot", can be useful for once!.Bonus: lots of ALOHA trail mix in the lab! ๐Ÿ˜.
@lucy_x_shi
Lucy Shi
11 months
Introducing Yell At Your Robot (YAY Robot!) ๐Ÿ—ฃ๏ธ- a fun collaboration b/w @Stanford and @UCBerkeley ๐Ÿค–. We enable robots to improve on-the-fly from language corrections: robots rapidly adapt in real-time and continuously improve from human verbal feedback. YAY Robot enables
1
0
24
@KarlPertsch
Karl Pertsch
4 years
How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills!.Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors!.๐Ÿ“„Paper: ๐Ÿ’ปWebsite & Code: Thread๐Ÿ‘‡(1/8)
2
11
24
@KarlPertsch
Karl Pertsch
1 year
If you want to browse through the Open X-Embodiment data, but don't like fiddling with Colabs, check out this neat website @its_dibya built that gives you a quick overview of all datasets!.
@its_dibya
Dibya Ghosh
1 year
Got a chance to dig through the big robot X-embodiment dataset released last week, and hacked together a little website for others to look through the data. Check it out! There's some pretty random and diverse robot data in there
0
2
23
@KarlPertsch
Karl Pertsch
8 months
This looks awesome! Simulation can be a valuable tool for robot data scaling & eval, but the hard part is building diverse simulation envs AND datasets. Glad to see Soroush et al's sim data line of work expanded to more diverse envs! Excited to give this a try!.
@snasiriany
Soroush Nasiriany
8 months
Iโ€™m excited to introduce RoboCasa, a large-scale simulation framework for everyday tasks. Scaling is the key driving force to unlocking generalist robots, and RoboCasa leverages simulation to take scaling to a whole new level. A short ๐Ÿงต
2
3
24
@KarlPertsch
Karl Pertsch
16 days
My favorite result: with FAST we can finally train VLAs on the DROID dataset & they work zero-shot in many scenes! Below is the same policy controlling robots at Berkeley, Stanford and UW. Just point a camera at the scene, type out an instruction, et voila!. 4/
2
2
23
@KarlPertsch
Karl Pertsch
3 months
Nice! Hopefully more hardware companies will follow this example and contribute to open-source datasets for robot learning research! :).
@UnitreeRobotics
Unitree
3 months
Unitree G1 Open Source Dataset.In order to promote the development of the global embodied AIย industry, the Unitree G1 robot operation data set is open sourced, adapted to a variety of open source solutions, and continuously updated:.Open source data collection:
0
2
23
@KarlPertsch
Karl Pertsch
4 months
Big thanks to my (co-)leads on the presented papers:.OpenVLA: @moo_jin_kim @siddkaramcheti .Embodied CoT: @MiZawalski @verityw_ . If you're interested in scalable robot learning, go follow them :). Full talk recording:
0
5
22
@KarlPertsch
Karl Pertsch
4 months
The talk covers our recent work on training vision-language-action models for robotics. I'll discuss how embracing LLMs/VLMs in robotics allows us to scale policy learning, but also cover key differences btw robotics and other multi-modal agents. Zoom:
Tweet media one
3
4
22
@KarlPertsch
Karl Pertsch
2 years
Glad to see RT-2 out! We show that VLM backbones are a great way to equip policies with robustness from internet-scale data. RT-2 strongly improves the generalization ability of existing skills (eg new scenes / objects) -- learning new low-level behaviors is the next frontier!.
@hausman_k
Karol Hausman
2 years
PaLM-E or GPT-4 can speak in many languages and understand images. What if they could speak robot actions?. Introducing RT-2: our new model that uses a VLM (up to 55B params) backbone and fine-tunes it to directly output robot actions!
1
2
21
@KarlPertsch
Karl Pertsch
9 months
Big FOMO! -- but you guys will rock the presentation :) .If you're @ ICRA, check out Quan's presentation of our Open X-Embodiment project today, nominated for a best paper award ๐ŸŽ‰. Room: CC-Main Hall.Time: 10:30-12:00.
@QuanVng
Quan Vuong
9 months
Wish @KarlPertsch was at ICRA for Open X-Embodiment ๐Ÿฅฒ.
1
0
20
@KarlPertsch
Karl Pertsch
5 years
0
1
19
@KarlPertsch
Karl Pertsch
16 days
Please find more details about FAST in our paper!.Thanks to @KyleStachowicz and many colleagues @physical_int who helped with this project!. Paper: Website:
2
1
19
@KarlPertsch
Karl Pertsch
16 days
I am very excited about FAST, because. (1) it makes VLA training really easy, even on complex tasks, and. (2) with FAST itโ€™s trivial to interleave non-robot data in VLA training (web data, subgoals, video prediction etc), itโ€™s all just tokens!. Lots of things to explore! :). 6/.
1
2
19
@KarlPertsch
Karl Pertsch
7 months
Cool use of a fine-tuned VLM for autonomous driving! Appreciate all the ablations in the paper + focus on speeding up inference on edge compute!.
@zhaohang0124
Hang Zhao
7 months
Introducing ๐ƒ๐ซ๐ข๐ฏ๐ž๐•๐‹๐Œ, VLM meets Autonomous Driving. We propose a dual system that drives a car autonomously in complex driving scenarios. - Slow system: VLM.- Fast system: classical AD pipeline.Enjoy our onboard demo!.Project Page:
0
2
18
@KarlPertsch
Karl Pertsch
3 months
Check out Kevin's thread on ฯ€โ‚€ -- Kevin had huge impact on model design & implementation!. To get all practitioner's tips for training generalist robot policies, don't miss his talk at our X-Embodiment workshop at CoRL next week! (we'll try to stream!).
@kvablack
Kevin Black
3 months
It's been 6 months since I slammed the brakes on several PhD research projects to go work at ฯ€. ๐Ÿ˜… super excited to finally share our results! A short ๐Ÿงต with some details:.
1
0
18
@KarlPertsch
Karl Pertsch
1 year
Check out @Jesse_Y_Zhang's CoRL oral on LLM-guided skill learning. Simple recipe: start from a base set of skills โ€”> use LLM to guide exploration towards meaningful skill chains โ€”> expand the skill library w/ RL. We show that this "skill bootstrapping" phase helps downstream RL!.
@Jesse_Y_Zhang
Jesse Zhang
1 year
How can our robots autonomously practice **new tasks** in **new environments**?. Introducing BOSS: A reinforcement learning (RL) framework that trains agents to solve new tasks in new environments with LLM guidance!. **CoRL 2023 Oral**. ๐Ÿงต๐Ÿ‘‡
1
2
17
@KarlPertsch
Karl Pertsch
4 years
Excited to be presenting SPiRL as an oral talk at today's plenary session on RL @corl_conf! Join to learn about skill priors for accelerated RL on new tasks!. Oral: Wed (today), 8:15am PST.Interactive: Wed, 12:30pm PST.w/ @YoungwoonLee & @JosephLim_AI.
@KarlPertsch
Karl Pertsch
4 years
How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills!.Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors!.๐Ÿ“„Paper: ๐Ÿ’ปWebsite & Code: Thread๐Ÿ‘‡(1/8)
1
4
18
@KarlPertsch
Karl Pertsch
3 years
Interested in large task-agnostic datasets in robotics? We show how to effectively combine them w/ demonstrations for sample efficient learning of new tasks!.Presenting @corl_conf poster session 4 (Wed 11.30-12.30 GMT)!. ๐Ÿ“œ: ๐Ÿ’ป:
@KarlPertsch
Karl Pertsch
4 years
New paper on *Skill-based Learning with Demonstrations (SkiLD)*!.While current imitation learning follows the _low-level actions_ in the demos, SkiLD follows the demonstrated _skills_. SkiLD enables efficient demo-guided RL & imitation learning on long-horizon tasks!. 1/N
2
3
17
@KarlPertsch
Karl Pertsch
4 months
We extended the deadline for our X-Embodiment workshop at CoRL to Oct 10! Submit your ICLR papers & share your findings with the community! :). PS: we also got funding from Google for some paper awards, so even more reason to submit!.
@KarlPertsch
Karl Pertsch
5 months
Excited to announce the Workshop on X-Embodiment Robot Learning at #Corl2024!. How can we build robot foundation models that can control many different robots & where do we find data to train them?.Submit your work on scalable & x-embodied robot learning and join us in Munich! ๐Ÿ™‚
Tweet media one
0
0
16
@KarlPertsch
Karl Pertsch
1 year
@chris_j_paxton @_ericrosen Indeed existing x-embodiment models like RT-X/Octo don't align action spaces or condition on action space definition/URDF -- that's a major reason why they don't usually work 0-shot on new robot setups: they don't know what action space to use -- we're hoping to fix that soon! :).
3
3
16
@KarlPertsch
Karl Pertsch
1 year
Super cool work from Cheng et al! Robot data collection in the wild without the pain of moving robots around!.Before we deploy robots at scale + in the wild, this can greatly increase diversity of robot data + help overcome activation energy for getting generalizable policies.
@chichengcc
Cheng Chi
1 year
Can we collect robot data without any robots?. Introducing Universal Manipulation Interface (UMI). An open-source $400 system from @Stanford designed to democratize robot data collection. 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)
1
1
17
@KarlPertsch
Karl Pertsch
5 years
Check out our new work on visual planning and control! Our model uses a divide-and-conquer strategy to break long-horizon planning problems into easier sub-problems, allowing us to solve tasks that require planning over hundreds of time steps!.
@svlevine
Sergey Levine
5 years
Instead of predicting in sequence, we can predict hierarchically: midpoint b/w start&goal, midpoint between that, etc. This hierarchical approach is great for planning w/ images!. @KarlPertsch, @_oleh, @febert8888, @chelseabfinn, @dineshjayaraman .
1
2
15
@KarlPertsch
Karl Pertsch
21 days
@VilleKuosmanen Fine-tuning the vision encoder tuned out to be very important in our openvla experiments, so Iโ€™d recommend trying LoRA on everything and the โ€œsandwichโ€ top+bottom thing you suggested. We have some LoRA experiments in the openvla paper, but only tested it after robot pretraining.
2
1
16
@KarlPertsch
Karl Pertsch
7 months
This should be a great tutorial by Lerrel, @notmahi and @RussTedrake for anyone wanting to catch up on modern techniques for imitation learning!. Lots of the practical tips should transfer to fine-tuning of large pre-trained models too!.(see zoom link in Lerrel's thread).
@LerrelPinto
Lerrel Pinto
7 months
This #RSS2024 on July 19, we are organizing a tutorial on supervised policy learning for real world robots!. Talks by @notmahi & @RussTedrake will cover the fundamentals of imitation, recent algorithms, walk-through code, and practical considerations.
Tweet media one
0
0
15
@KarlPertsch
Karl Pertsch
3 years
Check out Lucy's and @YoungwoonLee's cool work on combining learned skills and model-based RL! Enables more sample efficient learning than model-free skill-RL approaches like SPiRL!. first skill-based RL results on the new CALVIN benchmark!. Lucy's first paper -- well done! :).
@lucy_x_shi
Lucy Shi
3 years
Can robots be farsighted? We introduce SkiMo (Skill + Model-based RL), which allows more accurate and efficient long-horizon planning through temporal abstraction. SkiMo learns temporally-extended, sparse-reward tasks with 5x fewer samples!. ๐Ÿงต๐Ÿ‘‡
1
1
14
@KarlPertsch
Karl Pertsch
3 years
Excited to present two papers w/ co-authors at ICLR this week!. 1โƒฃ Task-Induced Representation Learning:.We investigate representation learning in visually complex environments. Q: How can we learn to represent important info & ignore distractors? .A: Use prior task experience!
1
2
14
@KarlPertsch
Karl Pertsch
1 year
2D trajectories for task specification are more grounded than language, but easier to provide than goal images, eg by crowd workers / VLMs. easy to relabel in hindsight + transfer nicely from human video!.Very cool work @Jiayuan_Gu @xiao_ted et al!.
@xiao_ted
Ted Xiao
1 year
Instead of just telling robots โ€œwhat to doโ€, can we also guide robots by telling them โ€œhow to doโ€ tasks?. Unveiling RT-Trajectory, our new work which introduces trajectory conditioned robot policies. These coarse trajectory sketches help robots generalize to novel tasks! ๐Ÿงตโฌ‡๏ธ
0
2
13
@KarlPertsch
Karl Pertsch
6 years
(1/n) Check out our new work on keyframe-based video prediction for subgoal discovery! (joint work with @_oleh, in collaboration with @yjy0625, @CSProfKGD, Joseph Lim, @KostasPenn, @drew_jaegle).
Tweet media one
1
1
12
@KarlPertsch
Karl Pertsch
1 year
Out of the box, Octo can control multiple robots, use 3rd person + wrist cameras, language instructions & goal images. Key feature: Octo can be quickly finetuned to use new observation & action spaces! In <5 hours on a 24 GB VRAM GPU!. 2/
1
1
12
@KarlPertsch
Karl Pertsch
2 years
By training on in-the-wild human videos, we can use demonstrations from *unseen* environments, e.g. 3 mins of video recorded in my kitchen substantially accelerates RL in a new robot env in our experiments.
1
4
11
@KarlPertsch
Karl Pertsch
6 years
We will present our work on keyframe-based video prediction in the workshop on Task-agnostic RL (TARL) tomorrow afternoon. If you're at ICLR, come see us at our poster! (joint work with @_oleh, @yiy0602, @CSProfKGD, Joseph Lim, @KostasPenn , @drew_jaegle).
@KarlPertsch
Karl Pertsch
6 years
(1/n) Check out our new work on keyframe-based video prediction for subgoal discovery! (joint work with @_oleh, in collaboration with @yjy0625, @CSProfKGD, Joseph Lim, @KostasPenn, @drew_jaegle).
Tweet media one
1
6
10
@KarlPertsch
Karl Pertsch
1 year
To show that the data is useful for learning, we trained a series of large-scale policies (RT-1-X, RT-2-X) & found co-training with our data to improve performance substantially! Weโ€™re releasing model checkpoints too, check Quanโ€™s tweets for details!.11/.
@QuanVng
Quan Vuong
1 year
RT-X: generalist AI models lead to 50% improvement over RT-1 and 3x improvement over RT-2, our previous best models. ๐Ÿ”ฅ๐Ÿฅณ๐Ÿงต. Project website:
1
2
10
@KarlPertsch
Karl Pertsch
1 year
We assembled the dataset by pooling *existing* robot datasets from our collaborators @ Google and many many academic labs (34!). In total we included 60 individual datasets with 22 different robot embodiments โ€” many robot arms, bi-manual robots, quadrupeds, wheeled robots etc. 2/
Tweet media one
1
2
8
@KarlPertsch
Karl Pertsch
3 months
Compared to OpenVLA, our previous VLA policy (see below), ฯ€โ‚€ uses flow matching as the decoding mechanism (fast + expressive). That's key to make it work on high-freq data -- it allows us to run a 3.3B param model for 50Hz control on a 4090!.
@KarlPertsch
Karl Pertsch
8 months
Very excited to release OpenVLA today, a 7B parameter open-source vision-language-action model (VLA). ๐Ÿฆพ SoTA generalist policy (better than Octo & RT-2-X).โšก๏ธ Easy to run & fine-tune on 1 GPU with quantization and LoRA.๐Ÿ’ป Open-source PyTorch codebase.๐Ÿค— Models on HuggingFace. 1/
2
0
9
@KarlPertsch
Karl Pertsch
8 months
How to use it?.Itโ€™s all on HuggingFace โ€” two lines to load the model, no code install needed. We also open-source our full PyTorch training code & data. Scales from fine-tuning on 1 GPU to training billion-parameter VLAs on distributed clusters!. 5/
Tweet media one
1
1
9
@KarlPertsch
Karl Pertsch
4 months
Turns out this was a placeholder zoom link ๐Ÿ˜….The correct link is here:
2
0
9
@KarlPertsch
Karl Pertsch
1 year
Here are the dataset resource links:.โœ…Colab (vis / download / data loaders):ย โœ…Overview Sheet (filtering):ย  All data is fully open-source under a commercially usable CC-BY 4.0 license!. 10/.
1
2
9
@KarlPertsch
Karl Pertsch
7 months
Great work! ๐Ÿ’ฏ lots of room to improve on the vision side of VLMs โ€” robotics could be a great test bed too!. For VLA training (VLM+action) we found existing vision encoders need lots of fine-tuning to work well for robot control, though admittedly ๐Ÿค– eval isnโ€™t straightforward ๐Ÿฅฒ.
@sainingxie
Saining Xie
7 months
Introducing Cambrian-1, a fully open project from our group at NYU. The world doesn't need another MLLM to rival GPT-4V. Cambrian is unique as a vision-centric exploration & here's why I think it's time to shift focus from scaling LLMs to enhancing visual representations.๐Ÿงต[1/n]
Tweet media one
0
1
9
@KarlPertsch
Karl Pertsch
1 year
Creating this dataset was a huge community effort (look at that author list ๐Ÿ˜€)! I led the dataset construction and had calls with countless labs & everybody was very excited to contribute data โ€” there is a lot of momentum in the community towards sharing & reusing data ๐Ÿ™‚. 12/
Tweet media one
1
0
9
@KarlPertsch
Karl Pertsch
1 year
Iโ€™m very excited to see how the community will use this dataset! Let me know if you have any questions! ๐Ÿ™‚. ๐Ÿ’ปProject Website: 15/15.
1
1
9
@KarlPertsch
Karl Pertsch
1 year
The full dataset download is ~4.5 TB. We also provide a sheet that allows you to filter the data along many attributes, e.g. if you only want to download Franka robot data or only data with wrist cams, natural language instructions etc! Tailor the data to your use case!. 9/
Tweet media one
1
0
8
@KarlPertsch
Karl Pertsch
3 months
Pi is a great place to do fundamental robotics research & publish it!.P.S.: we're hiring :).
1
0
8
@KarlPertsch
Karl Pertsch
5 years
@_oleh and I are presenting our work on hierarchical models for long-horizon prediction and planning at the #BIGICML workshop today, start is at 10.40PT. Come join us to chat about predictive models and model-based RL!.
@svlevine
Sergey Levine
5 years
Instead of predicting in sequence, we can predict hierarchically: midpoint b/w start&goal, midpoint between that, etc. This hierarchical approach is great for planning w/ images!. @KarlPertsch, @_oleh, @febert8888, @chelseabfinn, @dineshjayaraman .
0
2
8
@KarlPertsch
Karl Pertsch
8 months
Check out Sidd's thread about OpenVLA and some key open questions for VLA research!.
@siddkaramcheti
Siddharth Karamcheti
8 months
Thrilled to announce OpenVLA ( โ€“ย a vision-language-action policy for robotic control!. Shout out to my co-leads @moo_jin_kim & @KarlPertsch; see their threads for overviews of our work. Here though, I want to talk about observations & next steps! ๐Ÿงตโฌ‡๏ธ.
0
0
8
@KarlPertsch
Karl Pertsch
4 months
Very cool, thanks for the walk-through on trying the model on robotics data! Spatial grounding is key to make VLMs useful for robotics and Molmo's grounding seems very robust in the examples Kiana tried!.Looking forward to giving it a spin!.
@ehsanik
Kiana Ehsani
4 months
Try out Molmo on your application! This is a great example by @DJiafei! We have a few videos describing Molmo's different capabilities on our blog! This one is me trying it out on a bunch of tasks and images from RT-X:
1
2
8
@KarlPertsch
Karl Pertsch
8 months
How does it work?.We take a strong open-source VLM, Prismatic 7B, and fine-tune it to predict robot actions, using a curated dataset of 970k robot demonstrations. This recipe scales, and allows robotics to reuse pretrained models from the community (SigLIP, DinoV2, Llama2) ๐Ÿš€. 2/
Tweet media one
2
0
8
@KarlPertsch
Karl Pertsch
1 year
Last but not least: Octo is your one-stop-shop for training on OpenX data! Weโ€™re releasing high-quality data loaders that work with PyTorch and JAX + a curated dataset split!. 7/.
@KarlPertsch
Karl Pertsch
1 year
Very excited to release the Open X-Embodiment Dataset today โ€” the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step!. Thereโ€™s lots to unpack here, so letโ€™s do a deep dive into the dataset!. ๐Ÿงต1/15
2
0
7
@KarlPertsch
Karl Pertsch
5 months
Big thanks to my co-organizers @keerthanpg @Lawrence_Y_Chen @lucy_x_shi @xiao_ted @QuanVng @pannag_ Christine Chan @Ken_Goldberg @gauravsukhatme @chelseabfinn!. Paper submission deadline: 10/03.Date: 11/09, Munich, Germany.Workshop Website:
0
3
7
@KarlPertsch
Karl Pertsch
1 year
This was a big team effort w/ collaborator from UC Berkeley, Stanford & CMU!.I'm very grateful to all collaborators!! :) @its_dibya @HomerWalke @kvablack @oier_mees @SudeepDasari @JoeyHejna Tobias Kreiman, Charles Xu @jianlanluo You Liang Tan @DorsaSadigh @chelseabfinn @svlevine.
2
0
6
@KarlPertsch
Karl Pertsch
4 years
Excited to present SPiRL in contributed talks at the Deep RL and Robot Learning workshops @NeurIPSConf! Join us during the poster sessions to chat about all things skill learning & transfer!. DRL Poster: Room F, A1.Robot Learning Poster: C3.w/ @YoungwoonLee & @JosephLim_AI.
@KarlPertsch
Karl Pertsch
4 years
How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills!.Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors!.๐Ÿ“„Paper: ๐Ÿ’ปWebsite & Code: Thread๐Ÿ‘‡(1/8)
0
1
7
@KarlPertsch
Karl Pertsch
4 months
My main message: large scale robot learning today looks very similar to other multi-modal sequence modeling problems, eg VLM training. We train so-called vision-language-action models (VLAs) on large datasets of interleaved image, text and action tokens. 2/
1
0
7
@KarlPertsch
Karl Pertsch
3 months
You can submit ๐ŸŒถ๏ธ questions for the panel on our website: The whole workshop will be live-streamed to YouTube + recorded (if conference internet permits ๐Ÿฅฒ):
3
2
6
@KarlPertsch
Karl Pertsch
1 year
We plan to expand the dataset over time and e.g. add more mobile manipulation and simulation data. If you have data that would be good to integrate, simulated or real, please fill out the form:
Tweet media one
0
0
6
@KarlPertsch
Karl Pertsch
4 months
Notably, the VLM pre-training allows VLAs like RT-2-X and OpenVLA to generalize more broadly than prior robot models w/o internet pre-training. Using VLM backbones also made it easy to optimize training + inference efficiency via LoRA + quantization!. 4/
Tweet media one
Tweet media two
1
0
6
@KarlPertsch
Karl Pertsch
1 year
Using the data is easy! All data is stored in tfrecords & we made a colab for visualizing & downloading the data (w/ examples for efficient data loaders)! Each dataset stores observations/actions in its โ€œnativeโ€ format & resolution, but it's easy to align&mix them on-the-fly!. 8/
Tweet media one
1
0
6
@KarlPertsch
Karl Pertsch
7 months
This is great work! 38 fine-tuning tasks for every eval ๐Ÿคฏ thanks for sharing many ablations @giffmana and team!. Also confirms our finding that vis encoder fine-tuning is required for finegrained spatial tasks like robot control!. Any plans to release larger PaliGemma models? :).
@giffmana
Lucas Beyer (bl16)
7 months
โœจPaliGemma report will hit arxiv tonight. We tried hard to make it interesting, and not "here model. sota results. kthxbye.". So here's some of the many interesting ablations we did, check the paper tomorrow for more!. ๐Ÿงถ
Tweet media one
1
0
6
@KarlPertsch
Karl Pertsch
3 months
Our focus for this first release was to push the dexterity of generalist robot policies. The videos on our blog show some of the most dexterous autonomous policies to date, and they are all based on a single base model checkpoint.
1
0
6
@KarlPertsch
Karl Pertsch
1 year
We analyzed the properties of the combined dataset!.First, the number of datasets per robot embodiment: many academic labs use Franka robot arms, so we have many (smaller) Franka datasets and a long-tail of other robot embodiments!. 3/
Tweet media one
1
3
6
@KarlPertsch
Karl Pertsch
4 months
Thus, we can simply fine-tune existing VLMs on our data to act as robot policies. We can reuse a lot of pieces from the VLM ecosystem -- scalable models, training & serving infra etc. In OpenVLA we packaged all of that into a strong robot policy:
Tweet media one
@KarlPertsch
Karl Pertsch
8 months
Very excited to release OpenVLA today, a 7B parameter open-source vision-language-action model (VLA). ๐Ÿฆพ SoTA generalist policy (better than Octo & RT-2-X).โšก๏ธ Easy to run & fine-tune on 1 GPU with quantization and LoRA.๐Ÿ’ป Open-source PyTorch codebase.๐Ÿค— Models on HuggingFace. 1/
1
0
6
@KarlPertsch
Karl Pertsch
7 months
When collecting your fine-tuning data, start with little variation in terms of objects, positions, scenes, backgrounds, camera angles, etc. It's easier to catch bugs in your robot pipeline this way. But, for best policy generalization, collect more diverse demo data later!. 3/.
1
0
5
@KarlPertsch
Karl Pertsch
8 months
Big shoutout to my co-leads @moo_jin_kim and @siddkaramcheti, and thanks to my advisors @chelseabfinn and @svlevine, and many others involved! Also thanks to @ToyotaResearch for providing the compute to enable this kind of open-source research!. 9/9.
1
0
5
@KarlPertsch
Karl Pertsch
1 year
Weโ€™re fully open-sourcing model checkpoints, our pre-training and finetuning pipelines!. Initially, Octo comes in two sizes: Octo-Small (27M params) and Octo-Base (93M params). All models are on HuggingFace, so loading an Octo model is as easy as this:. 5/
Tweet media one
1
0
5
@KarlPertsch
Karl Pertsch
1 year
Octo is only the first step towards building generalist robot policies and weโ€™re planning to improve the models over time โ€” larger sizes, more robot morphologies, RL etc etc โ€” really excited to see how folks will use Octo! :) . 8/.
1
0
5
@KarlPertsch
Karl Pertsch
7 months
Most importantly: 99% of applications will require *fine-tuning*, i.e. collect a small dataset <100 robot demos in your target domain & fine-tune OpenVLA on it. Why? OpenVLA needs to learn your robot's action space, camera setup etc. More on 0-shot usage at the end!. 2/.
1
0
5
@KarlPertsch
Karl Pertsch
8 months
Please check out Moo Jinโ€™s thread for more details about OpenVLA โ€” Moo Jin really carried the torch in this project, which was the first project in his PhD! Way to go Moo Jin! :).
@moo_jin_kim
Moo Jin Kim
8 months
โœจ Introducing ๐Ž๐ฉ๐ž๐ง๐•๐‹๐€ โ€” an open-source vision-language-action model for robotics! ๐Ÿ‘. - SOTA generalist policy.- 7B params.- outperforms Octo, RT-2-X on zero-shot evals ๐Ÿฆพ.- trained on 970k episodes from OpenX dataset ๐Ÿค–.- fully open: model/code/data all online ๐Ÿค—. ๐Ÿงต๐Ÿ‘‡
1
0
5
@KarlPertsch
Karl Pertsch
1 year
Weโ€™re hoping to continue this momentum and keep growing the dataset ๐Ÿš€! Weโ€™re still figuring out the details, but if you or your lab have data youโ€™d like to contribute feel free to shoot an email toย open-x-embodiment@googlegroups.com and we will get back to you! :). 13/.
1
1
5
@KarlPertsch
Karl Pertsch
4 years
Jun will present our work on augmenting RL w/ motion planners at @corl_conf today. Our RL agents learn to use motion planners for solving challenging manipulation tasks w/ many obstacles!. Interactive Session: today, 11.10am PST. Led jointly by Jun (@junjungoal) & @YoungwoonLee.
0
1
5