Karl Pertsch @KarlPertsch profile

Karl Pertsch

@KarlPertsch

Followers

3K

Following

405

Media

91

Statuses

336

Robot Foundation Models @ UC Berkeley & Stanford & @physical_int | Postdoc w/ Sergey Levine & Chelsea Finn | Prev. Intern @ Google Brain, Meta AI | PhD @ USC.

Joined July 2015

Don't wanna be here? Send us removal request.

Karl Pertsch

@KarlPertsch

16 days

Excited to release FAST, our new robot action tokenizer! 🤖. Some highlights:.- Simple autoregressive VLAs match diffusion VLA performance.- Trains up to 5x faster.- Works on all robot datasets we tested.- First VLAs that work out-of-the-box in new environments!. 🧵/

17

96

508

Karl Pertsch

@KarlPertsch

1 year

Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step!. There’s lots to unpack here, so let’s do a deep dive into the dataset!. 🧵1/15

8

90

444

Karl Pertsch

@KarlPertsch

4 months

I'll give a talk at the Multimodal Agents workshop at ECCV tomorrow Sept 30, at 2:20pm CET. Excited for my first talk at a vision conference: robotics is increasingly becoming a multi-modal sequence modeling problem w/ lots of potential for LLM/VLM researchers to have big impact!

7

46

433

Karl Pertsch

@KarlPertsch

1 year

3 mo. ago we released the Open X-Embodiment dataset, today we’re doing the next step:.Introducing Octo 🐙, a generalist robot policy, trained on 800k robot trajectories, stronger than RT-1X, flexible observation + action spaces, fully open source!.💻: /🧵

10

90

368

Karl Pertsch

@KarlPertsch

8 months

Very excited to release OpenVLA today, a 7B parameter open-source vision-language-action model (VLA). 🦾 SoTA generalist policy (better than Octo & RT-2-X).⚡️ Easy to run & fine-tune on 1 GPU with quantization and LoRA.💻 Open-source PyTorch codebase.🤗 Models on HuggingFace. 1/

4

64

390

Karl Pertsch

@KarlPertsch

4 months

It was fun giving this talk yesterday!.The live talk wasn't recorded, but I just uploaded the recording of a practice run I did the night before (link below). A short thread of key points from the talk 🧵.

Karl Pertsch

@KarlPertsch

4 months

I'll give a talk at the Multimodal Agents workshop at ECCV tomorrow Sept 30, at 2:20pm CET. Excited for my first talk at a vision conference: robotics is increasingly becoming a multi-modal sequence modeling problem w/ lots of potential for LLM/VLM researchers to have big impact!

3

29

224

Karl Pertsch

@KarlPertsch

11 months

Access to *diverse* training data is a major bottleneck in robot learning. We're releasing DROID, a large-scale in-the-wild manipulation dataset. 76k trajectories, 500+ scenes, multi-view stereo, language annotations etc.Check it out & download today!. 💻:

8

59

193

Karl Pertsch

@KarlPertsch

7 months

Our OpenVLA model has been downloaded more than 20k times in less than a month -- the most for any robotics model on the 🤗 hub by a long shot!. Here is a little "cook book" for people who want to get started using OpenVLA! 🧑‍🍳. 1/🧵

2

16

166

Karl Pertsch

@KarlPertsch

3 months

I will be at @corl_conf this week, co-presenting 4 papers and one workshop across the full spectrum of scalable robot learning research: data, models & evals!. Also happy to chat about research @physical_int!. Short 🧵w/ paper pointers 👇

3

13

132

Karl Pertsch

@KarlPertsch

3 months

I started at Pi in part-time a few months back, and I'm excited to share what we've been up to!.π₀ is the first generalist VLA that can solve many *dexterous* tasks, including some really long-horizon laundry manipulation tasks. A few notes 👇.

Physical Intelligence

@physical_int

3 months

At Physical Intelligence (π) our mission is to bring general-purpose AI into the physical world. We're excited to show the first step towards this mission - our first generalist model π₀ 🧠 🤖. Paper, blog, uncut videos:

2

5

108

Karl Pertsch

@KarlPertsch

9 months

Our OpenX paper won best paper at ICRA! Congrats to all my co-authors! 🎉🎉.This is an ongoing effort, we recently added new datasets from the community that double the size of the OpenX dataset -- keep 'em coming! :) . Check datasets & how to contribute:

3

14

104

Karl Pertsch

@KarlPertsch

5 months

Excited to announce the Workshop on X-Embodiment Robot Learning at #Corl2024!. How can we build robot foundation models that can control many different robots & where do we find data to train them?.Submit your work on scalable & x-embodied robot learning and join us in Munich! 🙂

2

9

102

Karl Pertsch

@KarlPertsch

9 months

Octo has been accepted to RSS and we finally arxiv'd the paper! 🐙.Many small updates vs the December release: more ablations, new checkpoints, code fixes etc.👇.

AK

@_akhaliq

9 months

Octo. An Open-Source Generalist Robot Policy. Large policies pretrained on diverse robot datasets have the potential to transform robotic learning: instead of training new policies from scratch, such generalist robot policies may be finetuned with only a

4

10

97

Karl Pertsch

@KarlPertsch

3 months

Happy that *two* of our CoRL papers got nominated as outstanding paper award finalists (ReMix & OpenVLA)!. Congrats to all my co-authors, esp. @JoeyHejna, @moo_jin_kim, @siddkaramcheti. And congrats to the award winners from AI2 & TRI, well deserved! :)

4

94

Karl Pertsch

@KarlPertsch

3 months

Excited to kick off the X-Embodiment workshop @corl_conf in the morning (9am @ room Terra)! . We have an exciting lineup of speakers, including @SongShuran, @kvablack, @ryancjulian, @ehsanik, @yukez, @Ed__Johns, @svlevine

3

14

87

Karl Pertsch

@KarlPertsch

1 year

It was fun to present Open X-Embodiment & RT-X at CoRL today with @QuanVng! We were very excited about the initial release of the Open X-Embodiment dataset, but it's just the start! We covered lots of open problems in the talk as well👇

1

7

73

Karl Pertsch

@KarlPertsch

3 months

Had a great time organizing yesterday's X-Embodiment workshop!. The full recording and all papers are now live on our workshop website -- learn all the latest on x-embodiment & scalable robot learning research!.

Karl Pertsch

@KarlPertsch

3 months

Excited to kick off the X-Embodiment workshop @corl_conf in the morning (9am @ room Terra)! . We have an exciting lineup of speakers, including @SongShuran, @kvablack, @ryancjulian, @ehsanik, @yukez, @Ed__Johns, @svlevine

3

14

73

Karl Pertsch

@KarlPertsch

2 years

Excited to present STAR, our work on cross-domain imitation @corl_conf!.Our goal: use demonstrations across domains, e.g. from robot in kitchen A to robot in kitchen B, or even from human to robot. With STAR I can teach a robot new tasks with videos recorded in my kitchen!. 🧵👇

1

18

68

Karl Pertsch

@KarlPertsch

16 days

With FAST, we scale autoregressive VLA training to pi0 scale, and we can solve some pretty complex robot tasks, simply via next token prediction!. The best part: in our experiments, pi0+FAST converges 5x faster than diffusion pi0! Days instead of weeks of training! 🎉. 3/

3

8

59

Karl Pertsch

@KarlPertsch

1 year

It's awesome to see the positive community response to our release! We're getting inquiries from around the world to contribute more data -- wheeled robots, drones, humanoids etc! 🚀🚀🚀.Please keep them coming 🙂.open-x-embodiment@googlegroups.com.

Karl Pertsch

@KarlPertsch

1 year

Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step!. There’s lots to unpack here, so let’s do a deep dive into the dataset!. 🧵1/15

0

7

58

Karl Pertsch

@KarlPertsch

5 months

If you're interested in scalable robot learning & applying for PhDs this cycle, apply to @shahdhruv_'s new lab at Princeton!.Dhruv pioneered X-embodied robot foundation models for navigation and I'm sure his lab will work on lots of exciting large-scale robot learning problems!.

Dhruv Shah

@shahdhruv_

5 months

Excited to share that I will be joining @Princeton as an Assistant Professor in ECE & Robotics next academic year! 🐯🤖. I am recruiting PhD students for the upcoming admissions cycle. If you are interested in working with me, please consider applying.

0

5

57

Karl Pertsch

@KarlPertsch

5 months

Curating large-scale robot training datasets is mostly black magic right now -- I called the data mix we used for Octo & OpenVLA the "magic soup"🧙‍♂️. In our project ReMix, Joey made a first step towards a more principled solution -- automatically finding good data mixture weights!.

Joey Hejna

@JoeyHejna

5 months

As imitation learning policies continue to scale, deciding how to weigh different robot datasets will become even more difficult. To address this problem we introduce ReMix, a method for automatically curating large RT-X scale imitation learning datasets. 🧵(1/5)

0

6

55

Karl Pertsch

@KarlPertsch

9 months

Evaluation of robot foundation models is a huge challenge: imagine running robot rollouts across 100s of scenes + tasks + embodiments. How can we make eval keep up w/ model improvements?.Introducing SIMPLER: sim eval envs for your favorite real robot foundation models!. Short 🧵.

Xuanlin Li (Simon)

@XuanlinLi2

9 months

Scalable, reproducible, and reliable robotic evaluation remains an open challenge, especially in the age of generalist robot foundation models. Can *simulation* effectively predict *real-world* robot policy performance & behavior?. Presenting SIMPLER!👇.

1

6

41

Karl Pertsch

@KarlPertsch

7 months

Excited to release our work on Embodied Chain-of-Thought Reasoning today!. We can boost performance of vision-language-action models like OpenVLA by a large margin without any additional robot training data!. The key: simply think before you act!. 1/.

Michał Zawalski

@MiZawalski

7 months

🤖Can robots think through complex tasks step-by-step like language models?.We present Embodied Chain-of-Thought Reasoning (ECoT): enabling robots to reason about plans and actions for better performance🎯, interpretability🧐, and generalization🌎. See

1

8

42

Karl Pertsch

@KarlPertsch

2 years

Robot learning needs data, but collecting it is expensive. How can we make the most of existing datasets?.In SPRINT, we use LLMs to auto-augment language instructions on robot datasets. Our agents learn a lot more tasks during pre-training *for free*!.See Jesse’s 🧵for details!👇.

Jesse Zhang

@Jesse_Y_Zhang

2 years

Having humans annotate data to pre-train robots is expensive and time-consuming!. Introducing SPRINT: .A pre-training approach using LLMs and offline RL to equip robots w/ many language-annotated skills while minimizing human annotation effort!. URL: 🧵👇

1

2

39

Karl Pertsch

@KarlPertsch

4 years

New paper on *Skill-based Learning with Demonstrations (SkiLD)*!.While current imitation learning follows the _low-level actions_ in the demos, SkiLD follows the demonstrated _skills_. SkiLD enables efficient demo-guided RL & imitation learning on long-horizon tasks!. 1/N

1

5

34

Karl Pertsch

@KarlPertsch

16 days

The key idea: FAST compresses actions before training on them. This removes redundancy & makes autoregressive VLA training on high-frequency tasks possible, where models like OpenVLA failed. We use the discrete cosine transform for compressing actions (also used by eg JPEG). 2/

1

3

34

Karl Pertsch

@KarlPertsch

10 months

Shoutout to the folks at Rerun who built a visualizer for our DROID dataset -- looks very cool! Allows you to visualize the point cloud from our multi-view stereo cams as well! And should work for any new dataset collected on the DROID robot platform!.Thanks @rerundotio :).

Rerun

@rerundotio

10 months

A Rerun Viewer for the DROID Dataset!. DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset is a robot manipulation dataset by @SashaKhazatsky et al. with 76k demonstration trajectories or 350h of interaction data, collected across 564 scenes and 86 tasks.

1

2

32

Karl Pertsch

@KarlPertsch

2 years

New work on scaling robot learning from the team I work with at Google!. Especially excited about RT1’s capability to ingest data from diverse sources, eg sim or even experience from other robots + demonstrate transfer -- very useful for scaling robotic dataset size & diversity!.

Karol Hausman

@hausman_k

2 years

Introducing RT-1, a robotic model that can execute over 700 instructions in the real world at 97% success rate!. Generalizes to new tasks✅.Robust to new environments and objects✅.Fast inference for real time control✅.Can absorb multi-robot data✅.Powers SayCan✅.🧵👇

1

0

29

Karl Pertsch

@KarlPertsch

2 years

Data collection is a major bottleneck in robot learning: it’s mostly done w/ tedious & expensive human teleoperation. Can we use learning to make data collection itself more efficient?.Introducing PATO, our approach for scalable robot data collection w/ learned assistive policies.

Shivin Dass

@ShivinDass

2 years

Excited to present PATO: Policy Assisted TeleOperation, our recent work on scaling robot data collection!. PATO uses a policy trained on prior data to assist the user during data collection, making teleop easier and even allows to teleop multiple robots simultaneously. 🧵👇

1

4

29

Karl Pertsch

@KarlPertsch

4 years

Grateful to be awarded the best paper presentation award @corl_conf! 🎉.Huge credit goes to all my lab mates @ CLVR lab, particularly to my co-author @YoungwoonLee, for all the tireless feedback that greatly improved the talk! :). Talk recording:

3

2

29

Karl Pertsch

@KarlPertsch

5 months

Our SIMPLER sim evaluation, now w/ GPU parallelization thanks to @Stone_Tao! Great work!.

Stone Tao

@Stone_Tao

5 months

just made it possible to evaluate generalist robotics models like Octo at 60-100x real world evaluation speeds via gpu simulation and rendering (~10x faster than original cpu sim code). All videos below are from our open source ManiSkill GPU sim!

1

2

28

Karl Pertsch

@KarlPertsch

16 days

We are releasing a FAST tokenizer we pre-trained on 1M real robot action sequences. In our tests it works well across all kind of robots — and it’s all on HuggingFace! Happy VLA training! :). 5/

1

2

28

Karl Pertsch

@KarlPertsch

11 months

Check out Lucy's new project! Finally, every roboticist's favorite pastime, "yelling at your robot", can be useful for once!.Bonus: lots of ALOHA trail mix in the lab! 😍.

Lucy Shi

@lucy_x_shi

11 months

Introducing Yell At Your Robot (YAY Robot!) 🗣️- a fun collaboration b/w @Stanford and @UCBerkeley 🤖. We enable robots to improve on-the-fly from language corrections: robots rapidly adapt in real-time and continuously improve from human verbal feedback. YAY Robot enables

1

0

24

Karl Pertsch

@KarlPertsch

4 years

How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills!.Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors!.📄Paper: 💻Website & Code: Thread👇(1/8)

2

11

24

Karl Pertsch

@KarlPertsch

1 year

If you want to browse through the Open X-Embodiment data, but don't like fiddling with Colabs, check out this neat website @its_dibya built that gives you a quick overview of all datasets!.

Dibya Ghosh

@its_dibya

1 year

Got a chance to dig through the big robot X-embodiment dataset released last week, and hacked together a little website for others to look through the data. Check it out! There's some pretty random and diverse robot data in there

0

2

23

Karl Pertsch

@KarlPertsch

8 months

This looks awesome! Simulation can be a valuable tool for robot data scaling & eval, but the hard part is building diverse simulation envs AND datasets. Glad to see Soroush et al's sim data line of work expanded to more diverse envs! Excited to give this a try!.

Soroush Nasiriany

@snasiriany

8 months

I’m excited to introduce RoboCasa, a large-scale simulation framework for everyday tasks. Scaling is the key driving force to unlocking generalist robots, and RoboCasa leverages simulation to take scaling to a whole new level. A short 🧵

2

3

24

Karl Pertsch

@KarlPertsch

16 days

My favorite result: with FAST we can finally train VLAs on the DROID dataset & they work zero-shot in many scenes! Below is the same policy controlling robots at Berkeley, Stanford and UW. Just point a camera at the scene, type out an instruction, et voila!. 4/

2

23

Karl Pertsch

@KarlPertsch

3 months

Nice! Hopefully more hardware companies will follow this example and contribute to open-source datasets for robot learning research! :).

Unitree

@UnitreeRobotics

3 months

Unitree G1 Open Source Dataset.In order to promote the development of the global embodied AI industry, the Unitree G1 robot operation data set is open sourced, adapted to a variety of open source solutions, and continuously updated:.Open source data collection:

0

2

23

Karl Pertsch

@KarlPertsch

4 months

Big thanks to my (co-)leads on the presented papers:.OpenVLA: @moo_jin_kim @siddkaramcheti .Embodied CoT: @MiZawalski @verityw_ . If you're interested in scalable robot learning, go follow them :). Full talk recording:

0

5

22

Karl Pertsch

@KarlPertsch

4 months

The talk covers our recent work on training vision-language-action models for robotics. I'll discuss how embracing LLMs/VLMs in robotics allows us to scale policy learning, but also cover key differences btw robotics and other multi-modal agents. Zoom:

3

4

22

Karl Pertsch

@KarlPertsch

2 years

Glad to see RT-2 out! We show that VLM backbones are a great way to equip policies with robustness from internet-scale data. RT-2 strongly improves the generalization ability of existing skills (eg new scenes / objects) -- learning new low-level behaviors is the next frontier!.

Karol Hausman

@hausman_k

2 years

PaLM-E or GPT-4 can speak in many languages and understand images. What if they could speak robot actions?. Introducing RT-2: our new model that uses a VLM (up to 55B params) backbone and fine-tunes it to directly output robot actions!

1

2

21

Karl Pertsch

@KarlPertsch

9 months

Big FOMO! -- but you guys will rock the presentation :) .If you're @ ICRA, check out Quan's presentation of our Open X-Embodiment project today, nominated for a best paper award 🎉. Room: CC-Main Hall.Time: 10:30-12:00.

Quan Vuong

@QuanVng

9 months

Wish @KarlPertsch was at ICRA for Open X-Embodiment 🥲.

1

0

20

Karl Pertsch

@KarlPertsch

5 years

Results and code: w/ @_oleh, @febert8888, @chelseabfinn, @dineshjayaraman, @svlevine.

0

1

19

Karl Pertsch

@KarlPertsch

16 days

Please find more details about FAST in our paper!.Thanks to @KyleStachowicz and many colleagues @physical_int who helped with this project!. Paper: Website:

2

1

19

Karl Pertsch

@KarlPertsch

16 days

I am very excited about FAST, because. (1) it makes VLA training really easy, even on complex tasks, and. (2) with FAST it’s trivial to interleave non-robot data in VLA training (web data, subgoals, video prediction etc), it’s all just tokens!. Lots of things to explore! :). 6/.

1

2

19

Karl Pertsch

@KarlPertsch

7 months

Cool use of a fine-tuned VLM for autonomous driving! Appreciate all the ablations in the paper + focus on speeding up inference on edge compute!.

Hang Zhao

@zhaohang0124

7 months

Introducing 𝐃𝐫𝐢𝐯𝐞𝐕𝐋𝐌, VLM meets Autonomous Driving. We propose a dual system that drives a car autonomously in complex driving scenarios. - Slow system: VLM.- Fast system: classical AD pipeline.Enjoy our onboard demo!.Project Page:

0

2

18

Karl Pertsch

@KarlPertsch

3 months

Check out Kevin's thread on π₀ -- Kevin had huge impact on model design & implementation!. To get all practitioner's tips for training generalist robot policies, don't miss his talk at our X-Embodiment workshop at CoRL next week! (we'll try to stream!).

Kevin Black

@kvablack

3 months

It's been 6 months since I slammed the brakes on several PhD research projects to go work at π. 😅 super excited to finally share our results! A short 🧵 with some details:.

1

0

18

Karl Pertsch

@KarlPertsch

1 year

Check out @Jesse_Y_Zhang's CoRL oral on LLM-guided skill learning. Simple recipe: start from a base set of skills —> use LLM to guide exploration towards meaningful skill chains —> expand the skill library w/ RL. We show that this "skill bootstrapping" phase helps downstream RL!.

Jesse Zhang

@Jesse_Y_Zhang

1 year

How can our robots autonomously practice **new tasks** in **new environments**?. Introducing BOSS: A reinforcement learning (RL) framework that trains agents to solve new tasks in new environments with LLM guidance!. **CoRL 2023 Oral**. 🧵👇

1

2

17

Karl Pertsch

@KarlPertsch

4 years

Excited to be presenting SPiRL as an oral talk at today's plenary session on RL @corl_conf! Join to learn about skill priors for accelerated RL on new tasks!. Oral: Wed (today), 8:15am PST.Interactive: Wed, 12:30pm PST.w/ @YoungwoonLee & @JosephLim_AI.

Karl Pertsch

@KarlPertsch

4 years

How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills!.Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors!.📄Paper: 💻Website & Code: Thread👇(1/8)

1

4

18

Karl Pertsch

@KarlPertsch

3 years

Interested in large task-agnostic datasets in robotics? We show how to effectively combine them w/ demonstrations for sample efficient learning of new tasks!.Presenting @corl_conf poster session 4 (Wed 11.30-12.30 GMT)!. 📜: 💻:

Karl Pertsch

@KarlPertsch

4 years

New paper on *Skill-based Learning with Demonstrations (SkiLD)*!.While current imitation learning follows the _low-level actions_ in the demos, SkiLD follows the demonstrated _skills_. SkiLD enables efficient demo-guided RL & imitation learning on long-horizon tasks!. 1/N

2

3

17

Karl Pertsch

@KarlPertsch

4 months

We extended the deadline for our X-Embodiment workshop at CoRL to Oct 10! Submit your ICLR papers & share your findings with the community! :). PS: we also got funding from Google for some paper awards, so even more reason to submit!.

Karl Pertsch

@KarlPertsch

5 months

Excited to announce the Workshop on X-Embodiment Robot Learning at #Corl2024!. How can we build robot foundation models that can control many different robots & where do we find data to train them?.Submit your work on scalable & x-embodied robot learning and join us in Munich! 🙂

0

16

Karl Pertsch

@KarlPertsch

1 year

@chris_j_paxton @_ericrosen Indeed existing x-embodiment models like RT-X/Octo don't align action spaces or condition on action space definition/URDF -- that's a major reason why they don't usually work 0-shot on new robot setups: they don't know what action space to use -- we're hoping to fix that soon! :).

3

16

Karl Pertsch

@KarlPertsch

1 year

Super cool work from Cheng et al! Robot data collection in the wild without the pain of moving robots around!.Before we deploy robots at scale + in the wild, this can greatly increase diversity of robot data + help overcome activation energy for getting generalizable policies.

Cheng Chi

@chichengcc

1 year

Can we collect robot data without any robots?. Introducing Universal Manipulation Interface (UMI). An open-source $400 system from @Stanford designed to democratize robot data collection. 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)

1

17

Karl Pertsch

@KarlPertsch

5 years

Check out our new work on visual planning and control! Our model uses a divide-and-conquer strategy to break long-horizon planning problems into easier sub-problems, allowing us to solve tasks that require planning over hundreds of time steps!.

Sergey Levine

@svlevine

5 years

Instead of predicting in sequence, we can predict hierarchically: midpoint b/w start&goal, midpoint between that, etc. This hierarchical approach is great for planning w/ images!. @KarlPertsch, @_oleh, @febert8888, @chelseabfinn, @dineshjayaraman .

1

2

15

Karl Pertsch

@KarlPertsch

21 days

@VilleKuosmanen Fine-tuning the vision encoder tuned out to be very important in our openvla experiments, so I’d recommend trying LoRA on everything and the “sandwich” top+bottom thing you suggested. We have some LoRA experiments in the openvla paper, but only tested it after robot pretraining.

2

1

16

Karl Pertsch

@KarlPertsch

7 months

This should be a great tutorial by Lerrel, @notmahi and @RussTedrake for anyone wanting to catch up on modern techniques for imitation learning!. Lots of the practical tips should transfer to fine-tuning of large pre-trained models too!.(see zoom link in Lerrel's thread).

Lerrel Pinto

@LerrelPinto

7 months

This #RSS2024 on July 19, we are organizing a tutorial on supervised policy learning for real world robots!. Talks by @notmahi & @RussTedrake will cover the fundamentals of imitation, recent algorithms, walk-through code, and practical considerations.

0

15

Karl Pertsch

@KarlPertsch

3 years

Check out Lucy's and @YoungwoonLee's cool work on combining learned skills and model-based RL! Enables more sample efficient learning than model-free skill-RL approaches like SPiRL!. first skill-based RL results on the new CALVIN benchmark!. Lucy's first paper -- well done! :).

Lucy Shi

@lucy_x_shi

3 years

Can robots be farsighted? We introduce SkiMo (Skill + Model-based RL), which allows more accurate and efficient long-horizon planning through temporal abstraction. SkiMo learns temporally-extended, sparse-reward tasks with 5x fewer samples!. 🧵👇

1

14

Karl Pertsch

@KarlPertsch

3 years

Excited to present two papers w/ co-authors at ICLR this week!. 1⃣ Task-Induced Representation Learning:.We investigate representation learning in visually complex environments. Q: How can we learn to represent important info & ignore distractors? .A: Use prior task experience!

1

2

14

Karl Pertsch

@KarlPertsch

1 year

2D trajectories for task specification are more grounded than language, but easier to provide than goal images, eg by crowd workers / VLMs. easy to relabel in hindsight + transfer nicely from human video!.Very cool work @Jiayuan_Gu @xiao_ted et al!.

Ted Xiao

@xiao_ted

1 year

Instead of just telling robots “what to do”, can we also guide robots by telling them “how to do” tasks?. Unveiling RT-Trajectory, our new work which introduces trajectory conditioned robot policies. These coarse trajectory sketches help robots generalize to novel tasks! 🧵⬇️

0

2

13

Karl Pertsch

@KarlPertsch

6 years

(1/n) Check out our new work on keyframe-based video prediction for subgoal discovery! (joint work with @_oleh, in collaboration with @yjy0625, @CSProfKGD, Joseph Lim, @KostasPenn, @drew_jaegle).

1

12

Karl Pertsch

@KarlPertsch

1 year

Out of the box, Octo can control multiple robots, use 3rd person + wrist cameras, language instructions & goal images. Key feature: Octo can be quickly finetuned to use new observation & action spaces! In <5 hours on a 24 GB VRAM GPU!. 2/

1

12

Karl Pertsch

@KarlPertsch

2 years

By training on in-the-wild human videos, we can use demonstrations from *unseen* environments, e.g. 3 mins of video recorded in my kitchen substantially accelerates RL in a new robot env in our experiments.

1

4

11

Karl Pertsch

@KarlPertsch

6 years

We will present our work on keyframe-based video prediction in the workshop on Task-agnostic RL (TARL) tomorrow afternoon. If you're at ICLR, come see us at our poster! (joint work with @_oleh, @yiy0602, @CSProfKGD, Joseph Lim, @KostasPenn , @drew_jaegle).

Karl Pertsch

@KarlPertsch

6 years

(1/n) Check out our new work on keyframe-based video prediction for subgoal discovery! (joint work with @_oleh, in collaboration with @yjy0625, @CSProfKGD, Joseph Lim, @KostasPenn, @drew_jaegle).

1

6

10

Karl Pertsch

@KarlPertsch

1 year

To show that the data is useful for learning, we trained a series of large-scale policies (RT-1-X, RT-2-X) & found co-training with our data to improve performance substantially! We’re releasing model checkpoints too, check Quan’s tweets for details!.11/.

Quan Vuong

@QuanVng

1 year

RT-X: generalist AI models lead to 50% improvement over RT-1 and 3x improvement over RT-2, our previous best models. 🔥🥳🧵. Project website:

1

2

10

Karl Pertsch

@KarlPertsch

1 year

We assembled the dataset by pooling *existing* robot datasets from our collaborators @ Google and many many academic labs (34!). In total we included 60 individual datasets with 22 different robot embodiments — many robot arms, bi-manual robots, quadrupeds, wheeled robots etc. 2/

1

2

8

Karl Pertsch

@KarlPertsch

3 months

Compared to OpenVLA, our previous VLA policy (see below), π₀ uses flow matching as the decoding mechanism (fast + expressive). That's key to make it work on high-freq data -- it allows us to run a 3.3B param model for 50Hz control on a 4090!.

Karl Pertsch

@KarlPertsch

8 months

Very excited to release OpenVLA today, a 7B parameter open-source vision-language-action model (VLA). 🦾 SoTA generalist policy (better than Octo & RT-2-X).⚡️ Easy to run & fine-tune on 1 GPU with quantization and LoRA.💻 Open-source PyTorch codebase.🤗 Models on HuggingFace. 1/

2

0

9

Karl Pertsch

@KarlPertsch

8 months

How to use it?.It’s all on HuggingFace — two lines to load the model, no code install needed. We also open-source our full PyTorch training code & data. Scales from fine-tuning on 1 GPU to training billion-parameter VLAs on distributed clusters!. 5/

1

9

Karl Pertsch

@KarlPertsch

4 months

Turns out this was a placeholder zoom link 😅.The correct link is here:

2

0

9

Karl Pertsch

@KarlPertsch

1 year

Here are the dataset resource links:.✅Colab (vis / download / data loaders): ✅Overview Sheet (filtering): All data is fully open-source under a commercially usable CC-BY 4.0 license!. 10/.

1

2

9

Karl Pertsch

@KarlPertsch

7 months

Great work! 💯 lots of room to improve on the vision side of VLMs — robotics could be a great test bed too!. For VLA training (VLM+action) we found existing vision encoders need lots of fine-tuning to work well for robot control, though admittedly 🤖 eval isn’t straightforward 🥲.

Saining Xie

@sainingxie

7 months

Introducing Cambrian-1, a fully open project from our group at NYU. The world doesn't need another MLLM to rival GPT-4V. Cambrian is unique as a vision-centric exploration & here's why I think it's time to shift focus from scaling LLMs to enhancing visual representations.🧵[1/n]

0

1

9

Karl Pertsch

@KarlPertsch

1 year

Creating this dataset was a huge community effort (look at that author list 😀)! I led the dataset construction and had calls with countless labs & everybody was very excited to contribute data — there is a lot of momentum in the community towards sharing & reusing data 🙂. 12/

1

0

9

Karl Pertsch

@KarlPertsch

1 year

I’m very excited to see how the community will use this dataset! Let me know if you have any questions! 🙂. 💻Project Website: 15/15.

1

9

Karl Pertsch

@KarlPertsch

1 year

The full dataset download is ~4.5 TB. We also provide a sheet that allows you to filter the data along many attributes, e.g. if you only want to download Franka robot data or only data with wrist cams, natural language instructions etc! Tailor the data to your use case!. 9/

1

0

8

Karl Pertsch

@KarlPertsch

3 months

Pi is a great place to do fundamental robotics research & publish it!.P.S.: we're hiring :).

1

0

8

Karl Pertsch

@KarlPertsch

5 years

@_oleh and I are presenting our work on hierarchical models for long-horizon prediction and planning at the #BIGICML workshop today, start is at 10.40PT. Come join us to chat about predictive models and model-based RL!.

Sergey Levine

@svlevine

5 years

Instead of predicting in sequence, we can predict hierarchically: midpoint b/w start&goal, midpoint between that, etc. This hierarchical approach is great for planning w/ images!. @KarlPertsch, @_oleh, @febert8888, @chelseabfinn, @dineshjayaraman .

0

2

8

Karl Pertsch

@KarlPertsch

8 months

Check out Sidd's thread about OpenVLA and some key open questions for VLA research!.

Siddharth Karamcheti

@siddkaramcheti

8 months

Thrilled to announce OpenVLA ( – a vision-language-action policy for robotic control!. Shout out to my co-leads @moo_jin_kim & @KarlPertsch; see their threads for overviews of our work. Here though, I want to talk about observations & next steps! 🧵⬇️.

0

8

Karl Pertsch

@KarlPertsch

4 months

Very cool, thanks for the walk-through on trying the model on robotics data! Spatial grounding is key to make VLMs useful for robotics and Molmo's grounding seems very robust in the examples Kiana tried!.Looking forward to giving it a spin!.

Kiana Ehsani

@ehsanik

4 months

Try out Molmo on your application! This is a great example by @DJiafei! We have a few videos describing Molmo's different capabilities on our blog! This one is me trying it out on a bunch of tasks and images from RT-X:

1

2

8

Karl Pertsch

@KarlPertsch

8 months

How does it work?.We take a strong open-source VLM, Prismatic 7B, and fine-tune it to predict robot actions, using a curated dataset of 970k robot demonstrations. This recipe scales, and allows robotics to reuse pretrained models from the community (SigLIP, DinoV2, Llama2) 🚀. 2/

2

0

8

Karl Pertsch

@KarlPertsch

1 year

Last but not least: Octo is your one-stop-shop for training on OpenX data! We’re releasing high-quality data loaders that work with PyTorch and JAX + a curated dataset split!. 7/.

Karl Pertsch

@KarlPertsch

1 year

Very excited to release the Open X-Embodiment Dataset today — the largest robot dataset to date with 1M+ trajectories! Robotics needs more data & this is a big step!. There’s lots to unpack here, so let’s do a deep dive into the dataset!. 🧵1/15

2

0

7

Karl Pertsch

@KarlPertsch

5 months

Big thanks to my co-organizers @keerthanpg @Lawrence_Y_Chen @lucy_x_shi @xiao_ted @QuanVng @pannag_ Christine Chan @Ken_Goldberg @gauravsukhatme @chelseabfinn!. Paper submission deadline: 10/03.Date: 11/09, Munich, Germany.Workshop Website:

0

3

7

Karl Pertsch

@KarlPertsch

1 year

This was a big team effort w/ collaborator from UC Berkeley, Stanford & CMU!.I'm very grateful to all collaborators!! :) @its_dibya @HomerWalke @kvablack @oier_mees @SudeepDasari @JoeyHejna Tobias Kreiman, Charles Xu @jianlanluo You Liang Tan @DorsaSadigh @chelseabfinn @svlevine.

2

0

6

Karl Pertsch

@KarlPertsch

4 years

Excited to present SPiRL in contributed talks at the Deep RL and Robot Learning workshops @NeurIPSConf! Join us during the poster sessions to chat about all things skill learning & transfer!. DRL Poster: Room F, A1.Robot Learning Poster: C3.w/ @YoungwoonLee & @JosephLim_AI.

Karl Pertsch

@KarlPertsch

4 years

How can we use large offline datasets for accelerating the learning of new tasks? We can transfer skills!.Check out our #CoRL2020 paper on efficient skill transfer with learned skill priors!.📄Paper: 💻Website & Code: Thread👇(1/8)

0

1

7

Karl Pertsch

@KarlPertsch

4 months

My main message: large scale robot learning today looks very similar to other multi-modal sequence modeling problems, eg VLM training. We train so-called vision-language-action models (VLAs) on large datasets of interleaved image, text and action tokens. 2/

1

0

7

Karl Pertsch

@KarlPertsch

3 months

You can submit 🌶️ questions for the panel on our website: The whole workshop will be live-streamed to YouTube + recorded (if conference internet permits 🥲):

3

2

6

Karl Pertsch

@KarlPertsch

1 year

We plan to expand the dataset over time and e.g. add more mobile manipulation and simulation data. If you have data that would be good to integrate, simulated or real, please fill out the form:

0

6

Karl Pertsch

@KarlPertsch

4 months

Notably, the VLM pre-training allows VLAs like RT-2-X and OpenVLA to generalize more broadly than prior robot models w/o internet pre-training. Using VLM backbones also made it easy to optimize training + inference efficiency via LoRA + quantization!. 4/

1

0

6

Karl Pertsch

@KarlPertsch

1 year

Using the data is easy! All data is stored in tfrecords & we made a colab for visualizing & downloading the data (w/ examples for efficient data loaders)! Each dataset stores observations/actions in its “native” format & resolution, but it's easy to align&mix them on-the-fly!. 8/

1

0

6

Karl Pertsch

@KarlPertsch

7 months

This is great work! 38 fine-tuning tasks for every eval 🤯 thanks for sharing many ablations @giffmana and team!. Also confirms our finding that vis encoder fine-tuning is required for finegrained spatial tasks like robot control!. Any plans to release larger PaliGemma models? :).

Lucas Beyer (bl16)

@giffmana

7 months

✨PaliGemma report will hit arxiv tonight. We tried hard to make it interesting, and not "here model. sota results. kthxbye.". So here's some of the many interesting ablations we did, check the paper tomorrow for more!. 🧶

1

0

6

Karl Pertsch

@KarlPertsch

3 months

Our focus for this first release was to push the dexterity of generalist robot policies. The videos on our blog show some of the most dexterous autonomous policies to date, and they are all based on a single base model checkpoint.

1

0

6

Karl Pertsch

@KarlPertsch

1 year

We analyzed the properties of the combined dataset!.First, the number of datasets per robot embodiment: many academic labs use Franka robot arms, so we have many (smaller) Franka datasets and a long-tail of other robot embodiments!. 3/

1

3

6

Karl Pertsch

@KarlPertsch

4 months

Thus, we can simply fine-tune existing VLMs on our data to act as robot policies. We can reuse a lot of pieces from the VLM ecosystem -- scalable models, training & serving infra etc. In OpenVLA we packaged all of that into a strong robot policy:

Karl Pertsch

@KarlPertsch

8 months

Very excited to release OpenVLA today, a 7B parameter open-source vision-language-action model (VLA). 🦾 SoTA generalist policy (better than Octo & RT-2-X).⚡️ Easy to run & fine-tune on 1 GPU with quantization and LoRA.💻 Open-source PyTorch codebase.🤗 Models on HuggingFace. 1/

1

0

6

Karl Pertsch

@KarlPertsch

7 months

When collecting your fine-tuning data, start with little variation in terms of objects, positions, scenes, backgrounds, camera angles, etc. It's easier to catch bugs in your robot pipeline this way. But, for best policy generalization, collect more diverse demo data later!. 3/.

1

0

5

Karl Pertsch

@KarlPertsch

8 months

Big shoutout to my co-leads @moo_jin_kim and @siddkaramcheti, and thanks to my advisors @chelseabfinn and @svlevine, and many others involved! Also thanks to @ToyotaResearch for providing the compute to enable this kind of open-source research!. 9/9.

1

0

5

Karl Pertsch

@KarlPertsch

1 year

We’re fully open-sourcing model checkpoints, our pre-training and finetuning pipelines!. Initially, Octo comes in two sizes: Octo-Small (27M params) and Octo-Base (93M params). All models are on HuggingFace, so loading an Octo model is as easy as this:. 5/

1

0

5

Karl Pertsch

@KarlPertsch

1 year

Octo is only the first step towards building generalist robot policies and we’re planning to improve the models over time — larger sizes, more robot morphologies, RL etc etc — really excited to see how folks will use Octo! :) . 8/.

1

0

5

Karl Pertsch

@KarlPertsch

7 months

Most importantly: 99% of applications will require *fine-tuning*, i.e. collect a small dataset <100 robot demos in your target domain & fine-tune OpenVLA on it. Why? OpenVLA needs to learn your robot's action space, camera setup etc. More on 0-shot usage at the end!. 2/.

1

0

5

Karl Pertsch

@KarlPertsch

8 months

Please check out Moo Jin’s thread for more details about OpenVLA — Moo Jin really carried the torch in this project, which was the first project in his PhD! Way to go Moo Jin! :).

Moo Jin Kim

@moo_jin_kim

8 months

✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐. - SOTA generalist policy.- 7B params.- outperforms Octo, RT-2-X on zero-shot evals 🦾.- trained on 970k episodes from OpenX dataset 🤖.- fully open: model/code/data all online 🤗. 🧵👇

1

0

5

Karl Pertsch

@KarlPertsch

1 year

We’re hoping to continue this momentum and keep growing the dataset 🚀! We’re still figuring out the details, but if you or your lab have data you’d like to contribute feel free to shoot an email to open-x-embodiment@googlegroups.com and we will get back to you! :). 13/.

1

5

Karl Pertsch

@KarlPertsch

4 years

Jun will present our work on augmenting RL w/ motion planners at @corl_conf today. Our RL agents learn to use motion planners for solving challenging manipulation tasks w/ many obstacles!. Interactive Session: today, 11.10am PST. Led jointly by Jun (@junjungoal) & @YoungwoonLee.

0

1

5