![Ayzaan Wahid Profile](https://pbs.twimg.com/profile_images/1638732830963535872/7SogEb1M.jpg)
Ayzaan Wahid
@ayzwah
Followers
1K
Following
254
Media
13
Statuses
40
For the past year we've been working on ALOHA Unleashed π @GoogleDeepmind - pushing the scale and dexterity of tasks on our ALOHA 2 fleet. Here is a thread with some of the coolest videos!. The first task is hanging a shirt on a hanger (autonomous 1x)
32
112
541
one time @tonyzzhao took off his sweater to try it with the model. The policy was never trained on an adult sized shirt or any type of sweaters, but we found it's able to generalize.
2
9
129
We've been working on ALOHA 2 for several months to scale up the ALOHA platform and create a robot fleet to collect dexterous manipulation data. The new hardware is more robust, user-friendly, and enables a much wider range of tasks. Check out Tony's thread for details!.
Led by @GoogleDeepMind, we present ALOHA 2 π€: An Enhanced Low-Cost Hardware for Bimanual Teleoperation. ALOHA 2 π€ significantly improves the durability of the original ALOHA ποΈ, enabling fleet-scale data collection on more complex tasks. As usual, everything is open-sourced!
1
4
50
Check out @tonyzzhao's tweet for a continuous-take video.
Introducing πππππ ππ§π₯πππ¬π‘ππ π - Pushing the boundaries of dexterity with low-cost robots and AI. @GoogleDeepMind. Finally got to share some videos after a few months. Robots are fully autonomous filmed in one continuous shot. Enjoy!
1
0
33
check out the blog post on PaLM-E!. also sharing a few more sports-themed examples of PaLM-E capabilities. first, here's PaLM-E describing images using emoji: π
Today we share PaLM-E, a generalist, embodied language model for robotics. The largest instantiation, 562 billion parameters, is also a state-of-the-art visual-language model, has PaLMβs language skills, and can be successfully applied across robot types β
5
6
30
this was a fun project with an amazing team: @tonyzzhao, @JonathanTompson, @DannyDriess, @peteflorence, @coolboi95, @chelseabfinn @SpencerGoodric6.
2
0
11
@ashvinair @GoogleDeepMind thanks Ashvin! when are you making your comeback to robotics? haha.
1
0
12
@ericjang11 you may be interested in sim ablations we ran.- 50% of data still does pretty well (though larger gap in SPL metric).-sim dataset smaller than real (~180k trajectories).-also ran bc-z RN+FiLM encoder, which does pretty well!. still lots more to learn about data complexity though!
1
0
3
Check out our blog post on Interactive Language. We're also releasing Language Table, with >600k real and sim robot episodes. Here's a colab notebook which gives an overview of the data and sim environment:
Interactive Language is an imitation learning framework for producing real-time, open vocabulary language-conditionable robots. Learn more and check out the newly released and largest available language-annotated robot dataset, called Language-Table β
1
1
1
We're hoping this benchmark can be useful for many ideas in robot learning!.
We think this dataset could have multiple uses beyond language control of robots, e.g. * Robot video captioning.* Action conditioned future video prediction.* Reward prediction.* Vision-language models + actions.* Sim2real.* Multi-robot pretraining (GATO-2?).
0
0
1
here's an example of multimodal chain-of-thought to answer a series of questions about the image.
Hereβs a many-step zero-shot CoT example (prompt by @ayzwah!). Note large VQA training datasets (VQAv2, OKVQA, etc.) typically only have 1-, 2-, 3-word answers, so these many-step answers are considerably out-of-distribution.
1
0
1