Ayzaan Wahid Profile
Ayzaan Wahid

@ayzwah

Followers
1K
Following
254
Media
13
Statuses
40

Robotics @GoogleDeepmind

Joined October 2010
Don't wanna be here? Send us removal request.
@ayzwah
Ayzaan Wahid
10 months
For the past year we've been working on ALOHA Unleashed πŸŒ‹ @GoogleDeepmind - pushing the scale and dexterity of tasks on our ALOHA 2 fleet. Here is a thread with some of the coolest videos!. The first task is hanging a shirt on a hanger (autonomous 1x)
32
112
541
@ayzwah
Ayzaan Wahid
10 months
one time @tonyzzhao took off his sweater to try it with the model. The policy was never trained on an adult sized shirt or any type of sweaters, but we found it's able to generalize.
2
9
129
@ayzwah
Ayzaan Wahid
10 months
We also are able to learn precise insertion tasks - like replacing the finger on another robot in our fleet.Here's a closeup of that task.
4
3
62
@ayzwah
Ayzaan Wahid
1 year
We've been working on ALOHA 2 for several months to scale up the ALOHA platform and create a robot fleet to collect dexterous manipulation data. The new hardware is more robust, user-friendly, and enables a much wider range of tasks. Check out Tony's thread for details!.
@tonyzzhao
Tony Z. Zhao
1 year
Led by @GoogleDeepMind, we present ALOHA 2 πŸ€™: An Enhanced Low-Cost Hardware for Bimanual Teleoperation. ALOHA 2 πŸ€™ significantly improves the durability of the original ALOHA πŸ–οΈ, enabling fleet-scale data collection on more complex tasks. As usual, everything is open-sourced!
1
4
50
@ayzwah
Ayzaan Wahid
10 months
Here's another precise industrial gear insertion task which requires a tight friction fit and meshing the teeth on the gears.
1
3
49
@ayzwah
Ayzaan Wahid
10 months
Here's the policy on a variety of shirt colors (4x speed).
1
4
47
@ayzwah
Ayzaan Wahid
10 months
We also tried pushing the dexterity even further with our shoelace tying task, which requires straightening the shoe and laces, then tying the bunny ears on the shoe.(1x speed):
1
4
40
@ayzwah
Ayzaan Wahid
10 months
Check out @tonyzzhao's tweet for a continuous-take video.
@tonyzzhao
Tony Z. Zhao
10 months
Introducing π€π‹πŽπ‡π€ 𝐔𝐧π₯𝐞𝐚𝐬𝐑𝐞𝐝 πŸŒ‹ - Pushing the boundaries of dexterity with low-cost robots and AI. @GoogleDeepMind. Finally got to share some videos after a few months. Robots are fully autonomous filmed in one continuous shot. Enjoy!
1
0
33
@ayzwah
Ayzaan Wahid
10 months
More examples of our shoelace tying policy (4x speed):
3
3
30
@ayzwah
Ayzaan Wahid
2 years
check out the blog post on PaLM-E!. also sharing a few more sports-themed examples of PaLM-E capabilities. first, here's PaLM-E describing images using emoji: 🐐
Tweet media one
@GoogleAI
Google AI
2 years
Today we share PaLM-E, a generalist, embodied language model for robotics. The largest instantiation, 562 billion parameters, is also a state-of-the-art visual-language model, has PaLM’s language skills, and can be successfully applied across robot types β†’
5
6
30
@ayzwah
Ayzaan Wahid
10 months
2
0
11
@ayzwah
Ayzaan Wahid
2 years
Check out our work β€œInteractive Language: Talking to Robots in Real Time”. We end up with an open-vocab robot policy (language+pixels -> actions) that lets a human and robot collaborate to accomplish long horizon tasks. video:
1
4
7
@ayzwah
Ayzaan Wahid
1 year
big props to @the_real_btaba and @kevin_zakka for creating a super realistic mujoco sim model
Tweet media one
0
0
8
@ayzwah
Ayzaan Wahid
10 months
@ashvinair @GoogleDeepMind thanks Ashvin! when are you making your comeback to robotics? haha.
1
0
12
@ayzwah
Ayzaan Wahid
14 years
first tweet ever.
1
1
4
@ayzwah
Ayzaan Wahid
10 months
@ericjang11 @tonyzzhao thanks Eric, definitely inspired by your one-take video!.
0
0
3
@ayzwah
Ayzaan Wahid
2 years
@ericjang11 you may be interested in sim ablations we ran.- 50% of data still does pretty well (though larger gap in SPL metric).-sim dataset smaller than real (~180k trajectories).-also ran bc-z RN+FiLM encoder, which does pretty well!. still lots more to learn about data complexity though!
Tweet media one
1
0
3
@ayzwah
Ayzaan Wahid
2 years
and finally PaLM-E reminding us how many rings Kobe has won
Tweet media one
1
0
2
@ayzwah
Ayzaan Wahid
2 years
Check out our blog post on Interactive Language. We're also releasing Language Table, with >600k real and sim robot episodes. Here's a colab notebook which gives an overview of the data and sim environment:
@GoogleAI
Google AI
2 years
Interactive Language is an imitation learning framework for producing real-time, open vocabulary language-conditionable robots. Learn more and check out the newly released and largest available language-annotated robot dataset, called Language-Table β†’
1
1
1
@ayzwah
Ayzaan Wahid
2 years
We're hoping this benchmark can be useful for many ideas in robot learning!.
@coreylynch
Corey Lynch
2 years
We think this dataset could have multiple uses beyond language control of robots, e.g. * Robot video captioning.* Action conditioned future video prediction.* Reward prediction.* Vision-language models + actions.* Sim2real.* Multi-robot pretraining (GATO-2?).
0
0
1
@ayzwah
Ayzaan Wahid
10 months
@coreylynch @tonyzzhao thanks Corey!.
0
0
1
@ayzwah
Ayzaan Wahid
2 years
@coreylynch
Corey Lynch
2 years
Excited to share our work on robots that follow real time language πŸ”„πŸ”„:. Interactive Language: Talking to Robots in Real Time. Here's me talking to our best policy :)
0
0
1
@ayzwah
Ayzaan Wahid
2 years
here's an example of multimodal chain-of-thought to answer a series of questions about the image.
@peteflorence
Pete Florence
2 years
Here’s a many-step zero-shot CoT example (prompt by @ayzwah!). Note large VQA training datasets (VQAv2, OKVQA, etc.) typically only have 1-, 2-, 3-word answers, so these many-step answers are considerably out-of-distribution.
Tweet media one
1
0
1