![Minghuan Liu Profile](https://pbs.twimg.com/profile_images/1241966191758852096/Ti6fayWO_x96.jpg)
Minghuan Liu
@ericliuof97
Followers
342
Following
344
Statuses
102
Ph.D @sjtu1896. Prev: Visit @UCSD at @xiaolonw's lab. Robot Learning, Reinforcement Learning, Imitation Learning.
San Diego, CA
Joined September 2016
This work would not be possible without our coauthors, Yufei Xue, Wentao Dong, Weinan Zhang, and @pangjiangmiao !
0
0
1
Very impressive! High agility is truly what a powerful humanoid controller needed.
🚀 Can we make a humanoid move like Cristiano Ronaldo, LeBron James and Kobe Byrant? YES! 🤖 Introducing ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills Website: Code:
0
1
4
@roeiherzig Just take a closer look of the work, very insightful! Sorry for my incorrect conclusion. The way of utilizing VLAs is still an open problem for now, and RoboVLMs are indeed still limited as a preliminary work.
0
0
1
@YouJiacheng Sorry for my incorrect words. But MoE is from another dimension that all of the current four class of VLAs can be augmented with. If we ignore MoE, we can simply classify the current version of pi0 into one-step cont. Think we better add a limitation discussion for the pi0 case.
1
0
1
@YouJiacheng It makes sense for you taxnomy of token interaction. As I mentioned before, this work keep the attention interaction in each vlm to support fast integration, allowing to support a) b) c). For d), we need to change the attention mask inside every vlm and should be studied further
0
0
0
@YouJiacheng We define the name of interleave under a principle of history modeing. Think it may be better just a one-step continuous model (in fig 2
0
0
1
@YouJiacheng Make sense. I think it may be better to be classified into a one-step continuous formation. Will fixed it soon in the revision. Also we have not supported that kind of implementation because if so we need to change the forward function specifically for each VLM backbone.
1
0
0
@DavidFSWD The 9b model cannot work directly on Jeston. But in theory, we can do that with small models, distillation, and model compilation techniques if we want to do so
0
0
1
@YouJiacheng Our RoboVLMs support pooling and learnable token format, and would add the cross-attention format in the future. But how to represent the output tokens does not change the fact that pi_0 can be classified into a policy head formulation.
1
0
0