Boyi Li @Boyiliee profile

Boyi Li

@Boyiliee

Followers

2K

Following

468

Media

24

Statuses

114

Joined March 2020

Don't wanna be here? Send us removal request.

Boyi Li

@Boyiliee

1 month

I’ve dreamt of creating a tool that could animate anyone with any motion from just ONE image… and now it’s a reality!.🎉 Super excited to introduce updated 3DHM: Synthesizing Moving People with 3D Control. 🕺💃3DHM can generate human videos from a single real or synthetic human

4

38

192

Boyi Li

@Boyiliee

6 months

🚀 Introducing 𝐖𝐨𝐥𝐟 🐺: a mixture-of-experts video captioning framework that outperforms GPT-4V and Gemini-Pro-1.5 in general scenes 🖼️, autonomous driving 🚗, and robotics videos 🤖. 👑:

8

63

201

Boyi Li

@Boyiliee

1 year

Super excited to announce our new work: Synthesizing Moving People with 3D Control (3DHM)💡. Why is 3DHM unique?.With 3D Control, 3DHM can animate a 𝗿𝗮𝗻𝗱𝗼𝗺 human photo with 𝗮𝗻𝘆 poses in a 𝟯𝟲𝟬-𝗱𝗲𝗴𝗿𝗲𝗲 camera view and 𝗮𝗻𝘆 camera azimuths from 𝗮𝗻𝘆 video!

15

53

241

Boyi Li

@Boyiliee

11 months

🚀 Thrilled to share our CVPR 2024 paper: Self-correcting LLM-controlled Diffusion Models (SLD)!. SLD can automatically edit any image or fix text-to-image misalignments across any generative model like #DALLE3 and #SDXL - no extra training is needed.

7

42

223

Boyi Li

@Boyiliee

1 year

Can we ask a robot to make boba milk 🧋?.Can we ask a robot to make any drinks based on limited task guidelines? 📒.Can we change our minds and interact with the robot during the execution? 🤖. Yes for all!!. We are excited to announce ITP (Interactive Task Planning with Language

5

47

213

Boyi Li

@Boyiliee

11 months

🚘Excited to share LLaDA @cvpr #CVPR2024, featured in #GTC2024!. LLaDA is a simple yet powerful tool that enables human drivers and autonomous vehicles alike to 𝐃𝐫𝐢𝐯𝐞 𝐄𝐯𝐞𝐫𝐲𝐰𝐡𝐞𝐫𝐞 by adapting their tasks and motion plans to traffic rules.

3

32

181

Boyi Li

@Boyiliee

3 years

Semantic segmentation is not limited to a fixed label set! Happy introduce LSeg, a novel model that can dynamically handle arbitrary label sets on the fly with varying length, content, and order 🌻.Paper: Demo:

4

24

112

Boyi Li

@Boyiliee

1 year

We are happy to announce the first Vision and Language for Autonomous Driving and Robotics (VLADR) workshop at @CVPR 2024! . Call for contributions and more details 👇🏻. See you in Seattle! 😃.

1

19

90

Boyi Li

@Boyiliee

8 months

🤖 Our "Vision and Language for Autonomous Driving and Robotics" full-day workshop @CVPR will take place next Tuesday. Please check the details here: See you in Seattle!

Boyi Li

@Boyiliee

1 year

We are happy to announce the first Vision and Language for Autonomous Driving and Robotics (VLADR) workshop at @CVPR 2024! . Call for contributions and more details 👇🏻. See you in Seattle! 😃.

1

19

91

Boyi Li

@Boyiliee

1 year

Paper on huggingface:

0

9

36

Boyi Li

@Boyiliee

1 year

@brjathu @YGandelsman @JitendraMalikCV @geopavlakos @goelshbhm @jane_h_wu Paper on Hugging Face: Website:

0

3

26

Boyi Li

@Boyiliee

4 years

WiCV 2021 @CVPR is coming soon (June 19th), please feel free to share this message or send the application link to anyone who needs help🌳.

WiCV

@WiCVworkshop

4 years

With the generous funding received from our sponsors, we are happy to offer extra spots for WiCV workshop in the upcoming CVPR. Please fill out this form latest by Jun 18th to be considered for the conference registration funding:

0

2

15

Boyi Li

@Boyiliee

10 months

@BoyuanChen0 @MIT @MIT_CSAIL So cute! How about asking a robot to make a boba milk for you 😆.

Boyi Li

@Boyiliee

1 year

Can we ask a robot to make boba milk 🧋?.Can we ask a robot to make any drinks based on limited task guidelines? 📒.Can we change our minds and interact with the robot during the execution? 🤖. Yes for all!!. We are excited to announce ITP (Interactive Task Planning with Language

1

0

12

Boyi Li

@Boyiliee

11 months

LLaDA (Driving Everywhere with Large Language Model Policy Adaptation) has been featured in #GTC2024 @NVIDIAAI @nvidia 😃🚗. Please check our GTC video for more details:

1

0

8

Boyi Li

@Boyiliee

11 months

@NVIDIAAI @nvidia Proudly working with an incredible team @yuewang314, @PointsCoder, @iamborisi, @Veer_Sushant, Karen Leung and @drmapavone @NVIDIAAI @nvidia #GTC2024 ⛳️. 📚arxiv: 💻project page: 🌏related link: @_akhaliq.

AK

@_akhaliq

1 year

Nvidia presents Driving Everywhere with Large Language Model Policy Adaptation. paper page: Adapting driving behavior to new environments, customs, and laws is a long-standing problem in autonomous driving, precluding the widespread deployment of

0

8

Boyi Li

@Boyiliee

8 months

@CVPR Glad to share our LLaDA demo! @CVPR. working with @yuewang314,@PointsCoder,@iamborisi,@Veer_Sushant, Karen Leung and @drmapavone @NVIDIAAI @NVIDIADRIVE.

0

8

Boyi Li

@Boyiliee

8 months

We will host keynote talks by @trevordarrell, @JitendraMalikCV, @chelseabfinn, @xf1280, and Long Chen, as well as oral & poster sessions!. Co-organizing the workshop with @yuewang314, @zhaohang0124, @JiaweiYang118, @PointsCoder, @SergeBelongie, @FidlerSanja, and @drmapavone!.

0

8

Boyi Li

@Boyiliee

1 year

@_akhaliq Thanks AK for sharing our work!.

1

0

7

Boyi Li

@Boyiliee

11 months

@tsunghan_wu @LongTonyLian @profjoeyg @trevordarrell @berkeley_ai @CVPR - code: - arxiv: - project page:

0

6

Boyi Li

@Boyiliee

6 months

Video understanding is challenging, and currently, a single VLM cannot capture everything in a video. Wolf aims to produce accurate descriptions of scenes, agent motion, and videos by introducing a novel framework, the CapScore metric, and four human-annotated datasets.

1

0

9

Boyi Li

@Boyiliee

1 year

Super honored to work with @brjathu @YGandelsman Alexei A. Efros, and @JitendraMalikCV!. We also thank the insightful discussions with @geopavlakos @goelshbhm , and @jane_h_wu!.

1

0

6

Boyi Li

@Boyiliee

1 year

3DHM training pipeline is self-supervised and scalable, it can be trained with any human videos. Please visit our website for more information:

1

6

Boyi Li

@Boyiliee

3 years

Joint work with @KilianQW, @SergeBelongie, Vladlen Koltun and @ranftlr1. @cs_cornell @cornell_tech. @uni_copenhagen, @Apple, @IntelAI.Code is available at Any feedback is welcome and appreciated.

1

0

5

Boyi Li

@Boyiliee

2 years

@_akhaliq Thanks @_akhaliq for sharing our work!.

0

3

Boyi Li

@Boyiliee

1 year

💡 Motions from Text.Text Input: A person turns to his right and paces back and forth.

1

0

4

Boyi Li

@Boyiliee

6 months

To evaluate caption quality, we introduce CapScore, an LLM-based metric that assesses the quality and similarity of generated captions to ground-truth captions. We built 4 human-annotated datasets in 3 domains: AV, robotics, and general scenes, to enable comprehensive comparisons.

1

0

3

Boyi Li

@Boyiliee

6 months

Wolf achieves superior captioning performance compared to SOTA methods from the research community (VILA1.5, CogAgent) and commercial solutions (Gemini-Pro-1.5, GPT-4V). For example, Wolf improves caption quality by 55.6% over GPT-4V on challenging driving videos (by CapScore).

1

0

2

Boyi Li

@Boyiliee

4 years

@tkasarla_ @CVPR @WiCVworkshop Very happy to have you this year! Great job!!.

0

3

Boyi Li

@Boyiliee

6 months

AK

@_akhaliq

6 months

Wolf. Captioning Everything with a World Summarization Framework. We propose Wolf, a WOrLd summarization Framework for accurate video captioning. Wolf is an automated captioning framework that adopts a mixture-of-experts approach, leveraging complementary strengths of

0

3

Boyi Li

@Boyiliee

4 years

@ceyda_cinarel Very happy you love them 😃❤️! These are largely inspired by the discussion with @SergeBelongie ,@YinCui1 and @TsungYiLin1.

0

3

Boyi Li

@Boyiliee

1 year

💡 Motions from Random Videos (Various 3D poses)

1

0

3

Boyi Li

@Boyiliee

11 months

🛠️ Explore the SLD pipeline: a simple yet effective “training-free” approach that combines LLM-integrated detectors for image assessment with base diffusion models for actual image modification.

1

0

3

Boyi Li

@Boyiliee

3 years

@nmervegurel @jieyuzhao11 @mariyaivasileva @linh_trd @MeeraDesai18 @siri_r @AmakaOraekwe @akikoe_ @NanaYaaSally @geeticka1 @S_A_Lee_ Aga here🍷👏.

0

3

Boyi Li

@Boyiliee

11 months

Proudly working with an incredible team @tsunghan_wu, @LongTonyLian, @profjoeyg, and @trevordarrell at @berkeley_ai @CVPR #CVPR2024. Excited for this journey in exploring LLM in content creation! 😃.

1

0

2

Boyi Li

@Boyiliee

1 year

@CVPR Co-organizing the workshop with @yuewang314 @zhaohang0124 @JiaweiYang118 @PointsCoder @SergeBelongie @FidlerSanja and @drmapavone!.

0

1

3

Boyi Li

@Boyiliee

6 months

𝐖𝐞 𝐢𝐧𝐯𝐢𝐭𝐞 𝐭𝐡𝐞 𝐜𝐨𝐦𝐦𝐮𝐧𝐢𝐭𝐲 𝐭𝐨 𝐩𝐚𝐫𝐭𝐢𝐜𝐢𝐩𝐚𝐭𝐞 𝐢𝐧 𝐭𝐡𝐞 𝐖𝐨𝐥𝐟 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞: 𝐚𝐧𝐝 𝐮𝐬𝐞 𝐖𝐨𝐥𝐟 𝐢𝐧 𝐭𝐡𝐞𝐢𝐫 𝐜𝐚𝐩𝐭𝐢𝐨𝐧 𝐩𝐢𝐩𝐞𝐥𝐢𝐧𝐞!.

1

0

2

Boyi Li

@Boyiliee

6 months

In conclusion, Wolf can (1) generate high-quality captions for challenging autonomous driving videos;.(2) provide a general benchmark for video caption qualities.

1

0

2

Boyi Li

@Boyiliee

3 years

Our code also includes detailed instructions for trying the demo:

0

2

Boyi Li

@Boyiliee

1 year

@jazcollins_ Always enjoy Jasmine's example 😆.

0

2

Boyi Li

@Boyiliee

6 months

Further, Wolf can generate high-quality captions to improve multimodal foundation models. VILA-1.5, when fine-tuned with Wolf-provided captions on 500 highly interactive nuScenes videos, significantly outperforms the original VILA-1.5 model.

1

0

2

Boyi Li

@Boyiliee

6 months

1

0

2

Boyi Li

@Boyiliee

3 years

@ak92501 Thanks, @ak92501🌻 We additionally created a short video demo to further showcase the capabilities of LSeg which can be found here:

0

2

Boyi Li

@Boyiliee

1 year

@Karttikeya_m @gberta227 @cfeichtenhofer @chaoyuaw @tyleryzhu @JaiswalNitya96 @_amirbar Great work!!.

0

2

Boyi Li

@Boyiliee

11 months

📈 SLD dominates in negation, numeracy, attribute binding, and spatial relationships, outperforming existing models including #DALLE3 on text-to-image generation.

1

0

2

Boyi Li

@Boyiliee

6 months

@muhammedce7in Yes, this would be possible!.

1

0

1

Boyi Li

@Boyiliee

1 year

💡 Long-range Motions

1

0

1

Boyi Li

@Boyiliee

2 years

@jbhuang0604 @ml_umd Thanks Jia-Bin!.

0

1

Boyi Li

@Boyiliee

3 years

@ylzou_Zack Thanks @ylzou_Zack! I really like this question, as has been mentioned in the paper, we assume due to memory constraints, the image encoder predicts pixel embeddings at a lower resolution than the input image resolution.

1

0

1

Boyi Li

@Boyiliee

3 years

@Haz09714143 Thanks for your question and interest in LSeg! We've updated detailed instructions in our code: Please let us know if you have any questions! Any feedback is welcome and appreciated.

0

Boyi Li

@Boyiliee

9 months

@medhini_n @trevordarrell Congrats! And so pretty!.

0

1

Boyi Li

@Boyiliee

1 year

@brjathu @YGandelsman @JitendraMalikCV @geopavlakos @goelshbhm @jane_h_wu We hope 3DHM could contribute valuable insights and encourage further advancements in research, with an emphasis on the alignment of pixels with 3D and the integration of video generation with 3D control. 💡.

1

0

1

Boyi Li

@Boyiliee

11 months

@rajammanabrolu Very cute 😆.

0

1

Boyi Li

@Boyiliee

6 months

Proudly working with an incredible team: @LigengZhu, @tianran_, @stan188249301, @Yuxiao_Chen_, @Yao__Lu, @YinCuiCV, @Veer_Sushant, @DrQueuecumber, @PhilionJonah, @xinshuoweng, @XueFz, @DrJimFan, @yukez!.

1

0

1