Boyi Li Profile
Boyi Li

@Boyiliee

Followers
2K
Following
468
Media
24
Statuses
114

Joined March 2020
Don't wanna be here? Send us removal request.
@Boyiliee
Boyi Li
1 month
Iโ€™ve dreamt of creating a tool that could animate anyone with any motion from just ONE imageโ€ฆ and now itโ€™s a reality!.๐ŸŽ‰ Super excited to introduce updated 3DHM: Synthesizing Moving People with 3D Control. ๐Ÿ•บ๐Ÿ’ƒ3DHM can generate human videos from a single real or synthetic human
4
38
192
@Boyiliee
Boyi Li
6 months
๐Ÿš€ Introducing ๐–๐จ๐ฅ๐Ÿ ๐Ÿบ: a mixture-of-experts video captioning framework that outperforms GPT-4V and Gemini-Pro-1.5 in general scenes ๐Ÿ–ผ๏ธ, autonomous driving ๐Ÿš—, and robotics videos ๐Ÿค–. ๐Ÿ‘‘:
Tweet media one
8
63
201
@Boyiliee
Boyi Li
1 year
Super excited to announce our new work: Synthesizing Moving People with 3D Control (3DHM)๐Ÿ’ก. Why is 3DHM unique?.With 3D Control, 3DHM can animate a ๐—ฟ๐—ฎ๐—ป๐—ฑ๐—ผ๐—บ human photo with ๐—ฎ๐—ป๐˜† poses in a ๐Ÿฏ๐Ÿฒ๐Ÿฌ-๐—ฑ๐—ฒ๐—ด๐—ฟ๐—ฒ๐—ฒ camera view and ๐—ฎ๐—ป๐˜† camera azimuths from ๐—ฎ๐—ป๐˜† video!
15
53
241
@Boyiliee
Boyi Li
11 months
๐Ÿš€ Thrilled to share our CVPR 2024 paper: Self-correcting LLM-controlled Diffusion Models (SLD)!. SLD can automatically edit any image or fix text-to-image misalignments across any generative model like #DALLE3 and #SDXL - no extra training is needed.
7
42
223
@Boyiliee
Boyi Li
1 year
Can we ask a robot to make boba milk ๐Ÿง‹?.Can we ask a robot to make any drinks based on limited task guidelines? ๐Ÿ“’.Can we change our minds and interact with the robot during the execution? ๐Ÿค–. Yes for all!!. We are excited to announce ITP (Interactive Task Planning with Language
5
47
213
@Boyiliee
Boyi Li
11 months
๐Ÿš˜Excited to share LLaDA @cvpr #CVPR2024, featured in #GTC2024!. LLaDA is a simple yet powerful tool that enables human drivers and autonomous vehicles alike to ๐ƒ๐ซ๐ข๐ฏ๐ž ๐„๐ฏ๐ž๐ซ๐ฒ๐ฐ๐ก๐ž๐ซ๐ž by adapting their tasks and motion plans to traffic rules.
3
32
181
@Boyiliee
Boyi Li
3 years
Semantic segmentation is not limited to a fixed label set! Happy introduce LSeg, a novel model that can dynamically handle arbitrary label sets on the fly with varying length, content, and order ๐ŸŒป.Paper: Demo:
4
24
112
@Boyiliee
Boyi Li
1 year
We are happy to announce the first Vision and Language for Autonomous Driving and Robotics (VLADR) workshop at @CVPR 2024! . Call for contributions and more details ๐Ÿ‘‡๐Ÿป. See you in Seattle! ๐Ÿ˜ƒ.
1
19
90
@Boyiliee
Boyi Li
8 months
๐Ÿค– Our "Vision and Language for Autonomous Driving and Robotics" full-day workshop @CVPR will take place next Tuesday. Please check the details here: See you in Seattle!
Tweet media one
Tweet media two
@Boyiliee
Boyi Li
1 year
We are happy to announce the first Vision and Language for Autonomous Driving and Robotics (VLADR) workshop at @CVPR 2024! . Call for contributions and more details ๐Ÿ‘‡๐Ÿป. See you in Seattle! ๐Ÿ˜ƒ.
1
19
91
@Boyiliee
Boyi Li
1 year
Paper on huggingface:
0
9
36
@Boyiliee
Boyi Li
1 year
0
3
26
@Boyiliee
Boyi Li
4 years
WiCV 2021 @CVPR is coming soon (June 19th), please feel free to share this message or send the application link to anyone who needs help๐ŸŒณ.
@WiCVworkshop
WiCV
4 years
With the generous funding received from our sponsors, we are happy to offer extra spots for WiCV workshop in the upcoming CVPR. Please fill out this form latest by Jun 18th to be considered for the conference registration funding:
0
2
15
@Boyiliee
Boyi Li
10 months
@BoyuanChen0 @MIT @MIT_CSAIL So cute! How about asking a robot to make a boba milk for you ๐Ÿ˜†.
@Boyiliee
Boyi Li
1 year
Can we ask a robot to make boba milk ๐Ÿง‹?.Can we ask a robot to make any drinks based on limited task guidelines? ๐Ÿ“’.Can we change our minds and interact with the robot during the execution? ๐Ÿค–. Yes for all!!. We are excited to announce ITP (Interactive Task Planning with Language
1
0
12
@Boyiliee
Boyi Li
11 months
LLaDA (Driving Everywhere with Large Language Model Policy Adaptation) has been featured in #GTC2024 @NVIDIAAI @nvidia ๐Ÿ˜ƒ๐Ÿš—. Please check our GTC video for more details:
1
0
8
@Boyiliee
Boyi Li
11 months
@NVIDIAAI @nvidia Proudly working with an incredible team @yuewang314, @PointsCoder, @iamborisi, @Veer_Sushant, Karen Leung and @drmapavone @NVIDIAAI @nvidia #GTC2024 โ›ณ๏ธ. ๐Ÿ“šarxiv: ๐Ÿ’ปproject page: ๐ŸŒrelated link: @_akhaliq.
@_akhaliq
AK
1 year
Nvidia presents Driving Everywhere with Large Language Model Policy Adaptation. paper page: Adapting driving behavior to new environments, customs, and laws is a long-standing problem in autonomous driving, precluding the widespread deployment of
Tweet media one
0
0
8
@Boyiliee
Boyi Li
8 months
@CVPR Glad to share our LLaDA demo! @CVPR. working with @yuewang314,@PointsCoder,@iamborisi,@Veer_Sushant, Karen Leung and @drmapavone @NVIDIAAI @NVIDIADRIVE.
0
0
8
@Boyiliee
Boyi Li
8 months
We will host keynote talks by @trevordarrell, @JitendraMalikCV, @chelseabfinn, @xf1280, and Long Chen, as well as oral & poster sessions!. Co-organizing the workshop with @yuewang314, @zhaohang0124, @JiaweiYang118, @PointsCoder, @SergeBelongie, @FidlerSanja, and @drmapavone!.
0
0
8
@Boyiliee
Boyi Li
1 year
@_akhaliq Thanks AK for sharing our work!.
1
0
7
@Boyiliee
Boyi Li
11 months
0
0
6
@Boyiliee
Boyi Li
6 months
Video understanding is challenging, and currently, a single VLM cannot capture everything in a video. Wolf aims to produce accurate descriptions of scenes, agent motion, and videos by introducing a novel framework, the CapScore metric, and four human-annotated datasets.
Tweet media one
1
0
9
@Boyiliee
Boyi Li
1 year
Super honored to work with @brjathu @YGandelsman Alexei A. Efros, and @JitendraMalikCV!. We also thank the insightful discussions with @geopavlakos @goelshbhm , and @jane_h_wu!.
1
0
6
@Boyiliee
Boyi Li
1 year
3DHM training pipeline is self-supervised and scalable, it can be trained with any human videos. Please visit our website for more information:
1
1
6
@Boyiliee
Boyi Li
3 years
Joint work with @KilianQW, @SergeBelongie, Vladlen Koltun and @ranftlr1. @cs_cornell @cornell_tech. @uni_copenhagen, @Apple, @IntelAI.Code is available at Any feedback is welcome and appreciated.
1
0
5
@Boyiliee
Boyi Li
2 years
@_akhaliq Thanks @_akhaliq for sharing our work!.
0
0
3
@Boyiliee
Boyi Li
1 year
๐Ÿ’ก Motions from Text.Text Input: A person turns to his right and paces back and forth.
1
0
4
@Boyiliee
Boyi Li
6 months
To evaluate caption quality, we introduce CapScore, an LLM-based metric that assesses the quality and similarity of generated captions to ground-truth captions. We built 4 human-annotated datasets in 3 domains: AV, robotics, and general scenes, to enable comprehensive comparisons.
1
0
3
@Boyiliee
Boyi Li
6 months
Wolf achieves superior captioning performance compared to SOTA methods from the research community (VILA1.5, CogAgent) and commercial solutions (Gemini-Pro-1.5, GPT-4V). For example, Wolf improves caption quality by 55.6% over GPT-4V on challenging driving videos (by CapScore).
1
0
2
@Boyiliee
Boyi Li
4 years
@tkasarla_ @CVPR @WiCVworkshop Very happy to have you this year! Great job!!.
0
0
3
@Boyiliee
Boyi Li
6 months
@_akhaliq
AK
6 months
Wolf. Captioning Everything with a World Summarization Framework. We propose Wolf, a WOrLd summarization Framework for accurate video captioning. Wolf is an automated captioning framework that adopts a mixture-of-experts approach, leveraging complementary strengths of
Tweet media one
0
0
3
@Boyiliee
Boyi Li
4 years
@ceyda_cinarel Very happy you love them ๐Ÿ˜ƒโค๏ธ! These are largely inspired by the discussion with @SergeBelongie ,@YinCui1 and @TsungYiLin1.
0
0
3
@Boyiliee
Boyi Li
1 year
๐Ÿ’ก Motions from Random Videos (Various 3D poses)
1
0
3
@Boyiliee
Boyi Li
11 months
๐Ÿ› ๏ธ Explore the SLD pipeline: a simple yet effective โ€œtraining-freeโ€ approach that combines LLM-integrated detectors for image assessment with base diffusion models for actual image modification.
1
0
3
@Boyiliee
Boyi Li
11 months
Proudly working with an incredible team @tsunghan_wu, @LongTonyLian, @profjoeyg, and @trevordarrell at @berkeley_ai @CVPR #CVPR2024. Excited for this journey in exploring LLM in content creation! ๐Ÿ˜ƒ.
1
0
2
@Boyiliee
Boyi Li
1 year
0
1
3
@Boyiliee
Boyi Li
6 months
๐–๐ž ๐ข๐ง๐ฏ๐ข๐ญ๐ž ๐ญ๐ก๐ž ๐œ๐จ๐ฆ๐ฆ๐ฎ๐ง๐ข๐ญ๐ฒ ๐ญ๐จ ๐ฉ๐š๐ซ๐ญ๐ข๐œ๐ข๐ฉ๐š๐ญ๐ž ๐ข๐ง ๐ญ๐ก๐ž ๐–๐จ๐ฅ๐Ÿ ๐‚๐ก๐š๐ฅ๐ฅ๐ž๐ง๐ ๐ž: ๐š๐ง๐ ๐ฎ๐ฌ๐ž ๐–๐จ๐ฅ๐Ÿ ๐ข๐ง ๐ญ๐ก๐ž๐ข๐ซ ๐œ๐š๐ฉ๐ญ๐ข๐จ๐ง ๐ฉ๐ข๐ฉ๐ž๐ฅ๐ข๐ง๐ž!.
1
0
2
@Boyiliee
Boyi Li
6 months
In conclusion, Wolf can (1) generate high-quality captions for challenging autonomous driving videos;.(2) provide a general benchmark for video caption qualities.
1
0
2
@Boyiliee
Boyi Li
3 years
Our code also includes detailed instructions for trying the demo:
Tweet media one
0
0
2
@Boyiliee
Boyi Li
1 year
@jazcollins_ Always enjoy Jasmine's example ๐Ÿ˜†.
0
0
2
@Boyiliee
Boyi Li
6 months
Further, Wolf can generate high-quality captions to improve multimodal foundation models. VILA-1.5, when fine-tuned with Wolf-provided captions on 500 highly interactive nuScenes videos, significantly outperforms the original VILA-1.5 model.
Tweet media one
1
0
2
@Boyiliee
Boyi Li
3 years
@ak92501 Thanks, @ak92501๐ŸŒป We additionally created a short video demo to further showcase the capabilities of LSeg which can be found here:
0
0
2
@Boyiliee
Boyi Li
11 months
๐Ÿ“ˆ SLD dominates in negation, numeracy, attribute binding, and spatial relationships, outperforming existing models including #DALLE3 on text-to-image generation.
Tweet media one
1
0
2
@Boyiliee
Boyi Li
6 months
@muhammedce7in Yes, this would be possible!.
1
0
1
@Boyiliee
Boyi Li
1 year
๐Ÿ’ก Long-range Motions
1
0
1
@Boyiliee
Boyi Li
2 years
@jbhuang0604 @ml_umd Thanks Jia-Bin!.
0
0
1
@Boyiliee
Boyi Li
3 years
@ylzou_Zack Thanks @ylzou_Zack! I really like this question, as has been mentioned in the paper, we assume due to memory constraints, the image encoder predicts pixel embeddings at a lower resolution than the input image resolution.
1
0
1
@Boyiliee
Boyi Li
3 years
@Haz09714143 Thanks for your question and interest in LSeg! We've updated detailed instructions in our code: Please let us know if you have any questions! Any feedback is welcome and appreciated.
Tweet media one
0
0
0
@Boyiliee
Boyi Li
9 months
@medhini_n @trevordarrell Congrats! And so pretty!.
0
0
1
@Boyiliee
Boyi Li
1 year
@brjathu @YGandelsman @JitendraMalikCV @geopavlakos @goelshbhm @jane_h_wu We hope 3DHM could contribute valuable insights and encourage further advancements in research, with an emphasis on the alignment of pixels with 3D and the integration of video generation with 3D control. ๐Ÿ’ก.
1
0
1
@Boyiliee
Boyi Li
11 months
@rajammanabrolu Very cute ๐Ÿ˜†.
0
0
1