Zhenzhi Wang @zhenzhiwang X Profile

Zhenzhi Wang

@zhenzhiwang

Followers

162

Following

11

Media

3

Statuses

16

Ph.D candidate at MMLab, CUHK. Working on human-centric video generation. Previously NJU CS&Physics.

https://t.co/J9982ComwS

Sha Tin District, Hong Kong

Joined November 2020

Don't wanna be here? Send us removal request.

Zhenzhi Wang

@zhenzhiwang

3 months

Video generation model could now generate multi-person dialogue videos or talking videos with HOI, from text prompts and N pairs of {cropped reference images (e.g., head images), audio} without any lip post-processing. Paper: https://t.co/AoIOtYLYbD Demo: https://t.co/c3QfgvHf5i

3

7

19

Shengqu Cai

@prime_cai

2 days

Some random thoughts I've been having about video world model/long video generation since working on Mixture of Contexts (whose title could also be "Learnable Sparse Attention for Long Video Generation"): 🚨Semi-long Post Alert🚨 1. Learnable sparse attention is still underrated

Gordon Wetzstein

@GordonWetzstein

19 days

How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context? We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing. 1/4

6

36

207

Zhenzhi Wang

@zhenzhiwang

3 months

HumanVid's extension to multi-person human image animation has been accepted by ICCV25! Thanks for collaborators @liyixxxuan @zengyh1900 @GuoywGuo @tianfanx @lindahua @doubledaibo Paper: https://t.co/FNhWE4ZBTv Code will be open-sourced soon.

arxiv.org

Generating human videos from a single image while ensuring high visual quality and precise control is a challenging task, especially in complex scenarios involving multiple individuals and...

Zhenzhi Wang

@zhenzhiwang

1 year

Want to generate camera controllable human videos like a real movie clip? Try our HumanVid dataset and a baseline model combined by AnimateAnyone and CameraCtrl. Project Page: https://t.co/ix2w7jYelN Paper: https://t.co/D8uCejx6KZ Data and code coming soon.

0

2

5

Yuwei Guo

@GuoywGuo

6 months

🎯 Inspired by Gen-4's impressive multi-shot results? 🚀 Check out our recent work on scene-level video generation via Long Context Tuning! 🏠 Homepage: https://t.co/kp7z3U6wLz 📄 Paper:

Yusuf Altunbıçak

@eyupyusufa

6 months

8 different scenes, all with the same place and character consistency. Created with Runway Gen-4.

1

4

18

Tianfan Xue

@tianfanx

8 months

[1/3] Want to capture a fantastic HDR image by 2 simple shots with your cellphone? Try UltraFusion HDR. It takes two images with exposure differences up to 9 stops, and it robustly generates HDR output. Try your own captured images (supporting 4Kx3K): https://t.co/P7V62H3KNE

1

17

66

Zhibing Li

@ZhibingLi_6626

9 months

🎉 Excited to introduce IDArb! 🎉 Our method can predict plausible and 𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝘁 geometry and PBR material for 𝗮𝗻𝘆 𝗻𝘂𝗺𝗯𝗲𝗿📷 of input images under 𝘃𝗮𝗿𝘆𝗶𝗻𝗴 𝗶𝗹𝗹𝘂𝗺𝗶𝗻𝗮𝘁𝗶𝗼𝗻𝘀☀️ ! Webpage: https://t.co/GvfyvbEq25

2

25

76

Zhenzhi Wang

@zhenzhiwang

10 months

Excited to attend NeurIPS 2024 during Dec 10th - 15th, with two 1-st author paper accepted. Hope to have a chat about (human) video generation! Papers: 1: https://t.co/IYAD13rdky 2:

arxiv.org

Human image animation involves generating videos from a character photo, allowing user control and unlocking the potential for video and movie production. While recent approaches yield impressive...

0

3

Zhenzhi Wang

@zhenzhiwang

1 year

Want to generate camera controllable human videos like a real movie clip? Try our HumanVid dataset and a baseline model combined by AnimateAnyone and CameraCtrl. Project Page: https://t.co/ix2w7jYelN Paper: https://t.co/D8uCejx6KZ Data and code coming soon.

7

37

116

Yixuan Li

@liyixxxuan

2 years

We have released the MatrixCityPlugin, render sequences, config and related scripts ( https://t.co/4Qu5NwJ8ZT). - collect large-scale and high-quality city data. - control lighting, fog, human and car crowds. - obtain depth, normal, decomposed BRDF materials.

github.com

[ICCV 2023] MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond. - city-super/MatrixCity

Yixuan Li

@liyixxxuan

2 years

#ICCV2023 We curated a large-scale, comprehensive, and high-quality synthetic dataset MatrixCity for city-scale neural rendering based on the UE5 City Sample. @ICCVConference -Project: https://t.co/MwBxJWg4Kw -Paper: https://t.co/Xoix3IadpD

0

42

152

Ceyuan Yang

@CeyuanY

2 years

Yuwei (@GuoywGuo) just released #AnimateDiff v3 and #SparseCtrl which allows to animate ONE keyframe, generate transition between TWO keyframes and interpolate MULTIPLE sparse keyframes. RGB images and scribbles are supported for now. Github: https://t.co/IeQ5ui4TDC

14

56

312

Zhenzhi Wang

@zhenzhiwang

2 years

Method: (1) define human interactions as human joint contact pairs and let LLM to generate them (2) train a spatially controllable MDM on every joint and takes contact pairs as spatial condition. We could generalize to arbitrary number of humans without interaction training data

1

0

2

Zhenzhi Wang

@zhenzhiwang

2 years

Excited to present our new work, InterControl. TL;DR: We could generate human motion interactions with spatially controllable MDM that is only trained on single-person data. arxiv: https://t.co/IYAD13rdky code: https://t.co/fwXZdPvrns

1

7

31