Zhenzhi Wang Profile
Zhenzhi Wang

@zhenzhiwang

Followers
162
Following
11
Media
3
Statuses
16

Ph.D candidate at MMLab, CUHK. Working on human-centric video generation. Previously NJU CS&Physics.

Sha Tin District, Hong Kong
Joined November 2020
Don't wanna be here? Send us removal request.
@zhenzhiwang
Zhenzhi Wang
3 months
Video generation model could now generate multi-person dialogue videos or talking videos with HOI, from text prompts and N pairs of {cropped reference images (e.g., head images), audio} without any lip post-processing. Paper: https://t.co/AoIOtYLYbD Demo: https://t.co/c3QfgvHf5i
3
7
19
@prime_cai
Shengqu Cai
2 days
Some random thoughts I've been having about video world model/long video generation since working on Mixture of Contexts (whose title could also be "Learnable Sparse Attention for Long Video Generation"): ๐ŸšจSemi-long Post Alert๐Ÿšจ 1. Learnable sparse attention is still underrated
@GordonWetzstein
Gordon Wetzstein
19 days
How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context? We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing. 1/4
6
36
207
@zhenzhiwang
Zhenzhi Wang
3 months
HumanVid's extension to multi-person human image animation has been accepted by ICCV25! Thanks for collaborators @liyixxxuan @zengyh1900 @GuoywGuo @tianfanx @lindahua @doubledaibo Paper: https://t.co/FNhWE4ZBTv Code will be open-sourced soon.
Tweet card summary image
arxiv.org
Generating human videos from a single image while ensuring high visual quality and precise control is a challenging task, especially in complex scenarios involving multiple individuals and...
@zhenzhiwang
Zhenzhi Wang
1 year
Want to generate camera controllable human videos like a real movie clip? Try our HumanVid dataset and a baseline model combined by AnimateAnyone and CameraCtrl. Project Page: https://t.co/ix2w7jYelN Paper: https://t.co/D8uCejx6KZ Data and code coming soon.
0
2
5
@GuoywGuo
Yuwei Guo
6 months
๐ŸŽฏ Inspired by Gen-4's impressive multi-shot results? ๐Ÿš€ Check out our recent work on scene-level video generation via Long Context Tuning! ๐Ÿ  Homepage: https://t.co/kp7z3U6wLz ๐Ÿ“„ Paper:
@eyupyusufa
Yusuf Altunbฤฑรงak
6 months
8 different scenes, all with the same place and character consistency. Created with Runway Gen-4.
1
4
18
@tianfanx
Tianfan Xue
8 months
[1/3] Want to capture a fantastic HDR image by 2 simple shots with your cellphone? Try UltraFusion HDR. It takes two images with exposure differences up to 9 stops, and it robustly generates HDR output. Try your own captured images (supporting 4Kx3K): https://t.co/P7V62H3KNE
1
17
66
@ZhibingLi_6626
Zhibing Li
9 months
๐ŸŽ‰ Excited to introduce IDArb! ๐ŸŽ‰ Our method can predict plausible and ๐—ฐ๐—ผ๐—ป๐˜€๐—ถ๐˜€๐˜๐—ฒ๐—ป๐˜ geometry and PBR material for ๐—ฎ๐—ป๐˜† ๐—ป๐˜‚๐—บ๐—ฏ๐—ฒ๐—ฟ๐Ÿ“ท of input images under ๐˜ƒ๐—ฎ๐—ฟ๐˜†๐—ถ๐—ป๐—ด ๐—ถ๐—น๐—น๐˜‚๐—บ๐—ถ๐—ป๐—ฎ๐˜๐—ถ๐—ผ๐—ป๐˜€โ˜€๏ธ ! Webpage: https://t.co/GvfyvbEq25
2
25
76
@zhenzhiwang
Zhenzhi Wang
10 months
Excited to attend NeurIPS 2024 during Dec 10th - 15th, with two 1-st author paper accepted. Hope to have a chat about (human) video generation! Papers: 1: https://t.co/IYAD13rdky 2:
Tweet card summary image
arxiv.org
Human image animation involves generating videos from a character photo, allowing user control and unlocking the potential for video and movie production. While recent approaches yield impressive...
0
0
3
@zhenzhiwang
Zhenzhi Wang
1 year
Want to generate camera controllable human videos like a real movie clip? Try our HumanVid dataset and a baseline model combined by AnimateAnyone and CameraCtrl. Project Page: https://t.co/ix2w7jYelN Paper: https://t.co/D8uCejx6KZ Data and code coming soon.
7
37
116
@liyixxxuan
Yixuan Li
2 years
We have released the MatrixCityPlugin, render sequences, config and related scripts ( https://t.co/4Qu5NwJ8ZT). - collect large-scale and high-quality city data. - control lighting, fog, human and car crowds. - obtain depth, normal, decomposed BRDF materials.
Tweet card summary image
github.com
[ICCV 2023] MatrixCity: A Large-scale City Dataset for City-scale Neural Rendering and Beyond. - city-super/MatrixCity
@liyixxxuan
Yixuan Li
2 years
#ICCV2023 We curated a large-scale, comprehensive, and high-quality synthetic dataset MatrixCity for city-scale neural rendering based on the UE5 City Sample. @ICCVConference -Project: https://t.co/MwBxJWg4Kw -Paper: https://t.co/Xoix3IadpD
0
42
152
@CeyuanY
Ceyuan Yang
2 years
Yuwei (@GuoywGuo) just released #AnimateDiff v3 and #SparseCtrl which allows to animate ONE keyframe, generate transition between TWO keyframes and interpolate MULTIPLE sparse keyframes. RGB images and scribbles are supported for now. Github: https://t.co/IeQ5ui4TDC
14
56
312
@zhenzhiwang
Zhenzhi Wang
2 years
Method: (1) define human interactions as human joint contact pairs and let LLM to generate them (2) train a spatially controllable MDM on every joint and takes contact pairs as spatial condition. We could generalize to arbitrary number of humans without interaction training data
1
0
2
@zhenzhiwang
Zhenzhi Wang
2 years
Excited to present our new work, InterControl. TL;DR: We could generate human motion interactions with spatially controllable MDM that is only trained on single-person data. arxiv: https://t.co/IYAD13rdky code: https://t.co/fwXZdPvrns
1
7
31