![Tianxing Wu@ECCV Milan Profile](https://pbs.twimg.com/profile_images/1671038315259367425/cujdXRbj_x96.jpg)
Tianxing Wu@ECCV Milan
@_tianxing
Followers
119
Following
192
Statuses
35
Interested in improving the inference quality of diffusion models? Checkout our #FreeInit poster at No.226 from 10:30am to 12:30pm tomorrow (Tuesday 1 Oct). Will be happy to discuss video diffusion models, noise initialization and more! #ECCV2024
📢Another Free Lunch for GenAI📢 We propose #FreeInit, a sampling strategy to improve temporal consistency of video generation at inference time, requiring no training and can be plugged into any diffusion model - Project: - Code:
0
3
21
RT @JingkangY: Carry your cool backpack 🎒 and come to visit us in the October 1 and 2 afternoon sessions! Octopus 🐙 - a vision-language m…
0
3
0
RT @liuziwei7: 📢#ECCV24 Welcome to check out our multimodal GenAI work @eccvconf📢 * Generative Models: - LGM: - L…
0
14
0
RT @liuziwei7: 🔥Video Generation with Image Prompts🔥 #CVPR2024 We propose *video generation with image prompts* 📽️VideoBooth📽️, providing…
0
29
0
RT @_akhaliq: FreeInit : Bridging Initialization Gap in Video Diffusion Models @Gradio demo is out on @huggingface demo:
0
24
0
The @Gradio demo for #FreeInit is now out on @huggingface🤗! Feel free to try it out and play around with the parameters :) HF demo link:
FreeInit: Bridging Initialization Gap in Video Diffusion Models paper page: Though diffusion-based video generation has witnessed rapid progress, the inference results of existing models still exhibit unsatisfactory temporal consistency and unnatural dynamics. In this paper, we delve deep into the noise initialization of video diffusion models, and discover an implicit training-inference gap that attributes to the unsatisfactory inference quality. Our key findings are: 1) the spatial-temporal frequency distribution of the initial latent at inference is intrinsically different from that for training, and 2) the denoising process is significantly influenced by the low-frequency components of the initial noise. Motivated by these observations, we propose a concise yet effective inference sampling strategy, FreeInit, which significantly improves temporal consistency of videos generated by diffusion models. Through iteratively refining the spatial-temporal low-frequency components of the initial latent during inference, FreeInit is able to compensate the initialization gap between training and inference, thus effectively improving the subject appearance and temporal consistency of generation results. Extensive experiments demonstrate that FreeInit consistently enhances the generation results of various text-to-video generation models without additional training.
1
10
46
RT @hamadakoichi: Anime Synthesis with FreeInit + AnimateDiff #FreeInit #AnimateDiff #StableDiffusion #AIAnime #AIArt
0
8
0
@DigThatData @_akhaliq Yeah it looks like this partial renoising strategy may have some similar functionalities. Very interesting. Thanks for discussion! I think the training-inference gap is the core problem, and other better approaches can be developed to address this.
0
0
0
@DigThatData @_akhaliq In the ablation, we show that using multiple passes w/o NR can indeed improve consistency, but not as good. From my experience, multiple passes cannot fix some extreme inconsistencies, and this is exactly where the FFT operations takes part in. Please refer to Fig.A11 for details
0
0
0
RT @op7418: 南洋理工发布了一个可以大幅提高AI视频生成中内容一致性的方法FreeInit,演示看起来非常流畅。而且可以跟现有的SD生态结合。 他们还发了跟Animatediff结合的方法,等有大佬做插件就可以用了。视频是使用了FreeInit和未使用FreeInit…
0
24
0
RT @liuziwei7: 📢Another Free Lunch for GenAI📢 We propose #FreeInit, a sampling strategy to improve temporal consistency of video generatio…
0
13
0
@Truongtoc1980 @_akhaliq Also, our core observation is: the quality of the initial noise's low frequency components is crucial for consistency. FreeInit is just one concise attempt, I think more approaches can be developed to address this problem.
0
0
1