Generative video models are rapidly improving in quality. Meet Replay, a new AI model that can generate stunning videos from text.
Replay v0.1 is designed to create ultrasmooth HD videos with a new interface. Available today for everyone.
What's New?
1. Replay understands plain
Excited to present VectorFusion at
#CVPR2023
, the first diffusion model for text-to-SVG generation 🖊️
Find me at the Tuesday AM session, Poster 182
We are also hiring exceptional engineers and researchers to build generative models at
@genmoai
. DM to talk about roles, diffusion
Check out our new paper - we put NeRF on a diet! Given just 1 to 8 images, DietNeRF renders consistent novel views of an object using prior knowledge from large visual encoders like CLIP ViT.
w/ Matthew Tancik,
@pabbeel
1/
Check out
#DreamFusion
: our paper on AI-based text-to-3D generation! Just take a look at these synthetic robots 🤖🤖🤖
This draws on years of work from our fabulous team on diffusion models and neural rendering, and I'm so excited for what comes next.
Happy to announce DreamFusion, our new method for Text-to-3D!
We optimize a NeRF from scratch using a pretrained text-to-image diffusion model. No 3D data needed!
Joint work w/ the incredible team of
@BenMildenhall
@ajayj_
@jon_barron
#dreamfusion
It's alive! Since there's interest, I released my code for automatically generating SVG vector graphics from textual captions: . Based on
@tzumaoli
's diffvg and
@OpenAI
's CLIP. Caption: "a painting of an evergreen tree"
PixelCNNs generate images pixel-by-pixel in a fixed order. Can we choose the order? Yes! We propose Locally Masked Convolution: a simple, efficient operation for arbitrary order training+testing & more accurate likelihoods.
Paper w
@pathak2206
@pabbeel
1/8
Wow! DreamFusion has been given the Outstanding Paper award at
#iclr2023
Huge congratulations to my co-authors
@poolio
@BenMildenhall
@jon_barron
, and thank you to the conference organizers and reviewers for the feedback and recognition!
Check out
#DreamFusion
: our paper on AI-based text-to-3D generation! Just take a look at these synthetic robots 🤖🤖🤖
This draws on years of work from our fabulous team on diffusion models and neural rendering, and I'm so excited for what comes next.
New work on autoregressive generative models! We improve the expressiveness of your favorite continuous autoreg models like Trajectory Transformer, WaveNet and Image GPT with Adaptive Categorical Discretization. See below for code. Paper at
#UAI2022
Check out our latest work on generative modeling with Adaptive Categorical Discretization (AdaCat)!
AdaCat generalizes uniform discretization and can improve existing autoregressive models on density estimation for tabular data, images, audio, and trajectories in RL. Thread: 1/N
Use AI to Paint a Video
1️⃣ Upload a photo 📷
@genmoai
2️⃣ Paint over it and write a caption
3️⃣ One minute later, it's animated
🎥🖌️ New video inpainting model out now
I've heard several people say they're no longer surprised by new capabilities of generative models. Tough crowd! What would personally surprise or delight you?
I'll be giving my dissertation talk today at 3-4 PM Pacific Time on Transferable Generative Models. Send me a DM if you'd be interested in joining the Zoom.
Modern generative models feel like powerful alien technologies that crash landed on Earth. Yet, they’re based on surprisingly simple concepts. I’m glad to have contributed to the popularization of diffusion models with our 2020 DDPM paper, and am so excited for progress to come.
Generative models (such as Dall-E 2 and PaLM) are becoming just such an insanely powerful, almost magic-like technology, it's completely NUTS. And it seems like most (non-ML) people still don't fully grasp the implications. This technology will thoroughly transform society.
What if your IDE crashed with specific variable names? 😱
#ML4Code
is powerful, but models like BERT are sensitive to adversarial code edits, performing worse than random guessing! We find that contrastive learning is more robust. Check out ContraCode: 👇
Having a blast in New Orleans! Crazy to think that we presented DDPM as a virtual poster just 3 years ago at NeurIPS 2020. In person food is much better 😋 Check out these beignets I made in a few minutes on Genmo.
Come find / DM me to chat about big diffusion models for video.
Meet
@GenmoAI
. We help you create media in the formats you need to tell your stories. Try Genmo today with hilarious and immersive text-to-video generation 🎬
Announcing Genmo Video, a generative media platform with a new text-to-video model that can generate immersive live artwork from any prompt or any image.
What will you create? 🎨▶️
Free public access:
Discord:
👇1/n
Controllability is a major problem for generative models, so it's hard to find the exact result that you're looking for in the latent space. Today, we announced Replay Camera Controls to allow creators to take the reins on generative video.
Camera movement drives emotion, pace, and mood in filmmaking.
Announcing Replay Camera Controls. Starting today, zoom, pan, roll and tilt the virtual camera to render gorgeous, swooping effects. 💫
Yesterday, you met Replay, our new, ultra-crisp video model. Now, take even more
Generate mesmerizing videos with diffusion on a single GPU! Try out "Journey to the BAOAB-limit: finding effective MCMC samplers for score-based models" at . Joint work with the awesome
@poolio
. Caption: "a DSLR photo of a large basket of rainbow macarons."
PyTorch aficionados: what is your process for locating the cause of NaNs, especially that only seem to occur in the backward pass? detect_anomaly tells me MulBackward0 returns nans, but the torchviz graph contains 38 ops with that name 😅
DietNeRF has been accepted to
#ICCV2021
! 🍾 Special thanks to collaborators Matthew Tancik and
@pabbeel
, to reviewers and to all those who gave feedback including
@aditij
for coming up with the name! We’ll be at the conference in October.
Paper:
Check out our new paper - we put NeRF on a diet! Given just 1 to 8 images, DietNeRF renders consistent novel views of an object using prior knowledge from large visual encoders like CLIP ViT.
w/ Matthew Tancik,
@pabbeel
1/
I'll be at NeurIPS in New Orleans this week, Tuesday night onward. Looking forward to catching up with old friends and meeting new folks! Let me know if you'd like to chat
I’ll be at NeurIPS this year in New Orleans. Excited to cheer on my collaborators
@AleEscontrela
@AdemiAdeniji
during the Wednesday morning poster session. In VIPER, we use video generative models to train reinforcement learning agents. Come find us at poster 1412 or DM to meet.
Today we're releasing Video Prediction Rewards (VIPER 🐍), a simple yet powerful method for extracting rewards from video prediction models!
VIPER learns reward functions from raw videos, and generalizes to entirely new domains for which no training data is available
🧵 thread
Check out our new paper - we put NeRF on a diet! Given just 1 to 8 images, DietNeRF renders consistent novel views of an object using prior knowledge from large visual encoders like CLIP ViT.
w/ Matthew Tancik,
@pabbeel
1/
Fun application: Our Locally Masked PixelCNN can generate images along a Hilbert space-filling curve. The Hilbert ordering is resolution agnostic and ensures that consecutively generated pixels are nearby! Video of unconditional generation along Hilbert curve: 2/8
🖼️ Hot off the presses: V2 image generation! We've been hard at work on Genmo's new text-to-image model. It generates gorgeous 1024x1024 pictures, with improved coherence and style.
Check out our community's results at . Who wants access?
Generative video models are rapidly improving in quality. Meet Replay, a new AI model that can generate stunning videos from text.
Replay v0.1 is designed to create ultrasmooth HD videos with a new interface. Available today for everyone.
What's New?
1. Replay understands plain
Last week, I presented Locally Masked Convolution for Autoregressive Models at
#UAI2020
. It was a great conference with insightful conversations! You can read our paper at , or see the talk at . Joint work with
@pabbeel
@pathak2206
We build some crazy infrastructure at Genmo to improve user experience. Today, we released streaming previews. Right after submitting a video to our GPU cluster, Genmo sends users renders of their AI videos. It's super fun to play with. Try queuing up to 4 videos at a time and
Replay now streams AI-generated videos for fast iteration. Your clips will start rendering and streaming almost as soon as you click "submit".
Genmo's infrastructure ships pixels straight off the GPU to your computer screen 📦
Thanks
@shaneguML
! My Twitter profile photo was generated with Score Distillation Sampling. We can also sample images by optimizing 2D Fourier Feature Net weights, and tried some early experiments using Mitsuba 3 as the differentiable renderer.
Besides incredible results, a great contribution of this work is to showcase the potentials of Score Distillation Sampling (SDS): "SDS allows us to optimize samples in an arbitrary parameter space"
"Text-image diffusion as a prior + X" has so many applications to come!
.
@genmoai
releases Replay, a new AI model that can generate videos from text
blog:
Replay is powered by aligning a new diffusion model for videos with the LLM behind Genmo Chat. Replay excels at generating at high-fidelity visual media without expert
Fast multicloud data transfer is going to be important for synthesis and representation learning: need to move TB of data quickly and cheaply to the accelerators. Looking forward to trying this and great work
@_parasj
!
Releasing Skyplane, a new open-source tool to move huge datasets between clouds.
Skyplane is:
1. 🔥 Blazing fast (110x faster)
2. 🤑 Cheap (4x cheaper)
3. 🌐 Universal (AWS, Azure and GCP)
Read more:
1/
Very exciting progress on high-resolution image generation through denoising diffusion probabilistic models, a collaboration with
@hojonathanho
@pabbeel
New paper on diffusion probabilistic models with
@ajayjain318
&
@pabbeel
:
Likelihood-based generative model with SOTA FID=3.17 on unconditional CIFAR10, and ProgressiveGAN-like quality on 256x256 LSUN & CelebA-HQ (sometimes generates dataset watermarks!)
Text-to-3D-to-Image)! I ran the output of DreamFusion through
#stablediffusion
#img2img
. This workflow would be amazing for generating artwork with precise control over the subject.
#dreamfusion
How do drivers predict future pedestrian behavior? Our map-aware model, the Discrete Residual Flow Network, predicts multimodal and uncertain pedestrian behaviors over long time horizons. We'll be giving a talk at
#CoRL2019
! Paper:
Replay FX gives generative AI videos very cool kaleidoscopic effects🛟🌸
We introduced a library of six customizable effects on our website. FX come with controllable sliders to adjust their strength and playback speed
Check out our paper
@NeurIPSConf
2020: Sparse Graphical Memory for Robust Planning. Allows an RL agent to build, abstract and plan over a memory of previous observations, really helping with robustness of long-horizon navigation.
New paper coming up at
@NeurIPSConf
- Sparse Graphical Memory for Robust Planning uses state abstractions to improve long-horizon navigation tasks from pixels!
Paper:
Site:
Co-led by
@emmons_scott
,
@ajayj_
, and myself.
[1/N]
Wonderful work all! The generative modeling space continues to be very exciting. These architectural and sampling insights should be useful for other models - especially looking forward to testing dynamic thresholding and the efficient U-Net soon.
Introducing Imagen, a new text-to-image synthesis model that can generate high-fidelity, photorealistic images from a deep level of language understanding. Learn more and and check out some examples of
#imagen
at
We just released AITemplate -- a high-performance Inference Engine -- similar to TensorRT but open-source.
It is really fast!
On StableDiffusion, it is 2.5x faster than the XLA based version released last week.
Great work
@Michaelvll1
@_parasj
and friends! A nice approach for combining local reasoning over a graph with global self-attention. The GNN may be learning a positional embedding based on local graph structure.
"Representing Long-Range Context for Graph
Neural Networks with Global Attention" by
@_parasj
@Michaelvll1
et al. at
#NeurIPS2021
Strong results on graph classification when combining GNN --> Transformers.
PDF:
Genmo text-to-video
Text prompts + camera motion
1) aerial view of a cosy cottage, snow fall, christmas
2) christmas tree in a cosy cottage
3) long hallway, cosy cottage, christmas decorations
4) top view, celebratory meal, dinner table, cosy cottage, christmas
5) Full moon,
This is a great application of DDPM by
@cnxhk
, and it's impressive that only 6 iterations are needed during sampling. Generative models should scale with data complexity rather than dimensionality!
WaveGrad by
@cnxhk
et al is a clever new neural net for generating speech. It starts with random noise, and iteratively denoises, essentially hallucinating a voice in the noise. Listen to the voice emerge, as the number of iterations increases with time 🔉
New paper on contrastive learning for programming languages with
@_parasj
@tianjun_zhang
@pabbeel
@mejoeyg
Ion Stoica! ContraCode learns similar representations for equivalent but differently implemented JavaScript programs using compiler-based data augmentations
#ml4code
Can contrastive learning be applied to text? Yes! We propose Contrastive Code Representation Learning: an unsupervised pretext task to learn representations of program *functionality*. Paper w/
@ajayj_
Tianjun Zhang
@pabbeel
@mejoeyg
Ion Stoica 👇 1/7
Chat with us tomorrow at
#NeurIPS2020
during the 9-11 AM PST poster session! You can check out the talk or teleport to our poster from . We're in Town A0, Spot B0.
New paper coming up at
@NeurIPSConf
- Sparse Graphical Memory for Robust Planning uses state abstractions to improve long-horizon navigation tasks from pixels!
Paper:
Site:
Co-led by
@emmons_scott
,
@ajayj_
, and myself.
[1/N]
Our LMConv layer is cheap to evaluate and easy to implement in pure PyTorch by masking the im2col matrix before matrix multiplication.
Code is open-source at 4/8
@junyanz89
@ndrewLiu
Cool! Wonderful to see more progress on few shot NeRF. You might enjoy DietNeRF, one of our works that approaches the overfitting problem with another auxiliary loss:
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Achieves SotA results on text-to-image generation, text-to-video generation, video prediction, etc. Outperforms DALL-E in text2image.
abs:
repo:
It was a pleasure to catch up with
@profjoeyg
earlier this week on the Generating Conversation podcast. Joey was a fantastic mentor and collaborator at Berkeley. Check out the interview below.
Latest interview from
@profjoeyg
is out!
We chatted with
@ajayj_
, who's the co-founder of
@genmoai
.
The conversations touches on the history of diffusion models 🖼️, some awesome demos 🎥, and the tech behind Genmo 🤖. Check it out!
Very exciting progress on high-resolution image generation through denoising diffusion probabilistic models, a collaboration with
@hojonathanho
@pabbeel
Research internships are now available
@Waabi_ai
. All year long, with duration of 3-12 months. Available in both Canada as well as US. Join the team at the forefront of innovation in
#SelfDrivingCars
!
Apply:
I am very excited that our work applying approximate computing to the Google TPU won the best paper at MLArchSys
@ISCA2021
!
Our talk is today at 10:15am PT at
#ISCA
We're at the UAI virtual poster session now. Come find
@qiyang_li
and I at Poster Session II (c), poster spot G5. Joint work with
@pabbeel
at
@berkeley_ai
.
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
significantly improves over previous GANs and outperforms distilled diffusion models in terms of sample quality and speed
abs:
project page:
LMConv improves PixelCNN++ (
@TimSalimans
et al) CIFAR10 likelihoods to 2.89 bpd by averaging over multiple orders (with one set of parameters). At test time, generate coherent image completions by choosing a maximum context order: just sample the missing pixels last! 5/8
PixelCNNs (
@avdnoord
et al 2016) introduced a convolutional inductive bias with parallel training, useful for generating images in raster scan order and fitting latent priors (eg VQ-VAE). SPN (
@jacobmenick
et al) proposed a variant to support a different subscale order. 7/8
I'm sharing this since we've made a big update to our paper from last year, with lots of new robustness experiments, insights and open questions. Co-authored with
@_parasj
@tianjun_zhang
@pabbeel
@mejoeyg
and Ion Stoica. 3/3
Can contrastive learning be applied to text? Yes! We propose Contrastive Code Representation Learning: an unsupervised pretext task to learn representations of program *functionality*. Paper w/
@ajayj_
Tianjun Zhang
@pabbeel
@mejoeyg
Ion Stoica 👇 1/7