Happy to announce Imagine Flash, which is a real-time image synthesis! Watch in real time as the image evolves with each character you type!
I'm proud to be leading the Flash project with my teammates - it's incredibly rewarding to witness the transformation of a quick demo I
⚡️SD3-Turbo: Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Following Stable Diffusion 3, my ex-colleagues have published a preprint on SD3 distillation using 4-step, while maintaining quality.
The new method – Latent Adversarial Diffusion
You don't need EfficientNets. Simple tricks make ResNets better and faster than EfficientNets
Revisiting ResNets: Improved Training and Scaling Strategies
🤙
Self-supervised Learning for Medical images
Due to fixed imaging procedures, medical images like X-ray or CT scans are usually well aligned.
This gives an opportunity to utilize such an alignment to automatically mine similar pairs of images for training
Swin Transformer: New SOTA backbone for Computer Vision🔥
👉 What?
New vision Transformer architecture called Swin Transformer that can serve as a backbone in computer vision instead of CNNs.
📝
⚒ Code (soon)
Thread 👇
We have released the code and weights for our
#CVPR2023
paper "Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model"!
code:
abs:
project:
The demo is below:
Staff Research Scientist: Personal Update
I have some exciting news that I'd like to share with you! On Monday, I was promoted to E6, which means I am now a Staff Research Scientist at Meta GenAI.
This was made possible thanks to the significant impact and scope of a Generative
🔴 PERFUSION: a generative AI model from NVIDIA that fits on a floppy disk 💾
It takes up just 100KB. Yes, you heard it right, much less than any picture you take with your mobile phone! Why is this revolutionary and can change everything?
I'll tell you 🧵👇
Come to see our
#CVPR2023
poster "Avatars Grow Legs: Generating Smooth Human Motion from Sparse Tracking Inputs with Diffusion Model".
Learn how to synthesize full body motion based on head and wrists only!
webpage:
Today at 10:30-12:30, poster
#46
.
Neural 3D Video Synthesis
NERF-like model generates frames conditioned on position, view direction and time-variant latent code.
When it gets faster, it will enable mind-blowing applications!
📝
🌐
⚔️ FastNeRF vs NeX ⚔️
Smart ideas do not come in the only head. FastNeRF has the same idea as in NeX, but a bit different implementation. Which one is Faster?
Nex
FastNeRF
To learn about differences between the two -> thread 👇
NeX: Real-time View Synthesis with Neural Basis Expansion
An amazing new approach to novel view synthesis a combination of multiplane image (MPI) and neural basis expansion (NeRF-like nets). It can reproduce spectacular complex view-dependent effects
🌐
Check out our new
#CVPR21
paper!
Discovering Relationships between Object Categories via Universal Canonical Maps
In collaboration with FAIR (
@NataliaNeverova
, P. Labatut,
@davnov134
and A. Vedaldi)
🌐
▶️
📝
StyleGAN2 for transferring garments between different poses and body shapes. The results are pretty neat! Virtul try-on is coming soon folks!
🌎Project page:
🧥Interactive example:
Our paper "Avatars Grow Legs" (CVPR 2023) is out!
TL; DR: Fast Diffusion models to generate full body motions based on head and hands tracking inputs,
Will release the code in a few days.
My new video on self-supervised representation learning (also easy to understand for beginners). I explain CliqueCNN which builds compact cliques for classification as a pretext task and I discuss other self-supervised learning approaches.
@itsbautistam
Last week I gave a talk at Heidelberg SIAM chapter
"Identification of Humpback Whales using Deep Metric Learning".
I talked about our recent CVPR'19 paper and about Humpback Whale Identification challenge at
@kaggle
.
Slides:
Since joining Meta GenAI, our team focused on speed advancements in image synthesis.
Exciting news!🚀 We've unlocked high-quality image synthesis in just ~5 sec! MZ showcased our progress at Meta Connect.
Try it out with the /imagine command in our AI chatbot in FB, IG or WA
Hiring interns for our team in Reality Labs Zurich!
We are looking for PhD students with a strong research background, proven by publications in top-tier venues. The primary goal of the internship is to submit a paper to CVPR 2023.
Details in the thread⬇️
Cool work from
@facebookai
!
It can generate an image of input text in any style provided an example of reference style. Architecture loosk similar to StyleGAN, but instead of noise, every nomalization layer is conditioned on the encoded style vector.
We willl be presenting our work "Divide and Conquer the Embedding Space for Metric Learning" at
#CVPR2019
on Tuesday 18th: Poster 24 at 10:15.
Authors: Me, Vadim Tschernezki, Uta Büchler (
@uta0590
) and Björn Ommer. Paper and Code:
Barlow Twins: Self-Supervised Learning via Redundancy Reduction
New self-supervised learning loss: compute cross-correlation matrix between the features of two distorted versions of a sample and make it close to the identity.
🛠️
🔥New DALL-E? Paint by Word 🔥
Edit a generated image by painting a mask atany location of the image and specifying any text description. Or generate a full image just based on textual input.
📝
1/
CvT: Introducing Convolutions to Vision Transformers🔥
SOTA ImageNet Results (almost)
Inject Inductive biases of CNNs (i.e. shift, scale, and distortion invariance) to the ViT architecture while maintaining the flexibility of Transformers.
📝
Thread👇
Our paper on training with pseudo-labels for semantic segmentation, GCPR 2019.
Semi-Supervised Segmentation of Salt Bodies in Seismic Images:
SOTA (1st place) at TGS Salt Identification Challenge.
🌐
📝
#kaggle
#TGS2019
#GCPR19
New video on my YouTube channel!
In this video, I explain VectorNet - a method for future motion prediction based on a vectorized representation of the scene instead of RGB images.
🎬
I'm happy to announce that our team (me,
@KonevSteven
, K. Brodt) was awarded 3rd place within the Waymo Motion Prediction Challenge 🥳
Task: predict trajectories of the agents for 8 seconds into the future.
📜Technical report
We also released our code ↓
How to easily edit and compose images like in Photoshop using GANs🔥
❓What?
Given an incomplete image or a collage of images, generate a realistic image
📌How?
1.Train a regressor to predict StyleGAN latent code even from incomplete image
2.Embedd collage and send it to GAN
Presenting our work "Re-ReND: Real-time Rendering of NeRFs across Devices"
#ICCV23
We show how to bake a NeRF on a mesh with rich view-dependent textures to allow rendering 100-1000 FPS on different devices without loss of quality.
Visit our poster:
ID: 3760
Foyer Sud"- 140
Self-supervised learning: The dark matter of intelligence
Blog post by
@ylecun
and
@ishan_
- well-known experts in self-supervised learning at FAIR.
0/5
They talk about:
- Self-supervised learning as a paradigm in general
...
I wrote a blog post which briefly explains the SMAL model for fitting 3D shapes of animals to RGB images paper.
Based on paper “3D Menagerie: Modeling the 3D Shape and Pose of Animal”, CVPR 2017
@silvia_zuffi
@Michael_J_Black
🌐
StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery 🔥
Use CLIP model in order to navigate image editing in StyleGAN by text queries.
📝Paper
⚙️ code
Thread 👇
Papers from A. Efros are always the top 🔝
One of the favourite papers I've recently read:
Space-Time Correspondence as a Contrastive Random Walk
* Tracking w/o supervision using random walk between image patches.
🌐
Happy to share that we got 1/1
#NeurIPS2022
papers acepted this year from our small team in Reality Labs Zurich!
Working on the camera ready and will upload it to arXive soon. Small spoiler: it's on learning implicit 3D shape representations for shape reconstruction.
Some cool results from VQGAN+CLIP experiments
1. "Holy war against capitalism"
2. "Polygonal fast food"
3. "Minecraft Starcraft"
4. "Modern cubist painting"
🎩Colab:
Our team (me,
@ppleskov
and
@shakhrayv
) finished 10th (out of 2131 teams) in Humpback Whale Identification challenge on
@kaggle
.
Special thanks to
@odsai_en
community for fruitful discussions!
🔥Fresh drop - Mixtral-8x22B!
As usual,
@MistralAI
stays true to their style by simply leaving a magnet link to a torrent with the weights of their new model. Nice trolling!
The new model is a Mixture of Experts Mixtral-8x22B:
- Model size: 262 GB (I assume the weights are in
My youtube video explaning HOW to EARN $6000 By WINNING A KAGGLE AUTONOMOUS DRIVING COMPETITION.
The video is on our 3rd place solution for the
@Kaggle
@LyftLevel5
Motion Prediction for Autonomous Vehicles competition.
🛠️
🎬
Our paper was accepted as oral at ECCV 2018!
"A Style-Aware Content Loss for Real-time HD Style Transfer"
Artsiom Sanakoyeu*, Dmytro Kotovenko*, Sabine Lang (
@lang254
) , Björn Ommer
Project page:
Source code is coming soon.
A few weeks ago, Mark announced the creation of a new organization within
#Meta
- GenAI, focusing solely on Generative AI. Our team has left Reality Labs & joined the new org.
Thrilled as I've been working on diffusion models for the past year - now full steam ahead! 🚀
#GenAI
We're looking for talented PhD interns to join our team at Meta Reality Labs in Zurich.
Our focus is on 3D human motion synthesis & tracking for AR/VR, and we're offering the chance to work on cutting-edge technology like generative models (diffusion, VAEs) for motion synthesis
My blog post on how to design a container with O(1) for insert, remove and get random element.
I draw some nice analogies with the implemention of std::vector.
Can Vision Transformers Learn without Natural Images?
1/ We can pretrain Vision Transformers purely on synthetic fractal data w/o any manual annotations and achieve similar performance on downstream tasks as self-supervised pretraining on ImageNet...
📝
Some nice stylization results on style transfer from our work "A Content Transformation Block For Image Style Transfer",
#CVPR2019
. More result are on the project page
🌐
▶️
📝
Germans are building a European analogue of OpenAI
The German startup Aleph Alpha, which is based in Heidelberg, recently raised $ 27M . The task, they set themselves ambitious (even too much) - they want to create another breakthrough in AI, akin GPT-3.
PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models
#CVPR2020
Upsample photo by finding a proper latent vector in pretrained StyleGan
Our style transfer on steroids at ICCV19!
"Content and Style Disentanglement for Artistic Style Transfer"
We learn subtle variations of styles and disentangle style from content
Project page:
Video:
#ICCV19
Google open-sourced its AutoML framework for model architecture. It automatically finds the right model architecture for any classification problem.
Now you can write `fit(); predict()` and call it a day! Of course, if you have enough GPUs 😅
Learning High Fidelity Depths of Dressed Humans by Watching TikTok Dance Videos
The single-frame depth is refined
by self-supervised leveraging local transformations of body parts to enforce geometric consistency across different poses.
In addition to Llama 3, today we’re also publishing a new paper: Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation ➡️
This work from GenAI researchers is enabling new image generation features in Meta AI on
@WhatsApp
& web.
ViViT: A Video Vision Transformer
Pure transformer based model for video
classification, drawing upon the recent success in image classification.
It extracts spatiotemporal tokens from the video, which are then encoded by a series of transformer layers
📝
Any pointers to self-supervised learning papers, where geometrical equivariance is enforced in the learned representaions? RotNet is a simple example of equivariance. "Unsupervised Part-Based Disentangling of Object Shape and Appearance" is another example.
Anything else?
New Video!
Computer Vision for animals is a fast-growing and very promising sub-field.
I this video I explain how to reconstruct a 3D model of an animal with a single photo using a cycle consistency loss.
Graph Representation Learning Book
A brief but comprehensive introduction to graph representation learning, including methods for embedding graph data, graph neural networks, and deep generative models of graphs.
Happy to share that our GCPR'19 paper was selected for an oral presentation!
"Semi-Supervised Segmentation of Salt Bodies in Seismic Images" : 1st place solution at TGS Salt Identification Challenge
@kaggle
@TGScompany
Paper:
#GCPR19
#kaggle
Generative Adversarial Transformers
📝
🛠️
The GANsformer leverages a bipartite structure to allow long-range interactions, while evading the quadratic complexity standard transformers suffer from. Presented 2 novel attention types.
Thrilled to announce that our Zurich team was directly responsible for optimizing the AI Sticker generative model. Just type in a description, and watch as it creates personalized stickers for you in IG/FB, WA.
A glimpse of this in an excerpt from keynote by MZ at Meta Connect
MIT: Deep Learning for Art, Aesthetics, and Creativity
An awesome mini-course from MIT on Neural Art and Creativity. This course has a lineup of great invited speakers like Phillip Isola (MIT), Alyosha Efros (UC Berkeley), Jeff Clune (OpenAI), etc.
🌀
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
paper page:
Diffusion models have recently revolutionized the field of image synthesis due to their ability to generate photorealistic images. However, one of the major drawbacks of
Facebook published its ultimate SElf-supERvised (SEER) model.
- They pretrained it on a 1B random, unlabeled and uncurated Instagram images 👀.
- SEER outperformed SOTA self-supervised systems, reaching 84.2% top-1 accuracy on ImageNet.
🛠️
MacaquePose: A Novel “In the Wild” Macaque Monkey Pose Dataset
The dataset provides keypoints for macaques in naturalistic scenes, it consists of 13k images and 16k monkey instances.
📝pdf
🌀Read more in my telegram channel post
New video!
I explain the paper "Taming Transformers for High-Res Image Synthesis".
The paper introduces VQGAN which is a GAN that learns a codebook of context-rich visual parts and uses it to quantize the bottleneck representation at every forward pass
It's such big honour to be selected as as one of the best
#Neurips2019
reviewers!
Moreover, I will get a free conference registration! Awesome!
@hugo_larochelle
Another cool work from OpenAI: Diffusion Models Beat GANs on Image Synthesis. New SOTA for image generation on ImageNet
A new type of generative models is proposed - the Diffusion Probabilistic Model.
📝Paper
🛠️Code
Thread 👇
Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera
#CVPR2020
@JaeShinYoon2
This is pretty cool! The model can do space-time navigation and bullet time effect!
🔗
LatentCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions
A framework that learns meaningful directions in GANs' latent space using unsupervised contrastive learning.
📝
🛠
Thread👇