I'm super excited to launch
@ReshotAI
! It's an AI face editor, and it works so well! 🤯🔥
Starting off with Face expressions: 🤪
👀 Edit eye movement & winking
🔄 Change the head rotation & tilt
😃 Adjust the smile, mouth opening
Here are more examples to see it in action ⬇️⬇️
Upscale-A-Video was just released and it's so good! 🤯
It's a temporal-consistent Diffusion Model for video Super-Resolution, and has some of the best results I've ever seen, look at how sharp those lines become!
More details below ⬇️⬇️
Google just revealed an ABSOLUTE depth estimation model 🤯
As opposed to recent depth models (Marigold, PatchFusion) which aim for maximum details, DMD aims to estimate the ABSOLUTE depth (in meters) within the image
More details below ⬇️⬇️
Meta AI strikes again, with Relightable Gaussian Codec Avatars
This is an update to the Meta Codec Avatars 2.0, building on 3D Gaussian Splatting.
As a result, we get fully relightable real-time avatars, accurate at the hair strand level 🤯
More details below ⬇️⬇️
Another experiment with Gaussian Painters ✨🎨
By optimizing 3D Gaussian Splattings over separate images at several viewpoints, it is possible to get a Steganography effect! Three paintings are hidden in those gaussian splats
I optimized 3D Gaussian Splattings over a single picture on a 2D plane. I'm calling this "Gaussian Painters" 🎨✨
Watch the gaussian splats work to paint the Girl with a Pearl Earring!
Here's how I did it (code below) ⬇️⬇️
Wow Marigold 🌼 depth estimation works extremely well! 🤯
And the best thing is that the checkpoints and code are fully available for commercial use!
Try it out yourself! ⬇️⬇️
This is literally magic 🤯
FMA-Net is a new AI method for video deblurring! It uses complex motion representation learning for spatio-temporally-variant restoration with kernels that are aware of motion trajectories.
More info below ⬇️⬇️
PatchFusion was just released. Compared to ZoeDepth, it predicts depth maps with much finer details, just look at the comparison below! 🤯
Its main contribution is a new Global-to-Local module and Consistency-Aware Training.
More examples below (with code) ⬇️⬇️
LooseControl was just released and it's so good! 🔥
It enables depth-map conditioned image generation, but unlike ControlNet, the 3D boxes enable less strict control with simple bounding boxes. And look at how stable it is across frames!
More examples (with code) below ⬇️⬇️
This scene was scanned using only 3 pictures 🤯
In my opinion, this was the biggest flaw of NeRFs & 3D Gaussian splats: they are trained from scratch every time with no knowledge of the world. With ReconFusion, we now acquire it from diffusion models
More examples below ⬇️⬇️
New 3D Gaussian Splatting recording! Those metallic reflections and leather were captured REALLY well!
When looking closer, you can also see how the watch hands are modeled with just a couple of elongated gaussians.
#GaussianSplatting
This scene was scanned using only 3 pictures 🤯
In my opinion, this was the biggest flaw of NeRFs & 3D Gaussian splats: they are trained from scratch every time with no knowledge of the world. With ReconFusion, we now acquire it from diffusion models
More examples below ⬇️⬇️
Made a little visualisation for my latest project on free-view one-shot image generation 🤩
Just pick a photo, and generate images with full control of rotation and facial expressions. Or choose a driving video and let the magic happen✨
@ylecun
Try it for free using
@litso_app
!
I optimized 3D Gaussian Splattings over a single picture on a 2D plane. I'm calling this "Gaussian Painters" 🎨✨
Watch the gaussian splats work to paint the Girl with a Pearl Earring!
Here's how I did it (code below) ⬇️⬇️
OpenVoice was just released! 🤯
Given a short audio clip, it clones the reference voice and can generate speech in multiple languages, while having control over emotion, accent, rhythm, pauses, and intonation!
Code & details below ⬇️⬇️
A new highly accurate OCR was just released and it's open-source!
Surya is accurate to the line-level, and multilingual. Well, in this example, only the newspaper name was not detected 😅
Link below ⬇️⬇️
EfficientSAM was just released and it's fast! 💨
With 20x fewer params, it is now 20x faster than the original SAM segmentation model, while staying in the same accuracy range.
See below for the project page and an interactive
@huggingface
space to try it out! ⬇️⬇️
New 3D Gaussian Splatting capture at the Vintage Cars association in Versailles, from a 30 seconds recording.
Some floaters were cleaned with my b3d plugin. Original output below ⬇️⬇️
#GaussianSplatting
Imagine this on the
@Nike
website.
This is a 3D capture of the Nike ZoomX Vaporfly Next%, and visualizing this feels as real as touching the real shoe.
3D Gaussian Splattings are SO good at modeling fine structures, like in this case the transparent fabric.
#GaussianSplatting
Copy-paste any object into an image with AI! 🤯
Here's one application of using AnyDoor for virtual try-on, but it's much more general and is designed to maintain texture details yet allow versatile local variations!
Links below (with code!) ⬇️⬇️
Get crisp 3D Gaussian splats from blurry inputs 🤯
Capturing sharp videos is often impossible because of lens defocusing, object motion or camera shake. And 3DGS learns to fit this.
"Deblurring 3DGS" optimizes a small MLP to model the scene blurriness!
More details below ⬇️⬇️
An open-source version of AnimateAnyone was just released! (Moore-AnimateAnyone)
I just tried it, the quality is not there just yet, but it has great potential!
Links below ⬇️⬇️
3.8 mb 🤯
I tried the 3D Gaussian Splatting add-on for Unity by
@aras_p
on my Nike shoe capture, and when using the “Very low” quality, the file size becomes 3.8mb with minimal visual loss.
That’s the lowest file size I’ve seen for a 3DGS yet!
#GaussianSplatting
Wow, generate infinite videos from a single image 🤯
WonderJourney creates coherently connected 3D scenes along a controllable camera trajectory. Look how running the code three times results in completely different videos!
More examples below ⬇️⬇️
While everyone is waiting for AnimateAnyone, MagicAnimate was just released and it's really impressive! 🤯
It needs a single image and a motion video, and it produces an animated video!
See below for more examples ⬇️⬇️
A new video generation paper just dropped 🤯
DreaMoving creates human dance videos given a target identity and posture sequences.
But unlike AnimateAnyone and MagicAnimate, a full body picture is not required as input (only face + optional prompt).
More details below ⬇️⬇️
I optimized 3D Gaussian Splattings over a single picture on a 2D plane. I'm calling this "Gaussian Painters" 🎨✨
Watch the gaussian splats work to paint the Girl with a Pearl Earring!
Here's how I did it (code below) ⬇️⬇️
A new real-time Radiance Field paper beating 3DGS was just released! 🔥
Similarly to 3D Gaussian Splatting, TRIPS optimizes a point-cloud with color, position & size that gets splatted to the screen. But it does so using a single trilinear write in an image pyramid
More info ⬇️
Adaptive Shells was just awarded best paper at SIGGRAPH Asia! 🙌
It's a new hybrid method between a NeRF and mesh, and achieves up to 300 FPS at HD resolution!
More details below ⬇️⬇️
DreamBooth from a SINGLE image with perfect accuracy 🤯
Unlike specialized models like AnimateAnyone, DreamTuner is a general method for subject-driven generation, controllable via text or pose
But it works so well, it can create temporally consistent animations!
More below ⬇️
Gaussian Head Avatars look amazing! 🤯
Capture a dynamic 3D Gaussian splat of a face, then animate it in 3D using another actor. Imagine the potential for the film industry!
More examples below ⬇️⬇️
Meta AI's new real-time translation model is so impressive! 🤯
It streams the translation BEFORE waiting for the end of a sentence, with <2 seconds of latency. See how fast the translation appears after the speaker starts talking 💨
More details below (with code!) ⬇️⬇️
Google just announces VideoPoet: a multimodal video generation model!
It's massively multimodal and can take as input: text, image, depth & optical flow or a masked video and is one of the first models that generates video + audio!
More info below ⬇️⬇️
Wow this is cool! 🤯
PixelLLM generates image captions with pixel coordinates
Just a few years ago, the field of Explainable AI was amazed by simple heatmaps in the image classification task (single label prediction) This brings it to a whole new level!
Project links below ⬇️
3D Gaussian Splatting is INSANELY good at fur rendering. Look at the fuzzy details here!
Makes sense since it literally optimizes over small ellipsoid particles, as opposed to NeRF or photogrammetry.
#GaussianSplatting
Get crisp 3D Gaussian splats from blurry inputs 🤯
Capturing sharp videos is often impossible because of lens defocusing, object motion or camera shake. And 3DGS learns to fit this.
"Deblurring 3DGS" optimizes a small MLP to model the scene blurriness!
More details below ⬇️⬇️
Segment Anything Model (SAM) now runs at 30 FPS on an iPhone! 🤯
EdgeSAM is the first SAM variant that can run at over 30 FPS on an iPhone 14 with good quality. Low how accurately it segments tiny vegetables!
Code and
@HuggingFace
demo below! ⬇️⬇️
3D Gaussian Splattings from a single image 🤯
Compared to recent novel-view synthesis approaches (like Stable Zero123) which generate novel views as images (causing inconsistencies) this work generates 3D Gaussians directly (via a pointcloud and triplane features)
More below ⬇️
The first Consistency Model for Video was just released! 🤯
It enables video generation with as little as 4 sampling steps: generating 16 frames (at 256x256 resolution) takes 10 seconds only! So not real-time yet (as for images), but close!
More details below! ⬇️⬇️
The code for DreamTalk was just released!
Given any audio (text or song) and a single image frame, it generates a lip-synced animated video, copying the "expression" of a style reference.
Links below ⬇️⬇️
A new real-time Radiance Field paper beating 3DGS was just released! 🔥
Similarly to 3D Gaussian Splatting, TRIPS optimizes a point-cloud with color, position & size that gets splatted to the screen. But it does so using a single trilinear write in an image pyramid
More info ⬇️
This is literally magic 🤯
FMA-Net is a new AI method for video deblurring! It uses complex motion representation learning for spatio-temporally-variant restoration with kernels that are aware of motion trajectories.
More info below ⬇️⬇️
Temporally stable 3D body MoCap with a SINGLE camera & occlusions!🔥
Obtaining globally coherent & plausible motions through occlusions is an incredibly difficult problem, but RoHM (by Meta and ETH Zurich) seems to have just solved this!
More info below ⬇️⬇️
Photopea just announced their new Background Removal tool
It's available for FREE and works imo better than "Remove bg"! Wow 🤯
More examples below ⬇️⬇️
Wow the loading of
#GaussianSplatting
in
@LumaLabsAI
is so smart and satisfying! 😍
Only show the point cloud till fully loaded + progressive streaming from center to background
This is insane!
@antimatter15
has implemented a WebGL viewer for 3D Gaussian Splattings.
Unlike other implementations, this uses vanilla WebGL, and runs on any device in the browser (60+ FPS on my desktop, 30 FPS on mobile but no touch controls yet).
Link to try it below ⬇️⬇️
Imagine this on the
@Nike
website.
This is a 3D capture of the Nike ZoomX Vaporfly Next%, and visualizing this feels as real as touching the real shoe.
3D Gaussian Splattings are SO good at modeling fine structures, like in this case the transparent fabric.
#GaussianSplatting
Just tried
@LumaLabsAI
's Text-to-3D
(low-poly) 3D generation is now as fast as image generation, which you can then upscale for higher resolution 3D models.
Looks promising! 🔥
🔥 Introducing Genie 1.0, our first step towards building multimodal AI. Genie is a text-to-3d model capable of creating any 3d object you can dream of in under 10 seconds with materials, quad mesh retopology, variable polycount, and in all standard formats! Try it on web and in
High quality real-time NeRFs on your phone🤯
MERF is a new streamable memory-efficient approach that achieves real-time performance while equaling the quality of Zip-NeRF (and outperforming 3DGS)
Try it out yourself below ⬇️⬇️
Try furniture in your living room before buying 🔥
Amazon just announced "Diffuse to Choose", a new diffusion-based image-conditioned inpainting model. It is fast and accurately copies fine details of the reference to the target image.
More examples below ⬇️⬇️
This is so perfect 🔥✨
SDXL Auto FaceSwap by
@fffiloni
enables to create new images using the face of a source image.
Try it out in this
@huggingface
space ⬇️⬇️
Fully control AI videos with simple boxes 🤯
Recent approaches enable control with human pose or depth maps, but creating these maps is challenging. TrailBlazer (built on top of ZeroScope) enables control with boxes through spatial & temporal attention map editing
More below ⬇️
#GaussianSplatting
from just two images in a single forward pass 🤯
PixelSplat predicts a dense probability distribution and samples Gaussians through a differentiable operation allowing to back-propagate gradients to the 3DGS representation
Completely insane results! More ⬇️⬇️
Tested Meta AI's new Audio2Photoreal: photorealistic animated 3D Codec Avatars from audio alone, sound on 🔊
Needs better face expressions, but very promising multi-view results!
Code links below ⬇️⬇️
New 3D Gaussian Splatting capture of a park near Versailles. The hardest part in shooting outdoors is to find the perfect timing when no people are seen 😅
#GaussianSplatting
Neural radiance field methods like Zip-NeRF perform very poorly when given only a few images. This is because they learn the scene from scratch with no prior information about the world.
ReconFusion fixes that! 🔥⬇️⬇️
This scene was scanned using only 3 pictures 🤯
In my opinion, this was the biggest flaw of NeRFs & 3D Gaussian splats: they are trained from scratch every time with no knowledge of the world. With ReconFusion, we now acquire it from diffusion models
More examples below ⬇️⬇️
DepthAnything was just released! 🔥
TLDR: it was trained on labeled + 62M unlabeled images
The encoder is initialized with DINOv2, a segmentation models helps to detect the sky (and set depth to ∞), the unlabeled images are strongly distorted (color, blur, CutMix).
More ⬇️⬇️
Get in-context descriptions of any object in an image 🤯
Osprey is a new Pixel Understanding model that can be integrated with Segment Anything Model (SAM) to obtain multi-granularity semantics of any region in an image!
More info below ⬇️⬇️
Generate high-resolution UV textures from just a mesh 🤯
Compared to other approaches, Paint3D achieves to create UV textures without embedded illumination information. It does so using a novel coarse-to-fine approach.
More info below ⬇️⬇️
I optimized the Spherical Harmonic coefficients of a grid of 100x100 gaussian spheres to fit pictures at different viewpoints, creating a lenticular card effect! ✨🌈
Once trained, the images are stored in the SH coefficients of SHARED spheres
Here's how it works (w. code) ⬇️⬇️
Generate a 3D object from a few pictures only! 🤯
UpFusion also works taking a single picture as input, but providing a few UNPOSED images improves the fidelity to the input object!
More details below ⬇️⬇️
Font resolution test on a 3D Gaussian Splatting capture. High frequency areas use more splats, while uniform ones are covered by just a handful of gaussians.
Still mind blown that this uses only 3D ellipsoids.
#GaussianSplatting
Ok this is cool! 🤯
Testing the
@krea_ai
AI Enhancer on one of our BricksAR
#LEGO
buildings to turn it into a realistic Parisian café 😍
Check below for more ⬇️⬇️
@Scobleizer
Tested 3D Gaussian Splatting on a capture from the Comics Art Museum, Brussels. 🇧🇪
Super impressive training convergence and real-time rendering!
#GaussianSplatting
Excited to announce that "DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation" has been accepted to
#NeurIPS
!
If you're interested in vector graphics & sketch generation, feel free to check it out
Code:
Paper:
DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation
Exciting work from
@alxandrecarlier
et al. Transformer-based hierarchical generative models learn latent representations of vector graphics, with nice applications in SVG animation.
Neural Haircut: Prior-Guided Strand-Based Hair Reconstruction
Paper page:
Generating realistic human 3D reconstructions using image or video data is essential for various communication and entertainment applications. While existing methods achieved
Ok this is wild for 3D artists.
What if you could just get a ready-to-use material by simply clicking on a material in an IMAGE? This is what MaterialPalette solves.
Especially useful when modeling from a reference image, just pick the correct material!
Project page below ⬇️⬇️
There's something beautiful about visualizing impressionist artworks from Claude Monet using new technology.
Peaceful scenes from the past that become 3d dreams, and which would be fun to play with in VR.
Created using GaussianPainters.
#GaussianSplatting
For reference, here is the ground truth depth maps for the previous images. DMD improves the relative depth error by up to 25% over ZoeDepth!
One finding is that conditioning on the FOV is essential for disambiguating depth-scale.
Replace a person from a video with any character using a single picture 🤯
This new paper from Alibaba is called MIMO: Controllable Character Video Synthesis withSpatial Decomposed Modeling
More below ⬇️⬇️
A new real-time Radiance Field paper beating 3DGS was just released! 🔥
Similarly to 3D Gaussian Splatting, TRIPS optimizes a point-cloud with color, position & size that gets splatted to the screen. But it does so using a single trilinear write in an image pyramid
More info ⬇️
I tested the ControlNet for video (MagicAnimate) and here are is my opinion: it works great but has some flaws.
- the identity of the motion video leaks to the resulting video (and deforms body shape)
- bad hands and face (unsurprisingly!)
But a great first step for consistent
While everyone is waiting for AnimateAnyone, MagicAnimate was just released and it's really impressive! 🤯
It needs a single image and a motion video, and it produces an animated video!
See below for more examples ⬇️⬇️
Those fonts do not exist 🤯
@AdobeResearch
strikes again with VecFusion, a new diffusion approach for Vector Image generation. Here it generates missing glyphs from just a few examples!
If you follow me from my DeepSVG paper you know how excited I'm about this!
More below ⬇️⬇️
I'm building the best YouTube Thumbnails editor with my app
@ThumbnailsPro
With ✨AI foreground segmentation so that you can place content behind the main subject
What other features should it include?
Create your own upside-down optical illusions with Stable Diffusion XL! 🎨✨
I have created a colab notebook with the modified diffusion process for you to try!
Link in the comments ⬇️⬇️
Fully control AI videos with simple boxes 🤯
Recent approaches enable control with human pose or depth maps, but creating these maps is challenging. TrailBlazer (built on top of ZeroScope) enables control with boxes through spatial & temporal attention map editing
More below ⬇️