With friends at
@Google
we announce 💜 Magic Insert 💜 - a generative AI method that allows you to drag-and-drop a subject into an image with a vastly different style achieving a style-harmonized and realistic insertion of the subject (Thread 🧵)
web:
Today, along with my collaborators at
@GoogleAI
, we announce DreamBooth! It allows a user to generate a subject of choice (pet, object, etc.) in myriad contexts and with text-guided semantic variations! The options are endless. (Thread 👇)
webpage:
1/N
Today, with collaborators at
@Google
, we're excited to announce 🥳🥳HyperDreamBooth🥳 🥳! It's like DreamBooth, but smaller, faster and better. 25x faster. Think of 30 minutes vs. 14 hours for 100 models. And works on a single image!
(Thread 👇)
webpage:
With collaborators
@Google
we're announcing 💫 ZipLora 💫! Merging LoRAs has been a big thing in the community, but tuning can be an onerous process. ZipLora allows us to easily combine any subject LoRA with any style LoRA! Easy to reimplement 🥳
link:
Today, along with collaborators at
@GoogleAI
, we’re excited to announce StyleDrop! It allows a user to generate new images that follow a specific style of their choice given only a single style reference image 🤯 (Thread 👇)
webpage:
Today, with collaborators at Google, we're announcing 🤩RealFill🤩! A generative AI approach to fill missing regions of an image with the content that should have been there. The best way to turn almost perfect pictures into invaluable memories!
page:
Today, with collaborators at Google and UT Austin, we're announcing 🤖 RB-Modulation 🤖! It's a whole new training-free framework for conditioning on reference images (for style or subject) without adapters (!) with an elegant formulation 🔥
web:
AI generated writing *feels* AI-generated at a visceral level, and even if you ask an LLM to make the writing feel or read less AI-generated it horrifically fails and makes it feel even more AI-generated. Any tricks that can help? Any prompts to share?
We are 🔥super excited🔥 to release the Platypus family of finetuned LLMs 🥳🥳. Platypus achieves the top score in the Hugging Face Open LLM Leaderboard 🏆! The main focus of our work is to achieve cheap, fast and powerful refinement of base LLMs.
page:
@cpicciolini
@SamHarrisOrg
Can you address his explanation, namely that you attributed specific positions to those two people which they do not actually hold?
Super happy to announce that I will be joining
@Google
as a Research Scientist and will be starting tomorrow! Extremely excited by this new step and very grateful for everyone that made this possible. 🥳🥳🥳
@bradesposito
Interestingly she describes exactly what is wrong with streaming now. Many times I have to see an album cover to remember that it’s great and I should listen to it again!
🥳 DreamBooth has been accepted to CVPR 2023. And with this comes a *big update* to the paper including the largest evaluation dataset for subject driven generation and an evaluation protocol! Find it in the project webpage:
(a thread)
#Dreambooth
1/N
Our team is looking for student researchers doing a PhD starting in January either full-time or part-time (prefer full-time). If you want to work on new exciting applications and methods like I did with DreamBooth, then please reach out. DMs open.
🥳 ZipLoRA has been accepted to ECCV 2024 🥳 looking forward to its continued impact in the research community - so many people already doing work on style/subject combinations and LoRA merging for diffusion models! Congratulations to all the co-authors.
With collaborators
@Google
we're announcing 💫 ZipLora 💫! Merging LoRAs has been a big thing in the community, but tuning can be an onerous process. ZipLora allows us to easily combine any subject LoRA with any style LoRA! Easy to reimplement 🥳
link:
Excited 🥳🥳🥳to release my first senior author work, done while still a student at BU, with a start studded lineup of collaborators and an incredible student first-author
@ArielNLee
🌻🙌- it's all about differences between Vision Transformers & CNNs 👇
Some really cool news that I forgot to share. HyperDreamBooth is accepted to CVPR 2024. Next tweet in this thread has a link to the YouTube video - and I will present the paper at the conference. Hope to see many of you at the poster session 🚀
On Distillation of Guided Diffusion Models
abs:
On ImageNet 64x64 and CIFAR-10, approach is able to generate images visually comparable to that of the original model using as few as 4 sampling steps
As fast as generative AI is advancing, as quickly as it is getting commoditized, and as rapidly as everyone is becoming blasé about it, I can say that I still remember the first days of dreambooth and how mindblown I was. I hope I remember that feeling forever 🤍
Today, at NeurIPS, we announce counterfactual simulation testing, a new framework for comparing vastly different network architectures using counterfactuals. We use it to compare the robustness of modern ConvNets and Transformers. (Thread 👇)
webpage:
Our method has some surprising capabilities inherited from large diffusion models. For example it can generate novel art renditions of a subject! Here are some renditions of a specific dog in the style of famous painters.
4/N
So, I quickly implemented the ZipLoRA by 🤗🧨 (Some people have already noticed though)
code:
I hope it helps somehow and feel free to drop your comments and feedback~
Big thanks to the authors for their awesome work 🙌
the more I look at the videos, the more the motions feel like a video game (the walking here), but the appereance of only some videos looks like video game footage. Maybe this model is trained on a lot of game footage? models are good at learning to change style simulated->real
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.
Prompt: “Beautiful, snowy
Text-to-image diffusion models are extremely powerful and allow for flexible generation of images with complex user captions. One limitation is that controlling the subject’s appearance and identity using text is very hard.
2/N
We can even do realistic viewpoint changes for some subjects which have a strong class prior! Here are some examples of different viewpoints for a cat. Notice the detailed fur patterns in the forehead are conserved. 🤯
7/N
By finetuning a model (Imagen here) with few images of a subject (~3-5), a user can generate variations of the subject. E.g. by controlling the environment and context of the subject. Ever wanted to have a high-quality picture of your dog in Paris (no travel required)?
3/N
My first paper as senior author (done while I was still a PhD student at BU!). So proud of Ariel and grateful for all coauthors 🙏🌸. I feel blessed. Thread coming out tomorrow 🔥
Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing
paper page:
Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional
@TectonixGEO
@vdbDennis
@xmodesocial
I mean how anonymized is it really if you can track a phone location? You can easily figure out where people live, and identifying the person is one-step away (maybe even a Google search away)
ZipLoRA-pytorch with
@Gradio
demo by
@mk1stats
local demo:
Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (LoRA)
Cool work which proposes a very similar "lower-rank" LoRA like Lightweight DreamBooth that we proposed in our HyperDreamBooth work () but for LLMs. 10x reduction in size, just like in our case!
VeRA: Vector-based Random Matrix Adaptation
paper page:
Low-rank adapation (LoRA) is a popular method that reduces the number of trainable parameters when finetuning large language models, but still faces acute storage challenges when scaling to even
Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models
paper page:
Text-to-image (T2I) personalization allows users to guide the creative image generation process by combining their own visual concepts in natural language
Finally, our method can generate new images of a subject with different expressions/emotions. Note that the original images of the subject dog here did not exhibit any of these expressions.
8/N
🚀 Presenting our latest SOTA LLM: OpenOrca-Platypus2-13B 🚀. Kudos to
@ArielNLee
and
@ColeJHunter
and the great people of
@alignment_lab
for topping the Hugging Face leaderboard in the 13B parameter category! Excited by this collaboration.
link:
@JxckSweeney
@elonmusk
So you run a Twitter account that tracks Musk's jet purportedly because it is "of service" and "interesting", yet here you are offering to take it down if the amount they pay you is enough? I don't understand.
I’m defending my PhD thesis tomorrow 🎉 at 3pm EST. It’s called: Simulating to Learn. Such a fun journey. Will post the video afterwards. If you want the zoom link send me a dm.
@JamesTodaroMD
@elonmusk
James, there has been a lot of criticism of the Santa Clara study and it might overestimate positive cases because of the biased sample and the false positive rate of antibody tests. The IFR that I computed of 0.1% with that data would mean prevalence of more than 100% in NYC.
But here is a result I really didn't expect. What surprises me is how well it handles the translation of ideas into arbitrary styles, changing the object shape to fit the style - and following stylistic flourishes and geometrical style components.
With collaborators
@Google
we're announcing 💫 ZipLora 💫! Merging LoRAs has been a big thing in the community, but tuning can be an onerous process. ZipLora allows us to easily combine any subject LoRA with any style LoRA! Easy to reimplement 🥳
link:
Some amazing work by Google researchers on personalization of video models. Their work allows for video subject-driven generation and style-driven generation with some seriously impressive results 🤯 imagine the possibilities
web:
Congratulations to
@kihyuk_sohn
,
@dilipkay
and to all authors involved in this work! The list is long and can be found below. For more amazing examples go to the project page.
paper:
project webpage:
Google announces PALP
Prompt Aligned Personalization of Text-to-Image Models
paper page:
Content creators often aim to create personalized images using personal subjects that go beyond the capabilities of conventional text-to-image models. Additionally,
A lot of times, you get more secret sauce from a conversation than reading a paper.
I will be presenting HyperDreamBooth at CVPR @ the Wednesday Poster Session: 17:15 - 18:45 (Paper 168) and @ the Google Booth at 12pm on Thursday
🌷🌻🌸
(also looking for really strong student
Some really cool news that I forgot to share. HyperDreamBooth is accepted to CVPR 2024. Next tweet in this thread has a link to the YouTube video - and I will present the paper at the conference. Hope to see many of you at the poster session 🚀
One main difficulty in finetuning a diffusion model using few images is overfitting. We tackle this problem by presenting an autogenous class-specific prior preservation loss. More details in the paper.
9/N
DreamBooth featured at Google I/O 🥳 on an insane concept: a card game with 7+ Million unique generated characters! Amazing work by the I/O Flip team! 🤯 The first instance of such a card game? (clip linked)
In order to do so we propose an optimized, small, yet very powerful dataset named Open-Platypus, which is a curated subset of open datasets and focuses on enhancing LLMs' STEM and logic proficiency. We release this dataset to the public.
Before diving into technical details, let's explore some impressive examples. StyleDrop can extract the color palette and overall style from this watercolor cat painting, and generate almost anything one can imagine in that same style.
I think
@Scenario_gg
are pushing the limits of DreamBooth in crazy ways. They really are alchemists working with the original DreamBooth idea to make it much stronger and to be able to do more things with it.
We just made creating your next Consistent Character waaaaaaay easier :D
Workflow 1/3
I am sharing THREE workflows this week to using the new "Character Base" LoRAs that we just added to Scenario to:
- Use as a consistent character
- Create a new consistent character from
-
We also thank the Imagen team for lending us access to their incredible model. And we deeply thank all of the great people who helped with reviews and feedback (all acknowledged in the paper).
Again, our project website is:
13/13 (END)
Our freshly minted ICCV2023 paper: The nice anti-aliasing of mip-NeRF 360, but with most of the speed of Instant NGP. Error rate reductions of 8%-77% compared to either prior technique, and 24x faster than the most accurate NeRF baseline we tried.
We are able to alleviate overfitting using this approach. We show that finetuning without this loss term leads to accelerated overfitting of subject pose and appearance, or context. This decreases generation variability and incorrect scenes.
10/N
@afneil
The study is hard to read. From what I saw it 1. is a retrospective study 2. treats patients that are severely ill, probably later in the course of the disease.
HCQ has in vitro antiviral effects against SARS-CoV-2 and should be used EARLY. Not effective to use it late!
Here is a ✨ demo ✨ that you can access on the desktop version of the website. We're excited by the options Magic Insert opens up for artistic creation, content creation and for the overall expansion of GenAI controllability.
We just released the dataset for 👽 RealFill 👽
Also - the paper has been updated on arxiv. I guess I also forgot to mention that RealFill has been accepted at SIGGRAPH 2024 🥳
arxiv:
dataset:
Party time! The SD3 paper made it to arxiv:
Key takeaways:
- flow matching is very nice.
- back to work with
@pess_r
and a fantastic team ♥️
The paper is full of details on improved flow matching, scaling and engineering. Enjoy!
@alexandrosM
@R_H_Ebright
This letter is pretty startling I have to say. As scientists, how could they have been so certain about the origins of the virus about a month after the news of the outbreak? It's always important to have a little bit of doubt when the evidence is not fully there yet
@marwilliamson
The sanctions were primarily targeted towards the regime (who wine and dine at expensive restaurants while the people starve). I just don’t agree with this specific example.
"This Should Be Impossible!" 🥳🥳 Our RealFill work at
@Google
made it into Two Minute Paper (
@twominutepapers
🙏). Truly great presentation of the work!
TLDR: Meet ✨Lumiere✨ our new text-to-video model from
@GoogleAI
!
Lumiere is designed to create entire clips in just one go!
Seamlessly opening up possibilities for many applications:
Image-to-video 🖼️ Stylized generation 🖌️ Video editing 🪩 and beyond.
See 🧵👇