dome | Outlier Profile Banner
dome | Outlier Profile
dome | Outlier

@dome_271

Followers
2,117
Following
384
Media
137
Statuses
618
Explore trending content on Musk Viewer
Pinned Tweet
@dome_271
dome | Outlier
8 months
It finally happened. We are releasing Stable Cascade (Würstchen v3) together with @StabilityAI ! And guess what? It‘s the best open-source text-to-image model now! You can find the blog post explaining everything here: 🧵 1/5
90
83
422
@dome_271
dome | Outlier
1 year
We release Würstchen. TL;DR: Reduce the training time of text-to-image models by 16x compared to Stable Diffusion 1.4 while achieving similar results in metrics and visual appearance. 9.200 GPU hours vs 150.000 GPU hours.
Tweet media one
@_akhaliq
AK
1 year
Wuerstchen: Efficient Pretraining of Text-to-Image Models paper page: introduce Wuerstchen, a novel technique for text-to-image synthesis that unites competitive performance with unprecedented cost-effectiveness and ease of training on constrained
Tweet media one
4
21
121
13
55
296
@dome_271
dome | Outlier
1 year
After so much work we now finally release Würstchen v2!!!! You can use it with @diffuserslib ! Find all information in the blog post: I can't describe how happy I'm to have this out :c Big thanks to @krasul @RisingSayak @pcuenq @multimodalart +Patrick
23
54
233
@dome_271
dome | Outlier
1 year
So happy about this! Our new Paella model is out.
5
39
204
@dome_271
dome | Outlier
3 months
UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks Image generations between 4k and 6k! Utilizes Würstchen v3 / Stable Cascade and finetunes / extends the model with some really cool ideas. I'm super impressed with the details of the generations!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
7
28
160
@dome_271
dome | Outlier
1 year
Würstchen v2 vs. Würstchen v3 "A grandpa holding a sign that says 'Thomas', photo." Würstchen v3 is much better at text. We are still evaluating other categories and the model is still finetuning. Really hoping this model becomes a banger! @StabilityAI @pabloppp
Tweet media one
11
24
158
@dome_271
dome | Outlier
2 years
I made a new video explaining Diffusion Models. I did a simple yet comprehensive explanation of both the idea and gave a full math derivation! Check it out:
Tweet media one
2
35
150
@dome_271
dome | Outlier
1 year
Here is some progress on Würstchen (). The model trained for 260k steps at 512x512 and now 100k steps at 1024x1024. Batch Size of 1280. 7691 GPU hours. (Other models usually take > 100000 GPU hours) I will explain what we changed + show some results. 1/9
Tweet media one
2
34
146
@dome_271
dome | Outlier
8 months
Würstchen v3 - Face ControlNet Still work in progress. This will probably also be released with v3 in 2-3 weeks. Excited! @pabloppp @_hsnyder_
Tweet media one
7
13
121
@dome_271
dome | Outlier
9 months
We trained some ControlNets for Würstchen v3. You can try them out on the discord: There are canny and inpainting atm "photo of an alien from Area 51"
Tweet media one
Tweet media two
5
21
115
@dome_271
dome | Outlier
7 months
Excited to share that I‘m joining @LumaLabsAI next month to work on some epic stuff.
18
3
100
@dome_271
dome | Outlier
1 year
Würstchen v2 - some cinematic 1024x2048 generated images. 4 images at 1024x2048 take 7 seconds to generate! Stable Diffusion XL takes 40 seconds to do the same. More images in the thread below. Note: Würstchen was not finetuned on some fancy dataset, just pretraining! OUT SOON
Tweet media one
6
14
95
@dome_271
dome | Outlier
4 months
LET'S GOOOOOOOOOOO SO HAPPY TO SHARE OUR FIRST TEXT-TO-VIDEO MODEL!
@LumaLabsAI
Luma AI
4 months
Introducing Dream Machine - a next generation video model for creating high quality, realistic shots from text instructions and images using AI. It’s available to everyone today! Try for free here #LumaDreamMachine
591
2K
7K
12
15
87
@dome_271
dome | Outlier
2 years
Here it is! My PyTorch Implementation video on Diffusion Models. It contains unconditional and conditional code & training. And I'm also implementing classifier-free-guidance and exponential moving average!
Tweet media one
1
14
87
@dome_271
dome | Outlier
4 months
Hey that‘s me
@LumaLabsAI
Luma AI
4 months
This is Dream Machine, our first generative text-to-video and image-to-video model. This video showcases some of the capabilities of #LumaDreamMachine that we're most proud of. Try Dream Machine for free today 👉
80
113
720
6
4
86
@dome_271
dome | Outlier
3 months
I really don‘t understand why Stability apparently is very much against Stable Cascade. What have we done lol? What‘s wrong with the model xd It‘s not even mentioned on the models page :(
Tweet media one
21
2
85
@dome_271
dome | Outlier
8 months
Following the release of Stable Cascade, I want to highlight a point that might be really interesting to researchers. As you have seen, Stable Cascade compresses images down 42x spatially, while reconstructing them very accurately. This means, .... 1/4
2
13
78
@dome_271
dome | Outlier
1 year
We are continuing to try to improve Würstchen (). The images show a new model after just 160k training steps -> that‘s only 2400GPU hours.
Tweet media one
3
6
75
@dome_271
dome | Outlier
1 year
You can try out Würstchen v2 on the Huggingface Demo!
Tweet media one
6
10
71
@dome_271
dome | Outlier
9 months
Würstchen received an Oral at ICLR WUHUUUUUUUUUUUUUUUUUUUUUU See you in May WUHUUUUUUUUUU Much love to @pabloppp @M_L_Richter @MAubreville @chrisjpal @iclr_conf
Tweet media one
9
6
69
@dome_271
dome | Outlier
1 year
Training of Würstchen v3 has started! 1B and 3.6B versions are training. ( @pabloppp started the trainings and took the big challenge of fighting FSDP all by himself)
Tweet media one
Tweet media two
Tweet media three
4
8
63
@dome_271
dome | Outlier
8 months
This is incredible. With that you are able to run Stable Cascade in FP16 (which before overflowed) and @KBlueleaf also shows how to use the model in FP8. Might be interesting to some and should make it more accessible to more GPUs that don‘t support Bfloat16. Thank you so much!
@KBlueleaf
琥珀青葉@LyCORIS
8 months
Solved the FP16 problem of Stable Cascade
Tweet media one
Tweet media two
Tweet media three
Tweet media four
6
20
125
2
9
62
@dome_271
dome | Outlier
1 year
Würstchen - Text-to-Video. Turns out using Würstchen for video generation might give even better benefits than on images in terms of training & sampling efficiency. Still very early on, but we are working on it! Model: 550k steps image + 220k steps video GPU Hours: 11200
Tweet media one
1
10
59
@dome_271
dome | Outlier
8 months
Extremely shocked by this quality. Unbelievable and unseen. Time to try to train a model that has the same quality 🫣
@sama
Sam Altman
8 months
here is sora, our video generation model: today we are starting red-teaming and offering access to a limited number of creators. @_tim_brooks @billpeeb @model_mechanic are really incredible; amazing work by them and the team. remarkable moment.
2K
4K
26K
6
0
59
@dome_271
dome | Outlier
1 year
I can not wait to train on this data. That’s incredible. Huge huge shoutouts to the authors!!!!
@_akhaliq
AK
1 year
JourneyDB: A Benchmark for Generative Image Understanding paper page: While recent advancements in vision-language models have revolutionized multi-modal understanding, it remains unclear whether they possess the capabilities of comprehending the
Tweet media one
2
26
197
1
12
57
@dome_271
dome | Outlier
5 months
Does anyone else feel like diffusion models have a hard time generating high frequency details? Any experience / thoughts / ideas / pointers on that manner? We have observed often that our models don't generate high frequencies in images and have found it hard improving it.
19
4
55
@dome_271
dome | Outlier
1 year
You can test Würstchen v3 on our Discord Server in the #w ürstchen-v3 channel. Feel free to join and let us know what you think of this preliminary version of Würstchen v3. Link:
Tweet media one
2
7
55
@dome_271
dome | Outlier
1 year
Würstchen v3: 'Cinematic realistic photography of an anthropomorphic dog wearing a hat and sunglasses standing in front of the eiffel tower holding a sign that says "WURST" in colourful letters' *cherrypicked
Tweet media one
3
6
55
@dome_271
dome | Outlier
11 months
Würstchen v3 :D
Tweet media one
3
3
54
@dome_271
dome | Outlier
8 months
You can not believe how much work went into this release, the models and the codebase. We release all of our code, including training, finetuning, ControlNet, LoRA and normal inference here:
1
1
53
@dome_271
dome | Outlier
1 year
Such a good paper and really insightful on training instabilities on large machine learning models with AdamW
@_akhaliq
AK
1 year
1. Theory on Adam Instability in Large-Scale Machine Learning paper: abstract: We present a theory for the previously unexplained divergent behavior noticed in the training of large language models. We argue that the phenomenon is an artifact of the
Tweet media one
1
7
59
0
4
51
@dome_271
dome | Outlier
1 year
Unbelievable.... So much more to come!
Tweet media one
3
0
50
@dome_271
dome | Outlier
1 year
"Pikachu dressed as an astronaut standing on mars, cgi, cinematic" (non-cherrypicked, not finetuned on some carefully crafted dataset, 1536x1024, 4 images generated in ~5 seconds) - Würstchen v2 - soon
Tweet media one
2
3
49
@dome_271
dome | Outlier
2 years
I uploaded a new video explaining Cross Attention. In my opinion a technique that is often not really spoken about, although it powers models such as #stablediffusion #imagen #muse etc. Let me know what you think!
Tweet media one
3
12
45
@dome_271
dome | Outlier
1 year
Würstchen v3 is also able to do image variations out of the box. The top image was generated with: "A photo of a cow wearing a cowboy hat" And the images below that are image variations based only on that given image, no caption.
Tweet media one
4
7
41
@dome_271
dome | Outlier
8 months
"cinematic" Würstchen v3
Tweet media one
1
3
40
@dome_271
dome | Outlier
1 year
Würstchen is trending on Huggingface Spaces 🤓
Tweet media one
3
3
38
@dome_271
dome | Outlier
1 year
This is really cool: We finetuned Würstchen v2 on some pretty data. Here are generations from the base model & the finetuned model AFTER JUST 2000 steps. (batch size = 384). Prompt: "portrait of a mysterious dog, creative concept trending on artstation" (no neg prompt, cfg = 4)
Tweet media one
Tweet media two
8
2
38
@dome_271
dome | Outlier
1 year
"Anthropomorphic cat dressed as a fire fighter" - Würstchen v2 Finetune - CFG=4.0, no negative prompt, release next week :D
Tweet media one
0
3
35
@dome_271
dome | Outlier
8 months
Or also for text-to-video generation: Imagine having a really good model that compresses videos with a 42x spatial and maybe 8x temporal. This would mean super efficient and fast generations. If you have other ideas, we can chat about them if you want:
2
0
33
@dome_271
dome | Outlier
8 months
That almost all information of a 3x1024x1024 image is stored in just 16x24x24 numbers. You can test this yourself with the notebook we have that explains all the details: That's why we think it makes so much sense to train the T2I model in that space.
2
5
34
@dome_271
dome | Outlier
1 year
Finetuned Würstchen for 2k steps at 1536x1024 and 1024x1536. Its crazy how fast models can adapt to new image sizes and aspect ratios. This model has now trained in total for 28.000 GPU hours (916k steps). (SD 1.4 used 150.000)
Tweet media one
Tweet media two
1
6
33
@dome_271
dome | Outlier
1 year
New Explanation Video about Paella! Check it out here: I'm explaining how Paella works & the story behind it!
3
10
34
@dome_271
dome | Outlier
1 year
1 / N: We are trying some new conditions on Würstchen, (cheeky stealing of SDXL ideas lol) by conditioning on aesthetic scores, crop ratios and image sizes. The first set of images here is using an aesthetic score of 7 and the second using an aesthetic score of 5.
Tweet media one
Tweet media two
4
4
34
@dome_271
dome | Outlier
1 year
That’s where we trained Würstchen btw:
Tweet media one
5
0
32
@dome_271
dome | Outlier
1 year
"An astronaut in an orange space suit walking on a foreign mysterious planet in a different galaxy" - Würstchen v2
Tweet media one
3
2
32
@dome_271
dome | Outlier
1 year
"Photography of an astronaut running scared in a cave, trying to escape from an extraterrestrial creature, ci, cinematic" Würstchen v3 - 2048x1536 v3 will come with with 4 models: Stage C: 1B, 3.6B Stage B: 700M, 3B Images here are from Stage C 1B and Stage B 700M @pabloppp
Tweet media one
1
4
31
@dome_271
dome | Outlier
1 year
First baby steps on text-to-video using Würstchen ()
Tweet media one
0
4
30
@dome_271
dome | Outlier
1 year
Würstchen v3 checkpoint file sizes have awesome numbers in float16 and float32
Tweet media one
Tweet media two
4
0
29
@dome_271
dome | Outlier
7 months
Now thats an epic horse @Suhail
Tweet media one
1
1
28
@dome_271
dome | Outlier
1 year
We will release the weights and the updated code on our GitHub (). Training of Stage C took less than 5 days on 64 GPUs, and can be reproduced much easier. 5/9
1
3
28
@dome_271
dome | Outlier
1 year
Würstchen v3
Tweet media one
1
4
28
@dome_271
dome | Outlier
1 year
"A photo of a beautiful house standing on top of a mountain" - Würstchen v2 Finetuned. Release next week.
Tweet media one
2
2
26
@dome_271
dome | Outlier
1 year
"Cinematic realistic photography of a teddy bear sitting on a luxury bike driving through time square in new york" Würstchen v3
Tweet media one
1
2
27
@dome_271
dome | Outlier
1 year
I love the paper. Finally an open-source zero-shot "finetuning" model. Also the approach seems to be pretty cool. Detect identities, (embed->concat->project) identities + regularize cross-attn maps in training with segmentation maps of identities. I wanna try it too!
@_akhaliq
AK
1 year
FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention FastComposer generates images of multiple unseen individuals with different styles, actions, and contexts. It achieves 300 times-2500 times speedup compared to fine-tuning-based methods and
Tweet media one
3
50
211
1
8
24
@dome_271
dome | Outlier
7 months
"grainy closeup photo of an antropomorphic mushroom creature with a cute face and limbs sitting sad on a mossy rock in the middle of a forest" #stablecascade
Tweet media one
0
3
25
@dome_271
dome | Outlier
8 months
Im soooo happy with the outputs, look at them:
Tweet media one
3
0
24
@dome_271
dome | Outlier
1 year
Pikachu as different animals. Würstchen v2. Release in the next days.
Tweet media one
Tweet media two
Tweet media three
3
2
25
@dome_271
dome | Outlier
1 year
We will continue trying to improve Würstchen and soon go to text-to-videos :D Thanks to @StabilityAI for providing compute to do this research! Shared work with @pabloppp 9/9
Tweet media one
1
0
25
@dome_271
dome | Outlier
4 months
This is literally one of the funniest posts of #LumaDreamMachine
@BengtTibert
Bengt Tibert
4 months
Assembly Made with #LumaDreamMachine 🔥 Thank you @LumaLabsAI for the invitation to play with your new mind blowing toy!🔥 I sleep even less now. Track made with @udiomusic @fellowshiptrust #artificialfriends
78
128
734
3
1
23
@dome_271
dome | Outlier
2 months
Yey
@LumaLabsAI
Luma AI
2 months
Dream Machine 1.5 is here 🎉 Now with higher-quality text-to-video, smarter understanding of your prompts, custom text rendering, and improved image-to-video! Level up. #LumaDreamMachine
97
303
1K
2
3
24
@dome_271
dome | Outlier
3 months
Tweet media one
3
3
23
@dome_271
dome | Outlier
1 year
"A photo of an astronaut riding a horse" - Würstchen v2 Finetuned
Tweet media one
2
1
23
@dome_271
dome | Outlier
1 year
"Anthropomorphic blue owl, big green eyes, lots of details, portrait, finely detailed armor, cinematic lighting, intricate filigree metal design, 8k, unreal engine, octane render, realistic, redshift render" - Würstchen v2. We will release soon.
Tweet media one
1
0
23
@dome_271
dome | Outlier
1 year
Wow that‘s a nice step towards interpretability if that’s true :o
@_akhaliq
AK
1 year
Explaining black box text modules in natural language with language models paper page:
Tweet media one
2
36
149
0
5
19
@dome_271
dome | Outlier
1 year
wow I did not know that
Tweet media one
8
3
22
@dome_271
dome | Outlier
1 year
How time passes. One year ago @pabloppp and I started with text-to-image models and we went from this (left) to this (right) within one year. Let's see where we will be next year! "A pikachu shaped hat"
Tweet media one
Tweet media two
0
3
22
@dome_271
dome | Outlier
5 months
We will be at ICLR! Wanna hang out after the conference and chat a bit? Sign up to our happy hour that we are hosting and let’s get some drinks!
@LumaLabsAI
Luma AI
5 months
Attending @iclr_conf next week? Come grab a drink and connect with us during our happy hour. 🔗 #ICLR2024
1
9
43
4
1
21
@dome_271
dome | Outlier
1 year
Würstchen Base Model vs. Finetuned Model. Release next week.
Tweet media one
Tweet media two
2
1
21
@dome_271
dome | Outlier
1 year
Wow nice. Never tried that!
@nerijs
Brandon G. Neri
1 year
Can confirm Würstchen can do cute pixel art corgis out of the box
Tweet media one
Tweet media two
1
1
30
1
1
21
@dome_271
dome | Outlier
5 months
Würstchen 🤓
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
5 months
MediSyn: Text-Guided Diffusion Models for Broad Medical 2D and 3D Image Synthesis abs: Two instruction-tuned text-guided latent diffusion models, one for 2D medical images and one for 3D medical images. Trained on a dataset of 5.7 million 2D medical
Tweet media one
2
18
78
2
1
21
@dome_271
dome | Outlier
8 months
So cool thank you!
@mk1stats
mkshing
8 months
@dome_271 @pabloppp @StabilityAI Also, I hacked to run the Stable Cascade on Colab free plan (T4 16GB)! Enjoy! #stablecascade colab:
5
31
108
0
3
20
@dome_271
dome | Outlier
8 months
Moreover, Stable Cascade performs really well compared to other models:
Tweet media one
Tweet media two
1
0
19
@dome_271
dome | Outlier
1 year
1024x1024 samples of Würstchen after 260k training iterations at only 512x512. Training used 4126 GPU Hours so far and took < 3 days for training.
Tweet media one
Tweet media two
Tweet media three
2
3
20
@dome_271
dome | Outlier
1 year
Generating very contrast rich black and white images is also possible, which e.g. Stable Diffusion has problems () with due to different minimum signal to noise ratio: "A black square on a white background". 7/9
Tweet media one
2
0
20
@dome_271
dome | Outlier
8 months
There might be other cool things to explore here. For example using the encoder to encode images and then train an image classifier in that space. Or other image tasks could use that model to first embed images to a small space, potentially leading to more efficient training.
1
0
18
@dome_271
dome | Outlier
2 years
I created the animations of my last video with manim, (the python library created by 3B1B to animate his videos, which is maintained by an open-source community) and put the code online here if anyone is interested:
1
2
19
@dome_271
dome | Outlier
1 year
Finally some Würstchen generated by Würstchen. Kinda interesting vibe.
Tweet media one
3
1
19
@dome_271
dome | Outlier
1 year
Sometimes I'm just so amazed at text-to-image models. So in love with these. Next Week Release - Würstchen v2!!!!
Tweet media one
3
0
19
@dome_271
dome | Outlier
1 year
"Dramatic photography of a frog evolving into a crab, crab legs, macro photography" Würstchen v2 - ( cc @pabloppp )
Tweet media one
1
2
18
@dome_271
dome | Outlier
1 year
"astronaut in the mushroom forest, psychedelic mood, astral figures, morning light, clear sky, extremely detailed, strong use of colors, pop surrealism, hard edges, heavy paint brush, 8k" - Würstchen v2 Finetuned - Release Next Week
Tweet media one
3
1
18
@dome_271
dome | Outlier
1 year
Finally someone is doing that on language models
@_akhaliq
AK
1 year
Stay on topic with Classifier-Free Guidance paper page: Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be
Tweet media one
0
25
125
1
1
18
@dome_271
dome | Outlier
1 year
One of the biggest improvements we saw comes from making use of findings from Consistency Models ( @YSongStanford @prafdhar @markchen90 @ilyasut ), which we combine with an epsilon objective. The model learns much much faster. 2/9
Tweet media one
1
1
17
@dome_271
dome | Outlier
8 months
WOW it actually happened! I have been looking forward so much to this !!!!!!!!!🤗 Thanks <3333333333333333
1
1
17
@dome_271
dome | Outlier
1 year
That's how most of my videos look after finishing them. It's quite a lot of work, but I enjoy it a lot. (The latest Paella video took about 60 hours of work, my DDPM video took 200 hours of work back then)
Tweet media one
2
0
17
@dome_271
dome | Outlier
1 month
"A serene digital illustration featuring a simplified, stylized mountain landscape bathed in sunlight, with gentle pastel hues dominated by soft yellows, where a lone figure is captured in a moment of bliss as they leisurely ride a bike across the foreground, invoking a sense of
Tweet media one
0
1
18
@dome_271
dome | Outlier
1 year
Pretty cool huh?
@wand_app
wand
1 year
Wand is now powered by #StableDiffusionXL ⚡️⚡️⚡️ Turn your sketches into highly detailed scenes 🖼️ Want to try? Waitlist in bio
1
7
48
1
0
15
@dome_271
dome | Outlier
5 months
Aww ❤️ thank you so much
1
0
15
@dome_271
dome | Outlier
1 year
Würstchen v2 - Release Next Week! Spiderman vs Batman
Tweet media one
Tweet media two
1
0
15
@dome_271
dome | Outlier
1 year
This makes training the text-conditional stage very fast & cheap, opening doors to better investigate large text-to-image models and enabling more people to train & finetune for cheaper. Work with @pabloppp @MAubreville
Tweet media one
1
2
14
@dome_271
dome | Outlier
11 months
"a still shot from a mobile time-lapse photography shoot of a city at night. It's a beautiful view of the city at night with city and traffic lights gleaming." We are using ChatGPT to enrich captions now. You can try it out on our Discord:
Tweet media one
2
4
13
@dome_271
dome | Outlier
3 months
Taylor Swift Metric will become a thing for evaluating text-to-image models. Forget FID. You heard it here first.
1
3
13
@dome_271
dome | Outlier
1 year
We achieve this by decoupling the text-conditional model even further from high resolutions. We use two models to compress 512x512 images into a tiny low dimensional latent space of 12x12, resulting in an f42 spatial compression, while reconstructing them faithfully.
Tweet media one
1
2
13
@dome_271
dome | Outlier
1 year
Würstchen v2 - Release Next Week! "portrait, cyberpunk mandalorian, futuristic, highly detailed" cfg = 4.0 sampler = ddpm steps = 30 (scaled) negative prompt = "" @pabloppp
Tweet media one
2
0
13
@dome_271
dome | Outlier
1 year
@nathanwchan @_akhaliq It just means that training classification models can now be improved by also training them on images generated by text-to-image models like StableDiffusion. The generated images are used to enlargen the dataset size the classification models train on. And apparently it works.
3
1
13
@dome_271
dome | Outlier
8 months
Pretty cool 🥰
@artificialguybr
𝑨𝒓𝒕𝒊𝒇𝒊𝒄𝒊𝒂𝒍 𝑮𝒖𝒚
8 months
Invictus.Redmond is here! Invictus is a Stable Cascade Generalist finetune. Its Stage C finetuned. Thanks @RedmondAI for all GPU Support. Need GPU? Talk to Redmond. Download it for free on Civitai and HF. Links e more examples below!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
4
17
73
1
0
12