Sayak Paul
@RisingSayak
Followers
19K
Following
8K
Statuses
6K
Introducing "Flux-Edit", an experimental finetune of Flux.1-Dev (@bfl_ml), following the "Flux Control" framework 🔥 Works nicely with a wide variety of editing tasks beyond style transfer. Works with turbo LoRAs, reducing steps from 50 to 8. Ckpt + code ⬇️
13
64
299
RT @dwarkesh_sp: The @JeffDean & @NoamShazeer episode. We talk about 25 years at Google, from PageRank to MapReduce to the Transformer to…
0
211
0
Yes, very true! But since images can have very different textures I wonder how much one can fit within the exemplars for ICL. > And it should also use some kind of reasoning (doesn't necessarily need to use a reasoning model) for more accurate result Yes, keen on seeing how this is incorporated. Feel free to comment with a patch on my gist here if you have time:
0
0
0
@gmongaras True. Trying to find a middle ground could be nice. Like some sort of combination of global attention and window attention (every other block) as was done in Gemma.
0
0
0
@jbohnslav Oh I didn’t ask the model to use its own reasoning to score, either (or any kind of on-policy sampling, either).
0
0
1
Fair. Even with that, when attention masks can be incorporated through flash attn, the memory-speed benefits become really prominent compared to the case without. So, it would be nice to have a good study here. I'd ve pretty interested to also see how mixing full attn. and local attn. (like Gemma) plays out here.
0
0
0
Glad to see you here and your insights. Also, thanks for your contribs! > performing joint attention can remove the unidirectional bias in text embeddings due to LLM's causal attention. This is good to know. So, for T5 like text encoders, as used in video models like LTX, CogVideoX -- is cross-attention better? > Besides, it provides a more unified and natural way of modeling text-image/video sequences for various applications. Could you elaborate more on this?
1
0
0
@YKirstain Plus I don’t see an easy way to incorporate masks in joint attn — perhaps that is not needed but haven’t seen that being explored either.
0
0
0
RT @RisingSayak: We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨 We cover the models support…
0
18
0