RisingSayak Profile Banner
Sayak Paul Profile
Sayak Paul

@RisingSayak

Followers
19K
Following
8K
Statuses
6K

ML at Hugging Face 🤗

Earth
Joined May 2012
Don't wanna be here? Send us removal request.
@RisingSayak
Sayak Paul
23 days
Introducing "Flux-Edit", an experimental finetune of Flux.1-Dev (@bfl_ml), following the "Flux Control" framework 🔥 Works nicely with a wide variety of editing tasks beyond style transfer. Works with turbo LoRAs, reducing steps from 50 to 8. Ckpt + code ⬇️
13
64
299
@RisingSayak
Sayak Paul
15 hours
RT @dwarkesh_sp: The @JeffDean & @NoamShazeer episode. We talk about 25 years at Google, from PageRank to MapReduce to the Transformer to…
0
211
0
@RisingSayak
Sayak Paul
1 day
@bdsqlsz Feel free to submit a PR :)
0
0
1
@RisingSayak
Sayak Paul
1 day
@aryanvs_ Bing Bang Theory
0
0
0
@RisingSayak
Sayak Paul
1 day
It would be blasphemy not to mention `hlky`, who is the primary developer of this tooling. Check out the code and give us your feedback. Contributions are welcome too 🤗
1
0
5
@RisingSayak
Sayak Paul
2 days
Yes, very true! But since images can have very different textures I wonder how much one can fit within the exemplars for ICL. > And it should also use some kind of reasoning (doesn't necessarily need to use a reasoning model) for more accurate result Yes, keen on seeing how this is incorporated. Feel free to comment with a patch on my gist here if you have time:
0
0
0
@RisingSayak
Sayak Paul
3 days
@gmongaras True. Trying to find a middle ground could be nice. Like some sort of combination of global attention and window attention (every other block) as was done in Gemma.
0
0
0
@RisingSayak
Sayak Paul
3 days
0
0
1
@RisingSayak
Sayak Paul
3 days
@jbohnslav Oh I didn’t ask the model to use its own reasoning to score, either (or any kind of on-policy sampling, either).
0
0
1
@RisingSayak
Sayak Paul
3 days
Code: Demo: The LLM grading prompt comes from @ma_nanye et al.:
0
0
8
@RisingSayak
Sayak Paul
4 days
Fair. Even with that, when attention masks can be incorporated through flash attn, the memory-speed benefits become really prominent compared to the case without. So, it would be nice to have a good study here. I'd ve pretty interested to also see how mixing full attn. and local attn. (like Gemma) plays out here.
0
0
0
@RisingSayak
Sayak Paul
4 days
Glad to see you here and your insights. Also, thanks for your contribs! > performing joint attention can remove the unidirectional bias in text embeddings due to LLM's causal attention. This is good to know. So, for T5 like text encoders, as used in video models like LTX, CogVideoX -- is cross-attention better? > Besides, it provides a more unified and natural way of modeling text-image/video sequences for various applications. Could you elaborate more on this?
1
0
0
@RisingSayak
Sayak Paul
4 days
@YKirstain Plus I don’t see an easy way to incorporate masks in joint attn — perhaps that is not needed but haven’t seen that being explored either.
0
0
0
@RisingSayak
Sayak Paul
5 days
RT @RisingSayak: We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨 We cover the models support…
0
18
0