Sayak Paul @RisingSayak profile

Sayak Paul

@RisingSayak

Followers

19K

Following

8K

Statuses

6K

ML at Hugging Face 🤗

Earth

Joined May 2012

Don't wanna be here? Send us removal request.

Sayak Paul

@RisingSayak

23 days

Introducing "Flux-Edit", an experimental finetune of Flux.1-Dev (@bfl_ml), following the "Flux Control" framework 🔥 Works nicely with a wide variety of editing tasks beyond style transfer. Works with turbo LoRAs, reducing steps from 50 to 8. Ckpt + code ⬇️

13

64

299

Sayak Paul

@RisingSayak

15 hours

RT @dwarkesh_sp: The @JeffDean & @NoamShazeer episode. We talk about 25 years at Google, from PageRank to MapReduce to the Transformer to…

0

211

0

Sayak Paul

@RisingSayak

1 day

@bdsqlsz Feel free to submit a PR :)

0

1

Sayak Paul

@RisingSayak

1 day

@aryanvs_ Bing Bang Theory

0

Sayak Paul

@RisingSayak

1 day

It would be blasphemy not to mention `hlky`, who is the primary developer of this tooling. Check out the code and give us your feedback. Contributions are welcome too 🤗

1

0

5

Sayak Paul

@RisingSayak

2 days

Yes, very true! But since images can have very different textures I wonder how much one can fit within the exemplars for ICL. > And it should also use some kind of reasoning (doesn't necessarily need to use a reasoning model) for more accurate result Yes, keen on seeing how this is incorporated. Feel free to comment with a patch on my gist here if you have time:

0

Sayak Paul

@RisingSayak

3 days

@gmongaras True. Trying to find a middle ground could be nice. Like some sort of combination of global attention and window attention (every other block) as was done in Gemma.

0

Sayak Paul

@RisingSayak

3 days

@arnabbiswas1 Cc: @RemiCadene

0

1

Sayak Paul

@RisingSayak

3 days

@jbohnslav Oh I didn’t ask the model to use its own reasoning to score, either (or any kind of on-policy sampling, either).

0

1

Sayak Paul

@RisingSayak

3 days

Code: Demo: The LLM grading prompt comes from @ma_nanye et al.:

0

8

Sayak Paul

@RisingSayak

4 days

Fair. Even with that, when attention masks can be incorporated through flash attn, the memory-speed benefits become really prominent compared to the case without. So, it would be nice to have a good study here. I'd ve pretty interested to also see how mixing full attn. and local attn. (like Gemma) plays out here.

0

Sayak Paul

@RisingSayak

4 days

Glad to see you here and your insights. Also, thanks for your contribs! > performing joint attention can remove the unidirectional bias in text embeddings due to LLM's causal attention. This is good to know. So, for T5 like text encoders, as used in video models like LTX, CogVideoX -- is cross-attention better? > Besides, it provides a more unified and natural way of modeling text-image/video sequences for various applications. Could you elaborate more on this?

1

0

Sayak Paul

@RisingSayak

4 days

@YKirstain Plus I don’t see an easy way to incorporate masks in joint attn — perhaps that is not needed but haven’t seen that being explored either.

0

Sayak Paul

@RisingSayak

5 days

RT @RisingSayak: We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨 We cover the models support…

0

18

0

Sayak Paul

@RisingSayak

5 days

@PrabhhavSharma @AshwiniVaishnaw @aakrit One.

0

1