Sulin Liu @su_lin_liu profile

Sulin Liu

@su_lin_liu

Followers

567

Following

2K

Statuses

186

Postdoc @MIT Ex: Machine Learning PhD @Princeton @Meta @NTUsg @NUSingapore

Joined March 2011

Don't wanna be here? Send us removal request.

Sulin Liu

@su_lin_liu

4 months

Discrete generative models use denoisers for generation, but they can slip up. What if generation *isn’t only* about denoising?🤔 Introducing DDPD: Discrete Diffusion with Planned Denoising🤗🧵(1/11) w/ @junonam_ @AndrewC_ML @HannesStaerk @xuyilun2 Tommi Jaakkola @RGBLabMIT

5

54

226

Sulin Liu

@su_lin_liu

1 day

Re "only works in conditional sampling", agree that this is a limitation, does not allow you to sample general images in the same way prompting a LLM can achieve. the Bayes' rule kind of makes sense to me -- it's more about making use of the difference of the "conditioned score direction" and the "average score direction". I imagine this to be more training data efficient? but this also makes CFG lose the general sampling capability. probably also why the model size of diffusion models are much smaller

0

2

Sulin Liu

@su_lin_liu

4 days

@agihippo @Swarooprm7 It's not even as close as that, more like OAI v.s. some unknown lab ... (wait...

0

Sulin Liu

@su_lin_liu

12 days

RT @DarioAmodei: My thoughts on China, export controls and two possible futures

0

1K

0

Sulin Liu

@su_lin_liu

12 days

@jxmnop Not true, many are PhDs from top 2 china universities, which can be on par with US top PhDs

1

0

10

Sulin Liu

@su_lin_liu

18 days

This quirky topic summarization (edge case?) somehow made my day😂

0

5

Sulin Liu

@su_lin_liu

24 days

@kohjingyu Double mcspicy ftw!

0

1

Sulin Liu

@su_lin_liu

24 days

@AharonAzulay @ma_nanye Our recent DDPD paper might be of interest

Sulin Liu

@su_lin_liu

4 months

Discrete generative models use denoisers for generation, but they can slip up. What if generation *isn’t only* about denoising?🤔 Introducing DDPD: Discrete Diffusion with Planned Denoising🤗🧵(1/11) w/ @junonam_ @AndrewC_ML @HannesStaerk @xuyilun2 Tommi Jaakkola @RGBLabMIT

0

1

Sulin Liu

@su_lin_liu

24 days

Cool paper on inference time search for diffusion models! The use of verifier for search at test-time are similar in spirit but very different in method/theory for continuous diffusion and discrete diffusion (DDPD). So many interesting things to further explore!

Willis (Nanye) Ma

@ma_nanye

24 days

Inference-time scaling for LLMs drastically improves the model's ability in many perspectives, but what about diffusion models? In our latest study—Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps—we reframe inference-time scaling as a search problem over sampling noises. Our results show that increasing search computation can further enhance generation performance, pushing the capabilities of diffusion models further. 🧵[1/n]

0

1

7

Sulin Liu

@su_lin_liu

24 days

Nice work by introducing search to the diffusion sampling process! Our recent work DDPD used a verifier for on-path planning and search in discrete diffusion. Would be interesting to explore the connections and using best-of-N sampling for discrete diffusion.

0

1

Sulin Liu

@su_lin_liu

25 days

@kohjingyu Haha, it's very open up to interpretation😂 Personally I don't think the deterministic activations in NN match the neurons which are inherently quantum from electro-chemical processes in the brain. But really little idea about what to model the brain as a whole.

0

1

Sulin Liu

@su_lin_liu

1 month

RT @brekelmaniac: I wrote a thing about "RL or control as Bayesian inference", which encompasses - RLHF and controlled generation in LLMs -…

0

100

0

Sulin Liu

@su_lin_liu

1 month

RT @JerryWeiAI: My holiday side quest at @AnthropicAI: How well can Claude play Geoguessr? 🗺️ I had Claude look at 200K+ Street View image…

0

28

0

Sulin Liu

@su_lin_liu

1 month

@YouJiacheng @cloneofsimo Ah that's a good idea. Yea cost will be an issue, but maybe a few steps with parallel decoding. Essentially what we were doing in the paper is one-step parallel sampling of z_t.

0

1

Sulin Liu

@su_lin_liu

1 month

@YouJiacheng @cloneofsimo That's a cool idea! The model will need to infer change (flow) of the z vector from the change of x, which might offer a more consistent estimate of z.

1

0

Sulin Liu

@su_lin_liu

1 month

p(x^d | x_noisy, z^d = N) is the reconstruction probability we want to calculate when d-th dimension is picked by the planner for denoising (i.e. the denoising step) in discrete diffusion, denoising prediction is restricted (by transformer) to be per dimension instead of joint probability. (13) is a way to utilize masked denoiser to calculate the denoising step for x_t with noisy variables. the approximation error might occur for p(z_t | x_t, z_t^d = N) when using independent p(z_t^d | x_t) predictions from the planner (also a transformer). but for text, we found this approximation error is minimal, for image it might be more than for text

1

0

1

Sulin Liu

@su_lin_liu

1 month

@YouJiacheng @cloneofsimo Yes exactly!

0

1

Sulin Liu

@su_lin_liu

1 month

Sure, happy to! (13) states that the denoising probability for a noisy image (no [MASK] token state modeled) can be decomposed into the expectation over the latent state z of noisy/clean and denoising based on z. To use a mask denoiser, one can sample a realization of z_t from the planner , and then apply [MASK] to z that are noisy, and use mask denoiser to get the reconstruction probability p(x_1^d | x_t, z_t). In practice, the z predictions are made independently for each position, thus might introduce approximation errors for p(z_t | x_t)

1

0

1

Sulin Liu

@su_lin_liu

1 month

Cool visual about how DDPD ( works!

Simo Ryu

@cloneofsimo

1 month

In DDPD, planner decides which tokens to denoise, and denoiser decides what to replace it with. Model's knowledge is decomposed to guessing which part is incoherent and how its incoherent. Left is planner's prediction on 'whats wrong'. Right is denoising state. You can see its very confident on the noisy part

0

14