su_lin_liu Profile Banner
Sulin Liu Profile
Sulin Liu

@su_lin_liu

Followers
567
Following
2K
Statuses
186

Postdoc @MIT Ex: Machine Learning PhD @Princeton @Meta @NTUsg @NUSingapore

Joined March 2011
Don't wanna be here? Send us removal request.
@su_lin_liu
Sulin Liu
4 months
Discrete generative models use denoisers for generation, but they can slip up. What if generation *isn’t only* about denoising?🤔 Introducing DDPD: Discrete Diffusion with Planned Denoising🤗🧵(1/11) w/ @junonam_ @AndrewC_ML @HannesStaerk @xuyilun2 Tommi Jaakkola @RGBLabMIT
Tweet media one
5
54
226
@su_lin_liu
Sulin Liu
1 day
Re "only works in conditional sampling", agree that this is a limitation, does not allow you to sample general images in the same way prompting a LLM can achieve. the Bayes' rule kind of makes sense to me -- it's more about making use of the difference of the "conditioned score direction" and the "average score direction". I imagine this to be more training data efficient? but this also makes CFG lose the general sampling capability. probably also why the model size of diffusion models are much smaller
0
0
2
@su_lin_liu
Sulin Liu
4 days
@agihippo @Swarooprm7 It's not even as close as that, more like OAI v.s. some unknown lab ... (wait...
0
0
0
@su_lin_liu
Sulin Liu
12 days
RT @DarioAmodei: My thoughts on China, export controls and two possible futures
0
1K
0
@su_lin_liu
Sulin Liu
12 days
@jxmnop Not true, many are PhDs from top 2 china universities, which can be on par with US top PhDs
1
0
10
@su_lin_liu
Sulin Liu
18 days
This quirky topic summarization (edge case?) somehow made my day😂
Tweet media one
0
0
5
@su_lin_liu
Sulin Liu
24 days
@kohjingyu Double mcspicy ftw!
0
0
1
@su_lin_liu
Sulin Liu
24 days
@AharonAzulay @ma_nanye Our recent DDPD paper might be of interest
@su_lin_liu
Sulin Liu
4 months
Discrete generative models use denoisers for generation, but they can slip up. What if generation *isn’t only* about denoising?🤔 Introducing DDPD: Discrete Diffusion with Planned Denoising🤗🧵(1/11) w/ @junonam_ @AndrewC_ML @HannesStaerk @xuyilun2 Tommi Jaakkola @RGBLabMIT
Tweet media one
0
0
1
@su_lin_liu
Sulin Liu
24 days
Cool paper on inference time search for diffusion models! The use of verifier for search at test-time are similar in spirit but very different in method/theory for continuous diffusion and discrete diffusion (DDPD). So many interesting things to further explore!
@ma_nanye
Willis (Nanye) Ma
24 days
Inference-time scaling for LLMs drastically improves the model's ability in many perspectives, but what about diffusion models? In our latest study—Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps—we reframe inference-time scaling as a search problem over sampling noises. Our results show that increasing search computation can further enhance generation performance, pushing the capabilities of diffusion models further. 🧵[1/n]
Tweet media one
0
1
7
@su_lin_liu
Sulin Liu
24 days
Nice work by introducing search to the diffusion sampling process! Our recent work DDPD used a verifier for on-path planning and search in discrete diffusion. Would be interesting to explore the connections and using best-of-N sampling for discrete diffusion.
0
0
1
@su_lin_liu
Sulin Liu
25 days
@kohjingyu Haha, it's very open up to interpretation😂 Personally I don't think the deterministic activations in NN match the neurons which are inherently quantum from electro-chemical processes in the brain. But really little idea about what to model the brain as a whole.
0
0
1
@su_lin_liu
Sulin Liu
1 month
RT @brekelmaniac: I wrote a thing about "RL or control as Bayesian inference", which encompasses - RLHF and controlled generation in LLMs -…
0
100
0
@su_lin_liu
Sulin Liu
1 month
RT @JerryWeiAI: My holiday side quest at @AnthropicAI: How well can Claude play Geoguessr? 🗺️ I had Claude look at 200K+ Street View image…
0
28
0
@su_lin_liu
Sulin Liu
1 month
@YouJiacheng @cloneofsimo Ah that's a good idea. Yea cost will be an issue, but maybe a few steps with parallel decoding. Essentially what we were doing in the paper is one-step parallel sampling of z_t.
0
0
1
@su_lin_liu
Sulin Liu
1 month
@YouJiacheng @cloneofsimo That's a cool idea! The model will need to infer change (flow) of the z vector from the change of x, which might offer a more consistent estimate of z.
1
0
0
@su_lin_liu
Sulin Liu
1 month
p(x^d | x_noisy, z^d = N) is the reconstruction probability we want to calculate when d-th dimension is picked by the planner for denoising (i.e. the denoising step) in discrete diffusion, denoising prediction is restricted (by transformer) to be per dimension instead of joint probability. (13) is a way to utilize masked denoiser to calculate the denoising step for x_t with noisy variables. the approximation error might occur for p(z_t | x_t, z_t^d = N) when using independent p(z_t^d | x_t) predictions from the planner (also a transformer). but for text, we found this approximation error is minimal, for image it might be more than for text
1
0
1
@su_lin_liu
Sulin Liu
1 month
@YouJiacheng @cloneofsimo Yes exactly!
0
0
1
@su_lin_liu
Sulin Liu
1 month
Sure, happy to! (13) states that the denoising probability for a noisy image (no [MASK] token state modeled) can be decomposed into the expectation over the latent state z of noisy/clean and denoising based on z. To use a mask denoiser, one can sample a realization of z_t from the planner , and then apply [MASK] to z that are noisy, and use mask denoiser to get the reconstruction probability p(x_1^d | x_t, z_t). In practice, the z predictions are made independently for each position, thus might introduce approximation errors for p(z_t | x_t)
1
0
1
@su_lin_liu
Sulin Liu
1 month
Cool visual about how DDPD ( works!
@cloneofsimo
Simo Ryu
1 month
In DDPD, planner decides which tokens to denoise, and denoiser decides what to replace it with. Model's knowledge is decomposed to guessing which part is incoherent and how its incoherent. Left is planner's prediction on 'whats wrong'. Right is denoising state. You can see its very confident on the noisy part
0
0
14