Daniel Geng Profile
Daniel Geng

@dangengdg

Followers
664
Following
650
Media
26
Statuses
90

PhD student at @UmichCSE . Currently student researcher @GoogleDeepMind . Interested in computer vision and generative models. Previously @MetaAI and @berkeley_ai

Joined August 2016
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@dangengdg
Daniel Geng
10 months
Can you make a jigsaw puzzle with two different solutions? Or an image that changes appearance when flipped? We can do that, and a lot more, by using diffusion models to generate optical illusions! Continue reading for more illusions and method details 🧵
16
119
610
@dangengdg
Daniel Geng
5 months
What do you see in these images? These are called hybrid images, originally proposed by Aude Oliva et al. They change appearance depending on size or viewing distance, and are just one kind of perceptual illusion that our method, Factorized Diffusion, can make.
8
84
387
@dangengdg
Daniel Geng
3 months
I'm at CVPR presenting "Visual Anagrams" on - Tuesday: 10am, Poster #429 - Friday: Oral6B @ 1pm, Poster #118 (pm) Let me know if you want to chat! Also, we manufactured a bunch of these "jigsaws with two solutions." If you want one, just hunt me down in the conference hall :)
8
23
166
@dangengdg
Daniel Geng
8 months
Can we use motion to prompt diffusion models? Our #ICLR2024 paper does just that. We propose Motion Guidance, a technique that allows users to edit an image by specifying “where things should move.”
2
15
76
@dangengdg
Daniel Geng
10 months
See our website, paper, and code for more details (and more illusions)! Website: arXiv: Code: Colab notebook: Big thanks to my collaborators @invernopark and @andrewhowens !
2
7
39
@dangengdg
Daniel Geng
4 months
This is an image of Corgis, but when played as a spectrogram sounds like dogs barking! Really thankful I got the chance to work on this super fun project with first author @CzyangChen . Check out his thread for many more examples, and to see how they're made!
@CzyangChen
Ziyang Chen
4 months
These spectrograms look like images, but can also be played as a sound! We call these images that sound. How do we make them? Look and listen below to find out, and to see more examples!
1
41
168
0
3
38
@dangengdg
Daniel Geng
5 months
I'll be presenting this work at #ICLR2024 on Wednesday, 10:45am in Hall B, #81 . Stop by if you're interested or reach out if you just want to chat!
@dangengdg
Daniel Geng
8 months
Can we use motion to prompt diffusion models? Our #ICLR2024 paper does just that. We propose Motion Guidance, a technique that allows users to edit an image by specifying “where things should move.”
2
15
76
0
4
36
@dangengdg
Daniel Geng
10 months
Using our method, we can create images that change appearance when flipped or rearranged…
1
1
25
@dangengdg
Daniel Geng
5 months
We can also make these images that change when viewed in grayscale. Since the human eye can't see color under dim lighting, there is actually a physical mechanism for this illusion: these images change appearance when taken from a bright room to a dim one!
1
4
25
@dangengdg
Daniel Geng
10 months
We can even make illusions with three different subjects!
2
5
25
@dangengdg
Daniel Geng
5 months
Many many more results are available at our website! code: [code link is not a typo btw!] arxiv: website:
1
1
23
@dangengdg
Daniel Geng
10 months
Our method is zero-shot and conceptually simple: just take a diffusion model, and denoise multiple views/transformations of an image…
Tweet media one
2
0
24
@dangengdg
Daniel Geng
3 months
@andrewhowens and @invernopark already explained the CVPR t-shirt, but here's some of the _designs_ we considered. Thank you so so much to @ctocevents , @elluba , and @wjscheirer for asking us to do this!
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
2
25
@dangengdg
Daniel Geng
10 months
Most orthogonal transformations on an image are pretty meaningless, but luckily permutations are a subset of these transformations. This is where the idea of a “visual anagram” comes from—images that change appearance under arbitrary permutations of its pixels!
1
2
22
@dangengdg
Daniel Geng
10 months
…or rotated, or skewed.
1
1
19
@dangengdg
Daniel Geng
10 months
But there’s a catch! We found that not every view would work. The view needs to satisfy two conditions. The first is linearity, which ensures the transformed image is the correct mix of noise and signal:
Tweet media one
1
0
18
@dangengdg
Daniel Geng
10 months
And we can make images that change appearance when inverted, and as promised, jigsaw puzzles with two solutions.
1
0
17
@dangengdg
Daniel Geng
5 months
By using a Laplacian pyramid decomposition, we can even manage to make (somewhat decent) "triple" hybrid images
1
3
16
@dangengdg
Daniel Geng
10 months
The second condition we call “statistical consistency.” The transformed noise needs to be iid Gaussian, as that’s the assumption in diffusion. It turns out this is only possible if your transformation is orthogonal.
Tweet media one
1
0
16
@dangengdg
Daniel Geng
5 months
And by extracting low frequencies from a real image, and generating the missing high frequencies with our method, we can make hybrid images from real images. In effect we are solving a (noiseless) inverse problem. Anyways, here's Thomas Edison turning into a lightbulb:
1
0
14
@dangengdg
Daniel Geng
5 months
And we can make these images that change when motion blurred
1
1
12
@dangengdg
Daniel Geng
5 months
Also, big thank you to my collaborators @invernopark and @andrewhowens !
0
0
10
@dangengdg
Daniel Geng
5 months
Our method works by decomposing an image into a sum of components. For example into high and low frequencies, or grayscale and color components. We then use a diffusion model to control each of these components individually, in a zero-shot manner.
Tweet media one
1
0
11
@dangengdg
Daniel Geng
10 months
I had a wonderful time working w/ @ZhaoyingPan and @andrewhowens on our #NeurIPS2023 paper "Self-Supervised Motion Magnification." We propose a simple method for magnifying tiny motions in video, and also show some neat tricks like magnification targeting and test time adaptation
@ZhaoyingPan
Zhaoying Pan
10 months
(1/4) Excited to present our NeurIPS paper: "Self-Supervised Motion Magnification by Backpropagating Through Optical Flow”, a simple, self-supervised method for magnifying subtle motions. arXiv: Website: @dangengdg @andrewhowens
1
1
16
0
2
11
@dangengdg
Daniel Geng
8 months
Our work is inspired by/related to DragGAN ( @XingangP ), DragonDiffusion (Chong Mou), and DragDiffusion ( @YujunPeiyangShi ). The technique from Universal Guided Diffusion ( @arpitbansal297 ) is also quite important for our method to work.
1
0
7
@dangengdg
Daniel Geng
10 months
And secondly to this great CVPR demo, , by @RyanBurgert , @XiangLi54505720 , @abe_leite , @kahnchana , and @ryoo_michael , which makes some really cool and creative illusions using SDS.
1
0
8
@dangengdg
Daniel Geng
8 months
Big thanks to @andrewhowens for advising me on this project. Please check out links for more info and results! website: arXiv: code: visualization code:
0
0
5
@dangengdg
Daniel Geng
10 months
I also want to give a pointer to some awesome related work: First to Matt Tancik, who implemented a method quite similar to ours a while back:
1
0
6
@dangengdg
Daniel Geng
3 months
puzzles can also be found with my advisor @andrewhowens or my coauthor @invernopark !
0
0
6
@dangengdg
Daniel Geng
5 months
Finally, using our method with certain decompositions reduces (roughly!) to prior work on spatial or compositional control of diffusion models. Details are in the paper.
1
0
5
@dangengdg
Daniel Geng
3 months
@HaareBlond Thanks! You may be interested in our recent work, led by @CzyangChen , which does weird, but really really cool things to sound and spectrograms.
1
0
5
@dangengdg
Daniel Geng
1 year
This creative paper led by @alexlioralexli shows that generative models can actually be used as zero-shot classifiers!
@alexlioralexli
Alex Li
1 year
Diffusion models have amazing image creation abilities. But how well does their generative knowledge transfer to discriminative tasks? We present Diffusion Classifier: strong classification results with pretrained conditional diffusion models, *with no additional training*! 1/9
14
79
398
0
0
4
@dangengdg
Daniel Geng
10 months
@CSProfKGD Honored to be on your reading list Kosta! Also glad you printed out the paper, it makes viewing the figures much easier :)
1
0
3
@dangengdg
Daniel Geng
4 months
0
0
3
@dangengdg
Daniel Geng
8 months
Our method requires no finetuning, works on real images, and enables fine-grained editing of images with pretty complex motion. Here, we visualize the optical flow, and corresponding points between the original image and the “motion edited” image.
1
0
2
@dangengdg
Daniel Geng
3 months
@BBarash Thank you!!!
0
0
3
@dangengdg
Daniel Geng
8 months
We can also extract motion from an existing video, and apply that motion to images. Here we take the spinning of the earth, and use it to rotate various animal faces.
1
0
2
@dangengdg
Daniel Geng
3 months
@danbgoldman @andrewhowens @invernopark Hi Dan, we were thinking of trying to print more. I'll add your name to a list of people who want one and I'll let you know if we figure it out. (Big fan of your work btw!)
1
0
3
@dangengdg
Daniel Geng
8 months
Here’s some more results. Our website has tons more.
1
0
3
@dangengdg
Daniel Geng
10 months
@jon_barron Wow thanks a ton Jon! I'm glad you enjoyed it :D We've got a ton more examples that I'll be putting on the website soon
0
0
2
@dangengdg
Daniel Geng
8 months
Our method also has limitations, such as (a) failures on OOD flow fields (b) potential identity loss (c) and occasional convergence issues. It is also slow to sample from. We hope future work can help alleviate these issues.
Tweet media one
1
0
2
@dangengdg
Daniel Geng
5 months
@_jasonliu_ This is a cool idea! We were thinking that these images could be a form of steganography. Like, you're a spy and a message only appears when you look at the photo in dim lighting. It could also act as really lossy compression, but I think there's probably more practical methods
1
0
2
@dangengdg
Daniel Geng
8 months
We achieve this by doing diffusion guidance through an off-the-shelf optical flow network. Our proposed guidance loss encourages the edited image to have the user specified motion w.r.t. the source image, as estimated by the flow network.
1
0
2
@dangengdg
Daniel Geng
8 months
We also wrote a simple GUI to make these dense motion fields. By just clicking and dragging, a user can segment out an object with SAM and create complex flow fields.
1
0
2
@dangengdg
Daniel Geng
10 months
@phillip_isola Wow thanks! I'm glad you like it :D
0
0
1
@dangengdg
Daniel Geng
10 months
@CSProfKGD I think one of the hardest parts of writing this paper was arranging as many images as possible into the teaser figure :)
0
0
1
@dangengdg
Daniel Geng
10 months
@deviparikh Thank you so much Devi!!
0
0
1
@dangengdg
Daniel Geng
10 months
@eerac This might work, you could try it out! I think you would have to be careful with the noise though... An uninvertible transformation might mess up the iid Gaussian-ness of it
1
0
1
@dangengdg
Daniel Geng
3 months
0
0
1
@dangengdg
Daniel Geng
3 months
@laurayuzheng hahaha, hello! :) I'll dm you
0
0
1
@dangengdg
Daniel Geng
3 months
@DanielZoran_ @CVPR @andrewhowens Oh my goodness I just saw this, thank you so much!! :D
3
0
1
@dangengdg
Daniel Geng
10 months
0
0
1
@dangengdg
Daniel Geng
3 months
And to @invernopark 's tweet:
@invernopark
Aaron Inbum Park
3 months
Here are some of the candidates for this year's @CVPR   T-shirt design using our newest work, Factorized Diffusion !
2
7
64
0
0
1
@dangengdg
Daniel Geng
5 months
@NagabhushanSN95 It's related! I think it's more that high frequency components of the image go away when you downsample. You could check out the hybrid images paper if you want more details:
0
0
1
@dangengdg
Daniel Geng
4 months
0
0
1
@dangengdg
Daniel Geng
10 months
@HaareBlond @invernopark @andrewhowens Yeah, like you said, latent diffusion doesn't work *well* (but it does kind of work). Audio is really interesting as well! We sort of lucked out tho, because the views that work with this method correspond to visually interpretable views. idk if the same would hold for audio
0
0
1
@dangengdg
Daniel Geng
5 months
0
0
1
@dangengdg
Daniel Geng
5 months
@anand_bhattad @andrewhowens Thank you for the kind words! :D
0
0
1