Alex Nichol Profile Banner
Alex Nichol Profile
Alex Nichol

@unixpickle

Followers
9,406
Following
398
Media
970
Statuses
5,647

Code, AI, and 3D printing. Opinions are my own, not my computer's...for now. Husband of @thesamnichol . Co-creator of DALL-E 2. Researcher @openai .

Joined September 2011
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@unixpickle
Alex Nichol
4 years
Luckily, years of Deep Learning research have honed my ability to sit in front of a graph and refresh every few seconds.
12
183
2K
@unixpickle
Alex Nichol
1 year
Super excited to release Shap-E, our latest work on 3D generative modeling. Paper: Code/models:
26
181
807
@unixpickle
Alex Nichol
2 years
Happy to share a cool hobby project I've been working on for the past few weeks: predicting attributes of cars from images. Blog post: Github: Gradio demo:
Tweet media one
27
70
625
@unixpickle
Alex Nichol
2 months
GPT-4o is a huge step forward for image generation. Not only is it amazing at rendering text and following captions, it also provides a very natural way to iteratively edit and compose visual concepts. 1/8
Tweet media one
Tweet media two
Tweet media three
20
56
451
@unixpickle
Alex Nichol
9 months
There is a new 3D generative model in town: Genie by @LumaLabsAI . This model is surprisingly good and insanely fast! As someone who has worked on 3D generative modeling, here are my observations and guesses about how this system works (without any insider info):
5
34
315
@unixpickle
Alex Nichol
8 months
@sama
Sam Altman
8 months
i love the openai team so much
5K
4K
73K
4
7
278
@unixpickle
Alex Nichol
8 months
OpenAI is nothing without its people.
4
15
268
@unixpickle
Alex Nichol
3 years
Now anybody can tinker with and explore the current SOTA generative image models! We have released model weights for our paper "Diffusion Models Beat GANs on Image Synthesis":
3
34
234
@unixpickle
Alex Nichol
5 years
Agent finally learning to beat the 10th floor! A few more days of training and I'll probably be ready to submit this. Whole thing is rewritten in PyTorch now, no more anyrl-py or TensorFlow.
6
30
201
@unixpickle
Alex Nichol
2 years
I'm still not convinced that large-scale RL is anything other than "smoothed-out search distilled into a small model". 1/5
10
29
190
@unixpickle
Alex Nichol
2 years
My theory for why Python is so popular for deep learning: Python is so slow that the CPU basically looks like a GPU--you need vec ops or special kernels just to get good perf. Python devs were used to this from the start (e.g. numpy) so GPU frameworks felt natural.
11
15
174
@unixpickle
Alex Nichol
11 months
So they initialized the carpet at x_T and forgot to run the diffusion sampling chain...?
Tweet media one
7
8
167
@unixpickle
Alex Nichol
2 years
Asked DALL•E 2 for some to-do list app designs.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
5
8
154
@unixpickle
Alex Nichol
6 years
Finally released our new, extremely simple meta-learning algorithm!
4
33
146
@unixpickle
Alex Nichol
3 years
My new years resolution is 256x256.
5
8
150
@unixpickle
Alex Nichol
3 years
I am very surprised that this 13B parameter model has been released! Very awesome
1
19
141
@unixpickle
Alex Nichol
11 months
I've been working through the Bayesian Flow Networks paper a bit. Here are a few of my thoughts and tips to readers...
2
8
132
@unixpickle
Alex Nichol
3 years
I tried JAX this weekend, and made a cool project with it: texture mapping with VAEs.
Tweet media one
2
18
128
@unixpickle
Alex Nichol
10 months
TIL that matrix multiplications are rather inefficient on my 6-year-old NVIDIA Titan X, achieving only 25% occupancy even with optimized kernels. I noticed this while writing my own matmul kernel in pure PTX, where I bumped up against some annoying hardware constraints. 1/6
4
3
126
@unixpickle
Alex Nichol
3 years
Finally got decent speaker conversion results using VQ-VAE + diffusion models! Sampling only takes a few seconds on a GPU.
3
17
118
@unixpickle
Alex Nichol
2 years
Super excited to see what cool things people do with this.
@_akhaliq
AK
2 years
Point·E: A System for Generating 3D Point Clouds from Complex Prompts abs: github:
Tweet media one
7
72
354
6
10
105
@unixpickle
Alex Nichol
8 months
So RNNs are just really deep networks. We now seem to be good at training really deep networks, i.e. LLMs, with layer norm and residual connections. Has anybody gone back to see if LSTM/GRU/other tricks are still necessary for RNNs, versus just a residual RNN?
17
2
104
@unixpickle
Alex Nichol
1 year
Happy to share my latest hobby project: representing 3D shapes as oblique decision trees. Surprisingly, this representation has a few neat properties that might make it useful! Blog: Github:
5
11
100
@unixpickle
Alex Nichol
5 years
This is something I always wanted to investigate, but never had time to. It would seem that MAML (and presumably Reptile as well) learn useful features in the outer loop, rather than learning how to learn useful features in the inner loop.
@OriolVinyalsML
Oriol Vinyals
5 years
Rapid Learning or Feature Reuse? Meta-learning algorithms on standard benchmarks have much more feature reuse than rapid learning! This also gives us a way to simplify MAML -- (Almost) No Inner Loop (A)NIL. With Aniruddh Raghu @maithra_raghu Samy Bengio.
Tweet media one
9
163
603
1
13
100
@unixpickle
Alex Nichol
3 years
JoJoGAN works so well because GANs trained on real faces can project stylized faces onto the manifold of real faces zero-shot. I wonder why this is the case, and what other domains would have this property.
3
5
95
@unixpickle
Alex Nichol
5 years
Wrote an incredibly concise and easy-to-use implementation of MAML in PyTorch. Just plug in any nn.Module.
0
23
88
@unixpickle
Alex Nichol
1 year
Can all the fake "AI people" become VR people now?
8
4
83
@unixpickle
Alex Nichol
2 years
If there's one thing I'm remembered for, let it be popularizing the use of corgis in generative modeling research.
3
2
85
@unixpickle
Alex Nichol
2 months
Announcing my own compression challenge. My program achieves 10x compression, but I will accept any winners who achieve at least 9x compression. I will pay $2,000 to the first person to solve this before May 29, 2024, at 12:00PM, PST.
11
8
76
@unixpickle
Alex Nichol
3 years
@ak92501 This is ridiculous. None of these are novel contributions and they don't reference any of the many past works that apply diffusion to inpainting.
Tweet media one
4
5
78
@unixpickle
Alex Nichol
3 years
Saw what I can only assume was a DL practitioner today.
Tweet media one
1
6
78
@unixpickle
Alex Nichol
2 months
One use case I'm excited about is telling a story with images. In this example, we use the model to create a character and then immerse her in a visually-consistent, fictional world. 2/8
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
3
75
@unixpickle
Alex Nichol
3 years
I hope GAN researchers remember that faces aren't the only images on the internet and at least start using ImageNet again...
4
6
76
@unixpickle
Alex Nichol
3 years
My latest vacation project was Bezier MNIST: a vector graphics version of the MNIST character dataset. Earlier this week I made an algorithm to convert images to Bezier curves, and now I applied it to all of MNIST!
Tweet media one
1
3
75
@unixpickle
Alex Nichol
1 year
PyTorch should always tell you every shape of every argument whenever a function raises a shape error.
3
0
72
@unixpickle
Alex Nichol
2 months
A neat thing about this model is that it can produce multiple consistent views of a 3D object, allowing us to reconstruct 3D models of complex shapes. 5/8
6
3
70
@unixpickle
Alex Nichol
10 months
@_akhaliq LLMs strike back against gzip?
1
0
69
@unixpickle
Alex Nichol
2 years
Vaguely inspired by Netflix's "Is It Cake?", I had DALL•E 2 generate some cake ideas.
2
12
69
@unixpickle
Alex Nichol
2 years
Idea for AI startup: film & analyze everything I do in my home, then instantly answer queries like "where did I put my scissors?"
8
3
68
@unixpickle
Alex Nichol
2 years
"A chunky corgi on a scale" #dalle
Tweet media one
6
1
68
@unixpickle
Alex Nichol
7 months
It continues to astound me that the built-in PyTorch Transformer isn't really used/usable in any serious projects. They should consider updating the standard library, since this is such a crucial building block.
7
2
62
@unixpickle
Alex Nichol
4 years
DDIMs are generative models with a latent space that is never learned or explicitly defined--it's implicitly created from the data distribution. Mind-bending stuff.
1
10
66
@unixpickle
Alex Nichol
2 years
I just read this paper as someone familiar with RL but not so much with video compression. Some thoughts and questions in thread...
@_akhaliq
AK
2 years
MuZero with Self-competition for Rate Control in VP9 Video Compression abs: MuZero-based rate control achieves an average 6.28% reduction in size of the compressed videos for the same delivered video quality level (measured as PSNR BD-rate)
Tweet media one
Tweet media two
Tweet media three
0
5
39
1
6
64
@unixpickle
Alex Nichol
5 years
New blog post on three of my recent ML research projects that _didn't_ pan out. Negative results can be informative, and I think the ideas are still interesting!
1
11
62
@unixpickle
Alex Nichol
2 years
Turning a NeRF into a textured mesh is an interesting way to explore what the model is actually learning. In this case, it's clear what the training camera angles don't cover.
1
6
59
@unixpickle
Alex Nichol
5 months
I'm surprised how few people realize it's possible to underfit and overfit at the same time.
2
2
60
@unixpickle
Alex Nichol
5 years
Hypothesis: RL can learn to play N-dimensional Pong for all N.
1
12
60
@unixpickle
Alex Nichol
4 years
Working on a new kind of discrete VAE that can generate good samples without any auto-regressive prior. Here's samples and test set reconstructions on MNIST with a 60-bit(!!!) prior:
Tweet media one
4
0
60
@unixpickle
Alex Nichol
2 years
Apparently this is $30k
Tweet media one
5
7
58
@unixpickle
Alex Nichol
19 days
I investigated this in 2017 and even at the time it looked encouraging. Glad to see it make a comeback.
@arankomatsuzaki
Aran Komatsuzaki
20 days
Learning to (Learn at Test Time): RNNs with Expressive Hidden States - Performs Linear-time RNN by propagating the gradient to the next step, i.e., test-time training - Achieves better perplexity than Mamba
Tweet media one
7
82
413
2
7
61
@unixpickle
Alex Nichol
5 months
Everyone seems excited about Mamba. What I can't glean is whether anybody has tried evals other than log-loss. Do we have a strong reason to believe these models actually compete with Transformers on realistic tasks?
6
3
57
@unixpickle
Alex Nichol
3 years
As a totally hacky and preliminary demo, check out the web app I made for my VQVAE + diffusion voice changer! I'd say it works reasonably well for me 50% of the time. Curious to see what you all can make it do!
1
7
57
@unixpickle
Alex Nichol
2 years
It's kind of mind-blowing that NeRF using COLMAP camera estimation is better at reconstructing scenes than COLMAP itself.
2
3
55
@unixpickle
Alex Nichol
2 months
Speaking of consistent characters, how about becoming movie stars? Here, the model is able to depict me and @gabeeegoooh as detectives in a stunning movie poster. Note how our names and the movie title are rendered properly! 3/8
Tweet media one
Tweet media two
3
2
53
@unixpickle
Alex Nichol
2 years
What we really want is algorithms that use lots of knowledge and reasoning to choose what to explore, how to act, how to attribute credit, etc. Results on single games do not look like this at all. 5/5
6
4
54
@unixpickle
Alex Nichol
3 years
Getting scooped on a really good idea sucks, but sometimes I have to remind myself that nothing is as bad as being forgotten as the inventor of calculus.
4
2
54
@unixpickle
Alex Nichol
2 months
The model can also compose ideas across images, e.g. here it is able to add the OpenAI logo to a photo of a coaster. 4/8
Tweet media one
1
4
52
@unixpickle
Alex Nichol
7 months
I'm reading this 2011 paper on parallelizable pseudorandom number generators, and it contains some surprisingly good knowledge for someone with no background in cryptography.
Tweet media one
2
5
54
@unixpickle
Alex Nichol
4 years
There's something unsettling about SimCLR. The paper shows that the network, at some intermediate layer, has more relevant information to classification than it does right before the output layer used for the contrastive loss. Why should this be the case?
3
5
53
@unixpickle
Alex Nichol
10 months
I have now learned that it's a fallacy to equate occupancy with efficiency! It's possible to fully utilize the hardware with low occupancy by leaning on instruction-level parallelism. @cHHillee pointed me to this very elucidating slide deck:
@unixpickle
Alex Nichol
10 months
TIL that matrix multiplications are rather inefficient on my 6-year-old NVIDIA Titan X, achieving only 25% occupancy even with optimized kernels. I noticed this while writing my own matmul kernel in pure PTX, where I bumped up against some annoying hardware constraints. 1/6
4
3
126
1
4
50
@unixpickle
Alex Nichol
1 year
I've learned that whenever I write PyTorch code with scatter_ and gather, there's a 90% chance it will have a bug.
8
1
48
@unixpickle
Alex Nichol
3 years
NeRFs are periodic functions, so this is what happens if you render a scene without clamping to the bounds of the model.
Tweet media one
4
0
49
@unixpickle
Alex Nichol
9 months
I thought the codefusion paper was too good to be true. How could such a tiny model learn to write _any_ meaningful code? The thing they didn't mention in the abstract is that they use a pretrained CodeT5 encoder, which is much larger than the entire rest of their architecture.
3
6
47
@unixpickle
Alex Nichol
8 months
I feel deceived and blindsided. Apparently, Snoop didn't actually quit weed! WTF
3
1
46
@unixpickle
Alex Nichol
6 months
My wife's summary of my workday: 1. Fix a bunch of bugs 2. Introduce one bug that breaks everything 3. Launch a bunch of experiments to run overnight 4. Realize the bug from step 2 the next day
1
0
47
@unixpickle
Alex Nichol
5 years
RL has its own "five second rule": "if it happened more than 5 seconds ago...it doesn't matter?"
1
3
45
@unixpickle
Alex Nichol
1 year
I maybe get coffee shops asking for tips, but now AWS?
Tweet media one
6
3
44
@unixpickle
Alex Nichol
2 months
By generating multiple images and context, and leveraging the model's amazing text rendering capabilities, we can do neat things like create custom fonts. 6/8
Tweet media one
Tweet media two
2
0
43
@unixpickle
Alex Nichol
8 months
Anybody else find it a bit surprising how many independent video models are suddenly dropping?
7
2
41
@unixpickle
Alex Nichol
2 months
Guys, I finally figured out what Ava said to Kyoko in Ex Machina.
Tweet media one
2
1
43
@unixpickle
Alex Nichol
2 months
Solution to my compression challenge. The files were encrypted -- simple as that. Encrypted files "look like noise", but they _do_ have structure and _can_ be compressed--if you are willing to invest lifetimes of compute into it!
7
1
42
@unixpickle
Alex Nichol
2 years
Impossible mug print came in the mail :)
Tweet media one
Tweet media two
2
2
41
@unixpickle
Alex Nichol
7 months
Has anybody tried scaling up LPIPS since the original work ~5 years ago. It seems like we could get a much better metric with modern backbones and a larger dataset of judgements.
6
0
40
@unixpickle
Alex Nichol
5 months
Just read this paper. My main question is why you'd inpaint a ball instead of outpainting a panorama, since I think they can accomplish the same thing. Paper:
Tweet media one
7
1
39
@unixpickle
Alex Nichol
6 years
Classic reward farming on StreetPursuit-v0.
3
9
40
@unixpickle
Alex Nichol
5 years
This massive neural net has been training for over two months with mostly unsupervised learning. Time for RL to begin.
Tweet media one
0
2
40
@unixpickle
Alex Nichol
3 years
I got a parking ticket in the Bay Area. You know what that means! I scraped ticket data from the online payment website. I estimate that all the tickets add up to between $79M and $130M.
Tweet media one
Tweet media two
Tweet media three
4
1
39
@unixpickle
Alex Nichol
8 months
💜
@ilyasut
Ilya Sutskever
8 months
I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.
7K
4K
33K
2
2
39
@unixpickle
Alex Nichol
9 months
So what is my guess about how this model works? My best guess is that an image diffusion model is fine-tuned to produce multi-view images, possibly with an additional depth channel, and then a novel second stage produces a NeRF quickly from this output.
3
2
38
@unixpickle
Alex Nichol
2 months
In this example, we can see just how well the model does at rendering a complex image. It uses two separate chat bubbles for the messages, renders a ton of text correctly all at once, and almost perfectly depicts a QWERTY keyboard. 7/8
Tweet media one
2
1
36
@unixpickle
Alex Nichol
2 years
Did they know the DeepNorm paper was coming out?
Tweet media one
0
3
39
@unixpickle
Alex Nichol
1 year
Discreetly scanned this purse-that-looks-like-a-snail in Florence, today. Lots of other stuff in that store I wish I had grabbed too.
5
1
38
@unixpickle
Alex Nichol
1 month
Give this a read; it may blow your mind.
6
8
38
@unixpickle
Alex Nichol
4 years
Amazing work! This is a very important moment for RL.
@AravSrinivas
Aravind Srinivas
4 years
New paper - CURL: Contrastive Unsupervised Representations for RL! We use the simplest form of contrastive learning (instance-based) as an auxiliary task in model-free RL. SoTA by *significant* margin on DMControl and Atari for data-efficiency.
Tweet media one
11
240
1K
1
6
37
@unixpickle
Alex Nichol
2 years
Latest learning Blender experiment: I created a corgi and animated it "walking".
2
0
38
@unixpickle
Alex Nichol
3 years
Thanks to Aran, I never need to tweet my own papers!
@arankomatsuzaki
Aran Komatsuzaki
3 years
Diffusion Models Beat GANs on Image Synthesis Achieves 3.85 FID on ImageNet 512×512 and matches BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution.
Tweet media one
8
111
612
1
1
37
@unixpickle
Alex Nichol
9 months
There are so many parallels between GPU programming and distributed systems design. In both cases, you have to think about synchronization, communication overhead, locality, and parallelism. The only substantial difference is reasoning about hardware failures.
2
4
35
@unixpickle
Alex Nichol
7 months
Just wasted a day or two because optimizer.load_state_dict() also resets an optimizer's hyperparameters. I was wondering why all my resumed runs with changed LR and other HPs did not change significantly.
2
0
36
@unixpickle
Alex Nichol
2 years
Some donut renderings. #dalle
Tweet media one
Tweet media two
Tweet media three
3
1
37
@unixpickle
Alex Nichol
2 years
Some fun with inpainting (note the unchanged part of the image).
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
4
36
@unixpickle
Alex Nichol
5 years
@marwanmattar Some of the best ML researchers I've ever met do not have PhDs--or any degree, for that matter. It's your choice as a company how you want to filter candidates, but there are likely better ways.
Tweet media one
0
3
35
@unixpickle
Alex Nichol
2 years
Noticed this 2018 NeurIPS paper, which doesn't actually train any neural networks. Instead, it learns decision trees with sparse linear SVMs at the branches.
1
0
34
@unixpickle
Alex Nichol
2 years
@rasbt Disclaimer: I was an author on that paper. Yes, diffusion models will really shine for most use cases. GANs might still be better for very narrow domains, but otherwise wouldn't be my first choice.
3
2
34
@unixpickle
Alex Nichol
3 years
I sometimes work on small-scale ML projects on vacation. It's always a reminder that you can still have interesting ideas and iterate on them with only a single GPU.
1
0
34