Jerry Li Profile Banner
Jerry Li Profile
Jerry Li

@jerryzli

Followers
3,455
Following
169
Media
14
Statuses
194

Senior researcher at @MSFTResearch . Machine learning, esp. robust and principled ML. @MIT Ph.D '18. More importantly, I once hit challenger in TFT 😎

Seattle
Joined December 2011
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@jerryzli
Jerry Li
2 months
Hello! For my biennial post here, a bit of a personal update: after 5 great years at @MSFTResearch , this fall, I'll be joining the CSE faculty at @UW . Much love to the great folks at MSR I've had the pleasure of working with. 1/2
13
10
340
@jerryzli
Jerry Li
3 years
theoretical research in a nutshell
Tweet media one
1
22
209
@jerryzli
Jerry Li
4 years
Hello! Just like @ilyaraz2 , I'm making video lectures covering material from the class I taught at UW last fall. The first video is already recorded and barring (additional) issues with Youtube it'll be online on Monday. (1/3)
3
20
137
@jerryzli
Jerry Li
3 years
Hi! So I was coerced into writing this thread---er I mean let me tell you a little bit about quantum learning and testing, and about some of the papers @sitanch , Jordan Cotler, @RobertHuangHY and I just posted on arxiv: 1/
3
5
79
@jerryzli
Jerry Li
4 years
Understanding adversarial examples is not only an important theoretical problem, but they also pose security risks to systems using AI as a component. We consider the real world impact of adversarial ML, and how systems designers can mitigate them, and how researchers can help.
1
9
47
@jerryzli
Jerry Li
4 years
@ccanonne_ However, note that this effort may have been abandoned due to the author becoming addicted to various video games
2
0
46
@jerryzli
Jerry Li
2 years
When can you learn a simple transformation of a Gaussian in high dimensions? We give the first nontrivial end-to-end guarantees for this basic problem, which also has applications to learning deep generative models. Massive props to @sitanch for hard carrying, see his thread ⬇️
@sitanch
Sitan Chen
2 years
Excited to share something I've been working on over the last year, joint with @jerryzli , Yuanzhi Li, and Anru Zhang! We give provably efficient algorithms for learning a rich family of "pushforward distributions" inspired by generative models. 1/n
Tweet media one
4
36
257
0
0
43
@jerryzli
Jerry Li
2 years
Slightly belated, but happy to announce a new paper with @sitanch , Brice Huang, and Allen Liu. We give tight lower bounds for basic quantum property testing problems without quantum memory. tl;dr Non-adaptivity is all you need, a thread 🤪 1/n
2
5
40
@jerryzli
Jerry Li
3 years
First time back in the office and immediately having technical issues with my computer. So nostalgic 🥲
1
0
35
@jerryzli
Jerry Li
3 years
Obivous bias aside: this talk was one of a number of exciting talks at the joint @MLFoundations / @SimonsInstitute workshop co-organized by Adam Klivans, Tselil Schramm, and myself. Check out the full list of talks and speakers here:
@SebastienBubeck
Sebastien Bubeck
3 years
Just watched an incredible talk by @AlexGDimakis at the Simons Institute, highly recommended. Their Iterative Layer Optimization technique to solve inverse problems with GANs make a LOT of sense! The empirical results on the famous blurred Obama face speak for themselves! 1/4
Tweet media one
3
78
457
0
6
33
@jerryzli
Jerry Li
4 years
Lecture 5: Efficient filtering from spectral signatures. The first "payoff" lecture! We give a super simple (~8 lines!) poly-time algorithm for robust mean estimation, with a fully self-contained analysis.
0
3
34
@jerryzli
Jerry Li
4 years
Lecture notes that I'll loosely follow are available here: (3/3)
1
5
32
@jerryzli
Jerry Li
4 years
New video: A short proof of the spectral signatures lemma. Alternative title: Jerry learns that manim is a thing, thinks it's amazing, and obsessively messes around with it all week. But hey, this time, all the constants are correct!
1
5
32
@jerryzli
Jerry Li
4 years
Lecture 4.5: Finite sample concentration with bounded second moments. Sorry for the weird timing, youtube hates me sometimes! Also: can you spot all of the bungled constants??? 🤔
3
5
30
@jerryzli
Jerry Li
4 years
for unknowable reasons sometimes youtube decides that it wants to upload exactly 67% of my videos...sorry about the delay, but the next video lectures are coming!
2
0
27
@jerryzli
Jerry Li
2 months
If you're a talented student interested in a nonempty subset of {TCS, quantum, machine learning, AI}, please consider UW! It's a very exciting place for all of these areas! 2/2
0
3
27
@jerryzli
Jerry Li
4 years
I'm generally planning on posting my lectures twice a week, but for my own sanity I'm only planning to do 1 per week (hopefully on Mondays) until the NeurIPS deadline.
1
0
25
@jerryzli
Jerry Li
5 years
A bit late, but let it be written here for the record that a couple of weeks ago, @SebastienBubeck promised, in front of witnesses ( @TheGregYang @hadisalmanX ), to dye his hair after his third child @ilyaraz2
3
2
19
@jerryzli
Jerry Li
4 years
I'm planning to focus only on robust statistics, so I won't cover everything I covered in the class. However, this allows me to cover more topics in this area than I could during the class, including robust covariance estimation, linear time algos, and SoS-based methods. (2/3)
1
2
18
@jerryzli
Jerry Li
4 years
@thegautamkamath how odd, I spent the whole day reading physical copies of papers I have to review and yet my stress level has increased dramatically 🤔🤔
1
0
15
@jerryzli
Jerry Li
1 year
@thegautamkamath @eytyang Whoa I won't stand for such blatant lies. I definitely provided citations.
Tweet media one
0
0
13
@jerryzli
Jerry Li
2 years
Big shout-out to all my co-authors, they are all insanely talented. Fun fact: Brice and Allen didn't know any quantum at all, and within a month or so of working with them, we had a working proof of this lower bound, which had eluded @sitanch and me for nearly 2 years...🙃 9/9
0
0
12
@jerryzli
Jerry Li
5 years
who dis🤔 (h/t @TheGregYang )
Tweet media one
Tweet media two
2
0
11
@jerryzli
Jerry Li
3 years
First off, what is a quantum distribution? They're known as "mixed states", and they are any d-dimensional PSD matrix with trace 1. Notice the eigenvalues of this state form a probability distribution. You can think of them as rotations of distributions in d-dim space. 2/
1
0
10
@jerryzli
Jerry Li
3 years
We also show that limited amounts of quantum memory are not sufficient for some of these tasks, and we show hierarchies for the power of entangled measurements. The techniques are surprisingly elementary, and we think may have application to classical learning! 10/
1
0
9
@jerryzli
Jerry Li
3 years
Big shoutout to my coauthors @sitanch , Jordan Cotler, and @RobertHuangHY , they're all super great and you should hire them (although @hseas already claimed @sitanch ). 11/end
1
0
8
@jerryzli
Jerry Li
2 years
@SebastienBubeck not pictured: the thousands of mosquitos all around us
0
0
8
@jerryzli
Jerry Li
3 years
How do you interact with a mixed state? In the classical world, we get samples from our distribution. Quantumly, one designs a measurement scheme known as a POVM, and them measures the mixed state in this POVM. The outcome of this is a draw from a classical distribution. 3/
Tweet media one
1
0
8
@jerryzli
Jerry Li
5 years
@XandaSchofield There are dozens of us!
Tweet media one
0
0
8
@jerryzli
Jerry Li
3 years
In our new papers, we show *exponential* separations for a number of learning and testing tasks, including shadow tomography, purity testing, and some channel testing tasks. In fact, sometimes there are exponential separations even with very mild entanglement. 9/
1
0
8
@jerryzli
Jerry Li
2 years
We show that the answer is no: adaptive algorithms still need Θ(d^3/2) samples. We construct a new hard instance based on Gaussian perturbations. We do this because unlike previous instances, the likelihood ratio for this instance has a really elegant self-similar structure. 5/n
Tweet media one
1
0
7
@jerryzli
Jerry Li
3 years
Previously, @SebastienBubeck , @sitanch and I showed that for the task of mixedness testing, there are polynomial gaps between the power of entangled and unentangled measurements. But are there tasks with much larger separations? 8/
1
0
7
@jerryzli
Jerry Li
3 years
In the same way that n draws from a classical distribution can be thought of as one draw from the joint distribution, n copies of a mixed state can be thought of as one big tensored up mixed state. One can then apply an arbitrary POVM to this entire tensor! 5/
1
0
7
@jerryzli
Jerry Li
3 years
This allows us to do measurements that we can't do just by applying POVMs to each individual state. Such entangled measurements are quite powerful, many algos use them to get optimal statistical rates see e.g. @BooleanAnalysis and John Wright's amazing papers. 6/
1
0
6
@jerryzli
Jerry Li
4 years
@thegautamkamath mfw I remember twitter exists and decide to check it for the first time in weeks
0
0
6
@jerryzli
Jerry Li
3 years
But they also come at a cost: they require the algorithm to store many copies of the mixed state in memory simultaneously. They're also complex to form. In our papers, we ask if they're truly necessary for a number of basic quantum learning tasks. 7/
1
1
6
@jerryzli
Jerry Li
3 years
There are two key differences between classical and quantum learning. First, notice that the measurement operation is inherently interactive, so measurements can be chosen adaptively, which is not true classically. Also, measurements can be entangled. 4/
1
0
6
@jerryzli
Jerry Li
2 years
For state certification, we generalize the prior instance-optimal bounds of @BooleanAnalysis , @sitanch , and myself, and show that they hold for arbitrary adaptive measurements, not just non-adaptive ones. Here too, you seem to get the same bounds with and without adaptivity. 7/n
1
0
5
@jerryzli
Jerry Li
2 years
This lets us boil the question down to controlling the behavior of a certain matrix martingale, which we can do using well-known techniques from matrix concentration inequalities. 🤯 6/n
1
0
5
@jerryzli
Jerry Li
2 years
It's "folklore" that if these measurements are specified ahead of time, that you need Θ(d^3/2) samples. More recently, @SebastienBubeck , @sitanch , and myself showed that even adaptive algorithms require Ω(d^4/3) samples. But this left the question: does adaptivity help? 4/n
1
0
5
@jerryzli
Jerry Li
3 years
@thesasho @BooleanAnalysis this reminds me of when I was learning probability theory my prof tried to use Av notation for Markov chains and at some point literally threw his notes in frustration and walked out of class...good times
0
0
5
@jerryzli
Jerry Li
4 years
@thegautamkamath @VishwakFTW why can't I dislike a tweet D:<
0
0
4
@jerryzli
Jerry Li
2 years
We consider mixedness testing and state certification, which are the natural quantum analogues of uniformity and identity testing, respectively. For mixedness testing of a d-dim state, @BooleanAnalysis and John Wright showed that Θ(d) samples are sufficient and necessary. 2/n
1
0
4
@jerryzli
Jerry Li
3 years
from
1
0
3
@jerryzli
Jerry Li
2 years
But their tester needs heavily entangled measurements, which makes their tester impractical to implement on existing quantum devices.😭 So what if we only consider testers that don't use entangled measurements, i.e., only measure one copy of the state at a time? 🤔3/n
1
0
4
@jerryzli
Jerry Li
5 years
@suriyagnskr I figured it out in the end!
1
0
4
@jerryzli
Jerry Li
4 years
...but perhaps consider watching this video on Thursday.
1
0
4
@jerryzli
Jerry Li
4 years
:')
0
0
4
@jerryzli
Jerry Li
4 years
Joint work with some great collaborators:
0
0
4
@jerryzli
Jerry Li
2 years
It's pretty fascinating: even though proving lower bounds against adaptive algorithms is often really hard, I don't know of any natural quantum state learning setting where it actually gives any advantage. Hence, non-adaptivity is all you need????? 🤔🤔🤔 8/n
1
0
4
@jerryzli
Jerry Li
4 years
(whoops, sorry, made it public!) (but really, maybe consider watching it on Thursday?)
1
0
3
@jerryzli
Jerry Li
5 years
Estimating Coronavirus Prevalence by Cross-Checking Countries by Jacob Steinhardt
0
1
3
@jerryzli
Jerry Li
5 years
@thegautamkamath @ccanonne_ yeah, what @thegautamkamath said. Surprisingly, the results appear to achieve incomparable sample complexities, and the techniques are quite different!
0
0
3
@jerryzli
Jerry Li
5 years
@suriyagnskr As is so often the case, the solution involved "lifting" the problem and, in this case, the chair arm.
1
0
3
@jerryzli
Jerry Li
5 years
@ccanonne_ this is not quite matrix Bernstein's but you can often do the following: since Frobenius is self-dual, you just need that <M, U> concentrates, if M is your matrix, and U is any unit Frobenius matrix. Get a good tail bound for a fixed, U, then union bound over all 2^(d^2) of them.
1
0
3
@jerryzli
Jerry Li
3 years
@thegautamkamath @tetraduzione wow good thing I randomly decided to check twitter today, outlook decided that this email was junk...
0
0
2
@jerryzli
Jerry Li
5 years
@ccanonne_ Yeah, what I suggested is exactly the analog of the proof technique used there. See for instance the discussion of Cor. 2.1.12 in my thesis (although it's almost certainly been written down before).
1
0
2
@jerryzli
Jerry Li
5 years
@ccanonne_ also, rude D:<
0
0
1
@jerryzli
Jerry Li
4 years
@zstevenwu I know, I know, your whole day must be ruined!
0
0
1
@jerryzli
Jerry Li
4 years
@thegautamkamath @zacharylipton I'm honestly shocked that i-camel isn't a thing
2
0
1
@jerryzli
Jerry Li
3 years
1
0
1
@jerryzli
Jerry Li
4 years
@kwon_youngjoon bruh couldn't you have taught that lesson without scarring me with that image...like what is that on your face
1
0
1