Ernest Ryu Profile
Ernest Ryu

@ErnestRyu

Followers
1,514
Following
311
Media
24
Statuses
169

Professor of Applied Mathematics at UCLA. Interested in deep learning and optimization.

Seoul, Korea
Joined March 2019
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@ErnestRyu
Ernest Ryu
5 months
I am excited to announce that I am joining the UCLA Department of Mathematics @uclamath as an Assistant Professor this coming Fall. I will dearly miss my time at Seoul National University, but looking forward to the new exciting opporunities ahead.
21
11
327
@ErnestRyu
Ernest Ryu
2 years
Found our book being advertised at ICML! Our book on large-scale convex optimization by Wotao and myself is scheduled to drop in October 2022. You can find a preprint here. (1/3)
Tweet media one
6
32
226
@ErnestRyu
Ernest Ryu
2 years
Our book with @wotaoyin is finally out! The final draft has been sent to the printers, and orders will be shipped in November.
@cambUP_maths
Cambridge University Press - Mathematics
2 years
Large-Scale Convex Optimization by @ErnestRyu and @wotaoyin A unified analysis of first-order optimization methods, including parallel-distributed algorithms, using monotone operators. ☑️
Tweet media one
0
14
80
1
40
226
@ErnestRyu
Ernest Ryu
1 year
🎉 Excited to announce that our lab has 4 papers accepted at NeurIPS. 1 in empirical deep learning, 1 in RL theory, and 2 in convex optimization theory. I'll post an advertisement for these papers, one per day! 1/N
1
12
162
@ErnestRyu
Ernest Ryu
10 months
I recently gave a talk at the Stanford ISL colloquium entitled ‘Toward a Grand Unified Theory of Accelerations in Optimization and Machine Learning.’ In this talk, I explain why I find acceleration an exciting and vibrant subject.
1
22
136
@ErnestRyu
Ernest Ryu
2 years
Excited to announce that our lab at SNU has 3 papers accepted at ICML, 2 as long talks! Let me use this opportunity to advertise these papers quickly. (1/10)
4
8
106
@ErnestRyu
Ernest Ryu
1 year
Final NeurIPS paper advert Paper 4 (Convex optimization) When minimizing a convex function f(x), we usually want an approximate solution such that f(x) is small. But what if we instead want ||∇f(x)|| to be small? 12/N
2
4
98
@ErnestRyu
Ernest Ryu
1 year
Image clustering is a quintessential unsupervised learning task, but clustering methods often fail to cluster images the way you want. Wouldn't it be great if you could "tell" the method, in words, the criterion by which it should cluster the images? (1/8)
1
8
83
@ErnestRyu
Ernest Ryu
1 year
NeurIPS paper advert continued. Paper #2 (RL theory) The classical value iteration (VI) is a cornerstone of RL theory. Surprisingly, VI is suboptimal! We show how to accelerate it in 'Accelerating Value Iteration with Anchoring' @IjongminS 6/N
Tweet media one
2
5
74
@ErnestRyu
Ernest Ryu
1 year
NeurIPS paper advert continued. Paper 3 (Convex optimization) Anchor acceleration is a mechanism distinct from Nesterov acceleration, and it has been used to accelerate algorithms for minimax optimization, fixed-point problems, and value iterations. 9/N
Tweet media one
1
9
76
@ErnestRyu
Ernest Ryu
2 years
ICML2022! My first in-person conference since ICML 2019. There’s such a surreal feeling to finally returning to normalcy. Looking forward to having a great time.
Tweet media one
2
0
64
@ErnestRyu
Ernest Ryu
9 months
Accepted at ICLR 2024! Vienna, here we come!
@ErnestRyu
Ernest Ryu
1 year
@sehyunkwon22 @jadenpark98 @minkyu_kim__ @jaewoong_cho @Kangwook_Lee In this work, we use a vision-language model (LLaVA) and a large language model (GPT-4) to cluster images based on user-specified text criteria, and it works very well! A given dataset can be clustered in multiple ways by varying the clustering criterion. (3/8)
Tweet media one
1
1
12
0
7
56
@ErnestRyu
Ernest Ryu
11 months
Proud advisor moment: My student TaeHo @TaeHo_Y00N had a blast of a poster presentation today at NeurIPS! (1/2)
Tweet media one
1
2
51
@ErnestRyu
Ernest Ryu
1 year
Professor Hédy Attouch, who has made significant contributions to convex analysis, monotone operator theory, and optimization, has passed away. Much of my research on splitting methods is influenced by his work. Thank you, Professor Attouch, for being an inspiring mathematician.
@GuillaumeG_
Guillaume Garrigos
1 year
It is with extreme sadness that I have learned that one of my PhD avisors, Hédy Attouch, has passed away.
Tweet media one
2
10
111
1
9
39
@ErnestRyu
Ernest Ryu
2 years
The performance estimation problem (PEP) is a computer-assisted proof methodology for optimization algorithms. In our collaboration with @shuvo_das_gupta and Bart Van Parys of MIT, we greatly expand this framework using branch-and-bound algorithms. (1/8)
@shuvo_das_gupta
Shuvomoy Das Gupta
2 years
Excited to share a presentation for our paper "BnB-PEP: A Unified Methodology for Constructing Optimal Optimization Methods" (joint work with Bart Van Parys and @ErnestRyu ) on YouTube: .
0
4
21
1
6
29
@ErnestRyu
Ernest Ryu
3 years
Roland Glowinski is known for being one of the inventors of ADMM. This is a huge loss to the optimization community. I guess we are all part of the circle of life, and great mathematicians also pass, even if their work lives on.
0
0
18
@ErnestRyu
Ernest Ryu
1 year
We introduce 'Censored Sampling of Diffusion Models Using 3 Minutes of Human Feedback', which is joint work with folks at the KRAFTON AI Research Center! @TaeHo_Y00N @KibeomM00 @keonlee9420 @jaewoong_cho 3/N
Tweet media one
1
0
18
@ErnestRyu
Ernest Ryu
1 year
@prof_grimmer This is very nice! If I may advertise relevant results from my work with @shuvo_das_gupta and Bart, we numerically investigate the *optimal* stepsizes of GD. (1/N)
Tweet media one
2
1
18
@ErnestRyu
Ernest Ryu
1 year
Paper #1 (Empirical deep learning) Diffusion models are very powerful, but they often generate undesirable images alongside the desired ones. Can we prevent (censor) a diffusion model from generating bad images while allowing it only to generate good images? 2/N
Tweet media one
1
0
16
@ErnestRyu
Ernest Ryu
1 year
@TaeHo_Y00N @KibeomM00 @keonlee9420 @jaewoong_cho We show that labeling around 100 images as "good" or "bad" (which takes about 3 minutes of human time) is enough to train an auxiliary reward model that can guide the sampling process away from bad images. 4/N
1
0
15
@ErnestRyu
Ernest Ryu
1 year
@TaeHo_Y00N @KibeomM00 @keonlee9420 @jaewoong_cho The reward model is a small ResNet trained on a few hundred labeled images, and the diffusion model's score network is not fine-tuned. Our approach to censoring is human-label efficient, computationally lightweight, and very effective! 5/N (Paper #2 advert coming tomorrow)
0
0
15
@ErnestRyu
Ernest Ryu
1 year
@AsuOzdaglar @chanwoopark20 @Jaeyeon_Kim_0 Usual notions of duality in optimization are between functions and optimization problems. We establish the first instance of a duality between *algorithms*. Our view is that an optimization algorithm is a mathematical object that we can take the dual of! 16/N
1
0
14
@ErnestRyu
Ernest Ryu
1 year
In 'Continuous-time Analysis of Anchor Acceleration', we perform a continuous-time analysis of this mechanism and provide some insight into the counter-intuitive acceleration mechanisms. @Jaewook_J_Suh 11/N
0
1
14
@ErnestRyu
Ernest Ryu
11 months
I'll be at #NeurIPS2023 , HMU if you want to chat!
0
0
13
@ErnestRyu
Ernest Ryu
1 year
In 'Time-Reversed Dissipation Induces Duality Between Minimizing Gradient Norm and Function Value', joint work with Asu at MIT, we present H-duality, a duality principle relating the acceleration mechanisms. @AsuOzdaglar @chanwoopark20 @Jaeyeon_Kim_0 14/N
1
1
13
@ErnestRyu
Ernest Ryu
2 years
We want this book serves as a nice introduction to the, in my view, elegant field of convex optimization algorithms. Slides (with their LaTeX source code) and YouTube lecture videos are also available. (3/3)
1
0
12
@ErnestRyu
Ernest Ryu
1 year
It turns out that Nesterov acceleration is not optimal for reducing gradient magnitude; Kim-Fessler acceleration is. Interestingly, there is a glaring symmetry between these two acceleration mechanisms. 13/N
Tweet media one
1
0
11
@ErnestRyu
Ernest Ryu
1 year
@sehyunkwon22 @jadenpark98 @minkyu_kim__ @jaewoong_cho @Kangwook_Lee In this work, we use a vision-language model (LLaVA) and a large language model (GPT-4) to cluster images based on user-specified text criteria, and it works very well! A given dataset can be clustered in multiple ways by varying the clustering criterion. (3/8)
Tweet media one
1
1
12
@ErnestRyu
Ernest Ryu
11 months
@damekdavis Terry’s productivity: -47 lines of code
0
0
11
@ErnestRyu
Ernest Ryu
1 year
@AsuOzdaglar @chanwoopark20 @Jaeyeon_Kim_0 This concludes my 4-part announcement of our lab's NeurIPS papers. If anybody wants to chat about our work, shoot us a DM or meet us at NeurIPS 23! 17/N
0
0
11
@ErnestRyu
Ernest Ryu
2 years
Say you have an optimization algorithm, and you want to find its convergence proof. Do you want computer assistance in finding the proof? If so, check out Shuvo's talk given at the 2023 PEP Workshop, UCLouvain! 🇧🇪
@shuvo_das_gupta
Shuvomoy Das Gupta
2 years
🎥 Excited to share the YouTube video of my presentation at the 2023 Workshop on Performance Estimation Problems (PEP), UCLouvain, Belgium! 🇧🇪 Check out my talk titled "Design and analysis of first-order methods via nonconvex QCQP frameworks" at (1/5)
1
3
15
0
1
10
@ErnestRyu
Ernest Ryu
1 year
Anchoring pulls iterates back towards the starting point of the algorithm, and the strength of this anchor diminishes over iterations. This counter-intuitive acceleration mechanism is understood much less than Nesterov acceleration. 10/N
Tweet media one
1
0
9
@ErnestRyu
Ernest Ryu
2 years
Dear Twitter ML theory folks, Does anybody know what the plans are for COLT23? I am part of a group planning a workshop next summer, and we may choose to run it as a satellite event of ICML or COLT. (1/2)
1
0
9
@ErnestRyu
Ernest Ryu
11 months
Taeho presented work on censoring (aligning) diffusion models to generate good images, as defined through a minimal amount of human feedback. Here’s the poster and the link to the paper (2/2)
Tweet media one
0
0
7
@ErnestRyu
Ernest Ryu
1 year
Through modern multi-model foundation models (which are 🤯 mind-blowingly powerful) we present a new paradigm of image clustering. If interested in further details, check out our preprint or meet us at the NeurIPS workshop! (8/8)
0
0
9
@ErnestRyu
Ernest Ryu
2 years
In this book, we provide a unified presentation of convex optimization algorithms using monotone operators. In chapters 1-3, there is only *one* theorem, and approximately 40 convex optimization algorithms are analyzed by reducing them to instances of "Theorem 1". (2/3)
1
1
8
@ErnestRyu
Ernest Ryu
1 year
@AsuOzdaglar @chanwoopark20 @Jaeyeon_Kim_0 In continuous time, taking the H-dual corresponds to reversing the time dependence of the dissipation (friction) term. The H-duality principle, loosely speaking, shows that under the H-dual, a function-value guarantee translates to a gradient-norm guarantee and vice versa. 15/N
Tweet media one
1
0
8
@ErnestRyu
Ernest Ryu
1 year
In addition to the faster rate, we provide a matching complexity lower bound. This proves that Anc-VI is optimal (while classical VI is suboptimal) in the regime γ≈1. 8/N The crucial mechanism is 'anchoring'. More on this in the paper #3 advert coming tomorrow!
Tweet media one
1
0
8
@ErnestRyu
Ernest Ryu
1 year
@sehyunkwon22 @jadenpark98 @minkyu_kim__ @jaewoong_cho @Kangwook_Lee Step 3 uses the LLM to assign the image descriptions from Step 1 to the K cluster names from Step 2. Finally, if the results are unsatisfactory, you can adjust your text criterion to be more specific and go back to Step 1. (7/8)
1
0
7
@ErnestRyu
Ernest Ryu
2 years
Question to RL twitter. Lately, I've been studying modern deep RL, and there's one thing that has puzzled me. Why does double Q-learning help? I understand that double DQN mitigates overestimation of Q values, but why should overestimating Q-values lead to bad policies? (1/3)
1
0
7
@ErnestRyu
Ernest Ryu
1 year
@sehyunkwon22 @jadenpark98 @minkyu_kim__ @jaewoong_cho @Kangwook_Lee If you think about it, providing only the images for clustering is not a realistic setup. In practice, users are willing to guide the clustering algorithm through some minimal input, and foundation models are available for us to use. (4/8)
1
0
7
@ErnestRyu
Ernest Ryu
1 year
Classical VI converges at rate O(γ^k), but this rate is problematic when the discount factor γ is close to 1, and the iteration count k is not too large. Anchored value iteration (Anc-VI) achieves a faster rate on the Bellman error. 7/N
Tweet media one
1
0
6
@ErnestRyu
Ernest Ryu
2 years
@Singularitarian I’m not sure what Daniel’s context is, but I am very happy to answer any questions about my book on Twitter or MathOverflow! So far, I have gotten email questions, but no public questions yet.
1
0
6
@ErnestRyu
Ernest Ryu
1 year
@sehyunkwon22 @jadenpark98 @minkyu_kim__ @jaewoong_cho @Kangwook_Lee Recent foundation models allow us to effectively incorporate textual user input into this computer vision task. Given a user-specified text criterion TC, Step 1 uses the instruction-tuned vision-language model LLaVA to extract a description of the image relevant to TC. (5/8)
Tweet media one
1
0
6
@ErnestRyu
Ernest Ryu
1 year
@sehyunkwon22 @jadenpark98 @minkyu_kim__ @jaewoong_cho @Kangwook_Lee Step 2a uses the LLM GPT-4 to summarize the image description into a single word in the context of TC. Step 2b uses the LLM to consolidate these single-word descriptions into K cluster names, where K is a pre-specified cluster count. (6/8)
1
0
6
@ErnestRyu
Ernest Ryu
3 years
@konstmish My work with monotone operators has partially been driven by this desire to find some unifying structure to the zoo algorithms. However, I find myself adding to this "problem" by publishing new methods and thereby increasing, not decreasing, the number of optimization algorithms!
1
0
5
@ErnestRyu
Ernest Ryu
3 years
@PatrickKidger Hi Patrick, I understand your frustration, but I must say that I found your 2020 COLT talk on width-bounded depth-wise uni. apx to be incredibly informative and useful. It was a great introduction to an area I had mot worked on before.
1
0
5
@ErnestRyu
Ernest Ryu
2 years
@GlockerBen Our paper was desk rejected before the start of the review period I'm sorry this happened to you, and I share your disappointment. (1/4)
1
1
5
@ErnestRyu
Ernest Ryu
3 years
@PatrickKidger The recorded talk lead me to read your paper and to even remember your name 😉 So it your time spent on the video has produced some value.
0
0
4
@ErnestRyu
Ernest Ryu
1 year
@prof_grimmer @shuvo_das_gupta Detail (iii) Our numerics go up to N=50. Our work says nothing about whether this phenomenon persists for larger N. @prof_grimmer ’s work makes progress towards a theoretical understanding of this phenomenon. (5/N, N=5)
0
0
4
@ErnestRyu
Ernest Ryu
2 years
If anything's been decided about COLT23, it'd be really helpful info. Is there someone I could ask publically or privately? Tagging folks in the COLT22 organization committee in case they can help. @mraginsky @ccanonne_ @suriyagnskr @bguedj @CiaraPikeBurke @vernadec (2/2)
2
0
4
@ErnestRyu
Ernest Ryu
3 years
@konstmish Personally, I am quite interested in the problem of "there are too many optimization algorithms". Zhao, Lessard, Udell has recently proposed an algorithm (based on ideas from control theory) for detecting equivalent algorithms
0
0
4
@ErnestRyu
Ernest Ryu
10 months
@AbecidAdam Sure thing. Here are my slides:
0
0
4
@ErnestRyu
Ernest Ryu
1 year
@prof_grimmer @shuvo_das_gupta Detail (ii): These step sizes are found by solving a non-convex QCQP via a spatial branch-and-bound method. (4/N)
1
0
3
@ErnestRyu
Ernest Ryu
1 year
@prof_grimmer @shuvo_das_gupta Detail (i): Optimal stepwise is not unique. Sometimes, but not always, we can permute the orders or some step sizes and maintain the same optimal performance. (3/N)
1
0
3
@ErnestRyu
Ernest Ryu
6 months
@shuvo_das_gupta @RiceUniversity @Columbia @ChrKroer Congrats Shuvo! You'll be a wonderful professor!
1
0
3
@ErnestRyu
Ernest Ryu
3 months
@hanrui_w @UCLAComSci Congrats! Welcome to UCLA!
0
0
3
@ErnestRyu
Ernest Ryu
1 year
@boazbaraktcs It was wonderful to have you Boaz! There was so much excitement among the students. Thanks for visiting SNU and delivering an inspiring lecture series!
1
0
3
@ErnestRyu
Ernest Ryu
3 years
@konstmish In algebra, mathematicians classify groups and other structures into a handful of equivalence classes. I wonder if optimization algorithms can be collected, classified, and organized in a similar manner.
0
0
3
@ErnestRyu
Ernest Ryu
2 years
Our central thesis is that the analysis of such ODEs simplifies dramatically under a change of variables into "dilated coordinates." The classical O(1/t^2) rate follows from a conservation-of-energy argument, and the mysterious "z-iterates" of AGM are conjugate momentum. (4/10)
1
0
2
@ErnestRyu
Ernest Ryu
3 years
@konstmish This is interesting. Is this a talk available online? I am interested to hear what Nesterov offered as a solution.
8
0
2
@ErnestRyu
Ernest Ryu
3 years
@sangwoomo @jasondeanlee Uni. apx. thm., in its classical form, is not quantitative, i.e., we can approximate with large enough size, but who knows how large. Since basically all neural networks (wide or deep) now have uni. apx. thms and there are constructive ways to find the approximations, (2/3)
1
0
2
@ErnestRyu
Ernest Ryu
2 years
The prior analysis of overparameterized neural networks via the NTK theory focused on infinitely wide networks. For infinitely deep networks with the standard LeCun initialization, the NTK becomes degenerate. (9/10)
1
0
2
@ErnestRyu
Ernest Ryu
2 years
@aaron_defazio @wotaoyin Thanks for the shoutout Aaron! I'll let the artist know that you are appreciating the cover art :)
1
0
2
@ErnestRyu
Ernest Ryu
2 years
@GlockerBen I also believe that the requirement was not made sufficiently clear. When the same mistake becomes numerous, at some point, one can reasonably attribute the cause to poor UX design (unclear announcement) rather than individuals' oversight. (3/4)
1
1
2
@ErnestRyu
Ernest Ryu
2 years
@GlockerBen I asked the program chairs if they could extend some leniency. They promptly and respectfully stated that the decision was final. I was sad, but the mistake is my fault, and I understand that keeping order for a conference with ~10000 submissions is a near-intractable task. (2/4)
1
1
2
@ErnestRyu
Ernest Ryu
1 year
@prof_grimmer @shuvo_das_gupta Our numerics go up to about N=50 iterations, but we find that mostly small step sizes with an occasional longstep is optimal. (2/N)
Tweet media one
1
0
2
@ErnestRyu
Ernest Ryu
2 years
0
0
2
@ErnestRyu
Ernest Ryu
2 years
@FeiziSoheil Hi Soheil, we haven't met, but I know you through your work on GANs and your Deep Learning Foundations YouTube videos. It'd be great if we could meet and chat at ICML.
0
0
2
@ErnestRyu
Ernest Ryu
2 years
@RdeHeide @AssocCompLearn It’s public: Bangalore Jul 19-22
1
0
2
@ErnestRyu
Ernest Ryu
3 years
@sangwoomo @jasondeanlee the remaining question is in their efficiency. My understanding of Jason’s line of work is to obtain quantitative results. Two-layer nets *can* approximate the function with SGD, it just needs too many neurons. Three-layers nets can do the job with far fewer neurons. (3/3)
1
0
2
@ErnestRyu
Ernest Ryu
2 years
@GlockerBen In any case, "desk rejections" now is nonsense. My guess is that many missing checklists were overlooked (or perhaps ACs refused to desk reject), and the PCs just found out about them. So, to enforce the rules consistently, they are now belatedly being desk rejected. (4/4)
0
1
2
@ErnestRyu
Ernest Ryu
2 years
@shuvo_das_gupta We call this the BnB-PEP. This vastly expands the applicability of the PEP's computer-assisted proof methodology. If interested, please check out our paper and Shuvo's excellent and detailed talk. (7/8)
1
0
2
@ErnestRyu
Ernest Ryu
3 months
@madsjw Fantastic! Congrats!
0
0
2
@ErnestRyu
Ernest Ryu
2 years
Our work shows that an infinitely-deep ReLU MLP with a particular initialization is provably trainable. The key technical challenge is establishing the NTK's invariance in the infinite-depth limit, which requires controlling the behavior under infinite compositions. (10/10)
0
0
2
@ErnestRyu
Ernest Ryu
3 years
@sangwoomo @jasondeanlee Rahimi and Recht's random kitchen sink approach can be used to constructively perform the universal approximation with 2-layer MLPs with only the final layer trained. You can also appeal to the NTK theory to show something similar for L-layer MLPs with all layers trained. (1/3)
1
0
2
@ErnestRyu
Ernest Ryu
2 years
0
0
1
@ErnestRyu
Ernest Ryu
2 years
0
0
1
@ErnestRyu
Ernest Ryu
3 years
@konstmish To shamelessly promote my work, I worked on a result establishing that under certain (very restrictive) conditions, DRS is the only splitting method that utilizes resolvents.
0
0
1
@ErnestRyu
Ernest Ryu
4 months
@masonporter Nice! And you also have an academic grandchild now!
0
0
1
@ErnestRyu
Ernest Ryu
3 years
@weijie444 @SimonShaoleiDu It’s a great paper. Congrats on the acceptance!
1
0
1
@ErnestRyu
Ernest Ryu
2 years
Continuous-time models of momentum-based accelerated gradient methods do provide some insight, but their analyses are still somewhat mysterious; where do these energy functions come from, and what do the terms mean? (3/10)
1
0
1
@ErnestRyu
Ernest Ryu
2 years
@konstmish Congrats Konstantin!
0
0
1
@ErnestRyu
Ernest Ryu
2 years
For the general problem of fixed-point iterations with a non-expansive or contractive operator, are the simple fixed-point iterations à la Banach or KM optimal? Can we characterize the optimal complexity as was done for Nesterov's accelerated gradient method? (6/10)
1
0
1
@ErnestRyu
Ernest Ryu
2 years
@TaylorAdrien This is a beautiful paper. Congrats for the MPA acceptance!
1
0
1
@ErnestRyu
Ernest Ryu
10 months
0
0
1
@ErnestRyu
Ernest Ryu
2 years
0
0
1
@ErnestRyu
Ernest Ryu
2 years
1
0
1
@ErnestRyu
Ernest Ryu
2 years
@shuvo_das_gupta Given a fixed optimization algorithm, finding its best convergence proof is, in principle, an optimization problem, and we call this the performance estimation problem (PEP). Surprisingly, the PEP can be formulated as a finite-dimensional (convex) SDP, no relaxation. 🤯 (2/8)
2
0
1
@ErnestRyu
Ernest Ryu
2 years
In this work, we accelerate such fixed-point iterations via an "anchoring" mechanism (from O(1/k) to O(1/k^2)) and provide an exact matching complexity lower bounds to establish exact optimality. (7/10)
1
0
1
@ErnestRyu
Ernest Ryu
3 years
@Dirque_L Wow. Thanks for the shoutout Dirk! @wotaoyin and I are working hard to finish the draft and to get it published!
0
0
1
@ErnestRyu
Ernest Ryu
2 years
@gautamcgoel @shuvo_das_gupta Of course, proofs establishing f(x^N)-f(x*)\le O(1/N) are known, but their constants are worse. The best proof for the simple gradient descent is not known.
1
0
1