Just learned: Adding randomness before quantization helps! One-bit quantization of X is sign(X). If |X|≤c then it's better to store sign(X+Y) where Y is uniformly distributed on [-c,c]!
Why? We still have
E[X] = E[c*sign(X+Y)]
1/n
Just learned: Adding randomness before quantization helps! One-bit quantization of X is sign(X). If |X|≤c then it's better to store sign(X+Y) where Y is uniformly distributed on [-c,c]!
Why? We still have
E[X] = E[c*sign(X+Y)]
1/n
@littmath
Something is off here. Do you have achtzehn (18) first? But acht (8) comes before that and achtundachtzig (88) and the other achtund* as well!
This book draft "Large-Scale Convex Optimization via Monotone Operators" by Ernest Ryu and Wotao Yin is awesome!
The "scaled relative graph" framework allows for a beautiful and totally graphical analysis of methods for monotone inclusions!
It is completely irrelevant if one should know what PAC stands for or not. Always define acronyms. Always. It's not about now - it's about producing literature that can be understood indefinitely.
This is quite amazing: It works sind 5*sqrt(2) is very close to 7. More precisely: (5*sqrt(2))² = 50 = 49+1 = 7².
Are there other possibilities? Turn out there are infinitely many of them!
1/n
Germany's math undergrad curricula often contain a course "complex analysis". When I ask colleagues why, they never answer "because it is so useful". Why is it still there and how should we replace it? My modest proposal:
Just change from 𝐜𝐨𝐦𝐩𝐥𝐞𝐱 to 𝐜𝐨𝐧𝐯𝐞𝐱! 1/10
Als "Arbeitgeber" von Postdocs sage ich: Es ist zum Verzweifeln.
Mit genügend Drittmitteln lassen sich zwar immer noch halbwegs annehmbare Bedingungen für Postdocs schaffen, es wird aber nichts leichter oder gar besser.
@TonyTheLion2500
Purely mathematical: The product of the variance of a function and the variance of its Fourier transform has a positive universal lower bound.
These older papers on fitting a few parameters in ODEs are pretty interesting:
[2007] Parameter estimation for differential equations
[2009] On Identifiability of Nonlinear ODE Models and Applications in Viral Dynamics
I had my inaugural lecture yesterday at
@unibremen
!
I talked about Ax=b and treated: What if
… we only have a single row of A at the same time?
… we only have a black-box function x → Ax (and maybe also y→ V T y with V T ≈ AT)?
… we want to have min φ(x) s.t. Ax=b?
New preprint 📄 "Linearly convergent adjoint free solution of least squares problems by random descent" by Felix Schneppe,
@ltondji
and myself
In this paper we revisited an old problem: What if you want to minimize ||Av-b||² 𝗕𝗨𝗧 1/10
Goodbye
@tuBraunschweig
! Today is the day I move out of my office. Thanks for the great time I had!
Will be on campus regularly for the next few months still, though.
@TaliaRinger
In my experience that's not exactly how it works, at least not everywhere in math. There are enough referees for "top journals" who judge harshly based on what they deem to be of interest or use. "Almost top" journals are more relaxed.
@Anthony_Bonato
The correct answer would have been "Nothing - it means absolutely nothing."
(It is used to indicate which one is the integration variable, though.)
I had to cancel my trip to trip to the
@CIRM
workshop "Optimization for Machine Learning" this week. My talk would have been today so I thought I twitter the message. Thread on "Extensions of the randomized Kaczmarz method: sparsity and mismatched adjoints"
Ping
@salmonjsph
Yesterday I attended a PhD defense where the student stuttered. The stutter was severe and at times it was hard for me to follow. I got very much distracted by the stutter.
End of the story: It was an excellent defense and the student graduated with distinction!
New optimization paper alert! "Directional Smoothness and Gradient Methods: Convergence and Adaptivity". We develop new sub-optimality bounds for gradient descent that depend on the local smoothness (curvature) along the optimization path.
“Most of the time it doesn’t work out. As pretty much every single graduate student in mathematics can attest, [..] you probably spend two-thirds of your time getting stuck and banging your head against a wall.” Encouraging quote, especially from this guy!
New preprint "A Bregman-Kaczmarz method for nonlinear systems of equations" with
@gowerrobert
and Maximilian Winkler on the arxiv:
What's it about? 🧵 1/5
Reminds me of the Seven C's of Analysis:
Convergence, completeness, closedness, compactness, continuity, connectedness, and convexity.
(From Kenneth Lange's book "Optimization".)
@TonyTheLion2500
I find the view from signal processing most intuitive: A signal that is localized in time cannot be well localized in frequency and vice versa. Put differently, if you want to build a signal that is localized in time you need to use arbitrarily large frequencies.
This is a remarkable fact. Especially since the definition of "convex" only needs a linear structure but no topology whatsoever. As a consequence, the statement is true for any topology! (Hint: Change of topology changes both the notion of continuity and interior.)
Was trying to figure out if "stepsize" or "step size" is more common. Checked just to learn that it contains "step size" seven times and "stepsize" six times… 🤔
Fun problem: How to sample from {1,…m} uniformly if you only have a fair coin?
Follow-up: Which method needs the lowest number of coin tosses on average? 1/4
Jannis Marquardt,
@emanuele_naldi
and me just uploaded the paper "The Degenerate Variable Metric Proximal Point Algorithm and Adaptive Stepsizes for Primal-Dual Douglas-Rachford" to the arxiv (). The convergence of DR still seems mysterious the me and… 1/2

Yesterday I uploaded the preprint "Adaptive Bregman-Kaczmarz: An Approach to Solve Linear Inverse Problems with Independent Noise Exactly" (with
@ltondji
and
@Idriss_Tondji
) to the arxiv. Here is a 
1/7
Well, I may not have won the Lehrleo from
@tuBraunschweig
this year but at least my teaching evaluation says
"Herr Lorenz ist ein Ehrenmann"
💪
(Sorry, no translation possible, I guess.)
The history of CT is quite amazing. This "algebraic reconstruction technique" is a reinvention of the Kaczmarz method from 1931 and it is still under intense research these days! (Btw. the method "just" solves a linear system of eqs.)
The first CT scan of a patient's brain was taken on this day in 1971.
The images from these scans took over 2 hours to be processed by algebraic reconstruction techniques on a large computer.
Recently I uploaded the paper "Learning Variational Models with Unrolling and Bilevel Optimization" (together with
@gusgustav85
, Niklas Breustedt and
@timodewolff
). What's that about? Find out in a… 🧵
1/7
@francoisfleuret
For me the SVD finally provided good intuition: A linear map is rotation, axis scaling and another rotation. The inverse is "inverse rotation, inverse scaling, and inverse rotation". The transpose inverts the scaling but not the scaling
Hey, math tweeps! I am looking for a convergence proof for projected SGD for a minimization problem of the form
minᵤ 𝔼ₓ[F(u,x)] s.t. u∈ C
with convex C. Does such a proof exist already?
(Algo is: sample x, do gradient step u − τ∇ᵤF(u,x) and project result onto C.)
We've got
@paperswithcode
and
@arxiv
(even partnered up) which is great, but why don't we have "papers with videos" yet? There are tons of recordings of talks but they're hard to find when you found the paper...
#AcademicTwitter
#mathtwitter
Has surprising properties when used for regression with noisy samples: Seems to converge to true function (in some Sobolev norm) when number of samples go to infty (but constant noise level). Anyone knows a proof?
Shepard interpolation is a surprisingly simple multi-dimensional interpolation method. It uses singular radial basis functions inversely proportional to the distances to the samples. Nearest neighbor interpolation is the limit case.
We are hiring an assistant prof in statistical learning and information theory! Please retweet and forward to potential candidates.
Any questions directly to me!
Did you ever use unrolling for training of a variational model? Wonder why it worked (or didn't as expected)? Come to MS79 at
#SIAMMDS22
in Palm 3 tomorrow at 9:30a (6:30p CEST) organized by
@moskitos_bite
and myself!
(Part 2 will be on Thursday, same time, same place.)
MFO in Oberwolfach has this amazing library and these bookshelves where they display some new books. Somehow they always manage to present quite many books I am eager to read. Here are some in a 🧵
Oldies but goldies: P. L. Lions and B. Mercier, Splitting Algorithms for the Sum of Two Nonlinear Operators, 1979. Douglas-Rachford algorithm (dual of ADMM) minimizes the sum of two convex functions. Typical convergence pattern is spiralling.
@docmilanfar
Would make much more sense to call lse(z) the softmax, and σ(z) the softargmax (because the former approximates the max and the latter the argmax…), but that battle is lost…
Spoiler for the inverse problems lecture next Monday: Movie time!
2 Years ago Katie Bouman talked about how she plans to reconstruct the picture of a black hole and guess what: it's inverse problems and computational imaging! via
@TEDTalks
If you attend the
@TheSIAMNews
Imaging conference next week and are interested in monotone inclusions, TV reg and such, don't miss our minisymposium "Challenges in Nonsmooth Convex Optimization and Monotone Inclusions Problems"!
(btw, timezome is UTC-4)
I can't remember a time at which I found such an illustration in a math book helpful.
(It's from Luenberger's "Optimization by vector space methods" which is awsome. But this illustration… 🙄
Takeaway from this rundown of software for scientific computing: If
@matlab
is the BMW sedan of scientific computing, then
@ThePSF
(Python) is the Ford pickup, and
@JuliaLanguage
is Tesla.
Feels about right.
Optimal transport is the new math of deep learning, and deep learning is practically all of artificial intelligence these days, and artificial intelligence in the new electricity according to
@AndrewYNg
I conclude that optimal transport is an electrifying field!
A few days ago I upload the paper "Chambolle-Pock's Primal-Dual Method with Mismatched Adjoint" together with Felix Schneppe. We wanted to know what happens to CP when the computation of the adjoint is not exact.
I was reading the paper "Proximal methods for point source location" and found myself looking for the "like"-button 🤦
On a second thought: arXiv should really implement something like that…
I don't care the slightest bit about tau, but wanted to add that ]a,b[ is the only way to write open intervals. (a,b) is either the ordered pair or (if you are evil) the inner product.
A data science master costs between 30.000 and 100.000!? Not at
@tuBraunschweig
! Our international Master of Data Science starts this fall and you just pay standard admission (< €400 per semester). Apply until July 15!
Our master program Data Science starts in a few weeks! It's the first program at
@tuBraunschweig
that's fully taught in englisch - looking forward to welcoming the first students!
Wow - I got mail from sixth graders from Munich. They liked our
#geometry
#origami
project and built their own torus!
They also have this cool dragon made of modular origami and a lot of other stuff. Impressive!