Beyond excited to be starting this company with Ilya and DG! I can't imagine working on anything else at this point in human history. If you feel the same and want to work in a small, cracked, high-trust team that will produce miracles, please reach out.
Superintelligence is within reach.
Building safe superintelligence (SSI) is the most important technical problem of our time.
We've started the world’s first straight-shot SSI lab, with one goal and one product: a safe superintelligence.
It’s called Safe Superintelligence
Our paper (with Matt Hoffman and
@jaschasd
) on learning MCMC kernels parameterized by neural networks was accepted to ICLR. Up to 106x ESS and improved posterior sampling for deep generative models.
Paper:
Code:
cc:
@GoogleBrain
First-order methods are great but we never know which one to use. In our paper (w/ John Duchi) , we tell you which one to use and when, depending on the quadratic convexity of the constraints. Also, adaptivity matters.
To appear at
#NeurIPS2019
as an oral.
Two papers on optimization with differential privacy accepted to
#NeurIPS2021
!
In the 1st, with
@SZiteng
and collaborators at Google, we provide optimal algorithms to learn with *user-level* DP; a more stringent notion that protects a user's entire contribution (of many samples)
DRO: no-one knows what it does and it doesn't scale anyway... Or does it?
In our
#NeurIPS2020
paper, we propose optimization algorithms with running time independent of dimension and dataset size (think SGD for ERM) for CVaR and chi-square objectives. 1/4
On my way to
#ICLR2018
! Will be presenting our work (with Matt Hoffman and
@jaschasd
) on generalizing HMC with neural networks. Come to the poster on Thursday!
Code:
Paper:
cc:
@GoogleBrain
Choosing the optimal gradient algorithm depending on the geometry is important. Interestingly, how to do it is in seminal results about the Gaussian Sequence model. Come to our oral at 4:50pm in West Exhibition Hall A!
#NeurIPS2019
First-order methods are great but we never know which one to use. In our paper (w/ John Duchi) , we tell you which one to use and when, depending on the quadratic convexity of the constraints. Also, adaptivity matters.
To appear at
#NeurIPS2019
as an oral.
New paper out on learning under *user*-level differential privacy constraints!
In the standard DP setting, we implicitly assume that each user contributes a single sample but it turns out we often contribute many many samples (like all of our texts). 1/
The poster everyone has been waiting for will happen tomorrow (Wednesday) between 9-11am PST.
Come say hi and learn the secret to robustness at scale!
Paper:
Gathertown:
DRO: no-one knows what it does and it doesn't scale anyway... Or does it?
In our
#NeurIPS2020
paper, we propose optimization algorithms with running time independent of dimension and dataset size (think SGD for ERM) for CVaR and chi-square objectives. 1/4
Just arrived to
#AAAI2018
! Will be presenting our work (with
@ermonste
) on sample-efficient policy optimization for discrete action spaces. Spotlight tomorrow at 2:30.
Paper:
Great collaboration with
@violet_zct
et al. on DRO for multilingual translation! Two key ideas:
- surprisingly, the robust objective improves performance on *every* language pair (vs ERM)
- tailoring the optimization algorithm to the architecture (here transformers) matters a lot
I am excited to introduce our EMNLP paper ``Distributionally Robust Multilingual Machine Translation”. To encourage uniform performance across languages, we propose a new learning objective for multilingual training based on the concept of distributionally robust optimization.
I am hiring PhD students in ML this cycle! Topics include unsupervised ML, generative models, seq. decision making, and climate change.
For those interested, plz apply to UCLA CS. Deadline: Dec 15. Plz share widely!
More info:
#AI
#ML
#AcademicTwitter
@hardmaru
There is a proof of this result for arbitrary decoder and linear encoder in the very cool of
@adityagrover_
and
@ermonste
(in the case beta=0): !
Can you detect COVID-19 using Machine Learning? 🤔
You have an X-ray or CT scan and the task is to detect if the patient has COVID-19 or not. Sounds doable, right?
None of the 415 ML papers published on the subject in 2020 was usable. Not a single one!
Let's see why 👇
In the 2nd with
@HilalAsi2
and John Duchi, we show that we can *beat* the "unavoidable" dimension factor when the function has growth around the optimum.
Furthermore, we don't need to know the growth constant a priori and adapt to achieve the best rate over these families.
Some open questions remain:
- There is still a gap between the (known) lower and upper bounds for the chi-square uncertainty sets.
- What's the effect of DRO-type training for big deep learning models? This couldn't really be evaluated before!
3/4
Things that didn’t exist on Thanksgiving 10 years ago:
Snapchat
Facebook Messenger
Apple Maps
Uber Eats
TikTok
Twitch
Zoom
Lyft
Slack
Peloton
Ethereum
AirPods
Apple Watch
Google Classroom
Microsoft Teams
Alexa
Tinder
Telegram
Oculus
Tesla Model S
Apple Music
Chromebook
Minecraft
@thegautamkamath
This was a billion startup idea I've had for a while. The price of posters in the US is insane (~$100 per poster) and it is paid for every half day of conference at least. Also you can re-use TVs across conferences! Genius.
We propose two types of algorithm, one based on minimizing a surrogate objective and the other utilizing a more sophisticated (multilevel) gradient estimator. We show that the latter are optimal in two out of three settings and all are implemented in a few lines of PyTorch. 2/4
Generalizing Hamiltonian Monte Carlo with Neural Networks. Cool work from my colleagues
@GoogleBrain
where they train MCMC kernels parameterized by neural nets. Should improve protein folding and physics simulations.
paper:
code:
@Mridgyy
I’ve been wanting to start using roam but the lack of LaTeX macros is a bit of a road block for me (to write proofs and what not). Has it been an issue for you?
@Maitre_Eolas
Maître Gildas Le Gonidec de Kerhalic peut-il aussi être condamné sur le plan disciplinaire après une telle décision de la cour de cassation ?
@admercs
@dustinvtran
I am not currently affiliated with Google so I cannot know for sure but I would be pretty excited by this addition to the amazing TFP!
We provide algorithms and lower bounds for these tasks (and sometimes they even match!) based on a novel generic mean estimator with error scaling as the concentration radius---think sub-Gaussian parameter---rather than the whole range of the random variable. 4/
It’s a bit annoying when people attack
@NateSilver538
“90% chance for Biden to win” on the basis of it being a close election. If you win a close game a 100% of the time, 100/0 was the right forecast, no matter how close the game was.
@shatterfront
I don't get this argument ; is it that clear that if Hillary / anyone else were president, there would be like 0 COVID deaths? The death rates in Western Europe with socialized medicine and reasonable leaders are essentially the same as in the US.
Gradient descent is inefficient to find saddle point (Nash equilibrium) for min-max games, because of spiralling behaviour. Beware when training your GANs …
@thejonullman
@HilalAsi2
I was being a bit facetious, apologies! I meant that it significantly improves the exponent of the d factor, e.g. if we have 5/4-growth, we go from d/n\eps -> (d/n\eps)^5! We also hypothesize that sharp growth could give rates logarithmic in dimension but haven't proved it yet 🙃