A truth-seeking AI needs to know what it doesn't know.
This requires an *epistemic* neural network.
...
@ibab_ml
knows this, he covets "the epinet" for Grok.
... could this be why
@elonmusk
is suing OpenAI?
... is this what
@roon
saw?
Listen to
@TalkRLPodcast
to find out:
Looking back over the year, the one paper that gave me the best "aha" moment was...
Reconciling Modern Machine Learning and the Bias-Variance Tradeoff:
The "bias-variance" you knew was just the first piece of the story!
This feels like a real breakthrough:
Take the same basic algorithm as AlphaZero, but now *learning* its own simulator.
Beautiful, elegant approach to model-based RL.
... AND ALSO STATE OF THE ART RESULTS!
Well done to the team at
@DeepMindAI
#MuZero
Are you interested in
#ThompsonSampling
and
#exploration
, but looking for a good reference?
A Tutorial on Thompson Sampling
This tutorial covers the algorithm and its applications, illustrating the concepts through a range of examples... check it out!
Have you heard of "RL as Inference"?
... you might be surprised that this framing completely ignores the role of uncertainty!
(confusing, since it talks a lot about "posteriors")
Our
#ICLR
spotlight tries to make sense of this:
Here's my top tip for research:
Start with an example that is SIMPLE and EXTREME.
- SIMPLE: clean and clear example
- EXTREME: pushes the key issues to the limit
If you can stress test your ideas in these edge cases, it is much easier to port the key insights to complex tasks.
Really excited to release
#bsuite
to the public!
- Clear, scalable experiments that test core
#RL
capabilities.
- Works with OpenAI gym, Dopamine.
- Detailed colab analysis
- Automated LaTeX appendix
Example report:
We are excited to release Behaviour Suite for Reinforcement Learning, or ‘bsuite’ – a collection of carefully-designed experiments that investigate core capabilities of RL agents
GitHub:
Paper:
Another great paper for understanding generalization properties in the overparameterized regime:
Spectrally-normalized margin bounds for neural networks
Barlett et al
It does feel like the "dark arts" of neural nets are waning...
Really excited about this research...
The culmination of a lot of peoples' hard work!
- Cool insights on marginal/join predictions
- Opensource code for a new testbed in the field
... I definitely learnt a lot working on this, you might too!
Does Bayesian deep learning work? The Neural Testbed provides tools to evaluate uncertainty estimates. These tools assess both the quality of marginal prediction per input & joint predictions given many inputs.
Github:
Paper: 1/
A lot of the value from ChatGPT comes in mundane/mindless drudgery, not deep thinking.
People like
@GaryMarcus
overlook the genuine value here:
- installing nvidia drivers
- sorting out messed up ruby version
- setting up custom domain name
Many such cases.
Einstein's paper on Brownian motion:
~4 pages A4, easy to follow, Nobel Prize
Self-Normalizing Neural Networks:
>100 pages, reams of numerical equations, SELU=slightly bent RELU
...
@zacharylipton
I don't know what you mean 🤷♂️
Reading abt the Manhattan project, clear parallels to AI today—celebrated scientists churning out research under tremendous pressure. Only they shaped the future of energy, warfare, and the international world order... and we produced the Squish activation fn.
Excited to share some of our recent work!
Fine-Tuning Language Models via Epistemic Neural Networks
TL;DR: prioritise getting labels for your most *uncertain* inputs, match performance in 2x less data & better final performance
Discussion (1/n)
Excited to (finally) present our work on Epistemic Neural Networks as a spotlight for
#NeurIPS23
"Get better uncertainty than an ensemble size=100 at cost less than 2x base models"
Poster 1924
We’ve acquired the MuJoCo physics simulator () and are making it free for all, to support research everywhere. MuJoCo is a fast, powerful, easy-to-use, and soon to be open-source simulation tool, designed for robotics research:
Big thanks to
@pbloemesquire
for a great tutorial:
Transformers from scratch
If (like me) you're excited about
#GPT3
but found yourself waving your hands through various NN diagrams on self-attention... this is the cure! 🙌
I often hear that "deep learning was all invented in the 90s"...
But seems like many things didn't actually work before:
- ReLU instead of sigmoid
- ADAM instead of SGD
- Favourable weight initialization
I wonder if there are similar "tricks" holding back current research?
Better late than never...
"Deep Exploration via Randomized Value Functions" published in JMLR:
This paper presents RVF as a scalable approach to deep exploration with generalization in RL.
Proud of this work with Ben, Dan and Zheng!
"One weird trick" for DQN in large (continuous) action spaces:
- Initialize uniform action-sampling distribution.
- Choose sampled action with highest Q.
- Train sampling to produce "best action" + also some entropy.
- ... Works surprisingly well!
Great stuff
@dwf
,
@VladMnih
!
Q-learning is difficult to apply when the number of available actions is large. We show that a simple extension based on amortized stochastic search allows Q-learning to scale to high-dimensional discrete, continuous or hybrid action spaces:
Totally agree:
The part that is hard for humans (symbolically solving the cube) is pretty easy for computers...
The part that is totally trivial for humans (twisting a cube with two hands) is still essentially impossible for RL robotics!
I find it funny folks are focusing on the symbolic challenge. The big challenge is attaching that hand to a moving controllable robot arm, and preferably having two coordinated hands learning diverse behaviours by RL, from sensors, with low sample complexity and in a safe manner.
Thought-provoking book, thanks
@demishassabis
:
The Order of Time
TL;DR:
Time as we know it (fundamentally ordered from past to future) does not exist.
Our perception of time is a side-effect of us residing in a low-entropy region of space + 2nd law.
We just updated our
@NipsConference
spotlight paper
"Randomized Prior Functions for Deep Reinforcement Learning"
If you're too lazy to read the paper... then just head to our accompanying website - we have
#CODE
+ demos you can run in the browser!
Amazing work from everyone on the team... incredible what a great team working together can accomplish.
... did we mention that this is also availble FOR FREE 🫡
GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot 🙂. Here’s how it’s been doing.
This paper is not long, and very easy to read... so I definitely recommend it.
The combination of:
1) Simple and targeted experiments
2) Sane and sensible writing
3) Excellent figures
Helps to provide a lot of insight to
#DeepLearning
- more please!
It says a lot that I had to honestly check if this was a troll account...
As expected:
- Homology detection is not the same as protein prediction.
- People used neural nets for this before 2007.
- AlphaFold is not using an LSTM.
...
@SchmidhuberAI
it's not a good look for you!
Kunihiko Fukushima was awarded the 2021 Bower Award for his enormous contributions to deep learning, particularly his highly influential convolutional neural network architecture. My laudation of Kunihiko at the 2021 award ceremony is on YouTube:
Fantastic talk from
@SebastienBubeck
on the "Physics of AI":
- Intelligence has emerged: why? how?
- Let's study this with *controlled experiments* and *toy models*
- Clean and clear insights that peer slightly behind the magic curtain
As part of the
#bsuite
release, we also include bsuite/baselines:
These are simple, clear, and correct agent implementations in
#TF1
,
#TF2
and
#JAX
... many in under 100 lines of code!
We built bsuite to do two things:
1. Offer clear, informative, and scalable experiments that capture key issues in RL
2. Study agent behaviour through performance on shared benchmarks
You can get started with bsuite in this colab:
And once you've been through
@pbloemesquire
's tutorial, you have to check out
@karpathy
's tutorial code:
Focus on the key points,
#simple
,
#sane
, and such a valuable resource in teaching... this stuff is really great!
Big thanks to
@pbloemesquire
for a great tutorial:
Transformers from scratch
If (like me) you're excited about
#GPT3
but found yourself waving your hands through various NN diagrams on self-attention... this is the cure! 🙌
@_aidan_clark_
this is a classic case of conflating *a bad RL algorithm* (policy gradient ?) vs *the RL problem*...
You're highlighting efficient exploration as one of the outstanding problems to solve - I agree.
... and that's something that only really studied in RL!
I got disillusioned with RL when I realized that it was always:
step 1: act randomly for ~years worth of data before stumbling upon a reward
step 2: figure out how to repeat that action in a generalizable way
.... and no one had good ideas for improving step 1
Paper summary:
- Tabular Q-learning converges to optimal with infinite data.
- You might hope Q-learning + function approx converges similarly to the best policy in that class.
- But actually that's not true... Basically because MDP with function approx ~= POMDP
#NeurIPS2018
Congratulations to Google researchers
@tylerlu
,
@CraigBoutilier
and Dale Schuurmans, whose paper “Non-delusional Q-learning and Value Iteration” has received a
#NeurIPS2018
Best Paper Award! Check it out at .
Great talk from
@jacobmbuckman
on STEVE - stochastic ensemble value expansion.
"If you want to roll forward a model, it's important to incorporate uncertainty estimates - and bootstrap ensemble works well for this"
Nice work, and very clear+engaging talk!
#NeurIPS2018
Today we're sharing structure predictions for six proteins associated with the virus that causes COVID-19, generated by the most up-to-date version of our AlphaFold system. We hope this contributes to the research community’s understanding of the virus:
If you are submitting an RL paper to AAAI, you should include a
#bsuite
evaluation (+ automated LaTeX appendix).
- Paper:
- Github:
- Report:
If you're interested, but having trouble then get in touch!
It's nice that
@SchmidhuberAI
is using his fame/expertise/brainpower to tackle the important issues:
❌ COVID-19 Pandemic
❌ Black Lives Matter
❌ Global Warming
❌ Existential risks of AI
❌ Any research post "annus mirabilis"
✔️ The 2018 Turing award
... really?
ACM lauds the awardees for work that did not cite the origins of the used methods. I correct ACM's distortions of deep learning history and mention 8 of our direct priority disputes with Bengio & Hinton.
#selfcorrectingscience
The GOAT of tennis
@DjokerNole
said: "35 is the new 25.” I say: “60 is the new 35.” AI research has kept me strong and healthy. AI could work wonders for you, too!
According to
@ylecun
#neurips2018
RL gets one scalar = weak signal
"self supervised" = strong signal
But to succeed in RL you have to understand state, transitions, and how the world works!
Rewards help shape what you care about, but it's so very far from the "only" signal in RL
Great talk from Ben Van Roy at the
#NeurIPS2019
workshop on optimization for RL.
Is it time for the field to move beyond "MDP"?
Thinking about "agent state" might be a better perspective for learning in complex worlds... the real world "state" is just too complex!
Our most recent work is out in Nature! We're reporting on (reinforcement) learning to navigate Loon stratospheric balloons and minimizing the sim2real gap. Results from a 39-day Pacific Ocean experiment show RL keeps its strong lead in real conditions.
#MachineLearning
conference review burdens are getting out of control... too many low quality submissions + reviews!
Here's a controversial solution:
- $100 fee to submit a paper for review
- Waived for papers that pass some "quality bar"
- Use proceeds to fund D&I initiatives
@__nmca__
@JAslanides
@geoffreyirving
If you're interested in:
- Uncertainty
- Alignment
- RL from human feedback
- Language models
Recent papers:
Consider applying for internships/positions in the "Efficient Agent Team" working in MTV (with
@ibab_ml
@goodfellow_ian
nearby) ;D
(5/5)
100% another great paper in this area from
@mrtz
@OriolVinyalsML
and more:
Understanding Deep Learning Requires Rethinking generalization
I love something that gets the conversation (or controversy) going! 😜
Large model != Poor generalization
Excited to kick off the Deep Reinforcement Learning theory workshop at the Simons Institute today, co-organized with
@LihongLi20
. Today's topic is Offline reinforcement learning 🔥 Schedule is here:
I actually don't think this is controversial... And I'm definitely "team Bayes"
Yes, an independent Gaussian prior over NN weights is nonsense... We know the *interaction* is the most important part!
But there's still huge potential for effective Bayesian deep learning!
Welcome
@SchmidhuberAI
to Twitter!
Approaching 10k followers... But yet to follow a single account... Who will be first?
Ivakhnenko and Fukushima seem more likely than
@ylecun
and
@geoffreyhinton
.
Thanks a lot to
@robinc
for hosting me + such a great job driving the discussion!
As mentioned in the podcast, I am especially interested to hear from people who are NOT on the same page as me...
Episode 49: Ian Osband
@IanOsband
research scientist at OpenAI (ex
@GoogleDeepMind
,
@Stanford
) on decision making under uncertainty, information theory in RL, uncertainty, joint predictions, epistemic neural networks and more!
A surprising deep learning mystery:
Contrary to conventional wisdom, performance of unregularized CNNs, ResNets, and transformers is non-monotonic: improves, then gets worse, then improves again with increasing model size, data size, or training time.
I missed this paper when it came out!
Really glad that
@vincefort
brought it to my attention...
Even if ensemble + prior function is not the precise posterior at least it's not overconfident... and it will eventually concentrate with data. 🥳
You could still drive efficient exploration in an Actor-Critic algorithm though... and use policy gradient as a sub-procedure.
For example, you could keep a distribution (or ensemble) of plausible value functions, and optimize a policy for each of these.
@adityamodi94
@nanjiang_cs
@HaqueIshfaq
Actually, I don't think this is a coincidence...
If you want to explore efficiently, you first need to be able to reason counterfactually: "what might things be like if I went and did XYZ?".
Basic policy gradient is not going to be able to do this effectively.
We also
#opensourced
all the
#code
for the book:
Recently upgraded from Py2 -> Py3 and made sure everything was still running as expected 😆
As an added bonus, you can now run this all in your browser without installing anything!
One problem that hierarchical RL has is that it's not totally clear how it *could* pan out convincingly...
(Separate from standard RL)
If we could distil some simple examples that embody what it means to be "good at hierarchical RL" that would be a great first step!
There's a handful of ML ideas that just *feel right*---perhaps due to evoking some aspect of human learning?---that keep recurring but never seem to have panned out convincingly. Here's two: (1) curriculum learning; (2) hierarchical reinforcement learning. (Dis)agree? Got others?
Missed this one at the time
@SebastienBubeck
!
The videos from the
#ICML2018
workshop on
#exploration
are all online:
Please get in touch - especially if there are parts you disagree with! ;D
Big thanks to Ben Van Roy, who I think really cultivates this way of thinking analytically...
Ben's way of thinking is even called out in:
... and honoured to say that, believe it or not, bsuite even gets a shout-out in the book! 🤖🧠🥳
We're releasing "Dota 2 with Large Scale Deep Reinforcement Learning", a scientific paper analyzing our findings from our 3-year Dota project:
One highlight — we trained a new agent, Rerun, which has a 98% win rate vs the version that beat
@OGEsports
.
... of course this "control perspective" completely ignores one of the biggest question in reinforcement learning: EXPLORATION.
If you're interested in how/why this is such a problem - come to the keynote talk "what is exploration"
Sunday 9am
#ICML2018
Very lucky to get a last-minute invite to the RL workshop on predictive intelligence. A week of workshops, discussion and debate on
#AI
,
#RL
with a lot of heavy hitters... Oh yeah and it's also in
#barbados
with snorkel breaks 🐢
#bellairs
#fresh
.
This is really how I think of the bsuite project:
We want to collect the most simple/extreme problems in core reinforcement learning research.
... bonus points if they are *scalable*, so that the level of extreme-ness can be dialed up/down
We're releasing Procgen Benchmark, 16 procedurally-generated environments for measuring how quickly a reinforcement learning agent learns generalizable skills.
This has become the standard research platform used by the OpenAI RL team:
@GaryMarcus
Also... think you probably know this... but company valuations are typically dominated by *future* earnings.
Agree that people are betting on big growth in the sector - maybe you should start "shorting" these companies!
You could become quite rich.
Go-Explore attains *by far* the best scores of
#MontezumaRevenge
- impressive!
However, we should be clear about what is the goal of research in
#RL
(and
#exploration
in particular):
There is plenty enough room for all this research in
#MachineLearning
Thrilled to announce our first major breakthrough in applying AI to a grand challenge in science.
#AlphaFold
has been validated as a solution to the ‘protein folding problem’ & we hope it will have a big impact on disease understanding and drug discovery:
Bellmansplaining: take big deep neural networks, train supervised from human data and a huge amount of tinkering +1000x more data/compute than before, declare fundamental breakthroughs due to RL research.
Bayesplaining: take a well established method, express it as a series of crude approximations to a Bayesian approach, throw it back at the community where it was invented.
I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.
Cool new
#RL
competition: learn to mine a diamond in minecraft in 4 days CPU training.
... but something really triggers me about calling this competition"sample-efficient"... they limit your
#COMPUTE
*not* your
#DATA
... why not limit number of frames??
What would you do with so much money?
Why not start with something small and demonstrate scaling properties: theorems, experiments.
Then, come back and ask for more money with a clear plan.
Didn't people already give you $$$ for Geometric Intelligence?
Don't you have tenure?
Suppose just for a second that Domingos (and I, and many others) were correct that neurosymbolic AI was one of the most promising research directions, and further suppose that we lived in a world in which people trying to pursue that research direction couldn’t get 1% of the
Pretty disappointed in
@yaringal
after I tried to work together!
But, if we're doing an
#ML
#showdown
... let's do points not typos:
- Dropout "posteriors" give bad decisions.
- Doesn't even pass linear sanity checks!
- Alternative?
Get it going
@slashML
@yaringal
Would have preferred to do this via email, but:
- lambda/d should be lambda/np in (6), thanks!
- this typo in the appendix doesn't affect *any* other statements/proof.
- "concrete" dropout does not address the issues we highlight.
- happy to add this baseline for clarification.
@svlevine
Favourite quote from Emo Todorov at
#NeurIPS0218
on hearing that
#bostondynamics
has started using some reinforcement learning: “Oh good! That will slow them down.”
Excited for the final day of workshops at
#ICML2018
!
If you're interested in what I have to say:
9am - "What is Exploration" (Exploration)
11.30am - "Deep Exploration via Randomized Value Functions" (PGM)
4.30pm - Panel Discussion (Exploration)
Particularly if you disagree! ;D
It's all well and good pushing for advanced reasoning, and robustness to academic trolling...
But even in their current form, it's a super valuable tool, and I think you'd be mad to exclude it from the path to AGI.
> "you wouldn't ask Terence Tao how to fix your nvidia driver"
Congratulations to Leon Bottou and
@obousquet
for the
#NeurIPS2018
test of time award:
Outlining the benefits of imperfect (but fast) SGD vs batch training.
Particularly good talk from
@obousquet
... Would recommend watching the recording!
For some idea of what we've done in our first 6m:
Roadmap for agent:
Network architectures:
Rethinking Bayesian Deep Learning:
We have a fantastic team just getting started.
Hustlers wanted, PhD optional.
Example: I had a website I set up ~8y ago while hunting for a job with
@GoogleDeepMind
It was out of date, and I'd completely forgot the arcane css/Jekyll/ruby I had used to make it... Let alone customize the domain.
A few minutes later: 📈
Awesome results from
@OpenAI
:
"use prediction error on a random network as a bonus for exploration."
You could even call this a follow up on our
@NIPS
2018 spotlight paper:
"Randomized Prior Functions for Deep Reinforcement Learning"
Very impressive!
NVIDIA Research developed a
#deeplearning
model that turns rough doodles into photorealistic masterpieces. Like a smart paintbrush, this GANs based tool converts segmentation maps into life-like images:
#GTC19
Yoshua Bengio, Geoffrey Hinton and Yann LeCun, the fathers of
#DeepLearning
, receive the 2018
#ACMTuringAward
for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing today.
I mostly agree... But my most recent experience with JMLR took well over a year for the first review!
I'm not sure it's always worth it to wait that long for a high quality gradient update Vs many more noisy SGD steps via conference.
Every time I get back reviews from JMLR, I'm just blown away by the quality (as compared to the typical reviews from an ML/AI conference). The questions/comments are often informative to the point they can really be seen as a contribution to the paper itself!
The key technology here is the ability to be able to estimate the model uncertainty in language model.
To this, we use a new type of network architecture called an *epinet* = a small additional network designed to estimate uncertainty.
(2/n)
ChatGPT (+ other LLMs) take actions grounded in the real world:
interacting with human users to satisfy their requests
Things really are backwards if you think that playing Goat Simulator 3 for thousands of years of simulated gameplay to finally reach 200%-relative simulated
LLMs are amazing but they’re not grounded in external, embodied environments. That’s why I’m excited to finally be able to talk about the project I’ve been working on for over a year: SIMA, an agent that can follow natural language in video games!
LLMs are amazing but they’re not grounded in external, embodied environments. That’s why I’m excited to finally be able to talk about the project I’ve been working on for over a year: SIMA, an agent that can follow natural language in video games!
What actually constitutes a good representation for reinforcement learning? Lots of sufficient conditions. But what's necessary? New paper: . Surprisingly, good value (or policy) based representations just don't cut it! w/
@SimonShaoleiDu
@RuosongW
@lyang36
Great to see high-quality software open source from
@berkeley_ai
! 👏
But why do these
#RL
frameworks end up with so many complex Agent interfaces:
(OpenAI Baselines + Dopamine are similar)
Why not:
- agent.act(observation)
- agent.observe(transition)
New reinforcement learning library rlpyt in pytorch thanks to Adam Stooke from
@berkeley_ai
(and previously intern with me at
@DeepMindAI
). There are a whole suite of RL algorithms implemented and framework for small and medium scale distributed training.
We live in such strange times. Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple.
MM1 is a treasure trove of analysis. They discuss