Imitation is the foundation of
#LLM
training.
And it is a
#ReinforcementLearning
problem!
Compared to supervised learning, RL -here inverse RL- better exploits sequential structure, online data and further extracts rewards.
Beyond thrilled for our
@GoogleDeepMind
paper!
A
After long nights and sacrificed weekends, many chats with colleagues and friends, I have finally completed this (much too long) post on
#MachineLearning
and Structure for Mobile
#Robots
Trying to get a better grasp of transfer in reinforcement learning? Look no further!
Over the last couple of years we have created a survey and taxonomy with colleagues
@GoogleDeepMind
1/N 🧵 👇
Thanks everyone for the congratulations!
Now seems a good time to announce that after continung my postdoc in Oxford until August, I will be joining
@DeepMindAI
as research scientist. Looking forward to new challenges and working together towards more capable autonomous systems!
Let's see how far we get this time...
Bought during my PhD (ie many years ago) and stopped reading at least 3 times. Now after the ICLR deadline it's time again.
Any opinions?
(The book, not my inability to complete it)
It's back! We're accepting internship applications again!
@GoogleDeepMind
Looking forward to working again with many incredible junior researchers (& engineers)!
Please reach out with any questions!
Fantastic new machine learning book by John Winn, Christopher Bishop, Thomas Diethe! (this is already largely accessible online for free)
Great interface, engagingly written, very intuitive & build on examples
Deep RL opening another door! 🤖⚽
It's amazing what dynamic, interactive behaviours emerge from training for quite simple objectives in a complex world.
Thrilled that the paper is public and incredibly proud of our team!
Add. coverage on
@60Minutes
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning
investigated the application of Deep Reinforcement Learning (Deep RL) for low-cost, miniature humanoid hardware in a dynamic environment, showing the method can synthesize sophisticated and safe
Cannot think of a better place (and a better model 😉) to embody AI in the physical world! We're hiring
@GoogleDeepMind
#Robotics
. Reach out with any qs
Gooal! Our work on autonomous, vision-based robot soccer is coming to
#CoRL2024
!
(using large-scale multi-agent RL in simulation with NeRF based rendering and lifelong learning via Replay across Experiments)
Paper
Videos
Catch up
Lets move these robots out of the lab! (check out the end of our paper for initial steps)
In particular, onboard egocentric vision instead of external sensors can go a long way.
Have a look at how the robot learns to control its head to keep track of everything!
Football players can tackle, get up, kick and chase a ball in one seamless motion. How could robots master these motor skills? ⚽
We trained AI agents to demonstrate these agile behaviours using end-to-end reinforcement learning.
Find out more:
Can deep reinforcement learning enable autonomous
#robots
🤖🦿 to play real-world soccer ⚽️?
Thrilled to share our latest step towards learning multi-agent robot soccer purely with onboard computation and sensing. We're extending prior motion capture
Ever wondered how to tune your hyperparameters while training RL agents? w/o running thousands of experiments in parallel? And even combine them?
Check out our work @
#CoRL2021
on training mixture agents which combines components with diverse architectures, distributions, etc
The talks from
#CoRL2017
are online. Many great talks, including our work on Reverse Curricula for RL and Mutual Alignment for Transfer Learning
#robotics
Excited to announce our newest work in hierarchical reinforcement learning. We design a robust off-policy learning framework which provides an easy transition from training flat Gaussian policies, to mixture policies, to option policies.
Thread 🔽
Representation matters!
I'm excited to share (slightly delayed) our newest work discussing what is a 'good' representation for
#ReinforcementLearning
in
#Robotics
.
Preparing
#ReinforcementLearning
lectures in
@GoogleColab
is fantastic!
Very interactive, easy to debug and visualise. Plus free cloud GPUs/TPUs!
I wish this would have been available when I was learning about the field nearly a decade ago!
Looking forward to my (just begun) short-time postdoc at the Oxford Robotics Institute. Proud to be working with this incredible team on tasks in LfD, RL and lifelong learning (etc) to increase robustness and reduce the effort of training robots
#robotics
#machinelearning
In our recent paper, we show that a minor variation of DQN actually solves many continuous control problems from state or pixels on par with state-of-the-art actor critic methods such as (D4PG, DMPO, SAC, DrQv2, DreamerV2, ...).
In our recent paper, we show that a minor variation of DQN actually solves many continuous control problems from state or pixels on par with state-of-the-art actor critic methods such as (D4PG, DMPO, SAC, DrQv2, DreamerV2, ...).
Finally finished my slides for the International Symposium on
#RobotLearning
tomorrow morning at the
@UTokyo_News_en
.
Looking forward to covering two recent projects as examples of the role of
#ReinforcementLearning
in the age of Gemini, ChatGPT and friends.
Thank you
Our 🌎-scale Inverse RL paper is finally out! Thrilled to share this multi-year project on route recommendation.
Understanding preferences is much harder than behaviour: not just WHAT but WHY!
We address the challenge via IRL on massive scale (100s Ms states, samples, params)!
Imitation is the foundation of
#LLM
training.
And it is a
#ReinforcementLearning
problem!
Compared to supervised learning, RL -here inverse RL- better exploits sequential structure, online data and further extracts rewards.
Beyond thrilled for our
@GoogleDeepMind
paper!
A
Sutton's and Barto’s
#ReinforcementLearning
book has had a massive, exciting update. If anyone has not yet had a look, I highly recommend it, both as quick refresher or full intro to a sub-field.
How is everyone's
#ICML2023
rebuttal experience? Looks like my team won't get any feedback across submissions.
These are only a couple of data points and I'd love to see stats from
@icmlconf
!
Very negative for junior PhDs who could learn to skip rebuttals.
Imagine you only have to run your
#FoundationModel
once! Offline and before any interaction with users.
This is possible due to the compositionality and graph structure underlying routes in Google Maps!
No better word than 'excitement' 🎉👀🤖⚽
Our robot soccer work has finally been published in
@SciRobotics
!
@GoogleDeepMind
RL for controller design is an extremely capable and flexible approach (eg incorporate large multimodal models in the future)!
Soccer players have to master a range of dynamic skills, from turning and kicking to chasing a ball. How could robots do the same? ⚽
We trained our AI agents to demonstrate a range of agile behaviors using reinforcement learning.
Here’s how. 🧵
It feels strange to see all this amazing legacy Google Brain work as part of DeepMind!
An incredible privilege to have all this talent, old and new friends, under the same (metaphorical) roof!
PaLM-E is a generalist, embodied language model for robotics. 🤖
It solves many tasks on 𝘮𝘶𝘭𝘵𝘪𝘱𝘭𝘦 types of robots and for 𝘮𝘶𝘭𝘵𝘪𝘱𝘭𝘦 modalities, including images and neural scene representations.
Hear more from researchers at
#ICML2023
:
📍Booth
#109
⌚10.30 HST
AlphaGoZero is 'thinking fast and slow' !
Post from David Barber (UCL) on Expert Iteration:
Seems like I'm a bit late to the party for realising the overlap of combining Deep Learning and MCTS to Kahneman's 'Thinking fast and slow'!
We’ve acquired the MuJoCo physics simulator () and are making it free for all, to support research everywhere. MuJoCo is a fast, powerful, easy-to-use, and soon to be open-source simulation tool, designed for robotics research:
Reinforcement learning has found a completely new role in the age of LLMs, VLMs and VLAs. Better catch up on the best ways for adaptation and transfer via RL!
The Gemini era is here. Thrilled to launch Gemini 1.0, our most capable & general AI model. Built to be natively multimodal, it can understand many types of info. Efficient & flexible, it comes in 3 sizes each best-in-class & optimized for different uses
Just finished Peter Feibelman's 'A PhD Is Not Enough!'. A great guide emphasising some of the aspects one easily tends to overlook during the time as PhD / postdoc. 1/x
Reinforcement learning is most useful if a) demonstrations are hard to get and b) a system is hard to model.
( a) no imitation, b) no MPC etc)
Excited to share Mohak's internship report and the 'Box o Flows' enabling us to ask questions in this space!
Continual learning and transfer between tasks is one of the most relevant/fascinating directions in current ML research (for me). It's a broad field and hard to keep up with the progress. So, I'm even more happy for this intuitive figure in
@DeepMindAI
s 'Progress & Compress'
Learning fast and slow 🤖🧠⏳
Highly excited that our (long-term) work on more flexible
#ReinforcementLearning
and
#TransferLearning
is finally public.
Two separate processes enable fast, coarse adaptation and slow, but better final performance.
1/n 🧵
Fascinating, how your reaction when finding a paper published about an idea you had depends on your connection to the field.
Field of expertise: No, I’m too slow!
Field of exploration: Yes, I’m on the right track!
It's a common joke in
#MachineLearning
and
#ArtificialIntelligence
that 'X is all you need' or that it is 'unreasonably effective'.
The powerful underlying idea is 'simplicity'. If we truly only need x then our methods become clearer and we can accelerate further progress.
🧵
Reward shaping in
#reinforcementlearning
is a pain! I need a couple of papers to visualize this pain for a slide.
What are your favorite examples?
(Our own dirty laundry: 9 shaping terms for the humanoid
#robot
soccer work)
Congratulations to everyone who submitted a paper to
#ICLR2022
yesterday!
Also: Congratulations to everyone who made a last minute call to instead improve their paper and submit to a future conference!
These decisions are hard but important. Your future self might thank you!
*It doesn't matter much.*
Vast majority of the papers won't matter in the long run. Your career will be shaped only by a few good ones. Instead of getting an "okay" paper accepted, it could be a blessing in disguise to revise and strengthen your paper.
Fig credit: Bill Freeman
2.8 million images were used to build a grid of Block-NeRFs and create the largest neural scene representation to date, capable of rendering an entire neighborhood in San Francisco. Dive in to the latest research from Waymo and Google Research:
Proud to announce our recent work on compositional, hierarchical models to strengthen
#transfer
between related tasks while mitigating negative interference. We considerably improve
#dataefficiency
for reinforcement learning on physical
#robots
(reducing training time by weeks)
Data-efficiency is one of the principal challenges for applying reinforcement learning on physical systems. We use hierarchical models to strengthen transfer while mitigating negative interference - saving weeks of training time for physical robots.
Software engineers ignoring libraries, inventors reinventing wheels & researchers not reading papers.
Repeated training of models completely from scratch should seem similarly hilarious.
Very interesting work on Q-function hyperparameter optimization for RL from Théo Vincent, Fabian Wahren,
@Jan_R_Peters
,
@_bbelousov
, Carlo D’Eramo
Wonder how this perspective might interact with
@timseyde
's hyperparameter mixture policies?
Using reinforcement learning to generate deployable controllers? We have a simple trick to improve performance for many off-policy RL algorithms.
Development commonly includes large numbers of experiments to adapt algorithms, parameters, etc. Why waste the generated experience?
Looking forward to discussing embodied AI and robotics
@imperialcollege
this Thursday.
My talk will cover the role of '
#ReinforcementLearning
in the Age of Large Data'. And while the data generation side should be quite trivial to many, expect some discussions around why we
@markus_with_k
ooh, thanks! That has Wednesday, not sure about Monday (they're unlisted vids so can't find via search) but Tues+Wed will keep me entertained for a while in any case :)
Introducing MuJoCo 2.1.1!
This version includes the top feature request:
#MuJoCo
now runs on ARM, including Apple Silicon. And yes, MuJoCo on the M1 Max is lightning fast.
Visit GitHub and read the changelog for more details:
I'll try to share the full slides later!
But for now, here is the work that was covered plus take aways.
Thanks again everyone, really enjoyed the questions!
Finally finished my slides for the International Symposium on
#RobotLearning
tomorrow morning at the
@UTokyo_News_en
.
Looking forward to covering two recent projects as examples of the role of
#ReinforcementLearning
in the age of Gemini, ChatGPT and friends.
Thank you
Working in
#ReinforcementLearning
for continuous Control or
#Robotics
?
You've probably repeatedly seen bang-bang behaviour emerging (which can quickly break your robot).
@timseyde
is asking why and what it means for us when designing algorithms and environments.
'Evolutionarily speaking, brains are not for rational thinking, linguistic communication, or even for perceiving the world. The most fundamental reason any organism has a brain is to help it stay alive.'
@anilkseth
@NautilusMag
Great experience with
#ICLR2024
!
Pure joy of working with our team
@GoogleDeepMind
leading to two accepted papers! Fantastic work by Dhruva Tirumala &
@BarnesMJ
!
Something for
#RL
(lifelong learning) and for
#IRL
(world-scale models!!); both heavily data-centric!
Thread 🧵👇
Glad of how robotics, computer vision and machine learning as fields move towards (even) more open access. Conferences & workshops have become more (virtually) open throughout the last years.
@RoboticsSciSys
is going to be live streamed this year!
Great to see this survey on successes of deep
#reinforcementlearning
for
#robot
deployment! There is much more to come over the next years.
Shameless self-plug, Dhruva's paper finally occupies that last free cell in table 2. Bingo!
That's one problem about being Bayesian, it's about when to stop.
[About also treating kernel parameters in Bayesian manner]
Coarsely quoting
@lawrennd
at
#NIPS17