Ian Osband Profile Banner
Ian Osband Profile
Ian Osband

@IanOsband

Followers
8,170
Following
367
Media
71
Statuses
567

Research scientist at OpenAI working on decision making under uncertainty.

Joined July 2012
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@IanOsband
Ian Osband
5 months
A truth-seeking AI needs to know what it doesn't know. This requires an *epistemic* neural network. ... @ibab_ml knows this, he covets "the epinet" for Grok. ... could this be why @elonmusk is suing OpenAI? ... is this what @roon saw? Listen to @TalkRLPodcast to find out:
Tweet media one
6
3
31
@IanOsband
Ian Osband
5 years
Looking back over the year, the one paper that gave me the best "aha" moment was... Reconciling Modern Machine Learning and the Bias-Variance Tradeoff: The "bias-variance" you knew was just the first piece of the story!
Tweet media one
15
481
2K
@IanOsband
Ian Osband
5 years
This feels like a real breakthrough: Take the same basic algorithm as AlphaZero, but now *learning* its own simulator. Beautiful, elegant approach to model-based RL. ... AND ALSO STATE OF THE ART RESULTS! Well done to the team at @DeepMindAI #MuZero
5
196
751
@IanOsband
Ian Osband
4 years
Are you interested in #ThompsonSampling and #exploration , but looking for a good reference? A Tutorial on Thompson Sampling This tutorial covers the algorithm and its applications, illustrating the concepts through a range of examples... check it out!
Tweet media one
2
87
414
@IanOsband
Ian Osband
5 years
Have you heard of "RL as Inference"? ... you might be surprised that this framing completely ignores the role of uncertainty! (confusing, since it talks a lot about "posteriors") Our #ICLR spotlight tries to make sense of this:
Tweet media one
7
69
316
@IanOsband
Ian Osband
3 years
Here's my top tip for research: Start with an example that is SIMPLE and EXTREME. - SIMPLE: clean and clear example - EXTREME: pushes the key issues to the limit If you can stress test your ideas in these edge cases, it is much easier to port the key insights to complex tasks.
4
28
278
@IanOsband
Ian Osband
5 years
Really excited to release #bsuite to the public! - Clear, scalable experiments that test core #RL capabilities. - Works with OpenAI gym, Dopamine. - Detailed colab analysis - Automated LaTeX appendix Example report:
Tweet media one
@GoogleDeepMind
Google DeepMind
5 years
We are excited to release Behaviour Suite for Reinforcement Learning, or ‘bsuite’ – a collection of carefully-designed experiments that investigate core capabilities of RL agents GitHub: Paper:
2
400
1K
2
53
205
@IanOsband
Ian Osband
5 years
Another great paper for understanding generalization properties in the overparameterized regime: Spectrally-normalized margin bounds for neural networks Barlett et al It does feel like the "dark arts" of neural nets are waning...
Tweet media one
2
45
184
@IanOsband
Ian Osband
3 years
Really excited about this research... The culmination of a lot of peoples' hard work! - Cool insights on marginal/join predictions - Opensource code for a new testbed in the field ... I definitely learnt a lot working on this, you might too!
@GoogleDeepMind
Google DeepMind
3 years
Does Bayesian deep learning work? The Neural Testbed provides tools to evaluate uncertainty estimates. These tools assess both the quality of marginal prediction per input & joint predictions given many inputs. Github: Paper: 1/
Tweet media one
8
167
821
5
22
169
@IanOsband
Ian Osband
8 months
I think the OpenAI board should resign
7
7
154
@IanOsband
Ian Osband
8 months
♥️
@sama
Sam Altman
8 months
i love the openai team so much
5K
4K
73K
5
5
141
@IanOsband
Ian Osband
4 months
A lot of the value from ChatGPT comes in mundane/mindless drudgery, not deep thinking. People like @GaryMarcus overlook the genuine value here: - installing nvidia drivers - sorting out messed up ruby version - setting up custom domain name Many such cases.
25
6
130
@IanOsband
Ian Osband
4 years
Einstein's paper on Brownian motion: ~4 pages A4, easy to follow, Nobel Prize Self-Normalizing Neural Networks: >100 pages, reams of numerical equations, SELU=slightly bent RELU ... @zacharylipton I don't know what you mean 🤷‍♂️
Tweet media one
@zacharylipton
Zachary Lipton
4 years
Reading abt the Manhattan project, clear parallels to AI today—celebrated scientists churning out research under tremendous pressure. Only they shaped the future of energy, warfare, and the international world order... and we produced the Squish activation fn.
14
17
246
5
15
115
@IanOsband
Ian Osband
2 years
Excited to share some of our recent work! Fine-Tuning Language Models via Epistemic Neural Networks TL;DR: prioritise getting labels for your most *uncertain* inputs, match performance in 2x less data & better final performance Discussion (1/n)
@__nmca__
Nat McAleese
2 years
Learn your classification task with 2x less data & better final accuracy via active learning in our new paper: . How does it work? (1/n)
Tweet media one
7
32
206
4
19
115
@IanOsband
Ian Osband
8 months
Excited to (finally) present our work on Epistemic Neural Networks as a spotlight for #NeurIPS23 "Get better uncertainty than an ensemble size=100 at cost less than 2x base models" Poster 1924
5
11
102
@IanOsband
Ian Osband
3 years
Big news for reproducible research!
@GoogleDeepMind
Google DeepMind
3 years
We’ve acquired the MuJoCo physics simulator () and are making it free for all, to support research everywhere. MuJoCo is a fast, powerful, easy-to-use, and soon to be open-source simulation tool, designed for robotics research:
85
2K
6K
0
10
101
@IanOsband
Ian Osband
4 years
Big thanks to @pbloemesquire for a great tutorial: Transformers from scratch If (like me) you're excited about #GPT3 but found yourself waving your hands through various NN diagrams on self-attention... this is the cure! 🙌
Tweet media one
1
18
96
@IanOsband
Ian Osband
4 years
I often hear that "deep learning was all invented in the 90s"... But seems like many things didn't actually work before: - ReLU instead of sigmoid - ADAM instead of SGD - Favourable weight initialization I wonder if there are similar "tricks" holding back current research?
14
4
88
@IanOsband
Ian Osband
5 years
Better late than never... "Deep Exploration via Randomized Value Functions" published in JMLR: This paper presents RVF as a scalable approach to deep exploration with generalization in RL. Proud of this work with Ben, Dan and Zheng!
Tweet media one
1
13
90
@IanOsband
Ian Osband
5 years
"One weird trick" for DQN in large (continuous) action spaces: - Initialize uniform action-sampling distribution. - Choose sampled action with highest Q. - Train sampling to produce "best action" + also some entropy. - ... Works surprisingly well! Great stuff @dwf , @VladMnih !
@GoogleDeepMind
Google DeepMind
5 years
Q-learning is difficult to apply when the number of available actions is large. We show that a simple extension based on amortized stochastic search allows Q-learning to scale to high-dimensional discrete, continuous or hybrid action spaces:
6
264
850
1
9
87
@IanOsband
Ian Osband
5 years
Totally agree: The part that is hard for humans (symbolically solving the cube) is pretty easy for computers... The part that is totally trivial for humans (twisting a cube with two hands) is still essentially impossible for RL robotics!
@NandoDF
Nando de Freitas
5 years
I find it funny folks are focusing on the symbolic challenge. The big challenge is attaching that hand to a moving controllable robot arm, and preferably having two coordinated hands learning diverse behaviours by RL, from sensors, with low sample complexity and in a safe manner.
14
50
315
7
7
82
@IanOsband
Ian Osband
3 years
@WiMLworkshop @wimlds @QueerinAI @AiDisability @black_in_ai @Khipu_AI @DeepIndaba @_LXAI @women_in_ai ... we need you! The Efficient Agent Team is hiring: - New DeepMind group in California - Focus on RL, data efficiency and rich feedback - Looking to scale up theory --> practice
@NandoDF
Nando de Freitas
3 years
If you’re advertising a machine learning or AI scholarship or job on Twitter, please consider announcing it to @QueerinAI @AiDisability @black_in_ai @Khipu_AI @DeepIndaba @_LXAI @WiMLworkshop @women_in_ai and other groups who care about diversity and inclusion. Thanks
1
61
295
5
19
78
@IanOsband
Ian Osband
5 years
Thought-provoking book, thanks @demishassabis : The Order of Time TL;DR: Time as we know it (fundamentally ordered from past to future) does not exist. Our perception of time is a side-effect of us residing in a low-entropy region of space + 2nd law.
Tweet media one
1
5
75
@IanOsband
Ian Osband
6 years
We just updated our @NipsConference spotlight paper "Randomized Prior Functions for Deep Reinforcement Learning" If you're too lazy to read the paper... then just head to our accompanying website - we have #CODE + demos you can run in the browser!
2
11
68
@IanOsband
Ian Osband
3 months
Amazing work from everyone on the team... incredible what a great team working together can accomplish. ... did we mention that this is also availble FOR FREE 🫡
@LiamFedus
William Fedus
3 months
GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot 🙂. Here’s how it’s been doing.
Tweet media one
194
907
5K
4
2
66
@IanOsband
Ian Osband
5 years
This paper is not long, and very easy to read... so I definitely recommend it. The combination of: 1) Simple and targeted experiments 2) Sane and sensible writing 3) Excellent figures Helps to provide a lot of insight to #DeepLearning - more please!
Tweet media one
0
13
59
@IanOsband
Ian Osband
4 years
It says a lot that I had to honestly check if this was a troll account... As expected: - Homology detection is not the same as protein prediction. - People used neural nets for this before 2007. - AlphaFold is not using an LSTM. ... @SchmidhuberAI it's not a good look for you!
1
1
51
@IanOsband
Ian Osband
3 years
Extremely magnanimous of your "laudation of Kunihiko" to clarify that it was in fact @SchmidhuberAI that invented the Transformer in 1991! 🤣 #annusmirabilis #ididitfirst #cookielicking
Tweet media one
@SchmidhuberAI
Jürgen Schmidhuber
3 years
Kunihiko Fukushima was awarded the 2021 Bower Award for his enormous contributions to deep learning, particularly his highly influential convolutional neural network architecture. My laudation of Kunihiko at the 2021 award ceremony is on YouTube:
Tweet media one
6
134
680
0
4
45
@IanOsband
Ian Osband
1 year
Fantastic talk from @SebastienBubeck on the "Physics of AI": - Intelligence has emerged: why? how? - Let's study this with *controlled experiments* and *toy models* - Clean and clear insights that peer slightly behind the magic curtain
1
8
44
@IanOsband
Ian Osband
5 years
As part of the #bsuite release, we also include bsuite/baselines: These are simple, clear, and correct agent implementations in #TF1 , #TF2 and #JAX ... many in under 100 lines of code!
Tweet media one
@GoogleDeepMind
Google DeepMind
5 years
We built bsuite to do two things: 1. Offer clear, informative, and scalable experiments that capture key issues in RL 2. Study agent behaviour through performance on shared benchmarks You can get started with bsuite in this colab:
1
40
179
2
13
44
@IanOsband
Ian Osband
4 years
And once you've been through @pbloemesquire 's tutorial, you have to check out @karpathy 's tutorial code: Focus on the key points, #simple , #sane , and such a valuable resource in teaching... this stuff is really great!
@IanOsband
Ian Osband
4 years
Big thanks to @pbloemesquire for a great tutorial: Transformers from scratch If (like me) you're excited about #GPT3 but found yourself waving your hands through various NN diagrams on self-attention... this is the cure! 🙌
Tweet media one
1
18
96
1
3
43
@IanOsband
Ian Osband
1 year
@_aidan_clark_ this is a classic case of conflating *a bad RL algorithm* (policy gradient ?) vs *the RL problem*... You're highlighting efficient exploration as one of the outstanding problems to solve - I agree. ... and that's something that only really studied in RL!
@_aidan_clark_
Aidan Clark
1 year
I got disillusioned with RL when I realized that it was always: step 1: act randomly for ~years worth of data before stumbling upon a reward step 2: figure out how to repeat that action in a generalizable way .... and no one had good ideas for improving step 1
36
22
383
1
1
42
@IanOsband
Ian Osband
6 years
Paper summary: - Tabular Q-learning converges to optimal with infinite data. - You might hope Q-learning + function approx converges similarly to the best policy in that class. - But actually that's not true... Basically because MDP with function approx ~= POMDP #NeurIPS2018
@GoogleAI
Google AI
6 years
Congratulations to Google researchers @tylerlu , @CraigBoutilier and Dale Schuurmans, whose paper “Non-delusional Q-learning and Value Iteration” has received a #NeurIPS2018 Best Paper Award! Check it out at .
2
57
260
2
5
42
@IanOsband
Ian Osband
6 years
Great talk from @jacobmbuckman on STEVE - stochastic ensemble value expansion. "If you want to roll forward a model, it's important to incorporate uncertainty estimates - and bootstrap ensemble works well for this" Nice work, and very clear+engaging talk! #NeurIPS2018
Tweet media one
0
6
41
@IanOsband
Ian Osband
4 years
Amazing to see @DeepMind #AI on the world's most pressing problems!
@GoogleDeepMind
Google DeepMind
4 years
Today we're sharing structure predictions for six proteins associated with the virus that causes COVID-19, generated by the most up-to-date version of our AlphaFold system. We hope this contributes to the research community’s understanding of the virus:
Tweet media one
27
931
2K
0
3
39
@IanOsband
Ian Osband
5 years
If you are submitting an RL paper to AAAI, you should include a #bsuite evaluation (+ automated LaTeX appendix). - Paper: - Github: - Report: If you're interested, but having trouble then get in touch!
Tweet media one
1
0
39
@IanOsband
Ian Osband
6 years
Meet the professors at @MILAMontreal : … … I knew #MachineLearning was pretty hyped... but now it is getting #HYPE ! @slashML @boredyannlecun @dwf
2
4
38
@IanOsband
Ian Osband
4 years
It's nice that @SchmidhuberAI is using his fame/expertise/brainpower to tackle the important issues: ❌ COVID-19 Pandemic ❌ Black Lives Matter ❌ Global Warming ❌ Existential risks of AI ❌ Any research post "annus mirabilis" ✔️ The 2018 Turing award ... really?
@SchmidhuberAI
Jürgen Schmidhuber
4 years
ACM lauds the awardees for work that did not cite the origins of the used methods. I correct ACM's distortions of deep learning history and mention 8 of our direct priority disputes with Bengio & Hinton. #selfcorrectingscience
13
67
314
1
0
38
@IanOsband
Ian Osband
7 months
I hope this becomes a new form of copypasta... will we see more ML researchers posting thirst traps?
@SchmidhuberAI
Jürgen Schmidhuber
7 months
The GOAT of tennis @DjokerNole said: "35 is the new 25.” I say: “60 is the new 35.” AI research has kept me strong and healthy. AI could work wonders for you, too!
Tweet media one
165
152
2K
1
2
37
@IanOsband
Ian Osband
6 years
According to @ylecun #neurips2018 RL gets one scalar = weak signal "self supervised" = strong signal But to succeed in RL you have to understand state, transitions, and how the world works! Rewards help shape what you care about, but it's so very far from the "only" signal in RL
0
6
35
@IanOsband
Ian Osband
5 years
Great talk from Ben Van Roy at the #NeurIPS2019 workshop on optimization for RL. Is it time for the field to move beyond "MDP"? Thinking about "agent state" might be a better perspective for learning in complex worlds... the real world "state" is just too complex!
Tweet media one
1
3
36
@IanOsband
Ian Osband
4 years
Congratulations @marcgbellemare - a huge achievement and a great success for RL in the real world!
@marcgbellemare
Marc G. Bellemare
4 years
Our most recent work is out in Nature! We're reporting on (reinforcement) learning to navigate Loon stratospheric balloons and minimizing the sim2real gap. Results from a 39-day Pacific Ocean experiment show RL keeps its strong lead in real conditions.
23
108
765
1
2
36
@IanOsband
Ian Osband
5 years
#MachineLearning conference review burdens are getting out of control... too many low quality submissions + reviews! Here's a controversial solution: - $100 fee to submit a paper for review - Waived for papers that pass some "quality bar" - Use proceeds to fund D&I initiatives
5
3
33
@IanOsband
Ian Osband
5 years
This is why you need @PlotNine : It's @matplotlib under the hood but uses a "grammar of graphics" that copies @hadleywickham 's #ggplot2 from R... Almost like #keras to #tf Seriously only takes 1 day to get up to speed... You will not regret it.
1
5
31
@IanOsband
Ian Osband
5 years
Come see our #bsuite poster at the #NeurIPS2019 Deep RL Workshop... Bring your questions! West Exhibition Hall C
Tweet media one
1
6
31
@IanOsband
Ian Osband
2 years
@__nmca__ @JAslanides @geoffreyirving If you're interested in: - Uncertainty - Alignment - RL from human feedback - Language models Recent papers: Consider applying for internships/positions in the "Efficient Agent Team" working in MTV (with @ibab_ml @goodfellow_ian nearby) ;D (5/5)
1
9
31
@IanOsband
Ian Osband
5 years
100% another great paper in this area from @mrtz @OriolVinyalsML and more: Understanding Deep Learning Requires Rethinking generalization I love something that gets the conversation (or controversy) going! 😜 Large model != Poor generalization
@OriolVinyalsML
Oriol Vinyals
5 years
The paper "Understanding deep learning requires rethinking generalization" mostly asked questions. Glad to see some answers / new theories since then!
1
46
161
1
5
30
@IanOsband
Ian Osband
4 years
Great first day of the conference - thanks so much @marcgbellemare and @LihongLi20 ! Particularly recommend the panel discussion: Top experts discussing "offline reinforcement learning": @EmmaBrunskill @tengyuma @svlevine @ofirnachum @pcastr 🔥
@marcgbellemare
Marc G. Bellemare
4 years
Excited to kick off the Deep Reinforcement Learning theory workshop at the Simons Institute today, co-organized with @LihongLi20 . Today's topic is Offline reinforcement learning 🔥 Schedule is here:
1
20
99
1
2
30
@IanOsband
Ian Osband
5 years
I actually don't think this is controversial... And I'm definitely "team Bayes" Yes, an independent Gaussian prior over NN weights is nonsense... We know the *interaction* is the most important part! But there's still huge potential for effective Bayesian deep learning!
@jacobmbuckman
Jacob Buckman
5 years
Highly persuasive rant. Any Bayesians out there willing to defend the reverend's honor?
6
6
36
1
3
28
@IanOsband
Ian Osband
5 years
Welcome @SchmidhuberAI to Twitter! Approaching 10k followers... But yet to follow a single account... Who will be first? Ivakhnenko and Fukushima seem more likely than @ylecun and @geoffreyhinton .
Tweet media one
0
1
27
@IanOsband
Ian Osband
5 months
Thanks a lot to @robinc for hosting me + such a great job driving the discussion! As mentioned in the podcast, I am especially interested to hear from people who are NOT on the same page as me...
@TalkRLPodcast
TalkRL Podcast
5 months
Episode 49: Ian Osband @IanOsband research scientist at OpenAI (ex @GoogleDeepMind , @Stanford ) on decision making under uncertainty, information theory in RL, uncertainty, joint predictions, epistemic neural networks and more!
Tweet media one
4
8
64
0
2
25
@IanOsband
Ian Osband
5 years
This "double descent" keeps cropping up... it's quite a funny thing! I like this paper from Mei+Montanari: Precise asymptotics for a stylized MLP.
@OpenAI
OpenAI
5 years
A surprising deep learning mystery: Contrary to conventional wisdom, performance of unregularized CNNs, ResNets, and transformers is non-monotonic: improves, then gets worse, then improves again with increasing model size, data size, or training time.
Tweet media one
98
665
2K
0
5
26
@IanOsband
Ian Osband
5 years
😅
@danluu
Dan Luu
5 years
"actually the seed is also a hyper-parameter"
Tweet media one
9
229
981
1
4
26
@IanOsband
Ian Osband
3 years
I missed this paper when it came out! Really glad that @vincefort brought it to my attention... Even if ensemble + prior function is not the precise posterior at least it's not overconfident... and it will eventually concentrate with data. 🥳
1
4
26
@IanOsband
Ian Osband
5 years
You could still drive efficient exploration in an Actor-Critic algorithm though... and use policy gradient as a sub-procedure. For example, you could keep a distribution (or ensemble) of plausible value functions, and optimize a policy for each of these.
Tweet media one
@IanOsband
Ian Osband
5 years
@adityamodi94 @nanjiang_cs @HaqueIshfaq Actually, I don't think this is a coincidence... If you want to explore efficiently, you first need to be able to reason counterfactually: "what might things be like if I went and did XYZ?". Basic policy gradient is not going to be able to do this effectively.
0
0
4
0
6
24
@IanOsband
Ian Osband
5 years
A giant bag of ham has been delivered to my work addressed to me, and I don't know why... Where do I go from here? #hamgate
Tweet media one
7
0
25
@IanOsband
Ian Osband
4 years
We also #opensourced all the #code for the book: Recently upgraded from Py2 -> Py3 and made sure everything was still running as expected 😆 As an added bonus, you can now run this all in your browser without installing anything!
2
2
24
@IanOsband
Ian Osband
4 years
One problem that hierarchical RL has is that it's not totally clear how it *could* pan out convincingly... (Separate from standard RL) If we could distil some simple examples that embody what it means to be "good at hierarchical RL" that would be a great first step!
@zacharylipton
Zachary Lipton
4 years
There's a handful of ML ideas that just *feel right*---perhaps due to evoking some aspect of human learning?---that keep recurring but never seem to have panned out convincingly. Here's two: (1) curriculum learning; (2) hierarchical reinforcement learning. (Dis)agree? Got others?
33
26
322
5
1
24
@IanOsband
Ian Osband
6 years
Missed this one at the time @SebastienBubeck ! The videos from the #ICML2018 workshop on #exploration are all online: Please get in touch - especially if there are parts you disagree with! ;D
@SebastienBubeck
Sebastien Bubeck
6 years
@IanOsband Will it be recorded?
0
0
0
0
5
24
@IanOsband
Ian Osband
4 months
@GaryMarcus We need a new "geometric intelligence" Gary! Looking forward to when you blow these LLMs out the water 🙏
3
1
24
@IanOsband
Ian Osband
5 years
Congratulations @OriolVinyalsML @maxjaderberg and everyone else on the team! Amazing stuff 🤖🏆🏅
@GoogleDeepMind
Google DeepMind
5 years
Our #AlphaStar research features on the cover of @Nature this week! Read the paper here:
9
167
514
0
1
24
@IanOsband
Ian Osband
3 years
Big thanks to Ben Van Roy, who I think really cultivates this way of thinking analytically... Ben's way of thinking is even called out in: ... and honoured to say that, believe it or not, bsuite even gets a shout-out in the book! 🤖🧠🥳
0
1
24
@IanOsband
Ian Osband
5 years
Huge congratulations to @ilyasut and team... Definitely one of the biggest results in AI research this year!
@OpenAI
OpenAI
5 years
We're releasing "Dota 2 with Large Scale Deep Reinforcement Learning", a scientific paper analyzing our findings from our 3-year Dota project: One highlight — we trained a new agent, Rerun, which has a 98% win rate vs the version that beat @OGEsports .
52
581
2K
0
2
24
@IanOsband
Ian Osband
6 years
... of course this "control perspective" completely ignores one of the biggest question in reinforcement learning: EXPLORATION. If you're interested in how/why this is such a problem - come to the keynote talk "what is exploration" Sunday 9am #ICML2018
4
3
23
@IanOsband
Ian Osband
9 months
@ESYudkowsky ... have you tried working out?
2
0
19
@IanOsband
Ian Osband
5 years
Very lucky to get a last-minute invite to the RL workshop on predictive intelligence. A week of workshops, discussion and debate on #AI , #RL with a lot of heavy hitters... Oh yeah and it's also in #barbados with snorkel breaks 🐢 #bellairs #fresh .
Tweet media one
0
0
21
@IanOsband
Ian Osband
3 years
This is really how I think of the bsuite project: We want to collect the most simple/extreme problems in core reinforcement learning research. ... bonus points if they are *scalable*, so that the level of extreme-ness can be dialed up/down
1
1
19
@IanOsband
Ian Osband
5 years
Very interesting results on "train" vs "test" in simulated RL domains from @OpenAI
@OpenAI
OpenAI
5 years
We're releasing Procgen Benchmark, 16 procedurally-generated environments for measuring how quickly a reinforcement learning agent learns generalizable skills. This has become the standard research platform used by the OpenAI RL team:
51
358
998
1
0
19
@IanOsband
Ian Osband
4 months
@GaryMarcus Also... think you probably know this... but company valuations are typically dominated by *future* earnings. Agree that people are betting on big growth in the sector - maybe you should start "shorting" these companies! You could become quite rich.
6
0
19
@IanOsband
Ian Osband
6 years
Go-Explore attains *by far* the best scores of #MontezumaRevenge - impressive! However, we should be clear about what is the goal of research in #RL (and #exploration in particular): There is plenty enough room for all this research in #MachineLearning
@jeffclune
Jeff Clune
6 years
Montezuma’s Revenge solved! 2 million points & level 159! Go-Explore is a new algorithm for hard-exploration problems. Shatters Pitfall records too 21,000 vs 0 Blog: Vid By @AdrienLE @Joost_Huizinga @joelbot3000 @kenneth0stanley & me
18
188
484
1
5
18
@IanOsband
Ian Osband
4 years
Amazing work from the team - very exciting!
@demishassabis
Demis Hassabis
4 years
Thrilled to announce our first major breakthrough in applying AI to a grand challenge in science. #AlphaFold has been validated as a solution to the ‘protein folding problem’ & we hope it will have a big impact on disease understanding and drug discovery:
162
2K
8K
0
0
18
@IanOsband
Ian Osband
2 years
Bellmansplaining: take big deep neural networks, train supervised from human data and a huge amount of tinkering +1000x more data/compute than before, declare fundamental breakthroughs due to RL research.
@usuallyuseless
Justin Bayer
2 years
Bayesplaining: take a well established method, express it as a series of crude approximations to a Bayesian approach, throw it back at the community where it was invented.
16
74
756
0
0
18
@IanOsband
Ian Osband
8 months
♥️
@ilyasut
Ilya Sutskever
8 months
I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.
7K
4K
33K
0
0
18
@IanOsband
Ian Osband
5 years
Cool new #RL competition: learn to mine a diamond in minecraft in 4 days CPU training. ... but something really triggers me about calling this competition"sample-efficient"... they limit your #COMPUTE *not* your #DATA ... why not limit number of frames??
@wgussml
william
5 years
Excited to announce our #NeurIPS2019 competition: The MineRL Competition for Sample-Efficient Reinforcement Learning! With @rsalakhu @katjahofmann @diego_pliebana @flippnflops @svlevine @OriolVinyalsML @chelseabfinn and others! Participate here!
7
116
264
1
1
18
@IanOsband
Ian Osband
4 months
What would you do with so much money? Why not start with something small and demonstrate scaling properties: theorems, experiments. Then, come back and ask for more money with a clear plan. Didn't people already give you $$$ for Geometric Intelligence? Don't you have tenure?
@GaryMarcus
Gary Marcus
4 months
Suppose just for a second that Domingos (and I, and many others) were correct that neurosymbolic AI was one of the most promising research directions, and further suppose that we lived in a world in which people trying to pursue that research direction couldn’t get 1% of the
30
15
144
2
0
17
@IanOsband
Ian Osband
6 years
Pretty disappointed in @yaringal after I tried to work together! But, if we're doing an #ML #showdown ... let's do points not typos: - Dropout "posteriors" give bad decisions. - Doesn't even pass linear sanity checks! - Alternative? Get it going @slashML
@IanOsband
Ian Osband
6 years
@yaringal Would have preferred to do this via email, but: - lambda/d should be lambda/np in (6), thanks! - this typo in the appendix doesn't affect *any* other statements/proof. - "concrete" dropout does not address the issues we highlight. - happy to add this baseline for clarification.
1
0
10
3
1
17
@IanOsband
Ian Osband
8 months
@j_foerst It's Jurgen's world... we're just living in it
Tweet media one
1
0
16
@IanOsband
Ian Osband
6 years
This did make me lol
@biletubes
Bill Tubbs
6 years
@svlevine Favourite quote from Emo Todorov at #NeurIPS0218 on hearing that #bostondynamics has started using some reinforcement learning: “Oh good! That will slow them down.”
0
6
41
0
0
16
@IanOsband
Ian Osband
6 years
Excited for the final day of workshops at #ICML2018 ! If you're interested in what I have to say: 9am - "What is Exploration" (Exploration) 11.30am - "Deep Exploration via Randomized Value Functions" (PGM) 4.30pm - Panel Discussion (Exploration) Particularly if you disagree! ;D
0
1
16
@IanOsband
Ian Osband
4 months
It's all well and good pushing for advanced reasoning, and robustness to academic trolling... But even in their current form, it's a super valuable tool, and I think you'd be mad to exclude it from the path to AGI. > "you wouldn't ask Terence Tao how to fix your nvidia driver"
5
1
16
@IanOsband
Ian Osband
6 years
Congratulations to Leon Bottou and @obousquet for the #NeurIPS2018 test of time award: Outlining the benefits of imperfect (but fast) SGD vs batch training. Particularly good talk from @obousquet ... Would recommend watching the recording!
1
5
16
@IanOsband
Ian Osband
3 years
For some idea of what we've done in our first 6m: Roadmap for agent: Network architectures: Rethinking Bayesian Deep Learning: We have a fantastic team just getting started. Hustlers wanted, PhD optional.
2
1
16
@IanOsband
Ian Osband
4 months
Example: I had a website I set up ~8y ago while hunting for a job with @GoogleDeepMind It was out of date, and I'd completely forgot the arcane css/Jekyll/ruby I had used to make it... Let alone customize the domain. A few minutes later: 📈
4
0
15
@IanOsband
Ian Osband
6 years
Awesome results from @OpenAI : "use prediction error on a random network as a bonus for exploration." You could even call this a follow up on our @NIPS 2018 spotlight paper: "Randomized Prior Functions for Deep Reinforcement Learning" Very impressive!
@OpenAI
OpenAI
6 years
Random Network Distillation: A prediction-based method that achieves state-of-the-art performance on Montezuma’s Revenge -
8
156
521
1
3
15
@IanOsband
Ian Osband
8 months
🤡
@SchmidhuberAI
Jürgen Schmidhuber
8 months
How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. More than a dozen concrete AI priority disputes under
Tweet media one
48
132
963
2
0
15
@IanOsband
Ian Osband
5 years
Straight up - I'm not usually a GAN man... But that's pretty cool...
@NVIDIAAI
NVIDIA AI
5 years
NVIDIA Research developed a #deeplearning model that turns rough doodles into photorealistic masterpieces. Like a smart paintbrush, this GANs based tool converts segmentation maps into life-like images: #GTC19
17
374
998
0
0
14
@IanOsband
Ian Osband
5 years
Massive congratulations to the Big 3 of Deep Learning! ... but didn't #schmidhuber win the #ACMTuringAward back in the 90s?
@TheOfficialACM
Association for Computing Machinery
5 years
Yoshua Bengio, Geoffrey Hinton and Yann LeCun, the fathers of #DeepLearning , receive the 2018 #ACMTuringAward for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing today.
Tweet media one
28
1K
3K
2
1
14
@IanOsband
Ian Osband
5 months
@_aidan_clark_ It's more of a vibes based metric.
Tweet media one
0
0
14
@IanOsband
Ian Osband
2 years
Not since the days of @iaindunning has there been such a high profile "Ian" at DeepMind! 🚀
@goodfellow_ian
Ian Goodfellow
2 years
I'm excited to announce that I've joined DeepMind! I'll be a research scientist in @OriolVinyalsML 's Deep Learning team.
152
239
7K
0
0
13
@IanOsband
Ian Osband
2 years
Excited to be giving a talk at the Stanford RL forum: Tune in *today* at 4pm PDT... you can join via zoom by following the link below.
2
1
14
@IanOsband
Ian Osband
5 years
I mostly agree... But my most recent experience with JMLR took well over a year for the first review! I'm not sure it's always worth it to wait that long for a high quality gradient update Vs many more noisy SGD steps via conference.
@alshedivat
Maruan Al-Shedivat
5 years
Every time I get back reviews from JMLR, I'm just blown away by the quality (as compared to the typical reviews from an ML/AI conference). The questions/comments are often informative to the point they can really be seen as a contribution to the paper itself!
3
3
130
1
0
14
@IanOsband
Ian Osband
2 years
The key technology here is the ability to be able to estimate the model uncertainty in language model. To this, we use a new type of network architecture called an *epinet* = a small additional network designed to estimate uncertainty. (2/n)
1
2
14
@IanOsband
Ian Osband
5 months
ChatGPT (+ other LLMs) take actions grounded in the real world: interacting with human users to satisfy their requests Things really are backwards if you think that playing Goat Simulator 3 for thousands of years of simulated gameplay to finally reach 200%-relative simulated
@janexwang
Jane Wang
5 months
LLMs are amazing but they’re not grounded in external, embodied environments. That’s why I’m excited to finally be able to talk about the project I’ve been working on for over a year: SIMA, an agent that can follow natural language in video games!
5
42
265
2
0
13
@IanOsband
Ian Osband
5 months
I'd say that the hundreds of millions of @ChatGPTapp users is a pretty real grounding 🥴
@janexwang
Jane Wang
5 months
LLMs are amazing but they’re not grounded in external, embodied environments. That’s why I’m excited to finally be able to talk about the project I’ve been working on for over a year: SIMA, an agent that can follow natural language in video games!
5
42
265
2
0
13
@IanOsband
Ian Osband
5 years
Looking forward to reading this one!
@ShamKakade6
Sham Kakade
5 years
What actually constitutes a good representation for reinforcement learning? Lots of sufficient conditions. But what's necessary? New paper: . Surprisingly, good value (or policy) based representations just don't cut it! w/ @SimonShaoleiDu @RuosongW @lyang36
2
32
178
0
1
13
@IanOsband
Ian Osband
5 years
Great to see high-quality software open source from @berkeley_ai ! 👏 But why do these #RL frameworks end up with so many complex Agent interfaces: (OpenAI Baselines + Dopamine are similar) Why not: - agent.act(observation) - agent.observe(transition)
@maxjaderberg
Max Jaderberg
5 years
New reinforcement learning library rlpyt in pytorch thanks to Adam Stooke from @berkeley_ai (and previously intern with me at @DeepMindAI ). There are a whole suite of RL algorithms implemented and framework for small and medium scale distributed training.
Tweet media one
3
114
427
2
3
13
@IanOsband
Ian Osband
5 months
commoditize your complement
@DrJimFan
Jim Fan
5 months
We live in such strange times. Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple. MM1 is a treasure trove of analysis. They discuss
Tweet media one
57
754
4K
0
0
13