Ian Osband @IanOsband profile

Ian Osband

@IanOsband

Followers

8,170

Following

367

Media

71

Statuses

567

Research scientist at OpenAI working on decision making under uncertainty.

https://t.co/OzTCGlvkIE

Joined July 2012

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

#GranHermanoCHV • 75001 Tweets

広瀬めぐみ • 44018 Tweets

#TrainAccident • 42834 Tweets

BINI BOYCOTT GENOCIDE FUNDERS • 22134 Tweets

大麻パーティ • 20385 Tweets

キラーT細胞 • 19068 Tweets

रेल मंत्री • 18049 Tweets

本イベント • 17517 Tweets

Soler • 11987 Tweets

Antonia • 10235 Tweets

ノーカネヒラ

Bertugas Sepenuh Hati

تاتي اليوم

うまトマ

アインラッド

石田正宗

Ji Chang Wook

新マウント

ホイールオブフォーチュン

UBパラ

百鬼夜行

ランゲラック

バチャ限

馬場裕一さん

匠の絶技

RECARO

ジョブ調整

ツインピークス

川瀬くん

ディカーソン

狐ヶ崎さん

モモ箱限

ナージャ

スキル回し

ジーター

Dホイール

レカロ破産

サイ・ヤング

アッパー

AL LOVE DELIVERY

零式マウント

ピクトマンサー

ライディングデュエル

#PALMYมิตรUniverseConcert

#مخططات_شرق_الرياض1

侍ナーフ

#Wayanad

パッチノート

カイケル

ヴァイパー

Last Seen Profiles

@CollinsWeglobe

@soffibunda

@RocKnighthawks

@NVSOS

@Lauren_y_taylor

@apocalypticafi

@lividwtw

@AAnoranora93565

@educasado

@Bennocap

@L2eLn

@blessking233223

@dnatweets

@sserirr

@hewdevil

@itsBigGrip

@travischmeisser

@katratnapala

@SSBCrack

@mayukotaniguchi

Pinned Tweet

Ian Osband

@IanOsband

5 months

A truth-seeking AI needs to know what it doesn't know. This requires an *epistemic* neural network. ... @ibab_ml knows this, he covets "the epinet" for Grok. ... could this be why @elonmusk is suing OpenAI? ... is this what @roon saw? Listen to @TalkRLPodcast to find out:

6

3

31

Ian Osband

@IanOsband

5 years

Looking back over the year, the one paper that gave me the best "aha" moment was... Reconciling Modern Machine Learning and the Bias-Variance Tradeoff: The "bias-variance" you knew was just the first piece of the story!

15

481

2K

Ian Osband

@IanOsband

5 years

This feels like a real breakthrough: Take the same basic algorithm as AlphaZero, but now *learning* its own simulator. Beautiful, elegant approach to model-based RL. ... AND ALSO STATE OF THE ART RESULTS! Well done to the team at @DeepMindAI #MuZero

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in...

arxiv.org

5

196

751

Ian Osband

@IanOsband

4 years

Are you interested in #ThompsonSampling and #exploration , but looking for a good reference? A Tutorial on Thompson Sampling This tutorial covers the algorithm and its applications, illustrating the concepts through a range of examples... check it out!

2

87

414

Ian Osband

@IanOsband

5 years

Have you heard of "RL as Inference"? ... you might be surprised that this framing completely ignores the role of uncertainty! (confusing, since it talks a lot about "posteriors") Our #ICLR spotlight tries to make sense of this:

7

69

316

Ian Osband

@IanOsband

3 years

Here's my top tip for research: Start with an example that is SIMPLE and EXTREME. - SIMPLE: clean and clear example - EXTREME: pushes the key issues to the limit If you can stress test your ideas in these edge cases, it is much easier to port the key insights to complex tasks.

4

28

278

Ian Osband

@IanOsband

5 years

Really excited to release #bsuite to the public! - Clear, scalable experiments that test core #RL capabilities. - Works with OpenAI gym, Dopamine. - Detailed colab analysis - Automated LaTeX appendix Example report:

Google DeepMind

@GoogleDeepMind

5 years

We are excited to release Behaviour Suite for Reinforcement Learning, or ‘bsuite’ – a collection of carefully-designed experiments that investigate core capabilities of RL agents GitHub: Paper:

2

400

1K

2

53

205

Ian Osband

@IanOsband

5 years

Another great paper for understanding generalization properties in the overparameterized regime: Spectrally-normalized margin bounds for neural networks Barlett et al It does feel like the "dark arts" of neural nets are waning...

2

45

184

Ian Osband

@IanOsband

3 years

Really excited about this research... The culmination of a lot of peoples' hard work! - Cool insights on marginal/join predictions - Opensource code for a new testbed in the field ... I definitely learnt a lot working on this, you might too!

Google DeepMind

@GoogleDeepMind

3 years

Does Bayesian deep learning work? The Neural Testbed provides tools to evaluate uncertainty estimates. These tools assess both the quality of marginal prediction per input & joint predictions given many inputs. Github: Paper: 1/

8

167

821

5

22

169

Ian Osband

@IanOsband

8 months

I think the OpenAI board should resign

7

154

Ian Osband

@IanOsband

8 months

♥️

Sam Altman

@sama

8 months

i love the openai team so much

5K

4K

73K

5

141

Ian Osband

@IanOsband

4 months

A lot of the value from ChatGPT comes in mundane/mindless drudgery, not deep thinking. People like @GaryMarcus overlook the genuine value here: - installing nvidia drivers - sorting out messed up ruby version - setting up custom domain name Many such cases.

25

6

130

Ian Osband

@IanOsband

4 years

Einstein's paper on Brownian motion: ~4 pages A4, easy to follow, Nobel Prize Self-Normalizing Neural Networks: >100 pages, reams of numerical equations, SELU=slightly bent RELU ... @zacharylipton I don't know what you mean 🤷‍♂️

Zachary Lipton

@zacharylipton

4 years

Reading abt the Manhattan project, clear parallels to AI today—celebrated scientists churning out research under tremendous pressure. Only they shaped the future of energy, warfare, and the international world order... and we produced the Squish activation fn.

14

17

246

5

15

115

Ian Osband

@IanOsband

2 years

Excited to share some of our recent work! Fine-Tuning Language Models via Epistemic Neural Networks TL;DR: prioritise getting labels for your most *uncertain* inputs, match performance in 2x less data & better final performance Discussion (1/n)

Fine-Tuning Language Models via Epistemic Neural Networks

Language models often pre-train on large unsupervised text corpora, then fine-tune on additional task-specific data. However, typical fine-tuning schemes do not prioritize the examples that they...

arxiv.org

Nat McAleese

@__nmca__

2 years

Learn your classification task with 2x less data & better final accuracy via active learning in our new paper: . How does it work? (1/n)

7

32

206

4

19

115

Ian Osband

@IanOsband

8 months

Excited to (finally) present our work on Epistemic Neural Networks as a spotlight for #NeurIPS23 "Get better uncertainty than an ensemble size=100 at cost less than 2x base models" Poster 1924

Epistemic Neural Networks

Intelligence relies on an agent's knowledge of what it does not know. This capability can be assessed based on the quality of joint predictions of labels across multiple inputs. In principle,...

arxiv.org

5

11

102

Ian Osband

@IanOsband

3 years

Big news for reproducible research!

Google DeepMind

@GoogleDeepMind

3 years

We’ve acquired the MuJoCo physics simulator () and are making it free for all, to support research everywhere. MuJoCo is a fast, powerful, easy-to-use, and soon to be open-source simulation tool, designed for robotics research:

85

2K

6K

0

10

101

Ian Osband

@IanOsband

4 years

Big thanks to @pbloemesquire for a great tutorial: Transformers from scratch If (like me) you're excited about #GPT3 but found yourself waving your hands through various NN diagrams on self-attention... this is the cure! 🙌

1

18

96

Ian Osband

@IanOsband

4 years

I often hear that "deep learning was all invented in the 90s"... But seems like many things didn't actually work before: - ReLU instead of sigmoid - ADAM instead of SGD - Favourable weight initialization I wonder if there are similar "tricks" holding back current research?

14

4

88

Ian Osband

@IanOsband

5 years

Better late than never... "Deep Exploration via Randomized Value Functions" published in JMLR: This paper presents RVF as a scalable approach to deep exploration with generalization in RL. Proud of this work with Ben, Dan and Zheng!

1

13

90

Ian Osband

@IanOsband

5 years

"One weird trick" for DQN in large (continuous) action spaces: - Initialize uniform action-sampling distribution. - Choose sampled action with highest Q. - Train sampling to produce "best action" + also some entropy. - ... Works surprisingly well! Great stuff @dwf , @VladMnih !

Google DeepMind

@GoogleDeepMind

5 years

Q-learning is difficult to apply when the number of available actions is large. We show that a simple extension based on amortized stochastic search allows Q-learning to scale to high-dimensional discrete, continuous or hybrid action spaces:

6

264

850

1

9

87

Ian Osband

@IanOsband

5 years

Totally agree: The part that is hard for humans (symbolically solving the cube) is pretty easy for computers... The part that is totally trivial for humans (twisting a cube with two hands) is still essentially impossible for RL robotics!

Nando de Freitas

@NandoDF

5 years

I find it funny folks are focusing on the symbolic challenge. The big challenge is attaching that hand to a moving controllable robot arm, and preferably having two coordinated hands learning diverse behaviours by RL, from sensors, with low sample complexity and in a safe manner.

14

50

315

7

82

Ian Osband

@IanOsband

3 years

@WiMLworkshop @wimlds @QueerinAI @AiDisability @black_in_ai @Khipu_AI @DeepIndaba @_LXAI @women_in_ai ... we need you! The Efficient Agent Team is hiring: - New DeepMind group in California - Focus on RL, data efficiency and rich feedback - Looking to scale up theory --> practice

Nando de Freitas

@NandoDF

3 years

If you’re advertising a machine learning or AI scholarship or job on Twitter, please consider announcing it to @QueerinAI @AiDisability @black_in_ai @Khipu_AI @DeepIndaba @_LXAI @WiMLworkshop @women_in_ai and other groups who care about diversity and inclusion. Thanks

1

61

295

5

19

78

Ian Osband

@IanOsband

5 years

Thought-provoking book, thanks @demishassabis : The Order of Time TL;DR: Time as we know it (fundamentally ordered from past to future) does not exist. Our perception of time is a side-effect of us residing in a low-entropy region of space + 2nd law.

1

5

75

Ian Osband

@IanOsband

6 years

We just updated our @NipsConference spotlight paper "Randomized Prior Functions for Deep Reinforcement Learning" If you're too lazy to read the paper... then just head to our accompanying website - we have #CODE + demos you can run in the browser!

Randomized Prior Functions for Deep Reinforcement Learning

Overview This is a site to collate useful accompanying material for the NeurIPS 2018 spotlight paper. Authors: Ian Osband, John Aslanides, Albin Cassirer Paper: https://arxiv.org/abs/1806.03335...

sites.google.com

2

11

68

Ian Osband

@IanOsband

3 months

Amazing work from everyone on the team... incredible what a great team working together can accomplish. ... did we mention that this is also availble FOR FREE 🫡

William Fedus

@LiamFedus

3 months

GPT-4o is our new state-of-the-art frontier model. We’ve been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot 🙂. Here’s how it’s been doing.

194

907

5K

4

2

66

Ian Osband

@IanOsband

5 years

This paper is not long, and very easy to read... so I definitely recommend it. The combination of: 1) Simple and targeted experiments 2) Sane and sensible writing 3) Excellent figures Helps to provide a lot of insight to #DeepLearning - more please!

0

13

59

Ian Osband

@IanOsband

4 years

It says a lot that I had to honestly check if this was a troll account... As expected: - Homology detection is not the same as protein prediction. - People used neural nets for this before 2007. - AlphaFold is not using an LSTM. ... @SchmidhuberAI it's not a good look for you!

1

51

Ian Osband

@IanOsband

3 years

Extremely magnanimous of your "laudation of Kunihiko" to clarify that it was in fact @SchmidhuberAI that invented the Transformer in 1991! 🤣 #annusmirabilis #ididitfirst #cookielicking

Jürgen Schmidhuber

@SchmidhuberAI

3 years

Kunihiko Fukushima was awarded the 2021 Bower Award for his enormous contributions to deep learning, particularly his highly influential convolutional neural network architecture. My laudation of Kunihiko at the 2021 award ceremony is on YouTube:

6

134

680

0

4

45

Ian Osband

@IanOsband

1 year

Fantastic talk from @SebastienBubeck on the "Physics of AI": - Intelligence has emerged: why? how? - Let's study this with *controlled experiments* and *toy models* - Clean and clear insights that peer slightly behind the magic curtain

Physics of AI

We propose an approach to the science of deep learning that roughly follows what physicists do to understand reality: (1) explore phenomena through controlle...

www.youtube.com

1

8

44

Ian Osband

@IanOsband

5 years

As part of the #bsuite release, we also include bsuite/baselines: These are simple, clear, and correct agent implementations in #TF1 , #TF2 and #JAX ... many in under 100 lines of code!

Google DeepMind

@GoogleDeepMind

5 years

We built bsuite to do two things: 1. Offer clear, informative, and scalable experiments that capture key issues in RL 2. Study agent behaviour through performance on shared benchmarks You can get started with bsuite in this colab:

1

40

179

2

13

44

Ian Osband

@IanOsband

4 years

And once you've been through @pbloemesquire 's tutorial, you have to check out @karpathy 's tutorial code: Focus on the key points, #simple , #sane , and such a valuable resource in teaching... this stuff is really great!

Ian Osband

@IanOsband

4 years

Big thanks to @pbloemesquire for a great tutorial: Transformers from scratch If (like me) you're excited about #GPT3 but found yourself waving your hands through various NN diagrams on self-attention... this is the cure! 🙌

1

18

96

1

3

43

Ian Osband

@IanOsband

1 year

@_aidan_clark_ this is a classic case of conflating *a bad RL algorithm* (policy gradient ?) vs *the RL problem*... You're highlighting efficient exploration as one of the outstanding problems to solve - I agree. ... and that's something that only really studied in RL!

Aidan Clark

@_aidan_clark_

1 year

I got disillusioned with RL when I realized that it was always: step 1: act randomly for ~years worth of data before stumbling upon a reward step 2: figure out how to repeat that action in a generalizable way .... and no one had good ideas for improving step 1

36

22

383

1

42

Ian Osband

@IanOsband

6 years

Paper summary: - Tabular Q-learning converges to optimal with infinite data. - You might hope Q-learning + function approx converges similarly to the best policy in that class. - But actually that's not true... Basically because MDP with function approx ~= POMDP #NeurIPS2018

Google AI

@GoogleAI

6 years

Congratulations to Google researchers @tylerlu , @CraigBoutilier and Dale Schuurmans, whose paper “Non-delusional Q-learning and Value Iteration” has received a #NeurIPS2018 Best Paper Award! Check it out at .

2

57

260

2

5

42

Ian Osband

@IanOsband

6 years

Great talk from @jacobmbuckman on STEVE - stochastic ensemble value expansion. "If you want to roll forward a model, it's important to incorporate uncertainty estimates - and bootstrap ensemble works well for this" Nice work, and very clear+engaging talk! #NeurIPS2018

0

6

41

Ian Osband

@IanOsband

4 years

Amazing to see @DeepMind #AI on the world's most pressing problems!

Google DeepMind

@GoogleDeepMind

4 years

Today we're sharing structure predictions for six proteins associated with the virus that causes COVID-19, generated by the most up-to-date version of our AlphaFold system. We hope this contributes to the research community’s understanding of the virus:

27

931

2K

0

3

39

Ian Osband

@IanOsband

5 years

If you are submitting an RL paper to AAAI, you should include a #bsuite evaluation (+ automated LaTeX appendix). - Paper: - Github: - Report: If you're interested, but having trouble then get in touch!

1

0

39

Ian Osband

@IanOsband

6 years

Meet the professors at @MILAMontreal : … … I knew #MachineLearning was pretty hyped... but now it is getting #HYPE ! @slashML @boredyannlecun @dwf

2

4

38

Ian Osband

@IanOsband

4 years

It's nice that @SchmidhuberAI is using his fame/expertise/brainpower to tackle the important issues: ❌ COVID-19 Pandemic ❌ Black Lives Matter ❌ Global Warming ❌ Existential risks of AI ❌ Any research post "annus mirabilis" ✔️ The 2018 Turing award ... really?

Jürgen Schmidhuber

@SchmidhuberAI

4 years

ACM lauds the awardees for work that did not cite the origins of the used methods. I correct ACM's distortions of deep learning history and mention 8 of our direct priority disputes with Bengio & Hinton. #selfcorrectingscience

13

67

314

1

0

38

Ian Osband

@IanOsband

7 months

I hope this becomes a new form of copypasta... will we see more ML researchers posting thirst traps?

Jürgen Schmidhuber

@SchmidhuberAI

7 months

The GOAT of tennis @DjokerNole said: "35 is the new 25.” I say: “60 is the new 35.” AI research has kept me strong and healthy. AI could work wonders for you, too!

165

152

2K

1

2

37

Ian Osband

@IanOsband

6 years

According to @ylecun #neurips2018 RL gets one scalar = weak signal "self supervised" = strong signal But to succeed in RL you have to understand state, transitions, and how the world works! Rewards help shape what you care about, but it's so very far from the "only" signal in RL

0

6

35

Ian Osband

@IanOsband

5 years

Great talk from Ben Van Roy at the #NeurIPS2019 workshop on optimization for RL. Is it time for the field to move beyond "MDP"? Thinking about "agent state" might be a better perspective for learning in complex worlds... the real world "state" is just too complex!

1

3

36

Ian Osband

@IanOsband

4 years

Congratulations @marcgbellemare - a huge achievement and a great success for RL in the real world!

Marc G. Bellemare

@marcgbellemare

4 years

Our most recent work is out in Nature! We're reporting on (reinforcement) learning to navigate Loon stratospheric balloons and minimizing the sim2real gap. Results from a 39-day Pacific Ocean experiment show RL keeps its strong lead in real conditions.

23

108

765

1

2

36

Ian Osband

@IanOsband

5 years

#MachineLearning conference review burdens are getting out of control... too many low quality submissions + reviews! Here's a controversial solution: - $100 fee to submit a paper for review - Waived for papers that pass some "quality bar" - Use proceeds to fund D&I initiatives

5

3

33

Ian Osband

@IanOsband

5 years

This is why you need @PlotNine : It's @matplotlib under the hood but uses a "grammar of graphics" that copies @hadleywickham 's #ggplot2 from R... Almost like #keras to #tf Seriously only takes 1 day to get up to speed... You will not regret it.

1

5

31

Ian Osband

@IanOsband

5 years

Come see our #bsuite poster at the #NeurIPS2019 Deep RL Workshop... Bring your questions! West Exhibition Hall C

1

6

31

Ian Osband

@IanOsband

2 years

@__nmca__ @JAslanides @geoffreyirving If you're interested in: - Uncertainty - Alignment - RL from human feedback - Language models Recent papers: Consider applying for internships/positions in the "Efficient Agent Team" working in MTV (with @ibab_ml @goodfellow_ian nearby) ;D (5/5)

Ian Osband

OpenAI - Cited by 8,062 - Reinforcement Learning

scholar.google.com

1

9

31

Ian Osband

@IanOsband

5 years

100% another great paper in this area from @mrtz @OriolVinyalsML and more: Understanding Deep Learning Requires Rethinking generalization I love something that gets the conversation (or controversy) going! 😜 Large model != Poor generalization

Understanding deep learning requires rethinking generalization

Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small...

arxiv.org

Oriol Vinyals

@OriolVinyalsML

5 years

The paper "Understanding deep learning requires rethinking generalization" mostly asked questions. Glad to see some answers / new theories since then!

1

46

161

1

5

30

Ian Osband

@IanOsband

4 years

Great first day of the conference - thanks so much @marcgbellemare and @LihongLi20 ! Particularly recommend the panel discussion: Top experts discussing "offline reinforcement learning": @EmmaBrunskill @tengyuma @svlevine @ofirnachum @pcastr 🔥

Marc G. Bellemare

@marcgbellemare

4 years

Excited to kick off the Deep Reinforcement Learning theory workshop at the Simons Institute today, co-organized with @LihongLi20 . Today's topic is Offline reinforcement learning 🔥 Schedule is here:

1

20

99

1

2

30

Ian Osband

@IanOsband

5 years

I actually don't think this is controversial... And I'm definitely "team Bayes" Yes, an independent Gaussian prior over NN weights is nonsense... We know the *interaction* is the most important part! But there's still huge potential for effective Bayesian deep learning!

Jacob Buckman

@jacobmbuckman

5 years

Highly persuasive rant. Any Bayesians out there willing to defend the reverend's honor?

6

36

1

3

28

Ian Osband

@IanOsband

5 years

Welcome @SchmidhuberAI to Twitter! Approaching 10k followers... But yet to follow a single account... Who will be first? Ivakhnenko and Fukushima seem more likely than @ylecun and @geoffreyhinton .

0

1

27

Ian Osband

@IanOsband

5 months

Thanks a lot to @robinc for hosting me + such a great job driving the discussion! As mentioned in the podcast, I am especially interested to hear from people who are NOT on the same page as me...

TalkRL Podcast

@TalkRLPodcast

5 months

Episode 49: Ian Osband @IanOsband research scientist at OpenAI (ex @GoogleDeepMind , @Stanford ) on decision making under uncertainty, information theory in RL, uncertainty, joint predictions, epistemic neural networks and more!

4

8

64

0

2

25

Ian Osband

@IanOsband

5 years

This "double descent" keeps cropping up... it's quite a funny thing! I like this paper from Mei+Montanari: Precise asymptotics for a stylized MLP.

OpenAI

@OpenAI

5 years

A surprising deep learning mystery: Contrary to conventional wisdom, performance of unregularized CNNs, ResNets, and transformers is non-monotonic: improves, then gets worse, then improves again with increasing model size, data size, or training time.

98

665

2K

0

5

26

Ian Osband

@IanOsband

5 years

😅

Dan Luu

@danluu

5 years

"actually the seed is also a hyper-parameter"

9

229

981

1

4

26

Ian Osband

@IanOsband

3 years

I missed this paper when it came out! Really glad that @vincefort brought it to my attention... Even if ensemble + prior function is not the precise posterior at least it's not overconfident... and it will eventually concentrate with data. 🥳

Conservative Uncertainty Estimation By Fitting Prior Networks - Microsoft Research

Obtaining high-quality uncertainty estimates is essential for many applications of deep neural networks. In this paper, we theoretically justify a scheme for estimating uncertainties, based on...

www.microsoft.com

1

4

26

Ian Osband

@IanOsband

5 years

You could still drive efficient exploration in an Actor-Critic algorithm though... and use policy gradient as a sub-procedure. For example, you could keep a distribution (or ensemble) of plausible value functions, and optimize a policy for each of these.

Ian Osband

@IanOsband

5 years

@adityamodi94 @nanjiang_cs @HaqueIshfaq Actually, I don't think this is a coincidence... If you want to explore efficiently, you first need to be able to reason counterfactually: "what might things be like if I went and did XYZ?". Basic policy gradient is not going to be able to do this effectively.

0

4

0

6

24

Ian Osband

@IanOsband

5 years

A giant bag of ham has been delivered to my work addressed to me, and I don't know why... Where do I go from here? #hamgate

7

0

25

Ian Osband

@IanOsband

4 years

We also #opensourced all the #code for the book: Recently upgraded from Py2 -> Py3 and made sure everything was still running as expected 😆 As an added bonus, you can now run this all in your browser without installing anything!

GitHub - iosband/ts_tutorial

Contribute to iosband/ts_tutorial development by creating an account on GitHub.

github.com

2

24

Ian Osband

@IanOsband

4 years

One problem that hierarchical RL has is that it's not totally clear how it *could* pan out convincingly... (Separate from standard RL) If we could distil some simple examples that embody what it means to be "good at hierarchical RL" that would be a great first step!

Zachary Lipton

@zacharylipton

4 years

There's a handful of ML ideas that just *feel right*---perhaps due to evoking some aspect of human learning?---that keep recurring but never seem to have panned out convincingly. Here's two: (1) curriculum learning; (2) hierarchical reinforcement learning. (Dis)agree? Got others?

33

26

322

5

1

24

Ian Osband

@IanOsband

6 years

Missed this one at the time @SebastienBubeck ! The videos from the #ICML2018 workshop on #exploration are all online: Please get in touch - especially if there are parts you disagree with! ;D

ERL Workshop @ ICML 2018: Ian Osband Keynote

Ian Osband (DeepMind) gave the opening keynote at the Exploration in RL (ERL) Workshop at ICML 2018.Visit our website https://sites.google.com/corp/view/erl-...

www.youtube.com

Sebastien Bubeck

@SebastienBubeck

6 years

@IanOsband Will it be recorded?

0

5

24

Ian Osband

@IanOsband

4 months

@GaryMarcus We need a new "geometric intelligence" Gary! Looking forward to when you blow these LLMs out the water 🙏

3

1

24

Ian Osband

@IanOsband

5 years

Congratulations @OriolVinyalsML @maxjaderberg and everyone else on the team! Amazing stuff 🤖🏆🏅

Google DeepMind

@GoogleDeepMind

5 years

Our #AlphaStar research features on the cover of @Nature this week! Read the paper here:

9

167

514

0

1

24

Ian Osband

@IanOsband

3 years

Big thanks to Ben Van Roy, who I think really cultivates this way of thinking analytically... Ben's way of thinking is even called out in: ... and honoured to say that, believe it or not, bsuite even gets a shout-out in the book! 🤖🧠🥳

Maxims for Thinking Analytically: The wisdom of legendary Harvard Professor Richard Zeckhauser

This book will help you think more analytically. Doing so will enable you to better understand the world around you, to make smarter decisions, and to ultimately live a more fulfilling life. It draws...

www.amazon.com

0

1

24

Ian Osband

@IanOsband

5 years

Huge congratulations to @ilyasut and team... Definitely one of the biggest results in AI research this year!

OpenAI

@OpenAI

5 years

We're releasing "Dota 2 with Large Scale Deep Reinforcement Learning", a scientific paper analyzing our findings from our 3-year Dota project: One highlight — we trained a new agent, Rerun, which has a 98% win rate vs the version that beat @OGEsports .

52

581

2K

0

2

24

Ian Osband

@IanOsband

6 years

... of course this "control perspective" completely ignores one of the biggest question in reinforcement learning: EXPLORATION. If you're interested in how/why this is such a problem - come to the keynote talk "what is exploration" Sunday 9am #ICML2018

4

3

23

Ian Osband

@IanOsband

9 months

@ESYudkowsky ... have you tried working out?

2

0

19

Ian Osband

@IanOsband

5 years

Very lucky to get a last-minute invite to the RL workshop on predictive intelligence. A week of workshops, discussion and debate on #AI , #RL with a lot of heavy hitters... Oh yeah and it's also in #barbados with snorkel breaks 🐢 #bellairs #fresh .

0

21

Ian Osband

@IanOsband

3 years

This is really how I think of the bsuite project: We want to collect the most simple/extreme problems in core reinforcement learning research. ... bonus points if they are *scalable*, so that the level of extreme-ness can be dialed up/down

GitHub - google-deepmind/bsuite: bsuite is a collection of carefully-designed experiments that...

bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning (RL) agent - google-deepmind/bsuite

github.com

1

19

Ian Osband

@IanOsband

5 years

Very interesting results on "train" vs "test" in simulated RL domains from @OpenAI

OpenAI

@OpenAI

5 years

We're releasing Procgen Benchmark, 16 procedurally-generated environments for measuring how quickly a reinforcement learning agent learns generalizable skills. This has become the standard research platform used by the OpenAI RL team:

51

358

998

1

0

19

Ian Osband

@IanOsband

4 months

@GaryMarcus Also... think you probably know this... but company valuations are typically dominated by *future* earnings. Agree that people are betting on big growth in the sector - maybe you should start "shorting" these companies! You could become quite rich.

6

0

19

Ian Osband

@IanOsband

5 years

Actual footage from the @NeurIPSConf poster session: #NeurIPS2019

It's Always Sunny in Philadelphia - Frank as an Art Collector - Ango...

null

www.youtube.com

0

1

19

Ian Osband

@IanOsband

6 years

Go-Explore attains *by far* the best scores of #MontezumaRevenge - impressive! However, we should be clear about what is the goal of research in #RL (and #exploration in particular): There is plenty enough room for all this research in #MachineLearning

Pervasive Simulator Misuse with Reinforcement Learning

The surge of interest in reinforcement learning is great fun, but I often see confused choices in applying RL algorithms to solve problems. There are two purposes for which you might use a world...

opendatascience.com

Jeff Clune

@jeffclune

6 years

Montezuma’s Revenge solved! 2 million points & level 159! Go-Explore is a new algorithm for hard-exploration problems. Shatters Pitfall records too 21,000 vs 0 Blog: Vid By @AdrienLE @Joost_Huizinga @joelbot3000 @kenneth0stanley & me

18

188

484

1

5

18

Ian Osband

@IanOsband

4 years

Amazing work from the team - very exciting!

Demis Hassabis

@demishassabis

4 years

Thrilled to announce our first major breakthrough in applying AI to a grand challenge in science. #AlphaFold has been validated as a solution to the ‘protein folding problem’ & we hope it will have a big impact on disease understanding and drug discovery:

162

2K

8K

0

18

Ian Osband

@IanOsband

2 years

Bellmansplaining: take big deep neural networks, train supervised from human data and a huge amount of tinkering +1000x more data/compute than before, declare fundamental breakthroughs due to RL research.

Justin Bayer

@usuallyuseless

2 years

Bayesplaining: take a well established method, express it as a series of crude approximations to a Bayesian approach, throw it back at the community where it was invented.

16

74

756

0

18

Ian Osband

@IanOsband

8 months

♥️

Ilya Sutskever

@ilyasut

8 months

I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.

7K

4K

33K

0

18

Ian Osband

@IanOsband

5 years

Cool new #RL competition: learn to mine a diamond in minecraft in 4 days CPU training. ... but something really triggers me about calling this competition"sample-efficient"... they limit your #COMPUTE *not* your #DATA ... why not limit number of frames??

william

@wgussml

5 years

Excited to announce our #NeurIPS2019 competition: The MineRL Competition for Sample-Efficient Reinforcement Learning! With @rsalakhu @katjahofmann @diego_pliebana @flippnflops @svlevine @OriolVinyalsML @chelseabfinn and others! Participate here!

7

116

264

1

18

Ian Osband

@IanOsband

5 years

Just found out that #bsuite made it to the big time... @karoly_zsolnai made one of his famous "two minute papers" videos - thanks a lot!

These Are The 7 Capabilities Every AI Should Have

❤️ Thank you so much for your support on Patreon: https://www.patreon.com/TwoMinutePapers📝 The paper "Behaviour Suite for Reinforcement Learning" is availab...

www.youtube.com

0

18

Ian Osband

@IanOsband

4 months

What would you do with so much money? Why not start with something small and demonstrate scaling properties: theorems, experiments. Then, come back and ask for more money with a clear plan. Didn't people already give you $$$ for Geometric Intelligence? Don't you have tenure?

Gary Marcus

@GaryMarcus

4 months

Suppose just for a second that Domingos (and I, and many others) were correct that neurosymbolic AI was one of the most promising research directions, and further suppose that we lived in a world in which people trying to pursue that research direction couldn’t get 1% of the

30

15

144

2

0

17

Ian Osband

@IanOsband

6 years

Pretty disappointed in @yaringal after I tried to work together! But, if we're doing an #ML #showdown ... let's do points not typos: - Dropout "posteriors" give bad decisions. - Doesn't even pass linear sanity checks! - Alternative? Get it going @slashML

Randomized Prior Functions for Deep Reinforcement Learning

Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most...

arxiv.org

Ian Osband

@IanOsband

6 years

@yaringal Would have preferred to do this via email, but: - lambda/d should be lambda/np in (6), thanks! - this typo in the appendix doesn't affect *any* other statements/proof. - "concrete" dropout does not address the issues we highlight. - happy to add this baseline for clarification.

1

0

10

3

1

17

Ian Osband

@IanOsband

8 months

@j_foerst It's Jurgen's world... we're just living in it

1

0

16

Ian Osband

@IanOsband

6 years

This did make me lol

Bill Tubbs

@biletubes

6 years

@svlevine Favourite quote from Emo Todorov at #NeurIPS0218 on hearing that #bostondynamics has started using some reinforcement learning: “Oh good! That will slow them down.”

0

6

41

0

16

Ian Osband

@IanOsband

6 years

Excited for the final day of workshops at #ICML2018 ! If you're interested in what I have to say: 9am - "What is Exploration" (Exploration) 11.30am - "Deep Exploration via Randomized Value Functions" (PGM) 4.30pm - Panel Discussion (Exploration) Particularly if you disagree! ;D

0

1

16

Ian Osband

@IanOsband

4 months

It's all well and good pushing for advanced reasoning, and robustness to academic trolling... But even in their current form, it's a super valuable tool, and I think you'd be mad to exclude it from the path to AGI. > "you wouldn't ask Terence Tao how to fix your nvidia driver"

5

1

16

Ian Osband

@IanOsband

6 years

Congratulations to Leon Bottou and @obousquet for the #NeurIPS2018 test of time award: Outlining the benefits of imperfect (but fast) SGD vs batch training. Particularly good talk from @obousquet ... Would recommend watching the recording!

1

5

16

Ian Osband

@IanOsband

3 years

For some idea of what we've done in our first 6m: Roadmap for agent: Network architectures: Rethinking Bayesian Deep Learning: We have a fantastic team just getting started. Hustlers wanted, PhD optional.

The Neural Testbed: Evaluating Joint Predictions

Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open-source benchmark for controlled and principled evaluation of agents...

arxiv.org

2

1

16

Ian Osband

@IanOsband

4 months

Example: I had a website I set up ~8y ago while hunting for a job with @GoogleDeepMind It was out of date, and I'd completely forgot the arcane css/Jekyll/ruby I had used to make it... Let alone customize the domain. A few minutes later: 📈

4

0

15

Ian Osband

@IanOsband

6 years

Awesome results from @OpenAI : "use prediction error on a random network as a bonus for exploration." You could even call this a follow up on our @NIPS 2018 spotlight paper: "Randomized Prior Functions for Deep Reinforcement Learning" Very impressive!

Randomized Prior Functions for Deep Reinforcement Learning

Dealing with uncertainty is essential for efficient reinforcement learning. There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most...

arxiv.org

OpenAI

@OpenAI

6 years

Random Network Distillation: A prediction-based method that achieves state-of-the-art performance on Montezuma’s Revenge -

8

156

521

1

3

15

Ian Osband

@IanOsband

1 year

And we're already getting good results applying these to language models... scale-up in progress (top secrete 👮)

Fine-Tuning Language Models via Epistemic Neural Networks

Language models often pre-train on large unsupervised text corpora, then fine-tune on additional task-specific data. However, typical fine-tuning schemes do not prioritize the examples that they...

arxiv.org

1

15

Ian Osband

@IanOsband

8 months

🤡

Jürgen Schmidhuber

@SchmidhuberAI

8 months

How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. More than a dozen concrete AI priority disputes under

48

132

963

2

0

15

Ian Osband

@IanOsband

5 years

Straight up - I'm not usually a GAN man... But that's pretty cool...

NVIDIA AI

@NVIDIAAI

5 years

NVIDIA Research developed a #deeplearning model that turns rough doodles into photorealistic masterpieces. Like a smart paintbrush, this GANs based tool converts segmentation maps into life-like images: #GTC19

17

374

998

0

14

Ian Osband

@IanOsband

5 years

Massive congratulations to the Big 3 of Deep Learning! ... but didn't #schmidhuber win the #ACMTuringAward back in the 90s?

Association for Computing Machinery

@TheOfficialACM

5 years

Yoshua Bengio, Geoffrey Hinton and Yann LeCun, the fathers of #DeepLearning , receive the 2018 #ACMTuringAward for conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing today.

28

1K

3K

2

1

14

Ian Osband

@IanOsband

5 months

@_aidan_clark_ It's more of a vibes based metric.

0

14

Ian Osband

@IanOsband

2 years

Not since the days of @iaindunning has there been such a high profile "Ian" at DeepMind! 🚀

Ian Goodfellow

@goodfellow_ian

2 years

I'm excited to announce that I've joined DeepMind! I'll be a research scientist in @OriolVinyalsML 's Deep Learning team.

152

239

7K

0

13

Ian Osband

@IanOsband

2 years

Excited to be giving a talk at the Stanford RL forum: Tune in *today* at 4pm PDT... you can join via zoom by following the link below.

Epistemic Neural Networks

Ian Osband

rlforum.stanford.edu

2

1

14

Ian Osband

@IanOsband

5 years

I mostly agree... But my most recent experience with JMLR took well over a year for the first review! I'm not sure it's always worth it to wait that long for a high quality gradient update Vs many more noisy SGD steps via conference.

Maruan Al-Shedivat

@alshedivat

5 years

Every time I get back reviews from JMLR, I'm just blown away by the quality (as compared to the typical reviews from an ML/AI conference). The questions/comments are often informative to the point they can really be seen as a contribution to the paper itself!

3

130

1

0

14

Ian Osband

@IanOsband

2 years

The key technology here is the ability to be able to estimate the model uncertainty in language model. To this, we use a new type of network architecture called an *epinet* = a small additional network designed to estimate uncertainty. (2/n)

Epistemic Neural Networks

Intelligence relies on an agent's knowledge of what it does not know. This capability can be assessed based on the quality of joint predictions of labels across multiple inputs. In principle,...

arxiv.org

1

2

14

Ian Osband

@IanOsband

5 months

ChatGPT (+ other LLMs) take actions grounded in the real world: interacting with human users to satisfy their requests Things really are backwards if you think that playing Goat Simulator 3 for thousands of years of simulated gameplay to finally reach 200%-relative simulated

Jane Wang

@janexwang

5 months

LLMs are amazing but they’re not grounded in external, embodied environments. That’s why I’m excited to finally be able to talk about the project I’ve been working on for over a year: SIMA, an agent that can follow natural language in video games!

5

42

265

2

0

13

Ian Osband

@IanOsband

5 months

I'd say that the hundreds of millions of @ChatGPTapp users is a pretty real grounding 🥴

Jane Wang

@janexwang

5 months

LLMs are amazing but they’re not grounded in external, embodied environments. That’s why I’m excited to finally be able to talk about the project I’ve been working on for over a year: SIMA, an agent that can follow natural language in video games!

5

42

265

2

0

13

Ian Osband

@IanOsband

5 years

Looking forward to reading this one!

Sham Kakade

@ShamKakade6

5 years

What actually constitutes a good representation for reinforcement learning? Lots of sufficient conditions. But what's necessary? New paper: . Surprisingly, good value (or policy) based representations just don't cut it! w/ @SimonShaoleiDu @RuosongW @lyang36

2

32

178

0

1

13

Ian Osband

@IanOsband

5 years

Great to see high-quality software open source from @berkeley_ai ! 👏 But why do these #RL frameworks end up with so many complex Agent interfaces: (OpenAI Baselines + Dopamine are similar) Why not: - agent.act(observation) - agent.observe(transition)

Max Jaderberg

@maxjaderberg

5 years

New reinforcement learning library rlpyt in pytorch thanks to Adam Stooke from @berkeley_ai (and previously intern with me at @DeepMindAI ). There are a whole suite of RL algorithms implemented and framework for small and medium scale distributed training.

3

114

427

2

3

13

Ian Osband

@IanOsband

5 months

commoditize your complement

Laws of Tech: Commoditize Your Complement

A classic pattern in technology economics, identified by Joel Spolsky, is layers of the stack attempting to become monopolies while turning other layers into perfectly-competitive markets which are...

gwern.net

Jim Fan

@DrJimFan

5 months

We live in such strange times. Apple, a company famous for its secrecy, published a paper with staggering amount of details on their multimodal foundation model. Those who are supposed to be open are now wayyy less than Apple. MM1 is a treasure trove of analysis. They discuss

57

754

4K

0

13