Cong Lu @cong_ml profile

Cong Lu

@cong_ml

Followers

1,171

Following

978

Media

32

Statuses

241

Postdoctoral Research Fellow @UBC_CS , in open-ended RL, and AI for Scientific Discovery. Prev: PhD @UniofOxford , RS Intern @Waymo , @MSFTResearch !

https://t.co/D2OgDcq4PX

Vancouver, British Columbia

Joined October 2019

Don't wanna be here? Send us removal request.

Explore tweets Explore followers Explore following

Explore trending content on Musk Viewer

#KawalPutusanMK • 3262875 Tweets

#TolakPolitikDinasti • 2077284 Tweets

#TolakPilkadaAkal2an • 1997649 Tweets

FB JOIN OPENING BTV HOUSE • 472073 Tweets

BATALKAN BUKAN TUNDA • 330462 Tweets

Mulyono • 226113 Tweets

Bolt • 89836 Tweets

South Africans • 83902 Tweets

SF x PONDPHUWIN • 50801 Tweets

鯉登少尉 • 47107 Tweets

ポケ森サ終 • 40123 Tweets

首都高バトル • 28703 Tweets

中川大志 • 25157 Tweets

南京大虐殺 • 22281 Tweets

本人確認 • 20833 Tweets

BGYO TRASH OUT NOW • 17644 Tweets

GCSE • 16902 Tweets

シェフィ • 13239 Tweets

ドラゴンズ • 12824 Tweets

VLOG 10 WITH BUILD • 12148 Tweets

勝ち越し • 12040 Tweets

#ミュージックジェネレーション • 11926 Tweets

ベイスターズ • 10574 Tweets

前進守備

ヤフーレ

マツチロ

逆転勝ち

野間口さん

プロデューサーズ

山田哲人

矢野くん

Nigerians

ウェンデルケン

大山結婚

大逆転勝利

乱舞音曲祭

サヨナラ

ジャクソン

Maarif Modeli

アドゥワ

柳川くん

ライマル

ヤスアキ

揖保乃糸

スイープ

ライデル

#モニタリング

こいほー

すわほー

どらほー

Last Seen Profiles

@n85S4Z7NTR34zB

@GonPazPardo

@rishiazao

@RemedEasy

@CRMollet

@clauxcm

@kingoflakemoor

@listegaza

@Mulaful

@BusterToon

@Maik_Matip04

@5031_

@sotwecom

@magezienergy

@Meeps821

@KaplanSalomon

@anwar910

@Normani

@GonPazPardo

@wintraux

Pinned Tweet

Cong Lu

@cong_ml

9 days

It’s been a dream of mine since I started in ML to see autonomous agents conduct research independently and discover novel ideas! 💡 Today we take a large step towards making this a reality. We introduce *The AI Scientist*, led together with @_chris_lu_ and @RobertTLange . [1/N]

Sakana AI

@SakanaAILabs

9 days

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI

249

1K

5K

8

10

79

Cong Lu

@cong_ml

1 year

RL agents🤖need a lot of data, which they usually need to gather themselves. But does that data need to be real? Enter *Synthetic Experience Replay*, leveraging recent advances in #GenerativeAI in order to vastly upsample⬆️ an agent’s training data! [1/N]

5

37

184

Cong Lu

@cong_ml

6 months

🚨 Model-based methods for offline RL aren’t working for the reasons you think! 🚨 In our new work, led by @anyaasims , we uncover a hidden “edge-of-reach” pathology which we show is the actual reason why offline MBRL methods work or fail! Let's dive in! 🧵 [1/N]

3

17

107

Cong Lu

@cong_ml

3 months

I am extremely excited to share Intelligent Go-Explore, presenting a robust exploration framework for foundation model agents! 🤖 It was a delight to work with @shengranhu and @jeffclune on this! 📜 Paper: 🌐 Website and Code:

Intelligent Go Explore

Project page for Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models.

www.conglu.co.uk

Jeff Clune

@jeffclune

3 months

Excited to introduce Intelligent Go-Explore: Foundation model (FM) agents have the potential to be invaluable, but struggle to learn hard-exploration tasks! Our new algorithm drastically improves their exploration abilities via Go-Explore + FM intelligence. Led by @cong_ml 🧵1/

5

47

253

1

17

73

Cong Lu

@cong_ml

9 months

Super excited to share that I'll be starting as a postdoc @UBC_CS with @jeffclune this January working on advancing open-endedness with large language/multimodal models and deep RL! 🤩 I'll be at NeurIPS next week and would love to discuss on any of those topics, I'll also be...

5

4

53

Cong Lu

@cong_ml

8 months

Cya #NeurIPS2023 !! It’s been a blast!

0

47

Cong Lu

@cong_ml

2 years

Delighted that our paper won the *Outstanding Paper Award* at #LDOD at #RSS2022 !🥳 Thanks to the organizers for an amazing event! Paper + Code + Data: Joint with my amazing collaborators🥰: @philipjohnball @timrudner @jparkerholder @maosbot @yeewhye

2

11

47

Cong Lu

@cong_ml

2 years

Offline RL offers tremendous potential for training agents from large pre-collected datasets. However, the majority of work focuses on the proprioceptive setting. In this work we release the first public benchmark for continuous control using *visual observations*, V-D4RL. [1/N]

1

6

40

Cong Lu

@cong_ml

1 year

We've now released code for this project at ! We think the potential of synthetic data for sample efficiency and robustness is huge and can't wait to see what people do with it! In other news, we've extended the paper with pixel-based experiments... [1/2]

GitHub - conglu1997/SynthER: Synthetic Experience Replay

Synthetic Experience Replay. Contribute to conglu1997/SynthER development by creating an account on GitHub.

github.com

Cong Lu

@cong_ml

1 year

RL agents🤖need a lot of data, which they usually need to gather themselves. But does that data need to be real? Enter *Synthetic Experience Replay*, leveraging recent advances in #GenerativeAI in order to vastly upsample⬆️ an agent’s training data! [1/N]

5

37

184

1

8

39

Cong Lu

@cong_ml

9 days

I love this visualisation of where we were at the start of the project (no latex, only markdown, only a few experiments). Our current version of The AI Scientist is the worst it will ever be. 🚀🚀🚀

Chris Lu

@_chris_lu_

9 days

Although The AI Scientist still makes basic mistakes, its performance will only improve from here. We see this work as being similar to early developments in GenAI, where basic mistakes in image generation were quickly overcome. Our initial manuscripts looked like this:

1

13

0

2

33

Cong Lu

@cong_ml

10 months

Thank you so much - I was so incredibly fortunate to have spent these years in Oxford under your and @maosbot 's supervision, and will always treasure the fun discussions and lessons learned throughout!

Yee Whye Teh

@yeewhye

10 months

Congratulations @cong_ml on defending his DPhil dissertation! Excellent work through out the past few years, and thanks to examiners @j_foerst @_rockt !

1

4

30

7

2

31

Cong Lu

@cong_ml

1 year

Delighted that V-D4RL has been accepted at TMLR! Our benchmark and algorithms are the perfect way to start studying offline RL from pixels. As performance in proprioceptive envs saturate, it’s increasingly necessary to look further! 🧐 Here are some notable uses so far… [1/N]

1

4

30

Cong Lu

@cong_ml

11 months

Delighted that this piece of work was accepted to #NeurIPS2023 ! Excited to chat about it in New Orleans ✈️✈️

Cong Lu

@cong_ml

1 year

RL agents🤖need a lot of data, which they usually need to gather themselves. But does that data need to be real? Enter *Synthetic Experience Replay*, leveraging recent advances in #GenerativeAI in order to vastly upsample⬆️ an agent’s training data! [1/N]

5

37

184

1

30

Cong Lu

@cong_ml

8 months

Come catch us at poster #1409 now!

1

2

29

Cong Lu

@cong_ml

1 year

Will be presenting our spotlight at Reincarnating RL @iclr_conf on generating synthetic data for RL with diffusion models at 10:40AM tomorrow! If you can't make it, here's the pre-recorded talk: Paper: #ICLR2023 #GenerativeAI

Synthetic Experience Replay

A key theme in the past decade has been that when large neural networks and large datasets combine they can produce remarkable results. In deep reinforcement learning (RL), this paradigm is...

arxiv.org

0

6

27

Cong Lu

@cong_ml

4 months

Super excited to share our new work led by @JacksonMattT showing that policy guidance + trajectory diffusion models produce extremely strong RL training data! 💥💥 As an added bonus, our code comes with JAX implementations of offline RL algorithms and diffusion upsampling! 🚀

Matthew Jackson

@JacksonMattT

4 months

🎮 Introducing the new and improved Policy-Guided Diffusion! Vastly more accurate trajectory generation than autoregressive models, with strong gains in offline RL performance! Plus a ton of new theory and results since our NeurIPS workshop paper... Check it out ⤵️

6

100

542

0

2

22

Cong Lu

@cong_ml

2 years

No better time to start on offline RL from pixels! V-D4RL is now on @huggingface at 💥 New D4RL-style visual datasets! 💥 Competitive baselines based on Dreamer and DrQ! 💥 A set of exciting open problems! Thanks @Thom_Wolf for the idea 😻

conglu/vd4rl · Datasets at Hugging Face

huggingface.co

Cong Lu

@cong_ml

2 years

Delighted that our paper won the *Outstanding Paper Award* at #LDOD at #RSS2022 !🥳 Thanks to the organizers for an amazing event! Paper + Code + Data: Joint with my amazing collaborators🥰: @philipjohnball @timrudner @jparkerholder @maosbot @yeewhye

2

11

47

0

6

21

Cong Lu

@cong_ml

2 years

If you’re attending #AIIDE22 and are interested in efficient, scalable, and simple-to-implement game testing, come to the #EXAG2022 workshop where I’ll be presenting our paper on Go-Explore for automated reachability testing at 11:20AM PDT! Paper📜:

1

2

22

Cong Lu

@cong_ml

8 months

Come chat to us at the #NeurIPS2023 Robot Learning Workshop in Hall B2 about policy-guided diffusion! Super exciting work showing that guided diffusion enables long-sequence on-policy synthetic data for training agents! 🚀🚀

Matthew Jackson

@JacksonMattT

8 months

Come check out a sneak peek of our work **Policy-Guided Diffusion** today at the NeurIPS Workshop on Robot Learning! Using offline data, we generate entire trajectories that are: ✅ On-policy, ✅ Without compounding error, ✅ Without model pessimism!

4

25

131

0

3

18

Cong Lu

@cong_ml

2 years

Come chat to us now at the #icml DARL workshop about simple baselines and the first public benchmark for offline continuous control from pixels! In Hall G 🥳

0

19

Cong Lu

@cong_ml

3 months

So cool to see people building agents with Intelligent Go-Explore already!! 🚀🚀🚀

Christopher David ⚡️

@AtlantisPleb

3 months

Magency: my project for the @craft_ventures agents hackathon 🤖👇 There is no good way to control AI agents on a mobile phone. There should be an app for that! Magency is a mobile app that lets you "make a wish", aka just say what you want into one text box and see a feed of

1

9

29

0

2

18

Cong Lu

@cong_ml

20 days

Delighted that research from our lab was featured in Science News! Great read about harnessing large language models to create open-ended learning systems!

Jeff Clune

@jeffclune

20 days

OMNI-EPIC & Intelligent Go-Explore in Science News! "Both works are significant advancements towards creating open-ended learning systems,” -Tim Rocktäschel.Lead by @jennyzhangzt @maxencefaldor & @conglu Quotes @j_foerst & @togelius too.Thx @SilverJacket !

2

14

54

0

1

18

Cong Lu

@cong_ml

3 years

Really excited about this recent work to feature in #ICML2021 on meta-learning task exploration in agent belief space!

WhiRL

@whi_rl

3 years

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning - @luisa_zintgraf , @lylbfeng , @cong_ml , @MaxiIgl , @kristianhartika , @katjahofmann , @shimon8282

1

2

24

0

1

15

Cong Lu

@cong_ml

2 months

In many realistic imitation learning settings, we often have differences in observations between experts and imitators. E.g. when experts have privileged information. Excited to share our new work towards a principled Bayesian solution to resolving such imitation gaps! 😍

Risto Vuorio

@ristovuorio

2 months

Demonstrating the desired behavior is often easier than defining a good reward function. However, when the demonstrator observes the world differently than the imitator, imitation learning can fail. ➡️In our new pre-print, we propose a Bayesian solution to this imitation gap.

1

4

11

0

2

15

Cong Lu

@cong_ml

2 years

Excited to share our ICLR spotlight on revisiting🤔 design choices in offline MBRL: later today! In the meantime, check out our light-hearted video introducing the paper: 😀 @philipjohnball @jparkerholder @maosbot , Steve Roberts

Revisiting Design Choices in Offline Model-Based RL (ICLR 2022...

In which the lead authors get closer to their dream of appearing on an ML podcast.Here's a light-hearted video that summarizes what our paper is about, featu...

www.youtube.com

Philip J. Ball

@philipjohnball

3 years

Model-based approaches have recently shown SOTA performance in the offline RL setting, typically by penalizing regions with dynamics uncertainty. But how well are current methods actually doing this? 1/

1

8

36

0

4

12

Cong Lu

@cong_ml

3 months

Extremely excited to share our new work led by @GunshiGupta and @KarmeshYadav showing that pretrained diffusion models provide powerful vision-language representations for control tasks that drive efficiency and generalization! All code open-sourced at:

GitHub - ykarmesh/stable-control-representations: Code for Stable Control Representations

Code for Stable Control Representations. Contribute to ykarmesh/stable-control-representations development by creating an account on GitHub.

github.com

Gunshi Gupta

@GunshiGupta

3 months

Excited to be giving a contributed oral talk tomorrow at the #GenAI4DM Workshop at #ICLR2024 about our latest work harnessing pre-trained diffusion models as vision-language representations learners that excel across a wide variety of control tasks! Details in 🧵below!

2

8

60

0

3

12

Cong Lu

@cong_ml

3 years

Come chat to us @jparkerholder @philipjohnball at the #ICLR SSL-RL Workshop about generalisation to environments with changed dynamics from offline data on a single environment with Augmented World Models! Gathertown Link: Paper:

0

2

12

Cong Lu

@cong_ml

1 year

We are excited about scaling this work to more settings! This is joint work with awesome co-authors: @philipjohnball , @jparkerholder 🥰. Come chat with us at the Reincarnating RL Workshop at @iclr_conf or get in touch! Paper: [7/N]

Synthetic Experience Replay

A key theme in the past decade has been that when large neural networks and large datasets combine they can produce remarkable results. In deep reinforcement learning (RL), this paradigm is...

arxiv.org

0

3

12

Cong Lu

@cong_ml

2 years

Super excited by our recent work massively expanding the scope of PBT methods in RL! 💥 Joint adaptation of architecture and hyperparameters 💥 Treating the *whole* RL hyperparameter space with trust-region BO 💥 Massive improvements on the prior PBT baselines, all code online!

Xingchen Wan

@wanxingchen_

2 years

(1/7) Population Based Training (PBT) has been shown to be highly effective for tuning hyperparameters (HPs) for deep RL. Now with the advent of massively parallel simulators, there has never been a better time to use these methods! However, PBT has a couple of key problems…

3

5

40

0

2

10

Cong Lu

@cong_ml

3 years

Really excited to share our recent work with @philipjohnball , @jparkerholder and Steve Roberts on dynamics generalisation from data from a single offline RL environment! To appear as a spotlight in the #ICLR2021 SSL-RL Workshop 😃

Jack Parker-Holder

@jparkerholder

3 years

The case for offline RL is clear: we often have access to real world data in settings where it is expensive (and potentially even dangerous) to collect new experience. But what happens if this offline data doesn’t perfectly match the test environment? [1/8]

1

14

85

0

12

Cong Lu

@cong_ml

3 years

Come chat to us at C0 about generalisation to new tasks from offline data on a single environment with AugWM! #ICML2021

OxCSML

@oxcsml

3 years

Spotlight presentation in Reinforcement Learning 5, Wed 21 Jul 02:00 BST — 03:00 BST (Tues 6 p.m. PDT) Poster Session 2: Wed 21 Jul 04:00 BST — 07:00 BST (Tues 8 p.m - 11 p.m. PDT) @philipjohnball , @cong_ml , @jparkerholder , Stephen Roberts #ICML2021

1

4

0

2

11

Cong Lu

@cong_ml

4 years

Excited to present some recent work with @timrudner , @maosbot and @yaringal at the #NeurIPS2020 BDL Meetup today at 12 & 5pm GMT! Join us at !

Tim G. J. Rudner

@timrudner

4 years

A Probabilistic Perspective on Pathologies in Behavioral Cloning for Reinforcement Learning with @cong_ml , @maosbot and @yaringal 4/5

1

3

10

0

1

11

Cong Lu

@cong_ml

3 months

Come see us tomorrow at the contributed orals at the #GenAI4DM workshop, talking about leveraging text-to-image diffusion models as vision-language representation learners for control! #ICLR2024

Lisa Lee

@rl_agent

3 months

Also looking forward to the Contributed Oral Talks at #GenAI4DM workshop at #ICLR2024 : Do Transformer World Models Give Better Policy Gradients? Authors: @michel_ma_ @twni2016 Clement Gehring @proceduralia @pierrelux Pretrained Text-to-Image Diffusion

0

4

13

0

1

9

Cong Lu

@cong_ml

3 years

Come chat to us now @ D5, RL4RL workshop: about our new work revisiting uncertainty quantification in offline MBRL and showcasing new SOTA results on D4RL MuJoCo 😻 Paper: @philipjohnball @jparkerholder @maosbot , Stephen Roberts

39\CameraReady\RevisitingOfflineMBRL__RL4RL_ICML_Workshop_ (1)-min.pdf

drive.google.com

0

4

9

Cong Lu

@cong_ml

8 days

Thanks for having me! Super fun discussion on The AI Scientist! ❤️

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

8 days

WE ARE STARTING IN 15 MIN We have a great list of papers and guests! The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery - first author @cong_ml will be presenting! Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model

1

3

32

0

9

Cong Lu

@cong_ml

3 years

Check out our recent work revisiting design choices in offline model-based reinforcement learning! 🤔 arXiv: With fantastic collaborators: @philipjohnball @jparkerholder @maosbot Stephen Roberts!

Revisiting Design Choices in Offline Model-Based Reinforcement Learning

Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies, circumventing the need for potentially expensive or...

arxiv.org

Philip J. Ball

@philipjohnball

3 years

Model-based approaches have recently shown SOTA performance in the offline RL setting, typically by penalizing regions with dynamics uncertainty. But how well are current methods actually doing this? 1/

1

8

36

1

0

8

Cong Lu

@cong_ml

1 year

(cont.) with data generated in latent space! Our experiments show a solid performance gain out of the box on the standard *V-D4RL* datasets with lots more untapped potential! Check it out here: [2/2]

0

7

Cong Lu

@cong_ml

2 years

All our code and data is open-sourced at: and our paper can be found at ! [N/N, N=9]

Challenges and Opportunities in Offline Reinforcement Learning...

Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However,...

arxiv.org

0

3

7

Cong Lu

@cong_ml

9 days

Agreed, truly a phenomenal team to envision the future of scientific discovery with!!! 🥰🥰🥰

Robert Lange

@RobertTLange

9 days

Time to retire @SchmidhuberAI ? 📹: Jokes aside - this project has been soo much fun! @_chris_lu_ and @cong_ml made this one of the best colabs I had so far 🥰

1

39

0

7

Cong Lu

@cong_ml

1 year

Our algorithm is conceptually simple and is compatible with *any RL algorithm* utilizing experience replay! Synthetic data generated by a diffusion model may simply be added to the replay buffer and trained on as if it was real experience. [2/N]

1

2

7

Cong Lu

@cong_ml

10 months

I'm also extremely grateful to @j_foerst and @_rockt for putting me through my paces and an incredibly informative discussion! Thanks as well to all my collaborators and mentors who guided me along the way, as well as my friends and family! 🥰 Stay tuned for what's next!!

0

7

Cong Lu

@cong_ml

1 year

Agents trained with synthetic data perform as well as agents trained with much more real data! With *no algorithmic changes needed at all*, simple RL agents with synthetic data beat carefully designed data-efficient algorithms and prior data augmentations methods. [3/N]

1

2

6

Cong Lu

@cong_ml

6 months

We are excited about what this unified perspective of offline RL might mean for future work! To find out more, please check out our paper: , And code: , Thanks again to my amazing co-authors @anyaasims @yeewhye ! 🥰🥰 [N/N]

GitHub - anyasims/edge-of-reach: Official implementation of Reach-Aware Value Estimation (RAVL)...

Official implementation of Reach-Aware Value Estimation (RAVL) from the paper: "The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning." - anyasims/edge-of-reach

github.com

0

6

Cong Lu

@cong_ml

9 days

Blog: Paper: Open-Source Code: It was an absolute joy to work together with @_chris_lu_ , @RobertTLange , @j_foerst , @jeffclune , @hardmaru on this 🥰, and can’t wait to see what the community does with this!

GitHub - SakanaAI/AI-Scientist: The AI Scientist: Towards Fully Automated Open-Ended Scientific...

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬 - SakanaAI/AI-Scientist

github.com

0

5

Cong Lu

@cong_ml

2 years

We hope this work can springboard progress in this very nascent field! Work done with some awesome collaborators: @philipjohnball @timrudner @jparkerholder @maosbot @yeewhye . 🥳 [8/N]

1

0

5

Cong Lu

@cong_ml

9 months

... presenting our new work on efficiently training RL agents with synthetic generative data () at Poster Session 2 on Tuesday. Do come say hi! 👋

Cong Lu

@cong_ml

1 year

RL agents🤖need a lot of data, which they usually need to gather themselves. But does that data need to be real? Enter *Synthetic Experience Replay*, leveraging recent advances in #GenerativeAI in order to vastly upsample⬆️ an agent’s training data! [1/N]

5

37

184

0

1

2

Cong Lu

@cong_ml

6 months

These edge-of-reach states trigger catastrophic value overestimation and a complete collapse of learning! For example, this figure shows how the agent 🤖 completely ignores the reward function and instead just aims towards an arbitrary edge-of-reach state! [4/N]

1

0

5

Cong Lu

@cong_ml

1 year

So why was this not possible before? It turns out that small differences in sample quality with VAEs and GANs significantly affect downstream RL performance. [5/N]

2

5

Cong Lu

@cong_ml

3 months

@jsuarez5341 Agreed! Some way of selectively deferring to the FM for the “harder” parts of the env would drastically increase throughput!

1

0

4

Cong Lu

@cong_ml

7 days

@Shalev_lif @iScienceLuvr @arankomatsuzaki 😎

0

6

Cong Lu

@cong_ml

9 days

We produce a vast archive of completed papers across both proprietary and open-weight LLMs, allowing us to for the first time get a sense of their ability to partake in the entire scientific process. [3/N]

1

0

4

Cong Lu

@cong_ml

2 years

We further analyze challenges and opportunities unique to the pixel-based setting including data with visual distractions. We see that our algorithms are robust to visual distractions but only Offline DV2 generalizes to unseen distractions. Scope for future work here!! 🧐 [5/N]

1

0

4

Cong Lu

@cong_ml

1 year

Remarkably, we find that the synthetic samples generated by our diffusion model are simultaneously *more diverse, more novel, and more accurate* to the true environment dynamics than the best data augmentation method.💥 [4/N]

1

2

4

Cong Lu

@cong_ml

1 year

You can find the paper here: Code here: Thanks again to my amazing co-authors @philipjohnball @timrudner @jparkerholder @maosbot @yeewhye 😻 [N/N]

GitHub - conglu1997/v-d4rl: Challenges and Opportunities in Offline Reinforcement Learning from...

Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations - conglu1997/v-d4rl

github.com

0

1

4

Cong Lu

@cong_ml

2 years

To kick off progress on new methods, we include strong baselines derived from the SoTA DreamerV2 and DrQ-v2 algorithms! Concretely, we adapt DreamerV2 ( @danijarh ) to the offline setting by introducing a penalty based on mean disagreement, resulting in Offline DV2. [2/N]

1

0

4

Cong Lu

@cong_ml

2 years

This work was done during an awesome MAX internship over the summer hosted jointly by @MSFTResearchCam and @XboxStudio in the very lovely Cambridge. I’m super grateful to my hosts @ralgeorgescu and @rookboom , and all the friends made along the way! 🥰🥰

0

4

Cong Lu

@cong_ml

6 months

For context: In offline MBRL, existing methods usually assume that any issues are due to model errors. ➡️ Therefore, approaches are based around preventing model exploitation. However, using the perfect “oracle” dynamics causes existing methods to completely fail! 📉 [2/N]

1

2

Cong Lu

@cong_ml

2 years

We found our base algorithms already represent a strong multitask baseline, opening the door to training generalist agents from offline data. 🤯 This could be because we can directly distinguish between different tasks instead of relying on explicit meta or multitask algos! [6/N]

1

0

3

Cong Lu

@cong_ml

2 years

💥 ML Research Opportunity for under-represented undergrads at Oxford! 💥 Would appreciate help sharing this widely! UNIQ+ is an awesome way to spend two months getting stuck into ML at great groups @oxcsml @CompSciOxford See proposed projects here:

0

1

3

Cong Lu

@cong_ml

6 months

But, existing methods work?! We discuss how they inadvertently address edge-of-reach states despite their motivation from model error. Can we directly target the true problem? Yes! We introduce RAVL, which precisely corrects edge-of-reach states using value pessimism. [5/N]

1

0

3

Cong Lu

@cong_ml

2 years

Towards this goal, we were curious to see how our baselines scaled with more data. Interestingly, the RL methods scale far better than BC, with gains of >30% compared to 10% when we go from 100K samples to 500K! This may have implications for when we scale even further! 🚀 [7/N]

1

0

3

Cong Lu

@cong_ml

8 days

@Hoper_Tom @jeffclune Thank you for kindly sharing these works! We will discuss these in the updated version of the paper, and also look forward to integrating the insights from your paper into our work!

0

3

Cong Lu

@cong_ml

6 months

So, what is going on? We show that there exist “edge-of-reach” states which are used in training but which the agent can never sample actions from *even with unlimited model-based data collection* (as illustrated below). [3/N]

2

0

2

Cong Lu

@cong_ml

2 years

We adapt the algorithm DrQ-v2 ( @denisyarats ) by adding an adaptive behavioral cloning term similar to TD3+BC, resulting in DrQ+BC. We also include a CQL and BC implementation in the same codebase. [3/N]

1

0

3

Cong Lu

@cong_ml

1 year

Another key benefit we observe on some algorithms is the ability to scale up the network size and obtain better performance! RL algorithms are typically limited to training with very small networks and synthetic data could lead to the lifting of this restriction! [6/N]

1

3

Cong Lu

@cong_ml

9 days

Given a starter research area (e.g. diffusion, language modeling), The AI Scientist generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a review process for eval. [2/N]

1

0

3

Cong Lu

@cong_ml

2 years

We find that the model-based Offline DV2 performs best on datasets with diverse data, DrQ+BC deals well with mixed but high-reward data and BC is best at the expert datasets. CQL is also a strong option for high-reward data. [4/N]

1

0

3

Cong Lu

@cong_ml

2 years

@luisa_zintgraf @shimon8282 @katjahofmann Huge congratulations!! 🥳🥳🥳

0

1

Cong Lu

@cong_ml

6 months

RAVL is theoretically principled and reaches state-of-the-art on D4RL and pixel-based V-D4RL without any explicit dynamics penalty! 📈 Furthermore, these insights serve to correct and unify our understanding of offline RL across model-free and model-based approaches. [6/N]

1

0

2

Cong Lu

@cong_ml

7 months

@jsuarez5341 @arankomatsuzaki We found this as well in a project that sounds v close to both these efforts! See Table 3 of where we show diffusion synthetic data can help scale the network sizes in TD3 :)

Synthetic Experience Replay

A key theme in the past decade has been that when large neural networks and large datasets combine they can produce remarkable results. In deep reinforcement learning (RL), this paradigm is...

arxiv.org

1

0

2

Cong Lu

@cong_ml

5 months

@MrinankSharma @yeewhye @tom_rainforth @eric_nalisnick Congrats Mrin :)

0

1

Cong Lu

@cong_ml

1 year

Efficient online reinforcement learning with offline data () @philipjohnball @ikostrikov @smithlaura1028 showing a remarkably simple method to accelerating online pixel-based training with V-D4RL datasets! [5/N]

Efficient Online Reinforcement Learning with Offline Data

Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data,...

arxiv.org

1

0

2

Cong Lu

@cong_ml

1 year

@Stone_Tao Thanks! At the moment, it's roughly 50/50 for diffusion vs. RL training. Big potential for speed-ups there though. DMC is on proprioceptive, visual transitions incoming! :)

1

0

2

Cong Lu

@cong_ml

1 year

Agent-controller representations: Principled offline rl with rich exogenous information () @riashatislam @manan_tomar learning how to handle rich amounts of irrelevant information commonly found in pixel-based datasets! [2/N]

Agent-Controller Representations: Principled Offline RL with Rich...

Learning to control an agent from data collected offline in a rich pixel-based visual observation space is vital for real-world applications of reinforcement learning (RL). A major challenge in...

arxiv.org

1

0

2

Cong Lu

@cong_ml

1 year

@vladkurenkov @shaneguML @ML_is_overhyped Super cool work!! We also found a deeper networks to help in TD3+BC esp. with synthetic data ;) (Table 3 of )

0

2

Cong Lu

@cong_ml

3 months

@Abel_TorresM @jeffclune We discuss NetHack in the conclusion! 🚀

1

0

2

Cong Lu

@cong_ml

9 days

@mejia_petit @jeffclune Hey! We released all outputs of our agent at the Google Drive link in Link:

GitHub - SakanaAI/AI-Scientist: The AI Scientist: Towards Fully Automated Open-Ended Scientific...

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑‍🔬 - SakanaAI/AI-Scientist

github.com

1

0

2

Cong Lu

@cong_ml

1 year

@kchonyc We investigate this for offline MBRL, we sho that using BO and a small number of online evals we can get vastly improved performance:

Revisiting Design Choices in Offline Model-Based Reinforcement Learning

Offline reinforcement learning enables agents to leverage large pre-collected datasets of environment transitions to learn control policies, circumventing the need for potentially expensive or...

arxiv.org

0

1

2

Cong Lu

@cong_ml

5 months

@percyliang @siddkaramcheti RoBERTa, rejected from ICLR, 12k cites

0

1

Cong Lu

@cong_ml

1 year

@Stone_Tao Yes, exactly :)

0

1

Cong Lu

@cong_ml

1 year

Behavior prior representation learning for offline reinforcement learning () @Hongyu Zang @Xin Li learning effective state representations by behavioral cloning in V-D4RL! [3/N]

Behavior Prior Representation learning for Offline Reinforcement Learning

Offline reinforcement learning (RL) struggles in environments with rich and noisy inputs, where the agent only has access to a fixed dataset without environment interactions. Past works have...

arxiv.org

1

0

1

Cong Lu

@cong_ml

2 years

@timrudner @NYUDataScience @andrewgwils @yaringal @yeewhye Huge congratulations!! 🥳🥳

1

0

1

Cong Lu

@cong_ml

1 year

Revisiting the Minimalist Approach to Offline Reinforcement Learning () @ML_is_overhyped @vladkurenkov integrating many simple recent algorithmic advances into the DrQ+BC baseline provided by V-D4RL! [4/N]

Revisiting the Minimalist Approach to Offline Reinforcement Learning

Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these...

arxiv.org

1

0

1

Cong Lu

@cong_ml

5 months

@TesfayZemuy Yes, they were certainly slower. No reason not to use diffusion models instead as the generative model :)

0

1

Cong Lu

@cong_ml