Jens Tuyls Profile
Jens Tuyls

@JensTuyls

Followers
783
Following
799
Media
5
Statuses
99

PhD @PrincetonCS . Previously CS & Eng. @UCIrvine . Studying AI, ML, RL, #NLProc .

Silicon Valley, CA
Joined June 2016
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@JensTuyls
Jens Tuyls
1 year
Imitation learning is one of the most widely used methods in ML, but how does compute affect its performance? We explore this question in the challenging game of NetHack and find our scaled-up agent to outperform prior SOTA by 2x! [1/6]
Tweet media one
2
19
107
@JensTuyls
Jens Tuyls
3 years
How can RL agents deal with both sparse rewards and large, dynamic action spaces – a key challenge in text games? Our method eXploit-Then-eXplore (XTX) tackles these challenges and achieves a more than 2x improvement on Zork! #ICLR2022 Spotlight 📜[1/5]
Tweet media one
6
11
44
@JensTuyls
Jens Tuyls
9 months
I’ll be at @NeurIPSConf this week! Feel free to reach out if you’d like to chat about anything scale in RL/IL, language agents (or broadly RL + NLP), or game theory!
0
0
17
@JensTuyls
Jens Tuyls
8 years
Loving the new Alexa Skills Kit SDK for Node JS! @alexadevs @amazonecho @AmazonAlexa #amazonecho
0
0
8
@JensTuyls
Jens Tuyls
1 year
More broadly, our results call for work in the larger IL and RL community to more carefully consider the role of scaling laws, which could provide large improvements in many other domains. Also check out prior work by @openai : . [5/6]
1
0
6
@JensTuyls
Jens Tuyls
1 year
We train a suite of neural NetHack agents with different model sizes using Behavioral Cloning (BC) and analyze the loss and mean return isoFLOP profiles. We find both BC loss and mean return to follow clear power law trends with respect to FLOPs. [3/6]
Tweet media one
1
0
6
@JensTuyls
Jens Tuyls
1 year
Using these power laws, we forecast the model and data size needed to train an agent aimed at recovering the underlying expert. While our agent falls short of expert performance, it sets a new SOTA (2.7K) in the unsolved game of NetHack, surpassing the prior best by 2x! [4/6]
1
0
5
@JensTuyls
Jens Tuyls
1 year
Prior works have found IL to consistently underperform the data-generating policy. However, these works often overlook the role of compute in terms of model and data size. Inspired by work around LLMs, we see if scaling up IL can provide similar performance gains. [2/6]
1
0
5
@JensTuyls
Jens Tuyls
8 years
Black smoke over the bay. What's happening? @ABC @CNN @CBSNews #fireInTheBay
Tweet media one
0
0
4
@JensTuyls
Jens Tuyls
3 years
XTX outperforms several competitive baselines across 12 games in the Jericho benchmark (avg norm. scores across games in fig) in both the deterministic and stochastic setting, showing the strength of our multi-stage approach with strategic exploration at the frontier. [4/5]
Tweet media one
1
0
2
@JensTuyls
Jens Tuyls
3 years
XTX employs a two-stage rollout in each episode to tackle these: (1) An *exploitation* policy trained on promising past trajectories returns to the frontier. (2) An *exploration* policy that uses past experience and curiosity explores the frontier. [3/5]
1
0
2
@JensTuyls
Jens Tuyls
3 years
Text games present unique challenges: (1) *Sparse rewards:* agents need to quickly learn from only a few rewarding trajectories. (2) *Large, dynamic action spaces* of up to 50 actions which can differ across states (e.g. “Echo” in fig), requiring clever exploration. [2/5]
1
0
1
@JensTuyls
Jens Tuyls
8 years
@andrew_j_mead @techedrob @udemy Such a great course! Very helpful.
1
2
1
@JensTuyls
Jens Tuyls
2 years
@s_mandt Congrats!! 🎉
0
0
1
@JensTuyls
Jens Tuyls
8 years
@udemy Hi there! I've learned so much through Udemy. Thank you!
0
0
1