Bartłomiej Cupiał Profile Banner
Bartłomiej Cupiał Profile
Bartłomiej Cupiał

@CupiaBart

Followers
1,038
Following
395
Media
5
Statuses
59

I sure do like machine learning

Warsaw, Poland
Joined May 2019
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
Pinned Tweet
@CupiaBart
Bartłomiej Cupiał
5 months
🚀Excited to share our latest work on fine-tuning RL models! By integrating fine-tuning with knowledge retention methods, we've achieved SOTA🔥in NetHack🎮, with scores surpassing 10K points, doubling the previous record. A detailed thread coming soon! ✨
Tweet media one
7
24
109
@CupiaBart
Bartłomiej Cupiał
2 months
So here's a story of, by far, the weirdest bug I've encountered in my CS career. Along with @maciejwolczyk we've been training a neural network that learns how to play NetHack, an old roguelike game, that looks like in the screenshot. Recenlty, something unexpected happened.
Tweet media one
149
2K
9K
@CupiaBart
Bartłomiej Cupiał
2 months
The moral is, if you encounter an unexpected bug, be sure to consult lunar calendar. Big thanks to @JensTuyls for solving this for us!
53
137
4K
@CupiaBart
Bartłomiej Cupiał
2 months
So apparently NetHack has a mechanic that slightly changes how the game plays every time it's full moon according to your system clock: The player character is luckier, werewolves appear in their animal form, and the dogs howl ominously.
6
47
2K
@CupiaBart
Bartłomiej Cupiał
2 months
Next day in the morning I see a lot of messages on slack. Jens replied "Oh yes, it's probably a full moon today." What.
Tweet media one
2
47
2K
@CupiaBart
Bartłomiej Cupiał
2 months
It doesn't make the game harder, but the model hasn't seen full moon data in its training set, so the score drops. In this particular case, it drops from 5k points to 3k points. We override the time so it's not a full moon, we evaluate the model - and it's 5k points again.
9
18
2K
@CupiaBart
Bartłomiej Cupiał
2 months
I check a moon phase calendar, and yes, it's a full moon today. Hands shaking, I start a new NetHack game, and the message says "You are lucky! Full moon tonight." What.
5
13
1K
@CupiaBart
Bartłomiej Cupiał
2 months
By the point we've spent several hours on this, it's 7 PM. I am starting to feel like a madman. I can't even watch a TV show constantly thinking about the bug. Before going to sleep I decide to ask @JensTuyls , the author of the model, if he knows what might be broken.
1
6
734
@CupiaBart
Bartłomiej Cupiał
2 months
Namely, the CUDA libraries that allow us to compute things quickly on GPU. So we suspect that maybe something about these libraries changed that degraded the model. Because what else could have? And yes, recently the version was changed from 11.8 to 12.4.
2
4
676
@CupiaBart
Bartłomiej Cupiał
2 months
We use a model by @JensTuyls that clones expert behavior on NetHack, and we improve it using RL methods. That model gets 5000 points and we finetune it in the game so that the score improves. However, suddenly in a recent run, Jens' model only got 3000 points. Quite a drop.
3
5
667
@CupiaBart
Bartłomiej Cupiał
2 months
Revert code a few weeks back? Still 3000 points. Luckily, the server we run our experiments on saves the files from the previous runs. We find the files corresponding to a run that previously got 5000 points, we re-run, and, well, it gets 3000. Nothing about the code changed.
2
5
635
@CupiaBart
Bartłomiej Cupiał
2 months
The CUDA mismatch probably shouldn't impact the results in this particular way, but we see no other explanation. We override the version to 11.8 - we still get 3000 points. We build a new environment from scratch, for CUDA 12.4 - 3000 points. Welp.
1
3
633
@CupiaBart
Bartłomiej Cupiał
2 months
We repeat the evaluation on a personal laptop. This is slow and expensive without the specialized hardware, but we make it work. Again, 3000 points. We disable multithreading, GPU, and some other things that have at least a conceivable chance of causing the problem - 3000 points.
1
2
632
@CupiaBart
Bartłomiej Cupiał
2 months
We start suspecting our software stack. Thankfully, we use Singularity which means that our whole environment is in a single, self-contained file. That file hasn't changed for a few months, so that shouldn't be the problem. However, the container loads one thing from the server.
1
8
629
@CupiaBart
Bartłomiej Cupiał
2 months
This problem is consistent between seeds so it's not just a fluke. Well, we probably screwed up something in the code for loading the model in the recent commit. Let's revert, no biggie. Except that after reversing to a version of the code from a few days back, we still get 3000.
1
4
605
@CupiaBart
Bartłomiej Cupiał
2 months
@tomcocobrico @maciejwolczyk This is actually the case. I remember now that few months ago I had similar problem on a sub experiment. I just thought the code had buggs and abandoned that branch :D
0
0
355
@CupiaBart
Bartłomiej Cupiał
2 months
@cglassey_author @maciejwolczyk My favorite interaction is using a cockatrice corpse as a weapon, very few monsters have stoning resistance :)
2
0
30
@CupiaBart
Bartłomiej Cupiał
6 months
Hey, I wanted to tell you that I added NetHack to sample_factory. Just transferring the code boosted my results by over 30%! If you ever want to do experiments on NetHack, this is now definitely the best place, as other NetHack repositories are heavily research-oriented.
3
4
17
@CupiaBart
Bartłomiej Cupiał
2 months
@AviBenemanuel @maciejwolczyk Yes, but unfortunately singularity takes time from the host machine. Also changing the time on your machine breaks up other things :<
0
0
5
@CupiaBart
Bartłomiej Cupiał
2 months
@sheepyk @maciejwolczyk Yes this is quite the issue. Maximizing the score means that you will just farm monsters. Finding items required for ascention or even Just doing a quest is too much for pure RL agent.
0
0
4
@CupiaBart
Bartłomiej Cupiał
2 months
@RyanSullyvan It turns out results in my paper accepted to ICML have lower score because of the full moon lol 1000 points or so
1
0
5
@CupiaBart
Bartłomiej Cupiał
6 months
If you're interested readme: wandb report: hugging face model card:
0
0
5
@CupiaBart
Bartłomiej Cupiał
2 months
@kinitawowi @JensTuyls Save the date! June 13 2025 friday full moon
0
0
4
@CupiaBart
Bartłomiej Cupiał
20 days
Model-free beating model based RL! Looks like we need harder environments.
@mic_nau
Michal Nauman
20 days
Scaling has done wonders for deep learning, but for a long time it failed in on-policy RL... until now! We show that when done appropriately, scaling leads to state-of-the-art results in a variety of continuous control tasks🔥 Introducing BRO: Bigger, Regularized, Optimistic! 🧵
8
48
266
0
0
4
@CupiaBart
Bartłomiej Cupiał
5 months
@hbouammar Thanks again for having me!
1
0
4
@CupiaBart
Bartłomiej Cupiał
2 months
@Abel_TorresM In terms of luck, yes. But the rnn sees the message it never saw (full moon tonight) at the start of the game which can influence the hidden states.
1
0
3
@CupiaBart
Bartłomiej Cupiał
2 months
@Kallahan11 @JensTuyls For the next year we are safe 🌕
Tweet media one
0
0
3
@CupiaBart
Bartłomiej Cupiał
5 months
@ulyanapiterbarg Thank you for pointing this out. We wanted to focus on the simplest setting since this is a complex game, so we run experiments only in monk setting to decrease the number of experiments. I just ran experiments on @ and without tuning any of the hparams we get ~4.5k score.
Tweet media one
1
0
3
@CupiaBart
Bartłomiej Cupiał
14 days
@IntuitMachine Perplexity uses RAG and it works fine
0
0
2
@CupiaBart
Bartłomiej Cupiał
2 months
@dlowd That's certainly true! I spend more time then I would like to getting my first ascention :)
1
0
2
@CupiaBart
Bartłomiej Cupiał
2 months
@sheepyk @maciejwolczyk Thats why NetHack is such an interesting testbed for new RL approaches :)
0
0
1
@CupiaBart
Bartłomiej Cupiał
2 months
@yasuoyamasaki During a full moon: Your base Luck is increased by one. Which is an advantage. Werecreatures, especially at night, will usually be in their animal form. Which can cause a lycanthropy.
0
0
1