Maciej Wołczyk Profile
Maciej Wołczyk

@maciejwolczyk

Followers
526
Following
286
Media
26
Statuses
123

Post-Doc @IDEAS_NCBR , alumni of Jagiellonian University. Reinforcement learning / continual learning / transfer learning. Outside of work:🏃🏻🧗🎮📖🎸

Joined March 2011
Don't wanna be here? Send us removal request.
Explore trending content on Musk Viewer
@maciejwolczyk
Maciej Wołczyk
2 years
Yesterday was the last day of MLSS^N, a summer school on ML and neuroscience I co-organised. If you missed it, you can catch up on the wonderful lectures of @rpascanu , @SaxeLab , @vdbergrianne and many other amazing speakers here:
4
48
200
@maciejwolczyk
Maciej Wołczyk
1 month
Fine-tuning has been essential for LLMs, but it doesn’t work too well in reinforcement learning. 🤔 Why? In our ICML 2024 spotlight, we highlight a crucial issue with this approach and show that if done correctly, it outperforms previous SOTA by 2x on a very challenging domain!🧵
Tweet media one
2
40
190
@maciejwolczyk
Maciej Wołczyk
2 years
Continual Learning + Interval Arithmetic = InterContiNet! We approach the continual learning problem by intersecting hyperrectangles in the parameter space, which allows us to give guarantees on future performance. To be presented on Thursday at @icmlconf . #ICML2022 (1/N)
Tweet media one
1
9
47
@maciejwolczyk
Maciej Wołczyk
20 days
I am proud to say I have joined @ELLISforEurope ! Europe needs more continent-wide collaboration in AI research, and I'm here to help!
@IDEAS_NCBR
IDEAS NCBR
21 days
@maciejwolczyk , a postdoc at IDEAS NCBR, has been accepted into the prestigious @ELLISforEurope ! His research focuses on efficiently adapting deep learning models, especially in continual and reinforcement learning. 🙌 Congratulations! #MachineLearning #Research #ELLIS #AI
Tweet media one
0
0
9
4
4
46
@maciejwolczyk
Maciej Wołczyk
2 years
Transfer learning is an essential component of continual learning, but it's still not well understood. Although we have some insights into sharing knowledge between tasks in NLP and CV, what can be said about RL? Find out in our #NeurIPS2022 paper: (1/N)
1
5
36
@maciejwolczyk
Maciej Wołczyk
3 months
I'm very proud to have received the FNP Start scholarship for talented young researchers!
@IDEAS_NCBR
IDEAS NCBR
3 months
💯 @maciejwolczyk (IDEAS NCBR) i @piotr_kicki ( @PUT_Poznan , IDEAS NCBR) wśród 100 najzdolniejszych młodych naukowców w Polsce według @FNP_org_pl . Gratulujemy! 🏆Każdy z laureatów otrzyma roczne stypendium – 30 tys. zł.
1
4
18
2
2
28
@maciejwolczyk
Maciej Wołczyk
1 month
Thanks for having us, it's been great!
@_rockt
Tim Rocktäschel
1 month
Great presentation by @maciejwolczyk and @CupiaBart of their paper "Fine-tuning Reinforcement Learning Models is Secretly a Forgetting Mitigation Problem" () as well as the story behind their @NetHack_LE full moon bug hunt at the @UCL_DARK Seminar Series.
Tweet media one
2
9
60
1
1
18
@maciejwolczyk
Maciej Wołczyk
3 months
Join me and @CupiaBart on a convoluted journey of troubleshooting a very very nasty bug.
@CupiaBart
Bartłomiej Cupiał
3 months
So here's a story of, by far, the weirdest bug I've encountered in my CS career. Along with @maciejwolczyk we've been training a neural network that learns how to play NetHack, an old roguelike game, that looks like in the screenshot. Recenlty, something unexpected happened.
Tweet media one
148
2K
9K
0
0
16
@maciejwolczyk
Maciej Wołczyk
5 years
Visit me today @neuroAIworkshop and check out our poster on Spatial Neural Networks. At GMUM we are really excited about combining ideas from AI and neuroscience!
Tweet media one
3
3
14
@maciejwolczyk
Maciej Wołczyk
2 years
I will be at #NeurIPS2022 next week! Let me know if you want to grab a coffee and chat about transfer learning, RL, or something entirely different. In other news, I'm on the job market for a research scientist position, so this would also make for a great conversation topic!
0
0
14
@maciejwolczyk
Maciej Wołczyk
2 months
Check out this sneak peek into the geometry of SSM states! The hidden states in SSMs have some surprisingly nice properties -- you can just average them to boost performance at ICL tasks and easily retrieve states corresponding to similar tasks. Based on our latest pre-print.
@maciejpioro
Maciej Pióro
2 months
Language modeling based on gated linear recurrence, such as Mamba, Hawk and RecurrentGemma is gaining traction! But can you use the statefulness of these models for more than efficiency? Super happy to introduce (snake 😉) state soups! 🧵
Tweet media one
2
5
28
0
0
12
@maciejwolczyk
Maciej Wołczyk
2 years
No papers to present at this year's @iclr_conf , but I'm very proud to be one of the highlighted reviewers!
0
0
11
@maciejwolczyk
Maciej Wołczyk
1 month
In fact, by simply combining fine-tuning with knowledge retention methods, we managed to outperform the previous SOTA in NetHack by 2x. This gives us hope that the current foundation models in RL can also be improved through careful online fine-tuning!
Tweet media one
1
0
8
@maciejwolczyk
Maciej Wołczyk
3 months
@niebezpiecznik Czas na obowiązkowe szkolenia z astrologii dla informatyków :)
1
0
9
@maciejwolczyk
Maciej Wołczyk
3 years
We'll be presenting our NeurIPS poster on Wednesday! Please come, say hi, and chat about the challenges of continual and reinforcement learning!
@PiotrRMilos
Piotr Miłoś@ICML
3 years
We are happy to present our Continual World benchmark: . To learn what works and what does not on the intersection of CL and RL, and why the forward transfer is important, meet us at #NeurIPS2021 (Wed Dec 08, 08:30am - 10:00am GMT): .
1
0
18
1
2
9
@maciejwolczyk
Maciej Wołczyk
1 month
Years of study in the continual learning field showed that neural networks catastrophically forget data that is not currently seen. Fortunately, this field also developed tools to stop neural nets from forgetting. We can use them to protect the old skills when learning new ones..
Tweet media one
1
0
8
@maciejwolczyk
Maciej Wołczyk
2 years
You can check out the full speaker lineup at Big thanks to all of the speakers, co-organizers, and the participants who helped to make it happen!
1
1
8
@maciejwolczyk
Maciej Wołczyk
1 month
If you simply fine-tune your NetHack model, it performs quite badly since it rapidly forgets the Far states it cannot reach at the beginning of the training. On the other hand, applying knowledge retention methods from continual learning again fixes this problem!
Tweet media one
1
0
7
@maciejwolczyk
Maciej Wołczyk
2 years
I'm happy to be a part of this initiative! Conditional computation is a fascinating and rather underexplored part of deep learning with practical applications (especially for efficiency) and intriguing ties to neuroscience. Check out our website and consider submiting a paper!
@dynn_icml2022
Workshop on Dynamic Neural Networks @ ICML 2022
2 years
Announcing the 1st Dynamic Neural Networks (DyNN) workshop, a hybrid event @icmlconf 2022! 👇 We hope DyNN can promote discussion on innovative approaches for dynamic neural networks from all perspectives. Want to learn more?
1
27
91
1
1
5
@maciejwolczyk
Maciej Wołczyk
2 years
@sarahookr @tomasztrzcinsk1 Are there any openings in your IDEAS NCBR group? Seems like a good fit.
0
0
6
@maciejwolczyk
Maciej Wołczyk
1 month
...and this works very well in our robotic task!
Tweet media one
1
0
6
@maciejwolczyk
Maciej Wołczyk
4 years
Check out our new paper introducing a generative autoencoder with a Gaussian mixture as the latent space prior! Since both class and style is contained in a single continuous space, we are able to smoothly change the label of a given image.
@BernhardGeiger
Bernhard Geiger
4 years
Finally out! Semi-supervised generative Cramer-Wold autoencoder with a GMM as target -- read early access at IEEE T-NNLS () or preprint on arXiv ()! #DeepLearning (Collaboration between @JagiellonskiUni and @Know_Center )
Tweet media one
Tweet media two
1
1
9
1
2
5
@maciejwolczyk
Maciej Wołczyk
1 month
We can conceptualize this problem by partitioning the environment into the Close states, which are easily available at the beginning of the trajectory, and the Far states, which are only available after putting some work into learning.
1
0
5
@maciejwolczyk
Maciej Wołczyk
2 years
Find more in our paper and visit us at the poster sesion on Thursday afternoon, Hall J #110 ! Huge thanks to my amazing co-authors: @Michal_Zajac_ @rpascanu @LukeKucinski @PiotrRMilos (9/9)
0
0
5
@maciejwolczyk
Maciej Wołczyk
1 month
We run experiments on NetHack, an extremely challenging env, using a massively scaled-up behavioral cloning model. If you compare the states the agent sees during the offline pre-training and states gathered by unrolling the cloned policy, the difference is massive!
Tweet media one
1
0
5
@maciejwolczyk
Maciej Wołczyk
2 years
There is still much to do, e.g. scaling up and finding tighter bounds on performance, but I'm really excited about this research direction! If you'd like to know more, check out our paper and visit me at the poster session on Thursday afternoon! (8/8)
0
0
5
@maciejwolczyk
Maciej Wołczyk
6 years
Me: recursion jokes are dumb Also me: Me: recursion jokes are dumb Also me: Me: recursion jokes are dumb Also me: ...
0
1
4
@maciejwolczyk
Maciej Wołczyk
6 years
@dgrey0 Mildly interesting fact - in Polish there's a word "rogalik" which means "croissant" and sounds vaguely like "roguelike". So, since "rogalik" rolls off our tongues easier than "roguelike" (and since it's fun), we sometimes refer to roguelikes as croissants.
0
0
4
@maciejwolczyk
Maciej Wołczyk
1 month
A similar problem will appear when your model is pre-trained offline through behavioral cloning. In complex environments, we can never perfectly clone the expert, so our policy will make small mistakes here and there.
1
0
4
@maciejwolczyk
Maciej Wołczyk
1 month
In our case, the Close states represent opening the drawer, and the Far states represent picking up and moving objects. As we start fine-tuning, the model cannot solve Close, so it doesn’t see any examples from Far. This causes it to forget how to behave in Far states.
Tweet media one
1
0
4
@maciejwolczyk
Maciej Wołczyk
4 years
Surprisingly, the model disentangles the class and style variables without any explicit regularisation. Because of that we are able to smoothly change the class without affecting other features.
Tweet media one
0
0
3
@maciejwolczyk
Maciej Wołczyk
1 month
These mistakes compound as we roll out the trajectory, getting further and further from the pre-trained distribution. We will see Far states during fine-tuning much less often than during pre-training, which again leads to forgetting. How big of a problem is this in practice?
Tweet media one
1
0
3
@maciejwolczyk
Maciej Wołczyk
1 month
When trained, the model will figure out how to open the drawer but it will no longer be able to pick up the object. What happened?
Tweet media one
1
0
3
@maciejwolczyk
Maciej Wołczyk
2 years
Finding #1 : Critic is more important for transfer than the other components. It's not enough to carry over skills and behaviors, you need the value function as well in order to maximize improvements. (4/N)
Tweet media one
1
0
3
@maciejwolczyk
Maciej Wołczyk
2 years
We focus on the SAC algorithm and perform an exhaustive study of transfer using the Continual World benchmark consisting of 10 robotic tasks. We explore the short-term transfer between pairs of tasks (10x10 = 100 pairs) as well as continual learning on the whole sequence. (2/N)
Tweet media one
1
0
3
@maciejwolczyk
Maciej Wołczyk
1 month
Imagine you have a robot steered by a neural network that can pick up objects. Now, the object it has to move is hidden inside a drawer.
Tweet media one
1
0
3
@maciejwolczyk
Maciej Wołczyk
3 months
@richinseattle @CupiaBart @JensTuyls You might enjoy our latest paper then: it's about finetuning in RL in general, but we have a bunch of experiments on NetHack. There's also a growing community working on NetHack in ML, see e.g,
0
0
2
@maciejwolczyk
Maciej Wołczyk
3 years
...however in practice, it doesn't happen! Even though CL methods remember A, they show much worse forward transfer on C than a simple fine-tuning approach. Solving catastrophic forgetting is not enough to have a good continual learner!
Tweet media one
0
0
3
@maciejwolczyk
Maciej Wołczyk
1 month
I'll be at #ICML2024 for the rest of the week, let me know if you'd like to talk over coffee and/or lunch!
0
0
3
@maciejwolczyk
Maciej Wołczyk
3 years
If you'd like to explore amazing ways how deep learning and neuroscience interact with each other, be on a lookout for our summer school! Co-organised by GMUM () and the wonderful folks at @MLinPL . More info coming shortly!
@MLinPL
ML in PL
3 years
❗🔊 Attention, all machine learners – this summer you can learn from the very best in the fields of ML, CV, and computational neuroscience in the beautiful city of Kraków! The MLSS^N summer school will be a joint effort between GMUM and ML in PL! Details:
Tweet media one
0
3
8
0
1
3
@maciejwolczyk
Maciej Wołczyk
2 years
In continual learning terms, we can find a hyperrectangle of parameters that perform well on the current task and get a guarantee on the performance during the rest of the training, as long as we stay within this parameter region! (6/N)
1
0
2
@maciejwolczyk
Maciej Wołczyk
2 years
Continual learning can be formalized as training on a sequence of tasks while maintaining performance on the previous tasks. Equivalently, we want to find the new solution within a region of parameters that perform well on previous tasks. (2/N)
Tweet media one
1
0
2
@maciejwolczyk
Maciej Wołczyk
2 years
Unfortunately, we had some sound issues during the first lecture, but after that it gets much better so don't get discouraged by that!
0
0
2
@maciejwolczyk
Maciej Wołczyk
2 years
Finding #4 : Reusing replay data from previous tasks can help a lot if done properly. Simply storing the whole past in SAC's buffer (Perfect Memory) is not a good idea, but distilling the knowledge through behavioral cloning works out great! (7/N)
Tweet media one
1
0
2
@maciejwolczyk
Maciej Wołczyk
2 years
To work with hyperrectangle parameter regions in practice, we represent each weight in a neural network as an interval [w_a, w_b] rather than a single point. Using interval arithmetic we implement the basic neural network components (layers, activations, etc). (4/N)
Tweet media one
1
0
2
@maciejwolczyk
Maciej Wołczyk
2 years
We take SAC apart into three components: the exploration policy, the paremeters of the actor, and the parameters of the critic. We study how carrying them over from previous tasks impacts the speed of learning a new task (forward transfer, FT). Here are our main findings: (3/N)
1
0
2
@maciejwolczyk
Maciej Wołczyk
3 years
Sneakpeek: It seems that even if CL methods can remember the past, they cannot efficiently reuse it. Let's say we have a sequence of tasks A->B->C, where A->C exhibits high forward transfer, and B acts as a distractor. Then CL methods should remember A to perform better at C...
1
0
2
@maciejwolczyk
Maciej Wołczyk
2 years
The empirical results are not groundbreaking, but we can match the performance of regularization methods (EWC/MAS/SI) while providing guarantees. Interval propagation is a bit tricky to work with due to stability issues, but CIFAR is still within reach (7/N)
Tweet media one
Tweet media two
1
0
2
@maciejwolczyk
Maciej Wołczyk
2 years
These insights lead us to a simple, efficient algorithm we dubbed ClonEx-SAC. On the CW10 and CW20 benchmark sequences from Continual World, it considerably outperforms other much more sophisticated methods. (8/N)
Tweet media one
1
0
2
@maciejwolczyk
Maciej Wołczyk
2 years
Finding #3 : Exploration becomes even more important when we get to longer sequences of tasks. Reusing policies from previous tasks to gather initial data in the new task improves the performance considerably. (6/N)
Tweet media one
1
0
2
@maciejwolczyk
Maciej Wołczyk
5 years
Ja: Cały Twitter growy:
Tweet media one
0
1
2
@maciejwolczyk
Maciej Wołczyk
2 years
Finding #2 : Contributions of SAC's components are additive, i.e. FT(actor, critic, exploration) ≈ FT(actor) + FT(critic) + FT(exploration). Even though the critic makes the biggest difference, the actor and exploration still make considerable independent contributions! (5/N)
Tweet media one
1
0
2
@maciejwolczyk
Maciej Wołczyk
2 years
This way, given an interval neural network, we can compute the set of possible outputs for any particular input. Now, we can take the worst-case loss over this set, and as long as we stay within the parameter region, the performance will not get worse. (5/N)
1
0
1
@maciejwolczyk
Maciej Wołczyk
2 years
However, the parameter regions can be highly irregular and the problem is NP-hard even if we assume they are polytopes as shown nicely by @LauchLab @syimplectic @tommy_da_cat We simplify the problem and assume all regions are hyperrectangles. (3/N)
1
0
1
@maciejwolczyk
Maciej Wołczyk
3 months
@tim_zaman @CupiaBart At this point it was grasping at straws. But yes, we had cuda in our container, we just thought that maybe something about system cuda libraries interfered with our setup.
0
0
1
@maciejwolczyk
Maciej Wołczyk
4 years
@alniac Dobry występ, przekaż gratulacje!
0
0
1
@maciejwolczyk
Maciej Wołczyk
2 years
@LucasPCaccia The regular FIM is the "correct" one for EWC, i.e. it was used in the original paper. I confirmed this with the authors at some point. Still, I haven't seen any experimental studies of the impact of using empirical Fisher in CL, it's an interesting question!
1
0
1
@maciejwolczyk
Maciej Wołczyk
5 years
@supergreatfrien @Swery65 Are you involved in this?
1
0
1
@maciejwolczyk
Maciej Wołczyk
5 years
@spikain Wkrótce.
Tweet media one
0
0
1