CupiaBart Profile Banner
Bartłomiej Cupiał Profile
Bartłomiej Cupiał

@CupiaBart

Followers
1K
Following
736
Statuses
82

I sure do like machine learning

Warsaw, Poland
Joined May 2019
Don't wanna be here? Send us removal request.
@CupiaBart
Bartłomiej Cupiał
9 months
So here's a story of, by far, the weirdest bug I've encountered in my CS career. Along with @maciejwolczyk we've been training a neural network that learns how to play NetHack, an old roguelike game, that looks like in the screenshot. Recenlty, something unexpected happened.
Tweet media one
140
1K
9K
@CupiaBart
Bartłomiej Cupiał
2 days
Fascinating work from my colleagues on MoE scaling laws! 🔥 They showed you can actually get better performance with MoEs under the same memory constraints as dense models. Really cool to see how they challenged the common assumption about memory vs compute trade-offs.
@jahulas
Jan Ludziejewski
2 days
(1/n) Introducing Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient We show that MoE is so compute-efficient that, under the same memory budget, it can beat a dense alternative!
Tweet media one
0
0
15
@CupiaBart
Bartłomiej Cupiał
6 days
RT @MartinKlissarov: Can AI agents adapt zero-shot, to complex multi-step language instructions in open-ended environments? We present Mae…
0
52
0
@CupiaBart
Bartłomiej Cupiał
6 days
RT @aviral_kumar2: 🚨Current scalable RL algos train a policy w/o value func, which is limiting with learning in open-ended, non-stationary,…
0
51
0
@CupiaBart
Bartłomiej Cupiał
20 days
BALROG, our benchmark for agentic LLM and VLM reasoning on games, has just been accepted to #ICLR! See you in Singapore 🇸🇬!
@PaglieriDavide
Davide Paglieri
3 months
Tired of saturated benchmarks? Want scope for a significant leap in capabilities? 🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games! BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come. 1/🧵
Tweet media one
1
4
34
@CupiaBart
Bartłomiej Cupiał
2 months
RT @PaglieriDavide: 🚨BALROG leaderboard update This week's new entries on are: Llama 3.3 70B Instruct 🫤 Claude 3…
0
5
0
@CupiaBart
Bartłomiej Cupiał
3 months
@sirbayes @danijarh It should be possible, but there are two problems 1) craftax has new monsters which would require creating new 3d models 2) I am assuming I would have to rewrite the code to jax (I am not a jax expert) also It won't be as fast
0
0
4
@CupiaBart
Bartłomiej Cupiał
3 months
@danijarh btw rendering was done with opengl
0
0
4
@CupiaBart
Bartłomiej Cupiał
3 months
@danijarh The goal was to compare VLMs on the same environment with different rendering to try to disentangle the underperformance of some VLMs on BALROG Turns out it doesn't help.
1
0
3
@CupiaBart
Bartłomiej Cupiał
3 months
@PaglieriDavide I played for more then a month with nethack wiki and only ascended once.
0
0
0
@CupiaBart
Bartłomiej Cupiał
3 months
RT @PaglieriDavide: The ultimate AGI test?
Tweet media one
0
6
0
@CupiaBart
Bartłomiej Cupiał
3 months
RT @emollick: This may sound odd, but game-based benchmarks are some of the most useful for AI, since we have human scores and they require…
0
110
0
@CupiaBart
Bartłomiej Cupiał
3 months
RT @_rockt: Excited to announce "BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games" led @UCL_DARK's @PaglieriDavide! @douwekie
0
5
0
@CupiaBart
Bartłomiej Cupiał
3 months
Want to know if your LLMs can truly act in the real world? BALROG is your answer. Our benchmark rigorously tests planning, reasoning & decision-making in LLMs and VLMs - the skills that matter for real applications.
@PaglieriDavide
Davide Paglieri
3 months
Tired of saturated benchmarks? Want scope for a significant leap in capabilities? 🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games! BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come. 1/🧵
Tweet media one
0
5
22
@CupiaBart
Bartłomiej Cupiał
3 months
RT @ulyanapiterbarg: There are infinitely many ways to write a program. In our new work, we show that training autoregressive LMs to synthe…
0
17
0
@CupiaBart
Bartłomiej Cupiał
3 months
I am happy to announce that I will be presenting our paper about fine-tuning in RL at this year's MLinPL conference! 🥳
Tweet media one
0
3
31
@CupiaBart
Bartłomiej Cupiał
5 months
RT @PiotrRMilos: Excellent news from NeurIPS. Two papers in, including a spotlight. 1. Repurposing Language Models into Embedding Models:…
0
16
0
@CupiaBart
Bartłomiej Cupiał
7 months
Look at what duck I met! #DuckInTheCity
0
0
4
@CupiaBart
Bartłomiej Cupiał
7 months
0
0
4