🎄I am a big fan of
@ylecun
‘s &
@alfcnz
‘s Deep Learning course. The attention to detail is incredible and one feels the love and passion, which goes into every single course week (my favorites: 7+8 on EBMs)🤗
#feelthelearn
📜:
📽️:
It's the beginning of a new month - so let's reflect on the core ideas of statistics in the last 50 years ⏳ Great weekend read by
@StatModeling
&
@avehtari
covering the core developments, their commonalities & future directions 🧑🚀
#mlcollage
[17/52]
📜:
Beautiful overview of Bayesian Methods in ML by
@shakir_za
at
#MLSS2020
. Left me pondering about many things beyond Bayesian Inference. Thank you Shakir🙏
Quote of the day: “The cyclist, not the cycle, steers.“🚴♀️
🎤 P-I:
🎤 P-II:
Really happy to share
#visualmlnotes
✍️ a virtual gallery of sketchnotes taken at Machine Learning talks 🧠🤓🤖 which includes last weeks
#ICLR2020
. Explore, exploit & feel free to share:
💻 website:
📝 repository:
🤖JAX is more than just the 'next cool autodiff library'. The primitives allow us to flexibly leverage XLA and to speed-up + vectorize neuroevolution methods 🦎 with minimal engineering overhead. Find out more in my new blog post 📝:
Great tutorial on Meta-Learning by
@yeewhye
covering optimisation-based, black-box & a probabilistic perspective on learning task invariances at
#MLSS2020
. Re-watch the videos here:
📺(Part I):
📺(Part II):
🚀 I am very excited to share gymnax 🏋️ — a JAX-based library of RL environments with >20 different classic environments 🌎, which are all easily parallelizable and run on CPU/GPU/TPU.
💻[repo]:
📜[colab]:
There is a lot to wrap your head around in LSTMs🤯. One way of thinking that helped me a lot is the 'conveyor belt' metaphor of the cell state 🧑🏭 by
@ch402
. I put together a little animation 🖼️ Check out the amazing blog post by Chris Olah here✍️:
What a week 🧠🤓💻! I loved meeting so many of you at
#NeurIPS2019
- the ML community is truly wonderful. Checkout all my collected visual notes ✍️ & feel free to share:
The lottery ticket hypothesis 🎲 states that sparse nets can be trained given the right initialisation 🧬. Since the original paper (
@jefrankle
&
@mcarbin
) a lot has happened. Checkout my blog post for an overview of recent developments & open Qs.
✍️:
Great
#NeurIPS2019
tutorial kick-off by
@EmtiyazKhan
! Showing the unifying Bayesian Principle bridging Human & Deep Learning. Variational Online Gauss-Newton (VOGN; Osawa et al., 19‘) = A Bayesian Love Story ❤️
JAX sometimes has me feeling like a kid in a candy store 🍭 Here is a small example of how to sample batches of Ornstein-Uhlenbeck process realisations combining lax.fori_loop, jit & vmap 🚀 Auto-vectorisation made intuitive and scalable 🤗
🚀 How can meta-learning, self-attention & JAX power the next generation of Evolutionary Optimizers 🦎?
Excited to share my
@DeepMind
internship project and our
#ICLR2023
paper ‘Discovering Evolution Strategies via Meta-Black-Box Optimization’ 🎉
📜:
🎉 Excited to share `mle-monitor` - a lightweight ML experiment protocol and tool for monitoring resource utilization 📝 It covers local machines/servers and Slurm/Grid engine clusters 📉
💻 [repo]:
📜 [colab]:
📈 What functions do ReLU nets 'like' to learn? 🌈 Using Fourier analysis Rahaman et al. (19') reveal their bias to learn low frequency modes first. Insights for implicit regularization & adv. robustness.
#mlcollage
[3/52]
📝:
💻:
🥳Really excited to be attending
#MLSS2020
. Great set of talks by
@bschoelkopf
& Stefan Bauer starting from 101 causality to Representation Learning for Disentanglement 💯! Re-watch them here:
📺 (Part I):
📺 (Part II):
How to train your d̶r̶a̶g̶o̶n̶ ViT? 🐉 Steiner et al. demonstrate that augmentation & regularization yield model performance comparable to training on 10x data. Many 💵-insights for practitioners.
🎨
#mlcollage
[30/52]
📜:
💻:
🚀 Happy to share my hyperparameter search tool: `mle-hyperopt` - a lightweight API covering many strategies with search space refinement 🪓, configuration export 📥 & storage/reloading of previous logs 🔄
💻[repo]:
📜[colab]:
Friday optimization revelations📉: My life needs more theoretical guarantees & convex + linear =❤️. Enlightening set of talks by
@BachFrancis
at
#MLSS2020
. Recordings can be found here:
📽️(Part I):
📽️(Part II):
🎉 Happy to share a mini-tool that I have been using on a daily basis: `mle-logging` - a lightweight logger 📉 for ML experiments, which makes it easy to aggregate logs across configurations & random seeds 🌱
💻 [repo]:
📜 [colab]:
🥳 New tooling blog post coming your way 🚆 'A Machine Learning Workflow for the iPad Pro' - including my favourite apps, routines and pipelines for working with remote machines and
@Raspberry_Pi
💽👨💻.
✍️:
🤗: Thanks
@tech_crafted
for the inspiration!
Puuuh. What are you up to these days? 💭 I try to stay sane, clean my place 🧹& write✍️. Todays edition - 'Getting started with
#JAX
'. Learn how to embrace the 'jit-grad-vmap' powers 💻 and code your own GRU-RNN in JAX. Stay safe & home. 🤗
💓 N-Beats is a pure Deep Learning architecture for 1D time series forecasting 📈 provides a M3/M4/tourism SOTA by combining learned/interpretable basis functions 🧑🔬 w. residual stacking & ensembling 🎨
#mlcollage
[38/52]
📜:
💻:
Looking to get started with the
@kaggle
ARC challenge & want to learn about psychometric/ability-based assessment of intelligent systems? Checkout my blogpost which provides an intro to "On the measure of intelligence" & the corpus by
@fchollet
🤖🧠🎉 👉
🎉 2019 🎉 was quite the year for Deep Reinforcement Learning. In todays blog post I list my top 10 papers 🦄💻🧠 What was your favourite paper? Let me know!
Great start to an all-virtual
#ICLR2020
& the ‘Causal Learning for Decision Making‘ workshop including talks by
@bschoelkopf
& Lars Buesing 🧠📉👨💻. Looking forward to more smooth Q&As and exploring the awesome web interface!
🎉 Stoked to share that I joined
@SakanaAILabs
as a Research Scientist & founding member.
@yujin_tang
&
@hardmaru
's work has been very inspirational for my meta-evolution endeavors🤗
Exciting times ahead: I will be working on nature-inspired foundation models & evolution 🐠/🧬.
🚀 Happy to share evosax - a JAX-based library of Evolution Strategies (ES) featuring >10 different ES ranging from classics (e.g. CMA-ES, PSO) 🦎 to modern neuroevolution methods (e.g. ARS, OpenES, ClipUp)🤖
💻[repo]:
📜[colab]:
Awesome new JAX tutorial by DeepMind 🥳 Covering the philosophy of stateful programs 💭, JAX primitives and more advanced topics such as TPU parallelism, higher-order & per-example gradients ∇. All in all a great resource for every level of expertise🚀
👉
How well do scalable Bayesian methods 🚀 approximate the true model average?
@Pavel_Izmailov
et al. (21') provide insights into performance, generalization, mixing & tempering 🌡️ of Bayesian Nets ! Hamiltonian MC + 512 TPU-v3 = 💘
#mlcollage
[18/52]
📜:
#MLSS2020
was full of wonderful experiences 🦋 I hope to meet many of you soon & in person. Here are all
#visualmlnotes
, videos & slides:
✍️:
📼&📚:
Thank you 🙏 to all hard working volunteers & organizers - you did awesome 🤗
Thinking 💭about biological & artificial learning with the help of Marr‘s 3 levels of analysis. Here are the
#visualmlnotes
✍️from Peter Dayan‘s talk at
#MLSS2020
& a little pointer to a nice complementary paper by
@jhamrick
&
@shakir_za
:
👉
Excited to share that I got to join DeepMind as a research intern ☀️
This has been a dream 💭 which felt out of reach for a long time. Super grateful to the many people that supported me along the way 🤗
Time to do awesome work with
@flennerhag
,
@TZahavy
& the discovery team🚀
🚀 How similar are network representations across the layers & architectures? And how do they emerge through training?🤸New blog on Centered Kernel Alignment (
@skornblith
et al., 2019) & training All-CNN-C in JAX/flax 🤖
📝:
💻:
📉 GD can be biased towards finding 'easy' solutions 🐈 By following the eigenvectors of the Hessian with negative eigenvalues, Ridge Rider explores a diverse set of solutions 🎨
#mlcollage
[40]
📜:
💻:
🎬:
SSL joint-embedding training 🧑🤝🧑 w/o asymmetry shenanigans? 🤯 Zbontar, Jing et al. propose a simple info bottleneck objective avoiding trivial solutions. Robust to small batches + scales w. dimensionality
#mlcollage
[19/52]
📜:
💻:
Can NNs only learn to interpolate?
@randall_balestr
et al. argue that NNs have to extrapolate to solve high dimensional tasks🔶 Questioning the relation of extrapolation & generalization 🎨
#mlcollage
[39/52]
📜:
🎙️ [
@MLStreetTalk
]:
Epic new show out with
@ylecun
and
@randall_balestr
where we discuss their recent everything is extrapolation paper, interpolation and the curse of dimensionality, and also dig deep into Randall's work on the spline theory of deep learning.
@DoctorDuggar
@ecsquendor
@ykilcher
‘Innate everything‘ 🧠🧐🐊 -
@hardmaru
argues for the importance of finding the right inductive biases in bodies/architectures (WANNs) & prediction/world models (Observational Dropout) - Transferable Skills Workshop
#NeurIPS2019
🎉 Stoked to share NeuroEvoBench – a JAX-based Evolutionary Optimizer benchmark for Deep Learning 🦎/🧬
🌎 To be presented at
#NeurIPS2023
Datasets & Benchmarks with
@yujin_tang
&
@alanyttian
🌐:
📜:
🧑💻:
✍️Want to learn more about RL, generalization within & across tasks as well as the ‚reward is enough hypothesis‘ 🌍🔄🤖? Checkout a set of thought-provoking talks by
@matteohessel
,
@aharutyu
and David Silver at the
@M2lSchool
✌️
🎉 I transitioned from Berlin to the Tokyo 🗼 office for the 2nd half of my
@GoogleDeepMind
student researcher time!
🤗Deeply thankful to
@yujin_tang
for all the support leading up to & during my first days in Japan 🇯🇵Everything still feels pretty surreal & I am super grateful!
People of the world - I just posted a new blog post covering my
#CCN2019
experience & many keynote talks. It is fair to say - I had a truly fulfilling time 💻❤️🧠. Thank you to all organizers, volunteers & speakers (
@CogCompNeuro
). [1/2]
This is a live dashboard 💻 monitoring my compute resources & the status/database of ML experiments 🚀 [more about this at a later point 🤗]. It is built with rich in ca. 10 hours of procreative work.
Many gems in
@OriolVinyalsML
Deep RL workshop talk at
#NeurIPS2019
on AlphaStar. Including scatter connections, imitation-based regularization, the league & the unique problem decomposition.
Workshop talks by Rich Sutton never fail to inspire 💭. Today’s
#ICML2020
Life-Long Learning workshop talk was no different. Exciting ideas about RL agents that learn their own questions & answers in a virtuous cycle 🔴🔄🔵 - all within the General Value Function framework.
Very happy to present our work "On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning“ today at the
#ICLR2021
@neverendingrl
workshop. 🎲 + 🤖🔁🌎
Paper 📜:
Poster Session 📢 [3 & 10pm CET]:
Summary 👇
Neural net symmetries induce geometric constraints 🔷 which imply conservation laws under ∇-flow 🧑🔬 This allows for exact prediction of training dynamics. A Noether’s theorem for NNs — great theoretical work by Kunin et al. (2020)
#mlcollage
[7/52]
📝:
✂️Why can we train sparse/subspace-constrained NNs? Larsen et al. derive a theory based on Gordon's Escape Theorem 🧑 → 🌔 & investigate optimized (lottery) subspaces using train data/trajectory info🎲
🎨
#mlcollage
[28/52]
📜:
💻:
⛩️ Gated Linear Networks (Veness et al., 19') are backprop-free & trained online + local via convex programming 🧮 GLNs combat catastrophic forgetting & the linearity allows for interpretable predictions.
#mlcollage
[15/52]
📜:
💻:
🔎 How can one measure the emergence of interpretable concept units in CNNs?
@davidbau
et al. propose network dissection 💉 based on the agreement of filter activations and segmentation models 🎨
#mlcollage
[26/52]
📜:
💻:
4 challenges in lifelong learning 👶-🧑-👵: Formalism, evaluation, exploration & representation. Great start to the Lifelong ML workshop at
#ICML2020
by
@katjahofmann
,
@luisa_zintgraf
&
@contactrika
. P.S.: I have never seen such smooth multi-speaker transitions 😎
Nothing better than starting your day with some invertible models 🤠 Great historic review & explanations by
@laurent_dinh
at
#ICLR2020
! 🤖 Biggest personal takeaway: The power of sparse/triangular Jacobians in determinant computation 📐
🦎/🧬Learned Evolutionary Optimization (& Rob 😋) are going on tour! Super excited to be giving talks about our recent work on meta-discovering attention-based ES/GA & JAX during the coming days 🎙️
@AutomlSeminar
: Today 4pm CET
@ml_collective
: Tomorrow 7pm CET
Come & say hi 🤗
Powerful opening
#NeurIPS2019
keynote by
@celestekidd
! Many inspirational thoughts from developmental psychology. Curiosity and intrinsic motivation in RL have a lot of work to do.
Can we go beyond backprop + SGD? BLUR (Sandler et al., 21') meta-learns a shared low-dimensional genome 🦎 which modulates bi-directional updates 🔁 It generalizes across tasks + FFW architectures & allows NNs to have many states 🧠
#mlcollage
[16/52]
📜:
A global workspace theory for coordination among neural modules in deep learning🧠🔄 🤖 Goyal et al. (21') propose a low-dim. bottleneck to facilitate synchronisation of specialists & replace costly pairwise attention interactions 🚀
#mlcollage
[11/52]
📜:
🤸Very excited to share evosax 🦎 release v.0.10.0 and a small paper, which covers all features and summarizes recent progress in hardware accelerated & JAX-powered evolutionary optimization!
🧑💻:
📜:
Many new features... 🧵
🦋 Meta-Policy Gradients ∇∇ have the power to change how we think about algorithm design 🧠. Learn more about automated online hyperparameter tuning and end-to-end RL objective discovery 🤖 in my new blog post!
📝:
⏰ Clockwork VAEs by Saxena et al. (21') scale temporally abstract latent dynamics models by imposing fixed clock speeds for different levels 📐 Very cool ablations that extract the level-info content and frequency adaptation 🧠
#mlcollage
[10/52]
📜:
Workshop talks should push conceptual limits. Fascinating talk by Rich Sutton at the Bio&Artificial RL workshop
#NeurIPS2019
#SuperDyna
P.S.: I will do my best 🧠🧐✍️
Thought provoking talk by
@white_martha
on the ingredients for BETRRL at the
#ICLR2020
workshop🌏! Many interesting ideas for generalization in Meta-RL, learning objectives, restricting complex MDPs & auxiliary tasks 🚀🧐
How does the RL problem affect the lottery ticket phenomenon 🤖🔁🎲? In our
#ICLR2022
spotlight we contrast RL & behavioral cloning tickets, disentangle mask/initialization ticket contributions & analyse the resulting sparse task representations. 🧵👇
📝:
For anyone who didn't catch our (w.
@yujin_tang
&
@alanyttian
) poster presentation on the coolest neuroevolution benchmark out there -- feel free to reach out & chat 📩
Would love to discuss evosax, gymnax and the future of evolutionary methods in the LLM era 🤗
#NeurIPS23
🎉 Stoked to share NeuroEvoBench – a JAX-based Evolutionary Optimizer benchmark for Deep Learning 🦎/🧬
🌎 To be presented at
#NeurIPS2023
Datasets & Benchmarks with
@yujin_tang
&
@alanyttian
🌐:
📜:
🧑💻:
🥱 Training foundation models is so 2023 😋
🚀 Super stoked for
@SakanaAILabs
first release showing how to combine large open-source models in weight and data flow space!
All powered by evolutionary optimization 🦎
Introducing Evolutionary Model Merge: A new approach bringing us closer to automating foundation model development. We use evolution to find great ways of combining open-source models, building new powerful foundation models with user-specified abilities!
🧙 What are representational differences between Vision Transformers & CNNs?
@maithra_raghu
et al. investigate the role of self-attention & skip connections in aggregation & propagation of global info 🔎
🎨
#mlcollage
[32/52]
📜:
Trying something new 🎉 - One slide mini-collage of my personal 'paper of the week' 📜
1/52: VQ-VAEs had quite the week in ML 🥑+🪑=🦋 But how do β-VAEs relate to the visual ventral stream?
Checkout Higgins et al. (2020) to find out 👉
❓How to efficiently estimate unbiased ∇ in unrolled optimization problems (e.g. hyperparameter tuning, learned optimizers)?🦎 Persistent ES does so by accumulating & applying correction terms for a series of truncated unrolls. 🎨
#mlcollage
[35/52]
📜:
Synthetic ∇s hold the promise of decoupling neural modules 🔵🔄🔴 for large-scale distributed training based on local info. But what are underlying mechanisms & theoretical guarantees? Check out Czarnecki et al. (2017) to find out.
#mlcollage
[5/52]
📝:
🎉 Excited to share `mle-hyperopt` v0.0.5 - a lightweight hyperparameter optimization tool, which now also features implementations of Successive Halving 🪓, Hyperband 🎸 & Population-Based Training 🦎
📂 Repo:
📜 Colab:
What is the right framework to study generalization in neural nets? 🧠🔄🤖
@PreetumNakkiran
et al. (21') study the gap between models trained to minimize the empirical & population loss 📉 Providing a new 🔍 for studying DL phenomena
#mlcollage
[13/52]
📜:
Had a great time at last week's
@sparsenn
workshop ✂️ Absolutely loved the
@thoefler
's tutorial covering many considerations (what, when, how). Beautiful distillation 🎨 Checkout the accompanying survey paper & recording 🤗
📜:
📺:
🧬 Evolution is the ultimate discovery process & its biological instantiation is the only proof of an open-ended process that has led to diverse intelligence!
One of my deepest beliefs: A scalable evolutionary computation analogue will open up many new powerful perspectives 🧑🔬
🎙️Stocked to present evosax tomorrow at
@PyConDE
It has been quite the journey since my 1st blog on CMA-ES 🦎 and I have never been as stoked about the future of evo optim. 🚀
Slides 📜:
Code 🤖:
Event 📅:
Can memory-based meta-learning not only learn adaptive strategies 💭 but also hard-code innate behavior🦎? In our
#AAAI2022
paper
@sprekeler
& I investigate how lifetime, task complexity & uncertainty shape meta-learned amortized Bayesian inference.
📝:
What drives hippocampus-neocortical interactions in memory consolidation?
@SaxeLab
argues for a top-down perspective & the predictability of the environment. 🧠🤓🌎
🚂 Looking for a tool to manage your training runs locally, on Slurm/Grid Engine clusters, SSH servers or GCP VMs? `mle-scheduler` provides a lightweight API to launch & monitor job queues on all of these 🚀
💻 [repo]:
📜 [colab]:
How can we create training distributions rich enough to yield powerful policies for 🦾 manipulation? OpenAI et al. (21') scale asymmetric self-play to achieve 0-shot generalisation to unseen objects 🧊🍴.
#mlcollage
[14/52]
📜:
💻:
😈 Adversarial robustness of CNNs correlates with their V1 🧠 response predictivity & can be improved by attaching a fixed-weight bio-constrained Gabor Filter-style model w. stochasticity as front-end 🚀
#mlcollage
[22/52]
📜:
💻:
I had a great time placing 5th in the Algonauts Mini-Track Challenge🎉 using SimCLR-v2 features to predict neural responses to videos 📺 → 🧠 Come & say hi next Tue at the
@CogCompNeuro
WS 📢 talks
👉
📝
🤖
🤖 How can we learn useful temporal abstractions that transfer across tasks? Veeriah et al. (21') propose to discover options by optimizing their parametrization via meta-∇ 🧠 Loved the idea to disentangle option reward & policy.
#mlcollage
[9/52]
📜:
Looking for a new dinner conversation🥘?
@KonstDaskalakis
got you covered - How about the beauty of directed parity & Sperner‘s lemma in proving Brouwer‘s fixed point theorem? 👀
#MLSS2020
video recordings:
📽️ - I:
📽️ - II:
Big shout out to
@TedPetrou
🤗 and the wonderful jupyter_to_medium package, which allowed me to export-import my notebook to medium within less than 2 minutes. Simply dope - saved me hours of my life! Can I buy you a coffee?
#opensource
Check it out:
🎉 Meta-evolved RL algorithms just became temporally aware 🤯 Phenomenal work led by
@JacksonMattT
&
@_chris_lu_
! Happy to have been a small part of the team 🤗
How can we meta-learn new RL algorithms that vastly outperform PPO and its variants? In our ICLR 2024 paper, we find that *temporally-aware* algorithms unlock performance gains, significantly beating PPO on unseen tasks!
work co-led with
@JacksonMattT
at
@whi_rl
@FLAIR_Ox
💊 Canonical capsules (Sun et al., 2020) solve the need for pre-alignment in 3⃣D point cloud tasks via a k-part capsule decomposition with attention. The representations can be used for auto-encoding, registration & classification.
#mlcollage
[6/52]
🧑💻: