Successfully completed my pet project (pun intended)! The 🤖🐕 can follow commands, navigate to target positions, kick a soccer ball, and play Taiko-no-Tatsujin 🥁
Next project: school of fish 🐟🐠🐡
New paper w/
@hardmaru
on applying attention for RL after AttentionAgent (), this time we shift our attention to the sensory neuron level. The agent's not only permutation invariant, but also robust to noises and generalizes better.
The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning
We explore RL agents that still work even when their observations get shuffled around a lot!
A fun paper w/
@yujin_tang
web
pdf
In this task, all fishes move at constant speed, they observe the states of the nearest M fishes and learn to orient so that they move in a coherent fashion. All fishes share the same MLP policy.
EvoJAX allows quick integration of new tasks, we hope to see more user extensions.
My intern efforts have been merged🎉
Based on the flocking example from jax-md <>, we have migrated the flocking movement to work in the EvoJAX environment.
check it out!
We release a new demo task in EvoJAX:
A group of agents (blue) learnt to hunt another (red). An agent gets air from N,S,E,W neighbors if they are not occupied by an opponent or a wall, and dies if its total air is less than 2.
SayTap: Language to Quadrupedal Locomotion
We use foot contact patterns as interface to bridge instructions in NL and low-level control commands.
New paper w/ Wenhao Yu, Jie Tan,
@heiga_zen
,
@AleksandraFaust
,
@ttyharada
Web
PDF
New paper w/
@hardmaru
on applying attention for RL after AttentionAgent (), this time we shift our attention to the sensory neuron level. The agent's not only permutation invariant, but also robust to noises and generalizes better.
We have well-developed infra and toolkits for RL, but the equivalent is missing for neuroevolution practitioners. We hope EvoJAX can fill this gap. Your feedback and contributions are most welcome!
EvoJAX is developed by
@yujin_tang
@alanyttian
We tried to make evolution run really fast with JAX on a wide range of tasks: MNIST, Seq2Seq, Locomotion, Multi-Agent Competition, Generative Art.
WaterWorld-Env adapted from
@karpathy
’s old JavaScript demo!
Our paper “DEIR: Efficient and Robust Exploration through Discriminative-Model-Based Episodic Intrinsic Rewards'' has been accepted at IJCAI 2023 (15% acceptance rate, w/
@swan_104
@tkaneko
@alanyttian
). DEIR explores more efficiently, especially in partially observable tasks.
Fusing Gundam with Transformer is the "Gotenks" magic in the model space!
By merging a JP LM with specialist models, we achieve top perf on various JP LM benchmarks.
We explored param and data flow space merging, and will further propel the tech across modalities and functions.
Introducing Evolutionary Model Merge: A new approach bringing us closer to automating foundation model development. We use evolution to find great ways of combining open-source models, building new powerful foundation models with user-specified abilities!
Discovering EvoLLM: Harnessing LLMs as evolutionary operators unveils remarkable insights. Thanks
@RobertTLange
for pioneering this work.
Merging evolutionary algorithms & LLMs could unlock a realm of exciting opportunities. A promising avenue for future exploration! ✨
Inspired by
@Troika_London
's work, I used EvoJAX () to create a visual illusion in sim, where a chain of particles place themselves in space to show a square from the front and a heart when observed from the back. The video below shows an initial result.
Our work has been accepted at CoRL 2023, we thank our reviewers for the insightful comments and feedback.
SayTap is also featured in our Google AI blog:
See you in Atlanta :)
SayTap: Language to Quadrupedal Locomotion
We use foot contact patterns as interface to bridge instructions in NL and low-level control commands.
New paper w/ Wenhao Yu, Jie Tan,
@heiga_zen
,
@AleksandraFaust
,
@ttyharada
Web
PDF
Introducing a new language-to-reward system for interfacing LLMs with robots using reward functions. Learn how the system’s predictive control tool enables users to teach robots novel actions using natural language inputs →
SayTap: Language to Quadrupedal Locomotion
paper page:
Large language models (LLMs) have demonstrated the potential to perform high-level planning. Yet, it remains a challenge for LLMs to comprehend low-level commands, such as joint angle targets or
Our robot not only understands direct instructions such as “trot forward fast” but also responds to vague human commands.
Its reaction to “Act as if the ground is very hot” is well aligned with my expectation and is my personal favorite.
Check out our website for more videos!
Join our VLM+🤖 workshop at
#ICRA
in Yokohama, Japan! Share your cool projects, meet awesome people, and explore cutting-edge research. The paper submission deadline is March 11 🚀
Large Language Models (LLMs) and Vision-Language Models (VLMs) are poised to revolutionize robotics.
Join our workshop at
#ICRA2024
on VLMs/LLMs for scene understanding, decision making, control, and more:
Submissions due March 11, 2024!
We contribute a LLM prompt, a reward design and a training pipeline that allows us to train a quadrupedal locomotion controller in ~20 min on a single V100 GPU. And the controller can be transferred to a real robot without any fine-tuning.
Figure↓ gives an overview of our system
🚀 How can meta-learning, self-attention & JAX power the next generation of Evolutionary Optimizers 🦎?
Excited to share my
@DeepMind
internship project and our
#ICLR2023
paper ‘Discovering Evolution Strategies via Meta-Black-Box Optimization’ 🎉
📜:
Join us today to chat about our
#NeurIPS2021
paper “The Sensory Neuron as a Transformer: Permutation-Invariant Neural Networks for Reinforcement Learning” at the spotlight session
info
poster
08:30PST, 11:30EST, 16:30GMT, 01:30JST
🎉 Stoked to share NeuroEvoBench – a JAX-based Evolutionary Optimizer benchmark for Deep Learning 🦎/🧬
🌎 To be presented at
#NeurIPS2023
Datasets & Benchmarks with
@yujin_tang
&
@alanyttian
🌐:
📜:
🧑💻:
While gradient descent has been very successful, let's not forget there are other options that may lead to surprisingly good results. This work by
@RobertTLange
is super interesting!
🦎/🧬Learned Evolutionary Optimization (& Rob 😋) are going on tour! Super excited to be giving talks about our recent work on meta-discovering attention-based ES/GA & JAX during the coming days 🎙️
@AutomlSeminar
: Today 4pm CET
@ml_collective
: Tomorrow 7pm CET
Come & say hi 🤗
Fusing Gundam with Transformer is the "Gotenks" magic in the model space!
By merging a JP LM with specialist models, we achieve top perf on various JP LM benchmarks.
We explored param and data flow space merging, and will further propel the tech across modalities and functions.
Vizier is an indispensable tool for many ML engineers inside Google, I'm sure external users will benefit from it as well. Also, I'm glad that our EvoJAX () is used by this open source Vizier.
At each step, an agent can choose to stay still or move to one of the N,S,E,W neighbors. It also observes the entire field. We employ an attention based policy network and PGPE to solve this task. The learnt policy zero-shot generalizes to different number of agents↓
We simplify the new intrinsic reward with BH inequality to make training tractable. We also introduce a discriminative model that learns to tell genuine transitions from fake trajectories for better embeddings.
Paper:
Video:
DEIR scales novelty-based intrinsic reward with a conditional mutual information term that relates actions with distances between past and present observations. Agents are thus able to tell true novelties from those rooted from the stochasticity in the environments' dynamics.
Introducing 𝗥𝗼𝗯𝗼𝗣𝗶𝗮𝗻𝗶𝘀𝘁 🎹🤖, a new benchmark for high-dimensional robot control! Solving it requires mastering the piano with two anthropomorphic hands.
This has been one year in the making, and I couldn’t be happier to release it today! Some highlights below:
@togelius
@hardmaru
@GoogleAI
Just a follow-up, we’ve released an implementation of MAP-Elites (), and will gradually implement other QD methods as well.
As always, we’d love user feedback and contributions :)
I'd like to thank
@moverfitted
for taking the time and effort in making the video, it is the highest possible reward for the authors. I especially love the extra comparison results presented near the end of the video, I wish we've done it.
The Sensory Neuron as a Transformer in PyTorch via
@YouTube
Definitely a cool paper and I hope some of you could find my take on the implementation helpful. I only focused on the CartPoleSwingUp task to make things easier.
@yujin_tang
@hardmaru
@noguchis
Thanks for trying our models! For the 10B model, can you try loading it with bfloat16? I believe that's what caused the inference speed difference.
My first thought (before reading the text): an image generating system created this with prompts like "cooking ramen with burning trees". Now I wonder what images will be created with "a tree hit by lightning exposes its vascular system", I guess they won't be close to this
@karpathy
@hardmaru
Hi, I'm not sure at this point if discrete representation has more benefits that cont ones (playing with in grad free ways is definitely a huge one tho). But I think it's less explored (I can be wrong on this), and it's exciting to find out more.
Read this book at the weekend. PFN is a great company, this book tells about its vision, value, management. I like chap 8 the most where it emphasizes personal robots. I'm also enthusiastic about robots, and hope what I'm doing can add to this bright future.
Google cares about mental health, there are sessions that introduce methods to deal with stress. One important thing is to get good sleeps. My personal experience says the most effective way to have sound sleeps is to NEVER check your agent's learning curve before going to bed.
@danbri
@hardmaru
Not quite. We don't train the agent with permuted inputs and hope it memorize various patterns, PI is entirely due to design. On the other hand, in occluded Pong, we did drop some fraction of inputs. But it's more general than dropout, we can accept inputs of arbitrary sizes.
@truth_tesla
@karpathy
@hardmaru
The agent has two parts: self-att based visual module and a controller. The former processes the entire image (quote: maximize input data), the latter uses only the selected patches' feature (quote: throw away data based on training). Is this different from what you meant?
@Troika_London
Thanks to
@zzznah
's pure jax renderer (), I was able to run entire training pipeline on accelerators.
This is only an initial result and there can be a lot of improvements. I'll release the code as an EvoJAX example.
Russia’s ruble hit a seven-year high, cementing its status as the world’s best-performing currency.
It has gained about 35% so far this year, and has more than doubled from a low after the invasion of Ukraine.
Shinjiro Koizumi, son of former PM Junichiro Koizumi, is widely disliked in Japan for dumb statements that have zero substance.
As a result, he's the source of many memes. People often just post a picture of his face with a meaningless or obvious statement underneath. Thread.
@tkasasagi
Oasis is great for in the bath/train reading (scribe is a little heavy and not water proof). For papers, you'll need a larger screen. I have one Fujitsu Quaderno, and borrowed a boox from a friend, both are great. I don't take notes, so I have no experience with their pens.
@chenyuio
@hardmaru
While we didn't eliminate pruning entirely, we played with the number of important patches in CarRacing with noisy bkg. Performance positively correlates with this number as expected, but I think the agent will generalize worse due to these redundant patches.