MAMBA: an effective world model approach for meta-reinfocement learning

ICLR 2024

Meta-reinforcement learning (meta-RL) is a promising framework for tackling challenging domains requiring efficient exploration. Existing meta-RL algorithms are characterized by low sample efficiency, and mostly focus on low-dimensional task distributions. In parallel, model-based RL methods have been successful in solving partially observable MDPs, of which meta-RL is a special case. In this work, we leverage this success and propose a new model-based approach to meta- RL, based on elements from existing state-of-the-art model-based and meta-RL methods. We demonstrate the effectiveness of our approach on common meta-RL benchmark domains, attaining greater return with better sample efficiency (up to 15×) while requiring very little hyperparameter tuning. In addition, we validate our approach on a slate of more challenging, higher-dimensional domains, taking a step towards real-world generalizing agents.

NeRN - Learning Neural Representations for Neural Networks

ICLR 2023 - Notable Top 25% (Spotlight)

Neural Representations have recently been shown to effectively reconstruct a wide range of signals from 3D meshes and shapes to images and videos. We show that, when adapted correctly, neural representations can be used to directly represent the weights of a pre-trained convolutional neural network, resulting in a Neural Representation for Neural Networks (NeRN). We assign a coordinate to each convolutional kernel in our network, and optimize a predictor network to map coordinates to their corresponding weights. We show that incorporating a smoothness constraint over the original network's weights aids NeRN towards a better reconstruction. In addition, we employ techniques from the field of knowledge distillation to stabilize the learning process. We demonstrate the effectiveness of NeRN in reconstructing widely used architectures on CIFAR-10, CIFAR-100, and ImageNet.

Meta-RL with Finite Training Tasks - a Density Estimation Approach

NeurIPS 2022

In meta reinforcement learning, an agent learns from a set of training tasks how to quickly solve a new task, drawn from the same task distribution. The question we explore in this work is how many training tasks are required to guarantee approximately optimal behavior with high probability and what properties of the tasks distribution affect this number. We propose to directly learn the task distribution, and then train a policy on the learned task distribution. We show that our approach leads to bounds that depend on the dimension of the task distribution. In settings where the task distribution lies in a low-dimensional manifold, we extend our analysis to use dimensionality reduction techniques, obtaining significantly better bounds than previous work, which strictly depend on the number of states and actions. The key of our approach is the regularization implied by the kernel density estimation method. We further demonstrate that this regularization improves generalization in practice.

  • Paper
  • Code
  • You Got Me Dancing

    Motion Transfer, the task of re-enacting the image of a person according to the movement of another, is an active research field in computer vision. While recent methods achieve realistic looking results in controlled scenarios, it is challenging to obtain similar results in the case of complex, crowded, in-the-wild scenes. In this work we tackle this task, while integrating the synthesized person into the real-world target scene. We call this task Scene Aware Motion Transfer (SMT). In order to achieve a robust solution, we introduce a novel workflow that harnesses a set of models, each attaining state-of-the-art results in its respective field. We first construct a novel person tracking workflow to separate each unique identity from the people in the scene. Then we utilize the tracking results for a targeted single-person motion transfer, resulting in a fully automatic workflow that handles complex videos. Extensive evaluation is presented to show the quality and robustness of the results in different scenarios.