Since the introduction of Gym (now Gymnasium) in 2016 and PettingZoo in 2020, these libraries have helped to provide a common API for training libraries and environments to build upon. In line with Farama’s long term goal, described in our Announcing The Farama Foundation blog post, we propose Minari, a dataset API for Offline Reinforcement Learning (Offline RL). Minari provides the capability to create your own environment-based datasets, to download open-source datasets and to upload your own datasets for others to use. This blog post is dedicated to outlining what Offline Reinforcement Learning is, the design philosophy of Minari, and our plan going forward.
You can start playing around with Minari today, see our website, minari.farama.org for example implementations and tutorials.
The majority of previous Reinforcement Learning research has focused on online learning where an agent/policy actively interacts with an existing environment to update and improve over time. This is commonly achieved through simulations, where the environment is a videogame or physics engine, i.e. Atari games or MuJoCo. However, when deploying online Reinforcement Learning to the real-world such as robotics, autonomous driving, energy management or healthcare, this becomes more challenging, due to the need for trial and in particular error for an agent to learn. For example, training a self-driving car to navigate through a city with online reinforcement learning would require the agent learning from scratch which with pedestrians and other drivers is an unacceptable safety hazard.
An alternative method is to train the agent through the use of a large dataset of human driving experience such that the agent can learn safe driving with real data before ever being deployed to the real world. This approach is the foundation of offline reinforcement learning, which has seen an explosion in use and research in the past few years [1]. In comparison to online RL where agents learn directly interacting with the environment, agents learn in Offline RL through updating the policy from samples of a static dataset of previously collected data.
The collected dataset can be generated from humans, a suboptimal policy, or any sort of control system that gives actions for the agent. This approach has already shown promising results for robotics, video games, disease mitigation, autonomous driving, generalist agents as well as other industry applications like the recommendation system of Spotify and Amazon’s research on order fraud evaluation for e-commerce.
Within Offline RL there already exists a number of open-source datasets. However none of these use the same API for users to interact with; for example Bridge, RoboNet, and VAL for real visual control learning robotics, D4RL and RL Unplugged which contain benchmark datasets of different simulated environments, and Crowdplay which provides a human-interaction interface to collect datasets from RL environments.
Today, the Farama Foundation is introducing Minari as one of its core API packages alongside Gymnasium and PettingZoo, to serve as an open-source standard API and reference collection of Offline RL datasets. We believe that by open-sourcing a big collection of standard datasets, researchers can forward the field more efficiently, effectively, and collaboratively. We aim for Minari to become the de facto API for open-source offline RL datasets that will support the development of new algorithms and provide researchers a common benchmark to compare results. We’re uniquely poised to do this given that we’re a neutral nonprofit with a diverse board of directors that maintain a number of open source RL environments, i.e., minigrid, miniworld and gymnasium-robotics. This release serves as our first push into the offline RL space, a space that we want to begin supporting at a comparable level to online RL going forward. We plan to integrate Minari into all of the environments that Farama maintains, as well as work with third party libraries to use it, similar to what we do for Gymnasium and PettingZoo.
Furthermore, several major open source RL projects have agreed to switch to using Minari as their standard, and we hope to see many releases along these lines in the future. As a result, we’re planning to deprecate D4RL in favor of Minari but already include several D4RL datasets (Adroit Hand, Point Maze and Kitchen) in Minari, and are actively working on adding the rest.
When designing Minari, we have worked to incorporate several key ideas:
We envision a bright future for Offline RL with many new applications in different disciplines, and hope that Minari will be an integral part of this process. To further this goal, we have the following high level roadmap:
A lot of work needs to be done and, and we’re continuing to develop Minari. If you want to be part of this journey we would love to hear from you and look forward to your contributions. If you have questions or want to be a part of this journey, the best way to get in touch with us is to join our discord server. We hope to see you there.
[1] Levine, Sergey, et al. “Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems”. ArXiv [Cs.LG], 2020, http://arxiv.org/abs/2005.01643. arXiv.
[2] R. F. Prudencio, M. R. O. A. Maximo and E. L. Colombini, “A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems,” in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2023.3250269.