An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World

License Build Status

Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic manipulation tasks. We aim to provide task distributions that are sufficiently broad to evaluate meta-RL algorithms' generalization ability to new behaviors.

For more background information, please refer to our website and the accompanying conference publication, which provides baseline results for 8 state-of-the-art meta- and multi-task RL algorithms.

Table of Contents

Join the Community

Join our mailing list: [email protected] for infrequent announcements about the status of the benchmark, critical bugs and known issues before conference deadlines, and future plans, please

Need some help? Have a question which is not quite a bug and not quite a feature request?

Join the community Slack by filling out this Google Form.

Installation

Meta-World is based on MuJoCo, which has a proprietary dependency we can't set up for you. Please follow the instructions in the mujoco-py package for help. Once you're ready to install everything, run:

pip install git+https://github.com/rlworkgroup/[email protected]#egg=metaworld

Alternatively, you can clone the repository and install an editable version locally:

git clone https://github.com/rlworkgroup/metaworld.git
cd metaworld
pip install -e .

Using the benchmark

Here is a list of benchmark environments for meta-RL (ML*) and multi-task-RL (MT*):

  • ML1 is a meta-RL benchmark environment which tests few-shot adaptation to goal variation within single task. You can choose to test variation within any of 50 tasks for this benchmark.
  • ML10 is a meta-RL benchmark which tests few-shot adaptation to new tasks. It comprises 10 meta-train tasks, and 3 test tasks.
  • ML45 is a meta-RL benchmark which tests few-shot adaptation to new tasks. It comprises 45 meta-train tasks and 5 test tasks.
  • MT10, MT1, and MT50 are multi-task-RL benchmark environments for learning a multi-task policy that perform 10, 1, and 50 training tasks respectively. MT1 is similar to ML1 becau you can choose to test variation within any of 50 tasks for this benchmark. In the original Metaworld experiments, we augment MT10 and MT50 environment observations with a one-hot vector which identifies the task. We don't enforce how users utilize task one-hot vectors, however one solution would be to use a Gym wrapper such as this one

Basics

We provide a Benchmark API, that allows constructing environments following the gym.Env interface.

To use a Benchmark, first construct it (this samples the tasks allowed for one run of an algorithm on the benchmark). Then, construct at least one instance of each environment listed in benchmark.train_classes and benchmark.test_classes. For each of those environments, a task must be assigned to it using env.set_task(task) from benchmark.train_tasks and benchmark.test_tasks, respectively. Tasks can only be assigned to environments which have a key in benchmark.train_classes or benchmark.test_classes matching task.env_name.

Please see below for some small examples using this API.

Running ML1 or MT1

import metaworld
import random

print(metaworld.ML1.ENV_NAMES)  # Check out the available environments

ml1 = metaworld.ML1('pick-place-v1') # Construct the benchmark, sampling tasks

env = ml1.train_classes['pick-place-v1']()  # Create an environment with task `pick_place`
task = random.choice(ml1.train_tasks)
env.set_task(task)  # Set task

obs = env.reset()  # Reset environment
a = env.action_space.sample()  # Sample an action
obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

MT1 can be run the same way except that it does not contain any test_tasks

Running a benchmark

Create an environment with train tasks (ML10, MT10, ML45, or MT50):

import metaworld
import random

ml10 = metaworld.ML10() # Construct the benchmark, sampling tasks

training_envs = []
for name, env_cls in ml10.train_classes.items():
  env = env_cls()
  task = random.choice([task for task in ml10.train_tasks
                        if task.env_name == name])
  env.set_task(task)
  training_envs.append(env)

for env in training_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

Create an environment with test tasks (this only works for ML10 and ML45, since MT10 and MT50 don't have a separate set of test tasks):

import metaworld
import random

ml10 = metaworld.ML10() # Construct the benchmark, sampling tasks

testing_envs = []
for name, env_cls in ml10.test_classes.items():
  env = env_cls()
  task = random.choice([task for task in ml10.test_tasks
                        if task.env_name == name])
  env.set_task(task)
  testing_envs.append(env)

for env in testing_envs:
  obs = env.reset()  # Reset environment
  a = env.action_space.sample()  # Sample an action
  obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action

Citing Meta-World

If you use Meta-World for academic research, please kindly cite our CoRL 2019 paper the using following BibTeX entry.

@inproceedings{yu2019meta,
  title={Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning},
  author={Tianhe Yu and Deirdre Quillen and Zhanpeng He and Ryan Julian and Karol Hausman and Chelsea Finn and Sergey Levine},
  booktitle={Conference on Robot Learning (CoRL)},
  year={2019}
  eprint={1910.10897},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
  url={https://arxiv.org/abs/1910.10897}
}

Accompanying Baselines

If you're looking for implementations of the baselines algorithms used in the Metaworld conference publication, please look at our sister directory, Garage. Note that these aren't the exact same baselines that were used in the original conference publication, however they are true to the original baselines.

Become a Contributor

We welcome all contributions to Meta-World. Please refer to the contributor's guide for how to prepare your contributions.

Acknowledgements

Meta-World is a work by Tianhe Yu (Stanford University), Deirdre Quillen (UC Berkeley), Zhanpeng He (Columbia University), Ryan Julian (University of Southern California), Karol Hausman (Google AI), Chelsea Finn (Stanford University) and Sergey Levine (UC Berkeley).

The code for Meta-World was originally based on multiworld, which is developed by Vitchyr H. Pong, Murtaza Dalal, Ashvin Nair, Shikhar Bahl, Steven Lin, Soroush Nasiriany, Kristian Hartikainen and Coline Devin. The Meta-World authors are grateful for their efforts on providing such a great framework as a foundation of our work. We also would like to thank Russell Mendonca for his work on reward functions for some of the environments.

Owner
Reinforcement Learning Working Group
Coalition of researchers which develop open source reinforcement learning research software
Reinforcement Learning Working Group
Comments
  • All environments produce observations outside of observation space.

    All environments produce observations outside of observation space.

    The following is a minimal working example which shows that all of the environments produce observations outside of their observation space. All it does is iterate over each environment from ML1, sample and set a task for the given environment, then take random actions in the environment and test whether or not the observations are inside the observation space, and at which indices (if any) an observation lies outside of the bounds of the observation space. You will get different results depending on the value of TIMESTEPS_PER_ENV, but setting this value to 1000 should yield violating observations for most environments. This is an issue, say, for RL implementations like RLlib which expect observations to be inside the observation space, and makes the environment incompatible with such libraries. This might be related to issue #31, though that issue only points out incorrect observation space boundaries regarding the goal coordinates, and the script below should point out that there are violations in other dimensions as well.

    import numpy as np
    from metaworld.benchmarks import ML1
    
    TIMESTEPS_PER_ENV = 1000
    
    def main():
    
        # Iterate over environment names.
        for env_name in ML1.available_tasks():
    
            # Create environment.
            env = ML1.get_train_tasks(env_name)
            tasks = env.sample_tasks(1)
            env.set_task(tasks[0])
    
            # Get boundaries of observation space and initial observation.
            low = env.observation_space.low
            high = env.observation_space.high
            obs = env.reset()
    
            # Create list of indices of observation space whose bounds are violated.
            broken_indices = []
    
            # Run environment.
            for _ in range(TIMESTEPS_PER_ENV):
    
                # Test if observation is outside observation space.
                if np.any(np.logical_or(obs < low, obs > high)):
                    current_indices = np.argwhere(np.logical_or(obs < low, obs > high))
                    current_indices = current_indices.reshape((-1,)).tolist()
                    for current_index in current_indices:
                        if current_index not in broken_indices:
                            broken_indices.append(current_index)
        
                # Sample action and perform environment step.
                a = env.action_space.sample()
                obs, reward, done, info = env.step(a)
    
            # Print out which indices of observation space were violated.
            broken_indices = sorted(broken_indices)
            print("%s broken indices: %r" % (env_name, broken_indices))
    
    if __name__ == "__main__":
        main()
    
  • Vectorizing Envs over Many Workers Results in Memory Overflow

    Vectorizing Envs over Many Workers Results in Memory Overflow

    Currently, I'm using RLlib for running metaworld envs, where each worker runs many vectorized instances of the environment.

    Tried running MAML/ProMP with 40 workers and 20 envs/worker on one of the Metaenvs (Push). Rllib can train on this for couple iterations before crashing due to memory overflow (can't allocate more memory). Not sure on what exactly is the issue, but do you have some leads on what could be the issue? I was thinking that it might be a memory leak, but trying this on a lower # of workers resulted in worse training overall but most importantly no crashing.

  • Updated Observation Space for SawyerReachPushPickPlaceEnv

    Updated Observation Space for SawyerReachPushPickPlaceEnv

    Fix for issue #39 for a single environment. A detailed explanation of the logic behind these changes is found here: https://github.com/rlworkgroup/metaworld/issues/39#issuecomment-632422667

  • ML1 Tasks for Constant Goals

    ML1 Tasks for Constant Goals

    Currently, we are trying to use specific environments in ML1 to set a goal constant per task in a MAML-setting (with env.reset() meaning that initial positions change but goal stays constant)

    However, we are not clear on what a task means in the ML1 setting. Based on the code for one of the environments we are trying to run, it seems like calling self.set_task will update self.goal. However, when the environment is reset, self._state_goal is initially self.goal but is then assigned a randomly generated goal + a concatenation of initial reacher arm positions, which also appears to be random. When self.random_init is False, it works as intended but the starting states are constant.

    We wondering if there is a way to define a task using the metaworld API such that for a given task a goal position is held constant but initial observation changes when env.reset() is called.

  • Scripted policies for reach-push-pick-place environment

    Scripted policies for reach-push-pick-place environment

    Hard-coded policies for the sawyer reach-push-pick-place environment. Since random_init=True, a test may fail occasionally, but that hasn't happened to me yet.

    Note that these tests will fail on master, as the env is not solvable without changes from #95

    Partially addresses #90

  • Missing one-hot vector

    Missing one-hot vector

    Hello :)

    The documentation mentions that MT10 and MT50 augment environment observations with a one-hot vector which identifies the task.. When I create a MT10 instance (using the code below) I do not get the task id. Could you please explain what am I missing.

    import metaworld
    import random
    
    mt10 = metaworld.MT10() # Construct the benchmark, sampling tasks
    
    training_envs = []
    for name, env_cls in mt10.train_classes.items():
      env = env_cls()
      task = random.choice([task for task in mt10.train_tasks
                            if task.env_name == name])
      env.set_task(task)
      training_envs.append(env)
    
    for env in training_envs:
      obs = env.reset()  # Reset environment
      a = env.action_space.sample()  # Sample an action
      obs, reward, done, info = env.step(a)  # Step the environoment with the sampled random action
    

    The shape of obs is (12,) and not (13,)

  • Reproducing Figure 11 and reporting success rate

    Reproducing Figure 11 and reporting success rate

    Hi all and @avnishn,

    I've been trying to reproduce results from Figure 11 in https://arxiv.org/pdf/1910.10897.pdf using https://github.com/rlworkgroup/garage/blob/08492007d6e2d9ead9beb83a8a4247e52019ac7d/metaworld_examples/sac_metaworld.py and hyper-parameters reported in Table 3. Should I use Table 3 for hyper-parameters?

    One thing which is not clear to me is how the success rate is reported. I notice the env.step returns 'success' but want to verify here that is what reported in the paper. Here is the code the I use to report results ( random action is used for simplicity):

    from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE
    env_cls = ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE['hammer-v2-goal-observable']
    eval_env= env_cls(seed=0)
    eval_env.seed(0)
    avg_reward = 0 
    success_rate = 0 
    num_evals = 2
    
    for _ in range(num_evals):
        obs = eval_env.reset()
        done = False
        stp = 0
        while not done and stp < eval_env.max_path_length:
            obs, reward, done, info = eval_env.step(eval_env.action_space.sample())
            avg_reward += reward
            stp += 1
            if 'success' in info:
                success_rate += info['success']
    avg_reward /= num_evals
    success_rate /= num_evals
    

    Is this the right way to report the success rate like Figure 11? Thanks for your help. Rasool

  • Some question about metaworld environment.

    Some question about metaworld environment.

    I'm interested in your MetaWorld. It will be good benchmark about Meta-RL. I have some questions about this benchmark.

    1. When start at environment, sawyer arm move to specific pose during first K steps.(K is about 10) It seems like 'init' function called by mujoco. I think this will cause problems when agent do reinforcement learning. Is this intended?

    2. Reward gap is big between some environment. When reach env get almost 100 at first time, push env get small reward(between 0 and 1). Is this intended?

  • Missing Environments

    Missing Environments

    If you try running scripts/demo_sawyer.py, many of the imports don't work because of missing environment such as from metaworld.envs.mujoco.sawyer_xyz.sawyer_stack import SawyerStackEnv

  • Confuse about the success rate

    Confuse about the success rate

    I am new to metaworld! Thank you for the configuration of such meaningful and complex projects. I have some questions about the metaworld. In the project setting, does the success rate represent the success in one step or one episode? Because in your project, the agent will reach the max path length in one episode, such as 150, so does the success rate represent the average success rate in 150 steps or the success rate in the last step? Can you give me some points about it.

  • Allow setting the seed when sampling configurations

    Allow setting the seed when sampling configurations

    This is useful for reproducibility, but might also place a larger responsibility on us to ensure the goal sampling process doesn't change across versions.

  • Error when setting rand_init to False in some environments

    Error when setting rand_init to False in some environments

    In the V2 push-wall, pick-place-wall, push-back, and shelf-place environments, setting rand_init to False and then resetting the environment would lead to the following error when calling step():

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/mujoco_env.py", line 25, in inner
        return func(*args, **kwargs)
      File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/sawyer_xyz/v2/sawyer_push_wall_v2.py", line 79, in evaluate_state
        ) = self.compute_reward(action, obs)
      File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/sawyer_xyz/v2/sawyer_push_wall_v2.py", line 157, in compute_reward
        object_grasped = self._gripper_caging_reward(
      File "/mnt/beegfs/home/zchuning/metaworld/metaworld/envs/mujoco/sawyer_xyz/sawyer_xyz_env.py", line 572, in _gripper_caging_reward
        caging_xz_margin = np.linalg.norm(self.obj_init_pos[xz] - self.init_tcp[xz])
    TypeError: list indices must be integers or slices, not list
    

    This is because the reset() function of these environments sets self.obj_init_pos to self.adjust_initObjPos(self.init_config['obj_init_pos']) when self.random_init is False (e.g. line 112 in sawyer_push_wall_v2.py). However, self.adjust_initObjPos() returns a python list instead of a numpy array. So line 572 of swyer_xyz_env.py triggers an error by accessing a python list with a list of indices.

    A simple fix is to wrap a np.array() around self.adjust_initObjPos(self.init_config['obj_init_pos']), similar to how some other environments (e.g. Push V2) handle calls to fix_extreme_obs_pos(). But there might be a more systematic way to fix this, hence the github issue instead of a pull request.

  • Incorrect reset space for object in disassemble

    Incorrect reset space for object in disassemble

    The lower bound for the random reset space of the object in disassemble is higher than the upper bound for the first index. This appears to be an issue for both v1 https://github.com/rlworkgroup/metaworld/blob/18118a28c06893da0f363786696cc792457b062b/metaworld/envs/mujoco/sawyer_xyz/v1/sawyer_disassemble_peg.py#L14-L15 and v2 https://github.com/rlworkgroup/metaworld/blob/18118a28c06893da0f363786696cc792457b062b/metaworld/envs/mujoco/sawyer_xyz/v2/sawyer_disassemble_peg_v2.py#L15-L16

    np.random.uniform seems to invert the values in case high > low. However when using explicit seeding with self.np_random, seed = seeding.np_random(seed) self.np_random.uniform raises a ValueError.

  • Dtype of observation space does not match returned observation

    Dtype of observation space does not match returned observation

    Currently, the observation space does not specify a dtype, which means it defaults to float32, however the returned numpy array is by default in float64. This will lead to exceptions with more recent versions of gym as they introduced explicit checks if the returned observations are contained in the observation space. Hence, I propose to either add the dtype to the Box space constructor https://github.com/rlworkgroup/metaworld/blob/18118a28c06893da0f363786696cc792457b062b/metaworld/envs/mujoco/sawyer_xyz/sawyer_xyz_env.py#L385-L403 and/or cast the returned values from step() and reset() to the appropriate dtype.

    The same thing should probably be done for the action space as well https://github.com/rlworkgroup/metaworld/blob/18118a28c06893da0f363786696cc792457b062b/metaworld/envs/mujoco/sawyer_xyz/sawyer_xyz_env.py#L128-L131

  • Replacing mujoco-py with pybind11-based mujoco bindings

    Replacing mujoco-py with pybind11-based mujoco bindings

    Recently, other big benchmark suites, such as OpenAI gym and DMC, switched to the new pybind11-based mujoco bindings. Since these bindings are made available in DeepMind's official mujoco repository, I expect that they will be properly maintained for quite a while. In addition, this makes installation easier for users as well as potentially allows for a larger user base and fewer issues related to the mujoco installation.

    As MetaWorld uses its own MujocoEnv version, I suggest including something similar to the OpenAI gym repo to allow for a smooth transition and reproducibility of existing results. While I don't think this change is urgent, I think it is in the long term interest of both users and maintainers.

  • can not render under Tesla-V100 GPU

    can not render under Tesla-V100 GPU

    Hi, it seems that the same bug always exists under V100-32G GPU environment:

    Found 0 GPUs for rendering. Using device 0.
    File "mjsim.pyx", line 156, in mujoco_py.cymj.MjSim.render
      File "mjsim.pyx", line 158, in mujoco_py.cymj.MjSim.render
      File "mjrendercontext.pyx", line 46, in mujoco_py.cymj.MjRenderContext.__init__
      File "mjrendercontext.pyx", line 114, in mujoco_py.cymj.MjRenderContext._setup_opengl_context
      File "opengl_context.pyx", line 130, in mujoco_py.cymj.OffscreenOpenGLContext.__init__
    RuntimeError: Failed to initialize OpenGL
    

    My code works well on P100 GPUs, does it mean that metaworld/mujoco_py do not support V100 rendering? Have your ever encounter such problem using V100 GPUs?

  • mujoco_py.cymj.GlfwError: Failed to initialize GLFW

    mujoco_py.cymj.GlfwError: Failed to initialize GLFW

    When I use this repo on the server without interface, I Get an error : mujoco_py.cymj.GlfwError: Failed to initialize GLFW, I know it is because the MJViewer, but how can I fix it?

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Aug 1, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

Aug 1, 2022
Open world survival environment for reinforcement learning
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Aug 5, 2022
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

Aug 2, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

Aug 2, 2022
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Aug 1, 2022
A toolkit for reproducible reinforcement learning research.
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Jul 31, 2022
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

Aug 8, 2022
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Aug 1, 2022
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Aug 2, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Aug 2, 2022
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

Jul 30, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Aug 8, 2022
Deep Reinforcement Learning for Keras.
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Aug 3, 2022
ChainerRL is a deep reinforcement learning library built on top of Chainer.
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Jul 24, 2022
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

Jul 26, 2022
This is the official implementation of Multi-Agent PPO.

MAPPO Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. Website: https://sites.google.com/view/mappo This repository implem

Aug 4, 2022
A general-purpose multi-agent training framework.

MALib A general-purpose multi-agent training framework. Installation step1: build environment conda create -n malib python==3.7 -y conda activate mali

Aug 1, 2022