A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates)

OpenAI Gym

OpenAI Gym is a toolkit for developing and comparing reinforcement learning algorithms. This is the gym open-source library, which gives you access to a standardized set of environments.

https://travis-ci.org/openai/gym.svg?branch=master

See What's New section below

gym makes no assumptions about the structure of your agent, and is compatible with any numerical computation library, such as TensorFlow or Theano. You can use it from Python code, and soon from other languages.

If you're not sure where to start, we recommend beginning with the docs on our site. See also the FAQ.

A whitepaper for OpenAI Gym is available at http://arxiv.org/abs/1606.01540, and here's a BibTeX entry that you can use to cite it in a publication:

@misc{1606.01540,
  Author = {Greg Brockman and Vicki Cheung and Ludwig Pettersson and Jonas Schneider and John Schulman and Jie Tang and Wojciech Zaremba},
  Title = {OpenAI Gym},
  Year = {2016},
  Eprint = {arXiv:1606.01540},
}

Basics

There are two basic concepts in reinforcement learning: the environment (namely, the outside world) and the agent (namely, the algorithm you are writing). The agent sends actions to the environment, and the environment replies with observations and rewards (that is, a score).

The core gym interface is Env, which is the unified environment interface. There is no interface for agents; that part is left to you. The following are the Env methods you should know:

  • reset(self): Reset the environment's state. Returns observation.
  • step(self, action): Step the environment by one timestep. Returns observation, reward, done, info.
  • render(self, mode='human'): Render one frame of the environment. The default mode will do something human friendly, such as pop up a window.

Supported systems

We currently support Linux and OS X running Python 3.5 -- 3.8 Windows support is experimental - algorithmic, toy_text, classic_control and atari should work on Windows (see next section for installation instructions); nevertheless, proceed at your own risk.

Installation

You can perform a minimal install of gym with:

git clone https://github.com/openai/gym.git
cd gym
pip install -e .

If you prefer, you can do a minimal install of the packaged version directly from PyPI:

pip install gym

You'll be able to run a few environments right away:

  • algorithmic
  • toy_text
  • classic_control (you'll need pyglet to render though)

We recommend playing with those environments at first, and then later installing the dependencies for the remaining environments.

You can also run gym on gitpod.io to play with the examples online. In the preview window you can click on the mp4 file you want to view. If you want to view another mp4 file, just press the back button and click on another mp4 file.

Installing everything

To install the full set of environments, you'll need to have some system packages installed. We'll build out the list here over time; please let us know what you end up installing on your platform. Also, take a look at the docker files (py.Dockerfile) to see the composition of our CI-tested images.

On Ubuntu 16.04 and 18.04:

apt-get install -y libglu1-mesa-dev libgl1-mesa-dev libosmesa6-dev xvfb ffmpeg curl patchelf libglfw3 libglfw3-dev cmake zlib1g zlib1g-dev swig

MuJoCo has a proprietary dependency we can't set up for you. Follow the instructions in the mujoco-py package for help. Note that we currently do not support MuJoCo 2.0 and above, so you will need to install a version of mujoco-py which is built for a lower version of MuJoCo like MuJoCo 1.5 (example - mujoco-py-1.50.1.0). As an alternative to mujoco-py, consider PyBullet which uses the open source Bullet physics engine and has no license requirement.

Once you're ready to install everything, run pip install -e '.[all]' (or pip install 'gym[all]').

Pip version

To run pip install -e '.[all]', you'll need a semi-recent pip. Please make sure your pip is at least at version 1.5.0. You can upgrade using the following: pip install --ignore-installed pip. Alternatively, you can open setup.py and install the dependencies by hand.

Rendering on a server

If you're trying to render video on a server, you'll need to connect a fake display. The easiest way to do this is by running under xvfb-run (on Ubuntu, install the xvfb package):

xvfb-run -s "-screen 0 1400x900x24" bash

Installing dependencies for specific environments

If you'd like to install the dependencies for only specific environments, see setup.py. We maintain the lists of dependencies on a per-environment group basis.

Environments

See List of Environments and the gym site.

For information on creating your own environments, see Creating your own Environments.

Examples

See the examples directory.

Testing

We are using pytest for tests. You can run them via:

pytest

Resources

What's new

  • 2020-12-18 (v 0.18.0)
    • Add python 3.9 support
    • Remove python 3.5 support (thanks @justinkterry on both!)
    • TimeAwareObservationWrapper (thanks @zuoxingdong!)
    • Space-related fixes and tests (thanks @wmmc88!)
  • 2020-09-29 (v 0.17.3)
    • Allow custom spaces in VectorEnv (thanks @tristandeleu!)
    • CarRacing performance improvements (thanks @leocus!)
    • Dict spaces are now iterable (thanks @NotNANtoN!)
  • 2020-05-08 (v 0.17.2)
    • remove unnecessary precision warning when creating Box with scalar bounds - thanks @johannespitz!
    • remove six from the dependencies
    • FetchEnv sample goal range can be specified through kwargs - thanks @YangRui2015!
  • 2020-03-05 (v 0.17.1)
    • update cloudpickle dependency to be >=1.2.0,<1.4.0
  • 2020-02-21 (v 0.17.0)
    • Drop python 2 support
    • Add python 3.8 build
  • 2020-02-09 (v 0.16.0)
    • EnvSpec API change - remove tags field (retro-active version bump, the changes are actually already in the codebase since 0.15.5 - thanks @wookayin for keeping us in check!)
  • 2020-02-03 (v0.15.6)
    • pyglet 1.4 compatibility (this time for real :))
    • Fixed the bug in BipedalWalker and BipedalWalkerHardcore, bumped version to 3 (thanks @chozabu!)
  • 2020-01-24 (v0.15.5)
    • pyglet 1.4 compatibility
    • remove python-opencv from the requirements
  • 2019-11-08 (v0.15.4)
    • Added multiple env wrappers (thanks @zuoxingdong and @hartikainen!)
    • Removed mujoco >= 2.0 support due to lack of tests
  • 2019-10-09 (v0.15.3)
    • VectorEnv modifications - unified the VectorEnv api (added reset_async, reset_wait, step_async, step_wait methods to SyncVectorEnv); more flexibility in AsyncVectorEnv workers
  • 2019-08-23 (v0.15.2)
    • More Wrappers - AtariPreprocessing, FrameStack, GrayScaleObservation, FilterObservation, FlattenDictObservationsWrapper, PixelObservationWrapper, TransformReward (thanks @zuoxingdong, @hartikainen)
    • Remove rgb_rendering_tracking logic from mujoco environments (default behavior stays the same for the -v3 environments, rgb rendering returns a view from tracking camera)
    • Velocity goal constraint for MountainCar (thanks @abhinavsagar)
    • Taxi-v2 -> Taxi-v3 (add missing wall in the map to replicate env as describe in the original paper, thanks @kobotics)
  • 2019-07-26 (v0.14.0)
    • Wrapper cleanup
    • Spec-related bug fixes
    • VectorEnv fixes
  • 2019-06-21 (v0.13.1)
    • Bug fix for ALE 0.6 difficulty modes
    • Use narrow range for pyglet versions
  • 2019-06-21 (v0.13.0)
    • Upgrade to ALE 0.6 (atari-py 0.2.0) (thanks @JesseFarebro!)
  • 2019-06-21 (v0.12.6)
    • Added vectorized environments (thanks @tristandeleu!). Vectorized environment runs multiple copies of an environment in parallel. To create a vectorized version of an environment, use gym.vector.make(env_id, num_envs, **kwargs), for instance, gym.vector.make('Pong-v4',16).
  • 2019-05-28 (v0.12.5)
    • fixed Fetch-slide environment to be solvable.
  • 2019-05-24 (v0.12.4)
    • remove pyopengl dependency and use more narrow atari-py and box2d-py versions
  • 2019-03-25 (v0.12.1)
    • rgb rendering in MuJoCo locomotion -v3 environments now comes from tracking camera (so that agent does not run away from the field of view). The old behaviour can be restored by passing rgb_rendering_tracking=False kwarg. Also, a potentially breaking change!!! Wrapper class now forwards methods and attributes to wrapped env.
  • 2019-02-26 (v0.12.0)
    • release mujoco environments v3 with support for gym.make kwargs such as xml_file, ctrl_cost_weight, reset_noise_scale etc
  • 2019-02-06 (v0.11.0)
    • remove gym.spaces.np_random common PRNG; use per-instance PRNG instead.
    • support for kwargs in gym.make
    • lots of bugfixes
  • 2018-02-28: Release of a set of new robotics environments.

  • 2018-01-25: Made some aesthetic improvements and removed unmaintained parts of gym. This may seem like a downgrade in functionality, but it is actually a long-needed cleanup in preparation for some great new things that will be released in the next month.

    • Now your Env and Wrapper subclasses should define step, reset, render, close, seed rather than underscored method names.
    • Removed the board_game, debugging, safety, parameter_tuning environments since they're not being maintained by us at OpenAI. We encourage authors and users to create new repositories for these environments.
    • Changed MultiDiscrete action space to range from [0, ..., n-1] rather than [a, ..., b-1].
    • No more render(close=True), use env-specific methods to close the rendering.
    • Removed scoreboard directory, since site doesn't exist anymore.
    • Moved gym/monitoring to gym/wrappers/monitoring
    • Add dtype to Space.
    • Not using python's built-in module anymore, using gym.logger
  • 2018-01-24: All continuous control environments now use mujoco_py >= 1.50. Versions have been updated accordingly to -v2, e.g. HalfCheetah-v2. Performance should be similar (see https://github.com/openai/gym/pull/834) but there are likely some differences due to changes in MuJoCo.

  • 2017-06-16: Make env.spec into a property to fix a bug that occurs when you try to print out an unregistered Env.

  • 2017-05-13: BACKWARDS INCOMPATIBILITY: The Atari environments are now at v4. To keep using the old v3 environments, keep gym <= 0.8.2 and atari-py <= 0.0.21. Note that the v4 environments will not give identical results to existing v3 results, although differences are minor. The v4 environments incorporate the latest Arcade Learning Environment (ALE), including several ROM fixes, and now handle loading and saving of the emulator state. While seeds still ensure determinism, the effect of any given seed is not preserved across this upgrade because the random number generator in ALE has changed. The *NoFrameSkip-v4 environments should be considered the canonical Atari environments from now on.

  • 2017-03-05: BACKWARDS INCOMPATIBILITY: The configure method has been removed from Env. configure was not used by gym, but was used by some dependent libraries including universe. These libraries will migrate away from the configure method by using wrappers instead. This change is on master and will be released with 0.8.0.

  • 2016-12-27: BACKWARDS INCOMPATIBILITY: The gym monitor is now a wrapper. Rather than starting monitoring as env.monitor.start(directory), envs are now wrapped as follows: env = wrappers.Monitor(env, directory). This change is on master and will be released with 0.7.0.

  • 2016-11-1: Several experimental changes to how a running monitor interacts with environments. The monitor will now raise an error if reset() is called when the env has not returned done=True. The monitor will only record complete episodes where done=True. Finally, the monitor no longer calls seed() on the underlying env, nor does it record or upload seed information.

  • 2016-10-31: We're experimentally expanding the environment ID format to include an optional username.

  • 2016-09-21: Switch the Gym automated logger setup to configure the root logger rather than just the 'gym' logger.

  • 2016-08-17: Calling close on an env will also close the monitor and any rendering windows.

  • 2016-08-17: The monitor will no longer write manifest files in real-time, unless write_upon_reset=True is passed.

  • 2016-05-28: For controlled reproducibility, envs now support seeding (cf #91 and #135). The monitor records which seeds are used. We will soon add seed information to the display on the scoreboard.

Comments
  • Seeding update

    Seeding update

    See Issue #1663

    This is a bit of a ride. The base change is getting rid of the custom seeding utils of gym, and instead using np.random.default_rng() as is recommended with modern versions of NumPy. I kept the gym.utils.seeding.np_random interface and changed it to basically being a synonym for default_rng (with some API difference, consistent with the old np_random)

    Because the API (then RandomState, now Generator) changed a bit, np_random.randint calls were replaced with np_random.integers, np_random.rand -> np_random.random, np_random.randn -> np_random.standard_normal. This is all in accordance with the recommended NumPy conversion.

    My doubts in order of (subjective) importance

    Doubt number 1:

    In gym/utils/seeding.py#L18 I'm accessing a "protected" variable seed_seq. This serves accessing the random seed that was automatically generated under the hood when the passed seed is None. (it also gives the correct value if the passed seed is an integer) An alternative solution would be restoring the whole create_seed machinery which generates a random initial seed from os.urandom. I was unable to find another way to get the initial seed of a default Generator (default_rng(None)) instance.

    Doubt number 2:

    In gym/spaces/multi_discrete.py#L64. Turns out that a Generator doesn't directly support get_state and set_state. The same functionality seems to be achievable by accesing the bit generator and modifying its state directly (without getters/setters).

    Doubt number 3:

    As mentioned earlier, this version maintains the gym.utils.seeding file with just a single function. Functionally, I don't think that's a problem, but might feel a bit redundant from a stylistic point of view. This could be replaced by changing something like 17 calls of this function that occur in the codebase, but at the moment I think it'd be a bad move. The reason is that the function passes through the seed that's generated if the passed seed is None (see Doubt 1), which has to be obtained through mildly sketchy means, so it's better to keep it contained within the function. I don't think passing the seed extremely necessary, but that would somewhat significantly change the actual external API, I tend to be hesitant about changes like this if there's no good reason. Overall I think it's good to keep the seeding.np_random function to keep some consistency with previous versions. The alternative is just completely removing the concept of "gym seeding", and using NumPy. (right now "gym seeding" is basically an interface for NumPy seeding)

    Doubt number 4:

    Pinging @araffin as there's a possibility this will (again) break some old pickled spaces in certain cases, and I know this was an issue with SB3 and the model zoo. Specifically, if you create a Space with the current master branch, run space.sample() at least once, and then pickle it, it will be pickled with a RandomState instance, which is now considered a legacy generator in NumPy. If you unpickle it using new gym code (i.e. this PR), space.np_random will still point to a RandomState, but the rest of the code expects space.np_random to be a Generator instance, which has a few API changes (see the beginning of this post).

    Overall I don't know how important it is for the internal details of gym objects to remain the same - which is more or less necessary for old objects to be unpicklable in new gym versions. There's probably a way of a custom unpickling protocol as a compatibility layer - I'm not sufficiently familiar with this to do it, but I imagine it should be doable on the user side? (i.e. not in gym)

    Doubt number 2137: (very low importance)

    This doesn't technically solve #2210. IMHO this is absolutely acceptable, because np.random.seed is part of the legacy seeding mechanism, and is officially discouraged by NumPy. Proper usage yields expected results:

    import numpy as np
    from gym.utils import seeding
    
    user_given_seed = None
    np_random, seed = seeding.np_random(user_given_seed)
    
    # since in some places np.random.randn or similar is used we want to seed the global numpy random generator as well
    rng = np.random.default_rng(seed)
    

    tl;dr

    This definitely needs another set of eyes on it because I'm not confident enough about the nitty-gritty details of NumPy RNG. There are a few things I'm not 100% happy with from a stylistic point of view, but as far as my understanding goes, the functionality is what it should be. There's also the question of supporting old pickled objects, which I think is a whole different topic that needs discussing now that gym is maintained again.

  • Box2d won't find some RAND_LIMIT_swigconstant

    Box2d won't find some RAND_LIMIT_swigconstant

    Hello!

    It's probably some silly mistake on my side, but i wasn't able to fix by random lever pulling, as usual.

    Installing Box2d as in instuctions (using pip install -e .[all]) will throw error when trying to use some of Box2D examples.

    Code that reproduces the issue:

    import gym
    atari = gym.make('LunarLander-v0')
    atari.reset()
    
    [2016-05-16 02:14:25,430] Making new env: LunarLander-v0
    
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-1-f89e78f4410b> in <module>()
          1 import gym
    ----> 2 atari = gym.make('LunarLander-v0')
          3 atari.reset()
          4 #plt.imshow(atari.render('rgb_array'))
    
    /home/jheuristic/yozhik/gym/gym/envs/registration.pyc in make(self, id)
         77         logger.info('Making new env: %s', id)
         78         spec = self.spec(id)
    ---> 79         return spec.make()
         80 
         81     def all(self):
    
    /home/jheuristic/yozhik/gym/gym/envs/registration.pyc in make(self)
         52             raise error.Error('Attempting to make deprecated env {}. (HINT: is there a newer registered version of this env?)'.format(self.id))
         53 
    ---> 54         cls = load(self._entry_point)
         55         env = cls(**self._kwargs)
         56 
    
    /home/jheuristic/yozhik/gym/gym/envs/registration.pyc in load(name)
         11 def load(name):
         12     entry_point = pkg_resources.EntryPoint.parse('x={}'.format(name))
    ---> 13     result = entry_point.load(False)
         14     return result
         15 
    
    /home/jheuristic/thenv/local/lib/python2.7/site-packages/pkg_resources/__init__.pyc in load(self, require, *args, **kwargs)
       2378         if require:
       2379             self.require(*args, **kwargs)
    -> 2380         return self.resolve()
       2381 
       2382     def resolve(self):
    
    /home/jheuristic/thenv/local/lib/python2.7/site-packages/pkg_resources/__init__.pyc in resolve(self)
       2384         Resolve the entry point from its module and attrs.
       2385         """
    -> 2386         module = __import__(self.module_name, fromlist=['__name__'], level=0)
       2387         try:
       2388             return functools.reduce(getattr, self.attrs, module)
    
    /home/jheuristic/yozhik/gym/gym/envs/box2d/__init__.py in <module>()
    ----> 1 from gym.envs.box2d.lunar_lander import LunarLander
          2 from gym.envs.box2d.bipedal_walker import BipedalWalker, BipedalWalkerHardcore
    
    /home/jheuristic/yozhik/gym/gym/envs/box2d/lunar_lander.py in <module>()
          3 from six.moves import xrange
          4 
    ----> 5 import Box2D
          6 from Box2D.b2 import (edgeShape, circleShape, fixtureDef, polygonShape, revoluteJointDef, contactListener)
          7 
    
    /home/jheuristic/thenv/local/lib/python2.7/site-packages/Box2D/__init__.py in <module>()
         18 # 3. This notice may not be removed or altered from any source distribution.
         19 #
    ---> 20 from .Box2D import *
         21 __author__ = '$Date$'
         22 __version__ = '2.3.1'
    
    /home/jheuristic/thenv/local/lib/python2.7/site-packages/Box2D/Box2D.py in <module>()
        433     return _Box2D.b2CheckPolygon(shape, additional_checks)
        434 
    --> 435 _Box2D.RAND_LIMIT_swigconstant(_Box2D)
        436 RAND_LIMIT = _Box2D.RAND_LIMIT
        437 
    
    AttributeError: 'module' object has no attribute 'RAND_LIMIT_swigconstant'
    
    

    What didn't help:

    pip uninstall gym
    apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl
    git clone https://github.com/openai/gym
    cd gym
    pip install -e .[all] --upgrade
    

    The OS is Ubuntu 14.04 Server x64 It may be a clue that i am running the thing from inside python2 virtualenv (with all numpys, etc. installed)

  • New Step API with terminated, truncated bools instead of done

    New Step API with terminated, truncated bools instead of done

    Description

    step method is changed to return five items instead of four.

    Old API - done=True if episode ends in any way.

    New API - terminated=True if environment terminates (eg. due to task completion, failure etc.) truncated=True if episode truncates due to a time limit or a reason that is not defined as part of the task MDP

    Link to docs - https://github.com/Farama-Foundation/gym-docs/pull/115 (To be updated with latest changes)

    Changes

    1. All existing environment implementations are changed to new API without direct support for old API. However gym.make for any environment will default to old API through a compatibility wrapper.
    2. Vector env implementations are changed to new API, with backward compatibility for old API, defaulting to old API. New API can set by a newly added argument new_step_api=True in constructor.
    3. All wrapper implementations are changed to new API, and have backward compatibility and default to old API (can be switched to new API with new_step_api=True).
    4. Some changes in phrasing - terminal_reward, terminal_observation etc. is replaced with final_reward, final_observation etc. The intention is to reserve the 'termination' word for only if terminated=True. (for some motivation, Sutton and Barto uses terminal states to specifically refer to special states whose values are 0, states at the end of the MDP. This is not true for a truncation where the value of the final state need not be 0. So the current usage of terminal_obs etc. would be incorrect if we adopt this definition)
    5. All tests are continued to be performed for old API (since the default is old API for now). A single exception for when the test env is unwrapped and so the compatibility wrapper doesn't apply. Also, special tests are added just for testing new API.
    6. new_step_api argument is used in different places. It's meaning is taken to be "whether this function / class should output step values in new API or not". Eg. self.new_step_api in a wrapper signifies whether the wrapper's step method outputs items in new API (the wrapper itself might have been written in new or old API, but through compatibility code it will output according to self.new_step_api)
    7. play.py alone is retained in old API due to the difficulty in having it be compatible for both APIs simultaneously, and being slightly lower priority.

    StepAPICompatibility Wrapper

    1. This wrapper is added to support conversion from old to new API and vice versa.
    2. Takes new_step_api argument in __init__. False (old API) by default.
    3. Wrapper applied at make with new_step_api=False by default. It can be changed during make like gym.make("CartPole-v1", new_step_api=True). The order of wrappers applied at make is as follows - core env -> PassiveEnvChecker -> StepAPICompatibility -> other wrappers

    step_api_compatibility function

    This function is similar to the wrapper, it is used for backward compatibility in wrappers, vector envs. It is used at interfaces between env / wrapper / vector / outside code. Example usage,

    # wrapper's step method
    def step(self, action):
    
        # here self.env.step is made to return in new API, since the wrapper is written in new API
        obs, rew, terminated, truncated, info = step_api_compatibility(self.env.step(action), new_step_api=True) 
    
        if terminated or truncated:
            print("Episode end")
        ### more wrapper code
    
        # here the wrapper is made to return in API specified by self.new_step_api, that is set to False by default, and can be changed according to the situation
        return step_api_compatibility((obs, rew, terminated, truncated, info), new_step_api=self.new_step_api) 
    

    TimeLimit

    1. In the current implementation of the timelimit wrapper, existence of 'TimeLimit.truncated' key in info means that truncation has occurred. The boolean value it is set to refers to whether the core environment has already ended. So, info['TimeLimit.truncated']=False, means the core environment has already terminated. We can infer terminated=True, truncated=True from this case.
    2. To change old API to new, the compatibility function first checks info. If there is nothing in info, it returns terminated=done and truncated=False as there is no better information available. If TimeLimit info is available, it accordingly sets the two bools.

    Backward Compatibility

    The PR attempts to achieve almost complete backward compatibility. However, there are cases which haven't been included. Environments directly imported eg. from gym.envs.classic_control import CartPoleEnv would not be backward compatible as these are rewritten in new API. StepAPICompatibility wrapper would need to be used manually in this case. Envs made through gym.make all default to old API. Vector and wrappers also default to old API. These should all continue to work without problems. But due to the scale of the change, bugs are expected.

    Warning Details

    Warnings are raised at the following locations:

    1. gym.Wrapper constructor - warning raised if self.new_step_api==False. This means any wrapper that does not explicitly pass new_step_api=True into super() will raise the warning since self.new_step_api=False by default. This is taken care of by wrappers written inside gym. Third party wrappers will face a problem in a specific situation - if the wrapper is not impacted by step API. eg. a wrapper subclassing ActionWrapper. This would work without any change for both APIs, however to avoid the warning, they still need to pass new_step_api=True into super(). The thinking is - "If your wrapper supports new step API, you need to pass new_step_api=True to avoid the warning".
    2. PassiveEnvChecker, passive_env_step_check function - if step return has 4 items a warning is raised. This happens only once since this function is only run once after env initialization. Since PassiveEnvChecker is wrapped first before step compatibility in make, this will raise a warning based on the core env implementation's API.
    3. gym.VectorEnv constructor - warning raised if self.new_step_api==False.
    4. StepAPICompatibility wrapper constructor - the wrapper that is applied by default at make. If new_step_api=False, a warning is raised. This is independent of whether the core env is implemented in new or old api and only depends on the new_step_api argument.
    • [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [x] This change requires a documentation update

    Checklist:

    • [x] I have run the pre-commit checks with pre-commit run --all-files
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation (need to update with latest changes)
    • [x] My changes generate no new warnings (only intentional warnings)
    • [ ] I have added tests that prove my fix is effective or that my feature works (added two tests but maybe needs more to be comprehensive)
    • [x] New and existing unit tests pass locally with my changes
    • [x] Locally runs with atari / pybullet envs
  • Use mujoco bindings instead of mujoco_py

    Use mujoco bindings instead of mujoco_py

    Changes made:

    • Create Viewer() class to render window in "human" mode with dm_viewer and glfw
    • Modified the default viewer_setup() method for all mujoco_environments (only for v3 envs)
  • Render API

    Render API

    New render API

    Following this discussion: #2540

    This PR extends the render() method, allowing the user to specify render_mode during the instantiation of the environment. The default value of render_mode is None; in that case, the user can call render with the preferred mode as before. In this way, these changes are backwards compatible. In case render_mode != None, the argument mode of .render() is ignored. In case render_mode = "human", the rendering is handled by the environment without needing to call .render(). With other render modes, .render() returns a proper List with all the renders since the last .reset()or .render(). For example, with render_mode = "rgb_array", .render() returns a List of np.ndarray, while with render_mode = "ansi" a List[str].

    TODO

    • [x] Add deprecation warnings to mode arg in .render() and VideoRecorder

    Examples

    import gym
    
    env = gym.make('FrozenLake-v1', render_mode="human")
    env.reset()
    for _ in range(100):
        env.step(env.action_space.sample())
        # env renders automatically, no needs to call .render()
        
    env.render()
    > None
    
    import gym
    
    env = gym.make('CartPole-v1', render_mode="rgb_array")
    env.reset()
    
    for _ in range(10):
        env.step(env.action_space.sample())
    
    frames = env.render()
    type(frames)
    > <class 'list'>
    len(frames)
    > 11
    len(env.render()) # expect 0 because frames are popped by previous .render() call
    > 0
    
    env.reset()
    len(env.render())
    > 1
    

    Example of backward compatibility:

    import gym
    
    env = gym.make('FrozenLake-v1')  # default render_mode=None
    env.reset()
    for _ in range(100):
        # no rendering handled by the environment since render_mode = None
        env.step(env.action_space.sample()) 
        env.render()  # render with human mode (default)
       
    
  • ImportError: sys.meta_path is None, Python is likely shutting down

    ImportError: sys.meta_path is None, Python is likely shutting down

    I'm using MacOS. Since the python script finished, it will print such errors:

    It's a script will cause this problem:

    import gym
    env = gym.make('SpaceInvaders-v0')
    env.reset()
    env.render()
    

    And after executing it, the error occurs:

    ~/G/qlearning $ python atari.py
    Exception ignored in: <bound method SimpleImageViewer.__del__ of <gym.envs.classic_control.rendering.SimpleImageViewer object at 0x1059ab400>>
    Traceback (most recent call last):
      File "/Users/louchenyao/anaconda3/lib/python3.6/site-packages/gym/envs/classic_control/rendering.py", line 347, in __del__
      File "/Users/louchenyao/anaconda3/lib/python3.6/site-packages/gym/envs/classic_control/rendering.py", line 343, in close
      File "/Users/louchenyao/anaconda3/lib/python3.6/site-packages/pyglet/window/cocoa/__init__.py", line 281, in close
      File "/Users/louchenyao/anaconda3/lib/python3.6/site-packages/pyglet/window/__init__.py", line 770, in close
    ImportError: sys.meta_path is None, Python is likely shutting down
    

    It doesn't affect the environment running. It just a little annoying.

  • Environment working, but not render()

    Environment working, but not render()

    Configuration:

    Dell XPS15 Anaconda 3.6 Python 3.5 NVIDIA GTX 1050

    I installed open ai gym through pip. When I run the below code, I can execute steps in the environment which returns all information of the specific environment, but the render() method just gives me a blank screen. When I exit python the blank screen closes in a normal way.

    Code:

    import gym
    env = gym.make('CartPole-v0')
    env.reset()
    env.render()
    for i in range(1000):
        env.step(env.action_space.sample())
    

    After hours of google searching I think the issue might have something to do with pyglet, the package used for rendering, and possibly a conflict with my nvidia graphics card? All help is welcome. Thanks!

  • Game window hangs up

    Game window hangs up

    Hi,

    I am a beginner with gym. After I render CartPole

    env = gym.make('CartPole-v0') env.reset() env.render()

    Window is launched from Jupyter notebook but it hangs immediately. Then the notebook is dead. I am using Python 3.5.4 on OSX 10.11.6. What could be the problem here?

    Thanks.

  • Support MuJoCo 1.5

    Support MuJoCo 1.5

    In order to use MuJoCo on a recent mac, you need to be using MuJoCo 1.5: https://github.com/openai/mujoco-py/issues/36 Otherwise you get:

    >>> import mujoco_py
    ERROR: Could not open disk
    

    This is because MuJoCo before 1.5 doesn't support NVMe disks.

    Gym depends on MuJoCo via mujoco-py, which just released support for MuJoCo 1.5.

    It looks like maybe this is something you're already working on? Or would it be useful for me to look into fixing it?

  • Update on Plans for the MuJoCo, Robotics and Box2d Environments and the Status of Brax and Hardware Accelerated Environments in Gym

    Update on Plans for the MuJoCo, Robotics and Box2d Environments and the Status of Brax and Hardware Accelerated Environments in Gym

    Given DeepMinds acquisition of MuJoCo and past discussions about replacing MuJoCo environments in Gym, I would like to clarify plans going forward after meeting with the Brax/PyBullet/TDS team at Google and the MuJoCo team at DeepMind.

    1. We are going to be replacing the documented MuJoCo environments of the "MuJoCo" class with Brax based environments in the "Phys3D" class, add a deprecation warning to the "MuJoCo" environments and move them to a separate deprecated repo some months later. This raises several questions -"Why do the MuJoCo environments have to be replaced?" Despite MuJoCo being free, right now, the Gym environments have numerous bugs in simulation configuration and have code in a state that we are not able to maintain them. Moreover, they all depend on MuJoCo-Py, which is now fully deprecated and cannot be reasonably maintained. Given this, to use the environments with the more updated free versions of MuJoCo, to fix bugs and to be able to continue do basic maintenance like using new Python versions, the environments would have to be very nearly rewritten from scratch. This means that a serious discussion of a change of simulator is appropriate. -"Of all the simulators available, why Brax?" First lets list the possible widely used options: PyBullet, MuJoCo, TDS and Brax. PyBullet, which was originally is the obvious choice, no longer seriously maintained in favor of TDS and Brax. Each simulators have pros and cons. TDS has full differentiability, Brax has accelerator support (the environments run on GPUs or TPUs allowing training to go orders of magnitude faster- e.g. full training in minutes), and PyBullet and MuJoCo are more physically accurate. For the "MuJoCo" environment class, this high level of physical accuracy is not necessary. Accordingly, picking newer simulators with extra feature of use to researchers (differentiability or hardware acceleration support) is likely the preferable option. I personally believe that hardware accelerator support is more important, hence choosing Brax. -"How long will this take?" We hope to have a release with the Brax based Phys3D environments within the next 5 weeks and a lot of progress has already been made, but it a definite date is difficult to say. For the most recent updates, see https://github.com/google/brax/issues/49

    2. The "Robotics" environments are being moved out of Gym. This in turn raises several questions: -"Why can't they be maintained as is?" These environments have the same problems with being unmaintainable and having serious bugs as the others in the "MuJoCo" class with hopper and so on do. -"Why can't these be rewritten in Brax like the others?" Brax not physically accurate enough to support such complex simulations well, and while they hope to support this in the future it will take a very long time. -"I use the Robotics environments, were are they going?" ~Into a repo maintained by @Rohan138 , unless someone who is capable of maintaining them to a higher level and wants to reaches out to me. They will still be maintained as best as is reasonably possible in their state, be installable, and be listed as third party environments in Gym.~ https://github.com/Farama-Foundation/gym-robotics -"Shouldn't Gym have robotics environments like this though? Why not rewrite them in a manner that's suitable?" Because I don't think Gym inherently should have them and because we can't. My goal is to make all the environments inside Gym itself good general purpose benchmarks suitable that someone new to the field of reinforcement learning can look at and say "okay, there's are the things everyone uses that I should play with and understand." The robotics environments after many years have never filled this role and have become niche environments specifically for HER and similar research, and while I cannot speak personally to this matter, the robotics researchers I've spoken to say that these environments are no longer widely used in this space, and that forks of them are used instead, which further means these should not live in Gym proper. Regarding why we can't, these would literally have to be rewritten in the new version of MuJoCo (as Pybullet is no longer extensively maintained) and it's new coming Python bindings (which will not be released publicly for many months, likely with Python bindings following later), and that's not something anyone I'm aware of is willing to do due to the utterly extraordinary amount of work required, including the MuJoCo team at Deepmind. -"When will this happen?" Whenever the next release of Gym comes out.

    3. The Box2D environments will be rewritten in Brax in a new Phys2D environment class, and the Box2D environments will be deprecated and then removed, similar to the MuJoCo environments. In this process, the duplicate versions of lunar lander and bipedal walker will be consolidated into one environment, with the different modes as arguments on creation. To answer the natural questions about this here as well: -"Why do they need to be rewritten?" This is discussed in https://github.com/openai/gym/issues/2358, but in a nutshell the physics engine they're using using (Box2D) has Python bindings that have not been maintained for years, meaning that they'll stop supporting new Python versions, architectures, and other basic maintenance things. After many discussions over months, I cannot get these bindings maintained by basically anyone. Additionally, using pyglet for rendering has been a source of continual problems for Gym and it does not reasonably support headless rendering (an essential feature). -"Why Brax?" Originally I was planning to use the other major 2D physics library (chipmunk, which has well maintained Python bindings), but Brax is orders of magnitude faster as it can run or accelerators and the Brax team is kind enough to be willing to do the replacements for us. -"When will this happen?" Probably a month after the Phys3D environments are merged at the current rate, but that's not a timeline people have committed to or anything.

    4. General questions: -"These Brax environments can still run on my CPU too, right?" Yep! -"Can Brax environments run on AMD GPUs?" With some effort, yes. Brax uses Jax, which uses XLA, which has solid beta support for most AMD GPUs. -"Why are you having Gym so heavily depend on Brax?" Because I think that it's the best option for environments that already need to be rewritten, and because I think that letting the benchmark environments run orders of magnitude faster via accelerators is of profound value to the community and to beginners in the field. -"Is Brax going to be maintained for the long term?" As long as we can realistically expect, yes. All software stands risk of deprecation, e.g. PyBullet, the Box2D Python bindings (and arguably Box2D itself), PIL (what came before pillow), and so on. Given what I've seen that Google is using it for internally, I'm very confident it will be maintained for at least 5 years or so if not longer, which I think is the best we can reasonably plan for. -"Are you going to make other environments hardware accelerated so they can similarly run orders of magnitude faster?" Hopefully! This could be done with the toy text environments and the classic control environments pretty easily through Jax. I have no concrete plans or timeline for this.

    Please let me know if anyone has additional questions about the transition here.

  • AttributeError: module 'gym' has no attribute 'make'

    AttributeError: module 'gym' has no attribute 'make'

    >>> import gym
    >>> env = gym.make('Copy-v0')
    Traceback (most recent call last):
      File "<pyshell#5>", line 1, in <module>
        env = gym.make('Copy-v0')
    AttributeError: module 'gym' has no attribute 'make'
    >>> 
    

    I wanna Why @jonasschneider

  • Add save_video util and deprecate RecordVideo in favor of it

    Add save_video util and deprecate RecordVideo in favor of it

    This PR adds a deprecation warning to RecordVideo, in favor of a cleaner util function https://github.com/younik/gym/blob/record-video/gym/utils/save_video.py

    See also https://github.com/openai/gym/issues/2905

    Sample usage:

    import gym
    from gym.utils.save_video import save_video
    
    env = gym.make("MyEnv", render_mode="rgb_array")
    
    env.reset()
    step_starting_index = 0
    episode_index = 0
    for step_index in range(199):
        action = env.action_space.sample()
        _, _, done, _ = env.step(action)
    
        if done:
            save_video(
                 env.render(), 
                 "videos", 
                 fps=env.metadata["render_fps"], 
                 step_starting_index=step_starting_index, 
                 episode_index=episode_index
            )
            step_starting_index = step_index + 1
            episode_index += 1
            env.reset()
    
    env.close()
    
  • Type hint mujoco_env.py and update pyright

    Type hint mujoco_env.py and update pyright

    This PR removes mujoco_env.py from the set of ignored files such that every gym file is tested In addition, this PR adds comments to the pyproject.toml file for the number of warnings that each pyright disable parameter would cause. This should help with the type hinting of the project in the future I have removed unnecessary pyright argument that do not raise any new warnings or errors as the original pyright argument were just copied from another project will limited investigation into their effectiveness.

  • [Question] Documentation for lunar lander rewards incomplete

    [Question] Documentation for lunar lander rewards incomplete

  • TimeAwareObservationV0 Wrapper

    TimeAwareObservationV0 Wrapper

    This PR adds TimeAwareObservationV0 to the new set of dev_wrappers.

    Compared to the old TimeAwareObservation this wrapper uses a Dict observation space with keys time and obs.
    The new wrappers support all types of spaces and vectorized environments.

    The wrapper supports only the new step API. If backward compatibility is required for dev_wrappers I will add it.

  • Reduce memory consumption and hanging for Dict and Tuple spaces

    Reduce memory consumption and hanging for Dict and Tuple spaces

    Description

    This PR fixes extensive memory consumption and hanging when seeding spaces.

    • Iterates over length of subseeds and generates a seed from np_random.integers() instead of np_random.choice().
    • Uses np.int32 instead of python int for portability.

    Fixes #3010

    Type of change

    • [x] Bug fix (non-breaking change which fixes an issue)
    • [ ] New feature (non-breaking change which adds functionality)
    • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • [ ] This change requires a documentation update

    Checklist:

    • [x] I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
    • [x] I have commented my code, particularly in hard-to-understand areas
    • [ ] I have made corresponding changes to the documentation
    • [x] My changes generate no new warnings
    • [ ] I have added tests that prove my fix is effective or that my feature works
    • [x] New and existing unit tests pass locally with my changes
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Aug 1, 2022
A toolkit for reproducible reinforcement learning research.
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Jul 31, 2022
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

Aug 8, 2022
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Aug 1, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Aug 8, 2022
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

Aug 2, 2022
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Jul 29, 2022
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Aug 1, 2022
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Aug 2, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Aug 2, 2022
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

Jul 30, 2022
Deep Reinforcement Learning for Keras.
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Aug 3, 2022
ChainerRL is a deep reinforcement learning library built on top of Chainer.
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Jul 24, 2022
Open world survival environment for reinforcement learning
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Aug 5, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

Aug 1, 2022
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

Jul 26, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

Aug 2, 2022
Developing and Comparing Vision-based Algorithms for Vision-based Agile Flight
Developing and Comparing Vision-based Algorithms for Vision-based Agile Flight

DodgeDrone: Vision-based Agile Drone Flight (ICRA 2022 Competition) Would you like to push the boundaries of drone navigation? Then participate in the

Jul 22, 2022
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Aug 1, 2022