A toolkit for reproducible reinforcement learning research.

Docs Garage CI License codecov PyPI version

garage

garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementations built using that toolkit.

The toolkit provides wide range of modular tools for implementing RL algorithms, including:

  • Composable neural network models
  • Replay buffers
  • High-performance samplers
  • An expressive experiment definition interface
  • Tools for reproducibility (e.g. set a global random seed which all components respect)
  • Logging to many outputs, including TensorBoard
  • Reliable experiment checkpointing and resuming
  • Environment interfaces for many popular benchmark suites
  • Supporting for running garage in diverse environments, including always up-to-date Docker containers

See the latest documentation for getting started instructions and detailed APIs.

Installation

pip install --user garage

Examples

Starting from version v2020.10.0, garage comes packaged with examples. To get a list of examples, run:

garage examples

You can also run garage examples --help, or visit the documentation for even more details.

Join the Community

Join the garage-announce mailing list for infrequent updates (<1/mo.) on the status of the project and new releases.

Need some help? Want to ask garage is right for your project? Have a question which is not quite a bug and not quite a feature request?

Join the community Slack by filling out this Google Form.

Algorithms

The table below summarizes the algorithms available in garage.

Algorithm Framework(s)
CEM numpy
CMA-ES numpy
REINFORCE (a.k.a. VPG) PyTorch, TensorFlow
DDPG PyTorch, TensorFlow
DQN PyTorch, TensorFlow
DDQN PyTorch, TensorFlow
ERWR TensorFlow
NPO TensorFlow
PPO PyTorch, TensorFlow
REPS TensorFlow
TD3 PyTorch, TensorFlow
TNPG TensorFlow
TRPO PyTorch, TensorFlow
MAML PyTorch
RL2 TensorFlow
PEARL PyTorch
SAC PyTorch
MTSAC PyTorch
MTPPO PyTorch, TensorFlow
MTTRPO PyTorch, TensorFlow
Task Embedding TensorFlow
Behavioral Cloning PyTorch

Supported Tools and Frameworks

garage requires Python 3.6+. If you need Python 3.5 support, the last garage release to support Python 3.5 was v2020.06.

The package is tested on Ubuntu 18.04. It is also known to run on Ubuntu 16.04, 18.04, and 20.04, and recent versions of macOS using Homebrew. Windows users can install garage via WSL, or by making use of the Docker containers.

We currently support PyTorch and TensorFlow for implementing the neural network portions of RL algorithms, and additions of new framework support are always welcome. PyTorch modules can be found in the package garage.torch and TensorFlow modules can be found in the package garage.tf. Algorithms which do not require neural networks are found in the package garage.np.

The package is available for download on PyPI, and we ensure that it installs successfully into environments defined using conda, Pipenv, and virtualenv.

Testing

The most important feature of garage is its comprehensive automated unit test and benchmarking suite, which helps ensure that the algorithms and modules in garage maintain state-of-the-art performance as the software changes.

Our testing strategy has three pillars:

  • Automation: We use continuous integration to test all modules and algorithms in garage before adding any change. The full installation and test suite is also run nightly, to detect regressions.
  • Acceptance Testing: Any commit which might change the performance of an algorithm is subjected to comprehensive benchmarks on the relevant algorithms before it is merged
  • Benchmarks and Monitoring: We benchmark the full suite of algorithms against their relevant benchmarks and widely-used implementations regularly, to detect regressions and improvements we may have missed.

Supported Releases

Release Build Status Last date of support
v2020.06 Garage CI Release-2020.06 February 28th, 2021

Garage releases a new stable version approximately every 4 months, in February, June, and October. Maintenance releases have a stable API and dependency tree, and receive bug fixes and critical improvements but not new features. We currently support each release for a window of 8 months.

Citing garage

If you use garage for academic research, please cite the repository using the following BibTeX entry. You should update the commit field with the commit or release tag your publication uses.

@misc{garage,
 author = {The garage contributors},
 title = {Garage: A toolkit for reproducible reinforcement learning research},
 year = {2019},
 publisher = {GitHub},
 journal = {GitHub repository},
 howpublished = {\url{https://github.com/rlworkgroup/garage}},
 commit = {be070842071f736eb24f28e4b902a9f144f5c97b}
}

Credits

The earliest code for garage was adopted from predecessor project called rllab. The garage project is grateful for the contributions of the original rllab authors, and hopes to continue advancing the state of reproducibility in RL research in the same spirit. garage has previously been supported by the Amazon Research Award "Watch, Practice, Learn, Do: Unsupervised Learning of Robust and Composable Robot Motion Skills by Fusing Expert Demonstrations with Robot Experience."


Made with  at and  

Owner
Reinforcement Learning Working Group
Coalition of researchers which develop open source reinforcement learning research software
Reinforcement Learning Working Group
Comments
  • Bug fixes for importing Box

    Bug fixes for importing Box

    There are two major changes:

    1. Change importing gym.spaces.Box to from gym.spaces import Box
    2. Add spec property method for MujocoEnv to fix the no method error when calling env.spec
  • Delete stub()

    Delete stub()

    Traceback (most recent call last):
      File "/Users/jonathon/Documents/garage/garage/scripts/run_experiment.py", line 191, in <module>
        run_experiment(sys.argv)
      File "/Users/jonathon/Documents/garage/garage/scripts/run_experiment.py", line 146, in run_experiment
        logger.log_parameters_lite(params_log_file, args)
      File "/Users/jonathon/Documents/garage/garage/garage/misc/logger.py", line 372, in log_parameters_lite
        json.dump(log_params, f, indent=2, sort_keys=True, cls=MyEncoder)
      File "/anaconda2/envs/garage/lib/python3.6/json/__init__.py", line 179, in dump
        for chunk in iterable:
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 430, in _iterencode
        yield from _iterencode_dict(o, _current_indent_level)
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
        yield from chunks
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
        yield from chunks
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 404, in _iterencode_dict
        yield from chunks
      [Previous line repeated 1 more times]
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 437, in _iterencode
        o = _default(o)
      File "/Users/jonathon/Documents/garage/garage/garage/misc/logger.py", line 352, in default
        return json.JSONEncoder.default(self, o)
      File "/anaconda2/envs/garage/lib/python3.6/json/encoder.py", line 180, in default
        o.__class__.__name__)
    TypeError: Object of type 'TimeLimit' is not JSON serializable
    
  • Fix a typo to allow evaluating algos deterministically

    Fix a typo to allow evaluating algos deterministically

    When running some experiments with SAC I have discovered that my algorithm does not act deterministically during the evaluation (i.e. the action is sampled from the policy distribution instead of taking the mean/mode of the distribution). The code for obtaining evaluation samples uses the rollout function from sampler.utils with the argument deterministic=True.

    The rollout function is then supposed to look into agent_info dictionary and use the mean value stored there. Unfortunately, currently in the code it looks into the agent_infos (with s at the end), which is a list containing agent_info dictionaries and as such obviously does not contain the mean key. This means that the stochastic, sampled action is used instead. My pull request solves this issue by fixing the typo.

    Technical sidenote - maybe there should be an exception raised if deterministic=True and there is no mean key in the dict?

  • Replace CategoricalConvPolicy

    Replace CategoricalConvPolicy

    • Remove all occurrences of CategoricalConvPolicy
    • Rename CategoricalConvPolicyWithModel to CategoricalConvPolicy
    • Create and remove integration test

    Benchmark script is located in origin/benchmark_categorical_cnn_policy.

    Results: MemorizeDigits-v0_benchmark_ppo

    MemorizeDigits-v0_benchmark_ppo_mean

    Also tried running both versions in an atari environment PongNoFrameskip with PPO, but realized that this combination of environment and algorithm is not ideal for our testing: image

    As discussed, results from MemorizeDigits are sufficient to show that the layer implementation can be replaced with the model implementation.

  • Fix sleeping processes

    Fix sleeping processes

    The joblib package responsible of the MemmappingPool has been updated to consider any bugs that could produce the sleeping processes in the parallel sampler. Also the environment variable JOBLIB_START_METHOD has been removed since it's not implemented by joblib anymore. However, if run_experiment is interrupted during the optimization steps, the sleeping processes are still produced. To fix the problem, the child processes of the parallel sampler ignore SIGINT so they're not killed while holding a lock that is also acquired by the parent process, avoiding a dead lock. To make sure the child processes are terminated, the SIGINT handler in the parent process is overridden to call the terminate and join functions in the processes pool. The process (thread in TF) used in Plotter is terminated thanks to registering the method shutdown with function atexit, but one important step missing was to clean the Queue that interacts with worker process.

  • Failed to reproduce example her_ddpg_fetchreach

    Failed to reproduce example her_ddpg_fetchreach

    Hi,

    I was trying to run examples/tf/her_ddpg_fetchreach.py but got a much worse performance. I attached the results as follows, and it seems that it's not working at all. Do you have any idea how to make it work? Though the default parameters look reasonable, should I try to tune some parameters?

    Thank you in advance.

    AverageSuccessRate 0 Epoch 49 Evaluation/AverageDiscountedReturn -9.94112 Evaluation/AverageReturn -999.93 Evaluation/CompletionRate 0 Evaluation/Iteration 980 Evaluation/MaxReturn -998 Evaluation/MinReturn -1000 Evaluation/NumTrajs 100 Evaluation/StdReturn 0.324191 Policy/AveragePolicyLoss 4.17451 QFunction/AverageAbsQ 4.19709 QFunction/AverageAbsY 4.19412 QFunction/AverageQ -4.16916 QFunction/AverageQFunctionLoss 0.0232313 QFunction/AverageY -4.16946 QFunction/MaxQ 2.87869 QFunction/MaxY 2.62668 TotalEnvSteps 100000

  • Master branch can't pass make test

    Master branch can't pass make test

    Current master branch can't pass make test. However the failed tests will pass when running unittest separately.

    ======================================================================
    ERROR: test_dm_control_tf_policy (tests.garage.envs.dm_control.test_dm_control_tf_policy.TestDmControlTfPolicy)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/root/code/garage/tests/garage/envs/dm_control/test_dm_control_tf_policy.py", line 38, in test_dm_control_tf_policy
        runner.train(n_epochs=1, batch_size=10)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 321, in train
        start_epoch=0)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 407, in _train
        self.save(epoch, paths if store_paths else None)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 210, in save
        snapshotter.save_itr_params(epoch, params)
      File "/root/code/garage/garage/logger/snapshotter.py", line 85, in save_itr_params
        with open(file_name, 'wb') as file:
    FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpgplyc983/params.pkl'
    
    ======================================================================
    ERROR: test_cem_cartpole (tests.garage.np.algos.test_cem.TestCEM)
    Test CEM with Cartpole-v1 environment.
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/root/code/garage/tests/garage/np/algos/test_cem.py", line 35, in test_cem_cartpole
        n_epochs=5, batch_size=2000, n_epoch_cycles=n_samples)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 321, in train
        start_epoch=0)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 407, in _train
        self.save(epoch, paths if store_paths else None)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 210, in save
        snapshotter.save_itr_params(epoch, params)
      File "/root/code/garage/garage/logger/snapshotter.py", line 85, in save_itr_params
        with open(file_name, 'wb') as file:
    FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpgplyc983/params.pkl'
    
    ======================================================================
    ERROR: test_cma_es_cartpole (tests.garage.np.algos.test_cma_es.TestCMAES)
    Test CMAES with Cartpole-v1 environment.
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/root/code/garage/tests/garage/np/algos/test_cma_es.py", line 33, in test_cma_es_cartpole
        runner.train(n_epochs=1, batch_size=1000, n_epoch_cycles=n_samples)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 321, in train
        start_epoch=0)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 407, in _train
        self.save(epoch, paths if store_paths else None)
      File "/root/code/garage/garage/experiment/local_tf_runner.py", line 210, in save
        snapshotter.save_itr_params(epoch, params)
      File "/root/code/garage/garage/logger/snapshotter.py", line 85, in save_itr_params
        with open(file_name, 'wb') as file:
    FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpgplyc983/params.pkl'
    
    ======================================================================
    FAIL: test_trpo_recurrent_cartpole (tests.garage.tf.algos.test_trpo_with_model.TestTRPO)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/root/code/garage/tests/garage/tf/algos/test_trpo_with_model.py", line 39, in test_trpo_recurrent_cartpole
        assert last_avg_ret > 90
    AssertionError
    
    ----------------------------------------------------------------------
    Ran 623 tests in 789.240s
    
    FAILED (failures=1, errors=3)
    Makefile:60: recipe for target 'run-headless' failed
    make: *** [run-headless] Error 1
    
  • Update setup script for OS X

    Update setup script for OS X

    The script has been updated based on setup_linux.sh, using homebrew to install Linux packages and updating the requirements for mujoco-py, gym and baselines according to their documentation for OS X. Since mujoco-py is installed within the setup scripts for Linux and OS X, the script to setup mujoco has been removed. Also, the tensorflow package was added to environment.yml so it can be installed out of the box without the scripts. The feature to install the GPU flavor of TensorFlow may be removed from the Linux script once everything could be installed using environment.yml only, or another single list of dependencies. Finally, the default for set-envvar and the correct replacement of string in print error and warning functions have been added to the script to setup linux.

  • Add Pytorch TRPO

    Add Pytorch TRPO

    Implemented Trust Region Policy Optimization in PyTorch.

    Benchmarks are currently running and should be finished by tomorrow. I opened this PR to get some feedback since Initial results and tests looked good.

  • Wrap examples in tests to run on CI

    Wrap examples in tests to run on CI

    Doesn't include the test for sim_policy.py. Will do that separately.

    Update: Also exludes tf/ppo_memorize_digits, tf/dqn_pong.py and tf/trpo_cubecrash.py for now, since they take too long to run on CI even with 1 epoch. Still figuring out decent enough parameters to run them on CI, but want to get this merged first.

  • Replace CategoricalLSTMPolicy with Model

    Replace CategoricalLSTMPolicy with Model

    This PR replaces CategoricalLSTMPolicy with the one implemented using garage.tf.models.

    Benchmark script is in benchmark_categorical_lstm_policy branch, modified from benchmark_ppo_catogorical.py.

    There seems to be some randomness in the benchmark. In some environments, the trails of the old and the new policies do not match very well. When this happends, usually the new one performs worse. See the figures below.

    Raw data for tensorboard. 2019-09-21-16-09-09-112853.zip

    Assault-ramDeterministic-v4 assault_seed_42 assault_seed_70 assault_seed_96 Breakout-ramDeterministic-v4 breakout_seed_40 breakout_seed_47 breakout_seed_86 ChopperCommand-ramDeterministic-v4 chopper_seed_3 chopper_seed_71 chopper_seed_99 LunarLander-v2 lunarLander_seed_23 lunarLander_seed_24 lunarLander_seed_35 Tutankham-ramDeterministic-v4/trial_1_seed_41 tutankham_seed_41 tutankham_seed_65 tutankham_seed_79

  • Suggestion to add how to implement pre-trained policies.

    Suggestion to add how to implement pre-trained policies.

    Hi! I'm currently working to implement a policy trained using SAC in mujoco into a real robot. I'm trying to load the two q-functions but I obtain weird result in q_loss and the returns. Any suggestion in how to load correctly the policy? Thanks!

  • Constraining the output interval of the GaussianLSTMModel to [0.0 .. 1.0]

    Constraining the output interval of the GaussianLSTMModel to [0.0 .. 1.0]

    What would be the simplest way to constrain the GaussianLSTMModel [1] to output values only within the interval [0.0 .. 1.0]?

    [1] https://github.com/rlworkgroup/garage/blob/6461a071f0155712add1b41316003e90c9c77899/src/garage/tf/models/gaussian_lstm_model.py#L16

    Many thanks in advance!

  • Support GPUs in RaySampler

    Support GPUs in RaySampler

    This has been frequently requested. We've avoided adding support largely because it's difficult to test, and because for many environments sampling on cpu is more efficient anyways, but there are enough bug reports about this it's probably better to just support it.

  • KeyError: 'render.modes' in GymEnv wrapping

    KeyError: 'render.modes' in GymEnv wrapping "CartPole-vX"

    Apparently, classic control environments in Gym have a different key for render modes in env.metadata.

    In fact:

    >>> import gym
    >>> env = gym.make('CartPole-v1')
    >>> env.metadata
    {'render_modes': ['human', 'rgb_array'], 'render_fps': 50}
    

    While in the garage wrapper it expects to find env.metadata['render.modes'], as it is for other environments. https://github.com/rlworkgroup/garage/blob/c56513f42be9cba2ef5426425a8ad36097e679c2/src/garage/envs/gym_env.py#L147

    This results in a KeyError, unsurprisingly:

    $ python examples/tf/trpo_cartpole.py 
    Traceback (most recent call last):
      File "examples/tf/trpo_cartpole.py", line 57, in <module>
        trpo_cartpole()
      File "/home/***/.local/lib/python3.8/site-packages/garage/experiment/experiment.py", line 369, in __call__
        result = self.function(ctxt, **kwargs)
      File "examples/tf/trpo_cartpole.py", line 33, in trpo_cartpole
        env = GymEnv('CartPole-v1')
      File "/home/***/.local/lib/python3.8/site-packages/garage/envs/gym_env.py", line 147, in __init__
        self._render_modes = self._env.metadata['render.modes']
    KeyError: 'render.modes'
    

    Is it a problem of my version mix? I run with:

    • garage==2021.3.0
    • gym==0.23.1

    Thank you all for the great work!

  • Make environment with 2 discrete action

    Make environment with 2 discrete action

    i was trying to make environment with 2 discrete action . but i couldn't find argument for akro.Discrete to change shape to 2 .

    How can i make environment with 2 discrete action ? (i dont have mixing option for actions )

    Thank you.

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Aug 1, 2022
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Aug 1, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Aug 8, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

Aug 2, 2022
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Jul 29, 2022
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

Aug 8, 2022
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Aug 1, 2022
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Aug 2, 2022
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

Aug 2, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Aug 2, 2022
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

Jul 30, 2022
Deep Reinforcement Learning for Keras.
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Aug 3, 2022
ChainerRL is a deep reinforcement learning library built on top of Chainer.
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Jul 24, 2022
Open world survival environment for reinforcement learning
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Aug 5, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

Aug 1, 2022
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

Jul 26, 2022
A customisable 3D platform for agent-based AI research
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

Jul 30, 2022
Python logging package for easy reproducible experimenting in research

smilelogging Python logging package for easy reproducible experimenting in research. Why you may need this package This project is meant to provide an

Jul 31, 2022
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Aug 1, 2022