Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine



Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grokked codebase in which users can freely experiment with wild ideas (speculative research).

Our design principles are:

  • Easy experimentation: Make it easy for new users to run benchmark experiments.
  • Flexible development: Make it easy for new users to try out research ideas.
  • Compact and reliable: Provide implementations for a few, battle-tested algorithms.
  • Reproducible: Facilitate reproducibility in results. In particular, our setup follows the recommendations given by Machado et al. (2018).

In the spirit of these principles, this first version focuses on supporting the state-of-the-art, single-GPU Rainbow agent (Hessel et al., 2018) applied to Atari 2600 game-playing (Bellemare et al., 2013). Specifically, our Rainbow agent implements the three components identified as most important by Hessel et al.:

For completeness, we also provide an implementation of DQN (Mnih et al., 2015). For additional details, please see our documentation.

We provide a set of Colaboratory notebooks which demonstrate how to use Dopamine.

We provide a website which displays the learning curves for all the provided agents, on all the games.

This is not an official Google product.

What's new

  • 16/102020: Learning curves for the QR-DQN JAX agent have been added to the baseline plots!

  • 03/08/2020: Dopamine now supports JAX agents! This includes an implementation of the Quantile Regression agent (QR-DQN) which has been a common request. Find out more in our jax subdirectory, which includes trained agent checkpoints.

  • 27/07/2020: Dopamine now runs on TensorFlow 2. However, Dopamine is still written as TensorFlow 1.X code. This means your project may need to explicity disable TensorFlow 2 behaviours with:

    import tensorflow.compat.v1 as tf
    tf.disable_v2_behavior()
    

    if you are using custom entry-point for training your agent. The migration to TensorFlow 2 also means that Dopamine no longer supports Python 2.

  • 02/09/2019: Dopamine has switched its network definitions to use tf.keras.Model. The previous tf.contrib.slim based networks are removed. If your agents inherit from dopamine agents you need to update your code.

    • ._get_network_type() and ._network_template() functions are replaced with ._create_network() and network_type definitions are moved inside the model definition.

      # The following two functions are replaced with `_create_network()`.
      # def _get_network_type(self):
      #   return collections.namedtuple('DQN_network', ['q_values'])
      # def _network_template(self, state):
      #   return self.network(self.num_actions, self._get_network_type(), state)
      
      def _create_network(self, name):
        """Builds the convolutional network used to compute the agent's Q-values.
      
        Args:
          name: str, this name is passed to the tf.keras.Model and used to create
            variable scope under the hood by the tf.keras.Model.
        Returns:
          network: tf.keras.Model, the network instantiated by the Keras model.
        """
        # `self.network` is set to `atari_lib.NatureDQNNetwork`.
        network = self.network(self.num_actions, name=name)
        return network
      
      def _build_networks(self):
        # The following two lines are replaced.
        # self.online_convnet = tf.make_template('Online', self._network_template)
        # self.target_convnet = tf.make_template('Target', self._network_template)
        self.online_convnet = self._create_network(name='Online')
        self.target_convnet = self._create_network(name='Target')
      
    • If your code overwrites ._network_template(), ._get_network_type() or ._build_networks() make sure you update your code to fit with the new API. If your code overwrites ._build_networks() you need to replace tf.make_template('Online', self._network_template) with self._create_network(name='Online').

    • The variables of each network can be obtained from the networks as follows: vars = self.online_convnet.variables.

    • Baselines and older checkpoints can be loaded by adding the following line to your gin file.

      atari_lib.maybe_transform_variable_names.legacy_checkpoint_load = True
      
  • 11/06/2019: Visualization utilities added to generate videos and still images of a trained agent interacting with its environment. See an example colaboratory here.

  • 30/01/2019: Dopamine 2.0 now supports general discrete-domain gym environments.

  • 01/11/2018: Download links for each individual checkpoint, to avoid having to download all of the checkpoints.

  • 29/10/2018: Graph definitions now show up in Tensorboard.

  • 16/10/2018: Fixed a subtle bug in the IQN implementation and upated the colab tools, the JSON files, and all the downloadable data.

  • 18/09/2018: Added support for double-DQN style updates for the ImplicitQuantileAgent.

    • Can be enabled via the double_dqn constructor parameter.
  • 18/09/2018: Added support for reporting in-iteration losses directly from the agent to Tensorboard.

    • Set the run_experiment.create_agent.debug_mode = True via the configuration file or using the gin_bindings flag to enable it.
    • Control frequency of writes with the summary_writing_frequency agent constructor parameter (defaults to 500).
  • 27/08/2018: Dopamine launched!

Instructions

Install via source

Installing from source allows you to modify the agents and experiments as you please, and is likely to be the pathway of choice for long-term use. The instructions below assume that you will be running Dopamine in a virtual environment. A virtual environment lets you control which dependencies are installed for which program.

Dopamine is a Tensorflow-based framework, and we recommend you also consult the Tensorflow documentation for additional details. Finally, these instructions are for Python 3.6 and above.

First download the Dopamine source.

git clone https://github.com/google/dopamine.git

Then create a virtual environment and activate it.

python3 -m venv ./dopamine-venv
source dopamine-venv/bin/activate

Finally setup the environment and install Dopamine's dependencies

pip install -U pip
pip install -r dopamine/requirements.txt

Running tests

You can test whether the installation was successful by running the following:

cd dopamine
export PYTHONPATH=$PYTHONPATH:$PWD
python -m tests.dopamine.atari_init_test

Training agents

Atari games

The entry point to the standard Atari 2600 experiment is dopamine/discrete_domains/train.py. To run the basic DQN agent,

python -um dopamine.discrete_domains.train \
  --base_dir /tmp/dopamine_runs \
  --gin_files dopamine/agents/dqn/configs/dqn.gin

By default, this will kick off an experiment lasting 200 million frames. The command-line interface will output statistics about the latest training episode:

[...]
I0824 17:13:33.078342 140196395337472 tf_logging.py:115] gamma: 0.990000
I0824 17:13:33.795608 140196395337472 tf_logging.py:115] Beginning training...
Steps executed: 5903 Episode length: 1203 Return: -19.

To get finer-grained information about the process, you can adjust the experiment parameters in dopamine/agents/dqn/configs/dqn.gin, in particular by reducing Runner.training_steps and Runner.evaluation_steps, which together determine the total number of steps needed to complete an iteration. This is useful if you want to inspect log files or checkpoints, which are generated at the end of each iteration.

More generally, the whole of Dopamine is easily configured using the gin configuration framework.

Non-Atari discrete environments

We provide sample configuration files for training an agent on Cartpole and Acrobot. For example, to train C51 on Cartpole with default settings, run the following command:

python -um dopamine.discrete_domains.train \
  --base_dir /tmp/dopamine_runs \
  --gin_files dopamine/agents/rainbow/configs/c51_cartpole.gin

You can train Rainbow on Acrobot with the following command:

python -um dopamine.discrete_domains.train \
  --base_dir /tmp/dopamine_runs \
  --gin_files dopamine/agents/rainbow/configs/rainbow_acrobot.gin

Install as a library

An easy, alternative way to install Dopamine is as a Python library:

pip install dopamine-rl

References

Bellemare et al., The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2013.

Machado et al., Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents, Journal of Artificial Intelligence Research, 2018.

Hessel et al., Rainbow: Combining Improvements in Deep Reinforcement Learning. Proceedings of the AAAI Conference on Artificial Intelligence, 2018.

Mnih et al., Human-level Control through Deep Reinforcement Learning. Nature, 2015.

Mnih et al., Asynchronous Methods for Deep Reinforcement Learning. Proceedings of the International Conference on Machine Learning, 2016.

Schaul et al., Prioritized Experience Replay. Proceedings of the International Conference on Learning Representations, 2016.

Giving credit

If you use Dopamine in your work, we ask that you cite our white paper. Here is an example BibTeX entry:

@article{castro18dopamine,
  author    = {Pablo Samuel Castro and
               Subhodeep Moitra and
               Carles Gelada and
               Saurabh Kumar and
               Marc G. Bellemare},
  title     = {Dopamine: {A} {R}esearch {F}ramework for {D}eep {R}einforcement {L}earning},
  year      = {2018},
  url       = {http://arxiv.org/abs/1812.06110},
  archivePrefix = {arXiv}
}
Comments
  • Cartpole colab shows DQN outperforming C51?

    Cartpole colab shows DQN outperforming C51?

    I would've expected C51 to outperform DQN (at least initially, if not asymptotically) but when I looked at the provided colab notebook, C51 seems to be beaten by DQN quite frequently:

    image

    I ran the notebook myself to get my own results, which largely agreed:

    image

    I suppose there are two questions:

    1. Why is DQN so unstable?

    2. Why does DQN outperform C51?

  • What the error 'Failed building wheel for atari-py?'

    What the error 'Failed building wheel for atari-py?'

    Qustion as the tille ,I hava installed all of the requirement package,but the error I can`t fix it. Otherwise I execute the ‘python tests/atari_init_test.py’ the error is ‘no moudle named dopamine.atari’ Thanks!

  • Not able to load the agent and get baseline result in the rainbow paper

    Not able to load the agent and get baseline result in the rainbow paper

    Hi, I was trying to load a rainbow agent form the checkpoint of breakout and want to see the results from the paper, but I was not able to make it work. It seems that the checkpoint prefix does not match the agent codebase. I am not sure if I am wrong about this. Any help would be appreciated. Thanks!

  • question: trouble loading 'content'

    question: trouble loading 'content'

    when loading content using experimental_data = colab_utils.load_baselines('/content') I got this error You are trying to merge on float64 and object columns. If you wish to proceed you should use pd.concat. I'm not sure what's wrong. I ran this on my local computer. I have 'c51, dqn, implicit_quantile, quantile, rainbow' in 'content' folder. Is this pandas related problem?

  • [question] Printing model summary

    [question] Printing model summary

    I was going through this Colab and trying to make a custom agent on the pattern of MyRandomDQNAgent(dqn_agent.DQNAgent). The DQNAgent's network is specified here which is in turn the NatureDQNNetwork specified here. Now this is going to sound stupid, but I have a similar object (a multi headed DQN built in Dopamine) that I am trying to recreate in another library (RLlib). I wanted to print the model.summary() of this Keras.Model to make sure both are equivalent, but I am really struggling with it. If someone can help point out a solution. Thanks!

  • Agent frozen

    Agent frozen

    I'm trying to visualize Breakout and Pong using example_viz_lib and dqn agent. However, in both cases for all 5 checkpoints provided, the agent seems to be stuck in the corner and does not move. In another words agent does not seem to play. The same experiment with SpaceInvaders seems to work fine. Is it because for Breakout and Pong, agent is waiting for FIRE action that never arrives or is it some other reason?

    Thanks

  • MemoryError:

    MemoryError:

    When I follow these steps to set up dopamine,everything seems ok until testing the "dopamine/atari/train.py.".The problem is: MemoryError: In call to configurable 'WrappedReplayBuffer' (<unbound method WrappedReplayBuffer.__init__>) In call to configurable 'DQNAgent' (<unbound method DQNAgent.__init__>) In call to configurable 'Runner' (<unbound method Runner.__init__>)

  • Tensorflow Attribute Error: module 'tensorflow_core._api.v2.train' has no attribute 'RMSProp Optimizer'

    Tensorflow Attribute Error: module 'tensorflow_core._api.v2.train' has no attribute 'RMSProp Optimizer'

    After conducting all the given steps on installation when I run ,

    python -um dopamine.discrete_domains.train
    --base_dir=/tmp/dopamine
    --gin_files='dopamine/agents/dqn/configs/dqn.gin'

    I get the following error,

    in line 97, in DQN Agent optimizer=tf.train.RMSPropOptimizer( Attribute Error: module 'tensorflow_core._api.v2.train' has no attribute 'RMSProp Optimizer'

    How could we solve this problem?

  • [Question] Timeline for generalizing Dopamine and policy for contributions towards this.

    [Question] Timeline for generalizing Dopamine and policy for contributions towards this.

    Thanks for this great project!

    There's been quite a few issues (f.e. #3 #36) regarding customising Dopamine to work on new environments. It also seems like there are quite a few people who have made or are making forks that allow for this.

    So I was wondering:

    • Is there any current work/timeline for generalising Dopamine to new environments/network structures and so on?
    • What is the policy on accepting contributions towards achieving the above?
  • Issue when creating example form colab.

    Issue when creating example form colab.

    I'm trying to run https://colab.research.google.com/github/google/dopamine/blob/master/dopamine/colab/agents.ipynb on local machine.

    I got this error for both examples

    /home/lukas/anaconda3/lib/python3.6/site-packages/h5py/init.py:34: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type. from ._conv import register_converters as _register_converters [2018-10-14 13:18:26,861] Making new env: AsterixNoFrameskip-v0 Traceback (most recent call last): File "/home/lukas/dopamine/dopamine/agents/luska_1/luska_1.py", line 51, in max_steps_per_episode=100) File "/home/lukas/anaconda3/lib/python3.6/site-packages/gin/config.py", line 1032, in wrapper utils.augment_exception_message_and_reraise(e, err_str) File "/home/lukas/anaconda3/lib/python3.6/site-packages/gin/utils.py", line 48, in augment_exception_message_and_reraise six.raise_from(proxy.with_traceback(exception.traceback), None) File "", line 2, in raise_from File "/home/lukas/anaconda3/lib/python3.6/site-packages/gin/config.py", line 1009, in wrapper return fn(*new_args, **new_kwargs) File "/home/lukas/dopamine/dopamine/atari/run_experiment.py", line 157, in init self._agent = create_agent_fn(self._sess, self._environment, summary_writer=self._summary_writer) TypeError: create_random_dqn_agent() got an unexpected keyword argument 'summary_writer' In call to configurable 'Runner' (<function Runner.init at 0x7f9b5b24b6a8>)

    Thanks!

  • What is the CUDA version supported?

    What is the CUDA version supported?

    I tried to run dopamine on my GPU machine w/ Ubuntu 16.04.4 and CUDA 9.0. I was following the testing and training instruction in the provided Readme file under virtualenv. The testing and training was running fine but all on CPU only (high CPU utilization and Zero GPU utilization all the way after one iteration is finished). I'm running using "dopamine/agents/dqn/configs/dqn.gin" and the configuration uses GPU:0 as tf_device by default. Does any body have any pointer on such kind of situation?

  • Poor introduction to dopamine

    Poor introduction to dopamine

    So I tried following the overview and - to be honest - this has been a pain.

    The entry point to the standard Atari 2600 experiment is [dopamine/discrete_domains/train.py](https://github.com/google/dopamine/blob/master/dopamine/discrete_domains/train.py). To run the basic DQN agent,
    
    python -um dopamine.discrete_domains.train \
      --base_dir /tmp/dopamine_runs \
      --gin_files dopamine/agents/dqn/configs/dqn.gin
    

    This raises loads of errors:

    • First all the missing imports (I fixed those using pipreqs).
    • Then ModuleNotFoundError: No module named 'dopamine.metrics' (described in GH-196) that I fixed by changing all from dopamine.metrics to from metrics.
    • Finaly I've got Gym complaining it doesn't distribute ROMs anymore:
    gym.error.Error: We're Unable to find the game "Pong". Note: Gym no longer distributes ROMs. If you own a license to use the necessary ROMs for research purposes you can download them via `pip install gym[accept-rom-license]`. Otherwise, you should try importing "Pong" via the command `ale-import-roms`. If you believe this is a mistake perhaps your copy of "Pong" is unsupported. To check if this is the case try providing the environment variable `PYTHONWARNINGS=default::ImportWarning:ale_py.roms`. For more information see: https://github.com/mgbellemare/Arcade-Learning-Environment#rom-management
    

    I believe all of this should run onto the core and atari containers that I created following the readme but I don't see where this is explained...

    So I started looking at the colabs hoping to find some more infos but I got loads of errors...

    agents.ipynb in Load baseline data

    ---------------------------------------------------------------------------
    
    TypeError                                 Traceback (most recent call last)
    
    [<ipython-input-6-76246b013715>](https://localhost:8080/#) in <module>()
          1 # @title Load baseline data
          2 get_ipython().system('gsutil -q -m cp -R gs://download-dopamine-rl/preprocessed-benchmarks/* /content/')
    ----> 3 experimental_data = colab_utils.load_baselines('/content')
    
    1 frames
    
    [/usr/lib/python3.7/copyreg.py](https://localhost:8080/#) in _reconstructor(cls, base, state)
         41 def _reconstructor(cls, base, state):
         42     if base is object:
    ---> 43         obj = object.__new__(cls)
         44     else:
         45         obj = base.__new__(cls, state)
    
    TypeError: object.__new__(BlockManager) is not safe, use BlockManager.__new__()
    

    load_statistics.ipynb in Load the baseline data

    ---------------------------------------------------------------------------
    
    TypeError                                 Traceback (most recent call last)
    
    <ipython-input-4-fed9ae80a060> in <module>()
          2 
          3 get_ipython().system('gsutil -q -m cp -R gs://download-dopamine-rl/preprocessed-benchmarks/* /content/')
    ----> 4 experimental_data = colab_utils.load_baselines('/content')
    
    1 frames
    
    /usr/lib/python3.7/copyreg.py in _reconstructor(cls, base, state)
         41 def _reconstructor(cls, base, state):
         42     if base is object:
    ---> 43         obj = object.__new__(cls)
         44     else:
         45         obj = base.__new__(cls, state)
    
    TypeError: object.__new__(BlockManager) is not safe, use BlockManager.__new__()
    

    agent_visualizer.ipynb in Generate the video

    pygame 2.1.2 (SDL 2.0.16, Python 3.7.13)
    Hello from the pygame community. https://www.pygame.org/contribute.html
    WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
    Instructions for updating:
    non-resource variables are not supported in the long term
    
    ---------------------------------------------------------------------------
    
    Exception                                 Traceback (most recent call last)
    
    [<ipython-input-3-05aad0365c1e>](https://localhost:8080/#) in <module>()
          4 example_viz_lib.run(agent='rainbow', game='SpaceInvaders', num_steps=num_steps,
          5                     root_dir='/tmp/agent_viz', restore_ckpt='/tmp/tf_ckpt-199',
    ----> 6                     use_legacy_checkpoint=True)
    
    15 frames
    
    [/usr/local/lib/python3.7/dist-packages/atari_py/games.py](https://localhost:8080/#) in get_game_path(game_name)
         18     path = os.path.join(_games_dir, game_name) + ".bin"
         19     if not os.path.exists(path):
    ---> 20         raise Exception('ROM is missing for %s, see https://github.com/openai/atari-py#roms for instructions' % (game_name,))
         21     return path
         22 
    
    Exception: ROM is missing for space_invaders, see https://github.com/openai/atari-py#roms for instructions
      In call to configurable 'create_atari_environment' (<function create_atari_environment at 0x7f4b9399da70>)
      In call to configurable 'Runner' (<class 'dopamine.discrete_domains.run_experiment.Runner'>)
    

    agent_visualizer.ipynb in Generate video

    WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/compat/v2_compat.py:107: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
    Instructions for updating:
    non-resource variables are not supported in the long term
    
    ---------------------------------------------------------------------------
    
    Exception                                 Traceback (most recent call last)
    
    [<ipython-input-5-527301e7cc72>](https://localhost:8080/#) in <module>()
         12 gin.parse_config(config)
         13 runner = example_viz_lib.MyRunner(base_dir, '/tmp/ckpt.199',
    ---> 14                                   AGENT_CREATORS[agent])
         15 runner.visualize(base_dir / 'images', num_global_steps=200)
    
    13 frames
    
    [/usr/local/lib/python3.7/dist-packages/atari_py/games.py](https://localhost:8080/#) in get_game_path(game_name)
         18     path = os.path.join(_games_dir, game_name) + ".bin"
         19     if not os.path.exists(path):
    ---> 20         raise Exception('ROM is missing for %s, see https://github.com/openai/atari-py#roms for instructions' % (game_name,))
         21     return path
         22 
    
    Exception: ROM is missing for breakout, see https://github.com/openai/atari-py#roms for instructions
      In call to configurable 'create_atari_environment' (<function create_atari_environment at 0x7f9a4a7ff200>)
      In call to configurable 'Runner' (<class 'dopamine.discrete_domains.run_experiment.Runner'>)
    

    happily tensorboard.ipynb functions properly and cartpole.ipynb too.

    Overall, all of this makes for a very poor introduction to dopamine.

  • ImportError: cannot import name 'isin' from 'jax._src.numpy.lax_numpy'

    ImportError: cannot import name 'isin' from 'jax._src.numpy.lax_numpy'

    Following the getting started steps for installing dopamine from source on a python3.10 virtual environment, I got the following error:

    ImportError: cannot import name 'isin' from 'jax._src.numpy.lax_numpy'
    

    this stackoverflow question has information on the same problem; in my environment, the flax version installed was 0.3.6, and updating it to the most recent (0.4.2) solved this issue. However, I could not attempt further tests if only updating the flax version o requirements is possible

  • NoisyNets implementation issues

    NoisyNets implementation issues

    I'm implementing my own RL framework in Jax to better understand RL algorithms and found your code very helpful

    Looking at the NoisyNets implementation, on line 316 and 317 (https://github.com/google/dopamine/blob/master/dopamine/jax/networks.py) The same rng_key is used each time noise is generated meaning that no 'new' noise is generated each time an input is passed to the layer. In effect, the layer just applies a linear transform I think

    This is a short testing example

    import jax
    import numpy as np
    
    from dopamine.jax.networks import NoisyNetwork
    
    if __name__ == '__main__':
        rng = jax.random.PRNGKey(1)
        rng, rng_net_def, rng_net_param = jax.random.split(rng, num=3)
    
        net_def = NoisyNetwork(rng_key=rng_net_def, eval_mode=False)
        net_params = net_def.init(rng_net_param, x=np.zeros(10), features=3)
    
        state = np.random.random(10)
        print(net_def.apply(net_params, x=state, features=3))
        print(net_def.apply(net_params, x=state, features=3))
    

    If this is an issue, then I implemented the following code for my framework

    from typing import Sequence
    
    import jax
    import numpy as onp
    import jax.numpy as jnp
    from flax import linen as nn
    
    class NoisyDense(nn.Module):
        features: int
    
        use_bias: bool = True
    
        @staticmethod
        @jax.jit
        def _f(x: jnp.ndarray) -> jnp.ndarray:
            # See (10) and (11) in Fortunato et al. (2018).
            return jnp.multiply(jnp.sign(x), jnp.power(jnp.abs(x), 0.5))
    
        @nn.compact
        def __call__(self, inputs: onp.ndarray, eval_mode: bool = True, rng: jnp.DeviceArray = None) -> jnp.ndarray:
            if eval_mode:  # Turn off noise during evaluation
                w_epsilon = jnp.zeros(shape=(inputs.shape[0], self.features), dtype=onp.float32)
                b_epsilon = jnp.zeros(shape=(self.features,), dtype=onp.float32)
            else:  # Factored gaussian noise in (10) and (11) in Fortunato et al. (2018).
                p_key, q_key = jax.random.split(rng)
                p, q = jax.random.normal(p_key, [inputs.shape[0], 1]), jax.random.normal(q_key, [1, self.features])
                f_p, f_q = self._f(p), self._f(q)
                w_epsilon, b_epsilon = f_p * f_p, jnp.squeeze(f_q)
    
            def _mu_init(key: jnp.DeviceArray, shape: Sequence[int]):
                # Initialization of mean noise parameters (Section 3.2)
                mean = 1 / jnp.power(inputs.shape[0], 0.5)
                return jax.random.uniform(key, minval=-mean, maxval=mean, shape=shape)
    
            def _sigma_init(_key: jnp.DeviceArray, shape: Sequence[int], dtype=jnp.float32):
                # Initialization of sigma noise parameters (Section 3.2)
                return jnp.ones(shape, dtype) * (0.1 / onp.sqrt(inputs.shape[0]))
    
            # See (8) and (9) in Fortunato et al. (2018) for output computation.
            w_mu = self.param('kernel_mu', _mu_init, (inputs.shape[0], self.features))
            w_sigma = self.param('kernel_sigma', _sigma_init, (inputs.shape[0], self.features))
            out = jnp.matmul(inputs, w_mu + jnp.multiply(w_sigma, w_epsilon))
    
            if self.use_bias:
                b_mu = self.param('bias_mu', _mu_init, (self.features,))
                b_sigma = self.param('bias_sigma', _sigma_init, (self.features,))
                out = out + b_mu + jnp.multiply(b_sigma, b_epsilon)
            return out
    

    Here is some similar testing code

    if __name__ == '__main__':
        rng = jax.random.PRNGKey(1)
        rng, rng_net_def, rng_net_param = jax.random.split(rng, num=3)
    
        net_def = NoisyDense(features=2)
        net_params = net_def.init(rng_net_param, np.zeros(10))
    
        state = np.random.random(10)
        print(net_def.apply(net_params, inputs=state))
        print(net_def.apply(net_params, inputs=state, eval_mode=False, rng=rng_net_def))
        print(net_def.apply(net_params, inputs=state, eval_mode=False, rng=rng))
    

    I would have submitted this as a pull request but noticed that you are not accepting merges

  • Dockerfile doesn't work anymore

    Dockerfile doesn't work anymore

    I had several issues running dopamine with the last docker image because of the last Jax and CUDA changes. Particularly with the cudnn versions.

    I made the following changes to the dockerfile and managed to the image successfully in a gpu:

    • Changed this line with ARG cuda_docker_tag="11.4.2-cudnn8-devel-ubuntu20.04" to update the docker cuda image to 11.4.2:

    • Changed this line to RUN pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_releases.html since the installation command changed.

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Aug 1, 2022
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Aug 1, 2022
A toolkit for reproducible reinforcement learning research.
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Jul 31, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

Aug 2, 2022
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

Aug 8, 2022
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Aug 1, 2022
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Jul 29, 2022
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Aug 2, 2022
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

Aug 2, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Aug 2, 2022
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

Jul 30, 2022
Deep Reinforcement Learning for Keras.
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Aug 3, 2022
ChainerRL is a deep reinforcement learning library built on top of Chainer.
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Jul 24, 2022
Open world survival environment for reinforcement learning
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Aug 5, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

Aug 1, 2022
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

Jul 26, 2022
A customisable 3D platform for agent-based AI research
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

Jul 30, 2022
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

Serpent.AI - Game Agent Framework (Python) Update: Revival (May 2020) Development work has resumed on the framework with the aim of bringing it into 2

Aug 8, 2022