Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach

CI License Docs DOI

Coach Logo

Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms.

It exposes a set of easy-to-use APIs for experimenting with new RL algorithms, and allows simple integration of new environments to solve. Basic RL components (algorithms, environments, neural network architectures, exploration policies, ...) are well decoupled, so that extending and reusing existing components is fairly painless.

Training an agent to solve an environment is as easy as running:

coach -p CartPole_DQN -r

Fetch Slide Pendulum Starcraft
Doom Deathmatch CARLA MontezumaRevenge
Doom Health Gathering PyBullet Minitaur Gym Extensions Ant

Table of Contents

Benchmarks

One of the main challenges when building a research project, or a solution based on a published algorithm, is getting a concrete and reliable baseline that reproduces the algorithm's results, as reported by its authors. To address this problem, we are releasing a set of benchmarks that shows Coach reliably reproduces many state of the art algorithm results.

Installation

Note: Coach has only been tested on Ubuntu 16.04 LTS, and with Python 3.5.

For some information on installing on Ubuntu 17.10 with Python 3.6.3, please refer to the following issue: https://github.com/IntelLabs/coach/issues/54

In order to install coach, there are a few prerequisites required. This will setup all the basics needed to get the user going with running Coach on top of OpenAI Gym environments:

# General
sudo -E apt-get install python3-pip cmake zlib1g-dev python3-tk python-opencv -y

# Boost libraries
sudo -E apt-get install libboost-all-dev -y

# Scipy requirements
sudo -E apt-get install libblas-dev liblapack-dev libatlas-base-dev gfortran -y

# PyGame
sudo -E apt-get install libsdl-dev libsdl-image1.2-dev libsdl-mixer1.2-dev libsdl-ttf2.0-dev
libsmpeg-dev libportmidi-dev libavformat-dev libswscale-dev -y

# Dashboard
sudo -E apt-get install dpkg-dev build-essential python3.5-dev libjpeg-dev  libtiff-dev libsdl1.2-dev libnotify-dev 
freeglut3 freeglut3-dev libsm-dev libgtk2.0-dev libgtk-3-dev libwebkitgtk-dev libgtk-3-dev libwebkitgtk-3.0-dev
libgstreamer-plugins-base1.0-dev -y

# Gym
sudo -E apt-get install libav-tools libsdl2-dev swig cmake -y

We recommend installing coach in a virtualenv:

sudo -E pip3 install virtualenv
virtualenv -p python3 coach_env
. coach_env/bin/activate

Finally, install coach using pip:

pip3 install rl_coach

Or alternatively, for a development environment, install coach from the cloned repository:

cd coach
pip3 install -e .

If a GPU is present, Coach's pip package will install tensorflow-gpu, by default. If a GPU is not present, an Intel-Optimized TensorFlow, will be installed.

In addition to OpenAI Gym, several other environments were tested and are supported. Please follow the instructions in the Supported Environments section below in order to install more environments.

Getting Started

Tutorials and Documentation

Jupyter notebooks demonstrating how to run Coach from command line or as a library, implement an algorithm, or integrate an environment.

Framework documentation, algorithm description and instructions on how to contribute a new agent/environment.

Basic Usage

Running Coach

To allow reproducing results in Coach, we defined a mechanism called preset. There are several available presets under the presets directory. To list all the available presets use the -l flag.

To run a preset, use:

coach -r -p <preset_name>

For example:

  • CartPole environment using Policy Gradients (PG):

    coach -r -p CartPole_PG
  • Basic level of Doom using Dueling network and Double DQN (DDQN) algorithm:

    coach -r -p Doom_Basic_Dueling_DDQN

Some presets apply to a group of environment levels, like the entire Atari or Mujoco suites for example. To use these presets, the requeseted level should be defined using the -lvl flag.

For example:

  • Pong using the Neural Episodic Control (NEC) algorithm:

    coach -r -p Atari_NEC -lvl pong

There are several types of agents that can benefit from running them in a distributed fashion with multiple workers in parallel. Each worker interacts with its own copy of the environment but updates a shared network, which improves the data collection speed and the stability of the learning process. To specify the number of workers to run, use the -n flag.

For example:

  • Breakout using Asynchronous Advantage Actor-Critic (A3C) with 8 workers:

    coach -r -p Atari_A3C -lvl breakout -n 8

It is easy to create new presets for different levels or environments by following the same pattern as in presets.py

More usage examples can be found here.

Running Coach Dashboard (Visualization)

Training an agent to solve an environment can be tricky, at times.

In order to debug the training process, Coach outputs several signals, per trained algorithm, in order to track algorithmic performance.

While Coach trains an agent, a csv file containing the relevant training signals will be saved to the 'experiments' directory. Coach's dashboard can then be used to dynamically visualize the training signals, and track algorithmic behavior.

To use it, run:

dashboard

Coach Design

Distributed Multi-Node Coach

As of release 0.11.0, Coach supports horizontal scaling for training RL agents on multiple nodes. In release 0.11.0 this was tested on the ClippedPPO and DQN agents. For usage instructions please refer to the documentation here.

Batch Reinforcement Learning

Training and evaluating an agent from a dataset of experience, where no simulator is available, is supported in Coach. There are example presets and a tutorial.

Supported Environments

  • OpenAI Gym:

    Installed by default by Coach's installer

  • ViZDoom:

    Follow the instructions described in the ViZDoom repository -

    https://github.com/mwydmuch/ViZDoom

    Additionally, Coach assumes that the environment variable VIZDOOM_ROOT points to the ViZDoom installation directory.

  • Roboschool:

    Follow the instructions described in the roboschool repository -

    https://github.com/openai/roboschool

  • GymExtensions:

    Follow the instructions described in the GymExtensions repository -

    https://github.com/Breakend/gym-extensions

    Additionally, add the installation directory to the PYTHONPATH environment variable.

  • PyBullet:

    Follow the instructions described in the Quick Start Guide (basically just - 'pip install pybullet')

  • CARLA:

    Download release 0.8.4 from the CARLA repository -

    https://github.com/carla-simulator/carla/releases

    Install the python client and dependencies from the release tarball:

    pip3 install -r PythonClient/requirements.txt
    pip3 install PythonClient
    

    Create a new CARLA_ROOT environment variable pointing to CARLA's installation directory.

    A simple CARLA settings file (CarlaSettings.ini) is supplied with Coach, and is located in the environments directory.

  • Starcraft:

    Follow the instructions described in the PySC2 repository -

    https://github.com/deepmind/pysc2

  • DeepMind Control Suite:

    Follow the instructions described in the DeepMind Control Suite repository -

    https://github.com/deepmind/dm_control

Supported Algorithms

Coach Design

Value Optimization Agents

Policy Optimization Agents

General Agents

Imitation Learning Agents

Hierarchical Reinforcement Learning Agents

Memory Types

Exploration Techniques

Citation

If you used Coach for your work, please use the following citation:

@misc{caspi_itai_2017_1134899,
  author       = {Caspi, Itai and
                  Leibovich, Gal and
                  Novik, Gal and
                  Endrawis, Shadi},
  title        = {Reinforcement Learning Coach},
  month        = dec,
  year         = 2017,
  doi          = {10.5281/zenodo.1134899},
  url          = {https://doi.org/10.5281/zenodo.1134899}
}

Contact

We'd be happy to get any questions or contributions through GitHub issues and PRs.

Please make sure to take a look here before filing an issue or proposing a PR.

The Coach development team can also be contacted over email

Disclaimer

Coach is released as a reference code for research purposes. It is not an official Intel product, and the level of quality and support may not be as expected from an official product. Additional algorithms and environments are planned to be added to the framework. Feedback and contributions from the open source and RL research communities are more than welcome.

Comments
  • invalid object?

    invalid object?

    I just tested running coach from source: 4fe9cba44508f258fc73286d6cbf0af4b1fdfa50

    on both ubuntu 14.04 and Macos high Sierra and I get the exact same error:

    python coach.py -p CartPole_DQN -r

    /home/jtoy/anaconda3/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
      return f(*args, **kwds)
    Warning: failed to import the following packages - RoboSchool, GymExtensions, ViZDoom, CARLA, Neon
    Please enter an experiment name: test
    Using tensorflow framework
    Traceback (most recent call last):
      File "coach.py", line 275, in <module>
        env_instance = create_environment(tuning_parameters)
      File "/home/jtoy/sandbox/touchnet/related_projects/coach/environments/__init__.py", line 32, in create_environment
        env = eval(env_type)(tuning_parameters)
      File "/home/jtoy/anaconda3/lib/python3.6/site-packages/pandas/core/computation/eval.py", line 267, in eval
        ret = eng_inst.evaluate()
      File "/home/jtoy/anaconda3/lib/python3.6/site-packages/pandas/core/computation/engines.py", line 75, in evaluate
        res = self._evaluate()
      File "/home/jtoy/anaconda3/lib/python3.6/site-packages/pandas/core/computation/engines.py", line 122, in _evaluate
        return ne.evaluate(s, local_dict=scope, truediv=truediv)
      File "/home/jtoy/anaconda3/lib/python3.6/site-packages/numexpr/necompiler.py", line 807, in evaluate
        zip(names, arguments)]
      File "/home/jtoy/anaconda3/lib/python3.6/site-packages/numexpr/necompiler.py", line 806, in <listcomp>
        signature = [(name, getType(arg)) for (name, arg) in
      File "/home/jtoy/anaconda3/lib/python3.6/site-packages/numexpr/necompiler.py", line 704, in getType
        raise ValueError("unknown type %s" % a.dtype.name)
    ValueError: unknown type object
    

    I installed all dependencies with pip install -r requirements_coach.txt

    Is there an issue in master or am I missing something basic?

  • The network's configuration of CARLA_DDPG

    The network's configuration of CARLA_DDPG

    Hi,I try to run The CARLA_DDPG preset in coach,with The help of The document,I run it successfully. However,i want to dig into the implementation of DDPG in coach, I have reviewed The code of CARLA_DDPG.py, and i have figured out both the network of actor and critic, as the below picture, who can help me revise the understanding and give me some supplementary advise?

  • Problems with PPO/ClippedPPO

    Problems with PPO/ClippedPPO

    Hey Guys,

    I've trouble with the likely hood ratio and nan/inf. I'm using entropy regularization for the exploration and its getting quite low so i think that the distributions at some point can't be compared anymore. My model learns until a certain point and then the nans happen. Adding a small epsilon in the ratio avoids the nans but then the reward curve is just dropping at some point and the model is not learning anymore. (The KL Divergence is also divergent) I'm using my own environment, a feed forward architecture and have a continious problem.

    I've already tried many things:

    • Optimizers: Adam(with different epsilons), RMSProp
    • Reducing the LR (that just postpones the crash)
    • Reducing the clipping(0.1) and the epochs, clip the gradients
    • Changing the coefficients for the value loss, policy loss and entropy
    • Changed the weight initializers and the network sizes (~a bigger network postpones the problem)
    • Changed the activation function (relu, lrelu, selu, tanh)

    If i change the beta coefficient for the entropy i get either an ever increasing entropy or it falls until the crash happens. The agent learns pretty well until that point, so i suppose i haven't made any error in my implementation. I may have made an error in the amount i have changed the parameters.

    Any tips or ideas to that?

  • Further improvement of using trained agents in production

    Further improvement of using trained agents in production

    Hey all,

    After using rl_coach for some days now, I have trained some models that seem promising. Related directly to issue #71 I have tried to do this with tensorflow. As suggested in the referenced issue, TF Serving can be used to accomplish it. However, on my side I don't need to go online (and I think that many users won't need it also) so something like:

    Loading graph > Loading weights/parameters > Performing the operation (i.e. somekind of .act() or just .run() the op in a tf.Session) would be sufficient.

    I have tried this path using the different checkpoints saved. Let me share some code:

        ### Create the session in which we will run.
        tensorflowSess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
    
        ############# LOADING #############
        ### First let's load meta graph:
        ### NOTE: Restarting training from saved meta_graph only works if the device assignments have not changed > allow_soft_placement=True
        restorerObject = tf.train.import_meta_graph(metaGraphPath)
    
        ### Then, restore weights (paramterers) of that graph:
        restorerObject.restore(tensorflowSess, ckptFilePath)
    
        ### Finally, get the operations we want to run and create the feed_dict:
        restoredGraph = tf.get_default_graph()
        
        '''If we want to return a value, 
          we need to get the tensor of that operation (whatever:0/1...) 
          because the tensor is the thing that holds the returned value, 
          not directly the operation we get with get_operation_by_name.
          Furthermore, it seems that taking the last operation of the graph, 
          populates the graph up to the beginning'''
    
        feedingXObservation = restoredGraph.get_tensor_by_name('main_level/agent/main/online/Placeholder:0')
    

    The problem here is that the we need to know 1) the name of the tensor that feeds the data at the beggining to the NN architecture used in each agent and 2) the last operation that outputs the action values (or its probabilities in the case of the Rainbow algorithm for example) so that we can feed the new observation to make inference.

    The print_networks_summary=True in the VisualizationParameters gives some hint about what to look for. However, there is no clarity on how to go about this. For example, let's say that as most of us we want to get the first placeholder to feed the observation and for the last operation to get the tensor (in the example of a Rainbow agent, being one of the most complex, we have the following architecture:)

    Network: main, Copies: 2 (online network | target network)
    ----------------------------------------------------------
    Input Embedder: observation
            Input size = [163]
            Noisy Dense (num outputs = 256)
            Activation (type = <function relu at 0x7f90a65608c8>)
    Middleware:
            No layers
    Output Head: rainbow_q_values_head
            State Value Stream - V
                    Dense (num outputs = 512)
                    Dense (num outputs = 51)
            Action Advantage Stream - A
                    Dense (num outputs = 512)
                    Dense (num outputs = 153)
                    Reshape (new size = 3 x 51)
                    Subtract(A, Mean(A))
            Add (V, A)
            Softmax
    

    Let's look for the first placeholder and for the last softmax layer.

    print([n.name for n in restoredGraph.as_graph_def().node if 'Softmax' in n.op]) gives:

    ['main_level/agent/main/online/network_0/rainbow_q_values_head_0/Softmax', 'main_level/agent/main/online/network_0/rainbow_q_values_head_0/softmax_cross_entropy_with_logits_sg', 'main_level/agent/main/online/network_0/rainbow_q_values_head_0/softmax_1', 'main_level/agent/main/online/gradients/main_level/agent/main/online/network_0/rainbow_q_values_head_0/softmax_cross_entropy_with_logits_sg_grad/LogSoftmax', 'main_level/agent/main/target/network_0/rainbow_q_values_head_0/Softmax', 'main_level/agent/main/target/network_0/rainbow_q_values_head_0/softmax_cross_entropy_with_logits_sg', 'main_level/agent/main/target/network_0/rainbow_q_values_head_0/softmax_1', 'main_level/agent/main/target/gradients/main_level/agent/main/target/network_0/rainbow_q_values_head_0/softmax_cross_entropy_with_logits_sg_grad/LogSoftmax']

    and looking for the first placeholder like:

    print([n.name for n in restoredGraph.as_graph_def().node if 'Placeholder' in n.op]) gives:

    ['main_level/agent/main/online/Placeholder', 'main_level/agent/main/online/network_0/observation/observation', 'main_level/agent/main/online/network_0/gradients_from_head_0-0_rescalers_1', 'main_level/agent/main/online/network_0/rainbow_q_values_head_0/distributions', 'main_level/agent/main/online/network_0/rainbow_q_values_head_0/rainbow_q_values_head_0_importance_weight', 'main_level/agent/main/online/0_holder', 'main_level/agent/main/online/1_holder', 'main_level/agent/main/online/2_holder', 'main_level/agent/main/online/3_holder', 'main_level/agent/main/online/4_holder', 'main_level/agent/main/online/5_holder', 'main_level/agent/main/online/6_holder', 'main_level/agent/main/online/7_holder', 'main_level/agent/main/online/8_holder', 'main_level/agent/main/online/9_holder', 'main_level/agent/main/online/10_holder', 'main_level/agent/main/online/11_holder', 'main_level/agent/main/online/12_holder', 'main_level/agent/main/online/13_holder', 'main_level/agent/main/online/14_holder', 'main_level/agent/main/online/15_holder', 'main_level/agent/main/online/16_holder', 'main_level/agent/main/online/17_holder', 'main_level/agent/main/online/18_holder', 'main_level/agent/main/online/19_holder', 'main_level/agent/main/online/20_holder', 'main_level/agent/main/online/output_gradient_weights', 'main_level/agent/main/target/Placeholder', 'main_level/agent/main/target/network_0/observation/observation', 'main_level/agent/main/target/network_0/gradients_from_head_0-0_rescalers_1', 'main_level/agent/main/target/network_0/rainbow_q_values_head_0/distributions', 'main_level/agent/main/target/network_0/rainbow_q_values_head_0/rainbow_q_values_head_0_importance_weight', 'main_level/agent/main/target/0_holder', 'main_level/agent/main/target/1_holder', 'main_level/agent/main/target/2_holder', 'main_level/agent/main/target/3_holder', 'main_level/agent/main/target/4_holder', 'main_level/agent/main/target/5_holder', 'main_level/agent/main/target/6_holder', 'main_level/agent/main/target/7_holder', 'main_level/agent/main/target/8_holder', 'main_level/agent/main/target/9_holder', 'main_level/agent/main/target/10_holder', 'main_level/agent/main/target/11_holder', 'main_level/agent/main/target/12_holder', 'main_level/agent/main/target/13_holder', 'main_level/agent/main/target/14_holder', 'main_level/agent/main/target/15_holder', 'main_level/agent/main/target/16_holder', 'main_level/agent/main/target/17_holder', 'main_level/agent/main/target/18_holder', 'main_level/agent/main/target/19_holder', 'main_level/agent/main/target/20_holder', 'main_level/agent/main/target/output_gradient_weights', 'Placeholder', 'Placeholder_1', 'Placeholder_2', 'Placeholder_3', 'Placeholder_4', 'Placeholder_5', 'Placeholder_6', 'Placeholder_7', 'Placeholder_8', 'Placeholder_9', 'Placeholder_10', 'Placeholder_11', 'Placeholder_12', 'Placeholder_13', 'Placeholder_14', 'Placeholder_15', 'Placeholder_16', 'Placeholder_17', 'Placeholder_18', 'Placeholder_19', 'Placeholder_20', 'Placeholder_21', 'Placeholder_22', 'Placeholder_23', 'Placeholder_24', 'Placeholder_25', 'Placeholder_26', 'Placeholder_27', 'Placeholder_28', 'Placeholder_29', 'Placeholder_30', 'Placeholder_31', 'Placeholder_32', 'Placeholder_33', 'Placeholder_34', 'Placeholder_35', 'Placeholder_36', 'Placeholder_37', 'Placeholder_38', 'Placeholder_39', 'Placeholder_40', 'Placeholder_41', 'Placeholder_42', 'Placeholder_43', 'Placeholder_44', 'Placeholder_45', 'Placeholder_46', 'Placeholder_47', 'Placeholder_48', 'Placeholder_49', 'Placeholder_50', 'Placeholder_51', 'Placeholder_52', 'Placeholder_53', 'Placeholder_54', 'Placeholder_55', 'Placeholder_56', 'Placeholder_57', 'Placeholder_58', 'Placeholder_59', 'Placeholder_60', 'Placeholder_61', 'Placeholder_62', 'Placeholder_63', 'Placeholder_64', 'Placeholder_65', 'save/filename', 'save/Const']

    So, my intuition is towards picking the Placeholder:0 and the 'main_level/agent/main/target/gradients/main_level/agent/main/target/network_0/rainbow_q_values_head_0/softmax_cross_entropy_with_logits_sg_grad/LogSoftmax:0' tensor by using get_tensor_by_name, but I'm not sure on how to interpret all that information and how to be certain.

    I think that this feature is crucial so that the framework can complete the creation and development cycle and be further developed in a PR or at least upgraded not directly in rl_coach but with TF (my idea would be to just give explicit names to the tensors that are needed to make this happen > i.e. the first one and the final one).

    ¿Any thougts on this? @gal-leibovich @galnov and others I can try to help it happen on my side, but I don't know your ideas regarding this important core part of coach.

    If there is another way to do it (I'm aware that it could be done loading all the coach framework, something like:)

    ### Create all the graph and then restore_checkpoint().
    ### Get the observation...
    
    action_info = coach.graph_manager.get_agent().choose_action(observation)
    print("State:{}, Action:{}".format(observation,action_info.action))
    

    If that is possible and "the way" to go, it would be awesome to create a mini-tutorial on how to load a pretained model, once exited the training.

  • Direct Future Prediction

    Direct Future Prediction

    I get a strange error when I try to run DFP on a custom environment (with a Discrete action space).

    AttributeError                            Traceback (most recent call last)
    <ipython-input-35-2055451d4b4a> in <module>()
         22     agent_params=agent_params,
         23     env_params=env_params,
    ---> 24     schedule_params=SimpleSchedule()
         25 )
    
    ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/rl_coach/graph_managers/basic_rl_graph_manager.py in __init__(self, agent_params, env_params, schedule_params, vis_params, preset_validation_params)
         39 
         40         self.agent_params.visualization = vis_params
    ---> 41         if self.agent_params.input_filter is None:
         42             self.agent_params.input_filter = env_params.default_input_filter()
         43         if self.agent_params.output_filter is None:
    
    AttributeError: 'DFPAlgorithmParameters' object has no attribute 'input_filter'
    

    The invocation is as follows:

    # define the environment parameters
    bit_length = 10
    env_params = GymVectorEnvironment(level='./custom.py')
    env_params.additional_simulator_parameters = { 'num_states': 100}
    
    agent_params = DFPAlgorithmParameters()
    
    graph_manager = BasicRLGraphManager(
        agent_params=agent_params,
        env_params=env_params,
        schedule_params=SimpleSchedule()
    )
    
  • Now able to use and create custom tensorflow heads, embedders, and middleware.

    Now able to use and create custom tensorflow heads, embedders, and middleware.

    Ref #134

    I modified the following classes:

    • HeadParameters
    • MiddlewareParameters
    • InputEmbedderParameters

    Adding a path property (or function as I mention in challenges).

    Then I modified:

    • GeneralTensorFlowNetwork.get_input_embedder
    • GeneralTensorFlowNetwork.get_middleware
    • GeneralTensorFlowNetwork.get_output_head

    To use these paths instead of their own local's.

    I moved a local dictionary inside GeneralTensorFlowNetwork.get_input_embedder called mod_names to embedder_parameters.MOD_NAMES so that it's more accessible.

    Challenges

    • InputEmbedderParameters.path can not be a property like the rest. You can call it with emb_type and the path will be created. But that's different than how most path's are made.

    Pytest

    I ran pytest locally and do not see any dramatic changes in the number of passing tests.

  • Installation using pip failed

    Installation using pip failed

    After running command pip3 install rl_coach, I got the following error message:

    Collecting rl_coach
      Downloading https://files.pythonhosted.org/packages/95/c9/3e92accfc8f967cda8fd37632ec7ec0a4b5ba71e5a8a4a6df2390adba625/rl-coach-0.10.0.4.tar.gz (223kB)
        100% |████████████████████████████████| 225kB 258kB/s
        Complete output from command python setup.py egg_info:
        /bin/sh: 1: pip: not found
        Traceback (most recent call last):
          File "<string>", line 1, in <module>
          File "/tmp/pip-build-kocjns51/rl-coach/setup.py", line 63, in <module>
            shell=True)
          File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
            raise CalledProcessError(retcode, cmd)
        subprocess.CalledProcessError: Command '['pip install https://anaconda.org/intel/tensorflow/1.6.0/download/tensorflow-1.6.0-cp35-cp35m-linux_x86_64.whl']' returned non-zero exit status 127
    
        ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-kocjns51/rl-coach/
    
    

    Does anybody encounter the same problem?

  • some wonderful algorithm

    some wonderful algorithm

    https://github.com/pathak22/noreward-rl https://pathak22.github.io/noreward-rl/ realAI for Deep Reinforcement Learning ICM algorithm ? (Curiosity-driven Exploration for Deep Reinforcement Learning - realAI

  • Cannot import minio.error ResponseError

    Cannot import minio.error ResponseError

    Hi experts, I just refer to the tutorials and found this error running it. Do I must have minio working to use Coach RL? May I know how do I solve this? Is it only for visualization? What lines could I remove to make it work?

    Environment:

    • Ubuntu 18.04
    • minio==7.0.2
    • rl-coach==1.0.1
    Traceback (most recent call last):
      File "batch_rl.py", line 13, in <module>
        from rl_coach.agents.ddqn_bcq_agent import DDQNBCQAgentParameters, KNNParameters
      File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/rl_coach/agents/ddqn_bcq_agent.py", line 25, in <module>
        from rl_coach.graph_managers.batch_rl_graph_manager import BatchRLGraphManager
      File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/rl_coach/graph_managers/batch_rl_graph_manager.py", line 26, in <module>
        from rl_coach.graph_managers.graph_manager import ScheduleParameters
      File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/rl_coach/graph_managers/graph_manager.py", line 35, in <module>
        from rl_coach.data_stores.data_store_impl import get_data_store as data_store_creator
      File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/rl_coach/data_stores/data_store_impl.py", line 19, in <module>
        from rl_coach.data_stores.s3_data_store import S3DataStore, S3DataStoreParameters
      File "/home/hsinyu/hyliu_Python/lib/python3.6/site-packages/rl_coach/data_stores/s3_data_store.py", line 21, in <module>
        from minio.error import ResponseError
    ImportError: cannot import name 'ResponseError'
    
    from minio.error import ResponseError
    ImportError: cannot import name 'ResponseError'
    
  • The reward function in carla_environment.py

    The reward function in carla_environment.py

    Hi, recently i am concerned on my graduation project in CARLA, I have noticed that the reward function of CARLA in coach was totally different from the formula introduced by "CARLA: An Open Urban Driving Simulator". While in the implementation of carla_environment.py, I saw the reward was calculated in this way:

    self.reward = speed_reward - (measurements.player_measurements.intersection_otherlane * 5) - (measurements.player_measurements.intersection_offroad * 5) - is_collision * 100 - np.abs(self.control.steer) * 10

    Honestly, I have trained my agent based on the reward formula of CARLA's paper, it seemed he needs many episodes to run util produce a good performance, sometimes, it even couldn't converge, although I used the similar network in DDPG algorithm. Could you explain why you chose this reward formula? I really appreciate that. @galnov @galleibo-intel @shadiendrawis @itaicaspi

  • Changes to avoid memory leak in Rollout worker

    Changes to avoid memory leak in Rollout worker

    Currently in rollout worker, we call restore_checkpoint repeatedly to load the latest model in memory. The restore checkpoint functions calls checkpoint_saver. Checkpoint saver uses GlobalVariablesSaver which does not release the references of the previous model variables. This leads to the situation where the memory keeps on growing before crashing the rollout worker.

    This change avoid using the checkpoint saver in the rollout worker as I believe it is not needed in this code path.

    Also added a test to easily reproduce the issue using CartPole example. We were also seeing this issue with the AWS DeepRacer implementation and the current implementation avoid the memory leak there as well.

  • ImportError: cannot import name 'ResponseError' from 'minio.error' when rl-coach installed with pip

    ImportError: cannot import name 'ResponseError' from 'minio.error' when rl-coach installed with pip

    I saw this issue was closed earlier, but I still receiving it with the version coming from pip.

    ImportError: cannot import name 'ResponseError' from 'minio.error'

    It can be solved manually after install by replacing all "ResponseError" to "InvalidResponseError" in /rl_coach/data_stores/s3_data_store.py

  • ERROR: No matching distribution found for tensorflow-gpu==1.9.0

    ERROR: No matching distribution found for tensorflow-gpu==1.9.0

    Collecting joblib>=0.17.0
      Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
    ERROR: Could not find a version that satisfies the requirement tensorflow-gpu<=1.14.0,>=1.9.0 (from rl-coach) (from versions: 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.5.0, 2.5.1, 2.5.2, 2.5.3, 2.6.0, 2.6.1, 2.6.2, 2.6.3, 2.7.0rc0, 2.7.0rc1, 2.7.0, 2.7.1, 2.8.0rc0, 2.8.0rc1, 2.8.0)
    ERROR: No matching distribution found for tensorflow-gpu<=1.14.0,>=1.9.0
    

    Seems like the joblib library is deprecated, and using old version of tensorflow.

  • Categorical DQN - dimension error

    Categorical DQN - dimension error

    Hi,

    I don't post issues very often, so I hope my problem is clear enough the way I present it below. When trying to train a Categorical DQN (for Batch RL, no interaction with environment), I run into the following error:

    _Traceback (most recent call last):

    File "", line 129, in graph_manager.improve()

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\batch_rl_graph_manager.py", line 234, in improve self.train()

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in train [manager.train() for manager in self.level_managers]

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in [manager.train() for manager in self.level_managers]

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in train [agent.train() for agent in self.agents.values()]

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in [agent.train() for agent in self.agents.values()]

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\agent.py", line 741, in train total_loss, losses, unclipped_grads = self.learn_from_batch(batch)

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\categorical_dqn_agent.py", line 113, in learn_from_batch self.q_values.add_sample(self.distribution_prediction_to_q_values(TD_targets))

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\categorical_dqn_agent.py", line 82, in distribution_prediction_to_q_values return np.dot(prediction, self.z_values)

    File "<array_function internals>", line 6, in dot

    ValueError: shapes (128,2) and (51,) not aligned: 2 (dim 1) != 51 (dim 0)_

    The 2 (dim 1) is the number of actions in my ActionSpace, and the 51 (dim 0) corresponds to the number of atoms set in the agent's parameters. So the error suggests that these should be of equal length, which seems strange to me. Is this indeed true? Should these be of the same length? When setting the numbers of atoms to 2 (to get rid of this error) I got the following error:

    _Traceback (most recent call last):

    File "", line 129, in graph_manager.improve()

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\batch_rl_graph_manager.py", line 234, in improve self.train()

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in train [manager.train() for manager in self.level_managers]

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\graph_managers\graph_manager.py", line 408, in [manager.train() for manager in self.level_managers]

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in train [agent.train() for agent in self.agents.values()]

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\level_manager.py", line 187, in [agent.train() for agent in self.agents.values()]

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\agent.py", line 741, in train total_loss, losses, unclipped_grads = self.learn_from_batch(batch)

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\rl_coach\agents\categorical_dqn_agent.py", line 116, in learn_from_batch target_actions = np.argmax(self.distribution_prediction_to_q_values(distributional_q_st_plus_1), axis=1)

    File "<array_function internals>", line 6, in argmax

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\numpy\core\fromnumeric.py", line 1188, in argmax return _wrapfunc(a, 'argmax', axis=axis, out=out)

    File "C:\Users\colin.conda\envs\py36\lib\site-packages\numpy\core\fromnumeric.py", line 58, in _wrapfunc return bound(*args, **kwds)

    AxisError: axis 1 is out of bounds for array of dimension 1_

    I tried setting the axis to zero, but this results in more complex errors, so I assumed this is not the way to go. Does anyone have a clue how I can fix this error? Any suggestions would be of great help, thanks in advance!

  • How to load a pretrained model (e.g. SAC) pb file to coach and continue to train ?

    How to load a pretrained model (e.g. SAC) pb file to coach and continue to train ?

    Now I have a mode.pb file and know its network architecture (which is trained by coach before) but I have no access to its original code. I want to load it by coach and write some code to continue to train this model. How can that be accomplished?

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

Aug 8, 2022
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Aug 1, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Aug 8, 2022
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Aug 1, 2022
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

Aug 2, 2022
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Aug 1, 2022
A toolkit for reproducible reinforcement learning research.
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Jul 31, 2022
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Jul 29, 2022
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Aug 2, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Aug 2, 2022
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

Jul 30, 2022
Deep Reinforcement Learning for Keras.
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Aug 3, 2022
ChainerRL is a deep reinforcement learning library built on top of Chainer.
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Jul 24, 2022
Open world survival environment for reinforcement learning
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Aug 5, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

Aug 1, 2022
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

Jul 26, 2022
Fully Automated YouTube Channel ▶️with Added Extra Features.

Fully Automated Youtube Channel ▒█▀▀█ █▀▀█ ▀▀█▀▀ ▀▀█▀▀ █░░█ █▀▀▄ █▀▀ █▀▀█ ▒█▀▀▄ █░░█ ░░█░░ ░▒█░░ █░░█ █▀▀▄ █▀▀ █▄▄▀ ▒█▄▄█ ▀▀▀▀ ░░▀░░ ░▒█░░ ░▀▀▀ ▀▀▀░

Aug 3, 2022
Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach
Roach: End-to-End Urban Driving by Imitating a Reinforcement Learning Coach

CARLA-Roach This is the official code release of the paper End-to-End Urban Driving by Imitating a Reinforcement Learning Coach by Zhejun Zhang, Alexa

Jul 30, 2022
piSTAR Lab is a modular platform built to make AI experimentation accessible and fun. (pistar.ai)
piSTAR Lab is a modular platform built to make AI experimentation accessible and fun. (pistar.ai)

piSTAR Lab WARNING: This is an early release. Overview piSTAR Lab is a modular deep reinforcement learning platform built to make AI experimentation a

Aug 1, 2022