A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Banner

Applied Reinforcement Learning @ Facebook

License CircleCI codecov

Overview

ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and used at Facebook. ReAgent is built in Python and uses PyTorch for modeling and training and TorchScript for model serving. The platform contains workflows to train popular deep RL algorithms and includes data preprocessing, feature transformation, distributed training, counterfactual policy evaluation, and optimized serving. For more detailed information about ReAgent see the white paper here.

The platform was once named "Horizon" but we have adopted the name "ReAgent" recently to emphasize its broader scope in decision making and reasoning.

Algorithms Supported

Installation

ReAgent can be installed via. Docker or manually. Detailed instructions on how to install ReAgent can be found here.

Usage

Detailed instructions on how to use ReAgent Models can be found here.

The ReAgent Serving Platform (RASP) tutorial is available here.

License

ReAgent is released under a BSD 3-Clause license. Find out more about it here.

Citing

@article{gauci2018horizon, title={Horizon: Facebook's Open Source Applied Reinforcement Learning Platform}, author={Gauci, Jason and Conti, Edoardo and Liang, Yitao and Virochsiri, Kittipat and Chen, Zhengxing and He, Yuchen and Kaden, Zachary and Narayanan, Vivek and Ye, Xiaohui}, journal={arXiv preprint arXiv:1811.00260}, year={2018} }

Comments
  • Upgrade ReAgent to use Python 3.8

    Upgrade ReAgent to use Python 3.8

    Summary: Currently, we have some test failures (https://app.circleci.com/pipelines/github/facebookresearch/ReAgent/1460/workflows/ecc21254-779b-4a89-a40d-ea317e839d96/jobs/8655) because we miss some latest features.

    Differential Revision: D26977836

  • Reimplement MDNRNN using new gym.

    Reimplement MDNRNN using new gym.

    Using our new gym, test MDNRNN feature importance/sensitivity. Also, train DQN to play POMDP string game with states embedded with MDNRNN. This is in preparation to nuke old gym folder.

  • Lightning SACTrainer

    Lightning SACTrainer

    Summary:

    • Created ReAgentLightningModule as base class to implement genrator API
    • Implemented reporting for SAC

    TODOs:

    • Convert TD3 to LightningModule
    • Fix the OSS version of model manager
    • Fix on-policy training with Gym (by creating GymDataModule)

    Differential Revision: D23857511

  • add env flag to skip frozen registry check

    add env flag to skip frozen registry check

    Summary: Environment variable SKIP_FROZEN_REGISTRY_CHECK is checked. If it's !=0, we print a warning instead of raising an error when we attempt to add members to a frozen regitry.

    Differential Revision: D32773682

  • add async_run_episode to gymrunner to support envs with async step methods

    add async_run_episode to gymrunner to support envs with async step methods

    Summary: I need this because my reward evaluation is done by an async coroutine (multiple trajectories are being generated in parallel)

    Differential Revision: D25487664

  • Migrate REINFORCE trainer to Lightning

    Migrate REINFORCE trainer to Lightning

    Summary: I migrated REINFORCE trainer to Lightning. I don't like the fake optimizer trick and I'll look into doing it more cleanly.

    Differential Revision: D26246712

  • Extend Gymrunner, add Transition and Trajectory

    Extend Gymrunner, add Transition and Trajectory

    Summary: Gymrunner is currently limited, which results in writing duplicated code when we're trying to replicate the previous gym environment's behavior, such as adding mdp_id, sequence_number to RB, or evaluating with gamma < 1.0. This change makes it easier to make those changes and without code dup.

    Differential Revision: D21616090

  • OOM killed

    OOM killed

    Hi,

    I played dqn_workflow with 7.9G training_data. But i got a OOM Killed. Below is my environment and oom logs.

    workflow : dqn_workflow.py training_data : 8 features, 20,249,257 rows, 7.9G training_eval_data : 8 features, 2,028,916 rows, 0.8G RAM : 80G

    INFO:ml.rl.evaluation.evaluation_data_page:EvaluationDataPage minibatch size: 2028912
    WARNING:ml.rl.evaluation.doubly_robust_estimator:Can't normalize DR-CPE because of small or negative logged_policy_score
    Killed
    
    [Tue May  7 22:05:38 2019] python invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
    [Tue May  7 22:05:38 2019] python cpuset=42ee6ef8b84594988960735ef211ac05221059efc2d524f2afc1e2b49eb46d0c mems_allowed=0-1
    [Tue May  7 22:05:38 2019] CPU: 1 PID: 51997 Comm: python Tainted: P           O      4.20.13-1.el7.elrepo.x86_64 #1
    [Tue May  7 22:05:38 2019] Hardware name: Dell Inc. PowerEdge C4140/013M88, BIOS 1.6.11 11/21/2018
    [Tue May  7 22:05:38 2019] Call Trace:
    [Tue May  7 22:05:38 2019]  dump_stack+0x63/0x88
    [Tue May  7 22:05:38 2019]  dump_header+0x78/0x2a4
    [Tue May  7 22:05:38 2019]  ? mem_cgroup_scan_tasks+0x9c/0xf0
    [Tue May  7 22:05:38 2019]  oom_kill_process+0x26b/0x290
    [Tue May  7 22:05:38 2019]  out_of_memory+0x140/0x4b0
    [Tue May  7 22:05:38 2019]  mem_cgroup_out_of_memory+0x4b/0x80
    [Tue May  7 22:05:38 2019]  try_charge+0x6e2/0x750
    [Tue May  7 22:05:38 2019]  mem_cgroup_try_charge+0x8c/0x1e0
    [Tue May  7 22:05:38 2019]  __add_to_page_cache_locked+0x1a0/0x300
    [Tue May  7 22:05:38 2019]  ? scan_shadow_nodes+0x30/0x30
    [Tue May  7 22:05:38 2019]  add_to_page_cache_lru+0x4e/0xd0
    [Tue May  7 22:05:38 2019]  filemap_fault+0x428/0x7c0
    [Tue May  7 22:05:38 2019]  ? xas_find+0x138/0x1a0
    [Tue May  7 22:05:38 2019]  ? filemap_map_pages+0x153/0x3c0
    [Tue May  7 22:05:38 2019]  __do_fault+0x3e/0xc0
    [Tue May  7 22:05:38 2019]  __handle_mm_fault+0xbd6/0xe80
    [Tue May  7 22:05:38 2019]  handle_mm_fault+0x102/0x220
    [Tue May  7 22:05:38 2019]  __do_page_fault+0x21c/0x4c0
    [Tue May  7 22:05:38 2019]  do_page_fault+0x37/0x140
    [Tue May  7 22:05:38 2019]  ? page_fault+0x8/0x30
    [Tue May  7 22:05:38 2019]  page_fault+0x1e/0x30
    ...
    [Tue May  7 22:05:38 2019] Memory cgroup out of memory: Kill process 51997 (python) score 997 or sacrifice child
    [Tue May  7 22:05:38 2019] Killed process 51997 (python) total-vm:102757536kB, anon-rss:83335008kB, file-rss:132692kB, shmem-rss:8192kB
    [Tue May  7 22:05:42 2019] oom_reaper: reaped process 51997 (python), now anon-rss:0kB, file-rss:127188kB, shmem-rss:8192kB
    

    image green : CPU yellow : RAM

  • Add sparse features to reward decomposition

    Add sparse features to reward decomposition

    Summary: As a showcase for how to add sparse features to ReAgent

    I take reference to mainly two examples:

    1. fbsource/fbcode/minimal_viable_ai/ifr_uv/train.py
    2. fbsource/fbcode/torchrecipes/rec/dlrm_main_fb.py

    Differential Revision: D33225789

  • Cannot build RaspCli inside the cpu Dockerfile

    Cannot build RaspCli inside the cpu Dockerfile

    I followed the installation guide for Docker cpu and can successfully go inside the container and run the tests.

    The tutorial has a part that says start an RP server. I can't do this because I don't have ./serving/build/RaspCli.

    To build RaspCli I ran the following commands inside ./serving/build:

    wget https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-1.3.0%2Bcpu.zip -O libtorch.zip && \
        unzip libtorch.zip && \
        rm libtorch.zip
    
    conda install glog
    
    apt -y install libgflags-dev \
                       libgoogle-glog-dev \
                       libboost-tools-dev \
                       libboost-thread1.62-dev
    
    cmake -DCMAKE_PREFIX_PATH=$HOME/libtorch ..
    make
    

    The build fails with many multiple definitions error. Here is an excerpt:

    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_large_model':
    (.text+0x11c): multiple definition of `__morestack_large_model'
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x11c): first defined here
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__stack_split_initialize':
    (.text+0x12c): multiple definition of `__stack_split_initialize'
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x12c): first defined here
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_get_guard':
    (.text+0x155): multiple definition of `__morestack_get_guard'
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x155): first defined here
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_set_guard':
    (.text+0x15f): multiple definition of `__morestack_set_guard'
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x15f): first defined here
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o): In function `__morestack_make_guard':
    (.text+0x169): multiple definition of `__morestack_make_guard'
    /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a(morestack.o):(.text+0x169): first defined here
    collect2: error: ld returned 1 exit status
    CMakeFiles/RaspCli.dir/build.make:106: recipe for target 'RaspCli' failed
    make[2]: *** [RaspCli] Error 1
    CMakeFiles/Makefile2:110: recipe for target 'CMakeFiles/RaspCli.dir/all' failed
    make[1]: *** [CMakeFiles/RaspCli.dir/all] Error 2
    Makefile:140: recipe for target 'all' failed
    make: *** [all] Error 2
    

    How can I successfully build ReAgent inside the CPU Docker container?

    I am running Ubuntu 18.04.3 LTS.

  • Getting a error running spark-submit job

    Getting a error running spark-submit job

    Hello,

    I am trying to follow the instructions here: https://github.com/facebookresearch/Horizon/blob/master/docs/usage.md

    When I run this script: /usr/local/spark/bin/spark-submit
    --class com.facebook.spark.rl.Preprocessor preprocessing/target/rl-preprocessing-1.1.jar
    "cat ml/rl/workflow/sample_configs/discrete_action/timeline.json"

    I am getting2019-02-27 00:57:03 INFO HiveMetaStore:746 - 0: get_database: global_temp 2019-02-27 00:57:03 INFO audit:371 - ugi=root ip=unknown-ip-addr cmd=get_database: global_temp 2019-02-27 00:57:03 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException Exception in thread "main" org.apache.spark.sql.AnalysisException: grouping expressions sequence is empty, and 'source_table.mdp_id' is not an aggregate function. Wrap '()' in windowing function(s) or wrap 'source_table.mdp_id' in first() (or first_value) if you don't care which value you get.;; 'Sort ['HASH('mdp_id, 'sequence_number) ASC NULLS FIRST], false +- 'RepartitionByExpression ['HASH('mdp_id, 'sequence_number)], 200 +- 'Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, next_state_features#24, next_action#25, sequence_number#2, sequence_number_ordinal#26, time_diff#27, possible_actions#7, possible_next_actions#28, metrics#8] +- 'Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8, next_state_features#24, next_action#25, sequence_number_ordinal#26, _we3#30, possible_next_actions#28, next_state_features#24, next_action#25, sequence_number_ordinal#26, (coalesce(_we3#30, sequence_number#2) - sequence_number#2) AS time_diff#27, possible_next_actions#28] +- 'Window [lead(state_features#4, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS next_state_features#24, lead(action#5, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS next_action#25, row_number() windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS sequence_number_ordinal#26, lead(sequence_number#2, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS _we3#30, lead(possible_actions#7, 1, null) windowspecdefinition(mdp_id#1, mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST, specifiedwindowframe(RowFrame, 1, 1)) AS possible_next_actions#28], [mdp_id#1], [mdp_id#1 ASC NULLS FIRST, sequence_number#2 ASC NULLS FIRST] +- 'Filter isnotnull('next_state_features) +- Aggregate [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8] +- SubqueryAlias source_table +- Project [mdp_id#1, state_features#4, action#5, action_probability#3, reward#6, sequence_number#2, possible_actions#7, metrics#8] +- Filter ((ds#0 >= 2019-01-01) && (ds#0 <= 2019-01-01)) +- SubqueryAlias cartpole_discrete +- Relation[ds#0,mdp_id#1,sequence_number#2,action_probability#3,state_features#4,action#5,reward#6,possible_actions#7,metrics#8] json

    I tried the steps, after manually installing Hbase (This step is missing in the documentation. Please let me know, if you want me to add it)

    I am using docker on Mac instructions (https://github.com/facebookresearch/Horizon/blob/master/docs/installation.md) to get going. Can anyone please help me on how to move forward?

  • Sync device in shift_kjt_by_one

    Sync device in shift_kjt_by_one

    Summary: Use the same device for constant tensor as input tensor to avoid concatenating tensors on separate devices

    Reviewed By: czxttkl

    Differential Revision: D39020305

  • Update Pendulum-V0 to V1 in reagent tests

    Update Pendulum-V0 to V1 in reagent tests

    Summary: This should unblock a failing autodatamodule test (f366492605) Gym documentation says V1 behaves same as V0: {F763157287}

    Differential Revision: D38918251

  • Clean up

    Clean up

    Summary: fixed acquisition functions for Bayes by Backprop and some general clean (comments, functions, etc)

    Reviewed By: czxttkl

    Differential Revision: D36470135

  • Issue in Installation On M1 Pro Mac

    Issue in Installation On M1 Pro Mac

    Hi I am trying to install but getting this error on tox step

    OSError: dlopen(/*/ReAgent/.tox/py38/lib/python3.8/site-packages/fbgemm_gpu/fbgemm_gpu_py.so, 0x0006): tried: '/*/ReAgent/.tox/py38/lib/python3.8/site-packages/fbgemm_gpu/fbgemm_gpu_py.so' (not a mach-o file)

    and for the same reason unable to run this command as well ./reagent/workflow/cli.py run reagent.workflow.gym_batch_rl.offline_gym_random $CONFIG

    even tried building fbgemm from source but in that getting error

    MKL library not found

    Any help would be appreciated

Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".
Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book

SLM Lab Modular Deep Reinforcement Learning framework in PyTorch. Documentation: https://slm-lab.gitbook.io/slm-lab/ BeamRider Breakout KungFuMaster M

Sep 24, 2022
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Sep 20, 2022
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Sep 24, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

Sep 26, 2022
A toolkit for reproducible reinforcement learning research.
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Sep 20, 2022
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Sep 20, 2022
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

Sep 20, 2022
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Sep 23, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Sep 23, 2022
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

Sep 23, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Sep 19, 2022
Deep Reinforcement Learning for Keras.
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Sep 26, 2022
ChainerRL is a deep reinforcement learning library built on top of Chainer.
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Sep 22, 2022
Open world survival environment for reinforcement learning
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Sep 13, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

Sep 21, 2022
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

Sep 10, 2022
A customisable 3D platform for agent-based AI research
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

Sep 21, 2022
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

Sep 21, 2022
Source code and data from the RecSys 2020 article "Carousel Personalization in Music Streaming Apps with Contextual Bandits" by W. Bendada, G. Salha and T. Bontempelli
Source code and data from the RecSys 2020 article

Carousel Personalization in Music Streaming Apps with Contextual Bandits - RecSys 2020 This repository provides Python code and data to reproduce expe

Sep 14, 2022