Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks

Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardized the hyperparameters of the SOTA MARL algorithms.

Python MARL framework

PyMARL is WhiRL's framework for deep multi-agent reinforcement learning and includes implementations of the following algorithms:

Value-based Methods:

Actor Critic Methods:

PyMARL is written in PyTorch and uses SMAC as its environment.

Installation instructions

Install Python packages

# require Anaconda 3 or Miniconda 3
bash install_dependecies.sh

Set up StarCraft II and SMAC:

bash install_sc2.sh

This will download SC2 into the 3rdparty folder and copy the maps necessary to run over.

Run an experiment

# For SMAC
python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor
# For Cooperative Predator-Prey
python3 src/main.py --config=qmix_prey --env-config=stag_hunt with env_args.map_name=stag_hunt

The config files act as defaults for an algorithm or environment.

They are all located in src/config. --config refers to the config files in src/config/algs --env-config refers to the config files in src/config/envs

Run parallel experiments:

# bash run.sh config_name map_name_list (threads_num arg_list gpu_list experinments_num)
bash run.sh qmix corridor 2 epsilon_anneal_time=500000 0,1 5

xxx_list is separated by ,.

All results will be stored in the Results folder and named with map_name.

Force all processes to exit

# all python and game processes of current user will quit.
bash clean.sh

Some test results on Super Hard scenarios

Cite

@article{hu2021riit,
      title={RIIT: Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning}, 
      author={Jian Hu and Haibin Wu and Seth Austin Harding and Siyang Jiang and Shih-wei Liao},
      year={2021},
      eprint={2102.03479},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Comments
  • 动作探索选择的问题

    动作探索选择的问题

    我比较了一下 pymarl 和 pymarl2 的代码

    发现在 pymarl 的 basic_controller.py 中的这个动作探索选择 https://github.com/oxwhirl/pymarl/blob/c971afdceb34635d31b778021b0ef90d7af51e86/src/controllers/basic_controller.py#L40-L48

    if not test_mode:
        # Epsilon floor
        epsilon_action_num = agent_outs.size(-1)
        if getattr(self.args, "mask_before_softmax", True):
            # With probability epsilon, we will pick an available action uniformly
            epsilon_action_num = reshaped_avail_actions.sum(dim=1, keepdim=True).float()
    
        agent_outs = ((1 - self.action_selector.epsilon) * agent_outs
                       + th.ones_like(agent_outs) * self.action_selector.epsilon/epsilon_action_num)
    

    被移动到了 action_selectors.py 中 https://github.com/hijkzzz/pymarl2/blob/d0aaf583605b2b012a1fd080eb6880a00954ed28/src/components/action_selectors.py#L94-L97

    而且计算方式貌似在是否mask上有所不同,请问一下为什么要这样改动哇

  • [Help]pysc2.lib.remote_controller.ConnectError: Failed to connect to the SC2 websocket. Is it up?

    [Help]pysc2.lib.remote_controller.ConnectError: Failed to connect to the SC2 websocket. Is it up?

    When I run command

    python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor
    

    I encounter this error.

    Traceback (most recent call last):
      File "src/main.py", line 109, in <module>
        ex.run_commandline(params)
      File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/experiment.py", line 318, in run_commandline
        options=args,
      File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/experiment.py", line 276, in run
        run()
      File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/run.py", line 238, in __call__
        self.result = self.main_function(*args)
      File "/opt/conda/envs/pymarl/lib/python3.7/site-packages/sacred/config/captured_function.py", line 42, in captured_function
        result = wrapped(*args, **kwargs)
      File "src/main.py", line 35, in my_main
        run_REGISTRY[_config['run']](_run, config, _log)
      File "/root/pymarl2/src/run/run.py", line 54, in run
        run_sequential(args=args, logger=logger)
      File "/root/pymarl2/src/run/run.py", line 177, in run_sequential
        episode_batch = runner.run(test_mode=False)
      File "/root/pymarl2/src/runners/parallel_runner.py", line 89, in run
        self.reset()
      File "/root/pymarl2/src/runners/parallel_runner.py", line 78, in reset
        data = parent_conn.recv()
      File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 250, in recv
        buf = self._recv_bytes()
      File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytes
        buf = self._recv(4)
      File "/opt/conda/envs/pymarl/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
        chunk = read(handle, remaining)
    KeyboardInterrupt
    

    Here(https://github.com/deepmind/pysc2/issues/281) says I have to open the Starcraft2 game as well instead of just open the battle net, but I don't know how to open them.

    Could you give any advices?

  • RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

    RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

    你好,代码运行以下测试的时候会报错: main.py --config=coma --env-config=one_step_matrix_game with save_model=True use_tensorboard=True save_model_interval=1000 t_max=50000 runner='episode' batch_size_run=1 use_cuda=False

    报错信息: Traceback (most recent calls WITHOUT Sacred internals): File "D:/sby/RL/pymarl2/main.py", line 35, in my_main run_REGISTRY[_config['run']](_run, config, _log) File "D:\sby\RL\pymarl2\run\run.py", line 56, in run run_sequential(args=args, logger=logger) File "D:\sby\RL\pymarl2\run\run.py", line 181, in run_sequential episode_batch = runner.run(test_mode=False) File "D:\sby\RL\pymarl2\runners\episode_runner.py", line 70, in run actions = self.mac.select_actions(self.batch, t_ep=self.t, t_env=self.t_env, test_mode=test_mode) File "D:\sby\RL\pymarl2\controllers\basic_controller.py", line 23, in select_actions t_env, test_mode=test_mode) File "D:\sby\RL\pymarl2\components\action_selectors.py", line 105, in select_action picked_actions = Categorical(masked_policies).sample().long() File "D:\Anaconda3\lib\site-packages\torch\distributions\categorical.py", line 107, in sample samples_2d = torch.multinomial(probs_2d, sample_shape.numel(), True).T RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

    我找了下问题,应该是rnn_agent.py中,x = F.relu(self.fc1(inputs.view(-1, e)), inplace=True), 多次迭代后,梯度累积,导致梯度爆炸,从而输出存在nan. 你看能否解决一下,谢谢。 我的环境是win10, pytorch1.x

  • Question about GRF

    Question about GRF

    Hi, Awesome work! You extended GRF into the pymarl famework. However when I run it with vdn_gfootball.yaml, there is a lot of debugging information. Could you please help me to fix it?

    Detail: absl Dump "episode_done": count limit reached / disabled

  • VMIX算法报NAN

    VMIX算法报NAN

    Traceback (most recent call last):                                                                                                                  [665/3388]
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline                                    
        return self.run(                                                                                                                                          
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run                                                
        run()                                                                                                                                                     
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/run.py", line 238, in __call__                                                  
        self.result = self.main_function(*args)                                                                                                                   
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function                     
        result = wrapped(*args, **kwargs)                                                                                                                         
      File "src/main.py", line 38, in my_main                                                                                                                     
        run_REGISTRY[_config['run']](_run, config, _log)                                                                                                          
      File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/run/run.py", line 54, in run                                                                 
        run_sequential(args=args, logger=logger)                                                                                                                  
      File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/run/run.py", line 195, in run_sequential                                                     
        learner.train(episode_sample, runner.t_env, episode)                                                                                                      
      File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/learners/policy_gradient_v2.py", line 58, in train                      
        advantages, td_error, targets_taken, log_pi_taken, entropy = self._calculate_advs(batch, rewards, terminated, actions, avail_actions,                     
      File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/learners/policy_gradient_v2.py", line 115, in _calculate_advs                                
        entropy = categorical_entropy(pi).reshape(-1)  #[bs, t, n_agents, 1]                                                                                      
      File "/NAS2020/Workspaces/DRLGroup/xhwang/Lab/SCII/pymarl2/src/components/action_selectors.py", line 110, in categorical_entropy                            
        return Categorical(probs=probs).entropy()                                                                                                                 
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/torch/distributions/categorical.py", line 64, in __init__                              
        super(Categorical, self).__init__(batch_shape, validate_args=validate_args)                                                  
      File "/home/xhwang/anaconda3/envs/pymarl/lib/python3.8/site-packages/torch/distributions/distribution.py", line 55, in __init__
        raise ValueError(                                                                                                                                         
    ValueError: Expected parameter probs (Tensor of shape (8, 54, 10, 18)) of distribution Categorical(probs: torch.Size([8, 54, 10, 18])) to satisfy the constrai
    nt Simplex(), but found invalid values:
    

    后面一截是数据没有贴上来,问题就是里面有nan

  • TensorBoard logger not working

    TensorBoard logger not working

    Hi, thanks for the good work! I installed the dependencies as instructed and successfully started training. However, it seams that the tensorboard logs are not written to /result directory although I set the use_tensorboard param to "true" in src/config/default.yaml. Could you please help me with this?

  • About the Linux environment

    About the Linux environment

    Hello,

    I would like to do more test with pymarl2. However, it seems that SC2.4.10 can not work on CentOS Linux 7.9.2009 due to glibc_2.17 and I got the following:

    /StarCraftII/Versions/Base75689/SC2_x64: /usr/lib64/libc.so.6: version 'GLIBC_2.18' not found (required by/StarCraftII/Libs/libstdc++.so.6)

    Could you please provide your operating environment info? Thanks!

  • 策略迭代问题

    策略迭代问题

    你好,我在你们的文章中看到S=EPI。 where S is the total number of samples, E is the number of samples in each episode, P is the number of rollout processes, and I is the number of policy iterations. 这里的policy iterations是指的target_update_interval还是多少轮train一次

  • AttributeError: 'TracebackException' object has no attribute 'exc_traceback' and RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)

    AttributeError: 'TracebackException' object has no attribute 'exc_traceback' and RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)

    when i run python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=corridor,the result is an error.

    Traceback (most recent call last):
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline
        return self.run(
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run
        run()
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/run.py", line 238, in __call__
        self.result = self.main_function(*args)
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function
        result = wrapped(*args, **kwargs)
      File "src/main.py", line 38, in my_main
        run_REGISTRY[_config['run']](_run, config, _log)
      File "/home/jindingquan/pymarl2-master/src/run/run.py", line 54, in run
        run_sequential(args=args, logger=logger)
      File "/home/jindingquan/pymarl2-master/src/run/run.py", line 114, in run_sequential
        buffer = ReplayBuffer(scheme, groups, args.buffer_size, env_info["episode_limit"] + 1,
      File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 209, in __init__
        super(ReplayBuffer, self).__init__(scheme, groups, buffer_size, max_seq_length, preprocess=preprocess, device=device)
      File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 28, in __init__
        self._setup_data(self.scheme, self.groups, batch_size, max_seq_length, self.preprocess)
      File "/home/jindingquan/pymarl2-master/src/components/episode_buffer.py", line 75, in _setup_data
        self.data.transition_data[field_key] = th.zeros((batch_size, max_seq_length, *shape), dtype=dtype, device=self.device)
    RuntimeError: [enforce fail at alloc_cpu.cpp:73] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 7506720000 bytes. Error code 12 (Cannot allocate memory)
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "src/main.py", line 112, in <module>
        ex.run_commandline(params)
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/experiment.py", line 347, in run_commandline
        print_filtered_stacktrace()
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 493, in print_filtered_stacktrace
        print(format_filtered_stacktrace(filter_traceback), file=sys.stderr)
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 528, in format_filtered_stacktrace
        return "".join(filtered_traceback_format(tb_exception))
      File "/home/jindingquan/.conda/envs/pymarl/lib/python3.8/site-packages/sacred/utils.py", line 568, in filtered_traceback_format
        current_tb = tb_exception.exc_traceback
    AttributeError: 'TracebackException' object has no attribute 'exc_traceback'
    

    how to fix it.Please help!!!!!!!!

  • Problem when modify maps

    Problem when modify maps

    Hi,

    I am doing some personal researchs, and I used one of your maps (1o_10b_vs_1r.SC2Map) due to the terrain design tha allow me some tasks. I have modified the maps regarding the type and number of agents but when I tried to change some terrain features the code gives me the following error

    Error

    The only modifications I made are elevate some of the terrain so I do not change the size of the map.

    Do you know the reason of this error?

    Thanks!

  • 咨询两个问题

    咨询两个问题

    作者您好,我从‘starry...’那,被您评论来。之前一开始选择星际争霸开源代码,由于原版的注释太少,才选择'starry...'。 1.现在运行qmix,是都已经微调过的吗? 2.maps的由来,请问Q_plex里的类似7sz map是怎么来的?在下载星际争霸的网页上,没看到这个地图,您这里也有主动提供新地图。

  • Add NDQ algorithm

    Add NDQ algorithm

    The source from NDQ's paper is too old and doesn't work with new pytorch. I modified the source, now it can easily work with new pytorch and is convinient to compare with other methods. By the way, I added a requirement file describing the versions of packages of my environment. 添加了NDQ 算法,使得代码可以运行在新的torch版本上,方便与其他方法进行比较。 并且添加了自己所使用的环境版本。

Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms
Reinforcement Learning Coach by Intel AI Lab enables easy experimentation with state of the art Reinforcement Learning algorithms

Coach Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. It exposes a set of easy-to-us

Nov 30, 2022
This is the official implementation of Multi-Agent PPO.

MAPPO Chao Yu*, Akash Velu*, Eugene Vinitsky, Yu Wang, Alexandre Bayen, and Yi Wu. Website: https://sites.google.com/view/mappo This repository implem

Dec 4, 2022
A general-purpose multi-agent training framework.

MALib A general-purpose multi-agent training framework. Installation step1: build environment conda create -n malib python==3.7 -y conda activate mali

Dec 2, 2022
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Nov 26, 2022
A customisable 3D platform for agent-based AI research
A customisable 3D platform for agent-based AI research

DeepMind Lab is a 3D learning environment based on id Software's Quake III Arena via ioquake3 and other open source software. DeepMind Lab provides a

Dec 2, 2022
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

Serpent.AI - Game Agent Framework (Python) Update: Revival (May 2020) Development work has resumed on the framework with the aim of bringing it into 2

Dec 5, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

Nov 30, 2022
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Nov 30, 2022
A toolkit for reproducible reinforcement learning research.
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Nov 29, 2022
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

Dec 5, 2022
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Dec 1, 2022
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Dec 3, 2022
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

Nov 28, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Nov 30, 2022
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

Nov 18, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Nov 25, 2022
Deep Reinforcement Learning for Keras.
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Dec 1, 2022
ChainerRL is a deep reinforcement learning library built on top of Chainer.
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Nov 24, 2022
Open world survival environment for reinforcement learning
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Nov 30, 2022