Modular Deep Reinforcement Learning framework in PyTorch. Companion library of the book "Foundations of Deep Reinforcement Learning".

SLM Lab
GitHub tag (latest SemVer) CI Maintainability Test Coverage

Modular Deep Reinforcement Learning framework in PyTorch.

Documentation:
https://slm-lab.gitbook.io/slm-lab/

ppo beamrider ppo breakout ppo kungfumaster ppo mspacman
BeamRider Breakout KungFuMaster MsPacman
ppo pong ppo qbert ppo seaquest ppo spaceinvaders
Pong Qbert Seaquest Sp.Invaders
sac ant sac halfcheetah sac hopper sac humanoid
Ant HalfCheetah Hopper Humanoid
sac doublependulum sac pendulum sac reacher sac walker
Inv.DoublePendulum InvertedPendulum Reacher Walker
Owner
Wah Loon Keng
Engineer by day, rock climber by night. Mathematician at heart.
Wah Loon Keng
Comments
  • All 'search' examples end with error

    All 'search' examples end with error

    Describe the bug I'm enjoying the book a lot. The best book on the subject and I've read Sutton & Barto, but I'm an empiricist and not an academic. Anyway, I can run all the examples in the book in 'dev' and 'train' modes but not in 'search' mode. They all end with error. I don't see anybody complaining about this so it must be a rooky mistake on my part. I hope you can help so I can continue enjoying the book to its fullest.

    To Reproduce

    1. OS and environment: Ubuntu 18.04
    2. SLM Lab git SHA (run git rev-parse HEAD to get it): What?
    3. spec file used: benchmark/reinforce/reinforce_cartpole.json

    Additional context I'm showing the error logs for Code 2.15 in page 50, but I get similar error logs for all the other codes ran in 'search' mode. There are 32 files in the 'data' folder, no plots. All the folders in the 'data' folder are empty except for 'log' which has a file with this

    [2020-01-30 11:03:56,907 PID:3351 INFO search.py run_ray_search] Running ray search for spec reinforce_cartpole
    

    NVIDIA drive version: 440.33.01 CUDA version: 10.2

    Error logs

    python run_lab.py slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json reinforce_baseline_cartpole search
    [2020-01-30 11:38:57,177 PID:4355 INFO run_lab.py read_spec_and_run] Running lab spec_file:slm_lab/spec/benchmark/reinforce/reinforce_cartpole.json spec_name:reinforce_baseline_cartpole in mode:search
    [2020-01-30 11:38:57,183 PID:4355 INFO search.py run_ray_search] Running ray search for spec reinforce_baseline_cartpole
    2020-01-30 11:38:57,183	WARNING worker.py:1341 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
    2020-01-30 11:38:57,183	INFO node.py:497 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-01-30_11-38-57_183527_4355/logs.
    2020-01-30 11:38:57,288	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:59003 to respond...
    2020-01-30 11:38:57,409	INFO services.py:409 -- Waiting for redis server at 127.0.0.1:55931 to respond...
    2020-01-30 11:38:57,414	INFO services.py:806 -- Starting Redis shard with 3.35 GB max memory.
    2020-01-30 11:38:57,435	INFO node.py:511 -- Process STDOUT and STDERR is being redirected to /tmp/ray/session_2020-01-30_11-38-57_183527_4355/logs.
    2020-01-30 11:38:57,435	INFO services.py:1441 -- Starting the Plasma object store with 5.02 GB memory using /dev/shm.
    2020-01-30 11:38:57,543	INFO tune.py:60 -- Tip: to resume incomplete experiments, pass resume='prompt' or resume=True to run()
    2020-01-30 11:38:57,543	INFO tune.py:223 -- Starting a new experiment.
    == Status ==
    Using FIFO scheduling algorithm.
    Resources requested: 0/8 CPUs, 0/1 GPUs
    Memory usage on this node: 2.1/16.7 GB
    
    2020-01-30 11:38:57,572	WARNING logger.py:130 -- Couldn't import TensorFlow - disabling TensorBoard logging.
    2020-01-30 11:38:57,573	WARNING logger.py:224 -- Could not instantiate <class 'ray.tune.logger.TFLogger'> - skipping.
    == Status ==
    Using FIFO scheduling algorithm.
    Resources requested: 4/8 CPUs, 0/1 GPUs
    Memory usage on this node: 2.2/16.7 GB
    Result logdir: /home/joe/ray_results/reinforce_baseline_cartpole
    Number of trials: 2 ({'RUNNING': 1, 'PENDING': 1})
    PENDING trials:
     - ray_trainable_1_agent.0.algorithm.center_return=False,trial_index=1:	PENDING
    RUNNING trials:
     - ray_trainable_0_agent.0.algorithm.center_return=True,trial_index=0:	RUNNING
    
    2020-01-30 11:38:57,596	WARNING logger.py:130 -- Couldn't import TensorFlow - disabling TensorBoard logging.
    2020-01-30 11:38:57,607	WARNING logger.py:224 -- Could not instantiate <class 'ray.tune.logger.TFLogger'> - skipping.
    (pid=4389) [2020-01-30 11:38:58,297 PID:4389 INFO logger.py info] Running sessions
    (pid=4388) [2020-01-30 11:38:58,292 PID:4388 INFO logger.py info] Running sessions
    (pid=4388) terminate called after throwing an instance of 'c10::Error'
    (pid=4388)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
    (pid=4388) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcf770dedc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
    (pid=4388) frame #1: <unknown function> + 0xca67 (0x7fcf6f2daa67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
    (pid=4388) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcf6f9fbb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
    (pid=4388) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcfa636128a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    (pid=4388) frame #4: <unknown function> + 0xc8421 (0x7fcfbb3bd421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
    (pid=4388) frame #5: <unknown function> + 0x76db (0x7fcfc0c466db in /lib/x86_64-linux-gnu/libpthread.so.0)
    (pid=4388) frame #6: clone + 0x3f (0x7fcfc096f88f in /lib/x86_64-linux-gnu/libc.so.6)
    (pid=4388) 
    (pid=4388) Fatal Python error: Aborted
    (pid=4388) 
    (pid=4388) Stack (most recent call first):
    (pid=4389) [2020-01-30 11:38:58,326 PID:4456 INFO openai.py __init__] OpenAIEnv:
    (pid=4389) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
    (pid=4389) - eval_frequency = 2000
    (pid=4389) - log_frequency = 10000
    (pid=4389) - frame_op = None
    (pid=4389) - frame_op_len = None
    (pid=4389) - image_downsize = (84, 84)
    (pid=4389) - normalize_state = False
    (pid=4389) - reward_scale = None
    (pid=4389) - num_envs = 1
    (pid=4389) - name = CartPole-v0
    (pid=4389) - max_t = 200
    (pid=4389) - max_frame = 100000
    (pid=4389) - to_render = False
    (pid=4389) - is_venv = False
    (pid=4389) - clock_speed = 1
    (pid=4389) - clock = <slm_lab.env.base.Clock object at 0x7fcc1a023d30>
    (pid=4389) - done = False
    (pid=4389) - total_reward = nan
    (pid=4389) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
    (pid=4389) - observation_space = Box(4,)
    (pid=4389) - action_space = Discrete(2)
    (pid=4389) - observable_dim = {'state': 4}
    (pid=4389) - action_dim = 2
    (pid=4389) - is_discrete = True
    (pid=4389) [2020-01-30 11:38:58,327 PID:4453 INFO openai.py __init__] OpenAIEnv:
    (pid=4389) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
    (pid=4389) - eval_frequency = 2000
    (pid=4389) - log_frequency = 10000
    (pid=4389) - frame_op = None
    (pid=4389) - frame_op_len = None
    (pid=4389) - image_downsize = (84, 84)
    (pid=4389) - normalize_state = False
    (pid=4389) - reward_scale = None
    (pid=4389) - num_envs = 1
    (pid=4389) - name = CartPole-v0
    (pid=4389) - max_t = 200
    (pid=4389) - max_frame = 100000
    (pid=4389) - to_render = False
    (pid=4389) - is_venv = False
    (pid=4389) - clock_speed = 1
    (pid=4389) - clock = <slm_lab.env.base.Clock object at 0x7fcc1a023d30>
    (pid=4389) - done = False
    (pid=4389) - total_reward = nan
    (pid=4389) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
    (pid=4389) - observation_space = Box(4,)
    (pid=4389) - action_space = Discrete(2)
    (pid=4389) - observable_dim = {'state': 4}
    (pid=4389) - action_dim = 2
    (pid=4389) - is_discrete = True
    (pid=4389) [2020-01-30 11:38:58,328 PID:4450 INFO openai.py __init__] OpenAIEnv:
    (pid=4389) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
    (pid=4389) - eval_frequency = 2000
    (pid=4389) - log_frequency = 10000
    (pid=4389) - frame_op = None
    (pid=4389) - frame_op_len = None
    (pid=4389) - image_downsize = (84, 84)
    (pid=4389) - normalize_state = False
    (pid=4389) - reward_scale = None
    (pid=4389) - num_envs = 1
    (pid=4389) - name = CartPole-v0
    (pid=4389) - max_t = 200
    (pid=4389) - max_frame = 100000
    (pid=4389) - to_render = False
    (pid=4389) - is_venv = False
    (pid=4389) - clock_speed = 1
    (pid=4389) - clock = <slm_lab.env.base.Clock object at 0x7fcc1a023d30>
    (pid=4389) - done = False
    (pid=4389) - total_reward = nan
    (pid=4389) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
    (pid=4389) - observation_space = Box(4,)
    (pid=4389) - action_space = Discrete(2)
    (pid=4389) - observable_dim = {'state': 4}
    (pid=4389) - action_dim = 2
    (pid=4389) - is_discrete = True
    (pid=4389) [2020-01-30 11:38:58,335 PID:4458 INFO openai.py __init__] OpenAIEnv:
    (pid=4389) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
    (pid=4389) - eval_frequency = 2000
    (pid=4389) - log_frequency = 10000
    (pid=4389) - frame_op = None
    (pid=4389) - frame_op_len = None
    (pid=4389) - image_downsize = (84, 84)
    (pid=4389) - normalize_state = False
    (pid=4389) - reward_scale = None
    (pid=4389) - num_envs = 1
    (pid=4389) - name = CartPole-v0
    (pid=4389) - max_t = 200
    (pid=4389) - max_frame = 100000
    (pid=4389) - to_render = False
    (pid=4389) - is_venv = False
    (pid=4389) - clock_speed = 1
    (pid=4389) - clock = <slm_lab.env.base.Clock object at 0x7fcc1a023d30>
    (pid=4389) - done = False
    (pid=4389) - total_reward = nan
    (pid=4389) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
    (pid=4389) - observation_space = Box(4,)
    (pid=4389) - action_space = Discrete(2)
    (pid=4389) - observable_dim = {'state': 4}
    (pid=4389) - action_dim = 2
    (pid=4389) - is_discrete = True
    (pid=4388) [2020-01-30 11:38:58,313 PID:4440 INFO openai.py __init__] OpenAIEnv:
    (pid=4388) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
    (pid=4388) - eval_frequency = 2000
    (pid=4388) - log_frequency = 10000
    (pid=4388) - frame_op = None
    (pid=4388) - frame_op_len = None
    (pid=4388) - image_downsize = (84, 84)
    (pid=4388) - normalize_state = False
    (pid=4388) - reward_scale = None
    (pid=4388) - num_envs = 1
    (pid=4388) - name = CartPole-v0
    (pid=4388) - max_t = 200
    (pid=4388) - max_frame = 100000
    (pid=4388) - to_render = False
    (pid=4388) - is_venv = False
    (pid=4388) - clock_speed = 1
    (pid=4388) - clock = <slm_lab.env.base.Clock object at 0x7fce28f7fcf8>
    (pid=4388) - done = False
    (pid=4388) - total_reward = nan
    (pid=4388) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
    (pid=4388) - observation_space = Box(4,)
    (pid=4388) - action_space = Discrete(2)
    (pid=4388) - observable_dim = {'state': 4}
    (pid=4388) - action_dim = 2
    (pid=4388) - is_discrete = True
    (pid=4388) [2020-01-30 11:38:58,318 PID:4445 INFO openai.py __init__] OpenAIEnv:
    (pid=4388) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
    (pid=4388) - eval_frequency = 2000
    (pid=4388) - log_frequency = 10000
    (pid=4388) - frame_op = None
    (pid=4388) - frame_op_len = None
    (pid=4388) - image_downsize = (84, 84)
    (pid=4388) - normalize_state = False
    (pid=4388) - reward_scale = None
    (pid=4388) - num_envs = 1
    (pid=4388) - name = CartPole-v0
    (pid=4388) - max_t = 200
    (pid=4388) - max_frame = 100000
    (pid=4388) - to_render = False
    (pid=4388) - is_venv = False
    (pid=4388) - clock_speed = 1
    (pid=4388) - clock = <slm_lab.env.base.Clock object at 0x7fce28f7fcf8>
    (pid=4388) - done = False
    (pid=4388) - total_reward = nan
    (pid=4388) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
    (pid=4388) - observation_space = Box(4,)
    (pid=4388) - action_space = Discrete(2)
    (pid=4388) - observable_dim = {'state': 4}
    (pid=4388) - action_dim = 2
    (pid=4388) - is_discrete = True
    (pid=4388) [2020-01-30 11:38:58,319 PID:4449 INFO openai.py __init__] OpenAIEnv:
    (pid=4388) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
    (pid=4388) - eval_frequency = 2000
    (pid=4388) - log_frequency = 10000
    (pid=4388) - frame_op = None
    (pid=4388) - frame_op_len = None
    (pid=4388) - image_downsize = (84, 84)
    (pid=4388) - normalize_state = False
    (pid=4388) - reward_scale = None
    (pid=4388) - num_envs = 1
    (pid=4388) - name = CartPole-v0
    (pid=4388) - max_t = 200
    (pid=4388) - max_frame = 100000
    (pid=4388) - to_render = False
    (pid=4388) - is_venv = False
    (pid=4388) - clock_speed = 1
    (pid=4388) - clock = <slm_lab.env.base.Clock object at 0x7fce28f7fcf8>
    (pid=4388) - done = False
    (pid=4388) - total_reward = nan
    (pid=4388) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
    (pid=4388) - observation_space = Box(4,)
    (pid=4388) - action_space = Discrete(2)
    (pid=4388) - observable_dim = {'state': 4}
    (pid=4388) - action_dim = 2
    (pid=4388) - is_discrete = True
    (pid=4388) [2020-01-30 11:38:58,323 PID:4452 INFO openai.py __init__] OpenAIEnv:
    (pid=4388) - env_spec = {'max_frame': 100000, 'max_t': None, 'name': 'CartPole-v0'}
    (pid=4388) - eval_frequency = 2000
    (pid=4388) - log_frequency = 10000
    (pid=4388) - frame_op = None
    (pid=4388) - frame_op_len = None
    (pid=4388) - image_downsize = (84, 84)
    (pid=4388) - normalize_state = False
    (pid=4388) - reward_scale = None
    (pid=4388) - num_envs = 1
    (pid=4388) - name = CartPole-v0
    (pid=4388) - max_t = 200
    (pid=4388) - max_frame = 100000
    (pid=4388) - to_render = False
    (pid=4388) - is_venv = False
    (pid=4388) - clock_speed = 1
    (pid=4388) - clock = <slm_lab.env.base.Clock object at 0x7fce28f7fcf8>
    (pid=4388) - done = False
    (pid=4388) - total_reward = nan
    (pid=4388) - u_env = <TrackReward<TimeLimit<CartPoleEnv<CartPole-v0>>>>
    (pid=4388) - observation_space = Box(4,)
    (pid=4388) - action_space = Discrete(2)
    (pid=4388) - observable_dim = {'state': 4}
    (pid=4388) - action_dim = 2
    (pid=4388) - is_discrete = True
    (pid=4389) [2020-01-30 11:38:58,339 PID:4453 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
    (pid=4389) [2020-01-30 11:38:58,340 PID:4450 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
    (pid=4389) [2020-01-30 11:38:58,343 PID:4456 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
    (pid=4389) [2020-01-30 11:38:58,345 PID:4450 INFO base.py __init__][2020-01-30 11:38:58,345 PID:4453 INFO base.py __init__] Reinforce:
    (pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bddcc0>
    (pid=4389) - algorithm_spec = {'action_pdtype': 'default',
    (pid=4389)  'action_policy': 'default',
    (pid=4389)  'center_return': False,
    (pid=4389)  'entropy_coef_spec': {'end_step': 20000,
    (pid=4389)                        'end_val': 0.001,
    (pid=4389)                        'name': 'linear_decay',
    (pid=4389)                        'start_step': 0,
    (pid=4389)                        'start_val': 0.01},
    (pid=4389)  'explore_var_spec': None,
    (pid=4389)  'gamma': 0.99,
    (pid=4389)  'name': 'Reinforce',
    (pid=4389)  'training_frequency': 1}
    (pid=4389) - name = Reinforce
    (pid=4389) - memory_spec = {'name': 'OnPolicyReplay'}
    (pid=4389) - net_spec = {'clip_grad_val': None,
    (pid=4389)  'hid_layers': [64],
    (pid=4389)  'hid_layers_activation': 'selu',
    (pid=4389)  'loss_spec': {'name': 'MSELoss'},
    (pid=4389)  'lr_scheduler_spec': None,
    (pid=4389)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4389)  'type': 'MLPNet'}
    (pid=4389) - body = body: {
    (pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bddcc0>",
    (pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56cc0>",
    (pid=4389)   "a": 0,
    (pid=4389)   "e": 0,
    (pid=4389)   "b": 0,
    (pid=4389)   "aeb": "(0, 0, 0)",
    (pid=4389)   "explore_var": NaN,
    (pid=4389)   "entropy_coef": 0.01,
    (pid=4389)   "loss": NaN,
    (pid=4389)   "mean_entropy": NaN,
    (pid=4389)   "mean_grad_norm": NaN,
    (pid=4389)   "best_total_reward_ma": -Infinity,
    (pid=4389)   "total_reward_ma": NaN,
    (pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bcb710>",
    (pid=4389)   "tb_actions": [],
    (pid=4389)   "tb_tracker": {},
    (pid=4389)   "observation_space": "Box(4,)",
    (pid=4389)   "action_space": "Discrete(2)",
    (pid=4389)   "observable_dim": {
    (pid=4389)     "state": 4
    (pid=4389)   },
    (pid=4389)   "state_dim": 4,
    (pid=4389)   "action_dim": 2,
    (pid=4389)   "is_discrete": true,
    (pid=4389)   "action_type": "discrete",
    (pid=4389)   "action_pdtype": "Categorical",
    (pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bddd68>"
    (pid=4389) }
    (pid=4389) - action_pdtype = default
    (pid=4389) - action_policy = <function default at 0x7fcc21560620>
    (pid=4389) - center_return = False
    (pid=4389) - explore_var_spec = None
    (pid=4389) - entropy_coef_spec = {'end_step': 20000,
    (pid=4389)  'end_val': 0.001,
    (pid=4389)  'name': 'linear_decay',
    (pid=4389)  'start_step': 0,
    (pid=4389)  'start_val': 0.01}
    (pid=4389) - policy_loss_coef = 1.0
    (pid=4389) - gamma = 0.99
    (pid=4389) - training_frequency = 1
    (pid=4389) - to_train = 0
    (pid=4389) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bddd30>
    (pid=4389) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bdda20>
    (pid=4389) - net = MLPNet(
    (pid=4389)   (model): Sequential(
    (pid=4389)     (0): Linear(in_features=4, out_features=64, bias=True)
    (pid=4389)     (1): SELU()
    (pid=4389)   )
    (pid=4389)   (model_tail): Sequential(
    (pid=4389)     (0): Linear(in_features=64, out_features=2, bias=True)
    (pid=4389)   )
    (pid=4389)   (loss_fn): MSELoss()
    (pid=4389) )
    (pid=4389) - net_names = ['net']
    (pid=4389) - optim = Adam (
    (pid=4389) Parameter Group 0
    (pid=4389)     amsgrad: False
    (pid=4389)     betas: (0.9, 0.999)
    (pid=4389)     eps: 1e-08
    (pid=4389)     lr: 0.002
    (pid=4389)     weight_decay: 0
    (pid=4389) )
    (pid=4389) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fcc10ba20b8>
    (pid=4389) - global_net = None
    (pid=4389)  Reinforce:
    (pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bdddd8>
    (pid=4389) - algorithm_spec = {'action_pdtype': 'default',
    (pid=4389)  'action_policy': 'default',
    (pid=4389)  'center_return': False,
    (pid=4389)  'entropy_coef_spec': {'end_step': 20000,
    (pid=4388) [2020-01-30 11:38:58,330 PID:4445 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
    (pid=4388) [2020-01-30 11:38:58,330 PID:4449 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
    (pid=4388) [2020-01-30 11:38:58,335 PID:4452 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
    (pid=4388) [2020-01-30 11:38:58,335 PID:4449 INFO base.py __init__] Reinforce:
    (pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e097f60>
    (pid=4388) - algorithm_spec = {'action_pdtype': 'default',
    (pid=4388)  'action_policy': 'default',
    (pid=4388)  'center_return': True,
    (pid=4388)  'entropy_coef_spec': {'end_step': 20000,
    (pid=4388)                        'end_val': 0.001,
    (pid=4388)                        'name': 'linear_decay',
    (pid=4388)                        'start_step': 0,
    (pid=4388)                        'start_val': 0.01},
    (pid=4388)  'explore_var_spec': None,
    (pid=4388)  'gamma': 0.99,
    (pid=4388)  'name': 'Reinforce',
    (pid=4388)  'training_frequency': 1}
    (pid=4388) - name = Reinforce
    (pid=4388) - memory_spec = {'name': 'OnPolicyReplay'}
    (pid=4388) - net_spec = {'clip_grad_val': None,
    (pid=4388)  'hid_layers': [64],
    (pid=4388)  'hid_layers_activation': 'selu',
    (pid=4388)  'loss_spec': {'name': 'MSELoss'},
    (pid=4388)  'lr_scheduler_spec': None,
    (pid=4388)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4388)  'type': 'MLPNet'}
    (pid=4388) - body = body: {
    (pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e097f60>",
    (pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044eb8>",
    (pid=4388)   "a": 0,
    (pid=4388)   "e": 0,
    (pid=4388)   "b": 0,
    (pid=4388)   "aeb": "(0, 0, 0)",
    (pid=4388)   "explore_var": NaN,
    (pid=4388)   "entropy_coef": 0.01,
    (pid=4388)   "loss": NaN,
    (pid=4388)   "mean_entropy": NaN,
    (pid=4388)   "mean_grad_norm": NaN,
    (pid=4388)   "best_total_reward_ma": -Infinity,
    (pid=4388)   "total_reward_ma": NaN,
    (pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
    (pid=4388)   "tb_actions": [],
    (pid=4388)   "tb_tracker": {},
    (pid=4388)   "observation_space": "Box(4,)",
    (pid=4388)   "action_space": "Discrete(2)",
    (pid=4388)   "observable_dim": {
    (pid=4388)     "state": 4
    (pid=4388)   },
    (pid=4388)   "state_dim": 4,
    (pid=4388)   "action_dim": 2,
    (pid=4388)   "is_discrete": true,
    (pid=4388)   "action_type": "discrete",
    (pid=4388)   "action_pdtype": "Categorical",
    (pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e097fd0>"
    (pid=4388) }
    (pid=4388) - action_pdtype = default
    (pid=4388) - action_policy = <function default at 0x7fce304ad620>
    (pid=4388) - center_return = True
    (pid=4388) - explore_var_spec = None
    (pid=4388) - entropy_coef_spec = {'end_step': 20000,
    (pid=4388)  'end_val': 0.001,
    (pid=4388)  'name': 'linear_decay',
    (pid=4388)  'start_step': 0,
    (pid=4388)  'start_val': 0.01}
    (pid=4388) - policy_loss_coef = 1.0
    (pid=4388) - gamma = 0.99
    (pid=4388) - training_frequency = 1
    (pid=4388) - to_train = 0
    (pid=4388) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e097c88>
    (pid=4388) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e083940>
    (pid=4388) - net = MLPNet(
    (pid=4388)   (model): Sequential(
    (pid=4388)     (0): Linear(in_features=4, out_features=64, bias=True)
    (pid=4388)     (1): SELU()
    (pid=4388)   )
    (pid=4388)   (model_tail): Sequential(
    (pid=4388)     (0): Linear(in_features=64, out_features=2, bias=True)
    (pid=4388)   )
    (pid=4388)   (loss_fn): MSELoss()
    (pid=4388) )
    (pid=4388) - net_names = ['net']
    (pid=4388) - optim = Adam (
    (pid=4388) Parameter Group 0
    (pid=4388)     amsgrad: False
    (pid=4388)     betas: (0.9, 0.999)
    (pid=4388)     eps: 1e-08
    (pid=4388)     lr: 0.002
    (pid=4388)     weight_decay: 0
    (pid=4388) )
    (pid=4388) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fce0e0562e8>
    (pid=4388) - global_net = None
    (pid=4388) [2020-01-30 11:38:58,335 PID:4445 INFO base.py __init__] Reinforce:
    (pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e098da0>
    (pid=4388) - algorithm_spec = {'action_pdtype': 'default',
    (pid=4388)  'action_policy': 'default',
    (pid=4388)  'center_return': True,
    (pid=4388)  'entropy_coef_spec': {'end_step': 20000,
    (pid=4389)                        'end_val': 0.001,
    (pid=4389)                        'name': 'linear_decay',
    (pid=4389)                        'start_step': 0,
    (pid=4389)                        'start_val': 0.01},
    (pid=4389)  'explore_var_spec': None,
    (pid=4389)  'gamma': 0.99,
    (pid=4389)  'name': 'Reinforce',
    (pid=4389)  'training_frequency': 1}
    (pid=4389) - name = Reinforce
    (pid=4389) - memory_spec = {'name': 'OnPolicyReplay'}
    (pid=4389) - net_spec = {'clip_grad_val': None,
    (pid=4389)  'hid_layers': [64],
    (pid=4389)  'hid_layers_activation': 'selu',
    (pid=4389)  'loss_spec': {'name': 'MSELoss'},
    (pid=4389)  'lr_scheduler_spec': None,
    (pid=4389)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4389)  'type': 'MLPNet'}
    (pid=4389) - body = body: {
    (pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bdddd8>",
    (pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56da0>",
    (pid=4389)   "a": 0,
    (pid=4389)   "e": 0,
    (pid=4389)   "b": 0,
    (pid=4389)   "aeb": "(0, 0, 0)",
    (pid=4389)   "explore_var": NaN,
    (pid=4389)   "entropy_coef": 0.01,
    (pid=4389)   "loss": NaN,
    (pid=4389)   "mean_entropy": NaN,
    (pid=4389)   "mean_grad_norm": NaN,
    (pid=4389)   "best_total_reward_ma": -Infinity,
    (pid=4389)   "total_reward_ma": NaN,
    (pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc5828>",
    (pid=4389)   "tb_actions": [],
    (pid=4389)   "tb_tracker": {},
    (pid=4389)   "observation_space": "Box(4,)",
    (pid=4389)   "action_space": "Discrete(2)",
    (pid=4389)   "observable_dim": {
    (pid=4389)     "state": 4
    (pid=4389)   },
    (pid=4389)   "state_dim": 4,
    (pid=4389)   "action_dim": 2,
    (pid=4389)   "is_discrete": true,
    (pid=4389)   "action_type": "discrete",
    (pid=4389)   "action_pdtype": "Categorical",
    (pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bdde80>"
    (pid=4389) }
    (pid=4389) - action_pdtype = default
    (pid=4389) - action_policy = <function default at 0x7fcc21560620>
    (pid=4389) - center_return = False
    (pid=4389) - explore_var_spec = None
    (pid=4389) - entropy_coef_spec = {'end_step': 20000,
    (pid=4389)  'end_val': 0.001,
    (pid=4389)  'name': 'linear_decay',
    (pid=4389)  'start_step': 0,
    (pid=4389)  'start_val': 0.01}
    (pid=4389) - policy_loss_coef = 1.0
    (pid=4389) - gamma = 0.99
    (pid=4389) - training_frequency = 1
    (pid=4389) - to_train = 0
    (pid=4389) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bdde48>
    (pid=4389) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bddb38>
    (pid=4389) - net = MLPNet(
    (pid=4389)   (model): Sequential(
    (pid=4389)     (0): Linear(in_features=4, out_features=64, bias=True)
    (pid=4389)     (1): SELU()
    (pid=4389)   )
    (pid=4389)   (model_tail): Sequential(
    (pid=4389)     (0): Linear(in_features=64, out_features=2, bias=True)
    (pid=4389)   )
    (pid=4389)   (loss_fn): MSELoss()
    (pid=4389) )
    (pid=4389) - net_names = ['net']
    (pid=4389) - optim = Adam (
    (pid=4389) Parameter Group 0
    (pid=4389)     amsgrad: False
    (pid=4389)     betas: (0.9, 0.999)
    (pid=4389)     eps: 1e-08
    (pid=4389)     lr: 0.002
    (pid=4389)     weight_decay: 0
    (pid=4389) )
    (pid=4389) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fcc10ba11d0>
    (pid=4389) - global_net = None
    (pid=4389) [2020-01-30 11:38:58,347 PID:4453 INFO __init__.py __init__][2020-01-30 11:38:58,347 PID:4450 INFO __init__.py __init__] Agent:
    (pid=4389) - spec = reinforce_baseline_cartpole
    (pid=4389) - agent_spec = {'algorithm': {'action_pdtype': 'default',
    (pid=4389)                'action_policy': 'default',
    (pid=4389)                'center_return': False,
    (pid=4389)                'entropy_coef_spec': {'end_step': 20000,
    (pid=4389)                                      'end_val': 0.001,
    (pid=4389)                                      'name': 'linear_decay',
    (pid=4389)                                      'start_step': 0,
    (pid=4389)                                      'start_val': 0.01},
    (pid=4389)                'explore_var_spec': None,
    (pid=4389)                'gamma': 0.99,
    (pid=4389)                'name': 'Reinforce',
    (pid=4389)                'training_frequency': 1},
    (pid=4389)  'memory': {'name': 'OnPolicyReplay'},
    (pid=4388)                        'end_val': 0.001,
    (pid=4388)                        'name': 'linear_decay',
    (pid=4388)                        'start_step': 0,
    (pid=4388)                        'start_val': 0.01},
    (pid=4388)  'explore_var_spec': None,
    (pid=4388)  'gamma': 0.99,
    (pid=4388)  'name': 'Reinforce',
    (pid=4388)  'training_frequency': 1}
    (pid=4388) - name = Reinforce
    (pid=4388) - memory_spec = {'name': 'OnPolicyReplay'}
    (pid=4388) - net_spec = {'clip_grad_val': None,
    (pid=4388)  'hid_layers': [64],
    (pid=4388)  'hid_layers_activation': 'selu',
    (pid=4388)  'loss_spec': {'name': 'MSELoss'},
    (pid=4388)  'lr_scheduler_spec': None,
    (pid=4388)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4388)  'type': 'MLPNet'}
    (pid=4388) - body = body: {
    (pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e098da0>",
    (pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044da0>",
    (pid=4388)   "a": 0,
    (pid=4388)   "e": 0,
    (pid=4388)   "b": 0,
    (pid=4388)   "aeb": "(0, 0, 0)",
    (pid=4388)   "explore_var": NaN,
    (pid=4388)   "entropy_coef": 0.01,
    (pid=4388)   "loss": NaN,
    (pid=4388)   "mean_entropy": NaN,
    (pid=4388)   "mean_grad_norm": NaN,
    (pid=4388)   "best_total_reward_ma": -Infinity,
    (pid=4388)   "total_reward_ma": NaN,
    (pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
    (pid=4388)   "tb_actions": [],
    (pid=4388)   "tb_tracker": {},
    (pid=4388)   "observation_space": "Box(4,)",
    (pid=4388)   "action_space": "Discrete(2)",
    (pid=4388)   "observable_dim": {
    (pid=4388)     "state": 4
    (pid=4388)   },
    (pid=4388)   "state_dim": 4,
    (pid=4388)   "action_dim": 2,
    (pid=4388)   "is_discrete": true,
    (pid=4388)   "action_type": "discrete",
    (pid=4388)   "action_pdtype": "Categorical",
    (pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e098e48>"
    (pid=4388) }
    (pid=4388) - action_pdtype = default
    (pid=4388) - action_policy = <function default at 0x7fce304ad620>
    (pid=4388) - center_return = True
    (pid=4388) - explore_var_spec = None
    (pid=4388) - entropy_coef_spec = {'end_step': 20000,
    (pid=4388)  'end_val': 0.001,
    (pid=4388)  'name': 'linear_decay',
    (pid=4388)  'start_step': 0,
    (pid=4388)  'start_val': 0.01}
    (pid=4388) - policy_loss_coef = 1.0
    (pid=4388) - gamma = 0.99
    (pid=4388) - training_frequency = 1
    (pid=4388) - to_train = 0
    (pid=4388) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e098e10>
    (pid=4388) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e098f28>
    (pid=4388) - net = MLPNet(
    (pid=4388)   (model): Sequential(
    (pid=4388)     (0): Linear(in_features=4, out_features=64, bias=True)
    (pid=4388)     (1): SELU()
    (pid=4388)   )
    (pid=4388)   (model_tail): Sequential(
    (pid=4388)     (0): Linear(in_features=64, out_features=2, bias=True)
    (pid=4388)   )
    (pid=4388)   (loss_fn): MSELoss()
    (pid=4388) )
    (pid=4388) - net_names = ['net']
    (pid=4388) - optim = Adam (
    (pid=4388) Parameter Group 0
    (pid=4388)     amsgrad: False
    (pid=4388)     betas: (0.9, 0.999)
    (pid=4388)     eps: 1e-08
    (pid=4388)     lr: 0.002
    (pid=4388)     weight_decay: 0
    (pid=4388) )
    (pid=4388) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fce0e05b1d0>
    (pid=4388) - global_net = None
    (pid=4388) [2020-01-30 11:38:58,336 PID:4449 INFO __init__.py __init__] Agent:
    (pid=4388) - spec = reinforce_baseline_cartpole
    (pid=4388) - agent_spec = {'algorithm': {'action_pdtype': 'default',
    (pid=4388)                'action_policy': 'default',
    (pid=4388)                'center_return': True,
    (pid=4388)                'entropy_coef_spec': {'end_step': 20000,
    (pid=4388)                                      'end_val': 0.001,
    (pid=4388)                                      'name': 'linear_decay',
    (pid=4388)                                      'start_step': 0,
    (pid=4388)                                      'start_val': 0.01},
    (pid=4388)                'explore_var_spec': None,
    (pid=4388)                'gamma': 0.99,
    (pid=4388)                'name': 'Reinforce',
    (pid=4388)                'training_frequency': 1},
    (pid=4388)  'memory': {'name': 'OnPolicyReplay'},
    (pid=4389)  'name': 'Reinforce',
    (pid=4389)  'net': {'clip_grad_val': None,
    (pid=4389)          'hid_layers': [64],
    (pid=4389)          'hid_layers_activation': 'selu',
    (pid=4389)          'loss_spec': {'name': 'MSELoss'},
    (pid=4389)          'lr_scheduler_spec': None,
    (pid=4389)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4389)          'type': 'MLPNet'}}
    (pid=4389) - name = Reinforce
    (pid=4389) - body = body: {
    (pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bdddd8>",
    (pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56da0>",
    (pid=4389)   "a": 0,
    (pid=4389)   "e": 0,
    (pid=4389)   "b": 0,
    (pid=4389)   "aeb": "(0, 0, 0)",
    (pid=4389)   "explore_var": NaN,
    (pid=4389)   "entropy_coef": 0.01,
    (pid=4389)   "loss": NaN,
    (pid=4389)   "mean_entropy": NaN,
    (pid=4389)   "mean_grad_norm": NaN,
    (pid=4389)   "best_total_reward_ma": -Infinity,
    (pid=4389)   "total_reward_ma": NaN,
    (pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc5828>",
    (pid=4389)   "tb_actions": [],
    (pid=4389)   "tb_tracker": {},
    (pid=4389)   "observation_space": "Box(4,)",
    (pid=4389)   "action_space": "Discrete(2)",
    (pid=4389)   "observable_dim": {
    (pid=4389)     "state": 4
    (pid=4389)   },
    (pid=4389)   "state_dim": 4,
    (pid=4389)   "action_dim": 2,
    (pid=4389)   "is_discrete": true,
    (pid=4389)   "action_type": "discrete",
    (pid=4389)   "action_pdtype": "Categorical",
    (pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bdde80>"
    (pid=4389) }
    (pid=4389) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fcc10bdde10>
    (pid=4389)  Agent:
    (pid=4389) - spec = reinforce_baseline_cartpole
    (pid=4389) - agent_spec = {'algorithm': {'action_pdtype': 'default',
    (pid=4389)                'action_policy': 'default',
    (pid=4389)                'center_return': False,
    (pid=4389)                'entropy_coef_spec': {'end_step': 20000,
    (pid=4389)                                      'end_val': 0.001,
    (pid=4389)                                      'name': 'linear_decay',
    (pid=4389)                                      'start_step': 0,
    (pid=4389)                                      'start_val': 0.01},
    (pid=4389)                'explore_var_spec': None,
    (pid=4389)                'gamma': 0.99,
    (pid=4389)                'name': 'Reinforce',
    (pid=4389)                'training_frequency': 1},
    (pid=4389)  'memory': {'name': 'OnPolicyReplay'},
    (pid=4389)  'name': 'Reinforce',
    (pid=4389)  'net': {'clip_grad_val': None,
    (pid=4389)          'hid_layers': [64],
    (pid=4389)          'hid_layers_activation': 'selu',
    (pid=4389)          'loss_spec': {'name': 'MSELoss'},
    (pid=4389)          'lr_scheduler_spec': None,
    (pid=4389)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4389)          'type': 'MLPNet'}}
    (pid=4389) - name = Reinforce
    (pid=4389) - body = body: {
    (pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bddcc0>",
    (pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56cc0>",
    (pid=4389)   "a": 0,
    (pid=4389)   "e": 0,
    (pid=4389)   "b": 0,
    (pid=4389)   "aeb": "(0, 0, 0)",
    (pid=4389)   "explore_var": NaN,
    (pid=4389)   "entropy_coef": 0.01,
    (pid=4389)   "loss": NaN,
    (pid=4389)   "mean_entropy": NaN,
    (pid=4389)   "mean_grad_norm": NaN,
    (pid=4389)   "best_total_reward_ma": -Infinity,
    (pid=4389)   "total_reward_ma": NaN,
    (pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bcb710>",
    (pid=4389)   "tb_actions": [],
    (pid=4389)   "tb_tracker": {},
    (pid=4389)   "observation_space": "Box(4,)",
    (pid=4389)   "action_space": "Discrete(2)",
    (pid=4389)   "observable_dim": {
    (pid=4389)     "state": 4
    (pid=4389)   },
    (pid=4389)   "state_dim": 4,
    (pid=4389)   "action_dim": 2,
    (pid=4389)   "is_discrete": true,
    (pid=4389)   "action_type": "discrete",
    (pid=4389)   "action_pdtype": "Categorical",
    (pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bddd68>"
    (pid=4389) }
    (pid=4389) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fcc10bddcf8>
    (pid=4389) [2020-01-30 11:38:58,347 PID:4458 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search[2020-01-30 11:38:58,347 PID:4450 INFO logger.py info][2020-01-30 11:38:58,347 PID:4453 INFO logger.py info]
    (pid=4388)  'name': 'Reinforce',
    (pid=4388)  'net': {'clip_grad_val': None,
    (pid=4388)          'hid_layers': [64],
    (pid=4388)          'hid_layers_activation': 'selu',
    (pid=4388)          'loss_spec': {'name': 'MSELoss'},
    (pid=4388)          'lr_scheduler_spec': None,
    (pid=4388)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4388)          'type': 'MLPNet'}}
    (pid=4388) - name = Reinforce
    (pid=4388) - body = body: {
    (pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e097f60>",
    (pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044eb8>",
    (pid=4388)   "a": 0,
    (pid=4388)   "e": 0,
    (pid=4388)   "b": 0,
    (pid=4388)   "aeb": "(0, 0, 0)",
    (pid=4388)   "explore_var": NaN,
    (pid=4388)   "entropy_coef": 0.01,
    (pid=4388)   "loss": NaN,
    (pid=4388)   "mean_entropy": NaN,
    (pid=4388)   "mean_grad_norm": NaN,
    (pid=4388)   "best_total_reward_ma": -Infinity,
    (pid=4388)   "total_reward_ma": NaN,
    (pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
    (pid=4388)   "tb_actions": [],
    (pid=4388)   "tb_tracker": {},
    (pid=4388)   "observation_space": "Box(4,)",
    (pid=4388)   "action_space": "Discrete(2)",
    (pid=4388)   "observable_dim": {
    (pid=4388)     "state": 4
    (pid=4388)   },
    (pid=4388)   "state_dim": 4,
    (pid=4388)   "action_dim": 2,
    (pid=4388)   "is_discrete": true,
    (pid=4388)   "action_type": "discrete",
    (pid=4388)   "action_pdtype": "Categorical",
    (pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e097fd0>"
    (pid=4388) }
    (pid=4388) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fce0e097f98>
    (pid=4388) [2020-01-30 11:38:58,337 PID:4449 INFO logger.py info] Session:
    (pid=4388) - spec = reinforce_baseline_cartpole
    (pid=4388) - index = 2
    (pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e097f60>
    (pid=4388) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044eb8>
    (pid=4388) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044eb8>
    (pid=4388) [2020-01-30 11:38:58,337 PID:4449 INFO logger.py info] Running RL loop for trial 0 session 2
    (pid=4388) [2020-01-30 11:38:58,337 PID:4445 INFO __init__.py __init__] Agent:
    (pid=4388) - spec = reinforce_baseline_cartpole
    (pid=4388) - agent_spec = {'algorithm': {'action_pdtype': 'default',
    (pid=4388)                'action_policy': 'default',
    (pid=4388)                'center_return': True,
    (pid=4388)                'entropy_coef_spec': {'end_step': 20000,
    (pid=4388)                                      'end_val': 0.001,
    (pid=4388)                                      'name': 'linear_decay',
    (pid=4388)                                      'start_step': 0,
    (pid=4388)                                      'start_val': 0.01},
    (pid=4388)                'explore_var_spec': None,
    (pid=4388)                'gamma': 0.99,
    (pid=4388)                'name': 'Reinforce',
    (pid=4388)                'training_frequency': 1},
    (pid=4388)  'memory': {'name': 'OnPolicyReplay'},
    (pid=4388)  'name': 'Reinforce',
    (pid=4388)  'net': {'clip_grad_val': None,
    (pid=4388)          'hid_layers': [64],
    (pid=4388)          'hid_layers_activation': 'selu',
    (pid=4388)          'loss_spec': {'name': 'MSELoss'},
    (pid=4388)          'lr_scheduler_spec': None,
    (pid=4388)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4388)          'type': 'MLPNet'}}
    (pid=4388) - name = Reinforce
    (pid=4388) - body = body: {
    (pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e098da0>",
    (pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044da0>",
    (pid=4388)   "a": 0,
    (pid=4388)   "e": 0,
    (pid=4388)   "b": 0,
    (pid=4388)   "aeb": "(0, 0, 0)",
    (pid=4388)   "explore_var": NaN,
    (pid=4388)   "entropy_coef": 0.01,
    (pid=4388)   "loss": NaN,
    (pid=4388)   "mean_entropy": NaN,
    (pid=4388)   "mean_grad_norm": NaN,
    (pid=4388)   "best_total_reward_ma": -Infinity,
    (pid=4388)   "total_reward_ma": NaN,
    (pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
    (pid=4388)   "tb_actions": [],
    (pid=4388)   "tb_tracker": {},
    (pid=4388)   "observation_space": "Box(4,)",
    (pid=4388)   "action_space": "Discrete(2)",
    (pid=4388)   "observable_dim": {
    (pid=4388)     "state": 4
    (pid=4388)   },
    (pid=4388)   "state_dim": 4,
    (pid=4388)   "action_dim": 2,
    (pid=4388)   "is_discrete": true,
    (pid=4389)  Session:
    (pid=4389) - spec = reinforce_baseline_cartpole
    (pid=4389) - index = 0
    (pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bddcc0>
    (pid=4389) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56cc0>
    (pid=4389) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56cc0> Session:
    (pid=4389) - spec = reinforce_baseline_cartpole
    (pid=4389) - index = 1
    (pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bdddd8>
    (pid=4389) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56da0>
    (pid=4389) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56da0>
    (pid=4389) 
    (pid=4389) [2020-01-30 11:38:58,347 PID:4450 INFO logger.py info] Running RL loop for trial 1 session 0[2020-01-30 11:38:58,347 PID:4453 INFO logger.py info]
    (pid=4389)  Running RL loop for trial 1 session 1
    (pid=4389) [2020-01-30 11:38:58,348 PID:4456 INFO base.py __init__] Reinforce:
    (pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bdcf28>
    (pid=4389) - algorithm_spec = {'action_pdtype': 'default',
    (pid=4389)  'action_policy': 'default',
    (pid=4389)  'center_return': False,
    (pid=4389)  'entropy_coef_spec': {'end_step': 20000,
    (pid=4389)                        'end_val': 0.001,
    (pid=4389)                        'name': 'linear_decay',
    (pid=4389)                        'start_step': 0,
    (pid=4389)                        'start_val': 0.01},
    (pid=4389)  'explore_var_spec': None,
    (pid=4389)  'gamma': 0.99,
    (pid=4389)  'name': 'Reinforce',
    (pid=4389)  'training_frequency': 1}
    (pid=4389) - name = Reinforce
    (pid=4389) - memory_spec = {'name': 'OnPolicyReplay'}
    (pid=4389) - net_spec = {'clip_grad_val': None,
    (pid=4389)  'hid_layers': [64],
    (pid=4389)  'hid_layers_activation': 'selu',
    (pid=4389)  'loss_spec': {'name': 'MSELoss'},
    (pid=4389)  'lr_scheduler_spec': None,
    (pid=4389)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4389)  'type': 'MLPNet'}
    (pid=4389) - body = body: {
    (pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bdcf28>",
    (pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56eb8>",
    (pid=4389)   "a": 0,
    (pid=4389)   "e": 0,
    (pid=4389)   "b": 0,
    (pid=4389)   "aeb": "(0, 0, 0)",
    (pid=4389)   "explore_var": NaN,
    (pid=4389)   "entropy_coef": 0.01,
    (pid=4389)   "loss": NaN,
    (pid=4389)   "mean_entropy": NaN,
    (pid=4389)   "mean_grad_norm": NaN,
    (pid=4389)   "best_total_reward_ma": -Infinity,
    (pid=4389)   "total_reward_ma": NaN,
    (pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc7940>",
    (pid=4389)   "tb_actions": [],
    (pid=4389)   "tb_tracker": {},
    (pid=4389)   "observation_space": "Box(4,)",
    (pid=4389)   "action_space": "Discrete(2)",
    (pid=4389)   "observable_dim": {
    (pid=4389)     "state": 4
    (pid=4389)   },
    (pid=4389)   "state_dim": 4,
    (pid=4389)   "action_dim": 2,
    (pid=4389)   "is_discrete": true,
    (pid=4389)   "action_type": "discrete",
    (pid=4389)   "action_pdtype": "Categorical",
    (pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bdcfd0>"
    (pid=4389) }
    (pid=4389) - action_pdtype = default
    (pid=4389) - action_policy = <function default at 0x7fcc21560620>
    (pid=4389) - center_return = False
    (pid=4389) - explore_var_spec = None
    (pid=4389) - entropy_coef_spec = {'end_step': 20000,
    (pid=4389)  'end_val': 0.001,
    (pid=4389)  'name': 'linear_decay',
    (pid=4389)  'start_step': 0,
    (pid=4389)  'start_val': 0.01}
    (pid=4389) - policy_loss_coef = 1.0
    (pid=4389) - gamma = 0.99
    (pid=4389) - training_frequency = 1
    (pid=4389) - to_train = 0
    (pid=4389) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bdcf98>
    (pid=4389) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10bdcc50>
    (pid=4389) - net = MLPNet(
    (pid=4389)   (model): Sequential(
    (pid=4389)     (0): Linear(in_features=4, out_features=64, bias=True)
    (pid=4389)     (1): SELU()
    (pid=4389)   )
    (pid=4389)   (model_tail): Sequential(
    (pid=4389)     (0): Linear(in_features=64, out_features=2, bias=True)
    (pid=4389)   )
    (pid=4389)   (loss_fn): MSELoss()
    (pid=4389) )
    (pid=4389) - net_names = ['net']
    (pid=4389) - optim = Adam (
    (pid=4389) Parameter Group 0
    (pid=4389)     amsgrad: False
    (pid=4389)     betas: (0.9, 0.999)
    (pid=4389)     eps: 1e-08
    (pid=4388)   "action_type": "discrete",
    (pid=4388)   "action_pdtype": "Categorical",
    (pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e098e48>"
    (pid=4388) }
    (pid=4388) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fce0e098dd8>
    (pid=4388) [2020-01-30 11:38:58,338 PID:4445 INFO logger.py info] Session:
    (pid=4388) - spec = reinforce_baseline_cartpole
    (pid=4388) - index = 1
    (pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e098da0>
    (pid=4388) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044da0>
    (pid=4388) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044da0>
    (pid=4388) [2020-01-30 11:38:58,338 PID:4445 INFO logger.py info] Running RL loop for trial 0 session 1
    (pid=4388) [2020-01-30 11:38:58,340 PID:4449 INFO __init__.py log_summary] Trial 0 session 2 reinforce_baseline_cartpole_t0_s2 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
    (pid=4388) [2020-01-30 11:38:58,340 PID:4452 INFO base.py __init__] Reinforce:
    (pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e082a58>
    (pid=4388) - algorithm_spec = {'action_pdtype': 'default',
    (pid=4388)  'action_policy': 'default',
    (pid=4388)  'center_return': True,
    (pid=4388)  'entropy_coef_spec': {'end_step': 20000,
    (pid=4388)                        'end_val': 0.001,
    (pid=4388)                        'name': 'linear_decay',
    (pid=4388)                        'start_step': 0,
    (pid=4388)                        'start_val': 0.01},
    (pid=4388)  'explore_var_spec': None,
    (pid=4388)  'gamma': 0.99,
    (pid=4388)  'name': 'Reinforce',
    (pid=4388)  'training_frequency': 1}
    (pid=4388) - name = Reinforce
    (pid=4388) - memory_spec = {'name': 'OnPolicyReplay'}
    (pid=4388) - net_spec = {'clip_grad_val': None,
    (pid=4388)  'hid_layers': [64],
    (pid=4388)  'hid_layers_activation': 'selu',
    (pid=4388)  'loss_spec': {'name': 'MSELoss'},
    (pid=4388)  'lr_scheduler_spec': None,
    (pid=4388)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4388)  'type': 'MLPNet'}
    (pid=4388) - body = body: {
    (pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e082a58>",
    (pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044fd0>",
    (pid=4388)   "a": 0,
    (pid=4388)   "e": 0,
    (pid=4388)   "b": 0,
    (pid=4388)   "aeb": "(0, 0, 0)",
    (pid=4388)   "explore_var": NaN,
    (pid=4388)   "entropy_coef": 0.01,
    (pid=4388)   "loss": NaN,
    (pid=4388)   "mean_entropy": NaN,
    (pid=4388)   "mean_grad_norm": NaN,
    (pid=4388)   "best_total_reward_ma": -Infinity,
    (pid=4388)   "total_reward_ma": NaN,
    (pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
    (pid=4388)   "tb_actions": [],
    (pid=4388)   "tb_tracker": {},
    (pid=4388)   "observation_space": "Box(4,)",
    (pid=4388)   "action_space": "Discrete(2)",
    (pid=4388)   "observable_dim": {
    (pid=4388)     "state": 4
    (pid=4388)   },
    (pid=4388)   "state_dim": 4,
    (pid=4388)   "action_dim": 2,
    (pid=4388)   "is_discrete": true,
    (pid=4388)   "action_type": "discrete",
    (pid=4388)   "action_pdtype": "Categorical",
    (pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e0540b8>"
    (pid=4388) }
    (pid=4388) - action_pdtype = default
    (pid=4388) - action_policy = <function default at 0x7fce304ad620>
    (pid=4388) - center_return = True
    (pid=4388) - explore_var_spec = None
    (pid=4388) - entropy_coef_spec = {'end_step': 20000,
    (pid=4388)  'end_val': 0.001,
    (pid=4388)  'name': 'linear_decay',
    (pid=4388)  'start_step': 0,
    (pid=4388)  'start_val': 0.01}
    (pid=4388) - policy_loss_coef = 1.0
    (pid=4388) - gamma = 0.99
    (pid=4388) - training_frequency = 1
    (pid=4388) - to_train = 0
    (pid=4388) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e054080>
    (pid=4388) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e054160>
    (pid=4388) - net = MLPNet(
    (pid=4388)   (model): Sequential(
    (pid=4388)     (0): Linear(in_features=4, out_features=64, bias=True)
    (pid=4388)     (1): SELU()
    (pid=4388)   )
    (pid=4388)   (model_tail): Sequential(
    (pid=4388)     (0): Linear(in_features=64, out_features=2, bias=True)
    (pid=4388)   )
    (pid=4388)   (loss_fn): MSELoss()
    (pid=4388) )
    (pid=4388) - net_names = ['net']
    (pid=4388) - optim = Adam (
    (pid=4388) Parameter Group 0
    (pid=4388)     amsgrad: False
    (pid=4388)     betas: (0.9, 0.999)
    (pid=4388)     eps: 1e-08
    (pid=4389)     lr: 0.002
    (pid=4389)     weight_decay: 0
    (pid=4389) )
    (pid=4389) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fcc10b9a2e8>
    (pid=4389) - global_net = None
    (pid=4389) [2020-01-30 11:38:58,350 PID:4456 INFO __init__.py __init__] Agent:
    (pid=4389) - spec = reinforce_baseline_cartpole
    (pid=4389) - agent_spec = {'algorithm': {'action_pdtype': 'default',
    (pid=4389)                'action_policy': 'default',
    (pid=4389)                'center_return': False,
    (pid=4389)                'entropy_coef_spec': {'end_step': 20000,
    (pid=4389)                                      'end_val': 0.001,
    (pid=4389)                                      'name': 'linear_decay',
    (pid=4389)                                      'start_step': 0,
    (pid=4389)                                      'start_val': 0.01},
    (pid=4389)                'explore_var_spec': None,
    (pid=4389)                'gamma': 0.99,
    (pid=4389)                'name': 'Reinforce',
    (pid=4389)                'training_frequency': 1},
    (pid=4389)  'memory': {'name': 'OnPolicyReplay'},
    (pid=4389)  'name': 'Reinforce',
    (pid=4389)  'net': {'clip_grad_val': None,
    (pid=4389)          'hid_layers': [64],
    (pid=4389)          'hid_layers_activation': 'selu',
    (pid=4389)          'loss_spec': {'name': 'MSELoss'},
    (pid=4389)          'lr_scheduler_spec': None,
    (pid=4389)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4389)          'type': 'MLPNet'}}
    (pid=4389) - name = Reinforce
    (pid=4389) - body = body: {
    (pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bdcf28>",
    (pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56eb8>",
    (pid=4389)   "a": 0,
    (pid=4389)   "e": 0,
    (pid=4389)   "b": 0,
    (pid=4389)   "aeb": "(0, 0, 0)",
    (pid=4389)   "explore_var": NaN,
    (pid=4389)   "entropy_coef": 0.01,
    (pid=4389)   "loss": NaN,
    (pid=4389)   "mean_entropy": NaN,
    (pid=4389)   "mean_grad_norm": NaN,
    (pid=4389)   "best_total_reward_ma": -Infinity,
    (pid=4389)   "total_reward_ma": NaN,
    (pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc7940>",
    (pid=4389)   "tb_actions": [],
    (pid=4389)   "tb_tracker": {},
    (pid=4389)   "observation_space": "Box(4,)",
    (pid=4389)   "action_space": "Discrete(2)",
    (pid=4389)   "observable_dim": {
    (pid=4389)     "state": 4
    (pid=4389)   },
    (pid=4389)   "state_dim": 4,
    (pid=4389)   "action_dim": 2,
    (pid=4389)   "is_discrete": true,
    (pid=4389)   "action_type": "discrete",
    (pid=4389)   "action_pdtype": "Categorical",
    (pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10bdcfd0>"
    (pid=4389) }
    (pid=4389) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fcc10bdcf60>
    (pid=4389) [2020-01-30 11:38:58,351 PID:4456 INFO logger.py info] Session:
    (pid=4389) - spec = reinforce_baseline_cartpole
    (pid=4389) - index = 2
    (pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bdcf28>
    (pid=4389) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56eb8>
    (pid=4389) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56eb8>
    (pid=4389) [2020-01-30 11:38:58,351 PID:4456 INFO logger.py info] Running RL loop for trial 1 session 2
    (pid=4389) [2020-01-30 11:38:58,351 PID:4450 INFO __init__.py log_summary] Trial 1 session 0 reinforce_baseline_cartpole_t1_s0 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
    (pid=4389) [2020-01-30 11:38:58,351 PID:4453 INFO __init__.py log_summary] Trial 1 session 1 reinforce_baseline_cartpole_t1_s1 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
    (pid=4389) [2020-01-30 11:38:58,352 PID:4458 INFO base.py __init__] Reinforce:
    (pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bddd68>
    (pid=4389) - algorithm_spec = {'action_pdtype': 'default',
    (pid=4389)  'action_policy': 'default',
    (pid=4389)  'center_return': False,
    (pid=4389)  'entropy_coef_spec': {'end_step': 20000,
    (pid=4389)                        'end_val': 0.001,
    (pid=4389)                        'name': 'linear_decay',
    (pid=4389)                        'start_step': 0,
    (pid=4389)                        'start_val': 0.01},
    (pid=4389)  'explore_var_spec': None,
    (pid=4389)  'gamma': 0.99,
    (pid=4389)  'name': 'Reinforce',
    (pid=4389)  'training_frequency': 1}
    (pid=4389) - name = Reinforce
    (pid=4389) - memory_spec = {'name': 'OnPolicyReplay'}
    (pid=4389) - net_spec = {'clip_grad_val': None,
    (pid=4389)  'hid_layers': [64],
    (pid=4389)  'hid_layers_activation': 'selu',
    (pid=4389)  'loss_spec': {'name': 'MSELoss'},
    (pid=4389)  'lr_scheduler_spec': None,
    (pid=4389)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4389)  'type': 'MLPNet'}
    (pid=4389) - body = body: {
    (pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bddd68>",
    (pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56fd0>",
    (pid=4389)   "a": 0,
    (pid=4389)   "e": 0,
    (pid=4389)   "b": 0,
    (pid=4388)     lr: 0.002
    (pid=4388)     weight_decay: 0
    (pid=4388) )
    (pid=4388) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fce0e054400>
    (pid=4388) - global_net = None
    (pid=4388) [2020-01-30 11:38:58,342 PID:4445 INFO __init__.py log_summary] Trial 0 session 1 reinforce_baseline_cartpole_t0_s1 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
    (pid=4388) [2020-01-30 11:38:58,342 PID:4452 INFO __init__.py __init__] Agent:
    (pid=4388) - spec = reinforce_baseline_cartpole
    (pid=4388) - agent_spec = {'algorithm': {'action_pdtype': 'default',
    (pid=4388)                'action_policy': 'default',
    (pid=4388)                'center_return': True,
    (pid=4388)                'entropy_coef_spec': {'end_step': 20000,
    (pid=4388)                                      'end_val': 0.001,
    (pid=4388)                                      'name': 'linear_decay',
    (pid=4388)                                      'start_step': 0,
    (pid=4388)                                      'start_val': 0.01},
    (pid=4388)                'explore_var_spec': None,
    (pid=4388)                'gamma': 0.99,
    (pid=4388)                'name': 'Reinforce',
    (pid=4388)                'training_frequency': 1},
    (pid=4388)  'memory': {'name': 'OnPolicyReplay'},
    (pid=4388)  'name': 'Reinforce',
    (pid=4388)  'net': {'clip_grad_val': None,
    (pid=4388)          'hid_layers': [64],
    (pid=4388)          'hid_layers_activation': 'selu',
    (pid=4388)          'loss_spec': {'name': 'MSELoss'},
    (pid=4388)          'lr_scheduler_spec': None,
    (pid=4388)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4388)          'type': 'MLPNet'}}
    (pid=4388) - name = Reinforce
    (pid=4388) - body = body: {
    (pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e082a58>",
    (pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044fd0>",
    (pid=4388)   "a": 0,
    (pid=4388)   "e": 0,
    (pid=4388)   "b": 0,
    (pid=4388)   "aeb": "(0, 0, 0)",
    (pid=4388)   "explore_var": NaN,
    (pid=4388)   "entropy_coef": 0.01,
    (pid=4388)   "loss": NaN,
    (pid=4388)   "mean_entropy": NaN,
    (pid=4388)   "mean_grad_norm": NaN,
    (pid=4388)   "best_total_reward_ma": -Infinity,
    (pid=4388)   "total_reward_ma": NaN,
    (pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
    (pid=4388)   "tb_actions": [],
    (pid=4388)   "tb_tracker": {},
    (pid=4388)   "observation_space": "Box(4,)",
    (pid=4388)   "action_space": "Discrete(2)",
    (pid=4388)   "observable_dim": {
    (pid=4388)     "state": 4
    (pid=4388)   },
    (pid=4388)   "state_dim": 4,
    (pid=4388)   "action_dim": 2,
    (pid=4388)   "is_discrete": true,
    (pid=4388)   "action_type": "discrete",
    (pid=4388)   "action_pdtype": "Categorical",
    (pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e0540b8>"
    (pid=4388) }
    (pid=4388) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fce0e054048>
    (pid=4388) [2020-01-30 11:38:58,342 PID:4452 INFO logger.py info] Session:
    (pid=4388) - spec = reinforce_baseline_cartpole
    (pid=4388) - index = 3
    (pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e082a58>
    (pid=4388) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044fd0>
    (pid=4388) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044fd0>
    (pid=4388) [2020-01-30 11:38:58,342 PID:4452 INFO logger.py info] Running RL loop for trial 0 session 3
    (pid=4388) [2020-01-30 11:38:58,343 PID:4440 INFO base.py post_init_nets] Initialized algorithm models for lab_mode: search
    (pid=4388) [2020-01-30 11:38:58,346 PID:4452 INFO __init__.py log_summary] Trial 0 session 3 reinforce_baseline_cartpole_t0_s3 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
    (pid=4388) [2020-01-30 11:38:58,348 PID:4440 INFO base.py __init__] Reinforce:
    (pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e09ac88>
    (pid=4388) - algorithm_spec = {'action_pdtype': 'default',
    (pid=4388)  'action_policy': 'default',
    (pid=4388)  'center_return': True,
    (pid=4388)  'entropy_coef_spec': {'end_step': 20000,
    (pid=4388)                        'end_val': 0.001,
    (pid=4388)                        'name': 'linear_decay',
    (pid=4388)                        'start_step': 0,
    (pid=4388)                        'start_val': 0.01},
    (pid=4388)  'explore_var_spec': None,
    (pid=4388)  'gamma': 0.99,
    (pid=4388)  'name': 'Reinforce',
    (pid=4388)  'training_frequency': 1}
    (pid=4388) - name = Reinforce
    (pid=4388) - memory_spec = {'name': 'OnPolicyReplay'}
    (pid=4388) - net_spec = {'clip_grad_val': None,
    (pid=4388)  'hid_layers': [64],
    (pid=4388)  'hid_layers_activation': 'selu',
    (pid=4388)  'loss_spec': {'name': 'MSELoss'},
    (pid=4388)  'lr_scheduler_spec': None,
    (pid=4388)  'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4388)  'type': 'MLPNet'}
    (pid=4388) - body = body: {
    (pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e09ac88>",
    (pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044cc0>",
    (pid=4388)   "a": 0,
    (pid=4388)   "e": 0,
    (pid=4388) terminate called after throwing an instance of 'c10::Error'
    (pid=4388)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
    (pid=4388) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcf770dedc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
    (pid=4388) frame #1: <unknown function> + 0xca67 (0x7fcf6f2daa67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
    (pid=4388) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcf6f9fbb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
    (pid=4388) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcfa636128a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    (pid=4388) frame #4: <unknown function> + 0xc8421 (0x7fcfbb3bd421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
    (pid=4388) frame #5: <unknown function> + 0x76db (0x7fcfc0c466db in /lib/x86_64-linux-gnu/libpthread.so.0)
    (pid=4388) frame #6: clone + 0x3f (0x7fcfc096f88f in /lib/x86_64-linux-gnu/libc.so.6)
    (pid=4388) 
    (pid=4388) Fatal Python error: Aborted
    (pid=4388) 
    (pid=4388) Stack (most recent call first):
    (pid=4388) terminate called after throwing an instance of 'c10::Error'
    (pid=4388)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
    (pid=4388) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcf770dedc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
    (pid=4388) frame #1: <unknown function> + 0xca67 (0x7fcf6f2daa67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
    (pid=4388) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcf6f9fbb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
    (pid=4388) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcfa636128a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    (pid=4388) frame #4: <unknown function> + 0xc8421 (0x7fcfbb3bd421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
    (pid=4388) frame #5: <unknown function> + 0x76db (0x7fcfc0c466db in /lib/x86_64-linux-gnu/libpthread.so.0)
    (pid=4388) frame #6: clone + 0x3f (0x7fcfc096f88f in /lib/x86_64-linux-gnu/libc.so.6)
    (pid=4388) 
    (pid=4388) Fatal Python error: Aborted
    (pid=4388) 
    (pid=4388) Stack (most recent call first):
    (pid=4389)   "aeb": "(0, 0, 0)",
    (pid=4389)   "explore_var": NaN,
    (pid=4389)   "entropy_coef": 0.01,
    (pid=4389)   "loss": NaN,
    (pid=4389)   "mean_entropy": NaN,
    (pid=4389)   "mean_grad_norm": NaN,
    (pid=4389)   "best_total_reward_ma": -Infinity,
    (pid=4389)   "total_reward_ma": NaN,
    (pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc6a58>",
    (pid=4389)   "tb_actions": [],
    (pid=4389)   "tb_tracker": {},
    (pid=4389)   "observation_space": "Box(4,)",
    (pid=4389)   "action_space": "Discrete(2)",
    (pid=4389)   "observable_dim": {
    (pid=4389)     "state": 4
    (pid=4389)   },
    (pid=4389)   "state_dim": 4,
    (pid=4389)   "action_dim": 2,
    (pid=4389)   "is_discrete": true,
    (pid=4389)   "action_type": "discrete",
    (pid=4389)   "action_pdtype": "Categorical",
    (pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10b9a0b8>"
    (pid=4389) }
    (pid=4389) - action_pdtype = default
    (pid=4389) - action_policy = <function default at 0x7fcc21560620>
    (pid=4389) - center_return = False
    (pid=4389) - explore_var_spec = None
    (pid=4389) - entropy_coef_spec = {'end_step': 20000,
    (pid=4389)  'end_val': 0.001,
    (pid=4389)  'name': 'linear_decay',
    (pid=4389)  'start_step': 0,
    (pid=4389)  'start_val': 0.01}
    (pid=4389) - policy_loss_coef = 1.0
    (pid=4389) - gamma = 0.99
    (pid=4389) - training_frequency = 1
    (pid=4389) - to_train = 0
    (pid=4389) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10b9a080>
    (pid=4389) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fcc10b9a160>
    (pid=4389) - net = MLPNet(
    (pid=4389)   (model): Sequential(
    (pid=4389)     (0): Linear(in_features=4, out_features=64, bias=True)
    (pid=4389)     (1): SELU()
    (pid=4389)   )
    (pid=4389)   (model_tail): Sequential(
    (pid=4389)     (0): Linear(in_features=64, out_features=2, bias=True)
    (pid=4389)   )
    (pid=4389)   (loss_fn): MSELoss()
    (pid=4389) )
    (pid=4389) - net_names = ['net']
    (pid=4389) - optim = Adam (
    (pid=4389) Parameter Group 0
    (pid=4389)     amsgrad: False
    (pid=4389)     betas: (0.9, 0.999)
    (pid=4389)     eps: 1e-08
    (pid=4389)     lr: 0.002
    (pid=4389)     weight_decay: 0
    (pid=4389) )
    (pid=4389) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fcc10b9a400>
    (pid=4389) - global_net = None
    (pid=4389) [2020-01-30 11:38:58,354 PID:4458 INFO __init__.py __init__] Agent:
    (pid=4389) - spec = reinforce_baseline_cartpole
    (pid=4389) - agent_spec = {'algorithm': {'action_pdtype': 'default',
    (pid=4389)                'action_policy': 'default',
    (pid=4389)                'center_return': False,
    (pid=4389)                'entropy_coef_spec': {'end_step': 20000,
    (pid=4389)                                      'end_val': 0.001,
    (pid=4389)                                      'name': 'linear_decay',
    (pid=4389)                                      'start_step': 0,
    (pid=4389)                                      'start_val': 0.01},
    (pid=4389)                'explore_var_spec': None,
    (pid=4389)                'gamma': 0.99,
    (pid=4389)                'name': 'Reinforce',
    (pid=4389)                'training_frequency': 1},
    (pid=4389)  'memory': {'name': 'OnPolicyReplay'},
    (pid=4389)  'name': 'Reinforce',
    (pid=4389)  'net': {'clip_grad_val': None,
    (pid=4389)          'hid_layers': [64],
    (pid=4389)          'hid_layers_activation': 'selu',
    (pid=4389)          'loss_spec': {'name': 'MSELoss'},
    (pid=4389)          'lr_scheduler_spec': None,
    (pid=4389)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4389)          'type': 'MLPNet'}}
    (pid=4389) - name = Reinforce
    (pid=4389) - body = body: {
    (pid=4389)   "agent": "<slm_lab.agent.Agent object at 0x7fcc10bddd68>",
    (pid=4389)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56fd0>",
    (pid=4389)   "a": 0,
    (pid=4389)   "e": 0,
    (pid=4389)   "b": 0,
    (pid=4389)   "aeb": "(0, 0, 0)",
    (pid=4389)   "explore_var": NaN,
    (pid=4389)   "entropy_coef": 0.01,
    (pid=4389)   "loss": NaN,
    (pid=4389)   "mean_entropy": NaN,
    (pid=4389)   "mean_grad_norm": NaN,
    (pid=4389)   "best_total_reward_ma": -Infinity,
    (pid=4389)   "total_reward_ma": NaN,
    (pid=4388)   "b": 0,
    (pid=4388)   "aeb": "(0, 0, 0)",
    (pid=4388)   "explore_var": NaN,
    (pid=4388)   "entropy_coef": 0.01,
    (pid=4388)   "loss": NaN,
    (pid=4388)   "mean_entropy": NaN,
    (pid=4388)   "mean_grad_norm": NaN,
    (pid=4388)   "best_total_reward_ma": -Infinity,
    (pid=4388)   "total_reward_ma": NaN,
    (pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
    (pid=4388)   "tb_actions": [],
    (pid=4388)   "tb_tracker": {},
    (pid=4388)   "observation_space": "Box(4,)",
    (pid=4388)   "action_space": "Discrete(2)",
    (pid=4388)   "observable_dim": {
    (pid=4388)     "state": 4
    (pid=4388)   },
    (pid=4388)   "state_dim": 4,
    (pid=4388)   "action_dim": 2,
    (pid=4388)   "is_discrete": true,
    (pid=4388)   "action_type": "discrete",
    (pid=4388)   "action_pdtype": "Categorical",
    (pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e09ad30>"
    (pid=4388) }
    (pid=4388) - action_pdtype = default
    (pid=4388) - action_policy = <function default at 0x7fce304ad620>
    (pid=4388) - center_return = True
    (pid=4388) - explore_var_spec = None
    (pid=4388) - entropy_coef_spec = {'end_step': 20000,
    (pid=4388)  'end_val': 0.001,
    (pid=4388)  'name': 'linear_decay',
    (pid=4388)  'start_step': 0,
    (pid=4388)  'start_val': 0.01}
    (pid=4388) - policy_loss_coef = 1.0
    (pid=4388) - gamma = 0.99
    (pid=4388) - training_frequency = 1
    (pid=4388) - to_train = 0
    (pid=4388) - explore_var_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e09acf8>
    (pid=4388) - entropy_coef_scheduler = <slm_lab.agent.algorithm.policy_util.VarScheduler object at 0x7fce0e09ae10>
    (pid=4388) - net = MLPNet(
    (pid=4388)   (model): Sequential(
    (pid=4388)     (0): Linear(in_features=4, out_features=64, bias=True)
    (pid=4388)     (1): SELU()
    (pid=4388)   )
    (pid=4388)   (model_tail): Sequential(
    (pid=4388)     (0): Linear(in_features=64, out_features=2, bias=True)
    (pid=4388)   )
    (pid=4388)   (loss_fn): MSELoss()
    (pid=4388) )
    (pid=4388) - net_names = ['net']
    (pid=4388) - optim = Adam (
    (pid=4388) Parameter Group 0
    (pid=4388)     amsgrad: False
    (pid=4388)     betas: (0.9, 0.999)
    (pid=4388)     eps: 1e-08
    (pid=4388)     lr: 0.002
    (pid=4388)     weight_decay: 0
    (pid=4388) )
    (pid=4388) - lr_scheduler = <slm_lab.agent.net.net_util.NoOpLRScheduler object at 0x7fce0e05c0b8>
    (pid=4388) - global_net = None
    (pid=4388) [2020-01-30 11:38:58,350 PID:4440 INFO __init__.py __init__] Agent:
    (pid=4388) - spec = reinforce_baseline_cartpole
    (pid=4388) - agent_spec = {'algorithm': {'action_pdtype': 'default',
    (pid=4388)                'action_policy': 'default',
    (pid=4388)                'center_return': True,
    (pid=4388)                'entropy_coef_spec': {'end_step': 20000,
    (pid=4388)                                      'end_val': 0.001,
    (pid=4388)                                      'name': 'linear_decay',
    (pid=4388)                                      'start_step': 0,
    (pid=4388)                                      'start_val': 0.01},
    (pid=4388)                'explore_var_spec': None,
    (pid=4388)                'gamma': 0.99,
    (pid=4388)                'name': 'Reinforce',
    (pid=4388)                'training_frequency': 1},
    (pid=4388)  'memory': {'name': 'OnPolicyReplay'},
    (pid=4388)  'name': 'Reinforce',
    (pid=4388)  'net': {'clip_grad_val': None,
    (pid=4388)          'hid_layers': [64],
    (pid=4388)          'hid_layers_activation': 'selu',
    (pid=4388)          'loss_spec': {'name': 'MSELoss'},
    (pid=4388)          'lr_scheduler_spec': None,
    (pid=4388)          'optim_spec': {'lr': 0.002, 'name': 'Adam'},
    (pid=4388)          'type': 'MLPNet'}}
    (pid=4388) - name = Reinforce
    (pid=4388) - body = body: {
    (pid=4388)   "agent": "<slm_lab.agent.Agent object at 0x7fce0e09ac88>",
    (pid=4388)   "env": "<slm_lab.env.openai.OpenAIEnv object at 0x7fce28044cc0>",
    (pid=4388)   "a": 0,
    (pid=4388)   "e": 0,
    (pid=4388)   "b": 0,
    (pid=4388)   "aeb": "(0, 0, 0)",
    (pid=4388)   "explore_var": NaN,
    (pid=4388)   "entropy_coef": 0.01,
    (pid=4388)   "loss": NaN,
    (pid=4388)   "mean_entropy": NaN,
    (pid=4388)   "mean_grad_norm": NaN,
    (pid=4388)   "best_total_reward_ma": -Infinity,
    (pid=4389)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4389)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fcc10bc6a58>",
    (pid=4389)   "tb_actions": [],
    (pid=4389)   "tb_tracker": {},
    (pid=4389)   "observation_space": "Box(4,)",
    (pid=4389)   "action_space": "Discrete(2)",
    (pid=4389)   "observable_dim": {
    (pid=4389)     "state": 4
    (pid=4389)   },
    (pid=4389)   "state_dim": 4,
    (pid=4389)   "action_dim": 2,
    (pid=4389)   "is_discrete": true,
    (pid=4389)   "action_type": "discrete",
    (pid=4389)   "action_pdtype": "Categorical",
    (pid=4389)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4389)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fcc10b9a0b8>"
    (pid=4389) }
    (pid=4389) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fcc10b9a048>
    (pid=4389) [2020-01-30 11:38:58,354 PID:4458 INFO logger.py info] Session:
    (pid=4389) - spec = reinforce_baseline_cartpole
    (pid=4389) - index = 3
    (pid=4389) - agent = <slm_lab.agent.Agent object at 0x7fcc10bddd68>
    (pid=4389) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56fd0>
    (pid=4389) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fcc10c56fd0>
    (pid=4389) [2020-01-30 11:38:58,354 PID:4458 INFO logger.py info] Running RL loop for trial 1 session 3
    (pid=4389) [2020-01-30 11:38:58,355 PID:4456 INFO __init__.py log_summary] Trial 1 session 2 reinforce_baseline_cartpole_t1_s2 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
    (pid=4389) [2020-01-30 11:38:58,358 PID:4458 INFO __init__.py log_summary] Trial 1 session 3 reinforce_baseline_cartpole_t1_s3 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
    (pid=4388)   "total_reward_ma": NaN,
    (pid=4388)   "train_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "eval_df": "Empty DataFrame\nColumns: [epi, t, wall_t, opt_step, frame, fps, total_reward, total_reward_ma, loss, lr, explore_var, entropy_coef, entropy, grad_norm]\nIndex: []",
    (pid=4388)   "tb_writer": "<torch.utils.tensorboard.writer.SummaryWriter object at 0x7fce2b00a780>",
    (pid=4388)   "tb_actions": [],
    (pid=4388)   "tb_tracker": {},
    (pid=4388)   "observation_space": "Box(4,)",
    (pid=4388)   "action_space": "Discrete(2)",
    (pid=4388)   "observable_dim": {
    (pid=4388)     "state": 4
    (pid=4388)   },
    (pid=4388)   "state_dim": 4,
    (pid=4388)   "action_dim": 2,
    (pid=4388)   "is_discrete": true,
    (pid=4388)   "action_type": "discrete",
    (pid=4388)   "action_pdtype": "Categorical",
    (pid=4388)   "ActionPD": "<class 'torch.distributions.categorical.Categorical'>",
    (pid=4388)   "memory": "<slm_lab.agent.memory.onpolicy.OnPolicyReplay object at 0x7fce0e09ad30>"
    (pid=4388) }
    (pid=4388) - algorithm = <slm_lab.agent.algorithm.reinforce.Reinforce object at 0x7fce0e09acc0>
    (pid=4388) [2020-01-30 11:38:58,350 PID:4440 INFO logger.py info] Session:
    (pid=4388) - spec = reinforce_baseline_cartpole
    (pid=4388) - index = 0
    (pid=4388) - agent = <slm_lab.agent.Agent object at 0x7fce0e09ac88>
    (pid=4388) - env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044cc0>
    (pid=4388) - eval_env = <slm_lab.env.openai.OpenAIEnv object at 0x7fce28044cc0>
    (pid=4388) [2020-01-30 11:38:58,350 PID:4440 INFO logger.py info] Running RL loop for trial 0 session 0
    (pid=4388) [2020-01-30 11:38:58,354 PID:4440 INFO __init__.py log_summary] Trial 0 session 0 reinforce_baseline_cartpole_t0_s0 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.002  explore_var: nan  entropy_coef: 0.01  entropy: nan  grad_norm: nan
    (pid=4388) terminate called after throwing an instance of 'c10::Error'
    (pid=4388)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
    (pid=4388) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcf770dedc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
    (pid=4388) frame #1: <unknown function> + 0xca67 (0x7fcf6f2daa67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
    (pid=4388) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcf6f9fbb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
    (pid=4388) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcfa636128a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    (pid=4388) frame #4: <unknown function> + 0xc8421 (0x7fcfbb3bd421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
    (pid=4388) frame #5: <unknown function> + 0x76db (0x7fcfc0c466db in /lib/x86_64-linux-gnu/libpthread.so.0)
    (pid=4388) frame #6: clone + 0x3f (0x7fcfc096f88f in /lib/x86_64-linux-gnu/libc.so.6)
    (pid=4388) 
    (pid=4388) Fatal Python error: Aborted
    (pid=4388) 
    (pid=4388) Stack (most recent call first):
    (pid=4389) terminate called after throwing an instance of 'c10::Error'
    (pid=4389)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
    (pid=4389) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcd68190dc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
    (pid=4389) frame #1: <unknown function> + 0xca67 (0x7fcd6038ca67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
    (pid=4389) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcd60aadb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
    (pid=4389) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcd9741328a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    (pid=4389) frame #4: <unknown function> + 0xc8421 (0x7fcdac471421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
    (pid=4389) frame #5: <unknown function> + 0x76db (0x7fcdb1cfa6db in /lib/x86_64-linux-gnu/libpthread.so.0)
    (pid=4389) frame #6: clone + 0x3f (0x7fcdb1a2388f in /lib/x86_64-linux-gnu/libc.so.6)
    (pid=4389) 
    (pid=4389) Fatal Python error: Aborted
    (pid=4389) 
    (pid=4389) Stack (most recent call first):
    (pid=4389) terminate called after throwing an instance of 'c10::Error'
    (pid=4389)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
    (pid=4389) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcd68190dc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
    (pid=4389) frame #1: <unknown function> + 0xca67 (0x7fcd6038ca67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
    (pid=4389) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcd60aadb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
    (pid=4389) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcd9741328a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    (pid=4389) frame #4: <unknown function> + 0xc8421 (0x7fcdac471421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
    (pid=4389) frame #5: <unknown function> + 0x76db (0x7fcdb1cfa6db in /lib/x86_64-linux-gnu/libpthread.so.0)
    (pid=4389) frame #6: clone + 0x3f (0x7fcdb1a2388f in /lib/x86_64-linux-gnu/libc.so.6)
    (pid=4389) 
    (pid=4389) Fatal Python error: Aborted
    (pid=4389) 
    (pid=4389) Stack (most recent call first):
    (pid=4389) terminate called after throwing an instance of 'c10::Error'
    (pid=4389)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
    (pid=4389) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcd68190dc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
    (pid=4389) frame #1: <unknown function> + 0xca67 (0x7fcd6038ca67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
    (pid=4389) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcd60aadb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
    (pid=4389) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcd9741328a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    (pid=4389) frame #4: <unknown function> + 0xc8421 (0x7fcdac471421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
    (pid=4389) frame #5: <unknown function> + 0x76db (0x7fcdb1cfa6db in /lib/x86_64-linux-gnu/libpthread.so.0)
    (pid=4389) frame #6: clone + 0x3f (0x7fcdb1a2388f in /lib/x86_64-linux-gnu/libc.so.6)
    (pid=4389) 
    (pid=4389) Fatal Python error: Aborted
    (pid=4389) 
    (pid=4389) Stack (most recent call first):
    (pid=4389) terminate called after throwing an instance of 'c10::Error'
    (pid=4389)   what():  CUDA error: initialization error (getDevice at /opt/conda/conda-bld/pytorch_1556653114079/work/c10/cuda/impl/CUDAGuardImpl.h:35)
    (pid=4389) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7fcd68190dc5 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10.so)
    (pid=4389) frame #1: <unknown function> + 0xca67 (0x7fcd6038ca67 in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
    (pid=4389) frame #2: torch::autograd::Engine::thread_init(int) + 0x3ee (0x7fcd60aadb1e in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
    (pid=4389) frame #3: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7fcd9741328a in /home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
    (pid=4389) frame #4: <unknown function> + 0xc8421 (0x7fcdac471421 in /home/joe/anaconda3/envs/lab/bin/../lib/libstdc++.so.6)
    (pid=4389) frame #5: <unknown function> + 0x76db (0x7fcdb1cfa6db in /lib/x86_64-linux-gnu/libpthread.so.0)
    (pid=4389) frame #6: clone + 0x3f (0x7fcdb1a2388f in /lib/x86_64-linux-gnu/libc.so.6)
    (pid=4389) 
    (pid=4389) Fatal Python error: Aborted
    (pid=4389) 
    (pid=4389) Stack (most recent call first):
    (pid=4388) 2020-01-30 11:38:58,550	ERROR function_runner.py:96 -- Runner Thread raised error.
    (pid=4388) Traceback (most recent call last):
    (pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 90, in run
    (pid=4388)     self._entrypoint()
    (pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 141, in entrypoint
    (pid=4388)     return self._trainable_func(config, self._status_reporter)
    (pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 249, in _trainable_func
    (pid=4388)     output = train_func(config, reporter)
    (pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/search.py", line 90, in ray_trainable
    (pid=4388)     metrics = Trial(spec).run()
    (pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/control.py", line 181, in run
    (pid=4388)     metrics = analysis.analyze_trial(self.spec, session_metrics_list)
    (pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 265, in analyze_trial
    (pid=4388)     trial_metrics = calc_trial_metrics(session_metrics_list, info_prepath)
    (pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 187, in calc_trial_metrics
    (pid=4388)     frames = session_metrics_list[0]['local']['frames']
    (pid=4388) IndexError: list index out of range
    (pid=4388) Exception in thread Thread-1:
    (pid=4388) Traceback (most recent call last):
    (pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 90, in run
    (pid=4388)     self._entrypoint()
    (pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 141, in entrypoint
    (pid=4388)     return self._trainable_func(config, self._status_reporter)
    (pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 249, in _trainable_func
    (pid=4388)     output = train_func(config, reporter)
    (pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/search.py", line 90, in ray_trainable
    (pid=4388)     metrics = Trial(spec).run()
    (pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/control.py", line 181, in run
    (pid=4388)     metrics = analysis.analyze_trial(self.spec, session_metrics_list)
    (pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 265, in analyze_trial
    (pid=4388)     trial_metrics = calc_trial_metrics(session_metrics_list, info_prepath)
    (pid=4388)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 187, in calc_trial_metrics
    (pid=4388)     frames = session_metrics_list[0]['local']['frames']
    (pid=4388) IndexError: list index out of range
    (pid=4388) 
    (pid=4388) During handling of the above exception, another exception occurred:
    (pid=4388) 
    (pid=4388) Traceback (most recent call last):
    (pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    (pid=4388)     self.run()
    (pid=4388)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 102, in run
    (pid=4388)     err_tb = err_tb.format_exc()
    (pid=4388) AttributeError: 'traceback' object has no attribute 'format_exc'
    (pid=4388) 
    (pid=4389) 2020-01-30 11:38:58,570	ERROR function_runner.py:96 -- Runner Thread raised error.
    (pid=4389) Traceback (most recent call last):
    (pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 90, in run
    (pid=4389)     self._entrypoint()
    (pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 141, in entrypoint
    (pid=4389)     return self._trainable_func(config, self._status_reporter)
    (pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 249, in _trainable_func
    (pid=4389)     output = train_func(config, reporter)
    (pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/search.py", line 90, in ray_trainable
    (pid=4389)     metrics = Trial(spec).run()
    (pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/control.py", line 181, in run
    (pid=4389)     metrics = analysis.analyze_trial(self.spec, session_metrics_list)
    (pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 265, in analyze_trial
    (pid=4389)     trial_metrics = calc_trial_metrics(session_metrics_list, info_prepath)
    (pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 187, in calc_trial_metrics
    (pid=4389)     frames = session_metrics_list[0]['local']['frames']
    (pid=4389) IndexError: list index out of range
    (pid=4389) Exception in thread Thread-1:
    (pid=4389) Traceback (most recent call last):
    (pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 90, in run
    (pid=4389)     self._entrypoint()
    (pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 141, in entrypoint
    (pid=4389)     return self._trainable_func(config, self._status_reporter)
    (pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 249, in _trainable_func
    (pid=4389)     output = train_func(config, reporter)
    (pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/search.py", line 90, in ray_trainable
    (pid=4389)     metrics = Trial(spec).run()
    (pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/control.py", line 181, in run
    (pid=4389)     metrics = analysis.analyze_trial(self.spec, session_metrics_list)
    (pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 265, in analyze_trial
    (pid=4389)     trial_metrics = calc_trial_metrics(session_metrics_list, info_prepath)
    (pid=4389)   File "/home/joe/SLM-Lab/slm_lab/experiment/analysis.py", line 187, in calc_trial_metrics
    (pid=4389)     frames = session_metrics_list[0]['local']['frames']
    (pid=4389) IndexError: list index out of range
    (pid=4389) 
    (pid=4389) During handling of the above exception, another exception occurred:
    (pid=4389) 
    (pid=4389) Traceback (most recent call last):
    (pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/threading.py", line 917, in _bootstrap_inner
    (pid=4389)     self.run()
    (pid=4389)   File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 102, in run
    (pid=4389)     err_tb = err_tb.format_exc()
    (pid=4389) AttributeError: 'traceback' object has no attribute 'format_exc'
    (pid=4389) 
    2020-01-30 11:38:59,690	ERROR trial_runner.py:497 -- Error processing event.
    Traceback (most recent call last):
      File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 446, in _process_trial
        result = self.trial_executor.fetch_result(trial)
      File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 316, in fetch_result
        result = ray.get(trial_future[0])
      File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/worker.py", line 2197, in get
        raise value
    ray.exceptions.RayTaskError: ray_worker (pid=4388, host=Gauss)
      File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/trainable.py", line 151, in train
        result = self._train()
      File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 203, in _train
        ("Wrapped function ran until completion without reporting "
    ray.tune.error.TuneError: Wrapped function ran until completion without reporting results or raising an exception.
    
    2020-01-30 11:38:59,694	INFO ray_trial_executor.py:180 -- Destroying actor for trial ray_trainable_0_agent.0.algorithm.center_return=True,trial_index=0. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
    2020-01-30 11:38:59,705	ERROR trial_runner.py:497 -- Error processing event.
    Traceback (most recent call last):
      File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 446, in _process_trial
        result = self.trial_executor.fetch_result(trial)
      File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 316, in fetch_result
        result = ray.get(trial_future[0])
      File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/worker.py", line 2197, in get
        raise value
    ray.exceptions.RayTaskError: ray_worker (pid=4389, host=Gauss)
      File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/trainable.py", line 151, in train
        result = self._train()
      File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/function_runner.py", line 203, in _train
        ("Wrapped function ran until completion without reporting "
    ray.tune.error.TuneError: Wrapped function ran until completion without reporting results or raising an exception.
    
    2020-01-30 11:38:59,707	INFO ray_trial_executor.py:180 -- Destroying actor for trial ray_trainable_1_agent.0.algorithm.center_return=False,trial_index=1. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
    == Status ==
    Using FIFO scheduling algorithm.
    Resources requested: 0/8 CPUs, 0/1 GPUs
    Memory usage on this node: 2.5/16.7 GB
    Result logdir: /home/joe/ray_results/reinforce_baseline_cartpole
    Number of trials: 2 ({'ERROR': 2})
    ERROR trials:
     - ray_trainable_0_agent.0.algorithm.center_return=True,trial_index=0:	ERROR, 1 failures: /home/joe/ray_results/reinforce_baseline_cartpole/ray_trainable_0_agent.0.algorithm.center_return=True,trial_index=0_2020-01-30_11-38-57n2qc80ke/error_2020-01-30_11-38-59.txt
     - ray_trainable_1_agent.0.algorithm.center_return=False,trial_index=1:	ERROR, 1 failures: /home/joe/ray_results/reinforce_baseline_cartpole/ray_trainable_1_agent.0.algorithm.center_return=False,trial_index=1_2020-01-30_11-38-57unqmlqvg/error_2020-01-30_11-38-59.txt
    
    Traceback (most recent call last):
      File "run_lab.py", line 80, in <module>
        main()
      File "run_lab.py", line 72, in main
        read_spec_and_run(*args)
      File "run_lab.py", line 56, in read_spec_and_run
        run_spec(spec, lab_mode)
      File "run_lab.py", line 35, in run_spec
        Experiment(spec).run()
      File "/home/joe/SLM-Lab/slm_lab/experiment/control.py", line 203, in run
        trial_data_dict = search.run_ray_search(self.spec)
      File "/home/joe/SLM-Lab/slm_lab/experiment/search.py", line 124, in run_ray_search
        server_port=util.get_port(),
      File "/home/joe/anaconda3/envs/lab/lib/python3.7/site-packages/ray/tune/tune.py", line 265, in run
        raise TuneError("Trials did not complete", errored_trials)
    ray.tune.error.TuneError: ('Trials did not complete', [ray_trainable_0_agent.0.algorithm.center_return=True,trial_index=0, ray_trainable_1_agent.0.algorithm.center_return=False,trial_index=1])
    
  • ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command

    ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command

    Describe the bug After successfully installing SLM-Lab and proceeding to the "Quick Start" portion which involves running DQN on the CartPole environment, everything works well i.e. (final_return_ma increases).

    Command entered: python run_lab.py slm_lab/spec/demo.json dqn_cartpole dev

    After several log summary and metric instances an OpenGL error code occurs :

    [101017:1015/191313.594764:ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command

    and then the process seems to end without showing any graphs.

    To Reproduce

    1. OS and environment: Ubuntu 20.04 LTS

    2. SLM Lab git SHA (run git rev-parse HEAD to get it):dda02d00031553aeda4c49c5baa7d0706c53996b

    3. spec file used: slm_lab/spec/demo.json

    Error logs

    [2020-10-15 19:13:09,800 PID:100781 INFO __init__.py log_summary] Trial 0 session 0 dqn_cartpole_t0_s0 [train_df] epi: 123  t: 120  wall_t: 153  opt_step: 398720  frame: 10000  fps: 65.3595  total_reward: 200  total_reward_ma: 142.7  loss: 5.46846  lr: 0.00774841  explore_var: 0.1  entropy_coef: nan  entropy: nan  grad_norm: 0.230459
    [2020-10-15 19:13:09,821 PID:100781 INFO __init__.py log_metrics] Trial 0 session 0 dqn_cartpole_t0_s0 [train_df metrics] final_return_ma: 142.7  strength: 120.84  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 0.00019783  training_efficiency: 5.02079e-06  stability: 0.926742
    [100946:1015/191310.923076:ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command
    [2020-10-15 19:13:12,794 PID:100781 INFO __init__.py log_metrics] Trial 0 session 0 dqn_cartpole_t0_s0 [eval_df metrics] final_return_ma: 142.7  strength: 120.84  max_strength: 178.14  final_strength: 178.14  sample_efficiency: 0.00019783  training_efficiency: 5.02079e-06  stability: 0.926742
    [2020-10-15 19:13:12,798 PID:100781 INFO logger.py info] Session 0 done
    [101017:1015/191313.594764:ERROR:buffer_manager.cc(488)] [.DisplayCompositor]GL ERROR :GL_INVALID_OPERATION : glBufferData: <- error from previous GL command
    [2020-10-15 19:13:15,443 PID:100781 INFO logger.py info] Trial 0 done
    
    
    
    
  • Error at end the execution

    Error at end the execution

    Hi, I get stuck at the end of the trial, when it finish, can't create the respective graphics, i got the next traceback error, what can it be?

    Traceback (most recent call last): File "run_lab.py", line 63, in main() File "run_lab.py", line 59, in main run_by_mode(spec_file, spec_name, lab_mode) File "run_lab.py", line 38, in run_by_mode Trial(spec).run() File "/home/kelo/librerias/SLM-Lab/slm_lab/experiment/control.py", line 122, in run session_datas = util.parallelize_fn(self.init_session_and_run, info_spaces, num_cpus) File "/home/kelo/librerias/SLM-Lab/slm_lab/lib/util.py", line 533, in parallelize_fn results = pool.map(fn, args) File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value ValueError: Invalid property specified for object of type plotly.graph_objs.Layout: 'yaxis2'

  • Arch Install

    Arch Install

    Hi, i'm having trouble in the installation because the linux distro, can you indicate the packages required for a correct installation for run the "yarn install" command.

    It's look a great framework and i'll like to test it, thanks and regards.

  • How to add a non-gym envrinment?

    How to add a non-gym envrinment?

    Hi, kengz, How to add a non-gym environment, such as Mahjong or Poker enviroment in rlcard project(https://github.com/datamllab/rlcard). Would you provide a simple demo for adding a new non-gym env, or give some suggestions about how to quickly add ?

  • why i get

    why i get "terminating" ?

    HI!

    I get terminating when i trainning with search mode and connect to env by grpc ,the log like this: "(pid=2023) terminating" and has nothing else logs about this "terminating", my process also killed by it at the same time. why i get that? @kengz @lgraesser

  • missing module cv2

    missing module cv2

    /SLM-Lab/slm_lab/lib/util.py", line 5, in import cv2 ModuleNotFoundError: No module named 'cv2'

    To Reproduce

    1. OS used: Ubuntu 18 LTS
    2. SLM-Lab git: git cloned
    3. demo.json not working

    Additional context had to add cmake libgcc manually

    Error logs (base) l*@l*-HP-Pavilion-dv7-PC:~/SLM-Lab$ python3 run_lab.py slm_lab/spec/demo.json dqn_cartpole dev Traceback (most recent call last): File "run_lab.py", line 10, in from slm_lab.experiment import analysis, retro_analysis File "/home/l*/SLM-Lab/slm_lab/experiment/analysis.py", line 5, in from slm_lab.agent import AGENT_DATA_NAMES File "/home/lr/SLM-Lab/slm_lab/agent/init.py", line 21, in from slm_lab.agent import algorithm, memory File "/home/l/SLM-Lab/slm_lab/agent/algorithm/init.py", line 8, in from .actor_critic import * File "/home/l*/SLM-Lab/slm_lab/agent/algorithm/actor_critic.py", line 1, in from slm_lab.agent import net File "/home/l*/SLM-Lab/slm_lab/agent/net/init.py", line 6, in from slm_lab.agent.net.conv import * File "/home/l*/SLM-Lab/slm_lab/agent/net/conv.py", line 1, in from slm_lab.agent.net import net_util File "/home/l*/SLM-Lab/slm_lab/agent/net/net_util.py", line 3, in from slm_lab.lib import logger, util File "/home/lr/SLM-Lab/slm_lab/lib/logger.py", line 1, in from slm_lab.lib import util File "/home/l/SLM-Lab/slm_lab/lib/util.py", line 5, in import cv2 ModuleNotFoundError: No module named 'cv2'

  • Undefined names

    Undefined names

    Undefined names have the potential to raise NameError at runtime.

    flake8 testing of https://github.com/kengz/SLM-Lab on Python 3.6.3

    $ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

    ./slm_lab/agent/algorithm/base.py:73:16: F821 undefined name 'action'
            return action
                   ^
    ./slm_lab/agent/algorithm/base.py:99:16: F821 undefined name 'batch'
            return batch
                   ^
    ./slm_lab/agent/algorithm/policy_util.py:43:13: F821 undefined name 'new_prob'
                new_prob[torch.argmax(probs, dim=0)] = 1.0
                ^
    ./slm_lab/env/__init__.py:97:49: F821 undefined name 'nvec'
            setattr(gym_space, 'low', np.zeros_like(nvec))
                                                    ^
    ./slm_lab/experiment/search.py:131:9: F821 undefined name 'config'
            config['trial_index'] = self.experiment.info_space.tick('trial')['trial']
            ^
    ./slm_lab/experiment/search.py:133:16: F821 undefined name 'config'
            return config
                   ^
    ./slm_lab/experiment/search.py:146:16: F821 undefined name 'trial_data_dict'
            return trial_data_dict
                   ^
    ./test/agent/net/test_nn.py:83:25: F821 undefined name 'net_util'
            before_params = net_util.copy_trainable_params(net)
                            ^
    ./test/agent/net/test_nn.py:88:24: F821 undefined name 'net_util'
            after_params = net_util.copy_trainable_params(net)
                           ^
    ./test/agent/net/test_nn.py:114:25: F821 undefined name 'net_util'
            before_params = net_util.copy_fixed_params(net)
                            ^
    ./test/agent/net/test_nn.py:118:24: F821 undefined name 'net_util'
            after_params = net_util.copy_fixed_params(net)
                           ^
    11    F821 undefined name 'action'
    11
    
  • docker gotchas

    docker gotchas

    Hi. I tried running this through Docker, and ran into a few gotchas following the gitbook instructions:

    • the files in bin somehow gave me permission errors, despite being root. pasting these manually helped as a work-around.
    • the setup script used sudo a lot, but the docker container did not recognize this. removing these helped. fwiw, installing sudo helped as well.
    • source activate lab errored stating source was not recognized. I then tried:
    # conda config --add channels anaconda
    # conda activate lab
    # conda env update
    (lab) # python3 --version
    Python 3.6.4
    (lab) # yarn start
    $ python3 run_lab.py
    Traceback (most recent call last):
      File "run_lab.py", line 6, in <module>
        from slm_lab.experiment.control import Session, Trial, Experiment
      File "/opt/SLM-Lab/slm_lab/__init__.py", line 12
        with open(os.path.join(ROOT_DIR, 'config', f'{config_name}.json')) as f:
                                                                       ^
    SyntaxError: invalid syntax
    error Command failed with exit code 1
    

    Trying this line in this python3 seemed not to yield syntax errors though, so f-strings do seem supported. Weird.

    I haven't fully gotten this to work, but hopefully some of this may be useful for the tutorial. I tried looking for the gitbook source in case I could add to the installation instructions based on this, but couldn't find it.

  • Potential Memory Leak

    Potential Memory Leak

    Hello,

    I am currently using SLM lab as the learning component of my custom Unity environments. I am using a modified UnityEnv wrapper and I run my experiments using a modified version of the starter code here.

    When I am running both PPO and SAC I realized that my Unix kernel kills the job after a while due running out of memory (RAM/Swap).

    Given the custom nature of this bug, I don't expect you to replicate it, but rather, asking if you had ever faced a similar problem on your end.

    Some more detail:

    1. Initially, I assumed it was due to the size of the replay buffer. But even after the replay buffer was capped up a small number (1000) and got maxed out the problem persisted.
    2. The memory increase is roughly on the order of 1mb/s which is relatively high.
    3. I managed to trace it to the "train step" in SAC. Can't trace if memory is created there, but when the training steps aren't taken, there is no problem.
    4. I tested with the default Unity envs to ensure I didn't cause the problem with my custom env--this doesn't seem to be the cause.
    5. We will be testing with the provided Cartpole env to see if the problem persists.

    Any guidance or tips would be appreciated! And once again thank you for the great library!

  • Fail to save graphs

    Fail to save graphs

    I follow the book "Foundations of Deep Reinforcement Learning" to conduct the experiments of reinformace algorithm.Although the algorithm can be conducted successfully, its graphs fail to be saved successfully, with an error from orca "service unavaialble".

    1. OS and environment: Ubuntu 16.04
    2. spec file used: reinforce_cartpole.json

    Additional context Add any other context about the problem here.

    Error logs Failed to generate graph. Run retro-analysis to generate graphs later. The image request was rejected by the orca conversion utility with the following error: 503:

    503 Service Unavailable

    Service Unavailable

    The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

  • Docker build fails on environment.yml installation

    Docker build fails on environment.yml installation

    Describe the bug running docker build hits an error during the build process

    To Reproduce

    1. OS and environment: Windows 10, Docker for Windows
    2. SLM Lab git SHA (run git rev-parse HEAD to get it): 2890277c8d499dbc925a16bda40acd8c29cb6819
    3. spec file used: unknown

    Additional context this appears to be caused by a problem earlier in the Dockerfile, where the python-pyglet package is failing to install.

    Error logs

     > [7/9] RUN . ~/miniconda3/etc/profile.d/conda.sh &&     conda create -n lab python=3.7.3 -y &&     conda activate lab &&     conda env update -f environment.yml &&     conda clean -y --all &&     rm -rf ~/.cache/pip:
    #11 1.493 Collecting package metadata (current_repodata.json): ...working... done
    #11 5.422 Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
    #11 5.424 Collecting package metadata (repodata.json): ...working... done
    #11 15.34 Solving environment: ...working... done
    #11 15.85
    #11 15.85
    #11 15.85 ==> WARNING: A newer version of conda exists. <==
    #11 15.85   current version: 4.12.0
    #11 15.85   latest version: 4.14.0
    #11 15.85
    #11 15.85 Please update conda by running
    #11 15.85
    #11 15.85     $ conda update -n base -c defaults conda
    #11 15.85
    #11 15.85
    #11 15.93
    #11 15.93 ## Package Plan ##
    #11 15.93
    #11 15.93   environment location: /root/miniconda3/envs/lab
    #11 15.93
    #11 15.93   added / updated specs:
    #11 15.93     - python=3.7.3
    #11 15.93
    #11 15.93
    #11 15.93 The following packages will be downloaded:
    #11 15.93
    #11 15.93     package                    |            build
    #11 15.93     ---------------------------|-----------------
    #11 15.93     _openmp_mutex-5.1          |            1_gnu          21 KB
    #11 15.93     ca-certificates-2022.07.19 |       h06a4308_0         124 KB
    #11 15.93     certifi-2022.6.15          |   py37h06a4308_0         153 KB
    #11 15.93     libedit-3.1.20210910       |       h7f8727e_0         166 KB
    #11 15.93     libffi-3.2.1               |    hf484d3e_1007          48 KB
    #11 15.93     libgcc-ng-11.2.0           |       h1234567_1         5.3 MB
    #11 15.93     libgomp-11.2.0             |       h1234567_1         474 KB
    #11 15.93     libstdcxx-ng-11.2.0        |       h1234567_1         4.7 MB
    #11 15.93     ncurses-6.3                |       h5eee18b_3         781 KB
    #11 15.93     openssl-1.1.1q             |       h7f8727e_0         2.5 MB
    #11 15.93     pip-22.1.2                 |   py37h06a4308_0         2.4 MB
    #11 15.93     python-3.7.3               |       h0371630_0        32.1 MB
    #11 15.93     readline-7.0               |       h7b6447c_5         324 KB
    #11 15.93     setuptools-63.4.1          |   py37h06a4308_0         1.1 MB
    #11 15.93     sqlite-3.33.0              |       h62c20be_0         1.1 MB
    #11 15.93     tk-8.6.12                  |       h1ccaba5_0         3.0 MB
    #11 15.93     xz-5.2.5                   |       h7f8727e_1         339 KB
    #11 15.93     zlib-1.2.12                |       h7f8727e_2         106 KB
    #11 15.93     ------------------------------------------------------------
    #11 15.93                                            Total:        54.8 MB
    #11 15.93
    #11 15.93 The following NEW packages will be INSTALLED:
    #11 15.93
    #11 15.93   _libgcc_mutex      pkgs/main/linux-64::_libgcc_mutex-0.1-main
    #11 15.93   _openmp_mutex      pkgs/main/linux-64::_openmp_mutex-5.1-1_gnu
    #11 15.93   ca-certificates    pkgs/main/linux-64::ca-certificates-2022.07.19-h06a4308_0
    #11 15.93   certifi            pkgs/main/linux-64::certifi-2022.6.15-py37h06a4308_0
    #11 15.93   libedit            pkgs/main/linux-64::libedit-3.1.20210910-h7f8727e_0
    #11 15.93   libffi             pkgs/main/linux-64::libffi-3.2.1-hf484d3e_1007
    #11 15.93   libgcc-ng          pkgs/main/linux-64::libgcc-ng-11.2.0-h1234567_1
    #11 15.93   libgomp            pkgs/main/linux-64::libgomp-11.2.0-h1234567_1
    #11 15.93   libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-11.2.0-h1234567_1
    #11 15.93   ncurses            pkgs/main/linux-64::ncurses-6.3-h5eee18b_3
    #11 15.93   openssl            pkgs/main/linux-64::openssl-1.1.1q-h7f8727e_0
    #11 15.93   pip                pkgs/main/linux-64::pip-22.1.2-py37h06a4308_0
    #11 15.93   python             pkgs/main/linux-64::python-3.7.3-h0371630_0
    #11 15.93   readline           pkgs/main/linux-64::readline-7.0-h7b6447c_5
    #11 15.93   setuptools         pkgs/main/linux-64::setuptools-63.4.1-py37h06a4308_0
    #11 15.93   sqlite             pkgs/main/linux-64::sqlite-3.33.0-h62c20be_0
    #11 15.93   tk                 pkgs/main/linux-64::tk-8.6.12-h1ccaba5_0
    #11 15.93   wheel              pkgs/main/noarch::wheel-0.37.1-pyhd3eb1b0_0
    #11 15.93   xz                 pkgs/main/linux-64::xz-5.2.5-h7f8727e_1
    #11 15.93   zlib               pkgs/main/linux-64::zlib-1.2.12-h7f8727e_2
    #11 15.93
    #11 15.93
    #11 15.93
    #11 15.93 Downloading and Extracting Packages
    zlib-1.2.12          | 106 KB    | ########## | 100%
    xz-5.2.5             | 339 KB    | ########## | 100%
    libedit-3.1.20210910 | 166 KB    | ########## | 100%
    _openmp_mutex-5.1    | 21 KB     | ########## | 100%
    sqlite-3.33.0        | 1.1 MB    | ########## | 100%
    libstdcxx-ng-11.2.0  | 4.7 MB    | ########## | 100%
    ncurses-6.3          | 781 KB    | ########## | 100%
    python-3.7.3         | 32.1 MB   | ########## | 100%
    certifi-2022.6.15    | 153 KB    | ########## | 100%
    tk-8.6.12            | 3.0 MB    | ########## | 100%
    libgomp-11.2.0       | 474 KB    | ########## | 100%
    libffi-3.2.1         | 48 KB     | ########## | 100%
    ca-certificates-2022 | 124 KB    | ########## | 100%
    setuptools-63.4.1    | 1.1 MB    | ########## | 100%
    pip-22.1.2           | 2.4 MB    | ########## | 100%
    openssl-1.1.1q       | 2.5 MB    | ########## | 100%
    readline-7.0         | 324 KB    | ########## | 100%
    libgcc-ng-11.2.0     | 5.3 MB    | ########## | 100%
    #11 24.41 Preparing transaction: ...working... done
    #11 24.74 Verifying transaction: ...working... done
    #11 25.93 Executing transaction: ...working... done
    #11 28.11 #
    #11 28.11 # To activate this environment, use
    #11 28.11 #
    #11 28.11 #     $ conda activate lab
    #11 28.11 #
    #11 28.11 # To deactivate an active environment, use
    #11 28.11 #
    #11 28.11 #     $ conda deactivate
    #11 28.11
    #11 29.82 Collecting package metadata (repodata.json): ...working... done
    #11 101.5 Solving environment: ...working... done
    #11 148.2
    #11 148.2
    #11 148.2 ==> WARNING: A newer version of conda exists. <==
    #11 148.2   current version: 4.12.0
    #11 148.2   latest version: 4.14.0
    #11 148.2
    #11 148.2 Please update conda by running
    #11 148.2
    #11 148.2     $ conda update -n base -c defaults conda
    #11 148.2
    #11 148.2
    #11 148.3
    #11 148.3 Downloading and Extracting Packages
    libgfortran-ng-7.5.0 | 23 KB     | ########## | 100%
    colorlog-4.0.2       | 19 KB     | ########## | 100%
    lz4-c-1.9.3          | 179 KB    | ########## | 100%
    jdcal-1.4.1          | 9 KB      | ########## | 100%
    scipy-1.3.0          | 18.8 MB   | ########## | 100%
    ujson-1.35           | 28 KB     | ########## | 100%
    mkl-2022.0.1         | 127.7 MB  | ########## | 100%
    xlrd-1.2.0           | 108 KB    | ########## | 100%
    libopenblas-0.3.12   | 8.2 MB    | ########## | 100%
    regex-2019.05.25     | 365 KB    | ########## | 100%
    pytest-4.5.0         | 354 KB    | ########## | 100%
    libgcc-7.2.0         | 304 KB    | ########## | 100%
    libwebp-base-1.2.2   | 824 KB    | ########## | 100%
    six-1.16.0           | 14 KB     | ########## | 100%
    zipp-3.8.1           | 13 KB     | ########## | 100%
    cffi-1.14.4          | 224 KB    | ########## | 100%
    et_xmlfile-1.0.1     | 11 KB     | ########## | 100%
    liblapack-3.9.0      | 11 KB     | ########## | 100%
    olefile-0.46         | 32 KB     | ########## | 100%
    importlib-metadata-4 | 33 KB     | ########## | 100%
    cudatoolkit-10.1.243 | 427.6 MB  | ########## | 100%
    py-1.11.0            | 74 KB     | ########## | 100%
    backports.functools_ | 9 KB      | ########## | 100%
    wcwidth-0.2.5        | 33 KB     | ########## | 100%
    pydash-4.2.1         | 60 KB     | ########## | 100%
    retrying-1.3.3       | 11 KB     | ########## | 100%
    libgfortran4-7.5.0   | 1.2 MB    | ########## | 100%
    flaky-3.5.3          | 19 KB     | ########## | 100%
    ca-certificates-2022 | 149 KB    | ########## | 100%
    pluggy-0.13.1        | 29 KB     | ########## | 100%
    python-3.7.3         | 35.7 MB   | ########## | 100%
    libtiff-4.2.0        | 590 KB    | ########## | 100%
    typing_extensions-4. | 28 KB     | ########## | 100%
    autopep8-1.4.4       | 38 KB     | ########## | 100%
    psutil-5.6.2         | 320 KB    | ########## | 100%
    openssl-1.1.1o       | 2.1 MB    | ########## | 100%
    importlib_metadata-4 | 4 KB      | ########## | 100%
    libcblas-3.9.0       | 11 KB     | ########## | 100%
    pytorch-1.3.1        | 428.0 MB  | ########## | 100%
    python-dateutil-2.8. | 240 KB    | ########## | 100%
    zstd-1.5.0           | 490 KB    | ########## | 100%
    yaml-0.2.5           | 87 KB     | ########## | 100%
    libpng-1.6.37        | 306 KB    | ########## | 100%
    ninja-1.11.0         | 2.8 MB    | ########## | 100%
    attrs-22.1.0         | 48 KB     | ########## | 100%
    coverage-4.5.3       | 216 KB    | ########## | 100%
    pytest-cov-2.7.1     | 17 KB     | ########## | 100%
    certifi-2022.6.15    | 155 KB    | ########## | 100%
    pillow-6.2.0         | 634 KB    | ########## | 100%
    bzip2-1.0.8          | 484 KB    | ########## | 100%
    pyyaml-5.1.2         | 184 KB    | ########## | 100%
    numpy-1.16.3         | 4.3 MB    | ########## | 100%
    atomicwrites-1.4.1   | 12 KB     | ########## | 100%
    jpeg-9e              | 268 KB    | ########## | 100%
    pytz-2022.2.1        | 224 KB    | ########## | 100%
    libblas-3.9.0        | 11 KB     | ########## | 100%
    intel-openmp-2022.0. | 4.2 MB    | ########## | 100%
    pycparser-2.21       | 100 KB    | ########## | 100%
    pandas-0.24.2        | 8.6 MB    | ########## | 100%
    pip-19.1.1           | 1.8 MB    | ########## | 100%
    more-itertools-8.14. | 45 KB     | ########## | 100%
    plotly-4.9.0         | 5.8 MB    | ########## | 100%
    pycodestyle-2.5.0    | 36 KB     | ########## | 100%
    pytest-timeout-1.3.3 | 12 KB     | ########## | 100%
    freetype-2.10.4      | 890 KB    | ########## | 100%
    python_abi-3.7       | 4 KB      | ########## | 100%
    backports-1.0        | 4 KB      | ########## | 100%
    openpyxl-2.6.1       | 152 KB    | ########## | 100%
    #11 419.8 Preparing transaction: ...working... done
    #11 422.4 Verifying transaction: ...working... done
    #11 425.0 Executing transaction: ...working... By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html
    #11 436.6
    #11 436.6 done
    #11 437.1 Installing pip dependencies: ...working... Ran pip subprocess with arguments:
    #11 818.8 ['/root/miniconda3/envs/lab/bin/python', '-m', 'pip', 'install', '-U', '-r', '/root/SLM-Lab/condaenv.u9_zu190.requirements.txt']
    #11 818.8 Pip subprocess output:
    #11 818.8 Collecting box2d-py==2.3.8 (from -r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 1))
    #11 818.8   Downloading https://files.pythonhosted.org/packages/87/34/da5393985c3ff9a76351df6127c275dcb5749ae0abbe8d5210f06d97405d/box2d_py-2.3.8-cp37-cp37m-manylinux1_x86_64.whl (448kB)
    #11 818.8 Collecting cloudpickle==0.5.2 (from -r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 2))
    #11 818.8   Downloading https://files.pythonhosted.org/packages/aa/18/514b557c4d8d4ada1f0454ad06c845454ad438fd5c5e0039ba51d6b032fe/cloudpickle-0.5.2-py2.py3-none-any.whl
    #11 818.8 Collecting colorlover==0.3.0 (from -r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 3))
    #11 818.8   Downloading https://files.pythonhosted.org/packages/9a/53/f696e4480b1d1de3b1523991dea71cf417c8b19fe70c704da164f3f90972/colorlover-0.3.0-py3-none-any.whl
    #11 818.8 Collecting future==0.18.2 (from -r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 4))
    #11 818.8   Downloading https://files.pythonhosted.org/packages/45/0b/38b06fd9b92dc2b68d58b75f900e97884c45bedd2ff83203d933cf5851c9/future-0.18.2.tar.gz (829kB)
    ...
    ...
    
    #11 818.8 Requirement already satisfied, skipping upgrade: zipp>=0.5 in /root/miniconda3/envs/lab/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->click->ray==0.7.0->-r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 8)) (3.8.1)
    #11 818.8 Requirement already satisfied, skipping upgrade: typing-extensions>=3.6.4; python_version < "3.8" in /root/miniconda3/envs/lab/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->click->ray==0.7.0->-r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 8)) (4.3.0)
    #11 818.8 Collecting pyasn1>=0.1.3 (from rsa<5,>=3.1.4; python_version >= "3.6"->google-auth<2,>=1.6.3->tensorboard==2.1.1->-r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 10))
    #11 818.8   Downloading https://files.pythonhosted.org/packages/62/1e/a94a8d635fa3ce4cfc7f506003548d0a2447ae76fd5ca53932970fe3053f/pyasn1-0.4.8-py2.py3-none-any.whl (77kB)
    #11 818.8 Collecting oauthlib>=3.0.0 (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard==2.1.1->-r /root/SLM-Lab/condaenv.u9_zu190.requirements.txt (line 10))
    #11 818.8   Downloading https://files.pythonhosted.org/packages/1d/46/5ee2475e1b46a26ca0fa10d3c1d479577fde6ee289f8c6aa6d7ec33e31fd/oauthlib-3.2.0-py3-none-any.whl (151kB)
    #11 818.8 Building wheels for collected packages: future, pyopengl, xvfbwrapper, gym, typing, grpcio, MarkupSafe
    #11 818.8   Building wheel for future (setup.py): started
    #11 818.8   Building wheel for future (setup.py): finished with status 'done'
    #11 818.8   Stored in directory: /root/.cache/pip/wheels/8b/99/a0/81daf51dcd359a9377b110a8a886b3895921802d2fc1b2397e
    #11 818.8   Building wheel for pyopengl (setup.py): started
    #11 818.8   Building wheel for pyopengl (setup.py): finished with status 'done'
    #11 818.8   Stored in directory: /root/.cache/pip/wheels/6c/00/7f/1dd736f380848720ad79a1a1de5272e0d3f79c15a42968fb58
    #11 818.8   Building wheel for xvfbwrapper (setup.py): started
    #11 818.8   Building wheel for xvfbwrapper (setup.py): finished with status 'done'
    #11 818.8   Stored in directory: /root/.cache/pip/wheels/10/f2/61/cacfaf84b352c223761ea8d19616e3b5ac5c27364da72863f0
    #11 818.8   Building wheel for gym (setup.py): started
    #11 818.8   Building wheel for gym (setup.py): finished with status 'done'
    #11 818.8   Stored in directory: /root/.cache/pip/wheels/57/b0/13/4153e1acab826fbe612c95b1336a63a3fa6416902a8d74a1b7
    #11 818.8   Building wheel for typing (setup.py): started
    #11 818.8   Building wheel for typing (setup.py): finished with status 'done'
    #11 818.8   Stored in directory: /root/.cache/pip/wheels/2d/04/41/8e1836e79581989c22eebac3f4e70aaac9af07b0908da173be
    #11 818.8   Building wheel for grpcio (setup.py): started
    #11 818.8   Building wheel for grpcio (setup.py): still running...
    #11 818.8   Building wheel for grpcio (setup.py): still running...
    #11 818.8   Building wheel for grpcio (setup.py): finished with status 'error'
    #11 818.8   Running setup.py clean for grpcio
    #11 818.8   Building wheel for MarkupSafe (setup.py): started
    #11 818.8   Building wheel for MarkupSafe (setup.py): finished with status 'done'
    #11 818.8   Stored in directory: /root/.cache/pip/wheels/f5/40/34/d60ef965622011684037ea53e53fd44ef58ed2062f26878ce2
    #11 818.8 Successfully built future pyopengl xvfbwrapper gym typing MarkupSafe
    #11 818.8 Failed to build grpcio
    #11 818.8 Installing collected packages: box2d-py, cloudpickle, colorlover, future, kaleido, opencv-python, pyopengl, typing, funcsigs, click, colorama, flatbuffers, redis, filelock, ray, absl-py, pyasn1, rsa, pyasn1-modules, cachetools, google-auth, markdown, MarkupSafe, werkzeug, charset-normalizer, idna, urllib3, requests, oauthlib, requests-oauthlib, google-auth-oauthlib, grpcio, protobuf, tensorboard, xvfbwrapper, pyglet, gym, pybullet, roboschool, atari-py
    #11 818.8   Running setup.py install for grpcio: started
    #11 818.8     Running setup.py install for grpcio: still running...
    #11 818.8     Running setup.py install for grpcio: still running...
    #11 818.8     Running setup.py install for grpcio: finished with status 'error'
    #11 818.8 Pip subprocess error:
    #11 818.8   ERROR: Complete output from command /root/miniconda3/envs/lab/bin/python -u -c 'import setuptools, tokenize;__file__='"'"'/tmp/pip-install-n1qdzi5c/grpcio/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-gt9n7xut --python-tag cp37:
    #11 818.8   ERROR: Found cython-generated files...
    #11 818.8   running bdist_wheel
    #11 818.8   running build
    #11 818.8   running build_py
    #11 818.8   running build_project_metadata
    #11 818.8   creating python_build
    #11 818.8   creating python_build/lib.linux-x86_64-cpython-37
    #11 818.8   creating python_build/lib.linux-x86_64-cpython-37/grpc
    #11 818.8   copying src/python/grpcio/grpc/_channel.py -> python_build/lib.linux-x86_64-cpython-37/grpc
    #11 818.8   copying src/python/grpcio/grpc/_utilities.py -> python_build/lib.linux-x86_64-cpython-37/grpc
    
    ...
    ...
                             ^
    #11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/base64/base64.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/base64/base64.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
    #11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
    #11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/abseil-cpp/absl/strings/internal/str_format/bind.cc -o python_build/temp.linux-x86_64-cpython-37/third_party/abseil-cpp/absl/strings/internal/str_format/bind.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
    #11 818.8   cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    #11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/abseil-cpp/absl/time/internal/cctz/src/time_zone_lookup.cc -o python_build/temp.linux-x86_64-cpython-37/third_party/abseil-cpp/absl/time/internal/cctz/src/time_zone_lookup.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
    #11 818.8   cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
    #11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-fuchsia.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-fuchsia.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
    #11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
    #11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-linux.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-linux.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
    #11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
    #11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-win.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-aarch64-win.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
    #11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
    #11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-arm-linux.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-arm-linux.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
    #11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
    #11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-arm.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-arm.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
    #11 818.8   cc1: warning: command line option ‘-std=c++14’ is valid for C++/ObjC++ but not for C
    #11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m -c third_party/boringssl-with-bazel/src/crypto/cpu-intel.c -o python_build/temp.linux-x86_64-cpython-37/third_party/boringssl-with-bazel/src/crypto/cpu-intel.o -std=c++14 -fvisibility=hidden -fno-wrapv -fno-exceptions -pthread
    #11 818.8   gcc -pthread -B /root/miniconda3/envs/lab/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -D_WIN32_WINNT=1536 -DGRPC_XDS_USER_AGENT_NAME_SUFFIX=\"Python\" -DGRPC_XDS_USER_AGENT_VERSION_SUFFIX=\"1.48.0\" -DGPR_BACKWARDS_COMPATIBILITY_MODE=1 -DHAVE_CONFIG_H=1 -DGRPC_ENABLE_FORK_SUPPORT=1 "-DPyMODINIT_FUNC=extern \"C\" __attribute__((visibility (\"default\"))) PyObject*" -DGRPC_POSIX_FORK_ALLOW_PTHREAD_ATFORK=1 -Isrc/python/grpcio -Iinclude -I. -Ithird_party/abseil-cpp -Ithird_party/address_sorting/include -Ithird_party/cares/cares/include -Ithird_party/cares -Ithird_party/cares/cares -Ithird_party/cares/config_linux -Ithird_party/re2 -Ithird_party/boringssl-with-bazel/src/include -Ithird_party/upb -Isrc/core/ext/upb-generated -Isrc/core/ext/upbdefs-generated -Ithird_party/xxhash -Ithird_party/zlib -I/root/miniconda3/envs/lab/include/python3.7m
    #11 818.8 [output clipped, log limit 1MiB reached]
    #11 818.8
    #11 818.8 failed
    ------
    executor failed running [/bin/bash -c . ~/miniconda3/etc/profile.d/conda.sh &&     conda create -n lab python=3.7.3 -y &&     conda activate lab &&     conda env update -f environment.yml &&     conda clean -y --all &&     rm -rf ~/.cache/pip]: exit code: 1
    
  • `optimizer.step()` before `lr_scheduler.step()` Warning Occurred

    `optimizer.step()` before `lr_scheduler.step()` Warning Occurred

    I really appreciate to you for your book, It's a great help for me to start RL. ^^

    Describe the bug A clear and concise description of what the bug is. When executing example code 4.7 (vanilla_dpn without any change), there comes a warning msg as below

    To Reproduce

    1. OS and environment: Ubuntu 20.04
    2. SLM Lab git SHA (run git rev-parse HEAD to get it): 5fa5ee3d034a38d5644f6f96b4c02ec366c831d0 (from the file "SLM-lab/data/vanilla_dqn_boltzmann_cartpole_2022_07_15_092012/vanilla_dqn_boltzmann_cartpole_t0_spec.json")
    3. spec file used: SLM-lab/slm_lab/spec/benchmark/dqn/dqn_cartpole.json

    Additional context After it occurred, it proceeded too slow (it took over an hour) than other methods (15 minutes for SARSA), and the result is also strange that mean_returns_ma decreases gradually to about 50 after 30k frames. I wonder the result of this trial is related to the warning situation

    Error logs

    [2022-07-15 09:20:14,002 PID:245693 INFO logger.py info] Running RL loop for trial 0 session 3
    [2022-07-15 09:20:14,006 PID:245693 INFO __init__.py log_summary] Trial 0 session 3 vanilla_dqn_boltzmann_cartpole_t0_s3 [train_df] epi: 0  t: 0  wall_t: 0  opt_step: 0  frame: 0  fps: 0  total_reward: nan  total_reward_ma: nan  loss: nan  lr: 0.01  explore_var: 5  entropy_coef: nan  entropy: nan  grad_norm: nan
    /home/eric/miniconda3/envs/lab/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:100: UserWarning:
    
    Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
    
    /home/eric/miniconda3/envs/lab/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:100: UserWarning:
    
    Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
    
    /home/eric/miniconda3/envs/lab/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:100: UserWarning:
    
    Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule.See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
    
  • how to improve the convergence performance of training loss?

    how to improve the convergence performance of training loss?

    Hi kengz, I find that the convergence performance of training loss (=value loss+policy loss) of ppo algorithem applied in game pong is poor (see Fig.1), but the corresponding mean_returns shows a good upward trend and reaches convergence (see Fig.2). That is why? how to improve the convergence performance of training loss? I tried many imporved tricks with ppo, but none of them worked. ppo_pong_t0_s0_session_graph_eval_loss_vs_frame Fig.1 ppo_pong_t0_s0_session_graph_eval_mean_returns_vs_frames Fig.2

  • How to run the code on multiple GPUs?

    How to run the code on multiple GPUs?

    Hi, kengz, I meet a problem on how to run on multiple GPUs? In the initial of class ConvNet in conv.py, the code assigned device as follows: self.to(self.device) but how to extent to multi GPUs here( in initial of class ConvNet ) , or for an instantiation of class ConvNet. When I try to use torch.nn.DataParallel(module, device_ids=None, output_device=None, dim=0) to assign to multi GPUs, there is a problem that some (public) methods or variables definition in class ConvNet will lose after conv_mode=torch.nn.DataParallel(conv_mode, device_ids={1,2,3,4}).

  • Book branch problem, need your help

    Book branch problem, need your help

    Describe the bug In the SLM_Lab-book branch,I run the reinfoce-cartpolp with search mode.There is a exception coming out . It happened on the search.py when worked on the line of 129 which is"for ray_trial in ray_trials"

    To Reproduce

    1. windows os
    2. on the book branch

    Error logs

    <class 'ray.tune.analysis.experiment_analysis.ExperimentAnalysis'> Traceback (most recent call last): File "D:/PythonProject/SLM-Lab-book/run_lab.py", line 99, in main() File "D:/PythonProject/SLM-Lab-book/run_lab.py", line 91, in main get_spec_and_run(*args) File "D:/PythonProject/SLM-Lab-book/run_lab.py", line 75, in get_spec_and_run run_spec(spec, lab_mode) File "D:/PythonProject/SLM-Lab-book/run_lab.py", line 55, in run_spec Experiment(spec).run() File "D:\PythonProject\SLM-Lab-book\slm_lab\experiment\control.py", line 204, in run trial_data_dict = search.run_ray_search(self.spec) File "D:\PythonProject\SLM-Lab-book\slm_lab\experiment\search.py", line 130, in run_ray_search for ray_trial in ray_trials: TypeError: 'ExperimentAnalysis' object is not iterable

    problem

ChainerRL is a deep reinforcement learning library built on top of Chainer.
ChainerRL is a deep reinforcement learning library built on top of Chainer.

ChainerRL ChainerRL is a deep reinforcement learning library that implements various state-of-the-art deep reinforcement algorithms in Python using Ch

Sep 22, 2022
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.

Dopamine Dopamine is a research framework for fast prototyping of reinforcement learning algorithms. It aims to fill the need for a small, easily grok

Sep 19, 2022
TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning. TF-Agents makes implementing, de

Sep 21, 2022
Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

Sep 23, 2022
Deep Reinforcement Learning for Keras.
Deep Reinforcement Learning for Keras.

Deep Reinforcement Learning for Keras What is it? keras-rl implements some state-of-the art deep reinforcement learning algorithms in Python and seaml

Sep 26, 2022
A toolkit for developing and comparing reinforcement learning algorithms.

Status: Maintenance (expect bug fixes and minor updates) OpenAI Gym OpenAI Gym is a toolkit for developing and comparing reinforcement learning algori

Sep 26, 2022
Doom-based AI Research Platform for Reinforcement Learning from Raw Visual Information. :godmode:

ViZDoom ViZDoom allows developing AI bots that play Doom using only the visual information (the screen buffer). It is primarily intended for research

Sep 24, 2022
A toolkit for reproducible reinforcement learning research.
A toolkit for reproducible reinforcement learning research.

garage garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementa

Sep 20, 2022
An open source robotics benchmark for meta- and multi-task reinforcement learning

Meta-World Meta-World is an open-source simulated benchmark for meta-reinforcement learning and multi-task learning consisting of 50 distinct robotic

Sep 20, 2022
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

Status: Maintenance (expect bug fixes and minor updates) Baselines OpenAI Baselines is a set of high-quality implementations of reinforcement learning

Sep 20, 2022
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms

Stable Baselines Stable Baselines is a set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines. You can read a

Sep 23, 2022
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)
A platform for Reasoning systems (Reinforcement Learning, Contextual Bandits, etc.)

Applied Reinforcement Learning @ Facebook Overview ReAgent is an open source end-to-end platform for applied reinforcement learning (RL) developed and

Sep 17, 2022
TensorFlow Reinforcement Learning

TRFL TRFL (pronounced "truffle") is a library built on top of TensorFlow that exposes several useful building blocks for implementing Reinforcement Le

Sep 23, 2022
Open world survival environment for reinforcement learning
Open world survival environment for reinforcement learning

Crafter Open world survival environment for reinforcement learning. Highlights Crafter is a procedurally generated 2D world, where the agent finds foo

Sep 13, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

MARL Tricks Our codes for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implemented and standardiz

Sep 21, 2022
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.
Paddle-RLBooks is a reinforcement learning code study guide based on pure PaddlePaddle.

Paddle-RLBooks Welcome to Paddle-RLBooks which is a reinforcement learning code study guide based on pure PaddlePaddle. 欢迎来到Paddle-RLBooks,该仓库主要是针对强化学

Sep 10, 2022
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

Serpent.AI - Game Agent Framework (Python) Update: Revival (May 2020) Development work has resumed on the framework with the aim of bringing it into 2

Sep 18, 2022
A general-purpose multi-agent training framework.

MALib A general-purpose multi-agent training framework. Installation step1: build environment conda create -n malib python==3.7 -y conda activate mali

Sep 15, 2022