Python強化練習之PyTorch opp算法實現月球登陸器

概述

從今天開始我們會開啟一個新的篇章, 帶領大傢來一起學習 (卷進) 強化學習 (Reinforcement Learning). 強化學習基於環境, 分析數據采取行動, 從而最大化未來收益.

在這裡插入圖片描述

強化學習算法種類

在這裡插入圖片描述

On-policy vs Off-policy:

  • On-policy: 訓練數據由當前 agent 不斷與環境交互得到
  • Off-policy: 訓練的 agent 和與環境交互的 agent 不是同一個 agent, 即別人與環境交互為我提供訓練數據

PPO 算法

PPO (Proximal Policy Optimization) 即近端策略優化. PPO 是一種 on-policy 算法, 通過實現小批量更新, 解決瞭訓練過程中新舊策略的變化差異過大導致不易學習的問題.

在這裡插入圖片描述

Actor-Critic 算法

Actor-Critic 算法共分為兩部分. 第一部分為策略函數 Actor, 負責生成動作並與環境交互; 第二部分為價值函數, 負責評估 Actor 的表現.

在這裡插入圖片描述

Gym

Gym 是一個強化學習會經常用到的包. Gym 裡收集瞭很多遊戲的環境. 下面我們就會用 LunarLander-v2 來實現一個自動版的 “阿波羅登月”.

在這裡插入圖片描述

安裝:

pip install gym

如果遇到報錯:

AttributeError: module 'gym.envs.box2d' has no attribute 'LunarLander'

解決辦法:

pip install gym[box2d]

LunarLander-v2

LunarLander-v2 是一個月球登陸器. 著陸平臺位於坐標 (0, 0). 坐標是狀態向量的前兩個數字, 從屏幕頂部移動到著陸臺和零速度的獎勵大約是 100 到 140分. 如果著陸器墜毀或停止, 則回合結束, 獲得額外的 -100 或 +100點. 每腳接地為 +10, 點火主機每幀 -0.3分, 正解為200分.

在這裡插入圖片描述

啟動登陸器

代碼:

import gym

# 創建環境
env = gym.make("LunarLander-v2")

# 重置環境
env.reset()

# 啟動
for i in range(180):

    # 渲染環境
    env.render()

    # 隨機移動
    observation, reward, done, info = env.step(env.action_space.sample())

    if i % 10 == 0:
        # 調試輸出
        print("觀察:", observation)
        print("得分:", reward)

輸出結果:

觀察: [ 0.00861025 1.4061487 0.42930993 -0.11858992 -0.00789343 -0.05729095
0. 0. ]
得分: 0.4097546298543773
觀察: [ 0.04917412 1.3876126 0.41002613 -0.13066985 -0.06578191 -0.12604967
0. 0. ]
得分: -1.0858669952763478
觀察: [ 0.08917055 1.3429415 0.43598312 -0.2890789 -0.17471936 -0.23913136
0. 0. ]
得分: -2.9339827504803666
觀察: [ 0.1326253 1.2450166 0.44708318 -0.5567949 -0.32039645 -0.28250334
0. 0. ]
得分: -2.2779730990326357
觀察: [ 0.18323365 1.1110108 0.615291 -0.61922276 -0.43743232 -0.2921057
0. 0. ]
得分: -3.107298313736037
觀察: [ 0.24544087 0.94960684 0.66677517 -0.7835077 -0.5929364 -0.2968613
0. 0. ]
得分: -0.5472611013563438
觀察: [ 0.3148238 0.75122666 0.7238519 -0.98458177 -0.72915816 -0.26130882
0. 0. ]
得分: -2.5665300894414416
觀察: [ 0.38628978 0.49828076 0.74157137 -1.2624744 -0.85754734 -0.37227553
0. 0. ]
得分: -3.2562193227533087
觀察: [ 0.46820658 0.18855602 0.92624503 -1.4677961 -1.08614 -0.4508995
0. 0. ]
得分: -4.017106927961208
觀察: [ 0.57930076 -0.09440845 1.4345247 -0.693939 -2.0783656 -5.4039164
1. 0. ]
得分: -100
觀察: [ 0.7383894 -0.08930686 1.4662493 -0.13461255 -3.653495 -3.109081
0. 0. ]
得分: -100
觀察: [ 0.859124 -0.08471288 0.9377837 0.21408719 -3.8998525 0.10151418
0. 0. ]
得分: -100
觀察: [ 9.3801367e-01 -4.6761338e-02 6.5999150e-01 1.4583524e-01
-3.9281998e+00 -4.7179851e-06 0.0000000e+00 1.0000000e+00]
得分: -100
觀察: [ 0.9879366 -0.04012476 0.33624884 0.08859511 -4.253908 -1.0233303
0. 0. ]
得分: -100
觀察: [ 1.0056045 -0.03840658 0.0733737 0.01812508 -4.6796274 -0.6103991
0. 0. ]
得分: -100
觀察: [ 1.0112988 -0.03921754 0.07890484 -0.00624387 -4.845023 -0.17111658
0. 0. ]
得分: -100
觀察: [ 1.0234139 -0.04488504 0.15701209 -0.0331554 -4.829875 0.07602684
0. 0. ]
得分: -100
觀察: [ 1.0306002e+00 -4.8987642e-02 -1.1189224e-02 8.7506004e-04
-4.8712435e+00 -1.5446089e-01 0.0000000e+00 0.0000000e+00]
得分: -100

PPO 算法實現月球登錄器

PPO

import torch
import torch.nn as nn
from torch.distributions import Categorical

# 是否使用GPU加速
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)


class Memory:
    def __init__(self):
        """初始化"""
        self.actions = []  # 行動(共4種)
        self.states = []  # 狀態, 由8個數字組成
        self.logprobs = []  # 概率
        self.rewards = []  # 獎勵
        self.is_terminals = []  # 遊戲是否結束

    def clear_memory(self):
        """清除memory"""
        del self.actions[:]
        del self.states[:]
        del self.logprobs[:]
        del self.rewards[:]
        del self.is_terminals[:]


class ActorCritic(nn.Module):
    def __init__(self, state_dim, action_dim, n_latent_var):
        super(ActorCritic, self).__init__()

        # 行動
        self.action_layer = nn.Sequential(
            # [b, 8] => [b, 64]
            nn.Linear(state_dim, n_latent_var),
            nn.Tanh(),  # 激活

            # [b, 64] => [b, 64]
            nn.Linear(n_latent_var, n_latent_var),
            nn.Tanh(),  # 激活

            # [b, 64] => [b, 4]
            nn.Linear(n_latent_var, action_dim),
            nn.Softmax(dim=-1)
        )

        # 評判
        self.value_layer = nn.Sequential(
            # [b, 8] => [8, 64]
            nn.Linear(state_dim, n_latent_var),
            nn.Tanh(),  # 激活

            # [b, 64] => [b, 64]
            nn.Linear(n_latent_var, n_latent_var),
            nn.Tanh(),

            # [b, 64] => [b, 1]
            nn.Linear(n_latent_var, 1)
        )

    def forward(self):
        """前向傳播, 由act替代"""

        raise NotImplementedError

    def act(self, state, memory):
        """計算行動"""

        # 轉成張量
        state = torch.from_numpy(state).float().to(device)

        # 計算4個方向概率
        action_probs = self.action_layer(state)

        # 通過最大概率計算最終行動方向
        dist = Categorical(action_probs)
        action = dist.sample()

        # 存入memory
        memory.states.append(state)
        memory.actions.append(action)
        memory.logprobs.append(dist.log_prob(action))

        # 返回行動
        return action.item()

    def evaluate(self, state, action):
        """
        評估
        :param state: 狀態, 2000個一組, 形狀為 [2000, 8]
        :param action: 行動, 2000個一組, 形狀為 [2000]
        :return:
        """

        # 計算行動概率
        action_probs = self.action_layer(state)
        dist = Categorical(action_probs)  # 轉換成類別分佈

        # 計算概率密度, log(概率)
        action_logprobs = dist.log_prob(action)

        # 計算熵
        dist_entropy = dist.entropy()

        # 評判
        state_value = self.value_layer(state)
        state_value = torch.squeeze(state_value)  # [2000, 1] => [2000]

        # 返回行動概率密度, 評判值, 行動概率熵
        return action_logprobs, state_value, dist_entropy


class PPO:
    def __init__(self, state_dim, action_dim, n_latent_var, lr, betas, gamma, K_epochs, eps_clip):
        self.lr = lr  # 學習率
        self.betas = betas  # betas
        self.gamma = gamma  # gamma
        self.eps_clip = eps_clip  # 裁剪, 限制值范圍
        self.K_epochs = K_epochs  # 迭代次數

        # 初始化policy
        self.policy = ActorCritic(state_dim, action_dim, n_latent_var).to(device)
        self.policy_old = ActorCritic(state_dim, action_dim, n_latent_var).to(device)
        self.policy_old.load_state_dict(self.policy.state_dict())

        self.optimizer = torch.optim.Adam(self.policy.parameters(), lr=lr, betas=betas)  # 優化器
        self.MseLoss = nn.MSELoss()  # 損失函數

    def update(self, memory):
        """更新梯度"""

        # 蒙特卡羅預測狀態回報
        rewards = []
        discounted_reward = 0
        for reward, is_terminal in zip(reversed(memory.rewards), reversed(memory.is_terminals)):
            # 回合結束
            if is_terminal:
                discounted_reward = 0

            # 更新削減獎勵(當前狀態獎勵 + 0.99*上一狀態獎勵
            discounted_reward = reward + (self.gamma * discounted_reward)

            # 首插入
            rewards.insert(0, discounted_reward)

        # 標準化獎勵
        rewards = torch.tensor(rewards, dtype=torch.float32).to(device)
        rewards = (rewards - rewards.mean()) / (rewards.std() + 1e-5)

        # 張量轉換
        old_states = torch.stack(memory.states).to(device).detach()
        old_actions = torch.stack(memory.actions).to(device).detach()
        old_logprobs = torch.stack(memory.logprobs).to(device).detach()

        # 迭代優化 K 次:
        for _ in range(self.K_epochs):
            # 評估
            logprobs, state_values, dist_entropy = self.policy.evaluate(old_states, old_actions)

            # 計算ratios
            ratios = torch.exp(logprobs - old_logprobs.detach())

            # 計算損失
            advantages = rewards - state_values.detach()
            surr1 = ratios * advantages
            surr2 = torch.clamp(ratios, 1 - self.eps_clip, 1 + self.eps_clip) * advantages
            loss = -torch.min(surr1, surr2) + 0.5 * self.MseLoss(state_values, rewards) - 0.01 * dist_entropy

            # 梯度清零
            self.optimizer.zero_grad()

            # 反向傳播
            loss.mean().backward()

            # 更新梯度
            self.optimizer.step()

        # 將新的權重賦值給舊policy
        self.policy_old.load_state_dict(self.policy.state_dict())

main

import gym
import torch
from PPO import Memory, PPO

############## 超參數 ##############
env_name = "LunarLander-v2"  # 遊戲名字
env = gym.make(env_name)
state_dim = 8  # 狀態維度
action_dim = 4  # 行動維度
render = False  # 可視化
solved_reward = 230  # 停止循環條件 (獎勵 > 230)
log_interval = 20  # print avg reward in the interval
max_episodes = 50000  # 最大迭代次數
max_timesteps = 300  # 最大單次遊戲步數
n_latent_var = 64  # 全連接隱層維度
update_timestep = 2000  # 每2000步policy更新一次
lr = 0.002  # 學習率
betas = (0.9, 0.999)  # betas
gamma = 0.99  # gamma
K_epochs = 4  # policy迭代更新次數
eps_clip = 0.2  # PPO 限幅


#############################################

def main():
    # 實例化
    memory = Memory()
    ppo = PPO(state_dim, action_dim, n_latent_var, lr, betas, gamma, K_epochs, eps_clip)

    # 存放
    total_reward = 0
    total_length = 0
    timestep = 0

    # 訓練
    for i_episode in range(1, max_episodes + 1):

        # 環境初始化
        state = env.reset()  # 初始化(重新玩)

        # 迭代
        for t in range(max_timesteps):
            timestep += 1

            # 用舊policy得到行動
            action = ppo.policy_old.act(state, memory)

            # 行動
            state, reward, done, _ = env.step(action)  # 得到(新的狀態,獎勵,是否終止,額外的調試信息)

            # 更新memory(獎勵/遊戲是否結束)
            memory.rewards.append(reward)
            memory.is_terminals.append(done)

            # 更新梯度
            if timestep % update_timestep == 0:
                ppo.update(memory)

                # memory清零
                memory.clear_memory()

                # 累計步數清零
                timestep = 0

            # 累加
            total_reward += reward

            # 可視化
            if render:
                env.render()

            # 如果遊戲結束, 退出
            if done:
                break

        # 遊戲步長
        total_length += t

        # 如果達到要求(230分), 退出循環
        if total_reward >= (log_interval * solved_reward):
            print("########## Solved! ##########")

            # 保存模型
            torch.save(ppo.policy.state_dict(), './PPO_{}.pth'.format(env_name))

            # 退出循環
            break

        # 輸出log, 每20次迭代
        if i_episode % log_interval == 0:
            
            # 求20次迭代平均時長/收益
            avg_length = int(total_length / log_interval)
            running_reward = int(total_reward / log_interval)

            # 調試輸出
            print('Episode {} \t avg length: {} \t average_reward: {}'.format(i_episode, avg_length, running_reward))

            # 清零
            total_reward = 0
            total_length = 0

if __name__ == '__main__':
    main()

輸出結果

Episode 20 avg length: 93 reward: -243
Episode 40 avg length: 92 reward: -172
Episode 60 avg length: 79 reward: -192
Episode 80 avg length: 85 reward: -164
Episode 100 avg length: 90 reward: -179
Episode 120 avg length: 100 reward: -201
Episode 140 avg length: 91 reward: -175
Episode 160 avg length: 101 reward: -141
Episode 180 avg length: 86 reward: -153
Episode 200 avg length: 93 reward: -189
Episode 220 avg length: 96 reward: -221
Episode 240 avg length: 105 reward: -140
Episode 260 avg length: 94 reward: -121
Episode 280 avg length: 91 reward: -131
Episode 300 avg length: 91 reward: -122
Episode 320 avg length: 90 reward: -113
Episode 340 avg length: 100 reward: -110
Episode 360 avg length: 110 reward: -92
Episode 380 avg length: 110 reward: -75
Episode 400 avg length: 119 reward: -76
Episode 420 avg length: 162 reward: -77
Episode 440 avg length: 194 reward: -91
Episode 460 avg length: 144 reward: -28
Episode 480 avg length: 192 reward: -8
Episode 500 avg length: 244 reward: -25
Episode 520 avg length: 239 reward: -1
Episode 540 avg length: 269 reward: 21
Episode 560 avg length: 289 reward: 27
Episode 580 avg length: 270 reward: 65
Episode 600 avg length: 264 reward: 86
Episode 620 avg length: 256 reward: 66
Episode 640 avg length: 278 reward: 75
Episode 660 avg length: 235 reward: 11
Episode 680 avg length: 244 reward: 84
Episode 700 avg length: 253 reward: 73
Episode 720 avg length: 292 reward: 63
Episode 740 avg length: 293 reward: 104
Episode 760 avg length: 279 reward: 109
Episode 780 avg length: 246 reward: 86
Episode 800 avg length: 260 reward: 124
Episode 820 avg length: 276 reward: 131
Episode 840 avg length: 269 reward: 121
Episode 860 avg length: 194 reward: 67
Episode 880 avg length: 241 reward: 94
Episode 900 avg length: 259 reward: 98
Episode 920 avg length: 211 reward: 83
Episode 940 avg length: 260 reward: 105
Episode 960 avg length: 194 reward: 65
Episode 980 avg length: 202 reward: 68
Episode 1000 avg length: 243 reward: 79
Episode 1020 avg length: 260 reward: 66
Episode 1040 avg length: 289 reward: 117
Episode 1060 avg length: 252 reward: 94
Episode 1080 avg length: 262 reward: 114
Episode 1100 avg length: 272 reward: 112
Episode 1120 avg length: 263 reward: 97
Episode 1140 avg length: 256 reward: 93
Episode 1160 avg length: 274 reward: 120
Episode 1180 avg length: 256 reward: 117
Episode 1200 avg length: 241 reward: 105
Episode 1220 avg length: 238 reward: 103
Episode 1240 avg length: 267 reward: 121
Episode 1260 avg length: 283 reward: 124
Episode 1280 avg length: 299 reward: 149
Episode 1300 avg length: 281 reward: 126
Episode 1320 avg length: 266 reward: 102
Episode 1340 avg length: 282 reward: 128
Episode 1360 avg length: 275 reward: 114
Episode 1380 avg length: 285 reward: 105
Episode 1400 avg length: 294 reward: 123
Episode 1420 avg length: 293 reward: 132
Episode 1440 avg length: 248 reward: 85
Episode 1460 avg length: 281 reward: 115
Episode 1480 avg length: 291 reward: 152
Episode 1500 avg length: 279 reward: 130
Episode 1520 avg length: 267 reward: 103
Episode 1540 avg length: 270 reward: 137
Episode 1560 avg length: 269 reward: 120
Episode 1580 avg length: 260 reward: 113
Episode 1600 avg length: 282 reward: 147
Episode 1620 avg length: 259 reward: 125
Episode 1640 avg length: 240 reward: 90
Episode 1660 avg length: 284 reward: 125
Episode 1680 avg length: 282 reward: 123
Episode 1700 avg length: 274 reward: 123
Episode 1720 avg length: 273 reward: 130
Episode 1740 avg length: 260 reward: 117
Episode 1760 avg length: 243 reward: 106
Episode 1780 avg length: 241 reward: 90
Episode 1800 avg length: 290 reward: 144
Episode 1820 avg length: 258 reward: 131
Episode 1840 avg length: 283 reward: 142
Episode 1860 avg length: 262 reward: 100
Episode 1880 avg length: 273 reward: 132
Episode 1900 avg length: 255 reward: 92
Episode 1920 avg length: 251 reward: 117
Episode 1940 avg length: 220 reward: 103
Episode 1960 avg length: 221 reward: 111
Episode 1980 avg length: 205 reward: 83
Episode 2000 avg length: 227 reward: 102
Episode 2020 avg length: 251 reward: 123
Episode 2040 avg length: 227 reward: 100
Episode 2060 avg length: 255 reward: 135
Episode 2080 avg length: 273 reward: 136
Episode 2100 avg length: 256 reward: 126
Episode 2120 avg length: 273 reward: 141
Episode 2140 avg length: 280 reward: 109
Episode 2160 avg length: 266 reward: 112
Episode 2180 avg length: 249 reward: 88
Episode 2200 avg length: 247 reward: 119
Episode 2220 avg length: 270 reward: 143
Episode 2240 avg length: 257 reward: 65
Episode 2260 avg length: 250 reward: 30
Episode 2280 avg length: 261 reward: 112
Episode 2300 avg length: 270 reward: 139
Episode 2320 avg length: 275 reward: 128
Episode 2340 avg length: 290 reward: 149
Episode 2360 avg length: 269 reward: 139
Episode 2380 avg length: 272 reward: 137
Episode 2400 avg length: 232 reward: 105
Episode 2420 avg length: 242 reward: 127
Episode 2440 avg length: 241 reward: 134
Episode 2460 avg length: 249 reward: 113
Episode 2480 avg length: 287 reward: 154
Episode 2500 avg length: 289 reward: 149
Episode 2520 avg length: 258 reward: 129
Episode 2540 avg length: 250 reward: 101
Episode 2560 avg length: 287 reward: 158
Episode 2580 avg length: 271 reward: 145
Episode 2600 avg length: 253 reward: 120
Episode 2620 avg length: 255 reward: 127
Episode 2640 avg length: 254 reward: 122
Episode 2660 avg length: 238 reward: 123
Episode 2680 avg length: 243 reward: 115
Episode 2700 avg length: 241 reward: 93
Episode 2720 avg length: 232 reward: 90
Episode 2740 avg length: 215 reward: 83
Episode 2760 avg length: 241 reward: 112
Episode 2780 avg length: 273 reward: 129
Episode 2800 avg length: 269 reward: 133
Episode 2820 avg length: 246 reward: 91
Episode 2840 avg length: 261 reward: 130
Episode 2860 avg length: 261 reward: 136
Episode 2880 avg length: 289 reward: 128
Episode 2900 avg length: 271 reward: 131
Episode 2920 avg length: 277 reward: 145
Episode 2940 avg length: 251 reward: 117
Episode 2960 avg length: 253 reward: 120
Episode 2980 avg length: 270 reward: 133
Episode 3000 avg length: 240 reward: 85
Episode 3020 avg length: 284 reward: 141
Episode 3040 avg length: 255 reward: 117
Episode 3060 avg length: 299 reward: 134
Episode 3080 avg length: 263 reward: 122
Episode 3100 avg length: 259 reward: 126
Episode 3120 avg length: 270 reward: 125
Episode 3140 avg length: 299 reward: 150
Episode 3160 avg length: 256 reward: 116
Episode 3180 avg length: 264 reward: 124
Episode 3200 avg length: 271 reward: 128
Episode 3220 avg length: 259 reward: 122
Episode 3240 avg length: 261 reward: 125
Episode 3260 avg length: 271 reward: 129
Episode 3280 avg length: 242 reward: 126
Episode 3300 avg length: 218 reward: 93
Episode 3320 avg length: 230 reward: 116
Episode 3340 avg length: 223 reward: 109
Episode 3360 avg length: 249 reward: 122
Episode 3380 avg length: 224 reward: 104
Episode 3400 avg length: 261 reward: 131
Episode 3420 avg length: 280 reward: 140
Episode 3440 avg length: 264 reward: 125
Episode 3460 avg length: 247 reward: 105
Episode 3480 avg length: 276 reward: 141
Episode 3500 avg length: 282 reward: 149
Episode 3520 avg length: 282 reward: 141
Episode 3540 avg length: 290 reward: 152
Episode 3560 avg length: 282 reward: 141
Episode 3580 avg length: 291 reward: 151
Episode 3600 avg length: 289 reward: 166
Episode 3620 avg length: 266 reward: 142
Episode 3640 avg length: 277 reward: 91
Episode 3660 avg length: 272 reward: 114
Episode 3680 avg length: 281 reward: 159
Episode 3700 avg length: 287 reward: 160
Episode 3720 avg length: 254 reward: 78
Episode 3740 avg length: 296 reward: 174
Episode 3760 avg length: 267 reward: 124
Episode 3780 avg length: 273 reward: 148
Episode 3800 avg length: 275 reward: 147
Episode 3820 avg length: 276 reward: 145
Episode 3840 avg length: 283 reward: 151
Episode 3860 avg length: 275 reward: 142
Episode 3880 avg length: 290 reward: 142
Episode 3900 avg length: 290 reward: 154
Episode 3920 avg length: 283 reward: 141
Episode 3940 avg length: 273 reward: 145
Episode 3960 avg length: 290 reward: 161
Episode 3980 avg length: 268 reward: 145
Episode 4000 avg length: 270 reward: 142
Episode 4020 avg length: 283 reward: 156
Episode 4040 avg length: 283 reward: 149
Episode 4060 avg length: 299 reward: 172
Episode 4080 avg length: 292 reward: 158
Episode 4100 avg length: 274 reward: 143
Episode 4120 avg length: 299 reward: 163
Episode 4140 avg length: 290 reward: 153
Episode 4160 avg length: 299 reward: 165
Episode 4180 avg length: 290 reward: 160
Episode 4200 avg length: 299 reward: 157
Episode 4220 avg length: 299 reward: 171
Episode 4240 avg length: 271 reward: 148
Episode 4260 avg length: 265 reward: 139
Episode 4280 avg length: 258 reward: 137
Episode 4300 avg length: 280 reward: 137
Episode 4320 avg length: 262 reward: 133
Episode 4340 avg length: 255 reward: 110
Episode 4360 avg length: 275 reward: 134
Episode 4380 avg length: 282 reward: 154
Episode 4400 avg length: 264 reward: 128
Episode 4420 avg length: 299 reward: 150
Episode 4440 avg length: 275 reward: 151
Episode 4460 avg length: 257 reward: 116
Episode 4480 avg length: 256 reward: 104
Episode 4500 avg length: 263 reward: 134
Episode 4520 avg length: 299 reward: 164
Episode 4540 avg length: 265 reward: 137
Episode 4560 avg length: 265 reward: 147
Episode 4580 avg length: 283 reward: 138
Episode 4600 avg length: 299 reward: 152
Episode 4620 avg length: 281 reward: 154
Episode 4640 avg length: 289 reward: 161
Episode 4660 avg length: 264 reward: 143
Episode 4680 avg length: 285 reward: 138
Episode 4700 avg length: 291 reward: 143
Episode 4720 avg length: 280 reward: 154
Episode 4740 avg length: 284 reward: 125
Episode 4760 avg length: 296 reward: 136
Episode 4780 avg length: 254 reward: 127
Episode 4800 avg length: 281 reward: 147
Episode 4820 avg length: 282 reward: 143
Episode 4840 avg length: 243 reward: 119
Episode 4860 avg length: 280 reward: 139
Episode 4880 avg length: 270 reward: 137
Episode 4900 avg length: 278 reward: 150
Episode 4920 avg length: 203 reward: 83
Episode 4940 avg length: 272 reward: 153
Episode 4960 avg length: 289 reward: 151
Episode 4980 avg length: 289 reward: 157
Episode 5000 avg length: 299 reward: 168
Episode 5020 avg length: 292 reward: 136
Episode 5040 avg length: 290 reward: 158
Episode 5060 avg length: 286 reward: 157
Episode 5080 avg length: 282 reward: 154
Episode 5100 avg length: 278 reward: 121
Episode 5120 avg length: 291 reward: 138
Episode 5140 avg length: 297 reward: 143
Episode 5160 avg length: 290 reward: 165
Episode 5180 avg length: 290 reward: 157
Episode 5200 avg length: 276 reward: 150
Episode 5220 avg length: 278 reward: 149
Episode 5240 avg length: 287 reward: 153
Episode 5260 avg length: 274 reward: 145
Episode 5280 avg length: 299 reward: 176
Episode 5300 avg length: 299 reward: 173
Episode 5320 avg length: 299 reward: 164
Episode 5340 avg length: 271 reward: 157
Episode 5360 avg length: 299 reward: 180
Episode 5380 avg length: 279 reward: 156
Episode 5400 avg length: 268 reward: 133
Episode 5420 avg length: 279 reward: 136
Episode 5440 avg length: 278 reward: 130
Episode 5460 avg length: 268 reward: 137
Episode 5480 avg length: 273 reward: 152
Episode 5500 avg length: 299 reward: 168
Episode 5520 avg length: 266 reward: 95
Episode 5540 avg length: 294 reward: 146
Episode 5560 avg length: 289 reward: 165
Episode 5580 avg length: 288 reward: 139
Episode 5600 avg length: 299 reward: 174
Episode 5620 avg length: 291 reward: 168
Episode 5640 avg length: 281 reward: 147
Episode 5660 avg length: 270 reward: 126
Episode 5680 avg length: 263 reward: 153
Episode 5700 avg length: 283 reward: 161
Episode 5720 avg length: 271 reward: 154
Episode 5740 avg length: 281 reward: 154
Episode 5760 avg length: 281 reward: 144
Episode 5780 avg length: 272 reward: 145
Episode 5800 avg length: 275 reward: 128
Episode 5820 avg length: 290 reward: 159
Episode 5840 avg length: 274 reward: 142
Episode 5860 avg length: 243 reward: 122
Episode 5880 avg length: 236 reward: 124
Episode 5900 avg length: 255 reward: 139
Episode 5920 avg length: 288 reward: 140
Episode 5940 avg length: 271 reward: 140
Episode 5960 avg length: 254 reward: 108
Episode 5980 avg length: 299 reward: 149
Episode 6000 avg length: 289 reward: 149
Episode 6020 avg length: 258 reward: 109
Episode 6040 avg length: 289 reward: 129
Episode 6060 avg length: 238 reward: 94
Episode 6080 avg length: 270 reward: 87
Episode 6100 avg length: 268 reward: 96
Episode 6120 avg length: 279 reward: 142
Episode 6140 avg length: 233 reward: 112
Episode 6160 avg length: 268 reward: 142
Episode 6180 avg length: 260 reward: 133
Episode 6200 avg length: 210 reward: 109
Episode 6220 avg length: 248 reward: 111
Episode 6240 avg length: 229 reward: 92
Episode 6260 avg length: 210 reward: 98
Episode 6280 avg length: 218 reward: 102
Episode 6300 avg length: 225 reward: 117
Episode 6320 avg length: 235 reward: 112
Episode 6340 avg length: 259 reward: 124
Episode 6360 avg length: 252 reward: 113
Episode 6380 avg length: 239 reward: 119
Episode 6400 avg length: 242 reward: 95
Episode 6420 avg length: 249 reward: 111
Episode 6440 avg length: 257 reward: 136
Episode 6460 avg length: 259 reward: 123
Episode 6480 avg length: 259 reward: 112
Episode 6500 avg length: 259 reward: 129
Episode 6520 avg length: 215 reward: 101
Episode 6540 avg length: 249 reward: 137
Episode 6560 avg length: 245 reward: 121
Episode 6580 avg length: 259 reward: 127
Episode 6600 avg length: 267 reward: 142
Episode 6620 avg length: 257 reward: 86
Episode 6640 avg length: 278 reward: 141
Episode 6660 avg length: 255 reward: 92
Episode 6680 avg length: 289 reward: 145
Episode 6700 avg length: 259 reward: 133
Episode 6720 avg length: 247 reward: 116
Episode 6740 avg length: 243 reward: 56
Episode 6760 avg length: 274 reward: 114
Episode 6780 avg length: 279 reward: 133
Episode 6800 avg length: 269 reward: 152
Episode 6820 avg length: 252 reward: 105
Episode 6840 avg length: 254 reward: 123
Episode 6860 avg length: 253 reward: 98
Episode 6880 avg length: 273 reward: 132
Episode 6900 avg length: 249 reward: 108
Episode 6920 avg length: 248 reward: 84
Episode 6940 avg length: 250 reward: 107
Episode 6960 avg length: 279 reward: 99
Episode 6980 avg length: 279 reward: 140
Episode 7000 avg length: 270 reward: 105
Episode 7020 avg length: 250 reward: 109
Episode 7040 avg length: 202 reward: 87
Episode 7060 avg length: 188 reward: 56
Episode 7080 avg length: 229 reward: 93
Episode 7100 avg length: 248 reward: 105
Episode 7120 avg length: 218 reward: 105
Episode 7140 avg length: 213 reward: 77
Episode 7160 avg length: 279 reward: 128
Episode 7180 avg length: 247 reward: 110
Episode 7200 avg length: 269 reward: 124
Episode 7220 avg length: 217 reward: 64
Episode 7240 avg length: 258 reward: 140
Episode 7260 avg length: 279 reward: 116
Episode 7280 avg length: 244 reward: 97
Episode 7300 avg length: 245 reward: 104
Episode 7320 avg length: 213 reward: 81
Episode 7340 avg length: 268 reward: 126
Episode 7360 avg length: 277 reward: 124
Episode 7380 avg length: 251 reward: 122
Episode 7400 avg length: 234 reward: 108
Episode 7420 avg length: 267 reward: 127
Episode 7440 avg length: 218 reward: 89
Episode 7460 avg length: 199 reward: 80
Episode 7480 avg length: 154 reward: 55
Episode 7500 avg length: 228 reward: 114
Episode 7520 avg length: 197 reward: 49
Episode 7540 avg length: 147 reward: 59
Episode 7560 avg length: 139 reward: 49
Episode 7580 avg length: 181 reward: 74
Episode 7600 avg length: 191 reward: 61
Episode 7620 avg length: 176 reward: 78
Episode 7640 avg length: 160 reward: 35
Episode 7660 avg length: 159 reward: 50
Episode 7680 avg length: 143 reward: 68
Episode 7700 avg length: 227 reward: 103
Episode 7720 avg length: 192 reward: 59
Episode 7740 avg length: 248 reward: 118
Episode 7760 avg length: 250 reward: 128
Episode 7780 avg length: 261 reward: 110
Episode 7800 avg length: 279 reward: 157
Episode 7820 avg length: 249 reward: 153
Episode 7840 avg length: 212 reward: 78
Episode 7860 avg length: 249 reward: 144
Episode 7880 avg length: 257 reward: 107
Episode 7900 avg length: 271 reward: 136
Episode 7920 avg length: 244 reward: 129
Episode 7940 avg length: 262 reward: 145
Episode 7960 avg length: 224 reward: 94
Episode 7980 avg length: 247 reward: 110
Episode 8000 avg length: 190 reward: 81
Episode 8020 avg length: 157 reward: 67
Episode 8040 avg length: 171 reward: 67
Episode 8060 avg length: 203 reward: 96
Episode 8080 avg length: 225 reward: 87
Episode 8100 avg length: 166 reward: 84
Episode 8120 avg length: 196 reward: 82
Episode 8140 avg length: 249 reward: 120
Episode 8160 avg length: 216 reward: 112
Episode 8180 avg length: 178 reward: 97
Episode 8200 avg length: 221 reward: 120
Episode 8220 avg length: 265 reward: 122
Episode 8240 avg length: 240 reward: 125
Episode 8260 avg length: 266 reward: 146
Episode 8280 avg length: 253 reward: 116
Episode 8300 avg length: 233 reward: 129
Episode 8320 avg length: 260 reward: 126
Episode 8340 avg length: 264 reward: 138
Episode 8360 avg length: 196 reward: 88
Episode 8380 avg length: 189 reward: 60
Episode 8400 avg length: 227 reward: 66
Episode 8420 avg length: 257 reward: 114
Episode 8440 avg length: 254 reward: 99
Episode 8460 avg length: 268 reward: 127
Episode 8480 avg length: 263 reward: 131
Episode 8500 avg length: 246 reward: 107
Episode 8520 avg length: 281 reward: 127
Episode 8540 avg length: 273 reward: 146
Episode 8560 avg length: 290 reward: 124
Episode 8580 avg length: 261 reward: 103
Episode 8600 avg length: 294 reward: 140
Episode 8620 avg length: 236 reward: 110
Episode 8640 avg length: 261 reward: 125
Episode 8660 avg length: 284 reward: 108
Episode 8680 avg length: 278 reward: 141
Episode 8700 avg length: 256 reward: 124
Episode 8720 avg length: 245 reward: 95
Episode 8740 avg length: 258 reward: 136
Episode 8760 avg length: 289 reward: 147
Episode 8780 avg length: 229 reward: 98
Episode 8800 avg length: 277 reward: 138
Episode 8820 avg length: 237 reward: 129
Episode 8840 avg length: 276 reward: 141
Episode 8860 avg length: 224 reward: 102
Episode 8880 avg length: 220 reward: 108
Episode 8900 avg length: 277 reward: 137
Episode 8920 avg length: 259 reward: 120
Episode 8940 avg length: 242 reward: 124
Episode 8960 avg length: 275 reward: 119
Episode 8980 avg length: 256 reward: 140
Episode 9000 avg length: 263 reward: 110
Episode 9020 avg length: 247 reward: 101
Episode 9040 avg length: 251 reward: 99
Episode 9060 avg length: 266 reward: 128
Episode 9080 avg length: 247 reward: 119
Episode 9100 avg length: 227 reward: 95
Episode 9120 avg length: 242 reward: 95
Episode 9140 avg length: 234 reward: 120
Episode 9160 avg length: 271 reward: 145
Episode 9180 avg length: 234 reward: 106
Episode 9200 avg length: 230 reward: 102
Episode 9220 avg length: 217 reward: 111
Episode 9240 avg length: 182 reward: 68
Episode 9260 avg length: 225 reward: 111
Episode 9280 avg length: 224 reward: 110
Episode 9300 avg length: 195 reward: 97
Episode 9320 avg length: 245 reward: 110
Episode 9340 avg length: 249 reward: 87
Episode 9360 avg length: 238 reward: 105
Episode 9380 avg length: 231 reward: 83
Episode 9400 avg length: 245 reward: 60
Episode 9420 avg length: 251 reward: 81
Episode 9440 avg length: 218 reward: 86
Episode 9460 avg length: 177 reward: 62
Episode 9480 avg length: 212 reward: 64
Episode 9500 avg length: 213 reward: 96
Episode 9520 avg length: 267 reward: 121
Episode 9540 avg length: 195 reward: 89
Episode 9560 avg length: 259 reward: 140
Episode 9580 avg length: 246 reward: 116
Episode 9600 avg length: 266 reward: 122
Episode 9620 avg length: 255 reward: 104
Episode 9640 avg length: 203 reward: 116
Episode 9660 avg length: 239 reward: 117
Episode 9680 avg length: 239 reward: 118
Episode 9700 avg length: 254 reward: 137
Episode 9720 avg length: 269 reward: 144
Episode 9740 avg length: 274 reward: 136
Episode 9760 avg length: 259 reward: 123
Episode 9780 avg length: 230 reward: 102
Episode 9800 avg length: 268 reward: 139
Episode 9820 avg length: 258 reward: 120
Episode 9840 avg length: 271 reward: 111
Episode 9860 avg length: 260 reward: 130
Episode 9880 avg length: 280 reward: 135
Episode 9900 avg length: 269 reward: 126
Episode 9920 avg length: 290 reward: 159
Episode 9940 avg length: 286 reward: 129
Episode 9960 avg length: 259 reward: 117
Episode 9980 avg length: 299 reward: 139
Episode 10000 avg length: 298 reward: 141
Episode 10020 avg length: 294 reward: 115
Episode 10040 avg length: 284 reward: 117
Episode 10060 avg length: 299 reward: 156
Episode 10080 avg length: 290 reward: 145
Episode 10100 avg length: 280 reward: 151
Episode 10120 avg length: 299 reward: 163
Episode 10140 avg length: 290 reward: 151
Episode 10160 avg length: 269 reward: 133
Episode 10180 avg length: 259 reward: 134
Episode 10200 avg length: 272 reward: 137
Episode 10220 avg length: 260 reward: 121
Episode 10240 avg length: 259 reward: 103
Episode 10260 avg length: 260 reward: 126
Episode 10280 avg length: 279 reward: 150
Episode 10300 avg length: 268 reward: 128
Episode 10320 avg length: 261 reward: 140
Episode 10340 avg length: 243 reward: 111
Episode 10360 avg length: 236 reward: 113
Episode 10380 avg length: 219 reward: 112
Episode 10400 avg length: 267 reward: 140
Episode 10420 avg length: 279 reward: 146
Episode 10440 avg length: 285 reward: 137
Episode 10460 avg length: 255 reward: 107
Episode 10480 avg length: 249 reward: 115
Episode 10500 avg length: 241 reward: 106
Episode 10520 avg length: 219 reward: 102
Episode 10540 avg length: 200 reward: 52
Episode 10560 avg length: 267 reward: 124
Episode 10580 avg length: 235 reward: 111
Episode 10600 avg length: 223 reward: 86
Episode 10620 avg length: 220 reward: 90
Episode 10640 avg length: 269 reward: 145
Episode 10660 avg length: 255 reward: 133
Episode 10680 avg length: 277 reward: 130
Episode 10700 avg length: 280 reward: 142
Episode 10720 avg length: 278 reward: 128
Episode 10740 avg length: 260 reward: 90
Episode 10760 avg length: 288 reward: 145
Episode 10780 avg length: 238 reward: 94
Episode 10800 avg length: 278 reward: 136
Episode 10820 avg length: 288 reward: 150
Episode 10840 avg length: 280 reward: 148
Episode 10860 avg length: 240 reward: 117
Episode 10880 avg length: 257 reward: 124
Episode 10900 avg length: 261 reward: 130
Episode 10920 avg length: 229 reward: 115
Episode 10940 avg length: 259 reward: 144
Episode 10960 avg length: 238 reward: 138
Episode 10980 avg length: 230 reward: 112
Episode 11000 avg length: 254 reward: 126
Episode 11020 avg length: 281 reward: 141
Episode 11040 avg length: 270 reward: 120
Episode 11060 avg length: 297 reward: 174
Episode 11080 avg length: 261 reward: 138
Episode 11100 avg length: 259 reward: 125
Episode 11120 avg length: 292 reward: 173
Episode 11140 avg length: 275 reward: 146
Episode 11160 avg length: 299 reward: 165
Episode 11180 avg length: 299 reward: 175
Episode 11200 avg length: 289 reward: 161
Episode 11220 avg length: 299 reward: 166
Episode 11240 avg length: 278 reward: 160
Episode 11260 avg length: 290 reward: 142
Episode 11280 avg length: 299 reward: 164
Episode 11300 avg length: 279 reward: 155
Episode 11320 avg length: 299 reward: 178
Episode 11340 avg length: 299 reward: 150
Episode 11360 avg length: 265 reward: 110
Episode 11380 avg length: 288 reward: 156
Episode 11400 avg length: 278 reward: 146
Episode 11420 avg length: 268 reward: 141
Episode 11440 avg length: 291 reward: 130
Episode 11460 avg length: 299 reward: 161
Episode 11480 avg length: 284 reward: 142
Episode 11500 avg length: 262 reward: 132
Episode 11520 avg length: 287 reward: 149
Episode 11540 avg length: 288 reward: 150
Episode 11560 avg length: 288 reward: 157
Episode 11580 avg length: 288 reward: 156
Episode 11600 avg length: 284 reward: 133
Episode 11620 avg length: 287 reward: 152
Episode 11640 avg length: 249 reward: 130
Episode 11660 avg length: 240 reward: 106
Episode 11680 avg length: 271 reward: 131
Episode 11700 avg length: 271 reward: 117
Episode 11720 avg length: 286 reward: 143
Episode 11740 avg length: 293 reward: 150
Episode 11760 avg length: 289 reward: 155
Episode 11780 avg length: 290 reward: 137
Episode 11800 avg length: 289 reward: 133
Episode 11820 avg length: 273 reward: 121
Episode 11840 avg length: 274 reward: 109
Episode 11860 avg length: 261 reward: 147
Episode 11880 avg length: 210 reward: 114
Episode 11900 avg length: 245 reward: 143
Episode 11920 avg length: 210 reward: 115
Episode 11940 avg length: 218 reward: 102
Episode 11960 avg length: 214 reward: 102
Episode 11980 avg length: 269 reward: 133
Episode 12000 avg length: 262 reward: 144
Episode 12020 avg length: 235 reward: 131
Episode 12040 avg length: 253 reward: 149
Episode 12060 avg length: 227 reward: 120
Episode 12080 avg length: 202 reward: 98
Episode 12100 avg length: 240 reward: 117
Episode 12120 avg length: 231 reward: 108
Episode 12140 avg length: 230 reward: 122
Episode 12160 avg length: 228 reward: 108
Episode 12180 avg length: 233 reward: 96
Episode 12200 avg length: 252 reward: 123
Episode 12220 avg length: 272 reward: 154
Episode 12240 avg length: 251 reward: 122
Episode 12260 avg length: 273 reward: 147
Episode 12280 avg length: 239 reward: 111
Episode 12300 avg length: 287 reward: 126
Episode 12320 avg length: 278 reward: 121
Episode 12340 avg length: 258 reward: 120
Episode 12360 avg length: 265 reward: 104
Episode 12380 avg length: 279 reward: 118
Episode 12400 avg length: 254 reward: 72
Episode 12420 avg length: 187 reward: 74
Episode 12440 avg length: 244 reward: 90
Episode 12460 avg length: 228 reward: 116
Episode 12480 avg length: 258 reward: 125
Episode 12500 avg length: 247 reward: 118
Episode 12520 avg length: 244 reward: 101
Episode 12540 avg length: 267 reward: 135
Episode 12560 avg length: 253 reward: 99
Episode 12580 avg length: 285 reward: 135
Episode 12600 avg length: 259 reward: 113
Episode 12620 avg length: 256 reward: 108
Episode 12640 avg length: 238 reward: 114
Episode 12660 avg length: 265 reward: 128
Episode 12680 avg length: 289 reward: 145
Episode 12700 avg length: 287 reward: 147
Episode 12720 avg length: 283 reward: 139
Episode 12740 avg length: 255 reward: 108
Episode 12760 avg length: 299 reward: 150
Episode 12780 avg length: 277 reward: 138
Episode 12800 avg length: 290 reward: 151
Episode 12820 avg length: 284 reward: 159
Episode 12840 avg length: 299 reward: 150
Episode 12860 avg length: 289 reward: 146
Episode 12880 avg length: 299 reward: 158
Episode 12900 avg length: 299 reward: 144
Episode 12920 avg length: 279 reward: 129
Episode 12940 avg length: 282 reward: 132
Episode 12960 avg length: 280 reward: 132
Episode 12980 avg length: 278 reward: 108
Episode 13000 avg length: 284 reward: 136
Episode 13020 avg length: 289 reward: 128
Episode 13040 avg length: 291 reward: 149
Episode 13060 avg length: 299 reward: 140
Episode 13080 avg length: 292 reward: 141
Episode 13100 avg length: 290 reward: 139
Episode 13120 avg length: 299 reward: 139
Episode 13140 avg length: 291 reward: 151
Episode 13160 avg length: 291 reward: 141
Episode 13180 avg length: 299 reward: 169
Episode 13200 avg length: 299 reward: 162
Episode 13220 avg length: 299 reward: 170
Episode 13240 avg length: 299 reward: 170
Episode 13260 avg length: 299 reward: 155
Episode 13280 avg length: 299 reward: 153
Episode 13300 avg length: 299 reward: 163
Episode 13320 avg length: 281 reward: 131
Episode 13340 avg length: 289 reward: 153
Episode 13360 avg length: 285 reward: 133
Episode 13380 avg length: 280 reward: 134
Episode 13400 avg length: 282 reward: 134
Episode 13420 avg length: 268 reward: 114
Episode 13440 avg length: 290 reward: 142
Episode 13460 avg length: 270 reward: 145
Episode 13480 avg length: 257 reward: 127
Episode 13500 avg length: 272 reward: 139
Episode 13520 avg length: 270 reward: 129
Episode 13540 avg length: 279 reward: 149
Episode 13560 avg length: 269 reward: 95
Episode 13580 avg length: 270 reward: 113
Episode 13600 avg length: 258 reward: 125
Episode 13620 avg length: 217 reward: 88
Episode 13640 avg length: 157 reward: 59
Episode 13660 avg length: 132 reward: 41
Episode 13680 avg length: 220 reward: 92
Episode 13700 avg length: 241 reward: 109
Episode 13720 avg length: 252 reward: 127
Episode 13740 avg length: 253 reward: 104
Episode 13760 avg length: 269 reward: 128
Episode 13780 avg length: 230 reward: 96
Episode 13800 avg length: 258 reward: 127
Episode 13820 avg length: 290 reward: 151
Episode 13840 avg length: 299 reward: 135
Episode 13860 avg length: 280 reward: 111
Episode 13880 avg length: 268 reward: 124
Episode 13900 avg length: 255 reward: 93
Episode 13920 avg length: 258 reward: 128
Episode 13940 avg length: 244 reward: 127
Episode 13960 avg length: 238 reward: 117
Episode 13980 avg length: 237 reward: 104
Episode 14000 avg length: 251 reward: 123
Episode 14020 avg length: 267 reward: 114
Episode 14040 avg length: 271 reward: 109
Episode 14060 avg length: 247 reward: 117
Episode 14080 avg length: 282 reward: 129
Episode 14100 avg length: 266 reward: 144
Episode 14120 avg length: 256 reward: 132
Episode 14140 avg length: 267 reward: 140
Episode 14160 avg length: 289 reward: 149
Episode 14180 avg length: 262 reward: 95
Episode 14200 avg length: 278 reward: 128
Episode 14220 avg length: 279 reward: 136
Episode 14240 avg length: 249 reward: 105
Episode 14260 avg length: 235 reward: 112
Episode 14280 avg length: 273 reward: 131
Episode 14300 avg length: 278 reward: 130
Episode 14320 avg length: 259 reward: 123
Episode 14340 avg length: 234 reward: 78
Episode 14360 avg length: 268 reward: 125
Episode 14380 avg length: 294 reward: 153
Episode 14400 avg length: 299 reward: 150
Episode 14420 avg length: 278 reward: 129
Episode 14440 avg length: 297 reward: 155
Episode 14460 avg length: 247 reward: 106
Episode 14480 avg length: 289 reward: 154
Episode 14500 avg length: 270 reward: 133
Episode 14520 avg length: 259 reward: 133
Episode 14540 avg length: 280 reward: 151
Episode 14560 avg length: 268 reward: 129
Episode 14580 avg length: 299 reward: 159
Episode 14600 avg length: 279 reward: 131
Episode 14620 avg length: 242 reward: 100
Episode 14640 avg length: 236 reward: 114
Episode 14660 avg length: 253 reward: 132
Episode 14680 avg length: 272 reward: 134
Episode 14700 avg length: 297 reward: 175
Episode 14720 avg length: 278 reward: 148
Episode 14740 avg length: 289 reward: 154
Episode 14760 avg length: 288 reward: 148
Episode 14780 avg length: 278 reward: 140
Episode 14800 avg length: 266 reward: 128
Episode 14820 avg length: 288 reward: 161
Episode 14840 avg length: 278 reward: 145
Episode 14860 avg length: 290 reward: 161
Episode 14880 avg length: 279 reward: 139
Episode 14900 avg length: 284 reward: 155
Episode 14920 avg length: 245 reward: 136
Episode 14940 avg length: 269 reward: 137
Episode 14960 avg length: 262 reward: 146
Episode 14980 avg length: 299 reward: 154
Episode 15000 avg length: 273 reward: 172
Episode 15020 avg length: 278 reward: 142
Episode 15040 avg length: 277 reward: 150
Episode 15060 avg length: 232 reward: 119
Episode 15080 avg length: 280 reward: 141
Episode 15100 avg length: 260 reward: 137
Episode 15120 avg length: 285 reward: 167
Episode 15140 avg length: 280 reward: 149
Episode 15160 avg length: 237 reward: 118
Episode 15180 avg length: 223 reward: 111
Episode 15200 avg length: 243 reward: 134
Episode 15220 avg length: 269 reward: 138
Episode 15240 avg length: 251 reward: 127
Episode 15260 avg length: 289 reward: 157
Episode 15280 avg length: 229 reward: 107
Episode 15300 avg length: 277 reward: 143
Episode 15320 avg length: 288 reward: 154
Episode 15340 avg length: 289 reward: 149
Episode 15360 avg length: 288 reward: 145
Episode 15380 avg length: 260 reward: 134
Episode 15400 avg length: 246 reward: 126
Episode 15420 avg length: 244 reward: 132
Episode 15440 avg length: 272 reward: 129
Episode 15460 avg length: 267 reward: 134
Episode 15480 avg length: 263 reward: 135
Episode 15500 avg length: 280 reward: 141
Episode 15520 avg length: 254 reward: 126
Episode 15540 avg length: 275 reward: 133
Episode 15560 avg length: 271 reward: 120
Episode 15580 avg length: 270 reward: 130
Episode 15600 avg length: 299 reward: 144
Episode 15620 avg length: 254 reward: 88
Episode 15640 avg length: 271 reward: 126
Episode 15660 avg length: 289 reward: 153
Episode 15680 avg length: 231 reward: 104
Episode 15700 avg length: 227 reward: 127
Episode 15720 avg length: 174 reward: 82
Episode 15740 avg length: 214 reward: 92
Episode 15760 avg length: 190 reward: 89
Episode 15780 avg length: 159 reward: 49
Episode 15800 avg length: 222 reward: 100
Episode 15820 avg length: 269 reward: 133
Episode 15840 avg length: 243 reward: 100
Episode 15860 avg length: 191 reward: 68
Episode 15880 avg length: 221 reward: 86
Episode 15900 avg length: 206 reward: 109
Episode 15920 avg length: 228 reward: 89
Episode 15940 avg length: 250 reward: 108
Episode 15960 avg length: 229 reward: 110
Episode 15980 avg length: 263 reward: 139
Episode 16000 avg length: 250 reward: 125
Episode 16020 avg length: 270 reward: 140
Episode 16040 avg length: 251 reward: 131
Episode 16060 avg length: 258 reward: 124
Episode 16080 avg length: 268 reward: 130
Episode 16100 avg length: 263 reward: 125
Episode 16120 avg length: 280 reward: 150
Episode 16140 avg length: 267 reward: 132
Episode 16160 avg length: 284 reward: 137
Episode 16180 avg length: 275 reward: 128
Episode 16200 avg length: 269 reward: 132
Episode 16220 avg length: 280 reward: 132
Episode 16240 avg length: 279 reward: 145
Episode 16260 avg length: 299 reward: 152
Episode 16280 avg length: 238 reward: 112
Episode 16300 avg length: 284 reward: 159
Episode 16320 avg length: 280 reward: 136
Episode 16340 avg length: 271 reward: 120
Episode 16360 avg length: 281 reward: 139
Episode 16380 avg length: 267 reward: 141
Episode 16400 avg length: 299 reward: 164
Episode 16420 avg length: 239 reward: 113
Episode 16440 avg length: 276 reward: 143
Episode 16460 avg length: 268 reward: 144
Episode 16480 avg length: 269 reward: 134
Episode 16500 avg length: 273 reward: 148
Episode 16520 avg length: 247 reward: 97
Episode 16540 avg length: 266 reward: 129
Episode 16560 avg length: 267 reward: 119
Episode 16580 avg length: 270 reward: 124
Episode 16600 avg length: 262 reward: 101
Episode 16620 avg length: 257 reward: 121
Episode 16640 avg length: 233 reward: 99
Episode 16660 avg length: 268 reward: 114
Episode 16680 avg length: 261 reward: 126
Episode 16700 avg length: 278 reward: 143
Episode 16720 avg length: 278 reward: 117
Episode 16740 avg length: 266 reward: 135
Episode 16760 avg length: 282 reward: 140
Episode 16780 avg length: 299 reward: 154
Episode 16800 avg length: 279 reward: 144
Episode 16820 avg length: 281 reward: 124
Episode 16840 avg length: 280 reward: 132
Episode 16860 avg length: 278 reward: 148
Episode 16880 avg length: 280 reward: 113
Episode 16900 avg length: 268 reward: 133
Episode 16920 avg length: 291 reward: 147
Episode 16940 avg length: 274 reward: 150
Episode 16960 avg length: 281 reward: 137
Episode 16980 avg length: 251 reward: 126
Episode 17000 avg length: 261 reward: 135
Episode 17020 avg length: 267 reward: 105
Episode 17040 avg length: 274 reward: 176
Episode 17060 avg length: 262 reward: 131
Episode 17080 avg length: 186 reward: 184
Episode 17100 avg length: 225 reward: 150
Episode 17120 avg length: 201 reward: 218
Episode 17140 avg length: 211 reward: 220
Episode 17160 avg length: 221 reward: 218
Episode 17180 avg length: 232 reward: 210
Episode 17200 avg length: 216 reward: 220
Episode 17220 avg length: 226 reward: 203
Episode 17240 avg length: 198 reward: 170
Episode 17260 avg length: 196 reward: 222
Episode 17280 avg length: 214 reward: 196
Episode 17300 avg length: 229 reward: 205
Episode 17320 avg length: 183 reward: 192
Episode 17340 avg length: 212 reward: 186
Episode 17360 avg length: 192 reward: 164
########## Solved! ##########

到此這篇關於Python強化練習之PyTorch opp算法實現月球登陸器的文章就介紹到這瞭,更多相關Python OPP內容請搜索WalkonNet以前的文章或繼續瀏覽下面的相關文章希望大傢以後多多支持WalkonNet!

推薦閱讀: