Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
johnjim0816 committed Jun 18, 2022
1 parent 4076b4f commit 88cb61c
Show file tree
Hide file tree
Showing 30 changed files with 68 additions and 823 deletions.
5 changes: 0 additions & 5 deletions codes/.gitignore

This file was deleted.

4 changes: 2 additions & 2 deletions codes/DQN/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@

## 原理简介

DQN是Q-learning算法的优化和延伸,Q-learning中使用有限的Q表存储值的信息,而DQN中则用神经网络替代Q表存储信息,这样更适用于高维的情况,相关知识基础可参考[EasyRL-DQN](https://datawhalechina.github.io/easy-rl/#/chapter6/chapter6)
DQN是Q-leanning算法的优化和延伸,Q-leaning中使用有限的Q表存储值的信息,而DQN中则用神经网络替代Q表存储信息,这样更适用于高维的情况,相关知识基础可参考[datawhale李宏毅笔记-Q学习](https://datawhalechina.github.io/easy-rl/#/chapter6/chapter6)

论文方面主要可以参考两篇,一篇就是2013年谷歌DeepMind团队的[Playing Atari with Deep Reinforcement Learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf),一篇是也是他们团队后来在Nature杂志上发表的[Human-level control through deep reinforcement learning](https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf)。后者在算法层面增加target q-net,也可以叫做Nature DQN。

Nature DQN使用了两个Q网络,一个当前Q网络𝑄用来选择动作,更新模型参数,另一个目标Q网络𝑄′用于计算目标Q值。目标Q网络的网络参数不需要迭代更新,而是每隔一段时间从当前Q网络𝑄复制过来,即延时更新,这样可以减少目标Q值和当前的Q值相关性。

要注意的是,两个Q网络的结构是一模一样的这样才可以复制网络参数。Nature DQN和[Playing Atari with Deep Reinforcement Learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)相比,除了用一个新的相同结构的目标Q网络来计算目标Q值以外,其余部分基本是完全相同的。细节也可参考[强化学习(九)Deep Q-Learning进阶之Nature DQN](https://www.cnblogs.com/pinard/p/9756075.html)
要注意的是,两个Q网络的结构是一模一样的这样才可以复制网络参数。Nature DQN和[Playing Atari with Deep Reinforcement Learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)相比,除了用一个新的相同结构的目标Q网络来计算目标Q值以外,其余部分基本是完全相同的。细节也可参考[强化学习(九)Deep Q-Learning进阶之Nature DQN](https://www.cnblogs.com/pinard/p/9756075.html)

https://blog.csdn.net/JohnJim0/article/details/109557173)

Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
40 changes: 30 additions & 10 deletions codes/DQN/task1.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
Email: johnjim0816@gmail.com
Date: 2021-12-22 11:14:17
LastEditor: JiangJi
LastEditTime: 2022-02-10 06:17:41
LastEditTime: 2022-06-18 20:12:20
Discription: 使用 Nature DQN 训练 CartPole-v1
'''
import sys
Expand All @@ -17,6 +17,9 @@
import gym
import torch
import datetime
import torch.nn as nn
import torch.nn.functional as F

from common.utils import save_results, make_dir
from common.utils import plot_rewards, plot_rewards_cn
from dqn import DQN
Expand All @@ -33,18 +36,18 @@ def __init__(self):
self.env_name = env_name # 环境名称
self.device = torch.device(
"cuda" if torch.cuda.is_available() else "cpu") # 检测GPU
self.train_eps = 200 # 训练的回合数
self.test_eps = 30 # 测试的回合数
self.train_eps = 300 # 训练的回合数
self.test_eps = 20 # 测试的回合数
# 超参数
self.gamma = 0.95 # 强化学习中的折扣因子
self.epsilon_start = 0.90 # e-greedy策略中初始epsilon
self.epsilon_end = 0.01 # e-greedy策略中的终止epsilon
self.gamma = 0.99 # 强化学习中的折扣因子
self.epsilon_start = 0.99 # e-greedy策略中初始epsilon
self.epsilon_end = 0.005 # e-greedy策略中的终止epsilon
self.epsilon_decay = 500 # e-greedy策略中epsilon的衰减率
self.lr = 0.0001 # 学习率
self.memory_capacity = 100000 # 经验回放的容量
self.batch_size = 64 # mini-batch SGD中的批量大小
self.batch_size = 128 # mini-batch SGD中的批量大小
self.target_update = 4 # 目标网络的更新频率
self.hidden_dim = 256 # 网络隐藏层
self.hidden_dim = 512 # 网络隐藏层
class PlotConfig:
''' 绘图相关参数设置
'''
Expand All @@ -60,15 +63,32 @@ def __init__(self) -> None:
'/' + curr_time + '/models/' # 保存模型的路径
self.save = True # 是否保存图片


class MLP(nn.Module):
def __init__(self, n_states,n_actions,hidden_dim=128):
""" 初始化q网络,为全连接网络
n_states: 输入的特征数即环境的状态维度
n_actions: 输出的动作维度
"""
super(MLP, self).__init__()
self.fc1 = nn.Linear(n_states, hidden_dim) # 输入层
self.fc2 = nn.Linear(hidden_dim,hidden_dim) # 隐藏层
self.fc3 = nn.Linear(hidden_dim, n_actions) # 输出层

def forward(self, x):
# 各层对应的激活函数
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return self.fc3(x)

def env_agent_config(cfg, seed=1):
''' 创建环境和智能体
'''
env = gym.make(cfg.env_name) # 创建环境
env.seed(seed) # 设置随机种子
n_states = env.observation_space.shape[0] # 状态维度
n_actions = env.action_space.n # 动作维度
agent = DQN(n_states, n_actions, cfg) # 创建智能体
model = MLP(n_states,n_actions)
agent = DQN(n_actions,model,cfg) # 创建智能体
return env, agent

def train(cfg, env, agent):
Expand Down
184 changes: 0 additions & 184 deletions codes/DQN/test copy.py

This file was deleted.

Binary file removed codes/Docs/assets/Qlearning_1.png
Binary file not shown.
Binary file removed codes/Docs/assets/cliffwalking_1.png
Binary file not shown.
Binary file removed codes/Docs/assets/eval_rewards_curve_cn-1689282.png
Binary file not shown.
Binary file removed codes/Docs/assets/eval_rewards_curve_cn-1760950.png
Binary file not shown.
Binary file removed codes/Docs/assets/eval_rewards_curve_cn.png
Binary file not shown.
Binary file removed codes/Docs/assets/image-20210915020027615.png
Binary file not shown.
Binary file removed codes/Docs/assets/pendulum_1.png
Binary file not shown.
Binary file removed codes/Docs/assets/poster.jpg
Binary file not shown.
Binary file removed codes/Docs/assets/train_rewards_curve_cn-1689150.png
Binary file not shown.
Binary file removed codes/Docs/assets/train_rewards_curve_cn-1760758.png
Binary file not shown.
Binary file removed codes/Docs/assets/train_rewards_curve_cn.png
Binary file not shown.
Loading

0 comments on commit 88cb61c

Please sign in to comment.