Skip to content

Commit

Permalink
add ex4.3
Browse files Browse the repository at this point in the history
  • Loading branch information
Ti-Ho committed Mar 25, 2021
1 parent 050f42f commit 5308c43
Showing 1 changed file with 15 additions and 1 deletion.
16 changes: 15 additions & 1 deletion 动态规划/习题解答.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,12 @@ $$
*解答:*(1)假设从原来的状态转出的方式不变,这意味着新添加的新状态“15”,对原本其它位置的状态价值函数值不影响,保持不变。我们可以根据书中式4.4计算新状态“15”的状态价值函数:
$$
式4.4:\ v_{\pi}(s) = \sum_{a}\pi(a|s)\sum_{s',r}p(s',r|s,a)[r + \gamma v_{\pi}(s')] \\
\begin{align}
v_{\pi}(15) = \pi(left|15)p(12, -1|15,left)[-1 + 1 * v_{\pi}(12)] \\
+ \pi(up|15)p(13, -1|15,up)[-1 + 1 * v_{\pi}(13)] \\
+ \pi(right|15)p(14, -1|15,right)[-1 + 1 * v_{\pi}(14)] \\
+ \pi(down|15)p(15, -1|15,down)[-1 + 1 * v_{\pi}(15)]
\end{align}
$$
由上图$v_{\pi}$收敛后的图表查出$v_{\pi}(12) = -22$,$v_{\pi}(13) = -20$,$v_{\pi}(14) = -14$,又由于是等概率随机策略,所以$\pi(left|15),\pi(up|15),\pi(right|15),\pi(down|15)$均为$\frac{1}{4}$带入上式化简得:
$$
Expand All @@ -61,5 +63,17 @@ $$

> 对于动作价值函数$q_{\pi}$以及其逼近序列函数$q_0, q_1, q_2,...$,类似于式(4.3)、式(4.4)和式(4.5)的公式是什么?
*解答:*
*解答:*类似于式(4.3)、式(4.4)的公式为动作价值函数$q_{\pi}(s, a)$的贝尔曼方程:
$$
\begin{align}
q_{\pi}(s, a) &= \mathbb{E}[G_t|S_t = s, A_t = a] \\
&=\mathbb{E}_{\pi}[R_{t + 1}|S_t = s, A_t = a] + \gamma \mathbb{E}_{\pi}[G_{t + 1|S_t = s, A_t = a}] \\
&=\sum_{s',r}p(s',r|s,a)[r + \gamma \sum_{a'}\pi(a'|s')q_{\pi}(s',a')]
\end{align}
$$
迭代法求近似的动作价值函数,类似于式(4.5)的公式:
$$
q_{k + 1}(s, a) = \sum_{s',r}p(s',r|s,a)[r + \gamma \sum_{a'}\pi(a'|s')q_{k}(s',a')]
$$


0 comments on commit 5308c43

Please sign in to comment.