Have you tried using multiple cpu on the Example here in A2C? #25

toksis · 2020-09-27T23:19:41Z

I am trying to use multiple cpu for the example provided on this link?

I tried to change the environment to multiple cpu.

env = DummyVecEnv([env_maker for i in range(16)])

But I have a problem in the done and info in stable baselines. It seems they turned into arrays.

There is an error in this code: any suggestions or any of you done this? It seems lstm in stable baselines are like this.

#env = env_maker()
#observation = env.reset()

while True:
    #observation = observation[np.newaxis, ...]

    # action = env.action_space.sample()
    action, _states = model.predict(observation)
    observation, reward, done, info = env.step(action)

    # env.render()
    if done:
        print("info:", info)
        break

------------------------------

Error:

```python
ValueError                                Traceback (most recent call last)
<ipython-input-27-2d78acbb8800> in <module>
     10 
     11     # env.render()
---> 12     if done:
     13         print("info:", info)
     14         break

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

AminHP · 2020-09-28T13:22:38Z

Use done.all() in the if statement.

toksis · 2020-09-29T10:00:09Z

It work but the render_all would not work even using the env.env_method(method_name='render_all') to call the method.

AminHP · 2020-09-29T12:17:32Z

Use this code instead:

for e in env.envs:
    plt.figure(figsize=(16, 6))
    e.render_all()
    plt.show()

AminHP · 2020-09-29T12:29:21Z

There is a fact you should consider: DummyVecEnv resets all the environments after they are done.
So, you can use the code below after importing DummyVecEnv to prevent this problem.

from copy import deepcopy

def step_wait(self):
    for env_idx in range(self.num_envs):
        obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] =\
            self.envs[env_idx].step(self.actions[env_idx])
        if self.buf_dones[env_idx]:
            # save final observation where user can get it, then reset
            self.buf_infos[env_idx]['terminal_observation'] = obs
            # obs = self.envs[env_idx].reset()
        self._save_obs(env_idx, obs)
    return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones),
            deepcopy(self.buf_infos))


DummyVecEnv.step_wait = step_wait

toksis · 2020-10-01T06:46:54Z

Hello,

Removing the .reset in Dummyvec end results this error... This happends at timesteps = 32000

   current_price = self.prices[self._current_tick]
IndexError: index 2335 is out of bounds for axis 0 with size 2335

I think that is the length of the data frame.

Error:

 File "e:\ml\reinforcementlearning\tradeorig\stable-baselines\stable_baselines\common\vec_env\base_vec_env.py", line 150, in step
    return self.step_wait()
  File "e:\ML\reinforcementlearning\tradeorig\testorig.py", line 29, in step_wait
    self.envs[env_idx].step(self.actions[env_idx])
  File "C:\anaconda\envs\gymanytradingOrig\lib\site-packages\gym_anytrading\envs\trading_env.py", line 78, in step
    step_reward = self._calculate_reward(action)
  File "C:\anaconda\envs\gymanytradingOrig\lib\site-packages\gym_anytrading\envs\stocks_env.py", line 39, in _calculate_reward
    current_price = self.prices[self._current_tick]
IndexError: index 2335 is out of bounds for axis 0 with size 2335

AminHP · 2020-10-01T23:49:45Z

from copy import deepcopy
import numpy as np
import pandas as pd

import gym
import gym_anytrading
import quantstats as qs

from stable_baselines import A2C
from stable_baselines.common.vec_env import DummyVecEnv

import matplotlib.pyplot as plt


df = gym_anytrading.datasets.STOCKS_GOOGL.copy()

window_size = 10
start_index = window_size
end_index = len(df)

env_maker = lambda: gym.make(
    'stocks-v0',
    df = df,
    window_size = window_size,
    frame_bound = (start_index, end_index)
)

env = DummyVecEnv([env_maker for _ in range(16)])

policy_kwargs = dict(net_arch=[64, 'lstm', dict(vf=[128, 128, 128], pi=[64, 64])])
model = A2C('MlpLstmPolicy', env, verbose=1, policy_kwargs=policy_kwargs)
model.learn(total_timesteps=1000)


class DummyVecEnv2(DummyVecEnv):
    def step_wait(self):
        for env_idx in range(self.num_envs):
            obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] =            self.envs[env_idx].step(self.actions[env_idx])
            if self.buf_dones[env_idx]:
                # save final observation where user can get it, then reset
                self.buf_infos[env_idx]['terminal_observation'] = obs
                # obs = self.envs[env_idx].reset()
            self._save_obs(env_idx, obs)
        return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones),
                deepcopy(self.buf_infos))


env = DummyVecEnv2([env_maker for i in range(16)])
observation = env.reset()

while True:
    # observation = observation[np.newaxis, ...]

    # action = env.action_space.sample()
    action, _states = model.predict(observation)
    observation, reward, done, info = env.step(action)

    # env.render()
    if done.all():
        print("info:", info)
        break

for e in env.envs:
    plt.figure(figsize=(16, 6))
    e.render_all()
    plt.show()

toksis · 2020-10-02T07:23:17Z

You are a Guru! It works now. What you did was after learning, override the DummyvecEnv by removing the reset. Am i correct?

AminHP · 2020-10-02T12:21:10Z

Thanks man :)

Yeah, somehow, but I didn't override DummyVecEnv itself this time. I inherited a new class from it (DummyVecEnv2) and overrode its reset method.

AminHP closed this as completed Oct 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have you tried using multiple cpu on the Example here in A2C? #25

Have you tried using multiple cpu on the Example here in A2C? #25

toksis commented Sep 27, 2020

AminHP commented Sep 28, 2020

toksis commented Sep 29, 2020

AminHP commented Sep 29, 2020

AminHP commented Sep 29, 2020

toksis commented Oct 1, 2020 •

edited

Loading

AminHP commented Oct 1, 2020

toksis commented Oct 2, 2020

AminHP commented Oct 2, 2020

Have you tried using multiple cpu on the Example here in A2C? #25

Have you tried using multiple cpu on the Example here in A2C? #25

Comments

toksis commented Sep 27, 2020

AminHP commented Sep 28, 2020

toksis commented Sep 29, 2020

AminHP commented Sep 29, 2020

AminHP commented Sep 29, 2020

toksis commented Oct 1, 2020 • edited Loading

AminHP commented Oct 1, 2020

toksis commented Oct 2, 2020

AminHP commented Oct 2, 2020

toksis commented Oct 1, 2020 •

edited

Loading