Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have you tried using multiple cpu on the Example here in A2C? #25

Closed
toksis opened this issue Sep 27, 2020 · 8 comments
Closed

Have you tried using multiple cpu on the Example here in A2C? #25

toksis opened this issue Sep 27, 2020 · 8 comments

Comments

@toksis
Copy link

toksis commented Sep 27, 2020

I am trying to use multiple cpu for the example provided on this link?

I tried to change the environment to multiple cpu.

env = DummyVecEnv([env_maker for i in range(16)])

But I have a problem in the done and info in stable baselines. It seems they turned into arrays.

There is an error in this code: any suggestions or any of you done this? It seems lstm in stable baselines are like this.

#env = env_maker()
#observation = env.reset()

while True:
    #observation = observation[np.newaxis, ...]

    # action = env.action_space.sample()
    action, _states = model.predict(observation)
    observation, reward, done, info = env.step(action)

    # env.render()
    if done:
        print("info:", info)
        break

------------------------------

Error:

```python
ValueError                                Traceback (most recent call last)
<ipython-input-27-2d78acbb8800> in <module>
     10 
     11     # env.render()
---> 12     if done:
     13         print("info:", info)
     14         break

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
@AminHP
Copy link
Owner

AminHP commented Sep 28, 2020

Use done.all() in the if statement.

@toksis
Copy link
Author

toksis commented Sep 29, 2020

It work but the render_all would not work even using the env.env_method(method_name='render_all') to call the method.

@AminHP
Copy link
Owner

AminHP commented Sep 29, 2020

Use this code instead:

for e in env.envs:
    plt.figure(figsize=(16, 6))
    e.render_all()
    plt.show()

@AminHP
Copy link
Owner

AminHP commented Sep 29, 2020

There is a fact you should consider: DummyVecEnv resets all the environments after they are done.
So, you can use the code below after importing DummyVecEnv to prevent this problem.

from copy import deepcopy

def step_wait(self):
    for env_idx in range(self.num_envs):
        obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] =\
            self.envs[env_idx].step(self.actions[env_idx])
        if self.buf_dones[env_idx]:
            # save final observation where user can get it, then reset
            self.buf_infos[env_idx]['terminal_observation'] = obs
            # obs = self.envs[env_idx].reset()
        self._save_obs(env_idx, obs)
    return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones),
            deepcopy(self.buf_infos))


DummyVecEnv.step_wait = step_wait

@toksis
Copy link
Author

toksis commented Oct 1, 2020

Hello,

Removing the .reset in Dummyvec end results this error... This happends at timesteps = 32000

   current_price = self.prices[self._current_tick]
IndexError: index 2335 is out of bounds for axis 0 with size 2335

I think that is the length of the data frame.

Error:

 File "e:\ml\reinforcementlearning\tradeorig\stable-baselines\stable_baselines\common\vec_env\base_vec_env.py", line 150, in step
    return self.step_wait()
  File "e:\ML\reinforcementlearning\tradeorig\testorig.py", line 29, in step_wait
    self.envs[env_idx].step(self.actions[env_idx])
  File "C:\anaconda\envs\gymanytradingOrig\lib\site-packages\gym_anytrading\envs\trading_env.py", line 78, in step
    step_reward = self._calculate_reward(action)
  File "C:\anaconda\envs\gymanytradingOrig\lib\site-packages\gym_anytrading\envs\stocks_env.py", line 39, in _calculate_reward
    current_price = self.prices[self._current_tick]
IndexError: index 2335 is out of bounds for axis 0 with size 2335

@AminHP
Copy link
Owner

AminHP commented Oct 1, 2020

from copy import deepcopy
import numpy as np
import pandas as pd

import gym
import gym_anytrading
import quantstats as qs

from stable_baselines import A2C
from stable_baselines.common.vec_env import DummyVecEnv

import matplotlib.pyplot as plt


df = gym_anytrading.datasets.STOCKS_GOOGL.copy()

window_size = 10
start_index = window_size
end_index = len(df)

env_maker = lambda: gym.make(
    'stocks-v0',
    df = df,
    window_size = window_size,
    frame_bound = (start_index, end_index)
)

env = DummyVecEnv([env_maker for _ in range(16)])

policy_kwargs = dict(net_arch=[64, 'lstm', dict(vf=[128, 128, 128], pi=[64, 64])])
model = A2C('MlpLstmPolicy', env, verbose=1, policy_kwargs=policy_kwargs)
model.learn(total_timesteps=1000)


class DummyVecEnv2(DummyVecEnv):
    def step_wait(self):
        for env_idx in range(self.num_envs):
            obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] =            self.envs[env_idx].step(self.actions[env_idx])
            if self.buf_dones[env_idx]:
                # save final observation where user can get it, then reset
                self.buf_infos[env_idx]['terminal_observation'] = obs
                # obs = self.envs[env_idx].reset()
            self._save_obs(env_idx, obs)
        return (self._obs_from_buf(), np.copy(self.buf_rews), np.copy(self.buf_dones),
                deepcopy(self.buf_infos))


env = DummyVecEnv2([env_maker for i in range(16)])
observation = env.reset()

while True:
    # observation = observation[np.newaxis, ...]

    # action = env.action_space.sample()
    action, _states = model.predict(observation)
    observation, reward, done, info = env.step(action)

    # env.render()
    if done.all():
        print("info:", info)
        break

for e in env.envs:
    plt.figure(figsize=(16, 6))
    e.render_all()
    plt.show()

@toksis
Copy link
Author

toksis commented Oct 2, 2020

You are a Guru! It works now. What you did was after learning, override the DummyvecEnv by removing the reset. Am i correct?

@AminHP
Copy link
Owner

AminHP commented Oct 2, 2020

Thanks man :)

Yeah, somehow, but I didn't override DummyVecEnv itself this time. I inherited a new class from it (DummyVecEnv2) and overrode its reset method.

@AminHP AminHP closed this as completed Oct 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants