[RLlib] Framework "tf2" raises error in `MLPEncoderConfig` #37413

simonsays1980 · 2023-07-14T09:56:37Z

What happened + What you expected to happen

What happened

I ran PPO with RLModule and _enable_learner_api=True using framework="tf2".

The following error occurred:

Failure # 1 (occurred at 2023-07-14_11-45-33)
The actor died because of an error raised in its creation task, �[36mray::PPO.__init__()�[39m (pid=121731, ip=192.168.1.111, actor_id=c9691334abdff1dd36b4318b01000000, repr=PPO)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 242, in _setup
    self.add_workers(
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 635, in add_workers
    raise result.get()
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/utils/actor_manager.py", line 488, in __fetch_result
    result = ray.get(r)
ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, �[36mray::RolloutWorker.__init__()�[39m (pid=121801, ip=192.168.1.111, actor_id=72a53083634c71fe34135f9401000000, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x7f0e5855f2e0>)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 738, in __init__
    self._update_policy_map(policy_dict=self.policy_dict)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1985, in _update_policy_map
    self._build_policy_map(
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 2097, in _build_policy_map
    new_policy = create_policy_for_framework(
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/utils/policy.py", line 139, in create_policy_for_framework
    return policy_class(observation_space, action_space, merged_config)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/policy/eager_tf_policy.py", line 164, in __init__
    super(TracedEagerPolicy, self).__init__(*args, **kwargs)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/algorithms/ppo/ppo_tf_policy.py", line 81, in __init__
    base.__init__(
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/policy/eager_tf_policy_v2.py", line 115, in __init__
    self.model = self.make_rl_module()
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/policy/policy.py", line 421, in make_rl_module
    marl_module = marl_spec.build()
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/rl_module/marl_module.py", line 452, in build
    return self.marl_module_class(module_config)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/rl_module/rl_module.py", line 289, in new_init
    previous_init(self, *args, **kwargs)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/rl_module/marl_module.py", line 56, in __init__
    super().__init__(config)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/rl_module/rl_module.py", line 282, in __init__
    self.setup()
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/rl_module/marl_module.py", line 63, in setup
    self._rl_modules[module_id] = module_spec.build()
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/rl_module/rl_module.py", line 93, in build
    return self.module_class(module_config)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/rl_module/rl_module.py", line 289, in new_init
    previous_init(self, *args, **kwargs)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/algorithms/ppo/tf/ppo_tf_rl_module.py", line 20, in __init__
    TfRLModule.__init__(self, *args, **kwargs)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/rl_module/rl_module.py", line 289, in new_init
    previous_init(self, *args, **kwargs)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/rl_module/tf/tf_rl_module.py", line 18, in __init__
    RLModule.__init__(self, *args, **kwargs)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/rl_module/rl_module.py", line 282, in __init__
    self.setup()
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/algorithms/ppo/ppo_rl_module.py", line 24, in setup
    self.encoder = catalog.build_actor_critic_encoder(framework=self.framework)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/algorithms/ppo/ppo_catalog.py", line 118, in build_actor_critic_encoder
    return self.actor_critic_encoder_config.build(framework=framework)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/models/configs.py", line 42, in checked_build
    return fn(self, framework, **kwargs)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/models/configs.py", line 767, in build
    return TfActorCriticEncoder(self)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/models/base.py", line 110, in new_init
    previous_init(self, *args, **kwargs)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/models/tf/encoder.py", line 42, in __init__
    ActorCriticEncoder.__init__(self, config)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/models/base.py", line 110, in new_init
    previous_init(self, *args, **kwargs)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/models/base.py", line 352, in __init__
    self.actor_encoder = config.base_encoder_config.build(
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/models/configs.py", line 42, in checked_build
    return fn(self, framework, **kwargs)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/models/configs.py", line 598, in build
    self._validate()
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/models/configs.py", line 589, in _validate
    super()._validate(framework)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/core/models/configs.py", line 102, in _validate
    get_activation_fn(self.hidden_layer_activation, framework=framework)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/models/utils.py", line 120, in get_activation_fn
    return nn.Tanh
AttributeError: '_NNStub' object has no attribute 'Tanh'

During handling of the above exception, another exception occurred:

�[36mray::PPO.__init__()�[39m (pid=121731, ip=192.168.1.111, actor_id=c9691334abdff1dd36b4318b01000000, repr=PPO)
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 475, in __init__
    super().__init__(
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/tune/trainable/trainable.py", line 170, in __init__
    self.setup(copy.deepcopy(self.config))
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 601, in setup
    self.workers = WorkerSet(
  File "/home/simon/git-projects/test-gym-experiments/.venv/lib/python3.9/site-packages/ray/rllib/evaluation/worker_set.py", line 194, in __init__
    raise e.args[0].args[2]
AttributeError: '_NNStub' object has no attribute 'Tanh'

What I expected to happen

That using TensorFlow2 runs with RLModule.

Versions / Dependencies

Fedora 37
Python 3.9.12
Ray 2.5.1

Reproduction script

import ray
from ray.rllib.algorithms.ppo.ppo import PPOConfig
from ray import air, tune

config = (
    PPOConfig()
    .environment(
        env="CartPole-v1",
        disable_env_checking=True,
    )
    .framework(
        framework="tf2",
        eager_tracing=True,
    )
    .rollouts(
        rollout_fragment_length=200,
        num_envs_per_worker=2,
        num_rollout_workers=1,
        observation_filter="MeanStdFilter",     
    )
    .resources(
        num_cpus_per_worker=2,
        num_cpus_for_local_worker=1,
    )
    .rl_module(
        _enable_rl_module_api=True,
    )
    .training(
        gamma=0.95,
        lr=5e-4,
        kl_coeff=0.2,
        train_batch_size=200*10,
        sgd_minibatch_size=240,
        _enable_learner_api=True,
    )
    .debugging(
        log_level="DEBUG",
    )
)

#ray.init(local_mode=True)
tuner = tune.Tuner(
    "PPO",
    param_space=config,
    run_config=air.RunConfig(
        stop={"training_iteration": 2},
    )
)
tuner.fit()

Issue Severity

Medium: It is a significant difficulty but I can work around it.

The text was updated successfully, but these errors were encountered:

ArturNiederfahrenhorst · 2023-07-19T19:38:28Z

Thanks a lot for reporting this :)

ArturNiederfahrenhorst · 2023-07-19T19:45:05Z

@simonsays1980 Running the reproduction script does not result in an error on my side.
Can you have a look again?

simonsays1980 · 2023-07-20T07:20:49Z

@simonsays1980 Running the reproduction script does not result in an error on my side. Can you have a look again?

I can reproduce it again after installing a fresh vitual environment:

pyenv local 3.9.12
python -m venv .venv-2-5-1
source .venv-2-5-1/bin/activate
python -m pip install --upgrade pip
python -m pip install tensorflow tensorflow_probability
python -m pip install "ray[default,tune,rllib]"

Ar you already running inside of the nightly-built? It might have been modified there already?

ArturNiederfahrenhorst · 2023-07-20T21:01:09Z

That's what I'm thinking, too. I have not run this with 2.5.1, since the linked PR is targeted at master. On master, it appears to be fine.

simonsays1980 · 2023-07-21T06:00:19Z

That's what I'm thinking, too. I have not run this with 2.5.1, since the linked PR is targeted at master. On master, it appears to be fine.

If this is fine on master there is nothing more to do. I will close this issue and the corresponding PR.

ArturNiederfahrenhorst · 2023-07-21T15:59:14Z

Thanks!

simonsays1980 · 2023-08-18T16:03:11Z

Sorry, I have to reopen. I do not see it fixed on master. WIth the last nightly it still gives me the error. I can write a PR.

marcm-ml · 2023-09-06T09:18:19Z

I have the same error and passing framework to _validate (as done in the PR by @simonsays1980) fixes the issue!

simonsays1980 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 14, 2023

simonsays1980 mentioned this issue Jul 14, 2023

Fixed a bug with the '_validate()' method. It was missing the framewo… #37414

Closed

8 tasks

ArturNiederfahrenhorst self-assigned this Jul 19, 2023

ArturNiederfahrenhorst added P0 Issues that should be fixed in short order rllib RLlib related issues and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Jul 19, 2023

ArturNiederfahrenhorst added @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. and removed P0 Issues that should be fixed in short order labels Jul 19, 2023

simonsays1980 closed this as completed Jul 21, 2023

simonsays1980 reopened this Aug 18, 2023

simonsays1980 mentioned this issue Aug 18, 2023

Added the framwork in the '_validate()' func call to enable framework tf2 #38610

Closed

8 tasks

marcm-ml mentioned this issue Sep 19, 2023

[RLLIB] Fix configs.py for other Frameworks as torch (e.g. TF2) #35975

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RLlib] Framework "tf2" raises error in `MLPEncoderConfig` #37413

[RLlib] Framework "tf2" raises error in `MLPEncoderConfig` #37413

simonsays1980 commented Jul 14, 2023 •

edited

Loading

ArturNiederfahrenhorst commented Jul 19, 2023

ArturNiederfahrenhorst commented Jul 19, 2023

simonsays1980 commented Jul 20, 2023

ArturNiederfahrenhorst commented Jul 20, 2023

simonsays1980 commented Jul 21, 2023

ArturNiederfahrenhorst commented Jul 21, 2023

simonsays1980 commented Aug 18, 2023

marcm-ml commented Sep 6, 2023 •

edited

Loading

[RLlib] Framework "tf2" raises error in MLPEncoderConfig #37413

[RLlib] Framework "tf2" raises error in MLPEncoderConfig #37413

Comments

simonsays1980 commented Jul 14, 2023 • edited Loading

What happened + What you expected to happen

What happened

What I expected to happen

Versions / Dependencies

Reproduction script

Issue Severity

ArturNiederfahrenhorst commented Jul 19, 2023

ArturNiederfahrenhorst commented Jul 19, 2023

simonsays1980 commented Jul 20, 2023

ArturNiederfahrenhorst commented Jul 20, 2023

simonsays1980 commented Jul 21, 2023

ArturNiederfahrenhorst commented Jul 21, 2023

simonsays1980 commented Aug 18, 2023

marcm-ml commented Sep 6, 2023 • edited Loading

[RLlib] Framework "tf2" raises error in `MLPEncoderConfig` #37413

[RLlib] Framework "tf2" raises error in `MLPEncoderConfig` #37413

simonsays1980 commented Jul 14, 2023 •

edited

Loading

marcm-ml commented Sep 6, 2023 •

edited

Loading