why there are two calls to the policy, also where is the non intrinsic characteristic of intrinsic reward? #13

mehdimashayekhi · 2019-01-10T17:22:31Z

Hi, Thanks for sharing. I was wondering if you can explain why do we need two calls for apply_policy in the can_gru_policy_dynamics.py, here

random-network-distillation/policies/cnn_gru_policy_dynamics.py

Line 69 in f75c0f1

self.apply_policy(self.ph_ob[None][:,:-1],

and here

random-network-distillation/policies/cnn_gru_policy_dynamics.py

Line 83 in f75c0f1

self.apply_policy(self.ph_ob[None],

Also, I have another question. Based on the paper, intrinsic reward, should be non episodic but extrinsic reward is treated as episodic, I couldn't find where this "non episodic" charactersitic has been addressed for intrinsic reward in the implementation. Shouldn't we also add this episodic reward (i.e., eprews) to the external reward (i.e., rews_ext)?!

random-network-distillation/ppo_agent.py

Line 241 in f75c0f1

eprews = MPI.COMM_WORLD.allgather(np.mean(list(self.I.statlists["eprew"])))

really appreciate your responses

The text was updated successfully, but these errors were encountered:

harri-edwards · 2019-02-01T18:02:20Z

There are two graphs created for the policy / predictor, one for rollout and one for optimization. This is because at rollout time the time dimension has size 1 and is better treated separately.

If you look at

random-network-distillation/ppo_agent.py

Line 294 in f75c0f1

    
           self.I.buf_advs = self.int_coeff*self.I.buf_advs_int + self.ext_coeff*self.I.buf_advs_ext

you'll see the intrinsic and extrinsic advantages are combined there.

mehdimashayekhi changed the title ~~apply policy call~~ why there is two call to the policy, also where is the non intrinsic characteristic of intrinsic reward? Jan 12, 2019

mehdimashayekhi changed the title ~~why there is two call to the policy, also where is the non intrinsic characteristic of intrinsic reward?~~ why there are two calls to the policy, also where is the non intrinsic characteristic of intrinsic reward? Jan 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why there are two calls to the policy, also where is the non intrinsic characteristic of intrinsic reward? #13

why there are two calls to the policy, also where is the non intrinsic characteristic of intrinsic reward? #13

mehdimashayekhi commented Jan 10, 2019 •

edited

Loading

harri-edwards commented Feb 1, 2019

why there are two calls to the policy, also where is the non intrinsic characteristic of intrinsic reward? #13

why there are two calls to the policy, also where is the non intrinsic characteristic of intrinsic reward? #13

Comments

mehdimashayekhi commented Jan 10, 2019 • edited Loading

harri-edwards commented Feb 1, 2019

mehdimashayekhi commented Jan 10, 2019 •

edited

Loading