Model learns the opposite direction, worst possible reward #13

realiti4 · 2020-06-17T12:39:27Z

Hi, this is not an issue, but after days of trying to figure this out, I wanted to ask in case someone has an advice for me. First I found this issue on my own custom env. I tried DQN, A2C, PPO and all of them are doesn't know which way to go. It just fluctuates between best and worst possible reward. It learns perfectly, because when it is negative it is the worst possible outcome. Then I wanted to try your env which is very clean and easy to understand, but I am having the exact same issue. Do you have any experience with something like this? I'm doing something wrong but couldn't find it. Thanks.

AminHP · 2020-06-19T03:05:45Z

Hi, can you show me a picture of your results? And I don't exactly understand what you mean by "It learns perfectly but doesn't know which way to go".

realiti4 · 2020-06-19T08:06:01Z

Hi thank you for response. Sorry if I wasn't clear. What I meant was usually at the start of a training there is a high chance that model decides to maximize towards negative rewards. Sometimes this also happens later in training when model is doing great towards positive rewards and suddenly it flips and tries to maximize negative. I'm not sure what I am doing wrong. Same models are running fine with other problems. I'll rerun and upload my results today.

AminHP · 2020-06-21T01:21:34Z

I had the same issue once and honestly I don't exactly know how to fix it. Maybe a bad reward function or lack of useful features leads to this issue. Or maybe your model is much/less complicated than what is actually needed. Even it can be something inside your neural network (like activation functions or other parameters) that causes this issue.

realiti4 · 2020-06-21T13:20:51Z

Thanks. I'll continue investigating. If I find something, I'll update in case someone might find it useful.

realiti4 closed this as completed Jun 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model learns the opposite direction, worst possible reward #13

Model learns the opposite direction, worst possible reward #13

realiti4 commented Jun 17, 2020

AminHP commented Jun 19, 2020

realiti4 commented Jun 19, 2020

AminHP commented Jun 21, 2020

realiti4 commented Jun 21, 2020

Model learns the opposite direction, worst possible reward #13

Model learns the opposite direction, worst possible reward #13

Comments

realiti4 commented Jun 17, 2020

AminHP commented Jun 19, 2020

realiti4 commented Jun 19, 2020

AminHP commented Jun 21, 2020

realiti4 commented Jun 21, 2020