Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model learns the opposite direction, worst possible reward #13

Closed
realiti4 opened this issue Jun 17, 2020 · 4 comments
Closed

Model learns the opposite direction, worst possible reward #13

realiti4 opened this issue Jun 17, 2020 · 4 comments

Comments

@realiti4
Copy link

Hi, this is not an issue, but after days of trying to figure this out, I wanted to ask in case someone has an advice for me. First I found this issue on my own custom env. I tried DQN, A2C, PPO and all of them are doesn't know which way to go. It just fluctuates between best and worst possible reward. It learns perfectly, because when it is negative it is the worst possible outcome. Then I wanted to try your env which is very clean and easy to understand, but I am having the exact same issue. Do you have any experience with something like this? I'm doing something wrong but couldn't find it. Thanks.

@AminHP
Copy link
Owner

AminHP commented Jun 19, 2020

Hi, can you show me a picture of your results? And I don't exactly understand what you mean by "It learns perfectly but doesn't know which way to go".

@realiti4
Copy link
Author

Hi thank you for response. Sorry if I wasn't clear. What I meant was usually at the start of a training there is a high chance that model decides to maximize towards negative rewards. Sometimes this also happens later in training when model is doing great towards positive rewards and suddenly it flips and tries to maximize negative. I'm not sure what I am doing wrong. Same models are running fine with other problems. I'll rerun and upload my results today.

@AminHP
Copy link
Owner

AminHP commented Jun 21, 2020

I had the same issue once and honestly I don't exactly know how to fix it. Maybe a bad reward function or lack of useful features leads to this issue. Or maybe your model is much/less complicated than what is actually needed. Even it can be something inside your neural network (like activation functions or other parameters) that causes this issue.

@realiti4
Copy link
Author

Thanks. I'll continue investigating. If I find something, I'll update in case someone might find it useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants