Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation for die reorientation #26

Closed
P-Schumacher opened this issue Sep 19, 2022 · 4 comments
Closed

Evaluation for die reorientation #26

P-Schumacher opened this issue Sep 19, 2022 · 4 comments

Comments

@P-Schumacher
Copy link
Collaborator

I feel that the evaluation criteria for the die reorientation task are a bit restrictive. Some of my policies are able to solve the task, but only before or after the specific time window that would count as a success.

Would it be possible to relax this a bit?

I suggest an episode limit of 200 and measuring a success if the goal is reached for 5 consecutive time steps at any point in the episode.

This preserves the spirit of the task, but is a bit easier.

@NaturalGradient
Copy link

Just chiming in to say that I agree here. The current die reorientation task doesn't seem generally solveable within 50 time-steps based on my experiments.

@vikashplus
Copy link
Collaborator

  • We are strongly considering boosting the horizon of the Die task. Stay tuned.
  • The goal of the die task is to stabilize the object at the specified goal location. success if the goal is reached for 5 consecutive time steps at any point doesn't seem to capture the essence of this task. There are also a few corner cases for this criteria (a) A policy that aggressively spins the object will succeed (b) A policy that stabilizes the object will have no advantage over a policy that throws the object to goal location. (c) Variable horizon length will introduce artifacts in effort calculations.

@P-Schumacher
Copy link
Collaborator Author

Thank you for the reply. I didn't consider how the variable horizon length affects the effort calculation.
A slightly longer time interval than 5 steps might have prevented the corner cases, but does not solve the effort issue.

Thinking about it, 200 steps might even be slightly too long then. In my experiments, it's very difficult for the policy to stabilize an object in the tight thresholds during the right time window. But it's hard to say.

@Vittorio-Caggiano
Copy link
Collaborator

closed with #29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants