Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Force-stop for a stopping datafeed is ignored #48931

Closed
droberts195 opened this issue Nov 11, 2019 · 2 comments · Fixed by #49191
Closed

[ML] Force-stop for a stopping datafeed is ignored #48931

droberts195 opened this issue Nov 11, 2019 · 2 comments · Fixed by #49191
Assignees
Labels
>bug :ml Machine learning

Comments

@droberts195
Copy link
Contributor

droberts195 commented Nov 11, 2019

Currently a request to force-stop one or more datafeeds ignores datafeeds that are in the stopping state. The rationale for this is that the datafeed will soon stop by itself. However, there is a situation where this will not happen: when the node that the datafeed was running on is no longer in the cluster. A datafeed can only stop "normally" by redirecting the request to the node on which it is running, so if this doesn't exist then it gets stuck in the stopping state.

Certainly for stopping datafeeds that are unassigned or that have stale assignments, force-stop should remove the persistent task.

Possibly the logic for force-stop could be changed so that persistent tasks are removed unconditionally for all datafeeds listed in the request, regardless of current state. But this requires a little more thought.

@droberts195 droberts195 added >bug :ml Machine learning labels Nov 11, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@droberts195
Copy link
Contributor Author

droberts195 commented Nov 13, 2019

I'll also try to cover #43670 (comment) in the fix, as it's in the same part of the code.

droberts195 added a commit to droberts195/elasticsearch that referenced this issue Nov 15, 2019
The following edge cases were fixed:

1. A request to force-stop a stopping datafeed is no longer
   ignored.  Force-stop is an important recovery mechanism
   if normal stop doesn't work for some reason, and needs
   to operate on a datafeed in any state other than stopped.
2. If the node that a datafeed is running on is removed from
   the cluster during a normal stop then the stop request is
   retried (and will likely succeed on this retry by simply
   cancelling the persistent task for the affected datafeed).
3. If there are multiple simultaneous force-stop requests for
   the same datafeed we no longer fail the one that is
   processed second.  The previous behaviour was wrong as
   stopping a stopped datafeed is not an error, so stopping
   a datafeed twice simultaneously should not be either.

Fixes elastic#43670
Fixes elastic#48931
droberts195 added a commit that referenced this issue Nov 19, 2019
The following edge cases were fixed:

1. A request to force-stop a stopping datafeed is no longer
   ignored.  Force-stop is an important recovery mechanism
   if normal stop doesn't work for some reason, and needs
   to operate on a datafeed in any state other than stopped.
2. If the node that a datafeed is running on is removed from
   the cluster during a normal stop then the stop request is
   retried (and will likely succeed on this retry by simply
   cancelling the persistent task for the affected datafeed).
3. If there are multiple simultaneous force-stop requests for
   the same datafeed we no longer fail the one that is
   processed second.  The previous behaviour was wrong as
   stopping a stopped datafeed is not an error, so stopping
   a datafeed twice simultaneously should not be either.

Fixes #43670
Fixes #48931
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants