[ML] Force-stop for a stopping datafeed is ignored #48931

droberts195 · 2019-11-11T09:21:06Z

Currently a request to force-stop one or more datafeeds ignores datafeeds that are in the stopping state. The rationale for this is that the datafeed will soon stop by itself. However, there is a situation where this will not happen: when the node that the datafeed was running on is no longer in the cluster. A datafeed can only stop "normally" by redirecting the request to the node on which it is running, so if this doesn't exist then it gets stuck in the stopping state.

Certainly for stopping datafeeds that are unassigned or that have stale assignments, force-stop should remove the persistent task.

Possibly the logic for force-stop could be changed so that persistent tasks are removed unconditionally for all datafeeds listed in the request, regardless of current state. But this requires a little more thought.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-11-11T09:21:08Z

Pinging @elastic/ml-core (:ml)

droberts195 · 2019-11-13T14:01:20Z

I'll also try to cover #43670 (comment) in the fix, as it's in the same part of the code.

The following edge cases were fixed: 1. A request to force-stop a stopping datafeed is no longer ignored. Force-stop is an important recovery mechanism if normal stop doesn't work for some reason, and needs to operate on a datafeed in any state other than stopped. 2. If the node that a datafeed is running on is removed from the cluster during a normal stop then the stop request is retried (and will likely succeed on this retry by simply cancelling the persistent task for the affected datafeed). 3. If there are multiple simultaneous force-stop requests for the same datafeed we no longer fail the one that is processed second. The previous behaviour was wrong as stopping a stopped datafeed is not an error, so stopping a datafeed twice simultaneously should not be either. Fixes elastic#43670 Fixes elastic#48931

The following edge cases were fixed: 1. A request to force-stop a stopping datafeed is no longer ignored. Force-stop is an important recovery mechanism if normal stop doesn't work for some reason, and needs to operate on a datafeed in any state other than stopped. 2. If the node that a datafeed is running on is removed from the cluster during a normal stop then the stop request is retried (and will likely succeed on this retry by simply cancelling the persistent task for the affected datafeed). 3. If there are multiple simultaneous force-stop requests for the same datafeed we no longer fail the one that is processed second. The previous behaviour was wrong as stopping a stopped datafeed is not an error, so stopping a datafeed twice simultaneously should not be either. Fixes #43670 Fixes #48931

droberts195 added >bug :ml Machine learning labels Nov 11, 2019

This was referenced Nov 11, 2019

[ML] Datafeed assignment should cancel datafeeds with failed jobs #48934

Closed

[ML] Failure persisting datafeed timing stats leaves datafeed in limbo #49032

Closed

droberts195 self-assigned this Nov 13, 2019

droberts195 mentioned this issue Nov 13, 2019

MlDistributedFailureIT.testCloseUnassignedJobAndDatafeed fails with NodeNotConnectedException #43670

Closed

droberts195 mentioned this issue Nov 15, 2019

[ML] Fixes for stop datafeed edge cases #49191

Merged

droberts195 closed this as completed in #49191 Nov 19, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Force-stop for a stopping datafeed is ignored #48931

[ML] Force-stop for a stopping datafeed is ignored #48931

droberts195 commented Nov 11, 2019 •

edited

Loading

elasticmachine commented Nov 11, 2019

droberts195 commented Nov 13, 2019 •

edited

Loading

[ML] Force-stop for a stopping datafeed is ignored #48931

[ML] Force-stop for a stopping datafeed is ignored #48931

Comments

droberts195 commented Nov 11, 2019 • edited Loading

elasticmachine commented Nov 11, 2019

droberts195 commented Nov 13, 2019 • edited Loading

droberts195 commented Nov 11, 2019 •

edited

Loading

droberts195 commented Nov 13, 2019 •

edited

Loading