[ResponseOps] Discrepancy between the rule state and task manager when performing bulkEnable or bulkDisable rules #192207

js-jankisalvi · 2024-09-05T17:06:58Z

Related to: #181050

When enabling rules, the Alerting framework skips rules if their saved object has alert.attributes.enabled: true.
This behavior creates issues described here and in the attached SDH.

There might be a situation where a rule is marked as enabled, but no corresponding task exists in the Task Manager.
In the UI, these rules will appear enabled but will never run. Users expect that all rules affected by the bulk enable action will get a corresponding task created in the Task Manager and be scheduled for execution.
Therefore, it would be best to also check if the rules to be enabled have tasks in the Task Manager, instead of relying solely on the rule's current enabled state.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2024-09-05T17:07:00Z

Pinging @elastic/response-ops (Team:ResponseOps)

pmuellr · 2024-09-05T18:50:26Z

From a comment in a previous attempted PR: #189041 ...

In order to make the double update (rule and task) more resilient, we determined that this should be the order of updates:

for enable, we should update the task and then the rule
for disable, we should update the rule and then the task

analysis

Trying to capture what happens when a request to enable/disable a task is run, but one of the updates to the task doc or the rule doc fails - in this case, it's the second one (task updated then rule => task doc updated, then rule doc updated).

Two tables for each of enable / disable, showing the difference in the ordering of the updates (rule then task, or task then rule). The first column is the input state of the task/doc, and the second column is the final state after the failed update.

The mismatched input states would be the result of a bad enable/disable update, or some other bad thing that happened.

Final state of "task: !enabled, rule: enabled" should be avoided if possible. The rule will look enabled in the UX, but actually will not be running. These final states have an - XXX appended to them.

Final state of "task: enabled, rule: !enabled" is acceptable. In this case, the rule is the source of truth for enablement, so when the task runs, it will check if the rule is enabled. If it isn't, the rule execution code will instead disable the task, so it won't be polled for till re-enabled. Even if that update fails, we'll get it again the next time.

enable: task updated then rule update fails

input state	rule update fails
task: enabled, rule: enabled	task: enabled, rule: enabled
task: !enabled, rule: !enabled	task: enabled, rule: !enabled
task: enabled, rule: !enabled	task: enabled, rule: !enabled
task: !enabled, rule: enabled	task: enabled, rule: enabled

enable: rule updated then task update fails

input state	task update fails
task: enabled, rule: enabled	task: enabled, rule: enabled
task: !enabled, rule: !enabled	task: !enabled, rule: enabled - XXX
task: enabled, rule: !enabled	task: enabled, rule: enabled
task: !enabled, rule: enabled	task: !enabled, rule: enabled - XXX

disable: task updated then rule update fails

input state	rule update fails
task: enabled, rule: enabled	task: !enabled, rule: enabled - XXX
task: !enabled, rule: !enabled	task: !enabled, rule: !enabled
task: enabled, rule: !enabled	task: !enabled, rule: !enabled
task: !enabled, rule: enabled	task: !enabled, rule: enabled - XXX

disable: rule updated then task update fails

input state	rule update fails
task: enabled, rule: enabled	task: enabled, rule: !enabled
task: !enabled, rule: !enabled	task: !enabled, rule: !enabled
task: enabled, rule: !enabled	task: enabled, rule: !enabled
task: !enabled, rule: enabled	task: !enabled, rule: !enabled

pmuellr · 2024-09-05T18:53:42Z

In one of the previous related PRs, I believe I noticed we did NOT have a test where we created a rule in a disabled state, and then did a bulk update of it. We should add a FT for this case, and anything similar/related. You can repro this today, manually, by importing a rule that has been exported - it will be disabled on import (with no backing task doc). You can then enable it via the rule list, which will use the bulk enable/disable functionality.

pmuellr · 2024-09-09T20:33:36Z

Assuming this PR gets merged, it may change the way we fix this problem: Add bulk update function that directly updates using the esClient #191760.

Since this allows a partial update, what I'm wondering is if we just blast the enabled field in the task docs, as appropriate, without any OCC. Previously we were doing a full update (essentially). We were thinking we would need to OCC a bunch of the task doc updates, but not really sure that we do. We have conflict resolution in the task claimer, but I don't think there's really a need for it here, for the tasks anyway.

js-jankisalvi added bug Fixes for quality problems that affect the customer experience Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Sep 5, 2024

js-jankisalvi mentioned this issue Sep 5, 2024

[ResponseOps] Some rules are getting skipped when performing bulkEnableRules and bulkDisableRules #181050

Closed

pmuellr mentioned this issue Sep 6, 2024

[ResponseOps][Rules] Fix inconsistencies in bulk enable/disable rules #191492

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ResponseOps] Discrepancy between the rule state and task manager when performing bulkEnable or bulkDisable rules #192207

[ResponseOps] Discrepancy between the rule state and task manager when performing bulkEnable or bulkDisable rules #192207

js-jankisalvi commented Sep 5, 2024

elasticmachine commented Sep 5, 2024

pmuellr commented Sep 5, 2024

enable: task updated then rule update fails

enable: rule updated then task update fails

disable: task updated then rule update fails

pmuellr commented Sep 5, 2024

pmuellr commented Sep 9, 2024

[ResponseOps] Discrepancy between the rule state and task manager when performing bulkEnable or bulkDisable rules #192207

[ResponseOps] Discrepancy between the rule state and task manager when performing bulkEnable or bulkDisable rules #192207

Comments

js-jankisalvi commented Sep 5, 2024

elasticmachine commented Sep 5, 2024

pmuellr commented Sep 5, 2024

enable: task updated then rule update fails

enable: rule updated then task update fails

disable: task updated then rule update fails

pmuellr commented Sep 5, 2024

pmuellr commented Sep 9, 2024