Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.6] [Fleet] cancel tasks when 3rd retry failed (#147190) #147230

Merged
merged 1 commit into from
Dec 8, 2022

Conversation

kibanamachine
Copy link
Contributor

Backport

This will backport the following commits from main to 8.6:

Questions ?

Please refer to the Backport tool documentation

## Summary

Related to elastic#144161

Found that on a bulk update tags task failure, the task didn't stop
after 3 retries (should be over in less then a minute), the retries kept
happening for 2 hours.
This change removes the retry task if 3 retries are reached.

Also testing in cloud deployment to see if the tags error can be
reproduced with this fix.
I could reproduce the reported error locally, and seeing it goes away
with this fix.

To verify:
- Add at least 50k agents with the `create_agents` script in kibana repo
- open Kibana, select the 50k agents, and open Actions / Add tags
- Try this in a few seconds: add 2 new tags, and remove one of them
- Wait about 30s, the agents should reflect the changes
- Check the logs to see that the tasks are removed after 3rd retry is
reached or successful.
- Check that there are no more running tasks. Any running task can be
found in Kibana Console by running this query: `GET
.kibana_task_manager/_search?q=task.taskType:"fleet:update_agent_tags:retry"`

Locally simulated an error to test that the retry (and check) task is
removed:

```
[2022-12-07T15:52:16.415+01:00][ERROR][plugins.fleet] Retry elastic#3 of task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b failed: failing task
[2022-12-07T15:52:16.416+01:00][WARN ][plugins.fleet] Stopping after 3rd retry. Error: failing task
[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:check:848984ab-c11d-4ebe-8d1f-606143dd656b
[2022-12-07T15:52:16.416+01:00][INFO ][plugins.fleet] Removing task fleet:update_agent_tags:retry:848984ab-c11d-4ebe-8d1f-606143dd656b
```

(cherry picked from commit 431c32b)
@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Dec 8, 2022
@kibanamachine kibanamachine enabled auto-merge (squash) December 8, 2022 08:18
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 59 65 +6
osquery 108 113 +5
securitySolution 441 447 +6
total +19

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 68 74 +6
osquery 109 115 +6
securitySolution 518 524 +6
total +20

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @juliaElastic

@kibanamachine kibanamachine merged commit fd4a6fc into elastic:8.6 Dec 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants