Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Implement recovery steps and retries in packages install process #175597

Closed
5 tasks done
criamico opened this issue Jan 25, 2024 · 1 comment · Fixed by #191515
Closed
5 tasks done

[Fleet] Implement recovery steps and retries in packages install process #175597

criamico opened this issue Jan 25, 2024 · 1 comment · Fixed by #191515
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@criamico
Copy link
Contributor

criamico commented Jan 25, 2024

Phase 1 of #169147

Follow up of #175592.

This is the core functionality that we need to implement to ensure that a package installation is recoverable. We introduced a new state machine functionality in #178657 but we need to implement the following functionalities.

  • Introduce a mechanism that allows to restart the install from the step where it left off the previous time, this should avoid repeating the steps that were already successful. This information should be available in the installationInfo object under latest_executed_state.
  • Introduce a flag that allows to do the whole reinstall from scratch. We could reuse force for that.
  • For each state transition:
    • Define initial conditions that need to be verified before executing that step, if needed. These conditions could be added along each step . We need to define what are the needed assets, SOs etc installed for each one of those steps to be considered "successful".
    • Once [Fleet] Implement executePreCondition for packages state machine functionality #189353 is done, pass the new defined conditions to executePreCondition of each step.
    • Implement retries and recovery steps to execute when the step failed, if needed.

Note: State transitions are defined here.

@criamico criamico added the Team:Fleet Team label for Observability Data Collection Fleet team label Jan 25, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@criamico criamico self-assigned this Apr 17, 2024
@criamico criamico removed their assignment Jul 29, 2024
@criamico criamico changed the title [Fleet] Enhancements on package install state machine [Fleet] Implement recovery steps and retries in packages install process Jul 29, 2024
criamico added a commit that referenced this issue Aug 27, 2024
…90986)

Closes #189353

## Summary

Small change that implements a precondition function for package install
state machine. This is needed for the subsequent work planned in
#169147.

Note that this code is added and tested, but it's not currently used and
it will actually be used only when
#175597 will be implemented.


### Checklist
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
gergoabraham pushed a commit to gergoabraham/kibana that referenced this issue Sep 13, 2024
…ess (elastic#191515)

Closes elastic#175597

## Summary
This PR adds a flag `retryFromLastState` in packages install process
that, in case of failed installation, allows to restart the install
process from the last failed step. This is built upon the changes
implemented in elastic#178657.
When the retry process happens, each step executes some clean up of
previously installed assets before happening again. This should help in
installing cleanly the package.

Retry conditions:
- it happens only when the failed install is of type `reinstall` (so not
in case of updates or rollback)
- It's only executed a max of three times, so in case of persistent
error we don't get into an infinite loop

### Testing
I tested by hardcoding the following error anywhere inside the install
functions:
```
throw new Error('Error installing');
```
and then running
```
 POST kbn:/api/fleet/epm/packages/nginx/1.23.0
 {"force": true }
```
this way is possible to see the three retries when`
handleInstallPackageFailure` is executed.

### Checklist

- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants