-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Improve recoverability and stability of package installation #169147
Comments
Pinging @elastic/fleet (Team:Fleet) |
Adding some considerations as discussed with @kpollich We could start by looking at the specific steps that are covered by the installation process and documenting it. We have a complex state machine and we go through those steps (and a lot of side effect) every time an integration is installed, but we don't really have it documented anywhere and whole install process is a little opaque. This brings me to the second point: whenever an integration goes to a bad state (like failed_install) we don't really have a way to restart from the failed step, but we need to force doing it all over. As highlighted in this comment, we could even implement retries on those steps, but currently we don't even have granularity on the steps. It's just a single endpoint and what we ask users to do is usually this:
Third consideration is that we could maybe reuse the new input template endpoint to simplify the installation process. The endpoint only returns the inputs, but we could easily reuse part of the logic to return the rest of the integration info and simplify the whole install flow. We could easily add an endpoint under the same namespace that returns the rest of the integrations info and not only the inputs. |
Adding some comments per discussion with @nchaulet:
|
@kpollich I split the items in phase 1 and added some further details in the descriptions as we discussed recently. |
…90986) Closes #189353 ## Summary Small change that implements a precondition function for package install state machine. This is needed for the subsequent work planned in #169147. Note that this code is added and tested, but it's not currently used and it will actually be used only when #175597 will be implemented. ### Checklist - [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
@kpollich what should we do with the remaining issues here? should we split them into another meta to be dealt with later? |
I don't think there's any reason to split them out into a new issue. This has sort of just become a tracking issue for tech debt to fill out quality sprints which I think is fine. |
ok I have moved it out to Q1 for now |
Meta issue tracking the work for recoverability and stability of package installation
Currently, it's hard to recover a failed package installation as our only recourse is typically to reinstall the package. There's no granular recovery steps we can take, and we often lack visibility into which particular steps failed. It'd be ideal if we could build a more "state machine" like implementation for packages with specific recovery steps for each state transition along the way.
Ref #166857
Ref #166798
The text was updated successfully, but these errors were encountered: