Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete index when cloud snapshotting needs esArchiver retry #39919

Closed
liza-mae opened this issue Jun 28, 2019 · 6 comments · Fixed by #50781
Closed

Delete index when cloud snapshotting needs esArchiver retry #39919

liza-mae opened this issue Jun 28, 2019 · 6 comments · Fixed by #50781
Assignees
Labels
failed-test A test failure on a tracked branch, potentially flaky-test high Team:Operations Team label for Operations Team test_infra test test-cloud

Comments

@liza-mae
Copy link
Contributor

On cloud I see this error once during our test runs:

{"path":"/.kibana_1%2C.kibana_2","query":{},"statusCode":400,"response":"{"error":{"root_cause":[{"type":"snapshot_in_progress_exception","reason":"Cannot delete indices that are being snapshotted: [[.kibana_1/QgsxgBUARzWvK3c9daDqKw], [.kibana_2/adK-0Z5NSyqjShoJNOd2tQ]]. Try again after snapshot finishes or cancel the currently running snapshot."}],"type":"snapshot_in_progress_exception","reason":"Cannot delete indices that are being snapshotted: [[.kibana_1/QgsxgBUARzWvK3c9daDqKw], [.kibana_2/adK-0Z5NSyqjShoJNOd2tQ]]. Try again after snapshot finishes or cancel the currently running snapshot."},"status":400}"}

In #39381 when it occurred, Lee mentioned it might be related to an esArchiver PR: #18624

I noticed it is checking for status code 500 but status 400 is being returned, maybe we should also check for 400 status code?

See:

await new Promise(resolve => setTimeout(resolve, 500));

@liza-mae liza-mae added test Team:Operations Team label for Operations Team failed-test A test failure on a tracked branch, potentially flaky-test test-cloud labels Jun 28, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-test-triage

@liza-mae
Copy link
Contributor Author

@jinmu03 we need someone from ops team to look at this issue. Thanks.

@tylersmalley
Copy link
Contributor

tylersmalley commented Nov 14, 2019

I don't know what we are going to do about this. Does Cloud have an API or anything to disable snapshotting? The issues is ES does not allow you to delete an index while it's being snapshotted, and we do that all the time in functional tests. It's possible we could re-architect things to not require deleting but that's a pretty massive change.

@liza-mae
Copy link
Contributor Author

liza-mae commented Nov 14, 2019

As noted above, Spencer added something for this in esArchiver, I thought maybe since the original code was looking for status 500 but seems like we are getting 400 now that might fix it. I am not familiar with esArchiver so not sure, maybe @spalger can comment on if it would fix it.
#18624

Possibly this fiile? https://github.com/elastic/kibana/blob/master/src/es_archiver/lib/indices/delete_index.js

@liza-mae
Copy link
Contributor Author

I took a closer look, seems we just need to increase the retry done in es_archiver delete_index. I have put in a PR to update. Tested on 7.5.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
failed-test A test failure on a tracked branch, potentially flaky-test high Team:Operations Team label for Operations Team test_infra test test-cloud
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants