-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[esArchiver/deleteIndex] wait and retry if snapshot in progress #18624
[esArchiver/deleteIndex] wait and retry if snapshot in progress #18624
Conversation
This comment has been minimized.
This comment has been minimized.
312bce8
to
12554e3
Compare
12554e3
to
f58e51f
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…r-retryWhenSnapshotting
This comment has been minimized.
This comment has been minimized.
💚 Build Succeeded |
I'm still testing but its either not working or I did something wrong. I cherry-picked your commit's to 6.2 branch so that I could run the tests against a Cloud Elasticsearch/Kibana instances. I'm getting a couple of these failures;
|
Welp, thanks for actually testing this! I only added the retry logic to the delete index stream, which is used by the unload task, but didn't share that logic with the create stream, which also recreates indices and is the place you ran into this error. I recreated your test by running kibana with:
then starting a snapshot manually in console:
and quickly starting the es_archiver from the command line:
Which produced the following output:
|
💚 Build Succeeded |
Yes, I started testing it last Friday and hope to finish up today... |
I haven't figured out why I still have errors sometimes. Here it worked;
And here it still failed;
|
💔 Build Failed |
jenkins, test this |
💚 Build Succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I don't know if you only added logging in the last commits? But I can't break it now.
I usually only see STARTED and SUCCESS status of the snapshots;
-------------------------- UNloading ---------------
│ info [logstash_functional] Unloading indices from "mappings.json"
│ info [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
│ debg Snapshot backups/snapshot_2183 is STARTED
│ debg Snapshot backups/snapshot_2183 is SUCCESS
│ info [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
│ debg Snapshot backups/snapshot_2184 is STARTED
│ debg Snapshot backups/snapshot_2184 is SUCCESS
│ info [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
│ debg Snapshot backups/snapshot_2185 is STARTED
│ debg Snapshot backups/snapshot_2185 is SUCCESS
│ info [logstash_functional] Deleted existing index "logstash-2015.09.22"
│ info [logstash_functional] Deleted existing index "logstash-2015.09.20"
│ info [logstash_functional] Deleted existing index "logstash-2015.09.21"
but sometimes also see INIT status;
│ info [logstash_functional] Unloading indices from "mappings.json"
│ info [logstash_functional] Deleted existing index "logstash-2015.09.22"
│ info [logstash_functional] Waiting for snapshot of "logstash-2015.09.20" to complete
│ debg Snapshot backups/snapshot_2261 is INIT
│ debg Snapshot backups/snapshot_2261 is SUCCESS
│ info [logstash_functional] Deleted existing index "logstash-2015.09.20"
│ info [logstash_functional] Deleted existing index "logstash-2015.09.21"
│ info [logstash_functional] Unloading indices from "data.json.gz"
…tic#18624) * [esArchiver/deleteIndex] wait and retry if snapshot in progress * [esArchiver/deleteIndex] use recursion for retry * [esArchiver/waitForSnapshot] invert status check * [esArchiver] share delete-with-retry with create stream * [esArchiver/stats] include index name in message * [esArchiver/indexDelete] wait for snapshot completion up to three times * [esArchiver] log status of snapshot during checks
…tic#18624) * [esArchiver/deleteIndex] wait and retry if snapshot in progress * [esArchiver/deleteIndex] use recursion for retry * [esArchiver/waitForSnapshot] invert status check * [esArchiver] share delete-with-retry with create stream * [esArchiver/stats] include index name in message * [esArchiver/indexDelete] wait for snapshot completion up to three times * [esArchiver] log status of snapshot during checks
…tic#18624) * [esArchiver/deleteIndex] wait and retry if snapshot in progress * [esArchiver/deleteIndex] use recursion for retry * [esArchiver/waitForSnapshot] invert status check * [esArchiver] share delete-with-retry with create stream * [esArchiver/stats] include index name in message * [esArchiver/indexDelete] wait for snapshot completion up to three times * [esArchiver] log status of snapshot during checks
…#18624) (#18936) * [esArchiver/deleteIndex] wait and retry if snapshot in progress * [esArchiver/deleteIndex] use recursion for retry * [esArchiver/waitForSnapshot] invert status check * [esArchiver] share delete-with-retry with create stream * [esArchiver/stats] include index name in message * [esArchiver/indexDelete] wait for snapshot completion up to three times * [esArchiver] log status of snapshot during checks
…#18624) (#18935) * [esArchiver/deleteIndex] wait and retry if snapshot in progress * [esArchiver/deleteIndex] use recursion for retry * [esArchiver/waitForSnapshot] invert status check * [esArchiver] share delete-with-retry with create stream * [esArchiver/stats] include index name in message * [esArchiver/indexDelete] wait for snapshot completion up to three times * [esArchiver] log status of snapshot during checks
…#18624) (#18934) * [esArchiver/deleteIndex] wait and retry if snapshot in progress * [esArchiver/deleteIndex] use recursion for retry * [esArchiver/waitForSnapshot] invert status check * [esArchiver] share delete-with-retry with create stream * [esArchiver/stats] include index name in message * [esArchiver/indexDelete] wait for snapshot completion up to three times * [esArchiver] log status of snapshot during checks
6.x/6.4: 7d15352 |
Fixes #17416
The esArchiver currently fails if it attempts to delete an index but the index is currently being snapshotted. Automatic snapshotting occurs pretty frequently on elastic cloud, and stores data in s3 so it's not super fast, which means that there is a good chance that at some point in the functional tests the esArchiver will throw an error.
I can't think of any reason why this should be an error that consumers of the esArchiver should be concerned with, so rather than fail when es responds with the error this PR adds wait and retry logic when the specific "Cannot delete indices that are being snapshotted" error is received.