Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[esArchiver/deleteIndex] wait and retry if snapshot in progress #18624

Merged

Conversation

spalger
Copy link
Contributor

@spalger spalger commented Apr 27, 2018

Fixes #17416

The esArchiver currently fails if it attempts to delete an index but the index is currently being snapshotted. Automatic snapshotting occurs pretty frequently on elastic cloud, and stores data in s3 so it's not super fast, which means that there is a good chance that at some point in the functional tests the esArchiver will throw an error.

I can't think of any reason why this should be an error that consumers of the esArchiver should be concerned with, so rather than fail when es responds with the error this PR adds wait and retry logic when the specific "Cannot delete indices that are being snapshotted" error is received.

@spalger spalger added review Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.0.0 v6.3.0 v6.4.0 labels Apr 27, 2018
@spalger spalger requested review from rhoboat and LeeDr April 27, 2018 00:23
@elasticmachine

This comment has been minimized.

@spalger spalger force-pushed the fix/esArchiver-retryWhenSnapshotting branch from 312bce8 to 12554e3 Compare April 27, 2018 01:33
@spalger spalger force-pushed the fix/esArchiver-retryWhenSnapshotting branch from 12554e3 to f58e51f Compare April 27, 2018 01:52
@elasticmachine

This comment has been minimized.

@LeeDr

This comment has been minimized.

@elasticmachine

This comment has been minimized.

@LeeDr

This comment has been minimized.

@LeeDr

This comment has been minimized.

@elasticmachine

This comment has been minimized.

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@spalger
Copy link
Contributor Author

spalger commented May 2, 2018

@archanid @LeeDr this is ready for review when you have a chance

@LeeDr
Copy link
Contributor

LeeDr commented May 3, 2018

I'm still testing but its either not working or I did something wrong. I cherry-picked your commit's to 6.2 branch so that I could run the tests against a Cloud Elasticsearch/Kibana instances.

I'm getting a couple of these failures;
(I think the screenshot fails because of embedded double-quotes "before all")
(the output of this test is a little confusing because these dashboard tests load the .kibana and logstash data in parallel)

               └-: dashboard view edit mode
                 └-> "before all" hook
                 └-> "before all" hook
                   │ debg  load kibana index with visualizations and log data
                   │ info  [dashboard] Loading "mappings.json"
                   │ info  [logstash_functional] Loading "mappings.json"
                   │ info  [logstash_functional] Skipped restore for existing index "logstash-2015.09.21"
                   │ info  [logstash_functional] Loading "data.json.gz"
                   │ info  [logstash_functional] Skipped restore for existing index "logstash-2015.09.22"
                   │ info  [logstash_functional] Skipped restore for existing index "logstash-2015.09.20"
                   │ info  [logstash_functional] Skipped restore for existing index "logstash-2015.09.21"
                   │ info  [logstash_functional] Loading "data.json.gz"
                   │ debg  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\failure\dashboard app dashboard view edit mode "before all" hook.png"
                   │ERROR  SCREENSHOT FAILED
                   │ERROR  Error: ENOENT: no such file or directory, open 'C:\git\master\kibana\test\functional\screenshots\failure\dashboard app dashboard view edit mode "before all" hook.png'
                 └- × fail: "dashboard app dashboard view edit mode "before all" hook"
                 │        [illegal_argument_exception] Cannot delete indices that are being snapshotted: [[.kibana/7CGPZ3COQp67Dip3_iiofw]]. Try again after snapshot finishes or cancel the currently running snapshot.
                 │         :: {"path":"/.kibana","query":{},"statusCode":400,"response":"{\"error\":{\"root_cause\":[{\"type\":\"illegal_argument_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[.kibana/7CGPZ3COQp67Dip3_iiofw]]. Try again after snapshot finishes or cancel the currently running snapshot.\"}],\"type\":\"illegal_argument_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[.kibana/7CGPZ3COQp67Dip3_iiofw]]. Try again after snapshot finishes or cancel the currently running snapshot.\"},\"status\":400}"}
                 │         at respond (node_modules\elasticsearch\src\lib\transport.js:295:15)
                 │         at checkRespForFailure (node_modules\elasticsearch\src\lib\transport.js:254:7)
                 │         at HttpConnector.<anonymous> (node_modules\elasticsearch\src\lib\connectors\http.js:159:7)
                 │         at IncomingMessage.bound (node_modules\elasticsearch\node_modules\lodash\dist\lodash.js:729:21)
                 │         at endReadableNT (_stream_readable.js:1064:12)
                 │         at _combinedTickCallback (internal/process/next_tick.js:138:11)
                 │         at process._tickCallback (internal/process/next_tick.js:180:9)
                 │
                 │
                   └-> "after all" hook
                     │ debg  gotoDashboardLandingPage
                     │ debg  onDashboardLandingPage
                     │ debg  TestSubjects.exists(dashboardLandingPage)
                     │ debg  existsByDisplayedByCssSelector [data-test-subj~="dashboardLandingPage"]
                   └-> "after all" hook

@spalger
Copy link
Contributor Author

spalger commented May 3, 2018

Welp, thanks for actually testing this! I only added the retry logic to the delete index stream, which is used by the unload task, but didn't share that logic with the create stream, which also recreates indices and is the place you ran into this error.

I recreated your test by running kibana with:

./bin/kibana --dev --elasticsearch.url=*** --elasticsearch.username=*** --elasticsearch.password=***

then starting a snapshot manually in console:

PUT /_snapshot/found-snapshots/foo
{
  "indices": "logstash-*",
  "ignore_unavailable": true,
  "include_global_state": false
}

# in case you need to start over
DELETE _snapshot/found-snapshots/foo

and quickly starting the es_archiver from the command line:

node scripts/es_archiver.js load logstash_functional --es-url ***

Which produced the following output:

 info  [logstash_functional] Loading "mappings.json"
 info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
...

@elasticmachine
Copy link
Contributor

💚 Build Succeeded

@spalger
Copy link
Contributor Author

spalger commented May 7, 2018

@LeeDr @archanid This is ready for another look when you have time.

@LeeDr
Copy link
Contributor

LeeDr commented May 7, 2018

Yes, I started testing it last Friday and hope to finish up today...

@LeeDr
Copy link
Contributor

LeeDr commented May 7, 2018

I haven't figured out why I still have errors sometimes. Here it worked;

[2018-05-07 15:42:28]      │ info  [logstash_functional] Unloading indices from "mappings.json"
[2018-05-07 15:42:28]      │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
[2018-05-07 15:42:29]      │ info  [logstash_functional] Deleted existing index "logstash-2015.09.22"

And here it still failed;

[2018-05-07 15:51:55]      │ info  [logstash_functional] Unloading indices from "mappings.json"
[2018-05-07 15:51:55]      │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
[2018-05-07 15:51:56]      │ debg  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\failure\context app _before all_ hook.png"
[2018-05-07 15:51:56]    └- × fail: "context app "before all" hook"
[2018-05-07 15:51:56]    │        [illegal_argument_exception] Cannot delete indices that are being snapshotted: [[logstash-2015.09.22/7XJpUddNRIO3-G2nsErJWA]]. Try again after snapshot finishes or cancel the currently running snapshot.
[2018-05-07 15:51:56]    │         :: {"path":"/logstash-2015.09.22","query":{},"statusCode":400,"response":"{\"error\":{\"root_cause\":[{\"type\":\"illegal_argument_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[logstash-2015.09.22/7XJpUddNRIO3-G2nsErJWA]]. Try again after snapshot finishes or cancel the currently running snapshot.\"}],\"type\":\"illegal_argument_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[logstash-2015.09.22/7XJpUddNRIO3-G2nsErJWA]]. Try again after snapshot finishes or cancel the currently running snapshot.\"},\"status\":400}"}
[2018-05-07 15:51:56]    │         at respond (node_modules\elasticsearch\src\lib\transport.js:295:15)
[2018-05-07 15:51:56]    │         at checkRespForFailure (node_modules\elasticsearch\src\lib\transport.js:254:7)
[2018-05-07 15:51:56]    │         at HttpConnector.<anonymous> (node_modules\elasticsearch\src\lib\connectors\http.js:159:7)
[2018-05-07 15:51:56]    │         at IncomingMessage.bound (node_modules\elasticsearch\node_modules\lodash\dist\lodash.js:729:21)
[2018-05-07 15:51:56]    │         at endReadableNT (_stream_readable.js:974:12)
[2018-05-07 15:51:56]    │         at _combinedTickCallback (internal/process/next_tick.js:80:11)
[2018-05-07 15:51:56]    │         at process._tickCallback (internal/process/next_tick.js:104:9)
[2018-05-07 15:51:56]    │
[2018-05-07 15:51:56]    │
[2018-05-07 15:51:56]      └-> "after all" hook: unloadMakelogs
[2018-05-07 15:51:56]        │ info  [logstash_functional] Unloading indices from "mappings.json"
[2018-05-07 15:51:56]        │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
[2018-05-07 15:51:57]        │ info  [logstash_functional] Deleted existing index "logstash-2015.09.22"
[2018-05-07 15:51:57]        │ info  [logstash_functional] Deleted existing index "logstash-2015.09.20"
[2018-05-07 15:51:57]        │ info  [logstash_functional] Deleted existing index "logstash-2015.09.21"
[2018-05-07 15:51:57]        │ info  [logstash_functional] Unloading indices from "data.json.gz"
[2018-05-07 15:51:58]      └-> "after all" hook

@elasticmachine
Copy link
Contributor

💔 Build Failed

@spalger
Copy link
Contributor Author

spalger commented May 8, 2018

jenkins, test this

@spalger spalger added the v6.2.5 label May 8, 2018
@elasticmachine
Copy link
Contributor

💚 Build Succeeded

Copy link
Contributor

@LeeDr LeeDr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I don't know if you only added logging in the last commits? But I can't break it now.
I usually only see STARTED and SUCCESS status of the snapshots;

 -------------------------- UNloading ---------------
     │ info  [logstash_functional] Unloading indices from "mappings.json"
     │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
     │ debg  Snapshot backups/snapshot_2183 is STARTED
     │ debg  Snapshot backups/snapshot_2183 is SUCCESS
     │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
     │ debg  Snapshot backups/snapshot_2184 is STARTED
     │ debg  Snapshot backups/snapshot_2184 is SUCCESS
     │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
     │ debg  Snapshot backups/snapshot_2185 is STARTED
     │ debg  Snapshot backups/snapshot_2185 is SUCCESS
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.22"
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.20"
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.21"

but sometimes also see INIT status;

     │ info  [logstash_functional] Unloading indices from "mappings.json"
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.22"
     │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.20" to complete
     │ debg  Snapshot backups/snapshot_2261 is INIT
     │ debg  Snapshot backups/snapshot_2261 is SUCCESS
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.20"
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.21"
     │ info  [logstash_functional] Unloading indices from "data.json.gz"

@spalger spalger merged commit 8a2a11e into elastic:master May 8, 2018
spalger added a commit to spalger/kibana that referenced this pull request May 8, 2018
…tic#18624)

* [esArchiver/deleteIndex] wait and retry if snapshot in progress

* [esArchiver/deleteIndex] use recursion for retry

* [esArchiver/waitForSnapshot] invert status check

* [esArchiver] share delete-with-retry with create stream

* [esArchiver/stats] include index name in message

* [esArchiver/indexDelete] wait for snapshot completion up to three times

* [esArchiver] log status of snapshot during checks
spalger added a commit to spalger/kibana that referenced this pull request May 8, 2018
…tic#18624)

* [esArchiver/deleteIndex] wait and retry if snapshot in progress

* [esArchiver/deleteIndex] use recursion for retry

* [esArchiver/waitForSnapshot] invert status check

* [esArchiver] share delete-with-retry with create stream

* [esArchiver/stats] include index name in message

* [esArchiver/indexDelete] wait for snapshot completion up to three times

* [esArchiver] log status of snapshot during checks
spalger added a commit to spalger/kibana that referenced this pull request May 8, 2018
…tic#18624)

* [esArchiver/deleteIndex] wait and retry if snapshot in progress

* [esArchiver/deleteIndex] use recursion for retry

* [esArchiver/waitForSnapshot] invert status check

* [esArchiver] share delete-with-retry with create stream

* [esArchiver/stats] include index name in message

* [esArchiver/indexDelete] wait for snapshot completion up to three times

* [esArchiver] log status of snapshot during checks
spalger added a commit that referenced this pull request May 8, 2018
…#18624) (#18936)

* [esArchiver/deleteIndex] wait and retry if snapshot in progress

* [esArchiver/deleteIndex] use recursion for retry

* [esArchiver/waitForSnapshot] invert status check

* [esArchiver] share delete-with-retry with create stream

* [esArchiver/stats] include index name in message

* [esArchiver/indexDelete] wait for snapshot completion up to three times

* [esArchiver] log status of snapshot during checks
spalger added a commit that referenced this pull request May 8, 2018
…#18624) (#18935)

* [esArchiver/deleteIndex] wait and retry if snapshot in progress

* [esArchiver/deleteIndex] use recursion for retry

* [esArchiver/waitForSnapshot] invert status check

* [esArchiver] share delete-with-retry with create stream

* [esArchiver/stats] include index name in message

* [esArchiver/indexDelete] wait for snapshot completion up to three times

* [esArchiver] log status of snapshot during checks
spalger added a commit that referenced this pull request May 8, 2018
…#18624) (#18934)

* [esArchiver/deleteIndex] wait and retry if snapshot in progress

* [esArchiver/deleteIndex] use recursion for retry

* [esArchiver/waitForSnapshot] invert status check

* [esArchiver] share delete-with-retry with create stream

* [esArchiver/stats] include index name in message

* [esArchiver/indexDelete] wait for snapshot completion up to three times

* [esArchiver] log status of snapshot during checks
@spalger
Copy link
Contributor Author

spalger commented May 8, 2018

6.x/6.4: 7d15352
6.3: 3afd4a1
6.2: 514c327

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
review Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v6.2.5 v6.3.0 v6.4.0 v7.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants