[esArchiver/deleteIndex] wait and retry if snapshot in progress #18624

spalger · 2018-04-27T00:23:17Z

The esArchiver currently fails if it attempts to delete an index but the index is currently being snapshotted. Automatic snapshotting occurs pretty frequently on elastic cloud, and stores data in s3 so it's not super fast, which means that there is a good chance that at some point in the functional tests the esArchiver will throw an error.

I can't think of any reason why this should be an error that consumers of the esArchiver should be concerned with, so rather than fail when es responds with the error this PR adds wait and retry logic when the specific "Cannot delete indices that are being snapshotted" error is received.

…r-retryWhenSnapshotting

elasticmachine · 2018-05-02T22:45:02Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

spalger · 2018-05-02T23:02:23Z

@archanid @LeeDr this is ready for review when you have a chance

LeeDr · 2018-05-03T21:38:52Z

I'm still testing but its either not working or I did something wrong. I cherry-picked your commit's to 6.2 branch so that I could run the tests against a Cloud Elasticsearch/Kibana instances.

I'm getting a couple of these failures;
(I think the screenshot fails because of embedded double-quotes "before all")
(the output of this test is a little confusing because these dashboard tests load the .kibana and logstash data in parallel)

               └-: dashboard view edit mode
                 └-> "before all" hook
                 └-> "before all" hook
                   │ debg  load kibana index with visualizations and log data
                   │ info  [dashboard] Loading "mappings.json"
                   │ info  [logstash_functional] Loading "mappings.json"
                   │ info  [logstash_functional] Skipped restore for existing index "logstash-2015.09.21"
                   │ info  [logstash_functional] Loading "data.json.gz"
                   │ info  [logstash_functional] Skipped restore for existing index "logstash-2015.09.22"
                   │ info  [logstash_functional] Skipped restore for existing index "logstash-2015.09.20"
                   │ info  [logstash_functional] Skipped restore for existing index "logstash-2015.09.21"
                   │ info  [logstash_functional] Loading "data.json.gz"
                   │ debg  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\failure\dashboard app dashboard view edit mode "before all" hook.png"
                   │ERROR  SCREENSHOT FAILED
                   │ERROR  Error: ENOENT: no such file or directory, open 'C:\git\master\kibana\test\functional\screenshots\failure\dashboard app dashboard view edit mode "before all" hook.png'
                 └- × fail: "dashboard app dashboard view edit mode "before all" hook"
                 │        [illegal_argument_exception] Cannot delete indices that are being snapshotted: [[.kibana/7CGPZ3COQp67Dip3_iiofw]]. Try again after snapshot finishes or cancel the currently running snapshot.
                 │         :: {"path":"/.kibana","query":{},"statusCode":400,"response":"{\"error\":{\"root_cause\":[{\"type\":\"illegal_argument_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[.kibana/7CGPZ3COQp67Dip3_iiofw]]. Try again after snapshot finishes or cancel the currently running snapshot.\"}],\"type\":\"illegal_argument_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[.kibana/7CGPZ3COQp67Dip3_iiofw]]. Try again after snapshot finishes or cancel the currently running snapshot.\"},\"status\":400}"}
                 │         at respond (node_modules\elasticsearch\src\lib\transport.js:295:15)
                 │         at checkRespForFailure (node_modules\elasticsearch\src\lib\transport.js:254:7)
                 │         at HttpConnector.<anonymous> (node_modules\elasticsearch\src\lib\connectors\http.js:159:7)
                 │         at IncomingMessage.bound (node_modules\elasticsearch\node_modules\lodash\dist\lodash.js:729:21)
                 │         at endReadableNT (_stream_readable.js:1064:12)
                 │         at _combinedTickCallback (internal/process/next_tick.js:138:11)
                 │         at process._tickCallback (internal/process/next_tick.js:180:9)
                 │
                 │
                   └-> "after all" hook
                     │ debg  gotoDashboardLandingPage
                     │ debg  onDashboardLandingPage
                     │ debg  TestSubjects.exists(dashboardLandingPage)
                     │ debg  existsByDisplayedByCssSelector [data-test-subj~="dashboardLandingPage"]
                   └-> "after all" hook

spalger · 2018-05-03T22:53:05Z

Welp, thanks for actually testing this! I only added the retry logic to the delete index stream, which is used by the unload task, but didn't share that logic with the create stream, which also recreates indices and is the place you ran into this error.

I recreated your test by running kibana with:

./bin/kibana --dev --elasticsearch.url=*** --elasticsearch.username=*** --elasticsearch.password=***

then starting a snapshot manually in console:

PUT /_snapshot/found-snapshots/foo
{
  "indices": "logstash-*",
  "ignore_unavailable": true,
  "include_global_state": false
}

# in case you need to start over
DELETE _snapshot/found-snapshots/foo

and quickly starting the es_archiver from the command line:

node scripts/es_archiver.js load logstash_functional --es-url ***

Which produced the following output:

 info  [logstash_functional] Loading "mappings.json"
 info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
...

elasticmachine · 2018-05-03T23:49:49Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

spalger · 2018-05-07T15:02:23Z

@LeeDr @archanid This is ready for another look when you have time.

LeeDr · 2018-05-07T16:51:34Z

Yes, I started testing it last Friday and hope to finish up today...

LeeDr · 2018-05-07T20:46:53Z

I haven't figured out why I still have errors sometimes. Here it worked;

[2018-05-07 15:42:28]      │ info  [logstash_functional] Unloading indices from "mappings.json"
[2018-05-07 15:42:28]      │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
[2018-05-07 15:42:29]      │ info  [logstash_functional] Deleted existing index "logstash-2015.09.22"

And here it still failed;

[2018-05-07 15:51:55]      │ info  [logstash_functional] Unloading indices from "mappings.json"
[2018-05-07 15:51:55]      │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
[2018-05-07 15:51:56]      │ debg  Taking screenshot "C:\git\master\kibana\test\functional\screenshots\failure\context app _before all_ hook.png"
[2018-05-07 15:51:56]    └- × fail: "context app "before all" hook"
[2018-05-07 15:51:56]    │        [illegal_argument_exception] Cannot delete indices that are being snapshotted: [[logstash-2015.09.22/7XJpUddNRIO3-G2nsErJWA]]. Try again after snapshot finishes or cancel the currently running snapshot.
[2018-05-07 15:51:56]    │         :: {"path":"/logstash-2015.09.22","query":{},"statusCode":400,"response":"{\"error\":{\"root_cause\":[{\"type\":\"illegal_argument_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[logstash-2015.09.22/7XJpUddNRIO3-G2nsErJWA]]. Try again after snapshot finishes or cancel the currently running snapshot.\"}],\"type\":\"illegal_argument_exception\",\"reason\":\"Cannot delete indices that are being snapshotted: [[logstash-2015.09.22/7XJpUddNRIO3-G2nsErJWA]]. Try again after snapshot finishes or cancel the currently running snapshot.\"},\"status\":400}"}
[2018-05-07 15:51:56]    │         at respond (node_modules\elasticsearch\src\lib\transport.js:295:15)
[2018-05-07 15:51:56]    │         at checkRespForFailure (node_modules\elasticsearch\src\lib\transport.js:254:7)
[2018-05-07 15:51:56]    │         at HttpConnector.<anonymous> (node_modules\elasticsearch\src\lib\connectors\http.js:159:7)
[2018-05-07 15:51:56]    │         at IncomingMessage.bound (node_modules\elasticsearch\node_modules\lodash\dist\lodash.js:729:21)
[2018-05-07 15:51:56]    │         at endReadableNT (_stream_readable.js:974:12)
[2018-05-07 15:51:56]    │         at _combinedTickCallback (internal/process/next_tick.js:80:11)
[2018-05-07 15:51:56]    │         at process._tickCallback (internal/process/next_tick.js:104:9)
[2018-05-07 15:51:56]    │
[2018-05-07 15:51:56]    │
[2018-05-07 15:51:56]      └-> "after all" hook: unloadMakelogs
[2018-05-07 15:51:56]        │ info  [logstash_functional] Unloading indices from "mappings.json"
[2018-05-07 15:51:56]        │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
[2018-05-07 15:51:57]        │ info  [logstash_functional] Deleted existing index "logstash-2015.09.22"
[2018-05-07 15:51:57]        │ info  [logstash_functional] Deleted existing index "logstash-2015.09.20"
[2018-05-07 15:51:57]        │ info  [logstash_functional] Deleted existing index "logstash-2015.09.21"
[2018-05-07 15:51:57]        │ info  [logstash_functional] Unloading indices from "data.json.gz"
[2018-05-07 15:51:58]      └-> "after all" hook

…r-retryWhenSnapshotting

elasticmachine · 2018-05-07T22:34:38Z

💔 Build Failed

continuous-integration/kibana-ci/pull-request

spalger · 2018-05-08T16:57:50Z

jenkins, test this

elasticmachine · 2018-05-08T18:21:51Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request

LeeDr

LGTM
I don't know if you only added logging in the last commits? But I can't break it now.
I usually only see STARTED and SUCCESS status of the snapshots;

 -------------------------- UNloading ---------------
     │ info  [logstash_functional] Unloading indices from "mappings.json"
     │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
     │ debg  Snapshot backups/snapshot_2183 is STARTED
     │ debg  Snapshot backups/snapshot_2183 is SUCCESS
     │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
     │ debg  Snapshot backups/snapshot_2184 is STARTED
     │ debg  Snapshot backups/snapshot_2184 is SUCCESS
     │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.22" to complete
     │ debg  Snapshot backups/snapshot_2185 is STARTED
     │ debg  Snapshot backups/snapshot_2185 is SUCCESS
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.22"
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.20"
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.21"

but sometimes also see INIT status;

     │ info  [logstash_functional] Unloading indices from "mappings.json"
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.22"
     │ info  [logstash_functional] Waiting for snapshot of "logstash-2015.09.20" to complete
     │ debg  Snapshot backups/snapshot_2261 is INIT
     │ debg  Snapshot backups/snapshot_2261 is SUCCESS
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.20"
     │ info  [logstash_functional] Deleted existing index "logstash-2015.09.21"
     │ info  [logstash_functional] Unloading indices from "data.json.gz"

…tic#18624) * [esArchiver/deleteIndex] wait and retry if snapshot in progress * [esArchiver/deleteIndex] use recursion for retry * [esArchiver/waitForSnapshot] invert status check * [esArchiver] share delete-with-retry with create stream * [esArchiver/stats] include index name in message * [esArchiver/indexDelete] wait for snapshot completion up to three times * [esArchiver] log status of snapshot during checks

…#18624) (#18936) * [esArchiver/deleteIndex] wait and retry if snapshot in progress * [esArchiver/deleteIndex] use recursion for retry * [esArchiver/waitForSnapshot] invert status check * [esArchiver] share delete-with-retry with create stream * [esArchiver/stats] include index name in message * [esArchiver/indexDelete] wait for snapshot completion up to three times * [esArchiver] log status of snapshot during checks

…#18624) (#18935) * [esArchiver/deleteIndex] wait and retry if snapshot in progress * [esArchiver/deleteIndex] use recursion for retry * [esArchiver/waitForSnapshot] invert status check * [esArchiver] share delete-with-retry with create stream * [esArchiver/stats] include index name in message * [esArchiver/indexDelete] wait for snapshot completion up to three times * [esArchiver] log status of snapshot during checks

…#18624) (#18934) * [esArchiver/deleteIndex] wait and retry if snapshot in progress * [esArchiver/deleteIndex] use recursion for retry * [esArchiver/waitForSnapshot] invert status check * [esArchiver] share delete-with-retry with create stream * [esArchiver/stats] include index name in message * [esArchiver/indexDelete] wait for snapshot completion up to three times * [esArchiver] log status of snapshot during checks

spalger · 2018-05-08T23:04:13Z

6.x/6.4: 7d15352
6.3: 3afd4a1
6.2: 514c327

spalger added review Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v7.0.0 v6.3.0 v6.4.0 labels Apr 27, 2018

spalger requested review from rhoboat and LeeDr April 27, 2018 00:23

This comment has been minimized.

Sign in to view

spalger force-pushed the fix/esArchiver-retryWhenSnapshotting branch from 312bce8 to 12554e3 Compare April 27, 2018 01:33

[esArchiver/deleteIndex] wait and retry if snapshot in progress

f58e51f

spalger force-pushed the fix/esArchiver-retryWhenSnapshotting branch from 12554e3 to f58e51f Compare April 27, 2018 01:52

This comment has been minimized.

Sign in to view

Merge branch 'master' of github.com:elastic/kibana into fix/esArchive…

7ab3340

…r-retryWhenSnapshotting

This comment has been minimized.

Sign in to view

spalger added 2 commits May 2, 2018 14:17

[esArchiver/deleteIndex] use recursion for retry

e924ed5

[esArchiver/waitForSnapshot] invert status check

07e428a

spalger added 2 commits May 3, 2018 15:47

[esArchiver] share delete-with-retry with create stream

0937b39

[esArchiver/stats] include index name in message

8e11070

spalger added 3 commits May 7, 2018 14:05

Merge branch 'master' of github.com:elastic/kibana into fix/esArchive…

864fd31

…r-retryWhenSnapshotting

[esArchiver/indexDelete] wait for snapshot completion up to three times

b1beabb

[esArchiver] log status of snapshot during checks

5174107

spalger added the v6.2.5 label May 8, 2018

LeeDr approved these changes May 8, 2018

View reviewed changes

spalger merged commit 8a2a11e into elastic:master May 8, 2018

spalger mentioned this pull request May 8, 2018

[6.x] [esArchiver/deleteIndex] wait and retry if snapshot in progress (#18624) #18934

Merged

spalger mentioned this pull request May 8, 2018

[6.3] [esArchiver/deleteIndex] wait and retry if snapshot in progress (#18624) #18935

Merged

spalger mentioned this pull request May 8, 2018

[6.2] [esArchiver/deleteIndex] wait and retry if snapshot in progress (#18624) #18936

Merged

spalger deleted the fix/esArchiver-retryWhenSnapshotting branch May 8, 2018 23:04

LeeDr mentioned this pull request Jun 25, 2019

machine learning feature controls security "before all" hook #39381

Closed

liza-mae mentioned this pull request Jun 28, 2019

Delete index when cloud snapshotting needs esArchiver retry #39919

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[esArchiver/deleteIndex] wait and retry if snapshot in progress #18624

[esArchiver/deleteIndex] wait and retry if snapshot in progress #18624

spalger commented Apr 27, 2018

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

elasticmachine commented May 2, 2018

spalger commented May 2, 2018

LeeDr commented May 3, 2018 •

edited

Loading

spalger commented May 3, 2018

elasticmachine commented May 3, 2018

spalger commented May 7, 2018

LeeDr commented May 7, 2018

LeeDr commented May 7, 2018 •

edited

Loading

elasticmachine commented May 7, 2018

spalger commented May 8, 2018

elasticmachine commented May 8, 2018

LeeDr left a comment

spalger commented May 8, 2018

[esArchiver/deleteIndex] wait and retry if snapshot in progress #18624

[esArchiver/deleteIndex] wait and retry if snapshot in progress #18624

Conversation

spalger commented Apr 27, 2018

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

elasticmachine commented May 2, 2018

💚 Build Succeeded

spalger commented May 2, 2018

LeeDr commented May 3, 2018 • edited Loading

spalger commented May 3, 2018

elasticmachine commented May 3, 2018

💚 Build Succeeded

spalger commented May 7, 2018

LeeDr commented May 7, 2018

LeeDr commented May 7, 2018 • edited Loading

elasticmachine commented May 7, 2018

💔 Build Failed

spalger commented May 8, 2018

elasticmachine commented May 8, 2018

💚 Build Succeeded

LeeDr left a comment

Choose a reason for hiding this comment

spalger commented May 8, 2018

LeeDr commented May 3, 2018 •

edited

Loading

LeeDr commented May 7, 2018 •

edited

Loading