Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Use doc ID to deduplicate shard allocations #143963

Merged
merged 8 commits into from
Nov 17, 2022

Conversation

miltonhultgren
Copy link
Contributor

@miltonhultgren miltonhultgren commented Oct 25, 2022

Summary

This PR depends on Metricbeat changes introduced in this PR elastic/beats#33457 or the Internal collection changes introduced in elastic/elasticsearch#91153.

The above PR fixes a bug that makes Metricbeat properly report on all the shards in an index, but our shard legend had the same kind of bug, it only displayed the first replica because it didn't account for the replica "id" in the function that tries to deduplicate the results.

Since the above PR ensures we report each shard with a unique document ID, we change the Kibana code to use the document ID instead of trying to regenerate a unique ID.

How to test

Get the changes: git fetch git@github.com:miltonhultgren/kibana.git sm-shard-allocations:sm-shard-allocations && git switch sm-shard-allocations

Start Elasticsearch, create an index like this:

PUT /some-index
{
  "settings": {
    "index": {
      "number_of_shards": 3,  
      "number_of_replicas": 3
    }
  }
}

Run Metricbeat with xpack.enabled: true.

Visit the Index details page for some-index in the Stack Monitoring app and verify that on a single node cluster you have 12 shards, 9 of which are unassigned.

Screenshot 2022-10-25 at 17 19 25

@miltonhultgren miltonhultgren added Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services release_note:skip Skip the PR/issue when compiling release notes Feature:Stack Monitoring backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) labels Oct 25, 2022
@miltonhultgren miltonhultgren requested a review from a team as a code owner October 25, 2022 15:43
@elasticmachine
Copy link
Contributor

Pinging @elastic/infra-monitoring-ui (Team:Infra Monitoring UI)

@crespocarlos
Copy link
Contributor

Interesting! I see the boxes and the numbers, but I still don't know how I would use this information. Do we match these numbers with something else to make more sense of this information?

image

@miltonhultgren
Copy link
Contributor Author

@crespocarlos

So the number (0, 1, 2) means the shard number. So you'll see 4 0s which is:
0 (primary 0 for index some-index)
0 (replica 1 of primary 0)
0 (replica 2 of primary 0)
0 (replica 3 of primary 0)

And so on for the other 2 primaries and their 6 replicas.
The fact that they are all unassigned replicas means they all get the color yellow but if they were primaries they would be orange.

So in your screenshot, the 3 primaries for shard 0, 1 and 2 are allocated to my machine. While the 9 replicas (3 for primary 0, 3 for primary 1, 3 for primary 2) are unassigned. If you add more nodes it becomes a bit more clear.

Screenshot 2022-10-27 at 21 18 51

Here you can see 1 replica for each shard (0, 1 and 2) are unassigned, while the 3 primaries are on main-1 while the other replicas are spread across main-0 and main-2.

@miltonhultgren miltonhultgren self-assigned this Oct 28, 2022
@miltonhultgren miltonhultgren marked this pull request as draft November 7, 2022 15:42
@miltonhultgren miltonhultgren marked this pull request as ready for review November 10, 2022 16:56
@crespocarlos
Copy link
Contributor

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id before after diff
osquery 1 2 +1

ESLint disabled line counts

id before after diff
enterpriseSearch 19 21 +2
fleet 59 65 +6
osquery 108 113 +5
securitySolution 441 447 +6
total +19

Total ESLint disabled count

id before after diff
enterpriseSearch 20 22 +2
fleet 67 73 +6
osquery 109 115 +6
securitySolution 518 524 +6
total +20

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @miltonhultgren

return {
_id: `STATE_UUID:${shard.node}:${shard.index}:s${shard.shard}:${
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this s in s${shard.shard} intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's to mirror the JSON data which looks like shard: 0, and to match with p or r1, "shard 0, primary", "shard 1, replica 2" etc.

Copy link
Contributor

@crespocarlos crespocarlos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@miltonhultgren miltonhultgren merged commit b7debc8 into elastic:main Nov 17, 2022
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 17, 2022
…ic#143963)

## Summary

This PR depends on Metricbeat changes introduced in this PR
elastic/beats#33457 or the Internal collection
changes introduced in
elastic/elasticsearch#91153.

The above PR fixes a bug that makes Metricbeat properly report on all
the shards in an index, but our shard legend had the same kind of bug,
it only displayed the first replica because it didn't account for the
replica "id" in the function that tries to deduplicate the results.

Since the above PR ensures we report each shard with a unique document
ID, we change the Kibana code to use the document ID instead of trying
to regenerate a unique ID.

### How to test
Get the changes: `git fetch git@github.com:miltonhultgren/kibana.git
sm-shard-allocations:sm-shard-allocations && git switch
sm-shard-allocations`

Start Elasticsearch, create an index like this:
```
PUT /some-index
{
  "settings": {
    "index": {
      "number_of_shards": 3,
      "number_of_replicas": 3
    }
  }
}
```

Run Metricbeat with `xpack.enabled: true`.

Visit the Index details page for `some-index` in the Stack Monitoring
app and verify that on a single node cluster you have 12 shards, 9 of
which are unassigned.

<img width="753" alt="Screenshot 2022-10-25 at 17 19 25"
src="https://user-images.githubusercontent.com/2564140/197819368-8ea45e1c-7472-4e15-9267-3b5d73378f2a.png">

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
(cherry picked from commit b7debc8)
@kibanamachine
Copy link
Contributor

💚 All backports created successfully

Status Branch Result
8.6

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

kibanamachine added a commit that referenced this pull request Nov 17, 2022
…143963) (#145520)

# Backport

This will backport the following commits from `main` to `8.6`:
- [[Stack Monitoring] Use doc ID to deduplicate shard allocations
(#143963)](#143963)

<!--- Backport version: 8.9.7 -->

### Questions ?
Please refer to the [Backport tool
documentation](https://github.com/sqren/backport)

<!--BACKPORT [{"author":{"name":"Milton
Hultgren","email":"milton.hultgren@elastic.co"},"sourceCommit":{"committedDate":"2022-11-17T09:44:58Z","message":"[Stack
Monitoring] Use doc ID to deduplicate shard allocations (#143963)\n\n##
Summary\r\n\r\nThis PR depends on Metricbeat changes introduced in this
PR\r\nhttps://github.com/elastic/beats/pull/33457 or the Internal
collection\r\nchanges introduced
in\r\nhttps://github.com/elastic/elasticsearch/pull/91153.\r\n\r\nThe
above PR fixes a bug that makes Metricbeat properly report on all\r\nthe
shards in an index, but our shard legend had the same kind of bug,\r\nit
only displayed the first replica because it didn't account for
the\r\nreplica \"id\" in the function that tries to deduplicate the
results.\r\n\r\nSince the above PR ensures we report each shard with a
unique document\r\nID, we change the Kibana code to use the document ID
instead of trying\r\nto regenerate a unique ID.\r\n\r\n### How to
test\r\nGet the changes: `git fetch
git@github.com:miltonhultgren/kibana.git\r\nsm-shard-allocations:sm-shard-allocations
&& git switch\r\nsm-shard-allocations`\r\n\r\nStart Elasticsearch,
create an index like this:\r\n```\r\nPUT /some-index\r\n{\r\n
\"settings\": {\r\n \"index\": {\r\n \"number_of_shards\": 3, \r\n
\"number_of_replicas\": 3\r\n }\r\n }\r\n}\r\n```\r\n\r\nRun Metricbeat
with `xpack.enabled: true`. \r\n\r\nVisit the Index details page for
`some-index` in the Stack Monitoring\r\napp and verify that on a single
node cluster you have 12 shards, 9 of\r\nwhich are
unassigned.\r\n\r\n<img width=\"753\" alt=\"Screenshot 2022-10-25 at 17
19
25\"\r\nsrc=\"https://user-images.githubusercontent.com/2564140/197819368-8ea45e1c-7472-4e15-9267-3b5d73378f2a.png\">\r\n\r\nCo-authored-by:
Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"b7debc839ff6923923fe9a41355bd0318e04cdca","branchLabelMapping":{"^v8.7.0$":"main","^v(\\d+).(\\d+).\\d+$":"$1.$2"}},"sourcePullRequest":{"labels":["Team:Infra
Monitoring UI","release_note:skip","Feature:Stack
Monitoring","backport:prev-minor","v8.7.0"],"number":143963,"url":"https://github.com/elastic/kibana/pull/143963","mergeCommit":{"message":"[Stack
Monitoring] Use doc ID to deduplicate shard allocations (#143963)\n\n##
Summary\r\n\r\nThis PR depends on Metricbeat changes introduced in this
PR\r\nhttps://github.com/elastic/beats/pull/33457 or the Internal
collection\r\nchanges introduced
in\r\nhttps://github.com/elastic/elasticsearch/pull/91153.\r\n\r\nThe
above PR fixes a bug that makes Metricbeat properly report on all\r\nthe
shards in an index, but our shard legend had the same kind of bug,\r\nit
only displayed the first replica because it didn't account for
the\r\nreplica \"id\" in the function that tries to deduplicate the
results.\r\n\r\nSince the above PR ensures we report each shard with a
unique document\r\nID, we change the Kibana code to use the document ID
instead of trying\r\nto regenerate a unique ID.\r\n\r\n### How to
test\r\nGet the changes: `git fetch
git@github.com:miltonhultgren/kibana.git\r\nsm-shard-allocations:sm-shard-allocations
&& git switch\r\nsm-shard-allocations`\r\n\r\nStart Elasticsearch,
create an index like this:\r\n```\r\nPUT /some-index\r\n{\r\n
\"settings\": {\r\n \"index\": {\r\n \"number_of_shards\": 3, \r\n
\"number_of_replicas\": 3\r\n }\r\n }\r\n}\r\n```\r\n\r\nRun Metricbeat
with `xpack.enabled: true`. \r\n\r\nVisit the Index details page for
`some-index` in the Stack Monitoring\r\napp and verify that on a single
node cluster you have 12 shards, 9 of\r\nwhich are
unassigned.\r\n\r\n<img width=\"753\" alt=\"Screenshot 2022-10-25 at 17
19
25\"\r\nsrc=\"https://user-images.githubusercontent.com/2564140/197819368-8ea45e1c-7472-4e15-9267-3b5d73378f2a.png\">\r\n\r\nCo-authored-by:
Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"b7debc839ff6923923fe9a41355bd0318e04cdca"}},"sourceBranch":"main","suggestedTargetBranches":[],"targetPullRequestStates":[{"branch":"main","label":"v8.7.0","labelRegex":"^v8.7.0$","isSourceBranch":true,"state":"MERGED","url":"https://github.com/elastic/kibana/pull/143963","number":143963,"mergeCommit":{"message":"[Stack
Monitoring] Use doc ID to deduplicate shard allocations (#143963)\n\n##
Summary\r\n\r\nThis PR depends on Metricbeat changes introduced in this
PR\r\nhttps://github.com/elastic/beats/pull/33457 or the Internal
collection\r\nchanges introduced
in\r\nhttps://github.com/elastic/elasticsearch/pull/91153.\r\n\r\nThe
above PR fixes a bug that makes Metricbeat properly report on all\r\nthe
shards in an index, but our shard legend had the same kind of bug,\r\nit
only displayed the first replica because it didn't account for
the\r\nreplica \"id\" in the function that tries to deduplicate the
results.\r\n\r\nSince the above PR ensures we report each shard with a
unique document\r\nID, we change the Kibana code to use the document ID
instead of trying\r\nto regenerate a unique ID.\r\n\r\n### How to
test\r\nGet the changes: `git fetch
git@github.com:miltonhultgren/kibana.git\r\nsm-shard-allocations:sm-shard-allocations
&& git switch\r\nsm-shard-allocations`\r\n\r\nStart Elasticsearch,
create an index like this:\r\n```\r\nPUT /some-index\r\n{\r\n
\"settings\": {\r\n \"index\": {\r\n \"number_of_shards\": 3, \r\n
\"number_of_replicas\": 3\r\n }\r\n }\r\n}\r\n```\r\n\r\nRun Metricbeat
with `xpack.enabled: true`. \r\n\r\nVisit the Index details page for
`some-index` in the Stack Monitoring\r\napp and verify that on a single
node cluster you have 12 shards, 9 of\r\nwhich are
unassigned.\r\n\r\n<img width=\"753\" alt=\"Screenshot 2022-10-25 at 17
19
25\"\r\nsrc=\"https://user-images.githubusercontent.com/2564140/197819368-8ea45e1c-7472-4e15-9267-3b5d73378f2a.png\">\r\n\r\nCo-authored-by:
Kibana Machine
<42973632+kibanamachine@users.noreply.github.com>","sha":"b7debc839ff6923923fe9a41355bd0318e04cdca"}}]}]
BACKPORT-->

Co-authored-by: Milton Hultgren <milton.hultgren@elastic.co>
benakansara pushed a commit to benakansara/kibana that referenced this pull request Nov 17, 2022
…ic#143963)

## Summary

This PR depends on Metricbeat changes introduced in this PR
elastic/beats#33457 or the Internal collection
changes introduced in
elastic/elasticsearch#91153.

The above PR fixes a bug that makes Metricbeat properly report on all
the shards in an index, but our shard legend had the same kind of bug,
it only displayed the first replica because it didn't account for the
replica "id" in the function that tries to deduplicate the results.

Since the above PR ensures we report each shard with a unique document
ID, we change the Kibana code to use the document ID instead of trying
to regenerate a unique ID.

### How to test
Get the changes: `git fetch git@github.com:miltonhultgren/kibana.git
sm-shard-allocations:sm-shard-allocations && git switch
sm-shard-allocations`

Start Elasticsearch, create an index like this:
```
PUT /some-index
{
  "settings": {
    "index": {
      "number_of_shards": 3,  
      "number_of_replicas": 3
    }
  }
}
```

Run Metricbeat with `xpack.enabled: true`. 

Visit the Index details page for `some-index` in the Stack Monitoring
app and verify that on a single node cluster you have 12 shards, 9 of
which are unassigned.

<img width="753" alt="Screenshot 2022-10-25 at 17 19 25"
src="https://user-images.githubusercontent.com/2564140/197819368-8ea45e1c-7472-4e15-9267-3b5d73378f2a.png">

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:prev-minor Backport to (8.x) the previous minor version (i.e. one version back from main) Feature:Stack Monitoring release_note:skip Skip the PR/issue when compiling release notes Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services v8.6.0 v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants