Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deep storage segment metric #16072

Merged
merged 3 commits into from
Mar 11, 2024

Conversation

georgew5656
Copy link
Contributor

@georgew5656 georgew5656 commented Mar 7, 2024

Addressing some followup comments from the change in the metric made in #16020

Description

If a datasource has a replicaCount=0 rule that applies to some segments that are not covered by a replicaCount>0 rule, those segments will not be reported as unavailable in segment/unavailable/count. Since they are not queryable by native queries (only using the query from deep storage feature), we want a separate metric that lets us check how many segments fit these criteria.

Fixed the bug ...

Renamed the class ...

Added a forbidden-apis entry ...

Add metric segment/deepStorage/count that counts the number of segments that are only queryable through the query from deep storage metric.

Release note

New metric to support query from deep storage feature.

Key changed/added classes in this PR
  • DruidCoordinator
  • Stats

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@georgew5656 georgew5656 changed the title Add cold segment metric Add deep storage segment metric Mar 7, 2024
@kfaraz
Copy link
Contributor

kfaraz commented Mar 8, 2024

If a datasource has a replicaCount=0 rule that applies to some segments that are not covered by a replicaCount>0 rule, those segments will not be reported as available in segment/unavailable/count

After this patch, will such segments be reported as available or unavailable?
If they need to be reported as unavailable, would we need to remove the replicaCount.required() == 0 condition from the computation of unavailable/count?

@@ -252,6 +252,25 @@ public Object2IntMap<String> getDatasourceToUnavailableSegmentCount()
return datasourceToUnavailableSegments;
}

public Object2IntMap<String> getDatasourceToDeepStorageQueryOnlySegmentCount()
Copy link
Contributor

@kfaraz kfaraz Mar 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Some of the code in this method may be commoned out with getDatasourceToUnavailableSegmentCount.

@@ -332,6 +332,7 @@ These metrics are for the Druid Coordinator and are reset each time the Coordina
|`segment/unneededEternityTombstone/count`|Number of non-overshadowed eternity tombstones marked as unused.| |Varies|
|`segment/unavailable/count`|Number of unique segments left to load until all used segments are available for queries.|`dataSource`|0|
|`segment/underReplicated/count`|Number of segments, including replicas, left to load until all used segments are available for queries.|`tier`, `dataSource`|0|
|`segment/deepStorageOnly/count`|Number of unique segments that are only available for querying directly from deep storage.|`dataSource`|Varies|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the metric name deepStorageQueryOnly or better yet availableDeepStorageOnly as it aligns better with unavailable/count?

@georgew5656
Copy link
Contributor Author

If a datasource has a replicaCount=0 rule that applies to some segments that are not covered by a replicaCount>0 rule, those segments will not be reported as available in segment/unavailable/count

After this patch, will such segments be reported as available or unavailable? If they need to be reported as unavailable, would we need to remove the replicaCount.required() == 0 condition from the computation of unavailable/count?

that should say "not be reported as unavailable", my bad.

i think they should not be reported as unavailable b/c that metric is supposed to be 0 during steady state and enabling query from deep storage shouldn't change this

@georgew5656 georgew5656 requested a review from kfaraz March 8, 2024 16:22
@suneet-s
Copy link
Contributor

suneet-s commented Mar 8, 2024

What is the purpose of this metric? The doc says it's typical value is varies - is there a good or bad value?

I can imagine tracking the size of cold data as a useful metric, but I'm not sure about the use of seeing how the count of cold segments changes over time will be helpful. Maybe some doc updates about that would help clarify the use of the metric.

@kfaraz
Copy link
Contributor

kfaraz commented Mar 9, 2024

i think they should not be reported as unavailable b/c that metric is supposed to be 0 during steady state

Thanks, @georgew5656 , the steady state reasoning makes sense to me.

@suneet-s , one use of this metric would be to find out if the number of segments that are not available for querying using the native engine. For example, if there are some segments that are available for querying from deep storage using MSQ, they would not be counted towards the unavailable/count, but these would still not be available to the native engine. The new metric would thus be reported as non-zero for such cases.

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
As @suneet-s requested, you may add more info in the docs. But the current definition of the metric also seems sufficient to me.

@georgew5656 georgew5656 merged commit 94d2a28 into apache:master Mar 11, 2024
83 checks passed
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants