Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: cache label names/values API response in the query results cache #5395

Closed
pracucci opened this issue Jul 1, 2023 · 3 comments · Fixed by #5426
Closed

Proposal: cache label names/values API response in the query results cache #5395

pracucci opened this issue Jul 1, 2023 · 3 comments · Fixed by #5426
Assignees

Comments

@pracucci
Copy link
Collaborator

pracucci commented Jul 1, 2023

Problem

We see some customers repetitively sending the same requests (with the same parameters) over and over to the GET /api/v1/label/<label_name>/values endpoint (doc).

Such requests can be heavy, in particular when the start and end timerange are missing because by default Prometheus queries the whole time range (since the beginning of the universe until the end of it).

Mimir supports the -store.max-labels-query-length limit, which limits the time range (end - start time) of series, label names and values queries. The default is 0, which means "no limit". Moreover this limit is currently enforced only when querying the store-gateways, but not ingesters. At Grafana Labs we set this limit to 768h (32 days).

Proposal

Similarly to #5212, I propose to add the support for a (short-lived) query results cache for the label values API endpoint.

Cache key

The "label values" API parameters are (doc):

  • start: Start timestamp. Optional.
  • end: End timestamp. Optional.
  • match[]: Repeated series selector argument that selects the series from which to read the label values. Optional.

I propose to add all parameters to the cache key.

Both TSDB (so the Mimir ingester) and Mimir store-gateway query the label values out of blocks overlapping within the start and end time. This means that for maximum granularity is the block. Since the smallest blocks we have are 2h blocks, I propose to align the start and end time to 2h boundaries when computing the cache key. This means that, if we receive two requests with different start/end time but within the same 2h range (aligned to block boundaries) the cache key for the two requests is the same (assuming the matchers are the same as well).

Cache TTL

Similarly to #5212, I propose to add a new per-tenant configuration option to set the caching TTL for "label query".

Extension to label names API endpoint

The same exact approach can be adopted for the "label names" API endpoint (docs).

@pracucci pracucci changed the title Proposal: cache label values API response in the query results cache Proposal: cache label names/values API response in the query results cache Jul 1, 2023
@pracucci pracucci self-assigned this Jul 1, 2023
@colega
Copy link
Contributor

colega commented Jul 4, 2023

Since the smallest blocks we have are 2h blocks, I propose to align the start and end time to 2h boundaries when computing the cache key.

So, I understand that without start/end, this cached value will only be available during 2h? What about recent data, are we going to cache the data relative to the head block? And what would happen to the OOO writes (especially for customers with month of OOO?)

Also see #457 as this cache will overlap with the implemented in #590

@pracucci
Copy link
Collaborator Author

pracucci commented Jul 5, 2023

What about recent data, are we going to cache the data relative to the head block?

We're going to cache the whole response (being done by the query-frontend). Think about this cache as a cache with 1m TTL, so may return stale results for 1m. Same as #5212.

@colega
Copy link
Contributor

colega commented Jul 5, 2023

Oh okay, it's 1 minute TTL cache. Fine! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants