Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dashboards: add 'Read path' selector to 'Mimir / Queries' dashboard #8878

Merged
merged 4 commits into from
Aug 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@
* [ENHANCEMENT] Dashboards: add Kafka end-to-end latency outliers panel in the "Mimir / Writes" dashboard. #8948
* [ENHANCEMENT] Dashboards: add "Out-of-order samples appended" panel to "Mimir / Tenants" dashboard. #8939
* [ENHANCEMENT] Alerts: `RequestErrors` and `RulerRemoteEvaluationFailing` have been enriched with a native histogram version. #9004
* [ENHANCEMENT] Dashboards: add 'Read path' selector to 'Mimir / Queries' dashboard. #8878
* [BUGFIX] Dashboards: fix "current replicas" in autoscaling panels when HPA is not active. #8566
* [BUGFIX] Alerts: do not fire `MimirRingMembersMismatch` during the migration to experimental ingest storage. #8727

Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

105 changes: 69 additions & 36 deletions operations/mimir-mixin-compiled/dashboards/mimir-queries.json

Large diffs are not rendered by default.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion operations/mimir-mixin-tools/serve/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ set -e

SCRIPT_DIR=$(cd `dirname $0` && pwd)
# Ensure we run recent Grafana.
GRAFANA_VERSION=11.0.0
GRAFANA_VERSION=11.1.3
DOCKER_CONTAINER_NAME="mixin-serve-grafana"
DOCKER_OPTS=""

Expand Down
3 changes: 2 additions & 1 deletion operations/mimir-mixin/.lint
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,10 @@ exclusions:
- dashboard: Mimir / Top tenants
panel: Top $limit users by received exemplars rate in last 5m
target-promql-rule:
reason: Skipping in dashboards where the linter parses a Loki query as Prometheus one.
reason: Skipping in dashboards where the linter parses a Loki query as Prometheus one, or we define label matchers as template variables.
entries:
- dashboard: Mimir / Slow queries
- dashboard: Mimir / Queries
template-datasource-rule:
reason: We prefer to keep calling "datasource" the Prometheus datasource to keep consistency between dashboards.
entries:
Expand Down
4 changes: 4 additions & 0 deletions operations/mimir-mixin/config.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@
alertmanager: ['alertmanager', 'cortex', 'mimir', 'mimir-backend.*'],
overrides_exporter: ['overrides-exporter', 'mimir-backend.*'],

// The following are job matchers used to select all components in the read path.
main_read_path: std.uniq(std.sort(self.query_frontend + self.query_scheduler + self.querier)),
remote_ruler_read_path: std.uniq(std.sort(self.ruler_query_frontend + self.ruler_query_scheduler + self.ruler_querier)),

// The following are job matchers used to select all components in a given "path".
write: ['distributor.*', 'ingester.*', 'mimir-write.*'],
read: ['query-frontend.*', 'querier.*', 'ruler-query-frontend.*', 'ruler-querier.*', 'mimir-read.*'],
Expand Down
94 changes: 51 additions & 43 deletions operations/mimir-mixin/dashboards/dashboard-utils.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -126,30 +126,38 @@ local utils = import 'mixin-utils/utils.libsonnet';
addActiveUserSelectorTemplates()::
self.addTemplate('user', 'cortex_ingester_active_series{%s=~"$cluster", %s=~"$namespace"}' % [$._config.per_cluster_label, $._config.per_namespace_label], 'user', sort=sortAscending),

addCustomTemplate(name, values, defaultIndex=0):: self {
addCustomTemplate(label, name, options, defaultIndex=0):: self {
// Escape the comma because it's used a separator in the options list.
local escapeValue(v) = std.strReplace(v, ',', '\\,'),

templating+: {
list+: [
{
name: name,
options: [
{
selected: v == values[defaultIndex],
text: v,
value: v,
}
for v in values
],
current: {
selected: true,
text: values[defaultIndex],
value: values[defaultIndex],
},
type: 'custom',
hide: 0,
includeAll: false,
multi: false,
list+: [{
current: {
selected: true,
text: options[defaultIndex].label,
value: escapeValue(options[defaultIndex].value),
},
],
hide: 0,
includeAll: false,
label: label,
multi: false,
name: name,
query: std.join(',', [
'%s : %s' % [option.label, escapeValue(option.value)]
for option in options
]),
options: [
{
selected: option.label == options[defaultIndex].label,
text: option.label,
value: escapeValue(option.value),
}
for option in options
],
skipUrlSync: false,
type: 'custom',
useTags: false,
}],
},
},
},
Expand Down Expand Up @@ -1884,7 +1892,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
},
},

ingestStorageFetchLastProducedOffsetRequestsPanel(jobName)::
ingestStorageFetchLastProducedOffsetRequestsPanel(jobMatcher)::
$.timeseriesPanel('Fetch last produced offset requests / sec') +
$.panelDescription(
'Fetch last produced offset requests / sec',
Expand All @@ -1896,10 +1904,10 @@ local utils = import 'mixin-utils/utils.libsonnet';
sum(rate(cortex_ingest_storage_reader_last_produced_offset_requests_total{%s}[$__rate_interval]))
-
sum(rate(cortex_ingest_storage_reader_last_produced_offset_failures_total{%s}[$__rate_interval]))
||| % [$.jobMatcher($._config.job_names[jobName]), $.jobMatcher($._config.job_names[jobName])],
||| % [jobMatcher, jobMatcher],
|||
sum(rate(cortex_ingest_storage_reader_last_produced_offset_failures_total{%s}[$__rate_interval]))
||| % [$.jobMatcher($._config.job_names[jobName])],
||| % [jobMatcher],
],
[
'successful',
Expand All @@ -1913,7 +1921,7 @@ local utils = import 'mixin-utils/utils.libsonnet';
$.aliasColors({ successful: $._colors.success, failed: $._colors.failed }) +
$.stack,

ingestStorageFetchLastProducedOffsetLatencyPanel(jobName)::
ingestStorageFetchLastProducedOffsetLatencyPanel(jobMatcher)::
$.timeseriesPanel('Fetch last produced offset latency') +
$.panelDescription(
'Fetch last produced offset latency',
Expand All @@ -1923,10 +1931,10 @@ local utils = import 'mixin-utils/utils.libsonnet';
) +
$.queryPanel(
[
'histogram_avg(sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(0.99, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(0.999, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(1.0, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_avg(sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [jobMatcher],
'histogram_quantile(0.99, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [jobMatcher],
'histogram_quantile(0.999, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [jobMatcher],
'histogram_quantile(1.0, sum(rate(cortex_ingest_storage_reader_last_produced_offset_request_duration_seconds{%s}[$__rate_interval])))' % [jobMatcher],
],
[
'avg',
Expand All @@ -1940,10 +1948,10 @@ local utils = import 'mixin-utils/utils.libsonnet';
},
},

ingestStorageStrongConsistencyRequestsPanel(jobName)::
// The unit changes whether the metric is exposed from ingesters or other components. In the ingesters it's the
ingestStorageStrongConsistencyRequestsPanel(component, jobMatcher)::
// The unit changes whether the metric is exposed from ingesters (partition-reader) or other components. In the ingesters it's the
// requests issued by queriers to ingesters, while in other components it's the actual query.
local unit = if jobName == 'ingester' then 'requests' else 'queries';
local unit = if component == 'partition-reader' then 'requests' else 'queries';
local title = '%s with strong read consistency / sec' % (std.asciiUpper(std.substr(unit, 0, 1)) + std.substr(unit, 1, std.length(unit) - 1));

$.timeseriesPanel(title) +
Expand All @@ -1956,13 +1964,13 @@ local utils = import 'mixin-utils/utils.libsonnet';
$.queryPanel(
[
|||
sum(rate(cortex_ingest_storage_strong_consistency_requests_total{%s}[$__rate_interval]))
sum(rate(cortex_ingest_storage_strong_consistency_requests_total{component="%(component)s", %(jobMatcher)s}[$__rate_interval]))
-
sum(rate(cortex_ingest_storage_strong_consistency_failures_total{%s}[$__rate_interval]))
||| % [$.jobMatcher($._config.job_names[jobName]), $.jobMatcher($._config.job_names[jobName])],
sum(rate(cortex_ingest_storage_strong_consistency_failures_total{component="%(component)s", %(jobMatcher)s}[$__rate_interval]))
||| % { jobMatcher: jobMatcher, component: component },
|||
sum(rate(cortex_ingest_storage_strong_consistency_failures_total{%s}[$__rate_interval]))
||| % [$.jobMatcher($._config.job_names[jobName])],
sum(rate(cortex_ingest_storage_strong_consistency_failures_total{component="%(component)s", %(jobMatcher)s}[$__rate_interval]))
||| % { jobMatcher: jobMatcher, component: component },
],
[
'successful',
Expand All @@ -1976,18 +1984,18 @@ local utils = import 'mixin-utils/utils.libsonnet';
$.aliasColors({ successful: $._colors.success, failed: $._colors.failed }) +
$.stack,

ingestStorageStrongConsistencyWaitLatencyPanel(jobName)::
ingestStorageStrongConsistencyWaitLatencyPanel(component, jobMatcher)::
$.timeseriesPanel('Strong read consistency queries — wait latency') +
$.panelDescription(
'Strong read consistency queries — wait latency',
'How long does the request wait to guarantee strong read consistency.',
) +
$.queryPanel(
[
'histogram_avg(sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(0.99, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(0.999, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_quantile(1.0, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{%s}[$__rate_interval])))' % [$.jobMatcher($._config.job_names[jobName])],
'histogram_avg(sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{component="%(component)s", %(jobMatcher)s}[$__rate_interval])))' % { component: component, jobMatcher: jobMatcher },
'histogram_quantile(0.99, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{component="%(component)s", %(jobMatcher)s}[$__rate_interval])))' % { component: component, jobMatcher: jobMatcher },
'histogram_quantile(0.999, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{component="%(component)s", %(jobMatcher)s}[$__rate_interval])))' % { component: component, jobMatcher: jobMatcher },
'histogram_quantile(1.0, sum(rate(cortex_ingest_storage_strong_consistency_wait_duration_seconds{component="%(component)s", %(jobMatcher)s}[$__rate_interval])))' % { component: component, jobMatcher: jobMatcher },
],
[
'avg',
Expand Down
Loading
Loading