Make __meta_tenant_id available in metric_relabel_configs #4725

sepich · 2023-04-13T20:16:59Z

What this PR does

This PR adds meta label with tenant_id to distributor metric_relabel_configs phase

Which issue(s) this PR fixes or relates to

Fixes #4692

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

…figs` grafana#4692

CLAassistant · 2023-04-13T20:17:05Z

All committers have signed the CLA.

pstibrany

Thank you. I think approach by passing builder to relabel.ProcessBuilder is smart, and will not be as expensive as @bboreham mentioned in #4692 (comment).

pkg/distributor/distributor.go

pstibrany · 2023-04-14T09:18:30Z

pkg/distributor/distributor.go

@@ -818,12 +819,19 @@ func (d *Distributor) prePushRelabelMiddleware(next push.Func) push.Func {
 			ts := req.Timeseries[tsIdx]

 			if mrc := d.limits.MetricRelabelConfigs(userID); len(mrc) > 0 {
-				l, keep := relabel.Process(mimirpb.FromLabelAdaptersToLabels(ts.Labels), mrc...)
+				lb := labels.NewBuilder(mimirpb.FromLabelAdaptersToLabels(ts.Labels))


We could reuse the builder between timeseries, to avoid creating new one in each iteration.

Let's see if I understand you right)

sepich · 2023-04-14T18:26:31Z

Here are some basic testing results:

first segment till 19:16
2x distributors running on grafana/mimir:2.7.1, ~1M series on ingesters, with flow of ~65k samples/s, no any metric_relabel_configs defined
second segment since 19:16
I'm adding to runtime config for this tenant:
```
      metric_relabel_configs:
        - source_labels: [__name__, prometheus]
          regex: .+;(k8s|gke|eks)-.*
          action: keep
```
so basically all the metrics are passed through. Spike at 20:00 is a 2h block cut on ingesters.
third segment since 20:02
I'm changing image to sepa/mimir:meta_tenant_id-bec9bef35 made from this branch, and fixing config to be:
```
      metric_relabel_configs:
        - source_labels: [__meta_tenant_id, prometheus]
          regex: k8s;(k8s|gke|eks)-.*
          action: keep
```
also affecting all metrics of this single test tenant

lamida · 2023-04-17T10:40:35Z

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

pracucci

Thanks for working on this! Looks like a nice addition to me.

About the performance impact, I'm not too much worried because it should just apply to tenants for which the relabelling config has been set. However, you could add a test case with relabelling to BenchmarkDistributor_Push (without the tenant ID label), then run it both on main and this PR branch with go test -count=3 ... and the compare the benchmarks with benchstat, please? I would like to see the actual different from a benchmark.

pracucci · 2023-04-17T14:41:51Z

pkg/distributor/distributor.go

 				if !keep {
 					removeTsIndexes = append(removeTsIndexes, tsIdx)
 					continue
 				}
-				ts.Labels = mimirpb.FromLabelsToLabelAdapters(l)
+				lb.Del(metaLabelTenantID)
+				ts.Labels = mimirpb.FromLabelsToLabelAdapters(lb.Labels(labels.EmptyLabels()))


This change should make it slightly more performant. Since we want to overwrite ts.Labels anyway then we can pass it:

Suggested change

ts.Labels = mimirpb.FromLabelsToLabelAdapters(lb.Labels(labels.EmptyLabels()))

ts.Labels = mimirpb.FromLabelsToLabelAdapters(lb.Labels(ts.Labels))

But ts.Labels is []LabelAdapter and lb.Labels() wants Labels

Don't worry about this too much. New stringlabels version of labels code that we will use soon doesn't even take this parameter anymore.

Don't worry about this too much. New stringlabels version of labels code that we will use soon doesn't even take this parameter anymore.

Sorry, it's not "new stringlabels version" that doesn't have this method with labels.Labels parameter anymore, but also latest Prometheus main (after merging prometheus/prometheus#12173) which will be in Mimir after #4759 gets merged.

sepich · 2023-04-17T16:38:18Z

Ok tested on main commit b5519ef and this branch:

$ go test -count=3 -bench BenchmarkDistributor_Push -run='^$' ./pkg/distributor/ > /tmp/new.txt
$ ./bin/benchstat /tmp/old.txt /tmp/new.txt
goos: darwin
goarch: amd64
pkg: github.com/grafana/mimir/pkg/distributor
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
                                                             │ /tmp/old.txt  │             /tmp/new.txt             │
                                                             │    sec/op     │    sec/op     vs base                │
Distributor_Push/max_label_value_length_limit_reached-16        601.0µ ± ∞ ¹   633.0µ ± ∞ ¹       ~ (p=0.400 n=3) ²
Distributor_Push/timestamp_too_new-16                           586.7µ ± ∞ ¹   599.1µ ± ∞ ¹       ~ (p=0.700 n=3) ²
Distributor_Push/all_samples_go_to_metric_relabel_configs-16    1.818m ± ∞ ¹   2.202m ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16            1033.0µ ± ∞ ¹   989.5µ ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16                493.0µ ± ∞ ¹   450.0µ ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/too_many_labels_limit_reached-16               648.2µ ± ∞ ¹   674.8µ ± ∞ ¹       ~ (p=0.200 n=3) ²
Distributor_Push/max_label_name_length_limit_reached-16         3.202m ± ∞ ¹   3.451m ± ∞ ¹       ~ (p=0.100 n=3) ²
geomean                                                         945.9µ         979.6µ        +3.56%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                                             │ /tmp/old.txt  │              /tmp/new.txt               │
                                                             │     B/op      │      B/op       vs base                 │
Distributor_Push/max_label_value_length_limit_reached-16       119.6Ki ± ∞ ¹    119.9Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/timestamp_too_new-16                          106.3Ki ± ∞ ¹    106.6Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_go_to_metric_relabel_configs-16   448.1Ki ± ∞ ¹   1200.4Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16            126.0Ki ± ∞ ¹    126.4Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16               4.647Ki ± ∞ ¹    4.882Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/too_many_labels_limit_reached-16              98.02Ki ± ∞ ¹    98.23Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/max_label_name_length_limit_reached-16        132.0Ki ± ∞ ¹    134.8Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
geomean                                                        88.70Ki          103.3Ki        +16.44%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                                             │ /tmp/old.txt │             /tmp/new.txt              │
                                                             │  allocs/op   │   allocs/op    vs base                │
Distributor_Push/max_label_value_length_limit_reached-16       2.094k ± ∞ ¹    2.096k ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/timestamp_too_new-16                          2.046k ± ∞ ¹    2.048k ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_go_to_metric_relabel_configs-16   7.051k ± ∞ ¹   10.062k ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16             47.00 ± ∞ ¹     49.00 ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16                44.00 ± ∞ ¹     46.00 ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/too_many_labels_limit_reached-16              2.148k ± ∞ ¹    2.150k ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/max_label_name_length_limit_reached-16        2.084k ± ∞ ¹    2.098k ± ∞ ¹       ~ (p=0.100 n=3) ²
geomean                                                         833.6           889.1        +6.66%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

Should I leave the bench test code here?

sepich · 2023-04-17T23:15:54Z

Simplified this, now results for allocs are much better:

goos: darwin
goarch: amd64
pkg: github.com/grafana/mimir/pkg/distributor
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
                                                             │ /tmp/old.txt  │            /tmp/new2.txt             │
                                                             │    sec/op     │    sec/op     vs base                │
Distributor_Push/max_label_value_length_limit_reached-16        601.0µ ± ∞ ¹   596.4µ ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/timestamp_too_new-16                           586.7µ ± ∞ ¹   570.1µ ± ∞ ¹       ~ (p=0.700 n=3) ²
Distributor_Push/all_samples_go_to_metric_relabel_configs-16    1.818m ± ∞ ¹   2.014m ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16            1033.0µ ± ∞ ¹   971.9µ ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16                493.0µ ± ∞ ¹   453.2µ ± ∞ ¹       ~ (p=0.700 n=3) ²
Distributor_Push/too_many_labels_limit_reached-16               648.2µ ± ∞ ¹   654.4µ ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/max_label_name_length_limit_reached-16         3.202m ± ∞ ¹   3.115m ± ∞ ¹       ~ (p=0.400 n=3) ²
geomean                                                         945.9µ         932.9µ        -1.38%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                                             │ /tmp/old.txt  │             /tmp/new2.txt             │
                                                             │     B/op      │     B/op       vs base                │
Distributor_Push/max_label_value_length_limit_reached-16       119.6Ki ± ∞ ¹   119.7Ki ± ∞ ¹       ~ (p=0.400 n=3) ²
Distributor_Push/timestamp_too_new-16                          106.3Ki ± ∞ ¹   106.3Ki ± ∞ ¹       ~ (p=0.700 n=3) ²
Distributor_Push/all_samples_go_to_metric_relabel_configs-16   448.1Ki ± ∞ ¹   447.2Ki ± ∞ ¹       ~ (p=0.700 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16            126.0Ki ± ∞ ¹   125.7Ki ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16               4.647Ki ± ∞ ¹   4.643Ki ± ∞ ¹       ~ (p=0.400 n=3) ²
Distributor_Push/too_many_labels_limit_reached-16              98.02Ki ± ∞ ¹   98.01Ki ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/max_label_name_length_limit_reached-16        132.0Ki ± ∞ ¹   134.6Ki ± ∞ ¹       ~ (p=0.100 n=3) ²
geomean                                                        88.70Ki         88.88Ki        +0.21%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                                             │ /tmp/old.txt │            /tmp/new2.txt             │
                                                             │  allocs/op   │  allocs/op    vs base                │
Distributor_Push/max_label_value_length_limit_reached-16       2.094k ± ∞ ¹   2.094k ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/timestamp_too_new-16                          2.046k ± ∞ ¹   2.046k ± ∞ ¹       ~ (p=1.000 n=3) ³
Distributor_Push/all_samples_go_to_metric_relabel_configs-16   7.051k ± ∞ ¹   7.051k ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16             47.00 ± ∞ ¹    46.00 ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16                44.00 ± ∞ ¹    44.00 ± ∞ ¹       ~ (p=1.000 n=3) ³
Distributor_Push/too_many_labels_limit_reached-16              2.148k ± ∞ ¹   2.148k ± ∞ ¹       ~ (p=1.000 n=3) ³
Distributor_Push/max_label_name_length_limit_reached-16        2.084k ± ∞ ¹   2.096k ± ∞ ¹       ~ (p=0.100 n=3) ²
geomean                                                         833.6          831.7        -0.22%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
³ all samples are equal

pkg/distributor/distributor.go

This reverts commit fb42dda.

pstibrany

Thank you.

CHANGELOG.md

pstibrany · 2023-04-19T12:57:53Z

Should I leave the bench test code here?

Yes, feel free to include this benchmark in the PR.

Note that there's a conflict in CHANGELOG that needs to resolved before we can merge this.

pstibrany · 2023-04-19T13:24:15Z

After #4759 was merged, can you please rebase this PR on top of Mimir main?

Co-authored-by: Peter Štibraný <pstibrany@gmail.com>

Distributor: make __meta_tenant_id available in `metric_relabel_con…

67a1602

…figs` grafana#4692

sepich requested review from a team as code owners April 13, 2023 20:17

Changelog: fix PR number

a467096

pstibrany reviewed Apr 14, 2023

View reviewed changes

sepich force-pushed the meta_tenant_id branch from 7242cac to 7b57260 Compare April 14, 2023 16:19

Improve performance

ca06f5b

sepich force-pushed the meta_tenant_id branch from 7b57260 to ca06f5b Compare April 14, 2023 17:20

Merge remote-tracking branch 'upstream/main' into meta_tenant_id

bec9bef

lamida added the release/notified-changelog-cut label Apr 17, 2023

Merge remote-tracking branch 'upstream/main' into meta_tenant_id

80dd2d3

pracucci self-requested a review April 17, 2023 13:55

pracucci approved these changes Apr 17, 2023

View reviewed changes

pracucci force-pushed the meta_tenant_id branch from f206381 to 80dd2d3 Compare April 17, 2023 14:52

sepich added 2 commits April 17, 2023 17:53

Merge remote-tracking branch 'upstream/main' into meta_tenant_id

3275998

Perf test for metric_relabel_configs

6259036

Mutate LabelAdapters directly

fb42dda

pstibrany reviewed Apr 18, 2023

View reviewed changes

pkg/distributor/distributor.go Outdated Show resolved Hide resolved

Revert "Mutate LabelAdapters directly"

c564be2

This reverts commit fb42dda.

pstibrany approved these changes Apr 19, 2023

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

sepich and others added 3 commits April 19, 2023 16:52

Rephrase CHANGELOG.md

2cc5c4d

Co-authored-by: Peter Štibraný <pstibrany@gmail.com>

Merge remote-tracking branch 'upstream/main' into meta_tenant_id

c0ab383

Merge remote-tracking branch 'upstream/main' into meta_tenant_id

45ad0b3

sepich added 2 commits April 20, 2023 09:33

update for LabelBuilder.Lables

955f1a9

Merge remote-tracking branch 'upstream/main' into meta_tenant_id

01d07a5

pstibrany enabled auto-merge (squash) April 24, 2023 08:18

pstibrany merged commit 8593c88 into grafana:main Apr 24, 2023

kavin-kr mentioned this pull request May 23, 2023

Support multi-tenancy using labels grafana/loki#9499

Open

sepich mentioned this pull request Jan 24, 2024

Enable Receiver to extract Tenant from a label present in incoming timeseries thanos-io/thanos#7081

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make __meta_tenant_id available in metric_relabel_configs #4725

Make __meta_tenant_id available in metric_relabel_configs #4725

sepich commented Apr 13, 2023

CLAassistant commented Apr 13, 2023 •

edited

Loading

pstibrany left a comment

pstibrany Apr 14, 2023

sepich Apr 14, 2023

sepich commented Apr 14, 2023

lamida commented Apr 17, 2023

pracucci left a comment

pracucci Apr 17, 2023

sepich Apr 17, 2023

pstibrany Apr 18, 2023 •

edited

Loading

pstibrany Apr 19, 2023

sepich commented Apr 17, 2023

sepich commented Apr 17, 2023

pstibrany left a comment

pstibrany commented Apr 19, 2023

pstibrany commented Apr 19, 2023

	ts.Labels = mimirpb.FromLabelsToLabelAdapters(lb.Labels(labels.EmptyLabels()))
	ts.Labels = mimirpb.FromLabelsToLabelAdapters(lb.Labels(ts.Labels))

Make __meta_tenant_id available in metric_relabel_configs #4725

Make __meta_tenant_id available in metric_relabel_configs #4725

Conversation

sepich commented Apr 13, 2023

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

CLAassistant commented Apr 13, 2023 • edited Loading

pstibrany left a comment

Choose a reason for hiding this comment

pstibrany Apr 14, 2023

Choose a reason for hiding this comment

sepich Apr 14, 2023

Choose a reason for hiding this comment

sepich commented Apr 14, 2023

lamida commented Apr 17, 2023

pracucci left a comment

Choose a reason for hiding this comment

pracucci Apr 17, 2023

Choose a reason for hiding this comment

sepich Apr 17, 2023

Choose a reason for hiding this comment

pstibrany Apr 18, 2023 • edited Loading

Choose a reason for hiding this comment

pstibrany Apr 19, 2023

Choose a reason for hiding this comment

sepich commented Apr 17, 2023

sepich commented Apr 17, 2023

pstibrany left a comment

Choose a reason for hiding this comment

pstibrany commented Apr 19, 2023

pstibrany commented Apr 19, 2023

CLAassistant commented Apr 13, 2023 •

edited

Loading

pstibrany Apr 18, 2023 •

edited

Loading