Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make __meta_tenant_id available in metric_relabel_configs #4725

Merged
merged 14 commits into from
Apr 24, 2023

Conversation

sepich
Copy link
Contributor

@sepich sepich commented Apr 13, 2023

What this PR does

This PR adds meta label with tenant_id to distributor metric_relabel_configs phase

Which issue(s) this PR fixes or relates to

Fixes #4692

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@sepich sepich requested review from a team as code owners April 13, 2023 20:17
@CLAassistant
Copy link

CLAassistant commented Apr 13, 2023

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I think approach by passing builder to relabel.ProcessBuilder is smart, and will not be as expensive as @bboreham mentioned in #4692 (comment).

pkg/distributor/distributor.go Outdated Show resolved Hide resolved
pkg/distributor/distributor.go Outdated Show resolved Hide resolved
pkg/distributor/distributor.go Outdated Show resolved Hide resolved
@@ -818,12 +819,19 @@ func (d *Distributor) prePushRelabelMiddleware(next push.Func) push.Func {
ts := req.Timeseries[tsIdx]

if mrc := d.limits.MetricRelabelConfigs(userID); len(mrc) > 0 {
l, keep := relabel.Process(mimirpb.FromLabelAdaptersToLabels(ts.Labels), mrc...)
lb := labels.NewBuilder(mimirpb.FromLabelAdaptersToLabels(ts.Labels))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could reuse the builder between timeseries, to avoid creating new one in each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see if I understand you right)

@sepich
Copy link
Contributor Author

sepich commented Apr 14, 2023

Here are some basic testing results:
image

  • first segment till 19:16
    2x distributors running on grafana/mimir:2.7.1, ~1M series on ingesters, with flow of ~65k samples/s, no any metric_relabel_configs defined
  • second segment since 19:16
    I'm adding to runtime config for this tenant:
          metric_relabel_configs:
            - source_labels: [__name__, prometheus]
              regex: .+;(k8s|gke|eks)-.*
              action: keep
    
    so basically all the metrics are passed through. Spike at 20:00 is a 2h block cut on ingesters.
  • third segment since 20:02
    I'm changing image to sepa/mimir:meta_tenant_id-bec9bef35 made from this branch, and fixing config to be:
          metric_relabel_configs:
            - source_labels: [__meta_tenant_id, prometheus]
              regex: k8s;(k8s|gke|eks)-.*
              action: keep
    
    also affecting all metrics of this single test tenant

@lamida
Copy link
Contributor

lamida commented Apr 17, 2023

The CHANGELOG has just been cut to prepare for the next Mimir release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the CHANGELOG document. Thanks!

@pracucci pracucci self-requested a review April 17, 2023 13:55
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Looks like a nice addition to me.

About the performance impact, I'm not too much worried because it should just apply to tenants for which the relabelling config has been set. However, you could add a test case with relabelling to BenchmarkDistributor_Push (without the tenant ID label), then run it both on main and this PR branch with go test -count=3 ... and the compare the benchmarks with benchstat, please? I would like to see the actual different from a benchmark.

if !keep {
removeTsIndexes = append(removeTsIndexes, tsIdx)
continue
}
ts.Labels = mimirpb.FromLabelsToLabelAdapters(l)
lb.Del(metaLabelTenantID)
ts.Labels = mimirpb.FromLabelsToLabelAdapters(lb.Labels(labels.EmptyLabels()))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should make it slightly more performant. Since we want to overwrite ts.Labels anyway then we can pass it:

Suggested change
ts.Labels = mimirpb.FromLabelsToLabelAdapters(lb.Labels(labels.EmptyLabels()))
ts.Labels = mimirpb.FromLabelsToLabelAdapters(lb.Labels(ts.Labels))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But ts.Labels is []LabelAdapter and lb.Labels() wants Labels

Copy link
Member

@pstibrany pstibrany Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about this too much. New stringlabels version of labels code that we will use soon doesn't even take this parameter anymore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry about this too much. New stringlabels version of labels code that we will use soon doesn't even take this parameter anymore.

Sorry, it's not "new stringlabels version" that doesn't have this method with labels.Labels parameter anymore, but also latest Prometheus main (after merging prometheus/prometheus#12173) which will be in Mimir after #4759 gets merged.

@sepich
Copy link
Contributor Author

sepich commented Apr 17, 2023

Ok tested on main commit b5519ef and this branch:

$ go test -count=3 -bench BenchmarkDistributor_Push -run='^$' ./pkg/distributor/ > /tmp/new.txt
$ ./bin/benchstat /tmp/old.txt /tmp/new.txt
goos: darwin
goarch: amd64
pkg: github.com/grafana/mimir/pkg/distributor
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
                                                             │ /tmp/old.txt  │             /tmp/new.txt             │
                                                             │    sec/op     │    sec/op     vs base                │
Distributor_Push/max_label_value_length_limit_reached-16        601.0µ ± ∞ ¹   633.0µ ± ∞ ¹       ~ (p=0.400 n=3) ²
Distributor_Push/timestamp_too_new-16                           586.7µ ± ∞ ¹   599.1µ ± ∞ ¹       ~ (p=0.700 n=3) ²
Distributor_Push/all_samples_go_to_metric_relabel_configs-16    1.818m ± ∞ ¹   2.202m ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16            1033.0µ ± ∞ ¹   989.5µ ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16                493.0µ ± ∞ ¹   450.0µ ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/too_many_labels_limit_reached-16               648.2µ ± ∞ ¹   674.8µ ± ∞ ¹       ~ (p=0.200 n=3) ²
Distributor_Push/max_label_name_length_limit_reached-16         3.202m ± ∞ ¹   3.451m ± ∞ ¹       ~ (p=0.100 n=3) ²
geomean                                                         945.9µ         979.6µ        +3.56%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                                             │ /tmp/old.txt  │              /tmp/new.txt               │
                                                             │     B/op      │      B/op       vs base                 │
Distributor_Push/max_label_value_length_limit_reached-16       119.6Ki ± ∞ ¹    119.9Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/timestamp_too_new-16                          106.3Ki ± ∞ ¹    106.6Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_go_to_metric_relabel_configs-16   448.1Ki ± ∞ ¹   1200.4Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16            126.0Ki ± ∞ ¹    126.4Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16               4.647Ki ± ∞ ¹    4.882Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/too_many_labels_limit_reached-16              98.02Ki ± ∞ ¹    98.23Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
Distributor_Push/max_label_name_length_limit_reached-16        132.0Ki ± ∞ ¹    134.8Ki ± ∞ ¹        ~ (p=0.100 n=3) ²
geomean                                                        88.70Ki          103.3Ki        +16.44%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                                             │ /tmp/old.txt │             /tmp/new.txt              │
                                                             │  allocs/op   │   allocs/op    vs base                │
Distributor_Push/max_label_value_length_limit_reached-16       2.094k ± ∞ ¹    2.096k ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/timestamp_too_new-16                          2.046k ± ∞ ¹    2.048k ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_go_to_metric_relabel_configs-16   7.051k ± ∞ ¹   10.062k ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16             47.00 ± ∞ ¹     49.00 ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16                44.00 ± ∞ ¹     46.00 ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/too_many_labels_limit_reached-16              2.148k ± ∞ ¹    2.150k ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/max_label_name_length_limit_reached-16        2.084k ± ∞ ¹    2.098k ± ∞ ¹       ~ (p=0.100 n=3) ²
geomean                                                         833.6           889.1        +6.66%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

Should I leave the bench test code here?

@sepich
Copy link
Contributor Author

sepich commented Apr 17, 2023

Simplified this, now results for allocs are much better:

goos: darwin
goarch: amd64
pkg: github.com/grafana/mimir/pkg/distributor
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
                                                             │ /tmp/old.txt  │            /tmp/new2.txt             │
                                                             │    sec/op     │    sec/op     vs base                │
Distributor_Push/max_label_value_length_limit_reached-16        601.0µ ± ∞ ¹   596.4µ ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/timestamp_too_new-16                           586.7µ ± ∞ ¹   570.1µ ± ∞ ¹       ~ (p=0.700 n=3) ²
Distributor_Push/all_samples_go_to_metric_relabel_configs-16    1.818m ± ∞ ¹   2.014m ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16            1033.0µ ± ∞ ¹   971.9µ ± ∞ ¹       ~ (p=0.100 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16                493.0µ ± ∞ ¹   453.2µ ± ∞ ¹       ~ (p=0.700 n=3) ²
Distributor_Push/too_many_labels_limit_reached-16               648.2µ ± ∞ ¹   654.4µ ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/max_label_name_length_limit_reached-16         3.202m ± ∞ ¹   3.115m ± ∞ ¹       ~ (p=0.400 n=3) ²
geomean                                                         945.9µ         932.9µ        -1.38%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                                             │ /tmp/old.txt  │             /tmp/new2.txt             │
                                                             │     B/op      │     B/op       vs base                │
Distributor_Push/max_label_value_length_limit_reached-16       119.6Ki ± ∞ ¹   119.7Ki ± ∞ ¹       ~ (p=0.400 n=3) ²
Distributor_Push/timestamp_too_new-16                          106.3Ki ± ∞ ¹   106.3Ki ± ∞ ¹       ~ (p=0.700 n=3) ²
Distributor_Push/all_samples_go_to_metric_relabel_configs-16   448.1Ki ± ∞ ¹   447.2Ki ± ∞ ¹       ~ (p=0.700 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16            126.0Ki ± ∞ ¹   125.7Ki ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16               4.647Ki ± ∞ ¹   4.643Ki ± ∞ ¹       ~ (p=0.400 n=3) ²
Distributor_Push/too_many_labels_limit_reached-16              98.02Ki ± ∞ ¹   98.01Ki ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/max_label_name_length_limit_reached-16        132.0Ki ± ∞ ¹   134.6Ki ± ∞ ¹       ~ (p=0.100 n=3) ²
geomean                                                        88.70Ki         88.88Ki        +0.21%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                                             │ /tmp/old.txt │            /tmp/new2.txt             │
                                                             │  allocs/op   │  allocs/op    vs base                │
Distributor_Push/max_label_value_length_limit_reached-16       2.094k ± ∞ ¹   2.094k ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/timestamp_too_new-16                          2.046k ± ∞ ¹   2.046k ± ∞ ¹       ~ (p=1.000 n=3) ³
Distributor_Push/all_samples_go_to_metric_relabel_configs-16   7.051k ± ∞ ¹   7.051k ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/all_samples_successfully_pushed-16             47.00 ± ∞ ¹    46.00 ± ∞ ¹       ~ (p=1.000 n=3) ²
Distributor_Push/ingestion_rate_limit_reached-16                44.00 ± ∞ ¹    44.00 ± ∞ ¹       ~ (p=1.000 n=3) ³
Distributor_Push/too_many_labels_limit_reached-16              2.148k ± ∞ ¹   2.148k ± ∞ ¹       ~ (p=1.000 n=3) ³
Distributor_Push/max_label_name_length_limit_reached-16        2.084k ± ∞ ¹   2.096k ± ∞ ¹       ~ (p=0.100 n=3) ²
geomean                                                         833.6          831.7        -0.22%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
³ all samples are equal

Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

CHANGELOG.md Outdated Show resolved Hide resolved
@pstibrany
Copy link
Member

Should I leave the bench test code here?

Yes, feel free to include this benchmark in the PR.

Note that there's a conflict in CHANGELOG that needs to resolved before we can merge this.

@pstibrany
Copy link
Member

After #4759 was merged, can you please rebase this PR on top of Mimir main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tenant_id to metric_relabel_configs
5 participants