Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compactor: change GatherIndexIssueStats validation logic on sorted labels #831

Closed
wants to merge 2 commits into from
Closed

compactor: change GatherIndexIssueStats validation logic on sorted labels #831

wants to merge 2 commits into from

Conversation

sgmitchell
Copy link

Deliberately exclude the first element when checking that a label set is in sorted order.

The first element is always __name__ for prometheus and while _ is lexicographically before all lowercase letters, it does not precede upper case letters. This made compaction fail on valid metrics such as up{Z="one", a="two"} since the expected order was [Z, __name__, a].

This change brings the validation more in line with what prometheus is using in promparse.

An additional check may also be needed to make sure that the first value is always __name__

Changes

  • Add test cases for GatherIndexIssueStats
  • Change sorted label validation to skip the first element

Verification

Here's a simple config that generates data that will crash the compactor. I validated that my fix no longer crashed the compactor on a similar dataset.

Use the following scrape config in prometheus to generate a metrics up{Z="one", a="two", instance="localhost:9090",job="prometheus"}

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets:
      - 'localhost:9090'
      labels:
        Z: one
        a: two

Start the compactor running the improbable/thanos:v0.3.0 image and read the bucket that the thanos sidecar is writing. When the container encounters this timeseries, it will crash with the the following log line

level=error ts=... caller=main.go:181 msg="running command failed" err="error executing compaction: compaction failed: compaction: gather index issues for block /var/thanos/store/compact/0@{monitor=\"prometheus\",replica=\"prometheus-0\"}/01D36C79GDX5DVXZ1PAD5ZT65M: out-of-order label set {__name__=\"up\",Z=\"one\",a=\"two\",instance=\"localhost:9090\",job=\"prometheus\"} for series 2739605"

deliberately exclude the first element when checking that a label set
is in sorted order. The first element is always "__name__" for
prometheus and while '_' is lexographically before all lowercase
letters, it does not precede upper case letters. This made compaction
fail on valid metrics such as up{Z="one", a="two"} since the expected
order was [Z, __name__, a].

This change brings the validation more in line with what prometheus is
using at https://github.com/prometheus/prometheus/blob/9f903fb3f7841f55d1b634d5fb02986a8bfc5e0c/pkg/textparse/promparse.go#L230

Add additional check may also be needed to make sure that the first
value is always __name__
Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am ok with this change, thanks for this!

But the problem is that really upper case label names are quite wrong -> It quite massively increase lookup time/effort. So they should be discouraged. Anyway, we should do it explicitely if any, not by accident sorting bug. Thanks (:

@sgmitchell
Copy link
Author

upper case label names are quite wrong

Agreed, although I didn't know about the perf hit aspect of it. The metrics that tickled this bug were changed as soon as we found them but unfortunately, we can't rewrite history, so the compactor needs to be flexible enough to deal with (and eventually erase) our bad decisions.

Related to this: I'd be interested if you can you point me to any discussions or pages that talk about this perf hit?
I know that almost every metric I see conforms to lower and snake case, but when I looked at the best practices, there was no explicit call out.

Thanks for the quick reply

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this: I'd be interested if you can you point me to any discussions or pages that talk about this perf hit?

I will be honest, there is nothing and I only remember @fabxc (co-author of Thanos and author of TSDB) mentioning some perf consequnces. But cannot find any immdiate answer why - anyway we should double check if name is actually first. Because I though it is not. I was thinking that the reason of perf issues is exactly that name is somewhere between upper and lower case labels. Maybe for your block we remove Z and that's why this code works? ;p (your test does not cover this case)

l0 = l

// Ensure all but the first value (__name__) are sorted
if !sort.SliceIsSorted(lset[1:], func(i, j int) bool { return lset[i+1].Name <= lset[j+1].Name }) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's actually check if name is first maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I think that's the whole point.. if the labels are ordered name is between uppercase and lowercase already.

}
l0 = l

// Ensure all but the first value (__name__) are sorted
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing trailing petiof

@sgmitchell
Copy link
Author

I initially omitted the check for lset[0] == "__name__" because I only found a few occurrences of __name__ in prometheus/tsdb and none of them indicated to me that it had to be the first label. The only place I see that invariant is in prometheus/prometheus.
Since the blocks package seemed like it was much more closely aligned with the tsdb code, I was hesitant to add that logic.

Issue #32 in prometheus/tsdb was the best guidance I found on how this should be implemented and after reading your comment, I've added another test case that checks bad (at least in my eyes) labelsets correctly throw an error. Unfortunately, the old code happily accepts these label sets while my new changes result in an error (the expected output). So now I'm not 100% sure this won't break anything 🤷‍♂️.

@sgmitchell
Copy link
Author

Yuck. Ensuring __name__ is first failed e2e testing where the labelset was {a="1"}.

So I guess the question becomes: when is __name__ required in the label set for a tsdb? If it's always, then we need to change the test data to reflect that. If it's anything less than always, I'm not sure what the best and safest path forward is.

@bwplotka
Copy link
Member

Let's focus on what we want to achieve here. Are you blocked? Do you use upper case label names? (:

}

// Ensure all but the first value (__name__) are sorted.
if !sort.SliceIsSorted(lset[1:], func(i, j int) bool { return lset[i+1].Name <= lset[j+1].Name }) {
Copy link
Member

@GiedriusS GiedriusS Feb 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit different and I might raise a different pull request but what if we would relax this check a bit and switch it to < here? Essentially, it seems like Prometheus accepts duplicate labels and if that happens (happened in our case) then compactor doesn't work anymore due to this check. Case in point:

{"caller":"main.go:181","err":"error executing compaction: compaction failed: compaction: gather index issues for block /data/compact/0@{monitor=\"monitor\",replica=\"repl\"}/01D34EDQMSQ29RHAC47XGKHGC7: out-of-order label set {__name__=\"foo\",exported_job=\"vv\",host=\"172_16_226_56\",host=\"172_16_226_56\",region=\"lt\",subtask_index=\"5\",task_attempt_id=\"32e4b047bb768583ff57c709be3b1046\",task_attempt_num=\"8\",task_id=\"688c028a219ff3372f3eecb0ee5811f9\",task_name=\"Source:_foo\",tenant=\"abc\",tier=\"cooltier\",tm_id=\"53b2ed987b08f427dec4ee1465df91fa\"} for series 2594231","level":"error","msg":"running command failed","ts":"2019-02-11T13:30:33.901722306Z"}

@bwplotka

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to #848.

@sgmitchell
Copy link
Author

Let's focus on what we want to achieve here. Are you blocked? Do you use upper case label names? (:

I'm dogfooding my fork and it's running great so far. The metrics with capital labels are gone but unfortunately, I'll still need to use my fork until those troublesome metrics age out of the retention window (which by design will be a while).
This PR is probably blocked until there is a determination how to reconcile the 2 differences.

I think the ultimate goal should be that compactor validation should be as close to prometheus' validation but no more strict than prometheus. This PR specifically should just cover the edge case of labels that start with a capital letter.

A temporary fix until we can agree on the right impl here, might be completing #711.

@bwplotka
Copy link
Member

Ok, so with #848 being merged, can we rebase this / or decide to close? (: There is definitely more work on this if we want to productionize that (:

@sgmitchell
Copy link
Author

I'm ok with closing this.
#848 won't fix this issue but unfortunately I don't currently have the bandwidth to contribute more.
I've corrected my metrics with uppercase labels to be lowercase, which has gotten past this problem but it seems like there needs to be some work to ensuring thanos and prometheus don't differ in their checks

@ipstatic
Copy link
Contributor

I just ran into this bug. @sgmitchell does this PR still fix the issue?

@bwplotka
Copy link
Member

bwplotka commented Mar 13, 2019 via email

@ipstatic
Copy link
Contributor

I just built this branch (after rebasing off master) and it does not solve my issue (some labels that are all uppercase). You can see my update here, maybe I did the merge wrong?

ipstatic@0dd3cb0

@jjneely
Copy link
Contributor

jjneely commented Mar 15, 2019

Ok, rebased myself to double check @ipstatic's work.

https://github.com/jjneely/thanos/tree/GatherIndexIssueStats-label-sorting

I updated my compact docker image and re-ran, and I get an error that __name__ isn't the first label in the set. From the error message, that looks totally correct.

Can we at least skip compaction of bad blocks? Or re-order the list correctly here?

Mar 15 20:31:41 thanos-store-global-dev-0.c.fit-prometheus-dev.internal docker[1301200]: level=error ts=2019-03-15T20:31:41.883189701Z caller=main.go:182 msg="running command failed" err="error executing compaction: compaction failed: compaction: gather index issues for block /data/compact/0@{monitor=\"global\",replica=\"0\"}/01D5Q95903XSD844YJTEEAHEGT: label set {DATA_CENTER=\"dal09\",__name__=\"DISCOVERED_ALERTS\",alertname=\"PuppetAgentOpenFdsHigh\",alertstate=\"healthy\",description=\"{{$labels.instance}} of job {{$labels.job}} has been elevated (over 80% of hard limit) for more than 5 minutes.\",instance=\"mesos-10-153-2-69-x2.prod.dal09.example.com:31255\",job=\"http_status-page_prod_site-svc-status-page_dal09\",monitor=\"dal09/sre/prod/prometheus\",runbook=\"https://wiki.example.com/display/siteops/number+of+open+files\",severity=\"page\",summary=\"{{$labels.instance}} puppet-agent open_fds high\"} does not start with __name__ for series 89244"

I actually think that this patch fixes some of my woes with Compact, but I do appear to have conditions where __name__ isn't first. What to do?

@GiedriusS
Copy link
Member

GiedriusS commented Mar 15, 2019

Ok, looked a bit more into this and it seems like in the Prometheus code there is some single package for all things related to labels: https://github.com/prometheus/prometheus/blob/9f903fb3f7841f55d1b634d5fb02986a8bfc5e0c/pkg/labels/labels.go. As you can see, everywhere there the label set is being sorted before being returned to the users of that package. In here as mentioned in the original issue: https://github.com/prometheus/prometheus/blob/9f903fb3f7841f55d1b634d5fb02986a8bfc5e0c/pkg/textparse/promparse.go#L230 the __name__ label is put at the front which breaks the invariant if any of the other labels use uppercase letters. So it seems like it is a bug there in promparse... thoughts? I deleted the old comment since I was completely wrong. Need to play around with this myself a bit but the invariant definitely seems to be to have the label set always ordered.

@GiedriusS
Copy link
Member

@jjneely indeed this seems like a Prometheus bug. I added another test to the parser in Prometheus and ran the tests with: wind_speed{A="2",c="3"} 12345.

Result:

--- FAIL: TestPromParse (0.00s)
    /Users/statkg/go/src/github.com/prometheus/prometheus/pkg/textparse/require.go:157: 
        	Error Trace:	promparse_test.go:193
        	Error:      	Not equal: 
        	            	expected: labels.Labels{labels.Label{Name:"A", Value:"2"}, labels.Label{Name:"__name__", Value:"wind_speed"}, labels.Label{Name:"c", Value:"3"}}
        	            	actual  : labels.Labels{labels.Label{Name:"__name__", Value:"wind_speed"}, labels.Label{Name:"A", Value:"2"}, labels.Label{Name:"c", Value:"3"}}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,2 @@
        	            	-(labels.Labels) (len=3) {A="2", __name__="wind_speed", c="3"}
        	            	+(labels.Labels) (len=3) {__name__="wind_speed", A="2", c="3"}
        	            	 
        	Test:       	TestPromParse

So yes, it seems that Prometheus parser itself isn't following the invariants laid out in the labels package. Needs fixing there, IMHO.

@GiedriusS
Copy link
Member

@jjneely
Copy link
Contributor

jjneely commented Mar 15, 2019

Yay! So labels values should always be in sorted order with no special treatment of __name__. So the bug/corruption I'm suffering from is that I have postings that have label sets ordered like this:

{__name__=\"DISCOVERED_ALERTS\",DATA_CENTER=\"dal09\",alertname=\"PuppetAgentOpenFdsHigh\",alertstate=\"healthy\",description=\"{{$labels.instance}} of job {{$labels.job}} has been elevated (over 80% of hard limit) for more than 5 minutes.\",instance=\"mesos-10-153-2-69-x2.prod.dal09.example.com:31255\",job=\"http_status-page_prod_site-svc-status-page_dal09\",monitor=\"dal09/sre/prod/prometheus\",runbook=\"https://wiki.example.com/display/siteops/number+of+open+files\",severity=\"page\",summary=\"{{$labels.instance}} puppet-agent open_fds high\"}

Which was the original error I was facing that lead me here. Is that correct?

@GiedriusS
Copy link
Member

Yes. So now we should add an extra flag to Thanos Compact to make it ignore this error, and print information about this in case it happens. We should explicitly note that all versions before and including Prometheus 2.8.0 are affected by this. Care to make a patch?

@GiedriusS
Copy link
Member

Closing this because this has been outlined already in #888. Will add the relevant info there.

@GiedriusS GiedriusS closed this Mar 17, 2019
@jjneely
Copy link
Contributor

jjneely commented Mar 18, 2019

Yes. I'll be glad to work on a PR for this. Thanks a bunch for the help, makes contributing easy.

@ashurov-os33
Copy link

Is it possible to disable sorting?

@GiedriusS
Copy link
Member

No, probably because it permits Prometheus to use binary search. What problem are you trying to solve?

@ashurov-os33
Copy link

I use HTTP URL parameters. (target, user, pass). Because of the sorting, I have the wrong request url.
I have http://server/metrics?pass=123&target=1.1.1.1&user=foo
I need http://server/metrics?target=1.1.1.1&user=foo&pass=123

@GiedriusS
Copy link
Member

I'm sorry but I think that your issue is completely unrelated to this one and even the Thanos project 😄 pleask ask in https://github.com/prometheus/prometheus about this. But most likely you will be told that your application must not depend on such minute details because IMHO it would be hard to implement any kind of ability to specify an arbitrary ordering of parameters when scraping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants