Receiver: Metrics with out-of-order labels get accepted #5499

matej-g · 2022-07-13T15:12:20Z

Thanos, Prometheus and Golang version used:
Latest Thanos (0.27.0)

What happened:
Receiver happily accepts metrics with out-of-order labels. By itself, this seemed to be acceptable (i.e. TSDB will append them without complaining). But this is causing issues in compactor, which expects labels to be in order and this causes it to get into a crash loop.

This is similar to situation that happened previously with prometheus/prometheus#5372. Though we're in remote write land and we may assume in most cases we are getting metrics from 'official' clients that should adhere to the labels ordering, this cannot be guaranteed with all clients (as is evidenced by our related issue #5497, which we caused by running some custom tests with an 'unofficial' client from https://github.com/observatorium/up).

What you expected to happen:
I would expect the receiver to handle this, at least in a minimal fashion, to prevent potential issues with compactor down the line. Possible options:

Silently fix label ordering
Warn about wrong label order
Both of the above
Refuse request as bad request
Make this configurable

The text was updated successfully, but these errors were encountered:

douglascamata · 2022-07-14T17:01:37Z

Can Receive know if the labels are out of order without having to effectively copy, sort the copy, and then and compare to the original? 🤔

matej-g · 2022-07-15T14:32:22Z

After some more thought and taking a look at the remote write spec, I feel we should not be condoning client's bad behavior, so instead of trying to fix wrong labels I'd suggest to drop series with invalid labels from the request and handle this as a bad request error. I gave it a stab in #5508.

douglascamata · 2022-07-15T15:21:03Z

@matej-g so we drop the series with bad labels, accept the others with good labels and returning a bad request? How does Prometheus client react to a bad request? I am afraid it could send again the same request, duplicating the series with good labels.

matej-g · 2022-07-15T15:36:37Z

Prometheus considers bad request to be unrecoverable errors (unlike e.g. server errors), so it should not retry on such response. This is equivalent to how we handle requests with invalid samples or exemplars. I tried to describe it more in depth in the PR description.

douglascamata · 2022-07-15T15:41:17Z

Awesome. It sounds like a good flow. Thanks for the explanation. 🙇

hanem100k · 2024-04-09T19:58:46Z

uuh, google does not like the suffix of this, found a lot of issues with out-of-order samples, but this with labels was buried:)

Running 0.34 thanos-receivers and It does work accordingly to the above, drops the requests:)

I'm sending writes from Prometheus instances, however, remote_write config doesn't seem to have an option to send sorted, which is weird because the spec explicitly states that stuff should be sorted, it might make sense to provide the possibility to sort before egressing. Or inside the receiver to sort before ingesting?

Either way, is there a way I can explicitly sort? or just labelkeep blocks in order? @matej-g @douglascamata

Thanks!

MichaHoffmann · 2024-04-09T20:11:31Z

Oh I don't think Prometheus should write ooo labels in remote write, is there something else writing?

hanem100k · 2024-04-09T20:15:34Z

Oh I don't think Prometheus should write ooo labels in remote write, is there something else writing?

It's only promethesis, hundreds of them. Most of them are on version 2.50, with a couple still on 2.26, let me see if the old ones are causing the issue

MichaHoffmann · 2024-04-09T20:17:28Z

Can you run receivers with debug logging? I think they should log the offending labels then iirc.

hanem100k · 2024-04-09T20:20:43Z

checked the versions:

We run quite a few, unfortunately, tho it doesn't seem to be version specific, newest is having the same issues.

Can you run receivers with debug logging? I think they should log the offending labels then iirc.

Yes they do log it, tho it is very very spammy, I mean we're having this issue all over the place, hence wanting to defer manual labelkeep:)

MichaHoffmann · 2024-04-09T20:23:21Z

Our of curiosity how do the rejected labelsets look?

hanem100k · 2024-04-09T20:30:02Z

*hand edited values, might have syntax errors

Edit: added screenshot, text was rendered very weird

MichaHoffmann · 2024-04-09T20:34:31Z

That lset also has a couple duplicate labels after rule_group! That's really weird

hanem100k · 2024-04-09T20:35:39Z

it is really weird, because checking the values, those are different its even a different pod, from a different node.

one is RISC the other is CISC

It's like two long label strings were randomly concatenated.

hanem100k · 2024-04-09T20:44:40Z

looking at an another one:

This is not sorted to being with, tho it duplicates again, even the node.

Will debug tomorrow, but I think the logging might be wrong, (we are consuming these currently with another layer of big prom instances and did not notice any label duplication or weirdness like the above. Ofc those are scraping)

matej-g added feature request/improvement component: receive labels Jul 13, 2022

matej-g mentioned this issue Jul 14, 2022

Compact: crashes when finding a sample with out of order labels #5497

Closed

matej-g self-assigned this Jul 14, 2022

matej-g mentioned this issue Jul 15, 2022

Receiver: Validate labels in write requests #5508

Merged

2 tasks

bwplotka closed this as completed in #5508 Aug 6, 2022

chi-miel mentioned this issue Feb 9, 2023

[Bug]: remote write exporter to Thanos netdata/netdata#14482

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Receiver: Metrics with out-of-order labels get accepted #5499

Receiver: Metrics with out-of-order labels get accepted #5499

matej-g commented Jul 13, 2022

douglascamata commented Jul 14, 2022

matej-g commented Jul 15, 2022

douglascamata commented Jul 15, 2022

matej-g commented Jul 15, 2022

douglascamata commented Jul 15, 2022

hanem100k commented Apr 9, 2024 •

edited

Loading

MichaHoffmann commented Apr 9, 2024

hanem100k commented Apr 9, 2024

MichaHoffmann commented Apr 9, 2024

hanem100k commented Apr 9, 2024

MichaHoffmann commented Apr 9, 2024

hanem100k commented Apr 9, 2024 •

edited

Loading

MichaHoffmann commented Apr 9, 2024

hanem100k commented Apr 9, 2024 •

edited

Loading

hanem100k commented Apr 9, 2024

Receiver: Metrics with out-of-order labels get accepted #5499

Receiver: Metrics with out-of-order labels get accepted #5499

Comments

matej-g commented Jul 13, 2022

douglascamata commented Jul 14, 2022

matej-g commented Jul 15, 2022

douglascamata commented Jul 15, 2022

matej-g commented Jul 15, 2022

douglascamata commented Jul 15, 2022

hanem100k commented Apr 9, 2024 • edited Loading

MichaHoffmann commented Apr 9, 2024

hanem100k commented Apr 9, 2024

MichaHoffmann commented Apr 9, 2024

hanem100k commented Apr 9, 2024

MichaHoffmann commented Apr 9, 2024

hanem100k commented Apr 9, 2024 • edited Loading

MichaHoffmann commented Apr 9, 2024

hanem100k commented Apr 9, 2024 • edited Loading

hanem100k commented Apr 9, 2024

hanem100k commented Apr 9, 2024 •

edited

Loading

hanem100k commented Apr 9, 2024 •

edited

Loading

hanem100k commented Apr 9, 2024 •

edited

Loading