Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

offset modifier causing OOM #678

Closed
arkbriar opened this issue Sep 10, 2024 · 6 comments · Fixed by #683
Closed

offset modifier causing OOM #678

arkbriar opened this issue Sep 10, 2024 · 6 comments · Fixed by #683

Comments

@arkbriar
Copy link

Hi!

I have a promxy server to form a HA query frontend against two identical VictoriaMetrics. It worked perfectly!

However, I found that some special queries will make promxy OOM. And I just found a strange thing: when I removed the offset modifier in one of the queries, the OOM's gone. Here's the original query:

max(max_over_time(sum(rate(stream_executor_row_count{namespace=~"default"}[30s] offset 34m46s))[12h2m27s:30s]))

The promxy has 4 GiB to use and it normally uses ~200 MiB. Therefore, I guess that there's some magic behind the offset. Also, I saw this PR that handles the offset specifically.

I'd like to dig into the codes later when I have time, but I think you folks are more professional and can definitely provide more insights so I opened this issue first. I will post more findings if any.

@arkbriar
Copy link
Author

Update: I can reproduce it with local repo. The process used 3.19 GiB with offset and 40 MiB without. Let me try digging into it.

@arkbriar
Copy link
Author

arkbriar commented Sep 10, 2024

Without offset, promxy sends QueryRange requests

DEBU[2024-09-10T19:08:01+08:00] http://localhost:8428                         api=QueryRange query="sum(rate(stream_executor_row_count{namespace=~\"default\"}[30s]))" r="{2024-09-08 22:26:30 +0000 UTC 2024-09-10 10:28:40 +0000 UTC 30s}"
DEBU[2024-09-10T19:08:02+08:00] http://localhost:8428                         api=QueryRange query="sum(rate(stream_executor_row_count{namespace=~\"default\"}[30s]))" r="{2024-09-08 22:26:30 +0000 UTC 2024-09-10 10:28:40 +0000 UTC 30s}" took=1.196407416s

And with offset 30m, it sends GetValue requests

DEBU[2024-09-10T19:08:29+08:00] http://localhost:8428                         api=GetValue end="2024-09-10 09:58:40 +0000 UTC" matchers="[namespace=~\"default\" __name__=\"stream_executor_row_count\"]" start="2024-09-08 21:55:43 +0000 UTC"
DEBU[2024-09-10T19:08:57+08:00] http://localhost:8428                         api=GetValue end="2024-09-10 09:58:40 +0000 UTC" matchers="[namespace=~\"default\" __name__=\"stream_executor_row_count\"]" start="2024-09-08 21:55:43 +0000 UTC" took=28.159786167s
DEBU[2024-09-10T19:08:57+08:00] Select                                        matchers="[namespace=~\"default\" __name__=\"stream_executor_row_count\"]" selectHints="&{1725832543000 1725962320000 345000 rate [] false 30000 false}" took=28.160525958s

I think that's the problem. It lies in the rewriting of the query at the AST level. But I don't understand the rationale behind the design. Would anyone please help?

@arkbriar
Copy link
Author

Hi @jacksontj, I think I've fixed it in #681. Could you help take a look when you have time? Thanks a lot!

Also, thanks for creating such a tool! It's really amazing!

@jacksontj
Copy link
Owner

Thanks for the report; this is actually a VERY interesting issue. Your PR actually was SUPER helpful in parsing your report here -- and was amazing, thanks!

Now I have opened #683 which I believe will fix the issue. A longer description is in the PR but TLDR when I added subquery support I did fail to update the Offset selection for subqueries -- so this should cover it.

I did some testing and it seems fine (and I added a test for it), mind giving that PR build a spin to see if that alleviates your issue? If not we can potentially grab some new data and continue from there.

@arkbriar
Copy link
Author

arkbriar commented Sep 16, 2024

Now I have opened #683 which I believe will fix the issue.

Thanks for the fix!

mind giving that PR build a spin to see if that alleviates your issue?

No problem! I've tested it. The logs look good!

DEBU[2024-09-16T15:30:33+08:00] http://localhost:8429                         api=QueryRange query="sum(rate(stream_executor_row_count{namespace=~\"default\"}[30s]))" r="{2024-09-15 17:53:44 +0000 UTC 2024-09-16 06:55:47.067 +0000 UTC 30s}"
DEBU[2024-09-16T15:30:33+08:00] http://localhost:8428                         api=QueryRange query="sum(rate(stream_executor_row_count{namespace=~\"default\"}[30s]))" r="{2024-09-15 17:53:44 +0000 UTC 2024-09-16 06:55:47.067 +0000 UTC 30s}"
DEBU[2024-09-16T15:30:33+08:00] http://localhost:8429                         api=QueryRange query="sum(rate(stream_executor_row_count{namespace=~\"default\"}[30s]))" r="{2024-09-15 17:53:44 +0000 UTC 2024-09-16 06:55:47.067 +0000 UTC 30s}" took=322.941083ms
DEBU[2024-09-16T15:30:33+08:00] http://localhost:8428                         api=QueryRange query="sum(rate(stream_executor_row_count{namespace=~\"default\"}[30s]))" r="{2024-09-15 17:53:44 +0000 UTC 2024-09-16 06:55:47.067 +0000 UTC 30s}" took=322.528792ms

I think #683 is good to go.

BTW, could you elaborate more on the reason for the sum(rate()) query being pushed down to the downstream servers? Cuz I found that if we'd like to refill missing data (holes) in the sources and get an accurate result, the only way is to pull the data up to promxy and do the calculation there. Could you help me understand the design decision?

@jacksontj
Copy link
Owner

could you elaborate more on the reason for the sum(rate()) query being pushed down to the downstream servers?

So this is a bit tricky -- and you do raise a good point here. To start off; lets talk about why we do query pushdown -- and it pretty much all comes down to serialization and bandwidth. The initial PoC implementation of promxy (prior to open source) was actually without NodeReplacer -- but the performance was SUPER SLOW. After doing a lot of digging the issue basically just boiled down to the downstream APIs having to serialize large sums of data and ship it across the wire. So the "query push down" is intended to offload some compute but primarily to reduce the amount of data we have to move around.

So with that in mind; assuming there are "no holes" pushing down a sum is safe as a sum of sums is still the sum (effectively sum is a re-entrant aggregator). That pushdown has been the case since promxy v0.0.1. So for a subquery it is "the same" as this mechanism so I called it "safe".

I found that if we'd like to refill missing data (holes) in the sources and get an accurate result, the only way is to pull the data up to promxy and do the calculation there.

You do bring up an excellent point here -- and TBH this is a bit of a tricky one. The initial design for promxy was an aggregating proxy in front of some sharded prometheus hosts. In that model (at least historically) if there are holes it is generally "all series" with holes (i.e. prometheus restart/crash). But there definitely could conciveably be scenarios where a single prometheus host has an issue getting data from a specific endpoint (i.e. network policy that applies to only 1 replica of prometheus) -- and in that case we would see issues with the data calculation. Fundamentally promxy is trying to strike a balance between correctness and performance -- and so far this mix has been sufficiently good.

That being said; now that I'm thinking through it a bit more (and describing it out here) we could in theory make the pushdown have some group by mechanism (i.e. by instance) so that we can attempt to "do better" on data fill (at the expense of more serialization/bandwidth). I'll have to think on that a bit; I'm not sure if there is a good/safe way to do that without causing a potentially massive/catastrophic increase in bandwidth/memory (extreme case being sum of all instances in a large deployment -- the cardinality there may be high).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants