Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge hydrant runners flatly for realtime queries. #15757

Merged
merged 3 commits into from
Jan 25, 2024

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Jan 25, 2024

Prior to this patch, we have two layers of mergeRunners for realtime queries: one for each Sink (a logical segment) and one across all Sinks. This is to keep metrics and results grouped by Sink, given that each FireHydrant within a Sink has its own separate storage adapter.

However, it costs extra memory usage due to the extra layer of materialization. This is especially pronounced for groupBy queries, which only use their merge buffers at the top layer of merging. The lower layer of merging materializes ResultRows directly into the heap, which can cause heap exhaustion if there are enough ResultRows.

This patch changes to a single layer of merging when bySegment: false, just like Historicals. To accommodate that, segment metrics like query/segment/time are now per-FireHydrant instead of per-Sink.

Two layers of merging are retained when bySegment: true. This isn't common, because it's typically only used when segment level caching is enabled on the Broker, which is off by default.

Prior to this patch, we have two layers of mergeRunners for realtime
queries: one for each Sink (a logical segment) and one across all
Sinks. This is done because to keep metrics and results grouped by Sink,
given that each FireHydrant within a Sink has its own separate storage
adapter.

However, it costs extra memory usage due to the extra layer of
materialization. This is especially pronounced for groupBy queries,
which only use their merge buffers at the top layer of merging. The
lower layer of merging materializes ResultRows directly into the heap,
which can cause heap exhaustion if there are enough ResultRows.

This patch changes to a single layer of merging when bySegment: false,
just like Historicals. To accommodate that, segment metrics like
query/segment/time are now per-FireHydrant instead of per-Sink.

Two layers of merging are retained when bySegment: true. This isn't
common, because it's typically only used when segment level caching
is enabled on the Broker, which is off by default.
@gianm
Copy link
Contributor Author

gianm commented Jan 25, 2024

The metric change should be called out in docs and release notes.

Copy link
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@asdf2014 asdf2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@asdf2014 asdf2014 merged commit 01e9d96 into apache:master Jan 25, 2024
83 checks passed
@gianm gianm deleted the more-chill-merge-runners branch January 25, 2024 17:15
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants