-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WriteOutBytes improvements #16698
base: master
Are you sure you want to change the base?
WriteOutBytes improvements #16698
Conversation
processing/src/main/java/org/apache/druid/segment/writeout/FileWriteOutBytes.java
Show resolved
Hide resolved
file.toPath(), | ||
StandardOpenOption.READ, | ||
StandardOpenOption.WRITE | ||
return new LazilyAllocatingHeapWriteOutBytes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not implement a new SegmentWriteOutMedium
instead of changing the behavior of an existing one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it has broad-reaching performance improving impacts? The alternative would be to create a new thing, update everything to use it and then delete the old, which is basically the same as changing in place. The reason to leave the old one would be if we wanted to maintain configurability to do the old thing, but I don't know why that would be useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea im for the strategy in general, just kind of worried about the slightly larger memory footprint here without any form of escape hatch. This looks like it can grow up to 16k (up from the 4k heap buffer of the old FileWriteOutBytes
buffer that has been replaced with a direct buffer in this PR). Its probably cool, I'm just being nervous.
I was suggesting mostly just making a new thing and making it the default and leave the old thing just in case, but i can be convinced that this is close enough to not matter i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
processing/src/main/java/org/apache/druid/segment/writeout/FileWriteOutBytes.java
Show resolved
Hide resolved
file.toPath(), | ||
StandardOpenOption.READ, | ||
StandardOpenOption.WRITE | ||
return new LazilyAllocatingHeapWriteOutBytes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it has broad-reaching performance improving impacts? The alternative would be to create a new thing, update everything to use it and then delete the old, which is basically the same as changing in place. The reason to leave the old one would be if we wanted to maintain configurability to do the old thing, but I don't know why that would be useful here.
Can you add please tests to cover the code changes? E.g. is there a test already that makes sure that |
This PR generally improves the working of WriteOutBytes and WriteOutMedium. Some analysis of usage of TmpFileSegmentWriteOutMedium shows that they periodically get used for very small things. The overhead of creating a tmp file is actually very large. To improve the performance in these cases, this PR modifies TmpFileSegmentWriteOutMedium to return a heap-based WriteOutBytes that falls back to making a tmp file when it actually fills up.
This PR has: