Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WriteOutBytes improvements #16698

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

adarshsanjeev
Copy link
Contributor

@adarshsanjeev adarshsanjeev commented Jul 8, 2024

This PR generally improves the working of WriteOutBytes and WriteOutMedium. Some analysis of usage of TmpFileSegmentWriteOutMedium shows that they periodically get used for very small things. The overhead of creating a tmp file is actually very large. To improve the performance in these cases, this PR modifies TmpFileSegmentWriteOutMedium to return a heap-based WriteOutBytes that falls back to making a tmp file when it actually fills up.

  • Increases the memory in FileWriteOutBytes and switches to using direct memory to improve performance.
  • Refactor TmpFileSegmentWriteOutMedium to use a heap based WriteOutBytes before falling back to a temporary file.
  • Modify FileWriteOutBytes's behavior on writing a value larger than its buffer. Previously, it used to throw an exception. This PR modifies it to write the value to the file directly instead, bypassing the buffer.
  • Adds some additional logging around the usage of write out bytes.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

file.toPath(),
StandardOpenOption.READ,
StandardOpenOption.WRITE
return new LazilyAllocatingHeapWriteOutBytes(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not implement a new SegmentWriteOutMedium instead of changing the behavior of an existing one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it has broad-reaching performance improving impacts? The alternative would be to create a new thing, update everything to use it and then delete the old, which is basically the same as changing in place. The reason to leave the old one would be if we wanted to maintain configurability to do the old thing, but I don't know why that would be useful here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea im for the strategy in general, just kind of worried about the slightly larger memory footprint here without any form of escape hatch. This looks like it can grow up to 16k (up from the 4k heap buffer of the old FileWriteOutBytes buffer that has been replaced with a direct buffer in this PR). Its probably cool, I'm just being nervous.

I was suggesting mostly just making a new thing and making it the default and leave the old thing just in case, but i can be convinced that this is close enough to not matter i think.

Copy link
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

file.toPath(),
StandardOpenOption.READ,
StandardOpenOption.WRITE
return new LazilyAllocatingHeapWriteOutBytes(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it has broad-reaching performance improving impacts? The alternative would be to create a new thing, update everything to use it and then delete the old, which is basically the same as changing in place. The reason to leave the old one would be if we wanted to maintain configurability to do the old thing, but I don't know why that would be useful here.

@abhishekagarwal87
Copy link
Contributor

Can you add please tests to cover the code changes? E.g. is there a test already that makes sure that filesCreated is changed correctly in TmpFileSegmentWriteOutMedium class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants