Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid file rename when pushing segments with HDFSDataPusher with UUID #7532

Open
nishantmonu51 opened this issue Apr 23, 2019 · 2 comments
Open

Comments

@nishantmonu51
Copy link
Member

nishantmonu51 commented Apr 23, 2019

Description

As per the current behavior, HDFSDataSegmentPusher pushes segments to a temp location and then uses rename to move the segments to final location.
This behavior was added to fix race when two replica tasks try to write segment files to same location.

However when useUniquePath is set to true, a UUID is appended to the file name and the write to temp location is not required.

This change is to write directly to final location and rely on useUniquePath flag to avoid conflicts.

Motivation

When underlying file system is S3AFileSystem, where rename is not supported a copy is done instead of rename and can be avoided when useUniquePath is set to true.
Link to S3A rename docs - https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1070

@gianm
Copy link
Contributor

gianm commented Apr 23, 2019

👍 sounds like a solid improvement.

@nishantmonu51
Copy link
Member Author

Thanks @gianm, looked at the code and it seems that we can avoid copy in all cases and rely on the useUniquePath flag. have created #7537
Please review.

nishantmonu51 added a commit to nishantmonu51/druid that referenced this issue Sep 2, 2019
Fixes apache#7532

A push to temporary location and a copy is not required as we can safely rely on the useUniquePath flag to avoid path conflicts of multiple tasks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants