Avoid file rename when pushing segments with HDFSDataPusher with UUID #7532

nishantmonu51 · 2019-04-23T21:51:59Z

Description

As per the current behavior, HDFSDataSegmentPusher pushes segments to a temp location and then uses rename to move the segments to final location.
This behavior was added to fix race when two replica tasks try to write segment files to same location.

However when useUniquePath is set to true, a UUID is appended to the file name and the write to temp location is not required.

This change is to write directly to final location and rely on useUniquePath flag to avoid conflicts.

Motivation

When underlying file system is S3AFileSystem, where rename is not supported a copy is done instead of rename and can be avoided when useUniquePath is set to true.
Link to S3A rename docs - https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1070

gianm · 2019-04-23T22:11:32Z

👍 sounds like a solid improvement.

nishantmonu51 · 2019-04-24T06:09:19Z

Thanks @gianm, looked at the code and it seems that we can avoid copy in all cases and rely on the useUniquePath flag. have created #7537
Please review.

Fixes apache#7532 A push to temporary location and a copy is not required as we can safely rely on the useUniquePath flag to avoid path conflicts of multiple tasks.

nishantmonu51 added Feature/Change Description Area - Deep Storage Improvement labels Apr 23, 2019

nishantmonu51 mentioned this issue Apr 24, 2019

Avoid file rename when pushing segments with HDFSDataPusher #7537

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid file rename when pushing segments with HDFSDataPusher with UUID #7532

Avoid file rename when pushing segments with HDFSDataPusher with UUID #7532

nishantmonu51 commented Apr 23, 2019 •

edited

Loading

gianm commented Apr 23, 2019

nishantmonu51 commented Apr 24, 2019

Avoid file rename when pushing segments with HDFSDataPusher with UUID #7532

Avoid file rename when pushing segments with HDFSDataPusher with UUID #7532

Comments

nishantmonu51 commented Apr 23, 2019 • edited Loading

Description

Motivation

gianm commented Apr 23, 2019

nishantmonu51 commented Apr 24, 2019

nishantmonu51 commented Apr 23, 2019 •

edited

Loading