You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As per the current behavior, HDFSDataSegmentPusher pushes segments to a temp location and then uses rename to move the segments to final location.
This behavior was added to fix race when two replica tasks try to write segment files to same location.
However when useUniquePath is set to true, a UUID is appended to the file name and the write to temp location is not required.
This change is to write directly to final location and rely on useUniquePath flag to avoid conflicts.
Thanks @gianm, looked at the code and it seems that we can avoid copy in all cases and rely on the useUniquePath flag. have created #7537
Please review.
Fixesapache#7532
A push to temporary location and a copy is not required as we can safely rely on the useUniquePath flag to avoid path conflicts of multiple tasks.
Description
As per the current behavior, HDFSDataSegmentPusher pushes segments to a temp location and then uses rename to move the segments to final location.
This behavior was added to fix race when two replica tasks try to write segment files to same location.
However when useUniquePath is set to true, a UUID is appended to the file name and the write to temp location is not required.
This change is to write directly to final location and rely on useUniquePath flag to avoid conflicts.
Motivation
When underlying file system is S3AFileSystem, where rename is not supported a copy is done instead of rename and can be avoided when useUniquePath is set to true.
Link to S3A rename docs - https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L1070
The text was updated successfully, but these errors were encountered: