-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move retries into DataSegmentPusher implementations. #15938
Conversation
The individual implementations know better when they should and should not retry. They can also generate better error messages. The inspiration for this patch was a situation where EntityTooLarge was generated by the S3DataSegmentPusher, and retried uselessly by the retry harness in PartialSegmentMergeTask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
the azure client handles retries of transient exceptions. as far as i know there's no limit for size of upload using the BlobClient.upload operation that we use to upload segments so this wouldn't apply, but if it did the client should fail fast as expected rather than retrying. |
The only error is Anyway, I don't think it's related to this patch, so I'll merge it anyway. Thanks for the reviews. |
ah, i saw this same error in a different PR and though it was nothing, but looking closer the real error here is a spellcheck failure:
I guess its complaining about 5GB, which i hear it would prefer as |
Oops, fixed in #16040. |
The individual implementations know better when they should and should not retry. They can also generate better error messages.
Most network-based deep storage implementations already have logic for retrying, except HDFS, which I added in this patch, and Azure. It looks like the Azure client itself may have some built-in retry stuff, but I'm not totally sure. That one might also need retry wrapping. If anyone has experience with Azure and knows the answer to this, please let me know. For now, I've left it without retry wrapping.
The inspiration for this patch was a situation where EntityTooLarge was generated by the S3DataSegmentPusher, and retried uselessly by the retry harness in PartialSegmentMergeTask. Other related changes in this patch include: