Move retries into DataSegmentPusher implementations. #15938

gianm · 2024-02-21T21:51:23Z

The individual implementations know better when they should and should not retry. They can also generate better error messages.

Most network-based deep storage implementations already have logic for retrying, except HDFS, which I added in this patch, and Azure. It looks like the Azure client itself may have some built-in retry stuff, but I'm not totally sure. That one might also need retry wrapping. If anyone has experience with Azure and knows the answer to this, please let me know. For now, I've left it without retry wrapping.

The inspiration for this patch was a situation where EntityTooLarge was generated by the S3DataSegmentPusher, and retried uselessly by the retry harness in PartialSegmentMergeTask. Other related changes in this patch include:

Adds an error message that spells out the problem, and potential fix, when a S3 upload attempt for a segment fails because it exceeds 5GB.
Stops unconditionally retrying on non-retryable S3 errors.
Updates the documentation to clarify that there is in some cases a cap on the max segment size.

The individual implementations know better when they should and should not retry. They can also generate better error messages. The inspiration for this patch was a situation where EntityTooLarge was generated by the S3DataSegmentPusher, and retried uselessly by the retry harness in PartialSegmentMergeTask.

kfaraz

LGTM 🚀

georgew5656 · 2024-02-22T15:24:07Z

the azure client handles retries of transient exceptions. as far as i know there's no limit for size of upload using the BlobClient.upload operation that we use to upload segments so this wouldn't apply, but if it did the client should fail fast as expected rather than retrying.

gianm · 2024-03-04T18:36:16Z

The only error is Invalid docusaurus-theme-mermaid version 2.4.3 in docs as part of Static Checks CI / web-checks (pull_request). I tried restarting the job, but it failed again. Possibly something is wrong with the build cache?

Anyway, I don't think it's related to this patch, so I'll merge it anyway. Thanks for the reviews.

clintropolis · 2024-03-04T23:19:50Z

The only error is Invalid docusaurus-theme-mermaid version 2.4.3 in docs as part of Static Checks CI / web-checks (pull_request). I tried restarting the job, but it failed again. Possibly something is wrong with the build cache?

ah, i saw this same error in a different PR and though it was nothing, but looking closer the real error here is a spellcheck failure:

> spellcheck
> mdspell --en-us --ignore-numbers --report '../docs/**/*.md' || (./script/notify-spellcheck-issues && false)

    ../docs/operations/segment-optimization.md
       54 | age imposes an upper limit of 5GB. 

>> 1 spelling error found in 251 files

I guess its complaining about 5GB, which i hear it would prefer as 5 GB

gianm · 2024-03-05T19:53:08Z

I guess its complaining about 5GB, which i hear it would prefer as 5 GB

Oops, fixed in #16040.

github-actions bot added Area - Documentation Area - Segment Format and Ser/De Area - Ingestion labels Feb 21, 2024

gianm added 2 commits February 21, 2024 14:03

Fix missing var.

c4b8d2f

Adjust imports.

e176e7a

kfaraz approved these changes Feb 22, 2024

View reviewed changes

georgew5656 approved these changes Feb 22, 2024

View reviewed changes

gianm added 3 commits March 1, 2024 11:11

Tests, comments, style.

2cac8a8

Remove unused import.

f97ee43

Merge branch 'master' into push-retries

1d1e628

gianm merged commit 930655f into apache:master Mar 4, 2024
82 of 83 checks passed

gianm deleted the push-retries branch March 4, 2024 18:36

gianm mentioned this pull request Mar 5, 2024

Docs: Fix spelling of 5 GB. #16040

Merged

adarshsanjeev added this to the 30.0.0 milestone May 6, 2024

adarshsanjeev mentioned this pull request May 28, 2024

[DRAFT] 30.0.0 release notes #16505

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move retries into DataSegmentPusher implementations. #15938

Move retries into DataSegmentPusher implementations. #15938

gianm commented Feb 21, 2024 •

edited

Loading

kfaraz left a comment

georgew5656 commented Feb 22, 2024

gianm commented Mar 4, 2024

clintropolis commented Mar 4, 2024

gianm commented Mar 5, 2024

Move retries into DataSegmentPusher implementations. #15938

Move retries into DataSegmentPusher implementations. #15938

Conversation

gianm commented Feb 21, 2024 • edited Loading

kfaraz left a comment

Choose a reason for hiding this comment

georgew5656 commented Feb 22, 2024

gianm commented Mar 4, 2024

clintropolis commented Mar 4, 2024

gianm commented Mar 5, 2024

gianm commented Feb 21, 2024 •

edited

Loading