Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace S3Boto3Storage._compress_content with a streaming implementation #1061

Merged
merged 1 commit into from
Oct 7, 2021

Conversation

vainu-arto
Copy link
Contributor

The original version reads the entire content in one go, compresses and
places it into a buffer in memory. This limits both the possible size of
the file that can be saved and compressed, and also the performance of the
transfer.

@vainu-arto vainu-arto force-pushed the streaming-compression branch 2 times, most recently from 34e5c95 to 6f273e9 Compare September 21, 2021 12:45
The original version reads the entire content in one go, compresses and
places it into a buffer in memory. This limits both the possible size of
the file that can be saved and compressed, and also the performance of the
transfer.
@jschneier
Copy link
Owner

This handles setting mtime=0, right?

@vainu-arto
Copy link
Contributor Author

Yes, mtime in the header will always be zero. In my understanding it isn't actually possible to set it to any other value when allowing zlib to generate the header itself.

@jschneier
Copy link
Owner

How was this tested?

@vainu-arto
Copy link
Contributor Author

I ran random sets of data through it and the stdlib GzipFile, decompressed (with GzipFile and system gzip) and compared the output between each other and the original. I couldn't find a way to make it produce output that could not be decompressed or would differ from the original.

@jschneier
Copy link
Owner

Thanks

@jschneier jschneier merged commit 544a9f9 into jschneier:master Oct 7, 2021
@vainu-arto vainu-arto deleted the streaming-compression branch October 8, 2021 04:08
mlazowik pushed a commit to qedsoftware/django-storages that referenced this pull request Mar 9, 2022
…#1061)

The original version reads the entire content in one go, compresses and
places it into a buffer in memory. This limits both the possible size of
the file that can be saved and compressed, and also the performance of the
transfer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants