Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompress files in parallel #337

Merged
merged 1 commit into from
Jan 6, 2023
Merged

Decompress files in parallel #337

merged 1 commit into from
Jan 6, 2023

Conversation

vrmiguel
Copy link
Member

@vrmiguel vrmiguel commented Jan 6, 2023

Uses rayon to call decompress_file in parallel.

The gain in performance occurs only if more than one file/archive are being decompressed at the same time, e.g. ouch d videos.tar.gz photos.tar.bz2. If ouch takes 3 secs to decompress videos.tar.gz and 4 secs to decompress photos.tar.bz2 it's safe to assume it'd take the previous single-threaded code 7 secs to decompress both since the operations take place in sequence.

Using rayon it's generally safe to assume the runtime is clamped by the time it takes to decompress the biggest file, e.g. it'd multithreaded ouch 4 secs to decompress both videos.tar.gz and photos.tar.bz2

I guess it's not a very common usecase to decompress many archives at once but I don't think this feature costs much. The binary size, at least, did not seem to increase

As a next step we can try using gzp by @sstadick to use Writers that work in parallel

Copy link
Member

@figsoda figsoda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes lgtm, but we might want to keep #244 open

@figsoda figsoda merged commit 986a6f6 into main Jan 6, 2023
@figsoda figsoda deleted the decompress-with-rayon branch January 6, 2023 01:20
@marcospb19
Copy link
Member

I'm afraid that one of our dependencies can't handle this, idk, previously we had some problems with parallel tests.

Feeling like we should have some tests, @vrmiguel what do you think about re-adding the Python integration tests? We can choose to rewrite it in Rust later, but that's not necessary, I'm just a little annoyed that I don't trust Ouch itself because of the lack of tests.

For example, we don't have any test that does compression of multiple archives at once to make sure this is working, and nobody breaks it in the future.

@marcospb19 marcospb19 mentioned this pull request Jan 11, 2023
2 tasks
@figsoda
Copy link
Member

figsoda commented Jan 31, 2023

we can try using gzp

#348

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants