Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clp-s: Correctly report uncompressed size of archives during archive-splitting (fixes #469). #463

Merged
merged 1 commit into from
Jul 3, 2024

Conversation

gibber9809
Copy link
Contributor

Description

This PR fixes a bug where if an archive is split while halfway through parsing a buffer of JSON objects the entire buffer is attributed to uncompressed size of the first archive instead of being split correctly between the archives before and after the split. We solve this problem by adding a new function to JsonFileIterator which reports the total number of bytes consumed by the caller (as opposed to the total number of bytes read by JsonFileIterator which is what we used before).

Validation performed

  • Validated that archives are correctly attributed the right proportion of a buffer of JSON during archive splitting
  • Validated that the sum of uncompressed size of all archives is equal to the total file size

Copy link
Contributor

@wraymo wraymo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to create an issue for it? And can we use an affirmative title like "Correctly report..."?

@gibber9809 gibber9809 changed the title clp-s: Fix bug where reported uncompressed size for an archive can be incorrect clp-s: Correctly report uncompressed size of archives during archive-splitting (fixes #469). Jul 3, 2024
@gibber9809 gibber9809 merged commit 3c1f0ad into y-scope:main Jul 3, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clp-s: Reported uncompressed size can be incorrect after archive-splitting
2 participants