Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The coalesced AVRO file may contain different sync markers if the sync marker varies in the avro files being coalesced. #5312

Closed
firestarman opened this issue Apr 26, 2022 · 2 comments · Fixed by #5428
Assignees
Labels
bug Something isn't working

Comments

@firestarman
Copy link
Collaborator

firestarman commented Apr 26, 2022

Describe the bug
The AVRO coalescing reading, introduced in #5306, will produce a big AVRO file with different sync markers when the sync marker varies in the original files. But one AVRO file should have the same sync marker in each sync block.

This bug exists because the reader simply concatenates the blocks from different files. We choose to do this is the implementation is quite simple for the quick start, and the cuDF AVRO reader ignores the sync markers for now.

Expected behavior
We should write the same sync marker into the coalesced AVRO file, no matter whether the original files have different sync markers.

@firestarman firestarman added bug Something isn't working ? - Needs Triage Need team to review and classify labels Apr 26, 2022
@firestarman firestarman self-assigned this Apr 26, 2022
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Apr 26, 2022
@sameerz sameerz modified the milestone: Apr 18 - Apr 29 Apr 26, 2022
@tgravescs
Copy link
Collaborator

so to clarify this isn't really a problem right now because CUDF ignores the sync markers right now, but we should clean it up in case they do, correct?

@firestarman
Copy link
Collaborator Author

so to clarify this isn't really a problem right now because CUDF ignores the sync markers right now, but we should clean it up in case they do, correct?

That's right.

@sameerz sameerz changed the title [BUG]The coalesced AVRO file may contain different sync markers if the sync marker varies in the avro files being coalesced. [BUG] The coalesced AVRO file may contain different sync markers if the sync marker varies in the avro files being coalesced. Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants