Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use IPC StreamWriter / StreamReader rather than writing to old Arrow file format #577

Closed
Dandandan opened this issue Dec 22, 2022 · 1 comment
Labels
enhancement New feature or request performance

Comments

@Dandandan
Copy link
Contributor

Dandandan commented Dec 22, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Currently we are writing to the (deprecated) Arrow format, but when sending the data to workers, we encode it to RecordBatch and re-encode data in the IPC format.

The benefit of writing to the IPC format directly, is that we can stream the data from disk and don't have to de-encode / re-encode the data. We'll also have to compress the data once (and benefit from reduced file-sizes too).

Describe the solution you'd like
Use the Arrow IPC stream format rather than the (old) file format.

Describe alternatives you've considered

Additional context

@Dandandan Dandandan added enhancement New feature or request performance labels Dec 22, 2022
@Dandandan Dandandan changed the title Use IPC StreamWriter / StreamReader rather than writing to Arrow files Use IPC StreamWriter / StreamReader rather than writing to old Arrow file format Dec 22, 2022
@Jefffrey
Copy link
Contributor

Jefffrey commented Apr 5, 2024

Looks to have been resolved by #943 (and duplicated by #942)

@Jefffrey Jefffrey closed this as not planned Won't fix, can't repro, duplicate, stale Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

2 participants