Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan_parquet from io.BytesIO() #10413

Open
s-b90 opened this issue Aug 10, 2023 · 9 comments
Open

scan_parquet from io.BytesIO() #10413

s-b90 opened this issue Aug 10, 2023 · 9 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@s-b90
Copy link

s-b90 commented Aug 10, 2023

Problem description

Add ability to accept io.BytesIO() as source parameter for scan_parquet. As for now, it accepts only a path to file/s.
This feature may be useful in cases when your program receives parquet through rest API or socket, directly into memory.

@s-b90 s-b90 added the enhancement New feature or an improvement of an existing feature label Aug 10, 2023
@ritchie46
Copy link
Member

ritchie46 commented Aug 11, 2023

I am pretty sure that this is a duplicate. 🤔

@s-b90
Copy link
Author

s-b90 commented Aug 11, 2023

True, I'm sorry. I've found some related issues #4950 #9511. They all are about scan_csv but you definitely can close this as a duplicate. Just don't forget about parquet also :)

@Object905
Copy link
Contributor

Object905 commented Aug 11, 2023

It would also be great if scan_* and read_* functions had unified input "type" for files\bytes\etc..
Also it will be nice so that they accepted list of BytesIO or path-like, to process them in parallel like with glob pattern.

@adamgreg
Copy link
Contributor

My application has Parquet embedded as BLOBs in SQL tables, and processes and combines them lazily. I would love to see support for this - at the moment I have to use read_parquet() and miss out on pushdown optimisations.

@aberres
Copy link
Contributor

aberres commented Jan 10, 2024

A similar use case here. We have a bunch of Parquet files in memory I want to work with, without having all of them in memory at the same time.

@shoz
Copy link

shoz commented Jan 24, 2024

I would be very happy with this improvement. I have about a million parquet files stored as binaries in Redis and I want to read them as LazyFrame to save memory space.

@HWiese1980
Copy link

This is still open. Has here been progress? I need this functionality too.

@cmdlineluser
Copy link
Contributor

@HWiese1980 Coincidentally, it was added on main a few hours ago #18532

@HWiese1980
Copy link

Hah! That's quite the timing! :-D Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

8 participants