Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support datashader.spatial.points.to_parquet with pyarrow #67

Open
jonmmease opened this issue Feb 15, 2019 · 0 comments
Open

Support datashader.spatial.points.to_parquet with pyarrow #67

jonmmease opened this issue Feb 15, 2019 · 0 comments

Comments

@jonmmease
Copy link
Collaborator

PR holoviz/datashader#702 introduced support for spatially indexing Dask dataframes and writing them out as parquet files with custom spatial metadata using the datashader.spatial.points.to_parquet.

To accomplish this, the parquet file is initially written out using dask's dask.dataframe.io.to_parquet function. Then the parquet file is opened with fastparquet directly. The parquet metadata is retrieved using fastparquet, the spatial metadata is added, and then the updated metadata is written back to the file.

In order to support the creation of spatially partitioned parquet files using pyarrow (rather than fastparquet), we would need to work out an similar approach to adding properties to the parquet metadata using the pyarrow parquet API.

@jonmmease jonmmease changed the title Support datashader.spatial.points.to_parquet with pyarrow Support datashader.spatial.points.to_parquet with pyarrow Feb 15, 2019
@jbednar jbednar transferred this issue from holoviz/datashader May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant