You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I wish the RAPIDS Accelerator for Apache Spark would [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
Currently the readers (parquet for example) performs badly when there are a lot of small files. We perform much better at a small number of large files. We should improve performance of reading small files.
The text was updated successfully, but these errors were encountered:
note that one issue with this is Spark has a feature where you can ask what the filename is that you are reading. Perhaps we can recognize when that feature is being used.
tgravescs
changed the title
[FEA] Better handling of lots of small files
[FEA] Better handling of reading lots of small files
Jul 8, 2020
tgravescs
changed the title
[FEA] Better handling of reading lots of small files
[FEA] Better handling of reading lots of small Parquet files
Aug 20, 2020
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I wish the RAPIDS Accelerator for Apache Spark would [...]
Describe the solution you'd like
A clear and concise description of what you want to happen.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
Currently the readers (parquet for example) performs badly when there are a lot of small files. We perform much better at a small number of large files. We should improve performance of reading small files.
The text was updated successfully, but these errors were encountered: