[FEA] Add python bindings in the parquet reader for num_rows
/skiprows
#15144
Labels
0 - Backlog
In queue waiting for assignment
cuIO
cuIO issue
feature request
New feature or request
good first issue
Good for newcomers
libcudf
Affects libcudf (C++/CUDA) code.
Python
Affects Python cuDF API.
Milestone
Is your feature request related to a problem? Please describe.
Unfortunately there has been churn in libcudf around support for
num_rows
/skiprows
in the Parquet and ORC readers. In 22.08 we deprecated these parameters in the parquet reader (#11218) and then in 22.10 we removed them from C++ (#11503) and python (#11480). We also deprecatednum_rows
/skiprows
in the ORC reader (#11522, see issue #11519).At this point, we realized that chunked parquet reading (#11867) would require adding
num_rows
/skiprows
back to the C++ implementation (#11657).Let's stabilize row selection APIs in libcudf by completing these tasks:
num_rows
/skiprows
num_rows
/skiprows
([REVIEW] Deprecateskiprows
andnum_rows
inread_orc
#11522)Additional context
We also dropped
num_rows
/skiprows
support in the cuDF-python fuzz tests (#11505). My preference is to not include any python fuzz testing changes in the scope of this issue.The text was updated successfully, but these errors were encountered: