Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add support for reading structs in GpuJsonScan #10325

Closed
wants to merge 1 commit into from

Conversation

andygrove
Copy link
Contributor

@andygrove andygrove commented Jan 30, 2024

Closes #10241

Creating this draft PR for discussion.

Pros:

  • Allows us to read JSON files containing structs on the GPU, making it consistent with GpuJsonToStruct, which already supports this

Cons:

  • cuDF will infer the types of fields in the nested structs and this is different to the behavior of how we read top-level primitive fields, where we specify to cuDF that they should be read as strings, and then we cast to the specific type in the plugin. To be able to make this all consistent we will need a way to specify struct schemas in cuDF JNI (rather than just specifying DType.STRUCT)

Status:

Some of the tests are failing with errors such as:

Type conversion is not allowed from Table{columns=[ColumnVector{rows=3, type=STRING, nullCount=Optional.empty, offHeap=(ID: 77294 7f558d500920)}, ColumnVector{rows=3, type=STRUCT, nullCount=Optional.empty, offHeap=(ID: 77295 7f558d60e070)}], cudfTable=140005399531232, rows=3} to [StringType, StructType(StructField(name,StringType,true),StructField(age,IntegerType,true))] columns 0 to 2

@andygrove andygrove self-assigned this Jan 30, 2024
Signed-off-by: Andy Grove <andygrove@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Add support for reading nested JSON in GpuJsonScan
1 participant