You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In the Spark RAPIDS plugin, we typically want to read JSON primitives as strings and then cast to the required type in the plugin, to ensure compatibility with Spark.
This works for top-level primitives in a JSON file. However, there doesn't seem to be a way to specify the data types of fields within a struct.
Here is an example input file where I would like to read fields b and c as strings.
cuDF has inferred the type of column b and there seems to be no way for me to specify to read this as a string instead of int64.
Describe the solution you'd like
There are two possible solutions:
Add the ability to specify struct types fully.
Add an option for reading structs as unparsed strings and then parse the JSON string in the plugin. This would be similar to the recently added support for reading mixed types as string. The API for this could be one of the following:
Specify the type STRING rather than STRUCT for the column
Add a new JSON reader option structs_as_strings
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
Specifiying nested type data type is available in libcudf json_reader_options. It is exposed as an array of dtypes in JNI (jintArray j_types).
The interface should be updated to allow nested specification of columns (cudf::io::schema_element).
andygrove
changed the title
[FEA] Add ability to read JSON structs as strings
[FEA] Add ability to read JSON structs as strings, or specify struct schema
Jan 23, 2024
Is your feature request related to a problem? Please describe.
In the Spark RAPIDS plugin, we typically want to read JSON primitives as strings and then cast to the required type in the plugin, to ensure compatibility with Spark.
This works for top-level primitives in a JSON file. However, there doesn't seem to be a way to specify the data types of fields within a struct.
Here is an example input file where I would like to read fields
b
andc
as strings.Here is some Java code for reading this file.
The output is:
cuDF has inferred the type of column
b
and there seems to be no way for me to specify to read this as a string instead of int64.Describe the solution you'd like
There are two possible solutions:
structs_as_strings
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: