-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement all the casting cases that GPU can support for ORC reading. #6149
Comments
Note that for the CHAR type, casting to a string requires stripping the trailing whitespaces from the value to match the CPU behavior. See #6188 (comment). It would be nice if we could ask libcudf to load the CHAR column by stripping trailing whitespace instead of adding it, so we don't have to perform a post-processing step on the CHAR columns. |
I divide these castings into these subcategories, according to the source type.
Two special case:
Whitespaces of char/varchar/string should be paied attention to, which is mentioned above. |
A Summary of Implementation DetailsCasting from Integer Types
Casting from Float types
Casting from string
Casting from Date types (TODO)
However, there are still some issues. For more details, see the comments in #6357 . Here is the Code branch. |
As the discussion mentioned in apache/orc#1237,
That is we can replace Schema evolution with For example, if we have an ORC file, it contains one column # Read `date_str` in type of string, do not use schema evolution
scala> var df = spark.read.schema("date_str string").orc("/tmp/orc/data.orc");
scala> df.show()
+----------+
| date_str|
+----------+
|2002-01-01|
|2022-08-29|
|2022-08-31|
|2022-01-32|
|9808-02-30|
|2022-06-31|
+----------+
# Cast `date_str` to type of `date`, using SQL-CAST
scala> df.registerTempTable("table")
scala> df.sqlContext.sql("select CAST(date_str as date) from table").show()
+----------+
| date_str|
+----------+
|2002-01-01|
|2022-08-29|
|2022-08-31|
| null|
| null|
| null|
+----------+
|
There will be more than 100 cases. We may need multiple sub issues for this.
Click to see full type casting list CPU ORC supports.
The text was updated successfully, but these errors were encountered: