Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fall back to CPU for try_cast in Spark 3.4.0 [databricks] #8179
Fall back to CPU for try_cast in Spark 3.4.0 [databricks] #8179
Changes from 9 commits
7d3c50d
633ee8e
4745c96
72dae84
322f8e4
86cfde4
160da45
6994dad
0ac77cf
54dcf92
ec66d5b
db17aef
68a73ce
d68d27e
1e933a2
e285cc9
c2332ad
3004cd9
d4fa8f1
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why use
evalMode
to computeansiEnabled
and then turn around and useansiEnabled
to computeevalMode
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to minimize changes to existing code, but I can revisit this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just a nit so you can decide what to do. But if you delete lines 58 to 62 I would not complain about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused and it is not clear how Spark340Plus shims is going to know if the cast is ANSI, TRY, or LEGACY. This inherits from Spark331PlusShims, But 331 does not do this, Only databricks 330, and I don't think this inherits from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We provide overrides for Cast in Spark320PlusShims and Spark31XShims.
Spark340PlusShims indirectly extends Spark320PlusShims (via Spark331PlusShims, Spark330PlusNonDBShims, Spark330PlusShims, and Spark321PlusShims).
These shims are delegating to
AnsiCastShim
, which is shimmed for 311+ and 330db/340 as follows:sql-plugin/src/main/spark311/scala/com/nvidia/spark/rapids/shims/AnsiCastShim.scala (311+)
sql-plugin/src/main/spark330db/scala/com/nvidia/spark/rapids/shims/AnsiCastShim.scala (330db + 340)
It is all very confusing, for sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, it was especially confusing that we have a db shim that shims for non-db. I understand that it makes sense, but I think this is not a pattern we are used to seeing so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it works, because the tests pass, but we should fix this. Can you file an issue so that we don't have 340 depend on 330db, unless we have one to do it already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a discussion about this in #8169 (comment) and it seems that it is correct that we have both 330db/340 in the same shim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is right. Not that it makes it any less confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed #8188 for removing the dependency from 340 to 330db
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best way to make the code more clear is to break up trait inheritance, stop encoding version ranges in the class/trait/object names