-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34506][CORE] ADD JAR with ivy coordinates should be compatible with Hive transitive behavior #31623
Conversation
cc @AngersZhuuuu @wangyum FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great finds @shardulm94 ! I'm glad we were able to catch this before it went out in a release (since #29966 missed the 3.1.1 release). Thanks a lot for doing this and PR #31591 to really polish up this feature!
// exclude org.pentaho:pentaho-aggdesigner-algorithm as this transitive dependency does | ||
// not exist on mavencentral and hence cannot be found in the test environment | ||
sql(s"ADD JAR ivy://org.apache.hive.hcatalog:hive-hcatalog-core:$hiveVersion" + | ||
"?exclude=org.pentaho:pentaho-aggdesigner-algorithm") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is odd. Any idea why hive-hcatalog-core
has a dependency on a package that doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very infamous dependency brought in by calcite. It does not exist on mavencentral but it does exist on conjars.org. Here is a Stack overflow asking about the same issue. https://stackoverflow.com/questions/32587769/dependency-resolution-issue-in-gradle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yikes.
Thanks for your double check, nice catch, I miss some point when check hive's behavior. |
ok to test |
cc @maropu and @dongjoon-hyun too FYI |
Oh, I see. Nice catch.
Just to check; all the released Hive versions having this feature have the same default behaviour? |
core/src/main/scala/org/apache/spark/util/DependencyUtils.scala
Outdated
Show resolved
Hide resolved
I left some minor comments though, the proposal itself seems okay. |
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #135400 has finished for PR 31623 at commit
|
Test build #135439 has finished for PR 31623 at commit
|
If no one has more comments, I'll merge this in a few days. |
Thanks! Merged to master. |
Thanks @maropu for taking care of this! |
Thank you, @shardulm94 and all. Like @xkrogen said, it's really nice to fix this at the early stage. |
… with Hive transitive behavior ### What changes were proposed in this pull request? SPARK-33084 added the ability to use ivy coordinates with `SparkContext.addJar`. PR apache#29966 claims to mimic Hive behavior although I found a few cases where it doesn't 1) The default value of the transitive parameter is false, both in case of parameter not being specified in coordinate or parameter value being invalid. The Hive behavior is that transitive is [true if not specified](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L169) in the coordinate and [false for invalid values](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L124). Also, regardless of Hive, I think a default of true for the transitive parameter also matches [ivy's own defaults](https://ant.apache.org/ivy/history/2.5.0/ivyfile/dependency.html#_attributes). 2) The parameter value for transitive parameter is regarded as case-sensitive [based on the understanding](apache#29966 (comment)) that Hive behavior is case-sensitive. However, this is not correct, Hive [treats the parameter value case-insensitively](https://github.com/apache/hive/blob/cb2ac3dcc6af276c6f64ee00f034f082fe75222b/ql/src/java/org/apache/hadoop/hive/ql/util/DependencyResolver.java#L122). I propose that we be compatible with Hive for these behaviors ### Why are the changes needed? To make `ADD JAR` with ivy coordinates compatible with Hive's transitive behavior ### Does this PR introduce _any_ user-facing change? The user-facing changes here are within master as the feature introduced in SPARK-33084 has not been released yet 1. Previously an ivy coordinate without `transitive` parameter specified did not resolve transitive dependency, now it does. 2. Previously an `transitive` parameter value was treated case-sensitively. e.g. `transitive=TRUE` would be treated as false as it did not match exactly `true`. Now it will be treated case-insensitively. ### How was this patch tested? Modified existing unit tests to test new behavior Add new unit test to cover usage of `exclude` with unspecified `transitive` Closes apache#31623 from shardulm94/spark-34506. Authored-by: Shardul Mahadik <smahadik@linkedin.com> Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
What changes were proposed in this pull request?
SPARK-33084 added the ability to use ivy coordinates with
SparkContext.addJar
. PR #29966 claims to mimic Hive behavior although I found a few cases where it doesn'tThe default value of the transitive parameter is false, both in case of parameter not being specified in coordinate or parameter value being invalid. The Hive behavior is that transitive is true if not specified in the coordinate and false for invalid values. Also, regardless of Hive, I think a default of true for the transitive parameter also matches ivy's own defaults.
The parameter value for transitive parameter is regarded as case-sensitive based on the understanding that Hive behavior is case-sensitive. However, this is not correct, Hive treats the parameter value case-insensitively.
I propose that we be compatible with Hive for these behaviors
Why are the changes needed?
To make
ADD JAR
with ivy coordinates compatible with Hive's transitive behaviorDoes this PR introduce any user-facing change?
The user-facing changes here are within master as the feature introduced in SPARK-33084 has not been released yet
transitive
parameter specified did not resolve transitive dependency, now it does.transitive
parameter value was treated case-sensitively. e.g.transitive=TRUE
would be treated as false as it did not match exactlytrue
. Now it will be treated case-insensitively.How was this patch tested?
Modified existing unit tests to test new behavior
Add new unit test to cover usage of
exclude
with unspecifiedtransitive