-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dist jar build should not need to unshim and dedup external dependencies #5294
Comments
One potential issue of above solution might be that different spark shims could still use inconsistent jni libs to compile/build aggregators, but at last we just pick one version in dist pkg for all shims. Probably we could also specify a jni version at the beginning of the build, and use it for all shims building? |
Agreed. For any dependency, cudf or otherwise, we should try to avoid pulling in dynamic versions that can change across multiple builds like SNAPSHOT. |
Let me check this issue. |
mvn CLI of mvn tree mvn -X dependency:tree -f sql_plugin
|
When I tried to build rapids-plugin with In file rapids4spark-version-info.properties, jni version: In file spark-rapids-jni-version-info.properties:
Then when running integration tests, rapids/cudf version MISMATCH would occur
|
To fix issue : NVIDIA#5294 Build rapids against the specified spark-rapids-jni snapshot jar, to avoid pulling different versions of spark-rapids-jni dependency jars during the compile. Signed-off-by: Tim Liu <timl@nvidia.com>
To fix issue : NVIDIA#5294 Build rapids against the specified spark-rapids-jni snapshot jar, to avoid pulling different versions of spark-rapids-jni dependency jars during the compile. Signed-off-by: Tim Liu <timl@nvidia.com>
* Build against specified spark-rapids-jni snapshot jar To fix issue : #5294 Build rapids against the specified spark-rapids-jni snapshot jar, to avoid pulling different versions of spark-rapids-jni dependency jars during the compile. Signed-off-by: Tim Liu <timl@nvidia.com> * Support maven build options for databricks nightly build and build/buildall script Signed-off-by: Tim Liu <timl@nvidia.com> * Update according to review 1, Test case for cudf expected timestamp-seq version name vs actual SNAPSHOT match, e.g. cudf version "7.0.1-20220101.001122-123" is satisfied by "7.0.1-SNAPSHOT" 2, Use $MVN instead of "mvn" for every invocation of Maven, to include maven options for all the mvn commands. Signed-off-by: Tim Liu <timl@nvidia.com> * Fix scalastyle checking failure : line length exceeds 100 characters Signed-off-by: Tim Liu <timl@nvidia.com> * Support modifying M2DIR, we may overwrite it outside the script Signed-off-by: Tim Liu <timl@nvidia.com> * Copyright 2022 Signed-off-by: Tim Liu <timl@nvidia.com>
We can build rapids with a specified version of spark-rapids-jni now. Still need to do: |
Is your feature request related to a problem? Please describe.
Currently external dependencies such as jucx and spark-rapids-jni are pulled in for every aggregator jar, and then the dist jar build ends up needing to ensure all the important parts of those jars are unshim'd and dedup'd to handle these dependencies properly. This can cause problems when the aggregator jars pull in slightly different versions of those dependencies (e.g.: spark-rapids-jni snapshot changed during the build).
Describe the solution you'd like
We shouldn't pull in external dependencies into the aggregator jar unpack process. Those should be handled separately. Wondering if we can mark them as
<scope>provided</scope>
in the aggregator poms and directly depend on them in thedist
pom, using the assembly plugin or something similar to bundle those dependencies in the dist jar directly rather than needing to manually ensure the parallel worlds build doesn't accidentally munge or needlessly duplicate their contents.The text was updated successfully, but these errors were encountered: