-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build against specified spark-rapids-jni snapshot jar [skip ci] #5501
Build against specified spark-rapids-jni snapshot jar [skip ci] #5501
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allowing the spark-rapids-jni version to be manually anchored is nice, but this does not solve the issue described in #5294. Even after this change, each Spark version will unpack and attempt to dedup the spark-rapids-jni jar and require manual sync of the unshim text files with the contents of that jar (and any other non-Spark-specific jar, like jucx).
Also shouldn't the buildall
script be updated with similar changes? The nightly build should not be the only build that has the fix.
Still dedup the spark-rapids-jni jar in aggregator spark-shims --> @jlowe Sorry I may not understand this issue clearly before. As aggregator depends on sql_plugin, and sql_plugin compile spark-rapids-jni into its jar. Will update buildAll script, and follow the similar way for other non-Spark-specific jar, like jucx, |
sql_plugin needs spark-rapids-jni to compile, but it does not pull spark-rapids-jni into the sql_plugin jar artifact. The spark-rapids-jni jar gets pulled into the dist jar by the custom dist jar build process (see dist/pom.xml and supporting build-parallel-worlds.xml and binary-dedupe.sh for details). The problem is with the way non-Spark-specific dependencies are handled. They're essentially assumed to be Spark-specific dependencies, which requires them to go through the parallel-world, binary-dedupe, and unshim processing. Ideally we should only do this for dependencies that are Spark-specific (i.e.: the aggregator jars but not all their dependencies). One potential solution is to manually exclude the dependencies we know are not Spark specific (like spark-rapids-jni, jucx, possibly others) and treat them just like a normal dependency that is included into the dist artifact. |
To fix issue : NVIDIA#5294 Build rapids against the specified spark-rapids-jni snapshot jar, to avoid pulling different versions of spark-rapids-jni dependency jars during the compile. Signed-off-by: Tim Liu <timl@nvidia.com>
c9f660e
to
9cc0976
Compare
@jlowe Thanks a lot for your explanation. Yes the duplicated class files of non-Spark-specific dependencies are mainly spark-rapids-jni and jucx related. We can NOT shade them into Spark-specific aggregator jars to avoid multiple binary-dedupe. As we would build sql_plugin multi times when building with diff spark shim versions, and spark-rapids-jni dependency jar may change during building, e.g., new spark-rapids-jni jars are pushed to maven repo. Suppose we can first patch for pulling the specified spark-rapids-jni dependency jar? Will follow up removing non-Spark-specific dependencies from aggregator jars in other pull requests. https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/tests/pom.xml#L56 https://github.com/NVIDIA/spark-rapids/blob/branch-22.06/api_validation/pom.xml#L148 |
…ildall script Signed-off-by: Tim Liu <timl@nvidia.com>
9cc0976
to
4f26f8e
Compare
Absolutely. I just wanted to call out that this PR should not state that it resolves the issue, since it alone does not.
Yes, that is expected. We cannot simply remove these dependencies, they are needed at compile time and runtime. We need to process these dependencies differently when constructing the jars, not remove them entirely. |
1, Test case for cudf expected timestamp-seq version name vs actual SNAPSHOT match, e.g. cudf version "7.0.1-20220101.001122-123" is satisfied by "7.0.1-SNAPSHOT" 2, Use $MVN instead of "mvn" for every invocation of Maven, to include maven options for all the mvn commands. Signed-off-by: Tim Liu <timl@nvidia.com>
Yes, test cases should be added for the new behavior for SNAPSHOT matching.
|
Signed-off-by: Tim Liu <timl@nvidia.com>
Signed-off-by: Tim Liu <timl@nvidia.com>
tests/src/test/scala/com/nvidia/spark/rapids/RapidsExecutorPluginSuite.scala
Show resolved
Hide resolved
Signed-off-by: Tim Liu <timl@nvidia.com>
build |
part of #5294
Build rapids against the specified spark-rapids-jni snapshot jar, to avoid pulling different versions of
spark-rapids-jni dependency jars during the compile.
Signed-off-by: Tim Liu timl@nvidia.com