Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unable to build rapids-4-spark jar from source due to missing 3.0.3-SNAPSHOT for spark-sql #3166

Closed
Niharikadutta opened this issue Aug 6, 2021 · 12 comments · Fixed by #3189
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@Niharikadutta
Copy link

Niharikadutta commented Aug 6, 2021

Describe the bug
I'm trying to build the rapids-4-spark jar from v21.06.1 tag and am getting the following error:

[ERROR] Failed to execute goal on project rapids-4-spark-shims-spark303_2.12: Could not resolve dependencies for project com.nvidia:rapids-4-spark-shims-spark303_2.12:jar:21.06.1: Could not find artifact org.apache.spark:spark-sql_2.12:jar:3.0.3-SNAPSHOT in snapshots-repo (https://oss.sonatype.org/content/repositories/snapshots)

Not sure if this why, but looks like there isn't a snapshot for 3.0.3 in the apache repo (modified yesterday) - https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-sql_2.12/

I got it to build by removing spark303 module from the pom, but just wanted to check with the devs here if this is expected.

Steps/Code to reproduce bug
Ran the following command from root directory for tag v21.06.1:
mvn -DskipTests package

Expected behavior
Jar should be built without errors.

Environment details

  • OS: Linux
  • Java version: 1.8.0_292
  • Maven version: 3.8.1

Additional context
I am need to build a local jar to remove the SLF4j dependency that rapids jar brings with it (v1.7.30), I'm running into issues with LoggerFactory instantiation in my environment, possibly due to a different library bringing its own slf4j (v1.7.16).

Failed to instantiate SLF4J LoggerFactory
Reported exception:
java.lang.NullPointerException
at org.slf4j.LoggerFactory.reportActualBinding(LoggerFactory.java:349)

Any guidance here that could help unblock me (that wouldn't require me to remove these dependencies from source and re-build) would also be really appreciated. Thank you in advance!

@Niharikadutta Niharikadutta added ? - Needs Triage Need team to review and classify bug Something isn't working labels Aug 6, 2021
@tgravescs
Copy link
Collaborator

tgravescs commented Aug 9, 2021

spark 3.0.3 was released so the snapshot jars may have been removed now. We didn't support it in the 26.01 release because it wasn't officially supported yet at that time.

if you build with -P!snapshot-shims that will skip building any of the shims that were snapshots and not officially released.

Note you need to quote the !snapshot-shims part:

mvn -P'!snapshot-shims' ...

@gerashegalov
Copy link
Collaborator

We could add language to the doc for tagged/released versions that SNAPSHOT Spark versions are not supported since they had not been finalized as of the Plugin release time. We can also make sure that snapshot-shims are disabled by default in the released plugin version.

@jlowe
Copy link
Member

jlowe commented Aug 9, 2021

We can also make sure that snapshot-shims are disabled by default in the released plugin version.

+1 for this approach.

@Niharikadutta
Copy link
Author

Thanks @tgravescs and @gerashegalov for the clarification.
About the following error I'm seeing when including RAPIDS jar 21.06.0:

Failed to instantiate SLF4J LoggerFactory
Reported exception:
java.lang.NullPointerException
at org.slf4j.LoggerFactory.reportActualBinding(LoggerFactory.java:349)

Do you have any insights?

@tgravescs
Copy link
Collaborator

I have not seen that, my first assumption is some incompatible jar.

This happens on startup or spark or when does it occur?

@gerashegalov
Copy link
Collaborator

gerashegalov commented Aug 10, 2021

It seems you are hitting this line https://github.com/qos-ch/slf4j/blob/v_1.7.32/slf4j-api/src/main/java/org/slf4j/LoggerFactory.java#L349

The primary reason may be related to multiple slf4j bindings on the classpath that you need to eliminate via exclusion in your deploy build or some mix up of slf4j versions. The secondary issue is some bug in slf4j leading to the NPE reporting the primary issue depending on the primary issue.

@Niharikadutta
Copy link
Author

Niharikadutta commented Aug 10, 2021

This happens on startup or spark or when does it occur?

@tgravescs This happens when running a Spark job.

@gerashegalov Yes it looks like a bug in slf4j v1.7.30, however I've tried updating the version in pom.xml to the latest (v2.0.0-alpha2, where they have a null check and falling to NOP) and I still see it hitting the line https://github.com/qos-ch/slf4j/blob/v_1.7.30/slf4j-api/src/main/java/org/slf4j/LoggerFactory.java#L349. I also tried changing the scope of slf4j dependencies in the pom file to provided but I still see slf4j files in the compiled jar, not sure what I'm doing wrong here.
As for multiple bindings, I have had multiple bindings warning being shown before but it never caused an error in instantialization before (this has been working with rapids 21.06.0 jar and our other internal slf4j jar for a couple of months, and only started failing recently).

@gerashegalov
Copy link
Collaborator

gerashegalov commented Aug 10, 2021

@Niharikadutta I would suggest enabling verbose classloading via JVM option '-verbose:class' to verify that you are getting classpath resources from expected locations. If you see it on the driver than add it to --driver-java-options / spark.driver.extraJavaOptions.

For 21.06.1 with the local REPL I see

[Loaded org.slf4j.Logger from file:/home/user/dist/spark-rapids-21.06.1/rapids-4-spark_2.12-21.06.1.jar]
...
[Loaded org.slf4j.helpers.Util from file:/home/user/dist/spark-rapids-21.06.1/rapids-4-spark_2.12-21.06.1.jar]

SLF4J: Found binding in [jar:file:/home/user/dist/spark-rapids-21.06.1/rapids-4-spark_2.12-21.06.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/user/dist/spark-3.1.1-bin-hadoop3.2/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

I can't see a notable difference between 21.06.01 although we could avoid bring in our own copy of slf4j.

@Salonijain27 Salonijain27 added documentation Improvements or additions to documentation and removed ? - Needs Triage Need team to review and classify labels Aug 10, 2021
@gerashegalov
Copy link
Collaborator

@Niharikadutta if you want to try excluding slf4j from the rapids assembly jar you can do this:

diff --git a/dist/pom.xml b/dist/pom.xml
index b89e56eb5..e8d1df0e2 100644
--- a/dist/pom.xml
+++ b/dist/pom.xml
@@ -70,6 +70,9 @@
         <groupId>org.apache.maven.plugins</groupId>
         <artifactId>maven-shade-plugin</artifactId>
         <configuration>
+          <artifactSet>
+            <excludes>org.slf4j:*</excludes>
+          </artifactSet>
          <transformers>
             <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
          </transformers>

@Niharikadutta
Copy link
Author

Thanks @gerashegalov let me try that

@jlowe
Copy link
Member

jlowe commented Aug 10, 2021

There's two separate issues being discussed here with two different fixes. I filed #3187 to track the NPE in SLF4J issue, leaving this to track the issue of old builds failing when Apache Spark releases a version that was previously shipped as a snapshot.

@Niharikadutta
Copy link
Author

Thanks @gerashegalov for the suggestion, that worked!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants