Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamically load hive and avro using reflection to avoid potential class not found exception [databricks] #5723

Merged
merged 7 commits into from
Jun 24, 2022

Conversation

res-life
Copy link
Collaborator

@res-life res-life commented Jun 2, 2022

Fixes #5648

Potential class not found exception problem:

In ExternalSource.scala

// **This class is static compile reference**
import org.apache.spark.sql.v2.avro.AvroScan

  lazy val hasSparkAvroJar = {
    // **Runtime check**
    Try(loader.loadClass("org.apache.spark.sql.v2.avro.AvroScan")) match {
      case Failure(_) => false
      case Success(_) => true
    }
  }

def getScans: Map[Class[_ <: Scan], ScanRule[_ <: Scan]] = {
    if (hasSparkAvroJar) {
        GpuOverrides.scan[AvroScan]  // **refer AvroScan**

source code of ExternalSource

See the above code.
ExternalSource statically reference AvroScan even AvroScan does not exist at runtime.
ExternalSource bytecode has the compile reference.

Resolve
Use the method mentioned in the issue #5648
Remove all the references to Avro classes in the ExternalSource file.
Provides a provider trait and an implement subclass.
Put all the compile-references into the implement subclass.
ExternalSource dynamically loads provider using reflection to avoid potential class not found exception.
If has no Avro jar detected by the loader.loadClass, ExternalSource will not try to load the implement subclass.

Also apply to Hive

The following avro and hive classes should move to subclasses of the providers.
avro.{AvroFileFormat, AvroOptions}
avro.AvroScan
hive.{HiveGenericUDF, HiveSimpleUDF}

Signed-off-by: Chong Gao res_life@163.com

@res-life res-life changed the title Dynamically load hive and avro using reflection to avoid potential class not found exception Dynamically load hive and avro using reflection to avoid potential class not found exception [databricks] Jun 2, 2022
@res-life
Copy link
Collaborator Author

res-life commented Jun 2, 2022

build

…ass not found exception

Signed-off-by: Chong Gao <res_life@163.com>
@res-life res-life force-pushed the fix-compile-time-reference branch from d73e53e to c89b407 Compare June 2, 2022 08:45
@res-life
Copy link
Collaborator Author

res-life commented Jun 2, 2022

build

@res-life res-life marked this pull request as ready for review June 2, 2022 11:37
@res-life
Copy link
Collaborator Author

res-life commented Jun 2, 2022

Conflict with #5716

@razajafri
Copy link
Collaborator

I have verified this fixes the bug reported by our QA
@gerashegalov can you verify one more time before we re-target this PR for 22.06 release?

object ExternalSource {
val providerClassName = "org.apache.spark.sql.rapids.AvroSourceProvider"

lazy val hasSparkAvroJar = {
val loader = Utils.getContextOrSparkClassLoader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do the same thing here as we are doing in GpuHiveOverrides. i.e. use the ShimLoader

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 49 to 50
val className = "org.apache.spark.sql.hive.rapids.HiveSourceProvider"
ShimLoader.newInstanceOf[HiveProvider](className).getExprs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
Our current pattern is to create dedicated methods in ShimLoader to reduce spreading of the class-for-name sort of code in the code base.

Consider creating in ShimLoader.scala methods:

  def newHiveProvider(): HiveProvider= {
    newInstanceOf[HiveProvider]("org.apache.spark.sql.hive.rapids.HiveSourceProvider")
  }

Another nit about naming. Usual pattern is that when the interface/trait called HiveProvider then the implementing class is HiveProviderImpl

Naming is hard. Given this called from GpuHiveOverrides, should the trait be GpuHiveOverridesProvider and the class GpuHiveOverridesImpl?

Similar considerations for Avro

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

case _: AvroFileFormat => true
case _ => false
}
ShimLoader.newInstanceOf[AvroProvider](providerClassName).isSupportedFormat(format)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be called more than once, and will lead to creating garbage instances of AvroProvider. We should memoize AvroProvider as a singleton in a lazy val similar to hasSparkAvroJar

Let us create a ShimLoader method def newAvroProvider(): AvroProvider . See the comment about Hive

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@gerashegalov
Copy link
Collaborator

gerashegalov commented Jun 2, 2022

I have verified this fixes the bug reported by our QA @gerashegalov can you verify one more time before we re-target this PR for 22.06 release?

I will file a dedicated issues for this. It does not look like this PR fixes it yet.

To reproduce, invoke:

$SPARK_HOME/bin/pyspark \
  --driver-class-path dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar \
  --packages org.apache.spark:spark-avro_2.12:3.2.1 \
  --conf spark.plugins=com.nvidia.spark.SQLPlugin \
  --conf spark.rapids.sql.enabled=true \
  --conf spark.rapids.sql.explain=ALL
2022/06/02 23:22:34 ERROR RapidsExecutorPlugin: Exception in the executor plugin, shutting down!
java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: org/apache/spark/sql/v2/avro/AvroScan
        at org.apache.spark.sql.rapids.AvroSourceProvider.getScans(AvroSourceProvider.scala:99)
        at org.apache.spark.sql.rapids.ExternalSource$.getScans(ExternalSource.scala:127)
        at com.nvidia.spark.rapids.GpuOverrides$.<init>(GpuOverrides.scala:3555)
        at com.nvidia.spark.rapids.GpuOverrides$.<clinit>(GpuOverrides.scala)
        at com.nvidia.spark.rapids.TypeChecks$.areTimestampsSupported(TypeChecks.scala:797)
        at com.nvidia.spark.rapids.RapidsExecutorPlugin.init(Plugin.scala:218)
        at org.apache.spark.internal.plugin.ExecutorPluginContainer.$anonfun$executorPlugins$1(PluginContainer.scala:125)
        at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
        at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
        at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
        at org.apache.spark.internal.plugin.ExecutorPluginContainer.<init>(PluginContainer.scala:113)
        at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:211)
        at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:199)
        at org.apache.spark.executor.Executor.$anonfun$plugins$1(Executor.scala:253)
        at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:231)
        at org.apache.spark.executor.Executor.<init>(Executor.scala:253)
        at org.apache.spark.scheduler.local.LocalEndpoint.<init>(LocalSchedulerBackend.scala:64)
        at org.apache.spark.scheduler.local.LocalSchedulerBackend.start(LocalSchedulerBackend.scala:132)
        at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:220)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:581)
        at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:238)
        at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
        at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/v2/avro/AvroScan
        ... 37 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.v2.avro.AvroScan
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)

If we substitute --jars for --driver-class-path it works

@res-life
Copy link
Collaborator Author

res-life commented Jun 7, 2022

I verified the following test works.

env -u SPARK_HOME mvn clean install -DskipTests -Dbuildver=321

$SPARK_HOME/bin/pyspark \
  --driver-class-path dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar \
  --packages org.apache.spark:spark-avro_2.12:3.2.1 \
  --conf spark.plugins=com.nvidia.spark.SQLPlugin \
  --conf spark.rapids.sql.enabled=true \
  --conf spark.rapids.sql.explain=ALL

@res-life
Copy link
Collaborator Author

res-life commented Jun 7, 2022

build

@res-life
Copy link
Collaborator Author

@gerashegalov Help review.

@res-life
Copy link
Collaborator Author

build

1 similar comment
@res-life
Copy link
Collaborator Author

build

/**
* Singleton Avro Provider
*/
lazy val avroProvider: AvroProvider = ShimLoader.newInstanceOf[AvroProvider](
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather have only methods in these trait, and not introduce any constraints, and let the caller decide how to use it. Can we change this to:

def newAvroProvider(): AvroProvider = { 
    ...
}

for consistency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

case _: AvroFileFormat => true
case _ => false
}
ShimLoader.avroProvider.isSupportedFormat(format)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a lazy val to object ExternalSource

lazy val avroProvider = ShimLoader.newAvroProvider()

and here and elsewhere simply call methods on the lazy val

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@gerashegalov
Copy link
Collaborator

build

@pxLi
Copy link
Collaborator

pxLi commented Jun 22, 2022

build

@pxLi
Copy link
Collaborator

pxLi commented Jun 22, 2022

this failed in https://github.com/NVIDIA/spark-rapids/blob/branch-22.08/jenkins/spark-premerge-build.sh#L81-L84
and cause executor continually restarting and flood the disk space.

04:05:28,149 INFO    CoarseGrainedExecutorBackend:2634 - Started daemon with process name: 21963@premerge-ci-1-jenkins-rapids-premerge-github-4977-hlxhw-hwp5x
04:05:28,157 INFO                     SignalUtils:  57 - Registering signal handler for TERM
04:05:28,159 INFO                     SignalUtils:  57 - Registering signal handler for HUP
04:05:28,159 INFO                     SignalUtils:  57 - Registering signal handler for INT
04:05:28,473 WARN                NativeCodeLoader:  60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
04:05:28,565 INFO                 SecurityManager:  57 - Changing view acls to: root
04:05:28,566 INFO                 SecurityManager:  57 - Changing modify acls to: root
04:05:28,566 INFO                 SecurityManager:  57 - Changing view acls groups to: 
04:05:28,567 INFO                 SecurityManager:  57 - Changing modify acls groups to: 
04:05:28,568 INFO                 SecurityManager:  57 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
04:05:29,009 INFO          TransportClientFactory: 309 - Successfully created connection to premerge-ci-1-jenkins-rapids-premerge-github-4977-hlxhw-hwp5x/10.233.110.180:34849 after 83 ms (0 ms spent in bootstraps)
04:05:29,124 INFO                 SecurityManager:  57 - Changing view acls to: root
04:05:29,124 INFO                 SecurityManager:  57 - Changing modify acls to: root
04:05:29,124 INFO                 SecurityManager:  57 - Changing view acls groups to: 
04:05:29,125 INFO                 SecurityManager:  57 - Changing modify acls groups to: 
04:05:29,125 INFO                 SecurityManager:  57 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
04:05:29,199 INFO          TransportClientFactory: 309 - Successfully created connection to premerge-ci-1-jenkins-rapids-premerge-github-4977-hlxhw-hwp5x/10.233.110.180:34849 after 3 ms (0 ms spent in bootstraps)
04:05:29,263 INFO                DiskBlockManager:  57 - Created local directory at /tmp/spark-8e09fbbe-141a-4473-87c0-53b5705c8477/executor-8295398f-5cd7-4b44-b69f-d4f804e45b4d/blockmgr-c3ef6d65-9ee0-43a1-8225-20a6e70ff050
04:05:29,305 INFO                     MemoryStore:  57 - MemoryStore started with capacity 366.3 MiB
04:05:29,560 INFO    CoarseGrainedExecutorBackend:  57 - Connecting to driver: spark://CoarseGrainedScheduler@premerge-ci-1-jenkins-rapids-premerge-github-4977-hlxhw-hwp5x:34849
04:05:29,561 INFO                   WorkerWatcher:  57 - Connecting to worker spark://Worker@10.233.110.180:33551
04:05:29,566 INFO          TransportClientFactory: 309 - Successfully created connection to /10.233.110.180:33551 after 2 ms (0 ms spent in bootstraps)
04:05:29,569 INFO                   WorkerWatcher:  57 - Successfully connected to spark://Worker@10.233.110.180:33551
04:05:29,578 INFO                   ResourceUtils:  57 - ==============================================================
04:05:29,578 INFO                   ResourceUtils:  57 - No custom resources configured for spark.executor.
04:05:29,578 INFO                   ResourceUtils:  57 - ==============================================================
04:05:29,604 INFO    CoarseGrainedExecutorBackend:  57 - Successfully registered with driver
04:05:29,609 INFO                        Executor:  57 - Starting executor ID 100 on host 10.233.110.180
04:05:29,714 INFO                           Utils:  57 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38777.
04:05:29,714 INFO       NettyBlockTransferService:  81 - Server created on 10.233.110.180:38777
04:05:29,716 INFO                    BlockManager:  57 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
04:05:29,725 INFO              BlockManagerMaster:  57 - Registering BlockManager BlockManagerId(100, 10.233.110.180, 38777, None)
04:05:29,733 INFO              BlockManagerMaster:  57 - Registered BlockManager BlockManagerId(100, 10.233.110.180, 38777, None)
04:05:29,733 INFO                    BlockManager:  57 - Initialized BlockManager: BlockManagerId(100, 10.233.110.180, 38777, None)
04:05:29,763 INFO                        Executor:  57 - Fetching spark://premerge-ci-1-jenkins-rapids-premerge-github-4977-hlxhw-hwp5x:34849/jars/parquet-hadoop-1.10.1-tests.jar with timestamp 1655870138005
04:05:29,786 INFO          TransportClientFactory: 309 - Successfully created connection to premerge-ci-1-jenkins-rapids-premerge-github-4977-hlxhw-hwp5x/10.233.110.180:34849 after 2 ms (0 ms spent in bootstraps)
04:05:29,788 INFO                           Utils:  57 - Fetching spark://premerge-ci-1-jenkins-rapids-premerge-github-4977-hlxhw-hwp5x:34849/jars/parquet-hadoop-1.10.1-tests.jar to /tmp/spark-8e09fbbe-141a-4473-87c0-53b5705c8477/executor-8295398f-5cd7-4b44-b69f-d4f804e45b4d/spark-80861fbe-2f9a-4393-a6dc-3d073ee2443e/fetchFileTemp6203006980548759201.tmp
04:05:29,797 INFO                           Utils:  57 - Copying /tmp/spark-8e09fbbe-141a-4473-87c0-53b5705c8477/executor-8295398f-5cd7-4b44-b69f-d4f804e45b4d/spark-80861fbe-2f9a-4393-a6dc-3d073ee2443e/-16998640221655870138005_cache to /home/jenkins/agent/workspace/jenkins-rapids_premerge-github-4977/.download/spark-3.1.1-bin-hadoop3.2/work/app-20220622035539-0000/100/./parquet-hadoop-1.10.1-tests.jar
04:05:29,815 INFO                        Executor:  57 - Adding file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-4977/.download/spark-3.1.1-bin-hadoop3.2/work/app-20220622035539-0000/100/./parquet-hadoop-1.10.1-tests.jar to class loader
04:05:29,816 INFO                        Executor:  57 - Fetching spark://premerge-ci-1-jenkins-rapids-premerge-github-4977-hlxhw-hwp5x:34849/jars/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar with timestamp 1655870138005
04:05:29,817 INFO                           Utils:  57 - Fetching spark://premerge-ci-1-jenkins-rapids-premerge-github-4977-hlxhw-hwp5x:34849/jars/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar to /tmp/spark-8e09fbbe-141a-4473-87c0-53b5705c8477/executor-8295398f-5cd7-4b44-b69f-d4f804e45b4d/spark-80861fbe-2f9a-4393-a6dc-3d073ee2443e/fetchFileTemp2347614777550848219.tmp
04:05:31,215 INFO                           Utils:  57 - Copying /tmp/spark-8e09fbbe-141a-4473-87c0-53b5705c8477/executor-8295398f-5cd7-4b44-b69f-d4f804e45b4d/spark-80861fbe-2f9a-4393-a6dc-3d073ee2443e/2012559801655870138005_cache to /home/jenkins/agent/workspace/jenkins-rapids_premerge-github-4977/.download/spark-3.1.1-bin-hadoop3.2/work/app-20220622035539-0000/100/./rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar
04:05:31,901 INFO                        Executor:  57 - Adding file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-4977/.download/spark-3.1.1-bin-hadoop3.2/work/app-20220622035539-0000/100/./rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar to class loader
04:05:31,902 INFO                        Executor:  57 - Fetching spark://premerge-ci-1-jenkins-rapids-premerge-github-4977-hlxhw-hwp5x:34849/jars/rapids-4-spark-integration-tests_2.12-22.08.0-SNAPSHOT-spark311.jar with timestamp 1655870138005
04:05:31,903 INFO                           Utils:  57 - Fetching spark://premerge-ci-1-jenkins-rapids-premerge-github-4977-hlxhw-hwp5x:34849/jars/rapids-4-spark-integration-tests_2.12-22.08.0-SNAPSHOT-spark311.jar to /tmp/spark-8e09fbbe-141a-4473-87c0-53b5705c8477/executor-8295398f-5cd7-4b44-b69f-d4f804e45b4d/spark-80861fbe-2f9a-4393-a6dc-3d073ee2443e/fetchFileTemp5348820776430585799.tmp
04:05:31,907 INFO                           Utils:  57 - Copying /tmp/spark-8e09fbbe-141a-4473-87c0-53b5705c8477/executor-8295398f-5cd7-4b44-b69f-d4f804e45b4d/spark-80861fbe-2f9a-4393-a6dc-3d073ee2443e/7258204081655870138005_cache to /home/jenkins/agent/workspace/jenkins-rapids_premerge-github-4977/.download/spark-3.1.1-bin-hadoop3.2/work/app-20220622035539-0000/100/./rapids-4-spark-integration-tests_2.12-22.08.0-SNAPSHOT-spark311.jar
04:05:31,917 INFO                        Executor:  57 - Adding file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-4977/.download/spark-3.1.1-bin-hadoop3.2/work/app-20220622035539-0000/100/./rapids-4-spark-integration-tests_2.12-22.08.0-SNAPSHOT-spark311.jar to class loader
04:05:31,945 INFO                      ShimLoader:  57 - Loading shim for Spark version: 3.1.1
04:05:31,946 INFO                      ShimLoader:  57 - Complete Spark build info: 3.1.1, https://github.com/apache/spark, HEAD, 1d550c4e90275ab418b9161925049239227f3dc9, 2021-02-22T01:33:19Z
04:05:31,985 INFO                      ShimLoader:  57 - Looking for a mutable classloader (defaultClassLoader) in SparkEnv.serializer org.apache.spark.serializer.JavaSerializer@3fff175f
04:05:31,996 INFO                      ShimLoader:  57 - Extracted Spark classloader from SparkEnv.serializer org.apache.spark.util.MutableURLClassLoader@18da6a97
04:05:31,996 INFO                      ShimLoader:  57 - findURLClassLoader found a URLClassLoader org.apache.spark.util.MutableURLClassLoader@18da6a97
04:05:31,997 INFO                      ShimLoader:  57 - Updating spark classloader org.apache.spark.util.MutableURLClassLoader@18da6a97 with the URLs: jar:file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-4977/dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar!/spark3xx-common/, jar:file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-4977/dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar!/spark311/
04:05:31,999 INFO                      ShimLoader:  57 - Spark classLoader org.apache.spark.util.MutableURLClassLoader@18da6a97 updated successfully
04:05:31,999 INFO                      ShimLoader:  57 - Updating spark classloader org.apache.spark.util.MutableURLClassLoader@18da6a97 with the URLs: jar:file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-4977/dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar!/spark3xx-common/, jar:file:/home/jenkins/agent/workspace/jenkins-rapids_premerge-github-4977/dist/target/rapids-4-spark_2.12-22.08.0-SNAPSHOT-cuda11.jar!/spark311/
04:05:32,000 INFO                      ShimLoader:  57 - Spark classLoader org.apache.spark.util.MutableURLClassLoader@18da6a97 updated successfully
04:05:32,227 INFO               RapidsPluginUtils:  57 - RAPIDS Accelerator build: {version=22.08.0-SNAPSHOT, user=, url=https://github.com/NVIDIA/spark-rapids.git, date=2022-06-22T03:08:46Z, revision=8bdc5b2e16c7d0c508663f612eef9b4b4517e402, cudf_version=22.08.0-SNAPSHOT, branch=HEAD}
04:05:32,228 INFO               RapidsPluginUtils:  57 - RAPIDS Accelerator JNI build: {version=22.08.0-SNAPSHOT, user=, url=https://github.com/NVIDIA/spark-rapids-jni.git, date=2022-06-22T02:27:42Z, revision=175742fe1b46c7e293b2ac551e6ca93101692c49, branch=HEAD}
04:05:32,229 INFO               RapidsPluginUtils:  57 - cudf build: {version=22.08.0-SNAPSHOT, user=, url=https://github.com/rapidsai/cudf.git, date=2022-06-22T02:27:42Z, revision=40ec1903e8cfa894950b3e2a91ca05bfcb7fdb63, branch=HEAD}
04:05:32,229 WARN               RapidsPluginUtils:  69 - RAPIDS Accelerator 22.08.0-SNAPSHOT using cudf 22.08.0-SNAPSHOT.
04:05:32,230 INFO            RapidsExecutorPlugin:  57 - RAPIDS Accelerator build: {version=22.08.0-SNAPSHOT, user=, url=https://github.com/NVIDIA/spark-rapids.git, date=2022-06-22T03:08:46Z, revision=8bdc5b2e16c7d0c508663f612eef9b4b4517e402, cudf_version=22.08.0-SNAPSHOT, branch=HEAD}
04:05:32,231 INFO            RapidsExecutorPlugin:  57 - cudf build: {version=22.08.0-SNAPSHOT, user=, url=https://github.com/rapidsai/cudf.git, date=2022-06-22T02:27:42Z, revision=40ec1903e8cfa894950b3e2a91ca05bfcb7fdb63, branch=HEAD}
04:05:32,702 ERROR           RapidsExecutorPlugin:  94 - Exception in the executor plugin, shutting down!
java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/rapids/HiveProvider
	at com.nvidia.spark.rapids.ShimLoader$.newHiveProvider(ShimLoader.scala:452)
	at org.apache.spark.sql.hive.rapids.GpuHiveOverrides$.exprs(GpuHiveOverrides.scala:49)
	at com.nvidia.spark.rapids.GpuOverrides$.<init>(GpuOverrides.scala:3505)
	at com.nvidia.spark.rapids.GpuOverrides$.<clinit>(GpuOverrides.scala)
	at com.nvidia.spark.rapids.TypeChecks$.areTimestampsSupported(TypeChecks.scala:801)
	at com.nvidia.spark.rapids.RapidsExecutorPlugin.init(Plugin.scala:219)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.$anonfun$executorPlugins$1(PluginContainer.scala:125)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
	at org.apache.spark.internal.plugin.ExecutorPluginContainer.<init>(PluginContainer.scala:113)
	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:211)
	at org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:199)
	at org.apache.spark.executor.Executor.$anonfun$plugins$1(Executor.scala:253)
	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:222)
	at org.apache.spark.executor.Executor.<init>(Executor.scala:253)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:159)
	at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:115)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
	at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
	at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.rapids.HiveProvider
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 31 more
04:05:32,713 ERROR                          Utils:  94 - Uncaught exception in thread shutdown-hook-0
java.lang.NullPointerException
	at org.apache.spark.executor.Executor.$anonfun$stop$3(Executor.scala:332)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.util.Utils$.withContextClassLoader(Utils.scala:222)
	at org.apache.spark.executor.Executor.stop(Executor.scala:332)
	at org.apache.spark.executor.Executor.$anonfun$new$2(Executor.scala:76)
	at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
	at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at scala.util.Try$.apply(Try.scala:213)
	at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
	at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
04:05:32,714 INFO                DiskBlockManager:  57 - Shutdown hook called
04:05:32,721 INFO             ShutdownHookManager:  57 - Shutdown hook called
04:05:32,722 INFO             ShutdownHookManager:  57 - Deleting directory /tmp/spark-8e09fbbe-141a-4473-87c0-53b5705c8477/executor-8295398f-5cd7-4b44-b69f-d4f804e45b4d/spark-80861fbe-2f9a-4393-a6dc-3d073ee2443e

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

res-life commented Jun 22, 2022

I posted the above 2 commits.
The first one is to fix:

    PYSP_TEST_spark_rapids_force_caller_classloader=false \
        NUM_LOCAL_EXECS=1 \
        TEST_PARALLEL=0 \
        ./integration_tests/run_pyspark_from_build.sh -k 'test_cartesian_join_special_case_count[100]'

Error:
...
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.rapids.HiveProvider
...

The second is to fix the comments.

gerashegalov
gerashegalov previously approved these changes Jun 22, 2022
Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo optional nit about unsorted unshimmed*.txt

Comment on lines 4 to 5
com/nvidia/spark/rapids/HiveProvider.class
com/nvidia/spark/rapids/AvroProvider.class
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us keep the file sorted

@res-life
Copy link
Collaborator Author

build

@res-life
Copy link
Collaborator Author

build

@res-life res-life merged commit 85e3cf7 into NVIDIA:branch-22.08 Jun 24, 2022
@res-life res-life deleted the fix-compile-time-reference branch June 24, 2022 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] compile-time references to classes potentially unavailable at run time
5 participants