[BUG] java.lang.NullPointerException when using spark.sql.extensions=com.nvidia.spark.rapids.SQLExecPlugin #460

wbo4958 · 2020-07-29T07:54:19Z

Describe the bug

If the user is using the configs as follows,

--conf spark.sql.extensions=com.nvidia.spark.rapids.SQLExecPlugin

Then RapidsExecutorPlugin will not be initialized, So GpuShuffleEnv will also not be initialized. If any code calls the method of GpuShuffleEnv, it may suffer NullPointerException, because the code assumes that the GpuShuffleEnv must have been initialized. The exception just like below.

20/07/29 15:51:29 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 10.19.183.93, executor 0): java.lang.NullPointerException
	at org.apache.spark.sql.rapids.GpuShuffleEnv$.isRapidsShuffleEnabled(GpuShuffleEnv.scala:128)
	at com.nvidia.spark.rapids.GpuPartitioning.sliceInternalGpuOrCpu(GpuPartitioning.scala:99)
	at com.nvidia.spark.rapids.GpuPartitioning.sliceInternalGpuOrCpu$(GpuPartitioning.scala:97)
	at com.nvidia.spark.rapids.GpuRoundRobinPartitioning.sliceInternalGpuOrCpu(GpuRoundRobinPartitioning.scala:34)
	at com.nvidia.spark.rapids.GpuRoundRobinPartitioning.columnarEval(GpuRoundRobinPartitioning.scala:81)
	at com.nvidia.spark.rapids.GpuShuffleExchangeExec$.$anonfun$prepareBatchShuffleDependency$2(GpuShuffleExchangeExec.scala:156)
	at com.nvidia.spark.rapids.GpuShuffleExchangeExec$$anon$1.partNextBatch(GpuShuffleExchangeExec.scala:177)
	at com.nvidia.spark.rapids.GpuShuffleExchangeExec$$anon$1.hasNext(GpuShuffleExchangeExec.scala:188)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:132)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

Steps/Code to reproduce bug
use

--conf spark.sql.extensions=com.nvidia.spark.rapids.SQLExecPlugin

The text was updated successfully, but these errors were encountered:

jlowe · 2020-07-29T14:12:04Z

I don't believe our intention is to support specifying the SQLExecPlugin exclusively. The getting started guide and other documentation specify to use: --conf spark.plugins=com.nvidia.spark.SQLPlugin. Besides the shuffle environment not being setup properly with this config, the RMM pool, pinned memory pool, and GPU semaphore won't be initialized either.

If there is documentation stating to configure Spark with --conf spark.sql.extensions=com.nvidia.spark.rapids.SQLExecPlugin then we need to update it.

abellina · 2020-07-29T14:22:07Z

Agree with @jlowe.

I think if this case could be detected and a user-friendly error thrown that would be a good small fix, perhaps. Thoughts?

For the particular NPE, some other user-friendly exception could be thrown.

wbo4958 · 2020-07-29T22:56:40Z

I also agree with @jlowe. But if we use spark.plugins in Spark Standalone mode, seems we need to copy rapids jar into each node. that's really boring.

So from the ML part, which may not need RMM/GPU shuffle things. And GPU semaphore seems can be initialized when it is used.
BTW, it can work well using spark.sql.extensions previously. Anyway, I'm ok if we stick user to use spark.plugins

revans2 · 2020-07-30T13:51:00Z

@wbo4958

If you want to be able to support that use case then file a feature request with what you want to support. We can then prioritize it on the backlog and try to figure out what is the right way to support it.

wbo4958 · 2020-07-31T00:38:09Z

close this issue and file a FEA #479

…#460) Add ability to provide a seed value in config. Closes NVIDIA#452 Signed-off-by: Gera Shegalov <gera@apache.org>

wbo4958 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jul 29, 2020

jlowe removed the ? - Needs Triage Need team to review and classify label Jul 29, 2020

wbo4958 mentioned this issue Jul 31, 2020

[FEA] Please consider to support spark.sql.extensions=com.nvidia.spark.rapids.SQLExecPlugin #479

Closed

wbo4958 closed this as completed Jul 31, 2020

pxLi pushed a commit to pxLi/spark-rapids that referenced this issue May 12, 2022

Add back HA test cases (NVIDIA#460)

6aba82e

tgravescs pushed a commit to tgravescs/spark-rapids that referenced this issue Nov 30, 2023

Allow providing a seed value for std::rand in faultinj config (NVIDIA…

96fc1d9

…#460) Add ability to provide a seed value in config. Closes NVIDIA#452 Signed-off-by: Gera Shegalov <gera@apache.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] java.lang.NullPointerException when using spark.sql.extensions=com.nvidia.spark.rapids.SQLExecPlugin #460

[BUG] java.lang.NullPointerException when using spark.sql.extensions=com.nvidia.spark.rapids.SQLExecPlugin #460

wbo4958 commented Jul 29, 2020 •

edited

Loading

jlowe commented Jul 29, 2020

abellina commented Jul 29, 2020

wbo4958 commented Jul 29, 2020 •

edited

Loading

revans2 commented Jul 30, 2020

wbo4958 commented Jul 31, 2020

[BUG] java.lang.NullPointerException when using spark.sql.extensions=com.nvidia.spark.rapids.SQLExecPlugin #460

[BUG] java.lang.NullPointerException when using spark.sql.extensions=com.nvidia.spark.rapids.SQLExecPlugin #460

Comments

wbo4958 commented Jul 29, 2020 • edited Loading

jlowe commented Jul 29, 2020

abellina commented Jul 29, 2020

wbo4958 commented Jul 29, 2020 • edited Loading

revans2 commented Jul 30, 2020

wbo4958 commented Jul 31, 2020

wbo4958 commented Jul 29, 2020 •

edited

Loading

wbo4958 commented Jul 29, 2020 •

edited

Loading