-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK SQL does not work #628
Comments
Is AWS EMR 6.1.0 using Apache Spark 3.0? If not, Spark SQL on GPU wont work |
Yes it is indeed |
|
The cluster does not even start without hive enabled. the containers do not get allocated at all. The following is from the YARN logs:
|
There is something odd happening. The API that is throwing the |
The first thing in your list of jars is indeed cudf-0.9.2 which explains the issue.
|
Yeah tried it using the below, still the same issue the issue reported now is: `20/08/28 20:25:21 WARN ResourceRequestHelper: YARN doesn't know about resource yarn.io/gpu, your resource discovery has to handle properly discovering and isolating the resource! Error: The resource manager encountered a problem that should not occur under normal circumstances. Please report this error to the Hadoop community by opening a JIRA ticket at http://issues.apache.org/jira and including the following information:
|
That implies the YARN cluster has not been configured to schedule for GPUs. Please check the YARN configuration files and verify it is configured to support GPU scheduling. https://hadoop.apache.org/docs/r3.2.1/hadoop-yarn/hadoop-yarn-site/UsingGpus.html |
done that now I am getting the error "Could not load cudf jni library.." this has been mentioned in #149 which is referring to cudf-0.15 but I cannot see it here: https://repo1.maven.org/maven2/ai/rapids/cudf/ |
If this message is occurring only on the driver node then it should be a benign message. The driver does not require a GPU or the cudf code to load in order to function.
cudf-0.15 has not yet released. Once it has the jar will be posted there. Note that cudf-0.15 is likely not compatible with version 0.1.0 of the plugin jar, so you should stick with cudf-0.14 as long as you are using plugin version 0.1.0. |
the current error: We have the following file in EMR:
|
This is the relevant portion of the error. The cudf jar is built for a specific CUDA runtime, 10.1 in this case. There is a version built for the CUDA 10.2 runtime at https://repo1.maven.org/maven2/ai/rapids/cudf/0.14/cudf-0.14-cuda10-2.jar. Typically the CUDA runtimes are installed under |
your team is absolutely brilliant, I am infact surprised by your cooperation and support. We have the following file in EMR: Can this person, from NVIDIA, who is causing all this confusion having not done proper diligence, please be asked to update this article: https://aws.amazon.com/blogs/big-data/improving-rapids-xgboost-performance-and-reducing-costs-with-amazon-emr-running-amazon-ec2-g4-instances/?nc1=b_rp? The article is clearly misleading, people who are using something like xgboost will try running SQL first to prepare their data. And this article is just frustratingly incomplete. |
Just trying to help, glad you are trying out the software and are willing to work through issues! I took a quick look at the article, and it appears to be not using the RAPIDS Accelerator for Apache Spark (this project) but rather a custom solution for xgboost that was built earlier. The notebook referred to in that article uses an old It appears the updated getting started guide is now at https://github.com/NVIDIA/spark-xgboost-examples/blob/spark-3/getting-started-guides/csp/aws/ec2.md which shows running with the RAPIDS Accelerator plugin and xgboost, but it does so under EC2 rather than EMR, running Spark in standalone mode rather than with Spark-on-YARN. There probably needs to be a getting started guide for EMR for those that would rather work in that environment. Would you be willing to file an issue in the https://github.com/NVIDIA/spark-xgboost-examples repo requesting an AWS EMR getting started guide? |
Hi @jlowe I am currently working on it, is there a way I could contribute and write the notebook and check it in? |
You can definitely write up a notebook and submit a pull request against https://github.com/NVIDIA/spark-xgboost-examples, that would be great! I can't guarantee it would be accepted verbatim or at all, as it's up to the committers in that repository to review and ultimately decide to accept the contribution. However in general contributions of all types (issues reported, features requested, pull requests posted, etc.) are welcome! I would recommend filing the issue first and post a followup comment to the issue stating you are interested in working on it and plan on posting a pull request. Then you can fork the repo in Github, put your notebook changes on a branch off of the I'm going to close this issue since it's a documentation issue in the spark-xgboost-examples repo rather than a bug in the RAPIDS Accelerator plugin. |
…IDIA#628) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com> Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
Describe the bug
in AWS EMR SPARK SQL does not work. The error thrown is cannot run on GPU because no GPU enabled version of operator class org.apache.spark.sql.execution.datasources.v2.SetCatalogAndNamespaceExec could be found
Steps/Code to reproduce bug
NOTE that the https://github.com/apache/spark/blob/master/examples/src/main/scripts/getGpusResources.sh was downloaded into all the nodes as a part of bootstrap actions.
from pyspark.sql import SparkSession
from pyspark import SparkConf
conf = SparkConf().setAppName("MortgageETL")
conf.set("spark.jars", "s3://gourav-bucket/gourav/gpu/cudf-0.9.2.jar,s3://gourav-bucket/gourav/gpu/rapids-4-spark_2.12-0.1.0.jar,s3://gourav-bucket/gourav/gpu/cudf-0.14-cuda10-1.jar")
conf.set('spark.rapids.sql.explain', 'ALL')
conf.set("spark.executor.instances", "20")
conf.set("spark.executor.cores", "2")
conf.set("spark.task.cpus", "1")
conf.set("spark.rapids.sql.concurrentGpuTasks", "1")
conf.set("spark.executor.memory", "4g")
conf.set("spark.rapids.memory.pinnedPool.size", "1G")
conf.set("spark.executor.memoryOverhead", "2G")
conf.set("spark.executor.extraJavaOptions", "-Dai.rapids.cudf.prefer-pinned=true")
conf.set("spark.locality.wait", "0s")
conf.set("spark.sql.files.maxPartitionBytes", "512m")
conf.set("spark.executor.resource.gpu.amount", "1")
conf.set("spark.task.resource.gpu.amount", "0.25")
conf.set("spark.plugins", "com.nvidia.spark.SQLPlugin")
conf.set("spark.rapids.sql.hasNans", "false")
conf.set('spark.rapids.sql.batchSizeBytes', '512M')
conf.set('spark.rapids.sql.reader.batchSizeBytes', '768M')
conf.set('spark.rapids.sql.variableFloatAgg.enabled', 'true')
conf.set("spark.plugins", "com.nvidia.spark.SQLPlugin")
conf.set("spark.sql.adaptive.enabled", False)
conf.set("spark.executor.resource.gpu.discoveryScript","/mnt/mapred/getGpusResources.sh")
spark = SparkSession.builder.enableHiveSupport()
.config(conf=conf)
.master("yarn").getOrCreate()
Expected behavior
The code should work
Environment details (please complete the following information)
Additional context
the code does not work with and without hive support enabled
The text was updated successfully, but these errors were encountered: