Extended the FAQ (NVIDIA#401)

* Extended the FAQ Signed-off-by: Robert (Bobby) Evans <bobby@apache.org> * Addressed review comments Signed-off-by: Robert (Bobby) Evans <bobby@apache.org>
nartal1 · Jul 22, 2020 · ce684f7 · ce684f7
1 parent e8cb4dd
commit ce684f7
Showing 1 changed file with 181 additions and 1 deletion.
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -13,7 +13,187 @@ settings. This is true of all configs in Spark. If you changed
 `spark.sql.autoBroadcastJoinThreshold` after running `explain()` on a `DataFrame`, the resulting
 query would not change to reflect that config and still show a `SortMergeJoin` even though the
 new config might have changed to be a `BroadcastHashJoin` instead. When actually running something
-like with `collect`, `show` or `write` a new `DataFrame` is constructed causing spark to replan the
+like with `collect`, `show` or `write` a new `DataFrame` is constructed causing spark to re-plan the
 query. This is why `spark.rapids.sql.enabled` is still respected when running, even if explain
 shows stale results.
 
+### What versions of Apache Spark does the RAPIDS Accelerator for Apache Spark support?
+
+The RAPIDS Accelerator for Apache Spark requires version 3.0.0 of Apache Spark. Because the plugin
+replaces parts of the physical plan that Apache Spark considers to be internal the code for those
+plans can change even between bug fix releases. As a part of our process, we try to stay on top of
+these changes and release updates as quickly as possible.
+
+### Which distributions are supported?
+
+The RAPIDS Accelerator for Apache Spark officially supports
+[Apache Spark](get-started/getting-started.md),
+[Databricks Runtime 7.0](get-started/getting-started-with-rapids-accelerator-on-databricks.md)
+and [Google Cloud Dataproc](get-started/getting-started-gcp.md).
+Most distributions based off of Apache Spark 3.0.0 should work, but because the plugin replaces
+parts of the physical plan that Apache Spark considers to be internal the code for those plans
+can change from one distribution to another. We are working with most cloud service providers to
+set up testing and validation on their distributions.
+
+### What is the right hardware setup to run GPU accelerated Spark?
+
+Reference Architectures should be available around Q4 2020.
+
+### What CUDA versions are supported?
+
+CUDA 10.1 and 10.2 are currently supported, but you need to download the cudf jar that corresponds
+to the version you are using. Please look [here][version/stable-release.md] for download links
+for the stable release.
+
+### What parts of Apache Spark are accelerated?
+
+Currently a limited set of SQL and DataFrame operations are supported, please see the
+[configs](configs.md) for a more complete list of what is supported. Some of structured streaming
+is likely to be accelerated, but it has not been an area of focus right now. Other areas like
+MLLib, GraphX or RDDs are not accelerated.
+
+### What is the road-map like?
+
+Please look at the github repository https://github.com/nvidia/spark-rapids It contains
+issue tracking and planning for sprints and releases.
+
+### How much faster will my query run?
+
+Any single operator isn’t a fixed amount faster. So there is no simple algorithm to see how much
+faster a query will run. In addition, Apache Spark can store intermediate data to disk and send it
+across the network, both of which we typically see as bottlenecks in real world queries. Generally
+for complicated queries where all the processing can run on the GPU we see between 3x and 7x
+speedup, with a 4x speedup typical. We have seen as high as 100x in some specific cases.
+
+### What operators are best suited for the GPU?
+
+* Group by operations with high cardinality
+* Joins with a high cardinality
+* Sorts with a high cardinality
+* Window operations, especially for large windows
+* Complicated processing
+* Writing Parquet/ORC
+* Reading CSV
+* Transcoding (reading an input file and doing minimal processing before writing it out again,
+possibly in a different format, like CSV to parquet)
+
+### Are there initialization costs?
+
+From our tests the GPU typically takes about 2 to 3 seconds to initialize when an executor first
+starts. If you are only going to run a single query that only takes a few seconds to run this can
+be problematic. In general if you are going to do 30 seconds or more of processing within a single
+session the overhead can be amortized.
+
+### Why is the size of my output Parquet/ORC file different?
+
+This can come down to a number of factors.  The GPU version often compresses data in smaller chunks
+to get more parallelism and performance. This can result in larger files in some instances. We have
+also seen instances where the ordering of the data can have a big impact on the output size of the
+files.  Spark tends to prefer sort based joins, and in some cases sort based aggregations, whereas
+the GPU versions are all hash based. This means that the resulting data can come out in a different
+order for the CPU and the GPU. This is not wrong, but can make the size of the output data
+different because of compression. Users can turn on
+(spark.rapids.sql.hashOptimizeSort.enabled)[configs.md#sql.hashOptimizeSort.enabled] to have
+the GPU try to replicate more closely what the output ordering would have been if sort were used,
+like on the CPU.
+
+### How are failures handled?
+
+The RAPIDS accelerator does not change the way failures are normally handled by Apache Spark.
+
+### How does the Spark scheduler decide what to do on the GPU vs the CPU?
+
+Technically the Spark scheduler does not make those decisions. The plugin has a set of rules that
+decide if an operation can safely be replaced by a GPU enabled version. We are working on some cost
+based optimizations to try and improve performance for some situations where it might be more
+efficient to stay on the CPU instead of going back and forth.
+
+### Is Dynamic Partition Pruning (DPP) Supported?
+
+Yes, DPP still works.  It might not be as efficient as it could be, and we are working to improve it.
+
+### Is Adaptive Query Execution (AQE) Supported?
+
+We are in the process of making sure AQE works. Some parts work now, but other parts require some
+changes to the internals of Spark, that we are working with the community to be able to support.
+
+### Are cache and persist supported?
+
+Yes cache and persist are supported, but they are not GPU accelerated yet. We are working with
+the Spark community on changes that would allow us to accelerate compression when caching data.
+
+### Can I cache data into GPU memory?
+
+No, that is not currently supported. It would require much larger changes to Apache Spark to be able
+to support this.
+
+### Is pyspark supported?
+
+Yes
+
+### Are the R APIs for Spark supported?
+
+Yes, but we don't actively test them.
+
+## Are the Java APIs for Spark supported?
+
+Yes, but we don't actively test them.
+
+## Are the Scala APIs for Spark supported?
+
+Yes
+
+## Is the GPU needed on the driver?  Are there any benefits to having a GPU on the driver?
+
+The GPU is not needed on the driver and there is no benefit to having one available on the driver
+for the RAPIDS plugin.
+
+## How does the performance compare to DataBricks' DeltaEngine?
+
+We have not evaluated the performance yet. DeltaEngine is not open source, so any analysis needs to
+be done with Databricks in some form. When DeltaEngine is generally available and the terms of
+service allow it, we will look into doing a comparison.
+
+### How many tasks can I run per executor? How many should I run per executor?
+
+There is no limit on the number of tasks per executor that you can run.  Generally we recommend 2 to
+6 tasks per executor and 1 GPU per executor. The GPU typically benefits from having 2 tasks run
+in [parallel](configs.md#sql.concurrentGpuTasks) on it at a time, assuming your GPU has enough
+memory to support that. Having 2 to 3 times as many tasks off of the GPU as on the GPU allows for
+I/O to be run in parallel with the processing. If you increase the tasks too high you can overload
+the I/O and starting the initial processing can suffer.  But if you have a lot of processing that
+cannot be done on the GPU, like complex UDFs, the more tasks you have the more CPU processing you
+can throw at it.
+
+### How can I run custom expressions/UDFs on the GPU?
+
+We do not currently support this, but we are working on ways to make it possible.
+
+### Why am I getting an error when trying to use pinned memory?
+
+```
+Caused by: ai.rapids.cudf.CudaException: OS call failed or operation not supported on this OS
+	at ai.rapids.cudf.Cuda.hostAllocPinned(Native Method)
+	at ai.rapids.cudf.PinnedMemoryPool.<init>(PinnedMemoryPool.java:254)
+	at ai.rapids.cudf.PinnedMemoryPool.lambda$initialize$1(PinnedMemoryPool.java:185)
+	at java.util.concurrent.FutureTask.run(FutureTask.java:264)
+```
+
+This is typically caused by the IOMMU being enabled.  Please see the
+[CUDA docs](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#iommu-on-linux)
+for this issue.
+
+To fix it you can either disable the IOMMU, or you can disable using pinned memory by setting
+[spark.rapids.memory.pinnedPool.size](configs.md#memory.pinnedPool.size) to 0.
+
+# Is speculative execution supported?
+
+Yes, speculative execution in Spark is fine with the RAPIDS accelerator plugin.
+
+As with all speculative execution, it may or may not be beneficial depending on the nature of why a
+particular task is slow and how easily speculation is triggered. You should monitor your Spark jobs
+to see how often task speculation occurs and how often the speculating task (i.e.: the one launched
+later) finishes before the slow task that triggered speculation. If the speculating task often
+finishes first then that's good, it is working as intended. If many tasks are speculating, but the
+original task always finishes first then this is a pure loss, the speculation is adding load to
+the Spark cluster with no benefit.