default rmm alloc fraction to the max to avoid unnecessary fragmentat…

…ion (NVIDIA#2846) * default rmm alloc fraction to the max to avoid unnecessary fragmentation Signed-off-by: Rong Ou <rong.ou@gmail.com> * initial allocation takes into account of the reserve Signed-off-by: Rong Ou <rong.ou@gmail.com> Signed-off-by: Raza Jafri <rjafri@nvidia.com>
razajafri · Aug 23, 2021 · c97a672 · c97a672
1 parent 03aab1f
commit c97a672
Show file tree

Hide file tree

Showing 3 changed files with 6 additions and 5 deletions.
diff --git a/docs/configs.md b/docs/configs.md
@@ -31,7 +31,7 @@ Name | Description | Default Value
 -----|-------------|--------------
 <a name="alluxio.pathsToReplace"></a>spark.rapids.alluxio.pathsToReplace|List of paths to be replaced with corresponding alluxio scheme. Eg, when configureis set to "s3:/foo->alluxio://0.1.2.3:19998/foo,gcs:/bar->alluxio://0.1.2.3:19998/bar", which means:       s3:/foo/a.csv will be replaced to alluxio://0.1.2.3:19998/foo/a.csv and      gcs:/bar/b.csv will be replaced to alluxio://0.1.2.3:19998/bar/b.csv|None
 <a name="cloudSchemes"></a>spark.rapids.cloudSchemes|Comma separated list of additional URI schemes that are to be considered cloud based filesystems. Schemes already included: dbfs, s3, s3a, s3n, wasbs, gs. Cloud based stores generally would be total separate from the executors and likely have a higher I/O read cost. Many times the cloud filesystems also get better throughput when you have multiple readers in parallel. This is used with spark.rapids.sql.format.parquet.reader.type|None
-<a name="memory.gpu.allocFraction"></a>spark.rapids.memory.gpu.allocFraction|The fraction of available GPU memory that should be initially allocated for pooled memory. Extra memory will be allocated as needed, but it may result in more fragmentation. This must be less than or equal to the maximum limit configured via spark.rapids.memory.gpu.maxAllocFraction.|0.9
+<a name="memory.gpu.allocFraction"></a>spark.rapids.memory.gpu.allocFraction|The fraction of available GPU memory that should be initially allocated for pooled memory. Extra memory will be allocated as needed, but it may result in more fragmentation. This must be less than or equal to the maximum limit configured via spark.rapids.memory.gpu.maxAllocFraction.|1.0
 <a name="memory.gpu.debug"></a>spark.rapids.memory.gpu.debug|Provides a log of GPU memory allocations and frees. If set to STDOUT or STDERR the logging will go there. Setting it to NONE disables logging. All other values are reserved for possible future expansion and in the mean time will disable logging.|NONE
 <a name="memory.gpu.direct.storage.spill.batchWriteBuffer.size"></a>spark.rapids.memory.gpu.direct.storage.spill.batchWriteBuffer.size|The size of the GPU memory buffer used to batch small buffers when spilling to GDS. Note that this buffer is mapped to the PCI Base Address Register (BAR) space, which may be very limited on some GPUs (e.g. the NVIDIA T4 only has 256 MiB), and it is also used by UCX bounce buffers.|8388608
 <a name="memory.gpu.direct.storage.spill.enabled"></a>spark.rapids.memory.gpu.direct.storage.spill.enabled|Should GPUDirect Storage (GDS) be used to spill GPU memory buffers directly to disk. GDS must be enabled and the directory `spark.local.dir` must support GDS. This is an experimental feature. For more information on GDS, see https://docs.nvidia.com/gpudirect-storage/.|false

diff --git a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuDeviceManager.scala b/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuDeviceManager.scala
@@ -171,8 +171,11 @@ object GpuDeviceManager extends Logging {
     // Align workaround for https://github.com/rapidsai/rmm/issues/527
     def truncateToAlignment(x: Long): Long = x & ~511L
 
-    var initialAllocation = truncateToAlignment((conf.rmmAllocFraction * info.free).toLong)
     val minAllocation = truncateToAlignment((conf.rmmAllocMinFraction * info.total).toLong)
+    val maxAllocation = truncateToAlignment((conf.rmmAllocMaxFraction * info.total).toLong)
+    val reserveAmount = conf.rmmAllocReserve
+    var initialAllocation = truncateToAlignment(
+      (conf.rmmAllocFraction * (info.free - reserveAmount)).toLong)
     if (initialAllocation < minAllocation) {
       throw new IllegalArgumentException(s"The initial allocation of " +
         s"${toMB(initialAllocation)} MB (calculated from ${RapidsConf.RMM_ALLOC_FRACTION} " +
@@ -181,7 +184,6 @@ object GpuDeviceManager extends Logging {
         s"${RapidsConf.RMM_ALLOC_MIN_FRACTION} (=${conf.rmmAllocMinFraction}) " +
         s"and ${toMB(info.total)} MB total memory)")
     }
-    val maxAllocation = truncateToAlignment((conf.rmmAllocMaxFraction * info.total).toLong)
     if (maxAllocation < initialAllocation) {
       throw new IllegalArgumentException(s"The initial allocation of " +
         s"${toMB(initialAllocation)} MB (calculated from ${RapidsConf.RMM_ALLOC_FRACTION} " +
@@ -190,7 +192,6 @@ object GpuDeviceManager extends Logging {
         s"${RapidsConf.RMM_ALLOC_MAX_FRACTION} (=${conf.rmmAllocMaxFraction}) " +
         s"and ${toMB(info.total)} MB total memory)")
     }
-    val reserveAmount = conf.rmmAllocReserve
     if (reserveAmount >= maxAllocation) {
       throw new IllegalArgumentException(s"RMM reserve memory (${toMB(reserveAmount)} MB) " +
           s"larger than maximum pool size (${toMB(maxAllocation)} MB). Check the settings for " +

diff --git a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala b/sql-plugin/src/main/scala/com/nvidia/spark/rapids/RapidsConf.scala
@@ -330,7 +330,7 @@ object RapidsConf {
       s"configured via $RMM_ALLOC_MAX_FRACTION_KEY.")
     .doubleConf
     .checkValue(v => v >= 0 && v <= 1, "The fraction value must be in [0, 1].")
-    .createWithDefault(0.9)
+    .createWithDefault(1)
 
   val RMM_ALLOC_MAX_FRACTION = conf(RMM_ALLOC_MAX_FRACTION_KEY)
     .doc("The fraction of total GPU memory that limits the maximum size of the RMM pool. " +