NVIDIA · tgravescs · Jul 26, 2022 · Jun 28, 2022 · Jun 30, 2022 · Jul 1, 2022
diff --git a/docs/configs.md b/docs/configs.md
@@ -29,7 +29,10 @@ scala> spark.conf.set("spark.rapids.sql.incompatibleOps.enabled", true)
 
 Name | Description | Default Value
 -----|-------------|--------------
-<a name="alluxio.pathsToReplace"></a>spark.rapids.alluxio.pathsToReplace|List of paths to be replaced with corresponding alluxio scheme. Eg, when configureis set to "s3:/foo->alluxio://0.1.2.3:19998/foo,gcs:/bar->alluxio://0.1.2.3:19998/bar", which means:       s3:/foo/a.csv will be replaced to alluxio://0.1.2.3:19998/foo/a.csv and      gcs:/bar/b.csv will be replaced to alluxio://0.1.2.3:19998/bar/b.csv|None
+<a name="alluxio.automount.enabled"></a>spark.rapids.alluxio.automount.enabled|Enable the feature of auto mounting the cloud storage to Alluxio. It requires the Alluxio master is the same node of Spark driver node. When it's true, it requires an environment variable ALLUXIO_HOME be set properly. The default value of ALLUXIO_HOME is "/opt/alluxio-2.8.0". You can set it as an environment variable when running a spark-submit or you can use spark.yarn.appMasterEnv.ALLUXIO_HOME to set it on Yarn. The Alluxio master's host and port will be read from alluxio.master.hostname and alluxio.master.rpc.port(default: 19998) from ALLUXIO_HOME/conf/alluxio-site.properties, then replace a cloud path which matches spark.rapids.alluxio.bucket.regex like "s3://bar/b.csv" to "alluxio://0.1.2.3:19998/bar/b.csv", and the bucket "s3://bar" will be mounted to "/bar" in Alluxio automatically.|false
+<a name="alluxio.bucket.regex"></a>spark.rapids.alluxio.bucket.regex|A regex to decide which bucket should be auto-mounted to Alluxio. E.g. when setting as "^s3://bucket.*", the bucket which starts with "s3://bucket" will be mounted to Alluxio and the path "s3://bucket-foo/a.csv" will be replaced to "alluxio://0.1.2.3:19998/bucket-foo/a.csv". It's only valid when setting spark.rapids.alluxio.automount.enabled=true. The default value matches all the buckets in "s3://" or "s3a://" scheme.|^s3a{0,1}://.*
+<a name="alluxio.cmd"></a>spark.rapids.alluxio.cmd|Provide the Alluxio command, which is used to mount or get information. The default value is "su,ubuntu,-c,/opt/alluxio-2.8.0/bin/alluxio", it means: run Process(Seq("su", "ubuntu", "-c", "/opt/alluxio-2.8.0/bin/alluxio fs mount --readonly /bucket-foo s3://bucket-foo")), to mount s3://bucket-foo to /bucket-foo. the delimiter "," is used to convert to Seq[String] when you need to use a special user to run the mount command.|List(su, ubuntu, -c, /opt/alluxio-2.8.0/bin/alluxio)
+<a name="alluxio.pathsToReplace"></a>spark.rapids.alluxio.pathsToReplace|List of paths to be replaced with corresponding Alluxio scheme. E.g. when configure is set to "s3://foo->alluxio://0.1.2.3:19998/foo,gs://bar->alluxio://0.1.2.3:19998/bar", it means: "s3://foo/a.csv" will be replaced to "alluxio://0.1.2.3:19998/foo/a.csv" and "gs://bar/b.csv" will be replaced to "alluxio://0.1.2.3:19998/bar/b.csv". To use this config, you have to mount the buckets to Alluxio by yourself. If you set this config, spark.rapids.alluxio.automount.enabled won't be valid.|None
 <a name="cloudSchemes"></a>spark.rapids.cloudSchemes|Comma separated list of additional URI schemes that are to be considered cloud based filesystems. Schemes already included: abfs, abfss, dbfs, gs, s3, s3a, s3n, wasbs. Cloud based stores generally would be total separate from the executors and likely have a higher I/O read cost. Many times the cloud filesystems also get better throughput when you have multiple readers in parallel. This is used with spark.rapids.sql.format.parquet.reader.type|None
 <a name="gpu.resourceName"></a>spark.rapids.gpu.resourceName|The name of the Spark resource that represents a GPU that you want the plugin to use if using custom resources with Spark.|gpu
 <a name="memory.gpu.allocFraction"></a>spark.rapids.memory.gpu.allocFraction|The fraction of available (free) GPU memory that should be allocated for pooled memory. This must be less than or equal to the maximum limit configured via spark.rapids.memory.gpu.maxAllocFraction, and greater than or equal to the minimum limit configured via spark.rapids.memory.gpu.minAllocFraction.|1.0

diff --git a/docs/get-started/getting-started-alluxio.md b/docs/get-started/getting-started-alluxio.md
@@ -244,10 +244,14 @@ NM_hostname_2
        ```
 
    For other filesystems, please refer to [this site](https://www.alluxio.io/).
+   We also provide auto mount feature for an easier usage.
+   Please refer to [Alluxio auto mount for AWS S3 buckets](#alluxio-auto-mount-for-aws-s3-buckets)
 
 ## RAPIDS Configuration
 
 There are two ways to leverage Alluxio in RAPIDS.
+We also provide an auto mount way for AWS S3 bucket if you install Alluxio in your Spark cluster.
+Please refer to [Alluxio auto mount for AWS S3 buckets](#alluxio-auto-mount-for-aws-s3-buckets)
 
 1. Explicitly specify the Alluxio path
 
@@ -312,6 +316,44 @@ There are two ways to leverage Alluxio in RAPIDS.
       --conf spark.executor.extraJavaOptions="-Dalluxio.conf.dir=${ALLUXIO_HOME}/conf" \
    ```
 
+## Alluxio auto mount for AWS S3 buckets
+
+There's a more user-friendly way to use Alluxio with RAPIDS when accessing S3 buckets.
+Suppose that a user has multiple buckets on AWS S3.
+To use `spark.rapids.alluxio.pathsToReplace` requires to mount all the buckets beforehand
+and put the path replacement one by one into this config. It'll be boring when there're many buckets.
+
+To solve this problem, we add a new feature of Alluxio auto mount, which can mount the S3 buckets
+automatically when finding them from the input path in the Spark driver.
+This feature requires the node running Spark driver has Alluxio installed,
+which means the node is also the master of Alluxio cluster. It will use `alluxio fs mount` command to
+mount the buckets in Alluxio. And the uid used to run the Spark application can run alluxio command.
+For example, the uid of Spark application is as same as the uid of Alluxio service
+or the uid of Spark application can use `su alluxio_uid` to run alluxio command.
+
+To enable the Alluxio auto mount feature, the simplest way is only to enable it by below config
+without setting `spark.rapids.alluxio.pathsToReplace`, which takes precedence over auto mount feature.
+``` shell
+--conf spark.rapids.alluxio.automount.enabled=true
+```
+If Alluxio is not installed in /opt/alluxio-2.8.0, you should set the environment variable `ALLUXIO_HOME`.
+
+Additional configs:
+``` shell
+--conf spark.rapids.alluxio.bucket.regex="^s3a{0,1}://.*"
+```
+The regex is used to match the s3 URI, to decide which bucket we should auto mount.
+The default value is to match all the URIs which start with `s3://` or `s3a://`.
+For exmaple, `^s3a{1,1}://foo.*` will match the buckets which start with `foo`.
+
+```shell
+--conf spark.rapids.alluxio.cmd="su,ubuntu,-c,/opt/alluxio-2.8.0/bin/alluxio"
+```
+This cmd config defines a sequence to be used run the alluxio command by a specific user,
+mostly the user with Alluxio permission. We run the command by user `ubuntu` as default.
+If you have a different user and command path, you can redefine it.
+The default value is suitable for the case of running Alluxio with RAPIDS on Databricks.
+
 ## Alluxio Troubleshooting
 
 This section will give some links about how to configure, tune Alluxio and some troubleshooting.