update docs (lakesoul-io#298)

Signed-off-by: chenxu <chenxu@dmetasoul.com> Co-authored-by: chenxu <chenxu@dmetasoul.com>
LaSola0 · Aug 22, 2023 · 1289de0 · 1289de0
1 parent bd4b341
commit 1289de0
Show file tree

Hide file tree

Showing 4 changed files with 29 additions and 10 deletions.
diff --git a/website/docs/01-Getting Started/01-setup-local-env.md b/website/docs/01-Getting Started/01-setup-local-env.md
@@ -39,14 +39,14 @@ export lakesoul_home=/opt/soft/pg.property
 You can put customized database configuration information in this file.
 
 ## Install an Apache Spark environment
-You could download spark distribution from https://spark.apache.org/downloads.html, and please choose spark version 3.3.0 or above. Note that the official package from Apache Spark does not include hadoop-cloud component. We provide a Spark package with Hadoop cloud dependencies, download it from https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop-3.3.5.tgz.
+You could download spark distribution from https://spark.apache.org/downloads.html, and please choose spark version 3.3.0 or above. Note that the official package from Apache Spark does not include hadoop-cloud component. We provide a Spark package with Hadoop cloud dependencies, download it from https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop3.tgz.
 
 After unpacking spark package, you could find LakeSoul distribution jar from https://github.com/lakesoul-io/LakeSoul/releases. Download the jar file put it into `jars` directory of your spark environment.
 
 ```bash
-wget https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop-3.3.5.tgz
-tar xf spark-3.3.2-bin-hadoop-3.3.5.tgz
-export SPARK_HOME=${PWD}/spark-3.3.2-bin-dmetasoul
+wget https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop-3.tgz
+tar xf spark-3.3.2-bin-hadoop-3.tgz
+export SPARK_HOME=${PWD}/spark-3.3.2-bin-hadoop3
 wget https://github.com/lakesoul-io/LakeSoul/releases/download/v2.3.0/lakesoul-spark-2.3.0-spark-3.3.jar -P $SPARK_HOME/jars
 ```
 

diff --git a/website/docs/03-Usage Docs/02-setup-spark.md b/website/docs/03-Usage Docs/02-setup-spark.md
@@ -126,6 +126,15 @@ export LAKESOUL_PG_PASSWORD=root
 ````
 :::
 
+:::tip
+If you need to access S3, you also need to download `[flink-s3-hadoop](https://mvnrepository.com/artifact/org.apache.flink/flink-s3-fs-hadoop)` corresponding to the Flink version, and put to the `$FLINK_HOME/lib` directory.
+
+If access to the Hadoop environment is required, the Hadoop Classpath environment variable can be declared:
+```bash
+export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
+```
+For details, please refer to: [Flink on Hadoop](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/yarn/)
+:::
 
 :::tip
 LakeSoul may use extra amount of off-heap memory, consider to increase the off heap memory size for task manager:

diff --git a/...docusaurus-plugin-content-docs/current/01-Getting Started/01-setup-local-env.md b/...docusaurus-plugin-content-docs/current/01-Getting Started/01-setup-local-env.md
@@ -19,12 +19,12 @@ PGPASSWORD=lakesoul_test psql -h localhost -p 5432 -U lakesoul_test -f script/me
 ```
 
 ## 安装 Spark 环境
-由于 Apache Spark 官方的下载安装包不包含 hadoop-cloud 以及 AWS S3 等依赖，我们提供了一个 Spark 安装包，其中包含了 hadoop cloud 、s3 等必要的依赖：https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-lakesoul-8e167b33.tgz
+由于 Apache Spark 官方的下载安装包不包含 hadoop-cloud 以及 AWS S3 等依赖，我们提供了一个 Spark 安装包，其中包含了 hadoop cloud 、s3 等必要的依赖：https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop3.tgz
 
 ```bash
-wget https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop-3.3.5.tgz
-tar xf spark-3.3.2-bin-hadoop-3.3.5.tgz
-export SPARK_HOME=${PWD}/spark-3.3.2-bin-dmetasoul
+wget https://dmetasoul-bucket.obs.cn-southwest-2.myhuaweicloud.com/releases/spark/spark-3.3.2-bin-hadoop3.tgz
+tar xf spark-3.3.2-bin-hadoop3.tgz
+export SPARK_HOME=${PWD}/spark-3.3.2-bin-hadoop3
 ```
 
 :::tip

diff --git a/.../zh-Hans/docusaurus-plugin-content-docs/current/03-Usage Docs/02-setup-spark.md b/.../zh-Hans/docusaurus-plugin-content-docs/current/03-Usage Docs/02-setup-spark.md
@@ -116,11 +116,11 @@ containerized.taskmanager.env.LAKESOUL_PG_URL: jdbc:postgresql://localhost:5432/
 请注意，需要同时设置 master 和 taskmanager 环境变量。
 
 :::tip
-Postgres数据库的连接信息、用户名和密码需要根据实际部署进行修改。
+Postgres 数据库的连接信息、用户名和密码需要根据实际部署进行修改。
 :::
 
 ::: caution
-注意，如果使用Session方式启动作业，即以客户端的方式将作业提交给Flink Standalone Cluster，作为客户端的`flink run`不会读取上面的配置，所以需要单独配置环境变量， 即：
+注意，如果使用 Session 方式启动作业，即以客户端的方式将作业提交给 Flink Standalone Cluster，作为客户端的 `flink run` 不会读取上面的配置，所以需要单独配置环境变量， 即：
 
 ```bash
 export LAKESOUL_PG_DRIVER=com.lakesoul.shaded.org.postgresql.Driver
@@ -143,6 +143,16 @@ taskmanager.memory.task.off-heap.size: 3000m
 
 并将 jar 文件放在 `$FLINK_HOME/lib` 下。在此之后，您可以像往常一样启动 flink 会话集群或应用程序。
 
+:::tip
+如果需要访问 S3，还需要下载与 Flink 版本对应的 `[flink-s3-hadoop](https://mvnrepository.com/artifact/org.apache.flink/flink-s3-fs-hadoop)`，并放到 `$FLINK_HOME/lib` 目录下。
+
+如果需要访问 Hadoop 环境，可以声明 Hadoop Classpath 环境变量：
+```bash
+export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
+```
+具体可以参考：[Flink on Hadoop](https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/deployment/resource-providers/yarn/)
+:::
+
 ### 在你的 Java 项目中添加 LakeSoul Flink Maven 依赖
 
 将以下内容添加到项目的 pom.xml