From 3eadfccab4b0295b21fd523dc02aa5fcca3aacab Mon Sep 17 00:00:00 2001
From: "Wang, Yi A" <yi.a.wang@intel.com>
Date: Thu, 11 Aug 2022 02:46:49 -0700
Subject: [PATCH 1/3] update doc for perf_train_cpu_many, add mpi introduction

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---
 docs/source/en/perf_train_cpu_many.mdx | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/docs/source/en/perf_train_cpu_many.mdx b/docs/source/en/perf_train_cpu_many.mdx
index 5705517f5b1b4a..350b7a736b27bd 100644
--- a/docs/source/en/perf_train_cpu_many.mdx
+++ b/docs/source/en/perf_train_cpu_many.mdx
@@ -37,7 +37,14 @@ pip install oneccl_bind_pt=={pytorch_version} -f https://software.intel.com/ipex
 where `{pytorch_version}` should be your PyTorch version, for instance 1.12.0.
 Check more approaches for [oneccl_bind_pt installation](https://github.com/intel/torch-ccl).
 
-### Usage in Trainer
+## Intel® MPI library
+Use this standards-based MPI implementation to deliver flexible, efficient, scalable cluster messaging on Intel® architecture. This component is part of the Intel® oneAPI HPC Toolkit.
+it could be installed from [mpi installation](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#mpi).
+
+The following "Usage in Trainer" takes mpirun in Intel® MPI library as an example.
+
+
+## Usage in Trainer
 To enable multi CPU distributed training in the Trainer with the ccl backend, users should add **`--xpu_backend ccl`** in the command arguments.
 
 Let's see an example with the [question-answering example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering)

From 20cc8e339b717be39d753ab2b1ca2bf718a27328 Mon Sep 17 00:00:00 2001
From: "Wang, Yi" <yi.a.wang@intel.com>
Date: Fri, 12 Aug 2022 08:07:32 +0800
Subject: [PATCH 2/3] Update docs/source/en/perf_train_cpu_many.mdx

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
---
 docs/source/en/perf_train_cpu_many.mdx | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/en/perf_train_cpu_many.mdx b/docs/source/en/perf_train_cpu_many.mdx
index 350b7a736b27bd..2263707863f2b2 100644
--- a/docs/source/en/perf_train_cpu_many.mdx
+++ b/docs/source/en/perf_train_cpu_many.mdx
@@ -39,7 +39,7 @@ Check more approaches for [oneccl_bind_pt installation](https://github.com/intel
 
 ## Intel® MPI library
 Use this standards-based MPI implementation to deliver flexible, efficient, scalable cluster messaging on Intel® architecture. This component is part of the Intel® oneAPI HPC Toolkit.
-it could be installed from [mpi installation](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#mpi).
+It can be installed via [MPI](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#mpi).
 
 The following "Usage in Trainer" takes mpirun in Intel® MPI library as an example.
 

From 69339512236d33f8a945e5453ff0c44c29eaef25 Mon Sep 17 00:00:00 2001
From: "Wang, Yi A" <yi.a.wang@intel.com>
Date: Thu, 11 Aug 2022 18:43:17 -0700
Subject: [PATCH 3/3] Update docs/source/en/perf_train_cpu_many.mdx

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---
 docs/source/en/perf_train_cpu_many.mdx | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/docs/source/en/perf_train_cpu_many.mdx b/docs/source/en/perf_train_cpu_many.mdx
index 2263707863f2b2..f4f77965748e3e 100644
--- a/docs/source/en/perf_train_cpu_many.mdx
+++ b/docs/source/en/perf_train_cpu_many.mdx
@@ -36,11 +36,18 @@ pip install oneccl_bind_pt=={pytorch_version} -f https://software.intel.com/ipex
 ```
 where `{pytorch_version}` should be your PyTorch version, for instance 1.12.0.
 Check more approaches for [oneccl_bind_pt installation](https://github.com/intel/torch-ccl).
+Versions of oneCCL and PyTorch must match.
 
 ## Intel® MPI library
 Use this standards-based MPI implementation to deliver flexible, efficient, scalable cluster messaging on Intel® architecture. This component is part of the Intel® oneAPI HPC Toolkit.
 It can be installed via [MPI](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#mpi).
 
+Please set the environment by following command before using it.
+
+```
+source /opt/intel/oneapi/setvars.sh
+```
+
 The following "Usage in Trainer" takes mpirun in Intel® MPI library as an example.