diff --git a/docs/source/en/perf_train_cpu_many.mdx b/docs/source/en/perf_train_cpu_many.mdx index 5705517f5b1b4a..f4f77965748e3e 100644 --- a/docs/source/en/perf_train_cpu_many.mdx +++ b/docs/source/en/perf_train_cpu_many.mdx @@ -36,8 +36,22 @@ pip install oneccl_bind_pt=={pytorch_version} -f https://software.intel.com/ipex ``` where `{pytorch_version}` should be your PyTorch version, for instance 1.12.0. Check more approaches for [oneccl_bind_pt installation](https://github.com/intel/torch-ccl). +Versions of oneCCL and PyTorch must match. -### Usage in Trainer +## Intel® MPI library +Use this standards-based MPI implementation to deliver flexible, efficient, scalable cluster messaging on Intel® architecture. This component is part of the Intel® oneAPI HPC Toolkit. +It can be installed via [MPI](https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html#mpi). + +Please set the environment by following command before using it. + +``` +source /opt/intel/oneapi/setvars.sh +``` + +The following "Usage in Trainer" takes mpirun in Intel® MPI library as an example. + + +## Usage in Trainer To enable multi CPU distributed training in the Trainer with the ccl backend, users should add **`--xpu_backend ccl`** in the command arguments. Let's see an example with the [question-answering example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering)