- Threading
- Multiprocessing
- IPyparallel
- Pathos
- Numba
- PyTorch_Multiprocessing
- PyCUDA
- PyOpenCL
- Joblib Parallel only for CPU
- Ray
- Dask
- RAPIDS
- HuggingFace Model Parallel
- Data Parallel PyTorch Distributed: Experiences on Accelerating Data Parallel Training
- Tensor Parallel Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
- Pipeline Parallel
- 3D Parallel DeepSpeed: Extreme-scale model training for everyone
- Mixed Precision Training
- API Solutions:
- DeepSpeed
- Megatron-LM
- PyTorch
- SageMaker
- FairScale