-
Notifications
You must be signed in to change notification settings - Fork 130
PyTorch optimizations
Albert Zeyer edited this page Oct 17, 2023
·
9 revisions
This is just an overview and collection of references.
Potential optimizations (speed and/or memory):
- Distributed / Multi GPU training. RETURNN config: check
torch_distributed
- Automatic mixed precision (AMP), e.g. to use float16 (fp16). RETURNN config:
torch_amp = "float16"
- PyTorch scripting and tracing (https://github.com/rwth-i6/returnn/issues/1436)
torch.compile
torch.optim._multi_tensor.AdamW
-
apex.optimizers.FusedAdam
(might be integrated into PyTorch? https://github.com/pytorch/pytorch/issues/71274) - Asynchronous data loading and augmentation. RETURNN config:
torch_dataloader_opts = {"num_workers": 1}
, maybe use together withMultiProcDataset
if more workers are needed, see here
References:
- PyTorch Tutorials > PyTorch Recipes > Performance Tuning Guide
- [Benchmark] HF Trainer on RTX-3090, many interesting benchmarks: fp16, bf16, tf32, grad accum, grad checkpointing, batch size, adamw apex fused