-
Build your env using Singularuty container. Check singularity recipe [singularity/Singularity.1.4.0]
``` sudo apt-get install -y singularity-container cd singularity bash sing_build.sh ```
-
Check slurm configs and paths at submit.sh.
Number of nodes and gpus must match lightning Trainer paramsexample training 2 nodes X 1 gpu
#SBATCH --nodes=2 #SBATCH --gres=gpu:1 #SBATCH --ntasks-per-node=1
from pytorch_lightning import Trainer trainer = Trainer( gpus=1, num_nodes=2, distributed_backend="ddp" )
-
Run job
sbatch submit.sh
-
Basic commands\
cancel job
scancel yourJobID
check status
squeue -j yourJobID
get into allocated machine
ssh hostname
-
The way to get logs on your local machine
rsync -avh --info=progress2 /src /dst
-
Notifications
You must be signed in to change notification settings - Fork 1
krokodilno/slurm_ddp
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published