This repository contains an example of distributed training for music tagging using the Accelerate framework. It demonstrates how to set up multi-machine, multi-GPU distributed training.
Please follow the steps below on both machines to set up and run the training process:
-
Clone the repository:
git clone https://github.com/HarlandZZC/music_tagging_accelerate.git cd music_tagging_accelerate
-
Set up the environment:
conda env create -f music_tagging_env.yaml conda activate music_tagging
If some packages cannot be installed through the YAML file, please download them manually.
-
Adjust the
default_config.yaml
andtrain.sh
according to the server you are using. -
Start training by running:
bash train.sh
-
To evaluate the training results, run:
python test.py
Please note that test.py
only runs on a single GPU on a single machine. For more information on how to use the Accelerate framework, please refer to https://huggingface.co/docs/accelerate/index.