WavCaps/retrieval at master · XinhaoMei/WavCaps

README.md

Please download pretrained audio encoders from PANNs or HTSAT. We have also uploaded our used audio encoders here.

Put them under pretrained_models/audio_encoders.

You can configure training settings in yaml files under settings directory.
For our dataloader, we use json files, and the audio key refers to the path of the audio clip in your computer or server.
Run pretrain.py for pretraining, and train.py for finetuning or training from scratch.

We provide pretrained audio-language retrieval models for reproducing results.

Pretrained models can be downloaded at Google Drive