Skip to content

MingjieChen/EasyVC

Repository files navigation

EasyVC

[demo-page]

A voice conversion framework for different types of encoders, decoders and vocoders.

The encoder-decoder framework is demonstrated in the following figure. figure

More specifically, three encoders are used to extract representations from speech, including a linguistic encoder, a prosodic encoder and a speaker encoder. Then a decoder is used to reconstruct speech mel-spectrograms. Finally, a vocoder converts mel-spectrograms to waveforms. Note that this repo also supports decoders that directly reconstruct waveforms (e.g. VITS), in these case, vocoders are not needed.

This repo covers all the steps of a voice conversion pipeline from dataset downloading to evaluation.

I am currently working on my own to maintain this repo. I am planning to integrate more encoders and decoders.

Please be aware that this repo is currently very unstable and under very fast developement.

Conda env

create a conda env

conda create --name torch_1.9 --file requirements.txt

Working progress

How to run

Step1: Dataset download

This part of codes are mostly from parallel_wavegan

./bin/download_vctk_dataset.sh

Or

./bin/download_libritts_dataset.sh

Step2: Generate metadata.csv

./bin/preprocess_vctk.sh

Or

./bin/preprocess_libritts.sh

Step3: Extract features

A ESPNET style bash script has been provided for extracting features, including spectrograms, linguistic, speaker, and prosodic representations. Before start extracting features, you need to decide the setups of your encoders, decoder and vocoder.

e.g.

./extract_features.sh --stage 1 \
                      --stop_stage 4 \
                      --dataset vctk \
                      --linguistic_encoder vqwav2vec \
                      --speaker_encoder utt_dvec \
                      --prosodic_encoder ppgvc_f0 \
                      --decoder fastspeech2 \
                      --vocoder ppgvc_hifigan

Options:

  • dataset:
    • vctk
    • libritts
  • speaker_encoder:
    • utt_dvec
    • utt_ecapa_tdnn
  • linguistic_encoder:
    • vqwav2vec
    • conformer_ppg
    • hubert_soft
    • contentvec_100
    • contentvec_500
    • whisper_ppg
  • prosodic_encoder:
    • ppgvc_f0
    • fastspeech2_pitch_energy
  • decoder:
    • fastspeech2
    • taco_ar
    • taco_mol
    • vits
  • vocoder:
    • ppgvc_hifigan
    • vctk_hifigan
    • libritts_hifigan

Step4: Training

To run training, you need to select a config file from configs/. The config files are named following the format ${dataset}_${linguistic_encoder}_${speaker_encoder}_${prosodic_encoder}_${decoder}_${vocoder} E.g.

./bin/train.sh configs/vctk_vqwav2vec_uttdvec_ppgvcf0_fs2_ppgvchifigan.yaml

Step5: Generate Eval List & Inference

To generate a eval list, you need to run e.g.

./bin/generate_eval_list.sh --task vc \
                            --dataset vctk \
                            --split eval_all #eval set name\
                            --eval_list eval_list.json #eval_list file name \
                            --n_trg_spk_samples 10 #number of randomly selected samples of target speakers to get averaged speaker embedding \
                            --n_src_spk_samples 10 # number of randomly selected samples of source speakers to test \
                            --n_eval_spks # number of randomly selected speakers from eval set

Then run inference using the genertated eval_list, e.g.

python inference.py \
          --exp_dir exp/${dataset}_${ling_enc}_${spk_enc}_${pros_enc}_${dec}_${vocoder}/${exp_name} \
          --eval_list data/$dataset/$split/$eval_list \
          --epochs 200 \
          --task a2a_vc # a2a_vc, m2m_vc or oneshot_vc, will decide how many target speaker embeddings to be used. \
          --device cpu \

Step6: Evaluation

./submit_evaluation.sh

Authors

  • Mingjie Chen, University of Sheffield
  • Prof. Thomas Hain, University of Sheffield

About

A toolkit for any-to-any encoder-decoder voice conversion systems

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published