Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
bin		bin
configs		configs
dataset		dataset
decoder		decoder
evaluation		evaluation
ling_encoder		ling_encoder
preprocess		preprocess
prosodic_encoder		prosodic_encoder
speaker_encoder		speaker_encoder
vocoder		vocoder
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
enc_dec_voice_conversion.drawio		enc_dec_voice_conversion.drawio
enc_dec_voice_conversion.drawio.png		enc_dec_voice_conversion.drawio.png
extract_features.sh		extract_features.sh
feature_extraction.py		feature_extraction.py
inference.py		inference.py
model.py		model.py
requirements.txt		requirements.txt
submit_evaluation.sh		submit_evaluation.sh
submit_inference.sh		submit_inference.sh
submit_train.sh		submit_train.sh
train.py		train.py
util.py		util.py

Repository files navigation

enc_dec_voice_conversion (EDVC)

Work in progress.

A voice conversion framework for different types of encoders, decoders and vocoders.

The encoder-decoder framework is demonstrated in the following figure.

More specifically, three encoders are used to extract representations from speech, including a linguistic encoder, a prosodic encoder and a speaker encoder. Then a decoder is used to reconstruct speech mel-spectrograms. Finally, a vocoder converts mel-spectrograms to waveforms. Note that this repo also supports decoders that directly reconstruct waveforms (e.g. VITS), in these case, vocoders are not needed.

This repo covers all the steps of a voice conversion pipeline from dataset downloading to evaluation.

I am currently working on my own to maintain this repo. I am planning to integrate more encoders and decoders.

Pleas be aware that this repo is currently very unstable and under very fast developement.

Conda env

create a conda env

conda create --name torch_1.9 --file requirements.txt

Working progress

Dataset
- VCTK
- LibriTTS
Linguistic Encoder
- conformer_ppg from ppg-vc
- vq-wav2vec from fairseq
- hubert_soft and hubert_discrete from soft-vc
Prosodic Encoder
- log-f0 from ppg-vc
- pitch + energy from fastspeech2
Speaker Encoder
- d-vector from ppg-vc
Decoder
- fastspeech2 from fastspeech2
- taco_ar from s3prl-vc
- taco_mol from ppg-vc
- vits from vits
Vocoder
- hifigan (vctk) from ppg-vc
Evaluation
- UTMOS22 mos prediction from UTMOS22
- ASR WER
- ASV EER
- MCD, F0-RMSE, F0-CORR

How to run

Step1: Dataset download

This part of codes are mostly from parallel_wavegan

./bin/download_vctk_dataset.sh

Or

./bin/download_libritts_dataset.sh

Step2: Generate metadata.csv

./bin/preprocess_vctk.sh

Or

./bin/preprocess_libritts.sh

Step3: Extract features

A ESPNET style bash script has been provided for extracting features, including spectrograms, linguistic, speaker, and prosodic representations. Before start extracting features, you need to decide the setups of your encoders, decoder and vocoder.

e.g.

./extract_features.sh --stage 1 \
                      --stop_stage 4 \
                      --dataset vctk \
                      --linguistic_encoder vqwav2vec \
                      --speaker_encoder utt_dvec \
                      --prosodic_encoder ppgvc_f0 \
                      --decoder fastspeech2 \
                      --vocoder ppgvc_hifigan

Options:

dataset:
- vctk
- libritts
speaker_encoder:
- utt_dvec
linguistic_encoder:
- vqwav2vec
- conformer_ppg
- hubert_soft
prosodic_encoder:
- ppgvc_f0
- fastspeech2_pitch_energy
decoder:
- fastspeech2
- taco_ar
- taco_mol
- vits
vocoder:
- ppgvc_hifigan
- vctk_hifigan
- libritts_hifigan

Step4: Training

To run training, a config file need to be chosed. A config file can be specified by its dataset, encoder, decoder and vocoder. E.g.

./bin/train.sh configs/vctk_vqwav2vec_uttdvec_ppgvcf0_fs2_ppgvchifigan.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

enc_dec_voice_conversion (EDVC)

Conda env

Working progress

How to run

Step1: Dataset download

Step2: Generate metadata.csv

Step3: Extract features

Step4: Training

About

Releases

Packages

Languages

License

MingjieChen/EasyVC

Folders and files

Latest commit

History

Repository files navigation

enc_dec_voice_conversion (EDVC)

Conda env

Working progress

How to run

Step1: Dataset download

Step2: Generate metadata.csv

Step3: Extract features

Step4: Training

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages