add abstract into readme

MingjieChen · Jun 13, 2023 · 012ae04 · 012ae04
1 parent 714bd17
commit 012ae04
Show file tree

Hide file tree

Showing 3 changed files with 9 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -1,8 +1,12 @@
 # **EasyVC**
 
+
+> Current state-of-the-art voice conversion (VC) systems typically are developed based on an encoder-decoder framework. In this framework, encoders are used to extract linguistic, speaker or prosodic features from speech, then a decoder is to generate speech from speech features. Recently, there have been more and more advance models deployed as encoders or decoders for VC. Although obtaining good performance, the effects of these encoders and decoders have not been fully studied. On the other hand, VC technologies have been applied in different scenarios, which brings a lot of challenges for VC techiques. Hence, studies and understandings of encoders and decoders are becoming necessary and important. However, due to the complexity of VC systems, it is not always easy to compare and analyse these encoders and decoders. This paper introduces a toolkit, EasyVC, which is built upon the encoder-decoder framework. EasyVC supports a number of encoders and decoders within a unified framework, which makes it easy and convenient for VC training, inference, evaluation and deployment. EasyVC provides step-wise recipes covering from dataset downloading to objective evaluations and online demo presentation. Furthermore, EasyVC focuses on challenging VC scenarios such as one-shot, emotional, singing and real-time, which have not been fully studied at the moment. EasyVC could help researchers and developers to investigate modules of VC systems and also promote the development of VC techniques. 
+
+***
+
 [[demo-page](https://mingjiechen.github.io/easyvc/index.html)]
 
-A voice conversion framework for different types of encoders, decoders and vocoders. 
 
 The encoder-decoder framework is demonstrated in the following figure. ![figure](enc_dec_voice_conversion.drawio.png)
 
@@ -14,9 +18,8 @@ Note that this repo also supports decoders that directly reconstruct waveforms (
 
 This repo covers all the steps of a voice conversion pipeline from dataset downloading to evaluation.
 
-I am currently working on my own to maintain this repo. I am planning to integrate more encoders and decoders.
+Trained models will be available soon.
 
-Please be aware that this repo is currently very unstable and under very fast developement.
 
 
 # Conda env

diff --git a/configs/libritts_conformerppg_uttdvec_ppgvcf0_gradtts_ppgvchifigan.yaml b/configs/libritts_conformerppg_uttdvec_ppgvcf0_gradtts_ppgvchifigan.yaml
@@ -7,7 +7,7 @@ dev_set: dev_clean
 
 
 # encoder-decoder
-ling_enc: conformerppg
+ling_enc: conformer_ppg
 spk_enc: utt_dvec
 pros_enc: ppgvc_f0
 decoder: GradTTS

diff --git a/submit_train.sh b/submit_train.sh
@@ -43,16 +43,17 @@ slots=8
 #gputypes="GeForceGTXTITANX|GeForceGTX1080Ti|GeForceRTX3060"
 gputypes="GeForceGTX1080Ti"
 
+. ./bin/parse_options.sh || exit 1;
 model_name=${dataset}_${ling}_${spk}_${pros}_${dec}_${vocoder}
 exp=$exp_dir/$model_name/$exp_name
 
+
 config=configs/${dataset}_${ling}_${spk}_${pros}_${dec}_${vocoder}.yaml
 if [ ! -e $config ] ; then
     echo "can't find config file $config" 
     exit 1;
 fi    
 
-. ./bin/parse_options.sh || exit 1;
 
 
 # create exp dir