Skip to content

Commit

Permalink
Merge pull request PlayVoice#60 from ForsakenRei/patch
Browse files Browse the repository at this point in the history
update en doc to latest, slightly modified zh doc
  • Loading branch information
innnky committed Feb 16, 2023
2 parents 090de29 + b9e42a9 commit a7e4e76
Show file tree
Hide file tree
Showing 2 changed files with 45 additions and 19 deletions.
30 changes: 28 additions & 2 deletions Eng_docs.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# SoftVC VITS Singing Voice Conversion

## Updates
> According to incomplete statistics, it seems that training with multiple speakers may lead to **worsened leaking of voice timbre**. It is not recommended to train models with more than 5 speakers. The current suggestion is to try to train models with only a single speaker if you want to achieve a voice timbre that is more similar to the target.
> Fixed the issue with unwanted staccato, improving audio quality by a decent amount.\
Expand All @@ -8,11 +9,11 @@
## Model Overview
A singing voice coversion (SVC) model, using the SoftVC encoder to extract features from the input audio, sent into VITS along with the F0 to replace the original input to acheive a voice conversion effect. Additionally, changing the vocoder to [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) to fix the issue with unwanted staccato.

## Notice
+ The current branch is the 32kHz version, which requires less vram during inferencing, as well as faster inferencing speeds, and datasets for said branch take up less disk space. Thus the 32 kHz branch is recommended for use.
+ If you want to train 48 kHz variant models, switch to the [main branch](https://github.com/innnky/so-vits-svc/tree/main).
## Colab notebook script for dataset creation and training.
[colab training notebook](https://colab.research.google.com/drive/1rCUOOVG7-XQlVZuWRAj5IpGrMM8t07pE?usp=sharing)


## Required models
+ soft vc hubert:[hubert-soft-0d54a1f4.pt](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt)
Expand All @@ -32,6 +33,8 @@ wget -P logs/32k/ https://huggingface.co/innnky/sovits_pretrained/resolve/main/D

```

## Colab notebook script for dataset creation and training.
[colab training notebook](https://colab.research.google.com/drive/1rCUOOVG7-XQlVZuWRAj5IpGrMM8t07pE?usp=sharing)

## Dataset preparation
All that is required is that the data be put under the `dataset_raw` folder in the structure format provided below.
Expand Down Expand Up @@ -81,3 +84,26 @@ Use [inference_main.py](inference_main.py)
+ Change `clean_names` to the output file name.
+ Use `trans` to edit the pitch shifting amount (semitones).
+ Change `spk_list` to the speaker name.

## Onnx Exporting.
### **When exporting Onnx, please make sure you re-clone the whole repository!!!**
Use [onnx_export.py](onnx_export.py)
+ Create a new folder called `checkpoints`.
+ Create a project folder in `checkpoints` folder with the desired name for your project, let's use `myproject` as example. Folder structure looks like `./checkpoints/myproject`.
+ Rename your model to `model.pth`, rename your config file to `config.json` then move them into `myproject` folder.
+ Modify [onnx_export.py](onnx_export.py) where `path = "NyaruTaffy"`, change `NyaruTaffy` to your project name, here it will be `path = "myproject"`.
+ Run [onnx_export.py](onnx_export.py)
+ Once it finished, a `model.onnx` will be generated in `myproject` folder, that's the model you just exported.
+ Notice: if you want to export a 48K model, please follow the instruction below or use `model_onnx_48k.py` directly.
+ Open [model_onnx.py](model_onnx.py) and change `hps={"sampling_rate": 32000...}` to `hps={"sampling_rate": 48000}` in class `SynthesizerTrn`.
+ Open [nvSTFT](/vdecoder/hifigan/nvSTFT.py) and replace all `32000` with `48000`
### Onnx Model UI Support
+ [MoeSS](https://github.com/NaruseMioShirakana/MoeSS)
+ All training function and transformation are removed, only if they are all removed you are actually using Onnx.

## Gradio (WebUI)
Use [sovits_gradio.py](sovits_gradio.py) to run Gradio WebUI
+ Create a new folder called `checkpoints`.
+ Create a project folder in `checkpoints` folder with the desired name for your project, let's use `myproject` as example. Folder structure looks like `./checkpoints/myproject`.
+ Rename your model to `model.pth`, rename your config file to `config.json` then move them into `myproject` folder.
+ Run [sovits_gradio.py](sovits_gradio.py)
34 changes: 17 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# SoftVC VITS Singing Voice Conversion
## English docs
[英语资料](Eng_docs.md)
[Check here](Eng_docs.md)


## Update
## Updates
> 据不完全统计,多说话人似乎会导致**音色泄漏加重**,不建议训练超过5人的模型,目前的建议是如果想炼出来更像目标音色,**尽可能炼单说话人的**\
> 断音问题已解决,音质提升了不少\
> 2.0版本已经移至 sovits_2.0分支\
Expand All @@ -22,12 +22,12 @@

## 预先下载的模型文件
+ soft vc hubert:[hubert-soft-0d54a1f4.pt](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt)
+ 放在hubert目录下
+ 放在`hubert`目录下
+ 预训练底模文件 [G_0.pth](https://huggingface.co/innnky/sovits_pretrained/resolve/main/G_0.pth)[D_0.pth](https://huggingface.co/innnky/sovits_pretrained/resolve/main/D_0.pth)
+ 放在logs/32k 目录下
+ 放在`logs/32k`目录下
+ 预训练底模为必选项,因为据测试从零开始训练有概率不收敛,同时底模也能加快训练速度
+ 预训练底模训练数据集包含云灏 即霜 辉宇·星AI 派蒙 绫地宁宁,覆盖男女生常见音域,可以认为是相对通用的底模
+ 底模删除了optimizer speaker_embedding 等无关权重, 只可以用于初始化训练,无法用于推理
+ 底模删除了`optimizer speaker_embedding`等无关权重, 只可以用于初始化训练,无法用于推理
+ 该底模和48khz底模通用
```shell
# 一键下载
Expand Down Expand Up @@ -90,24 +90,24 @@ python train.py -c configs/config.json -m 32k
## 推理

使用 [inference_main.py](inference_main.py)
+ 更改model_path为你自己训练的最新模型记录点
+ 将待转换的音频放在raw文件夹下
+ clean_names 写待转换的音频名称
+ trans 填写变调半音数量
+ spk_list 填写合成的说话人名称
+ 更改`model_path`为你自己训练的最新模型记录点
+ 将待转换的音频放在`raw`文件夹下
+ `clean_names` 写待转换的音频名称
+ `trans` 填写变调半音数量
+ `spk_list` 填写合成的说话人名称


## Onnx导出
### 重要的事情说三遍:导出Onnx时,请重新克隆整个仓库!!!导出Onnx时,请重新克隆整个仓库!!!导出Onnx时,请重新克隆整个仓库!!!
使用 [onnx_export.py](onnx_export.py)
+ 新建文件夹:checkpoints 并打开
+ 在checkpoints文件夹中新建一个文件夹作为项目文件夹,文件夹名为你的项目名称
+ 将你的模型更名为model.pth,配置文件更名为config.json,并放置到刚才创建的文件夹下
+[onnx_export.py](onnx_export.py) 中path = "NyaruTaffy" 的 "NyaruTaffy" 修改为你的项目名称
+ 新建文件夹:`checkpoints` 并打开
+ `checkpoints`文件夹中新建一个文件夹作为项目文件夹,文件夹名为你的项目名称,比如`aziplayer`
+ 将你的模型更名为`model.pth`,配置文件更名为`config.json`,并放置到刚才创建的`aziplayer`文件夹下
+[onnx_export.py](onnx_export.py) `path = "NyaruTaffy"``"NyaruTaffy"` 修改为你的项目名称`path = "aziplayer"`
+ 运行 [onnx_export.py](onnx_export.py)
+ 等待执行完毕,在你的项目文件夹下会生成一个model.onnx,即为导出的模型
+ 注意:若想导出48K模型,请按照以下步骤修改文件,或者直接使用48K.py
+ 请打开[model_onnx.py](model_onnx.py)将其中最后一个class的hps中32000改为48000
+ 等待执行完毕,在你的项目文件夹下会生成一个`model.onnx`,即为导出的模型
+ 注意:若想导出48K模型,请按照以下步骤修改文件,或者直接使用`model_onnx_48k.py`
+ 请打开[model_onnx.py](model_onnx.py)将其中最后一个class`SynthesizerTrn`的hps中`sampling_rate`32000改为48000
+ 请打开[nvSTFT](/vdecoder/hifigan/nvSTFT.py),将其中所有32000改为48000
### Onnx模型支持的UI
+ [MoeSS](https://github.com/NaruseMioShirakana/MoeSS)
Expand Down

0 comments on commit a7e4e76

Please sign in to comment.