Neural Formant Synthesis with Differentiable Resonant Filters

Neural formant synthesis using differtiable resonant filters and source-filter model structure.

Authors: Lauri Juvela, Pablo Pérez Zarazaga, Gustav Eje Henter, Zofia Malisz

Model overview

We present a model that performs neural speech synthesis using the structure of the source-filter model, allowing to independently inspect and manipulate the spectral envelope and glottal excitation:

Sound samples

A description of the presented model and sound samples compared to other synthesis/manipulation systems can be found in the project's demo webpage

Repository installation

Conda environment

First, we need to create a conda environment to install our dependencies. Use mamba to speed up the process if possible.

mamba env create -n neuralformants -f environment.yml
conda activate neuralformants

Pre-trained models are available in HuggingFace, and can be downloaded using git-lfs. If you don't have git-lfs installed (it's included in environment.yml), you can find it here. Use the following command to download the pre-trained models:

git submodule update --init --recursive

Install the package in development mode:

pip install -e .

GlotNet

GlotNet is included partially for WaveNet models and DSP functions. Full repository is available here

HiFi-GAN

HiFi-GAN is included in the hifi_gan subdirectory. Original source code is available here

Inference

We provide a script to run inference on the end-to-end architecture, such that an audio file can be provided as input and a wav file with the manipulated features is stored as output.

Change the feature scaling to modify pitch (with F0) or formants. The scales are provided as a list of 5 elements with the following order:

[F0, F1, F2, F3, F4]

An example with the provided audio samples from the VCTK dataset can be run using:

HiFi-Glot

python inference_hifiglot.py \
    --input_path "./Samples" \
    --output_path "./output/hifi-glot" \
    --config "./checkpoints/HiFi-Glot/config_hifigan.json" \
    --fm_config "./checkpoints/HiFi-Glot/config_feature_map.json" \
    --checkpoint_path "./checkpoints/HiFi-Glot" \
    --feature_scale "[1.0, 1.0, 1.0, 1.0, 1.0]"

NFS

python inference_hifigan.py \
    --input_path "./Samples" \
    --output_path "./output/nfs" \
    --config "./checkpoints/NFS/config_hifigan.json" \
    --fm_config "./checkpoints/NFS/config_feature_map.json" \
    --checkpoint_path "./checkpoints/NFS" \
    --feature_scale "[1.0, 1.0, 1.0, 1.0, 1.0]"

NFS-E2E

python inference_hifigan.py \
    --input_path "./Samples" \
    --output_path "./output/nfs-e2e" \
    --config "./checkpoints/NFS-E2E/config_hifigan.json" \
    --fm_config "./checkpoints/NFS-E2E/config_feature_map.json" \
    --checkpoint_path "./checkpoints/NFS-E2E" \
    --feature_scale "[1.0, 1.0, 1.0, 1.0, 1.0]"

Model training

Training of the HiFi-GAN and HiFi-Glot models is possible with the end-to-end architecture by using the the scripts train_e2e_hifigan.py and train_e2e_hifiglot.py.

Citation information

Citation information will be added when a pre-print is available.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
Samples		Samples
checkpoints		checkpoints
config		config
images		images
src/neural_formant_synthesis		src/neural_formant_synthesis
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
inference_hifigan.py		inference_hifigan.py
inference_hifiglot.py		inference_hifiglot.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train_e2e_hifigan.py		train_e2e_hifigan.py
train_e2e_hifiglot.py		train_e2e_hifiglot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Formant Synthesis with Differentiable Resonant Filters

Table of contents

Model overview

Sound samples

Repository installation

Conda environment

GlotNet

HiFi-GAN

Inference

Model training

Citation information

About

Releases

Packages

Contributors 2

Languages

License

ljuvela/SourceFilterNeuralFormants

Folders and files

Latest commit

History

Repository files navigation

Neural Formant Synthesis with Differentiable Resonant Filters

Table of contents

Model overview

Sound samples

Repository installation

Conda environment

GlotNet

HiFi-GAN

Inference

Model training

Citation information

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages