Skip to content

Commit

Permalink
[week07] add homework
Browse files Browse the repository at this point in the history
  • Loading branch information
markovka17 committed Nov 7, 2020
1 parent f99f08f commit e1e513b
Show file tree
Hide file tree
Showing 2 changed files with 201 additions and 0 deletions.
201 changes: 201 additions & 0 deletions week07/homework.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Homework №5\n",
"\n",
" This homework will be dedicated to the Text-to-Speech(TTS), specifically the neural vocoder."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Data\n",
"\n",
" In this homework we will use only LJSpeech https://keithito.com/LJ-Speech-Dataset/.\n",
"\n",
" Use the following `featurizer` (his configuration is +- standard for this task):"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"code_folding": [
13,
27
]
},
"outputs": [],
"source": [
"from IPython import display\n",
"from dataclasses import dataclass\n",
"\n",
"import torch\n",
"from torch import nn\n",
"\n",
"import torchaudio\n",
"\n",
"import librosa\n",
"from matplotlib import pyplot as plt\n",
"\n",
"\n",
"@dataclass\n",
"class MelSpectrogramConfig:\n",
" sr: int = 22050\n",
" win_length: int = 1024\n",
" hop_length: int = 256\n",
" n_fft: int = 1024\n",
" f_min: int = 0\n",
" f_max: int = 8000\n",
" n_mels: int = 80\n",
" power: float = 1.0\n",
" \n",
" # value of melspectrograms if we fed a silence into `MelSpectrogram`\n",
" pad_value: float = -11.5129251\n",
"\n",
"\n",
"class MelSpectrogram(nn.Module):\n",
"\n",
" def __init__(self, config: MelSpectrogramConfig):\n",
" super(MelSpectrogram, self).__init__()\n",
" \n",
" self.config = config\n",
"\n",
" self.mel_spectrogram = torchaudio.transforms.MelSpectrogram(\n",
" sample_rate=config.sr,\n",
" win_length=config.win_length,\n",
" hop_length=config.hop_length,\n",
" n_fft=config.n_fft,\n",
" f_min=config.f_min,\n",
" f_max=config.f_max,\n",
" n_mels=config.n_mels\n",
" )\n",
"\n",
" # The is no way to set power in constructor in 0.5.0 version.\n",
" self.mel_spectrogram.spectrogram.power = config.power\n",
"\n",
" # Default `torchaudio` mel basis uses HTK formula. In order to be compatible with WaveGlow\n",
" # we decided to use Slaney one instead (as well as `librosa` does by default).\n",
" mel_basis = librosa.filters.mel(\n",
" sr=config.sr,\n",
" n_fft=config.n_fft,\n",
" n_mels=config.n_mels,\n",
" fmin=config.f_min,\n",
" fmax=config.f_max\n",
" ).T\n",
" self.mel_spectrogram.mel_scale.fb.copy_(torch.tensor(mel_basis))\n",
" \n",
"\n",
" def forward(self, audio: torch.Tensor) -> torch.Tensor:\n",
" \"\"\"\n",
" :param audio: Expected shape is [B, T]\n",
" :return: Shape is [B, n_mels, T']\n",
" \"\"\"\n",
" \n",
" mel = self.mel_spectrogram(audio) \\\n",
" .clamp_(min=1e-5) \\\n",
" .log_()\n",
"\n",
" return mel"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"featurizer = MelSpectrogram(MelSpectrogramConfig())\n",
"wav, sr = torchaudio.load('../week01/audio.wav')\n",
"mels = featurizer(wav)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"_, axes = plt.subplots(2, 1, figsize=(15, 7))\n",
"axes[0].plot(wav.squeeze())\n",
"axes[1].imshow(mels.squeeze())\n",
"\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Model\n",
"\n",
" 1) In this homework you need to implement classical version of WaveNet.\n",
" Pay attention on:\n",
" 1.1) Causal convs. We recommend to implement it via padding.\n",
" 1.2) \"Condition Network\" which align mel with wav\n",
"\n",
" 2) (Bonus) If you have already implemented WaveNet, you can try to implement [Parallel WaveGAN](https://www.dropbox.com/s/bj25vnmkblr9y8v/PWG.pdf?dl=0).\n",
" This model is based on WaveNet and GAN.\n",
"\n",
" 3) (Bonus) Fast generation of WaveNet. https://arxiv.org/abs/1611.09482.\n",
" Don't forget to compare perfomance."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Code\n",
"\n",
" 1) In this homework you are allowed to use pytorch-lighting.\n",
"\n",
" 2) Try to write code more structurally and cleanly!\n",
"\n",
" 3) Good logging of experiments save your nerves and time, so we ask you to use W&B.\n",
" Log loss, generated and real wavs (in pair, i.e. real wav and wav from correspond mel). \n",
" Do not remove the logs until we have checked your work and given you a grade!\n",
"\n",
" 4) We also ask you to organize your code in github repo with (Bonus) Docker and setup.py. You can use my template https://github.com/markovka17/dl-start-pack.\n",
"\n",
" 5) Your work must be reproducable, so fix seed, save the weights of model, and etc.\n",
"\n",
" 6) In the end of your work write inference utils. Anyone should be able to take your weight, load it into the model and run it on some melspec."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Report\n",
"\n",
" Finally, you need to write a report in W&B https://www.wandb.com/reports. Add examples of generated mel and audio, compare with GT.\n",
" Don't forget to add link to your report."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Empty file added week10/.gitkeep
Empty file.

0 comments on commit e1e513b

Please sign in to comment.