[week07] add homework

Ecovne · Nov 7, 2020 · e1e513b · e1e513b
1 parent f99f08f
commit e1e513b
Show file tree

Hide file tree

Showing 2 changed files with 201 additions and 0 deletions.
diff --git a/week07/homework.ipynb b/week07/homework.ipynb
@@ -0,0 +1,201 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Homework №5\n",
+    "\n",
+    "    This homework will be dedicated to the Text-to-Speech(TTS), specifically the neural vocoder."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Data\n",
+    "\n",
+    "    In this homework we will use only LJSpeech https://keithito.com/LJ-Speech-Dataset/.\n",
+    "\n",
+    "    Use the following `featurizer` (his configuration is +- standard for this task):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "code_folding": [
+     13,
+     27
+    ]
+   },
+   "outputs": [],
+   "source": [
+    "from IPython import display\n",
+    "from dataclasses import dataclass\n",
+    "\n",
+    "import torch\n",
+    "from torch import nn\n",
+    "\n",
+    "import torchaudio\n",
+    "\n",
+    "import librosa\n",
+    "from matplotlib import pyplot as plt\n",
+    "\n",
+    "\n",
+    "@dataclass\n",
+    "class MelSpectrogramConfig:\n",
+    "    sr: int = 22050\n",
+    "    win_length: int = 1024\n",
+    "    hop_length: int = 256\n",
+    "    n_fft: int = 1024\n",
+    "    f_min: int = 0\n",
+    "    f_max: int = 8000\n",
+    "    n_mels: int = 80\n",
+    "    power: float = 1.0\n",
+    "        \n",
+    "    # value of melspectrograms if we fed a silence into `MelSpectrogram`\n",
+    "    pad_value: float = -11.5129251\n",
+    "\n",
+    "\n",
+    "class MelSpectrogram(nn.Module):\n",
+    "\n",
+    "    def __init__(self, config: MelSpectrogramConfig):\n",
+    "        super(MelSpectrogram, self).__init__()\n",
+    "        \n",
+    "        self.config = config\n",
+    "\n",
+    "        self.mel_spectrogram = torchaudio.transforms.MelSpectrogram(\n",
+    "            sample_rate=config.sr,\n",
+    "            win_length=config.win_length,\n",
+    "            hop_length=config.hop_length,\n",
+    "            n_fft=config.n_fft,\n",
+    "            f_min=config.f_min,\n",
+    "            f_max=config.f_max,\n",
+    "            n_mels=config.n_mels\n",
+    "        )\n",
+    "\n",
+    "        # The is no way to set power in constructor in 0.5.0 version.\n",
+    "        self.mel_spectrogram.spectrogram.power = config.power\n",
+    "\n",
+    "        # Default `torchaudio` mel basis uses HTK formula. In order to be compatible with WaveGlow\n",
+    "        # we decided to use Slaney one instead (as well as `librosa` does by default).\n",
+    "        mel_basis = librosa.filters.mel(\n",
+    "            sr=config.sr,\n",
+    "            n_fft=config.n_fft,\n",
+    "            n_mels=config.n_mels,\n",
+    "            fmin=config.f_min,\n",
+    "            fmax=config.f_max\n",
+    "        ).T\n",
+    "        self.mel_spectrogram.mel_scale.fb.copy_(torch.tensor(mel_basis))\n",
+    "    \n",
+    "\n",
+    "    def forward(self, audio: torch.Tensor) -> torch.Tensor:\n",
+    "        \"\"\"\n",
+    "        :param audio: Expected shape is [B, T]\n",
+    "        :return: Shape is [B, n_mels, T']\n",
+    "        \"\"\"\n",
+    "        \n",
+    "        mel = self.mel_spectrogram(audio) \\\n",
+    "            .clamp_(min=1e-5) \\\n",
+    "            .log_()\n",
+    "\n",
+    "        return mel"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "featurizer = MelSpectrogram(MelSpectrogramConfig())\n",
+    "wav, sr = torchaudio.load('../week01/audio.wav')\n",
+    "mels = featurizer(wav)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "_, axes = plt.subplots(2, 1, figsize=(15, 7))\n",
+    "axes[0].plot(wav.squeeze())\n",
+    "axes[1].imshow(mels.squeeze())\n",
+    "\n",
+    "plt.show()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Model\n",
+    "\n",
+    "    1) In this homework you need to implement classical version of WaveNet.\n",
+    "        Pay attention on:\n",
+    "            1.1) Causal convs. We recommend to implement it via padding.\n",
+    "            1.2) \"Condition Network\" which align mel with wav\n",
+    "\n",
+    "    2) (Bonus) If you have already implemented WaveNet, you can try to implement [Parallel WaveGAN](https://www.dropbox.com/s/bj25vnmkblr9y8v/PWG.pdf?dl=0).\n",
+    "        This model is based on WaveNet and GAN.\n",
+    "\n",
+    "    3) (Bonus) Fast generation of WaveNet. https://arxiv.org/abs/1611.09482.\n",
+    "        Don't forget to compare perfomance."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Code\n",
+    "\n",
+    "    1) In this homework you are allowed to use pytorch-lighting.\n",
+    "\n",
+    "    2) Try to write code more structurally and cleanly!\n",
+    "\n",
+    "    3) Good logging of experiments save your nerves and time, so we ask you to use W&B.\n",
+    "       Log loss, generated and real wavs (in pair, i.e. real wav and wav from correspond mel). \n",
+    "       Do not remove the logs until we have checked your work and given you a grade!\n",
+    "\n",
+    "    4) We also ask you to organize your code in github repo with (Bonus) Docker and setup.py. You can use my template https://github.com/markovka17/dl-start-pack.\n",
+    "\n",
+    "    5) Your work must be reproducable, so fix seed, save the weights of model, and etc.\n",
+    "\n",
+    "    6) In the end of your work write inference utils. Anyone should be able to take your weight, load it into the model and run it on some melspec."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Report\n",
+    "\n",
+    "    Finally, you need to write a report in W&B https://www.wandb.com/reports. Add examples of generated mel and audio, compare with GT.\n",
+    "    Don't forget to add link to your report."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
diff --git a/week10/.gitkeep b/week10/.gitkeep