Added demo colab nb

Emotional-Text-to-Speech · May 4, 2020 · fa02df6 · fa02df6
1 parent 30ee7a1
commit fa02df6
Showing 1 changed file with 301 additions and 0 deletions.
diff --git a/Demo_DL_Based_Emotional_TTS.ipynb b/Demo_DL_Based_Emotional_TTS.ipynb
@@ -0,0 +1,301 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "name": "Demo - DL Based Emotional TTS",
+      "provenance": [],
+      "collapsed_sections": [],
+      "include_colab_link": true
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "accelerator": "GPU"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "view-in-github",
+        "colab_type": "text"
+      },
+      "source": [
+        "<a href=\"https://colab.research.google.com/github/Emotional-Text-to-Speech/dl-for-emo-tts/blob/master/Demo_DL_Based_Emotional_TTS.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "f9ds3NJZ5T92",
+        "colab_type": "text"
+      },
+      "source": [
+        "# DL Based Emotional Text to Speech\n",
+        "\n",
+        "In this demo, we provide an interface to generate emotional speech from user inputs for both the emotional label and the text.\n",
+        "\n",
+        "The models that are trained are [Tacotron](https://github.com/Emotional-Text-to-Speech/tacotron_pytorch) and [DC-TTS](https://github.com/Emotional-Text-to-Speech/pytorch-dc-tts).\n",
+        "\n",
+        "Further information about our approaches and *exactly how* did we develop this demo can be seen [here](https://github.com/Emotional-Text-to-Speech/dl-for-emo-tts).\n",
+        "\n",
+        "---\n",
+        "---\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "colab_type": "text",
+        "id": "_7gRpQLXQSID"
+      },
+      "source": [
+        "## Download the required code and install the dependences\n",
+        "\n",
+        "- Make sure you have clicked on ```Open in Playground``` to be able to run the cells. Set your runtime to ```GPU```. This can be done with the following steps:\n",
+        "  - Click on ```Runtime``` on the menubar above \n",
+        "  - Select ```Change runtime type```\n",
+        "  - Select ```GPU``` from the ```Hardware accelerator``` dropdown and save.\n",
+        "- Run the cell below. It will automatically create the required directory structure. In order to run the cell, click on the **arrow** that is on the left column of the cell (hover over the ```[]``` symbol). Optionally, you can also press ```Shift + Enter ```\n",
+        "\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "V4d2LXHbC-Es",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "! git clone https://github.com/Emotional-Text-to-Speech/pytorch-dc-tts\n",
+        "! git clone --recursive https://github.com/Emotional-Text-to-Speech/tacotron_pytorch.git\n",
+        "! cd \"tacotron_pytorch/\" && pip install -e .\n",
+        "! pip install unidecode\n",
+        "! pip install gdown\n",
+        "! mkdir trained_models\n",
+        "\n",
+        "import gdown\n",
+        "url = 'https://drive.google.com/uc?id=1rmhtEl3N3kAfnQM6J0vDGSCCHlHLK6kw'\n",
+        "output = 'trained_models/angry_dctts.pth'\n",
+        "gdown.download(url, output, quiet=False)\n",
+        "url = 'https://drive.google.com/uc?id=1bP0eJ6z4onr2klolzU17Y8SaNspxQjF-'\n",
+        "output = 'trained_models/neutral_dctts.pth'\n",
+        "gdown.download(url, output, quiet=False)\n",
+        "url = 'https://drive.google.com/uc?id=1WWE9zxS3FRgD0Y5yIdNmLY9-t5gnBsNt'\n",
+        "output = 'trained_models/ssrn.pth'\n",
+        "gdown.download(url, output, quiet=False)\n",
+        "url = 'https://drive.google.com/uc?id=1N6Ykrd1IaPiNdos_iv0J6JbY2gBDghod'\n",
+        "output = 'trained_models/disgust_tacotron.pth'\n",
+        "gdown.download(url, output, quiet=False)\n",
+        "url = 'https://drive.google.com/uc?id=15m0PZ8xaBocb_6wDjAU6S4Aunbr3TKkM'\n",
+        "output = 'trained_models/amused_tacotron.pth'\n",
+        "gdown.download(url, output, quiet=False)\n",
+        "url = 'https://drive.google.com/uc?id=1D6HGWYWvhdvLWQt4uOYqdmuVO7ZVLWNa'\n",
+        "output = 'trained_models/sleepiness_tacotron.pth'\n",
+        "gdown.download(url, output, quiet=False)"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "LaZ8INV0IOgH",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Setup the required code\n",
+        "\n",
+        "- Run the cell below. It will automatically create the required directory structure. In order to run the cell, click on the **arrow** that is on the left column of the cell (hover over the ```[]``` symbol). Optionally, you can also press ```Shift + Enter ```"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "GnZPVzThIBkd",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        "%tensorflow_version 1.x \n",
+        "%pylab inline\n",
+        "rcParams[\"figure.figsize\"] = (10,5)\n",
+        "\n",
+        "import os\n",
+        "import sys\n",
+        "import numpy as np\n",
+        "sys.path.append('pytorch-dc-tts/')\n",
+        "sys.path.append('pytorch-dc-tts/models')\n",
+        "sys.path.append(\"tacotron_pytorch/\")\n",
+        "sys.path.append(\"tacotron_pytorch/lib/tacotron\")\n",
+        "\n",
+        "# For the DC-TTS\n",
+        "import torch\n",
+        "from text2mel import Text2Mel\n",
+        "from ssrn import SSRN\n",
+        "from audio import save_to_wav, spectrogram2wav\n",
+        "from utils import get_last_checkpoint_file_name, load_checkpoint_test, save_to_png, load_checkpoint\n",
+        "from datasets.emovdb import vocab, get_test_data\n",
+        "\n",
+        "# For the Tacotron\n",
+        "from text import text_to_sequence, symbols\n",
+        "# from util import audio\n",
+        "\n",
+        "from tacotron_pytorch import Tacotron\n",
+        "from synthesis import tts as _tts\n",
+        "\n",
+        "# For Audio/Display purposes\n",
+        "import librosa.display\n",
+        "import IPython\n",
+        "from IPython.display import Audio\n",
+        "from IPython.display import display\n",
+        "from google.colab import widgets\n",
+        "from google.colab import output\n",
+        "import warnings\n",
+        "warnings.filterwarnings('ignore')\n",
+        "\n",
+        "\n",
+        "torch.set_grad_enabled(False)\n",
+        "text2mel = Text2Mel(vocab).eval()\n",
+        "\n",
+        "ssrn = SSRN().eval()\n",
+        "load_checkpoint('trained_models/ssrn.pth', ssrn, None)\n",
+        "\n",
+        "model = Tacotron(n_vocab=len(symbols),\n",
+        "                 embedding_dim=256,\n",
+        "                 mel_dim=80,\n",
+        "                 linear_dim=1025,\n",
+        "                 r=5,\n",
+        "                 padding_idx=None,\n",
+        "                 use_memory_mask=False,\n",
+        "                 )\n",
+        "\n",
+        "def visualize(alignment, spectrogram, Emotion):\n",
+        "    label_fontsize = 16\n",
+        "    tb = widgets.TabBar(['Alignment', 'Spectrogram'], location='top')\n",
+        "    with tb.output_to('Alignment'):\n",
+        "      imshow(alignment.T, aspect=\"auto\", origin=\"lower\", interpolation=None)\n",
+        "      xlabel(\"Decoder timestamp\", fontsize=label_fontsize)\n",
+        "      ylabel(\"Encoder timestamp\", fontsize=label_fontsize)\n",
+        "    with tb.output_to('Spectrogram'):\n",
+        "      if Emotion == 'Disgust' or Emotion == 'Amused' or Emotion == 'Sleepiness':\n",
+        "        librosa.display.specshow(spectrogram.T, sr=fs,hop_length=hop_length, x_axis=\"time\", y_axis=\"linear\")\n",
+        "      else:\n",
+        "        librosa.display.specshow(spectrogram, sr=fs,hop_length=hop_length, x_axis=\"time\", y_axis=\"linear\")\n",
+        "\n",
+        "      xlabel(\"Time\", fontsize=label_fontsize)\n",
+        "      ylabel(\"Hz\", fontsize=label_fontsize)\n",
+        "\n",
+        "def tts_dctts(text2mel, ssrn, text):\n",
+        "  sentences = [text]\n",
+        "\n",
+        "  max_N = len(text)\n",
+        "  L = torch.from_numpy(get_test_data(sentences, max_N))\n",
+        "  zeros = torch.from_numpy(np.zeros((1, 80, 1), np.float32))\n",
+        "  Y = zeros\n",
+        "  A = None\n",
+        "\n",
+        "  for t in range(210):\n",
+        "      _, Y_t, A = text2mel(L, Y, monotonic_attention=True)\n",
+        "      Y = torch.cat((zeros, Y_t), -1)\n",
+        "      _, attention = torch.max(A[0, :, -1], 0)\n",
+        "      attention = attention.item()\n",
+        "      if L[0, attention] == vocab.index('E'):  # EOS\n",
+        "          break\n",
+        "\n",
+        "  _, Z = ssrn(Y)\n",
+        "  Y = Y.cpu().detach().numpy()\n",
+        "  A = A.cpu().detach().numpy()\n",
+        "  Z = Z.cpu().detach().numpy()\n",
+        "\n",
+        "  return spectrogram2wav(Z[0, :, :].T), A[0, :, :], Y[0, :, :]\n",
+        "\n",
+        "\n",
+        "def tts_tacotron(model, text):\n",
+        "    waveform, alignment, spectrogram = _tts(model, text)\n",
+        "    return waveform, alignment, spectrogram \n",
+        "\n",
+        "def present(waveform, Emotion, figures=False):\n",
+        "  if figures!=False:\n",
+        "        visualize(figures[0], figures[1], Emotion)\n",
+        "  IPython.display.display(Audio(waveform, rate=fs))\n",
+        "\n",
+        "  \n",
+        "fs = 20000 #20000\n",
+        "hop_length = 250\n",
+        "model.decoder.max_decoder_steps = 200"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "wiSAxs7XIUO7",
+        "colab_type": "text"
+      },
+      "source": [
+        "## Run the Demo\n",
+        "\n",
+        "- Select an ```Emotion``` from the dropdown and enter the ```Text``` that you want to be generated. \n",
+        "- Run the cell below. It will automatically create the required directory structure. In order to run the cell, click on the **arrow** that is on the left column of the cell (hover over the ```[]``` symbol). Optionally, you can also press ```Shift + Enter ```\n",
+        "\n",
+        "**Play the speech with the generated audio player and view the required plots by clicking on their respective tabs!**\n",
+        "\n",
+        "\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "cellView": "form",
+        "colab_type": "code",
+        "id": "3jKM6GfzlgpS",
+        "colab": {}
+      },
+      "source": [
+        "#@title Select the emotion and type the text\n",
+        "\n",
+        "%pylab inline\n",
+        "\n",
+        "Emotion = \"Neutral\" #@param [\"Neutral\", \"Angry\", \"Disgust\", \"Sleepiness\", \"Amused\"]\n",
+        "Text = 'I am exhausted.' #@param {type:\"string\"}\n",
+        "\n",
+        "wav, align, mel = None, None, None\n",
+        "\n",
+        "if Emotion == \"Neutral\":\n",
+        "  load_checkpoint('trained_models/'+Emotion.lower()+'_dctts.pth', text2mel, None)\n",
+        "  wav, align, mel = tts_dctts(text2mel, ssrn, Text)\n",
+        "elif Emotion == \"Angry\":\n",
+        "  load_checkpoint_test('trained_models/'+Emotion.lower()+'_dctts.pth', text2mel, None)\n",
+        "  wav, align, mel = tts_dctts(text2mel, ssrn, Text)\n",
+        "  # wav = wav.T\n",
+        "elif Emotion == \"Disgust\" or Emotion == \"Amused\" or Emotion == \"Sleepiness\":\n",
+        "  checkpoint = torch.load('trained_models/'+Emotion.lower()+'_tacotron.pth', map_location=torch.device('cpu'))\n",
+        "  model.load_state_dict(checkpoint[\"state_dict\"])\n",
+        "  wav, align, mel = tts_tacotron(model, Text)\n",
+        "\n",
+        "present(wav, Emotion, (align,mel))\n",
+        "\n"
+      ],
+      "execution_count": 0,
+      "outputs": []
+    },
+    {
+      "cell_type": "code",
+      "metadata": {
+        "id": "AchHD0xLcaE1",
+        "colab_type": "code",
+        "colab": {}
+      },
+      "source": [
+        ""
+      ],
+      "execution_count": 0,
+      "outputs": []
+    }
+  ]
+}