Skip to content

Commit

Permalink
Added demo colab nb
Browse files Browse the repository at this point in the history
  • Loading branch information
brihijoshi committed May 4, 2020
1 parent 30ee7a1 commit fa02df6
Showing 1 changed file with 301 additions and 0 deletions.
301 changes: 301 additions & 0 deletions Demo_DL_Based_Emotional_TTS.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,301 @@
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Demo - DL Based Emotional TTS",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"accelerator": "GPU"
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/github/Emotional-Text-to-Speech/dl-for-emo-tts/blob/master/Demo_DL_Based_Emotional_TTS.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f9ds3NJZ5T92",
"colab_type": "text"
},
"source": [
"# DL Based Emotional Text to Speech\n",
"\n",
"In this demo, we provide an interface to generate emotional speech from user inputs for both the emotional label and the text.\n",
"\n",
"The models that are trained are [Tacotron](https://github.com/Emotional-Text-to-Speech/tacotron_pytorch) and [DC-TTS](https://github.com/Emotional-Text-to-Speech/pytorch-dc-tts).\n",
"\n",
"Further information about our approaches and *exactly how* did we develop this demo can be seen [here](https://github.com/Emotional-Text-to-Speech/dl-for-emo-tts).\n",
"\n",
"---\n",
"---\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "_7gRpQLXQSID"
},
"source": [
"## Download the required code and install the dependences\n",
"\n",
"- Make sure you have clicked on ```Open in Playground``` to be able to run the cells. Set your runtime to ```GPU```. This can be done with the following steps:\n",
" - Click on ```Runtime``` on the menubar above \n",
" - Select ```Change runtime type```\n",
" - Select ```GPU``` from the ```Hardware accelerator``` dropdown and save.\n",
"- Run the cell below. It will automatically create the required directory structure. In order to run the cell, click on the **arrow** that is on the left column of the cell (hover over the ```[]``` symbol). Optionally, you can also press ```Shift + Enter ```\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "V4d2LXHbC-Es",
"colab_type": "code",
"colab": {}
},
"source": [
"! git clone https://github.com/Emotional-Text-to-Speech/pytorch-dc-tts\n",
"! git clone --recursive https://github.com/Emotional-Text-to-Speech/tacotron_pytorch.git\n",
"! cd \"tacotron_pytorch/\" && pip install -e .\n",
"! pip install unidecode\n",
"! pip install gdown\n",
"! mkdir trained_models\n",
"\n",
"import gdown\n",
"url = 'https://drive.google.com/uc?id=1rmhtEl3N3kAfnQM6J0vDGSCCHlHLK6kw'\n",
"output = 'trained_models/angry_dctts.pth'\n",
"gdown.download(url, output, quiet=False)\n",
"url = 'https://drive.google.com/uc?id=1bP0eJ6z4onr2klolzU17Y8SaNspxQjF-'\n",
"output = 'trained_models/neutral_dctts.pth'\n",
"gdown.download(url, output, quiet=False)\n",
"url = 'https://drive.google.com/uc?id=1WWE9zxS3FRgD0Y5yIdNmLY9-t5gnBsNt'\n",
"output = 'trained_models/ssrn.pth'\n",
"gdown.download(url, output, quiet=False)\n",
"url = 'https://drive.google.com/uc?id=1N6Ykrd1IaPiNdos_iv0J6JbY2gBDghod'\n",
"output = 'trained_models/disgust_tacotron.pth'\n",
"gdown.download(url, output, quiet=False)\n",
"url = 'https://drive.google.com/uc?id=15m0PZ8xaBocb_6wDjAU6S4Aunbr3TKkM'\n",
"output = 'trained_models/amused_tacotron.pth'\n",
"gdown.download(url, output, quiet=False)\n",
"url = 'https://drive.google.com/uc?id=1D6HGWYWvhdvLWQt4uOYqdmuVO7ZVLWNa'\n",
"output = 'trained_models/sleepiness_tacotron.pth'\n",
"gdown.download(url, output, quiet=False)"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "LaZ8INV0IOgH",
"colab_type": "text"
},
"source": [
"## Setup the required code\n",
"\n",
"- Run the cell below. It will automatically create the required directory structure. In order to run the cell, click on the **arrow** that is on the left column of the cell (hover over the ```[]``` symbol). Optionally, you can also press ```Shift + Enter ```"
]
},
{
"cell_type": "code",
"metadata": {
"id": "GnZPVzThIBkd",
"colab_type": "code",
"colab": {}
},
"source": [
"%tensorflow_version 1.x \n",
"%pylab inline\n",
"rcParams[\"figure.figsize\"] = (10,5)\n",
"\n",
"import os\n",
"import sys\n",
"import numpy as np\n",
"sys.path.append('pytorch-dc-tts/')\n",
"sys.path.append('pytorch-dc-tts/models')\n",
"sys.path.append(\"tacotron_pytorch/\")\n",
"sys.path.append(\"tacotron_pytorch/lib/tacotron\")\n",
"\n",
"# For the DC-TTS\n",
"import torch\n",
"from text2mel import Text2Mel\n",
"from ssrn import SSRN\n",
"from audio import save_to_wav, spectrogram2wav\n",
"from utils import get_last_checkpoint_file_name, load_checkpoint_test, save_to_png, load_checkpoint\n",
"from datasets.emovdb import vocab, get_test_data\n",
"\n",
"# For the Tacotron\n",
"from text import text_to_sequence, symbols\n",
"# from util import audio\n",
"\n",
"from tacotron_pytorch import Tacotron\n",
"from synthesis import tts as _tts\n",
"\n",
"# For Audio/Display purposes\n",
"import librosa.display\n",
"import IPython\n",
"from IPython.display import Audio\n",
"from IPython.display import display\n",
"from google.colab import widgets\n",
"from google.colab import output\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"\n",
"torch.set_grad_enabled(False)\n",
"text2mel = Text2Mel(vocab).eval()\n",
"\n",
"ssrn = SSRN().eval()\n",
"load_checkpoint('trained_models/ssrn.pth', ssrn, None)\n",
"\n",
"model = Tacotron(n_vocab=len(symbols),\n",
" embedding_dim=256,\n",
" mel_dim=80,\n",
" linear_dim=1025,\n",
" r=5,\n",
" padding_idx=None,\n",
" use_memory_mask=False,\n",
" )\n",
"\n",
"def visualize(alignment, spectrogram, Emotion):\n",
" label_fontsize = 16\n",
" tb = widgets.TabBar(['Alignment', 'Spectrogram'], location='top')\n",
" with tb.output_to('Alignment'):\n",
" imshow(alignment.T, aspect=\"auto\", origin=\"lower\", interpolation=None)\n",
" xlabel(\"Decoder timestamp\", fontsize=label_fontsize)\n",
" ylabel(\"Encoder timestamp\", fontsize=label_fontsize)\n",
" with tb.output_to('Spectrogram'):\n",
" if Emotion == 'Disgust' or Emotion == 'Amused' or Emotion == 'Sleepiness':\n",
" librosa.display.specshow(spectrogram.T, sr=fs,hop_length=hop_length, x_axis=\"time\", y_axis=\"linear\")\n",
" else:\n",
" librosa.display.specshow(spectrogram, sr=fs,hop_length=hop_length, x_axis=\"time\", y_axis=\"linear\")\n",
"\n",
" xlabel(\"Time\", fontsize=label_fontsize)\n",
" ylabel(\"Hz\", fontsize=label_fontsize)\n",
"\n",
"def tts_dctts(text2mel, ssrn, text):\n",
" sentences = [text]\n",
"\n",
" max_N = len(text)\n",
" L = torch.from_numpy(get_test_data(sentences, max_N))\n",
" zeros = torch.from_numpy(np.zeros((1, 80, 1), np.float32))\n",
" Y = zeros\n",
" A = None\n",
"\n",
" for t in range(210):\n",
" _, Y_t, A = text2mel(L, Y, monotonic_attention=True)\n",
" Y = torch.cat((zeros, Y_t), -1)\n",
" _, attention = torch.max(A[0, :, -1], 0)\n",
" attention = attention.item()\n",
" if L[0, attention] == vocab.index('E'): # EOS\n",
" break\n",
"\n",
" _, Z = ssrn(Y)\n",
" Y = Y.cpu().detach().numpy()\n",
" A = A.cpu().detach().numpy()\n",
" Z = Z.cpu().detach().numpy()\n",
"\n",
" return spectrogram2wav(Z[0, :, :].T), A[0, :, :], Y[0, :, :]\n",
"\n",
"\n",
"def tts_tacotron(model, text):\n",
" waveform, alignment, spectrogram = _tts(model, text)\n",
" return waveform, alignment, spectrogram \n",
"\n",
"def present(waveform, Emotion, figures=False):\n",
" if figures!=False:\n",
" visualize(figures[0], figures[1], Emotion)\n",
" IPython.display.display(Audio(waveform, rate=fs))\n",
"\n",
" \n",
"fs = 20000 #20000\n",
"hop_length = 250\n",
"model.decoder.max_decoder_steps = 200"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "wiSAxs7XIUO7",
"colab_type": "text"
},
"source": [
"## Run the Demo\n",
"\n",
"- Select an ```Emotion``` from the dropdown and enter the ```Text``` that you want to be generated. \n",
"- Run the cell below. It will automatically create the required directory structure. In order to run the cell, click on the **arrow** that is on the left column of the cell (hover over the ```[]``` symbol). Optionally, you can also press ```Shift + Enter ```\n",
"\n",
"**Play the speech with the generated audio player and view the required plots by clicking on their respective tabs!**\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"cellView": "form",
"colab_type": "code",
"id": "3jKM6GfzlgpS",
"colab": {}
},
"source": [
"#@title Select the emotion and type the text\n",
"\n",
"%pylab inline\n",
"\n",
"Emotion = \"Neutral\" #@param [\"Neutral\", \"Angry\", \"Disgust\", \"Sleepiness\", \"Amused\"]\n",
"Text = 'I am exhausted.' #@param {type:\"string\"}\n",
"\n",
"wav, align, mel = None, None, None\n",
"\n",
"if Emotion == \"Neutral\":\n",
" load_checkpoint('trained_models/'+Emotion.lower()+'_dctts.pth', text2mel, None)\n",
" wav, align, mel = tts_dctts(text2mel, ssrn, Text)\n",
"elif Emotion == \"Angry\":\n",
" load_checkpoint_test('trained_models/'+Emotion.lower()+'_dctts.pth', text2mel, None)\n",
" wav, align, mel = tts_dctts(text2mel, ssrn, Text)\n",
" # wav = wav.T\n",
"elif Emotion == \"Disgust\" or Emotion == \"Amused\" or Emotion == \"Sleepiness\":\n",
" checkpoint = torch.load('trained_models/'+Emotion.lower()+'_tacotron.pth', map_location=torch.device('cpu'))\n",
" model.load_state_dict(checkpoint[\"state_dict\"])\n",
" wav, align, mel = tts_tacotron(model, Text)\n",
"\n",
"present(wav, Emotion, (align,mel))\n",
"\n"
],
"execution_count": 0,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "AchHD0xLcaE1",
"colab_type": "code",
"colab": {}
},
"source": [
""
],
"execution_count": 0,
"outputs": []
}
]
}

0 comments on commit fa02df6

Please sign in to comment.