Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synpaflex french dataset tacotron2 and MB-MelGAN support #640

Merged
merged 6 commits into from
Aug 10, 2021

Conversation

samuel-lunii
Copy link
Contributor

Note that the files of the SynPaFlex dataset have to be reorganized in an LJSpeech manner. I made a script to do this automatically, which is not present here. I can also share it if you want :)

@dathudeptrai dathudeptrai self-assigned this Aug 4, 2021
@dathudeptrai dathudeptrai added the enhancement 🚀 New feature or request label Aug 4, 2021
@dathudeptrai
Copy link
Collaborator

@samuel-lunii Is this pull request complete?

@ZDisket
Copy link
Collaborator

ZDisket commented Aug 4, 2021

@samuel-lunii Is there a trained model?

@samuel-lunii
Copy link
Contributor Author

@dathudeptrai almost complete :)

  • I forgot to include Synpaflex in tensorflow_tts/configs/tacotron2.py
  • Still need to update README.md and to create a notebook similar to this one to prepare synpaflex data before preprocessing.

@ZDisket I have trained models for tacotron2 (150k) and MB-MelGAN (780k), but I am not sure where to include them ?

@samuel-lunii
Copy link
Contributor Author

@dathudeptrai it is complete now

@dathudeptrai
Copy link
Collaborator

@samuel-lunii

I have trained models for tacotron2 (150k) and MB-MelGAN (780k), but I am not sure where to include them ?

You can make a google colab for the inference. The model should be downloaded from google drive then i will fork and copy your model to upload into Huggingface Hub :D.

@samuel-lunii
Copy link
Contributor Author

@dathudeptrai
Here is a link to the google colab for inference, everything is there :)

@dathudeptrai
Copy link
Collaborator

@samuel-lunii Many thanks :D, i will review and merge this weekend :D

dathudeptrai
dathudeptrai previously approved these changes Aug 6, 2021
@dathudeptrai
Copy link
Collaborator

@samuel-lunii can you fix a failing check :D

@dathudeptrai dathudeptrai self-requested a review August 10, 2021 10:16
@dathudeptrai dathudeptrai merged commit d7415ac into TensorSpeech:master Aug 10, 2021
@samuel-lunii samuel-lunii deleted the sd/synpaflexSupport branch August 11, 2021 07:58
@samuel-lunii
Copy link
Contributor Author

@dathudeptrai here is a colab for inference with FS2 trained at 200k steps with SynPaFlex dataset. Durations have been exported with tacotron2 trained at 150k steps.

@samuel-lunii
Copy link
Contributor Author

samuel-lunii commented Aug 26, 2021

@dathudeptrai
I realized that you set the sampling rate to 24000 Hz instead of 22050 Hz as I used for preprocessing, in this colab cell.

Also, adding the piece of code below in this cell, before # vocoder part, allows for finding the end of the synthesized utterance from alignment data, even if the synthesized mel spectrogram is much longer than it should be :

  # find the end of the sentence according to alignment data
  final_text_index = alignment_history[0].shape[0]
  final_frame_index = 0
  for frame in np.swapaxes(alignment_history[0],0,1):
    max_index = np.where(frame == np.amax(frame))[0][0] 
    final_frame_index += 1
    if max_index == final_text_index - 1:
      break

and then replace
audio = vocoder_model.inference(mel_outputs)[0, :-remove_end, 0]
by :
audio = vocoder_model.inference(mel_outputs[:,:final_frame_index,:])[0, :-1, 0]

So Vocoder inference is only performed on actual speech :)

@dathudeptrai
Copy link
Collaborator

dathudeptrai commented Aug 26, 2021

@samuel-lunii Many thanks, i will upload your model to huggingface hub soon :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🚀 New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants