Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of warnings when running prepare.sh #108

Open
gpawlowsky1979 opened this issue May 1, 2023 · 3 comments
Open

Lots of warnings when running prepare.sh #108

gpawlowsky1979 opened this issue May 1, 2023 · 3 comments

Comments

@gpawlowsky1979
Copy link

When running prepare.sh for preparing the libritts dataset I got lots of warnings like this:

2023-04-30 21:15:25,842 WARNING [words_mismatch.py:88] words count mismatch on 100.0% of the lines (1/1)
2023-04-30 21:15:25,842 WARNING [words_mismatch.py:88] words count mismatch on 100.0% of the lines (1/1)
2023-04-30 21:15:25,843 WARNING [words_mismatch.py:88] words count mismatch on 200.0% of the lines (2/1)
2023-04-30 21:15:25,843 WARNING [words_mismatch.py:88] words count mismatch on 200.0% of the lines (2/1)
2023-04-30 21:15:25,843 WARNING [words_mismatch.py:88] words count mismatch on 100.0% of the lines (1/1)
2023-04-30 21:15:25,844 WARNING [words_mismatch.py:88] words count mismatch on 200.0% of the lines (2/1)
2023-04-30 21:15:25,844 WARNING [words_mismatch.py:88] words count mismatch on 300.0% of the lines (3/1)
2023-04-30 21:15:25,844 WARNING [words_mismatch.py:88] words count mismatch on 100.0% of the lines (1/1)
2023-04-30 21:15:25,845 WARNING [words_mismatch.py:88] words count mismatch on 200.0% of the lines (2/1)
2023-04-30 21:15:25,845 WARNING [words_mismatch.py:88] words count mismatch on 100.0% of the lines (1/1)
2023-04-30 21:15:25,845 WARNING [words_mismatch.py:88] words count mismatch on 100.0% of the lines (1/1)
2023-04-30 21:15:25,845 WARNING [words_mismatch.py:88] words count mismatch on 100.0% of the lines (1/1)

I looks like every single line in the dataset has this kind of problem, so I don't think it's something that can just be ignored safely. I got similar warnings later when using infer.py.
Despite these errors, I was able to train the model and after 60 epochs (20 AR + 40 NAR) it is capable of generating intelligible speech, but it doesn't resemble much the voices I use as an input. This might be due to underfitting, but I'm concerned it may also be related to the warnings in the dataset I mentioned. I also had to reduce a bit the max-duration parameter in order to run on a 16GB GPU.
Here's my tensorboard image after 40 epochs on NAR:
TensorBoard_NAR

Anybody got some luck getting good speech generation that really resembles input voices after training on LibriTTS?
Also, what's the difference between vall-e and vall-f models? I haven't found much information about vall-f. Is it any better than vall-e?

@debasishaimonk
Copy link

@gpawlowsky1979 Hi what all changes u did inorder to train it,i meant hyperparamters? and how many distinct speakers that you have used to train it.

@gpawlowsky1979
Copy link
Author

After 70 epochs the results are better and now the voices resemble more the ones used as input. Perhaps I had unrealistic expectations about how good the generated voices would sound.
However, I'm still concerned about the warning messages when preparing the dataset, and I just used the default command:

bash prepare.sh --stage -1 --stop-stage 3

I think the results may have been better if the dataset was properly prepared, without so many word count mismatches.

I trained it on the libritts dataset. Here are the parameters I used:

## Train AR model
python3 bin/trainer.py --max-duration 50 --filter-min-duration 0.5 --filter-max-duration 14 --train-stage 1 \
      --num-buckets 6 --dtype "bfloat16" --save-every-n 10000 --valid-interval 20000 \
      --model-name valle --share-embedding true --norm-first true --add-prenet false \
      --decoder-dim 1024 --nhead 16 --num-decoder-layers 12 --prefix-mode 1 \
      --base-lr 0.05 --warmup-steps 200 --average-period 0 \
      --num-epochs 20 --start-epoch 1 --start-batch 0 --accumulate-grad-steps 4 \
      --exp-dir ${exp_dir} --tensorboard true

## Train NAR model
cp ${exp_dir}/best-valid-loss.pt ${exp_dir}/epoch-2.pt  # --start-epoch 3=2+1
python3 bin/trainer.py --max-duration 36 --filter-min-duration 0.5 --filter-max-duration 14 --train-stage 2 \
      --num-buckets 6 --dtype "float32" --save-every-n 10000 --valid-interval 20000 \
      --model-name valle --share-embedding true --norm-first true --add-prenet false \
      --decoder-dim 1024 --nhead 16 --num-decoder-layers 12 --prefix-mode 1 \
      --base-lr 0.05 --warmup-steps 200 --average-period 0 \
      --num-epochs 40 --start-epoch 3 --start-batch 0 --accumulate-grad-steps 4 \
      --exp-dir ${exp_dir} --tensorboard true

@salaxieb
Copy link

salaxieb commented Jun 8, 2023

@gpawlowsky1979 I'm also training the model.
Can, you, please, share your checkpoint, so I will not have to start from scratch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants