Incorrect behaviour of generating OOV during validation. #363

AndreyBocharnikov · 2021-12-03T09:43:33Z

Hello and thanks you for your excellent work.

I've got paired txt-wav sample "dear customer,welcome to our ship." And because of missing space around comma this "word" is not in dictionary that was downloaded via mfa model download dictionary english so it should be in OOV, but it's not. After running mfa validate wrong_sample english english logs from /root/Documents/MFA/wrong_sample/validate.log says "There were no missing words from the dictionary" which seems to be bug. The fact that this "word" is not being taken in account can be seen with mfa align wrong_sample english english wrong_sample_result, the resulting phonems looks like "D IH1 R" then blank for more than a second and then "T UW1 AW1 ER0 SH IH1 P".

The wrong alignment itself could be overcome with mfa g2p english_g2p wrong_sample wrong_sample_g2p which does generate customer,welcome K AH1 S T AH0 M ER0 W EH1 L K AH0 M, but the fact that mfa validate doesn't generate OOV file on that sample seems wrong.

With love, looking forward for your replay :)

The text was updated successfully, but these errors were encountered:

AndreyBocharnikov · 2021-12-03T17:46:03Z

I found out that my dataset contains more words that are not in /root/Documents/MFA/pretrained_models/dictionary/english.dict but still oovs_found.txt was not generated after mfa validate dataset english english.

I installed mfa via installation page from documentation:
conda create -n aligner -c conda-forge montreal-forced-aligner
mfa model download acoustic english
mfa model download dictionary english
and then aligned dataset mfa align dataset english english dataset_result
dataset is in correct format, right amount of speakers and utterances.
Am I doing something wrong?

The list of words that is not in english.dict, so they were not transcripted to phonems, instead there was a "" text with long duration (~1 second) in the place of OOV word in the resulting .TextGrid file.
LOUDING, PIGTAIL'S, BUNBURYED, TV, EEYORE, MIDYEAR, PEPER, PIGLIT
Hope is helps.

Please fix it or tell me what I am doing wrong.

mmcauliffe · 2021-12-03T18:36:34Z

Yeah that seems weird, I'll take a look. Is this with the most recent version (released last night)?

Hocine958 · 2021-12-03T19:24:02Z

Same here, OOV words are not detected by mfa validate and the aligned phonemes is just an empty string with the oov word time interval.
For the versions I tested, this problem is present in 2.0.0b7 and 2.0.0b8.
I also tested 2.0.0b4 where the validation was working but not the alignment.

AndreyBocharnikov · 2021-12-04T11:54:31Z

I run cat /root/miniconda3/envs/aligner/lib/python3.9/site-packages/montreal_forced_aligner/_version.py
and it returned:
# coding: utf-8
# file generated by setuptools_scm
# don't change, don't track in version control
version = '2.0.0b7'
version_tuple = (2, 0, 0)

mmcauliffe mentioned this issue Dec 9, 2021

Bug fixes #369

Merged

mmcauliffe closed this as completed in 947397e Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect behaviour of generating OOV during validation. #363

Incorrect behaviour of generating OOV during validation. #363

AndreyBocharnikov commented Dec 3, 2021

AndreyBocharnikov commented Dec 3, 2021

mmcauliffe commented Dec 3, 2021

Hocine958 commented Dec 3, 2021

AndreyBocharnikov commented Dec 4, 2021

Incorrect behaviour of generating OOV during validation. #363

Incorrect behaviour of generating OOV during validation. #363

Comments

AndreyBocharnikov commented Dec 3, 2021

AndreyBocharnikov commented Dec 3, 2021

mmcauliffe commented Dec 3, 2021

Hocine958 commented Dec 3, 2021

AndreyBocharnikov commented Dec 4, 2021