-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect behaviour of generating OOV during validation. #363
Comments
I found out that my dataset contains more words that are not in I installed mfa via installation page from documentation: The list of words that is not in Please fix it or tell me what I am doing wrong. |
Yeah that seems weird, I'll take a look. Is this with the most recent version (released last night)? |
Same here, OOV words are not detected by mfa validate and the aligned phonemes is just an empty string with the oov word time interval. |
I run |
Hello and thanks you for your excellent work.
I've got paired txt-wav sample "dear customer,welcome to our ship." And because of missing space around comma this "word" is not in dictionary that was downloaded via
mfa model download dictionary english
so it should be in OOV, but it's not. After runningmfa validate wrong_sample english english
logs from/root/Documents/MFA/wrong_sample/validate.log
says "There were no missing words from the dictionary" which seems to be bug. The fact that this "word" is not being taken in account can be seen withmfa align wrong_sample english english wrong_sample_result
, the resulting phonems looks like "D IH1 R" then blank for more than a second and then "T UW1 AW1 ER0 SH IH1 P".The wrong alignment itself could be overcome with
mfa g2p english_g2p wrong_sample wrong_sample_g2p
which does generatecustomer,welcome K AH1 S T AH0 M ER0 W EH1 L K AH0 M
, but the fact thatmfa validate
doesn't generate OOV file on that sample seems wrong.With love, looking forward for your replay :)
The text was updated successfully, but these errors were encountered: