Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

You provided the "cs" option but are not calling with keys in sorted order #359

Closed
tshmak opened this issue Nov 24, 2021 · 4 comments
Closed

Comments

@tshmak
Copy link

tshmak commented Nov 24, 2021

DEBUG - ERROR (gmm-est-fmllr[5.5.985]:FindKeyInternal():util/kaldi-table-inl.h:2106) You provided the
"cs" option but are not calling with keys in sorted order: utt1266-DCP302-DrRex1 <
utt985-DCP302-DrRex0: rspecifier is ark,s,cs:apply-cmvn
--utt2spk=ark:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/utt2spk.lexicon2.0.scp
scp:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/cmvn.lexicon2.0.scp
scp:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/feats.lexicon2.0.scp ark:- | splice-
feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats
/home/tshmak/Documents/MFA/speakers/sat1/lda.mat ark:- ark:- |

I tried using the conda version of MFA downloaded on 2021/11/23. (mfa version shows only 2.0.0.) It gave me the above error when running mfa train speakers lexicon.txt output --clean --verbose -j 1 --output_model_path output.zip. This was encountered after "Initializing speaker-adapted triphone training...". Any idea why?

I can attach the full log if that's helpful.

Thanks

@mmcauliffe
Copy link
Member

Sure, full log would be nice! My guess is its related to the dataset setup and use of -j 1, so I'll try to replicate it with corpora that I have, but a couple of questions for you:

  1. How many speakers/utterances do you have?
  2. What's the naming format for your files?

In the meantime, you can try rerunning it with -j 10 --disable_mp and that will use a single process but still split up the corpus, which might get around this (if you just have one speaker then this won't work).

@ai-zahran
Copy link

I am currently facing the same issue as well. I am running version 2.0.0b4.dev8+g2403bd5.d20211108, installed via Conda. The data I am using is the L2-arctic data.

The directory structure looks as follows:

+-- Train
|   +-- ABA
|       --- arctic_a0003.wav
|       --- arctic_a0003.lab
|       --- arctic_a0005.wav
|       --- arctic_a0005.lab
|       --- ...
|   +-- ASI
|       --- arctic_a0001.wav
|       --- arctic_a0001.lab
|       --- arctic_a0002.wav
|       --- arctic_a0002.lab
|   --- ...
| +-- ...

where ABA and ASI are speakers, and each of the files under them are corresponding utterances. The data has a total of 12 speakers.
When I run mfa align --clean -j 10 --disable_mp Train english english Train_aligned, it outputs:

Cleaning old directory!
multilingual_ipa False
INFO - Setting up corpus information...
INFO - Number of speakers in corpus: 12, average number of utterances per speaker: 414.0833333333333
INFO - Parsing dictionary "english" without pronunciation probabilities without silence probabilities
INFO - Creating dictionary information...
INFO - Setting up training data...
INFO - Generating base features (mfcc)...
INFO - Calculating CMVN...
INFO - Setting up training data...
INFO - Setting up training data...
INFO - Done with setup!
INFO - Performing first-pass alignment...
INFO - Calculating fMLLR for speaker adaptation...
KaldiProcessingError: There were 2 job(s) with errors when running Kaldi binaries. For more details, please check /home/ai/Documents/MFA/Train/align.log

In the log file mentioned in the previous output, I can see the same error:

2021-11-25 01:36:45,940 - align - DEBUG -       WARNING (gmm-est-fmllr[5.5.985]:main():gmmbin/gmm-est
-fmllr.cc:118) Did not find posteriors for utterance arctic-b0537-BWC
2021-11-25 01:36:45,940 - align - DEBUG -       LOG (apply-cmvn[5.5.985]:main():featbin/apply-cmvn.cc
:162) Applied cepstral mean normalization to 1931 utterances, errors on 0
2021-11-25 01:36:45,940 - align - DEBUG -       WARNING (gmm-est-fmllr[5.5.985]:main():gmmbin/gmm-est
-fmllr.cc:118) Did not find posteriors for utterance arctic-b0538-BWC
2021-11-25 01:36:45,940 - align - DEBUG -       WARNING (gmm-est-fmllr[5.5.985]:main():gmmbin/gmm-est
-fmllr.cc:118) Did not find posteriors for utterance arctic-b0539-BWC
2021-11-25 01:36:45,940 - align - DEBUG -       LOG (gmm-est-fmllr[5.5.985]:ComputeFmllrMatrixDiagGmm
Full():transform/fmllr-diag-gmm.cc:262) fMLLR objf improvement is 15.4656 per frame over 45883 frames
.
2021-11-25 01:36:45,940 - align - DEBUG -       LOG (gmm-est-fmllr[5.5.985]:main():gmmbin/gmm-est-fml
lr.cc:143) For speaker BWC, auxf-impr from fMLLR is 15.4656, over 45883 frames.
2021-11-25 01:36:45,940 - align - DEBUG -       ERROR (gmm-est-fmllr[5.5.985]:FindKeyInternal():util/
kaldi-table-inl.h:2106) You provided the "cs" option but are not calling with keys in sorted order: a
rctic-a0003-ERMS < arctic-b0539-BWC: rspecifier is ark,s,cs:apply-cmvn --utt2spk=ark:/home/ai/Documen
ts/MFA/Train/corpus_data/split3/utt2spk.english.0.scp scp:/home/ai/Documents/MFA/Train/corpus_data/sp
lit3/cmvn.english.0.scp scp:/home/ai/Documents/MFA/Train/corpus_data/split3/feats.english.0.scp ark:-

@tshmak
Copy link
Author

tshmak commented Nov 25, 2021

I have 59 speakers/69881 utterances.

The utterances are often (but not always) named as 'uttXXX.wav', where 'XXX' is a number. The naming of utterances across speakers is not consistent. It is possible that different speakers have the same filename (but are in different folders). In addition to alphanumeric characters, filenames can have '.' or '_'. There is one speaker whose .wav files begin with the speaker's name.

Also note that I didn't have this problem when running with v2.0.0a17.

Incidentally, how do you get the detailed version when running the conda mfa? mfa version only gives me 2.0.0.

Here's the log:

$ ./aligned2.sh
Number of jobs starting: 1
Cleaning old directory!
DEBUG - Set up logger for MFA version: 2.0.0
DEBUG - TRAIN CONFIG:
DEBUG - !!python/object:montreal_forced_aligner.config.TrainingConfig training_configs: -
                 !!python/object:montreal_forced_aligner.trainers.MonophoneTrainer   _data_directory: null
                 _speaker_independent: true   _use_mp: true   _uses_cmvn: true   _uses_splices: false   _uses_voiced:
                 false   acoustic_scale: 0.1   architecture: gmm-hmm   beam: 10   boost_silence: 1.25   calc_pron_probs:
                 false   cleanup_textgrids: true   corpus: null   current_gaussians: null   debug: false   dictionary:
                 null   feature_config: !!python/object:montreal_forced_aligner.config.FeatureConfig
                 allow_downsample: true     allow_upsample: true     deltas: true     fmllr: false     frame_shift: 10
                 high_frequency: 7800     lda: false     low_frequency: 20     pitch: false     sample_frequency: 16000
                 snip_edges: true     splice_left_context: 3     splice_right_context: 3     type: mfcc     use_energy:
                 false     use_mp: true   identifier: null   initial_gaussians: 135   iteration: 0   logger: null
                 max_gaussians: 1000   num_iterations: 40   overwrite: false   power: 0.25   previous_trainer: null
                 realignment_iterations:   - 1   - 2   - 3   - 4   - 5   - 6   - 7   - 8   - 9   - 10   - 12   - 14   -
                 16   - 18   - 20   - 23   - 26   - 29   - 32   - 35   - 38   - 1   - 2   - 3   - 4   - 5   - 6   - 7
                 - 8   - 9   - 10   - 12   - 14   - 16   - 18   - 20   - 23   - 26   - 29   - 32   - 35   - 38   - 1   -
                 2   - 3   - 4   - 5   - 6   - 7   - 8   - 9   - 10   - 12   - 14   - 16   - 18   - 20   - 23   - 26   -
                 29   - 32   - 35   - 38   retry_beam: 40   self_loop_scale: 0.1   subset: 2000   temp_directory: null
                 training_complete: false   transition_scale: 1.0 -
                 !!python/object:montreal_forced_aligner.trainers.TriphoneTrainer   _data_directory: null
                 _speaker_independent: true   _use_mp: true   _uses_cmvn: true   _uses_splices: false   _uses_voiced:
                 false   acoustic_scale: 0.1   architecture: gmm-hmm   beam: 10   boost_silence: 1.25   calc_pron_probs:
                 false   cleanup_textgrids: true   cluster_threshold: -1   corpus: null   current_gaussians: 2000
                 debug: false   dictionary: null   feature_config:
                 !!python/object:montreal_forced_aligner.config.FeatureConfig     allow_downsample: true
                 allow_upsample: true     deltas: true     fmllr: false     frame_shift: 10     high_frequency: 7800
                 lda: false     low_frequency: 20     pitch: false     sample_frequency: 16000     snip_edges: true
                 splice_left_context: 3     splice_right_context: 3     type: mfcc     use_energy: false     use_mp:
                 true   identifier: null   initial_gaussians: 2000   iteration: 0   logger: null   max_gaussians: 10000
                 num_iterations: 35   num_leaves: 2000   overwrite: false   power: 0.25   previous_trainer: null
                 realignment_iterations:   - 10   - 20   - 30   - 10   - 20   - 30   - 10   - 20   - 30   retry_beam: 40
                 self_loop_scale: 0.1   subset: 5000   temp_directory: null   training_complete: false
                 transition_scale: 1.0 - !!python/object:montreal_forced_aligner.trainers.LdaTrainer   _data_directory:
                 null   _speaker_independent: true   _use_mp: true   _uses_cmvn: true   _uses_splices: true
                 _uses_voiced: false   acoustic_scale: 0.1   architecture: gmm-hmm   beam: 10   boost_silence: 1.0
                 calc_pron_probs: false   cleanup_textgrids: true   cluster_threshold: -1   corpus: null
                 current_gaussians: 2500   debug: false   dictionary: null   feature_config:
                 !!python/object:montreal_forced_aligner.config.FeatureConfig     allow_downsample: true
                 allow_upsample: true     deltas: true     fmllr: false     frame_shift: 10     high_frequency: 7800
                 lda: true     low_frequency: 20     pitch: false     sample_frequency: 16000     snip_edges: true
                 splice_left_context: 3     splice_right_context: 3     type: mfcc     use_energy: false     use_mp:
                 true   identifier: null   initial_gaussians: 2500   iteration: 0   lda_dimension: 40   logger: null
                 max_gaussians: 15000   mllt_iterations:   - 2   - 4   - 6   - 16   num_iterations: 35   num_leaves:
                 2500   overwrite: false   power: 0.25   previous_trainer: null   random_prune: 4.0
                 realignment_iterations:   - 10   - 20   - 30   - 10   - 20   - 30   - 10   - 20   - 30   retry_beam: 40
                 self_loop_scale: 0.1   subset: 10000   temp_directory: null   training_complete: false
                 transition_scale: 1.0 - !!python/object:montreal_forced_aligner.trainers.SatTrainer   _data_directory:
                 null   _speaker_independent: true   _use_mp: true   _uses_cmvn: true   _uses_splices: false
                 _uses_voiced: false   acoustic_scale: 0.1   architecture: gmm-hmm   beam: 10   boost_silence: 1.0
                 calc_pron_probs: false   cleanup_textgrids: true   cluster_threshold: -1   corpus: null
                 current_gaussians: 2500   debug: false   dictionary: null   ensure_train: true   feature_config:
                 !!python/object:montreal_forced_aligner.config.FeatureConfig     allow_downsample: true
                 allow_upsample: true     deltas: true     fmllr: true     frame_shift: 10     high_frequency: 7800
                 lda: false     low_frequency: 20     pitch: false     sample_frequency: 16000     snip_edges: true
                 splice_left_context: 3     splice_right_context: 3     type: mfcc     use_energy: false     use_mp:
                 true   fmllr_iterations:   - 2   - 4   - 6   - 16   fmllr_update_type: full   identifier: null
                 initial_fmllr: true   initial_gaussians: 2500   iteration: 0   logger: null   max_gaussians: 15000
                 num_iterations: 35   num_leaves: 2500   overwrite: false   power: 0.2   previous_trainer: null
                 realignment_iterations:   - 10   - 20   - 30   - 10   - 20   - 30   - 10   - 20   - 30   retry_beam: 40
                 self_loop_scale: 0.1   silence_weight: 0.0   subset: 10000   temp_directory: null   training_complete:
                 false   transition_scale: 1.0 - !!python/object:montreal_forced_aligner.trainers.SatTrainer
                 _data_directory: null   _speaker_independent: true   _use_mp: true   _uses_cmvn: true   _uses_splices:
                 false   _uses_voiced: false   acoustic_scale: 0.1   architecture: gmm-hmm   beam: 10   boost_silence:
                 1.0   calc_pron_probs: false   cleanup_textgrids: true   cluster_threshold: -1   corpus: null
                 current_gaussians: 4200   debug: false   dictionary: null   ensure_train: true   feature_config:
                 !!python/object:montreal_forced_aligner.config.FeatureConfig     allow_downsample: true
                 allow_upsample: true     deltas: true     fmllr: true     frame_shift: 10     high_frequency: 7800
                 lda: false     low_frequency: 20     pitch: false     sample_frequency: 16000     snip_edges: true
                 splice_left_context: 3     splice_right_context: 3     type: mfcc     use_energy: false     use_mp:
                 true   fmllr_iterations:   - 2   - 4   - 6   - 16   fmllr_update_type: full   identifier: null
                 initial_fmllr: true   initial_gaussians: 4200   iteration: 0   logger: null   max_gaussians: 40000
                 num_iterations: 35   num_leaves: 4200   overwrite: false   power: 0.2   previous_trainer: null
                 realignment_iterations:   - 10   - 20   - 30   - 10   - 20   - 30   - 10   - 20   - 30   retry_beam: 40
                 self_loop_scale: 0.1   silence_weight: 0.0   subset: null   temp_directory: null   training_complete:
                 false   transition_scale: 1.0 training_identifiers: - mono - tri - lda - sat1 - sat2 use_mp: true
DEBUG - ALIGN CONFIG:
DEBUG - !!python/object:montreal_forced_aligner.config.AlignConfig acoustic_scale: 0.1 beam: 10 boost_silence:
                 1.0 cleanup_textgrids: true data_directory: null debug: false disable_sat: false feature_config:
                 !!python/object:montreal_forced_aligner.config.FeatureConfig   allow_downsample: true   allow_upsample:
                 true   deltas: true   fmllr: true   frame_shift: 10   high_frequency: 7800   lda: false
                 low_frequency: 20   pitch: false   sample_frequency: 16000   snip_edges: true   splice_left_context: 3
                 splice_right_context: 3   type: mfcc   use_energy: false   use_mp: true fmllr_update_type: full
                 initial_fmllr: true iteration: null overwrite: false retry_beam: 40 self_loop_scale: 0.1
                 transition_scale: 1.0 use_fmllr_mp: false use_mp: true
INFO - Setting up corpus information...
DEBUG - Could not find /home/tshmak/Documents/MFA/speakers/corpus_data/speakers.yaml, cannot load from temp
DEBUG - Loading from source without multiprocessing
DEBUG - Parsed corpus directory in 769.9068269729614 seconds
INFO - Number of speakers in corpus: 59, average number of utterances per speaker: 1184.4237288135594
INFO - Parsing dictionary "lexicon2" without pronunciation probabilities without silence probabilities
INFO - Creating dictionary information...
INFO - Setting up training data...
INFO - Generating base features (mfcc)...
WARNING - There were some utterances ignored due to short duration, see the log file for full details or run
                   `mfa validate` on the corpus.
DEBUG - The following utterances were too short to run alignment: utt422-DCP309-TingFeng
                 ,utt422-DCP309-TingFeng
INFO - Calculating CMVN...
INFO - Setting up training data...
INFO - Setting up training data...
INFO - Initializing training for mono...
DEBUG - Setup for initialization took 28.67957901954651 seconds
DEBUG - Compiling training graphs...
DEBUG - Compiling training graphs took 2.673961877822876
INFO - Initialization complete!
DEBUG - Initialization took 5.312573671340942 seconds
  0%|                                                                                                                                                       | 0/40 [00:00<?, ?it/s]DEBUG - Alignment round took 12.285019636154175
  2%|███▌                                                                                                                                           | 1/40 [00:13<08:28, 13.05s/it]DEBUG - Alignment round took 4.532478332519531
  5%|███████▏                                                                                                                                       | 2/40 [00:18<05:23,  8.50s/it]DEBUG - Alignment round took 4.01016640663147
  8%|██████████▋                                                                                                                                    | 3/40 [00:23<04:11,  6.81s/it]DEBUG - Alignment round took 4.065117835998535
 10%|██████████████▎                                                                                                                                | 4/40 [00:27<03:35,  5.99s/it]DEBUG - Alignment round took 4.4452619552612305
 12%|█████████████████▉                                                                                                                             | 5/40 [00:33<03:20,  5.73s/it]DEBUG - Alignment round took 4.042255878448486
 15%|█████████████████████▍                                                                                                                         | 6/40 [00:38<03:05,  5.44s/it]DEBUG - Alignment round took 4.178578853607178
 18%|█████████████████████████                                                                                                                      | 7/40 [00:42<02:53,  5.27s/it]DEBUG - Alignment round took 4.353701829910278
 20%|████████████████████████████▌                                                                                                                  | 8/40 [00:48<02:47,  5.25s/it]DEBUG - Alignment round took 4.182608127593994
 22%|████████████████████████████████▏                                                                                                              | 9/40 [00:53<02:39,  5.14s/it]DEBUG - Alignment round took 4.172502756118774
 28%|███████████████████████████████████████                                                                                                       | 11/40 [00:59<01:51,  3.85s/it]DEBUG - Alignment round took 4.448259353637695
 32%|██████████████████████████████████████████████▏                                                                                               | 13/40 [01:05<01:27,  3.25s/it]DEBUG - Alignment round took 4.213992595672607
 38%|█████████████████████████████████████████████████████▎                                                                                        | 15/40 [01:11<01:13,  2.94s/it]DEBUG - Alignment round took 4.193673133850098
 42%|████████████████████████████████████████████████████████████▎                                                                                 | 17/40 [01:17<01:03,  2.76s/it]DEBUG - Alignment round took 4.220340013504028
 48%|███████████████████████████████████████████████████████████████████▍                                                                          | 19/40 [01:23<00:56,  2.70s/it]DEBUG - Alignment round took 4.535277366638184
 55%|██████████████████████████████████████████████████████████████████████████████                                                                | 22/40 [01:30<00:39,  2.17s/it]DEBUG - Alignment round took 4.150838851928711
 62%|████████████████████████████████████████████████████████████████████████████████████████▊                                                     | 25/40 [01:37<00:29,  1.94s/it]DEBUG - Alignment round took 4.120241641998291
 70%|███████████████████████████████████████████████████████████████████████████████████████████████████▍                                          | 28/40 [01:44<00:22,  1.85s/it]DEBUG - Alignment round took 4.427204370498657
 78%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████                                | 31/40 [01:51<00:17,  1.94s/it]DEBUG - Alignment round took 4.18611741065979
 85%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                     | 34/40 [01:58<00:11,  1.89s/it]DEBUG - Alignment round took 4.312897682189941
 92%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎          | 37/40 [02:05<00:05,  1.93s/it]DEBUG - Alignment round took 4.508893013000488
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [02:13<00:00,  3.33s/it]
INFO - Training complete!
DEBUG - Training took 133.36941695213318 seconds
INFO - Generating alignments using mono models using 5000 utterances...
DEBUG - Compiling training graphs...
DEBUG - Compiling training graphs took 6.516096353530884
DEBUG - Alignment round took 19.247758150100708
DEBUG - Compiling information took 0.09038186073303223
DEBUG - Average per frame likelihood (this might not actually mean anything) for mono: -102.107
DEBUG - Number of unaligned files for mono: 16
DEBUG - Alignment took 137.30383467674255 seconds
INFO - Initializing training for tri...
DEBUG - Setup for initialization took 112.74181485176086 seconds
DEBUG - Compiling training graphs...
DEBUG - Compiling training graphs took 9.029027223587036
INFO - Initialization complete!
DEBUG - Initialization took 29.600830793380737 seconds
 26%|████████████████████████████████████▊                                                                                                          | 9/35 [00:23<01:11,  2.73s/it]DEBUG - Alignment round took 19.803386688232422
 54%|█████████████████████████████████████████████████████████████████████████████                                                                 | 19/35 [01:13<00:52,  3.30s/it]DEBUG - Alignment round took 20.265465021133423
 83%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                        | 29/35 [02:08<00:22,  3.71s/it]DEBUG - Alignment round took 20.351600170135498
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [02:49<00:00,  4.86s/it]
INFO - Training complete!
DEBUG - Training took 169.9659013748169 seconds
INFO - Generating alignments using tri models using 10000 utterances...
DEBUG - Compiling training graphs...
DEBUG - Compiling training graphs took 22.89983320236206
DEBUG - Alignment round took 59.850199937820435
DEBUG - Compiling information took 0.12182116508483887
DEBUG - Average per frame likelihood (this might not actually mean anything) for tri: -99.7245
DEBUG - Number of unaligned files for tri: 7
DEBUG - Alignment took 367.6730623245239 seconds
INFO - Initializing training for lda...
DEBUG - Setup for initialization took 284.81506085395813 seconds
DEBUG - Compiling training graphs...
DEBUG - Compiling training graphs took 23.359941005706787
INFO - Initialization complete!
DEBUG - Initialization took 53.745129108428955 seconds
 26%|████████████████████████████████████▊                                                                                                          | 9/35 [01:54<04:31, 10.46s/it]DEBUG - Alignment round took 52.5507378578186
 54%|█████████████████████████████████████████████████████████████████████████████                                                                 | 19/35 [04:23<02:51, 10.73s/it]DEBUG - Alignment round took 55.01582622528076
 83%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋                        | 29/35 [06:48<00:59,  9.84s/it]DEBUG - Alignment round took 54.138418197631836
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [08:39<00:00, 14.83s/it]
INFO - Training complete!
DEBUG - Training took 519.0667488574982 seconds
INFO - Generating alignments using lda models using 10000 utterances...
DEBUG - Compiling training graphs...
DEBUG - Compiling training graphs took 23.24289321899414
DEBUG - Alignment round took 54.30949354171753
DEBUG - Compiling information took 0.13241314888000488
DEBUG - Average per frame likelihood (this might not actually mean anything) for lda: -49.4234
DEBUG - Number of unaligned files for lda: 8
DEBUG - Alignment took 353.37736463546753 seconds
INFO - Initializing training for sat1...
DEBUG - Setup for initialization took 274.0100133419037 seconds
INFO - Initializing speaker-adapted triphone training...
DEBUG - Compiling training graphs...
DEBUG - Compiling training graphs took 23.874430894851685
DEBUG - There were 1 kaldi processing files that had errors:

DEBUG - /home/tshmak/Documents/MFA/speakers/sat1/log/calc_fmllr.0.log
DEBUG -         /mnt/nas2/tshmak/WORK/Projects/TTS/FanoFastSpeech2/train/MFA/conda_gpu1_MFA_20211123/bin/ali-
                 to-post ark:/home/tshmak/Documents/MFA/speakers/sat1/ali.lexicon2.0.ark ark:-
DEBUG - /mnt/nas2/tshmak/WORK/Projects/TTS/FanoFastSpeech2/train/MFA/conda_gpu1_MFA_20211123/bin/weight-
                 silence-post 0.0 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15 /home/tshmak/Documents/MFA/speakers/sat1/0.mdl
                 ark:- ark:-
DEBUG -         /mnt/nas2/tshmak/WORK/Projects/TTS/FanoFastSpeech2/train/MFA/conda_gpu1_MFA_20211123/bin/gmm-
                 est-fmllr --verbose=4 --fmllr-update-type=full
                 --spk2utt=ark:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/spk2utt.lexicon2.0.scp
                 /home/tshmak/Documents/MFA/speakers/sat1/0.mdl 'ark,s,cs:apply-cmvn
                 --utt2spk=ark:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/utt2spk.lexicon2.0.scp
                 scp:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/cmvn.lexicon2.0.scp
                 scp:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/feats.lexicon2.0.scp ark:- | splice-
                 feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats
                 /home/tshmak/Documents/MFA/speakers/sat1/lda.mat ark:- ark:- |' ark,s,cs:-
                 ark:/home/tshmak/Documents/MFA/speakers/sat1/trans.lexicon2.0.ark
DEBUG -         apply-cmvn
                 --utt2spk=ark:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/utt2spk.lexicon2.0.scp
                 scp:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/cmvn.lexicon2.0.scp
                 scp:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/feats.lexicon2.0.scp ark:-
DEBUG -         transform-feats /home/tshmak/Documents/MFA/speakers/sat1/lda.mat ark:- ark:-
DEBUG -         splice-feats --left-context=3 --right-context=3 ark:- ark:-
DEBUG -         LOG (gmm-est-fmllr[5.5.985]:ComputeFmllrMatrixDiagGmmFull():transform/fmllr-diag-gmm.cc:262)
                 fMLLR objf improvement is 6.54043 per frame over 27039 frames.
DEBUG -         LOG (gmm-est-fmllr[5.5.985]:main():gmmbin/gmm-est-fmllr.cc:143) For speaker CMHK01F, auxf-impr
                 from fMLLR is 6.54043, over 27039 frames.
DEBUG -         LOG (gmm-est-fmllr[5.5.985]:ComputeFmllrMatrixDiagGmmFull():transform/fmllr-diag-gmm.cc:262)
                 fMLLR objf improvement is 4.1697 per frame over 79163 frames.
DEBUG -         LOG (gmm-est-fmllr[5.5.985]:main():gmmbin/gmm-est-fmllr.cc:143) For speaker DCP302_DrRex0,
                 auxf-impr from fMLLR is 4.1697, over 79163 frames.
DEBUG -         ERROR (gmm-est-fmllr[5.5.985]:FindKeyInternal():util/kaldi-table-inl.h:2106) You provided the
                 "cs" option but are not calling with keys in sorted order: utt1266-DCP302-DrRex1 <
                 utt985-DCP302-DrRex0: rspecifier is ark,s,cs:apply-cmvn
                 --utt2spk=ark:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/utt2spk.lexicon2.0.scp
                 scp:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/cmvn.lexicon2.0.scp
                 scp:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/feats.lexicon2.0.scp ark:- | splice-
                 feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats
                 /home/tshmak/Documents/MFA/speakers/sat1/lda.mat ark:- ark:- |
DEBUG -         WARNING (gmm-est-fmllr[5.5.985]:Close():util/kaldi-io.cc:515) Pipe apply-cmvn
                 --utt2spk=ark:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/utt2spk.lexicon2.0.scp
                 scp:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/cmvn.lexicon2.0.scp
                 scp:/home/tshmak/Documents/MFA/speakers/corpus_data/subset_10000/feats.lexicon2.0.scp ark:- | splice-
                 feats --left-context=3 --right-context=3 ark:- ark:- | transform-feats
                 /home/tshmak/Documents/MFA/speakers/sat1/lda.mat ark:- ark:- | had nonzero return status 36096
DEBUG -         kaldi::KaldiFatalErrorXDG_SESSION_ID: 37676
DEBUG -         TERM: screen
DEBUG -         SHELL: /bin/bash
DEBUG -         SSH_CLIENT: 192.168.1.1 54681 22
DEBUG -         CONDA_SHLVL: 2
DEBUG -         CONDA_PROMPT_MODIFIER: (conda_gpu1_MFA_20211123)
DEBUG -         SSH_TTY: /dev/pts/3
DEBUG -         LC_ALL: en_US.UTF-8
DEBUG -         USER: tshmak
DEBUG -         BUP_DIR: /home/tshmak/storage/.bup
DEBUG -         CONDA_EXE: /home/tshmak/miniconda3/bin/conda
DEBUG -         SSH_AUTH_SOCK: /tmp/ssh-6QLybKxiOO/agent.165567
DEBUG -         TMUX: /tmp/tmux-5000/default,166661,0
DEBUG -         _CE_CONDA:
DEBUG -         CONDA_PREFIX_1: /home/tshmak/miniconda3
DEBUG -         MAIL: /var/mail/tshmak
DEBUG -         PATH: /mnt/nas2/tshmak/WORK/Projects/TTS/FanoFastSpeech2/train/MFA/conda_gpu1_MFA_20211123/bin:
                 /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/hom
                 e/tshmak/scripts:/home/tshmak/local/gpu2/bin:/home/tshmak/local/gpu1/bin
DEBUG -         CONDA_PREFIX:
                 /mnt/nas2/tshmak/WORK/Projects/TTS/FanoFastSpeech2/train/MFA/conda_gpu1_MFA_20211123
DEBUG -         PWD: /home/tshmak/WORK/Projects/TTS/FanoFastSpeech2/train/MFA/55CantoneseE
DEBUG -         CUDA_VISIBLE_DEVICES: 1
DEBUG -         EDITOR: vim
DEBUG -         LANG: en_US.UTF-8
DEBUG -         TMUX_PANE: %0
DEBUG -         _CE_M:
DEBUG -         HOME: /home/tshmak
DEBUG -         SHLVL: 3
DEBUG -         LANGUAGE: en_HK:en
DEBUG -         CONDA_PYTHON_EXE: /home/tshmak/miniconda3/bin/python
DEBUG -         LOGNAME: tshmak
DEBUG -         VISUAL: vim
DEBUG -         SSH_CONNECTION: 192.168.1.1 54681 192.168.1.238 22
DEBUG -         LC_CTYPE: UTF-8
DEBUG -         XDG_DATA_DIRS: /usr/local/share:/usr/share:/var/lib/snapd/desktop
DEBUG -         CONDA_DEFAULT_ENV:
                 /mnt/nas2/tshmak/WORK/Projects/TTS/FanoFastSpeech2/train/MFA/conda_gpu1_MFA_20211123
DEBUG -         DISPLAY: localhost:10.0
DEBUG -         XDG_RUNTIME_DIR: /run/user/5000
DEBUG -         default_PATH:
                 /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
DEBUG -         _: /mnt/nas2/tshmak/WORK/Projects/TTS/FanoFastSpeech2/train/MFA/conda_gpu1_MFA_20211123/bin/mfa
DEBUG -         OPENBLAS_NUM_THREADS: 1
DEBUG -         MKL_NUM_THREADS: 1
KaldiProcessingError: There were 1 job(s) with errors when running Kaldi binaries. For more details, please check /home/tshmak/Documents/MFA/speakers/train_acoustic_

@mmcauliffe
Copy link
Member

I uploaded version 2.0.0b8 last night, should have a fix for this, can you try upgrading and rerunning with the --clean flag?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants