-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[egs,script,src] x-vectors for diarization #2391
[egs,script,src] x-vectors for diarization #2391
Conversation
…g results to callhome_diarization/v2/run.sh
…cipe in callhome_diarization/v2
It looks like the build failed due to: ../matrix/libkaldi-matrix.so: file not recognized: File format not recognized I don't this this is related to the PR. |
Just restart the built, I think this is due some weird concurrency issues
in travis (or the VM in which travis runs)
y.
…On Thu, May 3, 2018 at 7:11 PM david-ryan-snyder ***@***.***> wrote:
The build failed due to:
../matrix/libkaldi-matrix.so: file not recognized: File format not
recognized
clang: error: linker command failed with exit code 1 (use -v to see
invocation)
make[1]: *** [libkaldi-util.so] Error 1
make[1]: Leaving directory `/home/travis/build/kaldi-asr/kaldi/src/util'
make: *** [util] Error 2
I don't this this is related to the PR.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2391 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AKisXwDXHlizH1LAIdX5pXpzOPNgIZQcks5tu46zgaJpZM4TrC5z>
.
|
@jtrmal I don't think I have the permissions necessary to manually restart the build. But, I just pushed inconsequential commit to trigger it. |
OK, checks passed now--should be good to go. |
I made a mistake when I ran into cluster.sh,My reco2num_spk file is “ 93 2 ” agglomerative-cluster --threshold=0.5 --read-costs=false --reco2num-spk-rspecifier=ark,t:data/test_cmn_segmented/reco2num_spk --max-spk-fraction=1.0 --first-pass-max-utterances=32767 "scp:utils/filter_scp.pl exp/xvector_nnet_1a/xvectors_test_segmented/plda_scores_num_spk/tmp/split1/1/spk2utt exp/xvector_nnet_1a/xvectors_test_segmented/plda_scores/scores.scp |" ark,t:exp/xvector_nnet_1a/xvectors_test_segmented/plda_scores_num_spk/tmp/split1/1/spk2utt ark,t:exp/xvector_nnet_1a/xvectors_test_segmented/plda_scores_num_spk/labels.1Started at Mon Sep 9 10:47:22 CST 2019agglomerative-cluster --threshold=0.5 --read-costs=false --reco2num-spk-rspecifier=ark,t:data/test_cmn_segmented/reco2num_spk --max-spk-fraction=1.0 --first-pass-max-utterances=32767 'scp:utils/filter_scp.pl exp/xvector_nnet_1a/xvectors_test_segmented/plda_scores_num_spk/tmp/split1/1/spk2utt exp/xvector_nnet_1a/xvectors_test_segmented/plda_scores/scores.scp |' ark,t:exp/xvector_nnet_1a/xvectors_test_segmented/plda_scores_num_spk/tmp/split1/1/spk2utt ark,t:exp/xvector_nnet_1a/xvectors_test_segmented/plda_scores_num_spk/labels.1 |
This error tells you what is going on. There's no value in the lefthand column (the key) of reco2num_spk called "test." If you look in scores.scp or perhaps spk2utt you'll likely find a key called "test." If you provide the file reco2num_spk you need to ensure it has the same recording names that are present in the other files which key on the recording name (e.g., spk2utt, or probably labels, or scores.scp). |
This PR adds an x-vector-based diarization recipe in callhome_diarization/v2. The results are about 20% better than the traditional i-vector recipe in v1.
callhome_diarization/v1/diarization/nnet3/xvector/extract_xvectors.sh
-- extracts x-vectors from a sliding window (e.g., 1.5 seconds with a 0.75 overlap). Based on diarization/extract_ivectors.sh.callhome_diarization/v1/diarization/nnet3/xvector/score_plda.sh
-- Computes the affinity matrix between pairs of x-vectors extracted from the previous script. This is equivalent to diarization/score_plda.sh but works with archives of x-vectors instead of i-vectorssrc/nnet3bin/nnet3-xvector-compute.cc
-- Adds option for input padding, when input features are less the the minimum chunk lengthcallhome_diarization/v2/run.sh
-- Data prep and x-vector training similar to egs/sre16/v2/run.sh. Remainder of script is almost exactly the same as the i-vector-based diarization system, but uses x-vectors instead of i-vectors.callhome_diarization/v1/local/nnet3/xvector/run_xvector_1a.sh
-- Trains the x-vector DNN. Closely based on what we do for speaker recognition, but the DNN is a little smaller.callhome_diarization/v1/local/nnet3/xvector/prepare_feats.sh
-- Applies sliding window CMVN to the features. Unfortunately this can't be done in memory, since the sub-segmentation performed by extract_xvectors.sh is done on disk (and it doesn't make sense to apply the CMVN after sub-segmentation)callhome_diarization/v1/local/nnet3/xvector/prepare_feats_for_egs.sh
-- Prepares the features that the nnet egs are computed from.callhome_diarization/v2/conf
-- Standard x-vector mfcc.conf and vad.confcallhome_diarization/v1/diarization/extract_ivectors.sh
-- Adding missing option documentationegs/sre16/v2/run.sh
-- Updating comments to point to more appropriate referenceegs/sre16/v2/README.txt
-- Updating comments to point to more appropriate referenceegs/callhome_diarization/v1/local/make_swbd2_phase3.pl
-- Minor fixegs/callhome_diarization/v1/local/make_swbd2_phase2.pl
-- Minor fixegs/callhome_diarization/v1/local/make_swbd_cellular1.pl
-- Minor fixegs/callhome_diarization/v1/local/make_swbd_cellular2.pl
-- Minor fix