[egs] Speeding up i-vector training in voxceleb v1 recipe #2421

david-ryan-snyder · 2018-05-14T14:20:03Z

The VoxCeleb training data consists of lots of (over 1.2 million) short recordings. As a result, the i-vector training takes an extremely long time to finish. Also, if the recordings are very short, I believe it is harmful to the i-vector extractor.

In this PR, we train the i-vector extractor on just the longest 100,000 recordings. This reduces the amount of training time by over 90%, and slightly improves performance, from 5.53% EER to 5.419% EER.

FYI, @entn-at

…able data, and uses a better performing wideband MFCC config

…e it takes an extremely long time to train

david-ryan-snyder · 2018-05-14T15:25:30Z

Maybe a better solution (to the problem of very short segments in the i-vector extractor training data) might be to include a mechanism that pools features across segments (based on a reco2seg or reco2utt file) before extracting i-vector stats.

But I think the proposed solution in this PR is fine for now, as it massively reduces training time without impacting performance negatively.

entn-at · 2018-05-14T15:54:20Z

I think your proposed solution is fine! I'll update my BNF PR.

One solution that I explored when I created the first version of the recipe is to concatenate individual utts to recordings (the code below would only work for voxceleb2, as it splits utt_ids using substrings):

for name in voxceleb2_train voxceleb2_test; do
    mkdir -p data/${name}_concat
    # discard segment portion of uttIds
    awk '{print substr($1,1,19), $2}' < data/${name}/utt2spk | sort | uniq > data/${name}_concat/utt2spk
    # update spk2utt file
    utils/utt2spk_to_spk2utt.pl < data/${name}_concat/utt2spk > data/${name}_concat/spk2utt
    # concatenate features
    awk '{print $1, substr($1,1,19)}' < data/${name}/utt2spk | utils/utt2spk_to_spk2utt.pl | \
      utils/apply_map.pl -f 2- data/${name}/feats.scp | \
      awk '{if (NF<=2){print;} else { $1 = $1 " concat-feats --print-args=false"; $NF = $NF " - |"; print; }}' > data/${name}_concat/feats.scp
done

david-ryan-snyder · 2018-05-14T16:16:07Z

Yes, I was thinking something along those lines. We could create a version of concat-feats that takes as input a feats archive (keyed on utterance ID) and a "reco2utt" file, concats the utterances together (if they belong to the same recording ID), and returns an archive of features, where the key is the recording ID.

But, if we do this, I think it should be in a separate PR that handles just that issue.

david-ryan-snyder · 2018-05-15T19:30:47Z

@danpovey pointed out that this script https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/utils/data/combine_short_segments.sh exists, and may help with this issue. Something to try later.

entn-at · 2018-05-15T21:27:05Z

I looked into that script some time ago in a different context. While it prefers to combine segments from the same speaker, it can use segments from other speakers to satisfy the minimum target segment length. Empirically, this may not make much of a difference, but I think it's not ideal. choose_utts_to_combine.py, the script that does the hard work, could be modified to be strict about only combining segments from the same speaker (specifically, the part after line 256 could be made optional).

In the case of VoxCeleb, I think combining all utterances that come from a single video into one recording makes the most sense, although I'd have to look into how wide the range of durations is. For VoxCeleb2 train, the number of videos per speaker ranges from 6 to 91.

…2421) * [egs]: updating the voxceleb recipe so that it uses more of the available data, and uses a better performing wideband MFCC config * [egs]: fixing comment error in mfcc.conf * [egs] updating voxceleb/v1/run.sh results * [egs] changing url to download voxceleb1 test set from, updating READMEs * [egs] fixing comment in voxceleb/v2/run.sh * [egs] adding check that ffmpeg exists in voxceleb2 data prep * [egs] subsampling the i-vector training data in voxceleb/v2, otherwise it takes an extremely long time to train

David Snyder added 8 commits May 3, 2018 22:00

[egs]: updating the voxceleb recipe so that it uses more of the avail…

f5dffd3

…able data, and uses a better performing wideband MFCC config

[egs]: fixing comment error in mfcc.conf

4aa206c

[egs] updating voxceleb/v1/run.sh results

cdb876d

[egs] changing url to download voxceleb1 test set from, updating READMEs

463d286

[egs] fixing comment in voxceleb/v2/run.sh

c6d692f

[egs] adding check that ffmpeg exists in voxceleb2 data prep

a350405

[egs] subsampling the i-vector training data in voxceleb/v2, otherwis…

6504840

…e it takes an extremely long time to train

[egs] merging upstream master

d38213b

david-ryan-snyder mentioned this pull request May 14, 2018

[egs] BNF i-vector based speaker verification recipe for VoxCeleb #2420

Closed

danpovey merged commit bce4336 into kaldi-asr:master May 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[egs] Speeding up i-vector training in voxceleb v1 recipe #2421

[egs] Speeding up i-vector training in voxceleb v1 recipe #2421

david-ryan-snyder commented May 14, 2018

david-ryan-snyder commented May 14, 2018 •

edited

Loading

entn-at commented May 14, 2018

david-ryan-snyder commented May 14, 2018

david-ryan-snyder commented May 15, 2018

entn-at commented May 15, 2018 •

edited

Loading

[egs] Speeding up i-vector training in voxceleb v1 recipe #2421

[egs] Speeding up i-vector training in voxceleb v1 recipe #2421

Conversation

david-ryan-snyder commented May 14, 2018

david-ryan-snyder commented May 14, 2018 • edited Loading

entn-at commented May 14, 2018

david-ryan-snyder commented May 14, 2018

david-ryan-snyder commented May 15, 2018

entn-at commented May 15, 2018 • edited Loading

david-ryan-snyder commented May 14, 2018 •

edited

Loading

entn-at commented May 15, 2018 •

edited

Loading