Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updating librispeech recipe 2 #1727

Merged
merged 10 commits into from
Jul 9, 2017
Merged

Conversation

LvHang
Copy link
Contributor

@LvHang LvHang commented Jun 29, 2017

The PR carries on #1708 .
It moves the original "tdnn" recipes to a tuning-directory and adds the new recipes with "xconfigs".
Thank you for @jonlnichols 's xconfig tdnn recipe (local/chain/tuning/run_tdnn_1b.sh).

@danpovey
Copy link
Contributor

Thanks. But if the 1c turns out to be better than 1b, I'd rather not include the 1b, to avoid adding bulk to the repository.
Also, see if you can create a script, e.g. local/chain/compare_wer.sh, to automatically create that table of WERs, so that in future we don't have to do it manually. There is one in the WSJ setup that you might be able to adapt for this purpose.

@LvHang
Copy link
Contributor Author

LvHang commented Jun 29, 2017

Sure, I will create the script--compare_wer.sh.
When the experiment finish, I will inform you and deal with "1b" and "1c".

@LvHang
Copy link
Contributor Author

LvHang commented Jul 3, 2017

Hi Dan,
The following is newest results.
"1a" means the original recipe. (The result comes from the "RESULT" file)
"1b" means jonlnichols's recipe with xconfigs.
"1c" and "1b" are very similar, but "1c" is one line fewer than "1b" [relu-batchnorm-layer name=tdnn2 dim=512 input=Append(-1,0,1)]
For "1d", I think its topo is same with "1a". So I think it is "xconfig" version "1a".
From the result form, I think the result of "1d" is worst. I suggest to rerun the "1a" recipe. I guess even though the data is same, there may be a difference between two gmm system, so that the alignment or other things has a little different.

System 1a 1b 1c 1d
dev_clean(fglarge) 3.87 3.84 3.89 4.01
dev_clean(tglarge) 3.97 4 4.09 4.16
dev_clean(tgmed) 4.95 5.16 5.17 5.18
dev_clean(tgsmall) 5.57 5.8 5.81 5.84
dev_other(fglarge) 10.22 10.15 10.24 10.51
dev_other(tglarge) 10.79 10.78 10.87 11.20
dev_other(tgmed) 13.01 13.09 13.23 13.54
dev_other(tgsmall) 14.36 14.61 14.63 15.14
test_celan(fglarge) 4.17 4.38 4.41 4.42
test_celan(tglarge) 4.36 4.58 4.54 4.61
test_celan(tgmed) 5.33 5.54 5.56 5.59
test_celan(tgsmall) 5.93 6.15 6.21 6.32
test_other(fglarge) 10.62 10.64 10.62 10.95
test_other(tglarge) 10.96 11.13 11.2 11.41
test_other(tgmed) 13.24 13.45 13.64 13.87
test_other(tgsmall) 14.53 14.92 15 15.19

Hang

@danpovey
Copy link
Contributor

danpovey commented Jul 3, 2017

I see the problem. You are setting relu_dim=725, but the config just hardcodes the dimension to 512.
Try the 1c architecture with relu_dim really at 725 this time.
Also, change the frames_per_eg from 150 to 150,140,100.

@LvHang
Copy link
Contributor Author

LvHang commented Jul 3, 2017

Oh, I see. Sorry for the mistake. jonlnichols uses it, so I copy it directly. It's my fault.
I will change the "frames_per_eg".
I checked the steps/nnet3/decode, the "--frames-per-chunk" option is 50 as default. The recipes of librispeech haven't set it, but the recipes of swbd set it to be the first value of "frames_per_egs". Need we also assign it?
Otherwise, I think from the result form, the "1b" is better than "1c". Why don't we try "1b"'s architecture?
Hang

@danpovey
Copy link
Contributor

danpovey commented Jul 3, 2017 via email

@LvHang
Copy link
Contributor Author

LvHang commented Jul 5, 2017

System 1a 1b 1c 1d 1e
dev_clean(fglarge) 3.87 3.84 3.89 4.01 3.9
dev_clean(tglarge) 3.97 4 4.09 4.16 4.05
dev_clean(tgmed) 4.95 5.16 5.17 5.18 5.02
dev_clean(tgsmall) 5.57 5.8 5.81 5.84 5.64
dev_other(fglarge) 10.22 10.15 10.24 10.51 10.29
dev_other(tglarge) 10.79 10.78 10.87 11.20 10.88
dev_other(tgmed) 13.01 13.09 13.23 13.54 13.3
dev_other(tgsmall) 14.36 14.61 14.63 15.14 14.78
test_celan(fglarge) 4.17 4.38 4.41 4.42 4.28
test_celan(tglarge) 4.36 4.58 4.54 4.61 4.42
test_celan(tgmed) 5.33 5.54 5.56 5.59 5.46
test_celan(tgsmall) 5.93 6.15 6.21 6.32 6.05
test_other(fglarge) 10.62 10.64 10.62 10.95 10.9
test_other(tglarge) 10.96 11.13 11.2 11.41 11.45
test_other(tgmed) 13.24 13.45 13.64 13.87 13.82
test_other(tgsmall) 14.53 14.92 15 15.19 15.08

Hi Dan,
This is the new results about "1e" which change from "1c" and set "reludim=725", "frames_per_eg=150,140,100".
Comparing the results, I found that "1e" is a little better than "1c" in "dev_clean" dataset, but in other data_sets, it looks worse. At the same time, it still worse than "1a".

@LvHang
Copy link
Contributor Author

LvHang commented Jul 9, 2017

System 1a 1b 1c 1d 1e 1f
dev_clean(fglarge) 3.87 3.84 3.89 4.01 3.9 3.87
dev_clean(tglarge) 3.97 4 4.09 4.16 4.05 3.99
dev_clean(tgmed) 4.95 5.16 5.17 5.18 5.02 4.96
dev_clean(tgsmall) 5.57 5.8 5.81 5.84 5.64 5.42
dev_other(fglarge) 10.22 10.15 10.24 10.51 10.29 10.15
dev_other(tglarge) 10.79 10.78 10.87 11.20 10.88 10.77
dev_other(tgmed) 13.01 13.09 13.23 13.54 13.3 12.94
dev_other(tgsmall) 14.36 14.61 14.63 15.14 14.78 14.39
test_celan(fglarge) 4.17 4.38 4.41 4.42 4.28 4.14
test_celan(tglarge) 4.36 4.58 4.54 4.61 4.42 4.32
test_celan(tgmed) 5.33 5.54 5.56 5.59 5.46 5.28
test_celan(tgsmall) 5.93 6.15 6.21 6.32 6.05 5.88
test_other(fglarge) 10.62 10.64 10.62 10.95 10.9 10.80
test_other(tglarge) 10.96 11.13 11.2 11.41 11.45 11.13
test_other(tgmed) 13.24 13.45 13.64 13.87 13.82 13.37
test_other(tgsmall) 14.53 14.92 15 15.19 15.08 14.92

Hi Dan,
The above is the newest results.
For you convenience, I briefly conclude the experiments.
"1a" means the original recipe. (The result comes from the "RESULT" file)
"1b" means jonlnichols's recipe with xconfigs.
"1c" and "1b" are very similar, but "1c" is one line fewer than "1b" [relu-batchnorm-layer name=tdnn2 dim=512 input=Append(-1,0,1)]
"1d"'s topo looks like "1a" except with "reludim=512".
"1e" changes from "1c" and set "reludim=725", "frames_per_eg=150,140,100".
"1f"'s tdnn construction looks like "1a" with xconfig except the "frames_per_eg=150,140,100".

Except the "test_other" dataset, I think "1f" is slightly better than "1a".
Do you have any suggestion?

Bests,
Hang

@danpovey
Copy link
Contributor

danpovey commented Jul 9, 2017 via email

@LvHang
Copy link
Contributor Author

LvHang commented Jul 9, 2017

Ok. I removed the unnecessary recipes.
Now the local/chain/tuning/run_tdnn_1a.sh means the original recipe.
And the local/chain/tuning/run_tdnn_1b.sh means the xconfigs recipe whose topo is similar with "1a" except the "frames_per_eg=150,140,100".
Otherwise, the "local/chain/compare_wer.sh" can be use to get the results easily.

@danpovey danpovey merged commit 39c6dde into kaldi-asr:master Jul 9, 2017
kronos-cm added a commit to kronos-cm/kaldi that referenced this pull request Jul 25, 2017
* 'master' of https://github.com/kaldi-asr/kaldi: (36 commits)
  [scripts] Fix convert_nnet2_to_nnet3.py (kaldi-asr#1774)
  [egs] Add missing make_corpus_subset.sh in babel_multilang example (kaldi-asr#1766)
  [egs] Graphemic lexicon updates / fixes in babel/s5d recipe and hub4_spanish recipe (kaldi-asr#1740)
  [egs] update hkust results (kaldi-asr#1772)
  [egs] Update AMI chain experiments RE dropout, decay-time and proportional-shrink (kaldi-asr#1732)
  [egs] Fixes to the aishell (Mandarin) recipe (kaldi-asr#1770)
  [egs] Add recipe for aishell data (free Mandarin corpus, 170 hours total) (kaldi-asr#1742)
  [src] Change to arpa-reading code to accept blank lines with whitespace (kaldi-asr#1752)
  [scripts] For nnet3 training, add option to disable the model-combination  (kaldi-asr#1757)
  [scripts] minor bugfix to nnet1 alignment script when creating lattices (kaldi-asr#1764)
  [src] Add support for row/column ranges when reading GeneralMatrix (kaldi-asr#1761)
  [src] Change name of option --norm-mean->--norm-means for consistency, thanks: 415198468@qq.com
  [egs] swbd/s5c, added 5 layer (b)lstm recipes (kaldi-asr#1759)
  [scripts] Fix bug in segment_long_utterances.sh (kaldi-asr#1758)
  [src] Fix indexing error in nnet1::Convolutional2DComponent (kaldi-asr#1755)
  [src] Fix usage message of program (thanks:jubang0219@gmail.com)
  [egs] some small updates to scripts (installing beamformit; segmentation example)
  [egs] Small fix to ami/s5b/local/chain/compare_wer_general.sh (kaldi-asr#1751)
  [build] Add configuration check for incompatible g++ compilers when CUDA is enabled. (kaldi-asr#1749)
  [egs] Update Librispeech nnet3 TDNN recipe (old one did not run) (kaldi-asr#1727)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants