WIP: lexicon learning #976

xiaohui-zhang · 2016-08-17T19:26:37Z

the main script is egs/wsj/s5/steps/dict/learn_lexicon.sh

danpovey · 2016-08-17T21:11:14Z

egs/tedlium/s5_r2/local/run_learn_lex.sh

+# decode
+if [ $stage -le 3 ]; then
+  cp -rT data/lang_learned data/lang_learned_rescore
+  cp data/lang_nosp/G.fst data/lang_learned/


Unless the words.txt are exactly the same, copying the graphs isn't right.

at least check that they are the same with cmp and die if not.

danpovey · 2016-10-06T03:12:43Z

Can you please show us where on the grid this has been run?
@jtrmal, if you have time it would be good if you can look over this.

xiaohui-zhang · 2016-10-06T03:20:48Z

here: /export/a11/xzhang/tedlium/s5_r2/exp/tri3_lex_0.4_work (I added the affix "_0.4" for tuning purposes.) Sure. @jtrmal, I'd like to explain the whole thing to you tmr face to face if you want.

jtrmal · 2016-10-06T15:36:06Z

OK, I'll have a look.
y.

On Wed, Oct 5, 2016 at 11:20 PM, Xiaohui Zhang notifications@github.com
wrote:

here: /export/a11/xzhang/tedlium/s5_r2/exp/tri3_lex_0.4_work (I added the
affix "_0.4" for tuning purposes.) Sure. @jtrmal
https://github.com/jtrmal, I'd like to explain the whole thing to you
tmr face to face if you want.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#976 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AKisX2_MJJMfcvxuDIw-4zz8TbXYtLhjks5qxGkWgaJpZM4JmzJX
.

jtrmal · 2016-10-07T15:40:33Z

Samuel, do these scripts include the yesterday's changes/fixes? I remember there was a script that was crashing.

xiaohui-zhang · 2016-10-07T16:00:23Z

@jtrmal Nope. That's Dan's debug_lexicon script which I used to generate alternative pronunciations and that problem only happens for Pashto. I'll suggest changes after I figure out the problem. You can start with my scripts which are more relevant to lexicon learning: local/run_learn_lex.sh, steps/dict/learn_lexicon.sh, steps/dict/select_prons_bayesian.py

jtrmal · 2016-10-07T17:06:52Z

egs/wsj/s5/steps/dict/select_prons_bayesian.py

+import imp
+import math
+import numpy
+


Samuel, I see a bunch of modules you are not actually using here and in the other python scripts.
Please remove those. Also, as a general principle, if you importing something outside the default python libs, you should be little bit more helpful with what is wrong.

i.e.

try: import numpy except ImportError: print "Error: you don't seem to have numpy installed, OR the install is not functional" raise

btw, please install flake8 and run your script through that program

danpovey · 2016-10-07T17:12:23Z

Please avoid using numpy for stuff like this that's not math intensive...
it's not always installed.

On Fri, Oct 7, 2016 at 1:06 PM, jtrmal notifications@github.com wrote:

@jtrmal commented on this pull request.

In egs/wsj/s5/steps/dict/select_prons_bayesian.py
#976 (review):

+#!/usr/bin/env python
+
+# Copyright 2016 Xiaohui Zhang
+# Apache 2.0.
+
+from future import print_function
+from collections import defaultdict
+import os
+import argparse
+import sys
+import warnings
+import copy
+import imp
+import math
+import numpy
+

Samuel, I see a bunch of modules you are not actually using here and in
the other python scripts.
Please remove those. Also, as a general principle, if you importing
something outside the default python libs, you should be little bit more
helpful with what is wrong.

i.e.
try:
import numpy
except ImportError:
print "Error: you don't seem to have numpy installed, OR the install is
not functional"
raise

btw, please install flake8 and run your script through that program

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#976 (review),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ADJVu3oM1y-elQ_t-FY_Rk4HmHkFNU44ks5qxnwvgaJpZM4JmzJX
.

jtrmal · 2016-10-11T15:22:04Z

I didn't forget about this. Wanna run the scripts. The thing is the main script is huge, so when I talked with Dan, we were leaning towards splitting it into smaller chunks (or isolating some chunks of the code into separate scripts), but I want to get the feel for it.

BTW, @xiaohui-zhang , you still can address the other things I suggested

xiaohui-zhang · 2016-10-11T15:26:28Z

Thanks @jtrmal. Yep I'm addressing issues you found.

On Tue, Oct 11, 2016 at 11:22 AM, jtrmal notifications@github.com wrote:

I didn't forget about this. Wanna run the scripts. The thing is the main
script is huge, so when I talked with Dan, we were leaning towards
splitting it into smaller chunks (or isolating some chunks of the code into
separate scripts), but I want to get the feel for it.

BTW, @xiaohui-zhang https://github.com/xiaohui-zhang , you still can
address the other things I suggested

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#976 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ANiEEQksj5zRV0U3E2NUT6IDR654Qrsbks5qy6mjgaJpZM4JmzJX
.

Xiao-hui

xiaohui-zhang · 2016-10-12T21:11:19Z

addressed issues @jtrmal mentioned earlier

jtrmal · 2016-10-12T21:19:40Z

egs/tedlium/s5_r2/local/run_learn_lex.sh

+
+. cmd.sh
+. path.sh
+


I think these should be written as

. ./cmd.sh . ./path.sh

jtrmal · 2016-10-12T21:23:34Z

egs/wsj/s5/steps/dict/train_g2p.sh

+remove_tags=true
+only_words=true
+icu_transform="Any-Lower"
+cmd=run.pl


I think it's OK to remove the whole icu transform mess for the kaldi-wide scripts and perhaps just rely on "tr" or make it user's job?
@danpovey any opinion? I used that to correctly normalize casing of the lexicon (bc for example in Turkish, i is not lowercased I) and was handling utf-8 encoding well. But not sure if/how this is importat for kaldi in general

I just talked with Dan and let's go with ignoring the normalization issues and let's make it user's job.. The same with the apply_g2p.sh
That will simplify the g2p scripts considerably

jtrmal · 2016-10-12T21:24:11Z

egs/wsj/s5/steps/dict/train_g2p.sh

+
+[ -f ./path.sh ] && . ./path.sh; # source the path.
+. parse_options.sh || exit 1;
+


Might be better to write

. utils/parse_options.sh

danpovey · 2016-10-17T19:45:20Z

@jtrmal and @xiaohui-zhang, can you please remind me what are the next TODOs to get this in shape?

jtrmal · 2016-10-17T20:02:47Z

I want to run the scripts on graphemic babel just to see how the scripts
behave...
y.

On Mon, Oct 17, 2016 at 3:45 PM, Daniel Povey notifications@github.com
wrote:

@jtrmal https://github.com/jtrmal and @xiaohui-zhang
https://github.com/xiaohui-zhang, can you please remind me what are the
next TODOs to get this in shape?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#976 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AKisX2-W1N2UNCqgKzVwfJ-xOZ0tc0Tzks5q09BXgaJpZM4JmzJX
.

xiaohui-zhang · 2016-10-17T20:29:06Z

it'll be great if someone can run it on tedlium first since English is
understandable and it's easier to figure out potential problems in my
recipe...

On Mon, Oct 17, 2016 at 4:02 PM, jtrmal notifications@github.com wrote:

I want to run the scripts on graphemic babel just to see how the scripts
behave...
y.

On Mon, Oct 17, 2016 at 3:45 PM, Daniel Povey notifications@github.com
wrote:

@jtrmal https://github.com/jtrmal and @xiaohui-zhang
https://github.com/xiaohui-zhang, can you please remind me what are
the
next TODOs to get this in shape?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#976 (comment),
or mute
the thread
<https://github.com/notifications/unsubscribe-
auth/AKisX2-W1N2UNCqgKzVwfJ-xOZ0tc0Tzks5q09BXgaJpZM4JmzJX>
.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#976 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ANiEEZDsBsv7WEKKapEuU7BLLnVQxcOsks5q09RwgaJpZM4JmzJX
.

Xiao-hui

jtrmal

@xiaohui-zhang , I was asking you to remove the lexicon case-normalization for g2p training, as after discussion with @danpovey it seemed it should be user's responsibility. I don't seem to be able to find that comment now, but can you please address that?

I.e. please remove all the normalization, just assume the lexicon/wordlist is already given the the g2p scripts normalized. That will simplify the scripts greatly.

jtrmal · 2016-10-20T16:19:17Z

Samuel, any progress on implementing the changes?

jtrmal · 2016-10-20T16:23:53Z

egs/tedlium/s5_r2/local/run_learn_lex.sh

+
+
+. utils/parse_options.sh # accept options
+


Either use

set -e -o pipefail

or make sure all calls ends with

|| exit 1

jtrmal · 2016-10-20T16:25:07Z

egs/wsj/s5/steps/dict/train_g2p.sh

+
+# Begin configuration section.  
+iters=5
+stage=5


stage should be set to 0

jtrmal · 2016-10-20T17:46:40Z

I played with the scripts and I think the steps/learn_lexicon.sh should really be re-desgined.
Mainly, the train_g2p and apply_g2p should be taken out.
The reason is that in many recipes, you will have g2p already trained (especially given the fact we will shift the responsibility for lexicon normalization for g2p to the user) or you actually do not need the g2p at all -- For graphemic systems, you can generate pronuncation right away (albeit badly) and in some cases you might decide you don't want to mess with g2p at all and want use only the prons found in the lattice.

xiaohui-zhang · 2016-10-21T17:40:36Z

sorry I've been struggling with the g2p toolkit thing recently. am addressing these issues now.

xiaohui-zhang · 2016-10-22T19:30:18Z

I agree with @jtrmal that we should definitely take train_g2p out of the recipe. However, I doubt whether we should take apply_g2p out of the recipe. The reason is that, when running phone-decoding (debug_lexicon), we prefer a lexicon which covers most words in acoustic training data, otherwise too many oovs will make phone-decoding very slow. More importantly, in many cases we want to include pronunciation candidates from g2p in the pronunciation selection procedure. But I think we can make apply_g2p optional here, assuming the user-provided reference lexicon already covers most words in acoustic training data, and the user doesn't care about any pron from g2p during pronunciation selection. So I think I can add an option like "use_g2p_for_oov" and change the script accordingly. @danpovey what do you think?

danpovey · 2016-11-09T22:49:43Z

Usage messages always need to make it easy to tell whether something is an input or an output parameter (e.g. does the script write to, or read, the file?)

xiaohui-zhang · 2016-11-09T23:05:31Z

Sure. I noted output file as "Output file containing XXX".

On Wed, Nov 9, 2016 at 5:49 PM, Daniel Povey notifications@github.com
wrote:

Usage messages always need to make it easy to tell whether something is an
input or an output parameter (e.g. does the script write to, or read, the
file?)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#976 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ANiEEQ5B8OGg3FVkPclFZozhlDrU9-LFks5q8k4LgaJpZM4JmzJX
.

Xiao-hui

xiaohui-zhang · 2016-11-14T23:55:50Z

Thanks for @danpovey 's detailed suggestions. I've addressed them in my last commit. Please review when you have time :-)

danpovey · 2016-11-15T01:28:17Z

egs/wsj/s5/steps/dict/learn_lexicon.sh

+  steps/align_fmllr.sh --nj $nj --cmd "$train_cmd" \
+    $data $dir/lang_expanded_target_vocab $src_mdl $alidir || exit 1;
+
+  # Train another SAT system on the given data and put it in ${src_mdl}_${affix},


this would give uninformative errors if num_leaves and num_gauss were not set. Either set defaults, or check that the user has set them if retrain_src_mdl == true.

danpovey · 2016-11-15T01:30:00Z

egs/wsj/s5/steps/dict/learn_lexicon.sh

+  mkdir -p $dir/dict_expanded_target_vocab
+  cp $ref_dict/{extra_questions.txt,optional_silence.txt,nonsilence_phones.txt,silence_phones.txt} \
+    $dir/dict_expanded_target_vocab  2>/dev/null
+  rm $dir/dict_expanded_target_vocab/lexiconp.txt 2>/dev/null


Remove lexicon.txt too, prepare_lang.sh automatically creates it from lexiconp.txt. Do this wherever you remove lexiconp.txt.

danpovey · 2016-11-15T01:31:36Z

egs/wsj/s5/steps/dict/learn_lexicon.sh

+  echo " <target-vocab> the vocabulary we want the final learned lexicon to cover."
+  echo " <data>         acoustic training data we use to get alternative"
+  echo "                pronunciations and collet acoustic evidence."
+  echo " <src-mdl>      the acoustic model based on which we re-train an SAT model" 


"acoustic model" or src-mdl implies a file, but I think that is not what you mean. Please rename to srcdir and say that it's a directory containing GMM-based models.

danpovey · 2016-11-15T01:32:09Z

egs/wsj/s5/steps/dict/learn_lexicon.sh

+  echo " <ref-dict>     the dir which contains the reference lexicon (most probably hand-derived)"
+  echo "                we want to expand/improve, and nonsilence_phones.txt,.etc which we need " 
+  echo "                for building new dict dirs."
+  echo " <target-vocab> the vocabulary we want the final learned lexicon to cover."


mention the format, e.g. one word per line. [or is it a symbol table format, like words.txt?]

danpovey · 2016-11-15T01:35:47Z

egs/tedlium/s5_r2/local/run_learn_lex.sh

+if [ $stage -le 1 ]; then
+  steps/dict/learn_lexicon.sh --g2p-pron-candidates "$data/lexicon_oov_g2p.txt" \
+    --min-prob $min_prob --variants-prob-mass $variants_prob_mass \
+    --variants-prob-mass2 $variants_prob_mass2  \


this variable name does not exist any more (and this means you have not tested the script after your latest changes).

danpovey · 2016-11-16T02:55:50Z

egs/wsj/s5/steps/dict/learn_lexicon.sh

+    $data $dir/lang_combined_iter1 $mdl $dir/lats || exit 1;
+
+  # Get arc level information from the lattice.
+  $cmd JOB=1:$nj $dir/lats/log/get_arc_info.JOB.log \


I think the names lats_pass1 and lats_pass2 would be better than lats and lats2.

danpovey · 2016-11-16T03:01:59Z

egs/wsj/s5/steps/dict/learn_lexicon.sh

+if [ $stage -le 3 ]; then
+  echo "$0: Generate pronunciation candidates from phone-level decoding on acoustic training data.."
+  mdl=$src_mdl && $retrain_src_mdl && mdl=${src_mdl}_${affix}
+  steps/cleanup/debug_lexicon.sh --nj $nj --cmd "$decode_cmd" $data $dir/lang_expanded_train \


Something that I think you should investigate at some point (not necessarily right now), is whether you can improve this recipe by using a higher-order phone language model in steps/cleanup/debug_lexicon.sh. Currently it uses an un-smoothed bigram (adding unigrams causes a blowup), but you could train a triphone LM using make_phone_lm.py using the same inputs (alignments); that also does not smooth below bigram. You'd have to make sure the disambiguation symbol was taken care of... make_phone_lm.py uses a disambiguation symbol. You could probably use #0, just make sure it's in phones.txt and L_disambig has the self-loop to consume it.

sure I've added to my TODO list

danpovey · 2016-11-16T03:34:09Z

egs/wsj/s5/steps/dict/prons_to_lexicon.py

+        else:
+            raise Exception("Unknown value {0} for --{1}".format(values, self.dest))
+
+def GetArgs():


We are moving the python to a pep8-compliant style (e.g. lower_case function names), and validation with flake8, but you could do that later.

danpovey · 2016-11-16T03:35:32Z

egs/wsj/s5/steps/dict/prons_to_lexicon.py

+        words.add(entry[0])
+        print("{0} {1}".format(entry[0], entry[1]),
+                file = args.out_lexicon_handle)
+    print ("Before pruning, the total num. pronunciations is: {}".format(len(lexicon)))


I think it's best to print things like this to stderr, as a general rule. That way you can redirect the output to /dev/stdout and things will work.

danpovey · 2016-11-16T03:38:55Z

egs/wsj/s5/steps/dict/select_prons_bayesian.py

+                    break
+            if not all_ref_prons_accepted or flags[0] == '1' or flags[2] == '1':
+                words_to_edit.append((word, counts[word]))
+            if count > thr:


please use threshold, not thr-- google style guide is against abbreviated names.

jtrmal · 2016-11-16T17:25:57Z

Guys, I won't have probably time for this until half of the december (until
LORELEI dry0run finishes)
I tried the scripts on babel data (Pastho) a couple of weeks back and it
actually crashed. Unfortunately I don't have the old system anymore and I
don't know what was the error.
I was wondering if @xiaohui-zhang could try to run the scripts and see if
he gets the error and if yes, then what is the reason for that. It might be
just that I was using older scripts and the bug was fixed since that. I'm
not saying he has to make actually improving something -- just to see if
there is some robustness issue
Thanks,
y.

On Wed, Aug 17, 2016 at 3:26 PM, Xiaohui Zhang notifications@github.com
wrote:

the main script is egs/wsj/s5/steps/dict/learn_lexicon.sh

You can view, comment on, or merge this pull request online at:

#976
Commit Summary

WIP: lexicon learning

File Changes

A egs/tedlium/s5_r2/local/run_learn_lex.sh
https://github.com/kaldi-asr/kaldi/pull/976/files#diff-0 (79)

M egs/wsj/s5/steps/cleanup/debug_lexicon.sh
https://github.com/kaldi-asr/kaldi/pull/976/files#diff-1 (30)

A egs/wsj/s5/steps/cleanup/internal/get_pron_stats.py
https://github.com/kaldi-asr/kaldi/pull/976/files#diff-2 (174)

A egs/wsj/s5/steps/dict/apply_g2p.sh
https://github.com/kaldi-asr/kaldi/pull/976/files#diff-3 (127)

A egs/wsj/s5/steps/dict/get_pron_stats.py
https://github.com/kaldi-asr/kaldi/pull/976/files#diff-4 (91)

A egs/wsj/s5/steps/dict/learn_lexicon.sh
https://github.com/kaldi-asr/kaldi/pull/976/files#diff-5 (341)

A egs/wsj/s5/steps/dict/prons_to_lexicon.py
https://github.com/kaldi-asr/kaldi/pull/976/files#diff-6 (193)

A egs/wsj/s5/steps/dict/select_prons_bayesian.py
https://github.com/kaldi-asr/kaldi/pull/976/files#diff-7 (354)

A egs/wsj/s5/steps/dict/train_g2p.sh
https://github.com/kaldi-asr/kaldi/pull/976/files#diff-8 (94)

Patch Links:

https://github.com/kaldi-asr/kaldi/pull/976.patch

https://github.com/kaldi-asr/kaldi/pull/976.diff

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#976, or mute the thread
https://github.com/notifications/unsubscribe-auth/AKisX8vu6v1K9KKoAZYkyx68erk64rQ7ks5qg2B0gaJpZM4JmzJX
.

xiaohui-zhang · 2016-11-16T20:35:26Z

egs/tedlium/s5_r2/local/run_learn_lex.sh

+    $data/train_vocab.txt | sort > $data/oov_train.txt || exit 1;
+  steps/dict/apply_g2p.sh --var-counts 4 $data/oov_train.txt \
+    $g2p_mdl_dir exp/g2p/oov_lex_train || exit 1;
+  cat exp/g2p/oov_lex_train/lexicon.lex | awk '{print $1,$3}' | \


I changed the awk command back to cut -f1,3 cause the columns of lexicon.lex are separated by tabs, and the phones are separated by spaces:
hello \tab 0.8 \tab h a l o

xiaohui-zhang · 2016-11-16T22:14:25Z

@jtrmal Thanks. I'll make sure this pipeline works. After you finish the LORELEI dryrun the priority of getting your help might be experiments related to the new g2p toolkit.

@danpovey just addressed all your comments (and tested the whole thing)

danpovey · 2016-11-16T22:16:41Z

So you think this is ready to merge?

On Wed, Nov 16, 2016 at 5:14 PM, Xiaohui Zhang notifications@github.com
wrote:

@jtrmal https://github.com/jtrmal Thanks. I'll make sure this pipeline
works. After you finish the LORELEI dryrun the priority of getting your
help might be experiments related to the new g2p toolkit.

@danpovey https://github.com/danpovey just addressed all your comments
(and tested the whole thing)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#976 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu1zCf1hpLtELyfkXy6Q2BBZHMAPIks5q-4BFgaJpZM4JmzJX
.

xiaohui-zhang · 2016-11-16T22:21:38Z

yep. as long as you don't have other suggestions

vince62s · 2016-11-17T06:57:44Z

@xiaohui-zhang Do you have results to commit too ?

danpovey · 2016-11-17T19:20:21Z

I'm asking for some big changes before we commit this, but Samuel, you
might as well put the results on this PR.
Basically, after looking at the lexicon-edits it was making, I think the
algorithm should be changed to use a different objective function that's
more related to likelihood...

On Thu, Nov 17, 2016 at 1:57 AM, vince62s notifications@github.com wrote:

@xiaohui-zhang https://github.com/xiaohui-zhang Do you have results to
commit too ?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#976 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ADJVu2EDaGJPFHUUPFauOeXMbIv7m9YBks5q-_rqgaJpZM4JmzJX
.

danpovey · 2016-12-01T21:30:36Z

Since it looks like the alternative version we're looking into may be becoming a bigger and more long-term project, I'm considering committing this as a stopgap measure. Samuel, are there any small fixes that you need to apply to this PR?

xiaohui-zhang · 2016-12-01T23:28:17Z

@danpovey good to merge now

xiaohui-zhang · 2016-12-01T23:31:23Z

Hello @vince62s , sorry for the late reply. We've been working hard on improving WERs and it's hard to get consistent improvements, partly cause the pronunciations we improved don't always appear in the dev/test sets. Also my baseline was out-of-date. So I didn't bother putting numbers in RESULTS. I'll let you know when we have consistent WER improvements. Thanks!

vince62s · 2016-12-02T09:24:07Z

@xiaohui-zhang I didn't try the scripts nor read it carefully because I am not sure to fully understand what it does in the case of tedlium. I would appreciate some clarification.

In the case of Tedlium, unless I am wrong, training data do not have OOV, since they included all words from training data in the dictionary. so, if this is the case, am I right saying that this script step will do nothing ?

from another general standpoint, if a user does the job on his side (to use G2P to generate prons for oov words in the training data) which I guess is what the tedlium team did when built their dict, will the script do some kind of redundant thing / detect some lexicon errors that were initiated by G2P in the first place ?

thanks for your help.

xiaohui-zhang · 2016-12-02T19:02:45Z

Hi @vince62s , basically there is OOV cause the vocab size of data/local/dict/lexicon.txt was truncated to 200k. This recipe is not showing how to deal with OOVs with lexicon learning. What this recipe does is: it debugs the given reference lexicon (data/local/dict/lexicon.txt), so that the pronunciations agree more with the acoustics. The output lexicon will be put in data/local/dict_learned_nosp (Note that the vocab is the same as the reference lexicon, cause "target-vocab" in learn_lexicon.sh was set as ref-vocab (vocab of the reference lexicon. The recipe also outputs a lexicon-edit file, which records the changes to the reference lexicon).

danpovey reviewed Aug 17, 2016
View reviewed changes

xiaohui-zhang force-pushed the 1509 branch from dd61422 to a01d96b Compare September 19, 2016 20:03

xiaohui-zhang force-pushed the 1509 branch 2 times, most recently from 7fbabf9 to 723f9a8 Compare October 6, 2016 02:29

jtrmal reviewed Oct 7, 2016

View reviewed changes

xiaohui-zhang force-pushed the 1509 branch 2 times, most recently from 6ee9947 to e9e1103 Compare October 12, 2016 21:06

jtrmal reviewed Oct 12, 2016

View reviewed changes

egs/tedlium/s5_r2/local/run_learn_lex.sh

. cmd.sh

. path.sh

Copy link

Contributor

jtrmal Oct 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should be written as

. ./cmd.sh . ./path.sh

jtrmal reviewed Oct 12, 2016

View reviewed changes

jtrmal requested changes Oct 19, 2016

View reviewed changes

jtrmal requested changes Oct 20, 2016

View reviewed changes

xiaohui-zhang force-pushed the 1509 branch from b99e07a to 54f3c75 Compare November 14, 2016 23:52

danpovey reviewed Nov 16, 2016

View reviewed changes

xiaohui-zhang force-pushed the 1509 branch from 54f3c75 to 81ac595 Compare November 16, 2016 22:04

xiaohui-zhang commented Nov 16, 2016

View reviewed changes

xiaohui-zhang added 9 commits December 1, 2016 18:22

WIP: lexicon learning

7ede56d

lexicon learning: updated scripts.

392eaff

lexicon learning: updated scripts.

cfc9cc8

lexicon learning: more modifications

ac70eb5

fixes

1a43bd6

lexicon learning:updates

b42c815

lexicon learning: re-structured scripts,.etc

485b2ff

lexicon learning: fixed styles,.etc

a89804e

lexicon learning: fixes

e362a14

xiaohui-zhang force-pushed the 1509 branch from 81ac595 to e362a14 Compare December 1, 2016 23:23

danpovey merged commit 03e6b92 into kaldi-asr:master Dec 1, 2016

gorinars mentioned this pull request Jan 20, 2017

lexicon learning: update missing defaults and help message #1360

Merged


		[ -f ./path.sh ] && . ./path.sh; # source the path.
		. parse_options.sh \|\| exit 1;

WIP: lexicon learning #976

WIP: lexicon learning #976

Conversation

xiaohui-zhang commented Aug 17, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Oct 6, 2016

xiaohui-zhang commented Oct 6, 2016

jtrmal commented Oct 6, 2016

jtrmal commented Oct 7, 2016

xiaohui-zhang commented Oct 7, 2016

jtrmal Oct 7, 2016 • edited Loading

Choose a reason for hiding this comment

danpovey commented Oct 7, 2016

@jtrmal commented on this pull request.

jtrmal commented Oct 11, 2016

xiaohui-zhang commented Oct 11, 2016

xiaohui-zhang commented Oct 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danpovey commented Oct 17, 2016

jtrmal commented Oct 17, 2016

xiaohui-zhang commented Oct 17, 2016

jtrmal left a comment

Choose a reason for hiding this comment

jtrmal commented Oct 20, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtrmal commented Oct 20, 2016

xiaohui-zhang commented Oct 21, 2016

xiaohui-zhang commented Oct 22, 2016

danpovey commented Nov 9, 2016

xiaohui-zhang commented Nov 9, 2016

xiaohui-zhang commented Nov 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtrmal commented Nov 16, 2016

the main script is egs/wsj/s5/steps/dict/learn_lexicon.sh

Choose a reason for hiding this comment

xiaohui-zhang commented Nov 16, 2016

danpovey commented Nov 16, 2016

xiaohui-zhang commented Nov 16, 2016

vince62s commented Nov 17, 2016

danpovey commented Nov 17, 2016

danpovey commented Dec 1, 2016 • edited Loading

xiaohui-zhang commented Dec 1, 2016

xiaohui-zhang commented Dec 1, 2016

vince62s commented Dec 2, 2016

xiaohui-zhang commented Dec 2, 2016

jtrmal Oct 7, 2016 •

edited

Loading

danpovey commented Dec 1, 2016 •

edited

Loading