Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nnet3-rnnlm lattice rescoring draft #1906

Merged
merged 27 commits into from
Nov 23, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
0d839b0
draft
hainan-xv Sep 18, 2017
699c956
lattice-rescoring draft finished
hainan-xv Sep 20, 2017
ef09b62
lattice-rescoring runnable but buggy
hainan-xv Sep 22, 2017
390a1bb
making a PR
hainan-xv Sep 24, 2017
dc49709
small changes
hainan-xv Sep 24, 2017
8a33e77
include lmrescore_rnn_lat.sh
hainan-xv Sep 24, 2017
5965b87
Merge branch 'rnnlm' into rnnlm-rescoring
hainan-xv Sep 24, 2017
483450d
some aesthetic changes; not final yet
hainan-xv Sep 25, 2017
00912f7
Merge branch 'master' into rnnlm-rescoring
hainan-xv Sep 25, 2017
b1167a2
cached version of lattice rescoring; buggy it seems
hainan-xv Sep 27, 2017
a52da29
purely aesthetic changes
hainan-xv Sep 27, 2017
3bdaa4d
re-written some of the classes
hainan-xv Sep 28, 2017
2b08335
very small changes
hainan-xv Sep 28, 2017
7cf4af8
fix a typo
hainan-xv Sep 28, 2017
8f35242
make RNNLM share the same FST wordlist
hainan-xv Oct 2, 2017
705ecc8
fix small issue when running lattice-rescoring with normalize-probs o…
hainan-xv Oct 2, 2017
d19ecc1
minor changes
hainan-xv Oct 6, 2017
232ef04
fix small stylistic issues in code
hainan-xv Oct 14, 2017
bd9936b
fix wrong variable used in scripts/rnnlm/lmrescore_rnnlm_lat.sh
hainan-xv Oct 14, 2017
9cc7ba1
add rnnlm softlink in swbd/s5c
hainan-xv Oct 15, 2017
267177f
small style changes
hainan-xv Oct 30, 2017
87f2f6c
merge with latest upstream
hainan-xv Nov 3, 2017
c9bf5e0
move rescoring into rnnlm training scripts
hainan-xv Nov 7, 2017
091d4d5
move rescoring into rnnlm training scripts
hainan-xv Nov 8, 2017
a192ada
fix small issues mentioned by @danoneata
hainan-xv Nov 9, 2017
697f219
change SWBD script to accommodate s5_c; add paper link to RNNLM scrip…
hainan-xv Nov 20, 2017
acb5211
fix conflicts
hainan-xv Nov 20, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
small changes
  • Loading branch information
hainan-xv committed Sep 24, 2017
commit dc4970981377e926f979f094e51d7cd07636a085
19 changes: 2 additions & 17 deletions egs/swbd/s5/local/rnnlm/run_rescoring.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,36 +11,21 @@ id=rnn

set -e

#[ ! -f $rnndir/rnnlm ] && echo "Can't find RNNLM model" && exit 1;

LM=fsh_sw1_tg
rnndir=exp/rnnlm_lstm_h650_a
rnndir=exp/rnnlm_lstm_d

#ln -s final.raw $rnndir/rnnlm 2>/dev/null
touch $rnndir/unk.probs

for decode_set in eval2000; do
dir=exp/chain/tdnn_lstm_1e_sp
decode_dir=${dir}/decode_${decode_set}_$LM

# N-best rescoring
# steps/rnnlmrescore.sh \
# --rnnlm-ver nnet3 \
# --N $n --cmd "$decode_cmd --mem 16G" --inv-acwt 10 0.5 \
# data/lang_$LM $rnndir \
# data/$mic/$decode_set ${decode_dir} \
# ${decode_dir}.$id.$n-best &
#
# continue

# will implement later
# Lattice rescoring
steps/lmrescore_rnnlm_lat.sh \
--cmd "$decode_cmd --mem 16G" \
--rnnlm-ver kaldirnnlm --weight 0.5 --max-ngram-order $ngram_order \
data/lang_$LM $rnndir \
data/${decode_set}_hires ${decode_dir} \
${decode_dir}.rnnlm.keli.nnet3rnnlm.lat.${ngram_order}gram
${decode_dir}.nnet3rnnlm.lat.${ngram_order}gram

done

Expand Down
2 changes: 1 addition & 1 deletion egs/swbd/s5/local/score.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ data=$1

if [ -f $data/stm ]; then # use sclite scoring.
echo "$data/stm exists: using local/score_sclite.sh"
eval local/score_sclite.sh $orig_args
eval local/score_sclite.sh "$orig_args"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that all of these changes are in swbd/s5. This is super outdated. You should be using s5c. I doubt that this problem (if there was a problem) occurs in the latest script. In any case let me know what the problem was, because I'd be surprised if this was really a bug, this script being so old.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC if I don't add the "", if the $orig_args has something like --cmd "queue.pl --mem 8G" it'll complain.

else
echo "$data/stm does not exist: using local/score_basic.sh"
eval local/score_basic.sh $orig_args
Expand Down
8 changes: 7 additions & 1 deletion scripts/rnnlm/train_rnnlm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -64,16 +64,19 @@ num_splits=$(cat $dir/text/info/num_splits)
num_repeats=$(cat $dir/text/info/num_repeats)
text_files=$(for n in $(seq $num_splits); do echo $dir/text/$n.txt; done)
vocab_size=$(tail -n 1 $dir/config/words.txt | awk '{print $NF + 1}')
embedding_type=

if [ -f $dir/feat_embedding.0.mat ]; then
sparse_features=true
embedding_type=feat_embedding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just make this either "feat" or "word", remove the "_embedding".

if [ -f $dir/word_embedding.0.mat ]; then
echo "$0: error: $dir/feat_embedding.0.mat and $dir/word_embedding.0.mat both exist."
exit 1;
fi
! [ -f $dir/word_feats.txt ] && echo "$0: expected $0/word_feats.txt to exist" && exit 1;
else
sparse_features=false
embedding_type=word_embedding
! [ -f $dir/word_embedding.0.mat ] && \
echo "$0: expected $dir/word_embedding.0.mat to exist" && exit 1
fi
Expand Down Expand Up @@ -192,7 +195,7 @@ while [ $x -lt $num_iters ]; do
[ -f $dir/.train_error ] && \
echo "$0: failure on iteration $x of training, see $dir/log/train.$x.*.log for details." && exit 1
if [ $this_num_jobs -gt 1 ]; then
# average the models and the embedding matrces. Use run.pl as we don't
# average the models and the embedding matrces. Use run.pl as we don\'t
# want this to wait on the queue (if there is a queue).
src_models=$(for n in $(seq $this_num_jobs); do echo $dir/$[x+1].$n.raw; done)
src_matrices=$(for n in $(seq $this_num_jobs); do echo $dir/${embedding_type}.$[x+1].$n.mat; done)
Expand All @@ -218,8 +221,11 @@ if [ $stage -le $num_iters ]; then
echo "$0: best iteration (out of $num_iters) was $best_iter, linking it to final iteration."
ln -sf $embedding_type.$best_iter.mat $dir/$embedding_type.final.mat
ln -sf $best_iter.raw $dir/final.raw
ln -sf $best_iter.raw $dir/rnnlm # to make it consistent with other RNNLMs
fi

touch $dir/unk.probs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we modify this setup to have its own rescoring scripts, unk.probs may no longer be needed. but I may merge this as-is for now.


# Now get some diagnostics about the evolution of the objective function.
if [ $stage -le $[num_iters+1] ]; then
(
Expand Down
8 changes: 4 additions & 4 deletions src/latbin/lattice-lmrescore-kaldi-rnnlm.cc
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,11 @@ int main(int argc, char *argv[]) {
"composing with the wrapped LM using a special type of composition\n"
"algorithm. Determinization will be applied on the composed lattice.\n"
"\n"
"Usage: lattice-lmrescore-nnet3-rnnlm [options] <rnnlm-wordlist> \\\n"
"Usage: lattice-lmrescore-kaldi-rnnlm [options] <embedding-file> <rnnlm-wordlist> \\\n"
" <word-symbol-table-rxfilename> <lattice-rspecifier> \\\n"
" <rnnlm-rxfilename> <lattice-wspecifier>\n"
" e.g.: lattice-lmrescore-nnet3-rnnlm --lm-scale=-1.0 words.txt \\\n"
" ark:in.lats rnnlm ark:out.lats\n";
" <raw-rnnlm-rxfilename> <lattice-wspecifier>\n"
" e.g.: lattice-lmrescore-kaldi-rnnlm --lm-scale=-1.0 word_embedding.mat \\\n"
" rnn_words.txt fst_words.txt ark:in.lats rnnlm ark:out.lats\n";

ParseOptions po(usage);
int32 max_ngram_order = 3;
Expand Down
1 change: 1 addition & 0 deletions src/rnnlm/rnnlm-decodable-simple-looped.cc
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

// Copyright 2017 Johns Hopkins University (author: Daniel Povey)
// 2017 Yiming Wang
// 2017 Hainan Xu

// See ../../COPYING for clarification regarding multiple authors
//
Expand Down
1 change: 1 addition & 0 deletions src/rnnlm/rnnlm-decodable-simple-looped.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

// Copyright 2017 Johns Hopkins University (author: Daniel Povey)
// 2017 Yiming Wang
// 2017 Hainan Xu

// See ../../COPYING for clarification regarding multiple authors
//
Expand Down