-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nnet3-rnnlm lattice rescoring draft #1906
Changes from 1 commit
0d839b0
699c956
ef09b62
390a1bb
dc49709
8a33e77
5965b87
483450d
00912f7
b1167a2
a52da29
3bdaa4d
2b08335
7cf4af8
8f35242
705ecc8
d19ecc1
232ef04
bd9936b
9cc7ba1
267177f
87f2f6c
c9bf5e0
091d4d5
a192ada
697f219
acb5211
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -64,16 +64,19 @@ num_splits=$(cat $dir/text/info/num_splits) | |
num_repeats=$(cat $dir/text/info/num_repeats) | ||
text_files=$(for n in $(seq $num_splits); do echo $dir/text/$n.txt; done) | ||
vocab_size=$(tail -n 1 $dir/config/words.txt | awk '{print $NF + 1}') | ||
embedding_type= | ||
|
||
if [ -f $dir/feat_embedding.0.mat ]; then | ||
sparse_features=true | ||
embedding_type=feat_embedding | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's just make this either "feat" or "word", remove the "_embedding". |
||
if [ -f $dir/word_embedding.0.mat ]; then | ||
echo "$0: error: $dir/feat_embedding.0.mat and $dir/word_embedding.0.mat both exist." | ||
exit 1; | ||
fi | ||
! [ -f $dir/word_feats.txt ] && echo "$0: expected $0/word_feats.txt to exist" && exit 1; | ||
else | ||
sparse_features=false | ||
embedding_type=word_embedding | ||
! [ -f $dir/word_embedding.0.mat ] && \ | ||
echo "$0: expected $dir/word_embedding.0.mat to exist" && exit 1 | ||
fi | ||
|
@@ -192,7 +195,7 @@ while [ $x -lt $num_iters ]; do | |
[ -f $dir/.train_error ] && \ | ||
echo "$0: failure on iteration $x of training, see $dir/log/train.$x.*.log for details." && exit 1 | ||
if [ $this_num_jobs -gt 1 ]; then | ||
# average the models and the embedding matrces. Use run.pl as we don't | ||
# average the models and the embedding matrces. Use run.pl as we don\'t | ||
# want this to wait on the queue (if there is a queue). | ||
src_models=$(for n in $(seq $this_num_jobs); do echo $dir/$[x+1].$n.raw; done) | ||
src_matrices=$(for n in $(seq $this_num_jobs); do echo $dir/${embedding_type}.$[x+1].$n.mat; done) | ||
|
@@ -218,8 +221,11 @@ if [ $stage -le $num_iters ]; then | |
echo "$0: best iteration (out of $num_iters) was $best_iter, linking it to final iteration." | ||
ln -sf $embedding_type.$best_iter.mat $dir/$embedding_type.final.mat | ||
ln -sf $best_iter.raw $dir/final.raw | ||
ln -sf $best_iter.raw $dir/rnnlm # to make it consistent with other RNNLMs | ||
fi | ||
|
||
touch $dir/unk.probs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. once we modify this setup to have its own rescoring scripts, unk.probs may no longer be needed. but I may merge this as-is for now. |
||
|
||
# Now get some diagnostics about the evolution of the objective function. | ||
if [ $stage -le $[num_iters+1] ]; then | ||
( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that all of these changes are in swbd/s5. This is super outdated. You should be using s5c. I doubt that this problem (if there was a problem) occurs in the latest script. In any case let me know what the problem was, because I'd be surprised if this was really a bug, this script being so old.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC if I don't add the "", if the $orig_args has something like --cmd "queue.pl --mem 8G" it'll complain.