Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nnet1 dropout ivec #1090

Merged
merged 3 commits into from
Oct 4, 2016
Merged

Conversation

KarelVesely84
Copy link
Contributor

@KarelVesely84 KarelVesely84 commented Oct 3, 2016

  • adding support to annealed dropout,
  • created an example how to prepare kaldi i-vectors on fMLLR features (AMI,IHM)
  • nice gains on AMI IHM (dev, eval): w/o i-vector 24.2 24.5, per-spk i-vector 23.2 22.8, (Dan's lattice-free MMI 22.4 22.4)

- changing the scripts to support the 'annealed dropout',
@@ -22,7 +22,7 @@ feature_transform=
max_iters=20
min_iters=0 # keep training, disable weight rejection, start learn-rate halving as usual,
keep_lr_iters=0 # fix learning rate for N initial epochs, disable weight rejection,
dropout_iters= # Disable dropout after 'N' initial epochs,
dropout_schedule= # Dropout schedule for N initial epochs, for example: 0.9,0.9,0.9,0.9,0.9,1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these probabilities the probability of dropout, or the probability of not dropping out?
I think when people normally describe dropout, it's the probability of setting the feature to zero, e.g. see
https://pdfs.semanticscholar.org/c2d7/8722ebac92766f1154497d8424108d906ae3.pdf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps if you renamed this to dropout_retention_schedule it would lead to less confusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dan, how do you have it in nnet2, nnet3? As the probability that neuron is dropped? It would be good to have it the same way, I will change it then...
(Actually, I was already thinking about it and discussed it with Harish)

- there's backward compatibility in Dropout::Read()
@mallidi
Copy link
Contributor

mallidi commented Oct 4, 2016

Hi Karel,
Sorry for complaining.
dropout_schedule seems to be clear. But dropout_retention_schedule might be better, as you are converting dropout_schedule into retention_schedule.

Harish

@KarelVesely84
Copy link
Contributor Author

KarelVesely84 commented Oct 4, 2016

Hi Harish (@mallidi),
in the last commit I changed the 'dropout-retention' to 'dropout-rate'.

This is a more standard formulation as @danpovey pointed out. Given that the values in 'dropout_schedule' are the probabilities that the neurons are dropped, we can use original variable name 'dropout_schedule'.

The c++ code knows how to read the older models with 'DropoutRetention' instead of 'DropoutRate', so there is backward compatibility...

Is it okay for your needs?
K.

@mallidi
Copy link
Contributor

mallidi commented Oct 4, 2016

Sure @vesis84 Thanks a lot for the annealed dropout

Harish.

@KarelVesely84
Copy link
Contributor Author

You are welcome ;)

What seemed to work well on 'ami-ihm' was dropout-rate 0.2 in 5 initial epochs, then switching it to 0.0 (no dropout). This starts the learning rate decay, as without dropout the cross-entropy immediately increases on 'cv' data (and massively decreases on 'tr' data)...

This schedule was better than 0.5 0.4 0.3 0.2 0.1 0.0, and some other combinations.
Harish, I'm sure you might be more lucky with the annealed version :)

In the implementation there is one detail, after applying the dropout mask the output is up-scaled by 1/(1-p_drop). While, the cross-validations are always without dropout (hard-coded in the training binaries). So the up-scaling does a good job, there doesn't seem to be a severe mismatch caused by disabling the dropout in the cross-validation step... [I tried to exponentiate the 1/(1-p_drop) to make it little larger / smaller, but this was causing a mismatch visible on 'cv' loss]

K.

@KarelVesely84
Copy link
Contributor Author

@danpovey I am done with the changes, from my side it's ready.

@danpovey danpovey merged commit 98ab7d2 into kaldi-asr:master Oct 4, 2016
@KarelVesely84 KarelVesely84 deleted the nnet1_dropout_ivec branch November 6, 2017 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants