Skip to content

Latest commit

 

History

History
79 lines (64 loc) · 2.81 KB

README.md

File metadata and controls

79 lines (64 loc) · 2.81 KB

RESULTS

Notes

  • In conformer-based experiments, nn.BatchNorm1d was not used in ConvolutionModule, which made the training more stable.
  • To manually remove nn.BatchNorm1d, please modify this file:
    espnet/nets/pytorch_backend/conformer/convolution.py
    
    • Comment out the following line in __init__:
      self.norm = nn.BatchNorm1d(channels)
      
    • Modify 1D depthwise convolution in forward as follows:
      # 1D Depthwise Conv
      x = self.depthwise_conv(x)
      # x = self.activation(self.norm(x))
      x = self.activation(x)
      

Dataset

Google Speech Commands Paper: https://arxiv.org/abs/1804.03209

Two versions are supported in this recipe: 12 commands and 35 commands. The variable num_commands in run.sh should be set to 12 or 35.

  • 12 commands: 10 words + silence + unknown. Results on two test sets are reported: (1) (test) a standard test set from the original paper, and (2) (test_speechbrain) a test set used in SpeechBrain's recipe.
  • 35 commands: entire 35 command words. The entire test set from the original paper is used.

asr_conformer_noBatchNorm_warmup5k_lr2e-4_accum3_conv15_5speeds (12 commands)

Model: https://zenodo.org/record/5635530#.YcaCZBOZMVU

Environments

  • date: Sun Oct 3 05:20:21 UTC 2021
  • python version: 3.8.12 | packaged by conda-forge | (default, Sep 16 2021, 02:08:29) [GCC 9.4.0]
  • espnet version: espnet 0.10.3a3
  • pytorch version: pytorch 1.9.0
  • Git hash: 8536be6afc363bcf6b4fc6f41d612e42173de46c
    • Commit date: Sun Oct 3 04:15:48 2021 +0000

Classification Accuracy

dataset total correct accuracy
dev 4605 4499 0.9770
test 4890 4785 0.9785
test_speechbrain 4886 4809 0.9842

WER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
infer/dev 4605 4605 97.7 2.3 0.0 0.0 2.3 2.3
infer/test 4890 4890 97.9 2.1 0.0 0.0 2.1 2.1
infer/test_speechbrain 4886 4886 98.4 1.6 0.0 0.0 1.6 1.6

asr_35commands_conformer_noBatchNorm_warmup5k_lr2e-4_accum3_conv15_5speeds (35 commands)

Model: https://zenodo.org/record/5637586#.YcaCQhOZMVU

Environments

  • date: Mon Oct 4 20:07:28 UTC 2021
  • python version: 3.8.12 | packaged by conda-forge | (default, Sep 16 2021, 02:08:29) [GCC 9.4.0]
  • espnet version: espnet 0.10.3a3
  • pytorch version: pytorch 1.9.0
  • Git hash: 94a64d4037602b2a7944619075bbc04ebdcd963d
    • Commit date: Sun Oct 3 04:24:10 2021 +0000

Classification Accuracy

dataset total correct accuracy
dev 9981 9725 0.9744
test 11005 10732 0.9752

WER

dataset Snt Wrd Corr Sub Del Ins Err S.Err
infer/dev 9981 9981 97.4 2.6 0.0 0.0 2.6 2.6
infer/test 11005 11005 97.5 2.5 0.0 0.0 2.5 2.5