-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LT-LM recipe for librispeech #4590
Conversation
@medbar, thanks for this new recipe, really interesting approach, спасибо, чрезвычайно любопытно, to the point I want to run and play with it myself—I'm interested in one-shot rescoring. Please do not be overwhelmed by the sheer amount of review comments: the code is large indeed, and there will be a lot of changes requested. We're generally following the Google code style with a couple of exceptions. I understand that comments like "add space here" and "remove space here" might be annoying, and C++ reviews are especially persnickety, but it what helps keeping the code base high, so please do not be off-put by them. It's the best part of keeping our codebase accessible to future collaborators. If you are comfortable with Oh, and feel free to ignore CodeFactor non-passing. You can skim through suggestions, but they more often make no real readability sense than do. Sometimes it finds important pessimizations to get rid of, but they drown in it's noise. Do you think that the only C++ binary that you're adding is beneficial to have for general use? Maybe just put it into latbin/, if it's not extremely specialized? I do not think we have recipes with their own binaries, generally all things C++ go under src. I would not worry about too much code bloat, and we're already bloated to 15GB of binaries in a static build, so it's a drop in the sea anyway. About the torch training part: does it use Hydra to orchestrate? Does it benefit from NCCL, i.e. is it faster on a multi-GPU nodes, or many single-GPU nodes would work as well, in your experience? What GPU would be the best? If fairview use lower-precision tensor cores in training, T4 would be the king; in NCCL scenario, V100 can be connected into a large collective; and if it scales well to multiple GPU nodes with a fast fabric network, P100 are likely the best bang for the buck. In the paper (very start of section 5) a 50GB text dataset is mentioned, but without any citation or references. I only skimmed the code, so I might have easily missed where it's coming from. Can you give a pointer please? I'll get to review in the next couple days. |
@kkm000, thank you for your answer. I styled c++ code by
latgen-faster-mapped-fake-am can be used only with FAM (Fake Acoustic Model) generated by
In the paper, single-GPU training was used. But it took about two weeks so I decided to use several nodes setup in new experiments. This recipe trains lt-lm on slurm cluster using Distributed Data Parallel with NCCL as a backend. With 6 2080ti ( two nodes, 3 gpu per node) model had been trained for 4 days. The main bottleneck of this setup is memory usage. I had to use
We use extra 5GB (not 50GB) text dataset which is included in the LibriSpeech corpus. In Looking forward to your review. |
@medbar, thanks for your detailed comments. I think that you are doing it all correctly w.r.t. Kaldi data structures, but I'll give a second brain pass when I have a breather. I'm really sorry about sitting on your recipe review for this long. As it usually happens, unexpected (ha!) emergencies at work. |
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a missing file header, otherwise LGTM!
I'm so sorry about dragging this on. My bad. Let's fix and commit that ASAP.
@@ -0,0 +1,197 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a standard header. Borrow one from any file in bin/ and put your name or institution or both. The file ./COPYING has instructions, if you aren't sure. LGTM otherwise.
Hello to all! |
Code for LT-LM training on the LibriSpeech corpus.
Paper LT-LM: a novel non-autoregressive language model for single-shot lattice rescoring
The model is trained using the fairseq framework.