2.0.0rc4 (MontrealCorpusTools#424)

Aditya514 · Apr 5, 2022 · 6ee755d · 6ee755d
1 parent d5230fd
commit 6ee755d
Show file tree

Hide file tree

Showing 158 changed files with 10,522 additions and 7,074 deletions.
diff --git a/README.md b/README.md
@@ -31,13 +31,15 @@ in your environment of choice.
 If you'd like to install a local version of MFA or want to use the development set up, the easiest way is first create the dev environment from the yaml in the repo root directory:
 
 ```
-conda create -n mfa-dev -f environment.yml  # or environment_win.yml on Windows
+conda env create -n mfa-dev -f environment.yml  # Linux or mac
+conda env create -n mfa-dev -f environment_win.yml  #  Windows
 ```
 
 Alternatively, the dependencies can be installed via:
 
 ```
-conda install -c conda-forge python=3.8 kaldi sox librosa biopython praatio tqdm requests colorama pyyaml  # and pynini if on Linux or Mac
+conda install -c conda-forge python=3.8 kaldi sox librosa biopython praatio tqdm requests colorama pyyaml  # All platforms
+conda install -c conda-forge pynini openfst baumwelch ngram  # Additional dependencies to install on Linux or Mac
 ```
 
 MFA can be installed in develop mode via:

diff --git a/docs/source/_static/css/style.css b/docs/source/_static/css/style.css
@@ -12,12 +12,26 @@
   white-space: normal;
 }
 
+
+
+a.external::after{
+content: "  \f35d";
+font-size: 0.75em;
+text-align: center;
+vertical-align: middle;
+padding-bottom: 0.45em;
+}
+
 :root {
   --base-blue: 0, 53, 102;
   --dark-blue: 0, 29, 61;
   --light-blue: 14, 99, 179;
   --base-yellow: 255, 195, 0;
   --light-yellow: 255, 214, 10;
+  --sd-color-primary: #003566;
+  --sd-color-dark: #003566;
+  --sd-color-primary-text: #FFC300;
+  --sd-color-primary-highlight: #FFC300;
   --pst-color-primary: var(--base-blue);
   --pst-color-warning: var(--light-yellow);
   --pst-color-info: var(--light-blue);
@@ -41,21 +55,21 @@
     --pst-color-toc-link-hover: var(--pst-color-hover-navigation);
     --pst-color-toc-link-active: var(--pst-color-active-navigation);
 }
-.btn-navigation{
-    background-color: #0E63B3;
-    border-color: #0E63B3;
+
+.sd-btn-primary{
+font-weight: bold;
 }
-.btn-navigation:hover {
-    background-color: #FFC300;
-    border-color: #FFC300;
-    color: #000814;
+
+.sd-btn-primary:hover{
+color: #003566 !important;
 }
 .i-navigation{
     color: #003566;
     padding: 20px;
 }
-.i-navigation:hover {
-    color: #FFC300;
+
+.navbar-light .navbar-nav li a.nav-link:{
+font-size: 1.15em;
 }
 
 .rst-table-cell{
@@ -65,6 +79,9 @@ display: inline-block;
 text-align: center;
 
 }
+div[class*="highlight-"] {
+  text-align: left;
+}
 
 .supported {
 background-color: #E9F6EC;

diff --git a/docs/source/_static/interrogate_badge.svg b/docs/source/_static/interrogate_badge.svg
diff --git a/docs/source/changelog/changelog_2.0.rst b/docs/source/changelog/changelog_2.0.rst
@@ -5,10 +5,33 @@
 2.0 Changelog
 *************
 
-.. _2.0b:
+.. _2.0r:
 
-Beta releases
-=============
+Release candidates
+==================
+
+2.0.0rc4
+--------
+
+- Added ``--quiet`` flag to suppress printing output to the console
+- Added ability to specify ``pronunciation_probabilities`` in training blocks where probabilities of pronunciation variants and their probabilities of appearing before/after silence will be calculated based on alignment at that stage.  The lexicon files will be regenerated and use these probabilities for later training blocks
+- Added a flag to export per-pronunciation silence probabilities to :ref:`training_dictionary`
+- Added a flag to :ref:`transcribing` for specifying the language model weight and word insertion penalties to speed up evaluation of transcripts
+- Added a final SAT training block equivalent to the :kaldi_steps:`train_quick` script
+- Added early stopping of SAT training blocks if the corpus size is below the specified subset (at least two rounds of SAT training will be performed)
+- Refactored how transcription parsing is done, so that you can specify word break characters other than whitespace (i.e., instances of ``.`` or ``?`` in embedded in words that are typos in the corpus)
+- Refactored quotations and clitic markers, so if there happens to be a word like ``kid'``, MFA can recover the word ``kid`` from it.  If there is no word entry for ``kid`` or ``kid'`` is in the dictionary, the apostrophe will be kept.
+- Refactored the ``--test_transcription`` functionality of :ref:`validating_data` to use small language models built from all transcripts of a speaker, mixed with an even smaller language model per utterance, following :kaldi_steps:`cleanup/make_biased_lm_graphs`.
+- Refactored how internal storage is done to use a sqlite database rather than having everything in memory.  Bigger corpora should not need as much memory when aligning/training.
+- Fixed an issue in lexicon construction where explicit silences were not being respected
+- Fixed an issue in training where initial gaussians were not being properly used
+- Changed the behavior of assigning speakers to jobs, so that it now tries to balance the number of utterances across jobs
+- Changed the default topology to allow for more variable length phones (minimum duration is now one frame, 10ms by default)
+- Changed how models and dictionaries are downloaded with the changes to the `MFA Models <https://mfa-models.readthedocs.io/>`_
+- Added the ability to use pitch features for models, with the ``--use_pitch`` flag or configuration option
+- Added a ``[bracketed]`` word that will capture any transcriptions like ``[wor-]`` or ``<hes->``, as these are typically restarts, hesitations, speech errors, etc that have separate characteristics compared to a word that happen to not be in the dictionary.  The same phone is used for both, but having a separate word symbol allows silence probabilities to be modelled separately.
+- Added words for ``[laugh]`` and ``[laughter]`` to capture laughter annotations as separate from both OOV ``<unk>`` items and ``[bracketed]`` words.  As with ``[bracketed]``, the laughter words use the same ``spn`` phone, but allow for separate silence probabilities.
+- Fixed a bug where models trained in earlier version were not correctly reporting their phone set (:github_issue:`422`)
 
 2.0.0rc3
 --------
@@ -32,6 +55,11 @@ Beta releases
 - Added file listing average per-frame log-likelihoods by utterance for alignment
 - Fixed a bug where having "<s>" in a transcript would cause MFA to crash
 
+.. _2.0b:
+
+Beta releases
+=============
+
 2.0.0b11
 --------
 
@@ -40,7 +68,7 @@ Beta releases
 - Added better progress bars for corpus loading, acoustic modeling, G2P training, transcription and alignment
 - Changed the default behavior of G2P generation to use a threshold system rather than returning a single top pronunciation.  The threshold defaults to 0.99, but can be specified through ``--g2p_threshold``.  Specifying number of pronunciations will override this behavior (use ``--num_pronunciation 1`` for the old behavior).
 - Changed the behavior of G2P evaluation to check whether the generated hypothesis is in the golden pronunciation set, so languages with pronunciation variation will be less penalized in evaluation
-- Added :class:`~montreal_forced_aligner.data.Word` and :class:`~montreal_forced_aligner.data.Pronunciation` data classes
+- Added :class:`~montreal_forced_aligner.data.WordData` and :class:`~montreal_forced_aligner.data.Pronunciation` data classes
 - Refactored and simplified TextGrid export process
 - Removed the ``multilingual_ipa`` mode in favor of a more general approach to better modeling phones
 - Added functionality to evaluate alignments against golden alignment set
@@ -95,7 +123,7 @@ Beta releases
 - Massive refactor to a proper class-based API for interacting with MFA corpora
 
   - Sorry, I really do hope this is the last big refactor of 2.0
-  - :class:`~montreal_forced_aligner.corpus.classes.Speaker`, :class:`~montreal_forced_aligner.corpus.classes.File`, and :class:`~montreal_forced_aligner.corpus.classes.Utterance` have dedicated classes rather than having their information split across dictionaries mimicking Kaldi files, so they should be more useful for interacting with outside of MFA
+  - montreal_forced_aligner.corpus.classes.Speaker, :class:`~montreal_forced_aligner.corpus.classes.FileData`, and :class:`~montreal_forced_aligner.corpus.classes.UtteranceData` have dedicated classes rather than having their information split across dictionaries mimicking Kaldi files, so they should be more useful for interacting with outside of MFA
   - Added :class:`~montreal_forced_aligner.corpus.multiprocessing.Job` class as well to make it easier to generate and keep track of information about different processes
 - Updated installation style to be more dependent on conda-forge packages
 

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -47,23 +47,44 @@
     "external_links",
     # "numpydoc",
     "sphinx.ext.napoleon",
-    "sphinx_panels",
+    "sphinx_design",
     "sphinx.ext.viewcode",
     "sphinxcontrib.autoprogram",
     "sphinxemoji.sphinxemoji",
     # "sphinx_autodoc_typehints",
 ]
 panels_add_bootstrap_css = False
 intersphinx_mapping = {
+    "sqlalchemy": ("https://docs.sqlalchemy.org/en/14/", None),
+    "numpy": ("https://numpy.org/doc/stable/", None),
     "python": ("https://docs.python.org/3", None),
     "Bio": ("https://biopython.org/docs/latest/api/", None),
 }
 
+
 extlinks = {
     "mfa_pr": ("https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/pull/%s", "PR #%s"),
 }
 
 xref_links = {
+    "mfa_models": ("MFA Models", "https://mfa-models.readthedocs.io/"),
+    "anchor": ("Anchor Annotator", "https://anchor-annotator.readthedocs.io/en/latest/"),
+    "pretrained_acoustic_models": (
+        "MFA acoustic models",
+        "https://mfa-models.readthedocs.io/en/latest/acoustic/index.html",
+    ),
+    "pretrained_dictionaries": (
+        "MFA dictionaries",
+        "https://mfa-models.readthedocs.io/en/latest/dictionary/index.html",
+    ),
+    "pretrained_g2p": (
+        "MFA G2P models",
+        "https://mfa-models.readthedocs.io/en/latest/g2p/index.html",
+    ),
+    "pretrained_language_models": (
+        "MFA language models",
+        "https://mfa-models.readthedocs.io/en/latest/language_model/index.html",
+    ),
     "mfa_mailing_list": ("MFA mailing list", "https://groups.google.com/g/mfa-users"),
     "mfa_github": (
         "MFA GitHub Repo",
@@ -123,6 +144,10 @@
         "MFA-reorganization-scripts repository",
         "https://github.com/MontrealCorpusTools/MFA-reorganization-scripts",
     ),
+    "corpus_creation_scripts": (
+        "@mmcauliffe's corpus creation scripts",
+        "https://github.com/mmcauliffe/corpus-creation-scripts",
+    ),
 }
 
 # -----------------------------------------------------------------------------
@@ -139,12 +164,9 @@
     "MultispeakerDictionary": "montreal_forced_aligner.dictionary.MultispeakerDictionary",
     "Trainer": "montreal_forced_aligner.abc.Trainer",
     "Aligner": "montreal_forced_aligner.abc.Aligner",
-    "Utterance": "montreal_forced_aligner.corpus.classes.Utterance",
-    "File": "montreal_forced_aligner.corpus.classes.File",
     "FeatureConfig": "montreal_forced_aligner.config.FeatureConfig",
     "multiprocessing.context.Process": "multiprocessing.Process",
     "mp.Process": "multiprocessing.Process",
-    "Speaker": "montreal_forced_aligner.corpus.classes.Speaker",
     "Namespace": "argparse.Namespace",
     "MetaDict": "dict[str, Any]",
 }
@@ -246,11 +268,8 @@
     ("py:class", "CtmErrorDict"),
     ("py:class", "kwargs"),
     ("py:class", "Labels"),
-    ("py:class", "ScpType"),
     ("py:class", "multiprocessing.Value"),
     ("py:class", "praatio.utilities.constants.Interval"),
-    ("py:class", "CorpusMappingType"),
-    ("py:class", "DictionaryEntryType"),
     ("py:class", "montreal_forced_aligner.abc.MetaDict"),
     ("py:class", "multiprocessing.context.Process"),
 ]

diff --git a/docs/source/external_links.py b/docs/source/external_links.py
@@ -52,6 +52,22 @@ def model_role(
     return [pnode], []
 
 
+def github_issue_role(
+    typ: str,
+    rawtext: str,
+    text: str,
+    lineno: int,
+    inliner: Inliner,
+    options: dict = None,
+    content: List[str] = None,
+) -> Tuple[List[Node], List[system_message]]:
+    text = utils.unescape(text)
+    full_url = f"https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/{text}"
+    title = f"GitHub #{text}"
+    pnode = nodes.reference(title, title, internal=False, refuri=full_url)
+    return [pnode], []
+
+
 def kaldi_steps_role(
     typ: str,
     rawtext: str,
@@ -139,6 +155,22 @@ def openfst_src_role(
     return [pnode], []
 
 
+def ngram_src_role(
+    typ: str,
+    rawtext: str,
+    text: str,
+    lineno: int,
+    inliner: Inliner,
+    options: dict = None,
+    content: List[str] = None,
+) -> Tuple[List[Node], List[system_message]]:
+    text = utils.unescape(text)
+    full_url = f"https://www.opengrm.org/doxygen/ngram/html/{text}-main_8cc_source.html"
+    title = f"OpenFst {text} source"
+    pnode = nodes.reference(title, title, internal=False, refuri=full_url)
+    return [pnode], []
+
+
 def kaldi_src_role(
     typ: str,
     rawtext: str,
@@ -411,12 +443,13 @@ def get_refs(app):
 
 def setup(app: Sphinx) -> Dict[str, Any]:
     app.add_config_value("xref_links", {}, "env")
-    app.add_role("mfa_model", model_role)
+    app.add_role("github_issue", github_issue_role)
     app.add_role("kaldi_steps", kaldi_steps_role)
     app.add_role("kaldi_utils", kaldi_utils_role)
     app.add_role("kaldi_steps_sid", kaldi_steps_sid_role)
     app.add_role("kaldi_src", kaldi_src_role)
     app.add_role("openfst_src", openfst_src_role)
+    app.add_role("ngram_src", ngram_src_role)
     app.add_role("kaldi_docs", kaldi_docs_role)
     app.add_role("xref", xref)
     app.connect("builder-inited", get_refs)

diff --git a/docs/source/first_steps/example.rst b/docs/source/first_steps/example.rst
@@ -33,7 +33,7 @@ Example 1: Aligning LibriSpeech (English)
 Set up
 ------
 
-1. Ensure you have installed MFA via :ref:`installation_ref`.
+1. Ensure you have installed MFA via :ref:`installation`.
 2. Ensure you have downloaded the pretrained model via :code:`mfa model download acoustic english`
 3. Download the prepared LibriSpeech dataset (`LibriSpeech data set`_) and extract it somewhere on your computer
 4. Download the LibriSpeech lexicon (`LibriSpeech lexicon`_) and save it somewhere on your computer
@@ -69,7 +69,7 @@ Example 2: Generate Mandarin dictionary
 Set up
 ------
 
-1. Ensure you have installed MFA via :ref:`installation_ref`.
+1. Ensure you have installed MFA via :ref:`installation`.
 2. Ensure you have downloaded the pretrained model via :code:`mfa model download g2p mandarin_pinyin_g2p`
 3. Download the prepared Mandarin dataset from (`example Mandarin corpus`_) and extract it somewhere on your computer
 
@@ -102,7 +102,7 @@ Example 3: Train Mandarin G2P model
 Set up
 ------
 
-1. Ensure you have installed MFA via :ref:`installation_ref`.
+1. Ensure you have installed MFA via :ref:`installation`.
 2. Download the prepared Mandarin dictionary from (`example Mandarin dictionary`_)
 
 In the same environment that you've installed MFA, enter the following command into the terminal: