Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
mmcauliffe authored Apr 5, 2022
1 parent d5230fd commit 6ee755d
Show file tree
Hide file tree
Showing 158 changed files with 10,522 additions and 7,074 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,13 +31,15 @@ in your environment of choice.
If you'd like to install a local version of MFA or want to use the development set up, the easiest way is first create the dev environment from the yaml in the repo root directory:

```
conda create -n mfa-dev -f environment.yml # or environment_win.yml on Windows
conda env create -n mfa-dev -f environment.yml # Linux or mac
conda env create -n mfa-dev -f environment_win.yml # Windows
```

Alternatively, the dependencies can be installed via:

```
conda install -c conda-forge python=3.8 kaldi sox librosa biopython praatio tqdm requests colorama pyyaml # and pynini if on Linux or Mac
conda install -c conda-forge python=3.8 kaldi sox librosa biopython praatio tqdm requests colorama pyyaml # All platforms
conda install -c conda-forge pynini openfst baumwelch ngram # Additional dependencies to install on Linux or Mac
```

MFA can be installed in develop mode via:
Expand Down
35 changes: 26 additions & 9 deletions docs/source/_static/css/style.css
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,26 @@
white-space: normal;
}



a.external::after{
content: " \f35d";
font-size: 0.75em;
text-align: center;
vertical-align: middle;
padding-bottom: 0.45em;
}

:root {
--base-blue: 0, 53, 102;
--dark-blue: 0, 29, 61;
--light-blue: 14, 99, 179;
--base-yellow: 255, 195, 0;
--light-yellow: 255, 214, 10;
--sd-color-primary: #003566;
--sd-color-dark: #003566;
--sd-color-primary-text: #FFC300;
--sd-color-primary-highlight: #FFC300;
--pst-color-primary: var(--base-blue);
--pst-color-warning: var(--light-yellow);
--pst-color-info: var(--light-blue);
Expand All @@ -41,21 +55,21 @@
--pst-color-toc-link-hover: var(--pst-color-hover-navigation);
--pst-color-toc-link-active: var(--pst-color-active-navigation);
}
.btn-navigation{
background-color: #0E63B3;
border-color: #0E63B3;

.sd-btn-primary{
font-weight: bold;
}
.btn-navigation:hover {
background-color: #FFC300;
border-color: #FFC300;
color: #000814;

.sd-btn-primary:hover{
color: #003566 !important;
}
.i-navigation{
color: #003566;
padding: 20px;
}
.i-navigation:hover {
color: #FFC300;

.navbar-light .navbar-nav li a.nav-link:{
font-size: 1.15em;
}

.rst-table-cell{
Expand All @@ -65,6 +79,9 @@ display: inline-block;
text-align: center;

}
div[class*="highlight-"] {
text-align: left;
}

.supported {
background-color: #E9F6EC;
Expand Down
6 changes: 3 additions & 3 deletions docs/source/_static/interrogate_badge.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
38 changes: 33 additions & 5 deletions docs/source/changelog/changelog_2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,33 @@
2.0 Changelog
*************

.. _2.0b:
.. _2.0r:

Beta releases
=============
Release candidates
==================

2.0.0rc4
--------

- Added ``--quiet`` flag to suppress printing output to the console
- Added ability to specify ``pronunciation_probabilities`` in training blocks where probabilities of pronunciation variants and their probabilities of appearing before/after silence will be calculated based on alignment at that stage. The lexicon files will be regenerated and use these probabilities for later training blocks
- Added a flag to export per-pronunciation silence probabilities to :ref:`training_dictionary`
- Added a flag to :ref:`transcribing` for specifying the language model weight and word insertion penalties to speed up evaluation of transcripts
- Added a final SAT training block equivalent to the :kaldi_steps:`train_quick` script
- Added early stopping of SAT training blocks if the corpus size is below the specified subset (at least two rounds of SAT training will be performed)
- Refactored how transcription parsing is done, so that you can specify word break characters other than whitespace (i.e., instances of ``.`` or ``?`` in embedded in words that are typos in the corpus)
- Refactored quotations and clitic markers, so if there happens to be a word like ``kid'``, MFA can recover the word ``kid`` from it. If there is no word entry for ``kid`` or ``kid'`` is in the dictionary, the apostrophe will be kept.
- Refactored the ``--test_transcription`` functionality of :ref:`validating_data` to use small language models built from all transcripts of a speaker, mixed with an even smaller language model per utterance, following :kaldi_steps:`cleanup/make_biased_lm_graphs`.
- Refactored how internal storage is done to use a sqlite database rather than having everything in memory. Bigger corpora should not need as much memory when aligning/training.
- Fixed an issue in lexicon construction where explicit silences were not being respected
- Fixed an issue in training where initial gaussians were not being properly used
- Changed the behavior of assigning speakers to jobs, so that it now tries to balance the number of utterances across jobs
- Changed the default topology to allow for more variable length phones (minimum duration is now one frame, 10ms by default)
- Changed how models and dictionaries are downloaded with the changes to the `MFA Models <https://mfa-models.readthedocs.io/>`_
- Added the ability to use pitch features for models, with the ``--use_pitch`` flag or configuration option
- Added a ``[bracketed]`` word that will capture any transcriptions like ``[wor-]`` or ``<hes->``, as these are typically restarts, hesitations, speech errors, etc that have separate characteristics compared to a word that happen to not be in the dictionary. The same phone is used for both, but having a separate word symbol allows silence probabilities to be modelled separately.
- Added words for ``[laugh]`` and ``[laughter]`` to capture laughter annotations as separate from both OOV ``<unk>`` items and ``[bracketed]`` words. As with ``[bracketed]``, the laughter words use the same ``spn`` phone, but allow for separate silence probabilities.
- Fixed a bug where models trained in earlier version were not correctly reporting their phone set (:github_issue:`422`)

2.0.0rc3
--------
Expand All @@ -32,6 +55,11 @@ Beta releases
- Added file listing average per-frame log-likelihoods by utterance for alignment
- Fixed a bug where having "<s>" in a transcript would cause MFA to crash

.. _2.0b:

Beta releases
=============

2.0.0b11
--------

Expand All @@ -40,7 +68,7 @@ Beta releases
- Added better progress bars for corpus loading, acoustic modeling, G2P training, transcription and alignment
- Changed the default behavior of G2P generation to use a threshold system rather than returning a single top pronunciation. The threshold defaults to 0.99, but can be specified through ``--g2p_threshold``. Specifying number of pronunciations will override this behavior (use ``--num_pronunciation 1`` for the old behavior).
- Changed the behavior of G2P evaluation to check whether the generated hypothesis is in the golden pronunciation set, so languages with pronunciation variation will be less penalized in evaluation
- Added :class:`~montreal_forced_aligner.data.Word` and :class:`~montreal_forced_aligner.data.Pronunciation` data classes
- Added :class:`~montreal_forced_aligner.data.WordData` and :class:`~montreal_forced_aligner.data.Pronunciation` data classes
- Refactored and simplified TextGrid export process
- Removed the ``multilingual_ipa`` mode in favor of a more general approach to better modeling phones
- Added functionality to evaluate alignments against golden alignment set
Expand Down Expand Up @@ -95,7 +123,7 @@ Beta releases
- Massive refactor to a proper class-based API for interacting with MFA corpora

- Sorry, I really do hope this is the last big refactor of 2.0
- :class:`~montreal_forced_aligner.corpus.classes.Speaker`, :class:`~montreal_forced_aligner.corpus.classes.File`, and :class:`~montreal_forced_aligner.corpus.classes.Utterance` have dedicated classes rather than having their information split across dictionaries mimicking Kaldi files, so they should be more useful for interacting with outside of MFA
- montreal_forced_aligner.corpus.classes.Speaker, :class:`~montreal_forced_aligner.corpus.classes.FileData`, and :class:`~montreal_forced_aligner.corpus.classes.UtteranceData` have dedicated classes rather than having their information split across dictionaries mimicking Kaldi files, so they should be more useful for interacting with outside of MFA
- Added :class:`~montreal_forced_aligner.corpus.multiprocessing.Job` class as well to make it easier to generate and keep track of information about different processes
- Updated installation style to be more dependent on conda-forge packages

Expand Down
33 changes: 26 additions & 7 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,23 +47,44 @@
"external_links",
# "numpydoc",
"sphinx.ext.napoleon",
"sphinx_panels",
"sphinx_design",
"sphinx.ext.viewcode",
"sphinxcontrib.autoprogram",
"sphinxemoji.sphinxemoji",
# "sphinx_autodoc_typehints",
]
panels_add_bootstrap_css = False
intersphinx_mapping = {
"sqlalchemy": ("https://docs.sqlalchemy.org/en/14/", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"python": ("https://docs.python.org/3", None),
"Bio": ("https://biopython.org/docs/latest/api/", None),
}


extlinks = {
"mfa_pr": ("https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/pull/%s", "PR #%s"),
}

xref_links = {
"mfa_models": ("MFA Models", "https://mfa-models.readthedocs.io/"),
"anchor": ("Anchor Annotator", "https://anchor-annotator.readthedocs.io/en/latest/"),
"pretrained_acoustic_models": (
"MFA acoustic models",
"https://mfa-models.readthedocs.io/en/latest/acoustic/index.html",
),
"pretrained_dictionaries": (
"MFA dictionaries",
"https://mfa-models.readthedocs.io/en/latest/dictionary/index.html",
),
"pretrained_g2p": (
"MFA G2P models",
"https://mfa-models.readthedocs.io/en/latest/g2p/index.html",
),
"pretrained_language_models": (
"MFA language models",
"https://mfa-models.readthedocs.io/en/latest/language_model/index.html",
),
"mfa_mailing_list": ("MFA mailing list", "https://groups.google.com/g/mfa-users"),
"mfa_github": (
"MFA GitHub Repo",
Expand Down Expand Up @@ -123,6 +144,10 @@
"MFA-reorganization-scripts repository",
"https://github.com/MontrealCorpusTools/MFA-reorganization-scripts",
),
"corpus_creation_scripts": (
"@mmcauliffe's corpus creation scripts",
"https://github.com/mmcauliffe/corpus-creation-scripts",
),
}

# -----------------------------------------------------------------------------
Expand All @@ -139,12 +164,9 @@
"MultispeakerDictionary": "montreal_forced_aligner.dictionary.MultispeakerDictionary",
"Trainer": "montreal_forced_aligner.abc.Trainer",
"Aligner": "montreal_forced_aligner.abc.Aligner",
"Utterance": "montreal_forced_aligner.corpus.classes.Utterance",
"File": "montreal_forced_aligner.corpus.classes.File",
"FeatureConfig": "montreal_forced_aligner.config.FeatureConfig",
"multiprocessing.context.Process": "multiprocessing.Process",
"mp.Process": "multiprocessing.Process",
"Speaker": "montreal_forced_aligner.corpus.classes.Speaker",
"Namespace": "argparse.Namespace",
"MetaDict": "dict[str, Any]",
}
Expand Down Expand Up @@ -246,11 +268,8 @@
("py:class", "CtmErrorDict"),
("py:class", "kwargs"),
("py:class", "Labels"),
("py:class", "ScpType"),
("py:class", "multiprocessing.Value"),
("py:class", "praatio.utilities.constants.Interval"),
("py:class", "CorpusMappingType"),
("py:class", "DictionaryEntryType"),
("py:class", "montreal_forced_aligner.abc.MetaDict"),
("py:class", "multiprocessing.context.Process"),
]
Expand Down
35 changes: 34 additions & 1 deletion docs/source/external_links.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,22 @@ def model_role(
return [pnode], []


def github_issue_role(
typ: str,
rawtext: str,
text: str,
lineno: int,
inliner: Inliner,
options: dict = None,
content: List[str] = None,
) -> Tuple[List[Node], List[system_message]]:
text = utils.unescape(text)
full_url = f"https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner/issues/{text}"
title = f"GitHub #{text}"
pnode = nodes.reference(title, title, internal=False, refuri=full_url)
return [pnode], []


def kaldi_steps_role(
typ: str,
rawtext: str,
Expand Down Expand Up @@ -139,6 +155,22 @@ def openfst_src_role(
return [pnode], []


def ngram_src_role(
typ: str,
rawtext: str,
text: str,
lineno: int,
inliner: Inliner,
options: dict = None,
content: List[str] = None,
) -> Tuple[List[Node], List[system_message]]:
text = utils.unescape(text)
full_url = f"https://www.opengrm.org/doxygen/ngram/html/{text}-main_8cc_source.html"
title = f"OpenFst {text} source"
pnode = nodes.reference(title, title, internal=False, refuri=full_url)
return [pnode], []


def kaldi_src_role(
typ: str,
rawtext: str,
Expand Down Expand Up @@ -411,12 +443,13 @@ def get_refs(app):

def setup(app: Sphinx) -> Dict[str, Any]:
app.add_config_value("xref_links", {}, "env")
app.add_role("mfa_model", model_role)
app.add_role("github_issue", github_issue_role)
app.add_role("kaldi_steps", kaldi_steps_role)
app.add_role("kaldi_utils", kaldi_utils_role)
app.add_role("kaldi_steps_sid", kaldi_steps_sid_role)
app.add_role("kaldi_src", kaldi_src_role)
app.add_role("openfst_src", openfst_src_role)
app.add_role("ngram_src", ngram_src_role)
app.add_role("kaldi_docs", kaldi_docs_role)
app.add_role("xref", xref)
app.connect("builder-inited", get_refs)
Expand Down
6 changes: 3 additions & 3 deletions docs/source/first_steps/example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Example 1: Aligning LibriSpeech (English)
Set up
------

1. Ensure you have installed MFA via :ref:`installation_ref`.
1. Ensure you have installed MFA via :ref:`installation`.
2. Ensure you have downloaded the pretrained model via :code:`mfa model download acoustic english`
3. Download the prepared LibriSpeech dataset (`LibriSpeech data set`_) and extract it somewhere on your computer
4. Download the LibriSpeech lexicon (`LibriSpeech lexicon`_) and save it somewhere on your computer
Expand Down Expand Up @@ -69,7 +69,7 @@ Example 2: Generate Mandarin dictionary
Set up
------

1. Ensure you have installed MFA via :ref:`installation_ref`.
1. Ensure you have installed MFA via :ref:`installation`.
2. Ensure you have downloaded the pretrained model via :code:`mfa model download g2p mandarin_pinyin_g2p`
3. Download the prepared Mandarin dataset from (`example Mandarin corpus`_) and extract it somewhere on your computer

Expand Down Expand Up @@ -102,7 +102,7 @@ Example 3: Train Mandarin G2P model
Set up
------

1. Ensure you have installed MFA via :ref:`installation_ref`.
1. Ensure you have installed MFA via :ref:`installation`.
2. Download the prepared Mandarin dictionary from (`example Mandarin dictionary`_)

In the same environment that you've installed MFA, enter the following command into the terminal:
Expand Down
Loading

0 comments on commit 6ee755d

Please sign in to comment.