Skip to content

Commit

Permalink
Export refactor changes (MontrealCorpusTools#335)
Browse files Browse the repository at this point in the history
  • Loading branch information
mmcauliffe authored Oct 1, 2021
1 parent 0196c42 commit 85f75c0
Show file tree
Hide file tree
Showing 113 changed files with 38,426 additions and 4,091 deletions.
6 changes: 6 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ exclude_lines =

except ImportError:

except KeyboardInterrupt:

except Exception as e:

except Exception:

if call_back
if stop_check

Expand Down
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ install:
- which python
- which sox
- conda list
- python -m montreal_forced_aligner.command_line.thirdparty download
- python -m montreal_forced_aligner.command_line.mfa thirdparty download
- ls $HOME/Documents/MFA/thirdparty/bin -al
- $HOME/Documents/MFA/thirdparty/bin/compute-mfcc-feats --help
- $HOME/Documents/MFA/thirdparty/bin/ivector-extractor-est --help
Expand Down
Binary file removed docs/source/_static/dictionary_annotation.png
Binary file not shown.
Binary file removed docs/source/_static/speaker_annotation.png
Binary file not shown.
132 changes: 14 additions & 118 deletions docs/source/annotator.rst
Original file line number Diff line number Diff line change
@@ -1,134 +1,30 @@

.. _`LAV filters`: https://github.com/Nevcairiel/LAVFilters/releases
.. _`Anchor Annotator documentation`: https://anchor-annotator.readthedocs.io/en/latest/

.. _annotator:

*********
Annotator
*********
****************
Anchor annotator
****************

The Anchor Annotator is a GUI utility for MFA that allows for users to modify transcripts and add/change entries in the pronunciation dictionary to interactively fix out of vocabulary issues.

.. attention::

The GUI annotator is under development and is currently pre-alpha. Use at your own risk and please use version control
Anchor is under development and is currently pre-alpha. Use at your own risk and please use version control
or back up any critical data.

Currently the functionality of the Annotator GUI allows for users to modify transcripts and add/change
entries in the pronunciation dictionary to interactively fix out of vocabulary issues.

.. warning::

If you are trying to use the annotator from Windows, note that some issues will be present as native Windows use is not
fully supported. Specifically if you need G2P functionality, that does not function on Windows due to its dependencies
not being available (Pynini, Opengrm-ngram, OpenFst).

To use the annotator, first follow the instructions in :ref:`installation`. Once MFA is installed and thirdparty binaries
have been downloaded, run the following command:

.. code-block:: bash
mfa annotator
Initial setup
=============

To load a corpus for inspection, go to the Corpus drop down menu and select "Load a corpus". Navigate
to the desired corpus directory. Please note that it should follow one of the data formats outlined in :ref:`data_format`.

.. note::

Some set up of system codecs may be necessary to playback those types of files. For Windows, `LAV filters` has been
tested to work with :code:`.flac` files.

Next, dictionary files and G2P models should be loaded via their respective menus. If any pretrained
models have been installed via :ref:`pretrained_models`, these can be selected directly.

Fixing out of vocabulary issues
===============================

Once the corpus is loaded with a dictionary, utterances in the corpus will be parsed for whether they contain
an out of vocabulary (OOV) word. If they do, they will be marked in that column on the left with a red cell
(see number :code:`2` below).

To fix a transcript, click on the utterance in the table. This will bring up a detail view of the utterance,
with a waveform window above and the transcript in the text field. Clicking the ``Play`` button (or ``Tab`` by default)
will allow you to listen to the audio. Pressing the ``Save current file`` button (see number :code:`10` below) will save the
utterance text to the .lab/.txt file or update the interval in the TextGrid.

.. warning::
To use the annotator, first install the anchor subpackage:

Clicking ``Save`` will overwrite the source file loaded, so use this software with caution.
Backing up your data and/or using version control is recommended to ensure that any data loss
during corpus creation is minimized.
.. code-block::
If the word causing the OOV warning is in fact a word you would like aligned, you can right click on
the word and select ``Add pronunciation for 'X'`` if a G2P model is loaded (see number :code:`7` below). This will run the G2P
model to generate a pronunciation in the dictionary which can then be modified if necessary and the dictionary
can be saved via the ``Save dictionary`` button. You can also look up any word in the pronunciation
dictionary by right clicking and selecting ``Look up 'X' in dictionary``. Any pronunciation can be modified
and saved. The ``Reset dictionary`` button wil discard any changes made to the dictionary.
pip install montreal-forced-aligner[anchor]
Fixing segments
===============

.. figure:: _static/dictionary_annotation.png
:align: center
:alt: Image cannot be displayed in your browser

The file you want to fix up can be selected via the dropdown in the top left (number :code:`1` above).

For fixing up intervals, you can select segments in the left table (number :code:`2` above), or by clicking on
intervals in the plot window (i.e., number :code:`5` above).
You can edit the text in the center bottom box (number :code:`6` above), change the speaker via the dropdown next to the
text box (number :code:`12` below), and adjust
boundaries as necessary (green lines associated with number :code:`4` below). If you would like to add a new speaker,
then it can be accessed via the :code:`Speaker` tab
on the right pane, which will also list counts of utterances (see :code:`13` below). Entering a speaker name and clicking
"Add speaker" (:code:`14` below), will make that speaker available in the dropdown.

Single segments can be split via a keyboard shortcut (by default :code:`Ctrl+S`, but this can be changed, see
:ref:`configure_annotator` for more details). This will create two segments from one, split at the midpoint, but with all
the text in the first segment.

Multiple segments can be selected by holding :code:`Ctrl` (with selections shown in the left pane, though not in the waveform panel),
and can be merged into single
segments via a keyboard shortcut (by default :code:`Ctrl+M`, but this can be changed, see :ref:`configure_annotator`
for more details). Any number of segments can be selected this way, and the resulting merged segment will concatenate
the transcriptions for them all. In general, be cautious about creating too long of utterances, as in general there
is better performance in alignment for shorter utterances, and often breath pauses make for good segment boundaries if
they're visible on the waveform.

.. figure:: _static/speaker_annotation.png
:align: center
:alt: Image cannot be displayed in your browser

Segments can be added via double clicking on a speaker's tier (i.e., number :code:`11`), however, it is disabled if a
segment exists at that point. Any segments can also be deleted via a shortcut (by default :code:`Delete`). There is limited
restore functionality for deleted utterances, via a button on the bottom left.


.. _configure_annotator:

Configuring the annotator
=========================

By going to :code:`Preferences` in the :code:`Edit` menu, many aspects of the interface can be changed. The two primary
customizations currently implemented are for the appearance of the waveform/segment window and for keyboard shortcuts.

The current available shortcuts are:

.. csv-table::
:header: "Function", "Default keybind"

"Play audio", "Tab"
"Zoom in", "Ctrl+I"
"Zoom out", "Ctrl+O"
"Pan left", "Left arrow"
"Pan right", "Right arrow"
"Merge utterances", "Ctrl+M"
"Split utterances", "Ctrl+S"
"Delete utterances", "Del"
"Save current file", "By default not bound, but can be set"
"Create new segment", "Double click (currently not rebindable)"
This will install MFA if hasn't been along with all the packages that Anchor requires. Once installed, Anchor can be started with the following MFA subcommand:

.. code-block:: bash
mfa anchor
See the `Anchor Annotator documentation`_ for more information.
3 changes: 0 additions & 3 deletions docs/source/apireference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,9 +111,6 @@ Feature processing API
:template: function.rst

mfcc
apply_cmvn
add_deltas
apply_lda

.. _multiprocessing_api:

Expand Down
14 changes: 14 additions & 0 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,20 @@
Changelog
=========

2.0.0b0
-------

Beta release!

- Fixed an issue in transcription when using a .ARPA language model rather than one built in MFA
- Fixed an issue in parsing filenames containing spaces
- Added a ``mfa configure`` command to set global options. Users can now specify a new default for arguments like ``--num_jobs``, ``--clean`` or ``--temp_directory``, see :ref:`configuration` for more details.
- Added a new flag for overwriting output files. By default now, MFA will not output files if the path already exists, and will instead write to a directory in the temporary directory. You can revert this change by running ``mfa configure --always_overwrite``
- Added a ``--disable_textgrid_cleanup`` flag to disable for post-processing that MFA has implemented recently (not outputting silence labels and recombining subwords that got split up as part of dictionary look up). You can set this to be the default by running ``mfa configure --disable_textgrid_cleanup``
- Refactored and optimized the TextGrid export process to use multiple processes by default, you should be significant speed ups.
- Removed shorthand flags for ``-c`` and ``-d`` since they could represent multiple different flags/arguments.


2.0.0a24
--------

Expand Down
5 changes: 4 additions & 1 deletion docs/source/commands.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@


.. _commands:

********
Expand Down Expand Up @@ -40,7 +42,7 @@ Corpus creation
"create_segments", "Use voice activity detection to create segments", :ref:`create_segments`
"train_ivector", "Train an ivector extractor for speaker classification", :ref:`train_ivector`
"classify_speakers", "Use ivector extractor to classify files or cluster them", :ref:`classify_speakers`
"annotator", "Run a GUI annotator program for editing and managing corpora", :ref:`annotator`
"anchor", "Run the Anchor annotator utility (if installed) for editing and managing corpora", :ref:`annotator`


Other utilities
Expand All @@ -52,6 +54,7 @@ Other utilities

"download", "Download a model trained by MFA developers", :ref:`pretrained_models`
"thirdparty", "Download and validate new third party binaries", :ref:`installation`
"configure", "Configure MFA to use customized defaults for command line arguments", :ref:`configuration`


Grapheme-to-phoneme
Expand Down
3 changes: 2 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@
import mock

MOCK_MODULES = ['textgrid', 'textgrid.textgrid',
'praatio', 'praatio.tgio',
'praatio', 'praatio.tgio', 'praatio.utilities',
'praatio.utilities.constants',
'tqdm', 'yaml',
'numpy', 'resampy', 'audioread',
'scipy', 'scipy.signal', 'scipy.io',
Expand Down
76 changes: 74 additions & 2 deletions docs/source/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,82 @@
Configuration
*************

Contents:
Global configuration for MFA can be updated via the ``mfa configure`` subcommand. Once the command is called with a flag, it will set a default value for any future runs (though, you can overwrite most settings when you call other commands).

Options available:

.. option:: -t
--temp_directory

Set the default temporary directory

.. option:: -j
--num_jobs

Set the number of processes to use by default

.. option:: --always_clean

Always remove files from previous runs by default

.. option:: --never_clean

Don't remove files from previous runs by default

.. option:: --always_verbose

Default to verbose output (outputs debug messages)

.. option:: --never_verbose

Default to non-verbose output

Default to verbose output (outputs debug messages)

.. option:: --always_debug

Default to running debugging steps

.. option:: --never_debug

Default to not running debugging steps

.. option:: --always_overwrite

Always overwrite output files

.. option:: --never_overwrite

Never overwrite output files (if file already exists, the output will be saved in the temp directory)

.. option:: --disable_mp

Disable all multiprocessing (not recommended as it will usually increase processing times)

.. option:: --enable_mp

Enable multiprocessing (recommended and enabled by default)

.. option:: --disable_textgrid_cleanup

Disable postprocessing of TextGrids that cleans up silences and recombines compound words and clitics

.. option:: --enable_textgrid_cleanup

Enable postprocessing of TextGrids that cleans up silences and recombines compound words and clitics

.. option:: -h
--help

Display help message for the command



Configuration of commands
=========================

.. toctree::
:maxdepth: 3
:maxdepth: 1

configuration_align.rst
configuration_transcription.rst
Expand Down
6 changes: 3 additions & 3 deletions docs/source/configuration_align.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,7 +196,7 @@ Default training config file
- sat:
num_leaves: 2500
max_gaussians: 15000
fmllr_power: 0.2
power: 0.2
silence_weight: 0.0
fmllr_update_type: "diag"
subset: 10000
Expand All @@ -206,7 +206,7 @@ Default training config file
- sat:
num_leaves: 4200
max_gaussians: 40000
fmllr_power: 0.2
power: 0.2
silence_weight: 0.0
fmllr_update_type: "diag"
subset: 30000
Expand Down Expand Up @@ -246,7 +246,7 @@ Training configuration for 1.0
- sat:
num_leaves: 3100
max_gaussians: 50000
fmllr_power: 0.2
power: 0.2
silence_weight: 0.0
cluster_threshold: 100
fmllr_update_type: "full"
Expand Down
2 changes: 1 addition & 1 deletion montreal_forced_aligner/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
__ver_major__ = 2
__ver_minor__ = 0
__ver_patch__ = '0a24'
__ver_patch__ = '0b0'
__version__ = "{}.{}.{}".format(__ver_major__, __ver_minor__, __ver_patch__)

__all__ = ['aligner', 'command_line', 'models', 'corpus', 'config', 'dictionary', 'exceptions',
Expand Down
Loading

0 comments on commit 85f75c0

Please sign in to comment.