Skip to content

Commit

Permalink
Updates for 2.0.0a3
Browse files Browse the repository at this point in the history
See changelog
  • Loading branch information
mmcauliffe authored Feb 2, 2021
1 parent adc05e2 commit f29c6bc
Show file tree
Hide file tree
Showing 101 changed files with 6,834 additions and 4,443 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ branches:
notifications:
email: false

dist: xenial
dist: bionic

addons:
apt:
Expand Down
10 changes: 10 additions & 0 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,15 @@
Changelog
=========

2.0.0a3
-------

- Further optimized corpus parsing algorithm to use multiprocessing and to load from saved files in temporary directories
- Revamped and fixed training using subsets of the corpora
- Fixed issue with training LDA systems
- Fixed a long-standing issue with words being marked as OOV due to improperly parsing clitics
- Updated logging to better capture when errors occur due to Kaldi binaries to better locate sources of issues

2.0.0
-----

Expand All @@ -17,6 +26,7 @@ Currently under development with major changes, see :ref:`whats_new_2_0`.
performance. This change should result in faster speaker adaptation.
- Optimized corpus parsing algorithm to be O(n log n) instead of O(n^2) (`PR #194`_)


1.1.0
-----

Expand Down
69 changes: 69 additions & 0 deletions docs/source/classify_speakers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
.. _classify_speakers:

**********************
Speaker classification
**********************

The Montreal Forced Aligner can use trained ivector models (see :ref:`train_ivector` for more information about training
these models) to classify or cluster utterances according to speakers.

Steps to classify speakers:


1. Provided the steps in :ref:`installation` have been completed and you are in the same Conda/virtual environment that
MFA was installed in.
2. Run the following command, substituting the arguments with your own paths:

.. code-block:: bash
mfa classify_speakers corpus_directory ivector_extractor_path output_directory
If the input uses TextGrids, the output TextGrids will have utterances sorted into tiers by each identified speaker. At
the moment, there is no way to retrain the classifier based on new data.

If the input corpus directory does not have TextGrids associated with them, then the speaker classifier will output
speaker directories with a text file that contains all the utterances that were classified.

Options available:

.. option:: -h
--help

Display help message for the command

.. option:: -t DIRECTORY
--temp_directory DIRECTORY

Temporary directory root to use for aligning, default is ``~/Documents/MFA``

.. option:: -j NUMBER
--num_jobs NUMBER

Number of jobs to use; defaults to 3, set higher if you have more
processors available and would like to process faster

.. option:: -s NUMBER
--num_speakers NUMBER

Number of speakers to return. If ``--cluster`` is present, this specifies the number of clusters. Otherwise,
MFA will sort speakers according to the first pass classification and then takes the top X speakers, and reclassify
the utterances to only use those speakers.

.. option:: --cluster

MFA will perform clustering of utterance ivectors into the number of speakers specified by ``--num_speakers``

.. option:: -v
--verbose

The aligner will print out more information if present

.. option:: -d
--debug

The aligner will run in debug mode

.. option:: -c
--clean

Forces removal of temporary files in ``~/Documents/MFA``
18 changes: 15 additions & 3 deletions docs/source/commands.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ Forced Alignment
"train", "Train an acoustic model and export resulting alignment", :ref:`trained_alignment`
"validate", "Validate a corpus to ensure there are no issues with the data format", :ref:`validating_data`
"train_dictionary", "Estimate pronunciation probabilities from aligning a corpus", :ref:`training_dictionary`
"train_ivector", "Train an ivector extractor for speaker diarization", ""


Transcription
Expand All @@ -30,6 +29,19 @@ Transcription
"transcribe", "Generate transcriptions using an acoustic model, dictionary, and language model", :ref:`transcribing`
"train_lm", "Train a language model from a text corpus or from an existing language model", :ref:`training_lm`

Corpus creation
===============

.. csv-table::
:header: "Command", "Description", "Link"
:widths: 10, 110, 40

"create_segments", "Use voice activity detection to create segments", :ref:`create_segments`
"train_ivector", "Train an ivector extractor for speaker classification", :ref:`train_ivector`
"classify_speakers", "Use ivector extractor to classify files or cluster them", :ref:`classify_speakers`
"annotator", "Run a GUI annotator program for editing and managing corpora", :ref:`annotator`


Other utilities
===============

Expand All @@ -39,7 +51,7 @@ Other utilities

"download", "Download a model trained by MFA developers", :ref:`pretrained_models`
"thirdparty", "Download and validate new third party binaries", :ref:`installation`
"annotator", "Run a GUI annotator program for editing and managing corpora", :ref:`annotator`


Grapheme-to-phoneme
===================
Expand All @@ -49,4 +61,4 @@ Grapheme-to-phoneme
:widths: 10, 110, 40

"g2p", "Use a G2P model to generate a pronunciation dictionary", :ref:`g2p_dictionary_generating`
"train_g2p", "Train a G2P model from a pronunciation dictionary", :ref:`g2p_model_training`
"train_g2p", "Train a G2P model from a pronunciation dictionary", :ref:`g2p_model_training`
3 changes: 2 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@
'scipy', 'scipy.signal', 'scipy.io',
'librosa', 'librosa.core.spectrum', 'matplotlib',
'soundfile',
'pyqt5', 'pyqtgraph', 'requests', 'requests.exceptions']
'pyqt5', 'pyqtgraph', 'requests', 'requests.exceptions',
'sklearn', 'joblib', 'sklearn.naive_bayes']

for mod_name in MOCK_MODULES:
sys.modules[mod_name] = mock.Mock()
Expand Down
Loading

0 comments on commit f29c6bc

Please sign in to comment.