Updates for 2.0.0a3

See changelog
Aditya514 · Feb 2, 2021 · f29c6bc · f29c6bc
1 parent adc05e2
commit f29c6bc
Show file tree

Hide file tree

Showing 101 changed files with 6,834 additions and 4,443 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -7,7 +7,7 @@ branches:
 notifications:
   email: false
 
-dist: xenial
+dist: bionic
 
 addons:
   apt:

diff --git a/docs/source/changelog.rst b/docs/source/changelog.rst
@@ -5,6 +5,15 @@
 Changelog
 =========
 
+2.0.0a3
+-------
+
+- Further optimized corpus parsing algorithm to use multiprocessing and to load from saved files in temporary directories
+- Revamped and fixed training using subsets of the corpora
+- Fixed issue with training LDA systems
+- Fixed a long-standing issue with words being marked as OOV due to improperly parsing clitics
+- Updated logging to better capture when errors occur due to Kaldi binaries to better locate sources of issues
+
 2.0.0
 -----
 
@@ -17,6 +26,7 @@ Currently under development with major changes, see :ref:`whats_new_2_0`.
   performance.  This change should result in faster speaker adaptation.
 - Optimized corpus parsing algorithm to be O(n log n) instead of O(n^2) (`PR #194`_)
 
+
 1.1.0
 -----
 

diff --git a/docs/source/classify_speakers.rst b/docs/source/classify_speakers.rst
@@ -0,0 +1,69 @@
+.. _classify_speakers:
+
+**********************
+Speaker classification
+**********************
+
+The Montreal Forced Aligner can use trained ivector models (see :ref:`train_ivector` for more information about training
+these models) to classify or cluster utterances according to speakers.
+
+Steps to classify speakers:
+
+
+1. Provided the steps in :ref:`installation` have been completed and you are in the same Conda/virtual environment that
+   MFA was installed in.
+2. Run the following command, substituting the arguments with your own paths:
+
+  .. code-block:: bash
+
+     mfa classify_speakers corpus_directory ivector_extractor_path output_directory
+
+If the input uses TextGrids, the output TextGrids will have utterances sorted into tiers by each identified speaker. At
+the moment, there is no way to retrain the classifier based on new data.
+
+If the input corpus directory does not have TextGrids associated with them, then the speaker classifier will output
+speaker directories with a text file that contains all the utterances that were classified.
+
+Options available:
+
+.. option:: -h
+               --help
+
+  Display help message for the command
+
+.. option:: -t DIRECTORY
+               --temp_directory DIRECTORY
+
+   Temporary directory root to use for aligning, default is ``~/Documents/MFA``
+
+.. option:: -j NUMBER
+               --num_jobs NUMBER
+
+  Number of jobs to use; defaults to 3, set higher if you have more
+  processors available and would like to process faster
+
+.. option:: -s NUMBER
+               --num_speakers NUMBER
+
+  Number of speakers to return.  If ``--cluster`` is present, this specifies the number of clusters.  Otherwise,
+  MFA will sort speakers according to the first pass classification and then takes the top X speakers, and reclassify
+  the utterances to only use those speakers.
+
+.. option:: --cluster
+
+  MFA will perform clustering of utterance ivectors into the number of speakers specified by ``--num_speakers``
+
+.. option:: -v
+               --verbose
+
+  The aligner will print out more information if present
+
+.. option:: -d
+               --debug
+
+  The aligner will run in debug mode
+
+.. option:: -c
+               --clean
+
+  Forces removal of temporary files in ``~/Documents/MFA``
diff --git a/docs/source/commands.rst b/docs/source/commands.rst
@@ -17,7 +17,6 @@ Forced Alignment
    "train", "Train an acoustic model and export resulting alignment", :ref:`trained_alignment`
    "validate", "Validate a corpus to ensure there are no issues with the data format", :ref:`validating_data`
    "train_dictionary", "Estimate pronunciation probabilities from aligning a corpus", :ref:`training_dictionary`
-   "train_ivector", "Train an ivector extractor for speaker diarization", ""
 
 
 Transcription
@@ -30,6 +29,19 @@ Transcription
    "transcribe", "Generate transcriptions using an acoustic model, dictionary, and language model", :ref:`transcribing`
    "train_lm", "Train a language model from a text corpus or from an existing language model", :ref:`training_lm`
 
+Corpus creation
+===============
+
+.. csv-table::
+   :header: "Command", "Description", "Link"
+   :widths: 10, 110, 40
+
+   "create_segments", "Use voice activity detection to create segments", :ref:`create_segments`
+   "train_ivector", "Train an ivector extractor for speaker classification", :ref:`train_ivector`
+   "classify_speakers", "Use ivector extractor to classify files or cluster them", :ref:`classify_speakers`
+   "annotator", "Run a GUI annotator program for editing and managing corpora", :ref:`annotator`
+
+
 Other utilities
 ===============
 
@@ -39,7 +51,7 @@ Other utilities
 
    "download", "Download a model trained by MFA developers", :ref:`pretrained_models`
    "thirdparty", "Download and validate new third party binaries", :ref:`installation`
-   "annotator", "Run a GUI annotator program for editing and managing corpora", :ref:`annotator`
+
 
 Grapheme-to-phoneme
 ===================
@@ -49,4 +61,4 @@ Grapheme-to-phoneme
    :widths: 10, 110, 40
 
    "g2p", "Use a G2P model to generate a pronunciation dictionary", :ref:`g2p_dictionary_generating`
-   "train_g2p", "Train a G2P model from a pronunciation dictionary", :ref:`g2p_model_training`
+   "train_g2p", "Train a G2P model from a pronunciation dictionary", :ref:`g2p_model_training`
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -28,7 +28,8 @@
                 'scipy', 'scipy.signal', 'scipy.io',
                 'librosa', 'librosa.core.spectrum', 'matplotlib',
                 'soundfile',
-                'pyqt5', 'pyqtgraph', 'requests', 'requests.exceptions']
+                'pyqt5', 'pyqtgraph', 'requests', 'requests.exceptions',
+                'sklearn', 'joblib', 'sklearn.naive_bayes']
 
 for mod_name in MOCK_MODULES:
     sys.modules[mod_name] = mock.Mock()