CHiME Utils

✅ Official data generation and data preparation scripts for CHiME-8 DASR.

We provide a more convenient standalone interface for downloading and prepare the core CHiME-8 DASR data.
This year we also support automatic downloading of CHiME-6.

⚠️ NOTE
For in-depth details about CHiME-8 DASR data and rules refer to chimechallenge.org/current/task1/data.

📩 Contact

For any issue/bug/question with this package feel free to raise an issue here or reach us via the CHiME [![Slack][slack-badge]][slack-invite]

Installation

I recommend making a fresh conda env before:

conda create --name chimeutils python=3.8
conda activate chimeutils

You can install with:

pip install git+https://github.com/chimechallenge/chime-utils

Usage

This package brings a new module:
from chime-utils import dgen, dprep, scoring, text_norm

And new CLI commands:

chime-utils dgen
- generates and downloads CHiME-8 data.
chime-utils lhotse-prep
- prepares CHiME-8 data lhotse manifests (which can be then converted to Kaldi and ESPNet compatible ones).
chime-utils speechbrain-prep
- prepares CHiME-8 data Speechbrain-style JSON format.
chime-utils score
- scripts used for official scoring.

Hereafter we describe each command/function in detail.

Data generation

⚡ All DASR data in one go

You can generate all CHiME-8 DASR data in one go with:
chime-utils dgen dasr ./download /path/to/mixer6 ./chime8_dasr --part train,dev

This script will download CHiME-6, DiPCo and NOTSOFAR1 automatically in ./download
Ensure you have at least 1TB of space there. You can remove the .tar.gz after the full data preparation to save some space later.

Mixer 6 Speech instead has to be obtained through LDC.
Refer to chimechallenge.org/current/task1/data on how to obtain Mixer 6 Speech.

🔐 You can check if the data has been successfully prepared with:
chime-utils dgen checksum ./chime8_dasr
It is better to run this also for the evaluation part, when evaluation will be released.

🐢 Single Dataset Scripts

We also provide scripts for obtaining each core dataset independently if needed.

CHiME-6
- chime-utils dgen chime6 /path/to/chime6 ./chime8_dasr/chime6 --part train,dev
- It can also be downloaded automatically to ./download/chime6 using:
  - chime-utils dgen chime6 ./download/chime6 ./chime8_dasr/chime6 --part train,dev --download
DiPCo
- chime-utils dgen dipco /path/to/dipco ./chime8_dasr/dipco --part dev
- It can also be downloaded automatically to ./download/dipco using:
  - chime-utils dgen dipco ./download/dipco ./chime8_dasr/dipco --part dev --download
Mixer 6 Speech
- chime-utils dgen mixer6 /path/to/mixer6 ./chime8_dasr/mixer6 --part train_call,train_intv,dev
NOTSOFAR1
- chime-utils dgen notsofar1 /path/to/notsofar1 ./chime8_dasr/notsofar1 --part dev
- It can also be downloaded automatically to ./download/notsofar1 using:
  - chime-utils dgen notsofar1 ./download/notsofar1 ./chime8_dasr/notsofar1 --part dev --download

Data preparation

🚀 NVIDIA NeMo Official Baseline

This year CHiME-8 DASR baseline is built directly upon NVIDIA NeMo last year CHiME-7 DASR Submission [1].

It is available at FIXME

Other Toolkits

For convenience, we also offer here data preparation scripts for different toolkits:

⚠️ NOTE
In all manifests preparation scripts you can choose which text normalization you want to apply on each utterance using as an additional argument:

--txt-norm chime8
- this is
chime7
chime6

K2/Icefall/Lhotse

You can prepare Lhotse manifests compatible with K2/Icefall for all core datasets easily.

For example, for CHiME-6:

e.g. to prepare manifests for far-field arrays and training, development partition:
- chime-utils lhotse-prep chime6 ./chime8_dasr/chime6 ./manifests/lhotse/chime6 --dset-part train,dev --mic mdm
you can also prepare manifests for on speakers close-talk mics:
- chime-utils lhotse-prep chime6 ./chime8_dasr/chime6 ./manifests/lhotse/chime6 --dset-part train,dev --mic ihm

Similarly, you can use chime-utils lhotse-prep dipco, chime-utils lhotse-prep mixer6 and chime-utils lhotse-prep notsofar1 commands to prepare manifests for the other three scenarios.

ESPNet and Kaldi

You can prepare Kaldi and ESPNet manifests for all core datasets easily.

For example, for CHiME-6:

e.g. to prepare manifests for far-field arrays and training, development partition:
- chime-utils espnet-prep chime6 ./chime8_dasr/chime6 ./manifests/espnet/chime6 --dset-part train,dev --mic mdm
you can also prepare manifests for on speakers close-talk mics:
- chime-utils espnet-prep chime6 ./chime8_dasr/chime6 ./manifests/espnet/chime6 --dset-part train,dev --mic ihm

Similarly, you can use chime-utils espnet-prep dipco, chime-utils espnet-prep mixer6 and chime-utils espnet-prep notsofar1 commands to prepare manifests for the other three scenarios.

Speechbrain

You can prepare Speechbrain compatible JSON annotation (with multichannel support !) easily.

For example, for CHiME-6:

e.g. to prepare manifests for far-field arrays and training, development partition:
- chime-utils speechbrain-prep chime6 ./chime8_dasr/chime6 ./manifests/speechbrain/chime6 --dset-part train,dev --mic mdm
you can also prepare manifests for on speakers close-talk mics:
- chime-utils speechbrain-prep chime6 ./chime8_dasr/chime6 ./manifests/speechbrain/chime6 --dset-part train,dev --mic ihm
or both together:
- chime-utils speechbrain-prep chime6 ./chime8_dasr/chime6 ./manifests/speechbrain/chime6 --dset-part train,dev --mic all

Similarly, you can use chime-utils speechbrain-prep dipco, chime-utils speechbrain-prep mixer6 and chime-utils speechbrain-prep notsofar1 commands to prepare manifests for the other three scenarios.

You can also use chime-utils speechbrain-prep combine manifest1 manifest2 .... manifestN to combine Speechbrain manifests together to train/validate on all scenarios simultaneously.

Scoring

Last but not least, we also provide scripts for scoring (the exact same scripts organizers will use for ranking CHiME-8 DASR submissions).
To learn more about scoring and ranking in CHiME-8 DASR please head over the official CHiME-8 Challenge website.

Note that the following scrips expect the participants predictions to be in the standard CHiME-style JSON format also known as SegLST (Segment-wise Long-form Speech Transcription) format (we adopt Meeteval naming convention [2]).
Each SegLST is a JSON containing a list of dicts (one for each utterance) with the following keys:

    {
        "end_time": "43.82",
        "start_time": "40.60",
        "words": "chime style json format",
        "speaker": "P05",
        "session_id": "S02"
    }

Please head over to CHiME-8 DASR Submission instructions to know more about scoring and text normalization and also ranking.

The scripts may accept a single SegLST JSON or a folder where multiple SegLST JSON files are contained.
E.g. one per each scenario as requested in CHiME-8 DASR Submission instructions.
For example for the development set:

dev
├── chime6.json
├── dipco.json
├── mixer6.json
└── notsofar1.json

CHiME-8 DASR Ranking Score

Text Normalization

Text normalization is applied automatically before scoring to your predictions.
In CHiME-8 DASR we use a more complex text normalization which is built on top of Whisper text normalization but is crucially different (less "aggressive").
Examples are available here: ./tests/test_normalizer.py

ASR

In detail, we provide scripts to compute common ASR metrics for long-form meeting scenarios. These scores are computed through the awesome Meeteval [2] toolkit.

tcpWER
concatenated minimum-permutation word error rate (cpWER) [3]
diarization-assigned minimum permutation word error rate (DA-WER) [4]

You can also use chime-utils score segslt2ctm input-dir output-dir to automatically convert all SegLST JSON files in input-dir and its subfolders to .ctm files.
This allows to use easily also other ASR metrics tools such as NIST Asclite.

Diarization

DER
JER

Error Analysis

As well as utils to convert SegSLT (aka CHiME-6 style) JSON annotation to other formats such as .ctm and Audacity compatible labels (.txt) so that systems output can be more in-depth analyzed.

Segment Time Marked .stm format conversion:
- chime-utils score segslt2stm input-dir output-dir
Conversation Time Mark .ctm format conversion:
- chime-utils score segslt2ctm input-dir output-dir
Rich Transcription Time Marked .rttm format conversion:
- chime-utils score segslt2rttm input-dir output-dir
  - this allows to use other diarization scoring tools such as dscore.
Audacity labels (see Audacity manual page) format conversion:
- chime-utils score segslt2aud input-dir output-dir

🔍 MeetEval meeting recognition visualization (recommended)

For ASR+diarization error analysis we recommend the use of this super useful Meeteval tool (will be presented at ICASSP 2024 in a show and tell session):

https://thequilo.github.io/meeteval_jupyterlite/lab/

To use this tool all you need is to convert the predictio.stmns and the ground truth to .stm format:

chime-utils score segslt2stm /path/to/your_JSON_predictions /path/to/output_folder
chime-utils score segslt2stm /path/to/chime8_dasr_ground_truth_JSON /path/to/output_folder_gt

Contribute

If you wish to contribute, download this repo:

git clone https://github.com/chimechallenge/chime-utils
cd chime-utils

and then install with:

pip install -e .[dev]
pip install pre-commit
pre-commit install --install-hooks

References

[1] Park, T. J., Huang, H., Jukic, A., Dhawan, K., Puvvada, K. C., Koluguri, N., ... & Ginsburg, B. (2023). The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System. arXiv preprint arXiv:2310.12378.

[2] von Neumann, T., Boeddeker, C., Delcroix, M., & Haeb-Umbach, R. (2023). MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems. arXiv preprint arXiv:2307.11394.

[3] Watanabe, S., Mandel, M., Barker, J., Vincent, E., Arora, A., Chang, X., ... & Ryant, N. (2020). CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings. arXiv preprint arXiv:2004.09249.

[4] Cornell, S., Wiesner, M., Watanabe, S., Raj, D., Chang, X., Garcia, P., ... & Khudanpur, S. (2023). The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios. arXiv preprint arXiv:2306.13734.

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/workflows		.github/workflows
chime_utils		chime_utils
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CHiME Utils

📩 Contact

Installation

Usage

Data generation

⚡ All DASR data in one go

🐢 Single Dataset Scripts

Data preparation

🚀 NVIDIA NeMo Official Baseline

Other Toolkits

K2/Icefall/Lhotse

ESPNet and Kaldi

Speechbrain

Scoring

CHiME-8 DASR Ranking Score

Text Normalization

ASR

Diarization

Error Analysis

🔍 MeetEval meeting recognition visualization (recommended)

Contribute

References

About

Releases

Packages

Languages

License

boeddeker/chime-utils

Folders and files

Latest commit

History

Repository files navigation

CHiME Utils

📩 Contact

Installation

Usage

Data generation

⚡ All DASR data in one go

🐢 Single Dataset Scripts

Data preparation

🚀 NVIDIA NeMo Official Baseline

Other Toolkits

K2/Icefall/Lhotse

ESPNet and Kaldi

Speechbrain

Scoring

CHiME-8 DASR Ranking Score

Text Normalization

ASR

Diarization

Error Analysis

🔍 MeetEval meeting recognition visualization (recommended)

Contribute

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages