NER benchmarking: monolingual vs. multilingual embeddings

This repository bundles different NER benchmark experiments to compare the performance of monolingual and multilingual embeddings for both monolingual (Dutch, French and English) and multilingual datasets. The experiments were performed in the context of my master's dissertation. A bookdown website of the thesis can be found here (not everything is optimized for html so downloading the book in PDF is preferred).

All experiments made use of the awesome Flair library. The experiments were run using in individual scripts on Kaggle. A simple CLI was used to generate self-contained script and metadata files and push these to Kaggle using the Kaggle API, some examples can be found below.

Datasets

Language	Dataset	Downsampled	Tokens (train)	Tokens (dev)	Tokens (test)
English	CoNLL2003	No	203621	51362	46435
Dutch	CoNLL2002	No	199969	37687	68466
French	WikiNER	Yes (0.10)	279729	34824	30991
Multilingual	All	Yes (0.33)	232173	40141	49444

Note: all datasets were converted to standard CoNLL2002 (BIO2)-format. Sentences containing more than 250 tokens (i.e. a total of 5 sentences in the CoNLL2002 dataset) were removed to allow the use of BERT embeddings (limited input sequence length). Document delimiters - when present - were removed as well. All three datasets were combined to obtain a multilingual dataset. Since the CoNLL2003 dataset requires a license (free for research) only the preprocessed CoNLL2002 and WikiNER datasets are included in this repository.

Embeddings

Different monolingual and multilingual embeddings were tested. This included monolingual, contextualized Flair and BERT embeddings, monolingual static fastText (fastT) and BytePair embeddings (BPEmbs) and task-specific word type embeddings (OneHotEmbeddings) (OHE) and character-feature embeddings (CharacterEmbeddings) (Char). For multilingual embeddings, we included multilingual BERT (mBERT), multilingual Flair (mFlair) and multilingual BytePair embeddings (mBPEmb).

For Flair embeddings, both forward and backward representations were included. BERT-based embeddings (both monolingual and multilingual) were obtained using the following parameter configuration of the TransformerWordEmbeddings class (i.e. default with only the last layer selected).

parameter	value
layers	-1
pooling_operation	first
use_scalar_mix	False

A full overview of all configurations that were evaluated is provided below ("+" indicates concatenation into a single StackedEmbedding):

Monolingual English

No contextualized embeddings	BERT	Flair	BERT + Flair
	BERT	Flair	BERT + Flair
Char	BERT + Char	Flair + Char	BERT + Flair + Char
OHE	BERT + OHE	Flair + OHE	BERT + Flair + OHE
BPEmb (En)	BERT + BPEmb (En)	Flair + BPEmb (En)	BERT + Flair + BPEmb (En)
fastT (En)	BERT + fastT (En)	Flair + fastT (En)	BERT + Flair + fastT (En)
All	BERT + All	Flair + All	BERT + Flair + All

All refers to Char + OHE + fastT (En) and BPEmb (En)

Monolingual Dutch

No contextualized embeddings	BERT	Flair	BERT + Flair
	BERTje	Flair (Nl)	BERTje + Flair (Nl)
Char	BERTje + Char	Flair (Nl) + Char	BERTje + Flair (Nl) + Char
OHE	BERTje + OHE	Flair (Nl) + OHE	BERTje + Flair (Nl) + OHE
BPEmb (Nl)	BERTje + BPEmb (Nl)	Flair (Nl) + BPEmb (Nl)	BERTje + Flair (Nl) + BPEmb (Nl)
fastT (Nl)	BERTje + fastT (Nl)	Flair (Nl) + fastT (Nl)	BERTje + Flair (Nl) + fastT (Nl)
All	BERTje + All	Flair (Nl) + All	BERTje + Flair (Nl) + All

All refers to Char + OHE + fastT (Nl) and BPEmb (Nl)

Monolingual French

No contextualized embeddings	BERT	Flair	BERT + Flair
	CamemBERT	Flair (Fr)	CamemBERT + Flair (Fr)
Char	CamemBERT + Char	Flair (Fr) + Char	CamemBERT + Flair (Fr) + Char
OHE	CamemBERT + OHE	Flair (Fr) + OHE	CamemBERT + Flair (Fr) + OHE
BPEmb (Fr)	CamemBERT + BPEmb (Fr)	Flair (Fr) + BPEmb (Fr)	CamemBERT + Flair (Fr) + BPEmb (Fr)
fastT (Fr)	CamemBERT + fastT (Fr)	Flair (Fr) + fastT (Fr)	CamemBERT + Flair (Fr) + fastT (Fr)
All	CamemBERT + All	Flair (Fr) + All	CamemBERT + Flair (Fr) + All

All refers to Char + OHE + fastT (Fr) and BPEmb (Fr)

Multilingual

No contextualized embeddings	BERT	Flair	BERT + Flair
	mBERT	mFlair	mBERT + mFlair
Char	mBERT + Char	mFlair + Char	mBERT + mFlair + Char
OHE	mBERT + OHE	mFlair + OHE	mBERT + mFlair + OHE
mBPEmb	mBERT + mBPEmb	mFlair + mBPEmb	mBERT + mFlair + mBPEmb
All	mBERT + All	mFlair + All	mBERT + mFlair + All

All refers to Char + OHE + mBPEmb

Command line interface

Since the Kaggle API does not allow to import utility scripts directly from the command line, every experiment was executed from a self-contained script containing all the code to load the dataset, initialize the embeddings, train and evaluate the model. This script was automatically generated and pushed to Kaggle using a simple command line interface. Results were obtained by extracting the tp, fp and fn counts provided by Flair's training log and computing micro-average precision, recall and F1-score using sklearn. The Dropbox API was used to collect these results (token required).

The following command creates the python script, the kernel-metadata.json file (kernel name is the concatenation of dataset, language and the different embeddings) and pushes the script to Kaggle. By default, the embedding storage mode of the ModelTrainer instance is set to 'cpu'. When --storage option is set to gpu, the GPU is automatically enabled on Kaggle.

Important: the --data option requires a valid name of the dataset on Kaggle (path: /kaggle/input/<data>).

$ python3 push_and_run.py --help
Usage: push_to_kaggle.py [OPTIONS] [EMBEDDING_CODES]...

  Embedding codes can be passed as arguments, the appropriate classes are
  initiated according to the --lang option. Multiple embeddings are
  concatenated (StackedEmbeddings)

  Abbreviations:

  =====================================================================

      "bert":     BERTje (nl), CamemBERT (fr), BERT (en), mBERT (multi)

      "bpe":      BytePairEmbeddings (fr, nl, en, multi)

      "ohe":      OneHotEmbeddings

      "char":     CharacterEmbeddings

      "ft":       fastText WordEmbeddings (fr, nl, en)

      "flair":    flair fw + bw (fr, nl, en, multi)

      "elmo:      ELMo embeddings (all 3 layers of ELMo large) (en)

  =====================================================================

Options:
  --user TEXT
  --data TEXT       Dataset: [conll2002, conll2003, wikiner, trired]
  --lang TEXT       Language of embedding class: [en, nl, fr, multi]
  --epochs INTEGER  Max number of epochs  [default: 100]
  --storage TEXT    Embedding storage mode. When storage_mode is 'gpu', GPU
                    will be automatically enabled in the Kaggle environment.
                    [default: cpu]
  --help            Show this message and exit.

Example 1: To run an experiment using CamemBERT + French fastText + OneHotEmbeddings on the WikiNER DS:

$ python3 push_and_run.py --data wikiner --lang fr bert ft ohe

Example 2: To run an experiment using mBERT + mFlair + multilingual BytePair embeddings on the trilingual DS:

$ python3 push_and_run.py --data trilingual --lang multi bert flair bpe

Results

An overview of the results can be found in the thesis, .csv files can be downloaded here and the most recent results (currently, n=3 training runs for each experiment) are also visualized below. The precision, recall and F1-scores for each experiment were obtained by micro-averaging the per-class scores. The symbol and error bars indicate the mean and standard deviation (only included when they exceeded the symbol dimensions) for n=3.

CoNLL2003

NOTE: "All" refers to Char + OHE + fastT and BPEmb (for monolingual embeddings) and Char + OHE + mBPEmb (for multilingual embeddings)

CoNLL2002

NOTE: "All" refers to Char + OHE + fastT and BPEmb (for monolingual embeddings) and Char + OHE + mBPEmb (for multilingual embeddings)

WikiNER

NOTE: "All" refers to Char + OHE + fastT and BPEmb (for monolingual embeddings) and Char + OHE + mBPEmb (for multilingual embeddings)

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
datasets		datasets
results		results
.gitignore		.gitignore
README.md		README.md
kernel-metadata.json		kernel-metadata.json
push_to_kaggle.py		push_to_kaggle.py
run_NER_experiment.py		run_NER_experiment.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NER benchmarking: monolingual vs. multilingual embeddings

Datasets

Embeddings

Monolingual English

Monolingual Dutch

Monolingual French

Multilingual

Command line interface

Results

CoNLL2003

CoNLL2002

WikiNER

About

Releases

Packages

Languages

arthur-arthur/NER

Folders and files

Latest commit

History

Repository files navigation

NER benchmarking: monolingual vs. multilingual embeddings

Datasets

Embeddings

Monolingual English

Monolingual Dutch

Monolingual French

Multilingual

Command line interface

Results

CoNLL2003

CoNLL2002

WikiNER

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages