Skip to content

Latest commit

 

History

History
76 lines (50 loc) · 3.55 KB

Pretrained_Models.md

File metadata and controls

76 lines (50 loc) · 3.55 KB

Pretrained Sequence Tagging Models

In the following some pre-trained models are provided for different common sequence tagging task. These models can be used by executing:

python RunModel.py modelname.h5 input.txt

For the English models, we used the word embeddings by Levy et al.. For the German files, we used the word embeddings by Reimers et al.

POS

We trained POS-tagger on the Universal Dependencies v1.3 dataset: Trained on universal dependencies v1.3 Englisch:

Language Development (Accuracy) Test (Accuracy)
English (UD) 95.47% 95.55%
German (UD) 94.86% 93.99%

Further, we trained models on the Wall Street Journal:

Language Development (Accuracy) Test (Accuracy)
English (WSJ) 97.18% 97.21%

The depicted performance is accuracy.

Chunking

Trained on CoNLL 2000 Chunking dataset. Performance is F1-score.

Language Development (F1) Test(F1)
English (CoNLL 2003) 95.40% 94.70%

NER

Trained on CoNLL 2003 and GermEval 2014

Language Development (F1) Test (F1)
English (CoNLL 2003) 94.29% 90.87%
German (CoNLL 2003) 80.80% 77.49%
German (GermEval 2014) 80.85% 80.00%

Entities

Trained on ACE 2005 (https://catalog.ldc.upenn.edu/LDC2006T06)

Language Development (F1) Test (F1)
English 82.46% 85.78%

Events

Trained on TempEval3 (https://www.cs.york.ac.uk/semeval-2013/task1/)

Language Development (F1) Test (F1)
English - 82.28%

Parameters

In the following are some parameters & configurations listed for the pretrained models.

English NER:
Glove 6B 100 embeddings with params = {'dropout': [0.25, 0.25], 'classifier': 'CRF', 'LSTM-Size': [100,75], 'optimizer': 'nadam', 'charEmbeddings': 'CNN', 'miniBatchSize': 32}

German NER (CoNLL 2003 and GermEval 2014):
Reimers et al., 2014, GermEval embeddings with params = {'dropout': [0.25, 0.25], 'classifier': 'CRF', 'LSTM-Size': [100,75], 'optimizer': 'nadam', 'charEmbeddings': 'CNN', 'miniBatchSize': 32}

Entities:
Glove 6B 100 embeddings, params = {'dropout': [0.25, 0.25], 'classifier': 'CRF', 'LSTM-Size': [100,75], 'optimizer': 'nadam', 'charEmbeddings': 'CNN', 'miniBatchSize': 32}