In the following some pre-trained models are provided for different common sequence tagging task. These models can be used by executing:
python RunModel.py modelname.h5 input.txt
For the English models, we used the word embeddings by Levy et al.. For the German files, we used the word embeddings by Reimers et al.
We trained POS-tagger on the Universal Dependencies v1.3 dataset: Trained on universal dependencies v1.3 Englisch:
Language | Development (Accuracy) | Test (Accuracy) |
---|---|---|
English (UD) | 95.47% | 95.55% |
German (UD) | 94.86% | 93.99% |
Further, we trained models on the Wall Street Journal:
Language | Development (Accuracy) | Test (Accuracy) |
---|---|---|
English (WSJ) | 97.18% | 97.21% |
The depicted performance is accuracy.
Trained on CoNLL 2000 Chunking dataset. Performance is F1-score.
Language | Development (F1) | Test(F1) |
---|---|---|
English (CoNLL 2003) | 95.40% | 94.70% |
Trained on CoNLL 2003 and GermEval 2014
Language | Development (F1) | Test (F1) |
---|---|---|
English (CoNLL 2003) | 94.29% | 90.87% |
German (CoNLL 2003) | 80.80% | 77.49% |
German (GermEval 2014) | 80.85% | 80.00% |
Trained on ACE 2005 (https://catalog.ldc.upenn.edu/LDC2006T06)
Language | Development (F1) | Test (F1) |
---|---|---|
English | 82.46% | 85.78% |
Trained on TempEval3 (https://www.cs.york.ac.uk/semeval-2013/task1/)
Language | Development (F1) | Test (F1) |
---|---|---|
English | - | 82.28% |
In the following are some parameters & configurations listed for the pretrained models.
English NER:
Glove 6B 100 embeddings with params = {'dropout': [0.25, 0.25], 'classifier': 'CRF', 'LSTM-Size': [100,75], 'optimizer': 'nadam', 'charEmbeddings': 'CNN', 'miniBatchSize': 32}
German NER (CoNLL 2003 and GermEval 2014):
Reimers et al., 2014, GermEval embeddings with params = {'dropout': [0.25, 0.25], 'classifier': 'CRF', 'LSTM-Size': [100,75], 'optimizer': 'nadam', 'charEmbeddings': 'CNN', 'miniBatchSize': 32}
Entities:
Glove 6B 100 embeddings, params = {'dropout': [0.25, 0.25], 'classifier': 'CRF', 'LSTM-Size': [100,75], 'optimizer': 'nadam', 'charEmbeddings': 'CNN', 'miniBatchSize': 32}