Pre-trained word vectors

We are publishing pre-trained word vectors for 90 languages, trained on Wikipedia using fastText. These vectors in dimension 300 were obtained using the skip-gram model described in 1 with default parameters.

Format

The word vectors come in both the binary and text default formats of fastText. In the text format, each line contain a word followed by its embedding. Each value is space separated. Words are ordered by their frequency in a descending order.

Models

The models can be downloaded from:

Afrikaans
Albanian
Arabic
Armenian
Asturian
Azerbaijani
Bashkir
Basque
Belarusian
Bengali
Bosnian
Breton
Bulgarian
Burmese
Catalan
Cebuano
Chechen
Chinese
Chuvash
Croatian
Czech
Danish
Dutch
English
Esperanto
Estonian
Farsi
Finnish
French
Galician
Georgian
German
Greek
Gujarati
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Italian
Japanese
Kannada
Kazakh
Khmer
Korean
Kyrgyz
Latin
Latvian
Lithuanian
Luxembourgish
Macedonian
Malagasy
Malayalam
Malay
Marathi
Minangkabau
Mongolian
Nepali
Newar
Norwegian
Occitan
Polish
Portuguese
Punjabi
Romanian
Russian
Sanskrit
Scots
Serbian
Serbo-Croatian
Sinhalese
Slovak
Slovene
Spanish
Swedish
Tagalog
Tajik
Tamil
Tatar
Telugu
Thai
Turkish
Ukrainian
Urdu
Uzbek
Vietnamese
Volapük
Waray
Welsh
Western Frisian

References

If you use these word embeddings, please cite the following paper:

[1] P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information

@article{bojanowski2016enriching,
  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.04606},
  year={2016}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pretrained-vectors.md

pretrained-vectors.md

Pre-trained word vectors

Format

Models

References

Files

pretrained-vectors.md

Latest commit

History

pretrained-vectors.md

File metadata and controls

Pre-trained word vectors

Format

Models

References