Skip to content

Latest commit

 

History

History
120 lines (110 loc) · 8.05 KB

pretrained-vectors.md

File metadata and controls

120 lines (110 loc) · 8.05 KB

Pre-trained word vectors

We are publishing pre-trained word vectors for 90 languages, trained on Wikipedia using fastText. These vectors in dimension 300 were obtained using the skip-gram model described in 1 with default parameters.

Format

The word vectors come in both the binary and text default formats of fastText. In the text format, each line contain a word followed by its embedding. Each value is space separated. Words are ordered by their frequency in a descending order.

Models

The models can be downloaded from:

References

If you use these word embeddings, please cite the following paper:

[1] P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information

@article{bojanowski2016enriching,
  title={Enriching Word Vectors with Subword Information},
  author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
  journal={arXiv preprint arXiv:1607.04606},
  year={2016}
}