"Average" Approximates "First Principal Component"? An Empirical Analysis on Representations from Neural Language Models
Our paper is available here. Accepted as a short paper in EMNLP'21.
In progress.
Packages can be installed via pip install -r requirements.txt
.
- First run
embed_corpus.py
to obtain the embeddings for a certain corpus with a certain language model. - Then, run
calculate_properties.py
to get the absolute cosine similarities between first PC and the average of the embeddings. - Calculations for other properties in the paper are in progress.
Please cite the following paper if you found our dataset or framework useful. Thanks!
Zihan Wang, Chengyu Dong, and Jingbo Shang. ""Average" Approximates "First Principal Component"? An Empirical Analysis on Representations from Neural Language Models" arXiv preprint arXiv:2104.08673 (2021).
@misc{wang2020xclass,
title={"Average" Approximates "First Principal Component"? An Empirical Analysis on Representations from Neural Language Models},
author={Zihan Wang and Chengyu Dong and Jingbo Shang},
year={2021},
eprint={2104.08673},
archivePrefix={arXiv},
primaryClass={cs.CL}
}