Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
scripts		scripts
transformer		transformer
LICENSE		LICENSE
THIRD PARTY OPEN SOURCE SOFTWARE NOTICE		THIRD PARTY OPEN SOURCE SOFTWARE NOTICE
__init__.py		__init__.py
helper.py		helper.py
kd_learner_glue.py		kd_learner_glue.py
kd_learner_squad.py		kd_learner_squad.py
quant_task_distill_glue.py		quant_task_distill_glue.py
quant_task_distill_squad.py		quant_task_distill_squad.py
readme.md		readme.md
requirements.txt		requirements.txt
utils_glue.py		utils_glue.py
utils_squad.py		utils_squad.py

readme.md

BinaryBERT: Pushing the Limit of BERT Quantization

This repository contains the implementation of our paper "BinaryBERT: Pushing the Limit of BERT Quantization" in ACL 2021. The overall workflow of training BinaryBERT is shown below. We first train a half-sized ternary BERT model, and then apply ternary weight splitting to initalize the full-sized BinaryBERT. We then fine-tune BinaryBERT for further refinement.

Dependencies

pip install -r requirements.txt

Datasets

We train and test BinaryBERT on GLUE and SQuAD benchmarks. Both dataset are available online:

GLUE: https://github.com/nyu-mll/GLUE-baselines
SQuAD: https://rajpurkar.github.io/SQuAD-explorer/

For data augmentation on GLUE, please follow the instruction in TinyBERT.

Execution

Our experiments are based on the fine-tuned full-precision DynaBERT, which can be found here. Complete running scripts and more detailed tips are provided in ./scripts. There are two steps for execution, and we illustrate them with training BinaryBERT with 4-bit activations on MRPC.

Step one: Train a half-sized ternary BERT

This correponds to scripts/ternary_glue.sh. For example

sh scripts/terarny_glue.sh mrpc data/mrpc/ models/dynabert_model/mrpc/width_0.5_depth_1.0/ models/dynabert_model/mrpc/width_0.5_depth_1.0/ 2 4

Step two: Apply TWS and finetune BinaryBERT

This correponds to scripts/tws_glue.sh. Based on the model checkpoint of ternary BERT, execute:

sh scripts/tws_glue.sh mrpc data/mrpc/ models/dynabert_model/mrpc/width_0.5_depth_1.0/ output/Ternary_W2A8/mrpc/kd_stage2/ 1 4

Go through each script for more detail.

Citation

If you find this repo helpful for your research, please:

@inproceedings{bai2021binarybert,
	title={BinaryBERT: Pushing the Limit of BERT Quantization},
	author={Bai, H. and Zhang, W. and Hou, L. and Shang, L. and Jin, J. and Jiang, X. and Liu, Q. and Lyu, M. and King, I.},
	booktitle={Annual Meeting of the Association for Computational Linguistics},
	year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BinaryBERT

BinaryBERT

readme.md

BinaryBERT: Pushing the Limit of BERT Quantization

Dependencies

Datasets

Execution

Step one: Train a half-sized ternary BERT

Step two: Apply TWS and finetune BinaryBERT

Citation

Files

BinaryBERT

Directory actions

More options

Directory actions

More options

Latest commit

History

BinaryBERT

Folders and files

parent directory

readme.md

BinaryBERT: Pushing the Limit of BERT Quantization

Dependencies

Datasets

Execution

Step one: Train a half-sized ternary BERT

Step two: Apply TWS and finetune BinaryBERT

Citation