Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add model compression API #2777

Merged
merged 5 commits into from
Aug 10, 2022

Conversation

LiuChiachi
Copy link
Contributor

@LiuChiachi LiuChiachi commented Jul 11, 2022

PR types

Function optimization

PR changes

APIs

Description

Update compression API

Done:

  • 根据paddleslim更新,更新了API接口(TODO:试验adaground是否优于round)
  • 避免patch的冲突,避免import了paddleslim中的nlp_utils.py文件,并且把改文件内容搬至 PaddleNLP,增加ofa_utils.py文件
  • compress API 移至 Library
  • 增加了CompressionArguments,用户可以直接通过--的形式传参,或者用Yaml文件传,不需要区分training和compression,直接把要传的参数传进去即可。
  • 目前支持dynabert+ptq,dynabert,ptq,三种策略,默认第一个。

TODO:

  • 通用性1:electra等还不支持,需要单测的测试(动转静不支持的模型没法使用压缩API(涉及动转静),不支持的列表:https://github.com/PaddlePaddle/PaddleNLP/issues/2793);
  • 通用性2:是否对任何class和forward都可以支持
  • 压缩API 文档修改
  • 更新eval.py脚本和backend,以提供示例,帮助开发者快速搭建预测脚本

Usage:
如果使用conf.yaml,需要把

    strategy:  "dynabert+ptq"
    algo_list: ["hist", "mse"]
    batch_num_list: [1]
    batch_size_list: [4, 8, 16]
    width_mult_list: [0.75]

配进去,也可以使用命令行的形式:

    # Supports 'dynabert+ptq', 'dynabert' and 'ptq' now.
    python compress_seq_cls.py \
    --dataset   "clue cluewsc2020"   \
    --model_name_or_path best_models/CLUEWSC2020/  \
    --output_dir ./  \
    --strategy "dynabert+ptq" \
    --algo_list "hist" "mse" \
    --width_mult_list 0.75 \
    --batch_size_list 4 8 16 \
    --batch_num_list 1 \
    --per_device_train_batch_size 32 \
    --per_device_eval_batch_size 32 \

Copy link

@tianxin1860 tianxin1860 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave some comments

AutoModelForQuestionAnswering,
)
from compress_trainer import CompressConfig, PTQConfig
from paddlenlp.transformers import AutoTokenizer, AutoModelForQuestionAnswering
from paddlenlp.utils.log import logger
from datasets import load_metric, load_dataset

sys.path.append("../ernie-1.0/finetune")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么这里会有 ernie-1.0 路径?

DataArguments,
ModelArguments,
)
from question_answering import QuestionAnsweringTrainer, CrossEntropyLossForSQuAD, prepare_train_features, prepare_validation_features
Copy link

@tianxin1860 tianxin1860 Jul 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里有个疑问,为什么类似 QuestionAnsweringTrainer 这种下游任务的 Trainer 实现没有进框架,而是放在 model_zoo/ernie-1.0 目录下呢?@wawltor

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以考虑进入框架,这里没有放的主要原因是 参考 的 huggingface。

当时可能的考虑有:

  1. 给出了一些用户改造代码Trainer的示例。
  2. 是否有很强的通用性。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

)
from question_answering import QuestionAnsweringTrainer, CrossEntropyLossForSQuAD, prepare_train_features, prepare_validation_features
from utils import ALL_DATASETS, DataArguments, ModelArguments
from compress_trainer import AutoCompressConfig

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照之前的讨论,我理解 compress_trainer.py 脚本里的实现应该是要进 paddlenlp 框架的,不需要在每个场景下都维护 compress_trainer.py 这个脚本?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

已经放到paddlenlp下,并且现在paddlenlp/transformers/下还新增了ofa_utils.py这个文件。这个文件是基本copy自 https://github.com/PaddlePaddle/PaddleSlim/blob/develop/paddleslim/nas/ofa/utils/nlp_utils.py
之所以几乎copy一份到paddlenlp是因为paddleslim该文件末尾有这几行代码,

nn.MultiHeadAttention.forward = _mha_forward
nn.MultiHeadAttention._prepare_qkv = _prepare_qkv
nn.TransformerEncoder.forward = _encoder_forward
nn.TransformerEncoderLayer.forward = _encoder_layer_forward

这种写法不是我们推荐的,首先是一旦import这个文件nlp_utils.py,以上类的forward函数均会改变,并且paddleslim没有恢复的接口,可能导致未知错误;其次改变类的forward也会让我们现在model_outputs功能可能会cover不住这些patch,发生错误。第二点是来自和 @guoshengCS 的讨论,定位到的原因。

)
from question_answering import QuestionAnsweringTrainer, CrossEntropyLossForSQuAD, prepare_train_features, prepare_validation_features
from utils import ALL_DATASETS, DataArguments, ModelArguments
from compress_trainer import AutoCompressConfig

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

用户视角来看,感觉 AutoCompressConfig 命名改成 CompressConfig 更简洁一些? Auto 更多是从 RD 视角看起来和 AutoModel 惯用法保持一致?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢指出,因为考虑这里的Auto和AutoModel用法有区别,还是先去掉Auto了

Copy link

@tianxin1860 tianxin1860 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave some comments

data_collator=data_collator,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
criterion=criterion) # Stratedy`dynabert` needs arguments `criterion`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: Stratedy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Done.


# Example 2: ptq
# configs = AutoCompressConfig("ptq")
# configs.set_config(width_mult_list=[0.75, 2 / 3],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

单独做 PTQ 量化的时候为什么需要配置 width_mult_list 参数?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a typo. Thanks. Done.

# Supports 'dynabert+ptq', 'dynabert' and 'ptq' now.
# Example 1: dynabert+ptq
configs = AutoCompressConfig()
configs.set_config(width_mult_list=[0.75, 2 / 3], batch_size_list=[4, 8])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么不需要配置 input_dir 参数?

Copy link
Contributor Author

@LiuChiachi LiuChiachi Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

压缩API的输入是动态图模型,已经通过Trainer中有已有的参数model_name_or_path传进来了

# configs = AutoCompressConfig("ptq")
# configs.set_config(width_mult_list=[0.75, 2 / 3],
# batch_size_list=[4, 8],
# input_dir=os.path.join(model_args.model_name_or_path,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input_dir 参数是一个策略无关参数,抽出来放入 trainer.compress() 接口和 output_dir 参数对应起来是否更容易理解?要么就是把 output_dir 参数也加入到 config 中, trainer.compress() 接口只接受 1 个 config 参数,看看哪种选择更合理?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

再次确认了下,input_dir可以不需要,即便是用户只做量化。压缩 API 的输入模型是动态图模型,模型路径已经通过 Trainer API 的model_name_or_path传入了。

# "compress", str(2/3)))

# Example 3: dynabert
# configs = AutoCompressConfig("dynabert")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不需要显式设置 "dynabert" 策略的 config 么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果用户没设置就是按默认的来,程序启动之后会提示有哪些参数可以自定义设置,并打印出运行时最终的config。

else:
pass

self.stratedy = stratedy

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks:-)


self.stratedy = stratedy
self.config_dict = {}
for each_stratedy in stratedy.split("+"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strategy 参数定义成 str 的话就没法避免使用 + 号来解析,如果把 strategy 定义成列表如何?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里想把'dynabert+ptq'搞成一个固定方案,想打包成一个和'dynabert', 'ptq'并列的策略。

paddle.version.commit))
for strategy in self.config_dict:
logger.info('{}:'.format(strategy))
for a in self.config_dict[strategy]:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 对 dict 的迭代这里为什么不采用 for config_key, config_value in self.config_dict[strategy].items()
  2. 后续可以注意下变量命名规范,避免采用 a 这种表意不清楚的变量名

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

收到,感谢提醒

output_dir_width = os.path.join(output_dir, str(width_mult))
self.quant(output_dir_width, output_dir_width,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quant 的接口需要 2 个 output_dir_width 参数?意思是量化后模型的输出路径依然是 output_dir_width 么?

Copy link
Contributor Author

@LiuChiachi LiuChiachi Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

是的,第一个参数是输入路径,第二个参数是输出路径。这里是裁剪后的量化,前者是从用户输入的output_dir和裁剪宽度创建的路径,量化就也放在改目录下的子目录中。例如用户传的output_dirbest_models/CLUEWSC2020/compress,那么裁剪后的模型会在:
best_models/CLUEWSC2020/compress/width_mult_0.75/float32,量化后的模型
best_models/CLUEWSC2020/compress/width_mult_0.75/hist16/int8.pdmodel

@@ -763,5 +733,4 @@ def soft_cross_entropy(inp, target):


Trainer.compress = compress
Trainer.prune = prune
Trainer.quant = quant

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么 prune 删除,quant 保留?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为不确定之后如果要接入别的裁剪方法的话,和dynabert合并能不能节省代码,但是可能量化可能可以。这里也不打算对外开放这quant这个接口

@tianxin1860 tianxin1860 changed the title Update compression API Add model compression API into PaddleNLP Jul 14, 2022
@tianxin1860 tianxin1860 changed the title Add model compression API into PaddleNLP Add model compression API into Jul 14, 2022
@tianxin1860 tianxin1860 changed the title Add model compression API into Add model compression API Jul 14, 2022
Copy link

@tianxin1860 tianxin1860 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Description Usage 部分看起来没有修改?

Comment on lines 164 to 165
# Calling `set_config` is not necessary
# configs.set_config(batch_size_list=[4, 8])
Copy link

@tianxin1860 tianxin1860 Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

第1句提示不需要 set_config, 第2句又设置 config ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实想告诉可以这样设也可以不设,这里也是注释掉了,代码不会跑。可能更明确的用法需要在文档中体现。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

其实想告诉可以这样设也可以不设,这里也是注释掉了,代码不会跑。可能更明确的用法需要在文档中体现。

嗯,可以用注释传达清楚你的本意,避免别人误解或者引起迷惑。

@LiuChiachi LiuChiachi force-pushed the update-compress-api branch 3 times, most recently from 08c0783 to 0bb8600 Compare July 18, 2022 09:55
Copy link

@tianxin1860 tianxin1860 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave some comments

ModelArguments,
)
from question_answering import QuestionAnsweringTrainer, CrossEntropyLossForSQuAD, prepare_train_features, prepare_validation_features
from utils import ALL_DATASETS, DataArguments, ModelArguments

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

有个疑问哈: 为什么这3个数据类型 ALL_DATASETS, DataArguments, ModelArguments 没有进 Trainer 框架,而是放在 ernie1.0/finetune/utils 里? @ZHUI

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这些是 custom 用户自定义的东西。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

跟数据,任务类型关系比较大。这里应该是 ernie-3.0 和 ernie-1.0 任务比较相似,所以共用。但是对于其他模型来讲,可能不一定适用。

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,清楚了。

Comment on lines 42 to 48
if data_args.dataset in ALL_DATASETS:
# if you custom you hyper-parameters in yaml config, it will overwrite all args.
config = ALL_DATASETS[data_args.dataset]
for args in (model_args, data_args, training_args):
for args in (model_args, data_args, compression_args):
for arg in vars(args):
if arg in config.keys():
setattr(args, arg, config[arg])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一段逻辑能否加上注释说明?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

第43行有注释,如果有自定义的yaml文件,则会替换args传递来的参数;

# We use this argument because the texts in our dataset are lists of words (with a label for each word).
is_split_into_words=True,
return_length=True)
label_ids = example['ner_tags']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ner_tags 这是对不同的 NER 任务都通用么?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不一定,这取决于数据集本身的结构,msra_ner和conll2002数据集都有ner_tags这个key,这个例子应该也只是以msra_ner为例

"""
Supports DynaBERT strategy now.
Supports pruning dynabert and post-training quantization. If both are
needed, pruning dynabertwould be performed before quantizaton.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. dynabert -> would
  2. 代码注释里的 dynabert 是否统一按照原论文专有名词来表述 DynaBERT ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢指出,全部改为原文DynaBERT

Copy link

@tianxin1860 tianxin1860 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@LiuChiachi LiuChiachi merged commit 4c54f6d into PaddlePaddle:develop Aug 10, 2022
@LiuChiachi LiuChiachi mentioned this pull request Aug 12, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants