Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(src): add kv cache int8 quantization #22

Merged
merged 9 commits into from
Jun 28, 2023
Merged

feat(src): add kv cache int8 quantization #22

merged 9 commits into from
Jun 28, 2023

Conversation

tpoisonooo
Copy link
Collaborator

@tpoisonooo tpoisonooo commented Jun 27, 2023

功能

  • kv cache 支持 int8 选项
  • 不能和 use_context_fmha 同时使用、不能和 fp32 同时使用
  • 配置文件增加 quant_policy,数值为 4。默认为 0, 不开启
  • 运行需要跑量化脚本

删除

  • 删掉了 params.int8_mode == 2, 新增 enum QuantPolicy, 1,2 两个值保留, 从 4 开始,下一个写 8,16,32..
  • 关闭 FP8 和 BF16, 用不到
  • 删掉了 ia3,用不到
  • 删掉了 transpose_key_cache,用不到

7B CN 版运行截图

图片

@tpoisonooo tpoisonooo changed the title WIP feat(src): add int8 and compile passed feat(src): add int8 and compile passed Jun 28, 2023
@tpoisonooo tpoisonooo requested a review from lvhan028 June 28, 2023 08:50
@tpoisonooo tpoisonooo changed the title feat(src): add int8 and compile passed feat(src): add kv cache int8 quantization Jun 28, 2023
@lvhan028 lvhan028 requested a review from lzhangzz June 28, 2023 09:17
@lvhan028 lvhan028 merged commit cc93136 into InternLM:main Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants