Skip to content

Commit

Permalink
support codegeex4 (#1305)
Browse files Browse the repository at this point in the history
  • Loading branch information
Jintao-Huang committed Jul 6, 2024
1 parent fb8c4e1 commit 9796af9
Show file tree
Hide file tree
Showing 13 changed files with 121 additions and 59 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ SWIFT has rich documentations for users, please check [here](https://github.com/
SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try!

## 🎉 News
- 2024.07.06: Support codegeex4-9b-chat.
- 2024.07.04: Support internlm2_5-7b series: internlm2_5-7b, internlm2_5-7b-chat, internlm2_5-7b-chat-1m.
- 2024.07.02: Support for using vLLM for accelerating inference and deployment of multimodal large models such as the llava series and phi3-vision models. You can refer to the [Multimodal & vLLM Inference Acceleration Documentation](docs/source_en/Multi-Modal/vllm-inference-acceleration.md) for more information.
- 2024.07.02: Support for `llava1_6-vicuna-7b-instruct`, `llava1_6-vicuna-13b-instruct` and other llava-hf models. For best practices, refer to [here](docs/source_en/Multi-Modal/llava-best-practice.md).
Expand Down Expand Up @@ -387,6 +388,7 @@ swift sft \

#### Multi-node Multi-GPU
```shell
# If multiple machines share a disk, please additionally specify `--save_on_each_node false`.
# node0
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NNODES=2 \
Expand Down Expand Up @@ -507,7 +509,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
| Model Type | Model Introduction | Language | Model Size | Model Type |
|------------------------------------------------|------------------------------------------------------------------------|--------------------|----------------------------------------|------------------------------------------- |
| Qwen<br>Qwen1.5<br>Qwen2 | [Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM) | Chinese<br>English | 0.5B-110B<br>including quantized versions | base model<br>chat model<br>MoE model<br>code model |
| ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4 | [Zhipu ChatGLM series models](https://github.com/THUDM) | Chinese<br>English | 6B-9B | base model<br>chat model<br>code model<br>long text model |
| ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4<br>Codegeex4 | [Zhipu ChatGLM series models](https://github.com/THUDM) | Chinese<br>English | 6B-9B | base model<br>chat model<br>code model<br>long text model |
| Baichuan<br>Baichuan2 | [Baichuan 1 and Baichuan 2](https://github.com/baichuan-inc) | Chinese<br>English | 7B-13B<br>including quantized versions | base model<br>chat model |
| Yuan2 | [Langchao Yuan series models](https://github.com/IEIT-Yuan) | Chinese<br>English | 2B-102B | instruct model |
| XVerse | [XVerse series models](https://github.com/xverse-ai) | Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model<br>MoE model |
Expand Down
4 changes: 3 additions & 1 deletion README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https:
可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift)[ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。

## 🎉 新闻
- 2024.07.06: 支持codegeex4-9b-chat.
- 2024.07.04: 支持internlm2_5-7b系列: internlm2_5-7b, internlm2_5-7b-chat, internlm2_5-7b-chat-1m.
- 2024.07.02: 支持使用vllm对多模态大模型: llava系列, phi3-vision模型进行推理加速和部署. 可以查看[多模态&vLLM推理加速文档](docs/source/Multi-Modal/vLLM推理加速文档.md)获取更多信息.
- 2024.07.02: 支持`llava1_6-vicuna-7b-instruct`, `llava1_6-vicuna-13b-instruct`等llava-hf模型. 最佳实践可以查看[这里](docs/source/Multi-Modal/llava最佳实践.md).
Expand Down Expand Up @@ -384,6 +385,7 @@ swift sft \

#### 多机多卡
```shell
# 如果多机共用磁盘请在各机器sh中额外指定`--save_on_each_node false`.
# node0
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NNODES=2 \
Expand Down Expand Up @@ -503,7 +505,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
| 模型类型 | 模型介绍 | 语言 | 模型大小 | 模型类型 |
| --------------------------------------------------- | ------------------------------------------------------------ |----------| ------------------------- |-------------------------------------------|
| Qwen<br>Qwen1.5<br>Qwen2 | [通义千问1.0和1.5系列模型](https://github.com/QwenLM) | 中文<br>英文 | 0.5B-110B<br>包含量化版本 | base模型<br>chat模型<br>MoE模型<br>代码模型 | |
| ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4 | [智谱ChatGLM系列模型](https://github.com/THUDM/) | 中文<br>英文 | 6B-9B | base模型<br>chat模型<br>代码模型<br>长文本模型 |
| ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4<br>Codegeex4 | [智谱ChatGLM系列模型](https://github.com/THUDM/) | 中文<br>英文 | 6B-9B | base模型<br>chat模型<br>代码模型<br>长文本模型 |
| Baichuan<br>Baichuan2 | [百川1和百川2](https://github.com/baichuan-inc) | 中文<br>英文 | 7B-13B<br>包含量化版本 | base模型<br>chat模型 |
| Yuan2 | [浪潮源系列模型](https://github.com/IEIT-Yuan) | 中文<br>英文 | 2B-102B | instruct模型 |
| XVerse | [元象系列模型](https://github.com/xverse-ai) | 中文<br>英文 | 7B-65B | base模型<br>chat模型<br>长文本模型<br>MoE模型 | |
Expand Down
2 changes: 2 additions & 0 deletions docs/source/LLM/LLM微调文档.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ swift sft \
--output_dir output \

# 多机多卡
# 如果多机共用磁盘请在各机器sh中额外指定`--save_on_each_node false`.
# node0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NNODES=2 \
Expand Down Expand Up @@ -246,6 +247,7 @@ print(f'history: {history}')

使用**数据集**评估:
```bash
# 如果要推理所有数据集样本, 请额外指定`--show_dataset_sample -1`
# 直接推理
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' \
Expand Down
1 change: 1 addition & 0 deletions docs/source/LLM/VLLM推理加速与部署.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,7 @@ CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' --merge_lora true

# 使用数据集评估
# 如果要推理所有数据集样本, 请额外指定`--show_dataset_sample -1`
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged' \
--infer_backend vllm \
Expand Down
2 changes: 1 addition & 1 deletion docs/source/LLM/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
- `--add_output_dir_suffix`: 默认为`True`, 表示会在`output_dir`的目录后拼接上`model_type`和微调版本号的后缀. 如果要避免此行为, 你可以设置为`False`.
- `--ddp_backend`: 表示分布式的后端支持, 默认是`None`. 你可以选择的值包括: 'nccl', 'gloo', 'mpi', 'ccl'.
- `--seed`: 全局的seed, 默认使用`42`. 用于复现训练效果.
- `--resume_from_checkpoint`: 用于断点续训, 默认为`None`. 你可以将其设置为checkpoint的路径, 例如: `'output/qwen-7b-chat/vx-xxx/checkpoint-xxx'`, 来进行断点续训. 支持调节`--resume_only_model`在断点续训时只读取模型文件.
- `--resume_from_checkpoint`: 用于断点续训, 默认为`None`. 你可以将其设置为checkpoint的路径, 例如: `--resume_from_checkpoint output/qwen-7b-chat/vx-xxx/checkpoint-xxx`, 来进行断点续训. 支持调节`--resume_only_model`在断点续训时只读取模型文件.
- `--resume_only_model`: 默认为`False`, 即为严格的断点续训, 这会读取模型、优化器和lr_scheduler的权重和各个设备存储的随机种子, 并将从上次训练暂停的stpes后继续计数进行训练. 如果设置为`True`, 则只读取模型的权重.
- `--dtype`: 基模型载入时的torch_dtype, 默认为`'AUTO'`, 即智能选择dtype: 如果机器不支持bf16, 则使用fp16, 如果`MODEL_MAPPING`中对应模型有指定torch_dtype, 则使用其对应dtype, 否则使用bf16. 你可以选择的值包括: 'bf16', 'fp16', 'fp32'.
- `--dataset`: 用于选择训练的数据集, 默认为`[]`. 可以选择的数据集可以查看[支持的数据集](支持的模型和数据集.md#数据集). 如果需要使用多个数据集进行训练, 你可以使用','或者' '进行分割, 例如: `--dataset alpaca-en,alpaca-zh` or `--dataset alpaca-en alpaca-zh`. 支持Modelscope Hub/HuggingFace Hub/本地路径、subsets选择与数据集采样, 每个数据集指定格式如下: `[HF or MS::]{dataset_name} or {dataset_id} or {dataset_path}[:subset1/subset2/...][#dataset_sample]`, 最简只需要指定dataset_name、dataset_id或者dataset_path即可. 自定义数据集可以查看[数据集的自定义与拓展文档](自定义与拓展.md#自定义数据集).
Expand Down
7 changes: 4 additions & 3 deletions docs/source/LLM/支持的模型和数据集.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,10 @@
|chatglm3-6b-32k|[ZhipuAI/chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/chatglm3-6b-32k](https://huggingface.co/THUDM/chatglm3-6b-32k)|
|chatglm3-6b-128k|[ZhipuAI/chatglm3-6b-128k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-128k/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/chatglm3-6b-128k](https://huggingface.co/THUDM/chatglm3-6b-128k)|
|codegeex2-6b|[ZhipuAI/codegeex2-6b](https://modelscope.cn/models/ZhipuAI/codegeex2-6b/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;|transformers<4.34|coding|[THUDM/codegeex2-6b](https://huggingface.co/THUDM/codegeex2-6b)|
|glm4-9b|[ZhipuAI/glm-4-9b](https://modelscope.cn/models/ZhipuAI/glm-4-9b/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b](https://huggingface.co/THUDM/glm-4-9b)|
|glm4-9b-chat|[ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat)|
|glm4-9b-chat-1m|[ZhipuAI/glm-4-9b-chat-1m](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b-chat-1m](https://huggingface.co/THUDM/glm-4-9b-chat-1m)|
|glm4-9b|[ZhipuAI/glm-4-9b](https://modelscope.cn/models/ZhipuAI/glm-4-9b/summary)|query_key_value|chatglm-generation|&#x2714;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b](https://huggingface.co/THUDM/glm-4-9b)|
|glm4-9b-chat|[ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/summary)|query_key_value|chatglm3|&#x2714;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat)|
|glm4-9b-chat-1m|[ZhipuAI/glm-4-9b-chat-1m](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m/summary)|query_key_value|chatglm3|&#x2714;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b-chat-1m](https://huggingface.co/THUDM/glm-4-9b-chat-1m)|
|codegeex4-9b-chat|[ZhipuAI/codegeex4-all-9b](https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b/summary)|query_key_value|codegeex4|&#x2714;|&#x2714;|transformers<4.42|coding|[THUDM/codegeex4-all-9b](https://huggingface.co/THUDM/codegeex4-all-9b)|
|llama2-7b|[modelscope/Llama-2-7b-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)|
|llama2-7b-chat|[modelscope/Llama-2-7b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|
|llama2-13b|[modelscope/Llama-2-13b-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-ms/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf)|
Expand Down
2 changes: 1 addition & 1 deletion docs/source_en/LLM/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
- `--add_output_dir_suffix`: Default is `True`, indicating that a suffix of `model_type` and fine-tuning version number will be appended to the `output_dir` directory. Set to `False` to avoid this behavior.
- `--ddp_backend`: Backend support for distributed training, default is `None`. Options include: 'nccl', 'gloo', 'mpi', 'ccl'.
- `--seed`: Global seed, default is `42`. Used to reproduce training results.
- `--resume_from_checkpoint`: Used for resuming training from a checkpoint, default is `None`. You can set it to the path of the checkpoint, for example: `'output/qwen-7b-chat/vx-xxx/checkpoint-xxx'`, to resume training from that point. Supports adjusting `--resume_only_model` to only read the model file during checkpoint continuation.
- `--resume_from_checkpoint`: Used for resuming training from a checkpoint, default is `None`. You can set it to the path of the checkpoint, for example: `--resume_from_checkpoint output/qwen-7b-chat/vx-xxx/checkpoint-xxx`, to resume training from that point. Supports adjusting `--resume_only_model` to only read the model file during checkpoint continuation.
- `--resume_only_model`: Default is `False`, which means strict checkpoint continuation, this will read the weights of the model, optimizer, lr_scheduler, and the random seeds stored on each device, and continue training from the last paused steps. If set to `True`, it will only read the weights of the model.
- `--dtype`: torch_dtype when loading base model, default is `'AUTO'`, i.e. intelligently select dtype: if machine does not support bf16, use fp16; if `MODEL_MAPPING` specifies torch_dtype for corresponding model, use its dtype; otherwise use bf16. Options include: 'bf16', 'fp16', 'fp32'.
- `--dataset`: Used to select the training dataset, default is `[]`. You can see the list of available datasets [here](Supported-models-datasets.md#Datasets). If you need to train with multiple datasets, you can use ',' or ' ' to separate them, for example: `--dataset alpaca-en,alpaca-zh` or `--dataset alpaca-en alpaca-zh`. It supports Modelscope Hub/HuggingFace Hub/local paths, subset selection, and dataset sampling. The specified format for each dataset is as follows: `[HF or MS::]{dataset_name} or {dataset_id} or {dataset_path}[:subset1/subset2/...][#dataset_sample]`. The simplest case requires specifying only dataset_name, dataset_id, or dataset_path. Customizing datasets can be found in the [Customizing and Extending Datasets document](Customization.md#custom-dataset)
Expand Down
2 changes: 2 additions & 0 deletions docs/source_en/LLM/LLM-fine-tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ swift sft \
--output_dir output \

# Multi-machine multi-card
# If multiple machines share a disk, please additionally specify `--save_on_each_node false`.
# node0
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NNODES=2 \
Expand Down Expand Up @@ -241,6 +242,7 @@ print(f'history: {history}')

Using **Dataset** for evaluation:
```bash
# If you want to infer all dataset samples, please additionally specify `--show_dataset_sample -1`.
# Direct inference
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' \
Expand Down
7 changes: 4 additions & 3 deletions docs/source_en/LLM/Supported-models-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,10 @@ The table below introcudes all models supported by SWIFT:
|chatglm3-6b-32k|[ZhipuAI/chatglm3-6b-32k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-32k/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/chatglm3-6b-32k](https://huggingface.co/THUDM/chatglm3-6b-32k)|
|chatglm3-6b-128k|[ZhipuAI/chatglm3-6b-128k](https://modelscope.cn/models/ZhipuAI/chatglm3-6b-128k/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/chatglm3-6b-128k](https://huggingface.co/THUDM/chatglm3-6b-128k)|
|codegeex2-6b|[ZhipuAI/codegeex2-6b](https://modelscope.cn/models/ZhipuAI/codegeex2-6b/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;|transformers<4.34|coding|[THUDM/codegeex2-6b](https://huggingface.co/THUDM/codegeex2-6b)|
|glm4-9b|[ZhipuAI/glm-4-9b](https://modelscope.cn/models/ZhipuAI/glm-4-9b/summary)|query_key_value|chatglm-generation|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b](https://huggingface.co/THUDM/glm-4-9b)|
|glm4-9b-chat|[ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat)|
|glm4-9b-chat-1m|[ZhipuAI/glm-4-9b-chat-1m](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m/summary)|query_key_value|chatglm3|&#x2718;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b-chat-1m](https://huggingface.co/THUDM/glm-4-9b-chat-1m)|
|glm4-9b|[ZhipuAI/glm-4-9b](https://modelscope.cn/models/ZhipuAI/glm-4-9b/summary)|query_key_value|chatglm-generation|&#x2714;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b](https://huggingface.co/THUDM/glm-4-9b)|
|glm4-9b-chat|[ZhipuAI/glm-4-9b-chat](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat/summary)|query_key_value|chatglm3|&#x2714;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b-chat](https://huggingface.co/THUDM/glm-4-9b-chat)|
|glm4-9b-chat-1m|[ZhipuAI/glm-4-9b-chat-1m](https://modelscope.cn/models/ZhipuAI/glm-4-9b-chat-1m/summary)|query_key_value|chatglm3|&#x2714;|&#x2714;|transformers<4.42|-|[THUDM/glm-4-9b-chat-1m](https://huggingface.co/THUDM/glm-4-9b-chat-1m)|
|codegeex4-9b-chat|[ZhipuAI/codegeex4-all-9b](https://modelscope.cn/models/ZhipuAI/codegeex4-all-9b/summary)|query_key_value|codegeex4|&#x2714;|&#x2714;|transformers<4.42|coding|[THUDM/codegeex4-all-9b](https://huggingface.co/THUDM/codegeex4-all-9b)|
|llama2-7b|[modelscope/Llama-2-7b-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-ms/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf)|
|llama2-7b-chat|[modelscope/Llama-2-7b-chat-ms](https://modelscope.cn/models/modelscope/Llama-2-7b-chat-ms/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)|
|llama2-13b|[modelscope/Llama-2-13b-ms](https://modelscope.cn/models/modelscope/Llama-2-13b-ms/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;||-|[meta-llama/Llama-2-13b-hf](https://huggingface.co/meta-llama/Llama-2-13b-hf)|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -205,6 +205,7 @@ CUDA_VISIBLE_DEVICES=0 swift export \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx' --merge_lora true

# Evaluate using dataset
# If you want to infer all dataset samples, please additionally specify `--show_dataset_sample -1`.
CUDA_VISIBLE_DEVICES=0 swift infer \
--ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged' \
--infer_backend vllm \
Expand Down
1 change: 1 addition & 0 deletions requirements/framework.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ accelerate
aiohttp
binpacking
dacite
datasets<2.19
jieba
matplotlib
modelscope>=1.14
Expand Down
Loading

0 comments on commit 9796af9

Please sign in to comment.