Skip to content

Commit

Permalink
support modelscope-agent & fix bugs (modelscope#768)
Browse files Browse the repository at this point in the history
  • Loading branch information
Jintao-Huang committed Apr 22, 2024
1 parent 56940b5 commit e8fa3e0
Show file tree
Hide file tree
Showing 8 changed files with 53 additions and 17 deletions.
8 changes: 5 additions & 3 deletions docs/source/LLM/支持的模型和数据集.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
|qwen-72b-chat|[qwen/Qwen-72B-Chat](https://modelscope.cn/models/qwen/Qwen-72B-Chat/summary)|c_attn|qwen|✔|✔||-|[Qwen/Qwen-72B-Chat](https://huggingface.co/Qwen/Qwen-72B-Chat)|
|qwen-72b-chat-int4|[qwen/Qwen-72B-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-72B-Chat-Int4/summary)|c_attn|qwen|✔|✔|auto_gptq>=0.5|-|[Qwen/Qwen-72B-Chat-Int4](https://huggingface.co/Qwen/Qwen-72B-Chat-Int4)|
|qwen-72b-chat-int8|[qwen/Qwen-72B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-72B-Chat-Int8/summary)|c_attn|qwen|✔|✘|auto_gptq>=0.5|-|[Qwen/Qwen-72B-Chat-Int8](https://huggingface.co/Qwen/Qwen-72B-Chat-Int8)|
|modelscope-agent-7b|[iic/ModelScope-Agent-7B](https://modelscope.cn/models/iic/ModelScope-Agent-7B/summary)|c_attn|modelscope-agent|✔|✘||-|-|
|modelscope-agent-14b|[iic/ModelScope-Agent-14B](https://modelscope.cn/models/iic/ModelScope-Agent-14B/summary)|c_attn|modelscope-agent|✔|✘||-|-|
|qwen1half-0_5b|[qwen/Qwen1.5-0.5B](https://modelscope.cn/models/qwen/Qwen1.5-0.5B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B)|
|qwen1half-1_8b|[qwen/Qwen1.5-1.8B](https://modelscope.cn/models/qwen/Qwen1.5-1.8B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-1.8B](https://huggingface.co/Qwen/Qwen1.5-1.8B)|
|qwen1half-4b|[qwen/Qwen1.5-4B](https://modelscope.cn/models/qwen/Qwen1.5-4B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-4B](https://huggingface.co/Qwen/Qwen1.5-4B)|
Expand Down Expand Up @@ -269,8 +271,8 @@

| Dataset Name | Dataset ID | Train Size | Val Size | Statistic (token) | Tags | HF Dataset ID |
| ------------ | ---------- | ---------- | -------- | ----------------- | ---- | ------------- |
|🔥ms-bench|[iic/ms_bench](https://modelscope.cn/datasets/iic/ms_bench/summary)|316228|0|345.0±441.3, min=22, max=30960|chat, general, multi-round|-|
|🔥ms-bench-mini|[iic/ms_bench](https://modelscope.cn/datasets/iic/ms_bench/summary)|19492|0|353.9±439.4, min=29, max=12078|chat, general, multi-round|-|
|🔥ms-bench|[iic/ms_bench](https://modelscope.cn/datasets/iic/ms_bench/summary)|316931|0|347.3±444.1, min=22, max=30960|chat, general, multi-round|-|
|🔥ms-bench-mini|[iic/ms_bench](https://modelscope.cn/datasets/iic/ms_bench/summary)|19960|0|356.6±443.3, min=29, max=12078|chat, general, multi-round|-|
|🔥alpaca-en|[AI-ModelScope/alpaca-gpt4-data-en](https://modelscope.cn/datasets/AI-ModelScope/alpaca-gpt4-data-en/summary)|52002|0|176.2±125.8, min=26, max=740|chat, general|[vicgalle/alpaca-gpt4](https://huggingface.co/datasets/vicgalle/alpaca-gpt4)|
|🔥alpaca-zh|[AI-ModelScope/alpaca-gpt4-data-zh](https://modelscope.cn/datasets/AI-ModelScope/alpaca-gpt4-data-zh/summary)|48818|0|162.1±93.9, min=26, max=856|chat, general|[c-s-ale/alpaca-gpt4-data-zh](https://huggingface.co/datasets/c-s-ale/alpaca-gpt4-data-zh)|
|multi-alpaca-all|[damo/nlp_polylm_multialpaca_sft](https://modelscope.cn/datasets/damo/nlp_polylm_multialpaca_sft/summary)|131867|0|112.9±50.6, min=26, max=1226|chat, general, multilingual|-|
Expand Down Expand Up @@ -310,7 +312,7 @@
|🔥disc-med-sft-zh|[AI-ModelScope/DISC-Med-SFT](https://modelscope.cn/datasets/AI-ModelScope/DISC-Med-SFT/summary)|441767|0|354.1±193.1, min=25, max=2231|chat, medical|[Flmc/DISC-Med-SFT](https://huggingface.co/datasets/Flmc/DISC-Med-SFT)|
|lawyer-llama-zh|[AI-ModelScope/lawyer_llama_data](https://modelscope.cn/datasets/AI-ModelScope/lawyer_llama_data/summary)|21476|0|194.4±91.7, min=27, max=924|chat, law|[Skepsun/lawyer_llama_data](https://huggingface.co/datasets/Skepsun/lawyer_llama_data)|
|tigerbot-law-zh|[AI-ModelScope/tigerbot-law-plugin](https://modelscope.cn/datasets/AI-ModelScope/tigerbot-law-plugin/summary)|55895|0|109.9±126.4, min=37, max=18878|text-generation, law, pretrained|[TigerResearch/tigerbot-law-plugin](https://huggingface.co/datasets/TigerResearch/tigerbot-law-plugin)|
|🔥disc-law-sft-zh|[AI-ModelScope/DISC-Law-SFT](https://modelscope.cn/datasets/AI-ModelScope/DISC-Law-SFT/summary)|166758|0|533.7±495.4, min=30, max=15169|chat, law|-|
|🔥disc-law-sft-zh|[AI-ModelScope/DISC-Law-SFT](https://modelscope.cn/datasets/AI-ModelScope/DISC-Law-SFT/summary)|166758|0|533.7±495.4, min=30, max=15169|chat, law|[ShengbinYue/DISC-Law-SFT](https://huggingface.co/datasets/ShengbinYue/DISC-Law-SFT)|
|🔥blossom-math-zh|[AI-ModelScope/blossom-math-v2](https://modelscope.cn/datasets/AI-ModelScope/blossom-math-v2/summary)|10000|0|169.3±58.7, min=35, max=563|chat, math|[Azure99/blossom-math-v2](https://huggingface.co/datasets/Azure99/blossom-math-v2)|
|school-math-zh|[AI-ModelScope/school_math_0.25M](https://modelscope.cn/datasets/AI-ModelScope/school_math_0.25M/summary)|248480|0|157.6±72.1, min=33, max=3450|chat, math|[BelleGroup/school_math_0.25M](https://huggingface.co/datasets/BelleGroup/school_math_0.25M)|
|open-platypus-en|[AI-ModelScope/Open-Platypus](https://modelscope.cn/datasets/AI-ModelScope/Open-Platypus/summary)|24926|0|367.9±254.8, min=30, max=3951|chat, math|[garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus)|
Expand Down
8 changes: 5 additions & 3 deletions docs/source_en/LLM/Supported-models-datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ The table below introcudes all models supported by SWIFT:
|qwen-72b-chat|[qwen/Qwen-72B-Chat](https://modelscope.cn/models/qwen/Qwen-72B-Chat/summary)|c_attn|qwen|✔|✔||-|[Qwen/Qwen-72B-Chat](https://huggingface.co/Qwen/Qwen-72B-Chat)|
|qwen-72b-chat-int4|[qwen/Qwen-72B-Chat-Int4](https://modelscope.cn/models/qwen/Qwen-72B-Chat-Int4/summary)|c_attn|qwen|✔|✔|auto_gptq>=0.5|-|[Qwen/Qwen-72B-Chat-Int4](https://huggingface.co/Qwen/Qwen-72B-Chat-Int4)|
|qwen-72b-chat-int8|[qwen/Qwen-72B-Chat-Int8](https://modelscope.cn/models/qwen/Qwen-72B-Chat-Int8/summary)|c_attn|qwen|✔|✘|auto_gptq>=0.5|-|[Qwen/Qwen-72B-Chat-Int8](https://huggingface.co/Qwen/Qwen-72B-Chat-Int8)|
|modelscope-agent-7b|[iic/ModelScope-Agent-7B](https://modelscope.cn/models/iic/ModelScope-Agent-7B/summary)|c_attn|modelscope-agent|✔|✘||-|-|
|modelscope-agent-14b|[iic/ModelScope-Agent-14B](https://modelscope.cn/models/iic/ModelScope-Agent-14B/summary)|c_attn|modelscope-agent|✔|✘||-|-|
|qwen1half-0_5b|[qwen/Qwen1.5-0.5B](https://modelscope.cn/models/qwen/Qwen1.5-0.5B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-0.5B](https://huggingface.co/Qwen/Qwen1.5-0.5B)|
|qwen1half-1_8b|[qwen/Qwen1.5-1.8B](https://modelscope.cn/models/qwen/Qwen1.5-1.8B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-1.8B](https://huggingface.co/Qwen/Qwen1.5-1.8B)|
|qwen1half-4b|[qwen/Qwen1.5-4B](https://modelscope.cn/models/qwen/Qwen1.5-4B/summary)|q_proj, k_proj, v_proj|default-generation|✔|✔|transformers>=4.37|-|[Qwen/Qwen1.5-4B](https://huggingface.co/Qwen/Qwen1.5-4B)|
Expand Down Expand Up @@ -269,8 +271,8 @@ The table below introduces the datasets supported by SWIFT:

| Dataset Name | Dataset ID | Train Size | Val Size | Statistic (token) | Tags | HF Dataset ID |
| ------------ | ---------- | ---------- | -------- | ----------------- | ---- | ------------- |
|🔥ms-bench|[iic/ms_bench](https://modelscope.cn/datasets/iic/ms_bench/summary)|316228|0|345.0±441.3, min=22, max=30960|chat, general, multi-round|-|
|🔥ms-bench-mini|[iic/ms_bench](https://modelscope.cn/datasets/iic/ms_bench/summary)|19492|0|353.9±439.4, min=29, max=12078|chat, general, multi-round|-|
|🔥ms-bench|[iic/ms_bench](https://modelscope.cn/datasets/iic/ms_bench/summary)|316931|0|347.3±444.1, min=22, max=30960|chat, general, multi-round|-|
|🔥ms-bench-mini|[iic/ms_bench](https://modelscope.cn/datasets/iic/ms_bench/summary)|19960|0|356.6±443.3, min=29, max=12078|chat, general, multi-round|-|
|🔥alpaca-en|[AI-ModelScope/alpaca-gpt4-data-en](https://modelscope.cn/datasets/AI-ModelScope/alpaca-gpt4-data-en/summary)|52002|0|176.2±125.8, min=26, max=740|chat, general|[vicgalle/alpaca-gpt4](https://huggingface.co/datasets/vicgalle/alpaca-gpt4)|
|🔥alpaca-zh|[AI-ModelScope/alpaca-gpt4-data-zh](https://modelscope.cn/datasets/AI-ModelScope/alpaca-gpt4-data-zh/summary)|48818|0|162.1±93.9, min=26, max=856|chat, general|[c-s-ale/alpaca-gpt4-data-zh](https://huggingface.co/datasets/c-s-ale/alpaca-gpt4-data-zh)|
|multi-alpaca-all|[damo/nlp_polylm_multialpaca_sft](https://modelscope.cn/datasets/damo/nlp_polylm_multialpaca_sft/summary)|131867|0|112.9±50.6, min=26, max=1226|chat, general, multilingual|-|
Expand Down Expand Up @@ -310,7 +312,7 @@ The table below introduces the datasets supported by SWIFT:
|🔥disc-med-sft-zh|[AI-ModelScope/DISC-Med-SFT](https://modelscope.cn/datasets/AI-ModelScope/DISC-Med-SFT/summary)|441767|0|354.1±193.1, min=25, max=2231|chat, medical|[Flmc/DISC-Med-SFT](https://huggingface.co/datasets/Flmc/DISC-Med-SFT)|
|lawyer-llama-zh|[AI-ModelScope/lawyer_llama_data](https://modelscope.cn/datasets/AI-ModelScope/lawyer_llama_data/summary)|21476|0|194.4±91.7, min=27, max=924|chat, law|[Skepsun/lawyer_llama_data](https://huggingface.co/datasets/Skepsun/lawyer_llama_data)|
|tigerbot-law-zh|[AI-ModelScope/tigerbot-law-plugin](https://modelscope.cn/datasets/AI-ModelScope/tigerbot-law-plugin/summary)|55895|0|109.9±126.4, min=37, max=18878|text-generation, law, pretrained|[TigerResearch/tigerbot-law-plugin](https://huggingface.co/datasets/TigerResearch/tigerbot-law-plugin)|
|🔥disc-law-sft-zh|[AI-ModelScope/DISC-Law-SFT](https://modelscope.cn/datasets/AI-ModelScope/DISC-Law-SFT/summary)|166758|0|533.7±495.4, min=30, max=15169|chat, law|-|
|🔥disc-law-sft-zh|[AI-ModelScope/DISC-Law-SFT](https://modelscope.cn/datasets/AI-ModelScope/DISC-Law-SFT/summary)|166758|0|533.7±495.4, min=30, max=15169|chat, law|[ShengbinYue/DISC-Law-SFT](https://huggingface.co/datasets/ShengbinYue/DISC-Law-SFT)|
|🔥blossom-math-zh|[AI-ModelScope/blossom-math-v2](https://modelscope.cn/datasets/AI-ModelScope/blossom-math-v2/summary)|10000|0|169.3±58.7, min=35, max=563|chat, math|[Azure99/blossom-math-v2](https://huggingface.co/datasets/Azure99/blossom-math-v2)|
|school-math-zh|[AI-ModelScope/school_math_0.25M](https://modelscope.cn/datasets/AI-ModelScope/school_math_0.25M/summary)|248480|0|157.6±72.1, min=33, max=3450|chat, math|[BelleGroup/school_math_0.25M](https://huggingface.co/datasets/BelleGroup/school_math_0.25M)|
|open-platypus-en|[AI-ModelScope/Open-Platypus](https://modelscope.cn/datasets/AI-ModelScope/Open-Platypus/summary)|24926|0|367.9±254.8, min=30, max=3951|chat, math|[garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus)|
Expand Down
16 changes: 11 additions & 5 deletions swift/llm/infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ def merge_lora(args: InferArguments,
else:
model, template = prepare_model_template(
args, device_map=args.merge_device_map, verbose=False)
logger.info('Merge lora...')
logger.info('Merge LoRA...')
Swift.merge_and_unload(model)
model = model.model
logger.info('Saving merged weights...')
Expand Down Expand Up @@ -254,9 +254,15 @@ def llm_infer(args: InferArguments) -> None:
# Inference
result = []
jsonl_path = None
if args.save_result and args.ckpt_dir is not None:
time = dt.datetime.now().strftime('%Y%m%d-%H%M%S')
jsonl_path = os.path.join(args.ckpt_dir, f'infer_result_{time}.jsonl')
if args.save_result:
result_dir = args.ckpt_dir
if result_dir is None:
result_dir = model.model_dir
if result_dir is not None:
result_dir = os.path.join(result_dir, 'infer_result')
os.makedirs(result_dir, exist_ok=True)
time = dt.datetime.now().strftime('%Y%m%d-%H%M%S')
jsonl_path = os.path.join(result_dir, f'{time}.jsonl')
if args.eval_human:
input_mode: Literal['S', 'M'] = 'S'
logger.info('Input `exit` or `quit` to exit the conversation.')
Expand Down Expand Up @@ -466,7 +472,7 @@ def llm_infer(args: InferArguments) -> None:
if images is not None:
print(f'[IMAGES]{images}')
print('-' * 50)
if args.save_result and args.ckpt_dir is not None:
if jsonl_path is not None:
logger.info(f'save_result_path: {jsonl_path}')
if args.val_dataset_sample == 10: # is default
logger.info(
Expand Down
6 changes: 3 additions & 3 deletions swift/llm/sft.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,9 +161,6 @@ def llm_sft(args: SftArguments) -> Dict[str, Union[str, Any]]:
args.self_cognition_sample,
args.model_name,
args.model_author)
if val_dataset is None:
training_args.evaluation_strategy = IntervalStrategy.NO
training_args.do_eval = False
logger.info(f'train_dataset: {train_dataset}')
logger.info(f'val_dataset: {val_dataset}')
template_kwargs = {}
Expand Down Expand Up @@ -202,6 +199,9 @@ def llm_sft(args: SftArguments) -> Dict[str, Union[str, Any]]:
train_dataset = LazyLLMDataset(train_dataset, template)
if val_dataset is not None:
val_dataset = LazyLLMDataset(val_dataset, template)
if val_dataset is None:
training_args.evaluation_strategy = IntervalStrategy.NO
training_args.do_eval = False

padding_to = args.max_length if args.sft_type == 'longlora' else None
data_collator = partial(template.data_collator, padding_to=padding_to)
Expand Down
7 changes: 5 additions & 2 deletions swift/llm/utils/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -669,15 +669,18 @@ def map_row(row):
'damo/MSAgent-Bench', ['train'], ['validation'],
ConversationsPreprocessor(
repair_conversations=partial(
_repair_agent_conversations, use_mini=True)),
_repair_agent_conversations, use_mini=True),
error_strategy='delete'),
get_dataset_from_repo,
tags=['chat', 'agent', 'multi-round'])
register_dataset(
DatasetName.damo_agent_zh,
'damo/MSAgent-Bench', ['train'], ['validation'],
ConversationsPreprocessor(
repair_conversations=partial(
_repair_agent_conversations, use_mini=False)),
_repair_agent_conversations,
use_mini=False,
error_strategy='delete')),
get_dataset_from_repo,
tags=['chat', 'agent', 'multi-round'])

Expand Down
16 changes: 16 additions & 0 deletions swift/llm/utils/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,8 @@ class ModelType:
qwen_72b_chat = 'qwen-72b-chat'
qwen_72b_chat_int4 = 'qwen-72b-chat-int4'
qwen_72b_chat_int8 = 'qwen-72b-chat-int8'
modelscope_agent_7b = 'modelscope-agent-7b'
modelscope_agent_14b = 'modelscope-agent-14b'
# qwen1.5
qwen1half_0_5b = 'qwen1half-0_5b'
qwen1half_1_8b = 'qwen1half-1_8b'
Expand Down Expand Up @@ -2749,6 +2751,20 @@ def get_model_tokenizer_qwen(model_dir: str,
return model, tokenizer


@register_model(
ModelType.modelscope_agent_7b,
'iic/ModelScope-Agent-7B',
LoRATM.qwen,
TemplateType.modelscope_agent,
support_flash_attn=True,
support_vllm=False)
@register_model(
ModelType.modelscope_agent_14b,
'iic/ModelScope-Agent-14B',
LoRATM.qwen,
TemplateType.modelscope_agent,
support_flash_attn=True,
support_vllm=False)
@register_model(
ModelType.codefuse_qwen_14b_chat,
'codefuse-ai/CodeFuse-QWen-14B',
Expand Down
3 changes: 2 additions & 1 deletion swift/llm/utils/protocol.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ class XRequestConfig:
repetition_penalty = 1.
"""
max_tokens: Optional[int] = None # None: max_model_len - num_tokens
temperature: Optional[float] = None # None: use deploy_args
# None: use deploy_args
temperature: Optional[float] = None
top_p: Optional[float] = None
repetition_penalty: Optional[float] = None

Expand Down
6 changes: 6 additions & 0 deletions swift/llm/utils/template.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ class TemplateType:
default = 'default'
qwen = 'qwen'
qwen_audio = 'qwen-audio'
modelscope_agent = 'modelscope-agent'
baichuan = 'baichuan'
chatglm2 = 'chatglm2'
chatglm3 = 'chatglm3'
Expand Down Expand Up @@ -589,6 +590,11 @@ def __init__(self):
register_template(TemplateType.qwen, QwenTemplate())
register_template(TemplateType.chatml, QwenTemplate())

register_template(
TemplateType.modelscope_agent,
Template([], [' \n\n<|user|>:{{QUERY}} \n\n<|assistant|>:'], [],
[' \n\n</s>'], DEFAULT_SYSTEM, [' \n\n<|system|>:{{SYSTEM}}']))


class _QwenAudioTemplateMixin:

Expand Down

0 comments on commit e8fa3e0

Please sign in to comment.