Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internvl-40b模型微调后推理时报错 #1881

Closed
ymlab opened this issue Aug 31, 2024 · 2 comments
Closed

internvl-40b模型微调后推理时报错 #1881

ymlab opened this issue Aug 31, 2024 · 2 comments

Comments

@ymlab
Copy link

ymlab commented Aug 31, 2024

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
训练时使用default-zero3跑多机多卡,full参数训练,max_length 8192,训练时正常,推理时报错,疑似显存炸了,想问怎么样可以正常推理起来呢?
image
image

Your hardware and system info
Name: torch
Version: 2.1.2+cu121

@ymlab
Copy link
Author

ymlab commented Aug 31, 2024

推理脚本:
swift infer \ --ckpt_dir /xxx/checkpoint-7112/ \ --dataset xxx.json \ --dataset_test_ratio 1.0 \ --show_dataset_sample -1 \ --max_length 8192 \ --infer_backend lmdeploy \

@ymlab
Copy link
Author

ymlab commented Aug 31, 2024

换pt推理可以了,用2张卡就够用
CUDA_VISIBLE_DEVICES=0,1 swift infer --ckpt_dir /xxx/checkpoint-7112/ --dataset xxx.json --dataset_test_ratio 1.0 --show_dataset_sample -1 --max_length 8192 --infer_backend pt

@ymlab ymlab closed this as completed Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant