-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qwen2-vl微调使用flash_attn报错 #1887
Comments
cuda 版本是12.1 ,GPU是64张A800 |
去掉flash_attn |
|
flash attn & qwen2-vl有bug |
#1857 可以查看这个的解决方案 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
Your hardware and system info
torchrun --nproc_per_node ${num_gpu_per_node} --master_port $MASTER_PORT --master_addr $MASTER_ADDR --node_rank $RANK --nnodes $WORLD_SIZE examples/pytorch/llm/llm_sft.py
--model_cache_dir models/Qwen/Qwen2-VL-7B-Instruct
--model_type qwen2-vl-7b-instruct
--sft_type full
--freeze_vit true
--tuner_backend swift
--template_type AUTO
--output_dir output/-correction-0830
--ddp_backend nccl
--custom_train_dataset_path homework_correction_train2.jsonl
--system "你是一位小学数学作业批改专家。"
--dataset_test_ratio 0.01
--self_cognition_sample -1
--preprocess_num_proc 60
--dataloader_num_workers 60
--train_dataset_sample -1
--dataset_test_ratio 0.01
--save_strategy epoch
--lr_scheduler_type cosine
--save_total_limit 5
--num_train_epochs 5
--eval_steps 50
--logging_steps 10
--max_length 2048
--check_dataset_strategy warning
--gradient_checkpointing true
--batch_size 4
--gradient_accumulation_steps 1
--deepspeed_config_path ds_z2_config.json
--weight_decay 0.01
--learning_rate 1e-5
--max_grad_norm 0.5
--warmup_ratio 0.03
--use_flash_attn true
--save_only_model false
--save_on_each_node false
--lazy_tokenize true
--neftune_noise_alpha 5
--dtype AUTO
The text was updated successfully, but these errors were encountered: