Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Re-appeared] DPO training error UnboundLocalError: local variable 'num_patches' referenced before assignment #2012

Closed
Lopa07 opened this issue Sep 11, 2024 · 1 comment

Comments

@Lopa07
Copy link

Lopa07 commented Sep 11, 2024

This bug has re-appeared in the latest ms-swift version.
This bug was initially reported in this issue, and was solved promptly. Now, with the latest version of ms-swift, it has re-appeared.

I am trying to DPO fine-tune the model GLM-4V-9B using the same command as in the original post here.
Command:

CUDA_VISIBLE_DEVICES=0 \
swift rlhf \
    --rlhf_type dpo \
    --model_type glm4v-9b-chat \
    --beta 0.1 \
    --sft_beta 0.1 \
    --sft_type lora \
    --dataset rlaif-v#1000 \
    --num_train_epochs 2 \
    --lora_target_modules DEFAULT \
    --gradient_checkpointing true \
    --batch_size 1 \
    --learning_rate 5e-5 \
    --gradient_accumulation_steps 16 \
    --warmup_ratio 0.03 \
    --save_total_limit 2

Initial error obtained is "AttributeError: 'list' object has no attribute 'squeeze'". The detailed error is:

Train:   0%|                                                          | 0/122 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/cli/rlhf.py", line 5, in <module>
    rlhf_main()
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/rlhf.py", line 282, in llm_rlhf
    trainer.train(training_args.resume_from_checkpoint)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 101, in train
    res = super().train(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/mixin.py", line 426, in train
    res = super().train(resume_from_checkpoint, *args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 1932, in train
    return inner_training_loop(
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 3307, in training_step
    loss = self.compute_loss(model, inputs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/trl/trainer/dpo_trainer.py", line 1520, in compute_loss
    loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 115, in get_batch_loss_metrics
    forward_output = self.concatenated_forward(model, batch)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 189, in concatenated_forward
    concatenated_batch = self.concatenated_inputs(
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 369, in concatenated_inputs
    concatenated_batch['images'] = batch['vision_images'].squeeze(1).repeat(2, 1, 1, 1).to(device=device)
AttributeError: 'list' object has no attribute 'squeeze'
Train:   0%|                                                          | 0/122 [00:00<?, ?it/s]

When considering only the first sample in the list instead, to bypass the above error, got the error "UnboundLocalError: local variable 'num_patches' referenced before assignment". The detailed error is as follows:

Train:   0%|                                                          | 0/122 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/cli/rlhf.py", line 5, in <module>
    rlhf_main()
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/rlhf.py", line 282, in llm_rlhf
    trainer.train(training_args.resume_from_checkpoint)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 101, in train
    res = super().train(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/mixin.py", line 426, in train
    res = super().train(resume_from_checkpoint, *args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 1932, in train
    return inner_training_loop(
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 3307, in training_step
    loss = self.compute_loss(model, inputs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/trl/trainer/dpo_trainer.py", line 1520, in compute_loss
    loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 115, in get_batch_loss_metrics
    forward_output = self.concatenated_forward(model, batch)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 235, in concatenated_forward
    outputs = model(
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/accelerate/utils/operations.py", line 820, in forward
    return model_forward(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/accelerate/utils/operations.py", line 808, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
    return func(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/peft/peft_model.py", line 1577, in forward
    return self.base_model(
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 188, in forward
    return self.model.forward(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 1176, in forward
    transformer_outputs = self.transformer(
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 1024, in forward
    (attention_mask[i, :boi_token_pos + 1], torch.ones(num_patches).to(attention_mask.device),
UnboundLocalError: local variable 'num_patches' referenced before assignment
Train:   0%|                                                          | 0/122 [00:00<?, ?it/s]

Your hardware and system info
CUDA Version: 12.4
System: Ubuntu 22.04.3 LTS
GPU
torch==2.4.0
transformers==4.44.0
trl==0.10.1
peft==0.12.0

@Jintao-Huang
Copy link
Collaborator

#1975

@Lopa07 Lopa07 closed this as completed Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants