You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This bug has re-appeared in the latest ms-swift version.
This bug was initially reported in this issue, and was solved promptly. Now, with the latest version of ms-swift, it has re-appeared.
I am trying to DPO fine-tune the model GLM-4V-9B using the same command as in the original post here. Command:
Initial error obtained is "AttributeError: 'list' object has no attribute 'squeeze'". The detailed error is:
Train: 0%| | 0/122 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/VDIL_COREML/m.banerjee/ms-swift/swift/cli/rlhf.py", line 5, in <module>
rlhf_main()
File "/VDIL_COREML/m.banerjee/ms-swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/rlhf.py", line 282, in llm_rlhf
trainer.train(training_args.resume_from_checkpoint)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 101, in train
res = super().train(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/mixin.py", line 426, in train
res = super().train(resume_from_checkpoint, *args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 1932, in train
return inner_training_loop(
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 3307, in training_step
loss = self.compute_loss(model, inputs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/trl/trainer/dpo_trainer.py", line 1520, in compute_loss
loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 115, in get_batch_loss_metrics
forward_output = self.concatenated_forward(model, batch)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 189, in concatenated_forward
concatenated_batch = self.concatenated_inputs(
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 369, in concatenated_inputs
concatenated_batch['images'] = batch['vision_images'].squeeze(1).repeat(2, 1, 1, 1).to(device=device)
AttributeError: 'list' object has no attribute 'squeeze'
Train: 0%| | 0/122 [00:00<?, ?it/s]
When considering only the first sample in the list instead, to bypass the above error, got the error "UnboundLocalError: local variable 'num_patches' referenced before assignment". The detailed error is as follows:
Train: 0%| | 0/122 [00:00<?, ?it/s]Traceback (most recent call last):
File "/VDIL_COREML/m.banerjee/ms-swift/swift/cli/rlhf.py", line 5, in <module>
rlhf_main()
File "/VDIL_COREML/m.banerjee/ms-swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/rlhf.py", line 282, in llm_rlhf
trainer.train(training_args.resume_from_checkpoint)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 101, in train
res = super().train(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/mixin.py", line 426, in train
res = super().train(resume_from_checkpoint, *args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 1932, in train
return inner_training_loop(
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 3307, in training_step
loss = self.compute_loss(model, inputs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/trl/trainer/dpo_trainer.py", line 1520, in compute_loss
loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 115, in get_batch_loss_metrics
forward_output = self.concatenated_forward(model, batch)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 235, in concatenated_forward
outputs = model(
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
result = forward_call(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/accelerate/utils/operations.py", line 820, in forward
return model_forward(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/accelerate/utils/operations.py", line 808, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
return func(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/peft/peft_model.py", line 1577, in forward
return self.base_model(
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 188, in forward
return self.model.forward(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 1176, in forward
transformer_outputs = self.transformer(
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 1024, in forward
(attention_mask[i, :boi_token_pos + 1], torch.ones(num_patches).to(attention_mask.device),
UnboundLocalError: local variable 'num_patches' referenced before assignment
Train: 0%| | 0/122 [00:00<?, ?it/s]
Your hardware and system info
CUDA Version: 12.4
System: Ubuntu 22.04.3 LTS
GPU
torch==2.4.0
transformers==4.44.0
trl==0.10.1
peft==0.12.0
The text was updated successfully, but these errors were encountered:
This bug has re-appeared in the latest ms-swift version.
This bug was initially reported in this issue, and was solved promptly. Now, with the latest version of ms-swift, it has re-appeared.
I am trying to DPO fine-tune the model GLM-4V-9B using the same command as in the original post here.
Command:
Initial error obtained is "AttributeError: 'list' object has no attribute 'squeeze'". The detailed error is:
When considering only the first sample in the list instead, to bypass the above error, got the error "UnboundLocalError: local variable 'num_patches' referenced before assignment". The detailed error is as follows:
Your hardware and system info
CUDA Version: 12.4
System: Ubuntu 22.04.3 LTS
GPU
torch==2.4.0
transformers==4.44.0
trl==0.10.1
peft==0.12.0
The text was updated successfully, but these errors were encountered: