Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

self.balance = {3: [4.0, 1.0, 0.4], 4: [4.0, 1.0, 0.25, 0.06], 5: [4.0, 1.0, 0.25, 0.06, .02]}[det.nl] #2255

Closed
xiaowo1996 opened this issue Feb 20, 2021 · 22 comments · Fixed by #2256 or #2266
Labels
bug Something isn't working

Comments

@xiaowo1996
Copy link
Contributor

🐛 Bug

I train yolov5s.yaml with voc2007andvoc2012dataset is ok, but when i edit the yolov5s.yaml,and train, the erro was occured.
I changed yolov5s.yaml same as he does.#1237 (comment)
image
image
the erro is :
Traceback (most recent call last):
File "train.py", line 526, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 233, in train
compute_loss = ComputeLoss(model) # init loss class
File "/yolov5/utils/loss.py", line 108, in init
self.balance = {3: [4.0, 1.0, 0.4], 4: [4.0, 1.0, 0.25, 0.06], 5: [4.0, 1.0, 0.25, 0.06, .02]}[det.nl]
KeyError: 2

To Reproduce (REQUIRED)

Input:

import torch

a = torch.tensor([5])
c = a / 0

Output:

Traceback (most recent call last):
  File "/Users/glennjocher/opt/anaconda3/envs/env1/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-be04c762b799>", line 5, in <module>
    c = a / 0
RuntimeError: ZeroDivisionError

Expected behavior

A clear and concise description of what you expected to happen.

Environment

If applicable, add screenshots to help explain your problem.

  • OS: [e.g. Ubuntu]
  • GPU [e.g. 2080 Ti]

Additional context

Add any other context about the problem here.

@xiaowo1996 xiaowo1996 added the bug Something isn't working label Feb 20, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Feb 20, 2021

👋 Hello @xiaowo1996, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@glenn-jocher
Copy link
Member

@xiaowo1996 thanks for the bug report! Yes your modifications are correct. It seems the balancing code we have in loss.py is not robust to 2 layer outputs. I will submit a PR with a fix.

@glenn-jocher
Copy link
Member

@xiaowo1996 this problem should be resolved now in PR #2256. Please git pull to receive this update and try again!

@xiaowo1996
Copy link
Contributor Author

@glenn-jocher thank you sir , when i git pull updata the project,this erro was disappear.but the new issue was appeared during the trainning time.the new issue is:
RuntimeError: Sizes of tensors must match except in dimension 3. Got 41 and 42 (The offending index is 0)
5
my enviromet is dokcerimages you offer,and i train orgin yolov5s.yaml successful,the train data is voc2012,but edited yam file didn't work

@xiaowo1996
Copy link
Contributor Author

@glenn-jocher I try again in the googlecolab,and the dataset is you provided from the command bash data/scripts/get_voc.sh you provided, my train command is :
python train.py --data data/voc.yaml --cfg models/yolov5s.yaml --weights weights/yolov5s.pt --batch-size 16 --epochs 500.
the yolov5s.yaml i have been edited same as above,but the same issue was appear
13

@glenn-jocher
Copy link
Member

@xiaowo1996 you don't need to download VOC manually, it will download automatically on first use. I will try with a P3-P4 model.

@glenn-jocher
Copy link
Member

@xiaowo1996 you can use the colab notebook to get started easily. You 1) run the setup cell, and then 2) run the VOC training cell (in the Appendix). The VOC training cell contents are:

# VOC
for b, m in zip([64, 48, 32, 16], ['yolov5s', 'yolov5m', 'yolov5l', 'yolov5x']):  # zip(batch_size, model)
  !python train.py --batch {b} --weights {m}.pt --data voc.yaml --epochs 50 --cache --img 512 --nosave --hyp hyp.finetune.yaml --project VOC --name {m}

@glenn-jocher
Copy link
Member

@xiaowo1996 everything works perfectly well training a P3-P4 model on VOC, I encountered no problems:

Screen Shot 2021-02-21 at 11 50 55 AM

@xiaowo1996
Copy link
Contributor Author

@glenn-jocher Thankyou, sir. In fact,This error occurred during the test time clearly,can you wait a time and see what happen?I try again same as you do.and same issue appeared:
8
9

@glenn-jocher
Copy link
Member

glenn-jocher commented Feb 22, 2021

Ah, test time. Yes I will try that, hold on.

@glenn-jocher
Copy link
Member

@xiaowo1996 yes, I get the same result as you now!
RuntimeError: Sizes of tensors must match except in dimension 3. Got 33 and 34 (The offending index is 0)

I think this is caused by a too small stride to support the feature map size reduction the models need. 32 may be the minimum image stride supported.

@glenn-jocher
Copy link
Member

@xiaowo1996 this should be fixed in #2266, which I just merged now. I've enforced a minimum stride of 32 regardless of architecture now to fully support the downsample and upsample ops in the YOLOv5 models. Please git pull and try again, and let us know if the problem persists or if you have any other issues!

@xiaowo1996
Copy link
Contributor Author

@glenn-jocher Thank you so much,your are very kind.After git pull,this issue is solved.

@glenn-jocher
Copy link
Member

@xiaowo1996 great!

@GMN23362
Copy link

GMN23362 commented May 9, 2022

@xiaowo1996这应该在我刚刚合并的#2266中修复。无论架构如何,我现在都强制执行 32 的最小步幅,以完全支持 YOLOv5 模型中的下采样和上采样操作。请 git pull 并重试,如果问题仍然存在或您还有其他问题,请告诉我们!

Same problem! But there is an error called 'torch.nn.modules.module.ModuleAttributeError:'Darknet' object has no attribute 'stride''.

@glenn-jocher
Copy link
Member

@GMN23362 we don't have any modules or objects called Darknet in YOLOv5

@TheSole0
Copy link

@glenn-jocher

I got the same above problems in Yolov8
My command is yolo detect train data=kitti.yaml model=yolov8s_test.yaml epochs=50 imgsz=640
this is my yaml file

`
backbone:
[from, repeats, module, args]

  • [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  • [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  • [-1, 3, C2f, [128, True]] # 2
  • [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  • [-1, 6, C2f, [256, True]] # 4
  • [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  • [-1, 6, C2f, [512, True]] # 6
  • [-1, 1, Conv, [768, 3, 2]] # 7-P5/32
  • [-1, 6, C2f, [768, True]] # 8
  • [-1, 1, Conv, [1024, 3, 2]] # 9-P6/64
  • [-1, 3, C2f, [1024, True]] # 10
  • [-1, 1, SPPF, [1024, 5]] # 11

head:

  • [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 12

  • [[-1, 8], 1, Concat, [1]] # cat backbone P5 # 13

  • [-1, 3, C2f, [768]] # 14

  • [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 15

  • [[-1, 6], 1, Concat, [1]] # cat backbone P4 # 16

  • [-1, 3, C2f, [512]] # (P4/16-medium) # 17

  • [-1, 1, Conv, [512, 3, 2]] # 18

  • [[-1, 14], 1, Concat, [1]] # cat head P3 # 19

  • [-1, 3, C2f, [768]] # (P5/32-large) # 20

  • [[17, 20], 1, Detect, [nc]] # Detect(P4, P5, P6) # 24
    `
    Actually, I want to use only medium and large heads that's why I did to scale up in backbone but it couldn't
    Why is this not working? Anybody can help me?

Image sizes 640 train, 640 val Using 8 dataloader workers Logging results to runs\detect\train11 Starting training for 50 epochs... Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 1/50 3.25G 4.309 3.989 3.242 11 640: 100%|██████████| 335/335 [00:58<00:00 Class Images Instances Box(P R mAP50 mAP50-95): 0%| | 0/43 [00: Traceback (most recent call last): File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\dlwlsgh\.conda\envs\yolov8\Scripts\yolo.exe\__main__.py", line 7, in <module> File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\ultralytics\yolo\cfg\__init__.py", line 317, in entrypoint getattr(model, mode)(**overrides) # default args from model File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\ultralytics\yolo\engine\model.py", line 325, in train self.trainer.train() File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\ultralytics\yolo\engine\trainer.py", line 186, in train self._do_train(RANK, world_size) File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\ultralytics\yolo\engine\trainer.py", line 357, in _do_train self.metrics, self.fitness = self.validate() File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\ultralytics\yolo\engine\trainer.py", line 453, in validate metrics = self.validator(self) File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\torch\autograd\grad_mode.py", line 26, in decorate_context return func(*args, **kwargs) File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\ultralytics\yolo\engine\validator.py", line 159, in __call__ preds = model(batch['img']) File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\ultralytics\nn\tasks.py", line 199, in forward return self._forward_once(x, profile, visualize) # single-scale inference, train File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\ultralytics\nn\tasks.py", line 58, in _forward_once x = m(x) # run File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "C:\Users\dlwlsgh\.conda\envs\yolov8\lib\site-packages\ultralytics\nn\modules.py", line 352, in forward return torch.cat(x, self.d) RuntimeError: Sizes of tensors must match except in dimension 2. Got 7 and 8 (The offending index is 0)

image

image

I already try to change stride 32 to 64 in "v5loader.py", "build.py" and "trainer.py"
Plz help me

@glenn-jocher
Copy link
Member

@TheSole0 hello,

It looks like your error is caused by a dimension mismatch issue between two tensors. The error message says "Sizes of tensors must match except in dimension 2. Got 7 and 8". This means that two tensors that should have matched sizes do not match in dimension 2, where one tensor has size 7 and the other has size 8.

From your command, it seems that you are using YOLOv8. Can you provide more information on what version of YOLOv8 you are using?

Also, I noticed that you have reduced the feature maps in the backbone, causing the mismatch between tensors. Given that the YOLOv8 architecture differs from the YOLOv5 architecture, I would not recommend reducing the feature maps in the backbone for YOLOv8. The architecture is designed to work best with the default feature map sizes.

Please let me know if this helps, or if you have any other questions.

@TheSole0
Copy link

TheSole0 commented May 31, 2023

@glenn-jocher

@TheSole0 hello,

It looks like your error is caused by a dimension mismatch issue between two tensors. The error message says "Sizes of tensors must match except in dimension 2. Got 7 and 8". This means that two tensors that should have matched sizes do not match in dimension 2, where one tensor has size 7 and the other has size 8.

From your command, it seems that you are using YOLOv8. Can you provide more information on what version of YOLOv8 you are using?

Also, I noticed that you have reduced the feature maps in the backbone, causing the mismatch between tensors. Given that the YOLOv8 architecture differs from the YOLOv5 architecture, I would not recommend reducing the feature maps in the backbone for YOLOv8. The architecture is designed to work best with the default feature map sizes.

Please let me know if this helps, or if you have any other questions.

First, I really thank you for your response to me.
Yesterday, I did that upgrade to a new version(8.0.110) but It was not working
image

I saw the p34.yaml file in the yolov5 hub. which can use only small and medium heads right? I got insight from p34.yaml file
Actually, what I really want to do is only use two heads such as medium and large. that's why I try to reduce the feature maps.

When the pixel size 640 is fixed. How can we use only medium and large heads without small head
Under image file name is "yolov8s_test.yaml"
image

my command is yolo detect train data=kitti.yaml model=yolov8s_test.yaml epochs=50 imgsz=640
image

Please help me

@glenn-jocher
Copy link
Member

@TheSole0 hello,

Thank you for the additional information. It looks like you are trying to use only medium and large heads in YOLOv8 with a fixed pixel size of 640, and you have reduced the feature maps in the backbone to achieve this. However, this has caused a dimension mismatch issue between two tensors.

While it is possible to modify the architecture of YOLOv8, it is not recommended to reduce the feature maps in the backbone as it is designed to work best with the default feature map sizes. The p34.yaml file in the yolov5 hub can use only small and medium heads, as you have pointed out.

If you want to use only medium and large heads, you could try using the yolov5l model and modifying the configuration file to use only medium and large heads. You can also try using the default configuration file for yolov5l with a fixed pixel size of 640 and see how it performs on your data.

I hope this helps. Let me know if you have any other questions.

@TheSole0
Copy link

@glenn-jocher

@TheSole0 hello,

Thank you for the additional information. It looks like you are trying to use only medium and large heads in YOLOv8 with a fixed pixel size of 640, and you have reduced the feature maps in the backbone to achieve this. However, this has caused a dimension mismatch issue between two tensors.

While it is possible to modify the architecture of YOLOv8, it is not recommended to reduce the feature maps in the backbone as it is designed to work best with the default feature map sizes. The p34.yaml file in the yolov5 hub can use only small and medium heads, as you have pointed out.

If you want to use only medium and large heads, you could try using the yolov5l model and modifying the configuration file to use only medium and large heads. You can also try using the default configuration file for yolov5l with a fixed pixel size of 640 and see how it performs on your data.

I hope this helps. Let me know if you have any other questions.

Thank you for your quick reply.
I will try again now

I really appreciate it. Teacher

@glenn-jocher
Copy link
Member

@TheSole0 you're welcome! I'm glad I could help. Feel free to reach out if you have any further questions or concerns. Best of luck with your project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
4 participants