Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected interruption in training process #761

Open
1 task done
Aq114 opened this issue Jul 8, 2024 · 1 comment
Open
1 task done

Unexpected interruption in training process #761

Aq114 opened this issue Jul 8, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@Aq114
Copy link

Aq114 commented Jul 8, 2024

Search before asking

  • I have searched the HUB issues and found no similar bug report.

HUB Component

No response

Bug

Using yolo detect train data=data/classify. yaml model=yolov8n. pt epochs=100 imgsz=640 to train a custom dataset on the local command line, an error termination occurs after approximately seven to eight iterations. The error message is as follows:
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Scripts\yolo.exe_main
.py", line 7, in
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\cfg_init
.py", line 591, in entrypoint
getattr(model, mode)(**overrides) # default args from model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\engine\model.py", line 650, in train
self.trainer.train()
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\engine\trainer.py", line 204, in train
self._do_train(world_size)
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\engine\trainer.py", line 429, in _do_train
self.metrics, self.fitness = self.validate()
^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\engine\trainer.py", line 570, in validate
metrics = self.validator(self)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\engine\validator.py", line 195, in call
stats = self.get_stats()
^^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\models\yolo\detect\val.py", line 172, in get_stats
stats = {k: torch.cat(v, 0).cpu().numpy() for k, v in self.stats.items()} # to numpy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\models\yolo\detect\val.py", line 172, in
stats = {k: torch.cat(v, 0).cpu().numpy() for k, v in self.stats.items()} # to numpy
^^^^^^^^^^^^^^^
RuntimeError: torch.cat(): expected a non-empty list of Tensors

Environment

Ultralytics YOLOv8.2.48 🚀 Python-3.11.9 torch-2.3.1 CUDA:0 (NVIDIA GeForce GTX 1050 Ti, 4096MiB)

Minimal Reproducible Example

No response

Additional

The yolov5m6 model has been successfully trained on this dataset

@Aq114 Aq114 added the bug Something isn't working label Jul 8, 2024
@pderrenger
Copy link
Member

@Aq114 hi there! 👋

Thank you for providing detailed information about the issue you're encountering. It looks like you're running into a RuntimeError related to torch.cat() expecting a non-empty list of Tensors. Let's try to troubleshoot this together.

Steps to Resolve:

  1. Update Packages: First, please ensure that you are using the latest versions of all relevant packages. You can update Ultralytics and PyTorch using the following commands:

    pip install --upgrade ultralytics
    pip install --upgrade torch
  2. Check Dataset and Annotations: Ensure that your dataset and annotations are correctly formatted and that there are no empty annotations. Sometimes, an empty list of annotations can cause such errors.

  3. Validate Dataset: You can use the yolo command to validate your dataset before training:

    yolo detect val data=data/classify.yaml model=yolov8n.pt imgsz=640

    This will help identify any potential issues with the dataset itself.

  4. Debugging: If the issue persists, you can add some debugging statements in the ultralytics/models/yolo/detect/val.py file to print out the contents of self.stats before the torch.cat() operation. This can help identify if and why self.stats might be empty.

Example Debugging Code:

You can modify the get_stats method in val.py to include print statements:

def get_stats(self):
    print("Stats before torch.cat():", self.stats)
    stats = {k: torch.cat(v, 0).cpu().numpy() for k, v in self.stats.items()}  # to numpy
    return stats

Additional Tips:

  • Ensure that your environment is correctly set up and that there are no conflicting versions of libraries.
  • If you have a large dataset, try training with a smaller subset to see if the issue persists.

Please try these steps and let us know if the issue continues. We're here to help! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants