Unexpected interruption in training process #761

Aq114 · 2024-07-08T08:58:28Z

Search before asking

I have searched the HUB issues and found no similar bug report.

HUB Component

No response

Bug

Using yolo detect train data=data/classify. yaml model=yolov8n. pt epochs=100 imgsz=640 to train a custom dataset on the local command line, an error termination occurs after approximately seven to eight iterations. The error message is as follows:
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Scripts\yolo.exe_main.py", line 7, in
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\cfg_init.py", line 591, in entrypoint
getattr(model, mode)(**overrides) # default args from model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\engine\model.py", line 650, in train
self.trainer.train()
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\engine\trainer.py", line 204, in train
self._do_train(world_size)
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\engine\trainer.py", line 429, in _do_train
self.metrics, self.fitness = self.validate()
^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\engine\trainer.py", line 570, in validate
metrics = self.validator(self)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\engine\validator.py", line 195, in call
stats = self.get_stats()
^^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\models\yolo\detect\val.py", line 172, in get_stats
stats = {k: torch.cat(v, 0).cpu().numpy() for k, v in self.stats.items()} # to numpy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Y\AppData\Local\anaconda3\envs\yolov5\Lib\site-packages\ultralytics\models\yolo\detect\val.py", line 172, in
stats = {k: torch.cat(v, 0).cpu().numpy() for k, v in self.stats.items()} # to numpy
^^^^^^^^^^^^^^^
RuntimeError: torch.cat(): expected a non-empty list of Tensors

Environment

Ultralytics YOLOv8.2.48 🚀 Python-3.11.9 torch-2.3.1 CUDA:0 (NVIDIA GeForce GTX 1050 Ti, 4096MiB)

Minimal Reproducible Example

No response

Additional

The yolov5m6 model has been successfully trained on this dataset

pderrenger · 2024-07-08T15:13:14Z

@Aq114 hi there! 👋

Thank you for providing detailed information about the issue you're encountering. It looks like you're running into a RuntimeError related to torch.cat() expecting a non-empty list of Tensors. Let's try to troubleshoot this together.

Steps to Resolve:

Update Packages: First, please ensure that you are using the latest versions of all relevant packages. You can update Ultralytics and PyTorch using the following commands:
```
pip install --upgrade ultralytics
pip install --upgrade torch
```
Check Dataset and Annotations: Ensure that your dataset and annotations are correctly formatted and that there are no empty annotations. Sometimes, an empty list of annotations can cause such errors.
Validate Dataset: You can use the yolo command to validate your dataset before training:
```
yolo detect val data=data/classify.yaml model=yolov8n.pt imgsz=640
```
This will help identify any potential issues with the dataset itself.
Debugging: If the issue persists, you can add some debugging statements in the ultralytics/models/yolo/detect/val.py file to print out the contents of self.stats before the torch.cat() operation. This can help identify if and why self.stats might be empty.

Example Debugging Code:

You can modify the get_stats method in val.py to include print statements:

def get_stats(self):
    print("Stats before torch.cat():", self.stats)
    stats = {k: torch.cat(v, 0).cpu().numpy() for k, v in self.stats.items()}  # to numpy
    return stats

Additional Tips:

Ensure that your environment is correctly set up and that there are no conflicting versions of libraries.
If you have a large dataset, try training with a smaller subset to see if the issue persists.

Please try these steps and let us know if the issue continues. We're here to help! 😊

Aq114 added the bug Something isn't working label Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected interruption in training process #761

Unexpected interruption in training process #761

Aq114 commented Jul 8, 2024

pderrenger commented Jul 8, 2024

Unexpected interruption in training process #761

Unexpected interruption in training process #761

Comments

Aq114 commented Jul 8, 2024

Search before asking

HUB Component

Bug

Environment

Minimal Reproducible Example

Additional

pderrenger commented Jul 8, 2024

Steps to Resolve:

Example Debugging Code:

Additional Tips: