Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problems for strip optimizer at last epoch #428

Closed
1 task done
omumbare7 opened this issue Oct 6, 2023 · 8 comments
Closed
1 task done

problems for strip optimizer at last epoch #428

omumbare7 opened this issue Oct 6, 2023 · 8 comments
Labels
bug Something isn't working Stale

Comments

@omumbare7
Copy link

Search before asking

  • I have searched the HUB issues and found no similar bug report.

HUB Component

No response

Bug

i already had this issue with one of the models i trained recently, it trained epoch 99 (out of 100) and then it gave out the following error

Ultralytics HUB: New authentication successful ✅
Ultralytics HUB: View model at https://hub.ultralytics.com/models/58IFoVk7ISnpulKrxrQM 🚀
Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/models/58IFoVk7ISnpulKrxrQM/epoch-99.pt to 'epoch-99.pt'...
100%|██████████| 521M/521M [00:27<00:00, 20.2MB/s]
WARNING ⚠️ Unable to automatically guess model task, assuming 'task=detect'. Explicitly define task for your model, i.e. 'task=detect', 'segment', 'classify', or 'pose'.
Ultralytics YOLOv8.0.194 🚀 Python-3.10.12 torch-2.0.1+cu118 CUDA:0 (Tesla T4, 15102MiB)
engine/trainer: task=detect, mode=train, model=epoch-99.pt, data=https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/datasets/sM0ItvxDPp9ahuaNVVRP/weed.v1i.yolov8.zip, epochs=100, patience=100, batch=9, imgsz=640, save=True, save_period=-1, cache=ram, device=, workers=8, project=None, name=None, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, stream_buffer=False, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.0, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=0.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=runs/detect/train
Downloading https://storage.googleapis.com/ultralytics-hub.appspot.com/users/C6ZyMlgkeIfubkqgBdcEZ6drOqt2/datasets/sM0ItvxDPp9ahuaNVVRP/weed.v1i.yolov8.zip to 'weed.v1i.yolov8.zip'...
100%|██████████| 1.64G/1.64G [01:17<00:00, 22.7MB/s]
Unzipping weed.v1i.yolov8.zip to /content/datasets/weed.v1i.yolov8...: 100%|██████████| 35660/35660 [00:13<00:00, 2705.23file/s]
Downloading https://ultralytics.com/assets/Arial.ttf to '/root/.config/Ultralytics/Arial.ttf'...
100%|██████████| 755k/755k [00:00<00:00, 14.4MB/s]
TensorBoard: Start with 'tensorboard --logdir runs/detect/train', view at http://localhost:6006/

               from  n    params  module                                       arguments                     

0 -1 1 2320 ultralytics.nn.modules.conv.Conv [3, 80, 3, 2]
1 -1 1 115520 ultralytics.nn.modules.conv.Conv [80, 160, 3, 2]
2 -1 3 436800 ultralytics.nn.modules.block.C2f [160, 160, 3, True]
3 -1 1 461440 ultralytics.nn.modules.conv.Conv [160, 320, 3, 2]
4 -1 6 3281920 ultralytics.nn.modules.block.C2f [320, 320, 6, True]
5 -1 1 1844480 ultralytics.nn.modules.conv.Conv [320, 640, 3, 2]
6 -1 6 13117440 ultralytics.nn.modules.block.C2f [640, 640, 6, True]
7 -1 1 3687680 ultralytics.nn.modules.conv.Conv [640, 640, 3, 2]
8 -1 3 6969600 ultralytics.nn.modules.block.C2f [640, 640, 3, True]
9 -1 1 1025920 ultralytics.nn.modules.block.SPPF [640, 640, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 3 7379200 ultralytics.nn.modules.block.C2f [1280, 640, 3]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 3 1948800 ultralytics.nn.modules.block.C2f [960, 320, 3]
16 -1 1 922240 ultralytics.nn.modules.conv.Conv [320, 320, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 3 7174400 ultralytics.nn.modules.block.C2f [960, 640, 3]
19 -1 1 3687680 ultralytics.nn.modules.conv.Conv [640, 640, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 7379200 ultralytics.nn.modules.block.C2f [1280, 640, 3]
22 [15, 18, 21] 1 8718931 ultralytics.nn.modules.head.Detect [1, [320, 640, 640]]
Model summary: 365 layers, 68153571 parameters, 68153555 gradients, 258.1 GFLOPs

Transferred 595/595 items from pretrained weights
Freezing layer 'model.22.dfl.conv.weight'
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
Downloading https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt to 'yolov8n.pt'...
100%|██████████| 6.23M/6.23M [00:00<00:00, 76.9MB/s]
AMP: checks passed ✅
train: Scanning /content/datasets/weed.v1i.yolov8/train/labels... 15585 images, 1364 backgrounds, 0 corrupt: 100%|██████████| 15585/15585 [00:08<00:00, 1920.07it/s]
train: New cache created: /content/datasets/weed.v1i.yolov8/train/labels.cache
train: 26.8GB RAM required to cache images with 50% safety margin but only 7.8/12.7GB available, not caching images ⚠️
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
val: Scanning /content/datasets/weed.v1i.yolov8/valid/labels... 1483 images, 127 backgrounds, 0 corrupt: 100%|██████████| 1483/1483 [00:01<00:00, 938.41it/s]
val: New cache created: /content/datasets/weed.v1i.yolov8/valid/labels.cache
val: Caching images (1.7GB ram): 100%|██████████| 1483/1483 [00:08<00:00, 169.14it/s]
Plotting labels to runs/detect/train/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: SGD(lr=0.01, momentum=0.9) with parameter groups 97 weight(decay=0.0), 104 weight(decay=0.0004921875), 103 bias(decay=0.0)
Resuming training from epoch-99.pt from epoch 101 to 100 total epochs
Closing dataloader mosaic
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
Ultralytics HUB: View model at https://hub.ultralytics.com/models/58IFoVk7ISnpulKrxrQM 🚀
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs/detect/train
Starting training for 100 epochs...

1 epochs completed in 0.001 hours.

AssertionError Traceback (most recent call last)
in <cell line: 4>()
2
3 model = YOLO('https://hub.ultralytics.com/models/58IFoVk7ISnpulKrxrQM')
----> 4 model.train()

5 frames
/usr/local/lib/python3.10/dist-packages/ultralytics/utils/plotting.py in plot_results(file, dir, segment, pose, classify, on_plot)
534 ax = ax.ravel()
535 files = list(save_dir.glob('results*.csv'))
--> 536 assert len(files), f'No results.csv files found in {save_dir.resolve()}, nothing to plot.'
537 for f in files:
538 try:

AssertionError: No results.csv files found in /content/runs/detect/train, nothing to plot.

then i striped the optimizer myself by downloading epoch-99 and it worked, i am writing this here as i have faced this error in other model i trained as well, maybe it is a bug at the moment, i am reporting this bug here as it can be issue for other model trainings as well. i striped the optimizer for it as well
also @kalenmike fixed the previous for me via ultralytics hub, i was able to download it from there, but the model i downloaded doesnt work and it gives out errors like

Confidence ---> 0.85
Traceback (most recent call last):
File "c:\Users\om\Desktop\ugv proto\codes\yolo\yolov8\trial.py", line 77, in
print("Class name -->", classNames[cls])
IndexError: list index out of range
Exception ignored in: <generator object BasePredictor.stream_inference at 0x0000023FB8C19310>
Traceback (most recent call last):
File "C:\Users\om\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 52, in generator_context
File "C:\Users\om\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 300, in clone
File "C:\Users\om\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 286, in init
AttributeError: 'NoneType' object has no attribute 'is_scripting

this error only occurs with the fixed model i downloaded form the hub, the size was 56mb and the same model before the error fixing (epoch 99) which i striped the optimizer from works really well without any errors (136mb) i am writing here to report a bug. i dont have any issues at the moment regarding models.

Environment

No response

Minimal Reproducible Example

No response

Additional

No response

@omumbare7 omumbare7 added the bug Something isn't working label Oct 6, 2023
@kalenmike
Copy link
Member

kalenmike commented Oct 6, 2023

@omumbare7 I am interested to understand why you are getting stuck resuming on completion? Did your previous training fail to upload the final weights?

Can you share the code in trial.py?

@kalenmike
Copy link
Member

@omumbare7 When I did the last fix I wrote this notebook, it should offer a fix for now, but we haven't tested this across all models:

Colab Notebook

@kalenmike
Copy link
Member

@omumbare7 I would like to resolve this permanently, maybe we can work together on this? You can reach me at:

kalen.michael@ultralytics.com

@omumbare7
Copy link
Author

@omumbare7 I am interested to understand why you are getting stuck resuming on completion? Did your previous training fail to upload the final weights?

Can you share the code in trial.py?

i actually trained it on colab, so i dont think i have the access to trial.py

@omumbare7
Copy link
Author

@omumbare7 When I did the last fix I wrote this notebook, it should offer a fix for now, but we haven't tested this across all models:

Colab Notebook

i will try this and let you know if the model works after this notebook, but it will definitely works with the strip optimizer i did previously

@omumbare7
Copy link
Author

@omumbare7 I would like to resolve this permanently, maybe we can work together on this? You can reach me at:

kalen.michael@ultralytics.com

i would like to contribute to this but i don't have the knowledge or the expertise in this topic, i am just a student and a beginner in this domain and i have just started to learn, my apologies

@UltralyticsAssistant
Copy link
Member

@omumbare7 that's perfectly okay! I completely understand. Learning is a continuous process and we all start somewhere. Your curiosity and willingness to explore is a great start in this domain. If you have any further questions or issues, don't hesitate to ask. We're here to help. Happy coding and learning!

Copy link

github-actions bot commented Nov 6, 2023

👋 Hello there! We wanted to give you a friendly reminder that this issue has not had any recent activity and may be closed soon, but don't worry - you can always reopen it if needed. If you still have any questions or concerns, please feel free to let us know how we can help.

For additional resources and information, please see the links below:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

@github-actions github-actions bot added the Stale label Nov 6, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

3 participants