-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training killed from the beginning #536
Comments
👋 Hello @eder1234, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:
If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix. If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response. We try to respond to all issues as promptly as possible. Thank you for your patience! |
@eder1234 hello! It looks like your training process is being terminated early, which could be due to a few reasons. Here are some common causes to consider:
To troubleshoot, you can start by monitoring system resources during training, reducing the batch size, and checking for any system logs that might indicate why the process was killed. If you continue to face issues, please provide more details, such as any error messages or logs, so we can assist you further. For more detailed guidance on troubleshooting, you can refer to the Ultralytics HUB Docs. Good luck with your training! 🚀 |
Thank you! Indeed, my system ran out RAM. Therefore, I increased the swap memory and it works better now. |
You're welcome, @eder1234! I'm glad to hear that increasing the swap memory resolved the issue. Remember that using swap memory can slow down the training process since it's not as fast as RAM, but it's a good workaround when physical RAM is limited. If you have any more questions or run into further issues, feel free to reach out. Happy training! 🎉 |
Search before asking
Question
Hi, I would like to know why is the training being killed when I run it locally.
Additional
Ultralytics YOLOv8.1.1 🚀 Python-3.10.13 torch-2.1.2+cu121 CUDA:0 (NVIDIA GeForce RTX 4050 Laptop GPU, 5905MiB)⚠️ Skipping /home/rodriguez/datasets/classify.zip unzip as destination directory /home/rodriguez/datasets/classify is not empty.
Setup complete ✅ (12 CPUs, 15.3 GB RAM, 69.3/199.9 GB disk)
Ultralytics HUB: New authentication successful ✅
Ultralytics HUB: View model at https://hub.ultralytics.com/models/HLv7cxztEUvk5eWJdJ9C 🚀
Downloading https://github.com/ultralytics/assets/releases/download/v8.1.0/yolov8s-cls.pt to 'yolov8s-cls.pt'...
100%|██████████████████████████████████████| 12.2M/12.2M [00:08<00:00, 1.54MB/s]
Ultralytics YOLOv8.1.1 🚀 Python-3.10.13 torch-2.1.2+cu121 CUDA:0 (NVIDIA GeForce RTX 4050 Laptop GPU, 5905MiB)
engine/trainer: task=classify, mode=train, model=yolov8s-cls.pt, data=https://storage.googleapis.com/ultralytics-hub.appspot.com/users/ZWUKwk47LeVGf1U0Cw0uiZmR8HQ2/datasets/F3xQK5zKgATriBNeyhOM/classify.zip, epochs=100, time=None, patience=100, batch=-1, imgsz=640, save=True, save_period=-1, cache=ram, device=0, workers=8, project=None, name=train6, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs/classify/train6
WARNING
train: /home/rodriguez/datasets/classify/train... found 2450 images in 5 classes ✅
val: /home/rodriguez/datasets/classify/val... found 525 images in 5 classes ✅
test: /home/rodriguez/datasets/classify/test... found 525 images in 5 classes ✅
Overriding model.yaml nc=1000 with nc=5
0 -1 1 928 ultralytics.nn.modules.conv.Conv [3, 32, 3, 2]
1 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
2 -1 1 29056 ultralytics.nn.modules.block.C2f [64, 64, 1, True]
3 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]
4 -1 2 197632 ultralytics.nn.modules.block.C2f [128, 128, 2, True]
5 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2]
6 -1 2 788480 ultralytics.nn.modules.block.C2f [256, 256, 2, True]
7 -1 1 1180672 ultralytics.nn.modules.conv.Conv [256, 512, 3, 2]
8 -1 1 1838080 ultralytics.nn.modules.block.C2f [512, 512, 1, True]
9 -1 1 664325 ultralytics.nn.modules.head.Classify [512, 5]
YOLOv8s-cls summary: 99 layers, 5087141 parameters, 5087141 gradients, 12.6 GFLOPs
Transferred 156/158 items from pretrained weights
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
AutoBatch: Computing optimal batch size for imgsz=640
AutoBatch: CUDA:0 (NVIDIA GeForce RTX 4050 Laptop GPU) 5.77G total, 0.22G reserved, 0.07G allocated, 5.48G free
Params GFLOPs GPU_mem (GB) forward (ms) backward (ms) input output
5087141 12.58 0.392 34.26 20.78 (1, 3, 640, 640) (1, 5)
5087141 25.17 0.516 4.312 14.08 (2, 3, 640, 640) (2, 5)
5087141 50.34 0.761 9.301 26.13 (4, 3, 640, 640) (4, 5)
5087141 100.7 1.275 21.11 32 (8, 3, 640, 640) (8, 5)
5087141 201.4 2.282 47.01 63.57 (16, 3, 640, 640) (16, 5)
AutoBatch: Using batch-size 23 for CUDA:0 3.45G/5.77G (60%) ✅
train: Scanning /home/rodriguez/datasets/classify/train... 2450 images, 0 corrup
val: Scanning /home/rodriguez/datasets/classify/val... 525 images, 0 corrupt: 10
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000714, momentum=0.9) with parameter groups 26 weight(decay=0.0), 27 weight(decay=0.0005390625), 27 bias(decay=0.0)
100 epochs...
0%| | 0/107 [00:00<?, ?it/s]Killed
The text was updated successfully, but these errors were encountered: