Model Ensembling Tutorial #318

glenn-jocher · 2020-07-07T22:39:30Z

📚 This guide explains how to use YOLOv5 🚀 model ensembling during testing and inference for improved mAP and Recall. UPDATED 25 September 2022.

From https://www.sciencedirect.com/topics/computer-science/ensemble-modeling:

Ensemble modeling is a process where multiple diverse models are created to predict an outcome, either by using many different modeling algorithms or using different training data sets. The ensemble model then aggregates the prediction of each base model and results in once final prediction for the unseen data. The motivation for using ensemble models is to reduce the generalization error of the prediction. As long as the base models are diverse and independent, the prediction error of the model decreases when the ensemble approach is used. The approach seeks the wisdom of crowds in making a prediction. Even though the ensemble model has multiple base models within the model, it acts and performs as a single model.

Before You Start

Clone repo and install requirements.txt in a Python>=3.7.0 environment, including PyTorch>=1.7. Models and datasets download automatically from the latest YOLOv5 release.

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Test Normally

Before ensembling we want to establish the baseline performance of a single model. This command tests YOLOv5x on COCO val2017 at image size 640 pixels. yolov5x.pt is the largest and most accurate model available. Other options are yolov5s.pt, yolov5m.pt and yolov5l.pt, or you own checkpoint from training a custom dataset ./weights/best.pt. For details on all available models please see our README table.

$ python val.py --weights yolov5x.pt --data coco.yaml --img 640 --half

Output:

val: data=./data/coco.yaml, weights=['yolov5x.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.65, task=val, device=, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True
YOLOv5 🚀 v5.0-267-g6a3ee7c torch 1.9.0+cu102 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)

Fusing layers... 
Model Summary: 476 layers, 87730285 parameters, 0 gradients

val: Scanning '../datasets/coco/val2017' images and labels...4952 found, 48 missing, 0 empty, 0 corrupted: 100% 5000/5000 [00:01<00:00, 2846.03it/s]
val: New cache created: ../datasets/coco/val2017.cache
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [02:30<00:00,  1.05it/s]
                 all       5000      36335      0.746      0.626       0.68       0.49
Speed: 0.1ms pre-process, 22.4ms inference, 1.4ms NMS per image at shape (32, 3, 640, 640)  # <--- baseline speed

Evaluating pycocotools mAP... saving runs/val/exp/yolov5x_predictions.json...
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.504  # <--- baseline mAP
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.688
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.546
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.351
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.551
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.644
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.382
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.628
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.681  # <--- baseline mAR
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.524
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.735
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.826

Ensemble Test

Multiple pretraind models may be ensembled togethor at test and inference time by simply appending extra models to the --weights argument in any existing val.py or detect.py command. This example tests an ensemble of 2 models togethor:

YOLOv5x
YOLOv5l6

python val.py --weights yolov5x.pt yolov5l6.pt --data coco.yaml --img 640 --half

Output:

val: data=./data/coco.yaml, weights=['yolov5x.pt', 'yolov5l6.pt'], batch_size=32, imgsz=640, conf_thres=0.001, iou_thres=0.6, task=val, device=, single_cls=False, augment=False, verbose=False, save_txt=False, save_hybrid=False, save_conf=False, save_json=True, project=runs/val, name=exp, exist_ok=False, half=True
YOLOv5 🚀 v5.0-267-g6a3ee7c torch 1.9.0+cu102 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)

Fusing layers... 
Model Summary: 476 layers, 87730285 parameters, 0 gradients  # Model 1
Fusing layers... 
Model Summary: 501 layers, 77218620 parameters, 0 gradients  # Model 2
Ensemble created with ['yolov5x.pt', 'yolov5l6.pt']  # Ensemble notice

val: Scanning '../datasets/coco/val2017.cache' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100% 5000/5000 [00:00<00:00, 49695545.02it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100% 157/157 [03:58<00:00,  1.52s/it]
                 all       5000      36335      0.747      0.637      0.692      0.502
Speed: 0.1ms pre-process, 39.5ms inference, 2.0ms NMS per image at shape (32, 3, 640, 640)  # <--- ensemble speed

Evaluating pycocotools mAP... saving runs/val/exp3/yolov5x_predictions.json...
...
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.515  # <--- ensemble mAP
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.699
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.557
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.356
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.563
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.668
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.387
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.638
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.689  # <--- ensemble mAR
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.526
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.743
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.844

Ensemble Inference

Append extra models to the --weights argument to run ensemble inference:

python detect.py --weights yolov5x.pt yolov5l6.pt --img 640 --source data/images

Output:

detect: weights=['yolov5x.pt', 'yolov5l6.pt'], source=data/images, imgsz=640, conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, update=False, project=runs/detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False
YOLOv5 🚀 v5.0-267-g6a3ee7c torch 1.9.0+cu102 CUDA:0 (Tesla P100-PCIE-16GB, 16280.875MB)

Fusing layers... 
Model Summary: 476 layers, 87730285 parameters, 0 gradients
Fusing layers... 
Model Summary: 501 layers, 77218620 parameters, 0 gradients
Ensemble created with ['yolov5x.pt', 'yolov5l6.pt']

image 1/2 /content/yolov5/data/images/bus.jpg: 640x512 4 persons, 1 bus, 1 tie, Done. (0.063s)
image 2/2 /content/yolov5/data/images/zidane.jpg: 384x640 3 persons, 2 ties, Done. (0.056s)
Results saved to runs/detect/exp2
Done. (0.223s)

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Notebooks with free GPU:
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Amazon Deep Learning AMI. See AWS Quickstart Guide
Docker Image. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training, validation, inference, export and benchmarks on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

The text was updated successfully, but these errors were encountered:

ZeKunZhang1998 · 2020-08-16T02:08:01Z

Can I use it in version 1?

SISTMrL · 2020-08-22T01:48:09Z

what's influence of model ensemble

Zzh-tju · 2020-08-25T20:24:19Z

compare to test-time augmentation? which one will be better?

Zzh-tju · 2020-08-25T20:36:15Z

Well, I see, the model ensembling method is actually more like using a poor model to find missed detections for a good model. In contrast, TTA can also find missed detections by changing the input, while maintaining using the best model.

glenn-jocher · 2020-08-25T21:59:32Z

@Zzh-tju ensembling and TTA are not mutually exclusive. You can TTA a single model, and you can ensemble a group of models with or without TTA:

python detect.py --weights model1.pt model2.pt --augment

glenn-jocher · 2020-08-25T22:08:33Z

@Zzh-tju ensembling runs multiple models, while TTA tests a single model at with different augmentations. Typically I've seen the best result when merging output grids directly, (i.e. ensembling YOLOv5l and YOLOv5x), rather than simply appending boxes from multiple models for NMS to sort out. This is not always possible however, for example Ensembling an EfficientDet model with YOLOv5x, you can not merge grids, you must use NMS or WBF (or Merge NMS) to get a final result.

github-actions · 2020-09-25T00:41:07Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Blaze-raf97 · 2020-10-16T10:39:19Z

How can I ensemble EfficientDet D7 with YOLO V5x?

glenn-jocher · 2020-10-16T11:07:02Z

@Blaze-raf97 with the right amount of coffee anything is possible.

LokedSher · 2020-10-23T07:17:48Z

How to solve this problem?
COCO mAP with pycocotools... saving detections_val2017__results.json...
ERROR: pycocotools unable to run: invalid literal for int() with base 10: 'Image_20200930140952222'

glenn-jocher · 2020-10-23T07:23:17Z

@LokedSher pycocotools is only intended for mAP on COCO data using coco.yaml. https://pypi.org/project/pycocotools/

LokedSher · 2020-10-23T07:31:14Z

@LokedSher pycocotools is only intended for mAP on COCO data using coco.yaml. https://pypi.org/project/pycocotools/

Thanks for your reply!

ZwNSW · 2020-11-03T07:47:53Z

@LokedSher I also encountered the same problem as you, but after I read your Q&A, I still don't know how to improve to get the picture given by the author.

pathikg · 2022-10-26T11:13:24Z

I want to ensemble two yolov5x6 models trained on the same data with some variation
I saw the Ensemble() class mentioned in above issues but I was a bit confused on how to implement it in a python script

In other words, how do I exactly use that Ensemble in order to create an ensemble of my 2 models within a python script/function just by passing the models?

glenn-jocher · 2022-10-26T12:13:30Z

@pathikg YOLOv5 ensembling is automatically built into detect.py and val.py, so simply pass two weights:

python detect.py --weights yolov5s.pt yolov5m.pt

pathikg · 2022-10-26T14:49:02Z

@pathikg YOLOv5 ensembling is automatically built into detect.py and val.py, so simply pass two weights:
python detect.py --weights yolov5s.pt yolov5m.pt

Thanks @glenn-jocher for quick reply but I want to do this in a python script
At present, I am loading model from torch hub with my custom weights and then doing the inference on respective images.
At present I've two such models, and I want to make ensemble of the same so is there any way I can do that as well?

glenn-jocher · 2022-10-26T14:51:00Z

I’d follow the code in detect.py and use the Ensemble() module from models/common.py

pathikg · 2022-10-26T14:55:01Z

You mean from models/experimental.py?
cuz there's no Ensemble() module present in common.py :/

glenn-jocher · 2022-11-05T00:52:33Z

yes sorry in experimental.py

AfiqueAye · 2023-01-02T20:28:52Z

@glenn-jocher I really need your help for one of my problems.
I have a dataset trained on cat, dog and pen, and after training, I have got dataset1.pt best file
Now, I trained another model with new data, for example I have taken just images of cow, and got dataset2.pt best file.
Both of the trained model can detect the images separately. But I want to make them one model, so that it will detect all the images (cat, dog, pen and cow) using a single weight file.
Can I do it using ensemble techniques?
will this work?
python detect.py --weights dataset1.pt dataset2.pt --img 640 --source data/images/cat
or how can I do this, is there any way, rather than creating a new dataset will all the images, and trained again. Please reply, thanks

michael-mayo · 2023-02-08T20:01:01Z

@Zzh-tju ensembling and TTA are not mutually exclusive. You can TTA a single model, and you can ensemble a group of models with or without TTA:

python detect.py --weights model1.pt model2.pt --augment

Hi @glenn-jocher , what would be the best practice for deploying an ensemble model like this? I know we can export the individual models for different deployment frameworks, but how would I export an ensemble?

arbaz-pivotchain · 2023-04-03T08:57:54Z

Hello, I read a previous discussion about ensembling multiple trained networks by simply passing more than 1 weight file during inference.

My question is what technique is being used for fusing the predictions? Is it majority voting on the bounding boxes? Some weighted averaging?

michael-mayo · 2023-04-03T19:30:26Z

I ended up predicting each image with all of the ensemble members separately, and then combining the bounding box predictions together and doing a second stage NMS to generate a final combined prediction. Seems to work OK. Another possibility is to average the weights of the ensemble members into an averaged model like they do in federated learning, but I have not properly evaluated that method yet.

glenn-jocher · 2023-04-03T20:03:08Z

@michael-mayo that's a good approach! Typically, the ensembling technique involves averaging the model weights and biases instead of the predictions. Here, you are combining the predictions which can be achieved using the NMS algorithm. Keep in mind that it's important to experiment and choose the best method based on the specific problem and the performance of each approach.

michael-mayo · 2023-04-04T00:54:32Z

I should also add that for each ensemble member I trained using a different global random seed, and a different (5/6) subset of the training data, to improve ensemble diversity.

glenn-jocher · 2023-04-04T03:48:37Z

@michael-mayo That's a great technique to improve ensemble diversity. It can help reducing the chances of overfitting (which can happen if all ensemble members are trained on exactly the same data) and increase the robustness of the final predictions.

mek651 · 2023-05-10T22:48:18Z

Hi @glenn-jocher I have a question: Suppose we train an object detection model separately on two completely separate datasets (datasets A and B) so that the classes are the same in both datasets (for example usask and avarlis in Wheat Head Detection).
After training the same model on 2 datasets, we will have 2 different best.pt files.
Question: How can we combine (or aggregate) these 2 model weights to have a single .pt file including the aggregated model weights?

michael-mayo · 2023-05-10T23:11:25Z

Hi @mek651 this can be done straightfowardly by loading each model, getting the state_dict for each model (which is a sequence of arrays or tensors I believe), doing a straightforward average of the two state_dicts, then deep copying one the models and assigning the averaged state_dict and saving it. In yolo this is only going to make sense if the classes are exactly the same though. You might get better results by training one larger model on both datasets at the same time though.

mek651 · 2023-05-10T23:22:29Z

Thanks @michael-mayo.
Actually, my goal is to use Federated Learning. the simplest version I am working on is to apply FedAvg model on the output weights (2 or more best.pt). the classes of both datasets are exactly the same.

Do you have any idea about this?

andualemw1 · 2023-07-01T17:36:31Z

Thanks for the relevant discussion, I am new to YOLOv5 and this platform. I wonder if there is a way to make ensemble training on two different datasets, with one frozen pre-trained model and one from scratch?

bshakya77 · 2023-12-27T06:45:35Z

Hello,

I did not see the python implementation for detect tasks. Is it same as the predict task?

python detect.py --weights yolov5s.pt yolov5m.pt
If not, how can I run the detect task using python implementation?

Thank you.

Regards,
Bijay

kidcad1412 · 2024-04-15T12:06:06Z

@glenn-jocher I really need your help for one of my problems. I have a dataset trained on cat, dog and pen, and after training, I have got dataset1.pt best file Now, I trained another model with new data, for example I have taken just images of cow, and got dataset2.pt best file. Both of the trained model can detect the images separately. But I want to make them one model, so that it will detect all the images (cat, dog, pen and cow) using a single weight file. Can I do it using ensemble techniques? will this work? python detect.py --weights dataset1.pt dataset2.pt --img 640 --source data/images/cat or how can I do this, is there any way, rather than creating a new dataset will all the images, and trained again. Please reply, thanks

do you solve this？diffrent number of classes will get the error
assert all(model[0].nc == m.nc for m in model), f"Models have different class counts: {[m.nc for m in model]}"
AssertionError: Models have different class counts: [36, 80]

Twinkle2401 · 2024-05-23T08:05:05Z

Actually in the ensembling technique I want to use the majority voting one the in this inference or test what type of aggregration is used and how can we use the majority voting python val.py --weights yolov5x.pt yolov5l6.pt --data coco.yaml --img 640 --half

glenn-jocher added documentation Improvements or additions to documentation enhancement New feature or request labels Jul 7, 2020

glenn-jocher self-assigned this Jul 7, 2020

glenn-jocher added a commit that referenced this issue Jul 7, 2020

Initial model ensemble capability #318

e8cf24b

glenn-jocher mentioned this issue Jul 8, 2020

YOLOv5 License Issues with Kaggle Wheat Competition: GPL vs MIT #317

Closed

This comment has been minimized.

Sign in to view

ultralytics deleted a comment from ZeKunZhang1998 Aug 17, 2020

github-actions bot added the Stale label Sep 25, 2020

github-actions bot closed this as completed Sep 30, 2020

glenn-jocher removed the Stale label Oct 8, 2020

glenn-jocher reopened this Oct 8, 2020

glenn-jocher mentioned this issue Sep 25, 2022

Documentation of methods, parameters, allowed values, term definitions, etc, etc #9584

Closed

2 tasks

This was referenced Oct 3, 2022

Gradual unfreezing the layers during training. #9677

Closed

Procedure of training the model offline. #9700

Closed

glenn-jocher mentioned this issue Oct 10, 2022

confusion matrix - backgroud part #9754

Closed

This was referenced Oct 24, 2022

Use Yolo for anomaly detection #9906

Closed

I want to pass the image read by opencv to the model I/F #9913

Closed

This was referenced Nov 6, 2022

Number of Classes #10054

Closed

Multigpu training becomes slower in Kaggle #10078

Closed

Yolov5 cannot detection a video (tfjs) #7416

Closed

glenn-jocher mentioned this issue Dec 6, 2022

How to freeze backbone and unfreeze it after a specific epoch? #10416

Closed

1 task

Model Ensembling Tutorial #318

Model Ensembling Tutorial #318

Comments

glenn-jocher commented Jul 7, 2020 • edited Loading

Before You Start

Test Normally

Ensemble Test

Ensemble Inference

Environments

Status

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

ZeKunZhang1998 commented Aug 16, 2020

SISTMrL commented Aug 22, 2020

Zzh-tju commented Aug 25, 2020

Zzh-tju commented Aug 25, 2020

glenn-jocher commented Aug 25, 2020 • edited Loading

glenn-jocher commented Aug 25, 2020

github-actions bot commented Sep 25, 2020

Blaze-raf97 commented Oct 16, 2020

glenn-jocher commented Oct 16, 2020

LokedSher commented Oct 23, 2020

glenn-jocher commented Oct 23, 2020

LokedSher commented Oct 23, 2020

ZwNSW commented Nov 3, 2020

pathikg commented Oct 26, 2022 • edited by glenn-jocher Loading

glenn-jocher commented Oct 26, 2022

pathikg commented Oct 26, 2022

glenn-jocher commented Oct 26, 2022

pathikg commented Oct 26, 2022

glenn-jocher commented Nov 5, 2022

AfiqueAye commented Jan 2, 2023

michael-mayo commented Feb 8, 2023

arbaz-pivotchain commented Apr 3, 2023

michael-mayo commented Apr 3, 2023 • edited Loading

glenn-jocher commented Apr 3, 2023

michael-mayo commented Apr 4, 2023

glenn-jocher commented Apr 4, 2023

mek651 commented May 10, 2023

michael-mayo commented May 10, 2023

mek651 commented May 10, 2023

andualemw1 commented Jul 1, 2023

bshakya77 commented Dec 27, 2023

kidcad1412 commented Apr 15, 2024

Twinkle2401 commented May 23, 2024

glenn-jocher commented Jul 7, 2020 •

edited

Loading

glenn-jocher commented Aug 25, 2020 •

edited

Loading

pathikg commented Oct 26, 2022 •

edited by glenn-jocher

Loading

michael-mayo commented Apr 3, 2023 •

edited

Loading