Output optimal confidence threshold based on PR curve #2048

decent-engineer-decent-datascientist · 2021-01-27T00:26:39Z

🚀 Feature

Could we print out the maximum F1 score and the associated confidence threshold at the end of training?

Motivation

This is something that should be done for every custom model, and the data is readily available after the final mAP calculations, and PR curve drawing.

Pitch

Filter detections at different score/confident thresholds, calculate P/R/F1, and then print the optimal threshold (max f1).

Alternatives

Instead of printing max PR, maybe write a csv in the run directory, containing metrics at different thresholds.

glenn-jocher · 2021-01-27T01:13:17Z

@decent-engineer-decent-datascientist yes that's an interesting idea. Unlike P and R, F1 should have a stable max point (P and R have unstable max and minima at 0.0 and 1.0 confidence thresholds). I'm not sure if the currently saved metrics suffice for this sort of analysis. At the moment we only save metrics at a fixed confidence, given by --conf for mAP and 0.1 hard coded for P and R (and also F1):

yolov5/utils/metrics.py

Line 41 in d68afed

    
           pr_score = 0.1  # score to evaluate P and R https://github.com/ultralytics/yolov3/issues/898

glenn-jocher · 2021-01-27T01:15:34Z

@decent-engineer-decent-datascientist evaluations at different PR scores might also help contribute to an additional feature, which would be an interactive P-R curve (1.0 conf top left and 0.0 conf bottom right).

Ownmarc · 2021-01-27T16:28:45Z

Oh yes I like this idea. I think it would also be interesting to vary the IoU threshold too and find the best settings to maximize the F1 score.
Could also give the option to weight classes differently in the calculation of this score.
Adding some flexibility on how "best.pt" is determined without requiring the user to jump in the code would be huge for this repo I think!

decent-engineer-decent-datascientist · 2021-01-27T16:31:14Z

@Ownmarc love the expansion to IOU and best.pt. Definitely something I could use.

glenn-jocher · 2021-01-27T20:13:15Z

@decent-engineer-decent-datascientist @Ownmarc I reviewed the code and this seems feasible, though probably best implemented at a single IoU threshold (i.e. 0.5 only), rather than all 10 IoU thresholds (0.50 : 0.05 : 0.95).

The current implementation computes P and R at a fixed score/conf (0.1) only at 0.50 IoU threshold. But luckily the operation that does this is vectorize-able so I can extend the P and R op to a vector of confidences, say np.linspace(0, 1, 100). This should allow for some really cool P, R and F1 plots as a function of confidence.

glenn-jocher · 2021-01-27T20:18:26Z

About best.pt, these checkpoints are saved anytime a new best fitness is observed, with default fitness defined as:

yolov5/utils/metrics.py

Lines 12 to 16 in f59f801

    
           def fitness(x): 
        
               # Model fitness as a weighted combination of metrics 
        
               w = [0.0, 0.0, 0.1, 0.9]  # weights for [P, R, mAP@0.5, mAP@0.5:0.95] 
        
               return (x[:, :4] * w).sum(1)

In the past we used to define fitness as inverse loss, though interestingly the min loss checkpoint rarely coincided with the highest mAP checkpoint, so we switched it to the current scheme following user feedback. It's possible that an array of 'best' checkpoints might better serve the varying community needs i.e.:

weights/
  last.pt
  best_map.pt
  best_f1.pt  # powered by our new all-confidence measurement
  best_loss.pt

glenn-jocher · 2021-01-27T21:14:17Z

@Ownmarc @decent-engineer-decent-datascientist I've opened up PR #2057 on this topic. YOLOv5m on COCO initial results below, evaluated and plotted at 0.50 IoU:

F1 Curve

Precision Curve

Recall Curve

decent-engineer-decent-datascientist · 2021-01-27T21:54:50Z

Oh gosh, that's so pretty. Do you have any worries merging this with master (performance, etc.)? The checks look good and no conflicts.

glenn-jocher · 2021-01-27T23:02:09Z

@decent-engineer-decent-datascientist the plotting time is only incurred on the last epoch and is not too material but yes still need to profile the added computation, which runs every epoch.

Also perhaps figure out how to best integrate these new plots into the overall picture. We have 4 curves now (PR + 3 above), none of which are interactive unfortunately. One simple change might be in test.py to update the P and R printouts to display at max F1 confidence rather than at 0.1 confidence, and to add an output for F1.

decent-engineer-decent-datascientist · 2021-01-27T23:06:31Z

Any reason we can’t use this:
https://wandb.ai/lavanyashukla/visualize-predictions/reports/Visualize-Model-Predictions--Vmlldzo1NjM4OA#Plots-2

Seems like an easy enough addition, and it’d let us play around with the data in wandb.

glenn-jocher · 2021-01-28T01:12:13Z

I've merged #2057, all four plots will output by default now, and P and R metrics are now logged (and printed to screen) at the optimal F1 confidence during training (which I assume may vary over the training epochs).

decent-engineer-decent-datascientist · 2021-01-28T02:11:31Z

Awesome, I’ll test it out now. Is there a reason you upload the results as an image rather than passing the plots? I believe passing the matplotlib figure will actually covert it into an interactive plotly plot on the wandb interface.

glenn-jocher · 2021-01-28T04:55:42Z

Just lack of time. We'd want to transition the entire logging infrastructure (or all plots at least) to an interactive local logger-agnostic environment (i.e. local plotly/bokeh dashboard) first and then transition remote logging to view the same.

glenn-jocher · 2021-01-28T05:02:59Z

@decent-engineer-decent-datascientist if you'd like to take a stab at this feel free to! The main precedent for passing logger sources through plotting functions is here.

A loggers dict is constructed here with room for future growth:

yolov5/train.py

Line 134 in f639e14

loggers = {'wandb': wandb} # loggers dict

which is then passed down to lower level plotting functions:

yolov5/train.py

Line 204 in f639e14

plot_labels(labels, save_dir, loggers)

and run at the end of a given plotting function:

yolov5/utils/plots.py

Lines 295 to 299 in f639e14

    
           # loggers 
        
           for k, v in loggers.items() or {}: 
        
               if k == 'wandb' and v: 
        
                   v.log({"Labels": [v.Image(str(x), caption=x.name) for x in save_dir.glob('*labels*.jpg')]}, commit=False)

Like I was saying though we'd want to provide a consistent cross-platform experience, so we'd want to begin the transition locally (i.e. through all plots in utils/plots.py) and then migrate those changes to wandb etc where possible.

decent-engineer-decent-datascientist · 2021-01-28T18:52:33Z

@glenn-jocher ah, time. The things we could do without that constraint haha. I'll mess around with it though, thank you for the pointers!

decent-engineer-decent-datascientist · 2021-01-28T18:54:02Z

@glenn-jocher also would you like me to close this issue now that we've got a nice f1 vs conf graph?

glenn-jocher · 2021-01-28T20:33:09Z

Sure, sounds good!

Ah, issue was closed on linked PR merge.

decent-engineer-decent-datascientist added the enhancement New feature or request label Jan 27, 2021

glenn-jocher mentioned this issue Jan 27, 2021

Metric-Confidence plots feature addition #2057

Merged

glenn-jocher linked a pull request Jan 27, 2021 that will close this issue

Metric-Confidence plots feature addition #2057

Merged

glenn-jocher closed this as completed in #2057 Jan 28, 2021

glenn-jocher mentioned this issue Feb 10, 2021

Pretrained Weight yolov3.pt and Optimizing Parameters for Inference ultralytics/yolov3#1676

Closed

This was referenced Apr 11, 2021

YOLOv5 v5.0 Release #2762

Merged

YOLOv5 v5.0 release compatibility update for YOLOv3 ultralytics/yolov3#1737

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output optimal confidence threshold based on PR curve #2048

Output optimal confidence threshold based on PR curve #2048

decent-engineer-decent-datascientist commented Jan 27, 2021

glenn-jocher commented Jan 27, 2021 •

edited

Loading

glenn-jocher commented Jan 27, 2021

Ownmarc commented Jan 27, 2021 •

edited

Loading

decent-engineer-decent-datascientist commented Jan 27, 2021

glenn-jocher commented Jan 27, 2021

glenn-jocher commented Jan 27, 2021 •

edited

Loading

glenn-jocher commented Jan 27, 2021

decent-engineer-decent-datascientist commented Jan 27, 2021

glenn-jocher commented Jan 27, 2021

decent-engineer-decent-datascientist commented Jan 27, 2021

glenn-jocher commented Jan 28, 2021

decent-engineer-decent-datascientist commented Jan 28, 2021

glenn-jocher commented Jan 28, 2021

glenn-jocher commented Jan 28, 2021

decent-engineer-decent-datascientist commented Jan 28, 2021

decent-engineer-decent-datascientist commented Jan 28, 2021

glenn-jocher commented Jan 28, 2021

Output optimal confidence threshold based on PR curve #2048

Output optimal confidence threshold based on PR curve #2048

Comments

decent-engineer-decent-datascientist commented Jan 27, 2021

🚀 Feature

Motivation

Pitch

Alternatives

glenn-jocher commented Jan 27, 2021 • edited Loading

glenn-jocher commented Jan 27, 2021

Ownmarc commented Jan 27, 2021 • edited Loading

decent-engineer-decent-datascientist commented Jan 27, 2021

glenn-jocher commented Jan 27, 2021

glenn-jocher commented Jan 27, 2021 • edited Loading

glenn-jocher commented Jan 27, 2021

F1 Curve

Precision Curve

Recall Curve

decent-engineer-decent-datascientist commented Jan 27, 2021

glenn-jocher commented Jan 27, 2021

decent-engineer-decent-datascientist commented Jan 27, 2021

glenn-jocher commented Jan 28, 2021

decent-engineer-decent-datascientist commented Jan 28, 2021

glenn-jocher commented Jan 28, 2021

glenn-jocher commented Jan 28, 2021

decent-engineer-decent-datascientist commented Jan 28, 2021

decent-engineer-decent-datascientist commented Jan 28, 2021

glenn-jocher commented Jan 28, 2021

glenn-jocher commented Jan 27, 2021 •

edited

Loading

Ownmarc commented Jan 27, 2021 •

edited

Loading

glenn-jocher commented Jan 27, 2021 •

edited

Loading