Improving Confusion Matrix Interpretability: FP and FN vectors should be switched to align with Predicted and True axis #2071

rbavery · 2021-01-28T21:51:36Z

🚀 Feature

The current code to generate the confusion matrix produces something like this:

If the confusion matrix is normalized so that counts become proportions, the proportions are not reflective of standard metrics like the class specific recall.

For example, if we focus on the "boma" class and the proportion of TP, ideally we would want to show an interpretable proportion like recall: TP for boma/ TP for boma+ FN for boma.

Instead, the proportion currently reflects TP for boma / (TP for boma + FN for boma - FN_background for boma+ FP_background for boma), which is difficult to interpret.

I think this is a simple fix, switch the bottom row and label with the rightmost column and label. This way the "True" and "Predicted" axis labels act as axis for all the cells in the confusion matrix, including for False positive background and False Negative background.

For example, this confusion matrix shows more clearly that 4% of the "boma" class are correctly detected out of all groundtruth.

This is the count matrix that swaps the FP and FN background vectors to show what we did to produce the proportion matrix above.

Motivation

This will make normalized confusion matrices more interpretable. I'm not sure how folks interpret the current implementation of the confusion matrix if it is normalized, and I would appreciate guidance on this if there is some reasoning behind the current implementation. However I think the change makes interpretation easier.

Pitch

Take the False Positive background row and swap it with the False Negative background column. Do this so that the "Predicted" axis reflects the category that is predicted for every single row and so that the "True" axis reflects what is groundtruth for every single column.

Alternatives

Don't normalize the matrix and just use counts. But even in this case, I think swapping the row and column data and label positions makes sense and is easier to interpret, since the axis act as axis for all rows and columns in the matrix.

Additional context

I spent some discussion time with @alkalait to confirm that the current implementation is not very interpretable if the matrix is normalized, and that swapping the row and column to make the axis valid for all rows and columns would be an improvement. Hopefully my explanation of the above isn't too confusing and I'm happy to clarify.

Thanks for considering this issue and for open sourcing this awesome project!

github-actions · 2021-01-28T21:52:19Z

👋 Hello @rbavery, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:
Kaggle Notebook with free GPU: https://www.kaggle.com/ultralytics/yolov5
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

rbavery · 2021-01-28T21:58:21Z

additionally it'd be nice to provide the option to normalize the confusion matrix along a row instead of along a column. This would show class specific precision along the diagonal. I should also mention that I'm happy to submit these changes as PRs if the maintainers are open to them.

glenn-jocher · 2021-01-28T23:16:23Z

@rbavery thanks for the feedback. Normalization is the default as it allows for better results comparison across datasets. I'm not sure I understand your other suggestions, but I do agree the confusion matrix is challenging to interpret due to the background, which may dominate at lower confidence thresholds.

This is also probably due to the fact that people are much more used to seeing classification confusion matrices. Perhaps an option may be an ignore_background flag to plot the matrix without the background classes.

In any case, be advised there was a recent PR with a confusion matrix bug fix: #2046

rbavery · 2021-01-28T23:35:41Z

Hi @glenn-jocher thanks a bunch for the quick response.

I updated the issue to show what the swapped FN and FP background vectors looks like in the new confusion matrix in units of count, so you can compare with the non-swapped count matrix.

Normalization is the default as it allows for better results comparison across datasets. I'm not sure I understand your other suggestions, but I do agree the confusion matrix is challenging to interpret due to the background

I agree proportions are easier to compare across datasets, it's something I'm doing with this and other models. I'm concerned about what the proportion represents when we just consider a single confusion matrix.

I think swapping the bottom row for the rightmost column allows us to convey more meaningful proportions when the confusion matrix is normalized. I'll try to state my concern a little more clearly with some questions:

What does a proportion represent in the current implementation?

For a true positive cell, shouldn't the proportion represent either recall or precision, depending on the axis the confusion matrix is normalized along?

If a TP proportion cell shouldn't represent recall or precision, what should this proportion represent if background is to be included?

Perhaps an option may be an ignore_background flag to plot the matrix without the background classes.

I also think it could be a bit too dangerous/tempting to offer an option to not include the background class. Object detection models often have a trouble with missing objects for background or predicting background when there should be a detection. Taking this component out of the visualization would make the above example look a lot better than it actually performed in reality, and folks that are more used to image classification matrices (as you mentioned) might be especially tempted to remove the background class from the evaluation.

This would lead to users of their models being very surprised when the model missed a lot of groundtruth or makes a lot of bad detections on background.

rbavery · 2021-01-28T23:46:51Z

At the core of my interpretation is that background is just another class in object detection. So on the Predicted axis (rows), we should have predicted background (False negative). Along the True axis (columns) we should have the truth being that there was only background and no detection of another class (False positive).

And I think it makes sense that the bottom right corner remains empty, because there are going to be too many cases were background was correctly predicted as background.

glenn-jocher · 2021-01-29T00:57:32Z

@rbavery the confusion matrix is just as an intersection of data, with the units as probabilities (when normalized). So if row, col = 0.6 then there's a 60% probability of col being predicted as row in similar future data.

Recall and Precision are a separate topic, and are computed within a single class, never across classes.

rbavery · 2021-01-29T02:15:24Z

By class specific precision I mean

Precision =Count of True Positives for a class / (Count of True Positives for a class + Count of False Positives for a class)

Apologies if I'm being creative with the terms. But I think you've stated my point

So if row, col = 0.6 then there's a 60% probability of col being predicted as row in similar future data.

In the current implementation, this rule does not hold for the background row, the last row. If you swap the columns and rows, the rule does hold. For example, we have a column for "boma". We have a row for "boma" The probability should be the probability that the groundtruth boma (col) is predicted as "boma" (row). BUT for the last row, we instead have that the probability that background (row) is predicted as "boma" (col).

This is not as intuitive to me as switching it so that the rule you stated holds for all columns and rows. If there's a justification for not switching I'm curious to hear it.

rbavery · 2021-01-29T02:18:34Z

In fact, the current normalization means you aren't summing across all groundtruth along the column, since the bottom left background/boma cell has no groundtruth "boma". But it is included in the summation of the column that is used to normalize by total amount of groundtruth class "boma".

glenn-jocher · 2021-01-29T02:18:39Z

@rbavery I've produced updated confusion matrices here to try to understand this better. The first is at 0.25 conf (default), second at 0.9 confidence, which should increase the 'background' row, but it does not. So perhaps a transpose is order. Can you apply your proposed changes with the two COCO command below to compare?

!python test.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --conf 0.25
!python test.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --conf 0.90

0.25 conf (default)

0.9 conf

rbavery · 2021-01-29T02:22:10Z

Yes, happily. I can take care of that on the weekend, I'm working toward a deadline right now. I appreciate you testing this and for the feedback.

rbavery · 2021-02-01T07:26:46Z

I think the fix works. As expected, if you increase the confidence threshold to remove more bad detections, more good detections get removed (False Negative increases). False Negative is now the bottom row.

Old Matrix, with False Positive on bottom row, False negatives as last column.

This is normalized along a column, so False negatives in the rightmost column are not factored into the normalization for all the other columns, which is why the higher confidence threshold is not apparent. If this was a non-normalized matrix, you would see the rightmost column express very dark values as confidence threshold is increased.

This uses the current code for assigning values to the last row and column

            else:
                self.matrix[gc, self.nc] += 1  # background FP

        if n:
            for i, dc in enumerate(detection_classes):
                if not any(m1 == i):
                    self.matrix[self.nc, dc] += 1  # background FN

and for plot labels

                xticklabels=names + ["background FN"] if labels else "auto",
                yticklabels=names + ["background FP"] if labels else "auto",

New matrix, False Negatives are the bottom row, so they are used in the normalization for all the other columns.

This is why we see the bottom row express higher values, false negatives make up the largest proportion of detections by amount of class groundtruth, across most classes. The rightmost column now shows the proportions of false positives for each class out of all false positives.
.

New code

            else:
                self.matrix[self.nc, gc] += 1  # background FP

        if n:
            for i, dc in enumerate(detection_classes):
                if not any(m1 == i):
                    self.matrix[dc, self.nc] += 1  # background FN

                xticklabels=names + ["background FP"] if labels else "auto",
                yticklabels=names + ["background FN"] if labels else "auto",

If this looks good, I can submit a PR.

glenn-jocher · 2021-02-01T18:13:32Z

@rbavery agree, as confidence is increased then more objects are missed, bottom row should increase. As confidence decreases the more FPs should dominate which would be the last column increasing. Sure, submit a PR and I will experiment with this there. Thanks!

github-actions · 2021-03-04T00:40:21Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

SaddamHosyn · 2022-11-13T11:05:29Z

Someone explain what is " BACKGROUND" COLUMN AND ROW in the confusion matrix, and how to interpret them?

TheArmbreaker · 2022-11-13T11:41:37Z

Someone explain what is " BACKGROUND" COLUMN AND ROW in the confusion matrix, and how to interpret them?

@SaddamHosyn
When reading background think of the groundtruth according to IoU. This is why in object detection the Matrix is not as easy to interpret like in simple classification task.

Type I Error (False Positive): There is a detection despite missing groundtruth.
Type II Error (False Negative) There is groundtruth but no detection.

rbavery added the enhancement New feature or request label Jan 28, 2021

github-actions bot added the Stale label Mar 4, 2021

rbavery closed this as completed Mar 6, 2021

This was referenced Apr 11, 2021

YOLOv5 v5.0 Release #2762

Merged

YOLOv5 v5.0 release compatibility update for YOLOv3 ultralytics/yolov3#1737

Merged

Shuixin-Li mentioned this issue Jul 11, 2023

How to interpret the background class in confusion matrix? meituan/YOLOv6#875

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Confusion Matrix Interpretability: FP and FN vectors should be switched to align with Predicted and True axis #2071

Improving Confusion Matrix Interpretability: FP and FN vectors should be switched to align with Predicted and True axis #2071

rbavery commented Jan 28, 2021 •

edited

Loading

github-actions bot commented Jan 28, 2021 •

edited by glenn-jocher

Loading

rbavery commented Jan 28, 2021 •

edited

Loading

glenn-jocher commented Jan 28, 2021 •

edited

Loading

rbavery commented Jan 28, 2021 •

edited

Loading

rbavery commented Jan 28, 2021

glenn-jocher commented Jan 29, 2021

rbavery commented Jan 29, 2021 •

edited

Loading

rbavery commented Jan 29, 2021

glenn-jocher commented Jan 29, 2021

rbavery commented Jan 29, 2021

rbavery commented Feb 1, 2021 •

edited

Loading

glenn-jocher commented Feb 1, 2021

github-actions bot commented Mar 4, 2021

SaddamHosyn commented Nov 13, 2022

TheArmbreaker commented Nov 13, 2022

Improving Confusion Matrix Interpretability: FP and FN vectors should be switched to align with Predicted and True axis #2071

Improving Confusion Matrix Interpretability: FP and FN vectors should be switched to align with Predicted and True axis #2071

Comments

rbavery commented Jan 28, 2021 • edited Loading

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

github-actions bot commented Jan 28, 2021 • edited by glenn-jocher Loading

Requirements

Environments

Status

rbavery commented Jan 28, 2021 • edited Loading

glenn-jocher commented Jan 28, 2021 • edited Loading

rbavery commented Jan 28, 2021 • edited Loading

rbavery commented Jan 28, 2021

glenn-jocher commented Jan 29, 2021

rbavery commented Jan 29, 2021 • edited Loading

rbavery commented Jan 29, 2021

glenn-jocher commented Jan 29, 2021

0.25 conf (default)

0.9 conf

rbavery commented Jan 29, 2021

rbavery commented Feb 1, 2021 • edited Loading

Old Matrix, with False Positive on bottom row, False negatives as last column.

New matrix, False Negatives are the bottom row, so they are used in the normalization for all the other columns.

glenn-jocher commented Feb 1, 2021

github-actions bot commented Mar 4, 2021

SaddamHosyn commented Nov 13, 2022

TheArmbreaker commented Nov 13, 2022

rbavery commented Jan 28, 2021 •

edited

Loading

github-actions bot commented Jan 28, 2021 •

edited by glenn-jocher

Loading

rbavery commented Jan 28, 2021 •

edited

Loading

glenn-jocher commented Jan 28, 2021 •

edited

Loading

rbavery commented Jan 28, 2021 •

edited

Loading

rbavery commented Jan 29, 2021 •

edited

Loading

rbavery commented Feb 1, 2021 •

edited

Loading