Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving Confusion Matrix Interpretability: FP and FN vectors should be switched to align with Predicted and True axis #2071

Closed
rbavery opened this issue Jan 28, 2021 · 15 comments
Labels
enhancement New feature or request Stale

Comments

@rbavery
Copy link
Contributor

rbavery commented Jan 28, 2021

🚀 Feature

The current code to generate the confusion matrix produces something like this:

human_activities_0 5_0 85_confusion_matrix_counts

If the confusion matrix is normalized so that counts become proportions, the proportions are not reflective of standard metrics like the class specific recall.

For example, if we focus on the "boma" class and the proportion of TP, ideally we would want to show an interpretable proportion like recall: TP for boma/ TP for boma+ FN for boma.

Instead, the proportion currently reflects TP for boma / (TP for boma + FN for boma - FN_background for boma+ FP_background for boma), which is difficult to interpret.

human_activities_0 5_0 85_confusion_matrix

I think this is a simple fix, switch the bottom row and label with the rightmost column and label. This way the "True" and "Predicted" axis labels act as axis for all the cells in the confusion matrix, including for False positive background and False Negative background.

For example, this confusion matrix shows more clearly that 4% of the "boma" class are correctly detected out of all groundtruth.

human_activities_0 5_0 85_confusion_matrix

This is the count matrix that swaps the FP and FN background vectors to show what we did to produce the proportion matrix above.

human_activities_0 5_0 85_confusion_matrix_counts

Motivation

This will make normalized confusion matrices more interpretable. I'm not sure how folks interpret the current implementation of the confusion matrix if it is normalized, and I would appreciate guidance on this if there is some reasoning behind the current implementation. However I think the change makes interpretation easier.

Pitch

Take the False Positive background row and swap it with the False Negative background column. Do this so that the "Predicted" axis reflects the category that is predicted for every single row and so that the "True" axis reflects what is groundtruth for every single column.

Alternatives

Don't normalize the matrix and just use counts. But even in this case, I think swapping the row and column data and label positions makes sense and is easier to interpret, since the axis act as axis for all rows and columns in the matrix.

Additional context

I spent some discussion time with @alkalait to confirm that the current implementation is not very interpretable if the matrix is normalized, and that swapping the row and column to make the axis valid for all rows and columns would be an improvement. Hopefully my explanation of the above isn't too confusing and I'm happy to clarify.

Thanks for considering this issue and for open sourcing this awesome project!

@rbavery rbavery added the enhancement New feature or request label Jan 28, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Jan 28, 2021

👋 Hello @rbavery, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@rbavery
Copy link
Contributor Author

rbavery commented Jan 28, 2021

additionally it'd be nice to provide the option to normalize the confusion matrix along a row instead of along a column. This would show class specific precision along the diagonal. I should also mention that I'm happy to submit these changes as PRs if the maintainers are open to them.

@glenn-jocher
Copy link
Member

glenn-jocher commented Jan 28, 2021

@rbavery thanks for the feedback. Normalization is the default as it allows for better results comparison across datasets. I'm not sure I understand your other suggestions, but I do agree the confusion matrix is challenging to interpret due to the background, which may dominate at lower confidence thresholds.

This is also probably due to the fact that people are much more used to seeing classification confusion matrices. Perhaps an option may be an ignore_background flag to plot the matrix without the background classes.

In any case, be advised there was a recent PR with a confusion matrix bug fix: #2046

@rbavery
Copy link
Contributor Author

rbavery commented Jan 28, 2021

Hi @glenn-jocher thanks a bunch for the quick response.

I updated the issue to show what the swapped FN and FP background vectors looks like in the new confusion matrix in units of count, so you can compare with the non-swapped count matrix.

Normalization is the default as it allows for better results comparison across datasets. I'm not sure I understand your other suggestions, but I do agree the confusion matrix is challenging to interpret due to the background

I agree proportions are easier to compare across datasets, it's something I'm doing with this and other models. I'm concerned about what the proportion represents when we just consider a single confusion matrix.

I think swapping the bottom row for the rightmost column allows us to convey more meaningful proportions when the confusion matrix is normalized. I'll try to state my concern a little more clearly with some questions:

What does a proportion represent in the current implementation?

For a true positive cell, shouldn't the proportion represent either recall or precision, depending on the axis the confusion matrix is normalized along?

If a TP proportion cell shouldn't represent recall or precision, what should this proportion represent if background is to be included?

Perhaps an option may be an ignore_background flag to plot the matrix without the background classes.

I also think it could be a bit too dangerous/tempting to offer an option to not include the background class. Object detection models often have a trouble with missing objects for background or predicting background when there should be a detection. Taking this component out of the visualization would make the above example look a lot better than it actually performed in reality, and folks that are more used to image classification matrices (as you mentioned) might be especially tempted to remove the background class from the evaluation.

This would lead to users of their models being very surprised when the model missed a lot of groundtruth or makes a lot of bad detections on background.

@rbavery
Copy link
Contributor Author

rbavery commented Jan 28, 2021

At the core of my interpretation is that background is just another class in object detection. So on the Predicted axis (rows), we should have predicted background (False negative). Along the True axis (columns) we should have the truth being that there was only background and no detection of another class (False positive).

And I think it makes sense that the bottom right corner remains empty, because there are going to be too many cases were background was correctly predicted as background.

@glenn-jocher
Copy link
Member

@rbavery the confusion matrix is just as an intersection of data, with the units as probabilities (when normalized). So if row, col = 0.6 then there's a 60% probability of col being predicted as row in similar future data.

Recall and Precision are a separate topic, and are computed within a single class, never across classes.

@rbavery
Copy link
Contributor Author

rbavery commented Jan 29, 2021

By class specific precision I mean

Precision =Count of True Positives for a class / (Count of True Positives for a class + Count of False Positives for a class)

Apologies if I'm being creative with the terms. But I think you've stated my point

So if row, col = 0.6 then there's a 60% probability of col being predicted as row in similar future data.

In the current implementation, this rule does not hold for the background row, the last row. If you swap the columns and rows, the rule does hold. For example, we have a column for "boma". We have a row for "boma" The probability should be the probability that the groundtruth boma (col) is predicted as "boma" (row). BUT for the last row, we instead have that the probability that background (row) is predicted as "boma" (col).

This is not as intuitive to me as switching it so that the rule you stated holds for all columns and rows. If there's a justification for not switching I'm curious to hear it.

@rbavery
Copy link
Contributor Author

rbavery commented Jan 29, 2021

In fact, the current normalization means you aren't summing across all groundtruth along the column, since the bottom left background/boma cell has no groundtruth "boma". But it is included in the summation of the column that is used to normalize by total amount of groundtruth class "boma".

@glenn-jocher
Copy link
Member

@rbavery I've produced updated confusion matrices here to try to understand this better. The first is at 0.25 conf (default), second at 0.9 confidence, which should increase the 'background' row, but it does not. So perhaps a transpose is order. Can you apply your proposed changes with the two COCO command below to compare?

!python test.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --conf 0.25
!python test.py --weights yolov5x.pt --data coco.yaml --img 640 --iou 0.65 --conf 0.90

0.25 conf (default)

image

0.9 conf

confusion_matrix

@rbavery
Copy link
Contributor Author

rbavery commented Jan 29, 2021

Yes, happily. I can take care of that on the weekend, I'm working toward a deadline right now. I appreciate you testing this and for the feedback.

@rbavery
Copy link
Contributor Author

rbavery commented Feb 1, 2021

I think the fix works. As expected, if you increase the confidence threshold to remove more bad detections, more good detections get removed (False Negative increases). False Negative is now the bottom row.

Old Matrix, with False Positive on bottom row, False negatives as last column.

This is normalized along a column, so False negatives in the rightmost column are not factored into the normalization for all the other columns, which is why the higher confidence threshold is not apparent. If this was a non-normalized matrix, you would see the rightmost column express very dark values as confidence threshold is increased.

confusion_matrix_90

This uses the current code for assigning values to the last row and column

            else:
                self.matrix[gc, self.nc] += 1  # background FP

        if n:
            for i, dc in enumerate(detection_classes):
                if not any(m1 == i):
                    self.matrix[self.nc, dc] += 1  # background FN

and for plot labels

                xticklabels=names + ["background FN"] if labels else "auto",
                yticklabels=names + ["background FP"] if labels else "auto",

New matrix, False Negatives are the bottom row, so they are used in the normalization for all the other columns.

This is why we see the bottom row express higher values, false negatives make up the largest proportion of detections by amount of class groundtruth, across most classes. The rightmost column now shows the proportions of false positives for each class out of all false positives.
confusion_matrix_90.

New code

            else:
                self.matrix[self.nc, gc] += 1  # background FP

        if n:
            for i, dc in enumerate(detection_classes):
                if not any(m1 == i):
                    self.matrix[dc, self.nc] += 1  # background FN
                xticklabels=names + ["background FP"] if labels else "auto",
                yticklabels=names + ["background FN"] if labels else "auto",

If this looks good, I can submit a PR.

@glenn-jocher
Copy link
Member

@rbavery agree, as confidence is increased then more objects are missed, bottom row should increase. As confidence decreases the more FPs should dominate which would be the last column increasing. Sure, submit a PR and I will experiment with this there. Thanks!

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@SaddamHosyn
Copy link

Someone explain what is " BACKGROUND" COLUMN AND ROW in the confusion matrix, and how to interpret them?

@TheArmbreaker
Copy link

Someone explain what is " BACKGROUND" COLUMN AND ROW in the confusion matrix, and how to interpret them?

@SaddamHosyn
When reading background think of the groundtruth according to IoU. This is why in object detection the Matrix is not as easy to interpret like in simple classification task.

  • Type I Error (False Positive): There is a detection despite missing groundtruth.
  • Type II Error (False Negative) There is groundtruth but no detection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

4 participants