Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding polygon detection alongside bounding boxes #6506

Closed
wants to merge 47 commits into from

Conversation

ahmad4633
Copy link

@ahmad4633 ahmad4633 commented Feb 2, 2022

I would like to thank XinzeLee for his amazing work, that inspired this pull request.

This PR adds 4 points polygon detection capability to this repo. Original bounding box detection is not modified. The new head will output 9 values (index x1 y1 x2 y2 x3 y3 x4 y4), the labels for training are of the same format as the mentioned output.

Also, for training we need to install polygon IoU Cuda to speed up the validation process and we need set "--polygon True", a full tutorial for training is provided in this notebook Polygon Tutorial.

image

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Enhanced support for polygon labels in YOLOv5.

📊 Key Changes

  • Added polygon label plotting utilities.
  • Improved polygon label support in training and logging functions.
  • Integrated polygon-specific NMS (Non-Maximum Suppression).
  • Developed CUDA extensions for polygon IOU (Intersection Over Union) computation.

🎯 Purpose & Impact

  • These changes allow YOLOv5 to train and infer with polygon-shaped bounding boxes, which are more accurate for certain objects.
  • The CUDA extensions improve performance by offloading complex IOU calculations to the GPU.
  • Overall, this broadens the range of computer vision tasks YOLOv5 can handle effectively.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hello @ahmad4633, thank you for submitting a YOLOv5 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • ✅ Verify your PR is up-to-date with upstream/master. If your PR is behind upstream/master an automatic GitHub Actions merge may be attempted by writing /rebase in a new comment, or by running the following code, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
# git checkout feature  # <--- replace 'feature' with local branch name
git merge upstream/master
git push -u origin -f
  • ✅ Verify all Continuous Integration (CI) checks are passing.
  • ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

@ahmad4633 ahmad4633 closed this Feb 2, 2022
@ahmad4633 ahmad4633 reopened this Feb 3, 2022
@ahmad4633
Copy link
Author

I keep getting load_image is not defined error even though it should be imported from utils/datasets ? Any help ? Also I have installed iou_cuda utils via setup.py but I am getting Warning: "polygon_inter_union_cuda" and "polygon_b_inter_union_cuda" are not installed errors ?

Merhaba Uygar, please refer to Colab Tutorial.

@tujh2
Copy link

tujh2 commented Feb 15, 2022

Hi, @RoudyES

Unfortunately, there is no direct fix from our end for this.

I figured out that you can just comment y[..., 8:] = y[..., 8:].sigmoid() line in Polygon_Detect.forward(x) in models/yolo.py and then OpenCV reads model successfully. And it does not fail on Mul operator too.
The graph after exporting does not contain ScatterND operators.

image

Also I have a question about Polygon_Detect._make_grid() func
OpenCV fails on forward() step. I've got the same error as in opencv/opencv#20072
I just copied original _make_grid() func from Detect and this helped.
Now I can use polygon model with OpenCV and this working fine (I have custom inference pipeline in c++ and opencv, so I don't use detect.py)

Also I don't know if this changes are correct, but I'll leave that question to you 🤷

@RoudyES
Copy link

RoudyES commented Feb 15, 2022

@tujh2 Careful of removing that y[..., 8:] = y[..., 8:].sigmoid(), it is there to make sure that all the outputs are within the [0..1] range. Without it the model might output values outside of that range which will go out of bounds of the image. You should be able to remedy that in post processing but we thought of leaving it there to include that step directly in the model and save the user from an extra step at post processing. I think it is also present in the master branch for the same reason (clamping values between 0..1), @glenn-jocher can correct me if I'm wrong.

As for the _make_grid() function I'm glad that it worked fine for you! Thank you for pointing that swapping it with Detect's function fixed it! We'll do some further testing to make sure that swapping the functions won't have any hidden unwanted effect, if all goes well we'll then update the PR with the original _make_grid().

Also, if you're on C++ I recommend swapping your inference engine to TensorRT, your pre and post processing steps will remain almost the same and you'll get a significant speedup in your pipeline.

@sctrueew
Copy link

sctrueew commented Feb 15, 2022

@RoudyES Hi,

I've converted the model to onnx and when I want to do inference in C++ I get this error:

(-2:Unspecified error) Can't create layer "416" of type "Range" in function 'cv::dnn::dnn4_v20211004::LayerData::getLayerInstance'

Do you have a sample to use in C++ on onnx or TensorRT?

Thanks

@RoudyES
Copy link

RoudyES commented Feb 15, 2022

@sctrueew it seems to me that you tried to run inference using OpenCV's DNN module. If so, please refer to @tujh2's comment above. They made some slight changes and were able to run the model with OpenCV's DNN.

However, I recommend running ONNX models using ONNX Runtime. Also, if you compile onnx runtime from source with TensorRT execution provider you will be able to run the same .onnx model with TensorRT backend without changing your code (the runtime will automatically convert the onnx model to .engine). Or you can just export the .pt model into .engine using export.py.

As for the sample, I'm afraid we don't have a C++ snippet. We usually run everything in C#.

@sctrueew
Copy link

@RoudyES Thank you. @tujh2, Have you successfully run in C++?

@tujh2
Copy link

tujh2 commented Feb 15, 2022

@sctrueew
Yes, I have. I run it with OpenCV 4.5.4 successfully.
I think that you exported your model without --simplify option. Try with this option and my tips in comment above.

@sctrueew
Copy link

@sctrueew it seems to me that you tried to run inference using OpenCV's DNN module. If so, please refer to @tujh2's comment above. They made some slight changes and were able to run the model with OpenCV's DNN.

However, I recommend running ONNX models using ONNX Runtime. Also, if you compile onnx runtime from source with TensorRT execution provider you will be able to run the same .onnx model with TensorRT backend without changing your code (the runtime will automatically convert the onnx model to .engine). Or you can just export the .pt model into .engine using export.py.

As for the sample, I'm afraid we don't have a C++ snippet. We usually run everything in C#.

@RoudyES Hi,
Thank you for the reply, My final goal is to run it in C# as well. It would be nice to have it in C#.

@sctrueew
Copy link

@sctrueew Yes, I have. I run it with OpenCV 4.5.4 successfully. I think that you exported your model without --simplify option. Try with this option and my tips in comment above.

Thanks, I've exported successfully the model to onnx but I still get the error when I want to readNet.

OpenCV version: 4.5.4 => built with GPU
auto net = dnn::readNetFromONNX("polygon.onnx");

What changes did you in the C++ side?

@tujh2
Copy link

tujh2 commented Feb 15, 2022

What changes did you in the C++ side?

Nothing at all. Actually I use GoCV - golang bindings for OpenCV, so I didn't modify code inside OpenCV library at all

@sctrueew
Copy link

sctrueew commented Feb 15, 2022

What changes did you in the C++ side?

Nothing at all. Actually I use GoCV - golang bindings for OpenCV, so I didn't modify code inside OpenCV library at all

I've followed your tips and finally, I could readNet but I get an error in forward():

shape_utils.hpp:171: error: (-215:Assertion failed) start <= (int)shape.size() && end <= (int)shape.size() && start <= end in function 'cv::dnn::dnn4_v20211004::total'

@tujh2
Copy link

tujh2 commented Feb 15, 2022

What changes did you in the C++ side?

Nothing at all. Actually I use GoCV - golang bindings for OpenCV, so I didn't modify code inside OpenCV library at all

I've followed your tips and finally, I could readNet but I get an error in forward():

shape_utils.hpp:171: error: (-215:Assertion failed) start <= (int)shape.size() && end <= (int)shape.size() && start <= end in function 'cv::dnn::dnn4_v20211004::total'

This is a problem with Polygon_Detect._make_grid() func.

    def forward(self, x):
        # x = x.copy() # for profiling
        z = []  # inference output
        for i in range(self.nl):
            x[i] = self.m[i](x[i])  # conv
            bs, _, ny, nx = x[i].shape  # x(bs,267,20,20) to x(bs,3,20,20,89)
            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()

            if not self.training:  # inference
                if self.onnx_dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:
                    self.grid[i] = self._make_grid(nx, ny, i)

                y = x[i].clone()
                #y[..., 8:] = y[..., 8:].sigmoid()
                # y[..., 8], y[..., 9:] = y[..., 8].sigmoid(), y[...,
                # 9:].softmax(dim=-1) # softmax loss for classes
                if self.inplace:
                    y[..., :8] = (y[..., :8] + self.grid[i].repeat((1, 1, 1, 1, 4))) * self.stride[i]  # xyxyxyxy
                else:
                    xyxyxyxy = (y[..., :8] + self.grid[i].repeat((1, 1, 1, 1, 4))) * self.stride[i]  # xyxyxyxy
                    y = torch.cat((xyxyxyxy, y[..., 8:]), -1)
                z.append(y.view(bs, -1, self.no))

        return x if self.training else (torch.cat(z, 1), x)

    def _make_grid(self, nx=20, ny=20, i=0):
        d = self.anchors[i].device
        if check_version(torch.__version__, '1.10.0'):  # torch>=1.10.0 meshgrid workaround for torch>=0.7 compatibility
            yv, xv = torch.meshgrid([torch.arange(ny, device=d), torch.arange(nx, device=d)], indexing='ij')
        else:
            yv, xv = torch.meshgrid([torch.arange(ny, device=d), torch.arange(nx, device=d)])
        return torch.stack((xv, yv), 2).expand((1, self.na, ny, nx, 2)).float()
Try replace code in Polygon_Detect with this. That's work for me.

@sctrueew
Copy link

sctrueew commented Feb 15, 2022

@tujh2 Thank you very much. it works for me but I have to change the result of forwarding.

`
cv::dnn::blobFromImage(input_image, blob, 1. / 255., cv::Size(INPUT_WIDTH, INPUT_HEIGHT), cv::Scalar(), true, false);

net.setInput(blob);
std::vector<cv::Mat> outputs;
net.forward(outputs, net.getUnconnectedOutLayersNames());

float x_factor = input_image.cols / INPUT_WIDTH;
float y_factor = input_image.rows / INPUT_HEIGHT;

float* data = (float*)outputs[0].data;

const int dimensions = 85;
const int rows = 25200;

std::vector<int> class_ids;
std::vector<float> confidences;
std::vector<cv::Rect> boxes;

for (int i = 0; i < rows; ++i) {
	float confidence = data[4];`  <<===== error occurred`

@sctrueew
Copy link

@tujh2 Do you have any ideas on how to resolve this issue?
Thanks

@tujh2
Copy link

tujh2 commented Feb 15, 2022

@tujh2 Do you have any ideas on how to resolve this issue? Thanks

I can publish my Go-inference code. Just modify this for C++.

        blob := gocv.BlobFromImage(src, 1/255.0, myNet.NetSize, myNet.netMean, true, false)
        defer blob.Close()
	myNet.net.SetInput(blob, "")

	netOutput := myNet.net.Forward(myNet.outputLayerName) // outputLayerName = "output"
	defer netOutput.Close()
	srcImgSize := src.Size()
	ratioW := float32(srcImgSize[1]) / float32(myNet.NetSize.Y) // NetSize is like --img (default 640x640)
	ratioH := float32(srcImgSize[0]) / float32(myNet.NetSize.X)
	var netWidth int = myNet.СlassesCount
	if myNet.isPolygon {
		netWidth += 9 // polygon has 8 fields for coords and 1 field for boxScore and other - class score
	} else {
		netWidth += 5
	}
	pData, _ := netOutput.DataPtrFloat32()
	var rects []image.Rectangle
	var polygonRects []PolygonOutput
	var scores []float32
	var indices []int
	var classes []int
	for i := 0; i < len(pData)-netWidth; i += netWidth {
		if myNet.isPolygon {
			fboxScore := pData[i+8]
			if fboxScore > myNet.BoxThreshold {
				x1 := pData[i] * ratioW
				y1 := pData[i+1] * ratioH
				x2 := pData[i+2] * ratioW
				y2 := pData[i+3] * ratioH
				x3 := pData[i+4] * ratioW
				y3 := pData[i+5] * ratioH
				x4 := pData[i+6] * ratioW
				y4 := pData[i+7] * ratioH

				var maxClassScore float32 = 0.0
				var classIndex = 0
				for k := i + 9; k < i+9+myNet.СlassesCount; k++ {
					classScore := pData[k]
					if classScore > maxClassScore {
						classIndex = k
						maxClassScore = classScore
					}
				}
			        // save result
			        // classId := classIndex - i - 9
			        //...
			}
		} else { 
		        // non-polygon yolo
		        //...
		}
	}

@sctrueew
Copy link

@tujh2 Do you have any ideas on how to resolve this issue? Thanks

I can publish my Go-inference code. Just modify this for C++.

        blob := gocv.BlobFromImage(src, 1/255.0, myNet.NetSize, myNet.netMean, true, false)
        defer blob.Close()
	myNet.net.SetInput(blob, "")

	netOutput := myNet.net.Forward(myNet.outputLayerName) // outputLayerName = "output"
	defer netOutput.Close()
	srcImgSize := src.Size()
	ratioW := float32(srcImgSize[1]) / float32(myNet.NetSize.Y) // NetSize is like --img (default 640x640)
	ratioH := float32(srcImgSize[0]) / float32(myNet.NetSize.X)
	var netWidth int = myNet.СlassesCount
	if myNet.isPolygon {
		netWidth += 9 // polygon has 8 fields for coords and 1 field for boxScore and other - class score
	} else {
		netWidth += 5
	}
	pData, _ := netOutput.DataPtrFloat32()
	var rects []image.Rectangle
	var polygonRects []PolygonOutput
	var scores []float32
	var indices []int
	var classes []int
	for i := 0; i < len(pData)-netWidth; i += netWidth {
		if myNet.isPolygon {
			fboxScore := pData[i+8]
			if fboxScore > myNet.BoxThreshold {
				x1 := pData[i] * ratioW
				y1 := pData[i+1] * ratioH
				x2 := pData[i+2] * ratioW
				y2 := pData[i+3] * ratioH
				x3 := pData[i+4] * ratioW
				y3 := pData[i+5] * ratioH
				x4 := pData[i+6] * ratioW
				y4 := pData[i+7] * ratioH

				var maxClassScore float32 = 0.0
				var classIndex = 0
				for k := i + 9; k < i+9+myNet.СlassesCount; k++ {
					classScore := pData[k]
					if classScore > maxClassScore {
						classIndex = k
						maxClassScore = classScore
					}
				}
			        // save result
			        // classId := classIndex - i - 9
			        //...
			}
		} else { 
		        // non-polygon yolo
		        //...
		}
	}

Thanks for sharing, If we have points more than 4 points how can we set (netWidth += 9). I don't know how many points are there? for example, one detection has 4 (x,y) and another one has 12(x,y)

@sctrueew
Copy link

This is my code but not working.
`

	Mat blob;
        int col = SrcImg.cols;
	int row = SrcImg.rows;
	int maxLen = MAX(col, row);
	Mat netInputImg = SrcImg.clone();
	if (maxLen > 1.2 * col || maxLen > 1.2 * row) {
		Mat resizeImg = Mat::zeros(maxLen, maxLen, CV_8UC3);
		SrcImg.copyTo(resizeImg(Rect(0, 0, col, row)));
		netInputImg = resizeImg;
	}
	blobFromImage(netInputImg, blob, 1 / 255.0, cv::Size(netWidth, netHeight), cv::Scalar(104, 117, 123), true, false);
	net.setInput(blob);
	std::vector<cv::Mat> netOutputImg;
	net.forward(netOutputImg, net.getUnconnectedOutLayersNames());
	std::vector<int> classIds;
	std::vector<float> confidences;
	std::vector<cv::Rect> boxes;
	float ratio_h = (float)netInputImg.rows / netHeight;
	float ratio_w = (float)netInputImg.cols / netWidth;
	//int net_width = className.size() + 5;
	int net_width = className.size() + 9;
	float* pdata = (float*)netOutputImg[0].data;



	std::vector<polygon> pol;

	float classScore = 0;
	for (int i = 0; i < sizeof(pdata) - net_width; i += net_width) {

		float confidence = pdata[i + 8];
		if (confidence >= 0.5) {

			float x1 = pdata[i] * ratio_w;
			float y1 = pdata[i + 1] * ratio_h;
			float x2 = pdata[i + 2] * ratio_w;
			float y2 = pdata[i + 3] * ratio_h;
			float x3 = pdata[i + 4] * ratio_w;
			float y3 = pdata[i + 5] * ratio_h;
			float x4 = pdata[i + 6] * ratio_w;
			float y4 = pdata[i + 7] * ratio_h;
			auto maxClassScore = 0.0;
			auto classIndex = 0;
			for (int k = i + 9; k < i + 9 + className.size(); k++) {
				classScore = pdata[k];
				if (classScore > maxClassScore) {
					classIndex = k;
					maxClassScore = classScore;
				}
			}
			classIds.push_back(classIndex);
			confidences.push_back(maxClassScore);
			//boxes.push_back(Rect(x1, y1, x2, y2));
			pol.push_back({ x1,y1,x2,y2,x3,y3,x4,y4 });
		}
	}

`

@RoudyES
Copy link

RoudyES commented Feb 16, 2022

@sctrueew Please note that the model can only predict 4 points polygons. You will never have a result with more than 4 points (if you did your post processing isn't correct).

@sctrueew
Copy link

Yes, I know. what about this example?

image

There are more than four points for each object in the example above

@RoudyES
Copy link

RoudyES commented Feb 16, 2022

@sctrueew This can't be achieved in this PR. The repo can only detect bounding boxes (such as the rotated boxes of the persons in this example). The image you shared seems to me the result of an Instance Segmentation model, where you get an object's bounding box along with it's contour polygon, this is not yet supported in this PR (although we might look into it in the future if this PR gets merged with master branch).

@GivanTsai
Copy link

There is a bug when I want to save the predicted text. If I run the inference command
python detect.py --polygon True --data '/content/yolov5/yolov5/data/polygon.ymal' --weights '/content/gdrive/MyDrive/Polygon_Yolo/yolov5/lpr/s/exp/weights/polygon_best.pt' --img 320 --source '/content/yolov5/polygon_all/images/val/' --hide-labels
It is Successful, but it is failed when I add the option --save-txt and --save-crop
!python detect.py --polygon True --data '/content/yolov5/yolov5/data/polygon.ymal' --weights '/content/gdrive/MyDrive/Polygon_Yolo/yolov5/lpr/s/exp/weights/polygon_best.pt' --img 320 --source '/content/yolov5/polygon_all/images/val/' --hide-labels --save-txt --save-conf --save-crop
image
How to deal with this bug? @ahmad4633

@GivanTsai
Copy link

There is a bug when I want to save the predicted text. If I run the inference command python detect.py --polygon True --data '/content/yolov5/yolov5/data/polygon.ymal' --weights '/content/gdrive/MyDrive/Polygon_Yolo/yolov5/lpr/s/exp/weights/polygon_best.pt' --img 320 --source '/content/yolov5/polygon_all/images/val/' --hide-labels It is Successful, but it is failed when I add the option --save-txt and --save-crop !python detect.py --polygon True --data '/content/yolov5/yolov5/data/polygon.ymal' --weights '/content/gdrive/MyDrive/Polygon_Yolo/yolov5/lpr/s/exp/weights/polygon_best.pt' --img 320 --source '/content/yolov5/polygon_all/images/val/' --hide-labels --save-txt --save-conf --save-crop image How to deal with this bug? @ahmad4633

It seems that the size of gn is torch.Size([4]) and torch.tensor(xyxyxyxy).view(1, 8) is torch.Size([1,8]), how to fix it?

@github-actions
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions YOLOv5 🚀 and Vision AI ⭐.

@github-actions github-actions bot added the Stale label Mar 21, 2022
@github-actions github-actions bot closed this Mar 27, 2022
@Aun0124
Copy link

Aun0124 commented Jan 11, 2023

@sctrueew @tujh2 I have also convert my model to onnx and the output shape will be (batch, x, 12) for polygon . Can i know did you all implement own nms or using other solution to solve this ? Anyway I wanted to implement in java for android apps.

@tujh2
Copy link

tujh2 commented Jan 11, 2023

@sctrueew @tujh2 I have also convert my model to onnx and the output shape will be (batch, x, 12) for polygon . Can i know did you all implement own nms or using other solution to solve this ? Anyway I wanted to implement in java for android apps.

Hi, no, i have missed nms step cause my photo context contains only 1 object. So i just choose polygon by max score. Also, as i know, polygon nms is an expensive step

@Aun0124
Copy link

Aun0124 commented Jan 16, 2023

@sctrueew @tujh2 I have also convert my model to onnx and the output shape will be (batch, x, 12) for polygon . Can i know did you all implement own nms or using other solution to solve this ? Anyway I wanted to implement in java for android apps.

Hi, no, i have missed nms step cause my photo context contains only 1 object. So i just choose polygon by max score. Also, as i know, polygon nms is an expensive step

@tujh2 how about the preprocessing letterbox resize? can i have look on your code anyway , will be greatful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants