The goal of this project is to identify and classify a certain type of tv ad.
In particular we desire to extract and cluster those rectangular ads that can be seen at the bottom to perform classification.
graph LR
subgraph Box Detection
A[Input Image] --> B(Bounding Box Extraction)
B-->C[Image Crop]
end
C-->F
C-->G
subgraph Clustering
D[Nearest Neighbours]
end
subgraph Feature Extraction
F[Color Histogram] --> X
G[VGG16 Latent Space] -->X
X[Concat features]-->D
end
Let's begin by understanding how does the bounding box extraction process works.
graph LR
A[input image]-->B(Blur/LPF) --> C(Canny Edge extraction)
-->D(Dilate - Erode) --> E(Extend Vertical/Horizontal lines) --> F(Box Detection)-->G[Bounding Box]
In the first step we apply a blur kernel to filter out the high frequency noise present in low quality images. This helps in the following step to avoid noise amplification.
This step is followed by performing classic edge detection with Canny It can already be seen that this image contains a great ammount of edges. Most of them are not relevant to our case. It is worth noting that the straight edges are far from perfect.We continue by performing a dilate-erode operation, also known as a close operation. This is applied in order to fill in the gaps that some contours may have. As the images we are dealing with of very low quality this steps provides stronger, more continous edges to apply further processing.
Dilate
Erode
Mmm 🤔 this still looks quite noisy as there are so many useless polygons and edges on this image. This makes polygon finding a lot harder and can definitely be improved.Let's recall that are goal is to recover the outline of the ads. Therefore we propose a simple method based on morphological operations to extract and extend the horizontal and vertical segments.
We split the image into its horizontal and vertical components by performing an Open operation with a big kernel and 2 iterations. This allows us to preserve those nice large vertical segments present in the image and discard those small noisy edges.
# create kernel
vertical_kernel = cv.getStructuringElement(cv.MORPH_RECT, (1, 30))
# extract largest vertical segments
vertical_segments = cv.morphologyEx(binary_image, cv.MORPH_OPEN, vertical_kernel, iterations=2)
# Extend those segments
vertical_segments_extended = cv.dilate(vertical_segments, vertical_kernel, iterations=10)
The same procedure is applied for the horizontal components
graph LR
A[Binary Image]
subgraph Horizontal
C[Horizontal Open]-->D[Dilation]
end
subgraph Vertical
F[Vertical Open]-->G[Dilation]
end
H[Add]
P[Box Extraction]
A-->C
A-->F
D-->H
G-->H
H-->P
After applying the Open operation followed by a dilation we get
Horizontal segments (open) | Extended (dilate) |
---|---|
Vertical segments (open) | Extended (dilate) |
---|---|
Removing all unwanted edges allows us to focus only in the polygons of interest. We can now clearly see two rectangles.
With the aid of the OpenCV tools we cand find all the polygons present in the image.
graph TD
A[1.Find Contours]-->B{2.Too many edges?}
B-->|no| D[Discarded]
B-->|yes|T[3.Approximate Contour to a Rectangles]
-->C{4.Too big or too small?}
C-->|bad size| X[Discarded]
C -->|perfect size!| Y[5.Succes! Return Bounding Box]
Read the following section to gain further understading of the steps involved.
Find all contours
contours, hierarchies = cv.findContours(
image, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE
)
Approximate Valid contours to a Rectangles
epsilon = 0.05 * cv.arcLength(countour, True)
cnt = cv.approxPolyDP(countour, epsilon, True)
Discard complex shapes and remove contours with extreme surface area, either too big or too small. If the contour meets this criteria then we make approximate that contour as a rectangle and save the extracted bounding box 👍
if 4 <= len(cnt) < self.max_polig:
if cv.isContourConvex(cnt):
boundRect_temp = cv.boundingRect(cnt)
if low_area_th < bxu.rect_area(boundRect_temp) < high_area_th:
box = boundRect_temp
Finally, just crop the images. We made sure to keep track of the parent-child relationship between images by storing them into a dataframe.
# End result! 🥳In order to clusterize the ads we need to extract some kind of feature vector that allows us to compare them. Such feature vector, in our case, is composed of the concatention of the feature space output of a pretrained VGG16 model and a color histogram. In this manner we are to combine both classic and sota approaches towards the computation of a rich feature vector.
There are many methods we considered to compare images and perform clustering.
- Pixel by Pixel comparison
- Matched Filters
- Color Histogram
- Tradition Perceptual Hashes (PHash, CHash, BMHash)
- Feature Vector extraction with pre-trained CNN
After thoroughly reviewing each of these options we opted to use the CNN approach as it provides the richest source of information for each image and allows blazing-fast comparisons between them.
We trained a Nearest Neighbors classifier in order to find where does the same ad appear.
The euclidean metric was used to determine similarity.
In this example, we see the DENIM MARKET ad appearing in all those images