From 3a078309902ddcf62bed140e60ba95d087dfa982 Mon Sep 17 00:00:00 2001 From: Olzhas Zhumabek Date: Sun, 1 Sep 2019 04:20:20 +0600 Subject: [PATCH 01/11] Docs for Harris and Hessian The docs are written with a beginner in mind and has a basics section. The pictures and paper links are to be inserted. --- doc/image_processing/basics.md | 27 +++++++ .../space-extrema-detectors.md | 71 +++++++++++++++++++ 2 files changed, 98 insertions(+) create mode 100644 doc/image_processing/basics.md create mode 100644 doc/image_processing/space-extrema-detectors.md diff --git a/doc/image_processing/basics.md b/doc/image_processing/basics.md new file mode 100644 index 0000000000..4870448de3 --- /dev/null +++ b/doc/image_processing/basics.md @@ -0,0 +1,27 @@ +## Basics + +Here are basic concepts that might help to understand documentation written in this folder: + +### Convolution + +--- + +### Filters, kernels, weights + +Those three words usually mean the same thing, unless context is clear about a different usage. Simply put, they are matrices, that are used to achieve certain effects on the image. Lets consider a simple one, 3 by 3 Scharr filter + +$ScharrX = [1,0,-1][1,0,-1][1,0,-1]$ + +The filter above, when convolved with a single channel image (intensity/luminance strength), will produce a gradient in X (horizontal) direction. There is filtering that cannot be done with a kernel though, and one good example is median filter (mean is the arithmetic mean, whereas median will be the center element of a sorted array). + +--- + + ### Derivatives + +A derivative of an image is a gradient in one of two directions, e.g. x (horizontal) and y (vertical). To compute a derivative, one can use Scharr, \ and other gradient filters. + +--- + +### Curvature + +The word, when used alone, will mean the curvature that would be generated if values of an image would be plotted in 3D graph. X and Z axises (which form horizontal plane) will correspond to X and Y indices of an image, and Y axis will correspond to value at that pixel. By little stretch of an imagination, filters (another names are kernels, weights) could be considered an image (or any 2D matrix). A mean filter would draw a flat plane, whereas Gaussian filter would draw a hill that gets sharper depending on it's sigma value. diff --git a/doc/image_processing/space-extrema-detectors.md b/doc/image_processing/space-extrema-detectors.md new file mode 100644 index 0000000000..867160a993 --- /dev/null +++ b/doc/image_processing/space-extrema-detectors.md @@ -0,0 +1,71 @@ +## Space extrema detectors + +### What is being detected? + +A good feature is one that is repeatable, stable and can be recognized under affine transformations. Unfortunately, edges do not fit the description. Corners, on the hand, fit well enough. + +--- + +### Available detectors + +At the moment, the following detectors are implemented + + - Harris corner detector + + - Hessian detector + +--- + +### Algorithm steps + +#### Harris and Hessian + +Both are derived from a concept called Moravec window. Lets have a look at the image below: + +\ + +As can be noticed, moving the yellow window in any direction will cause very big change in intensity. This is the key concept in understanding how the two corner detectors work. + +The algorithms have the same structure: + + 1. Compute image derivatives + + 2. Weighted convolution + + 3. Compute response + + 4. Threshold (optional) + +Harris and Hessian differ in what **derivatives they compute**. Harris computes the following derivatives: + +$HarrisMatrix = [(dx)^2, dxdy], [dxdy, (dy)^2]$ + +*(note that $d(x^2)$ and $(dy^2)$ are **numerical** powers, not gradient again).* + +The three distinct terms of a matrix can be separated into three images, to simplify implementation. Hessian, on the other hand, computes second order derivatives: + +$HessianMatrix = [dxdx, dxdy][dxdy, dydy]$ + +**Weighted convolution** is the same for both. Usually Gaussian blur matrix is used as weights, because corners should have hill like curvature in gradients, and other weights might be noisy. + +**Response computation** is a matter of choice. Given the general form of both matrices above + +$[a, b][c, d]$ + +One of the response functions is + +$response = det - k * trace^2 = a * c - b * d - k * (a + d)^2$ + +$k$ is called discrimination constant. Usual values are $0.04$ - $0.06$. + +The other is simply determinant + +$response = det = a * c - b * d$ + +**Thresholding** is optional, but without it the result will be extremely noisy. For complex images, like the ones of outdoors, for Harris it will be in order of \ and for Hessian will be in order of \. For simpler images values in order of 100s and 1000s should be enough. + +To get deeper explanation please refer to following **papers**: + +\ + +\ From e14f182d33cd2baca6db4c83e183b0f5b797f91b Mon Sep 17 00:00:00 2001 From: Olzhas Zhumabek Date: Sun, 1 Sep 2019 00:47:03 +0600 Subject: [PATCH 02/11] Fill values, links and images This commit fills in the template left by the previous commit --- .../Moravec-window-corner.png | Bin 0 -> 1323 bytes doc/image_processing/Moravec-window-edge.png | Bin 0 -> 1322 bytes doc/image_processing/basics.md | 8 +++++- .../space-extrema-detectors.md | 27 +++++++++++------- 4 files changed, 23 insertions(+), 12 deletions(-) create mode 100644 doc/image_processing/Moravec-window-corner.png create mode 100644 doc/image_processing/Moravec-window-edge.png diff --git a/doc/image_processing/Moravec-window-corner.png b/doc/image_processing/Moravec-window-corner.png new file mode 100644 index 0000000000000000000000000000000000000000..c0a8d207c21beb4dc367e23271c5f8bfe5454dd8 GIT binary patch literal 1323 zcmeAS@N?(olHy`uVBq!ia0y~yVAKI&4mO}jWo=(6kkgv!>>S|f?5t2wl%JNFlghxL zF|l@{t;b;piMIROyt+eJ-YO`~SZEdbfxqyImdHY<6{1n420Op-&oWWb)az@?K6r5c z(N)deo7eHJYf|{Z{OHlEB`=gz>!yoD9jPcN=YL=O{`B4QhW&AAt8N;yt(cX%M5s`4 za;#|6dX7Yny<7Uc&s=%#V#s;^V=2G<{r5>Wr+2?Qe{t;-AGRMXk0O^zi+yDK_+{DB z;(O|S8x}1Qm@nKfRpR)_ecfrj{e^lKZ=TN-RGvFay-$F<)oj}B8KS&XE4nOJt&n>9 z+gML~&UTB{Pqnh|P2O)X=W^JzoqojvoH|LIN-QTdZyz{X>OJ{;!)e(C|2?PkoSevH^7p6BhdQ_WQ&@lHHLf$=o`1G@ zjvYhhtI$tu78WdSTvPVG`}i)P`u@MQncv>Cw@fKo_dk2>EMUBF7I;J!Gcf2WgD_*o zQu{KXAbW|YuPggw4tXJd@%rbAlYl~!C9V-A&iT2ysd*&~&PAz-C8;S2<(VZJ3hti1 z0pX2&;tUMTBAzaeAr-gY-r308U?Ae`$i9*PcmF|`b4(&3f{x|N5f?A7H=1O^AZ_8p zz#-O&p{tEVCV$hZq<>S|f?5t2wl%JNFlghxL zF|l@{t;b;piMIROyt+eJ-YO`~SZEdbfxqyImdHY<6{1n420Op-&oWWb)az@?K6r5c z(N)deo7eHJYf|{Z{OHlEB`=gz>!yoD9jPcN=YL=O{`B4QhW&AAt8N;yt(cX%M5s`4 za;#|6dX7Yny<7Uc&s=%#V#s;^V=2G<{r5>Wr+2?Qe{t;-AGRMXk0O^zi+yDK_+{DB z;(O|S8x}1Qm@nKfRpR)_ecfrj{e^lKZ=TN-RGvFay-$F<)oj}B8KS&XE4nOJt&n>9 z+gML~&UTB{Pqnh|P2O)X=W^JzoqojvoH|LIN-QTdZyz{X>OJ{;!)e(C|2?PkoSevH^7p6BhdQ_WQ&@lHHLf$=o`1G@ zjvYhhtI$tu78WdSTvPVG`}i)P`u@MQncv>Cw@fKo_dk2>EMUBF7I;J!Gcf2WgD_*o zQu{KXAbW|YuPggw4tXJdhCe1tLV-e(C9V-A&iT2ysd*&~&PAz-C8;S2<(VZJ3hti1 z0pX2&;tUMT!k#XUAr-gYUUTGaFc5J$sO!l8FLyzPAp@tmmz8qj?LB`cE@Ax9c$TFh zfssQ@A%dZaRUrb*1`;4PP=G^>NJ*j<;5|9* literal 0 HcmV?d00001 diff --git a/doc/image_processing/basics.md b/doc/image_processing/basics.md index 4870448de3..69f3ef45d8 100644 --- a/doc/image_processing/basics.md +++ b/doc/image_processing/basics.md @@ -14,11 +14,17 @@ $ScharrX = [1,0,-1][1,0,-1][1,0,-1]$ The filter above, when convolved with a single channel image (intensity/luminance strength), will produce a gradient in X (horizontal) direction. There is filtering that cannot be done with a kernel though, and one good example is median filter (mean is the arithmetic mean, whereas median will be the center element of a sorted array). +--- + +### Affine transformation + + + --- ### Derivatives -A derivative of an image is a gradient in one of two directions, e.g. x (horizontal) and y (vertical). To compute a derivative, one can use Scharr, \ and other gradient filters. +A derivative of an image is a gradient in one of two directions: x (horizontal) and y (vertical). To compute a derivative, one can use Scharr, \ and other gradient filters. --- diff --git a/doc/image_processing/space-extrema-detectors.md b/doc/image_processing/space-extrema-detectors.md index 867160a993..11b702d003 100644 --- a/doc/image_processing/space-extrema-detectors.md +++ b/doc/image_processing/space-extrema-detectors.md @@ -1,4 +1,4 @@ -## Space extrema detectors +## Space extrema detectors ### What is being detected? @@ -10,7 +10,7 @@ A good feature is one that is repeatable, stable and can be recognized under aff At the moment, the following detectors are implemented - - Harris corner detector + - Harris detector - Hessian detector @@ -20,17 +20,23 @@ At the moment, the following detectors are implemented #### Harris and Hessian -Both are derived from a concept called Moravec window. Lets have a look at the image below: +Sometimes the kind of detectors is described as affine region detectors. Both are derived from a concept called Moravec window. Lets have a look at the image below: -\ +![Moravec window corner case](./Moravec-window-corner.png) -As can be noticed, moving the yellow window in any direction will cause very big change in intensity. This is the key concept in understanding how the two corner detectors work. +As can be noticed, moving the yellow window in any direction will cause very big change in intensity. Now, lets have a look at the edge case: + +![Moravec window edge case](./Moravec-window-edge.png) + +In this case, intensity change will happen only when moving in particular direction. + +This is the key concept in understanding how the two corner detectors work. The algorithms have the same structure: 1. Compute image derivatives - 2. Weighted convolution + 2. Compute Weighted convolution 3. Compute response @@ -62,10 +68,9 @@ The other is simply determinant $response = det = a * c - b * d$ -**Thresholding** is optional, but without it the result will be extremely noisy. For complex images, like the ones of outdoors, for Harris it will be in order of \ and for Hessian will be in order of \. For simpler images values in order of 100s and 1000s should be enough. - -To get deeper explanation please refer to following **papers**: +**Thresholding** is optional, but without it the result will be extremely noisy. For complex images, like the ones of outdoors, for Harris it will be in order of 100000000 and for Hessian will be in order of 10000. For simpler images values in order of 100s and 1000s should be enough. The numbers assume `uint8_t` gray image. -\ +To get deeper explanation please refer to following **paper**: -\ +[Harris, Christopher G., and Mike Stephens. "A combined corner and edge detector." In Alvey vision conference, vol. 15, no. 50, pp. 10-5244. 1988. +](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.434.4816&rep=rep1&type=pdf) From b7a7bb52a7676c9d1b2fda90c23c6b0839091d7b Mon Sep 17 00:00:00 2001 From: Olzhas Zhumabek Date: Sun, 1 Sep 2019 00:51:59 +0600 Subject: [PATCH 03/11] Replace Mathjax with code blocks Since GitHub doesn't allow Mathjax, the formula parts have been replaced with code blocks --- doc/image_processing/basics.md | 4 ++-- doc/image_processing/space-extrema-detectors.md | 14 +++++++------- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/doc/image_processing/basics.md b/doc/image_processing/basics.md index 69f3ef45d8..2c4a972cc0 100644 --- a/doc/image_processing/basics.md +++ b/doc/image_processing/basics.md @@ -10,7 +10,7 @@ Here are basic concepts that might help to understand documentation written in t Those three words usually mean the same thing, unless context is clear about a different usage. Simply put, they are matrices, that are used to achieve certain effects on the image. Lets consider a simple one, 3 by 3 Scharr filter -$ScharrX = [1,0,-1][1,0,-1][1,0,-1]$ +`ScharrX = [1,0,-1][1,0,-1][1,0,-1]` The filter above, when convolved with a single channel image (intensity/luminance strength), will produce a gradient in X (horizontal) direction. There is filtering that cannot be done with a kernel though, and one good example is median filter (mean is the arithmetic mean, whereas median will be the center element of a sorted array). @@ -24,7 +24,7 @@ The filter above, when convolved with a single channel image (intensity/luminanc ### Derivatives -A derivative of an image is a gradient in one of two directions: x (horizontal) and y (vertical). To compute a derivative, one can use Scharr, \ and other gradient filters. +A derivative of an image is a gradient in one of two directions: x (horizontal) and y (vertical). To compute a derivative, one can use Scharr, Sobel and other gradient filters. --- diff --git a/doc/image_processing/space-extrema-detectors.md b/doc/image_processing/space-extrema-detectors.md index 11b702d003..91bc2ff76a 100644 --- a/doc/image_processing/space-extrema-detectors.md +++ b/doc/image_processing/space-extrema-detectors.md @@ -44,29 +44,29 @@ The algorithms have the same structure: Harris and Hessian differ in what **derivatives they compute**. Harris computes the following derivatives: -$HarrisMatrix = [(dx)^2, dxdy], [dxdy, (dy)^2]$ +`HarrisMatrix = [(dx)^2, dxdy], [dxdy, (dy)^2]` -*(note that $d(x^2)$ and $(dy^2)$ are **numerical** powers, not gradient again).* +*(note that `d(x^2)` and `(dy^2)` are **numerical** powers, not gradient again).* The three distinct terms of a matrix can be separated into three images, to simplify implementation. Hessian, on the other hand, computes second order derivatives: -$HessianMatrix = [dxdx, dxdy][dxdy, dydy]$ +`HessianMatrix = [dxdx, dxdy][dxdy, dydy]` **Weighted convolution** is the same for both. Usually Gaussian blur matrix is used as weights, because corners should have hill like curvature in gradients, and other weights might be noisy. **Response computation** is a matter of choice. Given the general form of both matrices above -$[a, b][c, d]$ +`[a, b][c, d]` One of the response functions is -$response = det - k * trace^2 = a * c - b * d - k * (a + d)^2$ +`response = det - k * trace^2 = a * c - b * d - k * (a + d)^2` -$k$ is called discrimination constant. Usual values are $0.04$ - $0.06$. +`k` is called discrimination constant. Usual values are `0.04` - `0.06`. The other is simply determinant -$response = det = a * c - b * d$ +`response = det = a * c - b * d` **Thresholding** is optional, but without it the result will be extremely noisy. For complex images, like the ones of outdoors, for Harris it will be in order of 100000000 and for Hessian will be in order of 10000. For simpler images values in order of 100s and 1000s should be enough. The numbers assume `uint8_t` gray image. From a4f56a3ed50ce5adb92bd9259b614b5f2dc1a59a Mon Sep 17 00:00:00 2001 From: Olzhas Zhumabek Date: Sun, 1 Sep 2019 00:53:58 +0600 Subject: [PATCH 04/11] Remove section on affine transformation It doesn't seem like the concept is used often, so postponed for now --- doc/image_processing/basics.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/doc/image_processing/basics.md b/doc/image_processing/basics.md index 2c4a972cc0..7aebf70906 100644 --- a/doc/image_processing/basics.md +++ b/doc/image_processing/basics.md @@ -14,12 +14,6 @@ Those three words usually mean the same thing, unless context is clear about a d The filter above, when convolved with a single channel image (intensity/luminance strength), will produce a gradient in X (horizontal) direction. There is filtering that cannot be done with a kernel though, and one good example is median filter (mean is the arithmetic mean, whereas median will be the center element of a sorted array). ---- - -### Affine transformation - - - --- ### Derivatives From 65b1ae74d14069e0b3d159f2c8efb35db35949ef Mon Sep 17 00:00:00 2001 From: Olzhas Zhumabek Date: Sun, 1 Sep 2019 01:49:28 +0600 Subject: [PATCH 05/11] Add basic explanation of convolution --- doc/image_processing/basics.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/image_processing/basics.md b/doc/image_processing/basics.md index 7aebf70906..2fb916ab92 100644 --- a/doc/image_processing/basics.md +++ b/doc/image_processing/basics.md @@ -4,6 +4,8 @@ Here are basic concepts that might help to understand documentation written in t ### Convolution +The simplest way to look at this is "tweaking the input so that it would look like the shape provided". What exact tweaking is applied depends on the kernel. + --- ### Filters, kernels, weights From fba0c7aae3b7b71277a5b8990e81de582c3a59f3 Mon Sep 17 00:00:00 2001 From: Olzhas Zhumabek Date: Mon, 2 Sep 2019 23:25:35 +0600 Subject: [PATCH 06/11] Convert markdown to rst --- doc/image_processing/basics.md | 29 ----- doc/image_processing/basics.rst | 54 +++++++++ .../space-extrema-detectors.md | 76 ------------- .../space-extrema-detectors.rst | 107 ++++++++++++++++++ 4 files changed, 161 insertions(+), 105 deletions(-) delete mode 100644 doc/image_processing/basics.md create mode 100644 doc/image_processing/basics.rst delete mode 100644 doc/image_processing/space-extrema-detectors.md create mode 100644 doc/image_processing/space-extrema-detectors.rst diff --git a/doc/image_processing/basics.md b/doc/image_processing/basics.md deleted file mode 100644 index 2fb916ab92..0000000000 --- a/doc/image_processing/basics.md +++ /dev/null @@ -1,29 +0,0 @@ -## Basics - -Here are basic concepts that might help to understand documentation written in this folder: - -### Convolution - -The simplest way to look at this is "tweaking the input so that it would look like the shape provided". What exact tweaking is applied depends on the kernel. - ---- - -### Filters, kernels, weights - -Those three words usually mean the same thing, unless context is clear about a different usage. Simply put, they are matrices, that are used to achieve certain effects on the image. Lets consider a simple one, 3 by 3 Scharr filter - -`ScharrX = [1,0,-1][1,0,-1][1,0,-1]` - -The filter above, when convolved with a single channel image (intensity/luminance strength), will produce a gradient in X (horizontal) direction. There is filtering that cannot be done with a kernel though, and one good example is median filter (mean is the arithmetic mean, whereas median will be the center element of a sorted array). - ---- - - ### Derivatives - -A derivative of an image is a gradient in one of two directions: x (horizontal) and y (vertical). To compute a derivative, one can use Scharr, Sobel and other gradient filters. - ---- - -### Curvature - -The word, when used alone, will mean the curvature that would be generated if values of an image would be plotted in 3D graph. X and Z axises (which form horizontal plane) will correspond to X and Y indices of an image, and Y axis will correspond to value at that pixel. By little stretch of an imagination, filters (another names are kernels, weights) could be considered an image (or any 2D matrix). A mean filter would draw a flat plane, whereas Gaussian filter would draw a hill that gets sharper depending on it's sigma value. diff --git a/doc/image_processing/basics.rst b/doc/image_processing/basics.rst new file mode 100644 index 0000000000..bbec6271cf --- /dev/null +++ b/doc/image_processing/basics.rst @@ -0,0 +1,54 @@ +Basics +------ + +Here are basic concepts that might help to understand documentation +written in this folder: + +Convolution +~~~~~~~~~~~ + +The simplest way to look at this is "tweaking the input so that it would +look like the shape provided". What exact tweaking is applied depends on +the kernel. + +-------------- + +Filters, kernels, weights +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Those three words usually mean the same thing, unless context is clear +about a different usage. Simply put, they are matrices, that are used to +achieve certain effects on the image. Lets consider a simple one, 3 by 3 +Scharr filter + +``ScharrX = [1,0,-1][1,0,-1][1,0,-1]`` + +The filter above, when convolved with a single channel image +(intensity/luminance strength), will produce a gradient in X +(horizontal) direction. There is filtering that cannot be done with a +kernel though, and one good example is median filter (mean is the +arithmetic mean, whereas median will be the center element of a sorted +array). + +-------------- + +Derivatives +~~~~~~~~~~~ + +A derivative of an image is a gradient in one of two directions: x +(horizontal) and y (vertical). To compute a derivative, one can use +Scharr, Sobel and other gradient filters. + +-------------- + +Curvature +~~~~~~~~~ + +The word, when used alone, will mean the curvature that would be +generated if values of an image would be plotted in 3D graph. X and Z +axises (which form horizontal plane) will correspond to X and Y indices +of an image, and Y axis will correspond to value at that pixel. By +little stretch of an imagination, filters (another names are kernels, +weights) could be considered an image (or any 2D matrix). A mean filter +would draw a flat plane, whereas Gaussian filter would draw a hill that +gets sharper depending on it's sigma value. diff --git a/doc/image_processing/space-extrema-detectors.md b/doc/image_processing/space-extrema-detectors.md deleted file mode 100644 index 91bc2ff76a..0000000000 --- a/doc/image_processing/space-extrema-detectors.md +++ /dev/null @@ -1,76 +0,0 @@ -## Space extrema detectors - -### What is being detected? - -A good feature is one that is repeatable, stable and can be recognized under affine transformations. Unfortunately, edges do not fit the description. Corners, on the hand, fit well enough. - ---- - -### Available detectors - -At the moment, the following detectors are implemented - - - Harris detector - - - Hessian detector - ---- - -### Algorithm steps - -#### Harris and Hessian - -Sometimes the kind of detectors is described as affine region detectors. Both are derived from a concept called Moravec window. Lets have a look at the image below: - -![Moravec window corner case](./Moravec-window-corner.png) - -As can be noticed, moving the yellow window in any direction will cause very big change in intensity. Now, lets have a look at the edge case: - -![Moravec window edge case](./Moravec-window-edge.png) - -In this case, intensity change will happen only when moving in particular direction. - -This is the key concept in understanding how the two corner detectors work. - -The algorithms have the same structure: - - 1. Compute image derivatives - - 2. Compute Weighted convolution - - 3. Compute response - - 4. Threshold (optional) - -Harris and Hessian differ in what **derivatives they compute**. Harris computes the following derivatives: - -`HarrisMatrix = [(dx)^2, dxdy], [dxdy, (dy)^2]` - -*(note that `d(x^2)` and `(dy^2)` are **numerical** powers, not gradient again).* - -The three distinct terms of a matrix can be separated into three images, to simplify implementation. Hessian, on the other hand, computes second order derivatives: - -`HessianMatrix = [dxdx, dxdy][dxdy, dydy]` - -**Weighted convolution** is the same for both. Usually Gaussian blur matrix is used as weights, because corners should have hill like curvature in gradients, and other weights might be noisy. - -**Response computation** is a matter of choice. Given the general form of both matrices above - -`[a, b][c, d]` - -One of the response functions is - -`response = det - k * trace^2 = a * c - b * d - k * (a + d)^2` - -`k` is called discrimination constant. Usual values are `0.04` - `0.06`. - -The other is simply determinant - -`response = det = a * c - b * d` - -**Thresholding** is optional, but without it the result will be extremely noisy. For complex images, like the ones of outdoors, for Harris it will be in order of 100000000 and for Hessian will be in order of 10000. For simpler images values in order of 100s and 1000s should be enough. The numbers assume `uint8_t` gray image. - -To get deeper explanation please refer to following **paper**: - -[Harris, Christopher G., and Mike Stephens. "A combined corner and edge detector." In Alvey vision conference, vol. 15, no. 50, pp. 10-5244. 1988. -](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.434.4816&rep=rep1&type=pdf) diff --git a/doc/image_processing/space-extrema-detectors.rst b/doc/image_processing/space-extrema-detectors.rst new file mode 100644 index 0000000000..49f5801e2d --- /dev/null +++ b/doc/image_processing/space-extrema-detectors.rst @@ -0,0 +1,107 @@ +Space extrema detectors +----------------------- + +What is being detected? +~~~~~~~~~~~~~~~~~~~~~~~ + +A good feature is one that is repeatable, stable and can be recognized +under affine transformations. Unfortunately, edges do not fit the +description. Corners, on the hand, fit well enough. + +-------------- + +Available detectors +~~~~~~~~~~~~~~~~~~~ + +At the moment, the following detectors are implemented + +- Harris detector + +- Hessian detector + +-------------- + +Algorithm steps +~~~~~~~~~~~~~~~ + +Harris and Hessian +^^^^^^^^^^^^^^^^^^ + +Sometimes the kind of detectors is described as affine region detectors. +Both are derived from a concept called Moravec window. Lets have a look +at the image below: + +.. figure:: ./Moravec-window-corner.png + :alt: Moravec window corner case + + Moravec window corner case + +As can be noticed, moving the yellow window in any direction will cause +very big change in intensity. Now, lets have a look at the edge case: + +.. figure:: ./Moravec-window-edge.png + :alt: Moravec window edge case + + Moravec window edge case + +In this case, intensity change will happen only when moving in +particular direction. + +This is the key concept in understanding how the two corner detectors +work. + +The algorithms have the same structure: + +1. Compute image derivatives + +2. Compute Weighted convolution + +3. Compute response + +4. Threshold (optional) + +Harris and Hessian differ in what **derivatives they compute**. Harris +computes the following derivatives: + +``HarrisMatrix = [(dx)^2, dxdy], [dxdy, (dy)^2]`` + +*(note that ``d(x^2)`` and ``(dy^2)`` are **numerical** powers, not +gradient again).* + +The three distinct terms of a matrix can be separated into three images, +to simplify implementation. Hessian, on the other hand, computes second +order derivatives: + +``HessianMatrix = [dxdx, dxdy][dxdy, dydy]`` + +**Weighted convolution** is the same for both. Usually Gaussian blur +matrix is used as weights, because corners should have hill like +curvature in gradients, and other weights might be noisy. + +**Response computation** is a matter of choice. Given the general form +of both matrices above + +``[a, b][c, d]`` + +One of the response functions is + +``response = det - k * trace^2 = a * c - b * d - k * (a + d)^2`` + +``k`` is called discrimination constant. Usual values are ``0.04`` - +``0.06``. + +The other is simply determinant + +``response = det = a * c - b * d`` + +**Thresholding** is optional, but without it the result will be +extremely noisy. For complex images, like the ones of outdoors, for +Harris it will be in order of 100000000 and for Hessian will be in order +of 10000. For simpler images values in order of 100s and 1000s should be +enough. The numbers assume ``uint8_t`` gray image. + +To get deeper explanation please refer to following **paper**: + +`Harris, Christopher G., and Mike Stephens. "A combined corner and edge +detector." In Alvey vision conference, vol. 15, no. 50, pp. 10-5244. +1988. `__ From 7d24cc1924c6b83f02e64f9401b591471ceca3b3 Mon Sep 17 00:00:00 2001 From: Olzhas Zhumabek Date: Mon, 2 Sep 2019 23:39:41 +0600 Subject: [PATCH 07/11] Add some more relevant papers This commit cites and adds a link to paper about Hessian detector and a review paper about affine region detectors --- doc/image_processing/space-extrema-detectors.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/image_processing/space-extrema-detectors.rst b/doc/image_processing/space-extrema-detectors.rst index 49f5801e2d..f132dfb111 100644 --- a/doc/image_processing/space-extrema-detectors.rst +++ b/doc/image_processing/space-extrema-detectors.rst @@ -105,3 +105,8 @@ To get deeper explanation please refer to following **paper**: `Harris, Christopher G., and Mike Stephens. "A combined corner and edge detector." In Alvey vision conference, vol. 15, no. 50, pp. 10-5244. 1988. `__ + +`Mikolajczyk, Krystian, and Cordelia Schmid. "An affine invariant interest point detector." In European conference on computer vision, pp. 128-142. Springer, Berlin, Heidelberg, 2002. `__ + +`Mikolajczyk, Krystian, Tinne Tuytelaars, Cordelia Schmid, Andrew Zisserman, Jiri Matas, Frederik Schaffalitzky, Timor Kadir, and Luc Van Gool. "A comparison of affine region detectors." International journal of computer vision 65, no. 1-2 (2005): 43-72. `__ + From 16d95aa46f3b0940c36e4a888f3cdc4f0a6bc4c2 Mon Sep 17 00:00:00 2001 From: Olzhas Zhumabek Date: Mon, 2 Sep 2019 23:48:41 +0600 Subject: [PATCH 08/11] Move to new concept name Space extrema detector doesn't seem to be a widespread usage of detector the detector class. There is a paper that uses "Affine region detector" term, which has quite a few citations --- ...ce-extrema-detectors.rst => affine-region-detectors.rst} | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) rename doc/image_processing/{space-extrema-detectors.rst => affine-region-detectors.rst} (95%) diff --git a/doc/image_processing/space-extrema-detectors.rst b/doc/image_processing/affine-region-detectors.rst similarity index 95% rename from doc/image_processing/space-extrema-detectors.rst rename to doc/image_processing/affine-region-detectors.rst index f132dfb111..e1375b966e 100644 --- a/doc/image_processing/space-extrema-detectors.rst +++ b/doc/image_processing/affine-region-detectors.rst @@ -1,4 +1,4 @@ -Space extrema detectors +Affine region detectors ----------------------- What is being detected? @@ -6,7 +6,8 @@ What is being detected? A good feature is one that is repeatable, stable and can be recognized under affine transformations. Unfortunately, edges do not fit the -description. Corners, on the hand, fit well enough. +description. They will get warped under affine transformations, +but corners, on the hand, fit well enough. -------------- @@ -27,7 +28,6 @@ Algorithm steps Harris and Hessian ^^^^^^^^^^^^^^^^^^ -Sometimes the kind of detectors is described as affine region detectors. Both are derived from a concept called Moravec window. Lets have a look at the image below: From 512362d064e24cc23aa69ac650aeed65167cc960 Mon Sep 17 00:00:00 2001 From: Olzhas Zhumabek Date: Tue, 3 Sep 2019 00:02:46 +0600 Subject: [PATCH 09/11] Fix mistakes in docs Fix mistakes related to terminology and algorithm steps --- doc/image_processing/affine-region-detectors.rst | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/doc/image_processing/affine-region-detectors.rst b/doc/image_processing/affine-region-detectors.rst index e1375b966e..412cbda9ba 100644 --- a/doc/image_processing/affine-region-detectors.rst +++ b/doc/image_processing/affine-region-detectors.rst @@ -4,10 +4,10 @@ Affine region detectors What is being detected? ~~~~~~~~~~~~~~~~~~~~~~~ -A good feature is one that is repeatable, stable and can be recognized -under affine transformations. Unfortunately, edges do not fit the -description. They will get warped under affine transformations, -but corners, on the hand, fit well enough. +Affine region is basically any region of the image +that is stable under affine transformations. It can be +edges under affinity conditions, corners (small patch of an image) +or any other stable features. -------------- @@ -54,7 +54,7 @@ The algorithms have the same structure: 1. Compute image derivatives -2. Compute Weighted convolution +2. Compute Weighted sum 3. Compute response @@ -74,9 +74,13 @@ order derivatives: ``HessianMatrix = [dxdx, dxdy][dxdy, dydy]`` -**Weighted convolution** is the same for both. Usually Gaussian blur +**Weighted sum** is the same for both. Usually Gaussian blur matrix is used as weights, because corners should have hill like curvature in gradients, and other weights might be noisy. +Basically overlay weights matrix over a corner, compute sum of +``s[i,j]=image[x + i, y + j] * weights[i, j]`` for ``i, j`` +from zero to weight matrix dimensions, then move the window +and compute again until all of the image is covered. **Response computation** is a matter of choice. Given the general form of both matrices above From b5ff9f24f01720f3a4f1dd5935e8adeecad32b20 Mon Sep 17 00:00:00 2001 From: Olzhas Zhumabek Date: Tue, 3 Sep 2019 00:10:03 +0600 Subject: [PATCH 10/11] Add ip docs to index.rst Make sure ip docs are built and are in the final output --- doc/index.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/index.rst b/doc/index.rst index a6897811cb..c40b0845e1 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -26,6 +26,8 @@ Documentation design_guide io + image_processing/basics.rst + image_processing/affine-region-detectors.rst toolbox numeric API Reference <./reference/index.html#://> From 8aa79683d50259359eb38a88e81c163cf6665db1 Mon Sep 17 00:00:00 2001 From: Olzhas Zhumabek Date: Tue, 3 Sep 2019 00:15:19 +0600 Subject: [PATCH 11/11] Fix formatting It seems like some lines are not properly formatted and rendered file containes unreadable lines. Fixed by not formatting it. --- doc/image_processing/affine-region-detectors.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/doc/image_processing/affine-region-detectors.rst b/doc/image_processing/affine-region-detectors.rst index 412cbda9ba..3ecb69aaee 100644 --- a/doc/image_processing/affine-region-detectors.rst +++ b/doc/image_processing/affine-region-detectors.rst @@ -65,8 +65,7 @@ computes the following derivatives: ``HarrisMatrix = [(dx)^2, dxdy], [dxdy, (dy)^2]`` -*(note that ``d(x^2)`` and ``(dy^2)`` are **numerical** powers, not -gradient again).* +(note that ``d(x^2)`` and ``(dy^2)`` are **numerical** powers, not gradient again). The three distinct terms of a matrix can be separated into three images, to simplify implementation. Hessian, on the other hand, computes second