Edge Detection Techniques - computer-vision

Does anyone know what the differences between the Prewitt, Sobel and Laplacian operators in edge detection algorithms?
Are some better than others?
Are different operators used in different situations?

The laplace operator is a 2nd order derivative operator, the other two are 1st order derivative operators, so they're used in different situations. Sobel/Prewitt measure the slope while the Laplacian measures the change of the slope.
Examples:
If you have a signal with a constant slope (a gradient):
Gradient signal: 1 2 3 4 5 6 7 8 9
a 1st derivative filter (Sobel/Prewitt) will measure the slope, so the filter response is
Sobel result: 2 2 2 2 2 2 2
The result of a lapace filter is 0 for this signal, because the slope is constant.
Example 2: If you have an edge signal:
Edge: 0 0 0 0 1 1 1 1
The sobel filter result has one peak; the sign of the peak depends on the direction of the edge:
Sobel result: 0 0 0 1 1 0 0 0
The laplace filter produces two peaks; the location of the edge corresponds with the zero crossing of the laplace filter result:
Laplace result: 0 0 0 1 -1 0 0 0
So if you want to know the direction of and edge, you'd use a 1st order derivative filter. Also, a Laplace filter is more sensitive to noise than Sobel or Prewitt.
Sobel and Prewitt filters on the other hand are quite similar and are used for the same purposes. Important differences between 1st order derivative filters are
Sensitivity to noise
Anisotropy: Ideally, the filter results for X/Y should be proportional to sin α and cos α, where α is the angle of the gradient, and the sum of the two squares should be the same for every angle.
Behavior at corners
These properties can be measured with artificial test images (like the famous Jähne test patterns, found in "Image Processing" by Bern Jähne). Unfortunately, I didn't find anything about the Prewitt operator in that book, so you'd have to do your own experiments.
In the end, there's always a trade-off between these properties, and which of them is more important depends on the application.

Related

Why scale pixels between -1 and 1 sample-wise in the preprocess step for image classification

In the preprocess_input() function found at the link below, the pixels are scaled between -1 and 1. I have seen this used elsewhere as well. What is the reason for scaling between -1 and 1 as opposed to 0 and 1. I was under impression that common ranges for pixels where between 0-255 or if normalized 0-1.
https://github.com/keras-team/keras/blob/master/keras/applications/imagenet_utils.py
The normalization between -1 and 1 aims to making the data has a mean of 0 and std_dev of 1. (i.e. Normal distribution). Also the choice of the activation function used in the network determines the kind of normalization especially when using batch normalization.
For example if sigmoid was used and the normalization is done between 0 and 1, then all the negative values obtained from the network (multiplying weights with inputs and then adding biases) will be mapped to zero. (that leads to more vanishing gradients during backpropagation)
Whereas with tanh and normalization between -1 and 1, those negative values will be mapped to corresponding negative values between 0 and -1.
tanh usually is the activation function used in Convolutional Networks and GANs and is preferred to Sigmoid.
"Tanh. The tanh non-linearity is shown on the image above on the right. It squashes a real-valued number to the range [-1, 1]. Like the sigmoid neuron, its activations saturate, but unlike the sigmoid neuron its output is zero-centered. Therefore, in practice the tanh non-linearity is always preferred to the sigmoid nonlinearity." from the infamous Andrej Karpathy Course cs231n.github.io/neural-networks-1

How can I fill gaps in a binary image in OpenCV?

I have some thresholded images of hand-drawn figures (circuits) but there are some parts where I need to have gaps closed between two points, as I show in the following image:
Binary image
I tried closing (dilation followed by erosion), but it is not working. It doesn't fill the gaps and makes the resistors and other components unrecognizable. I couldn't find a proper value for the morph size and number of iterations that give me a good result without affecting the rest of the picture. It's important not to affect too much the components.
I can't use hough lines because the gaps are not always in lines.
Result after closing:
Result after closing
int morph_size1 = 2;
Mat element1 = getStructuringElement(MORPH_RECT, Size(2 * morph_size1 + 1, 2 * morph_size1 + 1), Point(morph_size1, morph_size1));
Mat dst1; // result matrix
for (int i = 1; i<3; i++)
{
morphologyEx(binary, dst1, CV_MOP_CLOSE, element1, Point(-1, -1), i);
}
imshow("closing ", dst1);
Any idea?
Thanks in advance.
My proposal:
find the endpoints of the breaks by means of morphological thinning (select the white pixels having only one white neighbor);
in small neighborhoods around every endpoint, find the closest endpoint, by circling* up to a limit radius;
draw a thick segment between them.
*In this step, it is very important to look for neighbors in different connected component, to avoid linking a piece to itself; so you need blob labelling as well.
In this thinning, there are more breaks than in your original picture because I erased the boxes.
Of course, you draw the filling segments in the original image.
This process cannot be perfect, as sometimes endpoints will be missing, and sometimes unwanted endpoints will be considered.
As a refinement, you can try and estimate the direction at endpoints, and only search is an angular sector.
My suggestion is to use a custom convolution filter (cv::filter2D) like the one below (can be larger):
0 0 1/12 0 0
0 0 2/12 0 0
1/12 2/12 0 2/12 1/12
0 0 2/12 0 0
0 0 1/12 0 0
The idea is to fill gaps when there are two line segments near each other. You can also use custom structuring elements to obtain the same effect.

Choose rectangles for maximizing the area

I've got a 2D-binary matrix of arbitrary size. I want to find a set of rectangles in this matrix, showing a maximum area. The constraints are:
Rectangles may only cover "0"-fields in the matrix and no "1"-fields.
Each rectangle has to have a given distance from the next rectangle.
So let me illustrate this a bit further by this matrix:
1 0 0 1
0 0 0 0
0 0 1 0
0 0 0 0
0 1 0 0
Let the minimal distance between two rectangles be 1. Consequently, the optimal solution would be by choosing the rectangles with corners (1,0)-(3,1) and (1,3)-(4,3). These rectangles are min. 1 field apart from each other and they do not lie on "1"-fields. Additionally, this solution got the maximum area (6+4=10).
If the minimal distance would be 2, the optimum would be (1,0)-(4,0) and (1,3)-(4,3) with area 4+4=8.
Till now, I achieved to find out rectangles analogous to this post:
Find largest rectangle containing only zeros in an N×N binary matrix
I saved all these rectangles in a list:
list<rectangle> rectangles;
with
struct rectangle {
int i,j; // bottom left corner of rectangle
int width,length; // width=size in neg. i direction, length=size in pos. j direction
};
Till now, I only thought about brute-force-methods but of course, I am not happy with this.
I hope you can give me some hints and tips of how to find the corresponding rectangles in my list and I hope my problem is clear to you.
The following counterexample shows that even a brute-force checking of all combinations of maximal-area rectangles can fail to find the optimum:
110
000
110
In the above example, there are 2 maximal-area rectangles, each of area 3, one vertical and one horizontal. You can't pick both, so if you are restricted to choosing a subset of these rectangles, the best you can do is to pick (either) one for a total area of 3. But if you instead picked the vertical area-3 rectangle, and then also took the non-maximal 1x2 rectangle consisting of just the leftmost two 0s, you could get a better total area of 5. (That's for a minimum separation distance of 0; if the minimum separation distance is 1, as in your own example, then you could instead pick just the leftmost 0 as a 1x1 rectangle for a total area of 4, which is still better than 3.)
For the special case when the separation distance is 0, there's a trivial algorithm: you can simply put a 1x1 rectangle on every single 0 in the matrix. When the separation distance is strictly greater than 0, I don't yet see a fast algorithm, though I'm less sure that the problem is NP-hard now than I was a few minutes ago...

The Concept of Bilinear Interpolation by Accessing 2-d array like 1-d array does

In 2-d array, there are pixels of bmp files. and its size is width(3*65536) * height(3*65536) of which I scaled.
It's like this.
1 2 3 4
5 6 7 8
9 10 11 12
Between 1 and 2, There are 2 holes as I enlarged the original 2-d array. ( multiply 3 )
I use 1-d array-like access method like this.
array[y* width + x]
index
0 1 2 3 4 5 6 7 8 9...
1 2 3 4 5 6 7 8 9 10 11 12
(this array is actually 2-d array and is scaled by multiplying 3)
now I can patch the hole like this solution.
In double for loop, in the condition (j%3==1)
Image[i*width+j] = Image[i*width+(j-1)]*(1-1/3) + Image[i*width+(j+2)]*(1-2/3)
In another condition ( j%3==2 )
Image[i*width+j] = Image[i*width+(j-2)]*(1-2/3) + Image[i*width+(j+1)]*(1-1/3)
This is the way I know I could patch the holes which is so called "Bilinear Interpolation".
I want to be sure about what I know before implementing this logic into my code. Thanks for reading.
Bi linear interpolation requires either 2 linear interpolation passes (horizontal and vertical) per interpolated pixel (well, some of them only require 1), or requires up to 4 source pixels per interpolated pixel.
Between 1 and 2 there are two holes. Between 1 and 5 there are 2 holes. Between 1 and 6 there are 4 holes. Your code, as written, could only patch holes between 1 and 2, not the other holes correctly.
In addition your division is integer division, and does not do what you want.
Generally you are far better off writing a r=interpolate_between(a,b,x,y) function, that interpolates between a and b at step x out of y. Then test and fix. Now scale your image horizontally using it, and check visually you got it right (especially the edges!)
Now try using it to scale vertically only.
Now do both horizontal, then vertical.
Next, write the bilinear version, which you can test again using the linear version three times (will be within rounding error). Then try to bilinear scale the image, checking visually.
Compare with the two-linear scale. It should differ only by rounding error.
At each of these stages you'll have a single "new" operation that can go wrong, with the previous code already validated.
Writing everything at once will lead to complex bug-ridden code.

How does a projection Matrix work?

I have to write a paper for my A-Levels about 3D-Programming. But I got a serious problem understanding the perspective projection Matrix and I need to fully explain the Matrix in detail. I've searched a lot of websites and youtube videos on this topic but very little even try to answer the question why the Matrix has these values at that place. Based on this http://www.songho.ca/opengl/gl_projectionmatrix.html I was able to find out how the w-row works, but I don't understand the other three.
I decided to use the "simpler" version for symmetric viewports only (right-handed Coord.):
I am very thankful for every attempt to explain the first three rows to me!
The core reason for the matrix is to map the 3D coordinates to a 2D plane and have more distant objects be smaller.
For just this a much simpler matrix suffices (assuming your camera is at origin and looking at the Z axis):
1 0 0 0
0 1 0 0
0 0 0 0
0 0 1 0
After multiplying with this matrix and then renormalizing the w coordinate you have exactly that. Each x,y,z,1 point becomes x/z,y/z,0,1.
However there is no depth information (Z is 0 for all points) so a depth buffer/filter won't work. For that we can a a parameter to the matrix so the depth information remains available:
1 0 0 0
0 1 0 0
0 0 0 1
0 0 1 0
Now the resulting point contains the inverse depth in the Z coordinate. Each x,y,z,1 point becomes x/z,y/z,1/z,1.
The extra parameters are the result of mapping the coordinates into the (-1,-1,-1) - (1,1,1) device box (the bounding box where if you are outside of it the point won't get drawn) using a scale and a translate.