I am studying image processing these days and I am a beginner to the subject. I got stuck on the subject of convolution and how to implement it for images. Let me brief - there is a general formula of convolution for images like so:
x(n1,n2) represents a pixel in the output image, but I do not know what k1 and k2 stand for. Actually, this is what would like to learn. In order to implement this in some programming language, I need to know what k1 and k2 stand for. Can someone explain me this to me or lead me to an article? I would be really appreciative of any help.
Convolution in this case deals with extracting out patches of image pixels that surround a target image pixel. When you perform image convolution, you perform this with what is known as a mask or point spread function or kernel and this is usually much smaller than the size of the image itself.
For each target image pixel in the output image, you grab a neighbourhood of pixel values from the input, including the pixel that is at the same target coordinates in the input. The size of this neighbourhood coincides with exactly the same size as the mask. At that point, you rotate the mask so that it's 180 degrees, then do an element-by-element multiplication of each value in the mask with the pixel values that coincide at each location in the neighbourhood. You add all of these up, and that is the output for the target pixel in the target image.
For example, let's say I had this small image:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
And let's say I wanted to perform an averaging within a 3 x 3 window, so my mask would all be:
[1 1 1]
1/9*[1 1 1]
[1 1 1]
To perform 2D image convolution, rotating the mask by 180 degrees still gives us the same mask, and so let's say I wanted to find the output at row 2, column 2. The 3 x 3 neighbourhood I would extract is:
1 2 3
6 7 8
11 12 13
To find the output, I would multiply each value in the mask by the same location of the neighbourhood:
[1 2 3 ] [1 1 1]
[6 7 8 ] ** (1/9)*[1 1 1]
[11 12 13] [1 1 1]
Perform a point by point multiplication and adding the values would give us:
1(1/9) + 2(1/9) + 3(1/9) + 6(1/9) + 7(1/9) + 8(1/9) + 11(1/9) + 12(1/9) + 13(1/9) = 63/9 = 7
The output at location (2,2) in the output image would be 7.
Bear in mind that I didn't tackle the case where the mask would go out of bounds. Specifically, if I tried to find the output at row 1, column 1 for example, there would be five locations where the mask would go out of bounds. There are many ways to handle this. Some people consider those pixels outside to be zero. Other people like to replicate the image border so that the border pixels are copied outside of the image dimensions. Some people like to pad the image using more sophisticated techniques like doing symmetric padding where the border pixels are a mirror reflection of what's inside the image, or a circular padding where the border pixels are copied from the other side of the image.
That's beyond the scope of this post, but in your case, start with the most simplest case where any pixels that go outside the bounds of the image when you're collecting neighbourhoods, set those to zero.
Now, what does k1 and k2 mean? k1 and k2 denote the offset with respect to the centre of the neighbourhood and mask. Notice that the n1 - k1 and n2 - k2 are important in the sum. The output position is denoted by n1 and n2. Therefore, n1 - k1 and n2 - k2 are the offsets with respect to this centre in both the horizontal sense n1 - k1 and the vertical sense n2 - k2. If we had a 3 x 3 mask, the centre would be k1 = k2 = 0. The top-left corner would be k1 = k2 = -1. The bottom right corner would be k1 = k2 = 1. The reason why they go to infinity is because we need to make sure we cover all elements in the mask. Masks are finite in size so that's just to ensure that we cover all of the mask elements. Therefore, the above sum simplifies to that point by point summation I was talking about earlier.
Here's a better illustration where the mask is a vertical Sobel filter which finds vertical gradients in an image:
Source: http://blog.saush.com/2011/04/20/edge-detection-with-the-sobel-operator-in-ruby/
As you can see, for each output pixel in the target image, we take a look at a neighbourhood of pixels in the same spatial location in the input image, and that's 3 x 3 in this case, we perform a weighted element by element sum between the mask and the neighbourhood and we set the output pixel to be the total sum of these weighted elements. Bear in mind that this example does not rotate the mask by 180 degrees, but that's what you do when it comes to convolution.
Hope this helps!
$k_1$ and $k_2$ are variables that should cover the whole definition area of your kernel.
Check out wikipedia for further description:
http://en.wikipedia.org/wiki/Kernel_%28image_processing%29
Related
i'm in a bit of a bind. Trying to write code in C++ that tests every possible combination of items in a matrix, with restrictions on the quadrants in which each item can be placed. Here is the problem:
I have a 4 (down) x 2 (across) matrix
Matrix Here
There are 4 quadrants in the Matrix 1-4, and each quadrant can occupy up to 2 of the same type of item.
I have a list of x (say 4) of each type of item a - d, which can fit into their respective quadrants 1-4.
the items:
a1,a2,a3,a4 - Only Quadrant 1
b1,b2,b3,b4 - Only Quadrant 2
c1,c2,c3,c4 - Only Quadrant 3
d1,d2,d3,d4 - Only Quadrant 4
I need to write a code that lists every possible combination of the items a to d, into their respective quadrants. Every possible way to combine all the items a-d into their respective quadrants (restriction)
Any ideas? So far I think I have to find the combination for each quadrant and multiply these across the quadrants to give the total number of combinations, but I am not sure how to develop code to list each possible combination.
In 2-d array, there are pixels of bmp files. and its size is width(3*65536) * height(3*65536) of which I scaled.
It's like this.
1 2 3 4
5 6 7 8
9 10 11 12
Between 1 and 2, There are 2 holes as I enlarged the original 2-d array. ( multiply 3 )
I use 1-d array-like access method like this.
array[y* width + x]
index
0 1 2 3 4 5 6 7 8 9...
1 2 3 4 5 6 7 8 9 10 11 12
(this array is actually 2-d array and is scaled by multiplying 3)
now I can patch the hole like this solution.
In double for loop, in the condition (j%3==1)
Image[i*width+j] = Image[i*width+(j-1)]*(1-1/3) + Image[i*width+(j+2)]*(1-2/3)
In another condition ( j%3==2 )
Image[i*width+j] = Image[i*width+(j-2)]*(1-2/3) + Image[i*width+(j+1)]*(1-1/3)
This is the way I know I could patch the holes which is so called "Bilinear Interpolation".
I want to be sure about what I know before implementing this logic into my code. Thanks for reading.
Bi linear interpolation requires either 2 linear interpolation passes (horizontal and vertical) per interpolated pixel (well, some of them only require 1), or requires up to 4 source pixels per interpolated pixel.
Between 1 and 2 there are two holes. Between 1 and 5 there are 2 holes. Between 1 and 6 there are 4 holes. Your code, as written, could only patch holes between 1 and 2, not the other holes correctly.
In addition your division is integer division, and does not do what you want.
Generally you are far better off writing a r=interpolate_between(a,b,x,y) function, that interpolates between a and b at step x out of y. Then test and fix. Now scale your image horizontally using it, and check visually you got it right (especially the edges!)
Now try using it to scale vertically only.
Now do both horizontal, then vertical.
Next, write the bilinear version, which you can test again using the linear version three times (will be within rounding error). Then try to bilinear scale the image, checking visually.
Compare with the two-linear scale. It should differ only by rounding error.
At each of these stages you'll have a single "new" operation that can go wrong, with the previous code already validated.
Writing everything at once will lead to complex bug-ridden code.
According to the HOG process, as described in the paper Histogram of Oriented Gradients for Human Detection (see link below), the contrast normalization step is done after the binning and the weighted vote.
I don't understand something - If I already computed the cells' weighted gradients, how can the normalization of the image's contrast help me now?
As far as I understand, contrast normalization is done on the original image, whereas for computing the gradients, I already computed the X,Y derivatives of the ORIGINAL image. So, if I normalize the contrast and I want it to take effect, I should compute everything again.
Is there something I don't understand well?
Should I normalize the cells' values?
Is the normalization in HOG not about contrast anyway, but is about the histogram values (counts of cells in each bin)?
Link to the paper:
http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf
The contrast normalization is achieved by normalization of each block's local histogram.
The whole HOG extraction process is well explained here: http://www.geocities.ws/talh_davidc/#cst_extract
When you normalize the block histogram, you actually normalize the contrast in this block, if your histogram really contains the sum of magnitudes for each direction.
The term "histogram" is confusing here, because you do not count how many pixels has direction k, but instead you sum the magnitudes of such pixels. Thus you can normalize the contrast after computing the block's vector, or even after you computed the whole vector, assuming that you know in which indices in the vector a block starts and a block ends.
The steps of the algorithm due to my understanding - worked for me with 95% success rate:
Define the following parameters (In this example, the parameters are like HOG for Human Detection paper):
A cell size in pixels (e.g. 6x6)
A block size in cells (e.g. 3x3 ==> Means that in pixels it is 18x18)
Block overlapping rate (e.g. 50% ==> Means that both block width and block height in pixels have to be even. It is satisfied in this example, because the cell width and cell height are even (6 pixels), making the block width and height also even)
Detection window size. The size must be dividable by a half of the block size without remainder (so it is possible to exactly place the blocks within with 50% overlapping). For example, the block width is 18 pixels, so the windows width must be a multiplication of 9 (e.g. 9, 18, 27, 36, ...). Same for the window height. In our example, the window width is 63 pixels, and the window height is 126 pixels.
Calculate gradient:
Compute the X difference using convolution with the vector [-1 0 1]
Compute the Y difference using convolution with the transpose of the above vector
Compute the gradient magnitude in each pixel using sqrt(diffX^2 + diffY^2)
Compute the gradient direction in each pixel using atan(diffY / diffX). Note that atan will return values between -90 and 90, while you will probably want the values between 0 and 180. So just flip all the negative values by adding to them +180 degrees. Note that in HOG for Human Detection, they use unsigned directions (between 0 and 180). If you want to use signed directions, you should make a little more effort: If diffX and diffY are positive, your atan value will be between 0 and 90 - leave it as is. If diffX and diffY are negative, again, you'll get the same range of possible values - here, add +180, so the direction is flipped to the other side. If diffX is positive and diffY is negative, you'll get values between -90 and 0 - leave them the same (You can add +360 if you want it positive). If diffY is positive and diffX is negative, you'll again get the same range, so add +180, to flip the direction to the other side.
"Bin" the directions. For example, 9 unsigned bins: 0-20, 20-40, ..., 160-180. You can easily achieve that by dividing each value by 20 and flooring the result. Your new binned directions will be between 0 and 8.
Do for each block separately, using copies of the original matrix (because some blocks are overlapping and we do not want to destroy their data):
Split to cells
For each cell, create a vector with 9 members (one for each bin). For each index in the bin, set the sum of all the magnitudes of all the pixels with that direction. We have totally 6x6 pixels in a cell. So for example, if 2 pixels have direction 0 while the magnitude of the first one is 0.231 and the magnitude of the second one is 0.13, you should write in index 0 in your vector the value 0.361 (= 0.231 + 0.13).
Concatenate all the vectors of all the cells in the block into a large vector. This vector size should of course be NUMBER_OF_BINS * NUMBER_OF_CELLS_IN_BLOCK. In our example, it is 9 * (3 * 3) = 81.
Now, normalize this vector. Use k = sqrt(v[0]^2 + v[1]^2 + ... + v[n]^2 + eps^2) (I used eps = 1). After you computed k, divide each value in the vector by k - thus your vector will be normalized.
Create final vector:
Concatenate all the vectors of all the blocks into 1 large vector. In my example, the size of this vector was 6318
I´m trying to compute the perimeter of a region in binary images. When a region is simply connected, i.e. it has no "holes", everything is quite simple: I just check for every pixel if it belongs to the region and has at least a neighbor which is not belonging to the region... I have a variable that counts the number of pixels satisfying this condition.
In the case of region with holes I use a different way. I start from a pixel in the border and "jump" to a neighbor (increasing a counter) if it is itself a border pixel. The procedure, with some more quirks, ends when I go back to the initial pixels. Something like this:
int iPosCol = iStartCol, int iPosRow = iStartRow;
do
{
//check neighbors, pick point on the perimeter
//condition: value == label, pixel at the border.
check8Neighbors(iPosCol, iPosRow);
updatePixPosition(iPosCol, iPosRow);
}
while ( iPosC != iStartC || iPosR != iStartR );
The problem is that this method won´t work if the holes in the region are close to the border (1-pixel distance).
Are there standard ways of computing perimeter of non simply connected regions, or am I approaching the problem in the wrong way?
As JCooper noted, connected component a.k.a. region labeling a.k.a. contour detection is an algorithm to find regions of connected pixels, typically in an image that has been binarized so that all pixels are black or white.
The Wikipedia entry for Connected-component labeling includes pseudocode for a "single pass" algorithm (http://en.wikipedia.org/wiki/Connected-component_labeling).
http://en.wikipedia.org/wiki/Connected-component_labeling
Another single-pass algorithm can be found in the paper "A Component-Labeling Algorithm Using Contour Tracing Technique" by Chang and Chen. This paper also includes a description of an edge-following algorithm you can use to find just the contour, if you'd like.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.3213
That paper describes edge-following well enough, but I'll describe the basic idea here.
Let's say the outer contour of a figure is represented by pixels a through f, and background pixels are represented by "-":
- - - - -
- a b - -
- - - c -
- - - d -
- f e - -
- - - - -
If we're scanning the image from top to bottom, and along each row from left to right, then the first pixel encountered is pixel a. To move from pixel a to pixel b, and then from b to c, we track the direction of each move using 8 directions defined relative to the current pixel, p:
6 7 8
5 p 1
4 3 2
The move from the background "-" to pixel "a" is along direction 1. Although we know where "b" is, the software doesn't, so we check all directions clockwise about "a" to find the next pixel along the contour. We don't need to check direction 5 (left) because we just came from the background pixel to the left of "a." What we do is check directions clockwise 6, 7, 8, 1, 2, etc., looking for the next contour pixel. We find "b" also along direction 1 after finding only background pixels in directions 6, 7, and 8 relative to "a."
If we look at the transition from c to d, we move in direction 3. To find the next contour pixel "e," we check directions 8, 1, 2, 3, 4, and we find contour pixel "e" by moving in direction 4.
The general rule is that if our last move was in direction d, the first direction we check for our next move is direction d - 3. If the last move was in direction 5 (moving left), then we start our next clockwise search at direction 2.
In code we would usually use directions 0 - 7, and clearly you'll make use of the modulo operation or similar math, but I hope the idea is clear. The Chang and Chen paper describes the basic contour-following algorithm reasonably well, and also mentions necessary checks if the algorithm needs to retrace certain pixels.
An edge-following algorithm may be sufficient for your needs, but for a variety of reasons you might want to find connection regions of pixels, too.
For a connected component algorithm, one thing to keep in mind is what you want to consider a "neighbor" of a pixel. You can look at just the "4-neighbors":
- n -
n p n
- n -
where "p" is the center pixel, "n" marks four neighbors, and "-" marks pixels that are not considered neighbors. You can also consider the "8-neighbors," which are simply all pixels surrounding a given pixel:
n n n
n p n
n n n
Typically, 4-neighbors are the better choice when checking connectivity for foreground objects. If you select an 8-neighbor technique, then a checkerboard pattern like the following could be considered a single object:
p - p - p - p - p
- p - p - p - p -
p - p - p - p - p
- p - p - p - p -
p - p - p - p - p
- p - p - p - p -
Let's say you have a blob that looks like the one below, with foreground pixels labeled as "p" and background pixels labeled as "-":
- - - - - - - - - -
- - - - p p p - - -
- - p p p - p p - -
- - - p p - p p - -
- - - - p p p - - -
- p p p p p p - - -
- - - - - - - - - -
If you consider just the pixels of the outer contour, you'll see that it can be a little tricky calculating the perimeter. For the pixels 1, 2, 3, 4, and 5 below, you can calculate the perimeter using the pixels 1 - 5, moving in stepwise fashion from pixel 1 to 2, then 2 to 3, etc. Typically it's better to calculate the perimeter for this segment using only pixels 1, 3, and 5 along the diagonal. For the single row of pixels at bottom, you must be careful that the algorithm does not count those pixels twice.
- - - - - - - - - -
- - - - p p p - - -
- - 1 2 p - p p - -
- - - 3 4 - p p - -
- - - - 5 p p - - -
- p p p p p p - - -
- - - - - - - - - -
For relatively large connected regions without "peninsulas" jutting out that are a single pixel wide, calculating the perimeter is relatively straightforward. For very small objects it's hard to calculate the "true" perimeter in part because we have a limited number of discrete, square pixels representing a real-world object with a contour that is likely smooth and slightly curvy. The image representation of the object is chunky.
If you have an ordered list of the pixels found from an edge tracing algorithm, then you can calculate the perimeter by checking the change in X and the change in Y of two successive pixels in the list of contour pixels. You calculate the perimeter by calculating the sum of pixel-to-pixel distances along contour pixels.
For pixel N and pixel N + 1: if either X is the same or Y is the same then the direction from N to N + 1 is left, right, up, or down, and the distance is 1.
If both X and Y are different for pixels N and N + 1, then the direction moving from one pixel to the next is at a 45-degree angle to the horizontal, and the distance between pixel centers is the square root of 2.
Whatever algorithm you create, consider checking its accuracy against simple figures: a square, a rectangle, a circle, etc. A circle is particular helpful to check perimeter calculation because the contour of a circle (esp. a small circle) in an image will have jagged rather than smooth edges.
- - - - - - - - - -
- - - p p p p - - -
- - p p p p p p - -
- - p p p p p p - -
- - p p p p p p - -
- - - p p p p - - -
- - - - - - - - - -
There are techniques to find shapes and calculate perimeters in grayscale and color images that don't rely on binarization to make the image just black and white, but those techniques are trickier. For many applications the simple, standard technique will work:
Choose a threshold to binarize the image.
Run a connected component / region labeling algorithm on the binarized image
Use the contour pixels of a connected region to calculate perimeter
An image processing textbook used in many universities has the answer to many of your questions about image processing. If you're going to delve into image processing, you should have at least one textbook like this one handy; it'll save you hours of hunting online for answers.
Digital Image Processing (3rd edition) by Gonzalez and Woods
Book website:
http://www.imageprocessingplace.com/
You should be able to find an international edition for about $35 online.
If you end up writing a lot of code to perform geometric calculations, another handy reference book is Geometric Tools for Computer Graphics by Schneider and Eberly.
http://www.amazon.com/Geometric-Computer-Graphics-Morgan-Kaufmann/dp/1558605940
It's pricey, but you can find used copies cheap sometimes at multi-site search engines like
http://www.addall.com
Corrections, PDFs of theory, and code from the book can be found here:
http://www.geometrictools.com/
So here is my proposition:
Let's assume you want to find the border of a black region(for simplicity).
First add one extra white column and one extra white row on all sides of the image. This is done to simplify corner cases and I will try to explain where it helps.
Next do a breadth first search from any pixel in your region. The edges in the graph are defined as connecting neighbouring cells in black color. By doing this BFS you will find all the pixels in your region. Now select the bottom-most(you can find it linerly) and if there are many bottom-most just select any of them. Select the pixel that is below it - this pixel is white for sure because: we selected the bottom-most of the pixels in our region and if the pixel was black the BFS would have visited it. Also there is a pixel below our bottom-most pixel because of the extra rows and columns we added.
Now do another BFS this time passing through white nighbouring pixels(again the fact that we added additional rows and columns helps here). This way we find a white region that surrounds the black region we are interested in from everywhere. Now all the pixels from the original black region that are neighbouring any of the pixels in the newly found white region are part of the border and only they are part of it. So you count those pixels and there you go - you have the perimeter.
The solution is complicated by the fact that we do not want to count borders of the holes as part of the perimeter - had this condition not be present we could just count all the pixels in the initial black region that are neighbouring any white pixel or the border of the image(here we do not need to add rows and colums).
Hope this answer helps.
Perhaps the simplest thing to do would be to run a connected component algorithm and then fill in the holes.
I am using letter_regcog example from OpenCV, it used dataset from UCI which have structure like this:
Attribute Information:
1. lettr capital letter (26 values from A to Z)
2. x-box horizontal position of box (integer)
3. y-box vertical position of box (integer)
4. width width of box (integer)
5. high height of box (integer)
6. onpix total # on pixels (integer)
7. x-bar mean x of on pixels in box (integer)
8. y-bar mean y of on pixels in box (integer)
9. x2bar mean x variance (integer)
10. y2bar mean y variance (integer)
11. xybar mean x y correlation (integer)
12. x2ybr mean of x * x * y (integer)
13. xy2br mean of x * y * y (integer)
14. x-ege mean edge count left to right (integer)
15. xegvy correlation of x-ege with y (integer)
16. y-ege mean edge count bottom to top (integer)
17. yegvx correlation of y-ege with x (integer)
example:
T,2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8
I,5,12,3,7,2,10,5,5,4,13,3,9,2,8,4,10
now I have segmented image of letter and want to transform it into data like this to put recognize it but I don't understand the mean of all value like "6. onpix total # on pixels" what is it mean ? Can you please explain the mean of these value. thanks.
I am not familiar with OpenCV's letter_recog example, but this appears to be a feature vector, or set of statistics about the image of a letter that is used to classify the future occurrences of the letter. The results of your segmentation should leave you with a binary mask with 1's on the letter and 0's everywhere else. onpix is simply the total count of pixels that fall on the letter, or in other words, the sum of your binary mask.
Most of the rest values in the list need to be calculated based on the set of pixels with a value of 1 in your binary mask. x and y are just the position of the pixel. For instance, x-bar is just the sample mean of all of the x positions of all pixels that have a 1 in the mask. You should be able to easily find references on the web for mathematical definitions of mean, variance, covariance and correlation.
14-17 are a little different since they are based on edge pixels, but the calculations should be similar, just over a different set of pixels.
My name is Antonio Bernal.
In page 3 of this article you will find a good description for each value.
Letter Recognition Using Holland-Style Adaptive Classifiers.
If you have any doubt let me know.
I am trying to make this algorithm work, but my problem is that I do not know how to scale the values to fit them to the range 0-15.
Do you have any idea how to do this?
Another Link from Google scholar -> Letter Recognition Using Holland-Style Adaptive Classifiers