I´m trying to compute the perimeter of a region in binary images. When a region is simply connected, i.e. it has no "holes", everything is quite simple: I just check for every pixel if it belongs to the region and has at least a neighbor which is not belonging to the region... I have a variable that counts the number of pixels satisfying this condition.
In the case of region with holes I use a different way. I start from a pixel in the border and "jump" to a neighbor (increasing a counter) if it is itself a border pixel. The procedure, with some more quirks, ends when I go back to the initial pixels. Something like this:
int iPosCol = iStartCol, int iPosRow = iStartRow;
do
{
//check neighbors, pick point on the perimeter
//condition: value == label, pixel at the border.
check8Neighbors(iPosCol, iPosRow);
updatePixPosition(iPosCol, iPosRow);
}
while ( iPosC != iStartC || iPosR != iStartR );
The problem is that this method won´t work if the holes in the region are close to the border (1-pixel distance).
Are there standard ways of computing perimeter of non simply connected regions, or am I approaching the problem in the wrong way?
As JCooper noted, connected component a.k.a. region labeling a.k.a. contour detection is an algorithm to find regions of connected pixels, typically in an image that has been binarized so that all pixels are black or white.
The Wikipedia entry for Connected-component labeling includes pseudocode for a "single pass" algorithm (http://en.wikipedia.org/wiki/Connected-component_labeling).
http://en.wikipedia.org/wiki/Connected-component_labeling
Another single-pass algorithm can be found in the paper "A Component-Labeling Algorithm Using Contour Tracing Technique" by Chang and Chen. This paper also includes a description of an edge-following algorithm you can use to find just the contour, if you'd like.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.58.3213
That paper describes edge-following well enough, but I'll describe the basic idea here.
Let's say the outer contour of a figure is represented by pixels a through f, and background pixels are represented by "-":
- - - - -
- a b - -
- - - c -
- - - d -
- f e - -
- - - - -
If we're scanning the image from top to bottom, and along each row from left to right, then the first pixel encountered is pixel a. To move from pixel a to pixel b, and then from b to c, we track the direction of each move using 8 directions defined relative to the current pixel, p:
6 7 8
5 p 1
4 3 2
The move from the background "-" to pixel "a" is along direction 1. Although we know where "b" is, the software doesn't, so we check all directions clockwise about "a" to find the next pixel along the contour. We don't need to check direction 5 (left) because we just came from the background pixel to the left of "a." What we do is check directions clockwise 6, 7, 8, 1, 2, etc., looking for the next contour pixel. We find "b" also along direction 1 after finding only background pixels in directions 6, 7, and 8 relative to "a."
If we look at the transition from c to d, we move in direction 3. To find the next contour pixel "e," we check directions 8, 1, 2, 3, 4, and we find contour pixel "e" by moving in direction 4.
The general rule is that if our last move was in direction d, the first direction we check for our next move is direction d - 3. If the last move was in direction 5 (moving left), then we start our next clockwise search at direction 2.
In code we would usually use directions 0 - 7, and clearly you'll make use of the modulo operation or similar math, but I hope the idea is clear. The Chang and Chen paper describes the basic contour-following algorithm reasonably well, and also mentions necessary checks if the algorithm needs to retrace certain pixels.
An edge-following algorithm may be sufficient for your needs, but for a variety of reasons you might want to find connection regions of pixels, too.
For a connected component algorithm, one thing to keep in mind is what you want to consider a "neighbor" of a pixel. You can look at just the "4-neighbors":
- n -
n p n
- n -
where "p" is the center pixel, "n" marks four neighbors, and "-" marks pixels that are not considered neighbors. You can also consider the "8-neighbors," which are simply all pixels surrounding a given pixel:
n n n
n p n
n n n
Typically, 4-neighbors are the better choice when checking connectivity for foreground objects. If you select an 8-neighbor technique, then a checkerboard pattern like the following could be considered a single object:
p - p - p - p - p
- p - p - p - p -
p - p - p - p - p
- p - p - p - p -
p - p - p - p - p
- p - p - p - p -
Let's say you have a blob that looks like the one below, with foreground pixels labeled as "p" and background pixels labeled as "-":
- - - - - - - - - -
- - - - p p p - - -
- - p p p - p p - -
- - - p p - p p - -
- - - - p p p - - -
- p p p p p p - - -
- - - - - - - - - -
If you consider just the pixels of the outer contour, you'll see that it can be a little tricky calculating the perimeter. For the pixels 1, 2, 3, 4, and 5 below, you can calculate the perimeter using the pixels 1 - 5, moving in stepwise fashion from pixel 1 to 2, then 2 to 3, etc. Typically it's better to calculate the perimeter for this segment using only pixels 1, 3, and 5 along the diagonal. For the single row of pixels at bottom, you must be careful that the algorithm does not count those pixels twice.
- - - - - - - - - -
- - - - p p p - - -
- - 1 2 p - p p - -
- - - 3 4 - p p - -
- - - - 5 p p - - -
- p p p p p p - - -
- - - - - - - - - -
For relatively large connected regions without "peninsulas" jutting out that are a single pixel wide, calculating the perimeter is relatively straightforward. For very small objects it's hard to calculate the "true" perimeter in part because we have a limited number of discrete, square pixels representing a real-world object with a contour that is likely smooth and slightly curvy. The image representation of the object is chunky.
If you have an ordered list of the pixels found from an edge tracing algorithm, then you can calculate the perimeter by checking the change in X and the change in Y of two successive pixels in the list of contour pixels. You calculate the perimeter by calculating the sum of pixel-to-pixel distances along contour pixels.
For pixel N and pixel N + 1: if either X is the same or Y is the same then the direction from N to N + 1 is left, right, up, or down, and the distance is 1.
If both X and Y are different for pixels N and N + 1, then the direction moving from one pixel to the next is at a 45-degree angle to the horizontal, and the distance between pixel centers is the square root of 2.
Whatever algorithm you create, consider checking its accuracy against simple figures: a square, a rectangle, a circle, etc. A circle is particular helpful to check perimeter calculation because the contour of a circle (esp. a small circle) in an image will have jagged rather than smooth edges.
- - - - - - - - - -
- - - p p p p - - -
- - p p p p p p - -
- - p p p p p p - -
- - p p p p p p - -
- - - p p p p - - -
- - - - - - - - - -
There are techniques to find shapes and calculate perimeters in grayscale and color images that don't rely on binarization to make the image just black and white, but those techniques are trickier. For many applications the simple, standard technique will work:
Choose a threshold to binarize the image.
Run a connected component / region labeling algorithm on the binarized image
Use the contour pixels of a connected region to calculate perimeter
An image processing textbook used in many universities has the answer to many of your questions about image processing. If you're going to delve into image processing, you should have at least one textbook like this one handy; it'll save you hours of hunting online for answers.
Digital Image Processing (3rd edition) by Gonzalez and Woods
Book website:
http://www.imageprocessingplace.com/
You should be able to find an international edition for about $35 online.
If you end up writing a lot of code to perform geometric calculations, another handy reference book is Geometric Tools for Computer Graphics by Schneider and Eberly.
http://www.amazon.com/Geometric-Computer-Graphics-Morgan-Kaufmann/dp/1558605940
It's pricey, but you can find used copies cheap sometimes at multi-site search engines like
http://www.addall.com
Corrections, PDFs of theory, and code from the book can be found here:
http://www.geometrictools.com/
So here is my proposition:
Let's assume you want to find the border of a black region(for simplicity).
First add one extra white column and one extra white row on all sides of the image. This is done to simplify corner cases and I will try to explain where it helps.
Next do a breadth first search from any pixel in your region. The edges in the graph are defined as connecting neighbouring cells in black color. By doing this BFS you will find all the pixels in your region. Now select the bottom-most(you can find it linerly) and if there are many bottom-most just select any of them. Select the pixel that is below it - this pixel is white for sure because: we selected the bottom-most of the pixels in our region and if the pixel was black the BFS would have visited it. Also there is a pixel below our bottom-most pixel because of the extra rows and columns we added.
Now do another BFS this time passing through white nighbouring pixels(again the fact that we added additional rows and columns helps here). This way we find a white region that surrounds the black region we are interested in from everywhere. Now all the pixels from the original black region that are neighbouring any of the pixels in the newly found white region are part of the border and only they are part of it. So you count those pixels and there you go - you have the perimeter.
The solution is complicated by the fact that we do not want to count borders of the holes as part of the perimeter - had this condition not be present we could just count all the pixels in the initial black region that are neighbouring any white pixel or the border of the image(here we do not need to add rows and colums).
Hope this answer helps.
Perhaps the simplest thing to do would be to run a connected component algorithm and then fill in the holes.
Related
I am trying to render an outline using Vulkan's stencil buffers. This technique involves rendering the object twice with the second one being scaled up in order to account for said outline. Normally this is done in 3D space in which the normal vectors for each vertex can be used to scale the object correctly. I however am trying the same in 2D space and without pre-calculated normals.
An Example: Given are the Coordinates I, H and J and I need to find L, K and M with the condition that the distance between each set of parallel vectors is the same.
I tried scaling up the object and then moving it to the correct location but that got me nowhere.
I am searching for a solution that is ideally applicable to arbitrary shapes in 2D space and also somewhat efficient. Also I am unsure if this should be calculated on the GPU or the CPU.
Lets draw an example of a single point of some 2D polygon.
The position of point M depends only on position of A and its two adjacent lines, I have added normals too - green and blue. Points P and Q line on the intersection of a shifted and non-shifted lines.
If we know the adjacent points of A - B , C and the distances to O and P, then
M = A - d_p * normalize(B-A) - d_o * normalize(C-A)
this is true because P, O lie on the lines B-A and C-A.
The distances are easy to compute from the two-color right triangles:
d_p=s/sin(alfa)
d_o=s/sin(alfa)
where s is the desired stencil shift. They are of the course the same.
So the whole computation, given coordinates of A,B,C of some polygon corner and the desired shift s is:
b = normalize(B-A) # vector
c = normalize(C-A) # vector
alfa = arccos(b.c) # dot product
d = s/sin(alfa)
M = A - sign(b.c) * (b+c)*d
This also proves that M lies on the alfa angle bisector line.
Anyway, the formula is generic and holds for any 2D polygon, it is easily parallelizible since each point is shifted independently of others. But
for non-convex corners, you need to use the opposite sign, we can use dot product to generalize.
It is not numerically stable for b.c close to zero i.e. when b,c lines are almost parallel, in that case I would recommend just shifting A by d*n_b where n_b is the normalized normal of B-A line, in 2D it is normalize((B.y - A.y, A.x-B.x)).
According to the HOG process, as described in the paper Histogram of Oriented Gradients for Human Detection (see link below), the contrast normalization step is done after the binning and the weighted vote.
I don't understand something - If I already computed the cells' weighted gradients, how can the normalization of the image's contrast help me now?
As far as I understand, contrast normalization is done on the original image, whereas for computing the gradients, I already computed the X,Y derivatives of the ORIGINAL image. So, if I normalize the contrast and I want it to take effect, I should compute everything again.
Is there something I don't understand well?
Should I normalize the cells' values?
Is the normalization in HOG not about contrast anyway, but is about the histogram values (counts of cells in each bin)?
Link to the paper:
http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf
The contrast normalization is achieved by normalization of each block's local histogram.
The whole HOG extraction process is well explained here: http://www.geocities.ws/talh_davidc/#cst_extract
When you normalize the block histogram, you actually normalize the contrast in this block, if your histogram really contains the sum of magnitudes for each direction.
The term "histogram" is confusing here, because you do not count how many pixels has direction k, but instead you sum the magnitudes of such pixels. Thus you can normalize the contrast after computing the block's vector, or even after you computed the whole vector, assuming that you know in which indices in the vector a block starts and a block ends.
The steps of the algorithm due to my understanding - worked for me with 95% success rate:
Define the following parameters (In this example, the parameters are like HOG for Human Detection paper):
A cell size in pixels (e.g. 6x6)
A block size in cells (e.g. 3x3 ==> Means that in pixels it is 18x18)
Block overlapping rate (e.g. 50% ==> Means that both block width and block height in pixels have to be even. It is satisfied in this example, because the cell width and cell height are even (6 pixels), making the block width and height also even)
Detection window size. The size must be dividable by a half of the block size without remainder (so it is possible to exactly place the blocks within with 50% overlapping). For example, the block width is 18 pixels, so the windows width must be a multiplication of 9 (e.g. 9, 18, 27, 36, ...). Same for the window height. In our example, the window width is 63 pixels, and the window height is 126 pixels.
Calculate gradient:
Compute the X difference using convolution with the vector [-1 0 1]
Compute the Y difference using convolution with the transpose of the above vector
Compute the gradient magnitude in each pixel using sqrt(diffX^2 + diffY^2)
Compute the gradient direction in each pixel using atan(diffY / diffX). Note that atan will return values between -90 and 90, while you will probably want the values between 0 and 180. So just flip all the negative values by adding to them +180 degrees. Note that in HOG for Human Detection, they use unsigned directions (between 0 and 180). If you want to use signed directions, you should make a little more effort: If diffX and diffY are positive, your atan value will be between 0 and 90 - leave it as is. If diffX and diffY are negative, again, you'll get the same range of possible values - here, add +180, so the direction is flipped to the other side. If diffX is positive and diffY is negative, you'll get values between -90 and 0 - leave them the same (You can add +360 if you want it positive). If diffY is positive and diffX is negative, you'll again get the same range, so add +180, to flip the direction to the other side.
"Bin" the directions. For example, 9 unsigned bins: 0-20, 20-40, ..., 160-180. You can easily achieve that by dividing each value by 20 and flooring the result. Your new binned directions will be between 0 and 8.
Do for each block separately, using copies of the original matrix (because some blocks are overlapping and we do not want to destroy their data):
Split to cells
For each cell, create a vector with 9 members (one for each bin). For each index in the bin, set the sum of all the magnitudes of all the pixels with that direction. We have totally 6x6 pixels in a cell. So for example, if 2 pixels have direction 0 while the magnitude of the first one is 0.231 and the magnitude of the second one is 0.13, you should write in index 0 in your vector the value 0.361 (= 0.231 + 0.13).
Concatenate all the vectors of all the cells in the block into a large vector. This vector size should of course be NUMBER_OF_BINS * NUMBER_OF_CELLS_IN_BLOCK. In our example, it is 9 * (3 * 3) = 81.
Now, normalize this vector. Use k = sqrt(v[0]^2 + v[1]^2 + ... + v[n]^2 + eps^2) (I used eps = 1). After you computed k, divide each value in the vector by k - thus your vector will be normalized.
Create final vector:
Concatenate all the vectors of all the blocks into 1 large vector. In my example, the size of this vector was 6318
I am studying image processing these days and I am a beginner to the subject. I got stuck on the subject of convolution and how to implement it for images. Let me brief - there is a general formula of convolution for images like so:
x(n1,n2) represents a pixel in the output image, but I do not know what k1 and k2 stand for. Actually, this is what would like to learn. In order to implement this in some programming language, I need to know what k1 and k2 stand for. Can someone explain me this to me or lead me to an article? I would be really appreciative of any help.
Convolution in this case deals with extracting out patches of image pixels that surround a target image pixel. When you perform image convolution, you perform this with what is known as a mask or point spread function or kernel and this is usually much smaller than the size of the image itself.
For each target image pixel in the output image, you grab a neighbourhood of pixel values from the input, including the pixel that is at the same target coordinates in the input. The size of this neighbourhood coincides with exactly the same size as the mask. At that point, you rotate the mask so that it's 180 degrees, then do an element-by-element multiplication of each value in the mask with the pixel values that coincide at each location in the neighbourhood. You add all of these up, and that is the output for the target pixel in the target image.
For example, let's say I had this small image:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
And let's say I wanted to perform an averaging within a 3 x 3 window, so my mask would all be:
[1 1 1]
1/9*[1 1 1]
[1 1 1]
To perform 2D image convolution, rotating the mask by 180 degrees still gives us the same mask, and so let's say I wanted to find the output at row 2, column 2. The 3 x 3 neighbourhood I would extract is:
1 2 3
6 7 8
11 12 13
To find the output, I would multiply each value in the mask by the same location of the neighbourhood:
[1 2 3 ] [1 1 1]
[6 7 8 ] ** (1/9)*[1 1 1]
[11 12 13] [1 1 1]
Perform a point by point multiplication and adding the values would give us:
1(1/9) + 2(1/9) + 3(1/9) + 6(1/9) + 7(1/9) + 8(1/9) + 11(1/9) + 12(1/9) + 13(1/9) = 63/9 = 7
The output at location (2,2) in the output image would be 7.
Bear in mind that I didn't tackle the case where the mask would go out of bounds. Specifically, if I tried to find the output at row 1, column 1 for example, there would be five locations where the mask would go out of bounds. There are many ways to handle this. Some people consider those pixels outside to be zero. Other people like to replicate the image border so that the border pixels are copied outside of the image dimensions. Some people like to pad the image using more sophisticated techniques like doing symmetric padding where the border pixels are a mirror reflection of what's inside the image, or a circular padding where the border pixels are copied from the other side of the image.
That's beyond the scope of this post, but in your case, start with the most simplest case where any pixels that go outside the bounds of the image when you're collecting neighbourhoods, set those to zero.
Now, what does k1 and k2 mean? k1 and k2 denote the offset with respect to the centre of the neighbourhood and mask. Notice that the n1 - k1 and n2 - k2 are important in the sum. The output position is denoted by n1 and n2. Therefore, n1 - k1 and n2 - k2 are the offsets with respect to this centre in both the horizontal sense n1 - k1 and the vertical sense n2 - k2. If we had a 3 x 3 mask, the centre would be k1 = k2 = 0. The top-left corner would be k1 = k2 = -1. The bottom right corner would be k1 = k2 = 1. The reason why they go to infinity is because we need to make sure we cover all elements in the mask. Masks are finite in size so that's just to ensure that we cover all of the mask elements. Therefore, the above sum simplifies to that point by point summation I was talking about earlier.
Here's a better illustration where the mask is a vertical Sobel filter which finds vertical gradients in an image:
Source: http://blog.saush.com/2011/04/20/edge-detection-with-the-sobel-operator-in-ruby/
As you can see, for each output pixel in the target image, we take a look at a neighbourhood of pixels in the same spatial location in the input image, and that's 3 x 3 in this case, we perform a weighted element by element sum between the mask and the neighbourhood and we set the output pixel to be the total sum of these weighted elements. Bear in mind that this example does not rotate the mask by 180 degrees, but that's what you do when it comes to convolution.
Hope this helps!
$k_1$ and $k_2$ are variables that should cover the whole definition area of your kernel.
Check out wikipedia for further description:
http://en.wikipedia.org/wiki/Kernel_%28image_processing%29
I've spent many frustrating hours and cannot figure this out, i understand collision and have it working until i try to implement gravity, i cant seem to set the player position after it hits the tile map, falling through the ground is my problem, x axis a variation of the following code works fine
if (background.colMap[tiles[i].y][tiles[i].x] == 1)
{
playerSpeed.y = -0.f;
playerSprite.setPosition(playerSprite.getPosition().x, playerSprite.getPosition().y - 1);
inAir = false;
}
I though by reducing the speed to 0 and bumping the player back 1 pixel would work, but all it does is the player sprite bounces up and down
Given the above information, I assume you're making a side-scrolling game, and your character is colliding with the top of a tile.
That said, the first thing you need to understand is that you're not supposed to adjust the position of the character after it moved but before it moved. The character is never supposed to be in a position that is "illegal" in your game. Even for a fraction of a second.
You have the power to see the future (at least in your own game), use it at will! Always be one step ahead.
How to find the right place?
Basic algebra!
Here's the situation.
The goal here is to find where the green and red dotted line cross the small blue dotted line (which represent the ground).
First, we need to find the equation of our character (black dot) trajectory, which should looks like: y = ax + b.
Where the slope a = (Y2 - Y1)/(X2 - X1) and b = y - ax.
In our example case, the equation is y = -2x + 10. We just need to know what is the value of X when Y = 3, which we can find with x = (y - b) / a and in our case, x = (3 - 10) / (-2) = 3.5.
So we now know that our character will be intersecting with the floor at (3.5, 3) and that's where we will put the character.
Flaws of your current technique
When you collide, you put the character up 1 pixel if I understand correctly your code.
Imagine that your character is going really fast and in one update of his position, he gets from a valid position to an invalid one, 25 pixels below the ground. With your current technique, it will take at least 25 position updates to get back on the ground or maybe just 25 collision detection loops, but still, it's not very efficient.
Another thing, you seem to be looping every possible tiles in the level, so that's probably mostly empty tiles and/or full ground (inaccessible) tiles, which is a big overhead on what you really need.
The best option would be to store the coordinates of collidable tiles and just iterate those tiles.
If you have a screen of, let's say 50 x 50 tiles, and there are only 25 collidable tiles, you're still checking 50 * 50 - 25 = 2475 tiles, and these are 2475 unnecessary checks. But obviously, that's not the reason why you are having trouble, even those 2475 unnecessary checks won't break the logic.
And just to play with the numbers, since our character is 25 pixels below, we'll loop 25 times 2500 tiles, which is 62500 checks, instead of 25 * 25 = 625 with a collidable tile collection, or just 25 checks with the math stuff.
This is quite complicated to explain, so I will do my best, sorry if there is anything I missed out, let me know and I will rectify it.
My question is, I have been tasked to draw this shape,
(source: learnersdictionary.com)
This is to be done using C++ to write code that will calculate the points on this shape.
Important details.
User Input - Centre Point (X, Y), number of points to be shown, Font Size (influences radius)
Output - List of co-ordinates on the shape.
The overall aim once I have the points is to put them into a graph on Excel and it will hopefully draw it for me, at the user inputted size!
I know that the maximum Radius is 165mm and the minimum is 35mm. I have decided that my base [Font Size][1] shall be 20. I then did some thinking and came up with the equation.
Radius = (Chosen Font Size/20)*130. This is just an estimation, I realise it probably not right, but I thought it could work at least as a template.
I then decided that I should create two different circles, with two different centre points, then link them together to create the shape. I thought that the INSIDE line will have to have a larger Radius and a centre point further along the X-Axis (Y staying constant), as then it could cut into the outside line.*
*(I know this is not what it looks like on the picture, just my chain of thought as it will still give the same shape)
So I defined 2nd Centre point as (X+4, Y). (Again, just estimation, thought it doesn't really matter how far apart they are).
I then decided Radius 2 = (Chosen Font Size/20)*165 (max radius)
So, I have my 2 Radii, and two centre points.
This is my code so far (it works, and everything is declared/inputted above)
for(int i=0; i<=n; i++) //output displayed to user
{
Xnew = -i*(Y+R1)/n; //calculate x coordinate
Ynew = pow((((Y+R1)*(Y+R1)) - (Xnew*Xnew)), 0.5); //calculate y coordinate
AND
for(int j=0; j<=n; j++)//calculation for angles and output displayed to user
{
Xnew2 = -j*(Y+R2)/((n)+((0.00001)*(n==0))); //calculate x coordinate
Ynew2 = Y*(pow(abs(1-(pow((Xnew2/X),2))),0.5));
if(abs(Ynew2) <= R1)
cout<<"\n("<<Xnew2<<", "<<Ynew2<<")"<<endl;
I am having the problem drawing the crescent moon that I cannot get the two circles to have the same starting point?
I have managed to get the results to Excel. Everything in that regard works. But when i plot the points on a graph on Excel, they do not have the same starting points. Its essentially just two half circles, one smaller than the other (Stops at the Y axis, giving the half doughnut shape).
If this makes sense, I am trying to get two parts of circles to draw the shape as such that they have the same start and end points.
If anyone has any suggestions on how to do this, it would be great, currently all I am getting more a 'half doughnut' shape, due to the circles not being connected.
So. Does anyone have any hints/tips/links they can share with me on how to fix this exactly?
Thanks again, any problems with the question, sorry will do my best to rectify if you let me know.
Cheers
Formular for points on a circle:
(x-h)^2+(y-k)^2=r^2
The center of the circle is at (h/k)
Solving for y
2y1 = k +/- sqrt( -x^2 + 2hx +r^2 - h^2)
So now if the inner circle has its center # h/k , the half-moon will begin # h and will stretch to h - r2
Now you need to solve the endpoint formular for the inner circle and the outter circle and plot it. Per x you should receive 4 points (solve the equation two times, each with two solutions)
I did not implement it, but this would be my train of thought...