I took the difference of two consecutive frames of a video. What I got (as you know) a black frame except the moving objects. The moving objects are white. I want to count the number of white pixels in the frame. I mean, I want to go through the image row by row and if the value of the ith pixel is greater than a specified number (say 50) then they must be stored in an array. Later on I will use this array to check if there is actually an object or just a noise. For example, if a car is moving in the video then after frame differencing I will check each pixel of the frames, containing the car, row by row, to detect that car. As a result when there is a moving car in the video the pixels' values are greater than 0 after frame differencing. Any idea how can I sum all the pixels of the moving car that will enable me to decide if it is a car or just a noise.
Thanks in advance :)
You'll probably find that the difference is non-trivial. For instance, you will probably find that the biggest difference is near the edges of the car, perpendicular to the movement of the car. One of those two edges will have negative values, one positive. Therefore, the biggest advantage of the "difference image" is that you restrict your search area. In isolation it's not very useful.
So, what should you do? Well, use an edge detection algorithm on the normal image, and compare the edge found there with the 2 edges found in the difference image. The edges belonging to the car will connect the 2 edges from the difference image.
You could use blob detection: http://www.labbookpages.co.uk/software/imgProc/blobDetection.html
to detect a blob of white pixels in each "difference image". Once you have the blobs you can find their center by finding the average of their pixel positions. Then you can find the path swept out by these centers and check it against some criterion.
Without knowing more about your images I cannot suggest a criterion, but for example if you are watching them move down a straight road you might expect all the points to be roughly co-linear. In this case, you can get the gradient and a point where a blob is found and use the point-gradient form of a line to get the lines equation:
y - y_1 = m(x - x_1)
For example given a point (4, 2) and gradient 3 you would get
y - 2 = 3(x - 4)
y = 3x - 2
You can then check all points against this line to see if they lie along it.
Related
[1 0 0 0; 0 1 0 0; 0 0 1/f 0][x y z 1]' = [x y z/f]' -> (fx/z f*y/z) = (u,v)
This converts 3D points (x,y,z) to pixels (u,v). How can I go from pixels to 3D points? Sorry, I'm not very smart.
Unfortunately, you lose depth information when you project a point. So you can recover the original 3d point only up to scale. Let's re-write your transformation like this:
calib_mat=[f 0 0 ;
0 f 0 ;
0 0 1]
I removed the last column since it doesn't have any impact. Then we have
calib_mat*[x y z]'==[fx fy z]=1/z * [fx/z fy/z 1]= 1/z * [u v 1].
Now, assume you know [u v 1] and you want to recover the 3d point. But now, the depth information is lost, so what you know is
calib_mat*[x y z]'= (1/unknown_depth) * [ u v 1]
Therefore,
[x y z]'=(1/unknown_depth)*inverse(calib_mat)*[u v 1]
So you have obtained what you wanted, but up to scale. To recover the depth of the point, you either need multiple (at least two) views of the point in question (for triangluation, for example). Or, if you are not in a computer vision context but in a rendering context, you can save the depth in some sort of z-buffer when you are projecting the point.
When you project three dimensional space onto a two dimensional image, you lose information about the depth and it is difficult to obtain the lost depth information about depth from only one frame. However, depth information can be regained if you have another image of the same scene taken from a different angle. Your brain does something similar to understand depth by using the "images" from your two eyes to give you an understanding of the depth of the world around you.
The underlying principles of stereo reconstruction is best explained this way: hold any object close to your eyes, then close one eye. Then open that eye and close the other. Then do the same thing again, but move the object farther away from your eyes. You will notice that the object will move a lot more when you switch which eye is opened when the object is close to your eyes than it does when it is farther. In the context of two images, the amount (in pixels) a single feature on one image moves between two images of the same scene is called "disparity." To calculate relative depth of the scene, simply take (1.0/disparity). To obtain the absolute depth of the scene (e.g. in meters or some unit of measurement), the focal length and baseline (distance between the two camera locations) is needed (and equations for doing so are discussed later).
Now that you know how the depth of each pixel is calculated, all that is left is to match features so that you can calculate disparity. If you were to iteratively find each pixel in your first image in the second image, it would become quickly unwieldy. However, the "search problem" is simplified by the fact that there exists an "epipolar line" between any two images that significantly decreases the possible locations for a feature in image1 to appear in image2. The easiest way to visualize this is to think of two cameras placed such that the only difference between the first and second camera is that the second camera has been moved horizontally from the first (so the cameras are at the same height, and both are the same depth away from the scene). Intuitively, say there is a ball in image1 at a certain pixel (x1, y1). Given that the cameras have taken a picture of the same ball at the same height as each other, it is intuitive that, while the pixel location of the feature in image1 of the ball may not be in the same location in image2, that at least the same feature in image2 will have the same y as it did in image1 since they were both taken at the same height. In that case, the epipolar line would be completely horizontal. With knowledge of this epipolar line, one no longer needs to search all of image2 for the location of a feature found image1 -- instead only the epipolar line through the location of the feature in image1 needs to be searched in image2. While the cameras do not need to be placed next to each other with no difference between their positions except horizontal translation, it makes computation much simpler and more intuitive, as otherwise the epipolar line would be sloped. So, in order to match feature1 from image1 to feature2 in image2, one simply must use a feature comparison technique (normalized cross correlation is often used) to determine what the most likely location of feature2 in image2 is. Given the matched location of a feature in both images, the disparity can be calculated by taking the magnitude between the two pixels.
After features are matched, the disparity of the pixel is calculated through some equations shown on page 7 of these lecture notes, where b is the baseline between the cameras, and l is the focal length in the unit of measurement you wish to use (e.g. inches, meters, etc.). If you are only looking for relative three dimensional location of the pixels in the image, and don't care about the location of the pixels (i.e. a point on the left of an image will still be in the left in the reconstruction, and a point farther back in the image will be farther back in the reconstruction), arbitrary non-zero values can be chosen for focal length and baseline. Those notes also explain some more intuition as to why this works if you are still curious.
Feel free to ask any questions, and there's no reason to be down on yourself -- either way you are seeking out knowledge and that is commendable.
With a group of friend, we are trying to accomplish a computer vision task on Raspberry Pi, coding with C++ using OpenCV library.
Let me explain the task first.
There is a pattern consisting of 16 seperate squares with each square being red, yellow or blue colored. We are mounting rasperry pi on a quadcopter with its camera module and gathering video feed of the pattern.
We have to detect colors of squares which was easy to accomplish with a little research on web. Tricky part is we have to detect order of the squares as well in order to save the colors in an array in an order.
So far we have accomplished filtering desired colors (red, yellow, blue) to determine squares.
example pattern to recognize and our process so far
In the second image, we know the colors and center points of each square. What we need is a way to write them in an order to a file or on screen.
And to find the order, we tried several OpenCV methods that find corners. With corner points at hand, we compared each point and determined end points so we could draw a boundingrectangle and overcome little distortions.
But since quadcopter gets the video stream, there is always a chance of high distortion. That messes up our corner theory, resulting in wrong order of colors. For example it can capture an image like this:
highly distorted image
It is not right to find order of these squares by comparing their center points. It also won't work finding endpoints to draw a larger rectangle around them to flatten pattern. And then order...
What I ask for is algorithm suggestions. Are we totally going in the wrong direction trying to find corners? Is it possible to determine the order without taking distortion into consideration?
Thanks in advance.
Take the two centers that are the furthest apart and number them 1 and 16. Then find the two centers that are the furthest from the line 1-16, to the left (number 4) and to the right (number 13). Now you have the four corners.
Compute the affine transform that maps the coordinates of the corners 1, 4 and 13 to (0,0), (3,0) and (0,3). Apply this transform to the 16 centers and round to the nearest integers. If all goes well, you will obtain the "logical" coordinates of the squares, in range [0, 3] x [0, 3]. The mapping to the cell indexes is immediate.
Note that because of symmetry, a fourfold undeterminacy will remain, which you can probably lift by checking the color patterns.
This procedure will be very robust to deformations. If there is extreme perspective, you can even exploit the four corners to determine an homographic transform instead of affine. In your case, I doubt this will be useful. You can assess proper working by checking that all expected indexes have been assigned.
I want to obtain all the pixels in an image with pixel values closest to certain pixels in an image. For example, I have an image which has a view of ocean (deep blue), clear sky (light blue), beach, and houses. I want to find all the pixels that are closest to deep blue in order to classify it as water. My problem is sky also gets classified as water. Someone suggested to use K nearest neighbor algorithm, but there are few examples online that use old C style. Can anyone provide me example on K-NN using OpenCv C++?
"Classify it as water" and "obtain all the pixels in an image with pixel values closest to certain pixels in an image" are not the same task. Color properties is not enough for classification you described. You will always have a number of same colored points on water and sky. So you have to use more detailed analysis. For instance if you know your object is self-connected you can use something like water-shred to fill this region and ignore distant and not connected regions in sky of the same color as water (suppose you will successfully detect by edge-detector horizon-line which split water and sky).
Also you can use more information about object you want to select like structure: calculate its entropy etc. Then you can use also K-nearest neighbor algorithm in multi-dimensional space where 1st 3 dimensions is color, 4th - entropy etc. But you can also simply check every image pixel if it is in epsilon-neighborhood of selected pixels area (I mean color-entropy 4D-space, 3 dimension from color + 1 dimension from entropy) using simple Euclidean metric -- it is pretty fast and could be accelerated by GPU .
I am trying to develop box sorting application in qt and using opencv. I want to measure width and length of box.
As shown in image above i want to detect only outermost lines (ie. box edges), which will give me width and length of box, regardless of whatever printed inside the box.
What i tried:
First i tried using Findcontours() and selected contour with max area, but the contour of outer edge is not enclosed(broken somewhere in canny output) many times and hence not get detected as a contour.
Hough line transform gives me too many lines, i dont know how to get only four lines am interested in out of that.
I tried my algorithm as,
Convert image to gray scale.
Take one column of image, compare every pixel with next successive pixel of that column, if difference in there value is greater than some threshold(say 100) that pixel belongs to edge, so store it in array. Do this for all columns and it will give upper line of box parallel to x axis.
Follow the same procedure, but from last column and last row (ie. from bottom to top), it will give lower line parallel to x axis.
Likewise find lines parallel to y axis as well. Now i have four arrays of points, one for each side.
Now this gives me good results if box is placed in such a way that its sides are exactly parallel to X and Y axis. If box is placed even slightly oriented in some direction, it gives me diagonal lines which is obvious as shown in below image.
As shown in image below i removed first 10 and last 10 points from all four arrays of points (which are responsible for drawing diagonal lines) and drew the lines, which is not going to work when box is tilted more and also measurements will go wrong.
Now my question is,
Is there any simpler way in opencv to get only outer edges(rectangle) of box and get there dimensions, ignoring anything printed on the box and oriented in whatever direction?
I am not necessarily asking to correct/improve my algorithm, but any suggestions on that also welcome. Sorry for such a big post.
I would suggest the following steps:
1: Make a mask image by using cv::inRange() (documentation) to select the background color. Then use cv::not() to invert this mask. This will give you only the box.
2: If you're not concerned about shadow, depth effects making your measurment inaccurate you can proceed right away with trying to use cv::findContours() again. You select the biggest contour and store it's cv::rotatedRect.
3: This cv::rotatedRect will give you a rotatedRect.size that defines the width en the height of your box in pixels
Since the box is placed in a contrasting background, you should be able to use Otsu thresholding.
threshold the image (use Otsu method)
filter out any stray pixels that are outside the box region (let's hope you don't get many such pixels and can easily remove them with a median or a morphological filter)
find contours
combine all contour points and get their convex hull (idea here is to find the convex region that bounds all these contours in the box region regardless of their connectivity)
apply a polygon approximation (approxPolyDP) to this convex hull and check if you get a quadrangle
if there are no perspective distortions, you should get a rectangle, otherwise you will have to correct it
if you get a rectangle, you have its dimensions. You can also find the minimum area rectangle (minAreaRect) of the convexhull, which should directly give you a RotatedRect
I want to compute the number of black pixel in arbitrary shapes in a picture. There might be several objects, like in the picture at the bottom.
I suspect that the problem is solveable with dynamic programming, i.e. traverse the pixels row-wise and add the black pixels. I just don't know how to correctly unite the size of two parts.
I'm pretty sure there are algorithms that solve my problem, but i obviously use the wrong search terms.
Can you please provide me with a good (fast) algorithm to do so, Bonus points if the algorithm is written in c++ and compatible to Mat from the OpenCV library. ;)
Result for this (zoomed) picture would be something like: 15 for Object at top left, 60 for big blob,...
I think i found a solution (better ones are obviously welcome!):
Integrated the size computation into a Connected Component Algorithm.
In the Connected Component algorithm, we generate a new Image in which there are labels (numbers) instead of the black pixels. All pixel of one area have the same label.
New to CC-Algo is a table in which the total amount of pixel for each label is stored. That way i know for every connected component the correct size.
Process the image from left to right, top to bottom:
1.) If the next pixel to process is white:
do nothing
2.) If the next pixel to process is black:
i.) If only one of its neighbors (top or left) is black, copy its label and +1 in the size table for that label.
ii.) If both are black and have the same label, copy it and +1 in the size table for that label.
iii.) If they have different labels Copy the label from the left. Update the equivalence table and +1 in the size table for the left label.
iv.) Otherwise, assign a new label and update the size table with that label and value 1.
• Re-label with the smallest of equivalent labels and update size table accordingly
The problem can be solved using flood fill in following way : -
Keep 2-D boolean array to track if pixel is already visited initially set to false
scan the image pixel by pixel.
if pixel is unvisited and black then apply flood fill on it,
During floodfill count the number of call also mark visited pixel made as they are the no of pixels connected.
Terminate floodfill when white pixels are encountered.
Count is the size of the region containing the pixel.
Flood Fill
If I well understoud, in an image like your sample you want your alogirthm to return 6 values on for each black shapes. And, each value the number of black pixels.
The algorithm I would use for this is the following :
Invert Pixels colors of your image (so now, you are looking for white pixels)
Find contours in your image. Don't forget to find only EXTERNAL Countours.
For each contours found :
Draw each contour in a small cv::Mat with a pixel value of 1. then compute the moment of order 0 of this image. The moment of order 0 will be the number of pixel in the shape.