With a group of friend, we are trying to accomplish a computer vision task on Raspberry Pi, coding with C++ using OpenCV library.
Let me explain the task first.
There is a pattern consisting of 16 seperate squares with each square being red, yellow or blue colored. We are mounting rasperry pi on a quadcopter with its camera module and gathering video feed of the pattern.
We have to detect colors of squares which was easy to accomplish with a little research on web. Tricky part is we have to detect order of the squares as well in order to save the colors in an array in an order.
So far we have accomplished filtering desired colors (red, yellow, blue) to determine squares.
example pattern to recognize and our process so far
In the second image, we know the colors and center points of each square. What we need is a way to write them in an order to a file or on screen.
And to find the order, we tried several OpenCV methods that find corners. With corner points at hand, we compared each point and determined end points so we could draw a boundingrectangle and overcome little distortions.
But since quadcopter gets the video stream, there is always a chance of high distortion. That messes up our corner theory, resulting in wrong order of colors. For example it can capture an image like this:
highly distorted image
It is not right to find order of these squares by comparing their center points. It also won't work finding endpoints to draw a larger rectangle around them to flatten pattern. And then order...
What I ask for is algorithm suggestions. Are we totally going in the wrong direction trying to find corners? Is it possible to determine the order without taking distortion into consideration?
Thanks in advance.
Take the two centers that are the furthest apart and number them 1 and 16. Then find the two centers that are the furthest from the line 1-16, to the left (number 4) and to the right (number 13). Now you have the four corners.
Compute the affine transform that maps the coordinates of the corners 1, 4 and 13 to (0,0), (3,0) and (0,3). Apply this transform to the 16 centers and round to the nearest integers. If all goes well, you will obtain the "logical" coordinates of the squares, in range [0, 3] x [0, 3]. The mapping to the cell indexes is immediate.
Note that because of symmetry, a fourfold undeterminacy will remain, which you can probably lift by checking the color patterns.
This procedure will be very robust to deformations. If there is extreme perspective, you can even exploit the four corners to determine an homographic transform instead of affine. In your case, I doubt this will be useful. You can assess proper working by checking that all expected indexes have been assigned.
I want to follow the rightmost edge in the following picture with a line following robot.
I tried simple "thresholding", but unfortunately, it includes the blurry white halo:
The reason I threshold is to obtain a clean line from the Sobel edge detector:
Is there a good algorithm which I can use to isolate this edge/move along this edge? The one I am using currently seems error prone, but it's the best one I've been able to figure out so far.
Note: The edge may curve or be aligned in any direction, but a point on the edge will always lie very close to the center of the image. Here's a video of what I'm trying to do. It doesn't follow the edge after (1:35) properly due to the halo screwing up the thresholding.
Here's another sample:
Here, I floodfill the centermost edge to separate it from the little bump in the bottom right corner:
Simplest method (vertical line)
If you know that your image will have black on the right side of the line, here's a simple method:
1) apply the Sobel operator to find the first derivative in the x direction. The result will be an image that is most negative where your gradient is strongest. (Use a large kernel size to average out the halo effect. You can even apply a Gaussian blur to the image first, to get even more averaging if the 7x7 kernel isn't enough.)
2) For each row of the image, find the index of the minimum (i.e. most negative) value. That's your estimate of line position in that row.
3) Do whatever you want with that. (Maybe take the median of those line positions, on the top half and the bottom half of the image, to get an estimate of 2 points that describe the line.)
Slightly more advanced (arbitrary line)
Use this if you don't know the direction of the line, but you do know that it's straight enough that you can approximate it with a straight line.
1)
dx = cv2.Sobel(grayscaleImg,cv2.cv.CV_32F,1,0,ksize=7)
dy = cv2.Sobel(grayscaleImg,cv2.cv.CV_32F,0,1,ksize=7)
angle = np.atan2(dy,dx)
magnitudeSquared = np.square(dx)+np.square(dy)
You now have the angle (in radians) and magnitude of the gradient at each point in your image.
2) From here you can use basic numpy operations to find the line: Filter the points to only keep points where magnitudeSquared > some threshold. Then grab the most common angle (np.bincount() is useful for that). Now you know your line's angle.
3) Further filter the points to only keep points that are close to that angle. You now have all the points on your line. Fit a line through the coordinates of those points.
Most advanced and brittle (arbitrary curve)
If you really need to handle a curve, here's one way:
1) Use your method above to threshold the image. Manually tune the threshold until the white/black division happens roughly where you want it. (Probably 127 is not the right threshold. But if your lighting conditions are consistent, you might be able to find a threshold that works. Confirm it works across multiple images.)
2) Use OpenCV's findcontours() to fit a curve to the white/black boundary. If it's too choppy, use approxPolyDP() to simplify it.
I have written an algorithm to process a camera capture and extract a binary image of two features I'm interested in. I'm trying to find the best (fastest) way of detecting when the two features intersect and where the lowest (y coordinate is greatest) point is (this will be the intersection).
I do not want to use a findContours() based method as this is too slow and, in my opinion, unnecessary. I also think blob detection libraries are too bloated for this.
I have two sample images (sorry for low quality):
(not touching: http://i.imgur.com/7bQ9qMo.jpg)
(touching: http://i.imgur.com/tuSmKw7.jpg)
Due to the way these images are created, there is often noise in the top right corner which looks like pixelated lines but methods such as dilation and erosion lose resolution around the features I'm trying to find.
My initial thought would be to use direct pixel access to form a width filter and a height filter. The lowest point in the image is therefore the intersection.
I have no idea how to detect when they touch... logically I can see that a triangle is formed when they intersect and otherwise there is no enclosed black area. Can I fill the image starting from the corner with say, red, and then calculate how much of the image is still black?
Does anyone have any suggestions?
Thanks
Your suggestion is a way more slow than finding contours. For binary images, finding contour is very easy and quick because you just need to find a black pixel followed by a white pixel or vice versa.
Anyway, if you don't want to use it, you can use the vertical projection or vertical profile you will see it the objects intersect or not.
For example, in the following image check the the letter "n" which is little similar to non-intersecting object, and the letter "o" which is similar to intersecting objects :
By analyzing the histograms you can recognize which one is intersecting or not.
I can successfully threshold images and find edges in an image. What I am struggling with is trying to extract the angle of the black edges accurately.
I am currently taking the extreme points of the black edge and calculating the angle with the atan2 function, but because of aliasing, depending on the point you choose the angle can come out with some degree of variation. Is there a reliable programmable way of choosing the points to calculate the angle from?
Example image:
For example, the Gimp Measure tool angle at 3.12°,
If you're writing your own library, then creating a robust solution for this problem will allow you to develop several independent chunks of code that you can string together to solve other problems, too. I'll assume that you want to find the corners of the checkerboard under arbitrary rotation, under varying lighting conditions, in the presence of image noise, with a little nonlinear pincushion/barrel distortion, and so on.
Although there are simple kernel-based techniques to find whole pixels as edge pixels, when working with filled polygons you'll want to favor algorithms that can find edges with sub-pixel accuracy so that you can perform accurate line fits. Even though the gradient from dark square to white square crosses several pixels, the "true" edge will be found at some sub-pixel point, and very likely not the point you'd guess by manually clicking.
I tried to provide a simple summary of edge finding in this older SO post:
what is the relationship between image edges and gradient?
For problems like yours, a robust solution is to find edge points along the dark-to-light transitions with sub-pixel accuracy, then fit lines to the edge points, and use the line angles. If you are processing a true camera image, and if there is an uncorrected radial distortion in the image, then there are some potential problems with measurement accuracy, but we'll ignore those.
If you want to find an accurate fit for an edge, then it'd be great to scan for sub-pixel edges in a direction perpendicular to that edge. That presupposes that we have some reasonable estimate of the edge direction to begin with. We can first find a rough estimate of the edge orientation, then perform an accurate line fit.
The algorithm below may appear to have too many steps, but my purpose is point out how to provide a robust solution.
Perform a few iterations of erosion on black pixels to separate the black boxes from one another.
Run a connected components algorithm (blob-finding algorithm) to find the eroded black squares.
Identify the center (x,y) point of each eroded square as well as the (x,y) end points defining the major and minor axes.
Maintain the data for each square in a structure that has the total area in pixels, the center (x,y) point, the (x,y) points of the major and minor axes, etc.
As needed, eliminate all components (blobs) that are too small. For example, you would want to exclude all "salt and pepper" noise blobs. You might also temporarily ignore checkboard squares that are cut off by the image edges--we can return to those later.
Then you'll loop through your list of blobs and do the following for each blob:
Determine the direction roughly perpendicular to the edges of the checkerboard square. How you accomplish this depends in part on what data you calculate when you run your connected components algorithm. In a general-purpose image processing library, a standard connected components algorithm will determine dozens of properties and measurements for each individual blob: area, roundness, major axis direction, minor axis direction, end points of the major and minor axis, etc. For rectangular figures, it can be sufficient to calculate the topmost, leftmost, rightmost, and bottommost points, as these will define the four corners.
Generate edge scans in the direction roughly perpendicular to the edges. These must be performed on the original, unmodified image. This generally assumes you have bilinear interpolation implemented to find the grayscale values of sub-pixel (x,y) points such as (100.35, 25.72) since your scan lines won't fall exactly on whole pixels.
Use a sub-pixel edge point finding technique. In general, you'll perform a curve fit to the edge points in the direction of the scan, then find the real-valued (x,y) point at maximum gradient. That's the edge point.
Store all sub-pixel edge points in a list/array/collection.
Generate line fits for the edge points. These can use Hough, RANSAC, least squares, or other techniques.
From the line equations for each of your four line fits, calculate the line angle.
That algorithm finds the angles independently for each black checkerboard square. It may be overkill for this one application, but if you're developing a library maybe it'll give you some ideas about what sub-algorithms to implement and how to structure them. For example, the algorithm would rely on implementations of these techniques:
Image morphology (e.g. erode, dilate, close, open, ...)
Kernel operations to implement morphology
Thresholding to binarize an image -- the Otsu method is worth checking out
Connected components algorithm (a.k.a blob finding, or the OpenCV contours function)
Data structure for blob
Moment calculations for blob data
Bilinear interpolation to find sub-pixel (x,y) values
A linear ray-scanning technique to find (x,y) gray values along a specific direction (which will also rely on bilinear interpolation)
A curve fitting technique and means to determine steepest tangent to find edge points
Robust line fit technique: Hough, RANSAC, and/or least squares
Data structure for line equation, related functions
All that said, if you're willing to settle for a slight loss of accuracy, and if you know that the image does not suffer from radial distortion, etc., and if you just need to find the angle of the parallel lines defined by all checkboard edges, then you might try..
Simple kernel-based edge point finding technique (Laplacian on Gaussian-smoothed image)
Hough line fit to edge points
Choose the two line fits with the greatest number of votes, which should be one set of horizontal-ish lines and the other set of vertical-ish lines
There are also other techniques that are less accurate but easier to implement:
Use a kernel-based corner-finding operator
Find the angles between corner points.
And so on and so on. As you're developing your library and creating robust implementations of standalone functions that you can string together to create application-specific solutions, you're likely to find that robust solutions rely on more steps than you would have guessed, but it'll also be more clear what the failure mode will be at each incremental step, and how to address that failure mode.
Can I ask, what C++ library are you using to code this?
Jerry is right, if you actually apply a threshold to the image it would be in 2bit, black OR white. What you may have applied is a kind of limiter instead.
You can make a threshold function (if you're coding the image processing yourself) by applying the limiter you may have been using and then turning all non-white pixels black. If you have the right settings, the squares should be isolated and you will be able to calculate the angle.
Once this is done you can use a path finding algorithm to find some edge, any edge will do. If you find a more or less straight path, you can use the extreme points as you are doing now to determine the angle. Since the checker-board rotation is only relevant within 90 degrees, your angle should be modulo 90 degrees or pi over 2 radians.
I'm not sure it's (anywhere close to) the right answer, but my immediate reaction would be to threshold twice: once where anything but black is treated as white, and once where anything but white is treated as black.
Find the angle for each, then interpolate between the two angles.
Your problem have few solutions but all have one very important issue which you seem to neglect. Note: When you are trying to make geometrical calculation in the image, the points you use must be as far as possible one from the other. You are taking 2 points inside a single square. Those points are very close one to another, so a slight error in pixel location of of the points leads to a large error in the angle. Why do you use only a single square, when you have many squares in the image?
Here are few solutions:
Find the line angle of every square. You have at least 9 squares in the image, 4 lines in each square which give you total of 36 angles (18 will be roughly at 3[deg] and 18 will be ~93[deg]). Remove the 90[degrees] and you get 36 different measurements of the angle. Sort them and take the average of the middle 30 (disregarding the lower 3 and higher 3 measurements). This will give you an accurate result
Second solution, find the left extreme point of the leftmost square and the right extreme point of the rightmost square. Now calculate the angle between them. The result will be much more accurate because the points are far away.
A third algorithm which will give you accurate results because it doesn't involve finding any points and no need for thresholding. Just smooth the image, calculate gradients in X and Y directions (gx,gy), calculate the angle of the gradient in each pixel atan(gy,gx) and make histogram of the angles. You will have 2 significant peaks near the 3[deg] and 93[deg]. Just find the peaks by searching the maximum in the histogram. This will work even if you have a lot of noise in the image, even with antialising and jpg artifacts, and even if you have other drawings on the image. But remember, you must smooth the image a lot before calculating the derivatives.
I am currently reading into the topic of stereo vision, using the book of Hartley&Zimmerman alongside some papers, as I am trying to develop an algorithm capable of creating elevation maps from two images.
I am trying to come up with the basic steps for such an algorithm. This is what I think I have to do:
If I have two images I somehow have to find the fundamental matrix, F, in order to find the actual elevation values at all points from triangulation later on. If the cameras are calibrated this is straightforward if not it is slightly more complex (plenty of methods for this can be found in H&Z).
It is necessary to know F in order to obtain the epipolar lines. These are lines that are used in order to find image point x in the first image back in the second image.
Now comes the part were it gets a bit confusing for me:
Now I would start taking a image point x_i in the first picture and try to find the corresponding point x_i’ in the second picture, using some matching algorithm. Using triangulation it is now possible to compute the real world point X and from that it’s elevation. This process will be repeated for every pixel in the right image.
In the perfect world (no noise etc) triangulation will be done based on
x1=P1X
x2=P2X
In the real world it is necessary to find a best fit instead.
Doing this for all pixels will lead to the complete elevation map as desired, some pixels will however be impossible to match and therefore can't be triangulated.
What confuses me most is that I have the feeling that Hartley&Zimmerman skip the entire discussion on how to obtain your point correspondences (matching?) and that the papers I read in addition to the book talk a lot about disparity maps which aren’t mentioned in H&Z at all. However I think I understood correctly that the disparity is simply the difference x1_i- x2_i?
Is this approach correct, and if not where did I make mistakes?
Your approach is in general correct.
You can think of a stereo camera system as two points in space where their relative orientation is known. This are the optical centers. In front of each optical center, you have a coordinate system. These are the image planes. When you have found two corresponding pixels, you can then calculate a line for each pixel, wich goes throug the pixel and the respectively optical center. Where the two lines intersect, there is the object point in 3D. Because of the not perfect world, they will probably not intersect and one may use the point where the lines are closest to each other.
There exist several algorithms to detect which points correspond.
When using disparities, the two image planes need to be aligned such that the images are parallel and each row in image 1 corresponds to the same row in image 2. Then correspondences only need to be searched on a per row basis. Then it is also enough to know about the differences on x-axis of the single corresponding points. This is then the disparity.