Perspective projection backwards - computer-vision

[1 0 0 0; 0 1 0 0; 0 0 1/f 0][x y z 1]' = [x y z/f]' -> (fx/z f*y/z) = (u,v)
This converts 3D points (x,y,z) to pixels (u,v). How can I go from pixels to 3D points? Sorry, I'm not very smart.

Unfortunately, you lose depth information when you project a point. So you can recover the original 3d point only up to scale. Let's re-write your transformation like this:
calib_mat=[f 0 0 ;
0 f 0 ;
0 0 1]
I removed the last column since it doesn't have any impact. Then we have
calib_mat*[x y z]'==[fx fy z]=1/z * [fx/z fy/z 1]= 1/z * [u v 1].
Now, assume you know [u v 1] and you want to recover the 3d point. But now, the depth information is lost, so what you know is
calib_mat*[x y z]'= (1/unknown_depth) * [ u v 1]
Therefore,
[x y z]'=(1/unknown_depth)*inverse(calib_mat)*[u v 1]
So you have obtained what you wanted, but up to scale. To recover the depth of the point, you either need multiple (at least two) views of the point in question (for triangluation, for example). Or, if you are not in a computer vision context but in a rendering context, you can save the depth in some sort of z-buffer when you are projecting the point.

When you project three dimensional space onto a two dimensional image, you lose information about the depth and it is difficult to obtain the lost depth information about depth from only one frame. However, depth information can be regained if you have another image of the same scene taken from a different angle. Your brain does something similar to understand depth by using the "images" from your two eyes to give you an understanding of the depth of the world around you.
The underlying principles of stereo reconstruction is best explained this way: hold any object close to your eyes, then close one eye. Then open that eye and close the other. Then do the same thing again, but move the object farther away from your eyes. You will notice that the object will move a lot more when you switch which eye is opened when the object is close to your eyes than it does when it is farther. In the context of two images, the amount (in pixels) a single feature on one image moves between two images of the same scene is called "disparity." To calculate relative depth of the scene, simply take (1.0/disparity). To obtain the absolute depth of the scene (e.g. in meters or some unit of measurement), the focal length and baseline (distance between the two camera locations) is needed (and equations for doing so are discussed later).
Now that you know how the depth of each pixel is calculated, all that is left is to match features so that you can calculate disparity. If you were to iteratively find each pixel in your first image in the second image, it would become quickly unwieldy. However, the "search problem" is simplified by the fact that there exists an "epipolar line" between any two images that significantly decreases the possible locations for a feature in image1 to appear in image2. The easiest way to visualize this is to think of two cameras placed such that the only difference between the first and second camera is that the second camera has been moved horizontally from the first (so the cameras are at the same height, and both are the same depth away from the scene). Intuitively, say there is a ball in image1 at a certain pixel (x1, y1). Given that the cameras have taken a picture of the same ball at the same height as each other, it is intuitive that, while the pixel location of the feature in image1 of the ball may not be in the same location in image2, that at least the same feature in image2 will have the same y as it did in image1 since they were both taken at the same height. In that case, the epipolar line would be completely horizontal. With knowledge of this epipolar line, one no longer needs to search all of image2 for the location of a feature found image1 -- instead only the epipolar line through the location of the feature in image1 needs to be searched in image2. While the cameras do not need to be placed next to each other with no difference between their positions except horizontal translation, it makes computation much simpler and more intuitive, as otherwise the epipolar line would be sloped. So, in order to match feature1 from image1 to feature2 in image2, one simply must use a feature comparison technique (normalized cross correlation is often used) to determine what the most likely location of feature2 in image2 is. Given the matched location of a feature in both images, the disparity can be calculated by taking the magnitude between the two pixels.
After features are matched, the disparity of the pixel is calculated through some equations shown on page 7 of these lecture notes, where b is the baseline between the cameras, and l is the focal length in the unit of measurement you wish to use (e.g. inches, meters, etc.). If you are only looking for relative three dimensional location of the pixels in the image, and don't care about the location of the pixels (i.e. a point on the left of an image will still be in the left in the reconstruction, and a point farther back in the image will be farther back in the reconstruction), arbitrary non-zero values can be chosen for focal length and baseline. Those notes also explain some more intuition as to why this works if you are still curious.
Feel free to ask any questions, and there's no reason to be down on yourself -- either way you are seeking out knowledge and that is commendable.

Related

Finding order of objects with OpenCV 3

With a group of friend, we are trying to accomplish a computer vision task on Raspberry Pi, coding with C++ using OpenCV library.
Let me explain the task first.
There is a pattern consisting of 16 seperate squares with each square being red, yellow or blue colored. We are mounting rasperry pi on a quadcopter with its camera module and gathering video feed of the pattern.
We have to detect colors of squares which was easy to accomplish with a little research on web. Tricky part is we have to detect order of the squares as well in order to save the colors in an array in an order.
So far we have accomplished filtering desired colors (red, yellow, blue) to determine squares.
example pattern to recognize and our process so far
In the second image, we know the colors and center points of each square. What we need is a way to write them in an order to a file or on screen.
And to find the order, we tried several OpenCV methods that find corners. With corner points at hand, we compared each point and determined end points so we could draw a boundingrectangle and overcome little distortions.
But since quadcopter gets the video stream, there is always a chance of high distortion. That messes up our corner theory, resulting in wrong order of colors. For example it can capture an image like this:
highly distorted image
It is not right to find order of these squares by comparing their center points. It also won't work finding endpoints to draw a larger rectangle around them to flatten pattern. And then order...
What I ask for is algorithm suggestions. Are we totally going in the wrong direction trying to find corners? Is it possible to determine the order without taking distortion into consideration?
Thanks in advance.
Take the two centers that are the furthest apart and number them 1 and 16. Then find the two centers that are the furthest from the line 1-16, to the left (number 4) and to the right (number 13). Now you have the four corners.
Compute the affine transform that maps the coordinates of the corners 1, 4 and 13 to (0,0), (3,0) and (0,3). Apply this transform to the 16 centers and round to the nearest integers. If all goes well, you will obtain the "logical" coordinates of the squares, in range [0, 3] x [0, 3]. The mapping to the cell indexes is immediate.
Note that because of symmetry, a fourfold undeterminacy will remain, which you can probably lift by checking the color patterns.
This procedure will be very robust to deformations. If there is extreme perspective, you can even exploit the four corners to determine an homographic transform instead of affine. In your case, I doubt this will be useful. You can assess proper working by checking that all expected indexes have been assigned.

aligning 2 face images based on their marker points

I am using open cv and C++. I have 2 face images which contain marker points on them. I have already found the coordinates of the marker points. Now I need to align those 2 face images based on those coordinates. The 2 images may not be necessarily of the same height, that is why I can't figure out how to start aligning them, what should be done etc.
In your case, you cannot apply the homography based alignment procedure. Why not? Because it does not fit in this use case. It was designed to align flat surfaces. Faces (3D objects) with markers at different places and depths are clearly no planar surface.
Instead, you can:
try to match the markers between images then interpolate the displacement field of the other pixels. Classical ways of doing it will include moving least squares interpolation or RBF's;
otherwise, a more "Face Processing" way of doing it would be to use the decomposition of faces images between a texture and a face model (like AAM does) and work using the decomposition of your faces in this setup.
Define "align".
Or rather, notice that there does not exist a unique warp of the face-side image that matches the overlapping parts of the frontal one - meaning that there are infinite such warps.
So you need to better specify what your goal is, and what extra information you have, in addition to the images and a few matched points on them. For example, is your camera setup calibrated? I.e do you know the focal lengths of the cameras and their relative position and poses?
Are you trying to build a texture map (e.g. a projective one) so you can plaster a "merged" face image on top of a 3d model that you already have? Then you may want to look into cylindrical or spherical maps, and build a cylindrical or spherical projection of your images from their calibrated poses.
Or are you trying to reconstruct the whole 3d shape of the head based on those 2 views? Obviously you can do this only over the small strip where the two images overlap, and they quality of the images you posted seems a little too poor for that.
Or...?

Edge detection / angle

I can successfully threshold images and find edges in an image. What I am struggling with is trying to extract the angle of the black edges accurately.
I am currently taking the extreme points of the black edge and calculating the angle with the atan2 function, but because of aliasing, depending on the point you choose the angle can come out with some degree of variation. Is there a reliable programmable way of choosing the points to calculate the angle from?
Example image:
For example, the Gimp Measure tool angle at 3.12°,
If you're writing your own library, then creating a robust solution for this problem will allow you to develop several independent chunks of code that you can string together to solve other problems, too. I'll assume that you want to find the corners of the checkerboard under arbitrary rotation, under varying lighting conditions, in the presence of image noise, with a little nonlinear pincushion/barrel distortion, and so on.
Although there are simple kernel-based techniques to find whole pixels as edge pixels, when working with filled polygons you'll want to favor algorithms that can find edges with sub-pixel accuracy so that you can perform accurate line fits. Even though the gradient from dark square to white square crosses several pixels, the "true" edge will be found at some sub-pixel point, and very likely not the point you'd guess by manually clicking.
I tried to provide a simple summary of edge finding in this older SO post:
what is the relationship between image edges and gradient?
For problems like yours, a robust solution is to find edge points along the dark-to-light transitions with sub-pixel accuracy, then fit lines to the edge points, and use the line angles. If you are processing a true camera image, and if there is an uncorrected radial distortion in the image, then there are some potential problems with measurement accuracy, but we'll ignore those.
If you want to find an accurate fit for an edge, then it'd be great to scan for sub-pixel edges in a direction perpendicular to that edge. That presupposes that we have some reasonable estimate of the edge direction to begin with. We can first find a rough estimate of the edge orientation, then perform an accurate line fit.
The algorithm below may appear to have too many steps, but my purpose is point out how to provide a robust solution.
Perform a few iterations of erosion on black pixels to separate the black boxes from one another.
Run a connected components algorithm (blob-finding algorithm) to find the eroded black squares.
Identify the center (x,y) point of each eroded square as well as the (x,y) end points defining the major and minor axes.
Maintain the data for each square in a structure that has the total area in pixels, the center (x,y) point, the (x,y) points of the major and minor axes, etc.
As needed, eliminate all components (blobs) that are too small. For example, you would want to exclude all "salt and pepper" noise blobs. You might also temporarily ignore checkboard squares that are cut off by the image edges--we can return to those later.
Then you'll loop through your list of blobs and do the following for each blob:
Determine the direction roughly perpendicular to the edges of the checkerboard square. How you accomplish this depends in part on what data you calculate when you run your connected components algorithm. In a general-purpose image processing library, a standard connected components algorithm will determine dozens of properties and measurements for each individual blob: area, roundness, major axis direction, minor axis direction, end points of the major and minor axis, etc. For rectangular figures, it can be sufficient to calculate the topmost, leftmost, rightmost, and bottommost points, as these will define the four corners.
Generate edge scans in the direction roughly perpendicular to the edges. These must be performed on the original, unmodified image. This generally assumes you have bilinear interpolation implemented to find the grayscale values of sub-pixel (x,y) points such as (100.35, 25.72) since your scan lines won't fall exactly on whole pixels.
Use a sub-pixel edge point finding technique. In general, you'll perform a curve fit to the edge points in the direction of the scan, then find the real-valued (x,y) point at maximum gradient. That's the edge point.
Store all sub-pixel edge points in a list/array/collection.
Generate line fits for the edge points. These can use Hough, RANSAC, least squares, or other techniques.
From the line equations for each of your four line fits, calculate the line angle.
That algorithm finds the angles independently for each black checkerboard square. It may be overkill for this one application, but if you're developing a library maybe it'll give you some ideas about what sub-algorithms to implement and how to structure them. For example, the algorithm would rely on implementations of these techniques:
Image morphology (e.g. erode, dilate, close, open, ...)
Kernel operations to implement morphology
Thresholding to binarize an image -- the Otsu method is worth checking out
Connected components algorithm (a.k.a blob finding, or the OpenCV contours function)
Data structure for blob
Moment calculations for blob data
Bilinear interpolation to find sub-pixel (x,y) values
A linear ray-scanning technique to find (x,y) gray values along a specific direction (which will also rely on bilinear interpolation)
A curve fitting technique and means to determine steepest tangent to find edge points
Robust line fit technique: Hough, RANSAC, and/or least squares
Data structure for line equation, related functions
All that said, if you're willing to settle for a slight loss of accuracy, and if you know that the image does not suffer from radial distortion, etc., and if you just need to find the angle of the parallel lines defined by all checkboard edges, then you might try..
Simple kernel-based edge point finding technique (Laplacian on Gaussian-smoothed image)
Hough line fit to edge points
Choose the two line fits with the greatest number of votes, which should be one set of horizontal-ish lines and the other set of vertical-ish lines
There are also other techniques that are less accurate but easier to implement:
Use a kernel-based corner-finding operator
Find the angles between corner points.
And so on and so on. As you're developing your library and creating robust implementations of standalone functions that you can string together to create application-specific solutions, you're likely to find that robust solutions rely on more steps than you would have guessed, but it'll also be more clear what the failure mode will be at each incremental step, and how to address that failure mode.
Can I ask, what C++ library are you using to code this?
Jerry is right, if you actually apply a threshold to the image it would be in 2bit, black OR white. What you may have applied is a kind of limiter instead.
You can make a threshold function (if you're coding the image processing yourself) by applying the limiter you may have been using and then turning all non-white pixels black. If you have the right settings, the squares should be isolated and you will be able to calculate the angle.
Once this is done you can use a path finding algorithm to find some edge, any edge will do. If you find a more or less straight path, you can use the extreme points as you are doing now to determine the angle. Since the checker-board rotation is only relevant within 90 degrees, your angle should be modulo 90 degrees or pi over 2 radians.
I'm not sure it's (anywhere close to) the right answer, but my immediate reaction would be to threshold twice: once where anything but black is treated as white, and once where anything but white is treated as black.
Find the angle for each, then interpolate between the two angles.
Your problem have few solutions but all have one very important issue which you seem to neglect. Note: When you are trying to make geometrical calculation in the image, the points you use must be as far as possible one from the other. You are taking 2 points inside a single square. Those points are very close one to another, so a slight error in pixel location of of the points leads to a large error in the angle. Why do you use only a single square, when you have many squares in the image?
Here are few solutions:
Find the line angle of every square. You have at least 9 squares in the image, 4 lines in each square which give you total of 36 angles (18 will be roughly at 3[deg] and 18 will be ~93[deg]). Remove the 90[degrees] and you get 36 different measurements of the angle. Sort them and take the average of the middle 30 (disregarding the lower 3 and higher 3 measurements). This will give you an accurate result
Second solution, find the left extreme point of the leftmost square and the right extreme point of the rightmost square. Now calculate the angle between them. The result will be much more accurate because the points are far away.
A third algorithm which will give you accurate results because it doesn't involve finding any points and no need for thresholding. Just smooth the image, calculate gradients in X and Y directions (gx,gy), calculate the angle of the gradient in each pixel atan(gy,gx) and make histogram of the angles. You will have 2 significant peaks near the 3[deg] and 93[deg]. Just find the peaks by searching the maximum in the histogram. This will work even if you have a lot of noise in the image, even with antialising and jpg artifacts, and even if you have other drawings on the image. But remember, you must smooth the image a lot before calculating the derivatives.

What is the difference between a disparity map and a disparity image in stereo matching?

I am new to stereo matching. I couldn't understand the concept of disparity. What are a disparity map and disparity image, and what is the difference between them?
Disparity
Disparity refers to the distance between two corresponding points in the left and right image of a stereo pair. If you look at the image below you see a labelled point X (ignore X1, X2 & X3). By following the dotted line from X to OL you see the intersection point with the left hand plane at XL. The same principal applies with the right-hand image plane.
If X projects to a point in the left frame XL = (u,v) and to the right frame at XR = (p,q) you can find the disparity for this point as the magnitude of the vector between (u,v) and (p,q).
Obviously this process involves choosing a point in the left hand frame and then finding its match (often called the corresponding point) in the right hand image; often this is a particularly difficult task to do without making a lot of mistakes.
Disparity Map/Image
If you were to perform this matching process for every pixel in the left hand image, finding its match in the right hand frame and computing the distance between them you would end up with an image where every pixel contained the distance/disparity value for that pixel in the left image.
Example
Given a left image
And a right image
By matching every pixel in the left hand image with its corresponding pixel in the right hand image and computing the distance between the pixel values (the disparities) you should end up with images that look like this:
This bottom image is known as a disparity image/map. A useful topic to read about when performing stereo matching is rectification. This will make the process of matching pixels in the left and right image considerably faster as the search will be horizontal.
One of the easiest methods to understand the disparity would be to blink your eyes, one at a time, alternating between your left and right eye. If you observe, the objects closer to you would appear to jump about their position more than the objects further away. This shift would become negligible as the objects move away. Therefore, in the disparity map, the brighter shades represent more shift and lesser distance from the point of view (camera). The darker shades represent lesser shift and therefore greater distance from the camera.

Row by row pixel checking of an image

I took the difference of two consecutive frames of a video. What I got (as you know) a black frame except the moving objects. The moving objects are white. I want to count the number of white pixels in the frame. I mean, I want to go through the image row by row and if the value of the ith pixel is greater than a specified number (say 50) then they must be stored in an array. Later on I will use this array to check if there is actually an object or just a noise. For example, if a car is moving in the video then after frame differencing I will check each pixel of the frames, containing the car, row by row, to detect that car. As a result when there is a moving car in the video the pixels' values are greater than 0 after frame differencing. Any idea how can I sum all the pixels of the moving car that will enable me to decide if it is a car or just a noise.
Thanks in advance :)
You'll probably find that the difference is non-trivial. For instance, you will probably find that the biggest difference is near the edges of the car, perpendicular to the movement of the car. One of those two edges will have negative values, one positive. Therefore, the biggest advantage of the "difference image" is that you restrict your search area. In isolation it's not very useful.
So, what should you do? Well, use an edge detection algorithm on the normal image, and compare the edge found there with the 2 edges found in the difference image. The edges belonging to the car will connect the 2 edges from the difference image.
You could use blob detection: http://www.labbookpages.co.uk/software/imgProc/blobDetection.html
to detect a blob of white pixels in each "difference image". Once you have the blobs you can find their center by finding the average of their pixel positions. Then you can find the path swept out by these centers and check it against some criterion.
Without knowing more about your images I cannot suggest a criterion, but for example if you are watching them move down a straight road you might expect all the points to be roughly co-linear. In this case, you can get the gradient and a point where a blob is found and use the point-gradient form of a line to get the lines equation:
y - y_1 = m(x - x_1)
For example given a point (4, 2) and gradient 3 you would get
y - 2 = 3(x - 4)
y = 3x - 2
You can then check all points against this line to see if they lie along it.