FInding the Z coordinate using disparity map - computer-vision

I have found Disparity map of two stereoscopic images. And now I have to write an OpenGL code to visualize it for 3D reconstruction.
OpenGL has function vertex3f() for which three co-ordinates are to mentioned.
Two dimension are points on image.
So how to find z dimension using Disparity map?
Please suggest something on this.

Since, you have found disparity mapping, I assume that you are working with rectified images. In that case, the Z coordinate is given by simple similar triangle formulation,
z=Bf/d, where f if the focal length of the camera used (in pixels), d is the obtained disparity value for the pixel of interest and B is the baseline between the two stereo images.
Note, the unit of z would be the same as that of B.

Related

Film coordinate to world coordinate

I am working on building 3D point cloud from features matching using OpenCV3.1 and OpenGL.
I have implemented 1) Camera Calibration (Hence I am having Intrinsic Matrix of the camera) 2) Feature extraction( Hence I have 2D points in Pixel Coordinates).
I was going through few websites but generally all have suggested the flow for converting 3D object points to pixel points but I am doing completely backword projection. Here is the ppt that explains it well.
I have implemented film coordinates(u,v) from pixel coordinates(x,y)(With the help of intrisic matrix). Can anyone shed the light on how I can render "Z" of camera coordinate(X,Y,Z) from the film coordinate(x,y).
Please guide me on how I can utilize functions for the desired goal in OpenCV like solvePnP, recoverPose, findFundamentalMat, findEssentialMat.
With single camera and rotating object on fixed rotation platform I would implement something like this:
Each camera has resolution xs,ys and field of view FOV defined by two angles FOVx,FOVy so either check your camera data sheet or measure it. From that and perpendicular distance (z) you can convert any pixel position (x,y) to 3D coordinate relative to camera (x',y',z'). So first convert pixel position to angles:
ax = (x - (xs/2)) * FOVx / xs
ay = (y - (ys/2)) * FOVy / ys
and then compute cartesian position in 3D:
x' = distance * tan(ax)
y' = distance * tan(ay)
z' = distance
That is nice but on common image we do not know the distance. Luckily on such setup if we turn our object than any convex edge will make an maximum ax angle on the sides if crossing the perpendicular plane to camera. So check few frames and if maximal ax detected you can assume its an edge (or convex bump) of object positioned at distance.
If you also know the rotation angle ang of your platform (relative to your camera) Then you can compute the un-rotated position by using rotation formula around y axis (Ay matrix in the link) and known platform center position relative to camera (just subbstraction befor the un-rotation)... As I mention all this is just simple geometry.
In an nutshell:
obtain calibration data
FOVx,FOVy,xs,ys,distance. Some camera datasheets have only FOVx but if the pixels are rectangular you can compute the FOVy from resolution as
FOVx/FOVy = xs/ys
Beware with Multi resolution camera modes the FOV can be different for each resolution !!!
extract the silhouette of your object in the video for each frame
you can subbstract the background image to ease up the detection
obtain platform angle for each frame
so either use IRC data or place known markers on the rotation disc and detect/interpolate...
detect ax maximum
just inspect the x coordinate of the silhouette (for each y line of image separately) and if peak detected add its 3D position to your model. Let assume rotating rectangular box. Some of its frames could look like this:
So inspect one horizontal line on all frames and found the maximal ax. To improve accuracy you can do a close loop regulation loop by turning the platform until peak is found "exactly". Do this for all horizontal lines separately.
btw. if you detect no ax change over few frames that means circular shape with the same radius ... so you can handle each of such frame as ax maximum.
Easy as pie resulting in 3D point cloud. Which you can sort by platform angle to ease up conversion to mesh ... That angle can be also used as texture coordinate ...
But do not forget that you will lose some concave details that are hidden in the silhouette !!!
If this approach is not enough you can use this same setup for stereoscopic 3D reconstruction. Because each rotation behaves as new (known) camera position.
You can't, if all you have is 2D images from that single camera location.
In theory you could use heuristics to infer a Z stacking. But mathematically your problem is under defined and there's literally infinitely many different Z coordinates that would evaluate your constraints. You have to supply some extra information. For example you could move your camera around over several frames (Google "structure from motion") or you could use multiple cameras or use a camera that has a depth sensor and gives you complete XYZ tuples (Kinect or similar).
Update due to comment:
For every pixel in a 2D image there is an infinite number of points that is projected to it. The technical term for that is called a ray. If you have two 2D images of about the same volume of space each image's set of ray (one for each pixel) intersects with the set of rays corresponding to the other image. Which is to say, that if you determine the ray for a pixel in image #1 this maps to a line of pixels covered by that ray in image #2. Selecting a particular pixel along that line in image #2 will give you the XYZ tuple for that point.
Since you're rotating the object by a certain angle θ along a certain axis a between images, you actually have a lot of images to work with. All you have to do is deriving the camera location by an additional transformation (inverse(translate(-a)·rotate(θ)·translate(a)).
Then do the following: Select a image to start with. For the particular pixel you're interested in determine the ray it corresponds to. For that simply assume two Z values for the pixel. 0 and 1 work just fine. Transform them back into the space of your object, then project them into the view space of the next camera you chose to use; the result will be two points in the image plane (possibly outside the limits of the actual image, but that's not a problem). These two points define a line within that second image. Find the pixel along that line that matches the pixel on the first image you selected and project that back into the space as done with the first image. Due to numerical round-off errors you're not going to get a perfect intersection of the rays in 3D space, so find the point where the ray are the closest with each other (this involves solving a quadratic polynomial, which is trivial).
To select which pixel you want to match between images you can use some feature motion tracking algorithm, as used in video compression or similar. The basic idea is, that for every pixel a correlation of its surroundings is performed with the same region in the previous image. Where the correlation peaks is, where it likely was moved from into.
With this pixel tracking in place you can then derive the structure of the object. This is essentially what structure from motion does.

OpenCV Structure from Motion Reprojection Issue

I am currently facing an issue with my Structure from Motion program based on OpenCv.
I'm gonna try to depict you what it does, and what it is supposed to do.
This program lies on the classic "structure from motion" method.
The basic idea is to take a pair of images, detect their keypoints and compute the descriptors of those keypoints. Then, the keypoints matching is done, with a certain number of tests to insure the result is good. That part works perfectly.
Once this is done, the following computations are performed : fundamental matrix, essential matrix, SVD decomposition of the essential matrix, camera matrix computation and finally, triangulation.
The result for a pair of images is a set of 3D coordinates, giving us points to be drawn in a 3D viewer. This works perfectly, for a pair.
Indeed, here is my problem : for a pair of images, the 3D points coordinates are calculated in the coordinate system of the first image of the image pair, taken as the reference image. When working with more than two images, which is the objective of my program, I have to reproject the 3D points computed in the coordinate system of the very first image, in order to get a consistent result.
My question is : How do I reproject 3D points coordinate given in a camera related system, into an other camera related system ? With the camera matrices ?
My idea was to take the 3D point coordinates, and to multiply them by the inverse of each camera matrix before.
I clarify :
Suppose I am working on the third and fourth image (hence, the third pair of images, because I am working like 1-2 / 2-3 / 3-4 and so on).
I get my 3D point coordinates in the coordinate system of the third image, how do I do to reproject them properly in the very first image coordinate system ?
I would have done the following :
Get the 3D points coordinates matrix, apply the inverse of the camera matrix for image 2 to 3, and then apply the inverse of the camera matrix for image 1 to 2.
Is that even correct ?
Because those camera matrices are non square matrices, and I can't inverse them.
I am surely mistaking somewhere, and I would be grateful if someone could enlighten me, I am pretty sure this is a relative easy one, but I am obviously missing something...
Thanks a lot for reading :)
Let us say you have a 3 * 4 extrinsic parameter matrix called P. To match the notations of OpenCV documentation, this is [R|t].
This matrix P describes the projection from world space coordinates to the camera space coordinates. To quote the documentation:
[R|t] translates coordinates of a point (X, Y, Z) to a coordinate system, fixed with respect to the camera.
You are wondering why this matrix is non-square. That is because in the usual context of OpenCV, you are not expecting homogeneous coordinates as output. Therefore, to make it square, just add a fourth row containing (0,0,0,1). Let's call this new square matrix Q.
You have one such matrix for each pair of cameras, that is you have one Qk matrix for each pair of images {k,k+1} that describes the projection from the coordinate space of camera k to that of camera k+1. Those matrices are inversible because they describe isometries in homogeneous coordinates.
To go from the coordinate space of camera 3 to that of camera 1, just apply to your points the inverse of Q2 and then the inverse of Q1.

OpenCV - things to do between stereoCalibrate and collecting 2x2D coordinates

I've calibrated a pair of cameras, I have a loop where markings on a person's legs are being tracked and their positions saved. What now?
Do I have to cv::undistort the images before saving the pixel coordinates? Is there anything else that has to be done before I try to convert these pairs of 2D coordinates to 3D (I don't know how, but that's for later)?
Yes, you need to remove lens distortion before doing triangulation. If you are going to apply cv::undistort on the whole images though, you have to apply it prior to detecting/tracking the points of interest.
If you are not going to compute disparity image and you are only interested in few points, it is more efficient to detect/track the points of interest on the raw image and then apply cv::undistortPoints on these selected points.
After you remove distortion from the points, you can compose the projection matrices for the two cameras and carry out triangulation.

Generate subpixel accuracy disparity map in OpenCV

The purpose is to generate disparity map for a calibrated stereo images.
A 3D model is projected onto a pair of calibrated stereo images (left/right) by using OpenCV function cv::projectPoints(). The cv::projectPoints() gives points on the 2D image coordinate in cv::Point2f which is subpixel accuracy.
As some 3D points are projected onto same pixel region, I only keep the point that have smaller depth/Z due to the fact that further points are occluded by the closer point.
By doing this, I can get two index images for left and right image respectively. Each pixel of the index image refer to its position in the 3D model (stored in std::vector) or in 2D (std::vector).
Following snippet should briefly explain the procedures:
std::vector<cv::Point3f> model_3D;
std::vector<cv::point2f> projectedPointsL, projectedPointsR;
cv::projectPoints(model_3D, rvec, tvec, P1.colRange(0,3), cv::noArray(), projectedPointsL);
cv::projectPoints(model_3D, rvec, tvec, P2.colRange(0,3), cv::noArray(), projectedPointsR);
// Each pixel in indexImage is a index pointing to a position in the vector of projected points
cv::Mat indexImageL, indexImageR;
// This function filter projected points and return the index image
filterProjectedPoints(projectedPointsL, model_3D, indexImageL);
filterProjectedPoints(projectedPointsR, model_3D, indexImageR);
In order to generate the disparity map, I can either:
1.For each pixel in the disparity map, find the corresponding pixel position in the left/right index images and subtract their position. This way gives integer disparity (not subpixel accuracy);
2.For each pixel in the disparity map, find its 2D (floating accuracy) position on both left/right projected points and calculate the difference in x axis as disparity. This way gives subpixel accuracy disparity.
The first way is straightforward and introduces error due to ignoring subpixel projected points.
However, the second way also introduces error as a pair of projected pixels (from same 3D points) may be projected into different location within a pixel. For example, a projected point in left image is (115.289, 80.393), in right image is (145.686, 79.883). Its position in disparity map will be (115, 80) and disparity can be: 145.686 - 115.289 = 30.397. As you can see, they may not be exactly row aligned to have same y coordinate.
Questions are:
1. Are both ways correct (except introducing error)?
2. If the 2nd way is correct, if the error is ignoble when computing subpixel accuracy disparity.
Well, you can also tell me how would you calculate subpixel disparity map in this scenario.
In a rectified image pair, the disparity is simply d=f*b/(z+f), where f is the focal length, b is the baseline between the two cameras and z is the distance to the object normal to the image planes. This assumes a basic pin hole camera model.
By approximating z >> f, this turns to d=f*b/z, i.e. the reciprocal nature of d to z.
So you can just calculate the disparity map analytically.

3d object overlay - augmented reality irrlicht + opencv

I am trying to develop an augmented reality program that overlays a 3d object on top of a marker. The model does not move along(proportionately) with the marker. Here are the list of things that I did
1) Using opencv: a) I used the solvepnp method to find rvecs and tvecs. b) I also used the rodrigues method to find the rotation matrix and appended the tvecs vector to get the projection matrix. c) Just for testing I made some points and lines and projected them to make a cube. This works perfectly fine and I am getting a good output.
2) Using irrlicht: a) I tried to place a 3d model(at position(0,0,0) and rotation(0,0,0)) with the camera feed running in the background. b) Using the rotation matrix found using rodrigues in opencv I calculated the pitch, yaw and roll values from this post("http://planning.cs.uiuc.edu/node103.html") and passed the value onto the rotation field. In the position field I passed the tvecs values. The tvecs values are tvecs[0], -tvecs[1], tvecs[2].
The model is moving in the correct directions but it is not moving proportionately. Meaning, if I move the marker 100 pixels in the x direction, the model only moves 20 pixels(the values 100 and 20 are not measured, I just took arbitrary values illustrate the example). Similarly for y axis and z axis. I do know I have to introduce another transformation matrix that maps the opencv camera coordinates to irrlicht camera coordinates and its a 4x4 matrix. But I do not know how to find it. Also the opencv's projections matrix [R|t] is a 3x4 matrix and it yields a 2d point that is to be projected. The 4x4 matrix mapping between opencv and irrlicht requires a 3d point(made homogeneous) to be fed into a 4x4 matrix. How do I achieve that?
The 4x4 matrix You are writing about seems to be M=[ R|t; 0 1]. t is 3x1 translation vector. To get the transformed coordinates v' of 4x1 ([x y z 1]^T) point v just do v'=Mt.
Your problem with scaling may be also caused by difference in units used for camera calibration in OpenCV and those used by the other library.