Get 3D coordinates from 2D image pixel if extrinsic and intrinsic parameters are known - c++

I am doing camera calibration from tsai algo. I got intrensic and extrinsic matrix, but how can I reconstruct the 3D coordinates from that inormation?
1) I can use Gaussian Elimination for find X,Y,Z,W and then points will be X/W , Y/W , Z/W as homogeneous system.
2) I can use the
OpenCV documentation approach:
as I know u, v, R , t , I can compute X,Y,Z.
However both methods end up in different results that are not correct.
What am I'm doing wrong?

If you got extrinsic parameters then you got everything. That means that you can have Homography from the extrinsics (also called CameraPose). Pose is a 3x4 matrix, homography is a 3x3 matrix, H defined as
H = K*[r1, r2, t], //eqn 8.1, Hartley and Zisserman
with K being the camera intrinsic matrix, r1 and r2 being the first two columns of the rotation matrix, R; t is the translation vector.
Then normalize dividing everything by t3.
What happens to column r3, don't we use it? No, because it is redundant as it is the cross-product of the 2 first columns of pose.
Now that you have homography, project the points. Your 2d points are x,y. Add them a z=1, so they are now 3d. Project them as follows:
p = [x y 1];
projection = H * p; //project
projnorm = projection / p(z); //normalize
Hope this helps.

As nicely stated in the comments above, projecting 2D image coordinates into 3D "camera space" inherently requires making up the z coordinates, as this information is totally lost in the image. One solution is to assign a dummy value (z = 1) to each of the 2D image space points before projection as answered by Jav_Rock.
p = [x y 1];
projection = H * p; //project
projnorm = projection / p(z); //normalize
One interesting alternative to this dummy solution is to train a model to predict the depth of each point prior to reprojection into 3D camera-space. I tried this method and had a high degree of success using a Pytorch CNN trained on 3D bounding boxes from the KITTI dataset. Would be happy to provide code but it'd be a bit lengthy for posting here.

Related

Place a billboard at a given distance so that it occupies a certain size on screen

I have a rectangle that I place on the screen using a simple scale matrix (S). Now I would like to place this rectangle into "3D space", but have it appear just like before on the screen. I found that I can do so by applying the view and projection matrices in inverse. Something like:
S' = (V⁻¹ P⁻¹ S)
Matrix = P V (V⁻¹ P⁻¹ S)
This works fine so far. My rectangle is like a billboard now and I can treat it like any other object, apply P and V and it will show up correctly. However, there is a degeneracy here: I don't specify at which depth the object is placed. It could be twice as far away but x times bigger!
The reason that this is important is that I want to animate the rectangle, say rotate it around the Z axis or move in 3D space. Then I want it to come to a stop and be positioned pixel-perfect on the screen.
How can I place a flat object at a given z distance, such that it appears on screen in a certain way? I already have with the scale matrix that I need to display it in OpenGL without any 3D transformation, that is the matrix for displaying it in NDC or screen coordinates. I also have the projection and view matrices I want to use. How can I go from this to the desired model matrix?
How can I place a flat object at a given z distance, such that it appears on screen in a certain way [...]
Actually you want to draw the object in view space. Define a model matrix for the object that contains only one translation component (0, 0, -z) and skip the view matrix when drawing the object.
Usually the order to transform 3D vertex v to 2D point p is written as a matrix multiplication. [Depending on the API you are using, the order might be reversed. The notation I used here is glsl - friendly]
p = P * V * M * S * v
v = vertex (usually 3D of the form x,y,z,1)
P = projection matrix
V = view (camera) matrix
M = model matrix (or world transformation)
S is a 4x4 object-scaling matrix
matrices are usually 4x4 with the last line/column 0,0,0,1
The model matrix M can be decomposed into a number of sub components such as T translation, S scale and R rotation. Of course here the order matters.
To rotate an object or vertex around itself, first rotate it (as if it is at the origin already) and then translate it to the position it needs to be, for example using vector v with coordinates (x,y,z).
R is a typical 3x3 rotation matrix embedded in a 4x4 with 0,0,0,1 on the last line
v is a 3D vector with coordinates (x,y,z)
T is a 4x4 translation matrix, all zeroes except the last column where it has x,y,z,1
M = T * R (first R, then T)
To rotate an object around an arbitrary point q, first translate it so that it is relative to that point (i.e. q is at the origin, so you subtracted q from v). Then rotate around q (at the origin) by applying R. Lastly place the object it back where it should be (so add q again).
R is your typical 3x3 rotation matrix embedded in a 4x4 with 0,0,0,1 on the last line
v is a 3D vector with coordinates (x,y,z)
T is a 4x4 translation matrix, all zeroes except the last column where it has x,y,z,1
L is another 4x4 translation matrix, but now with q instead of v
L' is the inverse transformation of L
M = T * L * R * L'
Also, scaling your object first (i.e. S is at the end of the multiplications) before translations will keep the translation in world units. Scale after all transformations, in fact scales all translations too, and the object will move over scaled distances.

OpenCV undistortPoints and triangulatePoint give odd results (stereo)

I'm trying to get 3D coordinates of several points in space, but I'm getting odd results from both undistortPoints() and triangulatePoints().
Since both cameras have different resolution, I've calibrated them separately, got RMS errors of 0,34 and 0,43, then used stereoCalibrate() to get more matrices, got an RMS of 0,708, and then used stereoRectify() to get remaining matrices. With that in hand I've started the work on gathered coordinates, but I get weird results.
For example, input is: (935, 262), and the undistortPoints() output is (1228.709125, 342.79841) for one point, while for another it's (934, 176) and (1227.9016, 292.4686) respectively. Which is weird, because both of these points are very close to the middle of the frame, where distortions are the smallest. I didn't expect it to move them by 300 pixels.
When passed to traingulatePoints(), the results get even stranger - I've measured the distance between three points in real life (with a ruler), and calculated the distance between pixels on each picture. Because this time the points were on a pretty flat plane, these two lengths (pixel and real) matched, as in |AB|/|BC| in both cases was around 4/9. However, triangulatePoints() gives me results off the rails, with |AB|/|BC| being 3/2 or 4/2.
This is my code:
double pointsBok[2] = { bokList[j].toFloat()+xBok/2, bokList[j+1].toFloat()+yBok/2 };
cv::Mat imgPointsBokProper = cv::Mat(1,1, CV_64FC2, pointsBok);
double pointsTyl[2] = { tylList[j].toFloat()+xTyl/2, tylList[j+1].toFloat()+yTyl/2 };
//cv::Mat imgPointsTyl = cv::Mat(2,1, CV_64FC1, pointsTyl);
cv::Mat imgPointsTylProper = cv::Mat(1,1, CV_64FC2, pointsTyl);
cv::undistortPoints(imgPointsBokProper, imgPointsBokProper,
intrinsicOne, distCoeffsOne, R1, P1);
cv::undistortPoints(imgPointsTylProper, imgPointsTylProper,
intrinsicTwo, distCoeffsTwo, R2, P2);
cv::triangulatePoints(P1, P2, imgWutBok, imgWutTyl, point4D);
double wResult = point4D.at<double>(3,0);
double realX = point4D.at<double>(0,0)/wResult;
double realY = point4D.at<double>(1,0)/wResult;
double realZ = point4D.at<double>(2,0)/wResult;
The angles between points are kinda sorta good but usually not:
`7,16816 168,389 4,44275` vs `5,85232 170,422 3,72561` (degrees)
`8,44743 166,835 4,71715` vs `12,4064 158,132 9,46158`
`9,34182 165,388 5,26994` vs `19,0785 150,883 10,0389`
I've tried to use undistort() on the entire frame, but got results just as odd. The distance between B and C points should be pretty much unchanged at all times, and yet this is what I get:
7502,42
4876,46
3230,13
2740,67
2239,95
Frame by frame.
Pixel distance (bottom) vs real distance (top) - should be very similar:
Angle:
Also, shouldn't both undistortPoints() and undistort() give the same results (another set of videos here)?
The function cv::undistort does undistortion and reprojection in one go. It performs the following list of operations:
undo camera projection (multiplication with the inverse of the camera matrix)
apply the distortion model to undo the distortion
rotate by the provided Rotation matrix R1/R2
project points to image using the provided Projection matrix P1/P2
If you pass the matrices R1, P1 resp. R2, P2 from cv::stereoCalibrate(), the input points will be undistorted and rectified. Rectification means that the images are transformed in a way such that corresponding points have the same y-coordinate. There is no unique solution for image rectification, as you can apply any translation or scaling to both images, without changing the alignement of corresponding points.
That being said, cv::stereoCalibrate() can shift the center of projection quite a bit (e.g. 300 pixels). If you want pure undistortion you can pass an Identity Matrix (instead of R1) and the original camera Matrix K (instead of P1). This should lead to pixel coordinates similar to the original ones.

transforming projection matrices computed from trifocal tensor to estimate 3D points

I am using this legacy code: http://fossies.org/dox/opencv-2.4.8/trifocal_8cpp_source.html
for estimating 3D points from the given corresponding 2D points from 3 different views. The problem I faced is same as stated here: http://opencv-users.1802565.n2.nabble.com/trifocal-tensor-icvComputeProjectMatrices6Points-icvComputeProjectMatricesNPoints-td2423108.html
I could compute Projection matrices successfully using icvComputeProjectMatrices6Points. I used 6 set of corresponding points from 3 views. Results are shown below:
projMatr1 P1 =
[-0.22742541, 0.054754492, 0.30500898, -0.60233182;
-0.14346679, 0.034095913, 0.33134204, -0.59825808;
-4.4949986e-05, 9.9166318e-06, 7.106331e-05, -0.00014547621]
projMatr2 P2 =
[-0.17060626, -0.0076031247, 0.42357284, -0.7917347;
-0.028817834, -0.0015948272, 0.2217239, -0.33850163;
-3.3046148e-05, -1.3680664e-06, 0.0001002633, -0.00019192585]
projMatr3 P3 =
[-0.033748217, 0.099119112, -0.4576003, 0.75215244;
-0.001807699, 0.0035084449, -0.24180284, 0.39423448;
-1.1765103e-05, 2.9554356e-05, -0.00013438619, 0.00025332544]
Furthermore, I computed 3D points using icvReconstructPointsFor3View. The six 3D points are as following:
4D points =
[-0.4999997, -0.26867214, -1, 2.88633e-07, 1.7766099e-07, -1.1447386e-07;
-0.49999994, -0.28693244, 3.2249036e-06, 1, 7.5971762e-08, 2.1956141e-07;
-0.50000024, -0.72402155, 1.6873783e-07, -6.8603946e-08, -1, 5.8393886e-07;
-0.50000012, -0.56681377, 1.202426e-07, -4.1603233e-08, -2.3659911e-07, 1]
While, actual 3D points are as following:
- { ID:1,X:500.000000, Y:800.000000, Z:3000.000000}
- { ID:2,X:500.000000, Y:800.000000, Z:4000.000000}
- { ID:3,X:1500.000000, Y:800.000000, Z:4000.000000}
- { ID:4,X:1500.000000, Y:800.000000, Z:3000.000000}
- { ID:5,X:500.000000, Y:1800.000000, Z:3000.000000}
- { ID:6,X:500.000000, Y:1800.000000, Z:4000.000000}
My question is now, how to transform P1, P2 and P3 to a form that allows
a meaningful triangulation? I need to compute the correct 3D points using trifocal tensor.
The trifocal tensor won't help you, because like the fundamental matrix, it only enables projective reconstruction of the scene and camera poses. If X0_j and P0_i are the true 3D points and camera matrices, this means that the reconstructed points Xp_j = inv(H).X0_j and camera matrices Pp_i = P0_i.H are only defined up to a common 4x4 matrix H, which is unknown.
In order to obtain a metric reconstruction, you need to know the calibration matrices of your cameras. Whether you know these matrices (e.g. if you use virtual cameras for image rendering) or you estimated them using camera calibration (see OpenCV calibration tutorials), you can find a method to obtain a metric reconstruction in §7.4.5 of "Geometry, constraints and computation of the trifocal tensor", by C.Ressl (PDF).
Note that even when using this method, you cannot obtain an up-to-scale 3D reconstruction, unless you have some additional knowledge (such as knowledge of the actual distance between two fixed 3D points).
Sketch of the algorithm:
Inputs: the three camera matrices P1, P2, P3 (projective world coordinates, with the coordinate system chosen so that P1=[I|0]), the associated calibration matrices K1, K2, K3 and one point correspondence x1, x2, x3.
Outputs: the three camera matrices P1_E, P2_E, P3_E (metric reconstruction).
Set P1_E=K1.[I|0]
Compute the fundamental matrices F21, F31. Denoting P2=[A|a] and P3=[B|b], you have F21=[a]x.A and F31=[b]x.B (see table 9.1 in [HZ00]), where for a 3x1 vector e [e]x = [0,-e_3,e_2;e_3,0,-e_1;-e_2,e_1,0]
Compute the essential matrices E21 = K2'.F21.K1 and E31 = K3'.F31.K1
For i = 2,3, do the following
i. Compute the SVD Ei1=U.S.V'. If det(U)<0 set U=-U. If det(V)<0 set V=-V.
ii. Define W=[0,-1,0;1,0,0;0,0,1], Ri=U.W.V' and ti = third column of U
iii. Define M=[Ri'.ti]x, X1=M.inv(K1).x1 and Xi=M.Ri'.inv(Ki).xi
iv. If X1_3.Xi_3<0, set Ri=U.W'.V' and recompute M and X1
v. If X1_3<0 set ti = -ti
vi. Define Pi_E=Ki.[Ri|ti]
Do the following to retrieve the correct scale for t3 (consistantly to the fact that ||t2||=1):
i. Define p2=R2'.inv(K2).x2 and p3=R3'.inv(K3).x3
ii. Define M=[p2]x
iii. Compute the scale s=(p3'.M.R2'.t2)/(p3'.M.R3'.t3)
iv. Set t3=t3*s
End of the algorithm: the camera matrices P1_E, P2_E, P3_E are valid up to an isotropic scaling of the scene and a change of 3D coordinate system (hence it is a metric reconstruction).
[HZ00] "Multiple view geometry in computer vision" , by R.Hartley and A.Zisserman, 2000.

Drawing Euler Angles rotational model on a 2d image

I'm currently attempting to draw a 3d representation of euler angles within a 2d image (no opengl or 3d graphic windows). The image output can be similar to as below.
Essentially I am looking for research or an algorithm which can take a Rotation Matrix or a set of Euler angles and then output them onto a 2d image, like above. This will be implemented in a C++ application that uses OpenCV. It will be used to output annotation information on a OpenCV window based on the state of the object.
I think I'm over thinking this because I should be able to decompose the unit vectors from a rotation matrix and then extract their x,y components and draw a line in cartesian space from (0,0). Am i correct in this thinking?
EDIT: I'm looking for an Orthographic Projection. You can assume the image above has the correct camera/viewing angle.
Any help would be appreciated.
Thanks,
EDIT: The example source code can now be found in my repo.
Header: https://bitbucket.org/jluzwick/tennisspindetector/src/6261524425e8d80772a58fdda76921edb53b4d18/include/projection_matrix.h?at=master
Class Definitions: https://bitbucket.org/jluzwick/tennisspindetector/src/6261524425e8d80772a58fdda76921edb53b4d18/src/projection_matrix.cpp?at=master
It's not the best code but it works and shows the steps necessary to get the projection matrix described in the accepted answer.
Also here is a youtube vid of the projection matrix in action (along with scale and translation added): http://www.youtube.com/watch?v=mSgTFBFb_68
Here are my two cents. Hope it helps.
If I understand correctly, you want to rotate 3D system of coordinates and then project it orthogonally onto a given 2D plane (2D plane is defined with respect to the original, unrotated 3D system of coordinates).
"Rotating and projecting 3D system of coordinates" is "rotating three 3D basis vectors and projecting them orthogonally onto a 2D plane so they become 2D vectors with respect to 2D basis of the plane". Let the original 3D vectors be unprimed and the resulting 2D vectors be primed. Let {e1, e2, e3} = {e1..3} be 3D orthonormal basis (which is given), and {e1', e2'} = {e1..2'} be 2D orthonormal basis (which we have to define). Essentially, we need to find such operator PR that PR * v = v'.
While we can talk a lot about linear algebra, operators and matrix representation, it'd be too long of a post. It'll suffice to say that :
For both 3D rotation and 3D->2D projection operators there are real matrix representations (linear transformations; 2D is subspace of 3D).
These are two transformations applied consequently, i.e. PR * v = P * R * v = v', so we need to find rotation matrix R and projection matrix P. Clearly, after we rotated v using R, we can project the result vector vR using P.
You have the rotation matrix R already, so we consider it is a given 3x3 matrix. So for simplicity we will talk about projecting vector vR = R * v.
Projection matrix P is a 2x3 matrix with i-th column being a projection of i-th 3D basis vector ei onto {e1..2'} basis.
Let's find P projection matrix such as a 3D vector vR is linearly transformed into 2D vector v' on a 2D plane with an orthonormal basis {e1..2'}.
A 2D plane can be easily defined by a vector normal to it. For example, from the figures in the OP, it seems that our 2D plane (the plane of the paper) has normal unit vector n = 1/sqrt(3) * ( 1, 1, 1 ). We need to find a 2D basis in the 2D plane defined by this n. Since any two linearly independent vectors lying in our 2D plane would form such basis, here are infinite number of such basis. From the problem's geometry and for the sake of simplicity, let's impose two additional conditions: first, the basis should be orthonormal; second, should be visually appealing (although, this is somewhat a subjective condition). As it can be easily seen, such basis is formed trivially in the primed system by setting e1' = ( 1, 0 )' = x'-axis (horizontal, positive direction from left to right) and e2' = ( 0, 1 )' = y'-axis (vertical, positive direction from bottom to top).
Let's now find this {e1', e2'} 2D basis in {e1..3} 3D basis.
Let's denote e1' and e2' as e1" and e2" in the original basis. Noting that in our case e1" has no e3-component (z-component), and using the fact that n dot e1" = 0, we get that e1' = ( 1, 0 )' -> e1" = ( -1/sqrt(2), 1/sqrt(2), 0 ) in the {e1..3} basis. Here, dot denotes dot-product.
Then e2" = n cross e1" = ( -1/sqrt(6), -1/sqrt(6), 2/sqrt(6) ). Here, cross denotes cross-product.
The 2x3 projection matrix P for the 2D plane defined by n = 1/sqrt(3) * ( 1, 1, 1 ) is then given by:
( -1/sqrt(2) 1/sqrt(2) 0 )
( -1/sqrt(6) -1/sqrt(6) 2/sqrt(6) )
where first, second and third columns are transformed {e1..3} 3D basis onto our 2D basis {e1..2'}, i.e. e1 = ( 1, 0, 0 ) from 3D basis has coordinates ( -1/sqrt(2), -1/sqrt(6) ) in our 2D basis, and so on.
To verify the result we can check few obvious cases:
n is orthogonal to our 2D plane, so there should be no projection. Indeed, P * n = P * ( 1, 1, 1 ) = 0.
e1, e2 and e3 should be transformed into their representation in {e1..2'}, namely corresponding column in P matrix. Indeed, P * e1 = P * ( 1, 0 ,0 ) = ( -1/sqrt(2), -1/sqrt(6) ) and so on.
To finalize the problem. We now constructed a projection matrix P from 3D into 2D for an arbitrarily chosen 2D plane. We now can project any vector, previously rotated by rotation matrix R, onto this plane. For example, rotated original basis {R * e1, R * e2, R * e3}. Moreover, we can multiply given P and R to get a rotation-projection transformation matrix PR = P * R.
P.S. C++ implementation is left as a homework exercise ;).
The rotation matrix will be easy to display,
A Rotation matrix can be constructed by using a normal, binormal and tangent.
You should be able to get them back out as follows:-
Bi-Normal (y') : matrix[0][0], matrix[0][1], matrix[0][2]
Normal (z') : matrix[1][0], matrix[1][1], matrix[1][2]
Tangent (x') : matrix[2][0], matrix[2][1], matrix[2][2]
Using a perspective transform you can the add perspective (x,y) = (x/z, y/z)
To acheive an orthographic project similar to that shown you will need to multiply by another fixed rotation matrix to move to the "camera" view (45° right and then up)
You can then multiply your end points x(1,0,0),y(0,1,0),z(0,0,1) and center(0,0,0) by the final matrix, use only the x,y coordinates.
center should always transform to 0,0,0
You can then scale these values to draw to you 2D canvas.

How to calculate extrinsic parameters of one camera relative to the second camera?

I have calibrated 2 cameras with respect to some world coordinate system. I know rotation matrix and translation vector for each of them relative to the world frame. From these matrices how to calculate rotation matrix and translation vector of one camera with respect to the other??
Any help or suggestion please. Thanks!
Here is an easier solution, since you already have the 3x3 rotation matrices R1 and R2, and the 3x1 translation vectors t1 and t2.
These express the motion from the world coordinate frame to each camera, i.e. are the matrices such that, if p is a point expressed in world coordinate frame, then the same point expressed in, say, camera 1 frame is p1 = R1 * p + t1.
The motion from camera 1 to 2 is then simply the composition of (a) the motion FROM camera 1 TO the world frame, and (b) of the motion FROM the world frame TO camera 2. You can easily compute this composition as follows:
Form the 4x4 roto-translation matrices Qw1 = [R1 t1] and Qw2 = [ R2 t2 ], both with the 4th row equal to [0 0 0 1]. These matrices completely express the roto-translation FROM the world coordinate frame TO camera 1 and 2 respectively.
The motion FROM camera 1 TO the world frame is simply Q1w = inv(Qw1). Here inv() is the algebraic inverse matrix, i.e. the one such that inv(X) * X = X * inv(X) = IdentityMatrix, for every nonsingular matrix X.
The roto-translation from camera 1 to 2 is then Q12 = Q1w * Qw2, and viceversa, the one from camera 2 to 1 is Q21 = Q2w * Qw1 = inv(Qw2) * Qw1.
Once you have Q12 you can extract from it the rotation and translation parts, if you so wish, respectively from its upper 3x3 submatrix and right 3x1 sub-column.
First convert your rotation matrix into a rotation vector. Now you have 2 3d vectors for each camera, call them A1,A2,B1,B2. You have all 4 of them with respect to some origin O. The rule you need is
A relative to B = (A relative to O)- (B relative to O)
Apply that rule to your 2 vectors and you will have their pose relative to one another.
Some documentation on converting from rotation matrix to euler angles can be found here as well as many other places. If you are using openCV you can just use Rodrigues. Here is some matlab/octave code I found.
Here is very simple and easy solution. I suppose your 1st camera has R1 and T1, 2nd camera has R2 and T2 rotation matrixes and translation vector according to common reference point.
Translation from 1st to 2nd camera, rotation from 1st to 2nd camera can be calculated by following two line matlab code;
R=R2*R1';
T=T2-R*T1;
but note, that is true if you have just one R and T for each camera. (I mean rotations and translation for one unique world reference). if you have more reference translations and rotations, you should calcuate R,T for every single reference point. Probably they will be very close to each other. But those might be sligtly different. Then you can calculate mean of Translation vector and convert all found rotation matrix to rotation vector, caluculate its mean and then convert them as rotation matrix.