3D reconstruction using the projection matrices from the trifocal tensor - c++

I have computed the trifocal tensor and corresponding projection matrices P_0, P_1 and P_2 from line correspondences over 3 views, according to 'Multiple View Geometry by Hartley & Zisserman, 2nd edition', Chapter 16. The computed matrices are:
P_0 =
[1 0 0 0
0 1 0 0
0 0 1 0]
P_1 =
[-0.284955 -0.129918 -0.0276358 0.922516
0.122053 0.560496 0.061383 0.385913
0.00455229 -0.0114709 -0.607497 0.00589735]
P_2 =
[0.21558 -0.10182 0.00499782 0.998876
0.0079606 0.11325 0.0226247 0.047112
0.006613 -0.00260303 -0.130705 0.00512245]
Now I want to compute the 3D (plücker) lines from these projection matrices. I know the intrinsic camera matrix K. What I don't understand is, how to include the intrinsic matrix K with the normalized projection matrices from the trifocal tensor P_1, P_2 and P_3 in order to get correct 3D information. More specifically, I want to follow the triangulation procedure described by Bartoli and Sturm (Section 4, Triangulation).
I appreciate your help.

What do you mean with correct 3D information? The whole coordinate system is only computable up to a scale.
Which algorithm exactly did you use for the computation? Algorithm 16.2 in that chapter?
Why don't you use the triangulation algorithm here:
http://www.robots.ox.ac.uk/~vgg/hzbook/code/vgg_multiview/vgg_line3d_from_lP_lin.m
http://www.robots.ox.ac.uk/~vgg/hzbook/code/vgg_multiview/vgg_line3d_from_lP_nonlin.m

Related

kitti dataset camera projection matrix

I am looking at the kitti dataset and particularly how to convert a world point into the image coordinates. I looked at the README and it says below that I need to transform to camera coordinates first then multiply by the projection matrix. I have 2 questions, coming from a non computer vision background
I looked at the numbers from calib.txt and in particular the matrix is 3x4 with non-zero values in the last column. I always thought this matrix = K[I|0], where K is the camera's intrinsic matrix. So, why is the last column non-zero and what does it mean? e.g P2 is
array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
After applying projection into [u,v,w] and dividing u,v by w, are these values with respect to origin at the center of image or origin being at the top left of the image?
README:
calib.txt: Calibration data for the cameras: P0/P1 are the 3x4
projection
matrices after rectification. Here P0 denotes the left and P1 denotes the
right camera. Tr transforms a point from velodyne coordinates into the
left rectified camera coordinate system. In order to map a point X from the
velodyne scanner to a point x in the i'th image plane, you thus have to
transform it like:
x = Pi * Tr * X
Refs:
How to understand the KITTI camera calibration files?
Format of parameters in KITTI's calibration file
http://www.cvlibs.net/publications/Geiger2013IJRR.pdf
Answer:
I strongly recommend you read those references above. They may solve most, if not all, of your questions.
For question 2: The projected points on images are with respect to origin at the top left. See ref 2 & 3, the coordinates of a far 3d point in image are (center_x, center_y), whose values are provided in the P_rect matrices. Or you can verify this with some simple codes:
import numpy as np
p = np.array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
x = [0, 0, 1E8, 1] # A far 3D point
y = np.dot(p, x)
y[0] /= y[2]
y[1] /= y[2]
y = y[:2]
print(y)
You will see some output like:
array([6.018873e+02, 1.831104e+02 ])
which is quite near the (p[0, 2], p[1, 2]), a.k.a. (center_x, center_y).
For all the P matrices (3x4), they represent:
P(i)rect = [[fu 0 cx -fu*bx],
[0 fv cy -fv*by],
[0 0 1 0]]
Last column are baselines in meters w.r.t. the reference camera 0. You can see the P0 has all zeros in the last column because it is the reference camera.
This post has more details:
How Kitti calibration matrix was calculated?

Projective or Euclidean 3D- Reconstruction?

I have problems understanding if I get an euclidean reconstruction result or just a projective one. So at first let me tell you what I've done:
I have two stereo images. The images are SEM images and are eucentrically tilted. The difference of tilt is 5°. Using SURF-correspondences and RANSAC, I calculate the fundamental matrix with the normalized 8-point algorithm.
Then the images are rectified and I do a dense stereo-matching:
minDisp = -16
numDisp = 16-minDisp
stereo = cv2.StereoSGBM_create(minDisparity = minDisp,
numDisparities = numDisp)
disp = stereo.compute(imgL, imgR).astype(np.float32) / 16.0
That gives me a disparity map, f.e. this 5x5 matrix (the values range from -16 to 16). I mask the bad pixels out (-17) and compute the z-component of my images using the flattened disp array.
-0.1875 -0.1250 -0.1250 0
-0.1250 -0.1250 -0.1250 -17
disp = -0.0625 -0.0625 -0.1250 -17
-0.0625 -0.0625 0 0.0625
0 0 0.0625 0.1250
#create mask that eliminates the bad pixel values ( = minimum values)
mask = disp != disp.min()
dispMasked = disp[mask]
#compute z-component
zWorld = np.float32(((dispMasked) * p) / (2 * np.sin(tilt)))
It's a simplified form of a real triangulation assuming a parallel projection using trigonometric equations. The pixelconstant was calculated with a calibration object. So I get the height in mm. The disparity was calculated in pixels.
The results of the point cloud look quite good but I have a small constant tilt of all points. So the created pointcloud(-plane) has a tiltangle.
My question is now, is this point cloud in real euclidean coordinates or do I have a projective reconstruction ( equal to affine reconstruction? ) result that still differs from an euclidean result (unknown transformation between euclidean and projective result)?
The reason why I ask is that I don't have a real calibration matrix and I didn't use a real triangulation method using central projection with camera center coordinates, focal length and image point coordinates.
Any suggestions or literature are appreciated. :)
Best regards and thanks in advance!

What does the resultant matrix of Homography denote?

I have 2 frames of shaky video. I applied homography on all the inliers points. Now the resultant matrix that i get for different frames are like this
0.2711 -0.0036 0.853
-0.0002 0.2719 -0.2247
0.0000 -0.0000 0.2704
0.4787 -0.0061 0.5514
0.0007 0.4798 -0.0799
0.0000 -0.0000 0.4797
What are those similar values in the diagonal and how can I retrieve the translation component from this matrix ?
Start with the following observation: a homography matrix is only defined up to scale. This means that if you divide or multiply all the matrix coefficients by the same number, you obtain a matrix that represent the same geometrical transformation. This is because, in order to apply the homography to a point at coordinates (x, y), you multiply its matrix H on the right by the column vector [x, y, 1]' (here I use the apostrophe symbol to denote transposition), and then divide the result H * x = [u, v, w]' by the third component w. Therefore, if instead of H you use a scaled matrix (s * H), you end up with [s*u, s*v, s*w], which represents the same 2D point.
So, to understand what is going on with your matrices, start by dividing both of them by their bottom-right component:
octave:1> a = [
> 0.2711 -0.0036 0.853
> -0.0002 0.2719 -0.2247
> 0.0000 -0.0000 0.2704
> ];
octave:2> b=[
> 0.4787 -0.0061 0.5514
> 0.0007 0.4798 -0.0799
> 0.0000 -0.0000 0.4797];
octave:3> a/a(3,3)
ans =
1.00259 -0.01331 3.15459
-0.00074 1.00555 -0.83099
0.00000 -0.00000 1.00000
octave:4> b/b(3,3)
ans =
0.99792 -0.01272 1.14947
0.00146 1.00021 -0.16656
0.00000 -0.00000 1.00000
Now suppose, for the moment, that the third column elements in both matrices were [0, 0, 1]'. Then the effect of applying it to any point (x, y) would be to move it by approx 1/100 units (say, pixels). Basically, not changing it by much.
Plugging back the actual values for the third column shows that both matrices are, essentially, translating the whole images by constant amounts.
So, in conclusion, having equal values on the diagonals, and very small values at indices (1,2) and (2,1), means that these homographies are both (essentially) pure translations.
Various transformations involve all elementary operations such as addition, multiplication, division, and addition of a constant. Only the first two can be modeled by regular matrix multiplication. Note that addition of a constant and, in case of a Homography, division is impossible to represent with matrix multiplication in 2D. Adding a third coordinate (that is converting points to homogeneous representation) solves this problem. For example, if you want to add constant 5 to x you can do this like this
1 0 5 x x+5
0 1 0 * y = y
1
Note that matrix is 2x3, not 2x2 and coordinates have three numbers though they represent 2D points. Also, the last transition is converting back from homogeneous to Euclidian representation. Thus two results are achieved: all operations (multiplication, division, addition of variables and additions of constants) can be represented by matrix multiplication; second, we can chain multiple operations (via multiplying their matrices) and still have only a single matrix as the result (of matrix multiplication).
Ok, now let’s explain Homography. Homography is better to consider in the context of the whole family of transformation moving from simple ones to complex ones. In other words, it is easier to understand the meaning of Homography coefficients by comparing them to the meaning of coefficients of simpler Euclidean, Similarity and Affine transforms. The Euclidwan transformation is the simplest and represents a rigid rotation and translation in space (note that matrix is 2x3). For 2D case,
cos(a) -sin(a) Tx
sin(a) cos(a) Ty
Similarity adds scaling to the rotation coefficients. So now the matrix looks like this:
Scl*cos(a) -scl*sin(a) Tx
Scl*sin(a) scl*cos(a) Ty
Affiliate transformation adds shearing so the rotation coefficients become unrestricted:
a11 a12 Tx
a21 a22 Ty
Homography adds another row that divides the output x and y (see how we explained the division during the transition form homogeneous to Euclidean coordinates above) and thus introduces projectivity or non uniform scaling that is a function of point coordinates. This is better understood by looking at the transition to Euclidean coordinates.
a11 a12 Tx x a11*x+a12*y+Tx (a11*x+a12*y+Tx)/(a32*x+a32*y+a33)
a21 a22 Ty * y = a21*x+a22*y+Ty -> (a21*x+a22*y+Ty)/(a32*x+a32*y+a33)
a31 a32 a33 1 a32*x+a32*y+a33
Thus homography has an extra row compared to other transformations such as affine or similarity. This extra row allows to scale objects depending on their coordinates which is how projectivity is formed.
Finally, speaking of your numbers:
0.4787 -0.0061 0.5514
0.0007 0.4798 -0.0799
0.0000 -0.0000 0.4797
This is not homography!. Just look at the last row and you will see that the first two coefficients are 0 thus there is no projectivity. Since a11=a22 this is not even an Affine transformation. This is rather a similarity transform. The translation is
Tx=0.5514/0.4797 and Ty=-0.0799/0.4797

Drawing Euler Angles rotational model on a 2d image

I'm currently attempting to draw a 3d representation of euler angles within a 2d image (no opengl or 3d graphic windows). The image output can be similar to as below.
Essentially I am looking for research or an algorithm which can take a Rotation Matrix or a set of Euler angles and then output them onto a 2d image, like above. This will be implemented in a C++ application that uses OpenCV. It will be used to output annotation information on a OpenCV window based on the state of the object.
I think I'm over thinking this because I should be able to decompose the unit vectors from a rotation matrix and then extract their x,y components and draw a line in cartesian space from (0,0). Am i correct in this thinking?
EDIT: I'm looking for an Orthographic Projection. You can assume the image above has the correct camera/viewing angle.
Any help would be appreciated.
Thanks,
EDIT: The example source code can now be found in my repo.
Header: https://bitbucket.org/jluzwick/tennisspindetector/src/6261524425e8d80772a58fdda76921edb53b4d18/include/projection_matrix.h?at=master
Class Definitions: https://bitbucket.org/jluzwick/tennisspindetector/src/6261524425e8d80772a58fdda76921edb53b4d18/src/projection_matrix.cpp?at=master
It's not the best code but it works and shows the steps necessary to get the projection matrix described in the accepted answer.
Also here is a youtube vid of the projection matrix in action (along with scale and translation added): http://www.youtube.com/watch?v=mSgTFBFb_68
Here are my two cents. Hope it helps.
If I understand correctly, you want to rotate 3D system of coordinates and then project it orthogonally onto a given 2D plane (2D plane is defined with respect to the original, unrotated 3D system of coordinates).
"Rotating and projecting 3D system of coordinates" is "rotating three 3D basis vectors and projecting them orthogonally onto a 2D plane so they become 2D vectors with respect to 2D basis of the plane". Let the original 3D vectors be unprimed and the resulting 2D vectors be primed. Let {e1, e2, e3} = {e1..3} be 3D orthonormal basis (which is given), and {e1', e2'} = {e1..2'} be 2D orthonormal basis (which we have to define). Essentially, we need to find such operator PR that PR * v = v'.
While we can talk a lot about linear algebra, operators and matrix representation, it'd be too long of a post. It'll suffice to say that :
For both 3D rotation and 3D->2D projection operators there are real matrix representations (linear transformations; 2D is subspace of 3D).
These are two transformations applied consequently, i.e. PR * v = P * R * v = v', so we need to find rotation matrix R and projection matrix P. Clearly, after we rotated v using R, we can project the result vector vR using P.
You have the rotation matrix R already, so we consider it is a given 3x3 matrix. So for simplicity we will talk about projecting vector vR = R * v.
Projection matrix P is a 2x3 matrix with i-th column being a projection of i-th 3D basis vector ei onto {e1..2'} basis.
Let's find P projection matrix such as a 3D vector vR is linearly transformed into 2D vector v' on a 2D plane with an orthonormal basis {e1..2'}.
A 2D plane can be easily defined by a vector normal to it. For example, from the figures in the OP, it seems that our 2D plane (the plane of the paper) has normal unit vector n = 1/sqrt(3) * ( 1, 1, 1 ). We need to find a 2D basis in the 2D plane defined by this n. Since any two linearly independent vectors lying in our 2D plane would form such basis, here are infinite number of such basis. From the problem's geometry and for the sake of simplicity, let's impose two additional conditions: first, the basis should be orthonormal; second, should be visually appealing (although, this is somewhat a subjective condition). As it can be easily seen, such basis is formed trivially in the primed system by setting e1' = ( 1, 0 )' = x'-axis (horizontal, positive direction from left to right) and e2' = ( 0, 1 )' = y'-axis (vertical, positive direction from bottom to top).
Let's now find this {e1', e2'} 2D basis in {e1..3} 3D basis.
Let's denote e1' and e2' as e1" and e2" in the original basis. Noting that in our case e1" has no e3-component (z-component), and using the fact that n dot e1" = 0, we get that e1' = ( 1, 0 )' -> e1" = ( -1/sqrt(2), 1/sqrt(2), 0 ) in the {e1..3} basis. Here, dot denotes dot-product.
Then e2" = n cross e1" = ( -1/sqrt(6), -1/sqrt(6), 2/sqrt(6) ). Here, cross denotes cross-product.
The 2x3 projection matrix P for the 2D plane defined by n = 1/sqrt(3) * ( 1, 1, 1 ) is then given by:
( -1/sqrt(2) 1/sqrt(2) 0 )
( -1/sqrt(6) -1/sqrt(6) 2/sqrt(6) )
where first, second and third columns are transformed {e1..3} 3D basis onto our 2D basis {e1..2'}, i.e. e1 = ( 1, 0, 0 ) from 3D basis has coordinates ( -1/sqrt(2), -1/sqrt(6) ) in our 2D basis, and so on.
To verify the result we can check few obvious cases:
n is orthogonal to our 2D plane, so there should be no projection. Indeed, P * n = P * ( 1, 1, 1 ) = 0.
e1, e2 and e3 should be transformed into their representation in {e1..2'}, namely corresponding column in P matrix. Indeed, P * e1 = P * ( 1, 0 ,0 ) = ( -1/sqrt(2), -1/sqrt(6) ) and so on.
To finalize the problem. We now constructed a projection matrix P from 3D into 2D for an arbitrarily chosen 2D plane. We now can project any vector, previously rotated by rotation matrix R, onto this plane. For example, rotated original basis {R * e1, R * e2, R * e3}. Moreover, we can multiply given P and R to get a rotation-projection transformation matrix PR = P * R.
P.S. C++ implementation is left as a homework exercise ;).
The rotation matrix will be easy to display,
A Rotation matrix can be constructed by using a normal, binormal and tangent.
You should be able to get them back out as follows:-
Bi-Normal (y') : matrix[0][0], matrix[0][1], matrix[0][2]
Normal (z') : matrix[1][0], matrix[1][1], matrix[1][2]
Tangent (x') : matrix[2][0], matrix[2][1], matrix[2][2]
Using a perspective transform you can the add perspective (x,y) = (x/z, y/z)
To acheive an orthographic project similar to that shown you will need to multiply by another fixed rotation matrix to move to the "camera" view (45° right and then up)
You can then multiply your end points x(1,0,0),y(0,1,0),z(0,0,1) and center(0,0,0) by the final matrix, use only the x,y coordinates.
center should always transform to 0,0,0
You can then scale these values to draw to you 2D canvas.

Model matrix in 3D graphics / OpenGL

I'm following some tutorials to learn openGL (from www.opengl-tutorial.org if it makes any difference) and there is an exercise that asks me to draw a cube and a triangle on the screen and it says as a hint that I'm supposed to calculate two MVP-matrices, one for each object. MVP matrix is given by Projection*View*Model and as far as I understand, the projection and view matrices are the same for all the objects on the screen (they are only affected by my choice of "camera" location and settings). However, the model matrix should change since it's supposed to give me the coordinates and rotation of the object in the global coordinates. Following the tutorials, for my cube the model matrix is just the unit matrix since it is located at the origin and there's no rotation or scaling. Then I draw my triangle so that its vertices are at (2,2,0), (2,3,0) and (3,2,0). Now my question is, what is the model matrix for my triangle?
My own reasoning says that if I don't want to rotate or scale it, the model matrix should be just translation matrix. But what gives the translation coordinates here? Should it include the location of one of the vertices or the center of the triangle or what? Or have I completely misunderstood what the model matrix is?
The model matrix is like the other matrices (projection, view) a 4x4 matrix with the same layout. Depending on whether you're using column or row vectors the matrix consists of the x,y,z axis of your local frame and a t1,t2,t3 vector specifying the translation part
so for a column vector p the transformation matrix (M) looks like
x1, x2, x3, t1,
y1, y2, y3, t2,
z1, z2, z3, t3,
0, 0, 0, 1
p' = M * p
so for row vectors you could try to find out how the matrix layout must be. Also note that if you have row vectors p' = p * M.
If you have no rotational component your local frame has the usual x,y,z axis as the rows of the 3x3 submatrix of the model matrix..
1 0 0 t1 -> x axis
0 1 0 t2 -> y axis
0 0 1 t3 -> z axis
0 0 0 1
the forth column specifies the translation vector (t1,t2,t3). If you have a point p =
1,
0,
0,
1
in a local coordinate system and you want it to translate +1 in z direction to place it in the world coordinate system the model matrix is simply:
1 0 0 0
0 1 0 0
0 0 1 1
0 0 0 1
p' = M * p .. p' is the transformed point in world coordinates.
For your example above you could already specify the triangle in (2,2,0), (2,3,0) and (3,2,0) in your local coordinate system. Then the model matrix is trivial. Otherwise you have to find out how you compute rotation etc.. I recommend reading the first few chapters of mathematics for 3d game programming and computer graphics. It's a very simple 3d math book, there you should get the minimal information you need to handle the most of the 3d graphics math.