How to get KITTI camera calibration file? - computer-vision

I'm working with KITTI dataset for camera, lidar calibration
And also, I have other calib files for calibration which forms differently with KITTI calibration file.
Here is what i have (calib.yaml)
fx: 1065.575589
fy: 1064.775881
cx: 973.889063
cy: 614.194865
k1: -0.139457
k2: 0.071645
p1/k3: -0.000150
p2/k4: 0.000889
Quaternion: [-0.005329,-0.004166,-0.705518,0.708660]
translation_vector: [0.006560,-0.169077,-0.198306]
I can get camera intrinsic matrix with fx, fy, cx, cy and distortion matrix with k1, k2, p1/k3, p2/k4
And I got rotation matrix by computing the Quaternion values.
Here's my question.
Is there any way to convert these values to match with KITTI calibration file data like this?:
P2: 1056.437682 0.0 974.398942 0.0 0.0 1024.415886 583.178996 0.0 0.0 0.0 1.0 0.0
R0_rect: 0 1 0 0 0 -1 -1 0 0
Tr_velo_to_cam: -0.999069645887872 0.030746732104378723 -0.030240389058074302 0.9276477826710018 -0.030772133023495043 -0.9995263554968372 0.0003748284867846508 0.058829066929946106 -0.03021454111295518 0.0013050410383379344 0.999542584572174 0.21383619101949458
DistCoeff: -0.149976 0.070503 -0.000917 -0.000333 0.0
I already know P2 is for camera projection matrix, R0_rect is for rotation matrix, and Tr_velo_to_cam for transformation from Velodyne coordinate to camera coordinate.
Reason why I wanted to convert is just I have calibration reference code for KITTI dataset not for the calib.yaml.

Related

kitti dataset camera projection matrix

I am looking at the kitti dataset and particularly how to convert a world point into the image coordinates. I looked at the README and it says below that I need to transform to camera coordinates first then multiply by the projection matrix. I have 2 questions, coming from a non computer vision background
I looked at the numbers from calib.txt and in particular the matrix is 3x4 with non-zero values in the last column. I always thought this matrix = K[I|0], where K is the camera's intrinsic matrix. So, why is the last column non-zero and what does it mean? e.g P2 is
array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
After applying projection into [u,v,w] and dividing u,v by w, are these values with respect to origin at the center of image or origin being at the top left of the image?
README:
calib.txt: Calibration data for the cameras: P0/P1 are the 3x4
projection
matrices after rectification. Here P0 denotes the left and P1 denotes the
right camera. Tr transforms a point from velodyne coordinates into the
left rectified camera coordinate system. In order to map a point X from the
velodyne scanner to a point x in the i'th image plane, you thus have to
transform it like:
x = Pi * Tr * X
Refs:
How to understand the KITTI camera calibration files?
Format of parameters in KITTI's calibration file
http://www.cvlibs.net/publications/Geiger2013IJRR.pdf
Answer:
I strongly recommend you read those references above. They may solve most, if not all, of your questions.
For question 2: The projected points on images are with respect to origin at the top left. See ref 2 & 3, the coordinates of a far 3d point in image are (center_x, center_y), whose values are provided in the P_rect matrices. Or you can verify this with some simple codes:
import numpy as np
p = np.array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
x = [0, 0, 1E8, 1] # A far 3D point
y = np.dot(p, x)
y[0] /= y[2]
y[1] /= y[2]
y = y[:2]
print(y)
You will see some output like:
array([6.018873e+02, 1.831104e+02 ])
which is quite near the (p[0, 2], p[1, 2]), a.k.a. (center_x, center_y).
For all the P matrices (3x4), they represent:
P(i)rect = [[fu 0 cx -fu*bx],
[0 fv cy -fv*by],
[0 0 1 0]]
Last column are baselines in meters w.r.t. the reference camera 0. You can see the P0 has all zeros in the last column because it is the reference camera.
This post has more details:
How Kitti calibration matrix was calculated?

OpenCV Equirectangular Rotation

I'm currently stuck on achieving an equirectangular rotation on a 360° image with OpenCV because of my mathematical understanding (nearly zero) of projections and rotations matrixes.
The result of a such rotation would be exactly what you can see here: https://www.youtube.com/watch?v=l1N0lEKIeLA
I found some code here: https://github.com/FoxelSA/libgnomonic/wiki/Equirectangular-rotation_v0.1 but I didn't succeed to apply it to opencv
If someone has any idea how to apply it for an OpenCV Mat and Pitch, Yaw, Roll angles it would be highly appreciated!
Thanks!
Instead of talking about yaw, pitch and roll, I'll talk here about Euler angles x, y and z.
To perform a rotation of your equirectangular mapping, you can follow this procedure:
Consider coordinates (i2, j2) in your result image. We'll try to find which color to put here. These coordinates correspond to a point on the sphere with latitude lat2 = 180 * i2 / image.height and longitude lon2 = 360 * j2 / image.width. Compute the corresponding 3D vector v2.
Compute the rotation matrix R with angles x, y and z (look at the formulas here). Take the transpose of this matrix to get the inverse rotation from the new image to the old one. We'll name this inverse rotation matrix Rt.
Compute v1 = Rt * v2. Then compute the latitude lat1 and longitude lon1 of v1.
Find the color in the original image at coordinates i1 = image.height * lat1 / 180 and j1 = image.width * lon1 / 360. This might not be integer coordinates. You might have to interpolate between several pixels to get your value. This is the color of the pixel at position (i2, j2) in your new image.
You'll need to look at how to convert between 3D vectors on a sphere and their latitude and longitude angles but this shouldn't be too hard to find. The algorithm described here should be rather straightforward to implement.
Let me know if I made any mistake as I haven't tested it myself.

Projective or Euclidean 3D- Reconstruction?

I have problems understanding if I get an euclidean reconstruction result or just a projective one. So at first let me tell you what I've done:
I have two stereo images. The images are SEM images and are eucentrically tilted. The difference of tilt is 5°. Using SURF-correspondences and RANSAC, I calculate the fundamental matrix with the normalized 8-point algorithm.
Then the images are rectified and I do a dense stereo-matching:
minDisp = -16
numDisp = 16-minDisp
stereo = cv2.StereoSGBM_create(minDisparity = minDisp,
numDisparities = numDisp)
disp = stereo.compute(imgL, imgR).astype(np.float32) / 16.0
That gives me a disparity map, f.e. this 5x5 matrix (the values range from -16 to 16). I mask the bad pixels out (-17) and compute the z-component of my images using the flattened disp array.
-0.1875 -0.1250 -0.1250 0
-0.1250 -0.1250 -0.1250 -17
disp = -0.0625 -0.0625 -0.1250 -17
-0.0625 -0.0625 0 0.0625
0 0 0.0625 0.1250
#create mask that eliminates the bad pixel values ( = minimum values)
mask = disp != disp.min()
dispMasked = disp[mask]
#compute z-component
zWorld = np.float32(((dispMasked) * p) / (2 * np.sin(tilt)))
It's a simplified form of a real triangulation assuming a parallel projection using trigonometric equations. The pixelconstant was calculated with a calibration object. So I get the height in mm. The disparity was calculated in pixels.
The results of the point cloud look quite good but I have a small constant tilt of all points. So the created pointcloud(-plane) has a tiltangle.
My question is now, is this point cloud in real euclidean coordinates or do I have a projective reconstruction ( equal to affine reconstruction? ) result that still differs from an euclidean result (unknown transformation between euclidean and projective result)?
The reason why I ask is that I don't have a real calibration matrix and I didn't use a real triangulation method using central projection with camera center coordinates, focal length and image point coordinates.
Any suggestions or literature are appreciated. :)
Best regards and thanks in advance!

Field of view + Aspect Ratio + View Matrix from Projection Matrix (HMD OST Calibration)

I'm currently working on an Augmented reality application. The targetted device being an Optical See-though HMD I need to calibrate its display to achieve a correct registration of virtual objects.
I used that implementation of SPAAM for android to do it and the result are precise enough for my purpose.
My problem is, calibration application give in output a 4x4 projection matrix I could have directly use with OpenGL for exemple. But, the Augmented Reality framework I use only accept optical calibration parameters under the format Field of View some parameter + Aspect Ratio some parameter + 4x4 View matrix.
Here is what I have :
Correct calibration result under wrong format :
6.191399, 0.114267, -0.142429, -0.142144
-0.100027, 11.791289, 0.05604, 0.055928
0.217304,-0.486923, -0.990243, -0.988265
0.728104, 0.005347, -0.197072, 0.003122
You can take a look at the code that generate this result here.
What I understand is the Single Point Active Alignment Method produce a 3x4 matrix, then the program multiply this matrice by an orthogonal projection matrix to get the result above. Here are the param used to produce the orthogonal matrix :
near : 0.1, far : 100.0, right : 960, left : 0, top : 540, bottom: 0
Bad calibration result under right format :
Param 1 : 12.465418
Param 2 : 1.535465
0.995903, -0.046072, 0.077501, 0.000000
0.050040, 0.994671, -0.047959, 0.000000
-0.075318, 0.051640, 0.992901, 0.000000
114.639359, -14.115030, -24.993097, 1.000000
I don't have any information on how these result are obtained.
I read these parameters from binary files, and I don't know if matrices are stored in row or column major form. So the two matrices may have to be transposed.
My question is : Is it possible, and if yes, how to get these three parameters from the projection first matrix I have ?
Is it possible, and if yes, how to get these three parameters from the projection matrix I have ?
The projection matrix and the view matrix describe completely different transformations. While the projection matrix describes the mapping from 3D points of a scene, to 2D points of the viewport, the view matrix describes the direction and position from which the scene is looked at. The view matrix is defined by the camera position and the direction too the target of view and the up vector of the camera. (see Transform the modelMatrix)
This means it is not possible to get the view matrix from the projection matrix. But the camera defines a view matrix.
If the projection is perspective, then it will be possible to get the field of view angle and the aspect ratio from the projection matrix.
The Perspective Projection Matrix looks like this:
r = right, l = left, b = bottom, t = top, n = near, f = far
2*n/(r-l) 0 0 0
0 2*n/(t-b) 0 0
(r+l)/(r-l) (t+b)/(t-b) -(f+n)/(f-n) -1
0 0 -2*f*n/(f-n) 0
it follows:
aspect = w / h
tanFov = tan( fov_y * 0.5 );
p[0][0] = 2*n/(r-l) = 1.0 / (tanFov * aspect)
p[1][1] = 2*n/(t-b) = 1.0 / tanFov
The field of view angle along the Y-axis in degrees:
fov = 2.0*atan( 1.0/prjMatrix[1][1] ) * 180.0 / PI;
The aspect ratio:
aspect = prjMatrix[1][1] / prjMatrix[0][0];
See further the answers to the following question:
How to render depth linearly in modern OpenGL with gl_FragCoord.z in fragment shader?
How to recover view space position given view space depth value and ndc xy

OpenGL: Change Camera's view port in 3D by given 4-by-4 matrix

I am now trying to change camera's view port under the given 4-by-4 matrix as
R11 R12 R13 transx
R21 R22 R23 transy
R31 R32 R33 transz
0 0 0 1
R is the 3-by-3 rotation matrix while transx, transy, transz are translations along x,y and z axis in 3D space.
My code is as following
eyex = transx;
eyey = transy;
eyez = transz;
atx = transx;
aty = transy;
atz = transz+1;
gluLookAt (R11*eyex + R12*eyey + R13*eyez, R21*eyex + R22*eyey + R23*eyez, R31*eyex + R32*eyey + R33*eyez, atx, aty, atz, 0.0, 1.0, 0.0);
However, I can't get the correct result. (Rotatation seems ok but problems occurs on translation)
Could someone can pick up the error in my code?
Not really sure what you want to do, but you seem to be confusing issues. gluLookAt is a function that generate a cam-to-world matrix for you, by passing to the function, the eye position and a point you are looking at. So in short, it's either you use a gluLooAt to build you matrix, or you build your matrix yourself for the camera, and use it to set the OpenGL cam-to-world matrix.
In OpenGL4, this would be done on the CPU (building up that matrix) and then loading it in the vertex shader. At the end of you will to transform your object to world space, then transform the vertex in world space in camera space, and finally project using the perspective projection matrix.
You can find some info on this website: www.scratchapixel.com