I am taking the Computer Vision course, and I had some problems while doing some exercise:
I have the intrinsic matrix K, and extrinsic matrix [R|t] of a camera as followings,
K =
478.989 2.67423 405.437
0 476.472 306.35
0 0 1
[R|t] =
0.681951 -0.00771052 -0.734232 -46.1881
-0.344648 0.882047 -0.331892 -42.4157
0.645105 0.479386 0.598855 118.637
the real world coordination is shown in the picture
I want to calculate "camera position relative to World coordinate",
and the answer is supposed to be
[X, Y, Z] = [74.18, 69.421, 50.904]
How can I get the answer? It just took me a lot of time, but I can not figure it out.
This OpenCV document details how to convert from world to camera co-ordinates.
x = K[R T]X
where x is 2D image co-ordinates and X is 3D world co-ordinate. Using above what you want is X which is nothing but:
X = inverse(K[R T]) * x
Now, put your values of x (u, v, 1) and you should get the value of X which is your required 3D co-ordinate.
I am looking at the kitti dataset and particularly how to convert a world point into the image coordinates. I looked at the README and it says below that I need to transform to camera coordinates first then multiply by the projection matrix. I have 2 questions, coming from a non computer vision background
I looked at the numbers from calib.txt and in particular the matrix is 3x4 with non-zero values in the last column. I always thought this matrix = K[I|0], where K is the camera's intrinsic matrix. So, why is the last column non-zero and what does it mean? e.g P2 is
array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
After applying projection into [u,v,w] and dividing u,v by w, are these values with respect to origin at the center of image or origin being at the top left of the image?
calib.txt: Calibration data for the cameras: P0/P1 are the 3x4
matrices after rectification. Here P0 denotes the left and P1 denotes the
right camera. Tr transforms a point from velodyne coordinates into the
left rectified camera coordinate system. In order to map a point X from the
velodyne scanner to a point x in the i'th image plane, you thus have to
transform it like:
x = Pi * Tr * X
How to understand the KITTI camera calibration files?
Format of parameters in KITTI's calibration file
I strongly recommend you read those references above. They may solve most, if not all, of your questions.
For question 2: The projected points on images are with respect to origin at the top left. See ref 2 & 3, the coordinates of a far 3d point in image are (center_x, center_y), whose values are provided in the P_rect matrices. Or you can verify this with some simple codes:
import numpy as np
p = np.array([[7.070912e+02, 0.000000e+00, 6.018873e+02, 4.688783e+01],
[0.000000e+00, 7.070912e+02, 1.831104e+02, 1.178601e-01],
[0.000000e+00, 0.000000e+00, 1.000000e+00, 6.203223e-03]])
x = [0, 0, 1E8, 1] # A far 3D point
y = np.dot(p, x)
y[0] /= y[2]
y[1] /= y[2]
y = y[:2]
You will see some output like:
array([6.018873e+02, 1.831104e+02 ])
which is quite near the (p[0, 2], p[1, 2]), a.k.a. (center_x, center_y).
For all the P matrices (3x4), they represent:
P(i)rect = [[fu 0 cx -fu*bx],
[0 fv cy -fv*by],
[0 0 1 0]]
Last column are baselines in meters w.r.t. the reference camera 0. You can see the P0 has all zeros in the last column because it is the reference camera.
This post has more details:
How Kitti calibration matrix was calculated?
I have two chessboard poses obtained with solvePnp:
Mat rotationVector1, translationVector1;
solvePnP(chess1WorldPoints, chess1ImagePoints, intrinsicMatrix, distortCoefficients, rotationVector1, translationVector1);
Mat rotationVector2, translationVector2;
solvePnP(chess2WorldPoints, chess2ImagePoints, intrinsicMatrix, distortCoefficients, rotationVector2, translationVector2);
How can I check if the planes of the poses are parallel, or find the angle between these planes?
More info
I tried obtaining Euler angles and computing the difference between each alpha, beta and gamma but that only tells me relative rotation for each axis I think:
Vec3d eulerAnglesPose1;
Mat rotationMatrix1;
Rodrigues(rotationVector1, rotationMatrix1);
getEulerAngles(rotationMatrix1, eulerAngles1);
Vec3d eulerAnglesPose2;
Mat rotationMatrix2;
Rodrigues(rotationVector2, rotationMatrix2);
getEulerAngles(rotationMatrix2, eulerAngles2);
I used the getEulerAngles implementation from Camera Rotation SolvePnp :
void getEulerAngles(Mat &rotCamerMatrix, Vec3d &eulerAngles)
Mat cameraMatrix, rotMatrix, transVect, rotMatrixX, rotMatrixY, rotMatrixZ;
double* _r = rotCamerMatrix.ptr<double>();
double projMatrix[12] =
decomposeProjectionMatrix(Mat(3, 4, CV_64FC1, projMatrix), cameraMatrix, rotMatrix, transVect, rotMatrixX, rotMatrixY, rotMatrixZ, eulerAngles);
In my case a rotation-translation pair (R, T) gives the correspondence between a coordinate system where the camera is at (0,0,0) (the camera coordinate system) to a coordinate system where (0,0,0) is something I defined in the first two parameters of solvePnp (the world coordinate system). So I have two world coordinate systems relative to the same camera coordinate system.
If I could switch from coord. system 2 to coord. system 1 I could use the Z=0 planes for each one to find the normals and solve my problem.
I think that for example switching from coord. system 2 to camera system should be done like in this post:
Rinv = R' (just the transpose as it's a rotation matrix)
Tinv = -Rinv * T (T is 3x1 column vector)
Then if Pw = [X Y Z] is a point in world coord. system 2 I can get its camera system coords.with:
Pc = [ Rinv Tinv] * [X Y Z 1] transposed.
Pc looks like [a b c d]
Following the same logic again I can get the coordinates of Pc relative to coord. system 1:
Pw1 = [ R1 T1] * Pc
Should I normalize Pc or just normalize Pw1 at the end?
I found how to translate points between coordinate systems in this OpenCV Demo.
The explanation from "Demo 3: Homography from the camera displacement" (the section spanning from title right until the first lines of code) shows how to translate between coordinate systems using matrix multiplication. I just had to apply it to my situation (I had CMO1 and CMO2 and needed to find O1MO2).
This way I can get two planes in the same coord. system, get their normals and find the angle between them.
Also it helped to realize that the extrinsic matrix [R T] translates a 3D point from the world coord. system to the camera coord. system (where camera is at (0,0,0)), not the other way around.
Here is my reasoning:
openGL draws everything within a 2x2x2 cube
the x,y values inside this cube determine where the point is drawn on the screen. The z value is used for other stuff...
if you want the z value to have some effect on perspective you need to mutate the scene (usually with a matrix) so that it gives an illusion of distant objects being smaller.
the z values of the cube go from -1 to 1.
Now I want it so that objects that are at z = 1 are infinitely zoomed, and objects that are at z = 0 are normal size, and objects that are at z = -1 are 1/2 size.
When I say an object is zoomed, I mean that the (x,y) coordinates of its points are multiplied by scaler zoom factor, which is based on its z coordinate.
If a point lies outside the 2x2x2 cube I want the calculations to still be done on it if it is between z = 1 and z = -1. Since the z value doesn't change I don't care what happens to any points that are not within this range, as long as their z value is not changed.
Generalized point transformation:
If I have a point P = (x, y, z), and -1 <= z <= 1 then:
the Zoom Factor, S = 1 / (1 - z)
so the translation is as follows:
(x, y, z) ==> (x * S, y * S, z)
Creating the matrix?
This is where I am having issues. I don't know how to create a matrix so that it will transform a generalized point to have the desired effect.
I am considering not using a matrix and applying this transformation via a function in glsl...
If someone has insight on how to create such a matrix I would like to know.
OpenCV's reprojectImageTo3D() outputs a "3-channel image representing a 3D surface".
You can access this data by
Vec3f coordinates = _3dImage.at<Vec3f>(y,x);
float depth = _3dImage.at<Vec3f>(y,x)[2];
witch returns a vector [X,Y,Z].
In "Learning OpenCV" by Gary Bradski & Adrian Kaehler, it is explained that the depth is calculated by
Z = f T / (x_left - x_right)
where f = focal length, T = eye base/translation between cameras, (x_left - x_right) = disparity
This exact formula is implemented in OpenCV (I checked the source code - however there is for some reason an additional negative sign). The question is: In which unit are the X, Y, Z values specified?
T is in your unit (e.g. mm), x_l - x_r is in pixel and [ f ] = ?
When you calibrate the camera, you specify the chessboard's size in real world units (e.g. mm). Does the intrinsic matrix therefore have real world units? Or is it specified in px? Unfortunately I cannot find the answer in the documentation.
The underlying equation that performs depth reconstruction is:
Z = fB/d, where
f is the focal length (in pixels), you called it as eye base/translation between cameras
B is the stereo baseline (in meters)
d is disparity (in pixels) that measures the difference in retinal position between corresponding points
Z is the distance along the camera Z axis
The 3D position (X,Y,Z) of an image point (e.g. (u,v) in pixels) can be given in meters, cm, mm or whatever you choose, because the 3D coordinates (X,Y,Z) are in the same units as the chessboard's square size. For example, if you define the square size to be 1 cm then the 3D coordinates will be in cm as well.
Size boardSize(4, 5); // 4x5 chessboard
float squareSize = 0.025F; // 0.025 meters
for( int i = 0; i < boardSize.height; i++ )
for( int j = 0; j < boardSize.width; j++ )
corners.push_back(Point3f(float(j*squareSize), float(i*squareSize), 0.0F));
After Z is determined, X and Y can be calculated using the usual projective camera equations:
X = uZ/f
Y = vZ/f
I have a point in 3D space and two angles, I want to calculate the resulting line from this information. I have found how to do this with 2D lines, but not 3D. How can this be calculated?
If it helps: I'm using C++ & OpenGL and have the location of the user's mouse click and the angle of the camera, I want to trace this line for intersections.
In trig terms two angles and a point are required to define a line in 3d space. Converting that to (x,y,z) is just polar coordinates to cartesian coordinates the equations are:
x = r sin(q) cos(f)
y = r sin(q) sin(f)
z = r cos(q)
Where r is the distance from the point P to the origin; the angle q (zenith) between the line OP and the positive polar axis (can be thought of as the z-axis); and the angle f (azimuth) between the initial ray and the projection of OP onto the equatorial plane(usually measured from the x-axis).
Okay that was the first part of what you ask. The rest of it, the real question after the updates to the question, is much more complicated than just creating a line from 2 angles and a point in 3d space. This involves using a camera-to-world transformation matrix and was covered in other SO questions. For convenience here's one: How does one convert world coordinates to camera coordinates? The answers cover converting from world-to-camera and camera-to-world.
The line can be fathomed as a point in "time". The equation must be vectorized, or have a direction to make sense, so time is a natural way to think of it. So an equation of a line in 3 dimensions could really be three two dimensional equations of x,y,z related to time, such as:
x = ax*t + cx
y = ay*t + cy
z = az*t + cz
To find that set of equations, assuming the camera is at origin, (0,0,0), and your point is (x1,y1,z1) then
ax = x1 - 0
ay = y1 - 0
az = z1 - 0
cx = cy = cz = 0
x = x1*t
y = y1*t
z = z1*t
Note: this also assumes that the "speed" of the line or vector is such that it is at your point (x1,y1,z1) after 1 second.
So to draw that line just fill in the points as fine as you like for as long as required, such as every 1/1000 of a second for 10 seconds or something, might draw a "line", really a series of points that when seen from a distance appear as a line, over 10 seconds worth of distance, determined by the "speed" you choose.