I want to add to a captured frame from a camera a mesh model (Let's say a cube)
I also know all the information about where to put the cube:
Translation matrix - relative to the camera
Rotation matrix - relative to the camera
camera calibration matrix - focal length, principal point, etc. (intrinsic parameters)
How can I convert this information to model/view/projection matrices?
What should be the values to set to these matrices?
For example, let's say that I want to display the point [x, y, z, 1] on the screen,
then that should be something like: [u, v, 1] = K * [R | T] * [x, y, z, 1], while:
u, v are the coordinates in the screen (or camera capture) and:
K, R and T are intrinsic camera parameters, rotation and translation, respectively.
How to convert K, R, T to model/view/projection matrices?
[R | T] would be your model-view matrix and K would be your projection matrix.
Model-view matrix is usually one matrix. The separation is only conceptual: Model translates from model coordinates to world coordinates and View from world coordinates to camera (not-yet-projected) coordinates. It makes sense in applications where the camera and the objects move independently from each other. In your case, on the other hand, camera can be considered fixed and everything else described relative to the camera. So you have to deal only with two matrices: model-view and projection.
Assuming that your camera intrinsic matrix comes from OpenCV (http://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html), how to initialize your OpenGL projection matrix is described there:
https://blog.noctua-software.com/opencv-opengl-projection-matrix.html
cx, cy, width and height are in pixels
As for your OpenGL model-view matrix, it's really simple:
So in the end your model-view-projection matrix is:
Related
I am trying to orient a 3d object at the world origin such that it doesn't change its position wrt camera when I move the camera OR change its field of view. I tried doing this
Object Transform = Inverse(CameraProjectionMatrix)
How do I undo the perspective divide because when I change the fov, the object is affected by it
In detail it looks like
origin(0.0, 0.0, 0.0, 1.0f);
projViewInverse = Camera.projViewMatrix().inverse();
projectionMatrix = Camera.projViewMatrix();
projectedOrigin = projectionMatrix * origin;
topRight(0.5f, 0.5f, 0.f);
scaleFactor = 1.0/projectedOrigin.z();
scale(scaleFactor,scaleFactor,scaleFactor);
finalMatrix = projViewInverse * Scaling(w) * Translation(topRight);
if you use gfx pipeline where positions (w=1.0) and vectors (w=0.0) are transformed to NDC like this:
(x',y',z',w') = M*(x,y,z,w) // applying transforms
(x'',y'') = (x',y')/w' // perspective divide
where M are all your 4x4 homogenyuous transform matrices multiplied in their order together. If you want to go back to the original (x,y,z) you need to know w' which can be computed from z. The equation depends on your projection. In such case you can do this:
w' = f(z') // z' is usually the value encoded in depth buffer and can obtained
(x',y') = (x'',y'')*w' // screen -> camera
(x,y) = Inverse(M)*(x',y',z',w') // camera -> world
However this can be used only if you know the z' and can derive w' from it. So what is usually done (if we can not) is to cast ray from camera focal point through the (x'',y'') and stop at wanted perpendicular distance to camera. For perspective projection you can look at it as triangle similarity:
So for each vertex you want to transform you need its projected x'',y'' position on the znear plane (screen) and then just scale the x'',y'' by the ratio between distances to camera focal point (*z1/z0). Now all we need is the focal length z0. That one dependss on the kind of projection matrix you use. I usually encounter 2 versions when you are in camera coordinate system then point (0,0,0) is either the focal point or znear plane. However the projection matrix can be any hence also the focal point position can vary ...
Now when you have to deal with aspect ratio then the first method deals with it internally as its inside the M. The second method needs to apply inverse of aspect ratio correction before conversion. So apply it directly on x'',y''
I'm displaying an array of 3D points with OpenGL. The problem is the 3D points are from a sensor where X is forward, Y is to the left, Z is up. From my understanding OpenGL has X to the right, Y up, Z out of screen. So when I use a lot of the examples of projection matrices, and cameras the points are obviously not viewed the right way, or the way that makes sense.
So to compare the two (S for sensor, O for OpenGL):
Xs == -Zo, Ys == -Xo, Zs == Yo.
Now my questions are:
How can I rotate the the points from S to O. I tried rotating by 90degrees around X, then Z but it doesn't appear to be working.
Do I even need to rotate to OpenGL convention, can I make up my own Axes (use the sensors orientation), and change the camera code? Or will some assumptions break somewhere in the graphics pipeline?
My implementation based on the answer below:
glm::mat4 model = glm::mat4(0.0f);
model[0][1] = -1;
model[1][2] = 1;
model[2][0] = -1;
// My input to the shader was a mat4 for the model matrix so need to
// make sure the bottom right element is 1
model[3][3] = 1;
The one line in the shader:
// Note that the above matrix is OpenGL to Sensor frame conversion
// I want Sensor to OpenGL so I need to take the inverse of the model matrix
// In the real implementation I will change the code above to
// take inverse before sending to shader
" gl_Position = projection * view * inverse(model) * vec4(lidar_pt.x, lidar_pt.y, lidar_pt.z, 1.0f);\n"
In order to convert the sensor data's coordinate system into OpenGL's right-handed world-space, where the X axis points to the right, Y points up and Z points towards the user in front of the screen (i.e. "out of the screen") you can very easily come up with a 3x3 rotation matrix that will perform what you want:
Since you said that in the sensor's coordinate system X points into the screen (which is equivalent to OpenGL's -Z axis, we will map the sensor's (1, 0, 0) axis to (0, 0, -1).
And your sensor's Y axis points to the left (as you said), so that will be OpenGL's (-1, 0, 0). And likewise, the sensor's Z axis points up, so that will be OpenGL's (0, 1, 0).
With this information, we can build the rotation matrix:
/ 0 -1 0\
| 0 0 1|
\-1 0 0/
Simply multiply your sensor data vertices with this matrix before applying OpenGL's view and projection transformation.
So, when you multiply that out with a vector (Sx, Sy, Sz), you get:
Ox = -Sy
Oy = Sz
Oz = -Sx
(where Ox/y/z is the point in OpenGL coordinates and Sx/y/z is the sensor coordinates).
Now, you can just build a transformation matrix (right-multiply against your usual model-view-projection matrix) and let a shader transform the vertices by that or you simply pre-transform the sensor vertices before uploading to OpenGL.
You hardly ever need angles in OpenGL when you know your linear algebra math.
I am trying to understand the math behind the transformation from world coordinates to view coordinates.
This is the formula to calculate the matrix in view coordinates:
and here is an example, that should normally be correct...:
where b = width of the viewport and h= the height of the viewport
But I just don't know how to calculate the R matrix. How do you get Ux, Uy, Uz, Vx, Vy, etc... ? U,v and, n is the coordinatesystem fixed to the camera. And the camera is at position X0, Y0, Z0.
The matrix T is applied first. It translates some world coordinate P by minus the camera coordinate (call it C), giving the relative coordinate of P (call this Q) with respect to the camera (Q = P - C), in the world axes orientation.
The matrix R is then applied to Q. It performs a rotation to obtain the coordinates of Q in the camera's axes.
u is the horizontal view axis
v is the vertical view axis
n is the view direction axis
(all three should be normalized)
Multiplying R with Q :
multiplying with the first line of R gives DOT(Q, u). This returns the component of Q projected onto u, which is the horizontal view coordinate.
the second line gives DOT(Q, v), which similar to above gives the vertical view coordinate.
the third line gives DOT(Q, n), which is the depth view coordinate.
A diagram:
BTW These are NOT screen/viewport coordinates! They are just the coordinates in the camera/view frame. To get the perspective-corrected coordinate another matrix (the projection matrix) needs to be applied.
I am using the default OpenGL values like glDepthRangef(0.0,1.0);, gldepthfunc(GL_LESS); and glClearDepthf(1.f); because my projection matrices change the right hand coordinate to the left hand coordinate. I mean, My near plane and the far plane z-values are supposed to be [-1 , 1] in NDC.
The problem is when I draw two objects at the one FBO including same RBOs, for example, like this code below,
glEnable(GL_DEPTH_TEST);
glClearDepthf(1.f);
glClearColor(0.0,0.0,0.0,0.0);
glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT);
drawObj1(); // this uses 1) the orthogonal projection below
drawObj2(); // this uses 2) the perspective projection below
glDisable(GL_DEPTH_TEST);
always, the object1 is above the object2.
1) orthogonal
2) perspective
However, when they use same projection whatever it is, it works fine.
Which part do you think I should go over?
--Updated--
Coverting Eye coordinate to NDC to Screen coordinate, what really happens?
My understanding is because after both of projections, its NDC shape is same as images below, its z-value after multiplying 2) perspective matrix doesn't have to be distorted. However, according to the derbass's good answer, if z-value in the view coordinate is multiplied by the perspective matrix, the z-value would be hyperbolically distorted in NDC.
If so, if one vertex position, for example, is [-240.0, 0.0, -100.0] in the eye(view) coordinate with [w:480.0,h:320.0], and I clipped it with [-0.01,-100], would it be [-1,0,-1] or [something>=-1,0,-1] in NDC ? And its z value is still same as -1, isn't it? when its z-value is distorted?
1) Orthogonal
2) Perspective
You can't expect that the z values of your vertices are projected to the same window space z value just because you use the same near and far values for a perspecitive and an orthogonal projection matrix.
In the prespecitve case, the eye space z value will be hyperbolically distorted to the NDC z value. In the orthogonal case, it is just linaerily scaled and shifted.
If your "Obj2" lies just in a flat plane z_eye=const, you can pre-calulate the distorted depth it should have in the perspective case. But if it has a non-zero extent into depth, this will not work. I can think of different approaches to deal with the situation:
"Fix" the depth of object two in the fragment shader by adjusting the gl_FragDepth according to the hyperbolic distortion your z buffer expects.
Use a linear z-buffer, aka. a w buffer.
These approaches are conceptually the inverse of each other. In both cases, you have play with gl_FragDepth so that it matches the conventions of the other render pass.
UPDATE
My understanding is because after both of projections, its NDC shape
is same as images below, its z-value after multiplying 2) perspective
matrix doesn't have to be distorted.
Well, these images show the conversion from clip space to NDC. And that transfromation is what the projection matrix followed by the perspective divide do. When it is in normalized device coords, no further distortion does occur. It is just linearily transformed to window space z according to the glDepthRange() setup.
However, according to the
derbass's good answer, if z-value in the view coordinate is multiplied
by the perspective matrix, the z-value would be hyperbolically
distorted in NDC.
The perspective matrix is applied to the complete 4D homogenous eye space vector, so it is applied to z_eye as well as to x_eye, y_eye and also w_eye (which is typically just 1, but doesn't have to).
So the resulting NDC coordinates for the perspective case are hyberbolically distorted to
f + n 2 * f * n B
z_ndc = ------- + ----------------- = A + -------
n - f (n - f) * z_eye z_eye
while, in the orthogonal case, they are just linearily transformed to
- 2 f + n
z_ndc = ------- z_eye - --------- = C * z_eye + D
f - n (f - n)
For n=1 and f=10, it will look like this (note that I plotted the range partly outside of the frustum. Clipping will prevent these values from occuring in the GL, of course).
If so, if one vertex position, for example, is [-240.0, 0.0, -100.0]
in the eye(view) coordinate with [w:480.0,h:320.0], and I clipped it
with [-0.01,-100], would it be [-1,0,-1] or [something>=-1,0,-1] in
NDC ? And its z value is still same as -1, isn't it? when its z-value
is distorted?
Points at the far plane are always transformed to z_ndc=1, and points at the near plane to z_ndc=-1. This is how the projection matrices were constructed, and this is exactly where the two graphs in the plot above intersect. So for these trivial cases, the different mappings do not matter at all. But for all other distances, they will.
I have a calibrated camera (intrinsic matrix and distortion coefficients) and I want to know the camera position knowing some 3d points and their corresponding points in the image (2d points).
I know that cv::solvePnP could help me, and after reading this and this I understand that I the outputs of solvePnP rvec and tvec are the rotation and translation of the object in camera coordinate system.
So I need to find out the camera rotation/translation in the world coordinate system.
From the links above it seems that the code is straightforward, in python:
found,rvec,tvec = cv2.solvePnP(object_3d_points, object_2d_points, camera_matrix, dist_coefs)
rotM = cv2.Rodrigues(rvec)[0]
cameraPosition = -np.matrix(rotM).T * np.matrix(tvec)
I don't know python/numpy stuffs (I'm using C++) but this does not make a lot of sense to me:
rvec, tvec output from solvePnP are 3x1 matrix, 3 element vectors
cv2.Rodrigues(rvec) is a 3x3 matrix
cv2.Rodrigues(rvec)[0] is a 3x1 matrix, 3 element vectors
cameraPosition is a 3x1 * 1x3 matrix multiplication that is a.. 3x3 matrix. how can I use this in opengl with simple glTranslatef and glRotate calls?
If with "world coordinates" you mean "object coordinates", you have to get the inverse transformation of the result given by the pnp algorithm.
There is a trick to invert transformation matrices that allows you to save the inversion operation, which is usually expensive, and that explains the code in Python. Given a transformation [R|t], we have that inv([R|t]) = [R'|-R'*t], where R' is the transpose of R. So, you can code (not tested):
cv::Mat rvec, tvec;
solvePnP(..., rvec, tvec, ...);
// rvec is 3x1, tvec is 3x1
cv::Mat R;
cv::Rodrigues(rvec, R); // R is 3x3
R = R.t(); // rotation of inverse
tvec = -R * tvec; // translation of inverse
cv::Mat T = cv::Mat::eye(4, 4, R.type()); // T is 4x4
T( cv::Range(0,3), cv::Range(0,3) ) = R * 1; // copies R into T
T( cv::Range(0,3), cv::Range(3,4) ) = tvec * 1; // copies tvec into T
// T is a 4x4 matrix with the pose of the camera in the object frame
Update: Later, to use T with OpenGL you have to keep in mind that the axes of the camera frame differ between OpenCV and OpenGL.
OpenCV uses the reference usually used in computer vision: X points to the right, Y down, Z to the front (as in this image). The frame of the camera in OpenGL is: X points to the right, Y up, Z to the back (as in the left hand side of this image). So, you need to apply a rotation around X axis of 180 degrees. The formula of this rotation matrix is in wikipedia.
// T is your 4x4 matrix in the OpenCV frame
cv::Mat RotX = ...; // 4x4 matrix with a 180 deg rotation around X
cv::Mat Tgl = T * RotX; // OpenGL camera in the object frame
These transformations are always confusing and I may be wrong at some step, so take this with a grain of salt.
Finally, take into account that matrices in OpenCV are stored in row-major order in memory, and OpenGL ones, in column-major order.
If you want to turn it into a standard 4x4 pose matrix specifying the position of your camera. Use rotM as the top left 3x3 square, tvec as the 3 elements on the right, and 0,0,0,1 as the bottom row
pose = [rotation tvec(0)
matrix tvec(1)
here tvec(2)
0 , 0, 0, 1]
then invert it (to get pose of camera instead of pose of world)