Camera position in world coordinate from cv::solvePnP - c++

I have a calibrated camera (intrinsic matrix and distortion coefficients) and I want to know the camera position knowing some 3d points and their corresponding points in the image (2d points).
I know that cv::solvePnP could help me, and after reading this and this I understand that I the outputs of solvePnP rvec and tvec are the rotation and translation of the object in camera coordinate system.
So I need to find out the camera rotation/translation in the world coordinate system.
From the links above it seems that the code is straightforward, in python:
found,rvec,tvec = cv2.solvePnP(object_3d_points, object_2d_points, camera_matrix, dist_coefs)
rotM = cv2.Rodrigues(rvec)[0]
cameraPosition = -np.matrix(rotM).T * np.matrix(tvec)
I don't know python/numpy stuffs (I'm using C++) but this does not make a lot of sense to me:
rvec, tvec output from solvePnP are 3x1 matrix, 3 element vectors
cv2.Rodrigues(rvec) is a 3x3 matrix
cv2.Rodrigues(rvec)[0] is a 3x1 matrix, 3 element vectors
cameraPosition is a 3x1 * 1x3 matrix multiplication that is a.. 3x3 matrix. how can I use this in opengl with simple glTranslatef and glRotate calls?

If with "world coordinates" you mean "object coordinates", you have to get the inverse transformation of the result given by the pnp algorithm.
There is a trick to invert transformation matrices that allows you to save the inversion operation, which is usually expensive, and that explains the code in Python. Given a transformation [R|t], we have that inv([R|t]) = [R'|-R'*t], where R' is the transpose of R. So, you can code (not tested):
cv::Mat rvec, tvec;
solvePnP(..., rvec, tvec, ...);
// rvec is 3x1, tvec is 3x1
cv::Mat R;
cv::Rodrigues(rvec, R); // R is 3x3
R = R.t(); // rotation of inverse
tvec = -R * tvec; // translation of inverse
cv::Mat T = cv::Mat::eye(4, 4, R.type()); // T is 4x4
T( cv::Range(0,3), cv::Range(0,3) ) = R * 1; // copies R into T
T( cv::Range(0,3), cv::Range(3,4) ) = tvec * 1; // copies tvec into T
// T is a 4x4 matrix with the pose of the camera in the object frame
Update: Later, to use T with OpenGL you have to keep in mind that the axes of the camera frame differ between OpenCV and OpenGL.
OpenCV uses the reference usually used in computer vision: X points to the right, Y down, Z to the front (as in this image). The frame of the camera in OpenGL is: X points to the right, Y up, Z to the back (as in the left hand side of this image). So, you need to apply a rotation around X axis of 180 degrees. The formula of this rotation matrix is in wikipedia.
// T is your 4x4 matrix in the OpenCV frame
cv::Mat RotX = ...; // 4x4 matrix with a 180 deg rotation around X
cv::Mat Tgl = T * RotX; // OpenGL camera in the object frame
These transformations are always confusing and I may be wrong at some step, so take this with a grain of salt.
Finally, take into account that matrices in OpenCV are stored in row-major order in memory, and OpenGL ones, in column-major order.

If you want to turn it into a standard 4x4 pose matrix specifying the position of your camera. Use rotM as the top left 3x3 square, tvec as the 3 elements on the right, and 0,0,0,1 as the bottom row
pose = [rotation tvec(0)
matrix tvec(1)
here tvec(2)
0 , 0, 0, 1]
then invert it (to get pose of camera instead of pose of world)

Related

Translate and rotate a 3d point to origin with pcl transformcloud

Given a rotation matrix and translation matrix obtained from camera pose estimation of a corner of a charucoBoard, I want to transform one particular corner to the origin by using homogeneous transformation. If I understand correctly, the position of the camera is (0,0,0) in camera's world, and the rotation matrix and translation vector are also interpreted in camera's coordinate system.
translation vector tells you where exactly is the point in camera's coordinate system, whereas the rotation matrix tells you the rotation angle with respect to the camera.
I then construct my homogeneous transformation matrix as :
float transformation_matrix[] = {rmatrix.at<float>(0,0),rmatrix.at<float>(0,1), rmatrix.at<float>(0,2), tvec[0],rmatrix.at<float>(1,0), rmatrix.at<float>(1,1), rmatrix.at<float>(1,2), tvec[1], rmatrix.at<float>(2,0), rmatrix.at<float>(2,1), rmatrix.at<float>(2,2), tvec[2],0.f ,0.f ,0.f, 1.f};
where the rmatrix is my rotation matrix and tvec is the translation vector
and the output of transformation_matrix is
transformation matrix =
[-0.99144238, 0.12881841, 0.021162758, 0.18116128;
0.045751289, 0.49469706, -0.86786038, -0.037676446;
-0.12226554, -0.8594653, -0.49635723, 0.67637974;
0, 0, 0, 1]
and the rotation matrix is:
rodrigues matrix =
[-0.99144238, 0.12881841, 0.021162758;
0.045751289, 0.49469706, -0.86786038;
-0.12226554, -0.8594653, -0.49635723]
translation vector:
[0.181161, -0.0376764, 0.67638]
I then use transformpointcloud from
pcl::transformPointCloud (*cloud, *transformed_cloud, translation_matrix);
However, I cannot align the point to origin after transformation.
Does anyone have an insight on where I might get wrong?
aruco::drawAxis(imageCopy, Camera_matrix, distortion, rvec, tvec, 0.1);
cv::Rodrigues(rvec, rmatrix);
pcl::PointCloud<pcl::PointXYZ>::Ptr transformed_cloud (new pcl::PointCloud<pcl::PointXYZ> ());
float transformation_matrix[] = {rmatrix.at<float>(0,0),rmatrix.at<float>(0,1), rmatrix.at<float>(0,2), tvec[0],rmatrix.at<float>(1,0), rmatrix.at<float>(1,1), rmatrix.at<float>(1,2), tvec[1], rmatrix.at<float>(2,0), rmatrix.at<float>(2,1), rmatrix.at<float>(2,2), tvec[2],0.f ,0.f ,0.f, 1.f};
cv::Mat translation_m = cv::Mat(4,4,CV_32F, transformation_matrix);
Eigen::Matrix<float,Eigen::Dynamic,Eigen::Dynamic> translation_matrix;
cv::cv2eigen(translation_m, translation_matrix);
pcl::transformPointCloud (*cloud, *transformed_cloud, translation_matrix);
UPDATE:
I tested out the transformation before and after and found out my transformation with homogeneous matrix was incorrect:
Here are the depth data taken from the side of the board
As you can see, the x, y, z axis are displayed as red, green and blue respectively and the white point cloud is before whereas the red point cloud is after. I found out that the image is flipped correctly to the direction I want but the origin (the intersection of red, green and blue axis) is sometimes not attached to one of the corner of the board, and the x axis is not perfectly aligned with the border of the board.
For sure one thing I can confirm is that when the camera is parallel to the board, the x axis is parallel to the transformed image as well.
taken from the upright angle at the top of the board
I am speculating that I am missing one or two axis of rotation.
I made sure my rotation matrix and translation vector of the corner are correct by labeling another corner with the following code using the R*P + T formula:
aruco::drawAxis(imageCopy, Camera_matrix, distortion, rvec, tvec, 0.1);
cv::Rodrigues(rvec, rmatrix);
Vec3f corner1_0(0.4, 0, 0);
Mat matcorner1_0;
matcorner1_0 = FloatMatFromVec3f(corner1_0);
auto mat_tvec = FloatMatFromVec3f(tvec);
Mat3f rcorner1_0 = rmatrix * matcorner1_0 + mat_tvec;
float cornerdata[3] = {rcorner1_0.at<float>(0,0),rcorner1_0.at<float>(0,1),rcorner1_0.at<float>(0,2)};
cv::Mat rcorner_10 = cv::Mat(3,1, CV_32F, cornerdata);
auto r1_0 = to_vec3f(rcorner_10);
aruco::drawAxis(imageCopy, Camera_matrix, distortion, rvec, r1_0, 0.1);
The length of the board is 0.4 meter so i set corner1_0 to (0.4, 0, 0)
and then this is the image of the two marker axis I draw for indication purpose
As you can see the rotation matrix and translation vector are correct.
Any help/hint/speculation is greatly appreciated:)
I solved the problem.
The problem is that the homogeneous transformation matrix is transforming from camera's coordinate system to that of real world. However, I want the transformation matrix from real world coordinate system back to camera's coordinate system.
You can achieve this by taking the inverse of the transformation matrix.
so adding translation_m= translation_m.inv(); between
cv::Mat translation_m = cv::Mat(4,4,CV_32F, transformation_matrix);
and
Eigen::Matrix<float,Eigen::Dynamic,Eigen::Dynamic> translation_matrix;
cv::cv2eigen(translation_m, translation_matrix);
will give you the correct answer.

convert a 3d point from camera space to object space, in c++ with opencv

In computer vision, when I need to convert a cv::mat from object(world) space coordinates to camera space, or a camera-centric coordinate, i use the following
code: ( where rvec and tvec are the rotation and translation vectors of the camera)
cv::Mat R; //holds rotation matrix
cv::Rodrigues(rvec, R); // converts a rotation vector to a 3x3 matrix
R = R.t(); // rotation of inverse
tvec2 = -R * tvec; // translation of inverse
giving me tvec2 as the camera space coordinate.
My current problem is the opposite. I have an array of 3d points in camera space, and i need to convert them to world space. What does the inverse of the above function look like?
Thank you.
What I would advice you to do is have a 4x4 projection matrix. Indeed, it would be way easier for you to transform your points from the World Cordinate System (WCS) to you Camera Coordinate System. Indeed, let's say you have a point in WCS at {0,0,0}, and your camera at {10, 0, 0} with a rotation on Y axis of 45 degree. What you can do is create your projection matrix like that
rot = 45 degrees = 0,785398 rad.
Rotation: (3x3)
Translation: (1x3)
Your projection Matrix will be: (4x4)
With the projection matrix, let's say you have a point in you WCS at:
just change it to a point at:
then what you can do to change it to you CCS is:
PointInCCS = projectionMatrix * PointInWCS;
to return in WCS just use the matrix inverse:
PointInWCS = projectionMatrix.inv() * PointInCCS;

How to decompose a 3x3 transformation matrix to a Camera position matrix, and back?

I want to project a 3D scene with a moving camera (animated over time). I have a 3x3 transformation matrix, calculated by A2 * (T * (R * A1)) where
A1 is 2D -> 3D matrix projection (I'm currently projecting a single 2D image but will extend to 3D points and remove this matrix)
T is Translation matrix
R is rotation matrix
A2 is 3D -> 2D matrix
I can use this in opencv's warpPerspective and get a decent result. Now, I want to move the camera position in 3D space, but have no idea how. OpenCV has a DecomposeProjectionMatrix which takes a 3x4 projection matrix, but I'm unclear how that relates to any that I have.

3D rendering in OpenGL: model/view/projection vs translation/rotation/camera matrices

I want to add to a captured frame from a camera a mesh model (Let's say a cube)
I also know all the information about where to put the cube:
Translation matrix - relative to the camera
Rotation matrix - relative to the camera
camera calibration matrix - focal length, principal point, etc. (intrinsic parameters)
How can I convert this information to model/view/projection matrices?
What should be the values to set to these matrices?
For example, let's say that I want to display the point [x, y, z, 1] on the screen,
then that should be something like: [u, v, 1] = K * [R | T] * [x, y, z, 1], while:
u, v are the coordinates in the screen (or camera capture) and:
K, R and T are intrinsic camera parameters, rotation and translation, respectively.
How to convert K, R, T to model/view/projection matrices?
[R | T] would be your model-view matrix and K would be your projection matrix.
Model-view matrix is usually one matrix. The separation is only conceptual: Model translates from model coordinates to world coordinates and View from world coordinates to camera (not-yet-projected) coordinates. It makes sense in applications where the camera and the objects move independently from each other. In your case, on the other hand, camera can be considered fixed and everything else described relative to the camera. So you have to deal only with two matrices: model-view and projection.
Assuming that your camera intrinsic matrix comes from OpenCV (http://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html), how to initialize your OpenGL projection matrix is described there:
https://blog.noctua-software.com/opencv-opengl-projection-matrix.html
cx, cy, width and height are in pixels
As for your OpenGL model-view matrix, it's really simple:
So in the end your model-view-projection matrix is:

Converting Screen 2D to World 3D Coordinates

I want to convert 2D screen coordinates to 3D world coordinates. I have searched a lot but I did not get any satisfying result.
Note: I am not using OpenGL nor any other graphics library.
Data which I have:
Screen X
Screen Y
Screen Height
Screen Width
Aspect Ratio
If you have the Camera world Matrix and Projection Matrix this is pretty simple.
If you don't have the world Matrix you can compute it from it's position and rotation.
worldMatrix = Translate(x, y, z) * RotateZ(z_angle) * RotateY(y_angle) * RotateX(x_angle);
Where translate returns the the 4x4 translation matrices and Rotate returns the 4x4 rotation matrices around the given axis.
The projection matrix can be calculated from the aspect ratio, field of view angle, and near and far planes.
This blog has a good explanation of how to calculate the projection matrix.
You can unproject the screen coordinates by doing:
mat = worldMatrix * inverse(ProjectionMatrix)
dir = transpose(mat) * <x_screen, y_screen, 0.5, 1>
dir /= mat[3] + mat[7] + mat[11] + mat[15]
dir -= camera.position
Your ray will point from the camera in the direction dir.
This should work, but it's not a super concreate example on how to do this.
Basically you just need to do the following steps:
calculate camera's worldMatrix
calculate camera's projection matrix
multiply worldMatrix with inverse projection matrix.
create a point <Screen_X_Value, Screen_Y_Value, SOME_POSITIVE_Z_VALUE, 1>
apply this "inverse" projection to your point.
then subtract the cameras position form this point.
The resulting vector is the direction from the camera. Any point along that ray are the 3D coordinates corresponding to your 2D screen coordinate.