Head Pose Estimation with 2 cameras - opengl

I am working on head pose estimation project using 2 cameras. For one camera system works and returns rotation matrix and translation vector of a head with respect to each camera coordinate system. I have rendered object in OpenGL scene which is rotated and translated to represent head movements. To display computed rotation matrix and translation vector I simply use the following OpenGL commands.
glMatrixMode(GL_MODELVIEW);
glLoadMatrixd(pose_matrix);
where pose matrix is OpenGL ModelView matrix constructed from rotation matrix and translation vector of a head.
Now I am trying to do this for 2 calibrated cameras. When the first camera lost the track of a face but the second one estimates head pose I display the rotation and translation with respect to second and visa versa. I want to display one OpenGl object and move it for both cases. For that I need to transfer pose matrices into the common coordinate frame.
I know the relative geometry of 2 cameras with respect to each other. I assume one of the cameras is the world coordinate frame and I transfer the head pose matrix of the second camera to the frame of first camera by multiplying pose matrix and calibration matrix of second camera with respect to first camera. When I load this multiplied matrix into the OpenGL ModelView matrix I get wrong results. When first camera captures face the object is moving right but for the second camera object is translated and rotated and is not in the same place as for the case of first camera.
What could be the problem? Maybe OpenGL displaying part is wrong or?

Safe default assumption: OpenGL is right, your code is wrong.
Without seeing your code, I can suggest to printf your matrices and double-check the math with them in Matlab or Octave.
A common mistake is to forget that by default OpenGL PRE-multiplies ROW-vectors by the modelvie matrix (indeed all matrices). That is, it multiplies as v_row * M, with the matrix stored in col-major order, whereas you may be thinking within the common mathematical convention of treating vectors as COLUMN ones, and POST-multiplying them as M * v_col (with M stored row-major).
If you prefer working with the latter convention(recommended), look up up the GL_ARB_transpose_matrix extension.

Related

Measure real size of object with Calibrated Camera opencv c++?

i am completing my thesis related opencv.
I want to measure real size of object (mm) with single camera but i have problem with convert the camera's natural units (pixels) and the real world units!!!
After calibrate camera, i have:
Camera matrix (3x3)
Distortion coefficients
Extrinsic parameters [rotation vector(1x3) + translation vector(1x3)]
I have read following link but i can't find out formula to convert unit.
https://docs.opencv.org/2.4/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html
Example about measure size of object
Any sugguestion???
Thanks so much.
As mentioned in the comments, you need the distance to the object to obtain 3D coordinates from pixels. A possible workflow would be:
Rectify the image using the distortion parameters, i.e., correct the distortion caused by the camera.
Deproject the pixels into 3D points in the camera coordinate frame using the camera matrix. For this you can multiply the inverse of the 3x3 camera matrix with a vector containing the pixels [pixel_x, pixel_y, 1]^T. If you multiply the result [x', y', 1]^T with the depth, i.e., the z-component you obtain the 3D point in the camera coordinate frame.
Transform the point from the camera coordinate frame into the world coordinate frame using the extrinsics parameters.
Obtaining the depth values from an image alone is not possible. The only option is to use some additional information. Maybe your object is placed on a table and you know the distance between the camera and the table.
To measure distances between the camera and a table or even the object itself you could use Aruco markers, which are also available within openCV.

How to triangulate Points from a Single Camera multiple Images?

I have a single calibrated camera pointing at a checkerboard at different locations with Known
Camera Intrinsics. fx,fy,cx,cy
Distortion Co-efficients K1,K2,K3,T1,T2,etc..
Camera Rotation & Translation (R,T) from IMU
After Undistortion, I have computed Point correspondence of checkerboard points in all the images with a known camera-to-camera Rotation and Translation vectors.
How can I estimate the 3D points of the checkerboard in all the images?
I think OpenCV has a function to do this but I'm not able to understand how to use this!
1) cv::sfm::triangulatePoints
2) triangulatePoints
How to compute the 3D Points using OpenCV?
Since you already have the matched points form the image you can use findFundamentalMat() to get the fundamental matrix. Keep in mind you need at least 7 matched points to do this. If you have more then 8 points CV_FM_RANSAC might be the best option.
Then use cv::sfm::projectionsFromFundamental() to find the projection matrix for each image, check if the projection matrix is valid (ex.check if the points are in-front of the camera).
then feed the projections and the points it into cv::sfm::triangulatePoints().
Hope this helps :)
Edit
The rotation and translation matrix are needed to change reference frames because the camera moves in SFM. The reference frame is at the position of the camera. Transforms are needed to make sure the position of the points a coherent(under the same reference frame which is usually the reface frame of the camera in the first image), so all the points are in the same coordinate system.
IE. To relate the point gathered by the second frame to the first frame, the third to second frame and so on.
So basically you can use the R and T vector to construct a transform matrix for each frame and multiplying it with your points to put them in the reface frame of the camera in the first frame.

OpenCV Structure from Motion Reprojection Issue

I am currently facing an issue with my Structure from Motion program based on OpenCv.
I'm gonna try to depict you what it does, and what it is supposed to do.
This program lies on the classic "structure from motion" method.
The basic idea is to take a pair of images, detect their keypoints and compute the descriptors of those keypoints. Then, the keypoints matching is done, with a certain number of tests to insure the result is good. That part works perfectly.
Once this is done, the following computations are performed : fundamental matrix, essential matrix, SVD decomposition of the essential matrix, camera matrix computation and finally, triangulation.
The result for a pair of images is a set of 3D coordinates, giving us points to be drawn in a 3D viewer. This works perfectly, for a pair.
Indeed, here is my problem : for a pair of images, the 3D points coordinates are calculated in the coordinate system of the first image of the image pair, taken as the reference image. When working with more than two images, which is the objective of my program, I have to reproject the 3D points computed in the coordinate system of the very first image, in order to get a consistent result.
My question is : How do I reproject 3D points coordinate given in a camera related system, into an other camera related system ? With the camera matrices ?
My idea was to take the 3D point coordinates, and to multiply them by the inverse of each camera matrix before.
I clarify :
Suppose I am working on the third and fourth image (hence, the third pair of images, because I am working like 1-2 / 2-3 / 3-4 and so on).
I get my 3D point coordinates in the coordinate system of the third image, how do I do to reproject them properly in the very first image coordinate system ?
I would have done the following :
Get the 3D points coordinates matrix, apply the inverse of the camera matrix for image 2 to 3, and then apply the inverse of the camera matrix for image 1 to 2.
Is that even correct ?
Because those camera matrices are non square matrices, and I can't inverse them.
I am surely mistaking somewhere, and I would be grateful if someone could enlighten me, I am pretty sure this is a relative easy one, but I am obviously missing something...
Thanks a lot for reading :)
Let us say you have a 3 * 4 extrinsic parameter matrix called P. To match the notations of OpenCV documentation, this is [R|t].
This matrix P describes the projection from world space coordinates to the camera space coordinates. To quote the documentation:
[R|t] translates coordinates of a point (X, Y, Z) to a coordinate system, fixed with respect to the camera.
You are wondering why this matrix is non-square. That is because in the usual context of OpenCV, you are not expecting homogeneous coordinates as output. Therefore, to make it square, just add a fourth row containing (0,0,0,1). Let's call this new square matrix Q.
You have one such matrix for each pair of cameras, that is you have one Qk matrix for each pair of images {k,k+1} that describes the projection from the coordinate space of camera k to that of camera k+1. Those matrices are inversible because they describe isometries in homogeneous coordinates.
To go from the coordinate space of camera 3 to that of camera 1, just apply to your points the inverse of Q2 and then the inverse of Q1.

Computer Vision: labelling camera pose

I am trying to create a dataset of images of objects at different poses, where each image is annotated with camera pose (or object pose).
If, for example, I have a world coordinate system and I place the object of interest at the origin and place the camera at a known position (x,y,z) and make it face the origin. Given this information, how can I calculate the pose (rotation matrix) for the camera or for the object.
I had one idea, which was to have a reference coordinate i.e. (0,0,z') where I can define the rotation of the object. i.e. its tilt, pitch and yaw. Then I can calculate the rotation from (0,0,z') and (x,y,z) to give me a rotation matrix. The problem is, how to now combine the two rotation matrices?
BTW, I know the world position of the camera as I am rendering these with OpenGL from a CAD model as opposed to physically moving a camera around.
The homography matrix maps between homogeneous screen coordinates (i,j) to homogeneous world coordinates (x,y,z).
homogeneous coordinates are normal coordinates with a 1 appended. So (3,4) in screen coordinates is (3,4,1) as homogeneous screen coordinates.
If you have a set of homogeneous screen coordinates, S and their associated homogeneous world locations, W. The 4x4 homography matrix satisfies
S * H = transpose(W)
So it boils down to finding several features in world coordinates you can also identify the i,j position in screen coordinates, then doing a "best fit" homography matrix (openCV has a function findHomography)
Whilst knowing the camera's xyz provides helpful info, its not enough to fully constrain the equation and you will have to generate more screen-world pairs anyway. Thus I don't think its worth your time integrating the cameras position into the mix.
I have done a similar experiment here: http://edinburghhacklab.com/2012/05/optical-localization-to-0-1mm-no-problemo/

OpenGl polygon rotation

I'm trying to implement a moving and rotating polygon in OpenGl and C++.
Movement and rotation are along the XZ plane(2D transformations only).
The polygon is defined by a centre point and a set of vertices whose coordinates are stored as offsets from the centre point.
The polygon is moved based on the user's key-press either in X or Z direction by simply adding the moved distance to the centre point and updating the vertices by adding the offset values to centre coordinates.
Rotation with respect to centre point is implemented by using the glRotatef() function.
But i need to know the coordinates of vertices for collision detection calculations.
Is there any chance of just retrieving the vertex coordinates of the transformed polygon without performing matrix operations myself?
The glRotatef function creates a matrix which is multiplied with the current matrix that exists on the stack to get the rotation on screen. Even if you could obtain that matrix then you would still have to multiply it against your vectors to obtain the values you want, which is what you'd have to do if you did the maths yourself. Just like datenwolf said, it would be better for you to make a maths library yourself that will perform all the necessary things needed for manipulating objects in a 2d or 3d world.
Is there any chance of just retrieving the vertex coordinates of the transformed polygon...
OpenGL is not a math library. It's only meant for drawing. Also the matrix manipulation functions of fixed function OpenGL are obsolete and have been removed from OpenGL-3 core and further.
without performing matrix operations myself?
In fact, this is the recommended way to do this. Remember: OpenGL is just your drawing tool, not a 3D-renderer-game-simulation-engine-math-geometry-toolkit.