Convert GL modelview matrix to world coordinates - c++

I'm using Qualcomm's AR SDK to track an object.
I have the following functions available: (specifically look at "Namespace List->QCAR::Tool").
I can get the tracked item's modelview matrix by using the convertPose2GLMatrix (const Matrix34F &pose) function, as I get a pose matrix for each tracked item.
My goal - to determine the marker's location in "the real world". You can assume my camera will be stationary.
I have read numerous articles online, and my general understanding is this:
I need to pick a modelview matrix from where I choose the axis' 0,0,0 point to be (i.e. - copy the matrix I get for that point).
I then need to transpose that matrix. Then, each model view matrix I extract should be multiplied by that matrix and then by an (x,y,z,1) vector, to obtain the coordinates (ignoring the 4th item).
Am I correct? Is this the way to go? And if not - what is?

I like to think of ortho matrices as moving from one coordinate space to another, so what you suggest is correct, yet I would do it the other way around:
1.) Use a reference coordinate system S relative to your camera (i.e. just one of your matrices determined by the pose estimation)
2.) For each pose estimation T calculate the following:
W = S * T^-1 = S * transpose(T)
3.) From matrix W pick the 4 column, as your world position.


How to do the correspondance 2D-3D points

I'm working with OpenCv API on an augmented reality project using one camera.I have :
The 3D point of my 3D object( i get 4 points from MeshLab)
The 2D points which i want to follow ( i have 4 points):these points are not the projection of the 3D points.
Intrinsic camera parameters.
Using these parameters, i have the extrinsic parameters( rotation and translation using the cvFindExtrinsicParam function) which i have used to render my model and set the modelView matrix.
My problem is that the 3D model are not shown in particular position: it has been shown in différent location on my image. How can i fix the model location and then the modelView matrix?
In other forums they told me that i should do the correspondance 2D-3D to get the extrinsic parameters but i don't know how to correspond my 2D points with the 3D points?
Typically you would design the points you want to track in such a fashion that the 2d-3d correspondence is immediately clear. The easiest way to do this is to have points with different colors. You could also go with some sort of pattern (google augmented reality cards) which you would then have to analyze in order to find out how it is rotated in the image. The pattern of course can not be rotation symmetric.
If you can't do that, you can try out all the different permutations of the points, plug them into OpenCV to get a matrix, then project your 3D points to 2D points with those matrices, and then see which one fits best.

use homography to rotate around x/y axes

The Project
I am working on a texture tracking project for mobile. It exclusively tracks planar surfaces so I have been using openCV's cv::FindHomography() to calculate the homography between two frames. That function runs very very slow however and is the primary bottleneck in my pipeline. I decided that an algorithm that can take an initial estimate of the homography would run much faster because my change in homography between frames is very small. Also, my outlier percentage is very small so robust methods are optional. Unfortunately, to my knowledge open CV does not include a homography finder that takes an initial estimate. It does however include solvePnP() which takes the original 3d world coordinates of the scene, the current 2d image coordinates, a camera matrix, distortion parameters, and most importantly an initial estimate. I am trying to replace FindHomography with solvePnP. Since I use only 2d coordinates throughout the pipeline and solvePnP asks for 3d coordinates I am trying to move from 2d->3d->3d_transform->2d_transform. Right now that process runs 6x faster than FindHomography() if it is given a good initial guess but it has issues.
The Problem
Something is wrong with how I am converting. My theory was that since a camera matrix is not required to find a homography it should not be required for this process since I only want the information contained in a homography in the end. I also assumed that since I throw out all z information in the end how I initialize z should not matter. My process is as follows
First I convert all my initial 2d coordinates to 3d by giving them a z pos of 1. I can assume that my original coordinates lie flat in the x-y plane. Then
cv::Mat rot_mat; //3x3 rotation matrix
cv::Mat pnp_rot; //3x1 rotation vector
cv::Mat pnp_tran; //3x1 translation vector
cv::Matx33f camera_matrix(1,0,0,
cv::Matx41f dist(0,0,0,0);
cv::solvePnP(original_cord, current_cord, camera_matrix, dist, pnp_rot, pnp_tran,true);
//Rodrigues converts from a rotation vector to a rotation matrix
cv::Rodrigues(pnp_rot, rot_mat);
cv::Matx33f homography(rot_mat(0,0),rot_mat(0,1),pnp_tran(0),
The conversion to a homography here is simple. The first two columns of the homography are from the 3x3 rotation matrix, the last column is the translation vector. The one trick is that homography(2,2) corresponds to scale while pnp_tran(2) corresponds to movement in the z axis. Given that I initialize my z coordinates to 1, scale is z_translation + 1. This process works perfectly for 4 of 6 degrees of freedom. Translation_x, translation_x, scale, and rotation about z all work. Rotation about x and y however display significant error. I believe that this is due to initializing my points at z = 1 but I don't know how to fix it.
The Question
Was my assumption that I can get good results from solvePnP by using a faked camera matrix and initial z coord correct? If so, how should I set up my camera matrix and z coordinates to make x and y rotation work? Also if anyone knows where I could get a homography finding algorithm that takes an initial guess and works only in 2d, or information on techniques for writing my own it would be very helpful. I will most likely be moving in that direction once I get this working.
I built myself a test program which takes a homography, generates a set of coplanar points from that homography, and then runs the points through solvePnP to recover the specified homography. In the process of doing this I realized that I am fundamentally misunderstanding some part of how homographies are constructed. I have been assuming that a homography is constructed as follows.
hom(0,2) = x translation
hom(1,2) = y translation
hom(2,2) = scale, I can divide the entire matrix by this to normalize
the first two columns I assumed were the first two columns of a 3x3 rotation matrix. This essentially amounts to taking a 3x4 transform and throwing away column(2). I have discovered however that this is not true. The test case showing me the error of my ways was trying to make a homography which rotates points some small angle around the y axis.
//rotate by .0175 rad about y axis
rot_mat = (1,0,.0174,
//my conversion method to make this a homography gives
homography = (1,0,0,
The above homography does not work at all. Take for example a point x,y,1 where x > 58. The result will be x,y,some_negative_number. When I convert from homogeneous coordinates back to cartesian my x and y values will both flip signs.
All that is to say, I now have a much simpler question that I think would let me solve everything. How do I construct a homography that rotates points by some angle around the x and y axis?
Homographies are not simple translation or rotation matrices. The aim is to map straight lines to straight lines rather than to map single points to other points. They take into account perspective matrices to achieve this and are explained here
Hence, homography matrices cannot be easily decomposed, but there are (complicated) ways to do so shown here. This may help you extract rotations and translations out of it.
This should help you better understand a homography, but the rest I am unfamiliar with.

Confused by localize matrix - works when passed to OpenGL but not when doing my own arthmetic?

I'm very confused as to what my problem is here. I've set up a matrix which converts global/world coordinates into a local coordinate space of an object. This conversion matrix is constructed using object information from four vectors (forward, up, side and position). This localization matrix is then passed to glMultMatrixf() at the draw time for each object so as I can draw a simple axes around each object to visualize the local coordinate system. This works completely fine and as expected, and as the objects move and rotate in the world, so do their local coordinate axes.
The problem is that when I take this same matrix and multiply it by a column vector (to convert the global position of one object into the local coordinate system of another object) the result is not at all as I would expect. For example:
My localize matrix is as follows:
0.84155 0.138 0.5788 0
0.3020 0.8428 -0.5381 8.5335
0.4949 -0.5381 -0.6830 -11.6022
0.0 0.0 0.0 1.0
I input the position column vector:
And get the output of:
As my object's position at this point in time is (-50.8, 8.533, -11.602, 1), I know that the output for the x coordinate cannot possibly be as great as -99.2362. Futhermore, when I find the distance between two global points, and the distance between the localized point and the origin, they are different.
I've checked this in Matlab and it seems that my matrix multiplication is correct (Note: in Matlab you have to first transpose the localize matrix). So I'm left to think that my localize matrix is not being constructed correctly - but then OpenGL is successfully using this matrix to draw the local coordinate axes!
I've tried to not include unnecessary details in this question but if you feel that you need more please don't hesitate to ask! :)
Many thanks.
I have to guess, but I would like to point out two sources of problems with OpenGL-matrix multiplication:
the modelview matrix transforms to a coordinate system where the camera is always at the origin (0,0,0) looking along the z-axis. So if you made some transformations to "move the camera" before applying local->global transformations, you must compensate for the camera movement or you will get coordinates local to the camera's coordinate space. Did you include camera transformations when you constructed the matrix?
Matrices in OpenGL are COLUMN-major. If you have an array with 16 values, the elements will be ordered that way:
[0][4][ 8][12]
[1][5][ 9][13]
Your matrix also seems strange. The first three columns tell me, that you applied some rotation or scaling transformations. The last column shows the amount of translation applied to each coordinate element. The numbers are the same as your object's position. That means, if you want the output x coordinate to be -50.8, the first three elements in the first row should add up to zero:
-30*0.8154 -30*0.3020 -30*0.4939 + 1 * -50.8967
<---this should be zero--------> but is -48,339.
So I think, there really is a problem when constructing the matrix. Perhaps you can explain how you construct the matrix...

3d geometry: how to align an object to a vector

i have an object in 3d space that i want to align according to a vector.
i already got the Y-rotation out by doing an atan2 on the x and z component of the vector. but i would also like to have an X-rotation to make the object look downwards or upwards.
imagine a plane that does it's pitch yaw roll, just without the roll.
i am using openGL to set the rotations so i will need an Y-angle and an X-angle.
I would not use Euler angles, but rather a Euler axis/angle. For that matter, this is what Opengl glRotate uses as input.
If all you want is to map a vector to another vector, there are an infinite number of rotations to do that. For the shortest one, (the one with the smallest angle of rotation), you can use the vector found by the cross product of your from and to unit vectors.
axis = from X to
from there, the angle of rotation can be found from = cos(theta) (assuming unit vectors)
theta = arccos(
glRotate(axis, theta) will then transform from to to.
But as I said, this is only one of many rotations that can do the job. You need a full referencial to define better how you want the transform done.
You should use some form of quaternion interpolation (Spherical Linear Interpolation) to animate your object going from its current orientation to this new orientation.
If you store the orientations using Quaternions (vector space math), then you can get the shortest path between two orientations very easily. For a great article, please read Understanding Slerp, Then Not Using It.
If you use Euler angles, you will be subject to gimbal lock and some really weird edge cases.
Actually...take a look at this article. It describes Euler Angles which I believe is what you want here.

Coordinate Transformation C++

I have a webcam pointed at a table at a slant and with it I track markers.
I have a transformationMatrix in OpenSceneGraph and its translation part contains the relative coordinates from the tracked Object to the Camera.
Because the Camera is pointed at a slant, when I move the marker across the table the Y and Z axis is updated, although all I want to be updated is the Z axis, because the height of the marker doesnt change only its distance to the camera.
This has the effect when when project a model on the marker in OpenSceneGraph, the model is slightly off and when I move the marker arround the Y and Z values are updated incorrectly.
So my guess is I need a Transformation Matrix with which I multiply each point so that I have a new coordinate System which lies orthogonal on the table surface.
Something like this: A * v1 = v2 v1 being the camera Coordinates and v2 being my "table Coordinates"
So what I did now was chose 4 points to "calibrate" my system. So I placed the marker at the top left corner of the Screen and defined v1 as the current camera coordinates and v2 as (0,0,0) and I did that for 4 different points.
And then taking the linear equations I get from having an unknown Matrix and two known vectors I solved the matrix.
I thought the values I would get for the matrix would be the values I needed to multiply the camera Coordinates with so the model would updated correctly on the marker.
But when I multiply the known Camera Coordinates I gathered before with the matrix I didnt get anything close to what my "table coordinates" were suposed to be.
Is my aproach completely wrong, did I just mess something up in the equations? (solved with the help of Is there an easier or better way of doing this?
Any help would be greatly appreciated, as I am kind of lost and under some time pressure :-/
when I move the marker across the table the Y and Z axis is updated, although all I want to be updated is the Z axis, because the height of the marker doesnt change only its distance to the camera.
Only true when your camera's view direction is aligned with your Y axis (or Z axis). If the camera is not aligned with Y, it means the transform will apply a rotation around the X axis, hence modifying both the Y and Z coordinates of the marker.
So my guess is I need a Transformation Matrix with which I multiply each point so that I have a new coordinate System which lies orthogonal on the table surface.
Yes it is. After that, you will have 2 transforms:
T_table to express marker's coordinates in the table referential,
T_camera to express table coordinates in the camera referential.
Finding T_camera from a single 2d image is hard because there's no depth information.
This is known as the Pose problem -- it has been studied by -among others- Daniel DeMenthon. He developed a fast and robust algorithm to find the pose of an object:
articles available on its research homepage, section 4 "Model Based Object Pose" (and particularly "Model-Based Object Pose in 25 Lines of Code", 1995);
code at the same place, section "POSIT (C and Matlab)".
Note that the OpenCv library offers an implementation of the DeMenthon's algorithm. This library also offers a convenient and easy-to-use interface to grab images from a webcam. It's worth a try: OpenCv homepage
If you know the location in the physical world of your four markers and you've recorded the positions as they appear on the camera, you ought to be able to derive some sort of transform.
When you do the calibration, surely you'd want to put the marker at the four corners of the table not the screen? If you're just doing the corners of the screen, I imagine you're probably not taking into acconut the slant of the table.
Is the table literally just slanted relative to the camera or is it also rotated at all?