Same Marker position, Different Rotation and Translation matrices - OpenCV - opengl

I'm working on an Augmented Reality marker detection program, using OpenCV and I'm getting two different rotation and translation values for the same marker.
The 3D model switches between these states automatically without my control, when the camera is slightly moved. Screenshots of the above two situations are added below. I want the Image#1 to be the correct one. How to and where to correct this?
I have followed How to use an OpenCV rotation and translation vector with OpenGL ES in Android? to create the Projection Matrix for OpenGL.
ex:
// code to convert rotation, translation vector
glLoadMatrixf(ConvertedProjMatrix);
glColor3f(0,1,1) ;
glutSolidTeapot(50.0f);
Image #1
Image #2
Additional
I'd be glad if someone suggests me a way to make the Teapot sit on the marker plane. I know I have to edit the Rotation matrix. But what's the best way of doing that?

To rotate the teapot you can use glRotatef(). If you want to rotate your current matrix for example by 125° around the y-axis you can call:
glRotate(125,0,1,0);
I can't make out the current orientation of your teapot, but I guess you would need to rotate it by 90° around the x-axis.
I have no idea about your first problem, OpenCV seems unable to decide which of the shown positions is the "correct" one. It depends on what kind of features OpenCV is looking for (edges, high contrast, unique points...) and how you implemented it.

Have you tried swapping the pose algorithm (ITERATIVE, EPNP, P3P)? Or possibly use the values from the previous calculation - remember that it's just giving you its 'best guess'.

Related

Another Perspective Camera issue

- SOLVED -
Warning : I'm not native English speaker
Hi,
I'm currently trying to make a 3D camera, surely because of some mistakes or math basics that I don't have, anyway, I think I will definitely become insane if I don't ask for someone help.
OK lets go.
First, I've a custom game engine that allow to deal with camera only by setting up:
the projection parameters (according to an orthographic or perspective mode)
the view: with a vector 3 for the position and a quaternion for orientation
(and no, we will not discuss about this design right now)
Now I'm writing a camera in my gameplay code (which use the functionalities of the previous engine)
My camera's environment has the following specs:
up_vector = (0, 1, 0)
forward_vector = (0, 0, 1)
angles are in degrees
glm as math lib
In my camera code I handle the player input, convert them into data that I send to my engine.
In the engine I only do:
glm::mat4 projection_view = glm::perspective(...parameters...) * glm::inverse(view_matrix)
And voila I have my matrix for the rendering step.
And now a little scenario with simple geometry.
In a 3D space we have 7 circles, drawn from z = -300 to 300.
The circle at z = -300 is red and the one at 300 is blue,
There are decorative shapes (triangles/box), they are there to facilitate the identification of up and right
When I run the scenario I have got the following disco result !! Thing that I don't want.
As you can see on my exemple of colorful potatoid above, the blue circle is the bigger but is setup to be the farest on z. According to the perspective it should be the smaller. What happened ?
On the other hand, when I use an orthographic camera everything works well.
Any ideas ?
About the Perspective matrix
I generate my perspective matrix with the function glm::perspective(), After a quick check , I have confirmed that my parameters' values are always good, so I can easily imagine that my issue doesn't come from there.
About the View matrix
First, I think my problem must be around here, maybe ... So, I have a vector3 for the position of the camera and 3 float for describing its rotation on each axes.
And here is the experimental part where I don't know what I'm doing !
I copy the previous three float in a vector 3 that I use as an Euleur angles and use a glm quaternion constructor that can create a quat from Euler angles, like that :
glm::quat q(glm::radians(euler_angles));
Finally I send the quaternion like that into the engine, without having use my up and forward vector (anyway I do not see now how to use them)
I work on it for too long and I think my head will explode, the saddest is I think I'm really close.
PS0: Those who help me have my eternal gratitude
PS1: Please do not give me some theory links : I no longer have any neuron, and have already read two interesting and helpless books. Maybe because I have not understood everything yet.
(3D Math Primer for Graphics and Game Development / Mathematics for 3D Game Programming and Computer Graphics, Third Edition)
SOLUTION
It was a dumb mistake ... at the very end of my rendering pipeline, I forget to sort the graphical objects on them "z" according to the camera orientation.
You said:
In my camera code I handle the player input, convert them into data
that I send to my engine. In the engine I only do:
glm::mat4 projection_view = glm::perspective(...parameters...) *
glm::inverse(view_matrix)
And voila I have my matrix for the rendering step.
Are you using the projection matrix when you render the coloured circles?
Should you be using an identity matrix to draw the circle, the model is then viewed according to the view/perspective matrices ?
The triangles and squares look correct - do you have a different transform in effect when you render the circles ?
Hi TonyWilk and thanks
Are you using the projection matrix when you render the coloured circles?
Yes, I generate my projection matrix from the glm::perspective() function and after use my projection_view matrix on my vertices when rendering, as indicated in the first block of code.
Should you be using an identity matrix to draw the circle, the model is then viewed according to the view/perspective matrices ?
I don't know if I have correctly understood this question, but here is an answer.
Theoretically, I do not apply directly the perspective matrix into vertices. I use, in pseudo code:
rendering_matrix = projection_matrix * inverse_camera_view_matrix
The triangles and squares look correct - do you have a different transform in effect when you render the circles ?
At the end, I always use the same matrix. And if the triangles and squares seem to be good, that is only due to an "optical effect". The biggest box is actually associated to the blue circle, and the smaller one to the red

how do I re-project points in a camera - projector system (after calibration)

i have seen many blog entries and videos and source coude on the internet about how to carry out camera + projector calibration using openCV, in order to produce the camera.yml, projector.yml and projectorExtrinsics.yml files.
I have yet to see anyone discussing what to do with this files afterwards. Indeed I have done a calibration myself, but I don't know what is the next step in my own application.
Say I write an application that now uses the camera - projector system that I calibrated to track objects and project something on them. I will use contourFind() to grab some points of interest from the moving objects and now I want to project these points (from the projector!) onto the objects!
what I want to do is (for example) track the centre of mass (COM) of an object and show a point on the camera view of the tracked object (at its COM). Then a point should be projected on the COM of the object in real time.
It seems that projectPoints() is the openCV function I should use after loading the yml files, but I am not sure how I will account for all the intrinsic & extrinsic calibration values of both camera and projector. Namely, projectPoints() requires as parameters the
vector of points to re-project (duh!)
rotation + translation matrices. I think I can use the projectorExtrinsics here. or I can use the composeRT() function to generate a final rotation & a final translation matrix from the projectorExtrinsics (which I have in the yml file) and the cameraExtrinsics (which I don't have. side question: should I not save them too in a file??).
intrinsics matrix. this tricky now. should I use the camera or the projector intrinsics matrix here?
distortion coefficients. again should I use the projector or the camera coefs here?
other params...
So If I use either projector or camera (which one??) intrinsics + coeffs in projectPoints(), then I will only be 'correcting' for one of the 2 instruments . Where / how will I use the other's instruments intrinsics ??
What else do I need to use apart from load() the yml files and projectPoints() ? (perhaps undistortion?)
ANY help on the matter is greatly appreciated .
If there is a tutorial or a book (no, O'Reilly "Learning openCV" does not talk about how to use the calibration yml files either! - only about how to do the actual calibration), please point me in that direction. I don't necessarily need an exact answer!
First, you seem to be confused about the general role of a camera/projector model: its role is to map 3D world points to 2D image points. This sounds obvious, but this means that given extrinsics R,t (for orientation and position), distortion function D(.) and intrisics K, you can infer for this particular camera the 2D projection m of a 3D point M as follows: m = K.D(R.M+t). The projectPoints function does exactly that (i.e. 3D to 2D projection), for each input 3D point, hence you need to give it the input parameters associated to the camera in which you want your 3D points projected (projector K&D if you want projector 2D coordinates, camera K&D if you want camera 2D coordinates).
Second, when you jointly calibrate your camera and projector, you do not estimate a set of extrinsics R,t for the camera and another for the projector, but only one R and one t, which represent the rotation and translation between the camera's and projector's coordinate systems. For instance, this means that your camera is assumed to have rotation = identity and translation = zero, and the projector has rotation = R and translation = t (or the other way around, depending on how you did the calibration).
Now, concerning the application you mentioned, the real problem is: how do you estimate the 3D coordinates of a given point ?
Using two cameras and one projector, this would be easy: you could track the objects of interest in the two camera images, triangulate their 3D positions using the two 2D projections using function triangulatePoints and finally project this 3D point in the projector 2D coordinates using projectPoints in order to know where to display things with your projector.
With only one camera and one projector, this is still possible but more difficult because you cannot triangulate the tracked points from only one observation. The basic idea is to approach the problem like a sparse stereo disparity estimation problem. A possible method is as follows:
project a non-ambiguous image (e.g. black and white noise) using the projector, in order to texture the scene observed by the camera.
as before, track the objects of interest in the camera image
for each object of interest, correlate a small window around its location in the camera image with the projector image, in order to find where it projects in the projector 2D coordinates
Another approach, which unlike the one above would use the calibration parameters, could be to do a dense 3D reconstruction using stereoRectify and StereoBM::operator() (or gpu::StereoBM_GPU::operator() for the GPU implementation), map the tracked 2D positions to 3D using the estimated scene depth, and finally project into the projector using projectPoints.
Anyhow, this is easier, and more accurate, using two cameras.
Hope this helps.

Augmented Reality OpenGL+OpenCV

I am very new to OpenCV with a limited experience on OpenGL. I am willing to overlay a 3D object on a calibrated image of a checkerboard. Any tips or guidance?
The basic idea is that you have 2 cameras: one is the physical one (the one where you are retriving the images with opencv) and one is the opengl one. You have to align those two matrices.
To do that, you need to calibrate the physical camera.
First. You need a distortion parameters (because every lens more or less has some optical distortion), and build with those parameters the so called intrinsic parameters. You do this with printing a chessboard in a paper, using it for get some images and calibrate the camera. It's full of nice tutorial about that on the internet, and from your answer it seems you have them. That's nice.
Then. You have to calibrate the position of the camera. And this is done with the so called extrinsic parameters. Those parameters encoded the position and the rotation the the 3D world of those camera.
The intrinsic parameters are needed by the OpenCV methods cv::solvePnP and cv::Rodrigues and that uses the rodrigues method to get the extrinsic parameters. This method get in input 2 set of corresponding points: some 3D knowon points and their 2D projection. That's why all augmented reality applications need some markers: usually the markers are square, so after detecting it you know the 2D projection of the point P1(0,0,0) P2(0,1,0) P3(1,1,0) P4(1,0,0) that forms a square and you can find the plane lying on them.
Once you have the extrinsic parameters all the game is easily solved: you just have to make a perspective projection in OpenGL with the FoV and the aperture angle of the camera from intrinsic parameter and put the camera in the position given by the extrinsic parameters.
Of course, if you want (and you should) understand and handle each step of this process correctly.. there is a lot of math - matrices, angles, quaternion, matrices again, and.. matrices again. You can find a reference in the famous Multiple View Geometry in Computer Vision from R. Hartley and A. Zisserman.
Moreover, to handle correctly the opengl part you have to deal with the so called "Modern OpenGL" (remember that glLoadMatrix is deprecated) and a little bit of shader for loading the matrices of the camera position (for me this was a problem because I didn't knew anything about it).
I have dealt with this some times ago and I have some code so feel free to ask any kind of problems you have. Here some links I found interested:
http://ksimek.github.io/2012/08/14/decompose/ (really good explanation)
Camera position in world coordinate from cv::solvePnP (a question I asked about that)
http://www.morethantechnical.com/2010/11/10/20-lines-ar-in-opencv-wcode/ (fabulous blog about computer vision)
http://spottrlabs.blogspot.it/2012/07/opencv-and-opengl-not-always-friends.html (nice tricks)
http://strawlab.org/2011/11/05/augmented-reality-with-OpenGL/
http://www.songho.ca/opengl/gl_projectionmatrix.html (very good explanation on opengl camera settings basics)
some other random usefull stuffs:
http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html (documentation, always look at the docs!!!)
Determine extrinsic camera with opencv to opengl with world space object
Rodrigues into Eulerangles and vice versa
Python Opencv SolvePnP yields wrong translation vector
http://answers.opencv.org/question/23089/opencv-opengl-proper-camera-pose-using-solvepnp/
Please read them before anything else. As usual, once you got the concept it is an easy joke, need to crash your brain a little bit against the wall. Just don't be scared from all those math : )

Movement of a surgical robot's arm OpenGL

I have a question concerning surgical robot arm's movements in OpenGL.
Our arm consists of 7 pieces that suppose to be the arm's joints and they are responsible for bending and twisting the arm. We draw the arm this way: first we create the element which is responsible for moving the shoulder like "up and down" and then we "move" using Translatef to the point in which we draw the next element, responsible for twisting the shoulder (we control the movement using Rotatef) and so on with the next joints (elbow, wrist).
The point is to create an arm that can make human-like movements. Our mechanism works, but now our tutor wants us to draw a line strip with the end of the arm. We put the code responsible for drawing and moving an arm between push and pop matrix, so it works like in real, I mean when we move the soulder, any other elements in arm also moves.
There is a lot of elements moving, rotating, we have a couple of rotate matrices that are attached to different elements which we can control and now we have no idea how to precisely find a new location of the end of an arm in space to be able to add a new point to a line strip. Anyone can help?
glGetFloatv(GL_MODELVIEW_MATRIX,mvm2);
x=mvm2[12];
y=mvm2[13];
z=mvm2[14];
glPointSize(5.0f);
glColor3f(1.0f, 0.0f, 0.0f);
glBegin(GL_POINTS);
glVertex3f(x,y,z);
glEnd();
When I checked using watch what are the x,y,z values, I got (0,-1.16-12e,17.222222), what can't be true, as my arm has length about 9.0 (on z-axis). I think only the last column of modelview matrix is important and I don't have to muliply it by local coordinates of the vertex, as the they are (0,0,0) since I finish my drawning here.
we have no idea how to precisely find a new location of the end of an arm in space to be able to add a new point to a line strip.
You do this by performing the matrix math and transformations yourself.
(from comment)
To do this we are suppose to multiply the matrices and get some information out of glGetFloatv
Please don't do this. Especially not if you're supposed to build a pretransformed line strip geometry yourself. OpenGL is not a matrix math library and there's absolutely no benefit to use OpenGL's fixed function pipeline matrix functions. But it has a lot of drawbacks. Better use a real matrix math library.
Your robot arm technically consists of a number of connected segments where each segment is transformed by the composition of transformations of the segments upward in the transformation hierachy.
M_i = M_{i-1} · (R_i · T_i)
where R_i and T_i are the respective rotation and translation of each segment. So for each segment you need the individual transform matrix to retrieve the point of the line segment.
Since you'll place each segment's origin at the tip of the previous segment you'd transform the homogenous point (0,0,0,1) with the segment's transformation matrix, which has the nice property of being just the 4th column of the transformation matrix.
This leaves you with the task of creating the transformation matrix chain. Doing this with OpenGL is tedious. Use a real math library for this. If your tutor insists on you using the OpenGL fixed function pipeline please ask him to show you the reference for the functions in the specicifications of a current OpenGL version (OpenGL-3 and later); he won't find them because all the matrix math functions have been removed entirely from modern OpenGL.
For math libraries I can recommend GLM, Eigen (with the OpenGL extra module) and linmath.h (self advertisement). With each of these libraries building transformation hierachies is simple, because you can create copies of each intermediary matrix without much effort.
If you're supposed to use glGetFloatv, then this refers to calling it with the GL_MODELVIEW_MATRIX argument, which returns the current model view matrix. You can then use this matrix to transform a point from the coordinate system of the hand to the world space CS.
However, calling glGetFloatv is bad practice, as it will probably result in reduced rendering performance. I think you should talk to your tutor about teaching outdated and even deprecated functionality, maybe he can get the prof to update the materials.
Edit: Your code for retrieving the translation is correct. However, you can't draw the point with the same model view matrix applied. Before drawing it, you have to reset the model view matrix with glLoadIdentity or by popping .

Rigid motion estimation

Now what I have is the 3D point sets as well as the projection parameters of the camera. Given two 2D point sets projected from the 3D point by using the camera and transformed camera(by rotation and translation), there should be an intuitive way to estimate the camera motion...I read some parts of Zisserman's book "Muliple view Geometry in Computer Vision", but I still did not get the solution..
Are there any hints, how can the rigid motion be estimated in this case?
THANKS!!
What you are looking for is a solution to the PnP problem. OpenCV has a function which should work called solvePnP. Just to be clear, for this to work you need point locations in world space, a camera matrix, and the points projections onto the image plane. It will then tell you the rotation and translation of the camera or points depending on how you choose to think of it.
Adding to the previous answer, Eigen has an implementation of Umeyama's method for estimation of the rigid transformation between two sets of 3d points. You can use it to get an initial estimation, and then refine it using an optimization algorithm and considering the projections of the 3d points onto the images too. For example, you could try to minimize the reprojection error between 2d points on the first image and projections of the 3d points after you bring them from the reference frame of one camera to the the reference frame of the other using the previously estimated transformation. You can do this in both ways, using the transformation and its inverse, and try to minimize the bidirectional reprojection error. I'd recommend the paper "Stereo visual odometry for autonomous ground robots", by Andrew Howard, as well as some of its references for a better explanation, especially if you are considering an outlier removal/inlier detection step before the actual motion estimation.