Trying to understand oblique clipping method i've got some problem with theory. According to this article written by Eric Lengyel at the end of 2 chapter we get clipping spaces:
Near <0,0,1,1>
Far <0,0,-1,1>
...
And it is said that:
each camera-space plane is
expressed as a sum or difference of two rows of the projection matrix
THIS moment i can not understand. For example, if it's said that Near plane value is "M4 + M3" (where M4 and M3 are the fourth and third rows of projection matrix), and other values are calculated similarly, then the conclusion follows that projection matrix MUST be Identity (to get <0,0,1,1> result from M4 + M3). But we know that it's different. So, can someone explain, what matrix we use and and what is the connection with the projection matrix?
THIS moment i can not understand. For example, if it's said that Near plane value is "M4 + M3" (where M4 and M3 are the fourth and third rows of projection matrix), and other values are calculated similarly, then the conclusion follows that projection matrix MUST be Identity (to get <0,0,1,1> result from M4 + M3).
First of all, your logic is very flawed here. To get a vector c=(0,0,1,1) out of the sum of two vectors a+b, you can find an infinte amount of vectors a and b fulfilling this, for example (7,-2pi,0,42) + (-7, 2pi, 1, -41) = (0,0,1,1).
However, this is completely besides the point, because you misunderstood crucial parts of that article. The clip planes you specified here are in clip space (for the special case that w = 1, as explained in the article). If we would want to find the equations for the clip planes in clip space, there would be absolutely no need for doing any calculations at all because the clip planes are defined in clip space as fixed equations. There is no point in calculating M4+M3 if we already know it would yield (0,0,1,1).
The whole article talks about efficiently calculating the clip planes in eye space. And table 1 of that paper makes this extremely clear:
Related
I'm following this explanation on the P3P problem and have a few questions.
In the heading labeled Section 1 they project the image plane points onto a unit sphere. I'm not sure why they do this, is this to simulate a camera lens? I know in OpenCV, we first compute the intrinsics of the camera and factor it into solvePnP. Is this unit sphere serving a similar purpose?
Also in Section 1, where did $u^{'}_x$, $u^{'}_y$, and $u^{'}_z$ come from, and what are they? If we are projecting onto a 2D plane then why do we need the third component? I know the standard answer is "because homogenous coordinates" but I can't seem to find an explanation as to why we use them or what they really are.
Also in Section 1 what does "normalize using L2 norm" mean, and what did this step accomplish?
I'm hoping if I understand Section 1, I can understand the notation in the following sections.
Thanks!
Here are some hints
The projection onto the unit sphere has nothing to do with the camera lens. It is just a mathematical transformation intended to simplify the P3P equation system (whose solutions we are trying to compute).
$u'_x$ and $u'_y$ are the coordinates of $(u,v) - P$ (here $P=(c_x, c_y)$), normalized by the focal distances $f_x$ and $f_y$. The subtraction of the camera optical center $P$ is a translation of the origin to this point. The introduction of the $z$ coordinate $u'_z=1$ moves the 2D point $(u'_x, u'_y)$ to the 3D plane defined by the equation $z=1$ (the 3D plane parallel to the $xy$ plane). Note that by moving points to the plane $z=1$, you now can better visualize of them as the intersections of 3D lines that pass thru $P$ and them. In other words, these points become the projections onto a 2D plane of 3D points located somewhere on those lines (well, not merely "somewhere" but at the focal distance, which has now been "normalized" to 1 after dividing by $f_x$ and $f_y$). Again, all transformations intended to solve the equations.
The so called $L2$ norm is nothing but the usual distance that comes from the Pithagoras Theorem ($a^2 + b^2 = c^2$), only that it's being used to measure distances between points in the 3D space.
I'm trying to understand how to restore the z from the depthbuffer and trying to do the math, based on these two posts:
Getting the true z value from the depth buffer (stackoverflow)
http://ogldev.atspace.co.uk/www/tutorial46/tutorial46.html
The two posts use different projection matrices (specifically the lower-right 2x2 part is different, one difference being use of a -1 vs a +1 in [ 3 ][ 4 ]). Not really sure why that would be, afaik the one with -1 is "the correct OpenGL projection-matrix" (right?)
Now I tried to do the calculation for both and the weird thing is that in the SO-post it mentions -A-B/z (or in my calculation -S-T/z). Then the code shows
And solving this for A and B (or S and T) gives
Okay, now doing the calculation for both Projection-matrices from scratch, (left=ogldev.atspace.co.uk, right=stackoverflow) and now it gets confusing because up to the -S-T/z part everything is fine, but when we compare the solved formula for which should have been the stackoverflow-case (-1 projection-matrix) it matches the one from ogldev.atspace.co.uk (+1 projection matrix) - colored in red...
this is confusing, any clues what I'm doing wrong?!
Updated calculations, see comments from "derhass" below:
The two posts use different projection matrices (specifically the
lower-right 2x2 part is different, one difference being use of a -1 vs
a +1 in [ 3 ][ 4 ]). Not really sure why that would be, afaik the one
with -1 is "the correct OpenGL projection-matrix" (right?)
No. There is no "right" and "wrong" here. There are just conventions. The "correct" matrix is the one which does the right thing, for the conventions you chose to use. Classic GL's glFrustum function did indeed use the matrix from that StackOverflow post. The convention here is that projection center is at origin, view direction is -z, x is right and y up. But you can use any convention, with arbitrary principal point, and arbitrary projection direction. The other matrix is just +z as the projection direction, which can be interpreted as a flipped handedness of the coordinate space. It can also be interpreted as just looking in the opposing direction while still keep the left-handed coordinate system.
this is confusing, any clues what I'm doing wrong?!
I'm not sure what you're trying to prove here, besides the fact that introducing small sign errors will give bogus results...
Your derivations for the "+z" projection matrix seem OK. It maps depth=0 to z=n, and depth=1 to z=f, which is the standard way of mapping these - and just another convention. You could also use a Reversed-Z mapping, where the near plane is mapped to depth 1, and far plane mapped to depth 0.
UPDATE
For the second matrix, you flipped the sign again, even after the corrections from my comment. When you substitued S and T back in that final formula, you actually substituted in -S. If you did the correct substitutions, you would have gotten [Now after you fixed the calulation again, you have got]
a formala which is exactly the negated one as in the +z matrix case - depth = 0 is mapped to -n, and depth=1 to -f, which is excatly how those parameters are defined in the classic GL convention, where n and f just describe the distances to those plane, in viewing direction (-z).
The Project
I am working on a texture tracking project for mobile. It exclusively tracks planar surfaces so I have been using openCV's cv::FindHomography() to calculate the homography between two frames. That function runs very very slow however and is the primary bottleneck in my pipeline. I decided that an algorithm that can take an initial estimate of the homography would run much faster because my change in homography between frames is very small. Also, my outlier percentage is very small so robust methods are optional. Unfortunately, to my knowledge open CV does not include a homography finder that takes an initial estimate. It does however include solvePnP() which takes the original 3d world coordinates of the scene, the current 2d image coordinates, a camera matrix, distortion parameters, and most importantly an initial estimate. I am trying to replace FindHomography with solvePnP. Since I use only 2d coordinates throughout the pipeline and solvePnP asks for 3d coordinates I am trying to move from 2d->3d->3d_transform->2d_transform. Right now that process runs 6x faster than FindHomography() if it is given a good initial guess but it has issues.
The Problem
Something is wrong with how I am converting. My theory was that since a camera matrix is not required to find a homography it should not be required for this process since I only want the information contained in a homography in the end. I also assumed that since I throw out all z information in the end how I initialize z should not matter. My process is as follows
First I convert all my initial 2d coordinates to 3d by giving them a z pos of 1. I can assume that my original coordinates lie flat in the x-y plane. Then
cv::Mat rot_mat; //3x3 rotation matrix
cv::Mat pnp_rot; //3x1 rotation vector
cv::Mat pnp_tran; //3x1 translation vector
cv::Matx33f camera_matrix(1,0,0,
0,1,0,
0,0,1);
cv::Matx41f dist(0,0,0,0);
cv::solvePnP(original_cord, current_cord, camera_matrix, dist, pnp_rot, pnp_tran,true);
//Rodrigues converts from a rotation vector to a rotation matrix
cv::Rodrigues(pnp_rot, rot_mat);
cv::Matx33f homography(rot_mat(0,0),rot_mat(0,1),pnp_tran(0),
rot_mat(1,0),rot_mat(1,1),pnp_tran(1),
rot_mat(2,0),rot_mat(2,1),pnp_tran(2)+1);
The conversion to a homography here is simple. The first two columns of the homography are from the 3x3 rotation matrix, the last column is the translation vector. The one trick is that homography(2,2) corresponds to scale while pnp_tran(2) corresponds to movement in the z axis. Given that I initialize my z coordinates to 1, scale is z_translation + 1. This process works perfectly for 4 of 6 degrees of freedom. Translation_x, translation_x, scale, and rotation about z all work. Rotation about x and y however display significant error. I believe that this is due to initializing my points at z = 1 but I don't know how to fix it.
The Question
Was my assumption that I can get good results from solvePnP by using a faked camera matrix and initial z coord correct? If so, how should I set up my camera matrix and z coordinates to make x and y rotation work? Also if anyone knows where I could get a homography finding algorithm that takes an initial guess and works only in 2d, or information on techniques for writing my own it would be very helpful. I will most likely be moving in that direction once I get this working.
Update
I built myself a test program which takes a homography, generates a set of coplanar points from that homography, and then runs the points through solvePnP to recover the specified homography. In the process of doing this I realized that I am fundamentally misunderstanding some part of how homographies are constructed. I have been assuming that a homography is constructed as follows.
hom(0,2) = x translation
hom(1,2) = y translation
hom(2,2) = scale, I can divide the entire matrix by this to normalize
the first two columns I assumed were the first two columns of a 3x3 rotation matrix. This essentially amounts to taking a 3x4 transform and throwing away column(2). I have discovered however that this is not true. The test case showing me the error of my ways was trying to make a homography which rotates points some small angle around the y axis.
//rotate by .0175 rad about y axis
rot_mat = (1,0,.0174,
0,1,0,
-.0174,0,1);
//my conversion method to make this a homography gives
homography = (1,0,0,
0,1,0,
-.0174,0,1);
The above homography does not work at all. Take for example a point x,y,1 where x > 58. The result will be x,y,some_negative_number. When I convert from homogeneous coordinates back to cartesian my x and y values will both flip signs.
All that is to say, I now have a much simpler question that I think would let me solve everything. How do I construct a homography that rotates points by some angle around the x and y axis?
Homographies are not simple translation or rotation matrices. The aim is to map straight lines to straight lines rather than to map single points to other points. They take into account perspective matrices to achieve this and are explained here
Hence, homography matrices cannot be easily decomposed, but there are (complicated) ways to do so shown here. This may help you extract rotations and translations out of it.
This should help you better understand a homography, but the rest I am unfamiliar with.
I'm using Qualcomm's AR SDK to track an object.
I have the following functions available:
https://ar.qualcomm.at/qdevnet/api (specifically look at "Namespace List->QCAR::Tool").
I can get the tracked item's modelview matrix by using the convertPose2GLMatrix (const Matrix34F &pose) function, as I get a pose matrix for each tracked item.
My goal - to determine the marker's location in "the real world". You can assume my camera will be stationary.
I have read numerous articles online, and my general understanding is this:
I need to pick a modelview matrix from where I choose the axis' 0,0,0 point to be (i.e. - copy the matrix I get for that point).
I then need to transpose that matrix. Then, each model view matrix I extract should be multiplied by that matrix and then by an (x,y,z,1) vector, to obtain the coordinates (ignoring the 4th item).
Am I correct? Is this the way to go? And if not - what is?
I like to think of ortho matrices as moving from one coordinate space to another, so what you suggest is correct, yet I would do it the other way around:
1.) Use a reference coordinate system S relative to your camera (i.e. just one of your matrices determined by the pose estimation)
2.) For each pose estimation T calculate the following:
W = S * T^-1 = S * transpose(T)
3.) From matrix W pick the 4 column, as your world position.
I'm very confused as to what my problem is here. I've set up a matrix which converts global/world coordinates into a local coordinate space of an object. This conversion matrix is constructed using object information from four vectors (forward, up, side and position). This localization matrix is then passed to glMultMatrixf() at the draw time for each object so as I can draw a simple axes around each object to visualize the local coordinate system. This works completely fine and as expected, and as the objects move and rotate in the world, so do their local coordinate axes.
The problem is that when I take this same matrix and multiply it by a column vector (to convert the global position of one object into the local coordinate system of another object) the result is not at all as I would expect. For example:
My localize matrix is as follows:
0.84155 0.138 0.5788 0
0.3020 0.8428 -0.5381 8.5335
0.4949 -0.5381 -0.6830 -11.6022
0.0 0.0 0.0 1.0
I input the position column vector:
-30.0
-30.0
-30.0
1.0
And get the output of:
-99.2362
-1.0199
4.8909
1.0000
As my object's position at this point in time is (-50.8, 8.533, -11.602, 1), I know that the output for the x coordinate cannot possibly be as great as -99.2362. Futhermore, when I find the distance between two global points, and the distance between the localized point and the origin, they are different.
I've checked this in Matlab and it seems that my matrix multiplication is correct (Note: in Matlab you have to first transpose the localize matrix). So I'm left to think that my localize matrix is not being constructed correctly - but then OpenGL is successfully using this matrix to draw the local coordinate axes!
I've tried to not include unnecessary details in this question but if you feel that you need more please don't hesitate to ask! :)
Many thanks.
I have to guess, but I would like to point out two sources of problems with OpenGL-matrix multiplication:
the modelview matrix transforms to a coordinate system where the camera is always at the origin (0,0,0) looking along the z-axis. So if you made some transformations to "move the camera" before applying local->global transformations, you must compensate for the camera movement or you will get coordinates local to the camera's coordinate space. Did you include camera transformations when you constructed the matrix?
Matrices in OpenGL are COLUMN-major. If you have an array with 16 values, the elements will be ordered that way:
[0][4][ 8][12]
[1][5][ 9][13]
[2][6][10][14]
[3][7][11][15]
Your matrix also seems strange. The first three columns tell me, that you applied some rotation or scaling transformations. The last column shows the amount of translation applied to each coordinate element. The numbers are the same as your object's position. That means, if you want the output x coordinate to be -50.8, the first three elements in the first row should add up to zero:
-30*0.8154 -30*0.3020 -30*0.4939 + 1 * -50.8967
<---this should be zero--------> but is -48,339.
So I think, there really is a problem when constructing the matrix. Perhaps you can explain how you construct the matrix...