OpenGL glut glTranslate glRotate glScale matrices - c++

i'm looking for an explanation (or an image) of the matrix and how it changes when putting translate, rotate and scale on it... (one cell with sin(angle), and another cell with x coord of translate)

For now, ignore translation, it's a slightly trickier concept than rotation and scale.
The way to think about this is that each matrix defines a change in the basis vectors. Given a standard co-ordinate system, your basis vectors are (1,0,0), (0,1,0) and (0,0,1). For now, I'm just going to assume a 2D system, as the concepts carry through, but it's less work.
I'm also assuming column-major. I can't remember if OpenGL actually uses this though, so check this first, and optionally transpose the matrices if needed.
The basis vectors, as defined before, can be put in matrix form. This simply puts each vector as a column in the matrix. Therefore, to transform from the basis vectors to the basis vectors (i.e. no change), we would use the following matrix. This is also called the "identity matrix", since it doesn't do anything to its input (similar to how *1 is the identity of multiplication).
2D 3D
(1 0) (1 0 0)
(0 1) (0 1 0)
(0 0 1)
I've included the 3D version for completeness sake, but that's as far as I'll be taking 3D.
A scale matrix can be seen as "stretching" the axes. If the axes are twice as large, the intervals on them will be twice as far apart, thus, the contents will be larger. Take this as an example
(2 0)
(0 2)
This will change the basis vectors from (1, 0) and (0, 1) to (2, 0) and (0, 2), thus making the whole shape represented twice as large. Diagrammatically, see below.
Before After
6| 3|
5| |
4| 2|-------|
3| | |
2|--| 1| |
1|__|___________ |_______|______
0 1 2 3 4 5 6 7 0 1 2 3
The same then happens for rotation, although instead, we sue different values, the values for a rotation matrix are as follows:
(cos(x) -sin(x))
(sin(x) cos(x))
This will effectively rotate each axis around the angle x. To really make sense of this, brush up on your trig and assume each column is a new basis vector ;).
Now, translation is a little trickier. For this, we add an extra column at the end of the matrix, which for all other operations just has a 1 on the last row (i.e. it is an identity, of forms). For translation, we fill this in as follows:
(1 0 x)
(0 1 y)
(0 0 1)
This is 3D in a form, but not in the form you will be used to. You can model this as moving the Z basis co-ordinate (and remember, we're working in 2D here!), assuming your model exists at Z=1. This effectively skews the shape, but again, as we're working in 2D, it is flattened so we don't percieve the third dimension. If we were working in 3D here, this would actually be the fourth dimension, as can be seen here:
(1 0 0 x)
(0 1 0 y)
(0 0 1 z)
(0 0 0 1)
Again, the "fourth dimension" isn't seen, but we instead move along it and flatten. It's easier to get your head around it in 2D space first, then try and extrapolate. In 3D space, this fourth dimension vector is called w, so your models implicitly lie at w=1.
Hope this helps!
EDIT: As an aside, this page is what helped me to understand translation matrices. It has some decent diagrams, so hopefully it will be more helpful:
http://www.blancmange.info/notes/maths/vectors/homo/

Related

How to calibrate camera focal length, translation and rotation given four points?

I'm trying to find the focal length, position and orientation of a camera in world space.
Because I need this to be resolution-independent, I normalized my image coordinates to be in the range [-1, 1] for x, and a somewhat smaller range for y (depending on aspect ratio). So (0, 0) is the center of the image. I've already corrected for lens distortion (using k1 and k2 coefficients), so this does not enter the picture, except sometimes throwing x or y slightly out of the [-1, 1] range.
As a given, I have a planar, fixed rectangle in world space of known dimensions (in millimeters). The four corners of the rectangle are guaranteed to be visible, and are manually marked in the image. For example:
std::vector<cv::Point3f> worldPoints = {
cv::Point3f(0, 0, 0),
cv::Point3f(2000, 0, 0),
cv::Point3f(0, 3000, 0),
cv::Point3f(2000, 3000, 0),
};
std::vector<cv::Point2f> imagePoints = {
cv::Point2f(-0.958707, -0.219624),
cv::Point2f(-1.22234, 0.577061),
cv::Point2f(0.0837469, -0.1783),
cv::Point2f(0.205473, 0.428184),
};
Effectively, the equation I think I'm trying to solve is (see the equivalent in the OpenCV documentation):
/ xi \ / fx 0 \ / tx \ / Xi \
s | yi | = | fy 0 | | Rxyz ty | | Yi |
\ 1 / \ 1 / \ tz / | Zi |
\ 1 /
where:
i is 1, 2, 3, 4
xi, yi is the location of point i in the image (between -1 and 1)
fx, fy are the focal lengths of the camera in x and y direction
Rxyz is the 3x3 rotation matrix of the camera (has only 3 degrees of freedom)
tx, ty, tz is the translation of the camera
Xi, Yi, Zi is the location of point i in world space (millimeters)
So I have 8 equations (4 points of 2 coordinates each), and I have 8 unknowns (fx, fy, Rxyz, tx, ty, tz). Therefore, I conclude (barring pathological cases) that a unique solution must exist.
However, I can't seem to figure out how to compute this solution using OpenCV.
I have looked at the imgproc module:
getPerspectiveTransform works, but gives me a 3x3 matrix only (from 2D points to 2D points). If I could somehow extract the needed parameters from this matrix, that would be great.
I have also looked at the calib3d module, which contains a few promising functions that do almost, but not quite, what I need:
initCameraMatrix2D sounds almost perfect, but when I pass it my four points like this:
cv::Mat cameraMatrix = cv::initCameraMatrix2D(
std::vector<std::vector<cv::Point3f>>({worldPoints}),
std::vector<std::vector<cv::Point2f>>({imagePoints}),
cv::Size2f(2, 2), -1);
it returns me a camera matrix that has fx, fy set to -inf, inf.
calibrateCamera seems to use a complicated solver to deal with overdetermined systems and outliers. I tried it anyway, but all I can get from it are assertion failures like this:
OpenCV(3.4.1) Error: Assertion failed (0 <= i && i < (int)vv.size()) in getMat_, file /build/opencv/src/opencv-3.4.1/modules/core/src/matrix_wrap.cpp, line 79
Is there a way to entice OpenCV to do what I need? And if not, how could I do it by hand?
3x3 rotation matrices have 9 elements but, as you said, only 3 degrees of freedom. One subtlety is that exploiting this property makes the equation non-linear in the angles you want to estimate, and non-linear equations are harder to solve than linear ones.
This kind of equations are usually solved by:
considering that the P=K.[R | t] matrix has 12 degrees of freedom and solving the resulting linear equation using the SVD decomposition (see Section 7.1 of 'Multiple View Geometry' by Hartley & Zisserman for more details)
decomposing this intermediate result into an initial approximate solution to your non-linear equation (see for example cv::decomposeProjectionMatrix)
refining the approximate solution using an iterative solver which is able to deal with non-linear equations and with the reduced degrees of freedom of the rotation matrix (e.g. Levenberg-Marquard algorithm). I am not sure if there is a generic implementation of this in OpenCV, however it is not too complicated to implement one yourself using the Ceres Solver library.
However, your case is a bit particular because you do not have enough point matches to solve the linear formulation (i.e. step 1) reliably. This means that, as you stated it, you have no way to initialize an iterative refining algorithm to get an accurate solution to your problem.
Here are a few work-arounds that you can try:
somehow get 2 additional point matches, leading to a total of 6 matches hence 12 constraints on your linear equation, allowing you to solve the problem using the steps 1, 2, 3 above.
somehow guess manually an initial estimate for your 8 parameters (2 focal lengths, 3 angles & 3 translations), and directly refine them using an iterative solver. Be aware that the iterative process might converge to a wrong solution if your initial estimate is too far off.
reduce the number of unknowns in your model. For instance, if you manage to fix two of the three angles (e.g. roll & pitch) the equations might simplify a lot. Also, the two focal lengths are probably related via the aspect ratio, so if you know it and if your pixels are square, then you actually have a single unknown there.
if all else fails, there might be a way to extract approximated values from the rectifying homography estimated by cv::getPerspectiveTransform.
Regarding the last bullet point, the opposite of what you want is clearly possible. Indeed, the rectifying homography can be expressed analytically knowing the parameters you want to estimate. See for instance this post and this post. There is also a full chapter on this in the Hartley & Zisserman book (chapter 13).
In your case, you want to go the other way around, i.e. to extract the intrinsic & extrinsic parameters from the homography. There is a somewhat related function in OpenCV (cv::decomposeHomographyMat), but it assumes the K matrix is known and it outputs 4 candidate solutions.
In the general case, this would be tricky. But maybe in your case you can guess a reasonable estimate for the focal length, hence for K, and use the point correspondences to select the good solution to your problem. You might also implement a custom optimization algorithm, testing many focal length values and keeping the solution leading to the lowest reprojection error.

The Concept of Bilinear Interpolation by Accessing 2-d array like 1-d array does

In 2-d array, there are pixels of bmp files. and its size is width(3*65536) * height(3*65536) of which I scaled.
It's like this.
1 2 3 4
5 6 7 8
9 10 11 12
Between 1 and 2, There are 2 holes as I enlarged the original 2-d array. ( multiply 3 )
I use 1-d array-like access method like this.
array[y* width + x]
index
0 1 2 3 4 5 6 7 8 9...
1 2 3 4 5 6 7 8 9 10 11 12
(this array is actually 2-d array and is scaled by multiplying 3)
now I can patch the hole like this solution.
In double for loop, in the condition (j%3==1)
Image[i*width+j] = Image[i*width+(j-1)]*(1-1/3) + Image[i*width+(j+2)]*(1-2/3)
In another condition ( j%3==2 )
Image[i*width+j] = Image[i*width+(j-2)]*(1-2/3) + Image[i*width+(j+1)]*(1-1/3)
This is the way I know I could patch the holes which is so called "Bilinear Interpolation".
I want to be sure about what I know before implementing this logic into my code. Thanks for reading.
Bi linear interpolation requires either 2 linear interpolation passes (horizontal and vertical) per interpolated pixel (well, some of them only require 1), or requires up to 4 source pixels per interpolated pixel.
Between 1 and 2 there are two holes. Between 1 and 5 there are 2 holes. Between 1 and 6 there are 4 holes. Your code, as written, could only patch holes between 1 and 2, not the other holes correctly.
In addition your division is integer division, and does not do what you want.
Generally you are far better off writing a r=interpolate_between(a,b,x,y) function, that interpolates between a and b at step x out of y. Then test and fix. Now scale your image horizontally using it, and check visually you got it right (especially the edges!)
Now try using it to scale vertically only.
Now do both horizontal, then vertical.
Next, write the bilinear version, which you can test again using the linear version three times (will be within rounding error). Then try to bilinear scale the image, checking visually.
Compare with the two-linear scale. It should differ only by rounding error.
At each of these stages you'll have a single "new" operation that can go wrong, with the previous code already validated.
Writing everything at once will lead to complex bug-ridden code.

HOG: What is done in the contrast-normalization step?

According to the HOG process, as described in the paper Histogram of Oriented Gradients for Human Detection (see link below), the contrast normalization step is done after the binning and the weighted vote.
I don't understand something - If I already computed the cells' weighted gradients, how can the normalization of the image's contrast help me now?
As far as I understand, contrast normalization is done on the original image, whereas for computing the gradients, I already computed the X,Y derivatives of the ORIGINAL image. So, if I normalize the contrast and I want it to take effect, I should compute everything again.
Is there something I don't understand well?
Should I normalize the cells' values?
Is the normalization in HOG not about contrast anyway, but is about the histogram values (counts of cells in each bin)?
Link to the paper:
http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf
The contrast normalization is achieved by normalization of each block's local histogram.
The whole HOG extraction process is well explained here: http://www.geocities.ws/talh_davidc/#cst_extract
When you normalize the block histogram, you actually normalize the contrast in this block, if your histogram really contains the sum of magnitudes for each direction.
The term "histogram" is confusing here, because you do not count how many pixels has direction k, but instead you sum the magnitudes of such pixels. Thus you can normalize the contrast after computing the block's vector, or even after you computed the whole vector, assuming that you know in which indices in the vector a block starts and a block ends.
The steps of the algorithm due to my understanding - worked for me with 95% success rate:
Define the following parameters (In this example, the parameters are like HOG for Human Detection paper):
A cell size in pixels (e.g. 6x6)
A block size in cells (e.g. 3x3 ==> Means that in pixels it is 18x18)
Block overlapping rate (e.g. 50% ==> Means that both block width and block height in pixels have to be even. It is satisfied in this example, because the cell width and cell height are even (6 pixels), making the block width and height also even)
Detection window size. The size must be dividable by a half of the block size without remainder (so it is possible to exactly place the blocks within with 50% overlapping). For example, the block width is 18 pixels, so the windows width must be a multiplication of 9 (e.g. 9, 18, 27, 36, ...). Same for the window height. In our example, the window width is 63 pixels, and the window height is 126 pixels.
Calculate gradient:
Compute the X difference using convolution with the vector [-1 0 1]
Compute the Y difference using convolution with the transpose of the above vector
Compute the gradient magnitude in each pixel using sqrt(diffX^2 + diffY^2)
Compute the gradient direction in each pixel using atan(diffY / diffX). Note that atan will return values between -90 and 90, while you will probably want the values between 0 and 180. So just flip all the negative values by adding to them +180 degrees. Note that in HOG for Human Detection, they use unsigned directions (between 0 and 180). If you want to use signed directions, you should make a little more effort: If diffX and diffY are positive, your atan value will be between 0 and 90 - leave it as is. If diffX and diffY are negative, again, you'll get the same range of possible values - here, add +180, so the direction is flipped to the other side. If diffX is positive and diffY is negative, you'll get values between -90 and 0 - leave them the same (You can add +360 if you want it positive). If diffY is positive and diffX is negative, you'll again get the same range, so add +180, to flip the direction to the other side.
"Bin" the directions. For example, 9 unsigned bins: 0-20, 20-40, ..., 160-180. You can easily achieve that by dividing each value by 20 and flooring the result. Your new binned directions will be between 0 and 8.
Do for each block separately, using copies of the original matrix (because some blocks are overlapping and we do not want to destroy their data):
Split to cells
For each cell, create a vector with 9 members (one for each bin). For each index in the bin, set the sum of all the magnitudes of all the pixels with that direction. We have totally 6x6 pixels in a cell. So for example, if 2 pixels have direction 0 while the magnitude of the first one is 0.231 and the magnitude of the second one is 0.13, you should write in index 0 in your vector the value 0.361 (= 0.231 + 0.13).
Concatenate all the vectors of all the cells in the block into a large vector. This vector size should of course be NUMBER_OF_BINS * NUMBER_OF_CELLS_IN_BLOCK. In our example, it is 9 * (3 * 3) = 81.
Now, normalize this vector. Use k = sqrt(v[0]^2 + v[1]^2 + ... + v[n]^2 + eps^2) (I used eps = 1). After you computed k, divide each value in the vector by k - thus your vector will be normalized.
Create final vector:
Concatenate all the vectors of all the blocks into 1 large vector. In my example, the size of this vector was 6318

Homographie Opencv apply in Opengl

I modified a algorithm to rectif. It returns me 2 Opencv homographies (3x3 Matrixes). I can use cv::warpPerspective and get rectified images. So the algorithm works right. But I need to apply this homographies to textures in OpenGl. So I create a 4x4 Matrix (HomoGl) and I use
glMultMatrixf(HomoGl);
to apply this Tranform. To fill HomoGl I use
for(int i=0;i<3;++i){
for(int j=0; j<3;++j){
HomoGL[i+j*4] = HomoCV.at<double>(i,j);
}
}
This methode has the best result...but it is wrong. I test some other methods[1] but they doesn't work.
My Question: How can I convert the OpenCV Homography, so I can use
glMultMatrixf to get right transformed Images.
[1]http://www.aiqus.com/questions/24699/from-2d-homography-of-2-planes-to-3d-rotation-of-opengl-camera
So an H matrix is the transformation of 1 point on plane one to another point on plane 2.
X1 = H*X2
When you use warpHomography in opencv you are putting the points in the perceptive of plane 2.
The matrix (or image mat) that you get out of that warping is the texture you should use when applying to the surface.
Your extension of the 3x3 homography to 4x4 is wrong. The most naive approach which will somewhat work would be an extension of the form
h11 h12 h13 h11 h12 0 h13
H = h21 h22 h23 -> H' = h21 h22 0 h23
h31 h32 h32 0 0 1 0
h31 h32 0 h33
The problem with this approach is that while it gives the correct result for x and y, it will distort z, since the modified w component affects all coordinates. If the z coordinate matters, you need a different approach.
In this paper, an approximation is proposed which will minimize the effects on the depth (see equation 5, you also will need to normalize your homography so that h33=1). However, this approximation will only work well enough for small distortions. If you have some extreme trapezoid distorion, that approch will also fail. In that case, a 2-pass approach of rendering into the texture and and applying the 2D distortion is possible.
With the modern programmable pipeline, one could also deal with this in one pass by undistorting the z coordinate in the fragment shader (but that can have some negative impact on performance on its own).

3x3 Matrix Rotation in C++

Alright, first off, I know similar questions are all over the web, I have looked at more than I'd care to count, I've been trying to figure it out for almost 3 weeks now (not constantly, just on and off, hoping for a spark of insight).
In the end, what I want to get, is a function where you pass in how much you want to rotate by (currently I'm working in Radian's, but I can go Degrees or Radians) and it returns the rotation matrix, preserving any translations I had.
I understand the formula to rotate on the "Z" axis in a 2D cartesian plane, is:
[cos(radians) -sin(radians) 0]
[sin(radians) cos(radians) 0]
[0 0 1]
I do understand Matrix Maths (Addition, Subtraction, Multiplication and Determinant/Inverse) fairly well, but what I'm not understanding, is how to, step-by-step, make a matrix I can use for rotation, preserving any translation (and whatever else, like scale) that it has.
From what I've gathered from other examples, is to multiply my current Matrix (whatever that may be, let's just use an Identity Matrix for now), by a Matrix like this:
[cos(radians) - sin(radians)]
[sin(radians) + cos(radians)]
[1]
But then my original Matrix would end up as a 3x1 Matrix instead of a 3x3, wouldn't it? I'm not sure what I'm missing, but something just doesn't seem right to me. I'm not necessarily looking for code for someone to write for me, just to understand how to do this properly and then I can write it myself. (not to say I won't look at other's code :) )
(Not sure if it matters to anybody, but just in-case, using Windows 7 64-bit, Visual Studio 2010 Ultimate, and I believe OpenGL, this is for Uni)
While we're at it, can someone double check this for me? Just to make sure it seems right.
A translation Matrix (again, let's use Identity) is something like this:
[1, 0, X translation element]
[0, 1, Y translation element]
[0, 0, 1]
First, You can not have translation 3x3 matrix for 3D space. You have to use homogeneous 4x4 matrices.
After that create a separate matrix for each transformation (translation, rotation, scale) and multiply them to get the final transformation matrix (multiplying 4x4 matrix will give you 4x4 matrix)
Lets clear some points:
Your object consists of 3D points which are basically 3 by 1 matrices.
You need a 3 by 3 rotation matrix to rotate your object: R but if you also add translation terms, transformation matrix will be 4 by 4:
[R11, R12, R13 tx]
[R21, R22, R23 ty]
[R31, R32, R33 tz]
[0, 0, 0, 1]
For R terms you can have look at :http://inside.mines.edu/~gmurray/ArbitraryAxisRotation/, they are dependent on the rotation angles of each axis.
In order to rotate your object, every 3D point is multiplied by this rotation matrix. For every 3 by 1 point you also need to add a 4th term(scale factor) which is 1 assuming fixed scale:
[x y z 1]'
Resulting product vector will be 4 by 1 and the last term is the scale term which is 1 again and can be removed.
Resulting rotated object points are these new 3D product points.
I faced the same problem and found a satisfying formula in this SO question.
Let (cos0, sin0) be respectively the cosine and sine values of your angle, and (x0, y0) the coordinates of the center of your rotation.
To transform a 2d point of coordinates (x,y), you have to multiply its homogeneous 3x1 coordinates (x,y,1) by this 3x3 matrix:
[cos0, -sin0, x0-(cos0*x0 - sin0*y0)]
[sin0, cos0, y0-(sin0*x0 + cos0*y0)]
[ 0, 0, 1 ]
The values on the third column are the amount of translation necessary to apply when you rotation center is not the origin of the system.