this topic has been discussed quite a few times. There are a lot of information on the memory layout of matrices in OpenGL on the internet. Sadly different sources often contradict each other.
My question boils down to:
When I have three base vectors of my matrix bx, by and bz. If I want to make a matrix out of them to plug them into a shader, how are they laid out in memory?
Lets clarify what I mean by base vector, because I suspect this can also mean different things:
When I have a 3D model, that is Z-up and I want to lay it down flat in my world space along the X-axis, then bz is [1 0 0]. I.e. a vertex [0 0 2] in model space will be transformed to [2 0 0] when that vertex is multiplied by my matrix that has bz as the base vector for the Z-axis.
Coming to OpenGL matrix memory layout:
According to the GLSL spec (GLSL Spec p.110) it says:
vec3 v, u;
mat3 m;
u = v * m;
is equivalent to
u.x = dot(v, m[0]); // m[0] is the left column of m
u.y = dot(v, m[1]); // dot(a,b) is the inner (dot) product of a and b
u.z = dot(v, m[2]);
So, in order to have best performance, I should premultiply my vertices in the vertex shader (that way the GPU can use the dot product and so on):
attribute vec4 vertex;
uniform mat4 mvp;
void main()
{
gl_Position = vertex * mvp;
}
Now OpenGL is said to be column-major (GLSL Spec p 101). I.e. the columns are laid out contiguously in memory:
[ column 0 | column 1 | column 2 | column 3 ]
[ 0 1 2 3 | 4 5 6 7 | 8 9 10 11 | 12 13 14 15 ]
or:
[
0 4 8 12,
1 5 9 13,
2 6 10 14,
3 7 11 15,
]
This would mean that I have to store my base vectors in the rows like this:
bx.x bx.y bx.z 0
by.x by.y by.z 0
bz.x bz.y bz.z 0
0 0 0 1
So for my example with the 3D model that I want to lay flat down, it has the base vectors:
bx = [0 0 -1]
by = [0 1 0]
bz = [1 0 0]
The model vertex [0 0 2] from above would be transformed like dis in the vertex shader:
// m[0] is [ 0 0 1 0]
// m[1] is [ 0 1 0 0]
// m[2] is [-1 0 0 0]
// v is [ 0 0 2 1]
u.x = dot([ 0 0 2 1], [ 0 0 1 0]);
u.y = dot([ 0 0 2 1], [ 0 1 0 0]);
u.z = dot([ 0 0 2 1], [-1 0 0 0]);
// u is [ 2 0 0]
Just as expected!
On the contrary:
This: Correct OpenGL matrix format?
SO question and consequently the OpenGL Faq states:
For programming purposes, OpenGL matrices are 16-value arrays with base vectors laid out contiguously in memory. The translation components occupy the 13th, 14th, and 15th elements of the 16-element matrix, where indices are numbered from 1 to 16 as described in section 2.11.2 of the OpenGL 2.1 Specification.
This says that my base vectors should be laid out in columns like this:
bx.x by.x bz.x 0
bx.y by.y bz.y 0
bx.z by.z bz.z 0
0 0 0 1
To me these two sources which both are official documentation from Khronos seem to contradict each other.
Can somebody explain this to me? Have I made a mistake? Is there indeed some wrong information?
The FAQ is correct, it should be:
bx.x by.x bz.x 0
bx.y by.y bz.y 0
bx.z by.z bz.z 0
0 0 0 1
and it's your reasoning that is flawed.
Assuming that your base vectors bx, by, bz are the model basis given in world coordinates, then the transformation from the model-space vertex v to the world space vertex Bv is given by linear combination of the base vectors:
B*v = bx*v.x + by*v.y + bz*v.z
It is not a dot product of b with v. Instead it's the matrix multiplication where B is of the above form.
Taking a dot product of a vertex u with bx would answer the inverse question: given a world-space u what would be its coordinates in the model space along the axis bx? Therefore multiplying by the transposed matrix transpose(B) would give you the transformation from world space to model space.
Related
I´m trying to create a view matrix for my program to be able to move and rotate the camera in OpenGL.
I have a camera struct that has the position and rotation vectors in it. From what I understood, to create the view matrix, you need to multiply the transform matrix with the rotation matrix to get the expected result.
So far I tried creating matrices for rotation and for transformation and multiply them like this:
> Transformation Matrix T =
1 0 0 -x
0 1 0 -y
0 0 1 -z
0 0 0 1
> Rotation Matrix Rx =
1 0 0 0
0 cos(-x) -sin(-x) 0
0 sin(-x) cos(-x) 0
0 0 0 1
> Rotation Matrix Ry =
cos(-y) 0 sin(-y) 0
0 1 0 0
-sin(-y) 0 cos(-y) 0
0 0 0 1
> Rotation Matrix Rz =
cos(-z) -sin(-z) 0 0
sin(-z) cos(-z) 0 0
0 0 1 0
0 0 0 1
View matrix = Rz * Ry * Rx * T
Notice that the values are negated, because if we want to move the camera to one side, the entire world is moving to the opposite side.
This solution seems to almost be working. The problem that I have is that when the camera is not at 0, 0, 0, if I rotate the camera, the position is changed. What I think is that if the camera is positioned at, let´s say, 0, 0, -20 and I rotate the camera, the position should remain at 0, 0, -20 right?
I feel like I´m missing something but I can´t seem to know what. Any help?
Edit 1:
It´s an assignment for university, so I can´t use any built-in functions!
Edit 2:
I tried changing the order of the operations and putting the translation in the left side, so T * Rz * Ry * Rx, but then the models rotate around themselves, and not around the camera.
# Iwillnotexist Idonotexist presented his code for image perspective transformation (rotations around 3 axes): link
I'm looking for a function (or math) to make an inverse perspective transformation.
Let's make an assumption, that my "input image" is a result of his warpImage() function, and all angles (theta, phi and gamma), scale and fovy are also known.
I'm looking for a function (or math) to compute inverse transformation (black border doesn't matter) to get an primary image.
How can I do this?
The basic idea is you need to find the inverse transformation. In the linked question they have F = P T R1 R2 where P is the projective transformation, T is a translation, and R1, R2 are two rotations.
Denote F* as the inverse transformation. We can the inverse as F* = R2* R1* T* P*. Note the order changes. Three of these are easy R1* is just another rotation but with the angle negated. So the first inverse rotation would be
cos th sin th 0 0
R1* = -sin th cos th 0 0
0 0 1 0
0 0 1
Note the signs on the two sin terms are reversed.
The inverse of a translation is just a translation in the opposite direction.
1 0 0 0
T*= 0 1 0 0
0 0 1 h
0 0 0 1
You can check these calculating T* T which should be the identity matrix.
The trickiest bit is the projective component we have
cot(fv/2) 0 0 0
P = 0 cot(fv/2) 0 0
0 0 -(f+n)/(f-n) -2 f n / (f-n)
0 0 -1 0
The inverse of this is
tan(fv/2) 0 0 0
P*= 0 tan(fv/2) 0 0
0 0 0 -2
0 0 (n-f)/(f n) (f+n)/(f n)
Wolfram alpha inverse with v=fv
You then need to multiply these together in the reverse order to get the final matrix.
I also had issues to back-transform my image.
You need to store the points
ptsInPt2f and ptsOutPt2f
which are computed in the the 'warpMatrix' method.
To back-transform, simply use the same method
M = getPerspectiveTransform(ptsOutPt2f, ptsInPt2f);
but with reversed param order (output as first argument, input as second).
Afterwards a simple crop will get rid of all the black.
Is there a way to perform convolution of two matrices (image and mask) by breaking up the mask into 2 smaller chunks and combining the result of the 2 convolutions to get the original single mask convolution result?
Yes, due to linearity of convolution, you can break things up as:
I * M = I * (M1 + M2) = I * M1 + I * M2
where M is your original mask and M1 and M2 are the two smaller chunks.
For example, M could be
M = [ 1 1 2
2 1 3
2 1 8 ]
and
M1 = [ 0 0 0
0 1 3
0 1 8 ]
M2 = [ 1 1 2
2 0 0
2 0 0 ]
Just be careful that if you do this, and you want to represent M1 as the smaller,
M1 = [ 1 3
1 8 ]
that you'll have to align them properly before you add them back together.
i found this example on how to to transform a unit cube into a frustum (truncated pyramid) via non-affine transformation. i need a matrix which can be pushed to my matrixstack which does the transform for me. how can this calculation
x' = (M11•x + M21•y + M31•z + OffsetX) ÷ (M14•x + M24•y + M34•z + M44)
y' = (M12•x + M22•y + M32•z + OffsetY) ÷ (M14•x + M24•y + M34•z + M44)
z' = (M13•x + M23•y + M33•z + OffsetZ) ÷ (M14•x + M24•y + M34•z + M44)
be expressed in a single matrix? is it possible?
for now i am using an inverse projection matrix to transform a unit cube into a frustum, but i have to divide every 3d point by w whenever i want to pick something.
The homogeneous matrix representing those equations is simply
[ M11 M12 M13 M14 ] [ 1 0 0 0 ]
M = [ M21 M22 M23 M24 ] , M0 = [ 0 1 0 0 ]
[ M31 M32 M33 M34 ] [ 0 0 1 0 ]
[ M41 M42 M43 M44 ] [ 0 0 1 0 ]
You can simply multiply you model data D of the cube with it to get the truncated pyramid, as well as continue stacking with other matrices, such as camera + projection:
((M * D) * V ) * P;
There's no need to worry about the division by 'w' -- playing with the 4x4 matrices postpones that to the final stages of the rasterizer.
M0 here is the simplest projection matrix: however to utilize that, you must first transform you cube along the z-axis further away from the camera, multiply by M0 and transform it back to it's origin. Define a transform Matrix T.
[ 1 0 0 0 ]
T = [ 0 1 0 0 ]
[ 0 0 1 4 ]
[ 0 0 0 0 ]
Then (D * T * M0 * (-T)) is a truncated pyramid, that just went through a perspective transform as if its center was 4 units away from the origin.
(Disclaimer: in opengl m43 is most likely -1)
To calculate the matrix it would be wise to choose a math library that is already implemented. The matrix is usually composed from a projection_matrix * vies_matrix * world_transform_matrix. All the three matrices can be created using a library such as GLM and the usage would be:
glm::perspective(..args...) * glm::lookAt(..args..) * object_transformation
In your case you could ignore lookAt and object_transformation and use only the projection to view the cube.
i'd like to be able to calculate the 'mean brightest point' in a line of pixels. It's for a primitive 3D scanner.
for testing i simply stepped through the pixels and if the current pixel is brighter than the one before, the brightest point of that line will be set to the current pixel. This of course gives very jittery results throughout the image(s).
i'd like to get the 'average center of the brightness' instead, if that makes sense.
has to be a common thing, i'm simply lacking the right words for a google search.
Calculate the intensity-weighted average of the offset.
Given your example's intensities (guessed) and offsets:
0 0 0 0 1 3 2 3 1 0 0 0 0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14
this would give you (5+3*6+2*7+3*8+9)/(1+3+2+3+1) = 7
You're looking for 1D Convolution which takes a filter with which you "convolve" the image. For example, you can use a Median filter (borrowing example from Wikipedia)
x = [2 80 6 3]
y[1] = Median[2 2 80] = 2
y[2] = Median[2 80 6] = Median[2 6 80] = 6
y[3] = Median[80 6 3] = Median[3 6 80] = 6
y[4] = Median[6 3 3] = Median[3 3 6] = 3
so
y = [2 6 6 3]
So here, the window size is 3 since you're looking at 3 pixels at a time and replacing the pixel around this window with the median. A window of 3 means, we look at the first pixel before and first pixel after the pixel we're currently evaluating, 5 would mean 2 pixels before and after, etc.
For a mean filter, you do the same thing except replace the pixel around the window with the average of all the values, i.e.
x = [2 80 6 3]
y[1] = Mean[2 2 80] = 28
y[2] = Mean[2 80 6] = 29.33
y[3] = Mean[80 6 3] = 29.667
y[4] = Mean[6 3 3] = 4
so
y = [28 29.33 29.667 4]
So for your problem, y[3] is the "mean brightest point".
Note how the borders are handled for y[1] (no pixels before it) and y[4] (no pixels after it)- this example "replicates" the pixel near the border. Therefore, we generally "pad" an image with replicated or constant borders, convolve the image and then remove those borders.
This is a standard operation which you'll find in many computational packages.
your problem is like finding the longest sequence problem. once you are able to determine a sequence( the starting point and the length), the all that remains is finding the median, which is the central element.
for finding the sequence, definition of bright and dark has to be present, either relative -> previous value or couple of previous values. absolute: a fixed threshold.