Convention of faces in OpenGL cubemapping - opengl

What is the convention OpenGL follows for cubemaps?
I followed this convention (found on a website) and used the correspondent GLenum to specify the 6 faces GL_TEXTURE_CUBE_MAP_POSITIVE_X_EXT but I always get wrong Y, so I have to invert Positive Y with Negative Y face. Why?
________
| |
| pos y |
| |
_______|________|_________________
| | | | |
| neg x | pos z | pos x | neg z |
| | | | |
|_______|________|________|________|
| |
| |
| neg y |
|________|

but I always get wrong Y, so I have to invert Positive Y with Negative Y face. Why?
Ah, yes, this is one of the most odd things about Cube Maps. Rest assured, you're not the only one to fall for it. You see:
Cube Maps have been specified to follow the RenderMan specification (for whatever reason), and RenderMan assumes the images' origin being in the upper left, contrary to the usual OpenGL behaviour of having the image origin in the lower left. That's why things get swapped in the Y direction. It totally breaks with the usual OpenGL semantics and doesn't make sense at all. But now we're stuck with it.
Take note that upper left, vs. lower left are defined in the context of identity transformation from model space to NDC space

Here is a convenient diagram showing how the axes work in OpenGL cubemaps:

Related

How do I invert two axes of a quaternion

I have to convert poses (coordiantes + quaternion for rotation) from two different APIs I'm using. More specifically I get coordinates of objects relative to the camera's local position.
My detection library (for detecting those objects) has the coordinate system of the camera oriented with Z in the direction the camera is looking, X to the right of the camera, and Y down from the camera (if you look from the perspective of the camera itself). I will use ACII Art here to show what I mean:
Symbols:
+------+
| | = camera from the back
+------+
+--+
| +-+
| | = camera from the right side (imagine the front part as the lens)
| +-+
+--+
Detection Coordinate System from the back of the camera
+--------> x
|
| +------+
| | |
V y +------+
Detection Coordinate System from the right side of the camera
+--------> z
| +--+
| | +-+
| | |
V y | +-+
+--+
The library where I use the object poses however has X in the same direction, but Y and Z are both inverted. So Z is pointing opposite the looking direction of the camera and Y is pointing straight up. More ASCII sketches:
Usage Coordinate System from the back of the camera
^ y +------+
| | |
| +------+
|
+--------> x
Usage Coordinate System from the right side of the camera
+--+
| +-+ ^ y
| | |
| +-+ |
+--+ |
z <--------+
So now I get object poses (including rotation) in the detection coordinate system but want to use them in the usage coordinate system. I know I can transform the coordinates by just inverting the values for y and z, but how do I convert the quaternions for the rotation? I tried a few combinations but none seem to work.
In this case your change of basis are just permutations of the axes, so to convert from one to the other you just have to replicate the same permutation in the imaginary vector in the quaternion.
i.e. if your quaternion is (w,x,y,z) and the basis permutation is (z,y,x) your new quaternion is (w,z,y,x).

3D Reconstruction: Solving Equations for 3D Points from Uncalibrated Images

This is a pretty straightforward question (I hope). The following is from 3D reconstruction from Multiple Images, Moons et al (Fig 2-13, p. 348):
Projective 3D reconstruction from two uncalibrated images
Given: A set of point correspondences m1 in I1 and m2 in I2 between two uncalibrated images I1 and I2 of a static scene.
Aim: A projective 3D reconstruction ^M of the scene.
Algorithm:
Compute an estimate ^F for the fundamental matrix
Compute the epipole e2 from ^F
Compute the 3x3-matrix
^A = −(1/||e2||2) [e2]x ^F
For each pair of corresponding image points m1 and m2, solve the following system of linear equations for ^M :
^p1 m1 = ^M and ^p2 m2 = ^A ^M + e2
( ^p1 and ^p2 are non-zero scalars )
[I apologize for the formatting. I don't know how to put hats over characters.]
I'm pretty much OK up until step 4. But it's been 30+ years since my last linear algebra class, and even then I'm not sure I knew how to solve something like this. Any help or references would be greatly appreciated.
By the way, this is sort of a follow-on to another post of mine:
Detecting/correcting Photo Warping via Point Correspondences
This is just another way to try to solve the problem.
Given a pair of matching image points m1 and m2, the two corresponding rays from the optical centers are unlikely to intersect perfectly due to noise in the measurements. Consequently a solution to the provided system should instead be found in the (linear) least square sense i.e. find x = argmin_x | C x - d |^2 with (for instance):
/ 0 \ / \
| I -m1 0 | | M |
C x = | 0 | | |
| 0 | | p1 |
| A 0 -m2 | \ p2 /
\ 0 /
and
/ 0 \
| 0 |
d = | 0 |
| |
| -e2 |
\ /
The problem has 5 unknowns for 6 equations.
A possible alternative formulation exploits the fact that m1 and m2 are collinear with M so m1 x M = 0 and m2 x (A M + e2) = 0 yielding the linear least squares problem x = argmin_x | C x - d |^2 with:
/ [m1]x \ / \
C = | | | M |
\ [m2]x A / \ /
and
/ 0 \
d = | |
\ -m2 x e2 /
where [v]x is the 3 x 3 matrix of the cross product with v. The problem has 3 unknowns for 6 equations which can be reduced to 4 only by keeping non-linearly dependent ones.

Explanation of the Perspective Projection Matrix (Second row)

I try to figure out how the Perspective Projection Matrix works.
According to this: https://www.opengl.org/sdk/docs/man2/xhtml/gluPerspective.xml
f = cotangent(fovy/2)
Logically I understand how it works (x- and y-Values moving further away from the bounding box or vice versa), but I need an mathematical explanation why this works. Maybe because of the theorem of intersecting lines???
I found an explanation here: http://www.songho.ca/opengl/gl_projectionmatrix.html
But I don't understand the relevent part of it.
As for me, an explanation of the perspective projection matrix at songho.ca is the best one.
I'll try to retell the main idea, without going into details. But, first of all, let's clarify why the cotangent is used in OpenGL docs.
What is cotangent? Accordingly to wikipedia:
The cotangent of an angle is the ratio of the length of the adjacent side to the length of the opposite side.
Look at the picture below, the near is the length of the adjacent side and the top is the length of the opposite side .
The fov/2 is the angle we are interested in.
The angle fov is the angle between the top plane and bottom plane, respectively the angle fov/2 is the angle between top(or botton) plane and the symmetry axis.
So, the [1,1] element of projection matrix that is defined as cotangent(fovy/2) in opengl docs is equivalent to the ratio near/top.
Let's have a look at the point A specified at the picture. Let's find the y' coordinate of the point A' that is a projection of the point A on the near plane.
Using the ratio of similar triangles, the following relation can be inferred:
y' / near = y / -z
Or:
y' = near * y / -z
The y coordinate in normalized device coordinates can be obtained by dividing by the value top (the range (-top, top) is mapped to the range (-1.0,1.0)), so:
yndc = near / top * y / -z
The coefficient near / top is a constant, but what about z? There is one very important detail about normalized device coordinates.
The output of the vertex shader is a four component vector, that is transformed to three component vector in the interpolator by dividing first three component by the fourth component:
,
So, we can assign to the fourth component the value of -z. It can be done by assigning to the element [2,3] of the projection matrix the value -1.
Similar reasoning can be done for the x coordinate.
We have found the following elements of projection matrix:
| near / right 0 0 0 |
| 0 near / top 0 0 |
| 0 0 ? ? |
| 0 0 -1 0 |
There are two elements that we didn't found, they are marked with '?'.
To make things clear, let's project an arbitary point (x,y,z) to normalized device coordinates:
| near / right 0 0 0 | | x |
| 0 near / top 0 0 | X | y | =
| 0 0 ? ? | | z |
| 0 0 -1 0 | | 1 |
| near / right * x |
= | near / top * y |
| ? |
| -z |
And finally, after dividing by the w component we will get:
| - near / right * x / z |
| - near / top * y / z |
| ? |
Note, that the result matches the equation inferred earlier.
As for the third component that marked with '?'. More complex reasoning is needed to find out how to calculate it. Refer to the songho.ca for more information.
I hope that my explanations make things a bit more clear.

How to interpret the VtkCamera viewTransformMatrix

I have an object at the origin and am moving the camera to (0,650,650) and setting the focal point to the origin i.e.:
vtkSmartPointer<vtkCamera> cam = vtkSmartPointer<vtkCamera>::New();
renderer->SetActiveCamera(cam);
cam->SetFocalPoint(0., 0., 0.);
cam->SetPosition(0., 650, 650);
cam->SetViewAngle(view_angle_);
cam->SetViewUp(0., 1., 0.);
However when I get the view transform matrix of the camera by:
vtkSmartPointer<vtkMatrix4x4> transform_view = cam->GetViewTransformMatrix();
And print it I get the following:
| 1 | 0 | 0 | 0 |
| 0 | cos(45) | -sin(45)| 0 |
| 0 | sin(45) | cos(45) | -919.239 |
| 0 | 0 | 0 | 1 |
Where the rotation part seems correct (45 degrees around the x axis) but the translation seems all wrong. Should the last column not be:
| 0 |
|650|
|650|
Or am I doing something wrong?
It's an old question, but I'll give an answer for the record.
What you expect is the transform w_T_c, i.e. from the camera frame to the world frame.
What GetViewTransformMatrix method returns, is c_T_w, i.e. the transform from the world to the camera frame: given a point in the world frame p_w, its coordinates in the camera frame are p_c = c_T_w * p_w.
In your example, if you inverted your matrix, in the last column you would get the translation values you were looking for.
The focal point is not the same as the "look at" point. The focal point may be in front of or behind the camera. It's the point through which all of the rays of your scene will pass to give your view perspective. Those rays are projected onto the view plane, which is what is rendered.
If you want to look at the origin, you need to set your View Plane Normal vector to be a normalized vector pointing from your camera location to the origin. So, if your camera location is at location L, the View Plane Normal vector should be -L/||L|| where || || is the l-2 norm.

Exactly in the Middle between Samples with GL_NEAREST Filtering: What Does It Return?

What value is going to be returned if 2D texture with GL_NEAREST mag and min filtering (and no mipmapping, i.e. there exists only 1 level) is sampled exactly in the middle between 4 texels?
Unfortunately, there is not a single note about that in the official documentation.
Update:
Actually, the whole cross looks pretty ambiguous to me. In the following figure I've drawn with x the ambiguous points (lines), and the 4 texels are denoted by o:
o---x---o
| x |
x x x x x
| x |
o---x---o
So which values are returned if the texture is sampled at x's?
If I'm reading Section 3.8.8 (page 175) correctly it looks like floor() is used on both axes.
o4--xA--o3
| x |
x x x x xB
| x |
o1--x---o2
So all the xes would sample o1. Except for xA and xB which would sample o4 and o2.
I read the sections of the spec in the selected answer and it seems that the selected answer is wrong.
According to the spec when GL_NEAREST is used, the texture value returned for the specified (s,t,r) is the one of the texel having: i=floor(u), j = floor(v).
Note that the value of a texel (i,j) is the one with coordinates (u,v)=(i+0.5, j+0.5). See fig. 3.10 at page 160.
This boils down to the simple rule that given a sample point, the value of the closest texel is used. In the case where we sample exactly between 2 texel centers, the one on the right or top is used. For example if we wanted to sample at u=1 (which is equally distanced between texel 0 and 1 centers), then we would select i=floor(1)=1, which the texel to the right. The texel's center is at u=1.5.
So this is what would happen:
o4----x3----o3
| x3 |
x4 x4 x3 x3 x3
| x2 |
o1----x2----o2
I also performed a test using OpenGL ES, by sampling a color-coded texture at predefined s,t coordinates from within the fragment shader and it behaved as expected.