How to interpret the VtkCamera viewTransformMatrix - c++

I have an object at the origin and am moving the camera to (0,650,650) and setting the focal point to the origin i.e.:
vtkSmartPointer<vtkCamera> cam = vtkSmartPointer<vtkCamera>::New();
renderer->SetActiveCamera(cam);
cam->SetFocalPoint(0., 0., 0.);
cam->SetPosition(0., 650, 650);
cam->SetViewAngle(view_angle_);
cam->SetViewUp(0., 1., 0.);
However when I get the view transform matrix of the camera by:
vtkSmartPointer<vtkMatrix4x4> transform_view = cam->GetViewTransformMatrix();
And print it I get the following:
| 1 | 0 | 0 | 0 |
| 0 | cos(45) | -sin(45)| 0 |
| 0 | sin(45) | cos(45) | -919.239 |
| 0 | 0 | 0 | 1 |
Where the rotation part seems correct (45 degrees around the x axis) but the translation seems all wrong. Should the last column not be:
| 0 |
|650|
|650|
Or am I doing something wrong?

It's an old question, but I'll give an answer for the record.
What you expect is the transform w_T_c, i.e. from the camera frame to the world frame.
What GetViewTransformMatrix method returns, is c_T_w, i.e. the transform from the world to the camera frame: given a point in the world frame p_w, its coordinates in the camera frame are p_c = c_T_w * p_w.
In your example, if you inverted your matrix, in the last column you would get the translation values you were looking for.

The focal point is not the same as the "look at" point. The focal point may be in front of or behind the camera. It's the point through which all of the rays of your scene will pass to give your view perspective. Those rays are projected onto the view plane, which is what is rendered.
If you want to look at the origin, you need to set your View Plane Normal vector to be a normalized vector pointing from your camera location to the origin. So, if your camera location is at location L, the View Plane Normal vector should be -L/||L|| where || || is the l-2 norm.

Related

How do I invert two axes of a quaternion

I have to convert poses (coordiantes + quaternion for rotation) from two different APIs I'm using. More specifically I get coordinates of objects relative to the camera's local position.
My detection library (for detecting those objects) has the coordinate system of the camera oriented with Z in the direction the camera is looking, X to the right of the camera, and Y down from the camera (if you look from the perspective of the camera itself). I will use ACII Art here to show what I mean:
Symbols:
+------+
| | = camera from the back
+------+
+--+
| +-+
| | = camera from the right side (imagine the front part as the lens)
| +-+
+--+
Detection Coordinate System from the back of the camera
+--------> x
|
| +------+
| | |
V y +------+
Detection Coordinate System from the right side of the camera
+--------> z
| +--+
| | +-+
| | |
V y | +-+
+--+
The library where I use the object poses however has X in the same direction, but Y and Z are both inverted. So Z is pointing opposite the looking direction of the camera and Y is pointing straight up. More ASCII sketches:
Usage Coordinate System from the back of the camera
^ y +------+
| | |
| +------+
|
+--------> x
Usage Coordinate System from the right side of the camera
+--+
| +-+ ^ y
| | |
| +-+ |
+--+ |
z <--------+
So now I get object poses (including rotation) in the detection coordinate system but want to use them in the usage coordinate system. I know I can transform the coordinates by just inverting the values for y and z, but how do I convert the quaternions for the rotation? I tried a few combinations but none seem to work.
In this case your change of basis are just permutations of the axes, so to convert from one to the other you just have to replicate the same permutation in the imaginary vector in the quaternion.
i.e. if your quaternion is (w,x,y,z) and the basis permutation is (z,y,x) your new quaternion is (w,z,y,x).

C++ Rotating Cube in Coordinates (non-draw)

I've been looking for this for quite a long time without any results, been trying to figure out the math for this myself for about a week+.
My goal is to set my cursor position(s) so in the way that it forms a rotating cube much in the way like an OpenGL rotating cube border box would.
Since OpenGL has a rotate function built it, it's not really something I can adapt to.
I just wonder if anyone has any ideas how I'd go about this.
If you're wondering what the point of this is, on each created frame(cube rotating point) it has a function to erase anything drawn in MsPaint and then the next positions begin drawing, basically to create a spinning cube being drawn.
If you try to rotate cube in C without help of any specialized library you should use Matrix operations to transform coordinates.
You sohuld get roatation matrix (Let's call it M)
You should multiply M to your coordinates vector - result is new
coordinates.
for 2D rotation, example (f - rotation angle, +- is rotation direction):
|cos f +-sin f| |x| |x'|
| | | | = | |
|+-sin f cos f| |y| |y'|
for 3D rotation, you should use 3x3 marix. Alsoo you should rotation axis, depending on it you should choose matrix M:
Mx (rotate around x axis):
|1 0 0 ||x| |x'|
|0 cos f -sin f||y| = |y'|
|0 sin f cos f||z| |z'|
My (rotate around y axis):
|cos f 0 sin f ||x| |x'|
| 0 1 0 ||y| = |y'|
|-sin f 0 cos f ||z| |z'|
Mz (rotate around z axis):
| cos f -sin f 0 ||x| |x'|
| sin f cos f 0 ||y| = |y'|
| 0 0 1 ||z| |z'|

Why does graphics pipeline need mapping to clip coordinates and normalized device coordinates?

On perspective projection, if I use simple projection matrix like:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 1/near 0
, which is just projecting onto the image plane. It can be easily get view space coordinates by discarding and normalizing, I think.
If on orthogonal projection, it even does not need the projection matrix.
But, OpenGL graphics pipeline has the above process, though the perspective projection causes a depth precision error.
Why does it need mapping to clip coordinates and normalized device coordinates?
Added
If I use the above projection matrix,
1 0 0 0
p = ( 0 1 0 0 )
0 0 1 0
0 0 1/n 0
v_eye = (x y z 1)
v_clip = p * v_eye = (x y z z/n)
v_ndc = v_clip / v_clip.w = (nx/z ny/z n 1)
Then, v_ndc can be clipped by discarding values over top, bottom, left, right.
Values over far also can be clipped in the same way before multiplying the projection matrix.
Well, it looks like silly though, I think it's easier than before.
ps. I noticed that the depth buffer can't be written in this way. Then, can't it be written before the projection?
Sorry for silly question and gibberish...
In case of orthographic projections, you are right: The perspective divide is not required, but it des not introduce any error, since it is a division by 1. (A orthographic projection matrix contains always [0, 0, 0, 1] in the last row).
For perspective projection, this is a bit more complex:
Let's look at the simplest perspective projection:
1 0 0 0
P = ( 0 1 0 0 )
0 0 1 0
0 0 1 0
Then a vector v=[x,y,z,1] (in view space) gets projected to
v_p = P * v = [x, y, z, z],
which is in projektive space.
Now the perspectve divide is needed to get the perspectve effect (objects closer to the viewer look larger):
v_ndc = v / v.w = [x'/z y'/z, z'/z, 1]
I don't see how this could be achieved without the perspective divide.
Why does it need mapping to clip coordinates and normalized device coordinates?
The space where the programmer leaves the vertices to the GL to be taken care of is the clip space. It's the 4D homogeneous space where the vertices exist before normalization / perspective division. This division, useful to perform perspective projection, is the mapping needed to transform the vertices from clip space to NDC (3D). Why? Similar triangles.
View Space Point
*
/ |
Proj /- |
Y ^ Plane /-- |
| /-- |
| *-- |y
| /-- | |
| /-- |y' |
| /--- | |
<-----+------------+------------+-------
Z O |
|-----d------| |
|------------z------------|
Perspective projection is where rays from the eye/origin cuts through a projection plane hitting the points present in the space. The point where the ray intersects the plane is the projection of the point hit. Lets say we want to project point P on to the projection plane, where all points have z = d. The projected location of P i.e. P' needs to be found. We know that z' will be d (since projection planes lies there). To find y', we know
y ⁄ z = y' ⁄ z' (similar triangles)
y ⁄ z = y' ⁄ d (z' = d by defn. of proj. plane)
y' = (d * y) ⁄ z
This division by z is called the perspective division. This shows that in perspective projection, objects farther, with larger z, appear smaller and objects closer, will smaller z, appear larger.
Another thing which convenient to perform in clip space is, obviously, clipping. In 4D, clipping is which is just checking if the points lie within a range as opposed to the costlier division.
In case of orthographic projection, the projection isn't a frustum but a cuboid — parallel rays come from infinity and not the origin. Hence for point P = (x, y, z), the Z values are just dropped, giving P' = (x, y). Thus the perspective division does nothing (divides by 1) in this case.

Surface normal on depth image

How to estimate the surface normal of point I(i,j) on a depth image (pixel value in mm) without using Point Cloud Library(PCL)? I've gone through (1), (2), and (3) but I'm looking for a simple estimation of surface normal on each pixel with C++ standard library or openCV.
You need to know the camera's intrinsic parameters, so that you can also know the distance between pixels in the same units (mm). This distance between pixels is obviously true for a certain distance from the camera (i.e. the value of the center pixel)
If the camera matrix is K which is typically something like:
f 0 cx
K= 0 f cy
0 0 1
Then, taking a pixel coordinates (x,y), then a ray from the camera origin through the pixel (in camera world coordinate space) is defined using:
x
P = inv(K) * y
1
Depending of whether the distance in your image is a projection on the Z axis, or just a euclidean distance from the center, you need to either normalize the vector P such that the magnitude is the distance to the pixel you want, or make sure the z component of P is this distance. For pixels around the center of the frame this should be close to identical.
If you do the same operation to nearby pixels (say, left and right) you get Pl and Pr in units of mm
Then just find the norm of (Pl-Pr) which is twice the distance between adjacent pixels in mm.
Then, you calculate the gradient in X and Y
gx = (Pi+1,j - Pi-1,j) / (2*pixel_size)
Then, take the two gradients as direction vectors:
ax = atan(gx), ay=atan(gy)
| cos ax 0 sin ax | |1|
dx = | 0 1 0 | * |0|
| -sin ax 0 cos ax | |0|
| 1 0 0 | |0|
dy = | 0 cos ay -sin ay | * |1|
| 0 sin ay cos ay | |0|
N = cross(dx,dy);
You may need to see if the signs make sense, by looking at a certain gradient and seeing of the dx,dy point to the expected direction. You may need to use a negative for none/one/both angles and same for the N vector.

Explanation of the Perspective Projection Matrix (Second row)

I try to figure out how the Perspective Projection Matrix works.
According to this: https://www.opengl.org/sdk/docs/man2/xhtml/gluPerspective.xml
f = cotangent(fovy/2)
Logically I understand how it works (x- and y-Values moving further away from the bounding box or vice versa), but I need an mathematical explanation why this works. Maybe because of the theorem of intersecting lines???
I found an explanation here: http://www.songho.ca/opengl/gl_projectionmatrix.html
But I don't understand the relevent part of it.
As for me, an explanation of the perspective projection matrix at songho.ca is the best one.
I'll try to retell the main idea, without going into details. But, first of all, let's clarify why the cotangent is used in OpenGL docs.
What is cotangent? Accordingly to wikipedia:
The cotangent of an angle is the ratio of the length of the adjacent side to the length of the opposite side.
Look at the picture below, the near is the length of the adjacent side and the top is the length of the opposite side .
The fov/2 is the angle we are interested in.
The angle fov is the angle between the top plane and bottom plane, respectively the angle fov/2 is the angle between top(or botton) plane and the symmetry axis.
So, the [1,1] element of projection matrix that is defined as cotangent(fovy/2) in opengl docs is equivalent to the ratio near/top.
Let's have a look at the point A specified at the picture. Let's find the y' coordinate of the point A' that is a projection of the point A on the near plane.
Using the ratio of similar triangles, the following relation can be inferred:
y' / near = y / -z
Or:
y' = near * y / -z
The y coordinate in normalized device coordinates can be obtained by dividing by the value top (the range (-top, top) is mapped to the range (-1.0,1.0)), so:
yndc = near / top * y / -z
The coefficient near / top is a constant, but what about z? There is one very important detail about normalized device coordinates.
The output of the vertex shader is a four component vector, that is transformed to three component vector in the interpolator by dividing first three component by the fourth component:
,
So, we can assign to the fourth component the value of -z. It can be done by assigning to the element [2,3] of the projection matrix the value -1.
Similar reasoning can be done for the x coordinate.
We have found the following elements of projection matrix:
| near / right 0 0 0 |
| 0 near / top 0 0 |
| 0 0 ? ? |
| 0 0 -1 0 |
There are two elements that we didn't found, they are marked with '?'.
To make things clear, let's project an arbitary point (x,y,z) to normalized device coordinates:
| near / right 0 0 0 | | x |
| 0 near / top 0 0 | X | y | =
| 0 0 ? ? | | z |
| 0 0 -1 0 | | 1 |
| near / right * x |
= | near / top * y |
| ? |
| -z |
And finally, after dividing by the w component we will get:
| - near / right * x / z |
| - near / top * y / z |
| ? |
Note, that the result matches the equation inferred earlier.
As for the third component that marked with '?'. More complex reasoning is needed to find out how to calculate it. Refer to the songho.ca for more information.
I hope that my explanations make things a bit more clear.