Surface normal on depth image - c++

How to estimate the surface normal of point I(i,j) on a depth image (pixel value in mm) without using Point Cloud Library(PCL)? I've gone through (1), (2), and (3) but I'm looking for a simple estimation of surface normal on each pixel with C++ standard library or openCV.

You need to know the camera's intrinsic parameters, so that you can also know the distance between pixels in the same units (mm). This distance between pixels is obviously true for a certain distance from the camera (i.e. the value of the center pixel)
If the camera matrix is K which is typically something like:
f 0 cx
K= 0 f cy
0 0 1
Then, taking a pixel coordinates (x,y), then a ray from the camera origin through the pixel (in camera world coordinate space) is defined using:
x
P = inv(K) * y
1
Depending of whether the distance in your image is a projection on the Z axis, or just a euclidean distance from the center, you need to either normalize the vector P such that the magnitude is the distance to the pixel you want, or make sure the z component of P is this distance. For pixels around the center of the frame this should be close to identical.
If you do the same operation to nearby pixels (say, left and right) you get Pl and Pr in units of mm
Then just find the norm of (Pl-Pr) which is twice the distance between adjacent pixels in mm.
Then, you calculate the gradient in X and Y
gx = (Pi+1,j - Pi-1,j) / (2*pixel_size)
Then, take the two gradients as direction vectors:
ax = atan(gx), ay=atan(gy)
| cos ax 0 sin ax | |1|
dx = | 0 1 0 | * |0|
| -sin ax 0 cos ax | |0|
| 1 0 0 | |0|
dy = | 0 cos ay -sin ay | * |1|
| 0 sin ay cos ay | |0|
N = cross(dx,dy);
You may need to see if the signs make sense, by looking at a certain gradient and seeing of the dx,dy point to the expected direction. You may need to use a negative for none/one/both angles and same for the N vector.

Related

Find (x,y) subpixel coordinates of a maximum using discrete quadratic interpolation

I have to find the subpixel (x,y) coordinates of the maximum value given a set of discrete points.Im ny case, I run cv::matchTemplate function that slides a model window along an image and returns a score value for each pixel position.The result is an image with the score for each position and the location (x0, y0) with of the maximum value, like these values around the maximum value found:
x_1 x0 x1
y_1 |0.91 | 0.89 | 0.90|
y0 |0.92 | 0.99 | 0.89|
y1 |0.95 | 0.95 | 0.90|
I would like to use a quadratic interpolation to find where are the subpixel point coordinates of the interpolated maximum value, using just the nearest neighbors.
In a 1d case, I use this formula (assuming x0 is the origin):
interpolated_x = (x_1-x1)/(2.*(x_1-2.*x0+x1));
For example:
x_1 x0 x1
|0.92 | 0.99 | 0.89|
you get interpolated_x = -0.08823, that is correctly slightly on the left of x0.
Is there some C++ code for the 2d case?

C++ Rotating Cube in Coordinates (non-draw)

I've been looking for this for quite a long time without any results, been trying to figure out the math for this myself for about a week+.
My goal is to set my cursor position(s) so in the way that it forms a rotating cube much in the way like an OpenGL rotating cube border box would.
Since OpenGL has a rotate function built it, it's not really something I can adapt to.
I just wonder if anyone has any ideas how I'd go about this.
If you're wondering what the point of this is, on each created frame(cube rotating point) it has a function to erase anything drawn in MsPaint and then the next positions begin drawing, basically to create a spinning cube being drawn.
If you try to rotate cube in C without help of any specialized library you should use Matrix operations to transform coordinates.
You sohuld get roatation matrix (Let's call it M)
You should multiply M to your coordinates vector - result is new
coordinates.
for 2D rotation, example (f - rotation angle, +- is rotation direction):
|cos f +-sin f| |x| |x'|
| | | | = | |
|+-sin f cos f| |y| |y'|
for 3D rotation, you should use 3x3 marix. Alsoo you should rotation axis, depending on it you should choose matrix M:
Mx (rotate around x axis):
|1 0 0 ||x| |x'|
|0 cos f -sin f||y| = |y'|
|0 sin f cos f||z| |z'|
My (rotate around y axis):
|cos f 0 sin f ||x| |x'|
| 0 1 0 ||y| = |y'|
|-sin f 0 cos f ||z| |z'|
Mz (rotate around z axis):
| cos f -sin f 0 ||x| |x'|
| sin f cos f 0 ||y| = |y'|
| 0 0 1 ||z| |z'|

Why does graphics pipeline need mapping to clip coordinates and normalized device coordinates?

On perspective projection, if I use simple projection matrix like:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 1/near 0
, which is just projecting onto the image plane. It can be easily get view space coordinates by discarding and normalizing, I think.
If on orthogonal projection, it even does not need the projection matrix.
But, OpenGL graphics pipeline has the above process, though the perspective projection causes a depth precision error.
Why does it need mapping to clip coordinates and normalized device coordinates?
Added
If I use the above projection matrix,
1 0 0 0
p = ( 0 1 0 0 )
0 0 1 0
0 0 1/n 0
v_eye = (x y z 1)
v_clip = p * v_eye = (x y z z/n)
v_ndc = v_clip / v_clip.w = (nx/z ny/z n 1)
Then, v_ndc can be clipped by discarding values over top, bottom, left, right.
Values over far also can be clipped in the same way before multiplying the projection matrix.
Well, it looks like silly though, I think it's easier than before.
ps. I noticed that the depth buffer can't be written in this way. Then, can't it be written before the projection?
Sorry for silly question and gibberish...
In case of orthographic projections, you are right: The perspective divide is not required, but it des not introduce any error, since it is a division by 1. (A orthographic projection matrix contains always [0, 0, 0, 1] in the last row).
For perspective projection, this is a bit more complex:
Let's look at the simplest perspective projection:
1 0 0 0
P = ( 0 1 0 0 )
0 0 1 0
0 0 1 0
Then a vector v=[x,y,z,1] (in view space) gets projected to
v_p = P * v = [x, y, z, z],
which is in projektive space.
Now the perspectve divide is needed to get the perspectve effect (objects closer to the viewer look larger):
v_ndc = v / v.w = [x'/z y'/z, z'/z, 1]
I don't see how this could be achieved without the perspective divide.
Why does it need mapping to clip coordinates and normalized device coordinates?
The space where the programmer leaves the vertices to the GL to be taken care of is the clip space. It's the 4D homogeneous space where the vertices exist before normalization / perspective division. This division, useful to perform perspective projection, is the mapping needed to transform the vertices from clip space to NDC (3D). Why? Similar triangles.
View Space Point
*
/ |
Proj /- |
Y ^ Plane /-- |
| /-- |
| *-- |y
| /-- | |
| /-- |y' |
| /--- | |
<-----+------------+------------+-------
Z O |
|-----d------| |
|------------z------------|
Perspective projection is where rays from the eye/origin cuts through a projection plane hitting the points present in the space. The point where the ray intersects the plane is the projection of the point hit. Lets say we want to project point P on to the projection plane, where all points have z = d. The projected location of P i.e. P' needs to be found. We know that z' will be d (since projection planes lies there). To find y', we know
y ⁄ z = y' ⁄ z' (similar triangles)
y ⁄ z = y' ⁄ d (z' = d by defn. of proj. plane)
y' = (d * y) ⁄ z
This division by z is called the perspective division. This shows that in perspective projection, objects farther, with larger z, appear smaller and objects closer, will smaller z, appear larger.
Another thing which convenient to perform in clip space is, obviously, clipping. In 4D, clipping is which is just checking if the points lie within a range as opposed to the costlier division.
In case of orthographic projection, the projection isn't a frustum but a cuboid — parallel rays come from infinity and not the origin. Hence for point P = (x, y, z), the Z values are just dropped, giving P' = (x, y). Thus the perspective division does nothing (divides by 1) in this case.

Explanation of the Perspective Projection Matrix (Second row)

I try to figure out how the Perspective Projection Matrix works.
According to this: https://www.opengl.org/sdk/docs/man2/xhtml/gluPerspective.xml
f = cotangent(fovy/2)
Logically I understand how it works (x- and y-Values moving further away from the bounding box or vice versa), but I need an mathematical explanation why this works. Maybe because of the theorem of intersecting lines???
I found an explanation here: http://www.songho.ca/opengl/gl_projectionmatrix.html
But I don't understand the relevent part of it.
As for me, an explanation of the perspective projection matrix at songho.ca is the best one.
I'll try to retell the main idea, without going into details. But, first of all, let's clarify why the cotangent is used in OpenGL docs.
What is cotangent? Accordingly to wikipedia:
The cotangent of an angle is the ratio of the length of the adjacent side to the length of the opposite side.
Look at the picture below, the near is the length of the adjacent side and the top is the length of the opposite side .
The fov/2 is the angle we are interested in.
The angle fov is the angle between the top plane and bottom plane, respectively the angle fov/2 is the angle between top(or botton) plane and the symmetry axis.
So, the [1,1] element of projection matrix that is defined as cotangent(fovy/2) in opengl docs is equivalent to the ratio near/top.
Let's have a look at the point A specified at the picture. Let's find the y' coordinate of the point A' that is a projection of the point A on the near plane.
Using the ratio of similar triangles, the following relation can be inferred:
y' / near = y / -z
Or:
y' = near * y / -z
The y coordinate in normalized device coordinates can be obtained by dividing by the value top (the range (-top, top) is mapped to the range (-1.0,1.0)), so:
yndc = near / top * y / -z
The coefficient near / top is a constant, but what about z? There is one very important detail about normalized device coordinates.
The output of the vertex shader is a four component vector, that is transformed to three component vector in the interpolator by dividing first three component by the fourth component:
,
So, we can assign to the fourth component the value of -z. It can be done by assigning to the element [2,3] of the projection matrix the value -1.
Similar reasoning can be done for the x coordinate.
We have found the following elements of projection matrix:
| near / right 0 0 0 |
| 0 near / top 0 0 |
| 0 0 ? ? |
| 0 0 -1 0 |
There are two elements that we didn't found, they are marked with '?'.
To make things clear, let's project an arbitary point (x,y,z) to normalized device coordinates:
| near / right 0 0 0 | | x |
| 0 near / top 0 0 | X | y | =
| 0 0 ? ? | | z |
| 0 0 -1 0 | | 1 |
| near / right * x |
= | near / top * y |
| ? |
| -z |
And finally, after dividing by the w component we will get:
| - near / right * x / z |
| - near / top * y / z |
| ? |
Note, that the result matches the equation inferred earlier.
As for the third component that marked with '?'. More complex reasoning is needed to find out how to calculate it. Refer to the songho.ca for more information.
I hope that my explanations make things a bit more clear.

How to convert Euler angles to directional vector?

I have pitch, roll, and yaw angles. How would I convert these to a directional vector?
It'd be especially cool if you can show me a quaternion and/or matrix representation of this!
Unfortunately there are different conventions on how to define these things (and roll, pitch, yaw are not quite the same as Euler angles), so you'll have to be careful.
If we define pitch=0 as horizontal (z=0) and yaw as counter-clockwise from the x axis, then the direction vector will be
x = cos(yaw)*cos(pitch)
y = sin(yaw)*cos(pitch)
z = sin(pitch)
Note that I haven't used roll; this is direction unit vector, it doesn't specify attitude. It's easy enough to write a rotation matrix that will carry things into the frame of the flying object (if you want to know, say, where the left wing-tip is pointing), but it's really a good idea to specify the conventions first. Can you tell us more about the problem?
EDIT:
(I've been meaning to get back to this question for two and a half years.)
For the full rotation matrix, if we use the convention above and we want the vector to yaw first, then pitch, then roll, in order to get the final coordinates in the world coordinate frame we must apply the rotation matrices in the reverse order.
First roll:
| 1 0 0 |
| 0 cos(roll) -sin(roll) |
| 0 sin(roll) cos(roll) |
then pitch:
| cos(pitch) 0 -sin(pitch) |
| 0 1 0 |
| sin(pitch) 0 cos(pitch) |
then yaw:
| cos(yaw) -sin(yaw) 0 |
| sin(yaw) cos(yaw) 0 |
| 0 0 1 |
Combine them, and the total rotation matrix is:
| cos(yaw)cos(pitch) -cos(yaw)sin(pitch)sin(roll)-sin(yaw)cos(roll) -cos(yaw)sin(pitch)cos(roll)+sin(yaw)sin(roll)|
| sin(yaw)cos(pitch) -sin(yaw)sin(pitch)sin(roll)+cos(yaw)cos(roll) -sin(yaw)sin(pitch)cos(roll)-cos(yaw)sin(roll)|
| sin(pitch) cos(pitch)sin(roll) cos(pitch)sin(roll)|
So for a unit vector that starts at the x axis, the final coordinates will be:
x = cos(yaw)cos(pitch)
y = sin(yaw)cos(pitch)
z = sin(pitch)
And for the unit vector that starts at the y axis (the left wing-tip), the final coordinates will be:
x = -cos(yaw)sin(pitch)sin(roll)-sin(yaw)cos(roll)
y = -sin(yaw)sin(pitch)sin(roll)+cos(yaw)cos(roll)
z = cos(pitch)sin(roll)
There are six different ways to convert three Euler Angles into a Matrix depending on the Order that they are applied:
typedef float Matrix[3][3];
struct EulerAngle { float X,Y,Z; };
// Euler Order enum.
enum EEulerOrder
{
ORDER_XYZ,
ORDER_YZX,
ORDER_ZXY,
ORDER_ZYX,
ORDER_YXZ,
ORDER_XZY
};
Matrix EulerAnglesToMatrix(const EulerAngle &inEulerAngle,EEulerOrder EulerOrder)
{
// Convert Euler Angles passed in a vector of Radians
// into a rotation matrix. The individual Euler Angles are
// processed in the order requested.
Matrix Mx;
const FLOAT Sx = sinf(inEulerAngle.X);
const FLOAT Sy = sinf(inEulerAngle.Y);
const FLOAT Sz = sinf(inEulerAngle.Z);
const FLOAT Cx = cosf(inEulerAngle.X);
const FLOAT Cy = cosf(inEulerAngle.Y);
const FLOAT Cz = cosf(inEulerAngle.Z);
switch(EulerOrder)
{
case ORDER_XYZ:
Mx.M[0][0]=Cy*Cz;
Mx.M[0][1]=-Cy*Sz;
Mx.M[0][2]=Sy;
Mx.M[1][0]=Cz*Sx*Sy+Cx*Sz;
Mx.M[1][1]=Cx*Cz-Sx*Sy*Sz;
Mx.M[1][2]=-Cy*Sx;
Mx.M[2][0]=-Cx*Cz*Sy+Sx*Sz;
Mx.M[2][1]=Cz*Sx+Cx*Sy*Sz;
Mx.M[2][2]=Cx*Cy;
break;
case ORDER_YZX:
Mx.M[0][0]=Cy*Cz;
Mx.M[0][1]=Sx*Sy-Cx*Cy*Sz;
Mx.M[0][2]=Cx*Sy+Cy*Sx*Sz;
Mx.M[1][0]=Sz;
Mx.M[1][1]=Cx*Cz;
Mx.M[1][2]=-Cz*Sx;
Mx.M[2][0]=-Cz*Sy;
Mx.M[2][1]=Cy*Sx+Cx*Sy*Sz;
Mx.M[2][2]=Cx*Cy-Sx*Sy*Sz;
break;
case ORDER_ZXY:
Mx.M[0][0]=Cy*Cz-Sx*Sy*Sz;
Mx.M[0][1]=-Cx*Sz;
Mx.M[0][2]=Cz*Sy+Cy*Sx*Sz;
Mx.M[1][0]=Cz*Sx*Sy+Cy*Sz;
Mx.M[1][1]=Cx*Cz;
Mx.M[1][2]=-Cy*Cz*Sx+Sy*Sz;
Mx.M[2][0]=-Cx*Sy;
Mx.M[2][1]=Sx;
Mx.M[2][2]=Cx*Cy;
break;
case ORDER_ZYX:
Mx.M[0][0]=Cy*Cz;
Mx.M[0][1]=Cz*Sx*Sy-Cx*Sz;
Mx.M[0][2]=Cx*Cz*Sy+Sx*Sz;
Mx.M[1][0]=Cy*Sz;
Mx.M[1][1]=Cx*Cz+Sx*Sy*Sz;
Mx.M[1][2]=-Cz*Sx+Cx*Sy*Sz;
Mx.M[2][0]=-Sy;
Mx.M[2][1]=Cy*Sx;
Mx.M[2][2]=Cx*Cy;
break;
case ORDER_YXZ:
Mx.M[0][0]=Cy*Cz+Sx*Sy*Sz;
Mx.M[0][1]=Cz*Sx*Sy-Cy*Sz;
Mx.M[0][2]=Cx*Sy;
Mx.M[1][0]=Cx*Sz;
Mx.M[1][1]=Cx*Cz;
Mx.M[1][2]=-Sx;
Mx.M[2][0]=-Cz*Sy+Cy*Sx*Sz;
Mx.M[2][1]=Cy*Cz*Sx+Sy*Sz;
Mx.M[2][2]=Cx*Cy;
break;
case ORDER_XZY:
Mx.M[0][0]=Cy*Cz;
Mx.M[0][1]=-Sz;
Mx.M[0][2]=Cz*Sy;
Mx.M[1][0]=Sx*Sy+Cx*Cy*Sz;
Mx.M[1][1]=Cx*Cz;
Mx.M[1][2]=-Cy*Sx+Cx*Sy*Sz;
Mx.M[2][0]=-Cx*Sy+Cy*Sx*Sz;
Mx.M[2][1]=Cz*Sx;
Mx.M[2][2]=Cx*Cy+Sx*Sy*Sz;
break;
}
return(Mx);
}
FWIW, some CPU's can compute Sin & Cos simultaneously (for example fsincos on x86). If you do this, you can make it a bit faster with three calls rather than 6 to compute the initial sin & cos values.
Update: There are actually 12 ways depending if you want right-handed or left-handed results -- you can change the "handedness" by negating the angles.
Beta saved my day. However I'm using a slightly different reference coordinate system and my definition of pitch is up\down (nodding your head in agreement) where a positive pitch results in a negative y-component. My reference vector is OpenGl style (down the -z axis) so with yaw=0, pitch=0 the resulting unit vector should equal (0, 0, -1).
If anyone comes across this post and has difficulties translating Beta's formulas to this particular system, the equations I use are:
vDir->X = sin(yaw);
vDir->Y = -(sin(pitch)*cos(yaw));
vDir->Z = -(cos(pitch)*cos(yaw));
Note the sign change and the yaw <-> pitch swap. Hope this will save someone some time.
You need to be clear about your definitions here - in particular, what is the vector you want? If it's the direction an aircraft is pointing, the roll doesn't even affect it, and you're just using spherical coordinates (probably with axes/angles permuted).
If on the other hand you want to take a given vector and transform it by these angles, you're looking for a rotation matrix. The wiki article on rotation matrices contains a formula for a yaw-pitch-roll rotation, based on the xyz rotation matrices. I'm not going to attempt to enter it here, given the greek letters and matrices involved.
If someone stumbles upon looking for implementation in FreeCAD.
import FreeCAD, FreeCADGui
from FreeCAD import Vector
from math import sin, cos, pi
cr = FreeCADGui.ActiveDocument.ActiveView.getCameraOrientation().toEuler()
crx = cr[2] # Roll
cry = cr[1] # Pitch
crz = cr[0] # Yaw
crx = crx * pi / 180.0
cry = cry * pi / 180.0
crz = crz * pi / 180.0
x = sin(crz)
y = -(sin(crx) * cos(crz))
z = cos(crx) * cos(cry)
view = Vector(x, y, z)