Hi I understand that under the official documentation from Microsoft is that the cameraspacepoints obtained are already expressed in meters. However, I'm still uncertain if the values are in meters as the values are varying in 9 digits or more and positive and negative at times.
Can anybody explain to me what these values are and are they already in meters?
Function to process body (placed the output of the joint[j].position in it
Results
Do let me know if you all require additional information. I apologize as I'm new to kinect. Btw, this is kinect v2!
Each Joint has a Position property, which is a CameraSpecePoint. A CameraSpacePoint object has three properties, X, Y and Z, which are float32 and are expressed in meters.
As explained in the documentation:
Camera space refers to the 3D coordinate system used by Kinect. The coordinate system is defined as follows:
The origin (x=0, y=0, z=0) is located at the center of the IR sensor on Kinect
X grows to the sensor’s left
Y grows up (note that this direction is based on the sensor’s tilt)
Z grows out in the direction the sensor is facing
1 unit = 1 meter
Note also that you have used a %d in the sprintf method to print a float value, while %d should be only used for signed decimal integer (see the table in this page). If you want to print X, Y and Z values, you must use %f; so your sprintf should become like this:
sprint(abc, "Head is at %f on X-axis\n", joints[j].Position.X);
Related
I am currently working on a camera pose estimation project using only one marker with ARUCO.
I used Aruco's Marker Detector to detect markers and get the marker's Rvec and Tvec. I understand these two vectors represent the transform from the marker to the camera, which is the marker's pose w.r.t camera. I form a 4 by 4 matrix called T_marker_camera using these two vectors.
Then, I set up a world frame (left handed) and get the marker's world pose, which is a 4 by 4 transform matrix.
I want to calculate the pose of the camera w.r.t the world frame, and I use the following formula to calculate it:
T_camera_world = T_marker_world * T_marker_camera_inv
Before I perform the above formula, I convert the OpenCV coordinates to the left handed one (flip the sign of x axis).
However, I didn't get the correct x, y, z of the camera w.r.t the world frame.
What did I miss to get the correct answer?
Thanks
The one equation you gave looks right, so the issue is probably somewhere that you didn't show/describe.
A fix in your notation will help clarify.
Write the pose/source frame on the right (input), the reference/destination frame on the left (output). Then your matrices "match up" like dominos.
rvec and tvec yield a matrix that should be called T_cam_marker.
If you want the pose of your camera in the world frame, that is
T_world_cam = T_world_marker * T_marker_cam
T_world_cam = T_world_marker * inv(T_cam_marker)
(equivalent to what you wrote, but domino)
Be sure that you do matrix multiplication, not element-wise multiplication.
To move between left-handed and right-handed coordinate systems, insert a matrix that maps coordinates accordingly. Frames:
OpenCV camera/screen: right-handed, {X right, Y down, Z far}
ARUCO (in OpenCV anyway): right-handed, {X right, Y far, Z up}, first corner is top left (-X+Y quadrant)
whatever leftie frame you have, let's say {X right, Y up, Z far} and it's a screen or something
The hand-change matrix for typical frames on screens is an identity but with the entry for Y being a -1. I don't know why you would flip the X but that's "equivalent", ignoring any rotations.
Assume that I took two panoramic image with vertical offset of H and each image is presented in equirectangular projection with size Xm and Ym. To do this, I place my panoramic camera at position say A and took an image, then move camera H meter up and took another image.
I know that a point in image 1 with coordinate of X1,Y1 is the same point on image 2 with coordinate X2 and Y2(assuming that X1=X2 as we have only vertical offset).
My question is that How I can calculate the range of selected of point (the point that know its X1and Y1 is on image 1 and its position on image 2 is X2 and Y2 from the Point A (where camera was when image no 1 was taken.).
Yes, you can do it - hold on!!!
Key thing y = focal length of your lens - now I can do it!!!
So, I think your question can be re-stated more simply by saying that if you move your camera (on the right in the diagram) up H metres, a point moves down p pixels in the image taken from the new location.
Like this if you imagine looking from the side, across you taking the picture.
If you know the micron spacing of the camera's CCD from its specification, you can convert p from pixels to metres to match the units of H.
Your range from the camera to the plane of the scene is given by x + y (both in red at the bottom), and
x=H/tan(alpha)
y=p/tan(alpha)
so your range is
R = x + y = H/tan(alpha) + p/tan(alpha)
and
alpha = tan inverse(p/y)
where y is the focal length of your lens. As y is likely to be something like 50mm, it is negligible, so, to a pretty reasonable approximation, your range is
H/tan(alpha)
and
alpha = tan inverse(p in metres/focal length)
Or, by similar triangles
Range = H x focal length of lens
--------------------------------
(Y2-Y1) x CCD photosite spacing
being very careful to put everything in metres.
Here is a shot in the dark, given my understanding of the problem at hand you want to do something similar to computer stereo vision, I point you to http://en.wikipedia.org/wiki/Computer_stereo_vision to start. Not sure if this is still possible to do in the manner you are suggesting, it sounds like you may need some more physical constraints but I do remember being able to correlate two 2d points in images after undergoing a strict translation. Think :
lambda[x,y,1]^t = W[r1, tx;r2, ty;ry, tz][x; y; z; 1]^t
Where lamda is a scale factor, W is a 3x3 matrix covering the intrinsic parameters of your camera, r1, r2, and r3 are row vectors that make up the 3x3 rotation matrix (in your case you can assume the identity matrix since you have only applied a translation), and tx, ty, tz which are your translation components.
Since you are looking at two 2d points at the same 3d point [x,y,z] this 3d point is shared by both 2d points. I cannot say if you can rationalize the actual x,y, and z values particularly for your depth calculation but this is where I would start.
In GLSL (specifically 3.00 that I'm using), there are two versions of
atan(): atan(y_over_x) can only return angles between -PI/2, PI/2, while atan(y/x) can take all 4 quadrants into account so the angle range covers everything from -PI, PI, much like atan2() in C++.
I would like to use the second atan to convert XY coordinates to angle.
However, atan() in GLSL, besides not able to handle when x = 0, is not very stable. Especially where x is close to zero, the division can overflow resulting in an opposite resulting angle (you get something close to -PI/2 where you suppose to get approximately PI/2).
What is a good, simple implementation that we can build on top of GLSL atan(y,x) to make it more robust?
I'm going to answer my own question to share my knowledge. We first notice that the instability happens when x is near zero. However, we can also translate that as abs(x) << abs(y). So first we divide the plane (assuming we are on a unit circle) into two regions: one where |x| <= |y| and another where |x| > |y|, as shown below:
We know that atan(x,y) is much more stable in the green region -- when x is close to zero we simply have something close to atan(0.0) which is very stable numerically, while the usual atan(y,x) is more stable in the orange region. You can also convince yourself that this relationship:
atan(x,y) = PI/2 - atan(y,x)
holds for all non-origin (x,y), where it is undefined, and we are talking about atan(y,x) that is able to return angle value in the entire range of -PI,PI, not atan(y_over_x) which only returns angle between -PI/2, PI/2. Therefore, our robust atan2() routine for GLSL is quite simple:
float atan2(in float y, in float x)
{
bool s = (abs(x) > abs(y));
return mix(PI/2.0 - atan(x,y), atan(y,x), s);
}
As a side note, the identity for mathematical function atan(x) is actually:
atan(x) + atan(1/x) = sgn(x) * PI/2
which is true because its range is (-PI/2, PI/2).
Depending on your targeted platform, this might be a solved problem. The OpenGL spec for atan(y, x) specifies that it should work in all quadrants, leaving behavior undefined only when x and y are both 0.
So one would expect any decent implementation to be stable near all axes, as this is the whole purpose behind 2-argument atan (or atan2).
The questioner/answerer is correct in that some implementations do take shortcuts. However, the accepted solution makes the assumption that a bad implementation will always be unstable when x is near zero: on some hardware (my Galaxy S4 for example) the value is stable when x is near zero, but unstable when y is near zero.
To test your GLSL renderer's implementation of atan(y,x), here's a WebGL test pattern. Follow the link below and as long as your OpenGL implementation is decent, you should see something like this:
Test pattern using native atan(y,x): http://glslsandbox.com/e#26563.2
If all is well, you should see 8 distinct colors (ignoring the center).
The linked demo samples atan(y,x) for several values of x and y, including 0, very large, and very small values. The central box is atan(0.,0.)--undefined mathematically, and implementations vary. I've seen 0 (red), PI/2 (green), and NaN (black) on hardware I've tested.
Here's a test page for the accepted solution. Note: the host's WebGL version lacks mix(float,float,bool), so I added an implementation that matches the spec.
Test pattern using atan2(y,x) from accepted answer: http://glslsandbox.com/e#26666.0
Your proposed solution still fails in the case x=y=0. Here both of the atan() functions return NaN.
Further I would not rely on mix to switch between the two cases. I am not sure how this is implemented/compiled, but IEEE float rules for x*NaN and x+NaN result again in NaN. So if your compiler really used mix/interpolation the result should be NaN for x=0 or y=0.
Here is another fix which solved the problem for me:
float atan2(in float y, in float x)
{
return x == 0.0 ? sign(y)*PI/2 : atan(y, x);
}
When x=0 the angle can be ±π/2. Which of the two depends on y only. If y=0 too, the angle can be arbitrary (vector has length 0). sign(y) returns 0 in that case which is just ok.
Sometimes the best way to improve the performance of a piece of code is to avoid calling it in the first place. For example, one of the reasons you might want to determine the angle of a vector is so that you can use this angle to construct a rotation matrix using combinations of the angle's sine and cosine. However, the sine and cosine of a vector (relative to the origin) are already hidden in plain sight inside the vector itself. All you need to do is to create a normalized version of the vector by dividing each vector coordinate by the total length of the vector. Here's the two-dimensional example to calculate the sine and cosine of the angle of vector [ x y ]:
double length = sqrt(x*x + y*y);
double cos = x / length;
double sin = y / length;
Once you have the sine and cosine values, you can now directly populate a rotation matrix with these values to perform a clockwise or counterclockwise rotation of arbitrary vectors by the same angle, or you can concatenate a second rotation matrix to rotate to an angle other than zero. In this case, you can think of the rotation matrix as "normalizing" the angle to zero for an arbitrary vector. This approach is extensible to the three-dimensional (or N-dimensional) case as well, although for example you will have three angles and six sin/cos pairs to calculate (one angle per plane) for 3D rotation.
In situations where you can use this approach, you get a big win by bypassing the atan calculation completely, which is possible since the only reason you wanted to determine the angle was to calculate the sine and cosine values. By skipping the conversion to angle space and back, you not only avoid worrying about division by zero, but you also improve precision for angles which are near the poles and would otherwise suffer from being multiplied/divided by large numbers. I've successfully used this approach in a GLSL program which rotates a scene to zero degrees to simplify a computation.
It can be easy to get so caught up in an immediate problem that you can lose sight of why you need this information in the first place. Not that this works in every case, but sometimes it helps to think out of the box...
A formula that gives an angle in the four quadrants for any value
of coordinates x and y. For x=y=0 the result is undefined.
f(x,y)=pi()-pi()/2*(1+sign(x))* (1-sign(y^2))-pi()/4*(2+sign(x))*sign(y)
-sign(x*y)*atan((abs(x)-abs(y))/(abs(x)+abs(y)))
I'm using the Kinect for Windows SDK (v1.8)
I'm successfully reading motion from the Kinect but I'm now wondering how to get the absolute position of the joint I'm tracking (right hand)
I'm using the following function, NuiTransformSkeletonToDepthImage, but the values range from 100-200 in both x and y coordinates.
Any suggestions for how to transform the returned coordinates to screen coordinates?
Got a feeling I'm missing something really obvious though...
Thank you in advance
Ok, after quite a bit of looking around and trial and error; I found the solution.
When you call NuiTransformSkeletonToDepthImage(); you can specify a Kinect resolution using the NUI_IMAGE_RESOLUTION enum. I believe the values are; 80x60, 320x240, 640x480, 1280x960
Depending on what resolution you decide to use; you'll need to divide the returned x and y coords by the corresponding resolution. Then multiply those by your window size and you should have the coordinates in your window.
I believe you can also specify the resolution when creating an instance of the Kinect Sensor interface.
Example as my explanation isn't too concise:
float x, y
const Vector4 fromPos = skeleton.SkeletonPositions[NUI_SKELETON_POSITION_HAND_RIGHT];
NuiTransformSkeletonToDepthImage(fromPos, &x, &y, NUI_IMAGE_RESOLUTION_320x240);
x = x * ((float)WINDOW_WIDTH / 320.0f);
y = y * ((float)WINDOW_HEIGHT / 240.0f);
Any questions; feel free to ask.
I am building a recognition program in C++ and to make it more robust, I need to be able to find the distance of an object in an image.
Say I have an image that was taken 22.3 inches away of an 8.5 x 11 picture. The system correctly identifies that picture in a box with the dimensions 319 pixels by 409 pixels.
What is an effective way for relating the actual Height and width (AH and AW) and the pixel Height and width (PH and PW) to the distance (D)?
I am assuming that when I actually go to use the equation, PH and PW will be inversely proportional to D and AH and AW are constants (as the recognized object will always be an object where the user can indicate width and height).
I don't know if you changed your question at some point but my first answer it quite complicated for what you want. You probably can do something simpler.
1) Long and complicated solution (more general problems)
First you need the know the size of the object.
You can to look at computer vision algorithms. If you know the object (its dimensions and shape). Your main problem is the problem of pose estimation (that is find the position of the object relative the camera) from this you can find the distance. You can look at [1] [2] (for example, you can find other articles on it if you are interested) or search for POSIT, SoftPOSIT. You can formulate the problem as an optimization problem : find the pose in order to minimize the "difference" between the real image and the expected image (the projection of the object given the estimated pose). This difference is usually the sum of the (squared) distances between each image point Ni and the projection P(Mi) of the corresponding object (3D) point Mi for the current parameters.
From this you can extract the distance.
For this you need to calibrate you camera (roughly, find the relation between the pixel position and the viewing angle).
Now you may not want do code all of this for by yourself, you can use Computer Vision libs such as OpenCV, Gandalf [3] ...
Now you may want to do something more simple (and approximate). If you can find the image distance between two points at the same "depth" (Z) from the camera, you can relate the image distance d to the real distance D with : d = a D/Z (where a is a parameter of the camera related to the focal length, number of pixels that you can find using camera calibration)
2) Short solution (for you simple problem)
But here is the (simple, short) answer : if you picture in on a plane parallel to the "camera plane" (i.e. it is perfectly facing the camera) you can use :
PH = a AH / Z
PW = a AW / Z
where Z is the depth of the plane of the picture and a in an intrinsic parameter of the camera.
For reference the pinhole camera model relates image coordinated m=(u,v) to world coordinated M=(X,Y,Z) with :
m ~ K M
[u] [ au as u0 ] [X]
[v] ~ [ av v0 ] [Y]
[1] [ 1 ] [Z]
[u] = [ au as ] X/Z + u0
[v] [ av ] Y/Z + v0
where "~" means "proportional to" and K is the matrix of intrinsic parameters of the camera. You need to do camera calibration to find the K parameters. Here I assumed au=av=a and as=0.
You can recover the Z parameter from any of those equations (or take the average for both). Note that the Z parameter is not the distance from the object (which varies on the different points of the object) but the depth of the object (the distance between the camera plane and the object plane). but I guess that is what you want anyway.
[1] Linear N-Point Camera Pose Determination, Long Quan and Zhongdan Lan
[2] A Complete Linear 4-Point Algorithm for Camera Pose Determination, Lihong Zhi and Jianliang Tang
[3] http://gandalf-library.sourceforge.net/
If you know the size of the real-world object and the angle of view of the camera then assuming you know the horizontal angle of view alpha(*), the horizontal resolution of the image is xres, then the distance dw to an object in the middle of the image that is xp pixels wide in the image, and xw meters wide in the real world can be derived as follows (how is your trigonometry?):
# Distance in "pixel space" relates to dinstance in the real word
# (we take half of xres, xw and xp because we use the half angle of view):
(xp/2)/dp = (xw/2)/dw
dw = ((xw/2)/(xp/2))*dp = (xw/xp)*dp (1)
# we know xp and xw, we're looking for dw, so we need to calculate dp:
# we can do this because we know xres and alpha
# (remember, tangent = oposite/adjacent):
tan(alpha) = (xres/2)/dp
dp = (xres/2)/tan(alpha) (2)
# combine (1) and (2):
dw = ((xw/xp)*(xres/2))/tan(alpha)
# pretty print:
dw = (xw*xres)/(xp*2*tan(alpha))
(*) alpha = The angle between the camera axis and a line going through the leftmost point on the middle row of the image that is just visible.
Link to your variables:
dw = D, xw = AW, xp = PW
This may not be a complete answer but may push you in the right direction. Ever seen how NASA does it on those pictures from space? The way they have those tiny crosses all over the images. Thats how they get a fair idea about the deapth and size of the object as far as I know. The solution might be to have an object that you know the correct size and deapth of in the picture and then calculate the others' relative to that. Time for you to do some research. If thats the way NASA does it then it should be worth checking out.
I have got to say This is one of the most interesting questions i have seen for a long time on stackoverflow :D. I just noticed you have only two tags attached to this question. Adding something more in relation to images might help you better.