Kinect Joint Absolute coordinates - c++

I'm using the Kinect for Windows SDK (v1.8)
I'm successfully reading motion from the Kinect but I'm now wondering how to get the absolute position of the joint I'm tracking (right hand)
I'm using the following function, NuiTransformSkeletonToDepthImage, but the values range from 100-200 in both x and y coordinates.
Any suggestions for how to transform the returned coordinates to screen coordinates?
Got a feeling I'm missing something really obvious though...
Thank you in advance

Ok, after quite a bit of looking around and trial and error; I found the solution.
When you call NuiTransformSkeletonToDepthImage(); you can specify a Kinect resolution using the NUI_IMAGE_RESOLUTION enum. I believe the values are; 80x60, 320x240, 640x480, 1280x960
Depending on what resolution you decide to use; you'll need to divide the returned x and y coords by the corresponding resolution. Then multiply those by your window size and you should have the coordinates in your window.
I believe you can also specify the resolution when creating an instance of the Kinect Sensor interface.
Example as my explanation isn't too concise:
float x, y
const Vector4 fromPos = skeleton.SkeletonPositions[NUI_SKELETON_POSITION_HAND_RIGHT];
NuiTransformSkeletonToDepthImage(fromPos, &x, &y, NUI_IMAGE_RESOLUTION_320x240);
x = x * ((float)WINDOW_WIDTH / 320.0f);
y = y * ((float)WINDOW_HEIGHT / 240.0f);
Any questions; feel free to ask.

Related

Getting 3D world coordinate from (x,y) pixel coordinates

I'm absolutely new to ROS/Gazebo world; this is probably a simple question, but I cannot find a good answer.
I have a simulated depth camera (Kinect) in a Gazebo scene. After some elaborations, I get a point of interest in the RGB image in pixel coordinates, and I want to retrieve its 3D coordinates in the world frame.
I can't understand how to do that.
I have tried compensating the distortions, given by the CameraInfo msg. I have tried using PointCloud with pcl library, retrieving the point as cloud.at(x,y).
In both cases, the coordinates are not correct (I have put a small sphere in the coords given out by the program, so to check if it's correct or no).
Every help would be very appreciated. Thank you very much.
EDIT:
Starting from the PointCloud, I try to find the coords of the points doing something like:
point = cloud.at(xInPixel, yInPixel);
point.x = point.x + cameraPos.x;
point.y = point.y + cameraPos.y;
point.z = point.z - cameraPos.z;
but the x,y,z coords I get as point.x seems not to be correct.
The camera has a pitch angle of pi/2, so to points on the ground.
I am clearly missing something.
I assume you've seen the gazebo examples for the kinect (brief, full). You can get, as topics, the raw image, raw depth, and calculated pointcloud (by setting them in the config):
<imageTopicName>/camera/color/image_raw</imageTopicName>
<cameraInfoTopicName>/camera/color/camera_info</cameraInfoTopicName>
<depthImageTopicName>/camera/depth/image_raw</depthImageTopicName>
<depthImageCameraInfoTopicName>/camera/dept/camera_info</depthImageCameraInfoTopicName>
<pointCloudTopicName>/camera/depth/points</pointCloudTopicName>
Unless you need to do your own things with the image_raw for rgb and depth frames (ex ML over rgb frame & find corresponding depth point via camera_infos), the pointcloud topic should be sufficient - it's the same as the pcl pointcloud in c++, if you include the right headers.
Edit (in response):
There's a magical thing in ros called tf/tf2. Your pointcloud, if you look at the msg.header.frame_id, says something like "camera", indicating it's in the camera frame. tf, like the messaging system in ros, happens in the background, but it looks/listens for transformations from one frame of reference to another frame. You can then transform/query the data in another frame. For example, if the camera is mounted at a rotation to your robot, you can specify a static transformation in your launch file. It seems like you're trying to do the transformation yourself, but you can make tf do it for you; this allows you to easily figure out where points are in the world/map frame, vs in the robot/base_link frame, or in the actuator/camera/etc frame.
I would also look at these ros wiki questions which demo a few different ways to do this, depending on what you want: ex1, ex2, ex3

Is it possible to obtain float touch coordinates in Windows?

I am working on a project in which I need to use high accuracy input coordinates for further calculation.
I call GetTouchInputInfo function in WM_TOUCH message handler to acquire PTOUCHINPUT variables, but the coordinates provided in this struct are indicated in hundredths of a pixel of physical screen coordinates. So basically the last 2 digits are always 0, and after divided by 100, these coordinates are just integer screen coordinates. I don't know why did they make it in hundredth.
Besides, does the touchscreen sensor gives continuous signal instead of discreet signal? As I know, there is a thing called touch prediction in Window, it creates a small touch input lag as it holds a few latest input points and do some prediction algorithm to fix these fresh inputs smooth and pretty. Even if the touchscreen can only sense discreet signal, the input should be float anyway after the prediction algorithm right?
I really couldn't find out how to obtain float touch input. Is there any way or any function to achieve this?
Thank you!
And sorry about my English.
To convert the scaled long coordinates to float do:
PTOUCHINPUT pt;
float xf = pt->x / 100.0;
float yf = pt->y / 100.0;

Kinect: getting the joint coordinates (x,y) in meters

Hi I understand that under the official documentation from Microsoft is that the cameraspacepoints obtained are already expressed in meters. However, I'm still uncertain if the values are in meters as the values are varying in 9 digits or more and positive and negative at times.
Can anybody explain to me what these values are and are they already in meters?
Function to process body (placed the output of the joint[j].position in it
Results
Do let me know if you all require additional information. I apologize as I'm new to kinect. Btw, this is kinect v2!
Each Joint has a Position property, which is a CameraSpecePoint. A CameraSpacePoint object has three properties, X, Y and Z, which are float32 and are expressed in meters.
As explained in the documentation:
Camera space refers to the 3D coordinate system used by Kinect. The coordinate system is defined as follows:
The origin (x=0, y=0, z=0) is located at the center of the IR sensor on Kinect
X grows to the sensor’s left
Y grows up (note that this direction is based on the sensor’s tilt)
Z grows out in the direction the sensor is facing
1 unit = 1 meter
Note also that you have used a %d in the sprintf method to print a float value, while %d should be only used for signed decimal integer (see the table in this page). If you want to print X, Y and Z values, you must use %f; so your sprintf should become like this:
sprint(abc, "Head is at %f on X-axis\n", joints[j].Position.X);

'Ray' creation for raypicking not fully working

I'm trying to implement a 'raypicker' for selecting objects within my project. I do not fully understand how to implement this, but I understand conceptually how it should work. I've been trying to learn how to do this, but most tutorials I find go way over my head. My current code is based on one of the recent tutorials I found, here.
After several hours of revisions, I believe the problem I'm having with my raypicker is actually the creation of the ray in the first place. If I substitute/hardcode my near/far planes with a coordinate that would undisputably be located within the region of a triangle, the picker identifies it correctly.
My problem is this: my ray creation doesn't seem to fully take my current "camera" or perspective into account, so camera rotation won't affect where my mouse is.
I believe to remedy this I need something like using gluUnProject() or something, but whenever I used this the x,y,z coordinates returned would be incredibly small,
My current ray creation is a mess. I tried to use methods that others proposed initially, but it seemed like whatever method I tried it never worked with my picker/intersection function.
Here's the code for my ray creation:
void oglWidget::mousePressEvent(QMouseEvent *event)
{
QVector3D nearP = QVector3D(event->x()+camX, -event->y()-camY, -1.0);
QVector3D farP = QVector3D(event->x()+camX, -event->y()-camY, 1.0);
int i = -1;
for (int x = 0; x < tileCount; x++)
{
bool rayInter = intersect(nearP, farP, tiles[x]->vertices);
if (rayInter == true)
i = x;
}
if (i != -1)
{
tiles[i]->showSelection();
}
else
{
for (int x = 0; x < tileCount; x++)
tiles[x]->hideSelection();
}
//tiles[0]->showSelection();
}
To repeat, I used to load up the viewport, model & projection matrices, and unproject the mouse coordinates, but within a 1920x1080 window, all I get is values in the range of -2 to 2 for x y & z for each mouse event, which is why I'm trying this method, but this method doesn't work with camera rotation and zoom.
I don't want to do pixel color picking, because who knows I may need this technique later on, and I'd rather not give up after the amount of effort I put in so far
As you seem to have problems constructing your rays, here's how I would do it. This has not been tested directly. You could do it like this, making sure that all vectors are in the same space. If you use multiple model matrices (or stacks thereof) the calculation needs to be repeated separately with each of them.
use pos = gluUnproject(winx, winy, near, ...) to get the position of the mouse coordinate on the near plane in model space; near being the value given to glFrustum() or gluPerspective()
origin of the ray is the camera position in model space: rayorig = inv(modelmat) * camera_in_worldspace
the direction of the ray is the normalized vector from the position from 1. to the ray origin: raydir = normalize(pos - rayorig)
On the website linked they use two points for the ray and they don't seem to normalize the ray direction vector, so this is optional.
Ok, so this is the beginning of my trail of breadcrumbs.
I was somehow having issues with the QT datatypes for the matrices, and the logic pertaining to matrix transformations.
This particular problem in this question resulted from not actually performing any transformations whatsoever.
Steps to solving this problem were:
Converting mouse coordinates into NDC space (within the range of -1 to 1: x/screen width * 2 - 1, y - height / height * 2 - 1)
grabbing the 4x4 matrix for my view matrix (can be the one used when rendering, or re calculated)
In a new vector, have it equal the inverse view matrix multiplied by the inverse projection matrix.
In order to build the ray, I had to do the following:
Take the previously calculated value for the matrices that were multiplied together. This will be multiplied by a vector 4 (array of 4 spots), where it will hold the previously calculated x and y coordinates, as well as -1, then +1.
Then this vector will be divided by the last spot value of the entire vector
Create another vector 4 which was just like the last, but instead of -1, put "1" .
Once again divide that by its last spot value.
Now the coordinates for the ray have been created at the far and near planes, so it can intersect with anything along it in the scene.
I opened a series of questions (because of great uncertainty with my series of problems), so parts of my problem overlap in them too.
In here, I learned that I needed to take the screen height into consideration for switching the origin of the y axis for a Cartesian system, since windows has the y axis start at the top left. Additionally, retrieval of matrices was redundant, but also wrong since they were never declared "properly".
In here, I learned that unProject wasn't working because I was trying to pull the model and view matrices using OpenGL functions, but I never actually set them in the first place, because I built the matrices by hand. I solved that problem in 2 fold: I did the math manually, and I made all the matrices of the same data type (they were mixed data types earlier, leading to issues as well).
And lastly, in here, I learned that my order of operations was slightly off (need to multiply matrices by a vector, not the reverse), that my near plane needs to be -1, not 0, and that the last value of the vector which would be multiplied with the matrices (value "w") needed to be 1.
Credits goes to those individuals who helped me solve these problems:
srobins of facepunch, in this thread
derhass from here, in this question, and this discussion
Take a look at
http://www.realtimerendering.com/intersections.html
Lot of help in determining intersections between various kinds of geometry
http://geomalgorithms.com/code.html also has some c++ functions one of them serves your purpose

2d rotation on set of points causes skewing distortion

I'm writing an application in OpenGL (though I don't think this problem is related to that). I have some 2d point set data that I need to rotate. It later gets projected into 3d.
I apply my rotation using this formula:
x' = x cos f - y sin f
y' = y cos f + x sin f
Where 'f' is the angle. When I rotate the point set, the result is skewed. The severity of the effect varies with the angle.
It's hard to describe so I have pictures;
The red things are some simple geometry. The 2d point sets are the vertices for the white polylines you see around them. The first picture shows the undistorted pointsets, and the second picture shows them after rotation. It's not just skew that's occuring with the rotation; sometimes it seems like displacement occurs as well.
The code itself is trivial:
double cosTheta = cos(2.4);
double sinTheta = sin(2.4);
CalcSimplePolyCentroid(listHullVx,xlate);
for(size_t j=0; j < listHullVx.size(); j++) {
// translate
listHullVx[j] = listHullVx[j] - xlate;
// rotate
double xPrev = listHullVx[j].x;
double yPrev = listHullVx[j].y;
listHullVx[j].x = ((xPrev*cosTheta) - (yPrev*sinTheta));
listHullVx[j].y = ((yPrev*cosTheta) + (xPrev*sinTheta));
// translate
listHullVx[j] = listHullVx[j] + xlate;
}
If I comment out the code under '//rotate' above, the output of the application is the first image. And adding it back in gives the second image. There's literally nothing else that's going on (afaik).
The data types being used are all doubles so I don't think its a precision issue. Does anyone have any idea why rotation would cause skewing like the above pictures show?
EDIT
filipe's comment below was correct. This probably has nothing to do with the rotation and I hadn't provided enough information for the problem;
The geometry I've shown in the pictures represents buildings. They're generated from lon/lat map coordinates. In the point data I use to do the transform, I forgot to use an actual projection to cartesian coordinate space and just mapped x->lon, y->lat, and I think this is the reason I'm seeing the distortion. I'm going to request that this question be deleted since I don't think it'll be useful to anyone else.
Update:
As a result of your comments it tunred out the it is unlikely that the bug is in the presented code.
One final other hint: std transform formulars are only valid if the cooridnate system is cartesian,
on ios you sometimes have inverted y Achsis.