Body tracking has low precision on Azure Kinect - c++

I am currently working on a project where the Azure Kinect is meant to be used as an interaction method for a 360° Screen room.
When using the Body Tracking Viewer from Microsoft the camera recognizes me well and precisely but in my application the calculated distance (using distanceTo from each vector) between my handtip joint and thumb joint is all over the place. Without moving my hand it will go from 10mm to 120mm.
I absolutely need this to be a bit more precise to be able to select things with the hand.
I suppose I have a problem in my startup configuration in the camera which causes it to be less precise than the Body Tracking Viewer but I don't know where to look.
As I wasn't sure if the default tracker config uses the lite dnn_model_2_0 or the "big" one, I tried setting it manually tracker_config.model_path = "dnn_model_2_0_op11.onnx"; but with the same result. Going to near field of view instead of wide does help a little bit but it stays very janky. Especially the left hand is all over the place, the right one is almost precise.
Here the startup/config from my cpp project
[...]
MQTTClient client;
MQTTClient_connectOptions conn_opts = MQTTClient_connectOptions_initializer;
MQTTClient_message pubmsg = MQTTClient_message_initializer;
int maint()
{
k4a_device_t device = NULL;
VERIFY(k4a_device_open(0, &device), "\nOpen K4A Device failed! Is the camera connected?");
// Start camera. Make sure depth camera is enabled.
k4a_device_configuration_t deviceConfig = K4A_DEVICE_CONFIG_INIT_DISABLE_ALL;
deviceConfig.depth_mode = K4A_DEPTH_MODE_WFOV_UNBINNED;
deviceConfig.color_resolution = K4A_COLOR_RESOLUTION_OFF;
deviceConfig.camera_fps = K4A_FRAMES_PER_SECOND_15;
VERIFY(k4a_device_start_cameras(device, &deviceConfig), "\nStart K4A cameras failed!");
k4a_calibration_t sensor_calibration;
VERIFY(k4a_device_get_calibration(device, deviceConfig.depth_mode, deviceConfig.color_resolution, &sensor_calibration),
"Get depth camera calibration failed!");
k4abt_tracker_t tracker = NULL;
k4abt_tracker_configuration_t tracker_config = K4ABT_TRACKER_CONFIG_DEFAULT;
VERIFY(k4abt_tracker_create(&sensor_calibration, tracker_config, &tracker), "\nBody tracker initialization failed!");
[...]
}

Related

Using a MotionController Component in Unreal with C++ instead of Blueprint

After iterating through an array of FMotionControllerSource of an OculusInputDevice IMotionController, I found a connected Oculus Right and Left Touch Controller based on it's ETrackingStatus. With the left and right controllers, I can get the location and rotation using the IMotionController API, which Returns the calibration-space orientation of the requested controller's hand.
Here's a reference to the IMotionController API:
https://docs.unrealengine.com/en-US/API/Runtime/HeadMountedDisplay/IMotionController/index.html
I want to apply the location/rotation to a PosableMesh, so that the mesh is shown where the Oculus controller is in reality. Currently, with the code below the 3D model is displayed down from the camera, so the mapping scale is off. I think WorldToMetersScalemight be off. When I use a small number the controller doesn't move the 3D model much, but this might be messing it up.
FVector position;
FRotator rotation;
int id = tracker.deviceIndex;
FName srcName = tracker.motionControllerSource;
bool success = tracker.motionController->GetControllerOrientationAndPosition(id, srcName, rotation, position, 250.0f);
if (success)
{
poseMesh->SetWorldLocationAndRotation(position, rotation);
}
Adding the camera position to the controller position seemed to fix the issue:
// get camera reference during BeginPlay:
camManager = GetWorld()->GetFirstPlayerController()->PlayerCameraManager;
// TickComponent
poseMesh->SetWorldLocationAndRotation(camManager->GetCameraLocation() + position, rotation);

How to improve accuracy of estimateAffine2D (or estimageRigidTransform) in OpenCV?

I have two sets of points, one from time t-1 and current time t. The first set was generated using goodFeaturesToTrack, and the latter from using calcOpticalFlowPyrLK(). Using these two sets of points, I then estimate a transformation matrix via estimateAffine2DPartial() in order to keep track of its scale & rotation. Code snippet is listed below:
// Precompute image pyramids
maxLvl = cv::buildOpticalFlowPyramid(_imgPrev, imPyr1, _winSize, maxLvl, true);
maxLvl = cv::buildOpticalFlowPyramid(tmpImg, imPyr2, _winSize, maxLvl, true);
// Optical flow call for tracking pixels
cv::calcOpticalFlowPyrLK(imPyr1, imPyr2, _currentPoints, nextPts, status, err, _winSize, maxLvl, _terminationCriteria, 0, 0.000001);
// Get transformation matrix between the two data sets
cv::Mat H = cv::estimateAffinePartial2D(_currentPoints, nextPts, inlier_mask, cv::RANSAC, 10.0, 2000, 0.99);
Using H, I then map my masking points using perspectiveTransform(). The result seems accurate for the first few dozen frames until I notice some drift (in terms of rotation) occurring when the object I am tracking continues to rotate (usually when rotation becomes > M_PI). I'm honestly stumped on where the culprit is, but my main suspicion is perhaps my window size for optical flow might be too small, or too big. However, tweaking the window size did not seem to help, the position of my object is still accurate, but the estimated rotation (and scale) got worse. Can anyone hope to shed a light on this?
Warm regards and thanks.
EDIT: Images attached to show drift issue
Starting Frame
First few frames -- Rotation OK
Z-Rotation Drift occurs -- see anchor line has drifted towards the red rectangle.
Lucas Kanade tracker needs more features. Guess the tracking template you provided is not good enough.
(1) Try with other feature rich real images? e.g Opencv feautre tracking template image
(2) fix scale. Since you are doing simulation, you can try to anchor the size first.
calcOpticalFlowPyrLK is widely used in visual inertial state estimation studies. such as Semi direct visual odometry or VINSMONO. You can try to find the code inside those project to see how other people is playing with the feature and parameters

How to convert the depth data from a Kinect 2.0 to a distance value?

I would like to take advantage from the depth sensor of the Kinect 2.0 SDK, but not in the sense that this data is drawn or displayed in the format of an image but rather an integer or something alike. An example of that is if I have my hand very close to the Kinect, I would get an integer value telling me approximately the range between the camera and the obstacle.
maybe something like this. As the obstacle moves the Kinect recalculates the distance and updates maybe every secnond or half a second.
The distance between the kinect and the obstacle is 20 cm
The distance between the kinect and the obstacle is 10 cm
The distance between the kinect and the obstacle is 100 cm
Is that possible?
I searched for tutorials, but al I could find is that the representation is usually using a point cloud or black and white depth image.
Even though this question has been asked some time ago and the person questioning most likely solved this by his own, I just wanted to give everyone else who might have the same problem/ question the C++ code to solve this problem.
Note: the solution is based on the Kinect 2.0 SDK and if I remember correctly, I took it from one of the examples provided in the SDK Browser 2.0, which comes with the Kinect 2.0 SDK. I removed all special modifications I've done and just left the most important aspects (so, you most likely have to modify the void and give it a return parameter of some kind).
The m_pDepthFrameReader is as initialized IDepthFrameReader*
void KinectFusionProcessor::getDistance() {
IDepthFrame* frame = NULL;
if (SUCCEEDED(m_pDepthFrameReader->AcquireLatestFrame(&frame))) {
int width = 512;
int height = 424;
unsigned int sz=(512*424);
unsigned short buf[525 * 424] = {0};
frame->CopyFrameDataToArray(sz,buf);
const unsigned short* curr = (const unsigned short*)buf;
const unsigned short* dataEnd = curr + (width*height);
while (curr < dataEnd) {
// Get depth in millimeters
unsigned short depth = (*curr++);
}
}
if (frame) frame->Release();
}
The Kinect does indeed use a point cloud. And each pixel in the point cloud has an unsigned short integer value that represents the distance in mm from the Kinect. From what I can tell the camera is already able to compensate or objects to the sides of its view being further away then whats in front, so all the data represents the depth those objects are away from the plane the Kinects camera is viewing from.
The Kinect will view up to about 8m but only has reliable data between 4-5m and can't see anything closer than 0.5m.
Since it sounds like your using it as an obstacle detection system I'd suggest monitoring all the data points in the point cloud and averaging out grouped data points that stand apart from others and interpreting them as their own objects with an approximate distance away. The Kinect also updates with 30 frames per second (assuming your hardware can handle the intensive data stream). So you'll simply be constantly monitoring your point cloud for the objects changing distance.
If you start by downloading both the SDK and Kinect Studio you can use the depth/IR programming examples and the studio to get a better understanding of how the data can be used.
This is pretty straightforward. I do not know the exact code in C++ but in C#, once you have the depth frame you need to do the following:
I'll assume that you already know that Y and X point where you want to evaluate the depth value.
Once you know that, you need to first convert each byte of the depth frame into ushort.
After that, you need to calculate the index inside the depthPixels that corresponds to your X and Y point. Typically this is the equation used:
// Get the depth for this pixel
ushort depth = frameData[y * depthFrameDescription.Height + x];
I hope this can be helpful.

c++ - OpenCV - Circular switch cameras with cvCaptureFromCam

My app could be executed in computer with 0 or 100 cameras connected. I need to do code to switch camera until computer have not any more camera to use. In this case, the source should be 0 again. To implement that, I have used the following code:
CvCapture * capture = cvCaptureFromCAM(_source);
// Try to open capture and if it fails go to first camera
if(!capture){
_source = 0;
capture = cvCaptureFromCAM(_source);
}
With this code, I want to try with one source (for example 3) and if the computer have not 3 cameras, go to the first camera (source 0). The issue is that, although source is 5, cvCaptureFromCAM always return a valid capture, with capture for the last camera used, never NULL to switch to 0 and get source from camera 0. Any idea about how implement this "circular" switch?
An option is get the count of cameras and do module operation in this range, but as far I know OpenCV doesn't have one method to get count of available cameras.
"Last camera used" suggests that you did not in fact release that camera. Try releasing the old camera before switching to a new camera.

2D Movement Theory

I have recently been getting into OpenGL/SDL and playing around with objects in 2D/3D Space and as i'm sure most newbies to this area do, have a few queries, about the 'best' way to do something. I quote best, because i'm assuming there isn't a best way, it's personal preference.
So, I have an entity, a simple GL_QUAD, which I want to move around. I have keyboard events setup, i can get the keypress/release events, not a problem.
I have an entity class, which is my GL_QUAD, pseudo implementation....
class Entity
{
void SetVelocity(float x, float y);
}
I then have this event handler code...
if theEvent.Key = UPARROW AND theEvent.State = PRESSED
Entity.SetVelocity(0.0f, -1.0f);
else if theEvent.Key = UPARROW AND theEvent.State = RELEASED
Entity.SetVelocity(0.0f, 0.0f);
My question is, is this the best way to move my entity? This has led me to thinking that we could make it a little more complex, by having a method for adjusting the X/Y Velocity, seperately. As SetVelocity would forget me X velocity if i started moving left? So i could never travel diagonally.
For Example...
class Entity
{
void SetXVelocity(float x);
void SetYVelocity(float y);
}
if theEvent.Key = UPARROW AND theEvent.State = PRESSED
Entity.SetYVelocity(-1.0f);
else if theEvent.Key = UPARROW AND theEvent.State = RELEASED
Entity.SetYVelocity(0.0f);
if theEvent.Key = LEFTARROW AND theEvent.State = PRESSED
Entity.SetXVelocity(-1.0f);
else if theEvent.Key = LEFTARROW AND theEvent.State = RELEASED
Entity.SetXVelocity(0.0f);
This way, if I have an XVelocity and I then press the UPARROW, I will still have my XVelocity, as well as a new YVelocity, thus moving diagonally.
Is there a better way? Am I missing something very simple here?
I am using SDL 1.2, OpenGL, C++. Not sure if there is something in SDL/OpenGL which would help?
Thanks in advance for your answers.
The question is really general since it depends on how you want to model the movement of your objects in your world.
Usually every object has its velocity which is calculated basing on an acceleration and capped to a maximum. This means that a key press alters the acceleration of the object for the specified frame which is then calculated and applied to the current velocity of the object.
This is done through an update phase which goes through all the objects and calculates the velocity change according to the object acceleration. In this way you don't bother to modify the velocity itself but you let your engine to do its calculations depending on the state of every object..
acceleration is applied over a period of time, so in the example by #jack, you would apply an acceleration 10m/s^2 over a time period of one second.
you should also modify your application to make it time based, not frame based.
have a look at this basic game physics introduction, and I would also really have a look at the GameDev.net Physics Tutorials
I assume the way you want movement to work is that you want the player to move only when a key is held.
In short: your solution is fine.
Some potential gotchas to take consideration of: What happens if both left and right is pressed?
Well, what you describe here is a simple finite state machine. You have the different directions in which you can move (plus no movement at all) as the states, and the key-events as transitions. This can usually be implemented quite well using the state pattern, but this is often quite painful in C++ (lots of boilerplate code), and might be over the top for your scenario.
There are of course other ways to represent speed and direction of your entity, e.g. as a 2D-vector (where the length gives the speed). This would enable you to easily represent arbitrary directions and velocities.