I would like to do skeletal tracking simultaneously from two kinect cameras in SkeletalViewer and obtain the skeleton result. As in my understanding, the Nui_Init() only process the threads for first Kinect (which I suppose index = 0). However could I have the two skeletal tracking run at the same time as I would like to output their result respectively into two text files at the same time.
(eg. for Kinect 0 output to "cam0.txt" while Kinect 1 output to "cam1.txt")
Does anyone has experience in such case or able to help?
Regards,
Eva
PS: I read this from Kinect SDK documentation state that:
If you are using multiple Kinect sensors, skeleton tracking works only on the first device that you Initialize. To switch the device being used to track, uninitialized the old one and initialize the new one.
So is it possible if I want to acquire the coordinates simultaneously? Or even if acquire one by one, how should I call them separately? (as I realize the index of the active Kinect will be 0 which I can't differentiate them).
I assume that you are using MS SkeletalViewer example. The problem with their SkeletalViewer is that they closely tied the display and the skeleton tracking. This make it difficult to change.
Using multiple kinect sensor should be possible, you just need to initialize all the sensors the same way. The best thing to do would be to define a sensor class to wrap kinect sensors. If you don't need the display, you can just write a new program. That's a bit of work but not that much, you can probably get a fully working program for multiple sensors in less than 100 lines. If you need the display, you can rewrite the SkeletalViewer example to use your sensor class but that's more tedious.
Related
I am building a small game engine and using fbx sdk to ipmort fbx mesh and animation, which means I want to store animation in my own class. Now, there are two ways to achieve this:
The first way is to store key frames only. If I store key frames only, I will have the raw animation data within the fbx file, and I can manipulate or adjust the animation whenever I want.
The second way is to sample frames at a fixed rate. Instead of storing key frames, I obtain some of the frames that I need from the fbx file and store them. This way, I may lost the raw animation data, because when I sample frames, the chosen frames may not be the key frames, resulting in minor loss of details.
From my perspective, the first way is the best way but most of the tutorails on the Internet are using the second way. For example, here. Below is a snippet of it:
for (FbxLongLong i = start.GetFrameCount(FbxTime::eFrames24); i <= end.GetFrameCount(FbxTime::eFrames24); ++i)
{
FbxTime currTime;
currTime.SetFrame(i, FbxTime::eFrames24);
*currAnim = new Keyframe();
(*currAnim)->mFrameNum = i;
FbxAMatrix currentTransformOffset = inNode->EvaluateGlobalTransform(currTime) * geometryTransform;
(*currAnim)->mGlobalTransform = currentTransformOffset.Inverse() * currCluster->GetLink()->EvaluateGlobalTransform(currTime);
currAnim = &((*currAnim)->mNext);
}
Notice that the function EvaluateGlobalTransform(..) is an fbxsdk function and it seems the only safe interface between us and an fbx animation on which we can rely. Also it seems the second way(use EvaluateGlobalTransform to sample at a specific rate) is the standard and commonly accepted way to do the job. And there is an explation says:
"The FBX SDK gives you a number of ways to get the data you might
want. Unfortunately due to different DCC tools (Max/Maya/etc) you may
not be able to get exactly the data you want. For instance, let's say
you find the root bone and it has translation on it in the animation.
You can access the transform in a number of ways. You can use the
LclTransform property and ask for the FbxAMatrix at various times. Or
you can call the evaluation functions with a time to get the matrix.
Or you can use the evaluator's EvaluateNode function to evaluate the
node at a time. And finally, the most complicated version is you can
get the curve nodes from the properties and look at the curve's keys.
Given all those options, you might think getting the curves would be
the way to go. Unfortunately Maya, for instance, bakes the animation
data to a set of keys which have nothing to do with the keys actually
setup in Maya. The reason for this is that the curves Maya uses are
not the same as those FBX supports. So, even if you get the curves
directly, they may have hundreds of keys in them since they might have
been baked.
What this means is that basically unless Max has curves supported by
FBX, it may be baking them and you won't have a way to find what the
original two poses in your terms were. Generally you will iterate
through time and sample the scene at a fixed rate. Yup, it kinda sucks
and generally you'll want to simplify the data after sampling."
To sum up:
The first way:
pros: easy to manipulate and adjust, accurate detail, less memory comsuption(if you generate your vertex transformation matrix on the fly)
cons:difficult to get these key frames, not applicable to some fbx files
The second way:
pros:easy to get chosen frames, adaptable to all fbx files
cons:difficult to change the animation, large memory comsuption, inaccurate details
So, my questions are:
Is the second way really is the common way to do this?
Which way do the famous game engines, like Unreal and Unity, use?
If I want to use the first way even though it may not work under some circumstances, how can I get only key frames from an fbx file(i.e. not using EvaluateGlobalTransform but working with FbxAnimStack, FbxAnimLayer, FbxAnimCurveNode, FbxAnimCurve, FbxAnimCurveKey)?
I want to build a depth camera that finds out any image from particular distance. I have already read the following link.
http://www.i-programmer.info/news/194-kinect/7641-microsoft-research-shows-how-to-turn-any-camera-into-a-depth-camera.html
https://jahya.net/blog/how-depth-sensor-works-in-5-minutes/
But couldn't understand clearly which hardware requirements need & how to integrated into all together?
Thanks
Certainly, a depth sensor needs an IR sensor, just like in Kinect or Asus Xtion and other cameras available that provides the depth or range image. However, Microsoft came up with machine learning techniques and using algorithmic modification and research which you can find here. Also here is a video link which shows the mobile camera that has been modified to get depth rendering. But some hardware changes might be necessary if you make a standalone 2D camera into a new performing device. So I would suggest you to see the hardware design of the existing market devices as well.
one way or the other you would need two angles to the same points to get a depth. So search for depth sensors and examples e.g. kinect with ros or openCV or here
also you could transfere two camera streams into a point cloud but that's another story
Here's what I know:
3D Cameras
RGBD and Stereoscopic cameras are popular for these applications but are not always practical / available. I've prototyped with Kinects (v1,v2) and intel cameras (r200,d435). Certainly those are preferred even today.
2D Cameras
IF YOU WANT TO USE RGB DATA FOR DEPTH INFO then you need to have an algorithm that will process the math for each frame; try an RGB SLAM. A good algo will not process ALL the data every frame but it will process all the data once and then look for clues to support evidence of changes to your scene. A number of BIG companies have already done this (it's not that difficult if you have a big team w big money) think Google, Apple, MSFT, etc etc.
Good luck out there, make something amazing!
it already says it in a title and weirdly enough, I'm unable to find information on this.
The Kinect 1 used IR patterns to get depth data for the Kinect, which is shown here:
https://www.youtube.com/watch?v=uq9SEJxZiUg
There you had the problem that you couldn't combine overlapping images of two Kinects, because the IR fields wouldn't be distinguishable. This could be fixed by letting one Kinect shake, while the other on is still (http://dl.acm.org/citation.cfm?id=2207676.2208335&coll=DL&dl=ACM&CFID=625209301&CFTOKEN=54555397)
Is that still the case for the Kinect 2 or did they change the way this works?
I'm studying the use of multiple cameras for computer vision applications. E.g. there is a camera in every corner of the room and the task is human tracking. I would like to simulate this kind of environment. What I need is:
Ability to define dynamic 3D environment, e.g. room and a moving object.
Options to place cameras at different positions and get simulated data set for each camera.
Does anyone have any experience with that? I checked out blender (http://www.blender.org), but currently I'm looking for a faster/easier to use solution.
Could you give me guidance to similar software/libraries (preferably C++ or MATLAB).
you may find ILNumerics perfectly fits your needs:
http://ilnumerics.net
If I get it right! you are looking to simulate camera feed from multiple camera at different positions of an environment.
I dont know of any sites or a working ready made solution, but here is how I would proceed:
Procure 3d point clouds of a dynamic environment (see Kinect 3d slam benchmark datasets) or generate one of your own with Kinect(hoping you have Xbox kinect with you).
Once you got kinect point clouds in PCL point cloud format, you can simulate video feed from various cameras.
A pseudo code such as this will suffice:
#include <pcl_headers>
//this method just discards all 3d depth information and fills the pixels with rgb values
//this is like a snapshot in the pcd_viewer of pcl(point cloud library)
makeImage(cloud,image){};
pcd <- read the point clouds
camera_positions[] <- {new CameraPosition(affine transform)...}
for(camera_position in camera_positions)
pcl::transformPointCloud(pcd,
cloud_out,
camera_position.getAffineTransform()
);
//Now cloud_out contains point cloud in different viewpoint
image <- new Image();
make_image(cloud_out,image);
saveImage(image);
pcl provides a function to transform a point cloud given appropriate parameters pcl::trasformPointCloud()
If you wish not to use pcl then you may wish to check this post and then followed with remaining steps.
I am new to image processing and am writing a small application in which I need to count the number of people entering a store. The entry point is fixed and there are 4 camera feeds in the same video to do the same counting. What can I possibly use to do this?
I have used Running Average and Background subtraction till now and that has given me only parts of the image which involve a person. How do I use this for counting? I am using OpenCV with C++.
Thanks!
If you have at your disposal multiple video stream you can calibrate your system to create a passive stereo framework.
I've already seen many work on this topic like this one:
http://www.dis.uniroma1.it/~iocchi/publications/iocchi-ie05.pdf
You can also take a look at this question:
People counting using OpenCV