Query maximum webcam resolution in OpenCV - c++

I'm dealing with several types of cameras and I need to know the maximum resolution each one is capable.
Is there a way to query such property in OpenCV?
If not, is there any other way? The application will work under Windows (by the moment) and all the project is being developed using C++.

A trick that's working for me:
Just set to a very high resolution (above the capabilities of any usual capture device), then get the current resolution.
You will see that the device will automatically switch to his maximum value.
Code example in Python with OpenCV 3.0:
HIGH_VALUE = 10000
WIDTH = HIGH_VALUE
HEIGHT = HIGH_VALUE
self.__capture = cv2.VideoCapture(0)
fourcc = cv2.VideoWriter_fourcc(*'XVID')
self.__capture.set(cv2.CAP_PROP_FRAME_WIDTH, WIDTH)
self.__capture.set(cv2.CAP_PROP_FRAME_HEIGHT, HEIGHT)
width = int(self.__capture.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(self.__capture.get(cv2.CAP_PROP_FRAME_HEIGHT))
Hope it helps.

FINAL SOLUTION
As the accepted answered by user2949634 was written in Python, I'm posting the equivalent implementation in C++ for completeness:
void query_maximum_resolution(cv::VideoCapture* camera, int& max_width, int& max_height)
{
// Save current resolution
const int current_width = static_cast<int>(camera->get(CV_CAP_PROP_FRAME_WIDTH));
const int current_height = static_cast<int>(camera->get(CV_CAP_PROP_FRAME_HEIGHT));
// Get maximum resolution
camera->set(CV_CAP_PROP_FRAME_WIDTH, 10000);
camera->set(CV_CAP_PROP_FRAME_HEIGHT, 10000);
max_width = static_cast<int>(camera->get(CV_CAP_PROP_FRAME_WIDTH));
max_height = static_cast<int>(camera->get(CV_CAP_PROP_FRAME_HEIGHT));
// Restore resolution
camera->set(CV_CAP_PROP_FRAME_WIDTH, current_width);
camera->set(CV_CAP_PROP_FRAME_HEIGHT, current_height);
}

VideoCapture::get(int propId)
Passing in CV_CAP_PROP_FRAME_WIDTH and CV_CAP_PROP_FRAME_HEIGHT will get you the resolution.
For getting the maximum possible resolution, all the functionality for cv::VideoCapture is in that link. There does not seem to be a possible way to do that directly, probably because many cameras expect you to know the possible resolutions from the manual and to set some flags to toggle what you want. One thing you can try is to keep a list of all common resolutions, then try all of them for each camera with VideoCapture::set while checking the return value to see if it was successful. There aren't many resolutions to search, so this should be viable.

Searching on same topic myself :
As previously answered there is no direct property, but you could find supported resolutions trying to figure out accepted camera resolutions :
trying all possible common resolutions
probing minimum resolution and increasing width/height
Here C++ OpenCV 2.4.8/Windows tested code sample
trying common resolutions solution :
const CvSize CommonResolutions[] = {
cvSize(120, 90),
cvSize(352, 240),
cvSize(352, 288),
// and so on
cvSize(8192, 4608)
};
vector<CvSize> getSupportedResolutions(VideoCapture camera)
{
vector<CvSize> supportedVideoResolutions;
int nbTests = sizeof(CommonResolutions) / sizeof(CommonResolutions[0]);
for (int i = 0; i < nbTests; i++) {
CvSize test = CommonResolutions[i];
// try to set resolution
camera.set(CV_CAP_PROP_FRAME_WIDTH, test.width);
camera.set(CV_CAP_PROP_FRAME_HEIGHT, test.height);
double width = camera.get(CV_CAP_PROP_FRAME_WIDTH);
double height = camera.get(CV_CAP_PROP_FRAME_HEIGHT);
if (test.width == width && test.height == height) {
supportedVideoResolutions.push_back(test);
}
}
return supportedVideoResolutions;
}
Probing solution based on width increment :
vector<CvSize> getSupportedResolutionsProbing(VideoCapture camera)
{
vector<CvSize> supportedVideoResolutions;
int step = 100;
double minimumWidth = 16; // Microvision
double maximumWidth = 1920 + step; // 1080
CvSize currentSize = cvSize(minimumWidth, 1);
CvSize previousSize = currentSize;
while (1) {
camera.set(CV_CAP_PROP_FRAME_WIDTH, currentSize.width);
camera.set(CV_CAP_PROP_FRAME_HEIGHT, currentSize.height);
CvSize cameraResolution = cvSize(
camera.get(CV_CAP_PROP_FRAME_WIDTH),
camera.get(CV_CAP_PROP_FRAME_HEIGHT));
if (cameraResolution.width == previousSize.width
&& cameraResolution.height == previousSize.height)
{
supportedVideoResolutions.push_back(cameraResolution);
currentSize = previousSize = cameraResolution;
}
currentSize.width += step;
if (currentSize.width > maximumWidth)
{
break;
}
}
return supportedVideoResolutions;
}
I hope this will be helpful for futur users.

For Ubuntu: install
sudo apt install v4l-utils
And then, run:
v4l2-ctl -d /dev/video0 --list-formats-ext

This kind of hardware capabilities can be queried from USB devices if the camera is UVC compliant. It depends on the driver / firmware of the device. See for example these Microsoft requirements to guess what kind of support you can expect on Windows platforms.

Related

SDL2 How to position a window on a second monitor?

I am using SDL_SetWindowPosition to position my window. Can I use this function to position my window on another monitor?
UPDATE
Using SDL_GetDisplayBounds will not return the correct monitor positions when the text size is changed in Windows 10. Any ideas how to fix this?
SDL2 uses a global screen space coordinate system. Each display device has its own bounds inside this coordinate space. The following example places a window on a second display device:
// enumerate displays
int displays = SDL_GetNumVideoDisplays();
assert( displays > 1 ); // assume we have secondary monitor
// get display bounds for all displays
vector< SDL_Rect > displayBounds;
for( int i = 0; i < displays; i++ ) {
displayBounds.push_back( SDL_Rect() );
SDL_GetDisplayBounds( i, &displayBounds.back() );
}
// window of dimensions 500 * 500 offset 100 pixels on secondary monitor
int x = displayBounds[ 1 ].x + 100;
int y = displayBounds[ 1 ].y + 100;
int w = 500;
int h = 500;
// so now x and y are on secondary display
SDL_Window * window = SDL_CreateWindow( "title", x, y, w, h, FLAGS... );
Looking at the definition of SDL_WINDOWPOS_CENTERED in SDL_video.h we see it is defined as
#define SDL_WINDOWPOS_CENTERED SDL_WINDOWPOS_CENTERED_DISPLAY(0)
so we could also use the macro SDL_WINDOWPOS_CENTERED_DISPLAY( n ) where n is the display index.
Update for Windows 10 - DPI scaling issue
It seems like there is indeed a bug with SDL2 and changing the DPI scale in Windows (i.e. text scale).
Here are two bug reports relevant to the problem. They are both still apparently unresolved.
https://bugzilla.libsdl.org/show_bug.cgi?id=3433
https://bugzilla.libsdl.org/show_bug.cgi?id=2713
Potential Solution
I am sure that the OP could use the WIN32 api to determine the dpi scale, for scale != 100%, and then correct the bounds by that.
DPI scaling issue ("will not return the correct monitor positions when the text size is changed")
It's a known issue with SDL2 (I encountered it in those versions: 2.0.6, 2.0.7, 2.0.8, probably the older versions have this issue as well).
Solutions:
1) Use manifest file and set there:
<dpiAware>True/PM</dpiAware>
(you need to include the manifest file to your app distribution)
2) Try SetProcessDPIAware().
Yes, you can use SetWindowPosition, if you know the boundaries of the second monitor.
You can use the function SDL_GetDisplayBounds(int displayIndex,SDL_Rect* rect) to get them.

OpenCV 3.1 Stitch images in order they were taken

I am building an Android app to create panoramas. The user captures a set of images and those images
are sent to my native stitch function that was based on https://github.com/opencv/opencv/blob/master/samples/cpp/stitching_detailed.cpp.
Since the images are in order, I would like to match each image only to the next image in the vector.
I found an Intel article that was doing just that with following code:
vector<MatchesInfo> pairwise_matches;
BestOf2NearestMatcher matcher(try_gpu, match_conf);
Mat matchMask(features.size(),features.size(),CV_8U,Scalar(0));
for (int i = 0; i < num_images -1; ++i)
{
matchMask.at<char>(i,i+1) =1;
}
matcher(features, pairwise_matches,matchMask);
matcher.collectGarbage();
Problem is, this wont compile. Im guessing its because im using OpenCV 3.1.
Then I found somewhere that this code would do the same:
int range_width = 2;
BestOf2NearestRangeMatcher matcher(range_width, try_cuda, match_conf);
matcher(features, pairwise_matches);
matcher.collectGarbage();
And for most of my samples this works fine. However sometimes, especially when im stitching
a large set of images (around 15), some objects appear on top of eachother and in places they shouldnt.
I've also noticed that the "beginning" (left side) of the end result is not the first image in the vector either
which is strange.
I am using "orb" as features_type and "ray" as ba_cost_func. Seems like I cant use SURF on OpenCV 3.1.
The rest of my initial parameters look like this:
bool try_cuda = false;
double compose_megapix = -1; //keeps resolution for final panorama
float match_conf = 0.3f; //0.3 default for orb
string ba_refine_mask = "xxxxx";
bool do_wave_correct = true;
WaveCorrectKind wave_correct = detail::WAVE_CORRECT_HORIZ;
int blend_type = Blender::MULTI_BAND;
float blend_strength = 5;
double work_megapix = 0.6;
double seam_megapix = 0.08;
float conf_thresh = 0.5f;
int expos_comp_type = ExposureCompensator::GAIN_BLOCKS;
string seam_find_type = "dp_colorgrad";
string warp_type = "spherical";
So could anyone enlighten me as to why this is not working and how I should match my features? Any help or direction would be much appreciated!
TL;DR : I want to stitch images in the order they were taken, but above codes are not working for me, how can I do that?
So I found out that the issue here is not with the order the images are stitched but rather the rotation that is estimated for the camera parameters in the Homography Based Estimator and the Bundle Ray Adjuster.
Those rotation angles are estimated considering a self rotating camera and my use case envolves an user rotating the camera (which means that will be some translation too.
Because of that (i guess) horizontal angles (around Y axis) are highly overestimated which means that the algorithm considers the set of images cover >= 360 degrees which results in some overlapped areas that shouldnt be overlapped.
Still havent found a solution for that problem though.
matcher() takes UMat as mask instead of Mat object, so try the following code:
vector<MatchesInfo> pairwise_matches;
BestOf2NearestMatcher matcher(try_gpu, match_conf);
Mat matchMask(features.size(),features.size(),CV_8U,Scalar(0));
for (int i = 0; i < num_images -1; ++i)
{
matchMask.at<char>(i,i+1) =1;
}
UMat umask = matchMask.getUMat(ACCESS_READ);
matcher(features, pairwise_matches, umask);
matcher.collectGarbage();

NVencs Output Bitstream is not readable

I have one question related to Nvidias NVenc API. I want to use the API to encode some OpenGL graphics. My problem is, that the API reports no error throughout the whole program, everything seems to be fine. But the generated output is not readable by, e.g. VLC. If I try to play the generated file, VLC would flash a black screen for about 0.5s, then ends the playback.
The Video has the length of 0, the size of the Vid seems rather small, too.
Resolution is 1280*720 and the size of 5secs recording is only 700kb. Is this realistic?
The flow of the application is as following:
Render to secondary Framebuffer
Download Framebuffer to one of two PBOs (glReadPixels())
Map the PBO of the previous frame, to get a pointer understandable by Cuda.
Call a simple CudaKernel converting OpenGLs RGBA to ARGB which should be understandable by NVenc according to this(p.18). The kernel reads the content of the PBO and writes the converted content into a CudaArray (created with cudaMalloc) which is registered as InputResource with NVenc.
The content of the converted Array gets encoded. A completion event plus the corresponding output bitstream buffer get queued.
A secondary thread listens on the queued output events, if one event is signaled, the Output Bitstream gets mapped and written to hdd.
The initializion of NVenc-Encoder:
InitParams* ip = new InitParams();
m_initParams = ip;
memset(ip, 0, sizeof(InitParams));
ip->version = NV_ENC_INITIALIZE_PARAMS_VER;
ip->encodeGUID = m_encoderGuid; //Used Codec
ip->encodeWidth = width; // Frame Width
ip->encodeHeight = height; // Frame Height
ip->maxEncodeWidth = 0; // Zero means no dynamic res changes
ip->maxEncodeHeight = 0;
ip->darWidth = width; // Aspect Ratio
ip->darHeight = height;
ip->frameRateNum = 60; // 60 fps
ip->frameRateDen = 1;
ip->reportSliceOffsets = 0; // According to programming guide
ip->enableSubFrameWrite = 0;
ip->presetGUID = m_presetGuid; // Used Preset for Encoder Config
NV_ENC_PRESET_CONFIG presetCfg; // Load the Preset Config
memset(&presetCfg, 0, sizeof(NV_ENC_PRESET_CONFIG));
presetCfg.version = NV_ENC_PRESET_CONFIG_VER;
presetCfg.presetCfg.version = NV_ENC_CONFIG_VER;
CheckApiError(m_apiFunctions.nvEncGetEncodePresetConfig(m_Encoder,
m_encoderGuid, m_presetGuid, &presetCfg));
memcpy(&m_encodingConfig, &presetCfg.presetCfg, sizeof(NV_ENC_CONFIG));
// And add information about Bitrate etc
m_encodingConfig.rcParams.averageBitRate = 500000;
m_encodingConfig.rcParams.maxBitRate = 600000;
m_encodingConfig.rcParams.rateControlMode = NV_ENC_PARAMS_RC_MODE::NV_ENC_PARAMS_RC_CBR;
ip->encodeConfig = &m_encodingConfig;
ip->enableEncodeAsync = 1; // Async Encoding
ip->enablePTD = 1; // Encoder handles picture ordering
Registration of CudaResource
m_cuContext->SetCurrent(); // Make the clients cuCtx current
NV_ENC_REGISTER_RESOURCE res;
memset(&res, 0, sizeof(NV_ENC_REGISTER_RESOURCE));
NV_ENC_REGISTERED_PTR resPtr; // handle to the cuda resource for future use
res.bufferFormat = m_inputFormat; // Format is ARGB
res.height = m_height;
res.width = m_width;
// NOTE: I've set the pitch to the width of the frame, because the resource is a non-pitched
//cudaArray. Is this correct? Pitch = 0 would produce no output.
res.pitch = pitch;
res.resourceToRegister = (void*) (uintptr_t) resourceToRegister; //CUdevptr to resource
res.resourceType =
NV_ENC_INPUT_RESOURCE_TYPE::NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
res.version = NV_ENC_REGISTER_RESOURCE_VER;
CheckApiError(m_apiFunctions.nvEncRegisterResource(m_Encoder, &res));
m_registeredInputResources.push_back(res.registeredResource);
Encoding
m_cuContext->SetCurrent(); // Make Clients context current
MapInputResource(id); //Map the CudaInputResource
NV_ENC_PIC_PARAMS temp;
memset(&temp, 0, sizeof(NV_ENC_PIC_PARAMS));
temp.version = NV_ENC_PIC_PARAMS_VER;
unsigned int currentBufferAndEvent = m_counter % m_registeredEvents.size(); //Counter is inc'ed in every Frame
temp.bufferFmt = m_currentlyMappedInputBuffer.mappedBufferFmt;
temp.inputBuffer = m_currentlyMappedInputBuffer.mappedResource; //got set by MapInputResource
temp.completionEvent = m_registeredEvents[currentBufferAndEvent];
temp.outputBitstream = m_registeredOutputBuffers[currentBufferAndEvent];
temp.inputWidth = m_width;
temp.inputHeight = m_height;
temp.inputPitch = m_width;
temp.inputTimeStamp = m_counter;
temp.pictureStruct = NV_ENC_PIC_STRUCT_FRAME; // According to samples
temp.qpDeltaMap = NULL;
temp.qpDeltaMapSize = 0;
EventWithId latestEvent(currentBufferAndEvent,
m_registeredEvents[currentBufferAndEvent]);
PushBackEncodeEvent(latestEvent); // Store the Event with its ID in a Queue
CheckApiError(m_apiFunctions.nvEncEncodePicture(m_Encoder, &temp));
m_counter++;
UnmapInputResource(id); // Unmap
Every little hint, where to look at, is very much appreciated. I'm running out of ideas what might be wrong.
Thanks a lot!
With the help of hall822 from the nvidia forums I managed to solve the issue.
The primary error was that I registered my cuda resource with a pitch equal to the size of the frame. I'm using a Framebuffer-Renderbuffer to draw my content into. The data of this is a plain, unpitched array. My first thought, giving a pitch equal to zero, failed. The encoder did nothing. The next idea was to set it to the width of the frame, a quarter of the image was encoded.
// NOTE: I've set the pitch to the width of the frame, because the resource is a non-pitched
//cudaArray. Is this correct? Pitch = 0 would produce no output.
res.pitch = pitch;
To answer this question: Yes, it is correct. But the pitch is measured in byte. So because I'm encoding RGBA-Frames, the correct pitch has to be FRAME_WIDTH * 4.
The second error was that my color channels were not right (See point 4 in my opening post). The NVidia enum says that the encoder expects the channels in ARGB format but actually ment is BGRA, so the alpha channel which is always 255 polluted the blue channel.
Edit: This may be due to the fact that NVidia is using little endian internally. I'm writing
my pixel data to a byte array, choosing an other type like int32 may allow one to pass actual ARGB data.

Why do I get a different behviour in Codename One simulator than on a real Android device?

I am trying to figure out why I get a different behaviour in the simulator (iPhone, Nexus, Nexus5, ... skins ) VS on an Android real device with the following code (my goal is to draw a text over a background image and save the whole in background image resolution) :
Please note that the GUI was done with the Designer.
protected void beforeMain(Form f) {
// The drawing label will contain the whole photo montage
f.setLayout(new LayeredLayout());
final Label drawing = new Label();
f.addComponent(drawing);
String nom = "Hello World";
// Image mutable dans laquelle on va dessiner (fond blancpar défaut)
// synthe is an Image
Image mutableImage = Image.createImage(synthe.getWidth(), synthe.getHeight());
drawing.getUnselectedStyle().setBgImage(mutableImage);
drawing.getUnselectedStyle().setBackgroundType(Style.BACKGROUND_IMAGE_SCALED_FIT);
// Draws over the background image and put all together on the mutable image.
paintSyntheOnBackground(mutableImage.getGraphics(),
synthe,
nom,
synthe.getWidth(),
synthe.getHeight());
long time = new Date().getTime();
OutputStream os;
try {
os = Storage.getInstance().createOutputStream("screenshot_" + Long.toString(time) + ".png");
ImageIO.getImageIO().save(mutableImage, os, ImageIO.FORMAT_PNG, 1.0f);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
} // end of beforeMain
And here is the method I call to draw a text over an image
public void paintSyntheOnBackground(Graphics g,
Image synthe,
final String pNom,
int width, int height) {
Font myFont = g.getFont();
g.setFont(myFont);
int w = myFont.stringWidth(pNom);
int h = myFont.getHeight();
// Added just to see the background
g.setColor(0x0000FF);
g.fillRect(0, 0, width, height);
g.setColor(0xff0000);
int x = g.getTranslateX() + width / 2 - w / 2;
int y = g.getTranslateY() + height / 2 - h / 2;
g.drawRect(x, y, w, h);
g.drawString(pNom, x, y);
} // end of paintSyntheOnBackground
Here is the outcome on the simulator (GoogleNexus7) :
And here is the outcome on the device (Android 4.4) :
My development system features Eclipse on Linux with Codename One V3-4.
I know the simulator cannot reproduce specific case, but here there is nothing special isn't it ? What can I do to make the behaviour on the simulator reflect the real behaviour since it would be much handier to test in the simulator ?
EDIT : After upgrading each of my CN1 project libs from version 114 to 115 (see this question for details on how to do the upgrade), I am now able to get the same behaviour in both simulator and device! Great bug fixing job CN1 team!
Please note : In my case (Eclipse - Linux) I had to upgrade the project libs in each and every Codename One project.
Any help would be greatly appreciated!
Cheers,
This was a really annoying bug that we just fixed now so it can make it to today's release.
The problem only occurs when drawing on a mutable image in a case where the simulator is in scale mode both of which we don't do often as scale mode is less accurate and mutable images are generally slower.
Thanks for keeping up with this.

Kinect for Windows v2 depth to color image misalignment

currently I am developing a tool for the Kinect for Windows v2 (similar to the one in XBOX ONE). I tried to follow some examples, and have a working example that shows the camera image, the depth image, and an image that maps the depth to the rgb using opencv. But I see that it duplicates my hand when doing the mapping, and I think it is due to something wrong in the coordinate mapper part.
here is an example of it:
And here is the code snippet that creates the image (rgbd image in the example)
void KinectViewer::create_rgbd(cv::Mat& depth_im, cv::Mat& rgb_im, cv::Mat& rgbd_im){
HRESULT hr = m_pCoordinateMapper->MapDepthFrameToColorSpace(cDepthWidth * cDepthHeight, (UINT16*)depth_im.data, cDepthWidth * cDepthHeight, m_pColorCoordinates);
rgbd_im = cv::Mat::zeros(depth_im.rows, depth_im.cols, CV_8UC3);
double minVal, maxVal;
cv::minMaxLoc(depth_im, &minVal, &maxVal);
for (int i=0; i < cDepthHeight; i++){
for (int j=0; j < cDepthWidth; j++){
if (depth_im.at<UINT16>(i, j) > 0 && depth_im.at<UINT16>(i, j) < maxVal * (max_z / 100) && depth_im.at<UINT16>(i, j) > maxVal * min_z /100){
double a = i * cDepthWidth + j;
ColorSpacePoint colorPoint = m_pColorCoordinates[i*cDepthWidth+j];
int colorX = (int)(floor(colorPoint.X + 0.5));
int colorY = (int)(floor(colorPoint.Y + 0.5));
if ((colorX >= 0) && (colorX < cColorWidth) && (colorY >= 0) && (colorY < cColorHeight))
{
rgbd_im.at<cv::Vec3b>(i, j) = rgb_im.at<cv::Vec3b>(colorY, colorX);
}
}
}
}
}
Does anyone have a clue of how to solve this? How to prevent this duplication?
Thanks in advance
UPDATE:
If I do a simple depth image thresholding I obtain the following image:
This is what more or less I expected to happen, and not having a duplicate hand in the background. Is there a way to prevent this duplicate hand in the background?
I suggest you use the BodyIndexFrame to identify whether a specific value belongs to a player or not. This way, you can reject any RGB pixel that does not belong to a player and keep the rest of them. I do not think that CoordinateMapper is lying.
A few notes:
Include the BodyIndexFrame source to your frame reader
Use MapColorFrameToDepthSpace instead of MapDepthFrameToColorSpace; this way, you'll get the HD image for the foreground
Find the corresponding DepthSpacePoint and depthX, depthY, instead of ColorSpacePoint and colorX, colorY
Here is my approach when a frame arrives (it's in C#):
depthFrame.CopyFrameDataToArray(_depthData);
colorFrame.CopyConvertedFrameDataToArray(_colorData, ColorImageFormat.Bgra);
bodyIndexFrame.CopyFrameDataToArray(_bodyData);
_coordinateMapper.MapColorFrameToDepthSpace(_depthData, _depthPoints);
Array.Clear(_displayPixels, 0, _displayPixels.Length);
for (int colorIndex = 0; colorIndex < _depthPoints.Length; ++colorIndex)
{
DepthSpacePoint depthPoint = _depthPoints[colorIndex];
if (!float.IsNegativeInfinity(depthPoint.X) && !float.IsNegativeInfinity(depthPoint.Y))
{
int depthX = (int)(depthPoint.X + 0.5f);
int depthY = (int)(depthPoint.Y + 0.5f);
if ((depthX >= 0) && (depthX < _depthWidth) && (depthY >= 0) && (depthY < _depthHeight))
{
int depthIndex = (depthY * _depthWidth) + depthX;
byte player = _bodyData[depthIndex];
// Identify whether the point belongs to a player
if (player != 0xff)
{
int sourceIndex = colorIndex * BYTES_PER_PIXEL;
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // B
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // G
_displayPixels[sourceIndex] = _colorData[sourceIndex++]; // R
_displayPixels[sourceIndex] = 0xff; // A
}
}
}
}
Here is the initialization of the arrays:
BYTES_PER_PIXEL = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;
_colorWidth = colorFrame.FrameDescription.Width;
_colorHeight = colorFrame.FrameDescription.Height;
_depthWidth = depthFrame.FrameDescription.Width;
_depthHeight = depthFrame.FrameDescription.Height;
_bodyIndexWidth = bodyIndexFrame.FrameDescription.Width;
_bodyIndexHeight = bodyIndexFrame.FrameDescription.Height;
_depthData = new ushort[_depthWidth * _depthHeight];
_bodyData = new byte[_depthWidth * _depthHeight];
_colorData = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_displayPixels = new byte[_colorWidth * _colorHeight * BYTES_PER_PIXEL];
_depthPoints = new DepthSpacePoint[_colorWidth * _colorHeight];
Notice that the _depthPoints array has a 1920x1080 size.
Once again, the most important thing is to use the BodyIndexFrame source.
Finally I get some time to write the long awaited answer.
Lets start with some theory to understand what is really happening and then a possible answer.
We should start by knowing the way to pass from a 3D point cloud which has the depth camera as the coordinate system origin to an image in the image plane of the RGB camera. To do that it is enough to use the camera pinhole model:
In here, u and v are the coordinates in the image plane of the RGB camera. the first matrix in the right side of the equation is the camera matrix, AKA intrinsics of the RGB Camera. The following matrix is the rotation and translation of the extrinsics, or better said, the transformation needed to go from the Depth camera coordinate system to the RGB camera coordinate system. The last part is the 3D point.
Basically, something like this, is what the Kinect SDK does. So, what could go wrong that makes the hand gets duplicated? well, actually more than one point projects to the same pixel....
To put it in other words and in the context of the problem in the question.
The depth image, is a representation of an ordered point cloud, and I am querying the u v values of each of its pixels that in reality can be easily converted to 3D points. The SDK gives you the projection, but it can point to the same pixel (usually, the more distance in the z axis between two neighbor points may give this problem quite easily.
Now, the big question, how can you avoid this.... well, I am not sure using the Kinect SDK, since you do not know the Z value of the points AFTER the extrinsics are applied, so it is not possible to use a technique like the Z buffering.... However, you may assume the Z value will be quite similar and use those from the original pointcloud (at your own risk).
If you were doing it manually, and not with the SDK, you can apply the Extrinsics to the points, and the use the project them into the image plane, marking in another matrix which point is mapped to which pixel and if there is one existing point already mapped, check the z values and compared them and always leave the closest point to the camera. Then, you will have a valid mapping without any problems. This way is kind of a naive way, probably you can get better ones, since the problem is now clear :)
I hope it is clear enough.
P.S.:
I do not have Kinect 2 at the moment so I can'T try to see if there is an update relative to this issue or if it still happening the same thing. I used the first released version (not pre release) of the SDK... So, a lot of changes may had happened... If someone knows if this was solve just leave a comment :)