DirectShow: Rendering for YV12 - c++

I am trying to understand how to work the rendering for YV12 format. For example, I took a simple sample. See this graph:
The webcam creates frames by size 640x480 in RGB24 or MJPEG . After it the LAV decoder transforms the frames to YV12 and sends them to DS renderer (EVR or VMR9).
The decoder changes the frame width (stride) 640 on 1024. Hence, the output size of frame will be 1.5*1024*640=737280. The normal size for YV12 is 1.5*640*480=460800. I know the stride can be more than the width of real frame (https://learn.microsoft.com/en-us/windows/desktop/medfound/image-stride). My first question - why did the renderer select that value (1024) than another? Can I get it programmatically?
When I replace the LAV decoder with my filter for transformation RGB24/YV12 (https://gist.github.com/thedeemon/8052fb98f8ba154510d7), the renderer shows me a shifted image, though all parameters are the same, as for the first graph:
Why? I noted that VIDEOINFOHEADER2 had the set interlacing flag dwInterlaceFlags. Therefore my next question: Do I have to add interlacing into my filter for normal work of renderer?

My first question - why did the renderer select that value (1024) than another? Can I get it programmatically?
Video renderer is using Direct3D texture as a carrier for the image. When texture is mapped into system memory to enable CPU's write access such extended stride could be applied because of specifics of implementation of video hardware. You get the value 1024 via dynamic media type negotiation as described in Handling Format Changes from the Video Renderer.
Your transformation filter has to handle such updates if you want it to be able to connect to video renderer directly.
You are generally not interested in getting this extended stride value otherwise because the one you get via media type update is the one to be used and you have to accept it.
When I replace the LAV decoder with my filter for transformation RGB24/YV12, the renderer shows me a shifted image, though all parameters are the same, as for the first graph...
Why?
Your filter does not handle stride update right.
...I noted that VIDEOINFOHEADER2 had the set interlacing flag dwInterlaceFlags. Therefore my next question: Do I have to add interlacing into my filter for normal work of renderer?
You don't have interlaced video here. The problem is unrelated to interlaced video.

My solution:
I must right copy a YV12 frame into a video buffer by its three surfaces: Y = 4x4, U = 1x2, V = 1x2. Here is a code for the frame size 640x480:
CVideoGrabberFilter::Transform(IMediaSample *pIn, IMediaSample *pOut)
{
BYTE* pSrcBuf = 0;
pIn->GetPointer(&pSrcBuf);
BYTE* pDstBuf = 0;
pOut->GetPointer(&pDstBuf);
SIZE size;
size.cx = 640;
size.cy = 480;
int nLen = pOut->GetActualDataLength();
BYTE* pDstTmp = new BYTE[nLen];
YV12ConverterFromRGB24(pSrcBufEnd, pDstTmp, size.cx, size.cy);
BYTE* pDst = pDstTmp;
int stride = 1024; //the real video stride for 640x480. For other resolutions you need to use pOut->GetMediaType() for the stride defining.
//Y
for (int y = 0; y < size.cy; ++y)
{
memcpy(pDstBuf, pDst, size.cx);
pDst += size.cx;
pDstBuf += stride;
}
stride /= 2;
size.cy /= 2;
size.cx /= 2;
//U and V
for (int y = 0; y < size.cy; y++ )
{
memcpy(pDstBuf, pDst, size.cx );
pDst += size.cx;
pDstBuf += stride;
memcpy(pDstBuf, pDst, size.cx);
pDst += size.cx;
pDstBuf += stride;
}
delete[] pDstTmp;
}

Related

NVencs Output Bitstream is not readable

I have one question related to Nvidias NVenc API. I want to use the API to encode some OpenGL graphics. My problem is, that the API reports no error throughout the whole program, everything seems to be fine. But the generated output is not readable by, e.g. VLC. If I try to play the generated file, VLC would flash a black screen for about 0.5s, then ends the playback.
The Video has the length of 0, the size of the Vid seems rather small, too.
Resolution is 1280*720 and the size of 5secs recording is only 700kb. Is this realistic?
The flow of the application is as following:
Render to secondary Framebuffer
Download Framebuffer to one of two PBOs (glReadPixels())
Map the PBO of the previous frame, to get a pointer understandable by Cuda.
Call a simple CudaKernel converting OpenGLs RGBA to ARGB which should be understandable by NVenc according to this(p.18). The kernel reads the content of the PBO and writes the converted content into a CudaArray (created with cudaMalloc) which is registered as InputResource with NVenc.
The content of the converted Array gets encoded. A completion event plus the corresponding output bitstream buffer get queued.
A secondary thread listens on the queued output events, if one event is signaled, the Output Bitstream gets mapped and written to hdd.
The initializion of NVenc-Encoder:
InitParams* ip = new InitParams();
m_initParams = ip;
memset(ip, 0, sizeof(InitParams));
ip->version = NV_ENC_INITIALIZE_PARAMS_VER;
ip->encodeGUID = m_encoderGuid; //Used Codec
ip->encodeWidth = width; // Frame Width
ip->encodeHeight = height; // Frame Height
ip->maxEncodeWidth = 0; // Zero means no dynamic res changes
ip->maxEncodeHeight = 0;
ip->darWidth = width; // Aspect Ratio
ip->darHeight = height;
ip->frameRateNum = 60; // 60 fps
ip->frameRateDen = 1;
ip->reportSliceOffsets = 0; // According to programming guide
ip->enableSubFrameWrite = 0;
ip->presetGUID = m_presetGuid; // Used Preset for Encoder Config
NV_ENC_PRESET_CONFIG presetCfg; // Load the Preset Config
memset(&presetCfg, 0, sizeof(NV_ENC_PRESET_CONFIG));
presetCfg.version = NV_ENC_PRESET_CONFIG_VER;
presetCfg.presetCfg.version = NV_ENC_CONFIG_VER;
CheckApiError(m_apiFunctions.nvEncGetEncodePresetConfig(m_Encoder,
m_encoderGuid, m_presetGuid, &presetCfg));
memcpy(&m_encodingConfig, &presetCfg.presetCfg, sizeof(NV_ENC_CONFIG));
// And add information about Bitrate etc
m_encodingConfig.rcParams.averageBitRate = 500000;
m_encodingConfig.rcParams.maxBitRate = 600000;
m_encodingConfig.rcParams.rateControlMode = NV_ENC_PARAMS_RC_MODE::NV_ENC_PARAMS_RC_CBR;
ip->encodeConfig = &m_encodingConfig;
ip->enableEncodeAsync = 1; // Async Encoding
ip->enablePTD = 1; // Encoder handles picture ordering
Registration of CudaResource
m_cuContext->SetCurrent(); // Make the clients cuCtx current
NV_ENC_REGISTER_RESOURCE res;
memset(&res, 0, sizeof(NV_ENC_REGISTER_RESOURCE));
NV_ENC_REGISTERED_PTR resPtr; // handle to the cuda resource for future use
res.bufferFormat = m_inputFormat; // Format is ARGB
res.height = m_height;
res.width = m_width;
// NOTE: I've set the pitch to the width of the frame, because the resource is a non-pitched
//cudaArray. Is this correct? Pitch = 0 would produce no output.
res.pitch = pitch;
res.resourceToRegister = (void*) (uintptr_t) resourceToRegister; //CUdevptr to resource
res.resourceType =
NV_ENC_INPUT_RESOURCE_TYPE::NV_ENC_INPUT_RESOURCE_TYPE_CUDADEVICEPTR;
res.version = NV_ENC_REGISTER_RESOURCE_VER;
CheckApiError(m_apiFunctions.nvEncRegisterResource(m_Encoder, &res));
m_registeredInputResources.push_back(res.registeredResource);
Encoding
m_cuContext->SetCurrent(); // Make Clients context current
MapInputResource(id); //Map the CudaInputResource
NV_ENC_PIC_PARAMS temp;
memset(&temp, 0, sizeof(NV_ENC_PIC_PARAMS));
temp.version = NV_ENC_PIC_PARAMS_VER;
unsigned int currentBufferAndEvent = m_counter % m_registeredEvents.size(); //Counter is inc'ed in every Frame
temp.bufferFmt = m_currentlyMappedInputBuffer.mappedBufferFmt;
temp.inputBuffer = m_currentlyMappedInputBuffer.mappedResource; //got set by MapInputResource
temp.completionEvent = m_registeredEvents[currentBufferAndEvent];
temp.outputBitstream = m_registeredOutputBuffers[currentBufferAndEvent];
temp.inputWidth = m_width;
temp.inputHeight = m_height;
temp.inputPitch = m_width;
temp.inputTimeStamp = m_counter;
temp.pictureStruct = NV_ENC_PIC_STRUCT_FRAME; // According to samples
temp.qpDeltaMap = NULL;
temp.qpDeltaMapSize = 0;
EventWithId latestEvent(currentBufferAndEvent,
m_registeredEvents[currentBufferAndEvent]);
PushBackEncodeEvent(latestEvent); // Store the Event with its ID in a Queue
CheckApiError(m_apiFunctions.nvEncEncodePicture(m_Encoder, &temp));
m_counter++;
UnmapInputResource(id); // Unmap
Every little hint, where to look at, is very much appreciated. I'm running out of ideas what might be wrong.
Thanks a lot!
With the help of hall822 from the nvidia forums I managed to solve the issue.
The primary error was that I registered my cuda resource with a pitch equal to the size of the frame. I'm using a Framebuffer-Renderbuffer to draw my content into. The data of this is a plain, unpitched array. My first thought, giving a pitch equal to zero, failed. The encoder did nothing. The next idea was to set it to the width of the frame, a quarter of the image was encoded.
// NOTE: I've set the pitch to the width of the frame, because the resource is a non-pitched
//cudaArray. Is this correct? Pitch = 0 would produce no output.
res.pitch = pitch;
To answer this question: Yes, it is correct. But the pitch is measured in byte. So because I'm encoding RGBA-Frames, the correct pitch has to be FRAME_WIDTH * 4.
The second error was that my color channels were not right (See point 4 in my opening post). The NVidia enum says that the encoder expects the channels in ARGB format but actually ment is BGRA, so the alpha channel which is always 255 polluted the blue channel.
Edit: This may be due to the fact that NVidia is using little endian internally. I'm writing
my pixel data to a byte array, choosing an other type like int32 may allow one to pass actual ARGB data.

Error runtime update of DXT compressed textures with Directx11

Context:
I'm developing a native C++ Unity 5 plugin that reads in DXT compressed texture data and uploads it to the GPU for further use in Unity. The aim is to create an fast image-sequence player, updating image data on-the-fly. The textures are compressed with an offline console application.
Unity can work with different graphics engines, I'm aiming towards DirectX11 and OpenGL 3.3+.
Problem:
The DirectX runtime texture update code, through a mapped subresource, gives different outputs on different graphics drivers. Updating a texture through such a mapped resource means mapping a pointer to the texture data and memcpy'ing the data from the RAM buffer to the mapped GPU buffer. Doing so, different drivers seem to expect different parameters for the row pitch value when copying bytes. I never had problems on the several Nvidia GPU's I tested on, but AMD and Intel GPU seems to act differently and I get distorted output as shown underneath. Furthermore, I'm working with DXT1 pixel data (0.5bpp) and DXT5 data (1bpp). I can't seem to get the correct pitch parameter for these DXT textures.
Code:
The following initialisation code for generating the d3d11 texture and filling it with initial texture data - e.g. the first frame of an image sequence - works perfect on all drivers. The player pointer points to a custom class that handles all file reads and contains getters for the current loaded DXT compressed frame, it's dimensions, etc...
if (s_DeviceType == kUnityGfxRendererD3D11)
{
HRESULT hr;
DXGI_FORMAT format = (compression_type == DxtCompressionType::DXT_TYPE_DXT1_NO_ALPHA) ? DXGI_FORMAT_BC1_UNORM : DXGI_FORMAT_BC3_UNORM;
// Create texture
D3D11_TEXTURE2D_DESC desc;
desc.Width = w;
desc.Height = h;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.Format = format;
// no anti-aliasing
desc.SampleDesc.Count = 1;
desc.SampleDesc.Quality = 0;
desc.Usage = D3D11_USAGE_DYNAMIC;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
desc.MiscFlags = 0;
// Initial data: first frame
D3D11_SUBRESOURCE_DATA data;
data.pSysMem = player->getBufferPtr();
data.SysMemPitch = 16 * (player->getWidth() / 4);
data.SysMemSlicePitch = 0; // just a 2d texture, no depth
// Init with initial data
hr = g_D3D11Device->CreateTexture2D(&desc, &data, &dxt_d3d_tex);
if (SUCCEEDED(hr) && dxt_d3d_tex != 0)
{
DXT_VERBOSE("Succesfully created D3D Texture.");
DXT_VERBOSE("Creating D3D SRV.");
D3D11_SHADER_RESOURCE_VIEW_DESC SRVDesc;
memset(&SRVDesc, 0, sizeof(SRVDesc));
SRVDesc.Format = format;
SRVDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURE2D;
SRVDesc.Texture2D.MipLevels = 1;
hr = g_D3D11Device->CreateShaderResourceView(dxt_d3d_tex, &SRVDesc, &textureView);
if (FAILED(hr))
{
dxt_d3d_tex->Release();
return hr;
}
DXT_VERBOSE("Succesfully created D3D SRV.");
}
else
{
DXT_ERROR("Error creating D3D texture.")
}
}
The following update code that runs for each new frame has the error somewhere. Please note the commented line containing method 1 using a simple memcpy without any rowpitch specified which works well on NVIDIA drivers.
You can see further in method 2 that I log the different row pitch values. For instace for a 1920x960 frame I get 1920 for the buffer stride, and 2048 for the runtime stride. This 128 pixels difference probably have to be padded (as can be seen in the example pic below) but I can't figure out how. When I just use the mappedResource.RowPitch without dividing it by 4 (done by the bitshift), Unity crashes.
ID3D11DeviceContext* ctx = NULL;
g_D3D11Device->GetImmediateContext(&ctx);
if (dxt_d3d_tex && bShouldUpload)
{
if (player->gather_stats) before_upload = ns();
D3D11_MAPPED_SUBRESOURCE mappedResource;
ctx->Map(dxt_d3d_tex, 0, D3D11_MAP_WRITE_DISCARD, 0, &mappedResource);
/* 1: THIS CODE WORKS ON ALL NVIDIA DRIVERS BUT GENERATES DISTORTED OR NO OUTPUT ON AMD/INTEL: */
//memcpy(mappedResource.pData, player->getBufferPtr(), player->getBytesPerFrame());
/* 2: THIS CODE GENERATES OUTPUT BUT SEEMS TO NEED PADDING? */
BYTE* mappedData = reinterpret_cast<BYTE*>(mappedResource.pData);
BYTE* buffer = player->getBufferPtr();
UINT height = player->getHeight();
UINT buffer_stride = player->getBytesPerFrame() / player->getHeight();
UINT runtime_stride = mappedResource.RowPitch >> 2;
DXT_VERBOSE("Buffer stride: %d", buffer_stride);
DXT_VERBOSE("Runtime stride: %d", runtime_stride);
for (UINT i = 0; i < height; ++i)
{
memcpy(mappedData, buffer, buffer_stride);
mappedData += runtime_stride;
buffer += buffer_stride;
}
ctx->Unmap(dxt_d3d_tex, 0);
}
Example pic 1 - distorted ouput when using memcpy to copy whole buffer without using separate row pitch on AMD/INTEL (method 1)
Example pic 2 - better but still erroneous output when using above code with mappedResource.RowPitch on AMD/INTEL (method 2). The blue bars indicate zone of error, and need to disappear so all pixels align well and form one image.
Thanks for any pointers!
Best,
Vincent
The mapped data row pitch is in byte, when you divide by four, it is definitely an issue.
UINT runtime_stride = mappedResource.RowPitch >> 2;
...
mappedData += runtime_stride; // here you are only jumping one quarter of a row
It is the height count with a BC format that is divide by 4.
Also a BC1 format is 8 bytes per 4x4 block, so the line below should by 8 * and not 16 *, but as long as you handle row stride properly on your side, d3d will understand, you just waste half the memory here.
data.SysMemPitch = 16 * (player->getWidth() / 4);

How to load a RGB format texture from memory using dx?

Here is my trying
in = cvLoadImage("24bpp_1920x1200_1.bmp", 1);
HRESULT err;
IDirect3DTexture9 * texture = NULL;
///D3DFMT_L8, D3DFMT_R8G8B8
err = D3DXCreateTexture(g_pd3dDevice, in->width, in->height, 1, 0 , D3DFMT_R8G8B8, D3DPOOL_MANAGED, &g_pTexture);
D3DLOCKED_RECT lockRect;
RECT rect;
err = g_pTexture->LockRect(0, &lockRect, NULL, 0);//I have specified the format is RGB, then why does lockRect.Picth = 7680?
memcpy(lockRect.pBits, in->imageData, in->widthStep*in->height);
if(FAILED(g_pTexture->UnlockRect(0)))
{
///
}
It can't display an image in RGB format. But it can display an image in grayscale or in RGBA format.
Otherwise, I want to display high resolution image like the sample of displaying image using d3d in Dx Sdk_june10".\DXSDK\Samples\InstalledSamples\Textures" does.
But, again, it can't display an image in size more than 1920x1200px approximately.
How to do ?
Two things to keep in mind:
(1) You need to check for caps that indicates support for the 24-bpp format. Not every Direct3D 9 device supports it.
(2) You have to respect the pitch returned by LockRect, so you need to do the copy scan-line by scan-line, so you can't do the whole thing in just one memcpy. Something like:
BYTE* sptr = in->imageData;
BYTE* dptr = lockRect.pBits;
for( int y = 0; y < in->height; ++y )
{
memcpy( dptr, sptr, in->widthStep );
sptr += in->widthStep;
dptr += lockRect.Pitch;
}
This assumes in->widthStep is the pitch of the source image in bytes, that in->widthStep is <= lockRect.Pitch, and that the format of the source image is identical to the format of the Direct3D resource.

Setting individual pixels of an RGB frame for ffmpeg encoding

I'm trying to change the test pattern of an ffmpeg streamer, Trouble syncing libavformat/ffmpeg with x264 and RTP , into familiar RGB format. My broader goal is to compute frames of a streamed video on the fly.
So I replaced its AV_PIX_FMT_MONOWHITE with AV_PIX_FMT_RGB24, which is "packed RGB 8:8:8, 24bpp, RGBRGB..." according to http://libav.org/doxygen/master/pixfmt_8h.html .
To stuff its pixel array called data, I've tried many variations on
for (int y=0; y<HEIGHT; ++y) {
for (int x=0; x<WIDTH; ++x) {
uint8_t* rgb = data + ((y*WIDTH + x) *3);
const double i = x/double(WIDTH);
// const double j = y/double(HEIGHT);
rgb[0] = 255*i;
rgb[1] = 0;
rgb[2] = 255*(1-i);
}
}
At HEIGHTxWIDTH= 80x60, this version yields
, when I expect a single blue-to-red horizontal gradient.
640x480 yields the same 4-column pattern, but with far more horizontal stripes.
640x640, 160x160, etc, yield three columns, cyan-ish / magenta-ish / yellow-ish, with the same kind of horizontal stripiness.
Vertical gradients behave even more weirdly.
Appearance was unaffected by an AV_PIX_FMT_RGBA attempt (4 not 3 bytes per pixel, alpha=255). Also unaffected by a port from C to C++.
The argument srcStrides passed to sws_scale() is a length-1 array, containing the single int HEIGHT.
Access each Pixel of AVFrame asks the same question in less detail, so far unanswered.
The streamer emits one warning, which I doubt affects appearance:
[rtp # 0x269c0a0] Encoder did not produce proper pts, making some up.
So. How do you set the RGB value of a pixel in a frame to be sent to sws_scale() (and then to x264_encoder_encode() and av_interleaved_write_frame())?
Use avpicture_fill() as described in Encoding a screenshot into a video using FFMPEG .
Instead of passing data directly to sws_scale(), do this:
AVFrame* pic = avcodec_alloc_frame();
avpicture_fill((AVPicture *)pic, data, AV_PIX_FMT_RGB24, WIDTH, HEIGHT);
and then replace the 2nd and 3rd args of sws_scale() with
pic->data, pic->linesize,
Then the gradients above work properly, at many resolutions.
The argument srcStrides passed to sws_scale() is a length-1 array, containing the single int HEIGHT.
Stride (AKA linesize) is the distance in bytes between two lines. For various reasons having mostly to do with optimization it is often larger than simply width in bytes, so there is padding on the end of each line.
In your case, without any padding, stride should be width * 3.

Kinect SDK: align depth and color frames

I'm working with Kinect sensor and I'm trying to align depth and color frames so that I can save them as images which "fit" into each other. I've spent a lot of time going through msdn forums and modest documentation of Kinect SDK and I'm getting absolutely nowhere.
Based on this answer: Kinect: Converting from RGB Coordinates to Depth Coordinates
I have the following function, where depthData and colorData are obtained from NUI_LOCKED_RECT.pBits and mappedData is the output containing new color frame, mapped to depth coordinates:
bool mapColorFrameToDepthFrame(unsigned char *depthData, unsigned char* colorData, unsigned char* mappedData)
{
INuiCoordinateMapper* coordMapper;
// Get coordinate mapper
m_pSensor->NuiGetCoordinateMapper(&coordMapper);
NUI_DEPTH_IMAGE_POINT* depthPoints = new NUI_DEPTH_IMAGE_POINT[640 * 480];
HRESULT result = coordMapper->MapColorFrameToDepthFrame(NUI_IMAGE_TYPE_COLOR, NUI_IMAGE_RESOLUTION_640x480, NUI_IMAGE_RESOLUTION_640x480, 640 * 480, reinterpret_cast<NUI_DEPTH_IMAGE_PIXEL*>(depthData), 640 * 480, depthPoints);
if (FAILED(result))
{
return false;
}
int pos = 0;
int* colorRun = reinterpret_cast<int*>(colorData);
int* mappedRun = reinterpret_cast<int*>(mappedData);
// For each pixel of new color frame
for (int i = 0; i < 640 * 480; ++i)
{
// Find the corresponding pixel in original color frame from depthPoints
pos = (depthPoints[i].y * 640) + depthPoints[i].x;
// Set pixel value if it's within frame boundaries
if (pos < 640 * 480)
{
mappedRun[i] = colorRun[pos];
}
}
return true;
}
All I get when running this code is an unchanged color frame with removed (white) all pixels where depthFrame had no information.
With the OpenNI framework there an option call registration.
IMAGE_REGISTRATION_DEPTH_TO_IMAGE – The depth image is transformed to have the same apparent vantage point as the RGB image.
OpenNI 2.0 and Nite 2.0 works very well to capture Kinect information and there a lot of tutorials.
You can have a look to this :
Kinect with OpenNI
And OpenNi have a example in SimplerViewer that merge Depth and Color maybe you can just look on that and try it.
This might not be the quick answer you're hoping for, but this transformation is done successfully within the ofxKinectNui addon for openFrameworks (see here).
It looks like ofxKinectNui delegates to the GetColorPixelCoordinatesFromDepthPixel function defined here.
I think the problem is that you're calling MapColorFrameToDepthFrame, when you should actually call MapDepthFrameToColorFrame.
The smoking gun is this line of code:
mappedRun[i] = colorRun[pos];
Reading from pos and writing to i is backwards, since pos = depthPoints[i] represents the depth coordinates corresponding to the color coordinates at i. You actually want to iterate over writing all depth coordinates and read from the input color image at the corresponding color coordinates.
I think that in your code there are different not correct lines.
First of all, which kind of depth map are you passing to your function?
Depth data is storred using two bytes for each value, that means that the correct type of the pointer that you should use for your depth data
is unsigned short.
Second point is that from what i have understood, you want to map depth frame to color frame, so the correct function that you have
to call from kinect sdk is MapDepthFrameToColorFrame instead of MapColorFrameToDepthFrame.
Finally the function will return a map of point where for each depth data at position [i], you have the position x and position y where that point should
be mapped.
To do this you don't need for colorData pointer.
So your function should be modified as follow:
/** Method used to build a depth map aligned to color frame
#param [in] depthData : pointer to your data;
#param [out] mappedData : pointer to your aligned depth map;
#return true if is all ok : false whene something wrong
*/
bool DeviceManager::mapColorFrameToDepthFrame(unsigned short *depthData, unsigned short* mappedData){
INuiCoordinateMapper* coordMapper;
NUI_COLOR_IMAGE_POINT* colorPoints = new NUI_COLOR_IMAGE_POINT[640 * 480]; //color points
NUI_DEPTH_IMAGE_PIXEL* depthPoints = new NUI_DEPTH_IMAGE_PIXEL[640 * 480]; // depth pixel
/** BE sURE THAT YOU ARE WORKING WITH THE RIGHT HEIGHT AND WIDTH*/
unsigned long refWidth = 0;
unsigned long refHeight = 0;
NuiImageResolutionToSize( NUI_IMAGE_RESOLUTION_640x480, refWidth, refHeight );
int width = static_cast<int>( refWidth ); //get the image width in a right way
int height = static_cast<int>( refHeight ); //get the image height in a right way
m_pSensor>NuiGetCoordinateMapper(&coordMapper); // get the coord mapper
//Map your frame;
HRESULT result = coordMapper->MapDepthFrameToColorFrame( NUI_IMAGE_RESOLUTION_640x480, width * height, depthPoints, NUI_IMAGE_TYPE_COLOR, NUI_IMAGE_RESOLUTION_640x480, width * height, colorPoints );
if (FAILED(result))
return false;
// apply map in terms of x and y (image coordinates);
for (int i = 0; i < width * height; i++)
if (colorPoints[i].x >0 && colorPoints[i].x < width && colorPoints[i].y>0 && colorPoints[i].y < height)
*(mappedData + colorPoints[i].x + colorPoints[i].y*width) = *(depthData + i );
// free your memory!!!
delete colorPoints;
delete depthPoints;
return true;
}
Make sure that your mappedData has been initialized in correct way, for example as follow.
mappedData = (USHORT*)calloc(width*height, sizeof(ushort));
Remember that kinect sdk does not provide an accurate align function between color and depth data.
If you want an accurate alignment between two images you should use a calibration model.
In that case i suggest you to use the Kinect Calibration Toolbox, based on Heikkilä calibration model.
You can find it in the follow link:
http://www.ee.oulu.fi/~dherrera/kinect/.
First of all, you must calibrate your device.
That means, you should calibrate the RGB and the IR sensor and then find the transformation between RGB and IR.
Once you know this information, you can apply the function:
RGBPoint = RotationMatrix * DepthPoint + TranslationVector
Check OpenCV or ROS projects for further details on it.
Extrinsic Calibration
Intrinsic Calibration