flipping depth frame received from Kinect - c++

I use the following c++ code to read out the depth information from the kinect:
BYTE * rgbrun = m_depthRGBX;
const USHORT * pBufferRun = (const USHORT *)LockedRect.pBits;
// end pixel is start + width*height - 1
const USHORT * pBufferEnd = pBufferRun + (Width * Height);
// process data for display in main window.
while ( pBufferRun < pBufferEnd )
{
// discard the portion of the depth that contains only the player index
USHORT depth = NuiDepthPixelToDepth(*pBufferRun);
BYTE intensity = static_cast<BYTE>(depth % 256);
// Write out blue byte
*(rgbrun++) = intensity;
// Write out green byte
*(rgbrun++) = intensity;
// Write out red byte
*(rgbrun++) = intensity;
++rgbrun;
++pBufferRun;
}
What I'd like to know is, what is the easiest way to implement frame flipping (horizontal & vertical)? I couldn't find any function in the kinect SDK, but maybe I missed it?
EDIT1 I'd like to not having to use any external libraries, so any solutions that explain the depth data layout and how to invert rows / columns, is highly appreciated.

So, you're using a standard 16bpp single channel depth map with player data. This is a nice easy format to work with. An image buffer is arranged row-wise, and each pixel in the image data has the bottom 3 bits set to the player ID and the top 13 bits set to depth data.
Here's a quick'n'dirty way to read each row in reverse, and write it out to an RGBWhatever image with a simple depth visualisation that's a little nicer to look at that the wrapping output you currently use.
BYTE * rgbrun = m_depthRGBX;
const USHORT * pBufferRun = (const USHORT *)LockedRect.pBits;
for (unsigned int y = 0; y < Height; y++)
{
for (unsigned int x = 0; x < Width; x++)
{
// shift off the player bits
USHORT depthIn = pBufferRun[(y * Width) + (Width - 1 - x)] >> 3;
// valid depth is (generally) in the range 0 to 4095.
// here's a simple visualisation to do a greyscale mapping, with white
// being closest. Set 0 (invalid pixel) to black.
BYTE intensity =
depthIn == 0 || depthIn > 4095 ?
0 : 255 - (BYTE)(((float)depthIn / 4095.0f) * 255.0f);
*(rgbrun++) = intensity;
*(rgbrun++) = intensity;
*(rgbrun++) = intensity;
++rgbrun;
}
}
Code untested, E&OE, etc ;-)
It is possible to parallelise the outer loop, if instead of using a single rgbrun pointer you get a pointer to the beginning of the current row and write the output to that instead.

Related

How to convert CMSampleBufferRef/CIImage/UIImage into pixels e.g. uint8_t[]

I have input from captured camera frame as CMSampleBufferRef and I need to get the raw pixels preferably in C type uint8_t[].
I also need to find the color scheme of the input image.
I know how to convert CMSampleBufferRef to UIImage and then to NSData with png format but I dont know how to get the raw pixels from there. Perhaps I could get it already from CMSampleBufferRef/CIImage`?
This code shows the need and the missing bits.
Any thoughts where to start?
int convertCMSampleBufferToPixelArray (CMSampleBufferRef sampleBuffer)
{
// inputs
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
CIImage *ciImage = [CIImage imageWithCVPixelBuffer:imageBuffer];
CIContext *imgContext = [CIContext new];
CGImageRef cgImage = [imgContext createCGImage:ciImage fromRect:ciImage.extent];
UIImage *uiImage = [UIImage imageWithCGImage:cgImage];
NSData *nsData = UIImagePNGRepresentation(uiImage);
// Need to fill this gap
uint8_t* data = XXXXXXXXXXXXXXXX;
ImageFormat format = XXXXXXXXXXXXXXXX; // one of: GRAY8, RGB_888, YV12, BGRA_8888, ARGB_8888
// sample showing expected data values
// this routine converts the image data to gray
//
int width = uiImage.size.width;
int height = uiImage.size.height;
const int size = width * height;
std::unique_ptr<uint8_t[]> new_data(new uint8_t[size]);
for (int i = 0; i < size; ++i) {
new_data[i] = uint8_t(data[i * 3] * 0.299f + data[i * 3 + 1] * 0.587f +
data[i * 3 + 2] * 0.114f + 0.5f);
}
return 1;
}
Some pointers you can use to search for more info. It's nicely documented and you shouldn't have an issue.
int convertCMSampleBufferToPixelArray (CMSampleBufferRef sampleBuffer) {
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
if (imageBuffer == NULL) {
return -1;
}
// Get address of the image buffer
CVPixelBufferLockBaseAddress(imageBuffer, 0);
uint8_t* data = CVPixelBufferGetBaseAddress(imageBuffer);
// Get size
size_t width = CVPixelBufferGetWidth(imageBuffer);
size_t height = CVPixelBufferGetHeight(imageBuffer);
// Get bytes per row
size_t bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer);
// At `data` you have a bytesPerRow * height bytes of the image data
// To get pixel info you can call CVPixelBufferGetPixelFormatType, ...
// you can call CVImageBufferGetColorSpace and inspect it, ...
// When you're done, unlock the base address
CVPixelBufferUnlockBaseAddress(imageBuffer, 0);
return 0;
}
There're couple of things you should be aware of.
First one is that it can be planar. Check the CVPixelBufferIsPlanar, CVPixelBufferGetPlaneCount, CVPixelBufferGetBytesPerRowOfPlane, etc.
Second one is that you have to calculate pixel size based on CVPixelBufferGetPixelFormatType. Something like:
CVPixelBufferGetPixelFormatType(imageBuffer)
size_t pixelSize;
switch (pixelFormat) {
case kCVPixelFormatType_32BGRA:
case kCVPixelFormatType_32ARGB:
case kCVPixelFormatType_32ABGR:
case kCVPixelFormatType_32RGBA:
pixelSize = 4;
break;
// + other cases
}
Let's say that the buffer is not planar and:
CVPixelBufferGetWidth returns 200 (pixels)
Your pixelSize is 4 (calcuated bytes per row is 200 * 4 = 800)
CVPixelBufferGetBytesPerRow can return anything >= 800
In other words, the pointer you have is not a pointer to a contiguous buffer. If you need row data you have to do something like this:
uint8_t* data = CVPixelBufferGetBaseAddress(imageBuffer);
// Get size
size_t width = CVPixelBufferGetWidth(imageBuffer);
size_t height = CVPixelBufferGetHeight(imageBuffer);
size_t pixelSize = 4; // Let's pretend it's calculated pixel size
size_t realRowSize = width * pixelSize;
size_t bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer);
for (int row = 0 ; row < height ; row++) {
// bytesPerRow acts like an offset where the next row starts
// bytesPerRow can be >= realRowSize
uint8_t *rowData = data + row * bytesPerRow;
// realRowSize = how many bytes are available for this row
// copy them somewhere
}
You have to allocate a buffer and copy these row data there if you'd like to have contiguous buffer. How many bytes to allocate? CVPixelBufferGetDataSize.

LibPNG segmentation fault on png_read_image

I'm having a segmentation fault on png_read_image() and I can't figure out why.
Here's the code:
/*
Initializing pngReadStruct & pngInfoStruct...
*/
// Getting image's width & height
png_uint_32 imgWidth = png_get_image_width(pngReadStruct, pngInfoStruct);
png_uint_32 imgHeight = png_get_image_height(pngReadStruct, pngInfoStruct);
// Getting bits per channel (not per pixel)
png_uint_32 bitDepth = png_get_bit_depth(pngReadStruct, pngInfoStruct);
// Getting number of channels
png_uint_32 channels = png_get_channels(pngReadStruct, pngInfoStruct);
// Getting color type (RGB, RGBA, luminance, alpha, palette, etc)
png_uint_32 colorType = png_get_color_type(pngReadStruct, pngInfoStruct);
// Refining color type (if colored or grayscale)
switch (colorType) {
case PNG_COLOR_TYPE_PALETTE:
png_set_palette_to_rgb(pngReadStruct);
// If RBG image, setting channel number to 3
channels = 3;
break;
case PNG_COLOR_TYPE_GRAY:
if (bitDepth < 8)
png_set_expand_gray_1_2_4_to_8(pngReadStruct);
// Updating bitdepth info
bitDepth = 8;
break;
default:
break;
}
// Adding full alpha channel to the image if it possesses transparency
if (png_get_valid(pngReadStruct, pngInfoStruct, PNG_INFO_tRNS)) {
png_set_tRNS_to_alpha(pngReadStruct);
channels += 1;
}
// Defining an array to contain image's rows of pixels
std::vector<png_bytep> rowPtrs(imgHeight);
// Defining an array to contain image's pixels (data's type is 'std::unique_ptr<char[]>')
data = std::make_unique<char[]>(imgWidth * imgHeight * bitDepth * channels / 8);
const unsigned long int rowLength = imgWidth * bitDepth * channels / 8;
// Adding every pixel into previously allocated rows
for (unsigned int i = 0; i < imgHeight; ++i) {
// Preparing the rows to handle image's data
rowPtrs[i] = (png_bytep)&data + ((imgHeight - i - 1) * rowLength);
}
// Recovering image data
png_read_image(pngReadStruct, rowPtrs.data()); // /!\ Segfault here
png_destroy_read_struct(&pngReadStruct, static_cast<png_infopp>(0), static_cast<png_infopp>(0));
Every characteristic taken from the file seems fine to me, and it worked without error just a while ago; it probably is a stupid error I made while refactoring.
Thanks for the help, feel free to ask anything else I'd have missed & sorry for the long code!

DirectX 11 and FreeType

Has anyone ever integrated FreeType with DirectX 11 for font rendering? The only article I seem to find is DirectX 11 Font Rendering. I can't seem to match the correct DXGI_FORMAT for rendering the grayscale bitmap that FreeType creates for a glyph.
There's three ways to handle greyscale textures in Direct3D 11:
Option (1): You can use an RGB format and replicate the channels. For example, you'd use DXGI_R8G8B8A8_UNORM and set R,G,B to the single monochrome channel and the A to all opaque (0xFF). You can handle Monochrome + Alpha (2 channel) data the same way.
This conversion is supported when loading .DDS luminance formats (D3DFMT_L8, D3DFMT_L8A8) by DirectXTex library and the texconv command-line tool with the -xlum switch.
This makes the texture up to 4 times larger in memory, but easily integrates using standard shaders.
Option (2): You keep the monochrome texture as a single channel using DXGI_FORMAT_R8_UNORM as your format. You then render using a custom shader which replicates the red channel to RGB at runtime.
This is in fact what the tutorial blog post you linked to is doing:
///////// PIXEL SHADER
float4 main(float2 uv : TEXCOORD0) : SV_Target0
{
return float4(Decal.Sample(Bilinear, uv).rrr, 1.f);
}
For Monochrome + Alpha (2-channel) you'd use DXGI_FORMAT_R8G8_UNORM and then your custom shader would use .rrrg as the swizzle.
Option (3): You can compress the monochrome data to the DXGI_FORMAT_BC2 format using a custom encoder. This is implemented in DirectX Tool Kit's MakeSpriteFont tool when using /TextureFormat:CompressedMono
// CompressBlock (16 pixels (4x4 block) stored as 16 bytes)
long alphaBits = 0;
int rgbBits = 0;
int pixelCount = 0;
for (int y = 0; y < 4; y++)
{
for (int x = 0; x < 4; x++)
{
long alpha;
int rgb;
// This is the single monochrome channel
int value = bitmapData[blockX + x, blockY + y];
if (options.NoPremultiply)
{
// If we are not premultiplied, RGB is always white and we have 4 bit alpha.
alpha = value >> 4;
rgb = 0;
}
else
{
// For premultiplied encoding, quantize the source value to 2 bit precision.
if (value < 256 / 6)
{
alpha = 0;
rgb = 1;
}
else if (value < 256 / 2)
{
alpha = 5;
rgb = 3;
}
else if (value < 256 * 5 / 6)
{
alpha = 10;
rgb = 2;
}
else
{
alpha = 15;
rgb = 0;
}
}
// Add this pixel to the alpha and RGB bit masks.
alphaBits |= alpha << (pixelCount * 4);
rgbBits |= rgb << (pixelCount * 2);
pixelCount++;
}
}
// The resulting BC2 block is:
// uint64_t = alphaBits
// uint16_t = 0xFFFF
// uint16_t = 0x0
// uint32_t = rgbBits
The resulting texture is then rendered using a standard alpha-blending shader. Since it uses 1 byte per pixel, this is effectively the same size as if you were using DXGI_FORMAT_R8_UNORM.
This technique does not work for 2-channel data, but works great for alpha-blended monochrome images like font glyphs.

C++AMP Computing gradient using texture on a 16 bit image

I am working with depth images retrieved from kinect which are 16 bits. I found some difficulties on making my own filters due to the index or the size of the images.
I am working with Textures because allows to work with any bit size of images.
So, I am trying to compute an easy gradient to understand what is wrong or why it doesn't work as I expected.
You can see that there is something wrong when I use y dir.
For x:
For y:
That's my code:
typedef concurrency::graphics::texture<unsigned int, 2> TextureData;
typedef concurrency::graphics::texture_view<unsigned int, 2> Texture
cv::Mat image = cv::imread("Depth247.tiff", CV_LOAD_IMAGE_ANYDEPTH);
//just a copy from another image
cv::Mat image2(image.clone() );
concurrency::extent<2> imageSize(640, 480);
int bits = 16;
const unsigned int nBytes = imageSize.size() * 2; // 614400
{
uchar* data = image.data;
// Result data
TextureData texDataD(imageSize, bits);
Texture texR(texDataD);
parallel_for_each(
imageSize,
[=](concurrency::index<2> idx) restrict(amp)
{
int x = idx[0];
int y = idx[1];
// 65535 is the maxium value that can take a pixel with 16 bits (2^16 - 1)
int valX = (x / (float)imageSize[0]) * 65535;
int valY = (y / (float)imageSize[1]) * 65535;
texR.set(idx, valX);
});
//concurrency::graphics::copy(texR, image2.data, imageSize.size() *(bits / 8u));
concurrency::graphics::copy_async(texR, image2.data, imageSize.size() *(bits) );
cv::imshow("result", image2);
cv::waitKey(50);
}
Any help will be very appreciated.
Your indexes are swapped in two places.
int x = idx[0];
int y = idx[1];
Remember that C++AMP uses row-major indices for arrays. Thus idx[0] refers to row, y axis. This is why the picture you have for "For x" looks like what I would expect for texR.set(idx, valY).
Similarly the extent of image is also using swapped values.
int valX = (x / (float)imageSize[0]) * 65535;
int valY = (y / (float)imageSize[1]) * 65535;
Here imageSize[0] refers to the number of columns (the y value) not the number of rows.
I'm not familiar with OpenCV but I'm assuming that it also uses a row major format for cv::Mat. It might invert the y axis with 0, 0 top-left not bottom-left. The Kinect data may do similar things but again, it's row major.
There may be other places in your code that have the same issue but I think if you double check how you are using index and extent you should be able to fix this.

binary file bit manipuation

I have a binary file of image data where each pixel is exactly 4 bits. Image data is laid out as follow:
There a N images where the first image is 1x1, the second image is 2x2, the third is 4x4, and so on (they are mipmaps if you care to know).
Given a pointer to the start of the data buffer, I want to skip to the biggest image.
Now I know how many bytes I want to skip, but there is this annoying 1x1 image at the start which is 4 bits. I am not aware of anyway to increment a pointer by bit.
How can I successfully retrieve the data without everything being off by 4 bits?
Assuming you can change your file format you can do either of the following:
Add padding to the 1x1 image
Store the images in reverse order (effectively the same as above, but not ideal for mip-maps because you don't necessarily know how many images you will have)
If you can't change your format, you have these choices:
Convert the data
Accept that the buffer is offset by half a byte and work with it accordingly
You said:
How can I successfully retrieve the data without everything being off
by 4 bits?
So that means you need to convert. When you calculate your offset in bytes, you will find that the first one contains half a byte of the previous image. So in a pinch you can shuffle them like this:
for( i = start; i < end; i++ ) {
p[i] = (p[i] << 4) | (p[i+1] >> 4);
}
That's assuming the first pixel is bits 4-7 and the second pixel is bits 0-3, and so on... If it's the other way around, just invert those two shifts.
// this assumes pixels points to bytes(unsigned chars)
index = ?;// your index to the pixel
byte_t b = pixels[index / 2];
if (index % 2) pixel = b >> 4;
else pixel = b & 15;
// Or you can use
byte_t b = pixels[index >> 1];
if (index & 1) pixel = b >> 4;
else pixel = b & 15;
Either way just compute the logical index into the file. Dividing by two takes you to the start of the byte where the pixel is. And then just read the correct half of the byte.
So make a function
byte_t GetMyPixel(unsigned char* pixels, unsigned index) {
byte_t b = pixels[index >> 1];
byte_t pixel;
if (index & 1) pixel = b >> 4;
else pixel = b & 15;
return pixel;
}
To read first image.
Image1x1 = GetMyPixel(pixels,0);
Image2x2_1 = GetMyPixel(pixels,1);// Top left pixel of second image
Image2x2_2 = GetMyPixel(pixels,2);// Top Right pixel of second image
Image2x2_3 = GetMyPixel(pixels,3);// Bottom left pixel of second image
... etc
So that is one way to go about it. You might need to take into account the endian-ness you are using so if it seems wrong then switch the logic for the pixel read thusly...
byte_t GetMyPixel(unsigned char* pixels, unsigned index) {
byte_t b = pixels[index >> 1];
byte_t pixel;
#if OTHER_ENDIAN
if (index & 1) pixel = b >> 4;
else pixel = b & 15;
#else
if (index & 1) pixel = b & 15;
else pixel = b >> 4;
#endif
return pixel;
}