I've read a lot of articles and code but I still cannot get this to work, I've read all the 128 bytes of the header in my texture and them read 65536 bytes of compressed data of the actual texture(the texture's resolution is 256x256 and each compressed pixel uses 1 byte). I've tried to create my decompression algorithm with no success, them I've decided to use someone's else, so I found this code here. This is the arguments I was trying to pass to it so it would decompress my DDS texture.BlockDecompressImageDXT5(textureHeader.dwWidth, textureHeader.dwHeight, temp, packedData)
Note: textureHeader is a valid struct with the DDS texture's header data loaded into it, temp is a unsigned char array holding all the DDS data that was read from the DDS texture and packedData is a unsigned long array I was expecting to receive the final decompressed data. So in the code I've linked, the RGBA channels for each pixel were packed in the PackRGBA function, one byte for each color in the packedData. Before pointing the data to the texture's data at D3D11_SUBRESOURCE_DATApSysMem, I've distributed each byte from the unsigned long packedData to 4 different unsigned char m_DDSData this way:
for (int i{ 0 }, iData{ 0 }; i < textureHeader.dwPitchOrLinearSize; i++, iData += 4) //dwPitchOrLinearSize is the size in bytes of the compressed data.
m_DDSData[iData] = ((packedData[i] << 24) >> 24); //first char receives the 1st byte, representing the red color.
m_DDSData[iData + 1] = ((packedData[i] << 16) >> 24); //second char receives the 2nd byte, representing the green color.
m_DDSData[iData + 2] = ((packedData[i] << 8) >> 24); //third char receives the 3rd byte, representing the blue color.
m_DDSData[iData + 3] = (packedData[i] >> 24); //fourth char receives the 4th byte, representing the alpha color.
Note: m_DDSData should be the final data array used by D3D11_SUBRESOURCE_DATA to point to the texture's data, but when I use it this is the kind of result I get, only a frame with random colors instead of my actual texture. I also have algorithm's to other type of textures and they work properly so I can assure the problem is only in the DDS compressed format.
EDIT: Another example, this is a model of a chest and the program should be rendering the chest's texture:

For a full description of the BC3 compression scheme, see Microsoft Docs. BC3 is just the modern name for DXT4/DXT5 compression a.k.a. S3TC. In short, it compresses a 4x4 block of pixels at a time into the following structures resulting in 16 bytes per block:
struct BC1
uint16_t rgb[2]; // 565 colors
uint32_t bitmap; // 2bpp rgb bitmap
static_assert(sizeof(BC1) == 8, "Mismatch block size");
struct BC3
uint8_t alpha[2]; // alpha values
uint8_t bitmap[6]; // 3bpp alpha bitmap
BC1 bc1; // BC1 rgb data
static_assert(sizeof(BC3) == 16, "Mismatch block size");
CPU decompression
For the color portion, it's the same as the "BC1" a.k.a. DXT1 compressed block. This is pseudo-code, but should get the point across:
auto pBC = &pBC3->bc1;
clr0 = pBC->rgb[0]; // 5:6:5 RGB
clr0.a = 255;
clr1 = pBC->rgb[1]; // 5:6:5 RGB
clr1.a = 255;
clr2 = lerp(clr0, clr1, 1 / 3);
clr2.a = 255;
clr3 = lerp(clr0, clr1, 2 / 3);
clr3.a = 255;
uint32_t dw = pBC->bitmap;
for (size_t i = 0; i < NUM_PIXELS_PER_BLOCK; ++i, dw >>= 2)
switch (dw & 3)
case 0: pColor[i] = clr0; break;
case 1: pColor[i] = clr1; break;
case 2: pColor[i] = clr2; break;
case 3: pColor[i] = clr3; break;
Note while a BC3 contains a BC1 block, the decoding rules for BC1 are slightly modified. When decompressing BC1, you normally check the order of the colors as follows:
if (pBC->rgb[0] <= pBC->rgb[1])
/* BC1 with 1-bit alpha */
clr2 = lerp(clr0, clr1, 0.5);
clr2.a = 255;
clr3 = 0; // alpha of zero
BC2 and BC3 already include the alpha channel, so this extra logic is not used, and you always have 4 opaque colors.
For the alpha portion, BC3 uses two alpha values and then generates a look-up table based on those values:
alpha[0] = alpha0 = pBC3->alpha[0];
alpha[1] = alpha1 = pBC3->alpha[1];
if (alpha0 > alpha1)
// 6 interpolated alpha values.
alpha[2] = lerp(alpha0, alpha1, 1 / 7);
alpha[3] = lerp(alpha0, alpha1, 2 / 7);
alpha[4] = lerp(alpha0, alpha1, 3 / 7);
alpha[5] = lerp(alpha0, alpha1, 4 / 7);
alpha[6] = lerp(alpha0, alpha1, 5 / 7);
alpha[7] = lerp(alpha0, alpha1, 6 / 7);
// 4 interpolated alpha values.
alpha[2] = lerp(alpha0, alpha1, 1 / 5);
alpha[3] = lerp(alpha0, alpha1, 2 / 5);
alpha[4] = lerp(alpha0, alpha1, 3 / 5);
alpha[5] = lerp(alpha0, alpha1, 4 / 5);
alpha[6] = 0;
alpha[7] = 255;
uint32_t dw = uint32_t(pBC3->bitmap[0]) | uint32_t(pBC3->bitmap[1] << 8)
| uint32_t(pBC3->bitmap[2] << 16);
for (size_t i = 0; i < 8; ++i, dw >>= 3)
pColor[i].a = alpha[dw & 0x7];
dw = uint32_t(pBC3->bitmap[3]) | uint32_t(pBC3->bitmap[4] << 8)
| uint32_t(pBC3->bitmap[5] << 16);
for (size_t i = 8; i < NUM_PIXELS_PER_BLOCK; ++i, dw >>= 3)
pColor[i].a = alpha[dw & 0x7];
DirectXTex includes functions for doing all the compression/decompression for all BC formats.
If you want to know what the pseudo-function lerp does, see wikipedia or HLSL docs.
Rendering with a compressed texture
If you are going to be rendering with Direct3D, you do not need to decompress the texture. All Direct3D hardware feature levels include support for BC1 - BC3 texture compression. You just create the texture with the DXGI_FORMAT_BC3_UNORM format and create the texture as normal. Something like this:
D3D11_TEXTURE2D_DESC desc = {};
desc.Width = textureHeader.dwWidth;
desc.Height = textureHeader.dwHeight;
desc.MipLevels = desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_BC3_UNORM;
desc.SampleDesc.Count = 1;
desc.Usage = D3D11_USAGE_DEFAULT;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
D3D11_SUBRESOURCE_DATA initData = {};
initData.pSrcBits = temp;
initData.SysMemPitch = 16 * (textureHeader.dwWidth / 4);
// For BC compressed textures pitch is the number of bytes in a ROW of blocks
Microsoft::WRL::ComPtr<ID3D11Texture2D> pTexture;
hr = device->CreateTexture2D( &desc, &initData, &pTexture );
if (FAILED(hr))
// error
For a full-featured DDS loader that supports arbitrary DXGI formats, mipmaps, texture arrays, volume maps, cubemaps, cubemap arrays, etc. See DDSTextureLoader. This code is included in DirectX Tool Kit for DX11 / DX12. There's standalone versions for DirectX 9, DirectX 10, and DirectX 11 in DirectXTex.
If loading legacy DDS files (i.e. those that do not map directly to DXGI formats), then use the DDS functions in DirectXTex which does all the various pixel format conversions required (3:3:2, 3:3:2:8, 4:4, 8:8:8, P8, A8P8, etc.)


UE4 capture frame using ID3D11Texture2D and convert to R8G8B8 bitmap

I'm working on a streaming prototype using UE4.
My goal here (in this post) is solely about capturing frames and saving one as a bitmap, just to visually ensure frames are correctly captured.
I'm currently capturing frames converting the backbuffer to a ID3D11Texture2D then mapping it.
Note : I tried the ReadSurfaceData approach in the render thread, but it didn't perform well at all regarding performances (FPS went down to 15 and I'd like to capture at 60 FPS), whereas the DirectX texture mapping from the backbuffer currently takes 1 to 3 milliseconds.
When debugging, I can see the D3D11_TEXTURE2D_DESC's format is DXGI_FORMAT_R10G10B10A2_UNORM, so red/green/blues are stored on 10 bits each, and alpha on 2 bits.
My questions :
How to convert the texture's data (using the D3D11_MAPPED_SUBRESOURCE pData pointer) to a R8G8B8(A8), that is, 8 bit per color (a R8G8B8 without the alpha would also be fine for me there) ?
Also, am I doing anything wrong about capturing the frame ?
What I've tried :
All the following code is executed in a callback function registered to OnBackBufferReadyToPresent (code below).
void* NativeResource = BackBuffer->GetNativeResource();
if (NativeResource == nullptr)
UE_LOG(LogTemp, Error, TEXT("Couldn't retrieve native resource"));
ID3D11Texture2D* BackBufferTexture = static_cast<ID3D11Texture2D*>(NativeResource);
D3D11_TEXTURE2D_DESC BackBufferTextureDesc;
// Get the device context
ID3D11Device* d3dDevice;
ID3D11DeviceContext* d3dContext;
// Staging resource
ID3D11Texture2D* StagingTexture;
D3D11_TEXTURE2D_DESC StagingTextureDesc = BackBufferTextureDesc;
StagingTextureDesc.Usage = D3D11_USAGE_STAGING;
StagingTextureDesc.BindFlags = 0;
StagingTextureDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
StagingTextureDesc.MiscFlags = 0;
HRESULT hr = d3dDevice->CreateTexture2D(&StagingTextureDesc, nullptr, &StagingTexture);
if (FAILED(hr))
UE_LOG(LogTemp, Error, TEXT("CreateTexture failed"));
// Copy the texture to the staging resource
d3dContext->CopyResource(StagingTexture, BackBufferTexture);
// Map the staging resource
hr = d3dContext->Map(
if (FAILED(hr))
UE_LOG(LogTemp, Error, TEXT("Map failed"));
// See for the struct definitions & the initialization of bmpHeader and bmpInfoHeader
// I didn't copy that code here to avoid overloading this post, as it's identical to the article's code
// Just making clear the reassigned values below
bmpHeader.sizeOfBitmapFile = 54 + StagingTextureDesc.Width * StagingTextureDesc.Height * 4;
bmpInfoHeader.width = StagingTextureDesc.Width;
bmpInfoHeader.height = StagingTextureDesc.Height;
std::ofstream fout("output.bmp", std::ios::binary);
fout.write((char*)&bmpHeader, 14);
fout.write((char*)&bmpInfoHeader, 40);
// TODO : convert to R8G8B8 (see below for my attempt at this)
d3dContext->Unmap(StagingTexture, 0);
(As mentioned in the code comments, I followed this article about the BMP headers for saving the bitmap to a file)
Texture data
One thing I'm concerned about is the retrieved data with this method.
I used a temporary array to check with the debugger what's inside.
// Just noted which width and height had the texture and hardcoded it here to allocate the right size
uint32_t data[1936 * 1056];
// Multiply by 4 as there are 4 bytes (32 bits) per pixel
memcpy(data, mapInfo.pData, StagingTextureDesc.Width * StagingTextureDesc.Height * 4);
Turns out the 1935 first uint32 in this array all contain the same value ; 3595933029. And after that, the same values are often seen hundred times in a row.
This makes me think the frame isn't captured as it should, because the UE4 editor's window doesn't have the exact same color on its first row all along (whether it's top or bottom).
R10G10B10A2 to R8G8B8(A8)
So I tried to guess how to convert from R10G10B10A2 to R8G8B8. I started from this value that appears 1935 times in a row at the beginning of the data buffer : 3595933029.
When I color pick an editor's window screenshot (using the Windows tool, which gets me an image with the exact same dimensions as the DirectX texture, that is 1936x1056), I get the following different colors:
R=56, G=57, B=52 (top left & bottom left)
R=0, G=0, B=0 (top right)
R=46, G=40, B=72 (bottom right - it overlaps the task bar, thus the color)
So I tried to manually convert the color to check if it matches any of those I color picked.
I thought about bit shifting to simply compare the values
3595933029 (value in retrieved buffer) in binary : 11010110010101011001010101100101
Can already see the pattern : 11 followed 3 times by the 10-bit value 0101100101, and none of the picked colors follow this (except the black corner, which would be only made of zeros though)
Anyway, assuming RRRRRRRRRR GGGGGGGGGG BBBBBBBBBB AA order (ditched bits are marked with an x) :
R=214, G=86, B=86 : doesn't match
R=89, G=89, B=89 : doesn't match
If that can help, here's the editor window that should be captured (it really is a Third person template, didn't add anything to it except this capture code)
Here's the generated bitmap when shifting bits :
Code to generate bitmap's pixels data :
struct Pixel {
uint8_t blue = 0;
uint8_t green = 0;
uint8_t red = 0;
} pixel;
uint32_t* pointer = (uint32_t*)mapInfo.pData;
size_t numberOfPixels = bmpInfoHeader.width * bmpInfoHeader.height;
for (int i = 0; i < numberOfPixels; i++) {
uint32_t value = *pointer;
// Ditch the color's 2 last bits, keep the 8 first = value >> 2; = value >> 12; = value >> 22;
fout.write((char*)&pixel, 3);
It somewhat seems similar in the present colors, however that doesn't look at all like the editor.
What am I missing ?
First of all, you are assuming that the mapInfo.RowPitch is exactly StagicngTextureDesc.Width * 4. This is often not true. When copying to/from Direct3D resources, you need to do 'row-by-row' copies. Also, allocating 2 MBytes on the stack is not good practice.
#include <cstdint>
#include <memory>
// Assumes our staging texture is 4 bytes-per-pixel
// Allocate temporary memory
auto data = std::unique_ptr<uint32_t[]>(
new uint32_t[StagingTextureDesc.Width * StagingTextureDesc.Height]);
auto src = static_cast<uint8_t*>(mapInfo.pData);
uint32_t* dest = data.get();
for(UINT y = 0; y < StagingTextureDesc.Height; ++y)
// Multiply by 4 as there are 4 bytes (32 bits) per pixel
memcpy(dest, src, StagingTextureDesc.Width * sizeof(uint32_t));
src += mapInfo.RowPitch;
dest += StagingTextureDesc.Width;
For C++11, using std::unique_ptr ensures the memory is eventually released automatically. You can transfer ownership of the memory to something else with uint32_t* ptr = data.release(). See cppreference.
With C++14, the better way to write the allocation is: auto data = std::make_unique<uint32_t[]>(StagingTextureDesc.Width * StagingTextureDesc.Height);. This assumes you are fine with a C++ exception being thrown for out-of-memory.
If you want to return an error code for out-of-memory instead of a C++ exception, use: auto data = std::unique_ptr<uint32_t[]>(new (std::nothrow) uint32_t[StagingTextureDesc.Width * StagingTextureDesc.Height]); if (!data) // return error
Converting 10:10:10:2 content to 8:8:8:8 content can be done efficiently on the CPU with bit-shifting.
The tricky bit is dealing with the up-scaling of the 2-bit alpha to 8-bits. For example, you want the Alpha of 11 to map to 255, not 192.
Here's a replacement for the loop above
// Assumes our staging texture is DXGI_FORMAT_R10G10B10A2_UNORM
for(UINT y = 0; y < StagingTextureDesc.Height; ++y)
auto sptr = reinterpret_cast<uint32_t*>(src);
for(UINT x = 0; x < StagingTextureDesc.Width; ++x)
uint32_t t = *(sptr++);
uint32_t r = (t & 0x000003ff) >> 2;
uint32_t g = (t & 0x000ffc00) >> 12;
uint32_t b = (t & 0x3ff00000) >> 22;
// Upscale alpha
// 11xxxxxx -> 11111111 (255)
// 10xxxxxx -> 10101010 (170)
// 01xxxxxx -> 01010101 (85)
// 00xxxxxx -> 00000000 (0)
t &= 0xc0000000;
uint32_t a = (t >> 24) | (t >> 26) | (t >> 28) | (t >> 30);
// Convert to DXGI_FORMAT_R8G8B8A8_UNORM
*(dest++) = r | (g << 8) | (b << 16) | (a << 24);
src += mapInfo.RowPitch;
Of course we can combine the shifting operations since we move them down and then back up in the previous loop. We do need to update the masks to remove the bits that are normally shifted off by the full shifts. This replaces the inner body of the loop above:
// Convert from 10:10:10:2 to 8:8:8:8
uint32_t t = *(sptr++);
uint32_t r = (t & 0x000003fc) >> 2;
uint32_t g = (t & 0x000ff000) >> 4;
uint32_t b = (t & 0x3fc00000) >> 6;
t &= 0xc0000000;
uint32_t a = t | (t >> 2) | (t >> 4) | (t >> 6);
*(dest++) = r | g | b | a;
Any time you reduce the bit-depth you will introduce error. Techniques like ordered dithering and error-diffusion dithering are commonly used in pixels conversions of this nature. These introduce a bit of noise to the image to reduce the visual impact of the lost low bits.
For examples of conversions for all DXGI_FORMAT types, see DirectXTex which makes use of DirectXMath for all the various packed vector types. DirectXTex also implements both 4x4 ordered dithering and Floyd-Steinberg error-diffusion dithering when reducing bit-depth.

LibPNG segmentation fault on png_read_image

I'm having a segmentation fault on png_read_image() and I can't figure out why.
Here's the code:
Initializing pngReadStruct & pngInfoStruct...
// Getting image's width & height
png_uint_32 imgWidth = png_get_image_width(pngReadStruct, pngInfoStruct);
png_uint_32 imgHeight = png_get_image_height(pngReadStruct, pngInfoStruct);
// Getting bits per channel (not per pixel)
png_uint_32 bitDepth = png_get_bit_depth(pngReadStruct, pngInfoStruct);
// Getting number of channels
png_uint_32 channels = png_get_channels(pngReadStruct, pngInfoStruct);
// Getting color type (RGB, RGBA, luminance, alpha, palette, etc)
png_uint_32 colorType = png_get_color_type(pngReadStruct, pngInfoStruct);
// Refining color type (if colored or grayscale)
switch (colorType) {
// If RBG image, setting channel number to 3
channels = 3;
if (bitDepth < 8)
// Updating bitdepth info
bitDepth = 8;
// Adding full alpha channel to the image if it possesses transparency
if (png_get_valid(pngReadStruct, pngInfoStruct, PNG_INFO_tRNS)) {
channels += 1;
// Defining an array to contain image's rows of pixels
std::vector<png_bytep> rowPtrs(imgHeight);
// Defining an array to contain image's pixels (data's type is 'std::unique_ptr<char[]>')
data = std::make_unique<char[]>(imgWidth * imgHeight * bitDepth * channels / 8);
const unsigned long int rowLength = imgWidth * bitDepth * channels / 8;
// Adding every pixel into previously allocated rows
for (unsigned int i = 0; i < imgHeight; ++i) {
// Preparing the rows to handle image's data
rowPtrs[i] = (png_bytep)&data + ((imgHeight - i - 1) * rowLength);
// Recovering image data
png_read_image(pngReadStruct,; // /!\ Segfault here
png_destroy_read_struct(&pngReadStruct, static_cast<png_infopp>(0), static_cast<png_infopp>(0));
Every characteristic taken from the file seems fine to me, and it worked without error just a while ago; it probably is a stupid error I made while refactoring.
Thanks for the help, feel free to ask anything else I'd have missed & sorry for the long code!

DirectX 11 and FreeType

Has anyone ever integrated FreeType with DirectX 11 for font rendering? The only article I seem to find is DirectX 11 Font Rendering. I can't seem to match the correct DXGI_FORMAT for rendering the grayscale bitmap that FreeType creates for a glyph.
There's three ways to handle greyscale textures in Direct3D 11:
Option (1): You can use an RGB format and replicate the channels. For example, you'd use DXGI_R8G8B8A8_UNORM and set R,G,B to the single monochrome channel and the A to all opaque (0xFF). You can handle Monochrome + Alpha (2 channel) data the same way.
This conversion is supported when loading .DDS luminance formats (D3DFMT_L8, D3DFMT_L8A8) by DirectXTex library and the texconv command-line tool with the -xlum switch.
This makes the texture up to 4 times larger in memory, but easily integrates using standard shaders.
Option (2): You keep the monochrome texture as a single channel using DXGI_FORMAT_R8_UNORM as your format. You then render using a custom shader which replicates the red channel to RGB at runtime.
This is in fact what the tutorial blog post you linked to is doing:
///////// PIXEL SHADER
float4 main(float2 uv : TEXCOORD0) : SV_Target0
return float4(Decal.Sample(Bilinear, uv).rrr, 1.f);
For Monochrome + Alpha (2-channel) you'd use DXGI_FORMAT_R8G8_UNORM and then your custom shader would use .rrrg as the swizzle.
Option (3): You can compress the monochrome data to the DXGI_FORMAT_BC2 format using a custom encoder. This is implemented in DirectX Tool Kit's MakeSpriteFont tool when using /TextureFormat:CompressedMono
// CompressBlock (16 pixels (4x4 block) stored as 16 bytes)
long alphaBits = 0;
int rgbBits = 0;
int pixelCount = 0;
for (int y = 0; y < 4; y++)
for (int x = 0; x < 4; x++)
long alpha;
int rgb;
// This is the single monochrome channel
int value = bitmapData[blockX + x, blockY + y];
if (options.NoPremultiply)
// If we are not premultiplied, RGB is always white and we have 4 bit alpha.
alpha = value >> 4;
rgb = 0;
// For premultiplied encoding, quantize the source value to 2 bit precision.
if (value < 256 / 6)
alpha = 0;
rgb = 1;
else if (value < 256 / 2)
alpha = 5;
rgb = 3;
else if (value < 256 * 5 / 6)
alpha = 10;
rgb = 2;
alpha = 15;
rgb = 0;
// Add this pixel to the alpha and RGB bit masks.
alphaBits |= alpha << (pixelCount * 4);
rgbBits |= rgb << (pixelCount * 2);
// The resulting BC2 block is:
// uint64_t = alphaBits
// uint16_t = 0xFFFF
// uint16_t = 0x0
// uint32_t = rgbBits
The resulting texture is then rendered using a standard alpha-blending shader. Since it uses 1 byte per pixel, this is effectively the same size as if you were using DXGI_FORMAT_R8_UNORM.
This technique does not work for 2-channel data, but works great for alpha-blended monochrome images like font glyphs.

How to compress YUYV raw data to JPEG using libjpeg?

I'm looking for an example of how to save a YUYV format frame to a JPEG file using the libjpeg library.
In typical computer APIs, "YUV" actually means YCbCr, and "YUYV" means "YCbCr 4:2:2" stored as Y0, Cb01, Y1, Cr01, Y2 ...
Thus, if you have a "YUV" image, you can save it to libjpeg using the JCS_YCbCr color space.
When you have a 422 image (YUYV) you have to duplicate the Cb/Cr values to the two pixels that need them before writing the scanline to libjpeg. Thus, this write loop will do it for you:
// "base" is an unsigned char const * with the YUYV data
// jrow is a libjpeg row of samples array of 1 row pointer
cinfo.image_width = width & -1;
cinfo.image_height = height & -1;
cinfo.input_components = 3;
cinfo.in_color_space = JCS_YCbCr;
jpeg_set_quality(&cinfo, 92, TRUE);
jpeg_start_compress(&cinfo, TRUE);
unsigned char *buf = new unsigned char[width * 3];
while (cinfo.next_scanline < height) {
for (int i = 0; i < cinfo.image_width; i += 2) {
buf[i*3] = base[i*2];
buf[i*3+1] = base[i*2+1];
buf[i*3+2] = base[i*2+3];
buf[i*3+3] = base[i*2+2];
buf[i*3+4] = base[i*2+1];
buf[i*3+5] = base[i*2+3];
jrow[0] = buf;
base += width * 2;
jpeg_write_scanlines(&cinfo, jrow, 1);
delete[] buf;
Use your favorite auto-ptr to avoid leaking "buf" if your error or write function can throw / longjmp.
Providing YCbCr to libjpeg directly is preferrable to converting to RGB, because it will store it directly in that format, thus saving a lot of conversion work. When the image comes from a webcam or other video source, it's also usually most efficient to get it in YCbCr of some sort (such as YUYV.)
Finally, "U" and "V" mean something slightly different in analog component video, so the naming of YUV in computer APIs that really mean YCbCr is highly confusing.
libjpeg also has a raw data mode, whereby you can directly supply the raw downsampled data (which is almost what you have in the YUYV format). This is more efficient than duplicating the UV values only to have libjpeg downscale them again internally.
To do so, you use jpeg_write_raw_data instead of jpeg_write_scanlines, and by default it will process exactly 16 scanlines at a time. JPEG expects the U and V planes to be 2x downsampled by default. YUYV format already has the horizontal dimension downsampled but not the vertical, so I skip U and V every other scanline.
cinfo.image_width = /* width in pixels */;
cinfo.image_height = /* height in pixels */;
cinfo.input_components = 3;
cinfo.in_color_space = JCS_YCbCr;
cinfo.raw_data_in = true;
JSAMPLE y_plane[16][cinfo.image_width];
JSAMPLE u_plane[8][cinfo.image_width / 2];
JSAMPLE v_plane[8][cinfo.image_width / 2];
JSAMPROW y_rows[16];
JSAMPROW u_rows[8];
JSAMPROW v_rows[8];
for (int i = 0; i < 16; ++i)
y_rows[i] = &y_plane[i][0];
for (int i = 0; i < 8; ++i)
u_rows[i] = &u_plane[i][0];
for (int i = 0; i < 8; ++i)
v_rows[i] = &v_plane[i][0];
JSAMPARRAY rows[] { y_rows, u_rows, v_rows };
jpeg_start_compress(&cinfo, true);
while (cinfo.next_scanline < cinfo.image_height)
for (JDIMENSION i = 0; i < 16; ++i)
auto offset = (cinfo.next_scanline + i) * cinfo.image_width * 2;
for (JDIMENSION j = 0; j < cinfo.image_width; j += 2)
y_plane[i][j] =[offset + j * 2 + 0];
y_plane[i][j + 1] =[offset + j * 2 + 2];
if (i % 2 == 0)
u_plane[i / 2][j / 2] = image_data[offset + j * 2 + 1];
v_plane[i / 2][j / 2] = image_data[offset + j * 2 + 3];
jpeg_write_raw_data(&cinfo, rows, 16);
I was able to get about a 33% decrease in compression time with this method compared to the one in #JonWatte's answer. This solution isn't for everyone though; some caveats:
You can only compress images with dimensions that are a multiple of 8. If you have different-sized images, you will have to write code to pad in the edges. If you're getting the images from a camera though, they will most likely be this way.
The quality is somewhat impaired by the fact that I simply skip color values for alternating scanlines instead of something fancier like averaging them. For my application though, speed was more important than quality.
The way it's written right now it allocates a ton of memory on the stack. This was acceptable for me because my images were small (640x480) and enough memory was available.
Documentation for libjpeg-turbo:

flipping depth frame received from Kinect

I use the following c++ code to read out the depth information from the kinect:
BYTE * rgbrun = m_depthRGBX;
const USHORT * pBufferRun = (const USHORT *)LockedRect.pBits;
// end pixel is start + width*height - 1
const USHORT * pBufferEnd = pBufferRun + (Width * Height);
// process data for display in main window.
while ( pBufferRun < pBufferEnd )
// discard the portion of the depth that contains only the player index
USHORT depth = NuiDepthPixelToDepth(*pBufferRun);
BYTE intensity = static_cast<BYTE>(depth % 256);
// Write out blue byte
*(rgbrun++) = intensity;
// Write out green byte
*(rgbrun++) = intensity;
// Write out red byte
*(rgbrun++) = intensity;
What I'd like to know is, what is the easiest way to implement frame flipping (horizontal & vertical)? I couldn't find any function in the kinect SDK, but maybe I missed it?
EDIT1 I'd like to not having to use any external libraries, so any solutions that explain the depth data layout and how to invert rows / columns, is highly appreciated.
So, you're using a standard 16bpp single channel depth map with player data. This is a nice easy format to work with. An image buffer is arranged row-wise, and each pixel in the image data has the bottom 3 bits set to the player ID and the top 13 bits set to depth data.
Here's a quick'n'dirty way to read each row in reverse, and write it out to an RGBWhatever image with a simple depth visualisation that's a little nicer to look at that the wrapping output you currently use.
BYTE * rgbrun = m_depthRGBX;
const USHORT * pBufferRun = (const USHORT *)LockedRect.pBits;
for (unsigned int y = 0; y < Height; y++)
for (unsigned int x = 0; x < Width; x++)
// shift off the player bits
USHORT depthIn = pBufferRun[(y * Width) + (Width - 1 - x)] >> 3;
// valid depth is (generally) in the range 0 to 4095.
// here's a simple visualisation to do a greyscale mapping, with white
// being closest. Set 0 (invalid pixel) to black.
BYTE intensity =
depthIn == 0 || depthIn > 4095 ?
0 : 255 - (BYTE)(((float)depthIn / 4095.0f) * 255.0f);
*(rgbrun++) = intensity;
*(rgbrun++) = intensity;
*(rgbrun++) = intensity;
Code untested, E&OE, etc ;-)
It is possible to parallelise the outer loop, if instead of using a single rgbrun pointer you get a pointer to the beginning of the current row and write the output to that instead.