DirectX 11 and FreeType - c++

Has anyone ever integrated FreeType with DirectX 11 for font rendering? The only article I seem to find is DirectX 11 Font Rendering. I can't seem to match the correct DXGI_FORMAT for rendering the grayscale bitmap that FreeType creates for a glyph.

There's three ways to handle greyscale textures in Direct3D 11:
Option (1): You can use an RGB format and replicate the channels. For example, you'd use DXGI_R8G8B8A8_UNORM and set R,G,B to the single monochrome channel and the A to all opaque (0xFF). You can handle Monochrome + Alpha (2 channel) data the same way.
This conversion is supported when loading .DDS luminance formats (D3DFMT_L8, D3DFMT_L8A8) by DirectXTex library and the texconv command-line tool with the -xlum switch.
This makes the texture up to 4 times larger in memory, but easily integrates using standard shaders.
Option (2): You keep the monochrome texture as a single channel using DXGI_FORMAT_R8_UNORM as your format. You then render using a custom shader which replicates the red channel to RGB at runtime.
This is in fact what the tutorial blog post you linked to is doing:
///////// PIXEL SHADER
float4 main(float2 uv : TEXCOORD0) : SV_Target0
return float4(Decal.Sample(Bilinear, uv).rrr, 1.f);
For Monochrome + Alpha (2-channel) you'd use DXGI_FORMAT_R8G8_UNORM and then your custom shader would use .rrrg as the swizzle.
Option (3): You can compress the monochrome data to the DXGI_FORMAT_BC2 format using a custom encoder. This is implemented in DirectX Tool Kit's MakeSpriteFont tool when using /TextureFormat:CompressedMono
// CompressBlock (16 pixels (4x4 block) stored as 16 bytes)
long alphaBits = 0;
int rgbBits = 0;
int pixelCount = 0;
for (int y = 0; y < 4; y++)
for (int x = 0; x < 4; x++)
long alpha;
int rgb;
// This is the single monochrome channel
int value = bitmapData[blockX + x, blockY + y];
if (options.NoPremultiply)
// If we are not premultiplied, RGB is always white and we have 4 bit alpha.
alpha = value >> 4;
rgb = 0;
// For premultiplied encoding, quantize the source value to 2 bit precision.
if (value < 256 / 6)
alpha = 0;
rgb = 1;
else if (value < 256 / 2)
alpha = 5;
rgb = 3;
else if (value < 256 * 5 / 6)
alpha = 10;
rgb = 2;
alpha = 15;
rgb = 0;
// Add this pixel to the alpha and RGB bit masks.
alphaBits |= alpha << (pixelCount * 4);
rgbBits |= rgb << (pixelCount * 2);
// The resulting BC2 block is:
// uint64_t = alphaBits
// uint16_t = 0xFFFF
// uint16_t = 0x0
// uint32_t = rgbBits
The resulting texture is then rendered using a standard alpha-blending shader. Since it uses 1 byte per pixel, this is effectively the same size as if you were using DXGI_FORMAT_R8_UNORM.
This technique does not work for 2-channel data, but works great for alpha-blended monochrome images like font glyphs.


Why my bitmap image have another color overlay after converting 32-bit to 8-bit

Im working on resizing bitmap image and converting bitmap image to 8-bit (grayscale). But I have the problem that when I convert 32-bit image to 8-bit image, the result has another color overlay while it works perfectly on 24-bit. I guess the cause is in the alpha color. but I dont know where the problem exactly is.
This is my code to generate 8-bit palette color and write it after DIB part:
char* palette = new char[1024];
for (int i = 0; i < 256; i++) {
palette[i * 4] = palette[i * 4 + 1] = palette[i * 4 + 2] = (char)i;
palette[i * 4 + 3] = 255;
fout.write(palette, 1024);
delete[] palette;
As I said, my code works perfectly on 24-bit. In 32-bit the color is still kept after resizing, but when converting to 8-bit, it will look like this:
expected image (when converted from 24-bit) //
unexpected image (when converted from 32-bit)
This is how I get the colors and save it to srcPixel[]:
int i = 0;
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
int index = getIndex(width, x, y);
srcPixel[index].A = srcBMP.pImageData[i];
i += alpha;
srcPixel[index].B = srcBMP.pImageData[i++];
srcPixel[index].G = srcBMP.pImageData[i++];
srcPixel[index].R = srcBMP.pImageData[i++];
i += padding;
And this is the code I converted it by getting average of 4 colors A, B, G and R from that srcPixel[]:
int i = 0;
for (int y = 0; y < dstHeight; y++) {
for (int x = 0; x < dstWidth; x++) {
int index = getIndex(dstWidth, x, y);
dstBMP.pImageData[i++] = (srcPixel[index].A + srcPixel[index].B + srcPixel[index].G + srcPixel[index].R) / 4;
i += dstPadding;
If I remove and skip all alpha bytes in my code, when converting my image is still like that and I will have another problem is when resizing, my image will have another color overlay like the problem when converting to 8-bit: resizing without alpha channel.
If I skip the alpha channel while getting average (change into dstBMP.pImageData[i++] = (srcPixel[index].B + srcPixel[index].G + srcPixel[index].R) / 3, there is almost nothing different, the overlay still exists.
If I remove palette[i * 4 + 3] = 255; or doing anything with it, the result is still not affected.
Thank you very much.
You add alpha channel to the color and that's why it becomes brighter. From here I found that opaque is 255 and transparent 0 - therefore you add another channel which is set to 'white' to your result.
Remove alpha channel from your equation and see if I'm right.

How to decompress a BC3_UNORM DDS texture format?

I've read a lot of articles and code but I still cannot get this to work, I've read all the 128 bytes of the header in my texture and them read 65536 bytes of compressed data of the actual texture(the texture's resolution is 256x256 and each compressed pixel uses 1 byte). I've tried to create my decompression algorithm with no success, them I've decided to use someone's else, so I found this code here. This is the arguments I was trying to pass to it so it would decompress my DDS texture.BlockDecompressImageDXT5(textureHeader.dwWidth, textureHeader.dwHeight, temp, packedData)
Note: textureHeader is a valid struct with the DDS texture's header data loaded into it, temp is a unsigned char array holding all the DDS data that was read from the DDS texture and packedData is a unsigned long array I was expecting to receive the final decompressed data. So in the code I've linked, the RGBA channels for each pixel were packed in the PackRGBA function, one byte for each color in the packedData. Before pointing the data to the texture's data at D3D11_SUBRESOURCE_DATApSysMem, I've distributed each byte from the unsigned long packedData to 4 different unsigned char m_DDSData this way:
for (int i{ 0 }, iData{ 0 }; i < textureHeader.dwPitchOrLinearSize; i++, iData += 4) //dwPitchOrLinearSize is the size in bytes of the compressed data.
m_DDSData[iData] = ((packedData[i] << 24) >> 24); //first char receives the 1st byte, representing the red color.
m_DDSData[iData + 1] = ((packedData[i] << 16) >> 24); //second char receives the 2nd byte, representing the green color.
m_DDSData[iData + 2] = ((packedData[i] << 8) >> 24); //third char receives the 3rd byte, representing the blue color.
m_DDSData[iData + 3] = (packedData[i] >> 24); //fourth char receives the 4th byte, representing the alpha color.
Note: m_DDSData should be the final data array used by D3D11_SUBRESOURCE_DATA to point to the texture's data, but when I use it this is the kind of result I get, only a frame with random colors instead of my actual texture. I also have algorithm's to other type of textures and they work properly so I can assure the problem is only in the DDS compressed format.
EDIT: Another example, this is a model of a chest and the program should be rendering the chest's texture:
For a full description of the BC3 compression scheme, see Microsoft Docs. BC3 is just the modern name for DXT4/DXT5 compression a.k.a. S3TC. In short, it compresses a 4x4 block of pixels at a time into the following structures resulting in 16 bytes per block:
struct BC1
uint16_t rgb[2]; // 565 colors
uint32_t bitmap; // 2bpp rgb bitmap
static_assert(sizeof(BC1) == 8, "Mismatch block size");
struct BC3
uint8_t alpha[2]; // alpha values
uint8_t bitmap[6]; // 3bpp alpha bitmap
BC1 bc1; // BC1 rgb data
static_assert(sizeof(BC3) == 16, "Mismatch block size");
CPU decompression
For the color portion, it's the same as the "BC1" a.k.a. DXT1 compressed block. This is pseudo-code, but should get the point across:
auto pBC = &pBC3->bc1;
clr0 = pBC->rgb[0]; // 5:6:5 RGB
clr0.a = 255;
clr1 = pBC->rgb[1]; // 5:6:5 RGB
clr1.a = 255;
clr2 = lerp(clr0, clr1, 1 / 3);
clr2.a = 255;
clr3 = lerp(clr0, clr1, 2 / 3);
clr3.a = 255;
uint32_t dw = pBC->bitmap;
for (size_t i = 0; i < NUM_PIXELS_PER_BLOCK; ++i, dw >>= 2)
switch (dw & 3)
case 0: pColor[i] = clr0; break;
case 1: pColor[i] = clr1; break;
case 2: pColor[i] = clr2; break;
case 3: pColor[i] = clr3; break;
Note while a BC3 contains a BC1 block, the decoding rules for BC1 are slightly modified. When decompressing BC1, you normally check the order of the colors as follows:
if (pBC->rgb[0] <= pBC->rgb[1])
/* BC1 with 1-bit alpha */
clr2 = lerp(clr0, clr1, 0.5);
clr2.a = 255;
clr3 = 0; // alpha of zero
BC2 and BC3 already include the alpha channel, so this extra logic is not used, and you always have 4 opaque colors.
For the alpha portion, BC3 uses two alpha values and then generates a look-up table based on those values:
alpha[0] = alpha0 = pBC3->alpha[0];
alpha[1] = alpha1 = pBC3->alpha[1];
if (alpha0 > alpha1)
// 6 interpolated alpha values.
alpha[2] = lerp(alpha0, alpha1, 1 / 7);
alpha[3] = lerp(alpha0, alpha1, 2 / 7);
alpha[4] = lerp(alpha0, alpha1, 3 / 7);
alpha[5] = lerp(alpha0, alpha1, 4 / 7);
alpha[6] = lerp(alpha0, alpha1, 5 / 7);
alpha[7] = lerp(alpha0, alpha1, 6 / 7);
// 4 interpolated alpha values.
alpha[2] = lerp(alpha0, alpha1, 1 / 5);
alpha[3] = lerp(alpha0, alpha1, 2 / 5);
alpha[4] = lerp(alpha0, alpha1, 3 / 5);
alpha[5] = lerp(alpha0, alpha1, 4 / 5);
alpha[6] = 0;
alpha[7] = 255;
uint32_t dw = uint32_t(pBC3->bitmap[0]) | uint32_t(pBC3->bitmap[1] << 8)
| uint32_t(pBC3->bitmap[2] << 16);
for (size_t i = 0; i < 8; ++i, dw >>= 3)
pColor[i].a = alpha[dw & 0x7];
dw = uint32_t(pBC3->bitmap[3]) | uint32_t(pBC3->bitmap[4] << 8)
| uint32_t(pBC3->bitmap[5] << 16);
for (size_t i = 8; i < NUM_PIXELS_PER_BLOCK; ++i, dw >>= 3)
pColor[i].a = alpha[dw & 0x7];
DirectXTex includes functions for doing all the compression/decompression for all BC formats.
If you want to know what the pseudo-function lerp does, see wikipedia or HLSL docs.
Rendering with a compressed texture
If you are going to be rendering with Direct3D, you do not need to decompress the texture. All Direct3D hardware feature levels include support for BC1 - BC3 texture compression. You just create the texture with the DXGI_FORMAT_BC3_UNORM format and create the texture as normal. Something like this:
D3D11_TEXTURE2D_DESC desc = {};
desc.Width = textureHeader.dwWidth;
desc.Height = textureHeader.dwHeight;
desc.MipLevels = desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_BC3_UNORM;
desc.SampleDesc.Count = 1;
desc.Usage = D3D11_USAGE_DEFAULT;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
D3D11_SUBRESOURCE_DATA initData = {};
initData.pSrcBits = temp;
initData.SysMemPitch = 16 * (textureHeader.dwWidth / 4);
// For BC compressed textures pitch is the number of bytes in a ROW of blocks
Microsoft::WRL::ComPtr<ID3D11Texture2D> pTexture;
hr = device->CreateTexture2D( &desc, &initData, &pTexture );
if (FAILED(hr))
// error
For a full-featured DDS loader that supports arbitrary DXGI formats, mipmaps, texture arrays, volume maps, cubemaps, cubemap arrays, etc. See DDSTextureLoader. This code is included in DirectX Tool Kit for DX11 / DX12. There's standalone versions for DirectX 9, DirectX 10, and DirectX 11 in DirectXTex.
If loading legacy DDS files (i.e. those that do not map directly to DXGI formats), then use the DDS functions in DirectXTex which does all the various pixel format conversions required (3:3:2, 3:3:2:8, 4:4, 8:8:8, P8, A8P8, etc.)

LibPNG segmentation fault on png_read_image

I'm having a segmentation fault on png_read_image() and I can't figure out why.
Here's the code:
Initializing pngReadStruct & pngInfoStruct...
// Getting image's width & height
png_uint_32 imgWidth = png_get_image_width(pngReadStruct, pngInfoStruct);
png_uint_32 imgHeight = png_get_image_height(pngReadStruct, pngInfoStruct);
// Getting bits per channel (not per pixel)
png_uint_32 bitDepth = png_get_bit_depth(pngReadStruct, pngInfoStruct);
// Getting number of channels
png_uint_32 channels = png_get_channels(pngReadStruct, pngInfoStruct);
// Getting color type (RGB, RGBA, luminance, alpha, palette, etc)
png_uint_32 colorType = png_get_color_type(pngReadStruct, pngInfoStruct);
// Refining color type (if colored or grayscale)
switch (colorType) {
// If RBG image, setting channel number to 3
channels = 3;
if (bitDepth < 8)
// Updating bitdepth info
bitDepth = 8;
// Adding full alpha channel to the image if it possesses transparency
if (png_get_valid(pngReadStruct, pngInfoStruct, PNG_INFO_tRNS)) {
channels += 1;
// Defining an array to contain image's rows of pixels
std::vector<png_bytep> rowPtrs(imgHeight);
// Defining an array to contain image's pixels (data's type is 'std::unique_ptr<char[]>')
data = std::make_unique<char[]>(imgWidth * imgHeight * bitDepth * channels / 8);
const unsigned long int rowLength = imgWidth * bitDepth * channels / 8;
// Adding every pixel into previously allocated rows
for (unsigned int i = 0; i < imgHeight; ++i) {
// Preparing the rows to handle image's data
rowPtrs[i] = (png_bytep)&data + ((imgHeight - i - 1) * rowLength);
// Recovering image data
png_read_image(pngReadStruct,; // /!\ Segfault here
png_destroy_read_struct(&pngReadStruct, static_cast<png_infopp>(0), static_cast<png_infopp>(0));
Every characteristic taken from the file seems fine to me, and it worked without error just a while ago; it probably is a stupid error I made while refactoring.
Thanks for the help, feel free to ask anything else I'd have missed & sorry for the long code!

What's the best way to read texture info from an image file and get to the pixels using SDL2?

You can create a texture in SDL2 using CreateTexture() and then get access to the pixels in that texture using LockTexture(). But in order to do so you need to have passed the SDL_TEXTUREACCESS_STREAMING flag to the CreateTexture call.
There's a fairly standard helper library for loading images called SDL_image. I use it to read image files into textures (textures are graphics card resident images for the casual observer). I'm currently loading my textures using IMG_LoadTexture(). My problem is I can't see how to set the SDL_TEXTUREACCESS_STREAMING flag in this case. So I can't get pixel data for textures loaded with SDL_image?
The reason I want to get to the pixels is to extract nine-patch data from them. (I may well end up having 9 textures). So I only need this info once at the start and I only need to read the texture data, not write it. I'd also like to use preexisting image file reading libraries if at all possible.
So the question is: What's the best way to read texture info from an image file and get to the pixels using SDL2?
Decided it was better to do it with surfaces and then convert to textures. This code works. It only does the top left corner of the nine-patch to keep it simple. (This also ignores the complication of the nine patch sizing info in the first row and column of the nine patch image).
typedef unsigned char byte_t;
NinePatch::NinePatch(const string fname, SDL_Renderer * renderer) {
// get the surface and the bits per pixel
SDL_Surface * surface = IMG_Load(fname.c_str());
int bytes_per_pixel = surface->format->BytesPerPixel;
// keep things simple by only looking at 4 byte/pixel nine-patches
if (bytes_per_pixel != 4) {
log_msg("Loading " + fname +
" expecting pixel data to be 4 but it has: " +
// offsets into the surface that divide the surface into a nine-patch
unsigned int left, right, top, bottom;
// find the widths we need by looking at the top row of pixels
byte_t * ptr = (byte_t*)surface->pixels;
uint32_t pixel, last_pixel = 0;
for (int i = 0; i < surface->w; i++) {
// we know they're 4 byte pixels cause otherwise we don't get here.
pixel = *(uint32_t*)ptr;
// look for "edges" in the top row of pixel data
if (pixel > last_pixel) {
left = i;
else if (pixel < last_pixel) {
right = i;
last_pixel = pixel;
// get the next pixel across
ptr += bytes_per_pixel;
// find the heights we need by looking at the left column of pixels
ptr = (byte_t*)surface->pixels;
last_pixel = 0;
for (int i = 0; i < surface->h; i++) {
// we know they're 4 byte pixels cause otherwise we don't get here.
pixel = *(uint32_t*)ptr;
// look for "edges" in the left column of pixel data
if (pixel > last_pixel) {
top = i;
else if (pixel < last_pixel) {
bottom = i;
last_pixel = pixel;
// get the next pixel down
ptr += bytes_per_pixel * surface->w;
// SDL interprets each pixel as a 32-bit number, so our masks
// must depend on the endianness (byte order) of the machine
Uint32 rmask, gmask, bmask, amask;
rmask = 0xff000000;
gmask = 0x00ff0000;
bmask = 0x0000ff00;
amask = 0x000000ff;
rmask = 0x000000ff;
gmask = 0x0000ff00;
bmask = 0x00ff0000;
amask = 0xff000000;
const uint32_t unused_flags = 0;
const int pixel_size = 32; // in bits
// scratch surface we use for breaking the nine-patch
// surface into little textures.
SDL_Surface * s;
SDL_Rect src_rect;
// create a surface to hold the top left corner
s = SDL_CreateRGBSurface(unused_flags, left, top,
pixel_size, rmask, gmask, bmask, amask);
// copy part of the nine-patch image surface into the new surface
src_rect.x = 0;
src_rect.y = 0;
src_rect.w = left;
src_rect.h = top;
SDL_BlitSurface(surface, &src_rect, s, NULL);
// convert the new corner surface into a texture
top_left_texture = SDL_CreateTextureFromSurface(renderer, s);
// free the scratch surface

flipping depth frame received from Kinect

I use the following c++ code to read out the depth information from the kinect:
BYTE * rgbrun = m_depthRGBX;
const USHORT * pBufferRun = (const USHORT *)LockedRect.pBits;
// end pixel is start + width*height - 1
const USHORT * pBufferEnd = pBufferRun + (Width * Height);
// process data for display in main window.
while ( pBufferRun < pBufferEnd )
// discard the portion of the depth that contains only the player index
USHORT depth = NuiDepthPixelToDepth(*pBufferRun);
BYTE intensity = static_cast<BYTE>(depth % 256);
// Write out blue byte
*(rgbrun++) = intensity;
// Write out green byte
*(rgbrun++) = intensity;
// Write out red byte
*(rgbrun++) = intensity;
What I'd like to know is, what is the easiest way to implement frame flipping (horizontal & vertical)? I couldn't find any function in the kinect SDK, but maybe I missed it?
EDIT1 I'd like to not having to use any external libraries, so any solutions that explain the depth data layout and how to invert rows / columns, is highly appreciated.
So, you're using a standard 16bpp single channel depth map with player data. This is a nice easy format to work with. An image buffer is arranged row-wise, and each pixel in the image data has the bottom 3 bits set to the player ID and the top 13 bits set to depth data.
Here's a quick'n'dirty way to read each row in reverse, and write it out to an RGBWhatever image with a simple depth visualisation that's a little nicer to look at that the wrapping output you currently use.
BYTE * rgbrun = m_depthRGBX;
const USHORT * pBufferRun = (const USHORT *)LockedRect.pBits;
for (unsigned int y = 0; y < Height; y++)
for (unsigned int x = 0; x < Width; x++)
// shift off the player bits
USHORT depthIn = pBufferRun[(y * Width) + (Width - 1 - x)] >> 3;
// valid depth is (generally) in the range 0 to 4095.
// here's a simple visualisation to do a greyscale mapping, with white
// being closest. Set 0 (invalid pixel) to black.
BYTE intensity =
depthIn == 0 || depthIn > 4095 ?
0 : 255 - (BYTE)(((float)depthIn / 4095.0f) * 255.0f);
*(rgbrun++) = intensity;
*(rgbrun++) = intensity;
*(rgbrun++) = intensity;
Code untested, E&OE, etc ;-)
It is possible to parallelise the outer loop, if instead of using a single rgbrun pointer you get a pointer to the beginning of the current row and write the output to that instead.