Currently implementing a D3D11 renderer, and the majority of my normal maps are in 3-channel RGB 8bits-per-channel format.
Looking through the DXGI_FORMAT msdn page I've noticed that there's no option for this. What's the reasoning behind this? How would I use this texture format then?
There is no hardware that support a 24bits format texture for a good decade now.
You have different options now :
Expand to 32bits. You retain the full quality of the bitmap at the price of an extra memory cost, no extra processing need.
Use a compressed format. This is where you should head.
GPU are not the best at reading a lot of uncompressed textures. Of course, if you are only rendering a few models, it won't matter, but if you start pushing a larger amount of data, you have to go to the compressed road.
Your compression options if you support very old hardware:
BC1, RGB format with 4bits per pixel, worst quality, use only with giant textures when the saving overcome the quality, or better, never use it.
BC3 with Red/Alpha swap, 4bit per pixel, just slightly better.
You should consider these instead :
BC5 : 8bits per pixels, 2 channels, you will have to rebuild Z with sqrt(1 - dot(n.xy,n.yx)) or an equivalent tricks.
BC7 : 8bits per pixels, 3/4 channels, the best choice if you need to store something else with your normal, like a variance. This format is great but is also very CPU intensive to generate at his best quality.
On a side note, do not forget to generate a mip chains up to an 1x1 size, it is both for performance and visual quality. Do not apply a SRGB conversion like you should have for your albedos ( 0.5 is really 0.5) and you have options with some formats to use an SNORM type to skip the typical 2*n-1 formula but be carreful with the 0 case.
Related
I have written a C/C++ implementation of what I term a "compositor" (I come from a video background) to composite/overlay video/graphics on the top of a video source. My current compositor implementation is rather naive and there is room for CPU optimization improvements (ex: SIMD, threading, etc).
I've created a high-level diagram of what I am currently doing:
The diagram is self explanatory. Nonetheless, I'll elaborate on some of the constraints:
The main video always comes served in an 8-bit YUV 4:2:2 packed format
The secondary video (optional) will come served in either an 8-bit YUV 4:2:2 or YUVA 4:2:2:4 packed format.
The output from the overlay must come out in an 8-bit YUV 4:2:2 packed format
Some other bits of information:
The number of graphics inputs will vary; it may (or may not) be a constant value.
The colour format of the Graphics can be pinned to either ARGB or YUVA format (ie. I can provide it as you see fit). At the moment, I pin it to YUVA to keep a consistent colour format.
The potential of using OpenGL and accompanying shaders is rather appealing:
No need to reinvent the wheel (in terms of actually performing the composition)
The possibility of using GPU where available.
My concern with using OpenGL is performance. Looking around on the web, it is my understanding that a YUV surface would be converted to RGB internally; I would like to minimize the number of colour format conversions and ensure optimal performance. Without prior OpenGL experience, I hope someone can shed some light and suggest if I'm about to venture down the wrong path.
Perhaps my concern relating to performance is less of an issue when using a dedicated GPU? Do I need to consider separate code paths:
Hardware with GPU(s)
Hardware with only CPU(s)?
Additionally, am I going to struggle when I need to process 10-bit YUV?
You should be able to treat YUV as independent channels throughout. OpenGL shaders will be calling them r, g, and b, but it's just data that can be treated as whatever you want.
Most GPUs will support 10 bits per channel (+ 2 alpha bits). Various will support 16 bits per channel for all 4 channels but I'm a little rusty here so I have no idea how common support is for this. Not sure about the 4:2:2 data, but you can always treat it as 3 separate surfaces.
The number of graphics inputs will vary; it may (or may not) be a constant value.
This is something I'm a little less sure about. Shaders like this to be predictable. If your implementation allows you to add each input iteratively then you should be fine.
As an alternative suggestion, have you looked into OpenCL?
I have several 32-bit(with alpha channel) bitmap images which I'm using as essential information in my game. Slightest change in RGBA values breaks everything, so I can't use lossy compression methods like S3TC.
Is there any feasible lossless compression algorithms I can use with OpenGL? I'm using fragment shaders and I want to use the glCompressedTexImage2D() method to define the texture. I haven't tried compressing the texture with OpenGL using GL_COMPRESSED_RGBA parameter, is there any chance I can get lossless compression that way?
Texture compression, as opposed to regular image compression, is designed for one specific purpose: being a texture. And that means fast random access of data.
Lossless compression formats do not tend to do well when it comes to random access patterns. The major lossless compression formats are some form of RLE or table-based encoding. These are adequate for decompressing the entire dataset at once, but they're terrible at being able to know in which memory location the value for texel (U,V) is.
And that question gets asked a lot when accessing textures.
As such, there are no lossless hardware texture compression formats.
Your options are limited to the following:
Use texture memory as a kind of cache. That is, when you determine that you will need a particular image in this frame, decompress it. This could be done on the CPU or GPU (via compute shaders or the like). Note that for fast GPU decompression, you will have to come up with a compression scheme that takes advantage of parallel execution. Most lossless compression formats are not particularly parallel.
If a particular image has not been used in some time, you put it in a "subject to be reused" pile. And if you need to decompress a new image, you can take the least-recently-used image off of that pile, rather than constantly creating/destroying OpenGL texture objects.
Build your own lossless compression scheme, designed for your specific needs. If you absolutely need exact texel values from the texture, I assume that you aren't using linear filtering when accessing these textures. So these aren't really colors; they're arbitrary information about a texel.
I might suggest field compression (improved packing of your bits in the available space). But without knowing what your data actually is or means, I can't say whether your particular use case is amenable to it.
Does anyone know of an efficient way to push 2vuy non-planar data onto a GPU in a way that doesn't require swizzling?
I am grabbing the raw 2vuy data from an h264 video file and successfully loading it into a texture that I map to an an OpenGL object. I notice that my code spends a fair amount of time in glgProcessPixelsWithProcessor. My glTexImage2D call looks like the following:
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, width, height, 0, GL_YCBCR_422_APPLE,
GL_UNSIGNED_SHORT_8_8_APPLE, data);
Apple says in its OpenGL guide that GL_YCBCR_422_APPLE, provides "acceptable" performance (p103), but that
Note: If your data needs only to be swizzled, glgProcessPixels performs the swizzling reasonably fast although not as fast as if the data didn't need swizzling. But non-native data formats are converted one byte at a time and incurs a performance cost that is best to avoid.
I assume that there is some kind of internal format conversion going on the CPU. I noticed in another thread that glgProcessPixels is running a block method as well.
Is my path the most efficient? If not, what is?
Your code, as it stands right now depends on extensions of Apple. I can't tell what's happening inside.
However what I suggest is, that you create three 2D textures, each with exactly one channel, where each texture receives one of the color planes; using independent textures makes supporting chroma subsampling (that 422) simpler.
In a shader you'd then perform the colorspace conversion. When writing down the math I suggest you do this via a contact color space, like XYZ, as this allows you, to take the color profile of the output device into account; ICC profiles provide the conversion data from XYZ color space coordinates to device color space (RGB) coordinates.
I need to convert 24bppRGB to 16bppRGB, 8bppRGB, 4bppRGB, 8bpp grayscal and 4bpp grayscale. Any good link or other suggestions?
preferably using Windows/GDI+
[EDIT] speed is more critical than quality. source images are screenshots
[EDIT1] color conversion is required to minimize space
You're better off getting yourself a library, as others have suggested. Aside from ImageMagick, there are others, such as OpenCV. The benefits of leaving this to a library are:
Save yourself some time -- by cutting out dev and testing time for the algorithm
Speed. Most libraries out there are optimized to a level far greater than a standard developer (such as ourselves) could achieve
Standards compliance. There are many image formats out there, and using a library cuts the problem of standards compliance out of the equation.
If you're doing this yourself, then your problem can be divided into the following sub-problems:
Simple color quantization. As #Alf P. Steinbach pointed out, this is just "downscaling" the number of colors. RGB24 has 8 bits per R, G, B channels, each. For RGB16 you can do a number of conversions:
Equal number of bits for each of R, G, B. This typically means 4 or 5 bits each.
Favor the green channel (human eyes are more sensitive to green) and give it 6 bits. R and B get 5 bits.
You can even do the same thing for RGB24 to RGB8, but the results won't be as pretty as a palletized image:
4 bits green, 2 red, 2 blue.
3 bits green, 5 bits between red and blue
Palletization (indexed color). This is for going from RGB24 to RGB8 and RGB4. This is a hard problem to solve by yourself.
Color to grayscale conversion. Very easy. Convert your RGB24 to YUV' color space, and keep the Y' channel. That will give you 8bpp grayscale. If you want 4bpp grayscale, then you either quantize or do palletization.
Also be sure to check out chroma subsampling. Often, you can decrease the bitrate by a third without visible losses to image quality.
With that breakdown, you can divide and conquer. Problems 1 and 2 you can solve pretty quickly. That will allow you to see the quality you can get simply by doing coarser color quantization.
Whether or not you want to solve Problem 2 will depend on the result from above. You said that speed is more important, so if the quality of color quantization only is good enough, don't bother with palletization.
Finally, you never mentioned WHY you are doing this. If this is for reducing storage space, then you should be looking at image compression. Even lossless compression will give you better results than reducing the color depth alone.
EDIT
If you're set on using PNG as the final format, then your options are quite limited, because both RGB16 and RGB8 are not valid combinations in the PNG header.
So what this means is: regardless of bit depth, you will have to switch to index color if you want RGB color images below 24bpp (8 bits per channel). This means you will NOT be able to take advantage of the color quantization and chroma decimation that I mentioned above -- it's not supported in PNG. So this means you will have to solve Problem 2 -- palletization.
But before you think about that, some more questions:
What are the dimensions of your images?
What sort of ideal file-size are you after?
How close to that ideal file-size do you get with straight RBG24 + PNG compression?
What is the source of your images? You've mentioned screenshots, but since you're so concerned about disk space, I'm beginning to suspect that you might be dealing with image sequences (video). If this is so, then you could do better than PNG compression.
Oh, and if you're serious about doing things with PNG, then definitely have a look at this library.
Find your self a copy of the ImageMagick [sic] library. It's very configurable, so you can teach it about the details of some binary format that you need to process...
See: ImageMagick, which has a very practical license.
I received acceptable results (preliminary) by GDI+, v.1.1 that is shipped with Vista and Win7. It allows conversion to 16bpp (I used PixelFormat16bppRGB565) and to 8bpp and 4bpp using standard palettes. Better quality could be received by "optimal palette" - GDI+ would calculate optimal palette for each screenshot, but it's two times slower conversion. Grayscale was received by specifying simple custom palette, e.g. as demonstrated here, except that I didn't need to modify pixels manually, Bitmap::ConvertFormat() did it for me.
[EDIT] results were really acceptable until I decided to check the solution on WinXP. Surprisingly, Microsoft decided to not ship GDI+ v.1.1 (required for Bitmap::ConvertFormat) to WinXP. Nice move! So I continue researching...
[EDIT] had to reimplement this on clean GDI hardcoding palettes from GDI+
Ok, so I'm trying to weigh up the pro's and con's of using various different texture compression techniques. I spend 99.999% of my time coding 2D sprite games for Windows machines using DirectX.
So far I have looked at texture packing (SpriteSheets) with alpha-trimming and that seems like a decent way to get a bit more performance. Now I am starting to look at the texture format that they are stored in; currently everything is stored as *.PNGs.
I have heard that *.DDS files are good, especially when used with DXT5 (/3/1 depending on the task) compression as the texture remains compressed in VRAM? Also people say that as they are already DirectDraw Surfaces they load in much, much quicker too.
So I created an application to test this out; I call the line below 20 times, releasing the texture between each call.
for (int i = 0; i < 20; i++)
{
if( FAILED( D3DXCreateTextureFromFile( g_pd3dDevice, L"Test.dds", &g_pTexture ) ) )
{
return E_FAIL;
}
g_pTexture->Release();
g_pTexture = NULL;
}
Now if I try this with a DXT5 texture, it takes 5x longer to complete than with loading in a simple *.PNG. I've heard that if you don't generate Mipmaps it can go slower, so I double checked that. Then I changed the program that I was using to generate the *.DDS file, switching to NVIDIA's own nvcompress.exe, but none of it had any effect.
EDIT: I forgot to mention that the files (both *.png and *.dds) are both the same image, just saved in different formats. (Same size, amount of alpha, everything!)
EDIT 2: When using the following parameters it loads in almost 2.5x faster AND consumes a LOT less VRAM!
D3DXCreateTextureFromFileEx( g_pd3dDevice, L"Test.dds", D3DX_DEFAULT_NONPOW2, D3DX_DEFAULT_NONPOW2, D3DX_FROM_FILE, 0, D3DFMT_FROM_FILE, D3DPOOL_MANAGED, D3DX_FILTER_NONE, D3DX_FILTER_NONE, 0, NULL, NULL, &g_pTexture )
However, I'm now losing all my transparency in the texture, I've looked at the DXT5 texture and it looks fine in Paint.NET and DirectX DDS Viewer. However when loaded in all the transparency turns to solid black. ColorKey issue?
EDIT 3: Ignore that last bit, I was being idiotic and in my "quick example" haste I'd forgotten to enable Alpha-Blending on the D3DXSprite->Begin(). Doh!
You need to distinguish between the format that your files are stored in on disk and the format that the textures ultimately use in video memory. DXT compressed textures offer a good balance between memory usage and quality in video memory but other compression techniques like PNG or Jpeg compression generally result in smaller files and/or better quality on disk.
DDS files have the advantage that they support DXT formats directly and are laid out on disk in the same way that DirectX expects the data to be laid out in memory so there is minimal CPU time required after they are loaded to convert them into a format the hardware can use. They also support pre-generated mipmap chains which formats like PNG do not support. Compressing an image to DXT formats is a fairly time consuming process so you generally want to avoid doing it on load if possible.
A DDS file with pre-generated mipmaps that is the same size as and uses the same format as the video memory texture you plan to create from it will use the least CPU time of any standard format. You need to make sure you tell D3DX not to perform any scaling, filtering, format conversion or mipmap generation to guarantee that though. D3DXCreateTextureFromFileEx allows you to specify flags that prevent any internal conversions happening (D3DX_DEFAULT_NONPOW2 for image width and height if your hardware supports non power of two textures, D3DFMT_FROM_FILE to prevent mipmap generation or format conversion, D3DX_FILTER_NONE to prevent any filtering or scaling).
CPU time is only half the story though. These days CPUs are pretty fast and hard drives are relatively slow so sometimes your total load time can be shorter if you load a smaller compressed file format like PNG or JPG and then do lots of CPU work to convert it than if you load a larger file like a DDS and just do a memcpy into video memory. A common approach that gives good results is to zip DDS files and decompress them for fast loading from disk and minimal CPU cost for format conversion.
Compression formats like PNG and JPG will compress some images more effectively than others. DDS is a fixed compression ratio - a given image resolution and format will always compress to the same size (this is why it is more suitable for decompression in hardware). If you're using simple non-representative images for testing (e.g. a uniform colour or simple pattern) then your PNG file is likely to be very small and so will load from disk faster than a typical game image would.
Compare loading a standard PNG and then compressing it to the time it takes to load a DDS file.
Still I can't see why a PNG would load any faster than the same texture DXT5 compressed. For one it will be a fair bit smaller so it should load form disk faster! Is this DXt5 texture the same as the PNG texture? ie are they the same size?
Have you tried playing with D3DXCreateTextureFromFileEx? You have far more control over what is going on. It may help you out.