Lossless texture compression for OpenGL - opengl

I have several 32-bit(with alpha channel) bitmap images which I'm using as essential information in my game. Slightest change in RGBA values breaks everything, so I can't use lossy compression methods like S3TC.
Is there any feasible lossless compression algorithms I can use with OpenGL? I'm using fragment shaders and I want to use the glCompressedTexImage2D() method to define the texture. I haven't tried compressing the texture with OpenGL using GL_COMPRESSED_RGBA parameter, is there any chance I can get lossless compression that way?

Texture compression, as opposed to regular image compression, is designed for one specific purpose: being a texture. And that means fast random access of data.
Lossless compression formats do not tend to do well when it comes to random access patterns. The major lossless compression formats are some form of RLE or table-based encoding. These are adequate for decompressing the entire dataset at once, but they're terrible at being able to know in which memory location the value for texel (U,V) is.
And that question gets asked a lot when accessing textures.
As such, there are no lossless hardware texture compression formats.
Your options are limited to the following:
Use texture memory as a kind of cache. That is, when you determine that you will need a particular image in this frame, decompress it. This could be done on the CPU or GPU (via compute shaders or the like). Note that for fast GPU decompression, you will have to come up with a compression scheme that takes advantage of parallel execution. Most lossless compression formats are not particularly parallel.
If a particular image has not been used in some time, you put it in a "subject to be reused" pile. And if you need to decompress a new image, you can take the least-recently-used image off of that pile, rather than constantly creating/destroying OpenGL texture objects.
Build your own lossless compression scheme, designed for your specific needs. If you absolutely need exact texel values from the texture, I assume that you aren't using linear filtering when accessing these textures. So these aren't really colors; they're arbitrary information about a texel.
I might suggest field compression (improved packing of your bits in the available space). But without knowing what your data actually is or means, I can't say whether your particular use case is amenable to it.

Related

Using OpenGL to perform video compositing with YUV color format - performance

I have written a C/C++ implementation of what I term a "compositor" (I come from a video background) to composite/overlay video/graphics on the top of a video source. My current compositor implementation is rather naive and there is room for CPU optimization improvements (ex: SIMD, threading, etc).
I've created a high-level diagram of what I am currently doing:
The diagram is self explanatory. Nonetheless, I'll elaborate on some of the constraints:
The main video always comes served in an 8-bit YUV 4:2:2 packed format
The secondary video (optional) will come served in either an 8-bit YUV 4:2:2 or YUVA 4:2:2:4 packed format.
The output from the overlay must come out in an 8-bit YUV 4:2:2 packed format
Some other bits of information:
The number of graphics inputs will vary; it may (or may not) be a constant value.
The colour format of the Graphics can be pinned to either ARGB or YUVA format (ie. I can provide it as you see fit). At the moment, I pin it to YUVA to keep a consistent colour format.
The potential of using OpenGL and accompanying shaders is rather appealing:
No need to reinvent the wheel (in terms of actually performing the composition)
The possibility of using GPU where available.
My concern with using OpenGL is performance. Looking around on the web, it is my understanding that a YUV surface would be converted to RGB internally; I would like to minimize the number of colour format conversions and ensure optimal performance. Without prior OpenGL experience, I hope someone can shed some light and suggest if I'm about to venture down the wrong path.
Perhaps my concern relating to performance is less of an issue when using a dedicated GPU? Do I need to consider separate code paths:
Hardware with GPU(s)
Hardware with only CPU(s)?
Additionally, am I going to struggle when I need to process 10-bit YUV?
You should be able to treat YUV as independent channels throughout. OpenGL shaders will be calling them r, g, and b, but it's just data that can be treated as whatever you want.
Most GPUs will support 10 bits per channel (+ 2 alpha bits). Various will support 16 bits per channel for all 4 channels but I'm a little rusty here so I have no idea how common support is for this. Not sure about the 4:2:2 data, but you can always treat it as 3 separate surfaces.
The number of graphics inputs will vary; it may (or may not) be a constant value.
This is something I'm a little less sure about. Shaders like this to be predictable. If your implementation allows you to add each input iteratively then you should be fine.
As an alternative suggestion, have you looked into OpenCL?

Compressed Textures in OpenGL

I have read that compressed textures are not readable and are not color render-able.
Though I have some idea of why its not allowed, can some one explain in little detail.
What exactly does it mean its not readable. I can not read from them in shader using say image Load etc? Or I cant even sample from them?
What does it mean its not render-able to? Is it because user is going to see all garbage anyway, so its not allowed.
I have not tried using compressed textures.
Compressed textures are "readable", by most useful definitions of that term. You can read from them via samplers. However, you can't use imageLoad operations on them. Why? Because reading such memory is not a simple memory fetch. It involves fetching lots of memory and doing a decompression operation.
Compressed images are not color-renderable, which means they cannot be attached to an FBO and used as a render target. One might think the reason for this was obvious, but if you need it spelled out. Writing to a compressed image requires doing image compression on the fly. And most texture compression formats (or compressed formats of any kind) are not designed to easily deal with changing a few values. Not to mention, most compressed texture formats are lossy, so every time you do a decompress/write/recompress operation, you lose image fidelity.
From the OpenGL Wiki:
Despite being color formats, compressed images are not color-renderable, for obvious reasons. Therefore, attaching a compressed image to a framebuffer object will cause that FBO to be incomplete and thus unusable. For similar reasons, no compressed formats can be used as the internal format of renderbuffers.
So "not color render-able" means that they can't be used in FBOs.
I'm not sure what "not readable" means; it may mean that you can't bind them to an FBO and read from the FBO (since you can't bind them to an FBO in the first place).

Full HD 2D Texture Memory OpenGL

I am in the process of writing a full HD capable 2D engine for a company of artists which will hopefully be cross platform and is written in OpenGL and C++.
The main problem i've been having is how to deal with all those HD sprites. The artists have drawn the graphics at 24fps and they are exported as png sequences. I have converted them into DDS (not ideal, because it needs the directx header to load) DXT5 which reduces filesize alot. Some scenes in the game can have 5 or 6 animated sprites at a time, and these can consist of 200+ frames each. Currently I am loading sprites into an array of pointers, but this is taking too long to load, even with compressed textures, and uses quite a bit of memory (approx 500mb for a full scene).
So my question is do you have any ideas or tips on how to handle such high volumes of frames? There are a couple of ideas i've thought've of:
Use the swf format for storing the frames from Flash
Implement a 2D skeletal animation system, replacing the png sequences (I have concerns about the joints being visible tho)
How do games like Castle Crashers load so quickly with great HD graphics?
Well the first thing to bear in mind is that not all platforms support DXT5 (mobiles specifically).
Beyond that have you considered using something like zlib to compress the textures? The textures will likely have a fair degree of self similarity which will mean that they will compress down a lot. In this day and age decompression is cheap due to the speed of processors and the time saved getting the data off the disk can be far far more useful than the time lost to decompression.
I'd start there if i were you.
24 fps hand-drawn animations? Have you considered reducing the framerate? Even cinema-quality cel animation is only rarely drawn at the full 24-fps. Even going down to 18 fps will get rid of 25% of your data.
In any case, you didn't specify where your load times were long. Is the load from harddisk to memory the problem, or is it the memory to texture load that's the issue? Are you frequently swapping sets of texture data into the GPU, or do you just build a bunch of textures out of it at load time?
If it's a disk load issue, then your only real choice is to compress the texture data on the disk and decompress it into memory. S3TC-style compression is not that compressed; it's designed to be a useable compression technique for texturing hardware. You can usually make it smaller by using a standard compression library on it, such as zlib, bzip2, or 7z. Of course, this means having to decompress it, but CPUs are getting faster than harddisks, so this is usually a win overall.
If the problem is in texture upload bandwidth, then there aren't very many solutions to that. Well, depending on your hardware of interest. If your hardware of interest supports OpenCL, then you can always transfer compressed data to the GPU, and then use an OpenCL program to decompress it on the fly directly into GPU memory. But requiring OpenCL support will impact the minimum level of hardware you can support.
Don't dismiss 2D skeletal animations so quickly. Games like Odin Sphere are able to achieve better animation of 2D skeletons by having several versions of each of the arm positions. The one that gets drawn is the one that matches up the closest to the part of the body it is attached to. They also use clever art to hide any defects, like flared clothing and so forth.

open GL texture compression

I'm a novice in Open GL.
Is there ever a need to do texture compression at runtime?
Surely, the way it works is a big texture file is compressed at build time. At runtime, you expand portions of the compressed texture file, as needed, to apply to a surface.
Are there any (credible) circumstances where you have expanded texture data, and you need to compress it at runtime?
Thanks!
Are you talking about compressed image formats (like JPEG or even a zip file containing an image) or compressed texture formats (like DXT1, etc)? When you have a compressed texture (such as DXT) you don't have to decompress it at runtime, the graphics card can do it on the fly as it samples the texture.
For games, where you can precompile all your assets ahead of time, it's generally a good idea to apply something like DXT compression at (asset) build time so you get all the benefits of texture compression (faster load time, less memory bandwidth usage, etc) without the cost of actually performing the compression at runtime. That said, in any circumstance where you wanted to render with compressed textures, but you don't have access to images you'll be using ahead of time (maybe you let the user pick image files from their machine or something) you would have no choice but to do the compression at runtime.
EDIT:
The way you would do DXT compression at runtime would be to call glTexImage2D, specifying the actual format of the source image you have (GL_RGBA, etc) for the 'format' parameter and a compressed format for the 'internal format' parameter, such as GL_COMPRESSED_RGBA_S3TC_DXT1_EXT for DXT1, assuming your card supports the gl_ext_texture_compression_s3tc extension.
If you have pre-compressed texture data then you can load it directly with glCompressedTexImage2D.

I thought *.DDS files were meant to be quick to load?

Ok, so I'm trying to weigh up the pro's and con's of using various different texture compression techniques. I spend 99.999% of my time coding 2D sprite games for Windows machines using DirectX.
So far I have looked at texture packing (SpriteSheets) with alpha-trimming and that seems like a decent way to get a bit more performance. Now I am starting to look at the texture format that they are stored in; currently everything is stored as *.PNGs.
I have heard that *.DDS files are good, especially when used with DXT5 (/3/1 depending on the task) compression as the texture remains compressed in VRAM? Also people say that as they are already DirectDraw Surfaces they load in much, much quicker too.
So I created an application to test this out; I call the line below 20 times, releasing the texture between each call.
for (int i = 0; i < 20; i++)
{
if( FAILED( D3DXCreateTextureFromFile( g_pd3dDevice, L"Test.dds", &g_pTexture ) ) )
{
return E_FAIL;
}
g_pTexture->Release();
g_pTexture = NULL;
}
Now if I try this with a DXT5 texture, it takes 5x longer to complete than with loading in a simple *.PNG. I've heard that if you don't generate Mipmaps it can go slower, so I double checked that. Then I changed the program that I was using to generate the *.DDS file, switching to NVIDIA's own nvcompress.exe, but none of it had any effect.
EDIT: I forgot to mention that the files (both *.png and *.dds) are both the same image, just saved in different formats. (Same size, amount of alpha, everything!)
EDIT 2: When using the following parameters it loads in almost 2.5x faster AND consumes a LOT less VRAM!
D3DXCreateTextureFromFileEx( g_pd3dDevice, L"Test.dds", D3DX_DEFAULT_NONPOW2, D3DX_DEFAULT_NONPOW2, D3DX_FROM_FILE, 0, D3DFMT_FROM_FILE, D3DPOOL_MANAGED, D3DX_FILTER_NONE, D3DX_FILTER_NONE, 0, NULL, NULL, &g_pTexture )
However, I'm now losing all my transparency in the texture, I've looked at the DXT5 texture and it looks fine in Paint.NET and DirectX DDS Viewer. However when loaded in all the transparency turns to solid black. ColorKey issue?
EDIT 3: Ignore that last bit, I was being idiotic and in my "quick example" haste I'd forgotten to enable Alpha-Blending on the D3DXSprite->Begin(). Doh!
You need to distinguish between the format that your files are stored in on disk and the format that the textures ultimately use in video memory. DXT compressed textures offer a good balance between memory usage and quality in video memory but other compression techniques like PNG or Jpeg compression generally result in smaller files and/or better quality on disk.
DDS files have the advantage that they support DXT formats directly and are laid out on disk in the same way that DirectX expects the data to be laid out in memory so there is minimal CPU time required after they are loaded to convert them into a format the hardware can use. They also support pre-generated mipmap chains which formats like PNG do not support. Compressing an image to DXT formats is a fairly time consuming process so you generally want to avoid doing it on load if possible.
A DDS file with pre-generated mipmaps that is the same size as and uses the same format as the video memory texture you plan to create from it will use the least CPU time of any standard format. You need to make sure you tell D3DX not to perform any scaling, filtering, format conversion or mipmap generation to guarantee that though. D3DXCreateTextureFromFileEx allows you to specify flags that prevent any internal conversions happening (D3DX_DEFAULT_NONPOW2 for image width and height if your hardware supports non power of two textures, D3DFMT_FROM_FILE to prevent mipmap generation or format conversion, D3DX_FILTER_NONE to prevent any filtering or scaling).
CPU time is only half the story though. These days CPUs are pretty fast and hard drives are relatively slow so sometimes your total load time can be shorter if you load a smaller compressed file format like PNG or JPG and then do lots of CPU work to convert it than if you load a larger file like a DDS and just do a memcpy into video memory. A common approach that gives good results is to zip DDS files and decompress them for fast loading from disk and minimal CPU cost for format conversion.
Compression formats like PNG and JPG will compress some images more effectively than others. DDS is a fixed compression ratio - a given image resolution and format will always compress to the same size (this is why it is more suitable for decompression in hardware). If you're using simple non-representative images for testing (e.g. a uniform colour or simple pattern) then your PNG file is likely to be very small and so will load from disk faster than a typical game image would.
Compare loading a standard PNG and then compressing it to the time it takes to load a DDS file.
Still I can't see why a PNG would load any faster than the same texture DXT5 compressed. For one it will be a fair bit smaller so it should load form disk faster! Is this DXt5 texture the same as the PNG texture? ie are they the same size?
Have you tried playing with D3DXCreateTextureFromFileEx? You have far more control over what is going on. It may help you out.