Here's what I want to do: I want to load a plain image file (.png, .tga, .bmp, etc), upload this image to OpenGL as a texture, tell OpenGL to generate mipmaps for the texture, tell OpenGL to compress the image (with S3TC/RGTC), then download the entire compressed/mipmapped texture, save it into a file, and later be able to load the entire texture into OpenGL at once.
I've already managed the first 3 steps. I use SDL2_Image to handle image loading, I can upload said image via glTexImage2D(), and I can create mipmaps using glGenerateMipmap(). From there, I'm pretty much lost, but I can figure out how to compress the images without much trouble.
What I need help with is the final bit - downloading the entire compressed+mipmapped texture as a single, contiguous block of data, saving it to file (at the content authoring stage), and later uploading the whole thing at once (at runtime). Any advice for where I can start?
PS. I'm using OpenGL 3.3 as my minimum version.
You can compress to S3TC or DXT using rygDXT real time compressor, because there is no DXT compression embedded in the driver. Most DXT compressors are offline (not real time) and needs seconds or minutes to run.
You also have nvdxt, nvidia SDK provides full code sample to compress textures in DXT but its slow I warn you. ryg DXT should be faster.
Then for upward copy you should be able to copy the texture into a dynamic ("staging" in DX terms) buffer and then map the dynamic buffer to memory using some lock (glMapBuffer?) then memcopy.
Related
I am in the process of writing a full HD capable 2D engine for a company of artists which will hopefully be cross platform and is written in OpenGL and C++.
The main problem i've been having is how to deal with all those HD sprites. The artists have drawn the graphics at 24fps and they are exported as png sequences. I have converted them into DDS (not ideal, because it needs the directx header to load) DXT5 which reduces filesize alot. Some scenes in the game can have 5 or 6 animated sprites at a time, and these can consist of 200+ frames each. Currently I am loading sprites into an array of pointers, but this is taking too long to load, even with compressed textures, and uses quite a bit of memory (approx 500mb for a full scene).
So my question is do you have any ideas or tips on how to handle such high volumes of frames? There are a couple of ideas i've thought've of:
Use the swf format for storing the frames from Flash
Implement a 2D skeletal animation system, replacing the png sequences (I have concerns about the joints being visible tho)
How do games like Castle Crashers load so quickly with great HD graphics?
Well the first thing to bear in mind is that not all platforms support DXT5 (mobiles specifically).
Beyond that have you considered using something like zlib to compress the textures? The textures will likely have a fair degree of self similarity which will mean that they will compress down a lot. In this day and age decompression is cheap due to the speed of processors and the time saved getting the data off the disk can be far far more useful than the time lost to decompression.
I'd start there if i were you.
24 fps hand-drawn animations? Have you considered reducing the framerate? Even cinema-quality cel animation is only rarely drawn at the full 24-fps. Even going down to 18 fps will get rid of 25% of your data.
In any case, you didn't specify where your load times were long. Is the load from harddisk to memory the problem, or is it the memory to texture load that's the issue? Are you frequently swapping sets of texture data into the GPU, or do you just build a bunch of textures out of it at load time?
If it's a disk load issue, then your only real choice is to compress the texture data on the disk and decompress it into memory. S3TC-style compression is not that compressed; it's designed to be a useable compression technique for texturing hardware. You can usually make it smaller by using a standard compression library on it, such as zlib, bzip2, or 7z. Of course, this means having to decompress it, but CPUs are getting faster than harddisks, so this is usually a win overall.
If the problem is in texture upload bandwidth, then there aren't very many solutions to that. Well, depending on your hardware of interest. If your hardware of interest supports OpenCL, then you can always transfer compressed data to the GPU, and then use an OpenCL program to decompress it on the fly directly into GPU memory. But requiring OpenCL support will impact the minimum level of hardware you can support.
Don't dismiss 2D skeletal animations so quickly. Games like Odin Sphere are able to achieve better animation of 2D skeletons by having several versions of each of the arm positions. The one that gets drawn is the one that matches up the closest to the part of the body it is attached to. They also use clever art to hide any defects, like flared clothing and so forth.
Is it possible to allocate some memory on the GPU without cuda?
i'm adding some more details...
i need to get the video frame decoded from VLC and have some compositing functions on the video; I'm doing so using the new SDL rendering capabilities.
All works fine until i have to send the decoded data to the sdl texture... that part of code is handled by standard malloc which is slow for video operations.
Right now i'm not even sure that using gpu video will actually help me
Let's be clear: are you are trying to accomplish real time video processing? Since your latest update changed the problem considerably, I'm adding another answer.
The "slowness" you are experiencing could be due to several reasons. In order get the "real-time" effect (in the perceptual sense), you must be able to process the frame and display it withing 33ms (approximately, for a 30fps video). This means you must decode the frame, run the compositing functions (as you call) on it, and display it on the screen within this time frame.
If the compositing functions are too CPU intensive, then you might consider writing a GPU program to speed up this task. But the first thing you should do is determine where the bottleneck of your application is exactly. You could strip your application momentarily to let it decode the frames and display them on the screen (do not execute the compositing functions), just to see how it goes. If its slow, then the decoding process could be using too much CPU/RAM resources (maybe a bug on your side?).
I have used FFMPEG and SDL for a similar project once and I was very happy with the result. This tutorial shows to do a basic video player using both libraries. Basically, it opens a video file, decodes the frames and renders them on a surface for displaying.
You can do this via Direct3D 11 Compute Shaders or OpenCL. These are similar in spirit to CUDA.
Yes, it is. You can allocate memory in the GPU through OpenGL textures.
Only indirectly through a graphics framework.
You can use OpenGL which is supported by virtually every computer.
You could use a vertex buffer to store your data. Vertex buffers are usually used to store points for rendering, but you can easily use it to store an array of any kind. Unlike textures, their capacity is only limited by the amount of graphics memory available.
http://www.songho.ca/opengl/gl_vbo.html has a good tutorial on how to read and write data to vertex buffers, you can ignore everything about drawing the vertex buffer.
I'm a novice in Open GL.
Is there ever a need to do texture compression at runtime?
Surely, the way it works is a big texture file is compressed at build time. At runtime, you expand portions of the compressed texture file, as needed, to apply to a surface.
Are there any (credible) circumstances where you have expanded texture data, and you need to compress it at runtime?
Thanks!
Are you talking about compressed image formats (like JPEG or even a zip file containing an image) or compressed texture formats (like DXT1, etc)? When you have a compressed texture (such as DXT) you don't have to decompress it at runtime, the graphics card can do it on the fly as it samples the texture.
For games, where you can precompile all your assets ahead of time, it's generally a good idea to apply something like DXT compression at (asset) build time so you get all the benefits of texture compression (faster load time, less memory bandwidth usage, etc) without the cost of actually performing the compression at runtime. That said, in any circumstance where you wanted to render with compressed textures, but you don't have access to images you'll be using ahead of time (maybe you let the user pick image files from their machine or something) you would have no choice but to do the compression at runtime.
EDIT:
The way you would do DXT compression at runtime would be to call glTexImage2D, specifying the actual format of the source image you have (GL_RGBA, etc) for the 'format' parameter and a compressed format for the 'internal format' parameter, such as GL_COMPRESSED_RGBA_S3TC_DXT1_EXT for DXT1, assuming your card supports the gl_ext_texture_compression_s3tc extension.
If you have pre-compressed texture data then you can load it directly with glCompressedTexImage2D.
I need to load PNGs and JPGs to textures. I also need to save textures to PNGs. When an image exceeds GL_MAX_TEXTURE_SIZE I need to split the image into separate textures.
I want to do this with C++.
What could I do?
Thank you.
I need to load PNGs and JPGs to textures
SDL_Image
Qt 4
or use libpng and libjpeg directly (you don't really want to do that, though).
When an image exceeds GL_MAX_TEXTURE_SIZE I need to split the image into separate textures.
You'll have to code it yourself. It isn't difficult.
DevIL can load and save many image formats including PNG and JPEG. It comes with helper functions that upload these images to OpenGL textures (ilutGLBindTexImage, ilutGLLoadImage) and functions to copy only parts of an image to a new image (ilCopyPixels, can be used to split large textures).
For the loading part SOIL looks rather self-contained.
Ok, so I'm trying to weigh up the pro's and con's of using various different texture compression techniques. I spend 99.999% of my time coding 2D sprite games for Windows machines using DirectX.
So far I have looked at texture packing (SpriteSheets) with alpha-trimming and that seems like a decent way to get a bit more performance. Now I am starting to look at the texture format that they are stored in; currently everything is stored as *.PNGs.
I have heard that *.DDS files are good, especially when used with DXT5 (/3/1 depending on the task) compression as the texture remains compressed in VRAM? Also people say that as they are already DirectDraw Surfaces they load in much, much quicker too.
So I created an application to test this out; I call the line below 20 times, releasing the texture between each call.
for (int i = 0; i < 20; i++)
{
if( FAILED( D3DXCreateTextureFromFile( g_pd3dDevice, L"Test.dds", &g_pTexture ) ) )
{
return E_FAIL;
}
g_pTexture->Release();
g_pTexture = NULL;
}
Now if I try this with a DXT5 texture, it takes 5x longer to complete than with loading in a simple *.PNG. I've heard that if you don't generate Mipmaps it can go slower, so I double checked that. Then I changed the program that I was using to generate the *.DDS file, switching to NVIDIA's own nvcompress.exe, but none of it had any effect.
EDIT: I forgot to mention that the files (both *.png and *.dds) are both the same image, just saved in different formats. (Same size, amount of alpha, everything!)
EDIT 2: When using the following parameters it loads in almost 2.5x faster AND consumes a LOT less VRAM!
D3DXCreateTextureFromFileEx( g_pd3dDevice, L"Test.dds", D3DX_DEFAULT_NONPOW2, D3DX_DEFAULT_NONPOW2, D3DX_FROM_FILE, 0, D3DFMT_FROM_FILE, D3DPOOL_MANAGED, D3DX_FILTER_NONE, D3DX_FILTER_NONE, 0, NULL, NULL, &g_pTexture )
However, I'm now losing all my transparency in the texture, I've looked at the DXT5 texture and it looks fine in Paint.NET and DirectX DDS Viewer. However when loaded in all the transparency turns to solid black. ColorKey issue?
EDIT 3: Ignore that last bit, I was being idiotic and in my "quick example" haste I'd forgotten to enable Alpha-Blending on the D3DXSprite->Begin(). Doh!
You need to distinguish between the format that your files are stored in on disk and the format that the textures ultimately use in video memory. DXT compressed textures offer a good balance between memory usage and quality in video memory but other compression techniques like PNG or Jpeg compression generally result in smaller files and/or better quality on disk.
DDS files have the advantage that they support DXT formats directly and are laid out on disk in the same way that DirectX expects the data to be laid out in memory so there is minimal CPU time required after they are loaded to convert them into a format the hardware can use. They also support pre-generated mipmap chains which formats like PNG do not support. Compressing an image to DXT formats is a fairly time consuming process so you generally want to avoid doing it on load if possible.
A DDS file with pre-generated mipmaps that is the same size as and uses the same format as the video memory texture you plan to create from it will use the least CPU time of any standard format. You need to make sure you tell D3DX not to perform any scaling, filtering, format conversion or mipmap generation to guarantee that though. D3DXCreateTextureFromFileEx allows you to specify flags that prevent any internal conversions happening (D3DX_DEFAULT_NONPOW2 for image width and height if your hardware supports non power of two textures, D3DFMT_FROM_FILE to prevent mipmap generation or format conversion, D3DX_FILTER_NONE to prevent any filtering or scaling).
CPU time is only half the story though. These days CPUs are pretty fast and hard drives are relatively slow so sometimes your total load time can be shorter if you load a smaller compressed file format like PNG or JPG and then do lots of CPU work to convert it than if you load a larger file like a DDS and just do a memcpy into video memory. A common approach that gives good results is to zip DDS files and decompress them for fast loading from disk and minimal CPU cost for format conversion.
Compression formats like PNG and JPG will compress some images more effectively than others. DDS is a fixed compression ratio - a given image resolution and format will always compress to the same size (this is why it is more suitable for decompression in hardware). If you're using simple non-representative images for testing (e.g. a uniform colour or simple pattern) then your PNG file is likely to be very small and so will load from disk faster than a typical game image would.
Compare loading a standard PNG and then compressing it to the time it takes to load a DDS file.
Still I can't see why a PNG would load any faster than the same texture DXT5 compressed. For one it will be a fair bit smaller so it should load form disk faster! Is this DXt5 texture the same as the PNG texture? ie are they the same size?
Have you tried playing with D3DXCreateTextureFromFileEx? You have far more control over what is going on. It may help you out.