I have an object type which renders (correctly) a texture onto a 2D mesh depending on rotation (simulates 3D). However it is quite slow loading/binding a new texture image for each view. Disabling the view-dependent texture loading results in very quick performance.
Buffering all views/textures of the object may not be a good option, it could contain on the order of 720 views (separate images) each of which may be 600x1000 pixels. There is no guarantees of end-user system specs either, and this is a peripheral application.
Are there any good intermediate OpenGL suggestions between loading textures on demand and buffering all view textures at once?
This is where it would be useful to have a Texture Cache and you load all the lowest resolution MIP levels of the 720 different images. This would be your 1x1, 2x2 or so on resolution images.
As you are detecting changes in the view you are updating the texture cache, prioritizing the textures that last were used so that the one currently in view will have a high priority and the ones that weren't used for a long time will have the lowest priority.
As textures increase in priority you would bring in the higher detail MIP levels of the textures and you can rebind the textures when they finish loading, the texture cache would load them asynchronously in a separate thread and then notify your main thread when they can be prepared as that needs to happen in the same thread as the GL Context.
There are some other ways of doing this with new extensions like Partially Resident Textures from AMD but the extension has some limitations that makes it a bit cumbersome to use.
If the rotation is smooth and slow, you can stream the data from disk depending on the view and prefetch data for the surrounding views.
If you can afford a lossy compression, you can put lots of data in RAM with an aggressive compression then move some of it to VRAM (with DXT/BC compression if possible).
You should check these articles:
JMP Van Waveren. Real-time texture streaming & decompression. Intel
Software Network, 2006.
JMP Van Waveren. Geospatial texture streaming from slow storage
devices. Intel Software Network, 2008.
J.P. van Waveren. id tech 5 challenges:from texture virtualization to
massive parallelization. SIGGRAPH Talk, 2009.
Related
Maya promo video explains how GPU Cache affects user making application run faster. In frameworks like Cinder we redraw all geopetry we want to be in the scene on each frame update sending it to video card. So I worder what is behind GPU Caching from a programmer prespective? What OpenGL/DirectX APIs are behind such technology? How to "Cache" my mesh in GPU memory?
There is, to my knowledge, no way in OpenGL or DirectX to directly specify what is to be, and not to be, stored and tracked on the GPU cache. There are however methodologies that should be followed and maintained in order to make best use of the cache. Some of these include:
Batch, batch, batch.
Upload data directly to the GPU
Order indices to maximize vertex locality across the mesh.
Keep state changes to a minimum.
Keep shader changes to a minimum.
Keep texture changes to a minimum.
Use maximum texture compression whenever possible.
Use mipmapping whenever possible (to maximize texel sampling locality)
It is also important to keep in mind that there is no single GPU cache. There are multiple (vertex, texture, etc.) independent caches.
Sources:
OpenGL SuperBible - Memory Bandwidth and Vertices
GPU Gems - Graphics Pipeline Performance
GDC 2012 - Optimizing DirectX Graphics
First off, the "GPU cache" terminology that Maya uses probably refers to graphics data that is simply stored on the card refers to optimizing a mesh for device-independent storage and rendering in Maya . For card manufacturer's the notion of a "GPU cache" is different (in this case it means something more like the L1 or L2 CPU caches).
To answer your final question: Using OpenGL terminology, you generally create vertex buffer objects (VBO's). These will store the data on the card. Then, when you want to draw, you can simply instruct the card to use those buffers.
This will avoid the overhead of copying the mesh data from main (CPU) memory into graphics (GPU) memory. If you need to draw the mesh many times without changing the mesh data, it performs much better.
I started to study OpenGL and I learned how to show figures and stuff using Vertex and Fragment shader.
I created a (very) stylized man, and now I want it to move his arm and legs.
The question is: should I change vertex data in the VBO, directly in a timer function called in the main (as I did), or is it a job that should be done in the vertex shader, without touching vertex data?
I suppose the answer is the first one, but I feel like it overload CPU, instead of make the GPU work.
Both ways will work fine: if your aim it to utilise the GPU more then do the transformations in vertex shaders, otherwise you could use the CPU. Bear in mind however if checking for collisions you need data present at the CPU side....
In essence:
Doing the manipulation on the GPU means you only need to send the mesh data once, then you can send the matrix transformations to deform or animate it.
This is ideal as it greatly reduces the bandwidth of data trasmission between CPU->GPU.
It can also mean that you can upload just one copy of the mesh to the GPU and apply transforms for many different instances of the mesh to achieve varied but similar models (ie bear mesh sent to GPU make an instance at scale *2 scale *1 and scale *0.5 for Daddy bear, Mummy bear and Baby bear, and then send a Goldilocks mesh, now you have 2 meshes in memory to get 4 distinct models).
The transformed meshes however are not immediately available on the CPU side, so mesh-perfect Collision Detection will be more intensive.
Animating on the CPU means you have access to the transformed mesh, with the major caveat that you must upload that whole mesh to the GPU each frame and for each instance: more work, more data and more memory used up on both CPU and GPU sides.
CPU Advantages
Current accurate meshes are available for whatever purpose you require at all times.
CPU Disadvantages
massive data transfer between CPU and GPU: one transfer per instance per frame per model
GPU Advantages:
Upload less mesh data (once if you only need variations of the one model)
Transform individual instances in a fast parallelised way
CPU->GPU bandwidth minimised: only need to send tranformations
GPU is parallelised and can handle mesh data far more efficiently than a CPU
GPU Disadvantages
Meshes are not readily available for mesh-perfect collision detection
Mitigations to offset the overheads of transferring GPU data back to CPU:
Utilise bounding Boxes(axis aligned or non axis aligned as per your preference): this allows a small dataset to represent the model on the CPU side (8 points per box, as opposed to millions of points per mesh). If the bounding boxes collide, then transfer the mesh from GPU -> CPU and do the refined calculation to get an exact mesh to mesh collision detection. This gives the best of both worlds for the least overhead.
As the performance of the GPU could be tens, hundreds or even thousands of time higher than a CPU at processing meshes it quickly becomes evident why for performant coding as much as possible in this area is farmed out to the GPU.
Hope this helps:)
Depending on the platform and OpenGL version you can do the animations by changing the data from the vertex buffer directly(software animations) or by associating groups of vertices with corresponding animation matrices(hardware animations).
If you chose second approach(recommended where it is possible) you can send one or more of these matrices to the vertex shader as uniforms, maybe associating some "weight" factor for each matrix.
Please keep in mind that software animation will overload the CPU when you have a very high number of vertices and hardware animations would be almost free, you just multiply the vertex with correct matrix instead of model-view-projection one in the shader. Also GPUs are highly optimized for doing math operations and the are very fast compared to CPUs.
It is generally advised not to use vector graphics in mobile games, or pre-rasterize them - for performance. Why is that? I though that OpenGL is at least as good at drawing lines / triangles as rendering images on screen...
Rasterizing them caches them as images so less overhead takes place vs calculating every coordinate for vector and drawing (more draw cycles and more cpu usage). Drawing a vector is exactly that, you are drawing arcs from point to point on every single call vs displaying an image at a certain coordinate with a cached image file.
Although using impostors is a great optimization trick, depending on the impostors shape, how much overdraw is involved and whenever you may need blending in the process the trick can get you to be fillrate bound. Also in some scenarios where shapes may change, caching the graphics into impostors may not be feasible or may incur in other overheads. Is at matter of balancing your rendering pipeline.
The answer depends on the hardware. Are you using a GPU or NOT?
Today modern mobile devices with Android and IOS have a GPU unit embedded in the chipset.
This GPUs are very good with vector graphics. To probe this point most GPU's have a dedicated Geometry processor in addition to 1 or more pixel processors. (By example Mali-400 GPU).
By example let's say you want to draw a 200 trasparent circles of different colors.
If you do it with modern OpenGL, you will only need one set of geometry (a list of triangles forming a circle) and a list of parameters for each circle, let's say position and color. If you provide this information to the GPU, it will draw it in parallel very quickly.
If you do it using different textures for each color, your program will be very heavy (in storage size) and probably will be more slow due memory bandwidth problems.
It depends on what you want to do, and the hardware. If your hardware doesn't have a GPU you probably should pre-render your graphics.
I am in the process of writing a full HD capable 2D engine for a company of artists which will hopefully be cross platform and is written in OpenGL and C++.
The main problem i've been having is how to deal with all those HD sprites. The artists have drawn the graphics at 24fps and they are exported as png sequences. I have converted them into DDS (not ideal, because it needs the directx header to load) DXT5 which reduces filesize alot. Some scenes in the game can have 5 or 6 animated sprites at a time, and these can consist of 200+ frames each. Currently I am loading sprites into an array of pointers, but this is taking too long to load, even with compressed textures, and uses quite a bit of memory (approx 500mb for a full scene).
So my question is do you have any ideas or tips on how to handle such high volumes of frames? There are a couple of ideas i've thought've of:
Use the swf format for storing the frames from Flash
Implement a 2D skeletal animation system, replacing the png sequences (I have concerns about the joints being visible tho)
How do games like Castle Crashers load so quickly with great HD graphics?
Well the first thing to bear in mind is that not all platforms support DXT5 (mobiles specifically).
Beyond that have you considered using something like zlib to compress the textures? The textures will likely have a fair degree of self similarity which will mean that they will compress down a lot. In this day and age decompression is cheap due to the speed of processors and the time saved getting the data off the disk can be far far more useful than the time lost to decompression.
I'd start there if i were you.
24 fps hand-drawn animations? Have you considered reducing the framerate? Even cinema-quality cel animation is only rarely drawn at the full 24-fps. Even going down to 18 fps will get rid of 25% of your data.
In any case, you didn't specify where your load times were long. Is the load from harddisk to memory the problem, or is it the memory to texture load that's the issue? Are you frequently swapping sets of texture data into the GPU, or do you just build a bunch of textures out of it at load time?
If it's a disk load issue, then your only real choice is to compress the texture data on the disk and decompress it into memory. S3TC-style compression is not that compressed; it's designed to be a useable compression technique for texturing hardware. You can usually make it smaller by using a standard compression library on it, such as zlib, bzip2, or 7z. Of course, this means having to decompress it, but CPUs are getting faster than harddisks, so this is usually a win overall.
If the problem is in texture upload bandwidth, then there aren't very many solutions to that. Well, depending on your hardware of interest. If your hardware of interest supports OpenCL, then you can always transfer compressed data to the GPU, and then use an OpenCL program to decompress it on the fly directly into GPU memory. But requiring OpenCL support will impact the minimum level of hardware you can support.
Don't dismiss 2D skeletal animations so quickly. Games like Odin Sphere are able to achieve better animation of 2D skeletons by having several versions of each of the arm positions. The one that gets drawn is the one that matches up the closest to the part of the body it is attached to. They also use clever art to hide any defects, like flared clothing and so forth.
I'm developing some C++ code that can do some fancy 3D transition effects between two images, for which I thought OpenGL would be the best option.
I start with a DIB section and set it up for OpenGL, and I create two textures from input images.
Then for each frame I draw just two OpenGL quads, with the corresponding image texture.
The DIB content is then saved to file.
For example one effect is to locate the two quads (in 3d space) like two billboards, one in front of the other(obscuring it), and then swoop the camera up, forward and down so you can see the second one.
My input images are 1024x768 or so and it takes a really long time to render (100 milliseconds) when the quads cover most of the view. It speeds up if the camera is far away.
I tried rendering each image quad as hundreds of individual tiles, but it takes just the same time, it seems like it depends on the number of visible textured pixels.
I assumed OpenGL could do zillions of polygons a second. Is there something I am missing here?
Would I be better off using some other approach?
Thanks in advance...
Edit :
The GL strings show up for the DIB version as :
Vendor : Microsoft Corporation
Version: 1.1.0
Renderer : GDI Generic
The Onscreen version shows :
Vendor : ATI Technologies Inc.
Version : 3.2.9756 Compatibility Profile Context
Renderer : ATI Mobility Radeon HD 3400 Series
So I guess I'll have to use FBO's , I'm a bit confused as to how to get the rendered data out from the FBO onto a DIB, any pointers (pun intended) on that?
It sounds like rendering to a DIB is forcing the rendering to happen in software. I'd render to a frame buffer object, and then extract the data from the generated texture. Gamedev.net has a pretty decent tutorial.
Keep in mind, however, that graphics hardware is oriented primarily toward drawing on the screen. Capturing rendered data will usually be slower that displaying it, even when you do get the hardware to do the rendering -- though it should still be quite a bit faster than software rendering.
Edit: Dominik Göddeke has a tutorial that includes code for reading back texture data to CPU address space.
One problem with your question:
You provided no actual rendering/texture generation code.
Would I be better off using some other approach?
The simplest thing you can do is to make sure your textures have sizes equal to power of two. I.e. instead of 1024x768 use 1024x1024, and use only part of that texture. Explanation: although most of modern hardware supports non-pow2 textures, they are sometimes treated as "special case", and using such texture MAY produce performance drop on some hardware.
I assumed OpenGL could do zillions of polygons a second. Is there something I am missing here?
Yes, you're missing one important thing. There are few things that limit GPU performance:
1. System memory to video memory transfer rate (probably not your case - only for dynamic textures\geometry when data changes every frame).
2. Computation cost. (If you write a shader with heavy computations, it will be slow).
3. Fill rate (how many pixels program can put on screen per second), AFAIK depends on memory speed on modern GPUs.
4. Vertex processing rate (not your case) - how many vertices GPU can process per second.
5. Texture read rate (how many texels per second GPU can read), on modern GPUs depends on GPU memory speed.
6. Texture read caching (not your case) - i.e. in fragment shader you can read texture few hundreds times per pixel with little performance drop IF coordinates are very close to each other (i.e. almost same texel in each read) - because results are cached. But performance will drop significantly if you'll try to access 100 randomly located texels for every pixels.
All those characteristics are hardware dependent.
I.e., depending on some hardware you may be able to render 1500000 polygons per frame (if they take a small amount of screen space), but you can bring fps to knees with 100 polygons if each polygon fills entire screen, uses alpha-blending and is textured with a highly-detailed texture.
If you think about it, you may notice that there are a lot of videocards that can draw a landscape, but fps drops when you're doing framebuffer effects (like blur, HDR, etc).
Also, you may get performance drop with textured surfaces if you have built-in GPU. When I fried PCIEE slot on previous motherboard, I had to work with built-in GPU (NVidia 6800 or something). Results weren't pleasant. While GPU supported shader model 3.0 and could use relatively computationally expensive shaders, fps rapidly dropped each time when there was a textured object on screen. Obviously happened because built-in GPU used part of system memory as video memory, and transfer rates in "normal" GPU memory and system memory are different.