OpenGL - Are animations done by shaders? - c++

I started to study OpenGL and I learned how to show figures and stuff using Vertex and Fragment shader.
I created a (very) stylized man, and now I want it to move his arm and legs.
The question is: should I change vertex data in the VBO, directly in a timer function called in the main (as I did), or is it a job that should be done in the vertex shader, without touching vertex data?
I suppose the answer is the first one, but I feel like it overload CPU, instead of make the GPU work.

Both ways will work fine: if your aim it to utilise the GPU more then do the transformations in vertex shaders, otherwise you could use the CPU. Bear in mind however if checking for collisions you need data present at the CPU side....
In essence:
Doing the manipulation on the GPU means you only need to send the mesh data once, then you can send the matrix transformations to deform or animate it.
This is ideal as it greatly reduces the bandwidth of data trasmission between CPU->GPU.
It can also mean that you can upload just one copy of the mesh to the GPU and apply transforms for many different instances of the mesh to achieve varied but similar models (ie bear mesh sent to GPU make an instance at scale *2 scale *1 and scale *0.5 for Daddy bear, Mummy bear and Baby bear, and then send a Goldilocks mesh, now you have 2 meshes in memory to get 4 distinct models).
The transformed meshes however are not immediately available on the CPU side, so mesh-perfect Collision Detection will be more intensive.
Animating on the CPU means you have access to the transformed mesh, with the major caveat that you must upload that whole mesh to the GPU each frame and for each instance: more work, more data and more memory used up on both CPU and GPU sides.
CPU Advantages
Current accurate meshes are available for whatever purpose you require at all times.
CPU Disadvantages
massive data transfer between CPU and GPU: one transfer per instance per frame per model
GPU Advantages:
Upload less mesh data (once if you only need variations of the one model)
Transform individual instances in a fast parallelised way
CPU->GPU bandwidth minimised: only need to send tranformations
GPU is parallelised and can handle mesh data far more efficiently than a CPU
GPU Disadvantages
Meshes are not readily available for mesh-perfect collision detection
Mitigations to offset the overheads of transferring GPU data back to CPU:
Utilise bounding Boxes(axis aligned or non axis aligned as per your preference): this allows a small dataset to represent the model on the CPU side (8 points per box, as opposed to millions of points per mesh). If the bounding boxes collide, then transfer the mesh from GPU -> CPU and do the refined calculation to get an exact mesh to mesh collision detection. This gives the best of both worlds for the least overhead.
As the performance of the GPU could be tens, hundreds or even thousands of time higher than a CPU at processing meshes it quickly becomes evident why for performant coding as much as possible in this area is farmed out to the GPU.
Hope this helps:)

Depending on the platform and OpenGL version you can do the animations by changing the data from the vertex buffer directly(software animations) or by associating groups of vertices with corresponding animation matrices(hardware animations).
If you chose second approach(recommended where it is possible) you can send one or more of these matrices to the vertex shader as uniforms, maybe associating some "weight" factor for each matrix.
Please keep in mind that software animation will overload the CPU when you have a very high number of vertices and hardware animations would be almost free, you just multiply the vertex with correct matrix instead of model-view-projection one in the shader. Also GPUs are highly optimized for doing math operations and the are very fast compared to CPUs.


Large 3D scene streaming

I'm working on a 3D engine suitable for very large scene display.
Appart of the rendering itself (frustum culling, occlusion culling, etc.), I'm wondering what is the best solution for scene management.
Data is given as a huge list of 3D meshs, with no relation between them, so I can't generate portals, I think...
The main goal is to be able to run this engine on systems with low RAM (500MB-1GB), and the scenes loaded into it are very large and can contain millions of triangles, which leads to very intensive memory usage. I'm actually working with a loose octree right now, constructed on loading, it works well on small and medium scenes, but many scenes are just to huge to fit entirely in memory, so here come my question:
How would you handle scenes to load and unload chunks dynamically (and ideally seamlessly), and what would you base on to determine if a chunk should be loaded/unloaded? If needed, I can create a custom file format, as scenes are being exported using a custom exporter on known 3D authoring tools.
Important information: Many scenes can't be effectively occluded, because of their construction.
Example: A very huge pipe network, so there isn't so much occlusion but very high number of elements.
I think that the best solution will be a "solution pack", a pack of different techniques.
Level of detail(LOD) can reduce memory footprint if unused levels are not loaded. It can be changed more or less seamlessly by using an alpha mix between the old and the new detail. The easiest controller will use mesh distance to camera.
Freeing the host memory(RAM) when the object has been uploaded to the GPU (device), and obviously free all unsued memory (OpenGL resources too). Valgrind can help you with this one.
Use low quality meshes and use tessellation to increase visual quality.
Use VBO indexing, this should reduce VRAM usage and increase performance
Don't use meshes if possible, terrain can be rendered using heightmaps. Some things can be procedurally generated.
Use bump or/and normalmaps. This will improve quality, then you can reduce vertex count.
Divide those "pipes" into different meshes.
Fake 3D meshes with 2D images: impostors, skydomes...
If the vast amount of ram is going to be used by textures, there are commercial packages available such as the GraniteSDK that offer seamless LOD-based texture streaming using a virtual texture cache. See . Alternatively you can look at
In fact you can use the same technique to construct poly's on the fly from texture data in the shader, but it's going to be a bit more complicated.
For voxels there is a techniques to construct oct-trees entirely in GPU memory, and page in/out the parts you really need. The rendering can then be done using raycasting. See this post: Use octree to organize 3D volume data in GPU , and
It comes down to how static the scene is going to be, and following from that, how well you can pre-bake the data according to your vizualization needs. It would already help if you can determine visibility constraints up front (e.g. google Potential Visiblity Sets) and organize it so that you can stream it at request. Since the visualizer will have limits, you always end up with a strategy to fit a section of the data into GPU memory as quickly and accurately as possible.

What is "GPU Cache" from a OpenGL/DirectX programmer prespective?

Maya promo video explains how GPU Cache affects user making application run faster. In frameworks like Cinder we redraw all geopetry we want to be in the scene on each frame update sending it to video card. So I worder what is behind GPU Caching from a programmer prespective? What OpenGL/DirectX APIs are behind such technology? How to "Cache" my mesh in GPU memory?
There is, to my knowledge, no way in OpenGL or DirectX to directly specify what is to be, and not to be, stored and tracked on the GPU cache. There are however methodologies that should be followed and maintained in order to make best use of the cache. Some of these include:
Batch, batch, batch.
Upload data directly to the GPU
Order indices to maximize vertex locality across the mesh.
Keep state changes to a minimum.
Keep shader changes to a minimum.
Keep texture changes to a minimum.
Use maximum texture compression whenever possible.
Use mipmapping whenever possible (to maximize texel sampling locality)
It is also important to keep in mind that there is no single GPU cache. There are multiple (vertex, texture, etc.) independent caches.
OpenGL SuperBible - Memory Bandwidth and Vertices
GPU Gems - Graphics Pipeline Performance
GDC 2012 - Optimizing DirectX Graphics
First off, the "GPU cache" terminology that Maya uses probably refers to graphics data that is simply stored on the card refers to optimizing a mesh for device-independent storage and rendering in Maya . For card manufacturer's the notion of a "GPU cache" is different (in this case it means something more like the L1 or L2 CPU caches).
To answer your final question: Using OpenGL terminology, you generally create vertex buffer objects (VBO's). These will store the data on the card. Then, when you want to draw, you can simply instruct the card to use those buffers.
This will avoid the overhead of copying the mesh data from main (CPU) memory into graphics (GPU) memory. If you need to draw the mesh many times without changing the mesh data, it performs much better.

OpenGL vector graphics rendering performance on mobile devices

It is generally advised not to use vector graphics in mobile games, or pre-rasterize them - for performance. Why is that? I though that OpenGL is at least as good at drawing lines / triangles as rendering images on screen...
Rasterizing them caches them as images so less overhead takes place vs calculating every coordinate for vector and drawing (more draw cycles and more cpu usage). Drawing a vector is exactly that, you are drawing arcs from point to point on every single call vs displaying an image at a certain coordinate with a cached image file.
Although using impostors is a great optimization trick, depending on the impostors shape, how much overdraw is involved and whenever you may need blending in the process the trick can get you to be fillrate bound. Also in some scenarios where shapes may change, caching the graphics into impostors may not be feasible or may incur in other overheads. Is at matter of balancing your rendering pipeline.
The answer depends on the hardware. Are you using a GPU or NOT?
Today modern mobile devices with Android and IOS have a GPU unit embedded in the chipset.
This GPUs are very good with vector graphics. To probe this point most GPU's have a dedicated Geometry processor in addition to 1 or more pixel processors. (By example Mali-400 GPU).
By example let's say you want to draw a 200 trasparent circles of different colors.
If you do it with modern OpenGL, you will only need one set of geometry (a list of triangles forming a circle) and a list of parameters for each circle, let's say position and color. If you provide this information to the GPU, it will draw it in parallel very quickly.
If you do it using different textures for each color, your program will be very heavy (in storage size) and probably will be more slow due memory bandwidth problems.
It depends on what you want to do, and the hardware. If your hardware doesn't have a GPU you probably should pre-render your graphics.

Per-line texture processing accelerated with OpenGL/OpenCL

I have a rendering step which I would like to perform on a dynamically-generated texture.
The algorithm can operate on rows independently in parallel. For each row, the algorithm will visit each pixel in left-to-right order and modify it in situ (no distinct output buffer is needed, if that helps). Each pass uses state variables which must be reset at the beginning of each row and persist as we traverse the columns.
Can I set up OpenGL shaders, or OpenCL, or whatever, to do this? Please provide a minimal example with code.
If you have access to GL 4.x-class hardware that implements EXT_shader_image_load_store or ARB_shader_image_load_store, I imagine you could pull it off. Otherwise, in-situ read/write of an image is generally not possible (though there are ways with NV_texture_barrier).
That being said, once you start wanting pixels to share state the way you do, you kill off most of your potential gains from parallelism. If the value you compute for a pixel is dependent on the computations of the pixel to its left, then you cannot actually execute each pixel in parallel. Which means that the only parallelism your algorithm actually has is per-row.
That's not going to buy you much.
If you really want to do this, use OpenCL. It's much friendlier to this kind of thing.
Yes, you can do it. No, you don't need 4.X hardware for that, you need fragment shaders (with flow control), framebuffer objects and floating point texture support.
You need to encode your data into 2D texture.
Store "state variable" in 1st pixel for each row, and encode the rest of the data into the rest of the pixels. It goes without saying that it is recommended to use floating point texture format.
Use two framebuffers, and render them onto each other in a loop using fragment shader that updates "state variable" at the first column, and performs whatever operation you need on another column, which is "current". To reduce amount of wasted resources you can limit rendering to columns you want to process. NVidia OpenGL SDK examples had "game of life", "GDGPU fluid", "GPU partciles" demos that work in similar fashion - by encoding data into texture and then using shaders to update it.
However, because you can do it, it doesn't mean you should do it and it doesn't mean that it is guaranteed to be fast. Some GPUs might have a very high memory texture memory read speed, but relatively slow computation speed (and vice versa) and not all GPUs have many conveyors for processing things in parallel.
Also, depending on your app, CUDA or OpenCL might be more suitable.

How to speed up offscreen OpenGL rendering with large textures on Win32?

I'm developing some C++ code that can do some fancy 3D transition effects between two images, for which I thought OpenGL would be the best option.
I start with a DIB section and set it up for OpenGL, and I create two textures from input images.
Then for each frame I draw just two OpenGL quads, with the corresponding image texture.
The DIB content is then saved to file.
For example one effect is to locate the two quads (in 3d space) like two billboards, one in front of the other(obscuring it), and then swoop the camera up, forward and down so you can see the second one.
My input images are 1024x768 or so and it takes a really long time to render (100 milliseconds) when the quads cover most of the view. It speeds up if the camera is far away.
I tried rendering each image quad as hundreds of individual tiles, but it takes just the same time, it seems like it depends on the number of visible textured pixels.
I assumed OpenGL could do zillions of polygons a second. Is there something I am missing here?
Would I be better off using some other approach?
Thanks in advance...
Edit :
The GL strings show up for the DIB version as :
Vendor : Microsoft Corporation
Version: 1.1.0
Renderer : GDI Generic
The Onscreen version shows :
Vendor : ATI Technologies Inc.
Version : 3.2.9756 Compatibility Profile Context
Renderer : ATI Mobility Radeon HD 3400 Series
So I guess I'll have to use FBO's , I'm a bit confused as to how to get the rendered data out from the FBO onto a DIB, any pointers (pun intended) on that?
It sounds like rendering to a DIB is forcing the rendering to happen in software. I'd render to a frame buffer object, and then extract the data from the generated texture. has a pretty decent tutorial.
Keep in mind, however, that graphics hardware is oriented primarily toward drawing on the screen. Capturing rendered data will usually be slower that displaying it, even when you do get the hardware to do the rendering -- though it should still be quite a bit faster than software rendering.
Edit: Dominik Göddeke has a tutorial that includes code for reading back texture data to CPU address space.
One problem with your question:
You provided no actual rendering/texture generation code.
Would I be better off using some other approach?
The simplest thing you can do is to make sure your textures have sizes equal to power of two. I.e. instead of 1024x768 use 1024x1024, and use only part of that texture. Explanation: although most of modern hardware supports non-pow2 textures, they are sometimes treated as "special case", and using such texture MAY produce performance drop on some hardware.
I assumed OpenGL could do zillions of polygons a second. Is there something I am missing here?
Yes, you're missing one important thing. There are few things that limit GPU performance:
1. System memory to video memory transfer rate (probably not your case - only for dynamic textures\geometry when data changes every frame).
2. Computation cost. (If you write a shader with heavy computations, it will be slow).
3. Fill rate (how many pixels program can put on screen per second), AFAIK depends on memory speed on modern GPUs.
4. Vertex processing rate (not your case) - how many vertices GPU can process per second.
5. Texture read rate (how many texels per second GPU can read), on modern GPUs depends on GPU memory speed.
6. Texture read caching (not your case) - i.e. in fragment shader you can read texture few hundreds times per pixel with little performance drop IF coordinates are very close to each other (i.e. almost same texel in each read) - because results are cached. But performance will drop significantly if you'll try to access 100 randomly located texels for every pixels.
All those characteristics are hardware dependent.
I.e., depending on some hardware you may be able to render 1500000 polygons per frame (if they take a small amount of screen space), but you can bring fps to knees with 100 polygons if each polygon fills entire screen, uses alpha-blending and is textured with a highly-detailed texture.
If you think about it, you may notice that there are a lot of videocards that can draw a landscape, but fps drops when you're doing framebuffer effects (like blur, HDR, etc).
Also, you may get performance drop with textured surfaces if you have built-in GPU. When I fried PCIEE slot on previous motherboard, I had to work with built-in GPU (NVidia 6800 or something). Results weren't pleasant. While GPU supported shader model 3.0 and could use relatively computationally expensive shaders, fps rapidly dropped each time when there was a textured object on screen. Obviously happened because built-in GPU used part of system memory as video memory, and transfer rates in "normal" GPU memory and system memory are different.