What tasks in 3d program does CPU / GPU handle? - opengl

When rotating a scene in a 3d modeling interface, which part of such task is CPU responsible for and which part the GPU takes on? (mesh verticies moving, shading, keeping track of UV coords - perhaps offsetting them, lighting the triangles and rendering transparency correctly)
What rendering mode is normally used via such modeling programs (realtime) - immediate or retained?

First and formost the GPU is responsible for putting points, lines and triangles to the screen. However this involves a certain amount of calculations.
The usual pipeline is, that for each vertex (which is a combination of attributes, that usually includes, but is not limited to position, normals, texture coordinates and so on), the vertex position is transformed from model local space into normalized device coordinates. This is in most implementations a 3-stage process.
transformation from model local space into view=eye space – eye space coordinates are later reused for things like illumination calculations
transformation from view space to clip space, also called projection; this is determined which part of the view space will later be visible in the viewport; it's also where affine perspective is introduced
mapping into normalized device coordinates, by coordinate homogenization (this later step actually creates perspective if an affine projection is used).
The above calculations are normally carried out by the GPU.
When rotating a scene in a 3d modeling interface, which part of such task is CPU responsible for and which part the GPU takes on?
Well, that depends on what kind of rotation you mean. If you mean an alteration of the viewport but nothing in the scene input data is actually changed. The only thing that gets altered is a parameter used in the first transformation step. This parameter is normally a 4×4 matrix. When rotating the viewport a new modelview transformation matrix is calculated on the CPU. This matrix is then passed to the GPU and the whole scene redrawn.
If however a model is actually modified in a modeller, then the calculations are usually carried out on the CPU.
(mesh verticies moving, shading, keeping track of UV coords - perhaps offsetting them, lighting the triangles and rendering transparency correctly)
In an online renderer this normally done mostly by the GPU, but certain parts may be precalculated by the CPU.
It's impossible to make a definitive statement, because how the workload is shared depends on the actual application.

This question is really I mean REALLY vague.
Generally whatever comes, because it really depends on the program and it's makers. Older programs were mostly all CPU since GPUs where non existent or too weak to handle massive scenes.
However today GPUs are powerful enough to handle massive scenes, the programs creators can offer various solutions but its usually an abstracted system where you have your data and your view and thus they allow you to specify what you want with in the editing viewport : realtime / immediate, preprocessed , high or low detail. Programs usually sacrifice accuracy for speed so you can edit with ease.
For example 3ds max uses rendering devices, viewport handlers and renderers. Renderers handle the production quality output but today they are not limited to CPU since they can take advantage of the GPU ( just think about OpenCL or CUDA ) while maintaining the quality and lowering rendering times. Secondly anyone can make plugins to implement a viewport renderer the way they want it let it be CPU or GPU or a mixed renderer.
So abstraction being common in modelling tools the scene information is fed into a viewport renderer which is usually very similar to a game engine's renderer. If you think about it , because it's a tool and has an UI and various systems that it needs to handle the try to offload the CPU as much as possible by doing as much rendering work as possible on the GPU, so editing takes place in memory on the in the "general" data structure than that is displayed to you, I'm not sure but I think they may also use the graphics api ( let it be OpenGL or DirectX ) to do the picking ( selection ).
The rendering mode is as I would describe it "on-demand" it usually renders when it needs to, so in a scene where realtime preview is off, it would only render if you modify something or move the camera, and as soon as you do something that needs constant updating it will do exactly that.
On top of all this there are hybrid methods, where the users desire production like quality which even to this day is difficult with GPUs so they settle with a fast watered down version of a real renderer to get as close to production quality as possible, one simple example is 3dsMax's 'Realistic' viewport it does Ambient Occlusion its not "realtime" but it does it fast enough so it's actually useful. In more advanced cases they make special extension cards to handle fast raytracing to be able to do fast / good quality graphics but still even in these cases, the main thing is the same they store the editable data in a generic internal format and feed that into some sort of renderer that outputs something, not necessarily the same that you would get from very high quality offline renderer but still it gives a good outline of what it will be like.

Related

Efficiently providing geometry for terrain physics

I have been researching different approaches to terrain systems in game engines for a bit now, trying to familiarize myself with the work. A number of the details seem straightforward, but I am getting hung up on a single detail.
For performance reasons many terrain solutions utilize shaders to generate parts or all of the geometry, such as vertex shaders to generate positions or tessellation shaders for LoD. At first I figured those approaches were exclusively for renders that weren't concerned about physics simulations.
The reason I say that is because as I understand shaders at the moment, the results of a shader computation generally are discarded at the end of the frame. So if you rely on shaders heavily then the geometry information will be gone before you could access it and send it off to another system (such as physics running on the CPU).
So, am I wrong about shaders? Can you store the results of them generating geometry to be accessed by other systems? Or am I forced to keep the terrain geometry on CPU and leave the shaders to the other details?
Shaders
You understand parts of the shaders correctly, that is: after a frame, the data is stored as a final composed image in the backbuffer.
BUT: Using transform feedback it is possible to capture transformed geometry into a vertex buffer and reuse it. Transform Feedback happens AFTER the vertex/geometry/tessellation shader, so you could use the geometry shader to generate a terrain (or visible parts of it once), push it through transform-feedback and store it.
This way, you potentially could use CPU collision detection with your terrain! You can even combine this with tessellation.
You will love this: A Framework for Real-Time, Deformable Terrain.
For the LOD and tessellation: LOD is not the prerequisite of tessellation. You can use tessellation to allow some more sophisticated effects such as adding a detail by recursive subdivision of rough geometry. Linking it with LOD is simply a very good optimization avoiding RAM-memory based LOD-mesh-levels, since you just have your "base mesh" and subdivide it (Although this will be an unsatisfying optimization imho).
Now some deeper info on GPU and CPU exclusive terrain.
GPU Generated Terrain (Procedural)
As written in the NVidia article Generating Complex Procedural Terrains Using the GPU:
1.2 Marching Cubes and the Density Function Conceptually, the terrain surface can be completely described by a single function, called the
density function. For any point in 3D space (x, y, z), the function
produces a single floating-point value. These values vary over
space—sometimes positive, sometimes negative. If the value is
positive, then that point in space is inside the solid terrain.
If the value is negative, then that point is located in empty space
(such as air or water). The boundary between positive and negative
values—where the density value is zero—is the surface of the terrain.
It is along this surface that we wish to construct a polygonal mesh.
Using Shaders
The density function used for generating the terrain, must be available for the collision-detection shader and you have to fill an output buffer containing the collision locations, if any...
CUDA
See: https://www.youtube.com/watch?v=kYzxf3ugcg0
Here someone used CUDA, based on the NVidia article, which however implies the same:
In CUDA, performing collision detection, the density function must be shared.
This will however make the transform feedback techniques a little harder to implement.
Both, Shaders and CUDA, imply resampling/recalculation of the density at at least one location, just for the collision detection of a single object.
CPU Terrain
Usually, this implies a RAM-memory stored set of geometry in the form of vertex/index-buffer pairs, which are regularly processed by the shader-pipeline. As you have the data available here, you will also most likely have a collision mesh, which is a simplified representation of your terrain, against which you perform collision.
Alternatively you could spend your terrain a set of colliders, marking the allowed paths, which is imho performed in the early PS1 Final Fantasy games (which actually don't really have a terrain in the sense we understand terrain today).
This short answer is neither extensively deep nor complete. I just tried to give you some insight into some concepts used in dozens of solutions.
Some more reading: http://prideout.net/blog/?tag=opengl-transform-feedback.

How lighting in building games with unlimited number of lights works?

In Minecraft for example, you can place torches anywhere and each one effects the light level in the world and there is no limit to the amount of torches / light sources you can put down in the world. I am 99% sure that the lighting for the torches is taken care of on the CPU and stored for each block and so when rendering the light value at that certain block just needs to be passed into the shader, but light sources cannot move for this reason. If you had a game where you could place light sources that could move around (arrow on fire, minecart with a light on it, glowing ball of energy) and the lighting wasn't as simple (color was included) what are the most efficient ways to calculate the lighting effects.
From my research I have found differed rendering, differed lighting, dynamically creating shaders with different amounts of lights available and using a for loop (can't use uniforms due to unrolling), and static light maps (these would probably only be used for the still lights). Are there any other ways to do lighting calculations such as doing what minecraft does except allowing moving lights, or is it possible to take an infinite amount of lights and mathematically combine them into an approximation that only involves a few lights (this is an idea I came up with but I can't figure out how it could be done)?
If it helps, I am a programmer with decent experience in OpenGL (legacy and modern) so you can give me code snippets although I have not done too much with lighting so brief explanations would be appreciated. I am also willing to do research if you can point me in the right direction!
Your title is a bit misleading infinite light implies directional light in infinite distance like Sun. I would use unlimited number of lights instead. Here some approaches for this I know of:
(back) ray-tracers
they can handle any number of light sources natively. Light is just another object in engine. If ray hits the light source it just take the light intensity and stop the recursion. Unfortunately current gfx hardware is not suited for this kind of rendering. There are GPU enhanced engines for this but the specialized gfx HW is still in development and did not hit the market yet. Memory requirements are not much different then standard BR rendering and You can still use BR meshes but mathematical (analytical) meshes are natively supported and are better for this.
Standard BR rendering
BR means boundary representation such engines (Like OpenGL fixed function) can handle only limited number of lights. This is because each primitive/fragment needs the complete list of lights and the computations are done for all light on per primitive or per fragment basis. If you got many light this would be slow.
GLSL example of fixed number of light sources see the fragment shader
Also the current GPU's have limited memory for uniforms (registers) in which the lights and other rendering parameters are stored so there are possible workarounds like have light parameters stored in a texture and iterate over all of them per primitive/fragment inside GLSL shader but the number of lights affect performance of coarse so you are limited by target frame-rate and computational power. Additional memory requirements for this is just the texture with light parameters which is not so much (few vectors per light).
light maps
they can be computed even for moving objects. Complex light maps can be computed slowly (not per frame). This leads to small lighting artifacts but you need to know what to look for to spot it. Light maps and shadow maps are very similar and often computed at once. There are simple light maps and complex radiation maps models out there
look Shading mask algorithm for radiation calculations
These are either:
projected 2D maps (hard to implement/use and often less precise)
3D Voxel maps (Memory demanding but easier to compute/use)
Some approaches uses pre-rendered Z-Buffer as geometry source and then fill the lights via Radiosity or any other technique. These can handle any number of lights as these maps can be computation demanding they are often computed in the background and updated once in a while.
fast moving light sources are usually updated more often or excluded from maps and rendered as transparent geometry to make impression of light. The computational power needed for this depends on the computation method the basic are done like:
set a camera to the larges visible surfaces
render scene and handle the result as light/shadow map
store it as 2D or 3D texture or voxel map
and then continue with normal rendering from camera view
So you need to render scene more then once per frame/map update and also need additional buffers to store the rendered result which for high resolution or Voxel maps can be a big chunk of memory.
multi pass light layer
there are cases when light is added after rendering of the scene for example I used it for
Atmospheric scattering in GLSL
Here comes all multi pass rendering techniques you need additional buffers to store the sub results and usually the multi pass rendering is done on the same view/scene so pre-rendered geometry is used which significantly speeds this up either as locked VAO or as already rendered Z-buffer Color and Index buffers from first pass. After this handle next passes as single or few Quads (like in the Atmospheric scattering link) so the computational power needed for this is not much bigger in comparison to basic BR rendering
forward rendering vs. deferred-rendering
in a google this forward rendering vs. deferred-rendering is first relevant hit I found. It is not very good one (a bit to vague for my taste) but for starters it is enough
forward rendering techniques are usually standard single pass BR renders
deffered rendering is standard multi pass renders. In first pass is rendered all the geometries of the scene into Z buffer, Color buffer and some auxiliary buffers just to know which fragment of the result belongs to which object,material,... And then in the next passes are added effects,light,shadows,... but the geometry is not rendered again instead just single or few overlay QUADs/per pass are rendered so the next passes are usually pretty fast ...
The link suggest that for high lights number is the deffered rendering more suited but that strongly depends on which of the previous technique is used. Usually the multi pass light layer is used (with is one of the standard deffered rendering techniques) so in that case it is true, and the memory and computational power demands are the same see the previous section.

Why does stereo 3D rendering require software written especially for it?

Given a naive take on 3D graphics rendering it seems that stereo 3D rendering should be essentially transparent to the developer and be entirely a feature of the graphics hardware and drivers. Wherever an OpenGL window is displaying a scene, it takes the geometry, lighting, camera and texture etc. information to render a 2D image of the scene.
Adding stereo 3D to the scene seems to essentially imply using two laterally offset cameras where there was originally one, and all other scene variables stay the same. The only additional information then would be how far apart to make the cameras and how far out to to make their central rays converge. Given this it would seem trivial to take a GL command sequence and interleave the appropriate commands at driver level to drive a 3D rendering.
It seems though applications need to be specially written to make use of special 3D hardware architectures making it cumbersome and prohibitive to implement. Would we expect this to be the future of stereo 3D implementations or am I glossing over too many important details?
In my specific case we are using a .net OpenGL viewport control. I originally hoped that simply having stereo enabled hardware and drivers would be enough to enable stereo 3D.
Your assumptions are wrong. OpenGL does not "take geometry, lighting camera and texture information to render a 2D image". OpenGL takes commands to manipulate its state machine and commands to execute draw calls.
As Nobody mentions in his comment, the core profile does not even care about transformations at all. The only thing it really provides you with now is ways to provide arbitrary data to a vertex shader, and an arbitrary 3D cube to do rendering to. Wether that corresponds or not to the actual view, GL does not care, nor should it.
Mind you, some people have noticed that a driver can try to guess what's the view and what's not, and this is what the nvidia driver tries to do when doing automatic stereo rendering. This requires some specific guess-work, which amounts to actual analysis of game rendering to tweak the algorithms so that the driver guesses right. So it's typically a per-title, in-driver change. And some developers have noticed that the driver can guess wrong, and when that happens, it starts to get confusing. See some first-hand account of those questions.
I really recommend you read that presentation, because it makes some further points as to where the camera should be pointing towards (should the 2 view directions be parallel and such).
Also, It turns out that is essentially costs twice as much rendering for everything that is view dependent. Some developers (including, for example, the Crytek guys, see Part 2), figured out that to a great extent, you can do a single render, and fudge the picture with additional data to generate the left and right eye pictures.
The amount of saved work here is worth a lot by itself, for the developer to do this themselves.
Stereo 3D rendering is unfortunately more complex than just adding a lateral camera offset.
You can create stereo 3D from an original 'mono' rendered frame and the depth buffer. Given the range of (real world) depths in the scene, the depth buffer for each value tells you how far away the corresponding pixel would be. Given a desired eye separation value, you can slide each pixel left or right depending on distance. But...
Do you want parallel axis stereo (offset asymmetrical frustums) or 'toe in' stereo where the two cameras eventually converge? If the latter, you will want to tweak the camera angles scene by scene to avoid 'reversing' bits of geometry beyond the convergence point.
For objects very close to the viewer, the left and right eyes see quite different images of the same object, even down to the left eye seeing one side of the object and the right eye the other side - but the mono view will have averaged these out to just the front. If you want an accurate stereo 3D image, it really does have to be rendered from different eye viewpoints. Does this matter? FPS shooter game, probably not. Human surgery training simulator, you bet it does.
Similar problem if the viewer tilts their head to one side, so one eye is higher than the other. Again, probably not important for a game, really important for the surgeon.
Oh, and do you have anti-aliasing or transparency in the scene? Now you've got a pixel which really represents two pixel values at different depths. Move an anti-aliased pixel sideways and it probably looks worse because the 'underneath' color has changed. Move a mostly-transparent pixel sideways and the rear pixel will be moving too far.
And what do you do with gunsight crosses and similar HUD elements? If they were drawn with depth buffer disabled, the depth buffer values might make them several hundred metres away.
Given all these potential problems, OpenGL sensibly does not try to say how stereo 3D rendering should be done. In my experience modifying an OpenGL program to render in stereo is much less effort than writing it in the first place.
Shameless self promotion: this might help
http://cs.anu.edu.au/~Hugh.Fisher/3dteach/stereo3d-devel/index.html

Is it possible to reuse glsl vertex shader output later?

I have a huge mesh(100k triangles) that needs to be drawn a few times and blend together every frame. Is it possible to reuse the vertex shader output of the first pass of mesh, and skip the vertex stage on later passes? I am hoping to save some cost on the vertex pipeline and rasterization.
Targeted OpenGL 3.0, can use features like transform feedback.
I'll answer your basic question first, then answer your real question.
Yes, you can store the output of vertex transformation for later use. This is called Transform Feedback. It requires OpenGL 3.x-class hardware or better (aka: DX10-hardware).
The way it works is in two stages. First, you have to set your program up to have feedback-based varyings. You do this with glTransformFeedbackVaryings. This must be done before linking the program, in a similar way to things like glBindAttribLocation.
Once that's done, you need to bind buffers (given how you set up your transform feedback varyings) to GL_TRANSFORM_FEEDBACK_BUFFER with glBindBufferRange, thus setting up which buffers the data are written into. Then you start your feedback operation with glBeginTransformFeedback and proceed as normal. You can use a primitive query object to get the number of primitives written (so that you can draw it later with glDrawArrays), or if you have 4.x-class hardware (or AMD 3.x hardware, all of which supports ARB_transform_feedback2), you can render without querying the number of primitives. That would save time.
Now for your actual question: it's probably not going to help buy you any real performance.
You're drawing terrain. And terrain doesn't really get any transformation. Typically you have a matrix multiplication or two, possibly with normals (though if you're rendering for shadow maps, you don't even have that). That's it.
Odds are very good that if you shove 100,000 vertices down the GPU with such a simple shader, you've probably saturated the GPU's ability to render them all. You'll likely bottleneck on primitive assembly/setup, and that's not getting any faster.
So you're probably not going to get much out of this. Feedback is generally used for either generating triangle data for later use (effectively pseudo-compute shaders), or for preserving the results from complex transformations like matrix palette skinning with dual-quaternions and so forth. A simple matrix multiply-and-go will barely be a blip on the radar.
You can try it if you like. But odds are you won't have any problems. Generally, the best solution is to employ some form of deferred rendering, so that you only have to render an object once + X for every shadow it casts (where X is determined by the shadow mapping algorithm). And since shadow maps require different transforms, you wouldn't gain anything from feedback anyway.

OpenGL voxel engine slow

I'm making a voxel engine in C++ and OpenGL (à la Minecraft) and can't get decent fps on my 3GHz with ATI X1600... I'm all out of ideas.
When I have about 12000 cubes on the screen it falls to under 20fps - pathetic.
So far the optimizations I have are: frustum culling, back face culling (via OpenGL's glEnable(GL_CULL_FACE)), the engine draws only the visible faces (except the culled ones of course) and they're in an octree.
I've tried VBO's, I don't like them and they do not significantly increase the fps.
How can Minecraft's engine be so fast... I struggle with a 10000 cubes, whereas Minecraft can easily draw much more at higher fps.
Any ideas?
#genpfault: I analyze the connectivity and just generate faces for the outer, visible surface. The VBO had a single cube that I glTranslate()d
I'm not an expert at OpenGL, but as far as I understand this is going to save very little time because you still have to send every cube to the card.
Instead what you should do is generate faces for all of the outer visible surface, put that in a VBO, and send it to the card and continue to render that VBO until the geometry changes. This saves you a lot of the time your card is actually waiting on your processor to send it the geometry information.
You should profile your code to find out if the bottleneck in your application is on the CPU or GPU. For instance it might be that your culling/octtree algorithms are slow and in that case it is not an OpenGL-problem at all.
I would also keep count of the number of cubes you draw on each frame and display that on screen. Just so you know your culling routines work as expected.
Finally you don't mention if your cubes are textured. Try using smaller textures or disable textures and see how much the framerate increases.
gDEBugger is a great tool that will help you find bottlenecks with OpenGL.
I don't know if it's ok here to "bump" an old question but a few things came up my mind:
If your voxels are static you can speed up the whole rendering process by using an octree for frustum culling, etc. Furthermore you can also compile a static scene into a potential-visibility-set in the octree. The main principle of PVS is to precompute for evere node in the tree which other nodes are potential visible from it and store pointers to them in a vector. When it comes to rendering you first check in which node the camera is placed and then run frustum culling against all nodes in the PVS-vector of the node.(Carmack used something like that in the Quake engines, but with Binary Space Partitioning trees)
If the shading of your voxels is kindalike complex it is also fast to do a pre-Depth-Only-Pass, without writing into the colorbuffer,just to fill the Depthbuffer. After that you render a 2nd pass: disable writing to the Depthbuffer and render only to the Colorbuffer while checking the Depthbuffer. So you avoid expensive shader-computations which are later overwritten by a new fragment which is closer to the viewer.(Carmack used that in Quake3)
Another thing which will definitely speed up things is the use of Instancing. You store only the position of each voxel and, if nescessary, its scale and other parameters into a texturebufferobject. In the vertexshader you can then read the positions of the voxels to be spawned and create an instance of the voxel(i.e. a cube which is given to the shader in a vertexbufferobject). So you send the 8 Vertices + 8 Normals (3 *sizeof(float) *8 +3 *sizeof(float) *8 + floats for color/texture etc...) only once to the card in the VBO and then only the positions of the instances of the Cube(3*sizeof(float)*number of voxels) in the TBO.
Maybe it is possibile to parallelize things between GPU and CPU by combining all 3 steps in 2 threads, in the CPU-thread you check the octrees pvs and update a TBO for instancing in the next frame, the GPU-thread does meanwhile render the 2 passes while using an TBO for instancing which was created by the CPU thread in the previous step. After that you switch TBOs. If the Camera has not moved you don't even have to do the CPU-calculations again.
Another kind of tree you me be interested in is the so called k-d-tree, which is more general than octrees.
PS: sorry for my english, it's not the clearest....
There are 3rd-party libraries you could use to make the rendering more efficient. For example the C++ PolyVox library can take a volume and generate the mesh for you in an efficient way. It has built-in methods for reducing triangle count and helping to generate things like ambient occlusion. It's got a good community around it so getting support on the forum should be easy.
Have you used a common display list for all your cubes ?
Do you skip calling drawing code of cubes which are not visible to the user ?