Imagine Im having my camera, and two squares in a 3D openGL context (using perspective) as follows :
From Top:
/
/
/ Square 1
Camera -> + V Square 2
\ | V
\ | |
\ |
So, what I will do is draw both using glBegin() and glEnd() and let the Z buffer do it's job. So far so good.
Now, Imagine I want to draw 1 million of those squares someones will be behind others of course. What will be faster, doing the last mentioned proccess for all or maybe I can make some math and discard the ones I DONT need to draw. Example:
if (should_I_Draw_It)
{
glBegin();
/*Draw*/
glEnd();
}
EDIT:
It's a dynamic scene, objects may be created, destroyed, moved and/or modified.
What you want to do is called occlusion culling. Simple algorithms are very inefficient on the CPU and they should be used only when there are big objects in the foreground and small objects in the background.
Nvidia describes in the GPU Gems Chapter 29 an efficient way for occlusion culling. You can try this to improve the efficiency of your rendering
Occlusion culling for so many dynamic objects would almost always be slower than just drawing everything. Your best bet in a dynamic scene may be to just do very simple view frutrum culling. The issue is that since you are only drawing boxes with 6 quads each, it probably is still faster to just draw them all instead of spending the time to decide if you should draw them.
Regardless the simplest test you can do is check to see if the box is directly behind/perpendicular to the camera relative to the direction you are looking and if it is far enough away (bounding radius) from the camera to not intersect it.
Modern graphics drivers will automatically optimize what it does/does not draw to some degree so you can rely on that to help a bit.
Related
Is there any difference in performance between drawing a scene with full triangles (GL_TRIANGLES) instead of just drawing their vertices (GL_POINTS), on modern hardware?
Where GL_POINTS is initialized like this:
glPointSize(1.0);
glDisable(GL_POINT_SMOOTH);
I have a somewhat low-end graphics card (9600gt) and drawing vertices-only can bring a 2x fps increase on certain sceneries. Not sure if it applies too on more recent gpus.
2x fps increase on
You lose 98% of picture and get only 2x fps increase. That's not impressive. If you take into account that you should be able to easily render 300..500 fps on any decent hardware (with vsync disabled and minor optimizations), that's probably not worth it.
Is there any difference in performance between drawing a scene with full triangles (GL_TRIANGLES) instead of just drawing their vertices (GL_POINTS), on modern hardware?
Well, if your scene has a LOT of alpha-blending and very "heavy" pixel shaders, then, obviously, displaying scene as point cloud will speed things up, because there's less pixels to fill.
On other hand, this kind of "optimization" will be completely useless for any practical task. I mean, if you're using blending and shaders, you probably wouldn't want to display your scene as pointlist in the first place, unless you're doing some kind of debug render (using glPolygonMode), and in case of debug render, you'll probably turn shaders off (because shaded/lit point will be hard to see) and disable lighting.
Even if you're using point sprites as particles or something, I'd stick with triangles - they give more control and do not have maximum size limit (compared to point sprites).
I can display more objects?
If you want more objects, you should probably try to optimzie things elsewhere first. If you stop trying to draw invisible objects (outside of field of view, etc), that'll be a start that can improve performance.
you have a mesh which is very far away from the camera. 1 million triangles and you know it is always in view. At this density ratio, triangles can't be bigger than a pixel,
When triangles are smaller than a pixel, and there are many of them, your mesh start looking like garbage and turns into pixelated mess of points. It will be ugly. Roughly same effect as when you disable mippimapping and texture filters and then render checkboard pattern. Using points instead of triangles might even aggravate effect.
: If you have 1mil triangle mesh that is always visible, you already need different kind of optimization. Reduce number of triangles (level of detail, dynamic tesselation or some solution that can simplify geometry on the fly), use bump mapping(maybe parallax mapping) to simulate extra geometry details that aren't even here, or even turn it into static background or a sprite. That'll work much better. Trying to render it using points will simply make it look ugly.
No, if the number of triangles is similar to the number of their shared vertices (considering the glDrawElements rendering command being used) in both modes the geometry-wise part of the rendering pipeline will be evaluated at roughly the same speed. The only benefit you can get from drawing GL_POINTS relies solely on the percentage of empty screen space you get from not drawing faces, thus only at fragment shader level.
I would like to implement a moving scene in OpenGL.
Scene description: terrain is static but all other objects are moving towards the -x axis.
Terrain is a plane in xz plane.
I have a mesh that will appear a lot of times on the terrain in several places.
But all of them will be moving towards -x axis at a specific speed.
I've thought of these possible implementations:
Create one mesh only and display it several times (I prefer this one)
Create several meshes, save them to a vector and then move them. After they leave the viewport, maybe destroy them?
The problem with the 1st way, is that I'll create meshes with a x% possibility, so this entails not knowing the number of meshes that will be needed. So how can I display them?
In example if I knew I would create 3 meshes I would do this:
glPushMatrix();
glTranslatef(mesh1 position + speed)
mesh.dray();
glPopMatrix();
glPushMatrix();
glTranslatef(mesh2 position + speed)
mesh.dray();
glPopMatrix();
glPushMatrix();
glTranslatef(mesh3 position + speed)
mesh.dray();
glPopMatrix();
Now in case we need to create meshes as long as the animation continues, how would I implement that? And secondly, what about the meshes that left the viewport? Do they continue to exist?
This answer is useless if you intend to code this in pure openGL.
However, if you are willing to try a 3rd party library, try www.ogre3d.org - I would find this really very easy to do in ogre.
In fact the challenges on 'intermediate tutorial one', if I remember correctly, should address the equivalent ogre concept of the problem you are having with openGL.
http://www.ogre3d.org/tikiwiki/tiki-index.php?page=Intermediate+Tutorial+1&structure=Tutorials
(Would've made this as a comment, only recently became active!)
Use option 2. Just dont delete them just move them back and use them again. Like for example if I wanted to count sheep... I wouldnt create 1,000,000 meshs of sheep I would create maybe 1 or 2 and just rotate between using those.
I'm writing a Minecraft like static 3d block world in C++ / openGL. I'm working at improving framerates, and so far I've implemented frustum culling using an octree. This helps, but I'm still seeing moderate to bad frame rates. The next step would be to cull cubes that are hidden from the viewpoint by closer cubes. However I haven't been able to find many resources on how to accomplish this.
Create a render target with a Z-buffer (or "depth buffer") enabled. Then make sure to sort all your opaque objects so they are rendered front to back, i.e. the ones closest to the camera first. Anything using alpha blending still needs to be rendered back to front, AFTER you rendered all your opaque objects.
Another technique is occlusion culling: You can cheaply "dry-render" your geometry and then find out how many pixels failed the depth test. There is occlusion query support in DirectX and OpenGL, although not every GPU can do it.
The downside is that you need a delay between the rendering and fetching the result - depending on the setup (like when using predicated tiling), it may be a full frame. That means that you need to be creative there, like rendering a bounding box that is bigger than the object itself, and dismissing the results after a camera cut.
And one more thing: A more traditional solution (that you can use concurrently with occlusion culling) is a room/portal system, where you define regions as "rooms", connected via "portals". If a portal is not visible from your current room, you can't see the room connected to it. And even it is, you can click your viewport to what's visible through the portal.
The approach I took in this minecraft level renderer is essentially a frustum-limited flood fill. The 16x16x128 chunks are split into 16x16x16 chunklets, each with a VBO with the relevant geometry. I start a floodfill in the chunklet grid at the player's location to find chunklets to render. The fill is limited by:
The view frustum
Solid chunklets - if the entire side of a chunklet is opaque blocks, then the floodfill will not enter the chunklet in that direction
Direction - the flood will not reverse direction, e.g.: if the current chunklet is to the north of the starting chunklet, do not flood into the chunklet to the south
It seems to work OK. I'm on android, so while a more complex analysis (antiportals as noted by Mike Daniels) would cull more geometry, I'm already CPU-limited so there's not much point.
I've just seen your answer to Alan: culling is not your problem - it's what and how you're sending to OpenGL that is slow.
What to draw: don't render a cube for each block, render the faces of transparent blocks that border an opaque block. Consider a 3x3x3 cube of, say, stone blocks: There is no point drawing the center block because there is no way that the player can see it. Likewise, the player will never see the faces between two adjacent stone blocks, so don't draw them.
How to draw: As noted by Alan, use VBOs to batch geometry. You will not believe how much faster they make things.
An easier approach, with minimal changes to your existing code, would be to use display lists. This is what minecraft uses.
How many blocks are you rendering and on what hardware? Modern hardware is very fast and is very difficult to overwhelm with geometry (unless we're talking about a handheld platform). On any moderately recent desktop hardware you should be able to render hundreds of thousands of cubes per frame at 60 frames per second without any fancy culling tricks.
If you're drawing each block with a separate draw call (glDrawElements/Arrays, glBegin/glEnd, etc) (bonus points: don't use glBegin/glEnd) then that will be your bottleneck. This is a common pitfall for beginners. If you're doing this, then you need to batch together all triangles that share texture and shading parameters into a single call for each setup. If the geometry is static and doesn't change frame to frame, you want to use one Vertex Buffer Object for each batch of triangles.
This can still be combined with frustum culling with an octree if you typically only have a small portion of your total game world in the view frustum at one time. The vertex buffers are still loaded statically and not changed. Frustum cull the octree to generate only the index buffers for the triangles in the frustum and upload those dynamically each frame.
If you have surfaces close to the camera, you can create a frustum which represents an area that is not visible, and cull objects that are entirely contained in that frustum. In the diagram below, C is the camera, | is a flat surface near the camera, and the frustum-shaped region composed of . represents the occluded area. The surface is called an antiportal.
.
..
...
....
|....
|....
|....
|....
C |....
|....
|....
|....
....
...
..
.
(You should of course also turn on depth testing and depth writing as mentioned in other answer and comments -- it's very simple to do in OpenGL.)
The use of a Z-Buffer ensures that polygons overlap correctly.
Enabling the depth test makes every drawing operation check the Z-buffer before placing pixels onto the screen.
If you have convex objects you must (for performance) enable backface culling!
Example code:
glEnable(GL_CULL_FACE);
glEnable(GL_DEPTH_TEST);
glDepthMask(GL_TRUE);
You can change the behaviour of glCullFace() passing GL_FRONT or GL_BACK...
glCullFace(...);
// Draw the "game world"...
I'm making a voxel engine in C++ and OpenGL (à la Minecraft) and can't get decent fps on my 3GHz with ATI X1600... I'm all out of ideas.
When I have about 12000 cubes on the screen it falls to under 20fps - pathetic.
So far the optimizations I have are: frustum culling, back face culling (via OpenGL's glEnable(GL_CULL_FACE)), the engine draws only the visible faces (except the culled ones of course) and they're in an octree.
I've tried VBO's, I don't like them and they do not significantly increase the fps.
How can Minecraft's engine be so fast... I struggle with a 10000 cubes, whereas Minecraft can easily draw much more at higher fps.
Any ideas?
#genpfault: I analyze the connectivity and just generate faces for the outer, visible surface. The VBO had a single cube that I glTranslate()d
I'm not an expert at OpenGL, but as far as I understand this is going to save very little time because you still have to send every cube to the card.
Instead what you should do is generate faces for all of the outer visible surface, put that in a VBO, and send it to the card and continue to render that VBO until the geometry changes. This saves you a lot of the time your card is actually waiting on your processor to send it the geometry information.
You should profile your code to find out if the bottleneck in your application is on the CPU or GPU. For instance it might be that your culling/octtree algorithms are slow and in that case it is not an OpenGL-problem at all.
I would also keep count of the number of cubes you draw on each frame and display that on screen. Just so you know your culling routines work as expected.
Finally you don't mention if your cubes are textured. Try using smaller textures or disable textures and see how much the framerate increases.
gDEBugger is a great tool that will help you find bottlenecks with OpenGL.
I don't know if it's ok here to "bump" an old question but a few things came up my mind:
If your voxels are static you can speed up the whole rendering process by using an octree for frustum culling, etc. Furthermore you can also compile a static scene into a potential-visibility-set in the octree. The main principle of PVS is to precompute for evere node in the tree which other nodes are potential visible from it and store pointers to them in a vector. When it comes to rendering you first check in which node the camera is placed and then run frustum culling against all nodes in the PVS-vector of the node.(Carmack used something like that in the Quake engines, but with Binary Space Partitioning trees)
If the shading of your voxels is kindalike complex it is also fast to do a pre-Depth-Only-Pass, without writing into the colorbuffer,just to fill the Depthbuffer. After that you render a 2nd pass: disable writing to the Depthbuffer and render only to the Colorbuffer while checking the Depthbuffer. So you avoid expensive shader-computations which are later overwritten by a new fragment which is closer to the viewer.(Carmack used that in Quake3)
Another thing which will definitely speed up things is the use of Instancing. You store only the position of each voxel and, if nescessary, its scale and other parameters into a texturebufferobject. In the vertexshader you can then read the positions of the voxels to be spawned and create an instance of the voxel(i.e. a cube which is given to the shader in a vertexbufferobject). So you send the 8 Vertices + 8 Normals (3 *sizeof(float) *8 +3 *sizeof(float) *8 + floats for color/texture etc...) only once to the card in the VBO and then only the positions of the instances of the Cube(3*sizeof(float)*number of voxels) in the TBO.
Maybe it is possibile to parallelize things between GPU and CPU by combining all 3 steps in 2 threads, in the CPU-thread you check the octrees pvs and update a TBO for instancing in the next frame, the GPU-thread does meanwhile render the 2 passes while using an TBO for instancing which was created by the CPU thread in the previous step. After that you switch TBOs. If the Camera has not moved you don't even have to do the CPU-calculations again.
Another kind of tree you me be interested in is the so called k-d-tree, which is more general than octrees.
PS: sorry for my english, it's not the clearest....
There are 3rd-party libraries you could use to make the rendering more efficient. For example the C++ PolyVox library can take a volume and generate the mesh for you in an efficient way. It has built-in methods for reducing triangle count and helping to generate things like ambient occlusion. It's got a good community around it so getting support on the forum should be easy.
Have you used a common display list for all your cubes ?
Do you skip calling drawing code of cubes which are not visible to the user ?
I'm loading 3D objects (obj or 3ds or collada files) into my openGL application. The 3 environment is quite large (a few hundred metres in all axis').
My problem is that smaller 3D objects (i.e. in the order of ~< 1-2m ) don't appear to be depth-tested properly. Depending on the zoom of the camera, I can sometimes see the back face of the object (I have been using a simple cube for testing) or other faces becoming visible/invisible/torn. Please see the attached images for a better explanation.
I am led to believe the problem is due to mipmapping being enabled. I would either like to disable mipmapping (can someone suggest a simple, fast way to do this) or set the resolution to be greater for the mipmapped objects. Or am I barking up the wrong tree completely?
Thanks
Chris
That's the result of insufficient z-buffer precission, which is an issue in games that have huge worlds but (relatively) small objects. The immediate solution would be to try using a 24 bit z-buffer instead of a 16 bit one. Another way to tackle this would be to render the game world it two steps, first the big distant objects, then clearing the zbuffer and then drawing the closer objects.
This specific problem is called z-fighting by the way, here's a great resource on this issue: http://www.codermind.com/articles/Depth-buffer-tutorial.html
The take-away is the last paragraph of the article above:
the true issue is that you can't draw
both objects that are very far and
objects that are very near with the
same depth buffer equations. If you
want to draw very far objects then you
need to sacrifice your near view by
pushing it further. To avoid clipping
artifacts you can make your collision
envelope large enough so that your
clip plane will never intercept an
existing object within your frustum.
Or you can make object gradually
disappear with transparency as they
come near your clip plane.
If you want to keep near objects and
at the same time draw mountains (or
planets) in the far distance, then you
can cut your rendering in parts. First
drawing your far objects, then
clearing the depth buffer and
rendering the near objects with a
different z buffer.
Like Julio, I believe that this is a depth precision issues, not something related to mip-mapping. However, I suggest you start by adjusting your near and far clipping plane before changing anything else (You are probably already using a 24-bit depth buffer anyways, as that is the default on most drivers/cards). Particularly the near plane should be as far away as possible for your scene. Look for calls to glFrustum or gluPerspective.