3d Occlusion Culling - opengl

I'm writing a Minecraft like static 3d block world in C++ / openGL. I'm working at improving framerates, and so far I've implemented frustum culling using an octree. This helps, but I'm still seeing moderate to bad frame rates. The next step would be to cull cubes that are hidden from the viewpoint by closer cubes. However I haven't been able to find many resources on how to accomplish this.

Create a render target with a Z-buffer (or "depth buffer") enabled. Then make sure to sort all your opaque objects so they are rendered front to back, i.e. the ones closest to the camera first. Anything using alpha blending still needs to be rendered back to front, AFTER you rendered all your opaque objects.
Another technique is occlusion culling: You can cheaply "dry-render" your geometry and then find out how many pixels failed the depth test. There is occlusion query support in DirectX and OpenGL, although not every GPU can do it.
The downside is that you need a delay between the rendering and fetching the result - depending on the setup (like when using predicated tiling), it may be a full frame. That means that you need to be creative there, like rendering a bounding box that is bigger than the object itself, and dismissing the results after a camera cut.
And one more thing: A more traditional solution (that you can use concurrently with occlusion culling) is a room/portal system, where you define regions as "rooms", connected via "portals". If a portal is not visible from your current room, you can't see the room connected to it. And even it is, you can click your viewport to what's visible through the portal.

The approach I took in this minecraft level renderer is essentially a frustum-limited flood fill. The 16x16x128 chunks are split into 16x16x16 chunklets, each with a VBO with the relevant geometry. I start a floodfill in the chunklet grid at the player's location to find chunklets to render. The fill is limited by:
The view frustum
Solid chunklets - if the entire side of a chunklet is opaque blocks, then the floodfill will not enter the chunklet in that direction
Direction - the flood will not reverse direction, e.g.: if the current chunklet is to the north of the starting chunklet, do not flood into the chunklet to the south
It seems to work OK. I'm on android, so while a more complex analysis (antiportals as noted by Mike Daniels) would cull more geometry, I'm already CPU-limited so there's not much point.
I've just seen your answer to Alan: culling is not your problem - it's what and how you're sending to OpenGL that is slow.
What to draw: don't render a cube for each block, render the faces of transparent blocks that border an opaque block. Consider a 3x3x3 cube of, say, stone blocks: There is no point drawing the center block because there is no way that the player can see it. Likewise, the player will never see the faces between two adjacent stone blocks, so don't draw them.
How to draw: As noted by Alan, use VBOs to batch geometry. You will not believe how much faster they make things.
An easier approach, with minimal changes to your existing code, would be to use display lists. This is what minecraft uses.

How many blocks are you rendering and on what hardware? Modern hardware is very fast and is very difficult to overwhelm with geometry (unless we're talking about a handheld platform). On any moderately recent desktop hardware you should be able to render hundreds of thousands of cubes per frame at 60 frames per second without any fancy culling tricks.
If you're drawing each block with a separate draw call (glDrawElements/Arrays, glBegin/glEnd, etc) (bonus points: don't use glBegin/glEnd) then that will be your bottleneck. This is a common pitfall for beginners. If you're doing this, then you need to batch together all triangles that share texture and shading parameters into a single call for each setup. If the geometry is static and doesn't change frame to frame, you want to use one Vertex Buffer Object for each batch of triangles.
This can still be combined with frustum culling with an octree if you typically only have a small portion of your total game world in the view frustum at one time. The vertex buffers are still loaded statically and not changed. Frustum cull the octree to generate only the index buffers for the triangles in the frustum and upload those dynamically each frame.

If you have surfaces close to the camera, you can create a frustum which represents an area that is not visible, and cull objects that are entirely contained in that frustum. In the diagram below, C is the camera, | is a flat surface near the camera, and the frustum-shaped region composed of . represents the occluded area. The surface is called an antiportal.
.
..
...
....
|....
|....
|....
|....
C |....
|....
|....
|....
....
...
..
.
(You should of course also turn on depth testing and depth writing as mentioned in other answer and comments -- it's very simple to do in OpenGL.)

The use of a Z-Buffer ensures that polygons overlap correctly.
Enabling the depth test makes every drawing operation check the Z-buffer before placing pixels onto the screen.
If you have convex objects you must (for performance) enable backface culling!
Example code:
glEnable(GL_CULL_FACE);
glEnable(GL_DEPTH_TEST);
glDepthMask(GL_TRUE);
You can change the behaviour of glCullFace() passing GL_FRONT or GL_BACK...
glCullFace(...);
// Draw the "game world"...

Related

Terrain Object collision detection

I've written my own 3D Game Engine in the past few years and wanted to actually use it for a game.
I stumbled accros the following problem:
I have multiple planes in my game but lets talk about one single plane.
Naturally, planes are not able to dive into the ground and fly under the terrain.
Therefor, I need to implement something that detects the collision between a plane/jet and my ground.
The informations given are the following:
Grid of terrain [2- dimensional array; stores height at according x,z coordinate]
Hitbox of my plane (it moves with my plane, so the bounds etc. are all already calculated and given)
So about the hitboxes:
I though about which method to use. The best one in terms of performance seems to be simple spheres with different radius.
About the ground: Graphically, the ground is subdivided into triangles:
So what I need now is the optimal type of hitbox (sphere, AABB,...) and the according most efficient calculations.
My attempt was to get every surrounding triangle and calculate the distance from that one to each center of my hitbox spheres. If the distance is less than the radius, it has successfully detected a collision. But when I have up to 10/20 spheres in my plane and like 100 triangles to check, it will take to much time.
Another attempt was to get the vertical distance to the ground from each hitbox sphere. This one needs way less calculations but fails when getting near steep surfaces.
I would be very happy if someone could help me implementing an efficient version of plane/terrain collision detection :)
render terrain
May be you could try liner depth buffer to improve accuracy.
read depth texture
you can use glReadPixels with GL_DEPTH_COMPONENT and GL_FLOAT. That will copy depth buffer into CPU side memory. So now you can do also collision on CPU side or any computation related to ground in view...
use the depth buffer as texture
so copy it back GPU with glTexImage2D. I know this is slow (but most likely much faster then your current computation of collision. In case you are not using Intel HD Graphics You can instead #2,#3 use FBO for depth which will render depth buffer directly to texture. But on Intel this does not work reliably (or at all).
now render your objects (off screen) with GLSL
inside fragment shader just compare rendered position with depth (attached as texture). If bellow output the collision somewhere. If done in compute shaders than you can store results in some texture. Or you could use some attachment or FBO for this.
In case you can not use FBO you could render to "screen" with specifically color encoded collisions. Then read it with glReadPixels and scan for it to handle what ever collision logic you have on CPU side...
Do not write to Depth buffer in this pass !!! And also do not use CULL_FACE because that could miss some collision of the back side of your object.
now render the objects normally
in case you do not render in #4 or you encode collision to screen buffer you need to overwrite/render the stuff. Otherwise this step is not needed. But rendering after collision detection is good because in case of collision you most likely change the object position/orientation/mesh and already rendered object could be hindering the altered one.
[Notes]
Copying image between CPU and GPU is slow so use FBO and render to texture if you can instead.
If you are not familiar with multiple pass rendering see some QAs for inspiration:
OpenGL Scale Single Pixel Line
Render filled complex polygons with large number of vertices with OpenGL
This works only in view ... but you can do just collision rendering pass (per object). Render with camera set to view from top to down (birdseye) and covering only area around your object... Also you do not need too big resolution for this so it should be relatively fast ... So you can divide your screen to square areas (using glViewport) testing more objects in single frame to lover the sync time slowdowns as much as possible (use less glReadPixel calls). Also you do not need any vertex colors or textures for this.

Frustum culling with VBOs

I am planning on writing a 3D game that will be using VBOs for rendering. Let's say, for example, that the terrain is a set of tiles and their vertices are all in the same VBO. The player should be able to scroll through the tiles, and at all times would see only a part of them.
I would like to perform frustum culling on those tiles. I already have found some sources on the maths part of frustum culling, but I am not sure how I would go about implementing this with a VBO - do people do that somehow in the vertex shader, or do they just call the rendering function to draw a subset of the VBO.
Given that your camera acts like in Diablo (wherever Isometric or with Perspective):
If you have a fixed map size, you can use 1 VBO for the base geometry of your map, Assuming you will use a heightmap based solution. The Quads not visible will be discarded by your graphics card after the vertex shader, not affecting your pixel fillrate. They are not worth the overhead of culling on your side. Details like Rocks, Houses etc will have their own VBO anyways.
If you aim for a streaming content engine with a huge seamless world, create chunks, the size of a chunk depends on your game. Divide your terrain into those chunks and test the camera frustum against their bounding boxes before drawing.
About drawing chunks:
The simplest way, which is enough for most games, is to make each chunk its own geometry, VBO, and so on. But you can optimize later and your terrain implementation should not drive your engine API designs (you will have to implement many different ways to draw things in your engine, for instance particles, post processing effects, etc..).
One way you can optimize is with only one VBO for the geometry and the usage of instanced drawing, just like in particle systems you then take some other source for some of your data, like your global transformation, the height of each vertex and so on.
But keep in mind, most games dont really need that much optimization in just the terrain. There will come other systems across your path more worthy of optimizations.

How to efficiently implement "point-on-heightmap" picking in OpenGL?

I want to track the mouse coordinates in my OpenGL scene on the ground surface of the world which is modeled as a height map. Currently there is no fancy stuff like hardware tessellation. Note that this question is not about object picking.
Currently I'm doing the following which is clearly dropping the performance because of a read-back operation:
Render the world (the ground surface)
Read back the depth value at the mouse coordinates
Render the rest of the scene
Swap buffers and render the next frame
The read back is between the two render steps because I want the depth value of the ground surface without any objects in front of it. It is done using the following command:
GLfloat depth;
glReadPixels(x, y, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &depth);
My application limits the frame rate to 60 frames per second. When rendering the scene without the read back operation, I experience a CPU usage of less than 5%, but when doing the read back, it increases to about 75% although I'm not doing much to render the scene or update any game model or such things.
A temporary solution is to cache the depth value of the pixel under the mouse and update it only every 5th or 10th frame which causes the CPU usage going back down below 10%. But clearly can't be the best solution to the problem.
How can I implement picking (not object picking since I want the (floating point) coordinates on the surface) efficiently?
I already thought of reading back the depth value of the front buffer instead of the back buffer, but when googling on how to do so, I only find people complaining about glRead* methods to be best avoided at all. But how can I read something (do picking) without reading something (using glRead*)?
I'm confused. How do other people implement picking?
A totally different approach would be implementing the world surface picking in software. It should be no big deal to reconstruct a 3D ray from the camera "into the depth", representing the points in space which are rendered at the target pixel. Then I could implement an intersection algorithm to find the front-most point on the surface.
You typically implement it on the CPU! Find your picking ray in heightmap coordinates and do a simple line-trace across the heightmap. This is very similar to line-drawing. In each cell you intersect, test against the triangles you used to triangulate it.
It is important to avoid reading from the GPU until it's done. Since you normally schedule drawing commands several frames ahead (GL does this automatically), this means that you will also only get the results then - or stall the CPU until the GPU caught up. But don't do that for simple things like this!

OpenGL voxel engine slow

I'm making a voxel engine in C++ and OpenGL (à la Minecraft) and can't get decent fps on my 3GHz with ATI X1600... I'm all out of ideas.
When I have about 12000 cubes on the screen it falls to under 20fps - pathetic.
So far the optimizations I have are: frustum culling, back face culling (via OpenGL's glEnable(GL_CULL_FACE)), the engine draws only the visible faces (except the culled ones of course) and they're in an octree.
I've tried VBO's, I don't like them and they do not significantly increase the fps.
How can Minecraft's engine be so fast... I struggle with a 10000 cubes, whereas Minecraft can easily draw much more at higher fps.
Any ideas?
#genpfault: I analyze the connectivity and just generate faces for the outer, visible surface. The VBO had a single cube that I glTranslate()d
I'm not an expert at OpenGL, but as far as I understand this is going to save very little time because you still have to send every cube to the card.
Instead what you should do is generate faces for all of the outer visible surface, put that in a VBO, and send it to the card and continue to render that VBO until the geometry changes. This saves you a lot of the time your card is actually waiting on your processor to send it the geometry information.
You should profile your code to find out if the bottleneck in your application is on the CPU or GPU. For instance it might be that your culling/octtree algorithms are slow and in that case it is not an OpenGL-problem at all.
I would also keep count of the number of cubes you draw on each frame and display that on screen. Just so you know your culling routines work as expected.
Finally you don't mention if your cubes are textured. Try using smaller textures or disable textures and see how much the framerate increases.
gDEBugger is a great tool that will help you find bottlenecks with OpenGL.
I don't know if it's ok here to "bump" an old question but a few things came up my mind:
If your voxels are static you can speed up the whole rendering process by using an octree for frustum culling, etc. Furthermore you can also compile a static scene into a potential-visibility-set in the octree. The main principle of PVS is to precompute for evere node in the tree which other nodes are potential visible from it and store pointers to them in a vector. When it comes to rendering you first check in which node the camera is placed and then run frustum culling against all nodes in the PVS-vector of the node.(Carmack used something like that in the Quake engines, but with Binary Space Partitioning trees)
If the shading of your voxels is kindalike complex it is also fast to do a pre-Depth-Only-Pass, without writing into the colorbuffer,just to fill the Depthbuffer. After that you render a 2nd pass: disable writing to the Depthbuffer and render only to the Colorbuffer while checking the Depthbuffer. So you avoid expensive shader-computations which are later overwritten by a new fragment which is closer to the viewer.(Carmack used that in Quake3)
Another thing which will definitely speed up things is the use of Instancing. You store only the position of each voxel and, if nescessary, its scale and other parameters into a texturebufferobject. In the vertexshader you can then read the positions of the voxels to be spawned and create an instance of the voxel(i.e. a cube which is given to the shader in a vertexbufferobject). So you send the 8 Vertices + 8 Normals (3 *sizeof(float) *8 +3 *sizeof(float) *8 + floats for color/texture etc...) only once to the card in the VBO and then only the positions of the instances of the Cube(3*sizeof(float)*number of voxels) in the TBO.
Maybe it is possibile to parallelize things between GPU and CPU by combining all 3 steps in 2 threads, in the CPU-thread you check the octrees pvs and update a TBO for instancing in the next frame, the GPU-thread does meanwhile render the 2 passes while using an TBO for instancing which was created by the CPU thread in the previous step. After that you switch TBOs. If the Camera has not moved you don't even have to do the CPU-calculations again.
Another kind of tree you me be interested in is the so called k-d-tree, which is more general than octrees.
PS: sorry for my english, it's not the clearest....
There are 3rd-party libraries you could use to make the rendering more efficient. For example the C++ PolyVox library can take a volume and generate the mesh for you in an efficient way. It has built-in methods for reducing triangle count and helping to generate things like ambient occlusion. It's got a good community around it so getting support on the forum should be easy.
Have you used a common display list for all your cubes ?
Do you skip calling drawing code of cubes which are not visible to the user ?

OpenGL 2D game question

I want to make a game with Worms-like destructible terrain in 2D, using OpenGL.
What is the best approach for this?
Draw pixel per pixel? (Uh, not good?)
Have the world as a texture and manipulate it (is that possible?)
Thanks in advance
Thinking about the way Worms terrain looked, I came up with this idea. But I'm not sure how you would implement it in OpenGL. It's more of a layered 2D drawing approach. I'm posting the idea anyway. I've emulated the approach using Paint.NET.
First, you have a background sky layer.
And you have a terrain layer.
The terrain layer is masked so the top portion isn't drawn. Draw the terrain layer on top of the sky layer to form the scene.
Now for the main idea. Any time there is an explosion or other terrain-deforming event, you draw a circle or other shape on the terrain layer, using the terrain layer itself as a drawing mask (so only the part of the circle that overlaps existing terrain is drawn), to wipe out part of the terrain. Use a transparent/mask-color brush for the fill and some color similar to the terrain for the thick pen.
You can repeat this process to add more deformations. You could keep this layer in memory and add deformations as they occur or you could even render them in memory each frame if there aren't too many deformations to render.
I guess you'd better use texture-filled polygons with the correct mapping (a linear one that doesn't stretch the texture to use all the texels, but leaves the cropped areas out), and then reshape them as they get destroyed.
I'm assuming your problem will be to implement the collision between characters/weapons/terrain.
As long as you aren't doing this on opengl es, you might be able to get away with using the stencil buffer to do per-pixel collision detection and have your terrain be a single modifyable texture.
This page will give an idea:
http://kometbomb.net/2007/07/11/hardware-accelerated-2d-collision-detection-in-opengl/
The way I imagine it is this:
a plane with the texture applied
a path( a vector of points/segments ) used for ground collisions.
When something explodes, you do a boolean operation (rectangle-circle) for the texture(revealing the background) and for the 'walkable' path.
What I'm trying to say is you do a geometric boolean operation and you use the result to update the texture(with an alpha mask or something) and update the data structure you use to keep track of the walkable area(which ever that might be).
Split things up, instead of relying only on gl draw methods
I think I would start by drawing the foreground into the stencil buffer so the stencil buffer is set to 1 bits anywhere there's foreground, and 0 elsewhere (where you want your sky to show).
Then to draw a frame, you draw your sky, enable the stencil buffer, and draw the foreground. For the initial frame (before any explosion has destroyed part of the foreground) the stencil buffer won't really be doing anything.
When you do have an explosion, however, you draw it to the stencil buffer (clearing the stencil buffer for that circle). Then you re-draw your data as before: draw the sky, enable the stencil buffer, and draw the foreground.
This lets you get the effect you want (the foreground disappears where desired) without having to modify the foreground texture at all. If you prefer not to use the stencil buffer, the alternative that seems obvious to me would be to enable blending, and just manipulate the alpha channel of your foreground texture -- set the alpha to 0 (transparent) where it's been affected by an explosion. IMO, the stencil buffer is a bit cleaner approach, but manipulating the alpha channel is pretty simple as well.
I think, but this is just a quick idea, that a good way might be to draw a Very Large Number of Lines.
I'm thinking that you represent the landscape as a bunch of line segments, for each column of the screen you have 0..n vertical lines, that make up the ground:
12 789
0123 6789
0123456789
0123456789
In the above awesomeness, the column of "0":s makes up a single line, and so on. I didn't try to illustrate the case where a single pixel column has more than one line, since it's a bit hard in this coarse format.
I'm not sure this will be efficient, but it at least makes some sense since lines are an OpenGL primitive.
You can color and texture the lines by enabling texture-mapping and specifying the desired texture coordinates for each line segment.
Typically the way I have seen it done is to have each entity be a textured quad, then update the texture for animation. For a destructible terrain it might be best to break the train into tiles then you only have to update the ones that have changed. Don't use GLdrawpixels it is probably the slowest approach possible (outside of reloading textures from disk every frame though it would be close.)