I want to track the mouse coordinates in my OpenGL scene on the ground surface of the world which is modeled as a height map. Currently there is no fancy stuff like hardware tessellation. Note that this question is not about object picking.
Currently I'm doing the following which is clearly dropping the performance because of a read-back operation:
Render the world (the ground surface)
Read back the depth value at the mouse coordinates
Render the rest of the scene
Swap buffers and render the next frame
The read back is between the two render steps because I want the depth value of the ground surface without any objects in front of it. It is done using the following command:
GLfloat depth;
glReadPixels(x, y, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &depth);
My application limits the frame rate to 60 frames per second. When rendering the scene without the read back operation, I experience a CPU usage of less than 5%, but when doing the read back, it increases to about 75% although I'm not doing much to render the scene or update any game model or such things.
A temporary solution is to cache the depth value of the pixel under the mouse and update it only every 5th or 10th frame which causes the CPU usage going back down below 10%. But clearly can't be the best solution to the problem.
How can I implement picking (not object picking since I want the (floating point) coordinates on the surface) efficiently?
I already thought of reading back the depth value of the front buffer instead of the back buffer, but when googling on how to do so, I only find people complaining about glRead* methods to be best avoided at all. But how can I read something (do picking) without reading something (using glRead*)?
I'm confused. How do other people implement picking?
A totally different approach would be implementing the world surface picking in software. It should be no big deal to reconstruct a 3D ray from the camera "into the depth", representing the points in space which are rendered at the target pixel. Then I could implement an intersection algorithm to find the front-most point on the surface.
You typically implement it on the CPU! Find your picking ray in heightmap coordinates and do a simple line-trace across the heightmap. This is very similar to line-drawing. In each cell you intersect, test against the triangles you used to triangulate it.
It is important to avoid reading from the GPU until it's done. Since you normally schedule drawing commands several frames ahead (GL does this automatically), this means that you will also only get the results then - or stall the CPU until the GPU caught up. But don't do that for simple things like this!
Related
I've written my own 3D Game Engine in the past few years and wanted to actually use it for a game.
I stumbled accros the following problem:
I have multiple planes in my game but lets talk about one single plane.
Naturally, planes are not able to dive into the ground and fly under the terrain.
Therefor, I need to implement something that detects the collision between a plane/jet and my ground.
The informations given are the following:
Grid of terrain [2- dimensional array; stores height at according x,z coordinate]
Hitbox of my plane (it moves with my plane, so the bounds etc. are all already calculated and given)
So about the hitboxes:
I though about which method to use. The best one in terms of performance seems to be simple spheres with different radius.
About the ground: Graphically, the ground is subdivided into triangles:
So what I need now is the optimal type of hitbox (sphere, AABB,...) and the according most efficient calculations.
My attempt was to get every surrounding triangle and calculate the distance from that one to each center of my hitbox spheres. If the distance is less than the radius, it has successfully detected a collision. But when I have up to 10/20 spheres in my plane and like 100 triangles to check, it will take to much time.
Another attempt was to get the vertical distance to the ground from each hitbox sphere. This one needs way less calculations but fails when getting near steep surfaces.
I would be very happy if someone could help me implementing an efficient version of plane/terrain collision detection :)
render terrain
May be you could try liner depth buffer to improve accuracy.
read depth texture
you can use glReadPixels with GL_DEPTH_COMPONENT and GL_FLOAT. That will copy depth buffer into CPU side memory. So now you can do also collision on CPU side or any computation related to ground in view...
use the depth buffer as texture
so copy it back GPU with glTexImage2D. I know this is slow (but most likely much faster then your current computation of collision. In case you are not using Intel HD Graphics You can instead #2,#3 use FBO for depth which will render depth buffer directly to texture. But on Intel this does not work reliably (or at all).
now render your objects (off screen) with GLSL
inside fragment shader just compare rendered position with depth (attached as texture). If bellow output the collision somewhere. If done in compute shaders than you can store results in some texture. Or you could use some attachment or FBO for this.
In case you can not use FBO you could render to "screen" with specifically color encoded collisions. Then read it with glReadPixels and scan for it to handle what ever collision logic you have on CPU side...
Do not write to Depth buffer in this pass !!! And also do not use CULL_FACE because that could miss some collision of the back side of your object.
now render the objects normally
in case you do not render in #4 or you encode collision to screen buffer you need to overwrite/render the stuff. Otherwise this step is not needed. But rendering after collision detection is good because in case of collision you most likely change the object position/orientation/mesh and already rendered object could be hindering the altered one.
[Notes]
Copying image between CPU and GPU is slow so use FBO and render to texture if you can instead.
If you are not familiar with multiple pass rendering see some QAs for inspiration:
OpenGL Scale Single Pixel Line
Render filled complex polygons with large number of vertices with OpenGL
This works only in view ... but you can do just collision rendering pass (per object). Render with camera set to view from top to down (birdseye) and covering only area around your object... Also you do not need too big resolution for this so it should be relatively fast ... So you can divide your screen to square areas (using glViewport) testing more objects in single frame to lover the sync time slowdowns as much as possible (use less glReadPixel calls). Also you do not need any vertex colors or textures for this.
I've implemented Game of Life using OpenGL buffers (as specified here: http://www.glprogramming.com/red/chapter14.html#name20). In this implementation each pixel is a cell in the game.
My program receives the initial state of the game (2d array). The size array ,in my implementation, is the size of the window. This of course makes it "unplayable" if the array is 5x5 or some other small values.
At each iteration I'm reading the content of the framebuffer into a 2D array (its size is the window size):
glReadPixels(0, 0, win_x, win_y, GL_RGB, GL_UNSIGNED_BYTE, image);
Then, I'm doing the necessary steps to calculate the living and dead cells, and then draw a rectangle which covers the whole window, using:
glRectf(0, 0, win_x, win_y);
I want to zoom (or enlarge) the window without affecting the correctness of my code. If I resize the window, then the framebuffer content won't fit inside image(the array). Is there a way of zooming the window(so that each pixel be drawn as several pixels) without affecting the framebuffer?
First, you seem to be learning opengl 2, I would suggest instead learning a newer version, as it is more powerful and efficient. A good tutorial can be found here http://www.opengl-tutorial.org/
If i understand this correctly, you read in an initial state and draw it, then continuously read in the pixels on the screen, update the array based on the game of life logic then draw it back? this seems overly complicated.
The reading of the pixels on the screen is unnecessary, and will cause complications if you try to enlarge the rects to more than a pixel.
I would say a good solution would be to keep a bit array (1 is a organism, 0 is not), possibly as a 2d array in memory, updating the logic every say 30 iterations (for 30 fps), then drawing all the rects to the screen, black for 1, white for 0 using glColor(r,g,b,a) tied to an in statement in a nested for loop.
Then, if you give your rects a negative z coord, you can "zoom in" using glTranslate(x,y,z) triggered by a keyboard button.
Of course in a newer version of opengl, vertex buffers would make the code much cleaner and efficient.
You can't store your game state directly the window framebuffer and then resize it for rendering, since what is stored in the framebuffer is by definition what is about to be rendered. (You could overwrite it, but then you lose your game state...) The simplest solution would just to store the game state in an array (on the client side) and then update a texture based on that. Thus for each block that was set, you could set a pixel in a texture to be the appropriate color. Each frame, you then render a full screen quad with that texture (with GL_NEAREST filtering).
However, if you want to take advantage of your GPU there are some tricks that could massively speed up the simulation by using a fragment shader to generate the texture. In this case you would actually have two textures that you ping-pong between: one containing the current game state, and the other containing the next game state. Each frame you would use your fragment shader (along with a FBO) to generate the next state texture from the current state texture. Afterwards, the two textures are swapped, making the next state become the current state. The current state texture would then be rendered to the screen the same way as above.
I tried to give an overview of how you might be able to offload the computation onto the GPU, but if I was unclear anywhere just ask! For a more detailed explanation feel free to ask another question.
I'm writing a Minecraft like static 3d block world in C++ / openGL. I'm working at improving framerates, and so far I've implemented frustum culling using an octree. This helps, but I'm still seeing moderate to bad frame rates. The next step would be to cull cubes that are hidden from the viewpoint by closer cubes. However I haven't been able to find many resources on how to accomplish this.
Create a render target with a Z-buffer (or "depth buffer") enabled. Then make sure to sort all your opaque objects so they are rendered front to back, i.e. the ones closest to the camera first. Anything using alpha blending still needs to be rendered back to front, AFTER you rendered all your opaque objects.
Another technique is occlusion culling: You can cheaply "dry-render" your geometry and then find out how many pixels failed the depth test. There is occlusion query support in DirectX and OpenGL, although not every GPU can do it.
The downside is that you need a delay between the rendering and fetching the result - depending on the setup (like when using predicated tiling), it may be a full frame. That means that you need to be creative there, like rendering a bounding box that is bigger than the object itself, and dismissing the results after a camera cut.
And one more thing: A more traditional solution (that you can use concurrently with occlusion culling) is a room/portal system, where you define regions as "rooms", connected via "portals". If a portal is not visible from your current room, you can't see the room connected to it. And even it is, you can click your viewport to what's visible through the portal.
The approach I took in this minecraft level renderer is essentially a frustum-limited flood fill. The 16x16x128 chunks are split into 16x16x16 chunklets, each with a VBO with the relevant geometry. I start a floodfill in the chunklet grid at the player's location to find chunklets to render. The fill is limited by:
The view frustum
Solid chunklets - if the entire side of a chunklet is opaque blocks, then the floodfill will not enter the chunklet in that direction
Direction - the flood will not reverse direction, e.g.: if the current chunklet is to the north of the starting chunklet, do not flood into the chunklet to the south
It seems to work OK. I'm on android, so while a more complex analysis (antiportals as noted by Mike Daniels) would cull more geometry, I'm already CPU-limited so there's not much point.
I've just seen your answer to Alan: culling is not your problem - it's what and how you're sending to OpenGL that is slow.
What to draw: don't render a cube for each block, render the faces of transparent blocks that border an opaque block. Consider a 3x3x3 cube of, say, stone blocks: There is no point drawing the center block because there is no way that the player can see it. Likewise, the player will never see the faces between two adjacent stone blocks, so don't draw them.
How to draw: As noted by Alan, use VBOs to batch geometry. You will not believe how much faster they make things.
An easier approach, with minimal changes to your existing code, would be to use display lists. This is what minecraft uses.
How many blocks are you rendering and on what hardware? Modern hardware is very fast and is very difficult to overwhelm with geometry (unless we're talking about a handheld platform). On any moderately recent desktop hardware you should be able to render hundreds of thousands of cubes per frame at 60 frames per second without any fancy culling tricks.
If you're drawing each block with a separate draw call (glDrawElements/Arrays, glBegin/glEnd, etc) (bonus points: don't use glBegin/glEnd) then that will be your bottleneck. This is a common pitfall for beginners. If you're doing this, then you need to batch together all triangles that share texture and shading parameters into a single call for each setup. If the geometry is static and doesn't change frame to frame, you want to use one Vertex Buffer Object for each batch of triangles.
This can still be combined with frustum culling with an octree if you typically only have a small portion of your total game world in the view frustum at one time. The vertex buffers are still loaded statically and not changed. Frustum cull the octree to generate only the index buffers for the triangles in the frustum and upload those dynamically each frame.
If you have surfaces close to the camera, you can create a frustum which represents an area that is not visible, and cull objects that are entirely contained in that frustum. In the diagram below, C is the camera, | is a flat surface near the camera, and the frustum-shaped region composed of . represents the occluded area. The surface is called an antiportal.
.
..
...
....
|....
|....
|....
|....
C |....
|....
|....
|....
....
...
..
.
(You should of course also turn on depth testing and depth writing as mentioned in other answer and comments -- it's very simple to do in OpenGL.)
The use of a Z-Buffer ensures that polygons overlap correctly.
Enabling the depth test makes every drawing operation check the Z-buffer before placing pixels onto the screen.
If you have convex objects you must (for performance) enable backface culling!
Example code:
glEnable(GL_CULL_FACE);
glEnable(GL_DEPTH_TEST);
glDepthMask(GL_TRUE);
You can change the behaviour of glCullFace() passing GL_FRONT or GL_BACK...
glCullFace(...);
// Draw the "game world"...
I'm making a voxel engine in C++ and OpenGL (à la Minecraft) and can't get decent fps on my 3GHz with ATI X1600... I'm all out of ideas.
When I have about 12000 cubes on the screen it falls to under 20fps - pathetic.
So far the optimizations I have are: frustum culling, back face culling (via OpenGL's glEnable(GL_CULL_FACE)), the engine draws only the visible faces (except the culled ones of course) and they're in an octree.
I've tried VBO's, I don't like them and they do not significantly increase the fps.
How can Minecraft's engine be so fast... I struggle with a 10000 cubes, whereas Minecraft can easily draw much more at higher fps.
Any ideas?
#genpfault: I analyze the connectivity and just generate faces for the outer, visible surface. The VBO had a single cube that I glTranslate()d
I'm not an expert at OpenGL, but as far as I understand this is going to save very little time because you still have to send every cube to the card.
Instead what you should do is generate faces for all of the outer visible surface, put that in a VBO, and send it to the card and continue to render that VBO until the geometry changes. This saves you a lot of the time your card is actually waiting on your processor to send it the geometry information.
You should profile your code to find out if the bottleneck in your application is on the CPU or GPU. For instance it might be that your culling/octtree algorithms are slow and in that case it is not an OpenGL-problem at all.
I would also keep count of the number of cubes you draw on each frame and display that on screen. Just so you know your culling routines work as expected.
Finally you don't mention if your cubes are textured. Try using smaller textures or disable textures and see how much the framerate increases.
gDEBugger is a great tool that will help you find bottlenecks with OpenGL.
I don't know if it's ok here to "bump" an old question but a few things came up my mind:
If your voxels are static you can speed up the whole rendering process by using an octree for frustum culling, etc. Furthermore you can also compile a static scene into a potential-visibility-set in the octree. The main principle of PVS is to precompute for evere node in the tree which other nodes are potential visible from it and store pointers to them in a vector. When it comes to rendering you first check in which node the camera is placed and then run frustum culling against all nodes in the PVS-vector of the node.(Carmack used something like that in the Quake engines, but with Binary Space Partitioning trees)
If the shading of your voxels is kindalike complex it is also fast to do a pre-Depth-Only-Pass, without writing into the colorbuffer,just to fill the Depthbuffer. After that you render a 2nd pass: disable writing to the Depthbuffer and render only to the Colorbuffer while checking the Depthbuffer. So you avoid expensive shader-computations which are later overwritten by a new fragment which is closer to the viewer.(Carmack used that in Quake3)
Another thing which will definitely speed up things is the use of Instancing. You store only the position of each voxel and, if nescessary, its scale and other parameters into a texturebufferobject. In the vertexshader you can then read the positions of the voxels to be spawned and create an instance of the voxel(i.e. a cube which is given to the shader in a vertexbufferobject). So you send the 8 Vertices + 8 Normals (3 *sizeof(float) *8 +3 *sizeof(float) *8 + floats for color/texture etc...) only once to the card in the VBO and then only the positions of the instances of the Cube(3*sizeof(float)*number of voxels) in the TBO.
Maybe it is possibile to parallelize things between GPU and CPU by combining all 3 steps in 2 threads, in the CPU-thread you check the octrees pvs and update a TBO for instancing in the next frame, the GPU-thread does meanwhile render the 2 passes while using an TBO for instancing which was created by the CPU thread in the previous step. After that you switch TBOs. If the Camera has not moved you don't even have to do the CPU-calculations again.
Another kind of tree you me be interested in is the so called k-d-tree, which is more general than octrees.
PS: sorry for my english, it's not the clearest....
There are 3rd-party libraries you could use to make the rendering more efficient. For example the C++ PolyVox library can take a volume and generate the mesh for you in an efficient way. It has built-in methods for reducing triangle count and helping to generate things like ambient occlusion. It's got a good community around it so getting support on the forum should be easy.
Have you used a common display list for all your cubes ?
Do you skip calling drawing code of cubes which are not visible to the user ?
I'm loading 3D objects (obj or 3ds or collada files) into my openGL application. The 3 environment is quite large (a few hundred metres in all axis').
My problem is that smaller 3D objects (i.e. in the order of ~< 1-2m ) don't appear to be depth-tested properly. Depending on the zoom of the camera, I can sometimes see the back face of the object (I have been using a simple cube for testing) or other faces becoming visible/invisible/torn. Please see the attached images for a better explanation.
I am led to believe the problem is due to mipmapping being enabled. I would either like to disable mipmapping (can someone suggest a simple, fast way to do this) or set the resolution to be greater for the mipmapped objects. Or am I barking up the wrong tree completely?
Thanks
Chris
That's the result of insufficient z-buffer precission, which is an issue in games that have huge worlds but (relatively) small objects. The immediate solution would be to try using a 24 bit z-buffer instead of a 16 bit one. Another way to tackle this would be to render the game world it two steps, first the big distant objects, then clearing the zbuffer and then drawing the closer objects.
This specific problem is called z-fighting by the way, here's a great resource on this issue: http://www.codermind.com/articles/Depth-buffer-tutorial.html
The take-away is the last paragraph of the article above:
the true issue is that you can't draw
both objects that are very far and
objects that are very near with the
same depth buffer equations. If you
want to draw very far objects then you
need to sacrifice your near view by
pushing it further. To avoid clipping
artifacts you can make your collision
envelope large enough so that your
clip plane will never intercept an
existing object within your frustum.
Or you can make object gradually
disappear with transparency as they
come near your clip plane.
If you want to keep near objects and
at the same time draw mountains (or
planets) in the far distance, then you
can cut your rendering in parts. First
drawing your far objects, then
clearing the depth buffer and
rendering the near objects with a
different z buffer.
Like Julio, I believe that this is a depth precision issues, not something related to mip-mapping. However, I suggest you start by adjusting your near and far clipping plane before changing anything else (You are probably already using a 24-bit depth buffer anyways, as that is the default on most drivers/cards). Particularly the near plane should be as far away as possible for your scene. Look for calls to glFrustum or gluPerspective.