Voxel rendering: pass data from CPU to GPU? - opengl

I'm trying to make a voxel-style game, like Voxatron but I think that my method of rendering is not efficient. The idea is to set the position and color of each voxel and send this data to the GPU each frame. The problem is that if I have a big square of 50x50x50 voxels, I have 125000 * 4 floats * 3 ints to send and it's clearly overloading the CPU-GPU bus (because if I only send the data once for all, I get much more FPS with the same scene). I could do some spatial optimization but my requirement is that each voxel can change arbitrarily each frame so I think it's not worth calculating.
The rendering itself uses the instancied draw of OpenGL so I only draw the voxels set. How can I do that efficiently (I repeat, don't talk me about chunk or whatever, I want the voxels to change arbitrarily each frame anyway). Maybe should I drop OpenGL and do everything in software (like Voxatron)?

Related

Terrain Object collision detection

I've written my own 3D Game Engine in the past few years and wanted to actually use it for a game.
I stumbled accros the following problem:
I have multiple planes in my game but lets talk about one single plane.
Naturally, planes are not able to dive into the ground and fly under the terrain.
Therefor, I need to implement something that detects the collision between a plane/jet and my ground.
The informations given are the following:
Grid of terrain [2- dimensional array; stores height at according x,z coordinate]
Hitbox of my plane (it moves with my plane, so the bounds etc. are all already calculated and given)
So about the hitboxes:
I though about which method to use. The best one in terms of performance seems to be simple spheres with different radius.
About the ground: Graphically, the ground is subdivided into triangles:
So what I need now is the optimal type of hitbox (sphere, AABB,...) and the according most efficient calculations.
My attempt was to get every surrounding triangle and calculate the distance from that one to each center of my hitbox spheres. If the distance is less than the radius, it has successfully detected a collision. But when I have up to 10/20 spheres in my plane and like 100 triangles to check, it will take to much time.
Another attempt was to get the vertical distance to the ground from each hitbox sphere. This one needs way less calculations but fails when getting near steep surfaces.
I would be very happy if someone could help me implementing an efficient version of plane/terrain collision detection :)
render terrain
May be you could try liner depth buffer to improve accuracy.
read depth texture
you can use glReadPixels with GL_DEPTH_COMPONENT and GL_FLOAT. That will copy depth buffer into CPU side memory. So now you can do also collision on CPU side or any computation related to ground in view...
use the depth buffer as texture
so copy it back GPU with glTexImage2D. I know this is slow (but most likely much faster then your current computation of collision. In case you are not using Intel HD Graphics You can instead #2,#3 use FBO for depth which will render depth buffer directly to texture. But on Intel this does not work reliably (or at all).
now render your objects (off screen) with GLSL
inside fragment shader just compare rendered position with depth (attached as texture). If bellow output the collision somewhere. If done in compute shaders than you can store results in some texture. Or you could use some attachment or FBO for this.
In case you can not use FBO you could render to "screen" with specifically color encoded collisions. Then read it with glReadPixels and scan for it to handle what ever collision logic you have on CPU side...
Do not write to Depth buffer in this pass !!! And also do not use CULL_FACE because that could miss some collision of the back side of your object.
now render the objects normally
in case you do not render in #4 or you encode collision to screen buffer you need to overwrite/render the stuff. Otherwise this step is not needed. But rendering after collision detection is good because in case of collision you most likely change the object position/orientation/mesh and already rendered object could be hindering the altered one.
[Notes]
Copying image between CPU and GPU is slow so use FBO and render to texture if you can instead.
If you are not familiar with multiple pass rendering see some QAs for inspiration:
OpenGL Scale Single Pixel Line
Render filled complex polygons with large number of vertices with OpenGL
This works only in view ... but you can do just collision rendering pass (per object). Render with camera set to view from top to down (birdseye) and covering only area around your object... Also you do not need too big resolution for this so it should be relatively fast ... So you can divide your screen to square areas (using glViewport) testing more objects in single frame to lover the sync time slowdowns as much as possible (use less glReadPixel calls). Also you do not need any vertex colors or textures for this.

How do I get started with a GPU voxelizer?

I've been reading various articles about how to write a GPU voxelizer. From my understanding the process goes like this:
Inspect the triangles individually and decide the axis that displays the triangle in the largest way. Call this the dominant axis.
Render the triangle on its dominant axis and sample the texels that come out.
Write that texel data onto a 3D texture and then do what you will with the data
Disregarding conservative rasterization, I have a lot of questions regarding this process.
I've gotten as far as rendering each triangle, choosing a dominant axis and orthogonally projecting it. What should the values of the orthogonal projection be? Should it be some value based around the size of the voxels or how large of an area the map should cover?
What am I supposed to do in the fragment shader? How do I write to my 3D texture such that it stores the voxel data? From my understanding, due to choosing the dominant axis we can't have more than a depth of 1 voxel for each fragment. However, since we projected orthogonally I don't see how that would reflect onto the 3D texture.
Finally, I am wondering on where to store the texture data. I know it's a bad idea to store data CPU side since you have to pass it all in to use it on the GPU, however the sourcecode I am kind of following chooses to store all its texture on the CPU side, such as those for a light map. My assumption is that data that will only be used on the GPU should be stored there and data used on both should be stored on the CPU side of things. So, from this I store my data on the CPU side. Is that correct?
My main sources have been: https://www.seas.upenn.edu/~pcozzi/OpenGLInsights/OpenGLInsights-SparseVoxelization.pdf OpenGL Insights
https://github.com/otaku690/sparsevoxeloctree A SVO using a voxelizer. The issue is that the shader code is not in the github.
In my own implementation, the whole scene is positioned and scaled into one unit cube centered on world origin. The modelview-project matrices are straightforward then. And the viewport is simply the desired voxel resolution.
I use 2-pass approach to output those voxel fragments: the 1st pass calculate the number of output voxel fragments by accumulating a single variable using atomic counter. Then I use the info to allocate a linear buffer.
In the 2nd pass the rasterized voxel fragments are stored into the allocated linear buffer, using atomic counter to avoid write conflict.

Height Map of accoustic data

I have the following problem (no code yet):
We have a data set of 4000 x 256 with a 16 bit resolution, and I need to code a program to display this data.
I wanted to use DirectX or OpenGL to do so, but I don't know what the proper approach is.
Do I create a buffer with 4000 x 256 triangles with the resolution being the y axis, or would I go ahead and create a single quad and then manipulate the data by using tesselation?
When would I use a big vertex buffer over tesselation and vice versa?
It really depends on a lot of factors.
You want to render a map of about 1million pixels\vertices. Depending on your hardware this could be doable with the most straight forward technique.
Out of my head I can think of 3 techniques:
1) Create a grid of 4000x256 vertices and set their height according to the height map image of your data.
You set the data once upon creation. The shaders will just draw the static buffer and a apply a single transform matrix(world\view\projection) to all the vertices.
2) Create a grid of 4000x256 vertices with height 0 and translate each vertex's height inside the vertex shader by the sampled height map data.
3) The same as 2) only you add a tessellation phase.
The advantage of doing tessellation is that you can use a smaller vertex buffer AND you can dynamically tessellate in run time.
This mean you can make part of your grid more tessellated and part of it less tessellated. For instance maybe you want to tessellate more only where the user is viewing the grid.
btw, you can't tesselate one quad into a million quads, there is a limit how much a single quad can tessellate. But you can tessellate it quite a lot, in any case you will gain several factors of reduced grid size.
If you never used DirectX or OpenGL I would go with 1. See if it's fast enough and only if it's not fast enough go with 2 and last go to 3.
The fact that you know the theory behind 3D graphics rendering doesn't mean it will be easy for you to learn DirectX or OpenGL. They are difficult to understand and learn because they are quite complex as an API.
If you want you can take a look at some tessellation stuff I did using DirectX11:
http://pompidev.net/2012/09/25/tessellation-simplified/
http://pompidev.net/2012/09/29/tessellation-update/

How to efficiently implement "point-on-heightmap" picking in OpenGL?

I want to track the mouse coordinates in my OpenGL scene on the ground surface of the world which is modeled as a height map. Currently there is no fancy stuff like hardware tessellation. Note that this question is not about object picking.
Currently I'm doing the following which is clearly dropping the performance because of a read-back operation:
Render the world (the ground surface)
Read back the depth value at the mouse coordinates
Render the rest of the scene
Swap buffers and render the next frame
The read back is between the two render steps because I want the depth value of the ground surface without any objects in front of it. It is done using the following command:
GLfloat depth;
glReadPixels(x, y, 1, 1, GL_DEPTH_COMPONENT, GL_FLOAT, &depth);
My application limits the frame rate to 60 frames per second. When rendering the scene without the read back operation, I experience a CPU usage of less than 5%, but when doing the read back, it increases to about 75% although I'm not doing much to render the scene or update any game model or such things.
A temporary solution is to cache the depth value of the pixel under the mouse and update it only every 5th or 10th frame which causes the CPU usage going back down below 10%. But clearly can't be the best solution to the problem.
How can I implement picking (not object picking since I want the (floating point) coordinates on the surface) efficiently?
I already thought of reading back the depth value of the front buffer instead of the back buffer, but when googling on how to do so, I only find people complaining about glRead* methods to be best avoided at all. But how can I read something (do picking) without reading something (using glRead*)?
I'm confused. How do other people implement picking?
A totally different approach would be implementing the world surface picking in software. It should be no big deal to reconstruct a 3D ray from the camera "into the depth", representing the points in space which are rendered at the target pixel. Then I could implement an intersection algorithm to find the front-most point on the surface.
You typically implement it on the CPU! Find your picking ray in heightmap coordinates and do a simple line-trace across the heightmap. This is very similar to line-drawing. In each cell you intersect, test against the triangles you used to triangulate it.
It is important to avoid reading from the GPU until it's done. Since you normally schedule drawing commands several frames ahead (GL does this automatically), this means that you will also only get the results then - or stall the CPU until the GPU caught up. But don't do that for simple things like this!

Do VBOs boost performance even when all data changes frequently

I'm doing a 2D turn based RTS game with 32x32 tiles (400-500 tiles per frame). I could use a VBO for this, but I may have to change almost all the VBO data each frame, as the background is a scrolling one and the visible tiles will change every time the map scrolls. Will using VBOs rather than client side vertex arrays still yield a performance benefit here? Also if using VBOs which data format is most efficient (float, or int16, or ...)?
If you are simply scrolling, you can use the vertex shader to manipulate the position rather than update the vertices themselves. Pass in a 'scroll' value as a uniform to your background and simply add that value to the x (or y, or whatever applies to your case) value of each vertex.
Update:
If you intend to modify the VBO often, you can tell the driver this using the usage param of glBufferData. This page has a good description of how that works: http://www.opengl.org/wiki/Vertex_Buffer_Object, under Accessing VBOs. In your case, it looks like you should specify GL_DYNAMIC_DRAW to glBufferData so that the driver puts your VBO in the best place in memory for your application.
The regular approach is to move the camera and perform culling instead of updating the content of the VBOs. For a 2d game culling will use simple rectangle intersection algorithm, which you will need anyway for unit selection in the game. As a bonus, manipulating the camera will allow to rate the camera and zoom in and zoom out. Also you could combine several tiles (4, 9 or 16) into one VBO.
I would strongly advise against writing logic to move the tiles instead of the camera. It will take you longer, have more bugs, and be less flexible.
The format will depend on what data you are storing in the VBOs. When in doubt, just use uint8 for color and float32 for everything else. Though for a 2d game your VBOs or vertex array are going to be very small compared to 3d applications, so it's highly unlikely VBO will make any difference.