What is the most efficient way to draw voxels (cubes) in opengl? - opengl

I would like to draw voxels by using opengl but it doesn't seem like it is supported. I made a cube drawing function that had 24 vertices (4 vertices per face) but it drops the frame rate when you draw 2500 cubes. I was hoping there was a better way. Ideally I would just like to send a position, edge size, and color to the graphics card. I'm not sure if I can do this by using GLSL to compile instructions as part of the fragment shader or vertex shader.
I searched google and found out about point sprites and billboard sprites (same thing?). Could those be used as an alternative to drawing a cube quicker? If I use 6, one for each face, it seems like that would be sending much less information to the graphics card and hopefully gain me a better frame rate.
Another thought is maybe I can draw multiple cubes using one drawelements call?
Maybe there is a better method altogether that I don't know about? Any help is appreciated.

Drawing voxels with cubes is almost always the wrong way to go (the exceptional case is ray-tracing). What you usually want to do is put the data into a 3D texture and render slices depending on camera position. See this page: https://developer.nvidia.com/gpugems/GPUGems/gpugems_ch39.html and you can find other techniques by searching for "volume rendering gpu".
EDIT: When writing the above answer I didn't realize that the OP was, most likely, interested in how Minecraft does that. For techniques to speed-up Minecraft-style rasterization check out Culling techniques for rendering lots of cubes. Though with recent advances in graphics hardware, rendering Minecraft through raytracing may become the reality.

What you're looking for is called instancing. You could take a look at glDrawElementsInstanced and glDrawArraysInstanced for a couple of possibilities. Note that these were only added as core operations relatively recently (OGL 3.1), but have been available as extensions quite a while longer.
nVidia's OpenGL SDK has an example of instanced drawing in OpenGL.

First you really should be looking at OpenGL 3+ using GLSL. This has been the standard for quite some time. Second, most Minecraft-esque implementations use mesh creation on the CPU side. This technique involves looking at all of the block positions and creating a vertex buffer object that renders the triangles of all of the exposed faces. The VBO is only generated when the voxels change and is persisted between frames. An ideal implementation would combine coplanar faces of the same texture into larger faces.

Related

OpenGL Geometry Extrusion with geometry Shader

With the GLE Tubing and Extrusion Library (http://www.linas.org/gle/) I am able to extrude 2D countours into 3D objects using OpenGL. The Library does all the work on the CPU and uses OpenGL immediate mode.
I guess doing the extrusion on the GPU using Geometry Shaders might be faster especially when rendering a lot of geometry. Since I do not yet have any experience with Geometry Shaders in OpenGL i would like to know if that is possible and what I have to pay attention to. Do you think it is a good Idea to move those computations to the GPU and that it will increase performance? It should also be possible to get the rendered geometry back to the CPU from the GPU, possibly using "Render to VBO".
If the geometry indeed changes every frame, you should do it on the GPU.
Keep in mind that every other solution that doesn't rely on the immediate mode will be faster than what you have right now. You might not even have to do it on the GPU.
But maybe you want to use shadow mapping instead, which is more efficient in some cases. It will also make it possible to render shadows for alpha tested objects like grass.
But it seems like you really need the resulting shadow geometry, so I'm not sure if that's an option for you.
Now back to the shadow volumes.
Extracting the shadow silhouette from a mesh using geometry shaders is a pretty complex process. But there's enough information about it on the internet.
Here's an article by Nvidia, which explains the process in detail:
Efficient and Robust Shadow Volumes Using Hierarchical Occlusion Culling and Geometry Shaders.
Here's another approach (from 2003) which doesn't even require geometry shaders, which could be interesting on low-end hardware:
http://de.slideshare.net/stefan_b/shadow-volumes-on-programmable-graphics-hardware
If you don't need the most efficient solution (using the shadow silhouette), you can also simply extract every triangle of the mesh on it's own. This is very easy using a geometry shader. I'd try that first before trying to implement silhouette extraction on the GPU.
About the "render to VBO" part of your question:
As far as I know there's no way to read the output of the geometry shader back to the CPU. Don't quote me on this, but I've never heard of a way to do this.

Deferred Rendering with Tile-Based culling Concept Problems

EDIT: I'm still looking for some help about the use of OpenCL or compute shaders. I would prefer to keep using OGL 3.3 and not have to deal with the bad driver support for OGL 4.3 and OpenCL 1.2, but I can't think of anyway to do this type of shading without using one of the two (to match lights and tiles). Is it possible to implement tile-based culling without using GPGPU?
I wrote a deferred render in OpenGL 3.3. Right now I don't do any culling for the light pass (I just render a full screen quad for every light). This (obviously) has a ton of overdraw. (Sometimes it is ~100%). Because of this I've been looking into ways to improve performance during the light pass. It seems like the best way in (almost) everyone's opinion is to cull the scene using screen space tiles. This was the method used in Frostbite 2. I read the the presentation from Andrew Lauritzen during SIGGRAPH 2010 (http://download-software.intel.com/sites/default/files/m/d/4/1/d/8/lauritzen_deferred_shading_siggraph_2010.pdf) , and I'm not sure I fully understand the concept. (and for that matter why it's better than anything else, and if it is better for me)
In the presentation Laurtizen goes over deferred shading with light volumes, quads, and tiles for culling the scene. According to his data, the tile based deferred renderer was the fastest (by far). I don't understand why it is though. I'm guessing it has something to do with the fact that for each tile, all the lights are batched together. In the presentation it says to read the G-Buffer once and then compute the lighting, but this doesn't make sense to me. In my mind, I would implement this like this:
for each tile {
for each light effecting the tile {
render quad (the tile) and compute lighting
blend with previous tiles (GL_ONE, GL_ONE)
}
}
This would still involve sampling the G-Buffer a lot. I would think that doing that would have the same (if not worse) performance than rendering a screen aligned quad for every light. From how it's worded though, it seems like this is what's happening:
for each tile {
render quad (the tile) and compute all lights
}
But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs . Can anyone help me with this? It also seems like almost every tile based deferred renderer uses compute shaders or OpenCL (to batch the lights), why is this, and if I didn't use these what would happen?
But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs .
It rather depends on how many lights you have. The "instruction limits" are pretty high; it's generally not something you need to worry about outside of degenerate cases. Even if 100+ lights affects a tile, odds are fairly good that your lighting computations aren't going to exceed instruction limits.
Modern GL 3.3 hardware can run at least 65536 dynamic instructions in a fragment shader, and likely more. For 100 lights, that's still 655 instructions per light. Even if you take 2000 instructions to compute the camera-space position, that still leaves 635 instructions per light. Even if you were doing Cook-Torrance directly in the GPU, that's probably still sufficient.

OpenGL voxel engine slow

I'm making a voxel engine in C++ and OpenGL (à la Minecraft) and can't get decent fps on my 3GHz with ATI X1600... I'm all out of ideas.
When I have about 12000 cubes on the screen it falls to under 20fps - pathetic.
So far the optimizations I have are: frustum culling, back face culling (via OpenGL's glEnable(GL_CULL_FACE)), the engine draws only the visible faces (except the culled ones of course) and they're in an octree.
I've tried VBO's, I don't like them and they do not significantly increase the fps.
How can Minecraft's engine be so fast... I struggle with a 10000 cubes, whereas Minecraft can easily draw much more at higher fps.
Any ideas?
#genpfault: I analyze the connectivity and just generate faces for the outer, visible surface. The VBO had a single cube that I glTranslate()d
I'm not an expert at OpenGL, but as far as I understand this is going to save very little time because you still have to send every cube to the card.
Instead what you should do is generate faces for all of the outer visible surface, put that in a VBO, and send it to the card and continue to render that VBO until the geometry changes. This saves you a lot of the time your card is actually waiting on your processor to send it the geometry information.
You should profile your code to find out if the bottleneck in your application is on the CPU or GPU. For instance it might be that your culling/octtree algorithms are slow and in that case it is not an OpenGL-problem at all.
I would also keep count of the number of cubes you draw on each frame and display that on screen. Just so you know your culling routines work as expected.
Finally you don't mention if your cubes are textured. Try using smaller textures or disable textures and see how much the framerate increases.
gDEBugger is a great tool that will help you find bottlenecks with OpenGL.
I don't know if it's ok here to "bump" an old question but a few things came up my mind:
If your voxels are static you can speed up the whole rendering process by using an octree for frustum culling, etc. Furthermore you can also compile a static scene into a potential-visibility-set in the octree. The main principle of PVS is to precompute for evere node in the tree which other nodes are potential visible from it and store pointers to them in a vector. When it comes to rendering you first check in which node the camera is placed and then run frustum culling against all nodes in the PVS-vector of the node.(Carmack used something like that in the Quake engines, but with Binary Space Partitioning trees)
If the shading of your voxels is kindalike complex it is also fast to do a pre-Depth-Only-Pass, without writing into the colorbuffer,just to fill the Depthbuffer. After that you render a 2nd pass: disable writing to the Depthbuffer and render only to the Colorbuffer while checking the Depthbuffer. So you avoid expensive shader-computations which are later overwritten by a new fragment which is closer to the viewer.(Carmack used that in Quake3)
Another thing which will definitely speed up things is the use of Instancing. You store only the position of each voxel and, if nescessary, its scale and other parameters into a texturebufferobject. In the vertexshader you can then read the positions of the voxels to be spawned and create an instance of the voxel(i.e. a cube which is given to the shader in a vertexbufferobject). So you send the 8 Vertices + 8 Normals (3 *sizeof(float) *8 +3 *sizeof(float) *8 + floats for color/texture etc...) only once to the card in the VBO and then only the positions of the instances of the Cube(3*sizeof(float)*number of voxels) in the TBO.
Maybe it is possibile to parallelize things between GPU and CPU by combining all 3 steps in 2 threads, in the CPU-thread you check the octrees pvs and update a TBO for instancing in the next frame, the GPU-thread does meanwhile render the 2 passes while using an TBO for instancing which was created by the CPU thread in the previous step. After that you switch TBOs. If the Camera has not moved you don't even have to do the CPU-calculations again.
Another kind of tree you me be interested in is the so called k-d-tree, which is more general than octrees.
PS: sorry for my english, it's not the clearest....
There are 3rd-party libraries you could use to make the rendering more efficient. For example the C++ PolyVox library can take a volume and generate the mesh for you in an efficient way. It has built-in methods for reducing triangle count and helping to generate things like ambient occlusion. It's got a good community around it so getting support on the forum should be easy.
Have you used a common display list for all your cubes ?
Do you skip calling drawing code of cubes which are not visible to the user ?

Using Vertex Buffer Objects for a tile-based game and texture atlases

I'm creating a tile-based game in C# with OpenGL and I'm trying to optimize my code as best as possible.
I've read several articles and sections in books and all come to the same conclusion (as you may know) that use of VBOs greatly increases performance.
I'm not quite sure, however, how they work exactly.
My game will have tiles on the screen, some will change and some will stay the same. To use a VBO for this, I would need to add the coordinates of each tile to an array, correct?
Also, to texture these tiles, I would have to create a separate VBO for this?
I'm not quite sure what the code would look like for tiling these coordinates if I've got tiles that are animated and tiles that will be static on the screen.
Could anyone give me a quick rundown of this?
I plan on using a texture atlas of all of my tiles. I'm not sure where to begin to use this atlas for the textured tiles.
Would I need to compute the coordinates of the tile in the atlas to be applied? Is there any way I could simply use the coordinates of the atlas to apply a texture?
If anyone could clear up these questions it would be greatly appreciated. I could even possibly reimburse someone for their time & help if wanted.
Thanks,
Greg
OK, so let's split this into parts. You didn't specify which version of OpenGL you want to use - I'll assume GL 3.3.
VBO
Vertex buffer objects, when considered as an alternative to client vertex arrays, mostly save the GPU bandwidth. A tile map is not really a lot of geometry. However, in recent GL versions the vertex buffer objects are the only way of specifying the vertices (which makes a lot of sense), so we cannot really talked about "increasing performance" here. If you mean "compared to deprecated vertex specification methods like immediate mode or client-side arrays", then yes, you'll get a performance boost, but you'd probably only feel it with 10k+ vertices per frame, I suppose.
Texture atlases
The texture atlases are indeed a nice feature to save on texture switching. However, on GL3 (and DX10)-enabled GPUs you can save yourself a LOT of trouble characteristic to this technique, because a more modern and convenient approach is available. Check the GL reference docs for TEXTURE_2D_ARRAY - you'll like it. If GL3 cards are your target, forget texture atlases. If not, have a google which older cards support texture arrays as an extension, I'm not familiar with the details.
Rendering
So how to draw a tile map efficiently? Let's focus on the data. There are lots of tiles and each tile has the following infromation:
grid position (x,y)
material (let's call it "material" not "texture" because as you said the image might be animated and change in time; the "material" would then be interpreted as "one texture or set of textures which change in time" or anything you want).
That should be all the "per-tile" data you'd need to send to the GPU. You want to render each tile as a quad or triangle strip, so you have two alternatives:
send 4 vertices (x,y),(x+w,y),(x+w,y+h),(x,y+h) instead of (x,y) per tile,
use a geometry shader to calculate the 4 points along with texture coords for every 1 point sent.
Pick your favourite. Also note that directly corresponds to what your VBO is going to contain - the latter solution would make it 4x smaller.
For the material, you can pass it as a symbolic integer, and in your fragment shader - basing on current time (passed as an uniform variable) and the material ID for a given tile - you can decide on the texture ID from the texture array to use. In this way you can make a simple texture animation.

OpenGL, applying texture from image to isosurface

I have a program in which I need to apply a 2-dimensional texture (simple image) to a surface generated using the marching-cubes algorithm. I have access to the geometry and can add texture coordinates with relative ease, but the best way to generate the coordinates is eluding me.
Each point in the volume represents a single unit of data, and each unit of data may have different properties. To simplify things, I'm looking at sorting them into "types" and assigning each type a texture (or portion of a single large texture atlas).
My problem is I have no idea how to generate the appropriate coordinates. I can store the location of the type's texture in the type class and use that, but then seams will be horribly stretched (if two neighboring points use different parts of the atlas). If possible, I'd like to blend the textures on seams, but I'm not sure the best manner to do that. Blending is optional, but I need to texture the vertices in some fashion. It's possible, but undesirable, to split the geometry into parts for each type, or to duplicate vertices for texturing purposes.
I'd like to avoid using shaders if possible, but if necessary I can use a vertex and/or fragment shader to do the texture blending. If I do use shaders, what would be the most efficient way of telling it was texture or portion to sample? It seems like passing the type through a parameter would be the simplest way, but possible slow.
My volumes are relatively small, 8-16 points in each dimension (I'm keeping them smaller to speed up generation, but there are many on-screen at a given time). I briefly considered making the isosurface twice the resolution of the volume, so each point has more vertices (8, in theory), which may simplify texturing. It doesn't seem like that would make blending any easier, though.
To build the surfaces, I'm using the Visualization Library for OpenGL and its marching cubes and volume system. I have the geometry generated fine, just need to figure out how to texture it.
Is there a way to do this efficiently, and if so what? If not, does anyone have an idea of a better way to handle texturing a volume?
Edit: Just to note, the texture isn't simply a gradient of colors. It's actually a texture, usually with patterns. Hence the difficulty in mapping it, a gradient would've been trivial.
Edit 2: To help clarify the problem, I'm going to add some examples. They may just confuse things, so consider everything above definite fact and these just as help if they can.
My geometry is in cubes, always (loaded, generated and saved in cubes). If shape influences possible solutions, that's it.
I need to apply textures, consisting of patterns and/or colors (unique ones depending on the point's "type") to the geometry, in a technique similar to the splatting done for terrain (this isn't terrain, however, so I don't know if the same techniques could be used).
Shaders are a quick and easy solution, although I'd like to avoid them if possible, as I mentioned before. Something usable in a fixed-function pipeline is preferable, mostly for the minor increase in compatibility and development time. Since it's only a minor increase, I will go with shaders and multipass rendering if necessary.
Not sure if any other clarification is necessary, but I'll update the question as needed.
On the texture combination part of the question:
Have you looked into 3d textures? As we're talking marching cubes I should probably immediately say that I'm explicitly not talking about volumetric textures. Instead you stack all your 2d textures into a 3d texture. You then encode each texture coordinate to be the 2d position it would be and the texture it would reference as the third coordinate. It works best if your textures are generally of the type where, logically, to transition from one type of pattern to another you have to go through the intermediaries.
An obvious use example is texture mapping to a simple height map — you might have a snow texture on top, a rocky texture below that, a grassy texture below that and a water texture at the bottom. If a vertex that references the water is next to one that references the snow then it is acceptable for the geometry fill to transition through the rock and grass texture.
An alternative is to do it in multiple passes using additive blending. For each texture, draw every face that uses that texture and draw a fade to transparent extending across any faces that switch from one texture to another.
You'll probably want to prep the depth buffer with a complete draw (with the colour masks all set to reject changes to the colour buffer) then switch to a GL_EQUAL depth test and draw again with writing to the depth buffer disabled. Drawing exactly the same geometry through exactly the same transformation should produce exactly the same depth values irrespective of issues of accuracy and precision. Use glPolygonOffset if you have issues.
On the coordinates part:
Popular and easy mappings are cylindrical, box and spherical. Conceptualise that your shape is bounded by a cylinder, box or sphere with a well defined mapping from surface points to texture locations. Then for each vertex in your shape, start at it and follow the normal out until you strike the bounding geometry. Then grab the texture location that would be at that position on the bounding geometry.
I guess there's a potential problem that normals tend not to be brilliant after marching cubes, but I'll wager you know more about that problem than I do.
This is a hard and interesting problem.
The simplest way is to avoid the issue completely by using 3D texture maps, especially if you just want to add some random surface detail to your isosurface geometry. Perlin noise based procedural textures implemented in a shader work very well for this.
The difficult way is to look into various algorithms for conformal texture mapping (also known as conformal surface parametrization), which aim to produce a mapping between 2D texture space and the surface of the 3D geometry which is in some sense optimal (least distorting). This paper has some good pictures. Be aware that the topology of the geometry is very important; it's easy to generate a conformal mapping to map a texture onto a closed surface like a brain, considerably more complex for higher genus objects where it's necessary to introduce cuts/tears/joins.
You might want to try making a UV Map of a mesh in a tool like Blender to see how they do it. If I understand your problem, you have a 3D field which defines a solid volume as well as a (continuous) color. You've created a mesh from the volume, and now you need to UV-map the mesh to a 2D texture with texels extracted from the continuous color space. In a tool you would define "seams" in the 3D mesh which you could cut apart so that the whole mesh could be laid flat to make a UV map. There may be aliasing in your texture at the seams, so when you render the mesh it will also be discontinuous at those seams (ie a triangle strip can't cross over the seam because it's a discontinuity in the texture).
I don't know any formal methods for flattening the mesh, but you could imagine cutting it along the seams and then treating the whole thing as a spring/constraint system that you drop onto a flat surface. I'm all about solving things the hard way. ;-)
Due to the issues with texturing and some of the constraints I have, I've chosen to write a different algorithm to build the geometry and handle texturing directly in that as it produces surfaces. It's somewhat less smooth than the marching cubes, but allows me to apply the texcoords in a way that works for my project (and is a bit faster).
For anyone interested in texturing marching cubes, or just blending textures, Tommy's answer is a very interesting technique and the links timday posted are excellent resources on flattening meshes for texturing. Thanks to both of them for their answers, hopefully they can be of use to others. :)