OpenGL Geometry Extrusion with geometry Shader - opengl

With the GLE Tubing and Extrusion Library (http://www.linas.org/gle/) I am able to extrude 2D countours into 3D objects using OpenGL. The Library does all the work on the CPU and uses OpenGL immediate mode.
I guess doing the extrusion on the GPU using Geometry Shaders might be faster especially when rendering a lot of geometry. Since I do not yet have any experience with Geometry Shaders in OpenGL i would like to know if that is possible and what I have to pay attention to. Do you think it is a good Idea to move those computations to the GPU and that it will increase performance? It should also be possible to get the rendered geometry back to the CPU from the GPU, possibly using "Render to VBO".

If the geometry indeed changes every frame, you should do it on the GPU.
Keep in mind that every other solution that doesn't rely on the immediate mode will be faster than what you have right now. You might not even have to do it on the GPU.
But maybe you want to use shadow mapping instead, which is more efficient in some cases. It will also make it possible to render shadows for alpha tested objects like grass.
But it seems like you really need the resulting shadow geometry, so I'm not sure if that's an option for you.
Now back to the shadow volumes.
Extracting the shadow silhouette from a mesh using geometry shaders is a pretty complex process. But there's enough information about it on the internet.
Here's an article by Nvidia, which explains the process in detail:
Efficient and Robust Shadow Volumes Using Hierarchical Occlusion Culling and Geometry Shaders.
Here's another approach (from 2003) which doesn't even require geometry shaders, which could be interesting on low-end hardware:
http://de.slideshare.net/stefan_b/shadow-volumes-on-programmable-graphics-hardware
If you don't need the most efficient solution (using the shadow silhouette), you can also simply extract every triangle of the mesh on it's own. This is very easy using a geometry shader. I'd try that first before trying to implement silhouette extraction on the GPU.
About the "render to VBO" part of your question:
As far as I know there's no way to read the output of the geometry shader back to the CPU. Don't quote me on this, but I've never heard of a way to do this.

Related

opengl - possibility of a mirroring shader?

Until today, when I wanted to create reflections (a mirror) in opengl, I rendered a view into a texture and displayed that texture on the mirroring surface.
What i want to know is, are there any other methods to create a mirror in opengl?
And 2. can this be done lonely in shaders (e.g. geometry shader) ?
Ray-tracing. You can write a ray-tracer in the fragment shader (every fragment follows a ray). Ray-tracers can perfectly deal with reflection (mirroring) on all kinds of surfaces.
You can find an OpenGL example here and a WebGL example including mirroring here.
There are no universal way to do that, in any 3D API i know of.
Depending on your case there are several possible techniques with different downsides.
Planar reflections: That's what you are doing already.
Note that your mirror needs to be flat and you have to clip so anything closer than the mirror ins't rendered into the texture.
Good old cubemaps: attach a cubemap to each mirror then sample it in the reflection direction. This works for any surface but you will need to render the cubemaps (which can be done only once if you don't care about moving objects being reflected). I don't think you can do this without shaders but only the mirror will need one. Its a very common technique as it's easy do implement, can be dynamic and fairly cheap while being easy to integrate into an existing engine.
Screen space ray-marching: It's what danny-ruijters suggested. Kind of like SSAO : for each pixel, sample the depth buffer along the reflection vector until you hit something. This has the advantage to be applicable anywhere (on arbitrary complex surfaces) however it can only reflect stuff that appear on screen which can introduce lots of small artifacts but it's completly dynamic and very simple to implement. Note that you will need an additional pass (or rendering normals into a buffer) to access your scene final color in while computing the reflections. You absolutely need shaders for that, but it's post process so it won't interfere with the scene rendering if that's what you fear.
Some modern game engines use this to add small details to reflective surfaces without the burden of having to compute/store cubemaps.
They are probably many other ways to render mirrors but these are the tree main one (at least for what i know) ways of doing reflections.

Deferred Rendering with Tile-Based culling Concept Problems

EDIT: I'm still looking for some help about the use of OpenCL or compute shaders. I would prefer to keep using OGL 3.3 and not have to deal with the bad driver support for OGL 4.3 and OpenCL 1.2, but I can't think of anyway to do this type of shading without using one of the two (to match lights and tiles). Is it possible to implement tile-based culling without using GPGPU?
I wrote a deferred render in OpenGL 3.3. Right now I don't do any culling for the light pass (I just render a full screen quad for every light). This (obviously) has a ton of overdraw. (Sometimes it is ~100%). Because of this I've been looking into ways to improve performance during the light pass. It seems like the best way in (almost) everyone's opinion is to cull the scene using screen space tiles. This was the method used in Frostbite 2. I read the the presentation from Andrew Lauritzen during SIGGRAPH 2010 (http://download-software.intel.com/sites/default/files/m/d/4/1/d/8/lauritzen_deferred_shading_siggraph_2010.pdf) , and I'm not sure I fully understand the concept. (and for that matter why it's better than anything else, and if it is better for me)
In the presentation Laurtizen goes over deferred shading with light volumes, quads, and tiles for culling the scene. According to his data, the tile based deferred renderer was the fastest (by far). I don't understand why it is though. I'm guessing it has something to do with the fact that for each tile, all the lights are batched together. In the presentation it says to read the G-Buffer once and then compute the lighting, but this doesn't make sense to me. In my mind, I would implement this like this:
for each tile {
for each light effecting the tile {
render quad (the tile) and compute lighting
blend with previous tiles (GL_ONE, GL_ONE)
}
}
This would still involve sampling the G-Buffer a lot. I would think that doing that would have the same (if not worse) performance than rendering a screen aligned quad for every light. From how it's worded though, it seems like this is what's happening:
for each tile {
render quad (the tile) and compute all lights
}
But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs . Can anyone help me with this? It also seems like almost every tile based deferred renderer uses compute shaders or OpenCL (to batch the lights), why is this, and if I didn't use these what would happen?
But I don't see how one would do this without exceeding the instruction limit for the fragment shader on some GPUs .
It rather depends on how many lights you have. The "instruction limits" are pretty high; it's generally not something you need to worry about outside of degenerate cases. Even if 100+ lights affects a tile, odds are fairly good that your lighting computations aren't going to exceed instruction limits.
Modern GL 3.3 hardware can run at least 65536 dynamic instructions in a fragment shader, and likely more. For 100 lights, that's still 655 instructions per light. Even if you take 2000 instructions to compute the camera-space position, that still leaves 635 instructions per light. Even if you were doing Cook-Torrance directly in the GPU, that's probably still sufficient.

Is it possible to reuse glsl vertex shader output later?

I have a huge mesh(100k triangles) that needs to be drawn a few times and blend together every frame. Is it possible to reuse the vertex shader output of the first pass of mesh, and skip the vertex stage on later passes? I am hoping to save some cost on the vertex pipeline and rasterization.
Targeted OpenGL 3.0, can use features like transform feedback.
I'll answer your basic question first, then answer your real question.
Yes, you can store the output of vertex transformation for later use. This is called Transform Feedback. It requires OpenGL 3.x-class hardware or better (aka: DX10-hardware).
The way it works is in two stages. First, you have to set your program up to have feedback-based varyings. You do this with glTransformFeedbackVaryings. This must be done before linking the program, in a similar way to things like glBindAttribLocation.
Once that's done, you need to bind buffers (given how you set up your transform feedback varyings) to GL_TRANSFORM_FEEDBACK_BUFFER with glBindBufferRange, thus setting up which buffers the data are written into. Then you start your feedback operation with glBeginTransformFeedback and proceed as normal. You can use a primitive query object to get the number of primitives written (so that you can draw it later with glDrawArrays), or if you have 4.x-class hardware (or AMD 3.x hardware, all of which supports ARB_transform_feedback2), you can render without querying the number of primitives. That would save time.
Now for your actual question: it's probably not going to help buy you any real performance.
You're drawing terrain. And terrain doesn't really get any transformation. Typically you have a matrix multiplication or two, possibly with normals (though if you're rendering for shadow maps, you don't even have that). That's it.
Odds are very good that if you shove 100,000 vertices down the GPU with such a simple shader, you've probably saturated the GPU's ability to render them all. You'll likely bottleneck on primitive assembly/setup, and that's not getting any faster.
So you're probably not going to get much out of this. Feedback is generally used for either generating triangle data for later use (effectively pseudo-compute shaders), or for preserving the results from complex transformations like matrix palette skinning with dual-quaternions and so forth. A simple matrix multiply-and-go will barely be a blip on the radar.
You can try it if you like. But odds are you won't have any problems. Generally, the best solution is to employ some form of deferred rendering, so that you only have to render an object once + X for every shadow it casts (where X is determined by the shadow mapping algorithm). And since shadow maps require different transforms, you wouldn't gain anything from feedback anyway.

Are triangles a gpu restriction or are there other rendering pathways?

To preface this question, I have a competent understanding of OpenGL and the maths behind it, and while I have never touched anything related to DirectX I imagine the concepts are similar.
There is plenty of information around about why triangles are used for 3D graphics (they are necessarily planar, are indivisible except into smaller triangles, etc). However, I would like to know if triangles are merely a convenient way of storing and manipulating 3D data (simpler maths regarding interpolation, etc), or if there is a hardware limitation in the graphics card that only realistically allows the rendering of triangles (e.g. instructions that can essentially ONLY be applied to triangles).
Following on from this, is there any way to achieve pixel-by-pixel control of graphics rendering (as outlined briefly by the answer to this question). While I appreciate direct control over individual pixels is done through a driver, is there any way I can get this kind of control over a rendering environment? Is there away to 'avoid triangles' completely?
Yes and no. Kind of.
Current GPUs are designed to render triangles because triangles are nice to work with. And because current GPUs are designed to work with triangles, people use triangles and so GPUs only need to process triangles, and so they're designed to process only triangles.
As you say, triangles just have advantages that make them convenient to use. GPUs can be made (and have been made) to render other primitives natively, but it's just not really worth it. If you tell a modern GPU to render a quad, it splits it up into two triangles and renders those.
Not because there's a technical reason why a GPU can't render quads natively, but because it's not worth spending transistors on. It's much more useful to focus the GPU on doing triangles as fast as possible, and then just emulate other primitives if they're needed.
So yes, modern GPUs have hardware limitations so they don't work with quads, for example, but not because it's impossible to design a GPU which works with quads. It'd just be less efficient to do so. :)
As for "avoiding triangles", sure, that's basically what the fragment shader does: it fills in one single pixel. The GPU just runs it a few million times in parallel to fill in the entire screen. You could draw two big triangles, which form a quad filling the entire screen, and then just specify a fragment shader which fills that with the content you like.
If you want more control over the process, do it in software instead: paint one pixel at a time to a memory surface, and then load that as a texture on the GPU. But it's slow. :)
As far as i know every modern CAN render quads and some even N-gons but it comparing the render time of a quad to 2 triangles shows the triangle advantage.
This is mainly because GPU's have been optimized to render triangles and that the accual hardware has way more "steam processors" (for triangles) then others such as textures ones. Some other processor types on the GPU can render quads directly but normally you would find a thousand steam to a few texture processors
Note that getting a texure unit to render a quad is EXTREMELY difficult. It is possible in theory but no one used the pricip for a serius case.
Unless you work with very hardware close operation the software will take care of the triangles, (eg, Auto-Convert them from quads)

What is the most efficient way to draw voxels (cubes) in opengl?

I would like to draw voxels by using opengl but it doesn't seem like it is supported. I made a cube drawing function that had 24 vertices (4 vertices per face) but it drops the frame rate when you draw 2500 cubes. I was hoping there was a better way. Ideally I would just like to send a position, edge size, and color to the graphics card. I'm not sure if I can do this by using GLSL to compile instructions as part of the fragment shader or vertex shader.
I searched google and found out about point sprites and billboard sprites (same thing?). Could those be used as an alternative to drawing a cube quicker? If I use 6, one for each face, it seems like that would be sending much less information to the graphics card and hopefully gain me a better frame rate.
Another thought is maybe I can draw multiple cubes using one drawelements call?
Maybe there is a better method altogether that I don't know about? Any help is appreciated.
Drawing voxels with cubes is almost always the wrong way to go (the exceptional case is ray-tracing). What you usually want to do is put the data into a 3D texture and render slices depending on camera position. See this page: https://developer.nvidia.com/gpugems/GPUGems/gpugems_ch39.html and you can find other techniques by searching for "volume rendering gpu".
EDIT: When writing the above answer I didn't realize that the OP was, most likely, interested in how Minecraft does that. For techniques to speed-up Minecraft-style rasterization check out Culling techniques for rendering lots of cubes. Though with recent advances in graphics hardware, rendering Minecraft through raytracing may become the reality.
What you're looking for is called instancing. You could take a look at glDrawElementsInstanced and glDrawArraysInstanced for a couple of possibilities. Note that these were only added as core operations relatively recently (OGL 3.1), but have been available as extensions quite a while longer.
nVidia's OpenGL SDK has an example of instanced drawing in OpenGL.
First you really should be looking at OpenGL 3+ using GLSL. This has been the standard for quite some time. Second, most Minecraft-esque implementations use mesh creation on the CPU side. This technique involves looking at all of the block positions and creating a vertex buffer object that renders the triangles of all of the exposed faces. The VBO is only generated when the voxels change and is persisted between frames. An ideal implementation would combine coplanar faces of the same texture into larger faces.