Prune Primitives using Geometry Shaders in OpenGL - opengl

I have a large 2D Triangle(not Triangle Strip) mesh with about 2+ million polygons. Many of these polygons are redundant which can be determined using one of the attribute variables in vertex shader. But since discarding a vertex in vertex shader is not possible. I am exploring the idea of discarding primitives in Geometry shader as an optimization. It is guaranteed that all three vertices of this primitive will have same attribute value.
I have a few doubts here
Will this really optimize rendering in terms of speed and GPU memory?
Is this even possible in geometry shader and are geometry shaders are suitable for this?

But since discarding a vertex in vertex shader is not possible.
Nonsense. Oh sure, the VS cannot "discard" a triangle, but that doesn't mean it has no power.
Triangles which appear wholly off-screen will generally get minimal processing done on them. So at a minimum, you can have the VS adjust the final gl_Position value to be off screen based on whether the attribute in question has the property you are looking for.
If you have access to OpenGL 4.6, you can get even fancier by employing cull planes, which allow the VS to cull triangles more directly.
Will this really optimize rendering in terms of speed and GPU memory?
Broadly speaking, you should never assume Geometry Shaders will increase the performance of any algorithm you apply it to. There may be cases where a GS could be used in optimizing an algorithm, but absent actual performance data, you should start from the assumption that employing it will only make performance worse.

Related

OpenGL indicesBuffer are they worth using?

I am currently learning OpenGL for 3D rendering and i can't quite wrap my head around some things regarding shaders and VBOs, i get that all VBOs share one index and therefore you need to duplicate some data
but when you create more VBOs there are nearly no faces with vertices that share the same position normal and texture coordinates so the indices are at least from my point of view pretty useless, it is basically just an array of consecutive numbers.
Is there an aspect of indicesBuffers i don't see ?
The utility of index buffers is, as with the utility of all vertex specification features, dependent on your mesh data.
Most of the meshes that get used in high-performance graphics, particularly those with significant polygon density, are smooth. Normals across such meshes are primarily smooth, since the modeller is usually approximating a primarily curved surface. Oh yes, there can be some sharp edges here and there, but for the most part, each position in such models has a single normal.
Texture coordinates usually vary smoothly across meshes too. There are certainly texture coordinate edges; well-optimized UV unwrapping often produces these kinds of things. But if you have a mesh of real size (10K+ vertices), most positions have a single texture coordinate. And tangents/bitangents are based on the changes in texture coordinates, so those will match the texture topology.
Are there meshes where the normal topology is highly disjoint with position topology? Yes. Cubes being the obvious example. But there are oftentimes needs for highly faceted geometry, either to achieve a specific look or for low-polygon uses. In these cases, normal indexed rendering may not be of benefit to you.
But that does not change the fact that these cases are the exception, generally speaking, rather than the rule. Even if your code always involves these cases, that simply isn't true for the majority of high-performance graphics applications out there.
In a closed mesh, every vertex is shared by at least two faces. (The only time a vertex will be used fewer than three times is in a double-sided mesh, where two faces have the same vertices but opposite normals and winding order.) Not using indices, and just duplicating vertices, is not only inefficient, but, at minimum, doubles the amount of vertex data required.
There’s also potential for cache thrashing that could be otherwise avoided, related pipeline stalls and other insanity.
Indices are your friend. Get to know them.
Update
Typically, normals, etc. are stored in a normal map, or interpolated between vertices.
If you just want a faceted or "flat shaded" render, use the cross product of dFdx() and dFdy() (or, in HLSL, ddx() and ddy())to generate the per-pixel normal in your fragment shader. Duplicating data is bad, and only necessary under very special and unusual circumstances. Nothing you've mentioned leads me to believe that this is necessary for your use case.

OpenGL: Are degenerate triangles in a Triangle Strip acceptable outside of OpenGL-ES?

In this tutorial for OpenGL ES, techniques for optimizing models are explained and one of those is to use triangle strips to define your mesh, using "degenerate" triangles to end one strip and begin another without ending the primitive. http://www.learnopengles.com/tag/degenerate-triangles/
However, this guide is very specific to mobile platforms, and I wanted to know if this technique held for modern desktop hardware. Specifically, would it hurt? Would it either cause graphical artifacts or degrade performance (opposed to splitting the strips into separate primatives?)
If it causes no artifacts and performs at least as well, I aim to use it solely because it makes organizing vertices in a certain mesh I want to draw easier.
Degenerate triangles work pretty well on all platforms. I'm aware of an old fixed-function console that struggled with degenerate triangles, but anything vaguely modern will be fine. Reducing the number of draw calls is always good and I would certainly use degenerates rather than multiple calls to glDrawArrays.
However, an alternative that usually performs better is indexed draws of triangle lists. With a triangle list you have a lot of flexibility to reorder the triangles to take maximum advantage of the post-transform cache. The post-transform cache is a hardware cache of the last few vertices that went through the vertex shader, the GPU can spot if you've re-issued the same vertex and skip the entire vertex shader for that vertex.
In addition to the above answers (no it shouldn't hurt at all unless you do something mad in terms of the ratio of real triangles to the degenerates), also note that the newer versions of OpenGL and OpenGL ES (3.x or higher) APIs support a means to insert breaks into index lists without needing an actual degenerate triangle, which is called primitive restart.
https://www.khronos.org/opengles/sdk/docs/man3/html/glEnable.xhtml
When enabled you can encode "MAX_INT" for the index type, and when detected that forces the GPU to restart building a new tristrip from the next index value.
It will not cause artifacts. As to "degrading performance"... relative to what? Relative to a random assortment of triangles with no indexing? Yes, it will be faster than that.
But there are plenty of other things one can do. For example, primitive restarting, which removes the need for degenerate triangles. Then there's using ordered lists of triangles for improved cache coherency. Will triangle strips be faster than that?
It rather depends on what you're rendering, how expensive your vertex shaders are, and various other things.
But at the end of the day, if you care about maximum performance on particular platforms, then you should profile for each platform and pick the vertex data based on what platform you're running on. If performance is really that important to you, then you're going to have to put forth some effort.

Is sharing vertices between faces worth it?

I am currently working on a WebGL project, although I imagine this question is generic across many graphics APIs.
Let me use the example of a simple cube to demonstrate what I am asking.
A cube has 6 faces with 4 vertices per face so in total we have 24 vertices that make up the cube. However, we could reduce the total number of vertices to only 8 if we share vertices between faces. As I have been reading this can save a lot of precious GPU memory especially when working with complex models and scenes.
On the other hand though, I have experienced first-hand some of the drawbacks with sharing vertices between faces. These include:
Complex vertex normal calculations as we must find the 'average' normal for each vertex, taking into account the face normals of each face that said vertex is a part of.
Some vertices must be duplicated anyway to 'match up' with their corresponding UV coordinates.
As a vertex may be shared by many faces, we are not able to specify a different colour per face using per vertex colouring.
The book I have been reading really stresses the importance of vertex sharing to minimise memory usage, so when I came across some of the disadvantages of vertex sharing I was unsure as to how viable/helpful vertex shading really is, and as the author did not mention any of the downsides of vertex sharing I would like to get the opinions of you guys. So is the memory saving produced from vertex shading really that important?
The disadvantages you named are indeed very real, especially for shapes with lots of sharp edges or different textures. A cube is the worst possible example for vertex sharing, each vertex has 3 different normals and possibly texture coordinates. It is essentially impossible to share the vertices.
However think of some organic shape. Like a ball, the body of some animal, cars, trees or even something as simple as a desert or something. These shapes probably need a high amount of vertices to look like anything decent, but a lot of those vertices are shared between faces. They need the exact same normals, texture coordinates and whatevers in order to look smooth.
Furthermore, the first disadvantage is not really that important. Calculating the vertices can be done in preprocessing, in most cases even by the modeller. This is basically never done realtime, instead you simply already have it in this format. However if it does need to be done realtime you can imagine this becoming an actual issue, and you need to start thinking about the trade offs and profile. But even then it can probably be dealt with using geometry shaders, if the visual fidelity is needed this can be a preferable solution.
In conclusion it heavily depends on what you're doing. In some cases vertex sharing isn't really viable because of the reasons you mentioned. Regardless, in many many cases it can potentially save a lot of memory.

Should I omit vertex normals when there is no lighting calculations?

I have an openGL program that doesn't use lighting or shading of any kind; the illusion of shadow is done completely through textures, since the meshes are low-poly. Faces are not backculled, and I wouldn't use normal-mapping of course.
My question is, should I define the vertex normals anyway? Would excluding them use fewer resources and speed rendering, or would excluding them negatively impact the performance/visuals in some way?
My question is, should I define the vertex normals anyway?
There is no need to, if they are not used.
Would excluding them use fewer resources and speed rendering, or would excluding them negatively impact the performance/visuals in some way?
It definitively wouldn't impact the visuals if there are not used.
You do not mention if you use old fixed-function pipeline or the modern programmable pipeline. In the old fixed-function pipeline, the normals are only used for the lighting calculation. The have nothing to do with the face culling. The front/back sides are determined solely by the primitive winding order in screen space.
If you use the programmable pipeline, the normals are used for whatever you use them. The GL itself will not care at all about it.
So excluding them should result in less memory needed for the object to be stored. If rendereing actually gets faster is hard to predict. If the normals aren't used, they shouldn't even be fetched, no matter if they are provided or not. But caching will also have an impact here, so the improvement of not fetching them might not be noticeable at all.
Only if you are using immediate mode(glBegin()/glEnd()) to specify geometry (which you really should never ever do), excluding the normals will save you one gl function call per vertex, and this should give a significant improvement (but still will be orders of magnitude slower than using vertex arrays).
If normals are not used for lighting, you don't need them (they are not used for back-face culling either).
The impact of performance is more about how this changes your vertex layout and resulting impact on pre-transform cache (assuming you have interleaved vertex format). Like on CPU's, GPU's fetch data in cache lines, and if without (or with) normals you get better alignment with cache lines, it can have an impact on the performance. For example if your vertex size is 32 bytes and removal of the normal gets it down to 20 bytes this will cause GPU fetching 2 cache lines for some vertices, while with 32 byte vertex format it's always fetches only one cache line. However, if your vertex size is 44 bytes and removal of normal gets it down to 32 bytes, then for sure it's an improvement (better alignment and less data).
However, this is quite a fine level optimization in the end and unlikely have any significant impact either way unless you are really pushing huge amount of geometry through the pipeline with very lightweight vertex/pixel shaders (e.g. shadow pass).

Geometry Shader Additional Primitives

I wanted to use a GLSL geometry shader to look at a line strip and determine the place to put a textured annotation, taking into account the current ModelView. It seems I'm limited to only getting 4 vertices per invokation (using GL_LINE_STRIP_ADJACENCY), but what I need is the entire line strip to evaluate.
I could use some other primitive type (such as a Multi-point, if there is an equivalent in GL), but the important point is I want to consider all the geometry, not just a portion.
Is there an extension of come kind that would provide additional vertices to the Geometry shader? Or is there a better way to do this other than using the Geometry shader?
There is no mechanism that will give you access to an entire rendered primitive stream. Primitives can be arbitrarily large, so they can easily blow past any reasonable internal buffer sizes that GPUs have. Thus implementing this would be impractical.
You could bind your array as a buffer texture and just read the data from there. But that's going to be exceedingly slow, since every GS invocation is going to have to process hundreds of vertices. That's not exactly taking advantage of GPU parallelism.
If you just want to put a text tag near something, you should designate a vertex or something as being where annotations should go.