How should I organize shader system with opengl - opengl

I was thinking about:
Having a main shader which will be applied to every object of my application, it will be used for projection, transformation, positionning, coloring, etc..
And each object could have their own extra shader for extra stuff, for example a water object definitely needs an extra shader.
But there is a problem, how would I apply 2 or more shaders into one object ? Because I'll need to apply the main shader + object's own shader.

It would be really nice if OpenGL (or Direct3D!) allowed you to have multiple shaders at each vertex / fragment / whatever stage, but alas we are stuck with existing systems.
Assume you've written a bunch of GLSL functions. Some are general-purpose for all objects, like applying the modelview transformation and copying texture coords to the next stage. Some are specific to particular classes of object, such as water or rock.
What you then write is the ubershader, a program in which the main() functions at the vertex / fragment / whatever stages do nothing much other than call all these functions. This is a template or prototype from which you generate more specialised programs.
The most common way is to use the preprocessor and lots of #ifdefs around function calls inside main(). Maybe if you compile without any #defines you get the standard transform and Gouraud shading. Add in #define WATER to get the water effect, #define DISTORT for some kind of free form deformation algorithm, both if you want free-form deformed water, #define FOG to add in a fog effect, ...
You don't even need to have more than one copy of the ubershader source, since you can generate the #define strings at runtime and pass them into glCompileShader.
What you end up with is a lot of shader programs, one for each type of rendering. If for any reasons you'd rather have just one program throughout, you can do something similar on newer systems with GLSL subroutines.
These are basically function pointers in GLSL which you can set much like uniforms. Now your ubershader has 1, 2, ... function pointer calls in the main() functions. Your program just sets up #1 to be standard transform, #2 to be rock/water/whatever, #3 to be fog, ... If you don't want to use a stage, just have a NOP function that you can assign.
While this has the advantage of only using one program, it is not as flexible as the #define approach because any given pointer has to use the same function prototype. It's also more work if say WATER needs processing in multiple shaders, because you have to remember to set the function pointers in every one rather than just a single #define.
Hope this helps.

Related

Is there a faster alternative to geometry shaders that can render points as a specific number of triangles?

I'm currently using openGL with a geometry shader to take points and convert them to triangles during rendering.
I have n lists of points that will each be rendered as n triangles (first list of points each becomes one triangle, second becomes two triangles, etc). I've tried swapping geometry shaders for each of these lists with max_vertices being the minimum for each list. With OpenGL I seemingly have no control over how this is ultimately implemented on the GPU via the geometry shader and some drivers seem to handle it very slowly while others are very fast.
Is there any way to perform this specific task optimally, ideally taking advantage of the fact that I know the exact number of desired output triangles per element and in total? I would be happy to use some alternative to geometry shaders for this if possible. I would also be happy to try Vulkan if it can do the trick.
What you want is arbitrary amplification of geometry: taking one point primitive and producing arbitrarily many entirely separate primitives from it. And the tool GPUs have for that is geometry shaders (or just using a compute shader to generate your vertex data manually, but that's probably not faster and definitely more memory consuming).
While GS's are not known for performance, there is one way you might be able to speed up what you're doing. Since all of the primitives in a particular call will generate a specific number of primitives, you can eschew having each GS output more than one primitive by employing vertex instanced rendering.
Here, you use glDrawArraysInstanced. Your VS needs to pass gl_InstanceID to the GS, which can use that to figure out which triangle to generate from the vertex. That is, instead of having a loop over n to generate n triangles, the GS only generates one triangle. But it gets called instanceCount times, and each call should generate the gl_InstanceIDth triangle.
Now, one downside of this is that the order of triangles generated will be different. In your original GS code, where each GS generates all of the triangles from a point, all of the triangles from one point will be rendered before rendering any triangles from another point. With vertex instancing, you get one triangle from all of the points, then it produces another triangle from all the points, etc. If rendering order matters to you, then this won't work.
If that's important, then you can try geometry shader instancing instead. This works similarly to vertex instancing, except that the instance count is part of the GS. Each GS invocation is only responsible for a single triangle, and you use gl_InvocationID to decide which triangle to use it on. This will ensure that all primitives from one set of GS instances will be rendered before any primitives from a different set of GS instances.
The downside is what I said: the instance count is part of the GS. Unlike instanced rendering, the number of instances is baked into the GS code itself. So you will need a separate program for every count of triangles you work with. SPIR-V specialization constants make it a bit easier on you to build those programs, but you still need to maintain (and swap) multiple programs.
Also, while instanced rendering has no limit on the number of instances, GS's do have a limit. And that limit can be as small as 32 (which is a very popular number).

glMultiDraw functions and gl_InstanceID

When I look at the documentation of glMultiDrawElementsIndirect (or in the Wiki) it says that a single call to glMultiDrawElementsIndirect is equivalent to repeatedly calling glDrawElementsIndirect (just with different parameters).
Does that mean that gl_InstanceID will reset for each of these "internal" calls? And if so, how am I able to tell all these calls apart in my vertex shader?
Background: I'm trying to draw all my different meshes all at once. But I need some way to know to which mesh the vertex, I'm processing in my vertex shader, belongs.
The documentation says "similarly to". "Equivalent" isn't the same thing. It also points to glDrawElementsInstancedBaseVertexBaseInstance, not glDrawElementsInstanced.
But yes, gl_InstanceId for any draw will start at zero, no matter what base instance you provide. That's how gl_InstanceId works, unfortunately.
Besides, that's not the question you want answered. You're not looking to ask which instance you're rendering, since each draw in the multi-draw can be rendering multiple instances. You're asking which draw in the multi-draw you are in. An instance ID isn't going to help.
And if so, how am I able to tell all these calls apart in my vertex shader?
Unless you have OpenGL 4.6 or ARB_shader_draw_parameters, you can't. Well, not directly.
That is, multidraw operations are expected to produce different results based on rendering from different parts of the current buffer objects, not based on computations in the shader. You're rendering with a different base vertex that selects different vertices from the arrays, or you're using different ranges of indices or whatever.
The typical pre-shader_draw_parameters solution would have been to use a unique base instance on each of the individual draws. Of course, since gl_InstanceId doesn't track the base instance (as previously stated), you would need to employ instanced arrays instead. So you'd get the mesh index from that.
Of course, 4.6/shader_draw_parameters gives you gl_DrawId, which just tells you what the index is within the multidraw command. It's also dynamically uniform, so you can use it to access arrays of opaque types in shaders.

Under what conditions does a multi-pass approach become strictly necessary?

I'd like to enumerate those general, fundamental circumstances under which multi-pass rendering becomes an unavoidable necessity, as opposed to keeping everything within the same shader program. Here's what I've come up with so far.
When a result requires non-local fragment information (i.e. context) around the current fragment, e.g. for box filters, then a previous pass must have supplied this;
When a result needs hardware interpolation done by a prior pass;
When a result acts as pre-cache of some set of calculations that enables substantially better performance than simply (re-)working through the entire set of calculations in those passes that use them, e.g. transforming each fragment of the depth buffer in a particular and costly way, which multiple later-pass shaders can then share, rather than each repeating those calculations. So, calculate once, use more than once.
I note from my own (naive) deductions above that vertex and geometry shaders don't really seem to come into the picture of deferred rendering, and so are probably usually done in first pass; to me this seems sensible, but either affirmation or negation of this, with detail, would be of interest.
P.S. I am going to leave this question open to gather good answers, so don't expect quick wins!
Nice topic. For me since I'm a beginner I would say to avoid unnecessary calculations in the pixel/fragment shader you get when you use forward rendering.
With forward rendering you have to do a pass for every light you have in your scene, even if the pixel colors aren't affected.
But that's just a comparison between forward rendering and deferred rendering.
As opposed to keeping everything in the same shader program, the simplest thing I can think of is the fact that you aren't restricted to use N number of lights in your scene, since in for instance GLSL you can use either separate lights or store them in a uniform array. Then again you can also use forward rendering, but if you have a lot of lights in your scene forward rendering has a too expensive pixel/fragment shader.
That's all I really know so I would like to hear other theories as well.
Deferred / multi-pass approaches are used when the results of the depth buffer are needed (produced by rendering basic geometry) in order to produce complex pixel / fragment shading effects based on depth, such as:
Edge / silhouette detection
Lighting
And also application logic:
GPU picking, which requires the depth buffer for ray calculation, and uniquely-coloured / ID'ed geometries in another buffer for identification of "who" was hit.

Multiple meshes or multiple objects in a single mesh?

I'm trying to load multiple objects into a vbo in opengl. If I want to be able to move these objects independently should I use a mesh for each object or should I load all the objects to a single mesh?
Also in my code I have...
loc1 = glGetAttribLocation(shaderP, "vertex_position");
Now I understand that this gets the vertex positions in my current program but if I want to load another object I load the mesh and then how can I get the vertex positions again but for only that mesh?
The answer is, as often, "it depends". Having one "mesh" (i.e. one buffer) per object is arguably "cleaner" but it is also likely slower. One buffer per object will make you bind a different buffer much more often. Tiny vertex buffer objects (a few dozen vertices) are as bad as huge ones (gigabytes of data). You should try to find a "reasonable" thing in between
The as of version 3.2 readily available glDrawElementsBaseVertex (also exists as instanced version) will allow you to seamlessly draw several objects or pieces from one buffer without having to fiddle with renumbering indices, and without having to switch buffers.
You should preferrably (presuming OpenGL 3.3 availability) not use glGetAttribLocation at all, but assign the attribute to a location using the layout specifier. That way you know the location, you don't need to ask every time, and you don't have to worry that "weird, unexpected stuff" might happen.
If you can't do it in the shader, use glBindAttribLocation (available since version 2.0) instead. It is somewhat less comfortable, but does the same thing. You get to decide the location instead of asking for it and worrying that the compiler hopefully didn't change the order for two different shaders.
It's generally cleaner if you use different buffers for different objects.
According to:
http://www.opengl.org/sdk/docs/man/xhtml/glGetAttribLocation.xml
This will only returned the pointer to the location of the data. You use this to bind you vertex info to the program. When you render your other object you will bind to the vbo that stores the other vertices.

Behavior of uniforms after glUseProgram() and speed

How fast is glUseProgram()? Is there anything better (faster)?:
Here are my thoughts:
Use 1 universal shader program, but with many input settings and attributes (settings for each graphics class)
Use more than 1 shader for each graphics class
What state are uniforms in after changing the shader program? Do they save values (for example, values of matrices)?
Here are what I consider the benefits of #1 to be:
Doesn't use glUseProgram()
And the benefits of #2:
No matrix changes (for example, if class Menu and class Scene3D have different Projection matrices)
What of the two options is better largely depends on what those shaders do, how different they are and how many attributes/uniforms you set and how often they are changed. There is no one right answer for all cases.
That said: Keep in mind, there is not only the cost for state changes, but also a shader runtime cost and it is payed per vertex and per fragment. So keeping the complexity of the shader low is always a good idea and a universal shader is more complex than specialised ones.
Minimize state change. If you have objects A, C, E using Program X and B, D, F using Program Y then, all else being equal, render in order ACEBDF, not ABCDEF.
Regarding the last question: Programs retain their state, and thus the values of uniforms, over their lifetime, unless you relink them. But uniforms are per program state, which means that if you have two uniforms with the same name and type in different programs, values won't carry over from one program to another.