Why is a simple shader slower than the standard pipeline? - opengl

I want to write a very simple shader which is equivalent to (or faster) than the standard pipeline. However, even the simplest shader possible:
Vertex Shader
void main(void)
{
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = ftransform();
}
Fragment Shader
uniform sampler2D Texture0;
void main(void)
{
gl_FragColor = texture2D(Texture0, gl_TexCoord[0].xy);
}
Cuts my framerate half in my game, compared to the standard shader, and performs horrific if some transparent images are displayed. I don't understand this, because the standard shader (glUseProgram(0)) does lighting and alpha blending, while this shader only draws flat textures. What makes it so slow?

It looks like this massive slowdown of custom shaders is a problem with old Intel Graphics chips, which seem to emulate the shaders on the CPU.
I tested the same program on recent hardware and the frame drop with the custom shader activated is only about 2-3 percents.
EDIT: wrong theory. See new answer below

I think you might bump into overdraw.
I don't know what engine you are using your shader on, but if you have alpha blend on then you might end up overdrawing allot.
Think about it this way :
If you have a 800x600 screen, and a 2D quad over the whole screen, that 2D quad will have 480000 fragment shader calls, although it has only 4 vertexes.
Now, moving further, let's assume you have 10 such quads, on on top of another. If you don't sort your geometry Front to Back or if you are using alpha blend with no depth test, then you will end up with 10x800x600 = 4800000 fragment calls.
2D usually is quite expensive on OpenGL due to the overdraw. 3D rejects many fragments. Eventhou the shaders are more complicated, the number of calls are greatly reduced for 3D objects compared to 2D objects.

After long investigation, the slowdown of the simple shader was caused by the shader being too simple.
In my case, the slowdown was caused by the text rendering engine, which made heavy use of "glBitmap", which would be very slow with textures enabled (for whatever reason I cannot understand; these letters are tiny).
However, this did not affect the standard pipeline, as it would acknowledge the feature glDisable(GL_LIGHTING) and glDisable(GL_TEXTURE_2D ), which circumvents the slowdown, whereas the simple shader failed to do so and would thus even do more work as the standard pipeline. After introducing these two features to the custom shader, it is as fast as the standard pipeline, plus the ability to add random effects without any performance impact!

Related

OpenGL Lighting Shader

I can't understand concept of smaller shaders in OpenGL. How does it work? For example: do I need to create one shader for positioning object in space and then shader another shader for lighting or what? Could someone explain this to me? Thanks in advance.
This is a very complex topic, especially since your question isn't very specific. At first, there are various shader stages (vertex shader, pixel shader, and so on). A shader program consists of different shader stages, at least a pixel and a vertex shader (except for compute shader programs, which are each single compute shaders). The vertex shader calculates the possition of the points on screen, so here the objects are being moved. The pixel shader calculates the color of each pixel, that is covered by the rendered geometry your vertex shader produced. Now, in terms of lighting, there are different ways of doing it:
Forward Shading
This is the straight-forward way, where you simply calculate the lighting in pixel shader of the same shader program, that moves to objects. This is the oldest way of calculating lighting, and the easiest one. However, it's abilities are very limited.
Deffered Shading
For ages, this is the go-to variant in games. Here, you have one shader program (vertex + pixel shader) that renders the geometrie on one (or multiple) textures (so it moves the objects, but it doesn't save the lit color, but rather things like the base color and surface normals into the texture), and then an other shader program that renders a quad on screen for each light you want to render, the pixel shader of this shader program reads the informations previously rendered in the textur by the first shader program, and uses it to render the lit objects on an other textur (which is then the final image). In constrast to forward shading, this allows (in theory) any number of lights in the scene, and allows easier usage of shadow maps
Tiled/Clustered Shading
This is a rather new and very complex way of calculating lighting, that can be build on top of deffered or forward shading. It basicly uses compute shaders to calculate an accelleration-structure on the gpu, which is then used draw huge amount of lights very fast. This allows to render thousands of lights in a scene in real time, but using shadow maps for these lights is very hard, and the algorithm is way more complex then the previous ones.
Writing smaller shaders means to separate some of your shader functionalities in another files. Then if you are writing a big shader which contains lightning algorithms, antialiasing algorithms, and any other shader computation algorithm, you can separate them in smaller shader files (light.glsl, fxaa.glsl, and so on...) and you have to link these files in your main shader file (the shader file which contains the void main() function) since in OpenGL a vertex array can only have one shader program (composition of vertex shader, fragment shader, geometry shader, etc...) during the rendering pipeline.
The way of writing smaller shader depends also on your rendering algorithm (forward rendering, deffered rendering, or forward+ rendering).
It's important to notice that writing a lot of shader will increase the shader compilation time, and also, writing a big shader with a lot of uniforms will also slow things down...

How hard would it be to write this shader? Multipoint shadow lighting

So, I have a simple project right now. Basically its just a bunch of cuboids that are all axis alligned... so it has really simple geometry.
Anyway I am considering adding a better shader to it. Currently I am using the "flat shader" that is a stock shader in GLShaderManager. It is coloring everything with a flat color. However I would love if I could build a shader like the following.
Basically I want a shader that has an array of point lights at various positions with varying intensities.
Probably defined like this.
struct Light {
float x;
float y;
float z;
float intensity;
};
Light Lighting[20];
And basically based on the level geometry and lights, I would love to simulate basic lighting and shadows, also it would be cool to have a circle under the player (like the player is actually their).
How hard would this be to make? How would I pass it my level geometry and light array. (note even though each cuboid is its own QUADS batch it will be easy to make any kind of variable that stores the data).
I am using Glew, GLTools, GLShaderManager, GLBatch, visual studio 2010, probably whatever "GSHL".
If you could just let me know how complicated a shader like this would be let me know. Also if it is easy to find a shader that works like this online if you could link it.
Also what are the difference between the two types of shaders? (Vertex, and fragment).
I would say it's relatively simple, but the thing about modern GL is that the initial learning curve is quite steep. At first it seems like you have to roll up your sleeves and learn how to do everything (essentially true) but later, it starts to seem like it made things easier than ever before with much more predictable behaviors since you're in the driver's seat.
One of the first things you want to learn to do is understand how to specify attributes from CPU to GPU. For attributes which don't vary on a per-vertex or per-fragment basis such as your light positions and intensities, you want uniform attributes. Check out examples utilizing the glUniform* functions to see how to do this. This will allow you to then experiment, passing values from the CPU side to the GPU side and then seeing how they affect the shader to accelerate your learning.
After that, it's worth learning how direct lighting is computed given a ray bouncing off a surface with phong shading, separating ambient, diffuse, and specular terms.
Later you might even want to store this light data into an environment map. That'll give you the ability to use as many lights as you want without affecting the speed of the shader.
About vertex vs. fragment shaders, vertex shaders compute things on a vertex-by-vertex basis, including data for the fragment shader to then use. The fragment shader is kind of like a pixel shader (in HLSL, it's actually just called a 'pixel shader'). It deals with rasterizing what's in between those vertices and is operating on a pixel-by-pixel basis (however with some potential overdraw). Often for lighting, the real heart of the logic will be in the fragment shader, while the vertex shader serves as an intermediary step to compute all the relevant values for the fragment shader to interpolate and use. The vertex shader is part of the 3D geometry pipeline, while the fragment shader is part of 2D rasterization.
It shouldn't take too long or be too hard to get the hang of this, but you want to approach this kind of slowly and in babysteps. There's a lot of setup work involved in establishing a lighting/shading pipeline for your software with the precise characteristics you want, and for the final work, you want to kind of plan ahead. So it's good to establish a separate scrap project and start experimenting away to figure out how things work.

OpenGL - Is switching shaders and uniforms often a serious bottleneck?

I'm currently planning out a renderer and I have 2 different ways I could handle shaders. I've written pseduo-code for them:
Example A
for all models {
bind all vertex data for the model
for each shader on this model {
set shader
upload uniforms
draw indicies for this shader
}
}
Example B
for all models {
collect geometry and seperate it by shader
}
sort geometry by shader into groups
for all shaders {
set shader
upload uniforms
draw geometry group for this shader
}
The advantage of Example A is that we only have to upload the vertex data one time and it's shared for all the geometry. The downside of this is I have to constantly change the shader and upload uniforms to it.
The advantage of Example B is that I can sort all the geometry in the entire scene by shader so I only have to apply each shader 1 time for the whole draw. This also means that I will have more on the draw stack at any given time so there is less "idle" time after a drawing for a shader is finished.
Which way would end up being better on performance? Based on what I've read I'm leaning towards Example B but I'd like to learn a little more about it.
Like so often with OpenGL it depends on the implementation (=driver) in question.
For example early NVidia drivers were prone for complete shader recompilation when Uniform values changed (so changing a uniform could be much more expensive than switching a texture or shader). Later this bottleneck was removed and changing a uniform value became a rather cheap operation (cheaper than switching a texture or a shader).
As with all performance related questions: You have to profile your program and test out your options.

Is it possible to have face culling before the geometry shader stage?

Is it possible for only front facing triangles to be sent to the geometry shader? I believe that culling only happens to emitted triangles after the geometry shader by default.
Yes it is, back in the ancient days of Quake, face culling was done on the CPU using a simple dot product per-triangle. Any triangle that failed the test was not included in the list of indices drawn.
This is not a viable optimization on most hardware these days, but you still see it employed from time to time in specialized applications. One such application I have seen a lot of is using the PS3's Cell SPEs to cull out triangles during batching to save vertex transform workload on the PS3's RSX GPU - keep in mind, the PS3 still uses a basic shader architecture where there are a fixed number of specialized vertex shader units and fragment shader units. Balancing shader workload is important on that GPU.
I may be missing the point of your question though; what benefit do you expect/want to get out of culling the primitives early?
Update:
What I was trying to say is that on modern hardware and software, vertex transform / primitive assembly is usually not a bottleneck. Fragment processing is much more expensive these days, so having primitives culled during rasterization is usually the extent to which you have to worry about things for performance. The PS3's RSX is a special case, it has very poor vertex performance and a CPU architecture that is hard to keep busy, so it makes sense to offload primitive culling to the CPU.
You can still cull triangles before the vertex shader/tessellation/geometry shader on the CPU, but storing normals per-triangle somewhere and transferring a new set of indices to draw each frame hardly makes this a wise use of resources. You may spend more time and memory setting up the reduced list of triangles than you would of if you processed them on the GPU and let GL throw the backward facing primitives out during rasterization.
There is at least one use-case that comes to mind where this actually could still be a useful thing to do. I am referring to tessellated patches. If you can determine on the CPU before tessellation occurs that the entire patch faces the wrong way, you can skip having to tessellate them on the GPU. Ordinarily rendering will not be vertex-bound these days, but tessellation is one case where it may be.
It is actually possible. Like the other answer mentioned you can do it on the software side. But there are stages in-between the vertex shader and geometry shader. Namely, the hull (programmable), primitive generator (fixed), and domain (programmable). In the hull shader you can specify tessellation levels for your patch.
If you set any of these levels to 0.0, then the patch will be discarded and it will not enter the geometry shader!
Hope this helps :)
Although it doesn't technically satisfy the question because the triangle reaches the geometry shader, you can cull triangles within the geometry shader itself. This might be the solution people reading this question are looking for, it was for me.
I used the shader below to implement wireframe drawing of quads with culling, where each quad is drawn using two triangles. The idea of using
glPolygonMode(GL_FRONT_AND_BACK, GL_LINES)
with culling includes the diagonal of each quad, which isn't desired, so I use a geometry shader to convert each triangle to just the two lines. However now culling doesn't work because only lines are emitted by the geometry shader. Instead culling is included at the beginning of the geometry shader, just set the uniform culling to -1, 0 or 1 for your desired behaviour (Note, this culling test assumes that the w coord is positive for each vertex).
#version 430 core
layout (triangles) in;
layout (line_strip, max_vertices=3) out;
uniform int culling;
void main()
{
// Perform culling here with an early return
mat3 M = mat3(gl_in[0].gl_Position.xyz, gl_in[1].gl_Position.xyz, gl_in[2].gl_Position.xyz);
if(culling * determinant(M) < 0)
{
EndPrimitive(); // Necessary?
return;
}
gl_Position = gl_in[0].gl_Position;
EmitVertex();
gl_Position = gl_in[1].gl_Position;
EmitVertex();
gl_Position = gl_in[2].gl_Position;
EmitVertex();
EndPrimitive();
}

Understanding the shader workflow in OpenGL?

I'm having a little bit of trouble conceptualizing the workflow used in a shader-based OpenGL program. While I've never really done any major projects using either the fixed-function or shader-based pipelines, I've started learning and experimenting, and it's become quite clear to me that shaders are the way to go.
However, the fixed-function pipeline makes much more sense to me from an intuitive perspective. Rendering a scene with that method is simple and procedural—like painting a picture. If I want to draw a box, I tell the graphics card to draw a box. If I want a lot of boxes, I draw my box in a loop. The fixed-function pipeline fits well with my established programming tendencies.
These all seem to go out the window with shaders, and this is where I'm hitting a block. A lot of shader-based tutorials show how to, for example, draw a triangle or a cube on the screen, which works fine. However, they don't seem to go into at all how I would apply these concepts in, for example, a game. If I wanted to draw three procedurally generated triangles, would I need three shaders? Obviously not, since that would be infeasible. Still, it's clearly not as simple as just sticking the drawing code in a loop that runs three times.
Therefore, I'm wondering what the "best practices" are for using shaders in game development environments. How many shaders should I have for a simple game? How do I switch between them and use them to render a real scene?
I'm not looking for specifics, just a general understanding. For example, if I had a shader that rendered a circle, how would I reuse that shader to draw different sized circles at different points on the screen? If I want each circle to be a different color, how can I pass some information to the fragment shader for each individual circle?
There is really no conceptual difference between the fixed-function pipeline and the programmable pipeline. The only thing shaders introduce is the ability to program certain stages of the pipeline.
On current hardware you have (for the most part) control over the vertex, primitive assembly, tessellation and fragment stages. Some operations that occur inbetween and after these stages are still fixed-function, such as depth/stencil testing, blending, perspective divide, etc.
Because shaders are actually nothing more than programs that you drop-in to define the input and output of a particular stage, you should think of input to a fragment shader as coming from the output of one of the previous stages. Vertex outputs are interpolated during rasterization and these are often what you're dealing with when you have an in variable in a fragment shader.
You can also have program-wide variables, known as uniforms. These variables can be accessed by any stage simply by using the same name in each stage. They do not vary across invocations of a shader, hence the name uniform.
Now you should have enough information to figure out this circle example... you can use a uniform to scale your circle (likely a simple scaling matrix) and you can either rely on per-vertex color or a uniform that defines the color.
You don't have shaders that draws circles (ok, you may with the right tricks, but's let's forget it for now, because it is misleading and has very rare and specific uses). Shaders are little programs you write to take care of certain stages of the graphic pipeline, and are more specific than "drawing a circle".
Generally speaking, every time you make a draw call, you have to tell openGL which shaders to use ( with a call to glUseProgram You have to use at least a Vertex Shader and a Fragment Shader. The resulting pipeline will be something like
Vertex Shader: the code that is going to be executed for each of the vertices you are going to send to openGL. It will be executed for each indices you sent in the element array, and it will use as input data the correspnding vertex attributes, such as the vertex position, its normal, its uv coordinates, maybe its tangent (if you are doing normal mapping), or whatever you are sending to it. Generally you want to do your geometric calculations here. You can also access uniform variables you set up for your draw call, which are global variables whic are not goin to change per vertex. A typical uniform variable you might watn to use in a vertex shader is the PVM matrix. If you don't use tessellation, the vertex shader will be writing gl_Position, the position which the rasterizer is going to use to create fragments. You can also have the vertex outputs different things (as the uv coordinates, and the normals after you have dealt with thieri geometry), give them to the rasterizer an use them later.
Rasterization
Fragment Shader: the code that is going to be executed for each fragment (for each pixel if that is more clear). Generally you do here texture sampling and light calculation. You will use the data coming from the vertex shader and the rasterizer, such as the normals (to evaluate diffuse and specular terms) and the uv coordinates (to fetch the right colors form the textures). The texture are going to be uniform, and probably also the parameters of the lights you are evaluating.
Depth Test, Stencil Test. (which you can move before the fragment shader with the early fragments optimization ( http://www.opengl.org/wiki/Early_Fragment_Test )
Blending.
I suggest you to look at this nice program to develop simple shaders http://sourceforge.net/projects/quickshader/ , which has very good examples, also of some more advanced things you won't find on every tutorial.