I'm currently planning out a renderer and I have 2 different ways I could handle shaders. I've written pseduo-code for them:
Example A
for all models {
bind all vertex data for the model
for each shader on this model {
set shader
upload uniforms
draw indicies for this shader
Example B
for all models {
collect geometry and seperate it by shader
sort geometry by shader into groups
for all shaders {
set shader
upload uniforms
draw geometry group for this shader
The advantage of Example A is that we only have to upload the vertex data one time and it's shared for all the geometry. The downside of this is I have to constantly change the shader and upload uniforms to it.
The advantage of Example B is that I can sort all the geometry in the entire scene by shader so I only have to apply each shader 1 time for the whole draw. This also means that I will have more on the draw stack at any given time so there is less "idle" time after a drawing for a shader is finished.
Which way would end up being better on performance? Based on what I've read I'm leaning towards Example B but I'd like to learn a little more about it.

Like so often with OpenGL it depends on the implementation (=driver) in question.
For example early NVidia drivers were prone for complete shader recompilation when Uniform values changed (so changing a uniform could be much more expensive than switching a texture or shader). Later this bottleneck was removed and changing a uniform value became a rather cheap operation (cheaper than switching a texture or a shader).
As with all performance related questions: You have to profile your program and test out your options.


OpenGL Lighting Shader

I can't understand concept of smaller shaders in OpenGL. How does it work? For example: do I need to create one shader for positioning object in space and then shader another shader for lighting or what? Could someone explain this to me? Thanks in advance.
This is a very complex topic, especially since your question isn't very specific. At first, there are various shader stages (vertex shader, pixel shader, and so on). A shader program consists of different shader stages, at least a pixel and a vertex shader (except for compute shader programs, which are each single compute shaders). The vertex shader calculates the possition of the points on screen, so here the objects are being moved. The pixel shader calculates the color of each pixel, that is covered by the rendered geometry your vertex shader produced. Now, in terms of lighting, there are different ways of doing it:
Forward Shading
This is the straight-forward way, where you simply calculate the lighting in pixel shader of the same shader program, that moves to objects. This is the oldest way of calculating lighting, and the easiest one. However, it's abilities are very limited.
Deffered Shading
For ages, this is the go-to variant in games. Here, you have one shader program (vertex + pixel shader) that renders the geometrie on one (or multiple) textures (so it moves the objects, but it doesn't save the lit color, but rather things like the base color and surface normals into the texture), and then an other shader program that renders a quad on screen for each light you want to render, the pixel shader of this shader program reads the informations previously rendered in the textur by the first shader program, and uses it to render the lit objects on an other textur (which is then the final image). In constrast to forward shading, this allows (in theory) any number of lights in the scene, and allows easier usage of shadow maps
Tiled/Clustered Shading
This is a rather new and very complex way of calculating lighting, that can be build on top of deffered or forward shading. It basicly uses compute shaders to calculate an accelleration-structure on the gpu, which is then used draw huge amount of lights very fast. This allows to render thousands of lights in a scene in real time, but using shadow maps for these lights is very hard, and the algorithm is way more complex then the previous ones.
Writing smaller shaders means to separate some of your shader functionalities in another files. Then if you are writing a big shader which contains lightning algorithms, antialiasing algorithms, and any other shader computation algorithm, you can separate them in smaller shader files (light.glsl, fxaa.glsl, and so on...) and you have to link these files in your main shader file (the shader file which contains the void main() function) since in OpenGL a vertex array can only have one shader program (composition of vertex shader, fragment shader, geometry shader, etc...) during the rendering pipeline.
The way of writing smaller shader depends also on your rendering algorithm (forward rendering, deffered rendering, or forward+ rendering).
It's important to notice that writing a lot of shader will increase the shader compilation time, and also, writing a big shader with a lot of uniforms will also slow things down...

A few questions about shaders

I am using opengl shaders.
Does count of uniforms affect shader performance? If I pass 5 uniforms or 50 will it matter?
Does each shader has its own area where it working on? Or each shader can draw at any point of my application?
I often create vertex shader just to pass attributes to fragment shader. What benefit of vertex shader and why not just pass attributes in fragment?
I would guess it doesn't (and if it does, only a very minor one). But I don't have any evidence for that, so I might be wrong. This is almost certainly driver-specific.
A shader does not draw anything. A shader just processes data. In the pipeline, the rasterizer produces the fragments that are covered by your shape. And these are the fragments that you can potentially draw to. The fragment shader calculates the color (and possibly depth) and the rest of the pipeline decides what to do with the result (either updating the frame buffer, blending, or discarding it altogether). Each draw call can potentially produce a framebuffer update everywhere, not just at some specific locations.
This is perfectly fine if the application requires it. The main difference is that vertex shaders process vertices and fragment shaders process fragments. Usually, there are much more fragments than vertices, so the fragment shader is called more often than the vertex shader. Therefore, you should do as much work in the vertex shader as possible. Of course, there are things that you just cannot calculate in a vertex shader.

Why is the geometry shader processed after the vertex shader?

In both the OpenGL and Direct3D rendering pipelines, the geometry shader is processed after the vertex shader and before the fragment/pixel shader. Now obviously processing the geometry shader after the fragment/pixel shader makes no sense, but what I'm wondering is why not put it before the vertex shader?
From a software/high-level perspective, at least, it seems to make more sense that way: first you run the geometry shader to create all the vertices you want (and dump any data only relevant to the geometry shader), then you run the vertex shader on all the vertices thus created. There's an obvious drawback in that the vertex shader now has to be run on each of the newly-created vertices, but any logic that needs to be done there would, in the current pipelines, need to be run for each vertex in the geometry shader, presumably; so there's not much of a performance hit there.
I'm assuming, since the geometry shader is in this position in both pipelines, that there's either a hardware reason, or a non-obvious pipeline reason that it makes more sense.
(I am aware that polygon linking needs to take place before running a geometry shader (possibly not if it takes single points as inputs?) but I also know it needs to run after the geometry shader as well, so wouldn't it still make sense to run the vertex shader between those stages?)
It is basically because "geometry shader" was a pretty stupid choice of words on Microsoft's part. It should have been called "primitive shader."
Geometry shaders make the primitive assembly stage programmable, and you cannot assemble primitives before you have an input stream of vertices computed. There is some overlap in functionality since you can take one input primitive type and spit out a completely different type (often requiring the calculation of extra vertices).
These extra emitted vertices do not require a trip backwards in the pipeline to the vertex shader stage - they are completely calculated during an invocation of the geometry shader. This concept should not be too foreign, because tessellation control and evaluation shaders also look very much like vertex shaders in form and function.
There are a lot of stages of vertex transform, and what we call vertex shaders are just the tip of the iceberg. In a modern application you can expect the output of a vertex shader to go through multiple additional stages before you have a finalized vertex for rasterization and pixel shading (which is also poorly named).

Why is a simple shader slower than the standard pipeline?

I want to write a very simple shader which is equivalent to (or faster) than the standard pipeline. However, even the simplest shader possible:
Vertex Shader
void main(void)
gl_TexCoord[0] = gl_MultiTexCoord0;
gl_Position = ftransform();
Fragment Shader
uniform sampler2D Texture0;
void main(void)
gl_FragColor = texture2D(Texture0, gl_TexCoord[0].xy);
Cuts my framerate half in my game, compared to the standard shader, and performs horrific if some transparent images are displayed. I don't understand this, because the standard shader (glUseProgram(0)) does lighting and alpha blending, while this shader only draws flat textures. What makes it so slow?
It looks like this massive slowdown of custom shaders is a problem with old Intel Graphics chips, which seem to emulate the shaders on the CPU.
I tested the same program on recent hardware and the frame drop with the custom shader activated is only about 2-3 percents.
EDIT: wrong theory. See new answer below
I think you might bump into overdraw.
I don't know what engine you are using your shader on, but if you have alpha blend on then you might end up overdrawing allot.
Think about it this way :
If you have a 800x600 screen, and a 2D quad over the whole screen, that 2D quad will have 480000 fragment shader calls, although it has only 4 vertexes.
Now, moving further, let's assume you have 10 such quads, on on top of another. If you don't sort your geometry Front to Back or if you are using alpha blend with no depth test, then you will end up with 10x800x600 = 4800000 fragment calls.
2D usually is quite expensive on OpenGL due to the overdraw. 3D rejects many fragments. Eventhou the shaders are more complicated, the number of calls are greatly reduced for 3D objects compared to 2D objects.
After long investigation, the slowdown of the simple shader was caused by the shader being too simple.
In my case, the slowdown was caused by the text rendering engine, which made heavy use of "glBitmap", which would be very slow with textures enabled (for whatever reason I cannot understand; these letters are tiny).
However, this did not affect the standard pipeline, as it would acknowledge the feature glDisable(GL_LIGHTING) and glDisable(GL_TEXTURE_2D ), which circumvents the slowdown, whereas the simple shader failed to do so and would thus even do more work as the standard pipeline. After introducing these two features to the custom shader, it is as fast as the standard pipeline, plus the ability to add random effects without any performance impact!

Performance difference between geometry shader and vertex shader

Currently am rendering a model of around 1 million vertices. And inside vertex shader i am doing some complex computation for each vertex. Now i would like to increase the resolution of the model.
I have two queries regarding this:
Is it advisable to use geometry shader for increasing resolution to very large factors like 64 times?
If i introduce a geometry shader i might need to move my computation from vertex shader to geometry shader. Whether doing an operation in verterx shader is same as doing it in geometry shader, in terms of performance.
Is it advisable to use geometry shader for increasing resolution to very large factors like 64 times.
Absolutely not. While GS's can amplify geometry and perform tessellation, that's not really what they're for. Their main purposes are for handling transform feedback data (particularly hardware that can handle multi-stream output) and layered rendering.
If i introduce a geometry shader i might need to move my computation from vertex shader to geometry shader. Whether doing an operation in verterx shader is same as doing it in geometry shader, in terms of performance.
Do as little work in the GS as is reasonable. The GS happens after the post-T&L cache, and you want to get as much out of that as possible. So do as much of your real transformation work as is reasonable in the vertex shader.