Why is the Graphics Pipeline so highly specialized? (OpenGL) - c++

The OpenGL Graphics Pipeline is changing every year. So the programmable Pipelines are growing. At the end, as an opengl Programmer we create many little programms (Vertex, Fragment, Geometry, Tessellation, ..)
Why is there such a high specialization between the stages? Are they all running on a different part of the hardware? Why not just writing one code-block to describe what should be come out at the end instead of juggling between the stages?
http://www.g-truc.net/doc/OpenGL%204.3%20Pipeline%20Map.pdf
In this Pipeline PDF we see the beast.

In the days of "Quake" (the game), developers had the freedom to do anything with their CPU rendering implementations, they were in control of everything in the "pipeline".
With the introduction of fixed pipeline and GPUs, you get "better" performance, but lose a lot of the freedom. Graphics developers are pushing to get that freedom back. Hence, more customization pipeline everyday. GPUs are even "fully" programmable now using tech such as CUDA/OpenCL, even if it's not strictly about graphics.
On the other hand, GPU vendors cannot replace the whole pipeline with fully programmable one overnight. In my opinion, this boils down to multiple reasons;
GPU capabilities and cost; GPUs evolve with each iteration, it's
nonsense to throw away all the architecture you have and replace it
overnight, instead you add new features and enhancements every iteration, especially
when developers ask for it (example: Tessellation stage). Think of CPUs, Intel tried to replace the x86 architecture with Itanium, losing backward compatibility, having failed, they eventually copied what AMD did with AMDx64 architecture.
They also can't fully replace it due to legacy applications support, which are more widely used than someone might expect.

Historically, there were actually different processing units for the different programmable parts - there were Vertex Shader processors and Fragment Shader processors, for example. Nowadays, GPUs employ a "unified shader architecture" where all types of shaders are executed on the same processing units. That's why non-graphic use of GPUs such as CUDA or OpenCL is possible (or at least easy).
Notice that the different shaders have different inputs/outputs - a vertex shader is executed for each vertex, a geometry shader for each primitive, a fragment shader for each fragment. I don't think this could be easily captured in one big block of code.
And last but definitely far from least, performance. There are still fixed-function stages between the programmable parts (such as rasterisation). And for some of these, it's simply impossible to make them programmable (or callable outside of their specific time in the pipeline) without reducing performance to a crawl.

Because each stage has a different purpose
Vertex is to transform the points to where they should be on the screen
Fragment is for each fragment (read: pixel of the triangles) and applying lighting and color
Geometry and Tessellation both do things the classic vertex and fragment shaders cannot (replacing the drawn primitives with other primitives) and are both optional.
If you look carefully at that PDF you'll see different inputs and outputs for each shader/

Separating each shader stage also allows you to mix and match shaders beginning with OpenGL 4.1. For example, you can use one vertex shader with multiple different fragment shaders, and swap out the fragment shaders as needed. Doing that when shaders are specified as a single code block would be tricky, if not impossible.
More info on the feature: http://www.opengl.org/wiki/GLSL_Object#Program_separation

Mostly because nobody wants to re-invent the wheel if they do not have to.
Many of the specialized things that are still fixed-function would simply make life more difficult for developers if they had to be programmed from scratch to draw a single triangle. Rasterization, for instance, would truly suck if you had to implement primitive coverage yourself or handle attribute interpolation. It might add some novel flexibility, but the vast majority of software does not require that flexibility and developers benefit tremendously from never thinking about this sort of stuff unless they have some specialized application in mind.
Truth be told, you can implement the entire graphics pipeline yourself using compute shaders if you are so inclined. Performance generally will not be competitive with pushing vertices through the traditional render pipeline and the amount of work necessary would be quite daunting, but it is doable on existing hardware. Realistically, this approach does not offer a lot of benefits for rasterized graphics, but implementing a ray-tracing based pipeline using compute shaders could be a worthwhile use of time.

Related

Compatibility issues on GLSL fragment shaders

I have found a few times differences between GPUs handling fragment shaders. One example was doing pow(x) where x is negative. One GPU handled it well while the other one failed.
Another situation was where I rewrote if() statements with step() statement and shader worked well. I blamed this to branching limit or something.
Now I am in situation where my fragment shader works on some GPUs and on some don't. I have tried to search for GPU/shader limits and similar information but found nothing.
The very current test which works everywhere I tried except on my GTX 780 is here online(Shadertoy)
I am asking for any directions or a link to shader limitations and most common issues in compatibility.
Shader limitations and specification, are vendor-specific, and are changing along with the GPU architecture versions.
Simply put, there's no unified way to "rule them all". Certain GPUs handle branching differently, some better, some worse. Some GPUs allow negative values in math functions, some don't. It quite depends on the architecture that's been used, version of the shading language and instructions that are allowed in the compiled version of the shader.
Instead of reading and trying to learn what works on which card, it's best to try/test shader on the specific GPU. That's probably the most reasonable decision regarding the resources spent trying to "fix the issue".
To answer the question directly, there's no (easy-to-find) resource which lists entire specification of the compiler used on the specific architecture, you simply follow tips'n'trick learned along the way and apply suggestions and observations made by others.

Modern OpenGL for 2d graphics

I wanted to use Opengl for 2d graphics because of its hardware acceleration. Would you rather recommend me to use modern Opengl or to use the fixed pipeline for this? This whole shader
writing seems like too much overhead to me. I just want to draw some 2d primitives.
Even for trivial 2d graphics using the programmable pipeline as opposed to the fixed function pipeline is what you want. In the end, the programmable pipeline gives you more freedom in expressing your graphics. How you decide to program the pipeline is up to you and is driven by your graphical needs. It could be, that you only need a single shader. There is no written rule that you need to have hundreds of shaders for it to be 'modern opengl'.
In that aspect it's debatable if modern opengl really is that much effort at all. It's a shader, a vertex/index buffers and a few textures. In comparison to fixed function pipeline, has it really changed that much that you even have to consider sticking to the fixed function pipeline?
A more compelling reason why you should prefer the programmable pipeline is that the fixed function pipeline is deprecated. In other words, pending removal. In concept a IHV could decide to drop support for it at any moment.
Modern OpenGL is better. Don't be afraid of shaders. Without them, you can't do anything besides draw the image using some blending... You can't go too far...
With shaders, you can do pretty much everything (effects like Photoshop, for example).

What is the difference between OpenCL and OpenGL's compute shader?

I know OpenCL gives control of the GPU's memory architecture and thus allows better optimization, but, leaving this aside, can we use Compute Shaders for vector operations (addition, multiplication, inversion, etc.)?
In contrast to the other OpenGL shader types, compute shaders are not directly related to computer graphics and provide a much more direct abstraction of the underlying hardware, similar to CUDA and OpenCL. It provides customizable work group size, shared memory, intra-group synchronization and all those things known and loved from CUDA and OpenCL.
The main differences are basically:
It uses GLSL instead of OpenCL C. While there isn't such a huge difference bewteen those programming languages, you can however use all the graphics-related GLSL functions not available to OpenCL, like advanced texture types (e.g. cube map arrays), advanced filtering (e.g. mipmapping, well Ok, you will probably need to compute the mip-level yourself), and little convenience things like 4x4 matrices or geometric functions.
It is an OpenGL shader program like any other GLSL shader. This means accessing OpenGL data (like buffers, textures, images) is just trivial, while interfacing between OpenGL and OpenCL/CUDA can get tedious, with possible manual synchronization effort from your side. In the same way integrating it into an existing OpenGL workflow is also trivial, while setting up OpenCL is a book on its own, not to speak of its integration into an existing graphics pipeline.
So what this comes down to is, that compute shaders are really intended for use within existing OpenGL applications, though exhibiting the usual (OpenCL/CUDA-like) compute-approach to GPU programming, in contrast to the graphics-approach of the other shader stages, which didn't have the compute-flexibility of OpenCL/CUDA (while offering other advantages, of course). So doing compute tasks is more flexible, direct and easy than either squeezing them into other shader stages not intended for general computing or introducing an additional computing framework you have to synchronize with.
Compute shaders should be able to do nearly anything achievable with OpenCL with the same flexibility and control over hardware resources and with the same programming approach. So if you have a good GPU-suitable algorithm (that would work well with CUDA or OpenCL) for the task you want to do, then yes, you can also do it with compute shaders, too. But it wouldn't make that much sense to use OpenGL (which still is and will probably always be a framework for real-time computer graphics in the first place) only because of compute shaders. For this you can just use OpenCL or CUDA. The real strength of compute shaders comes into play when mixing graphics and compute capabilities.
Look here for another perspective.
Summarizing:
Yes, OpenCL already existed, but it targets heavyweight applications (think CFD, FEM, etc), and it is much more universal than OpenGL (think beyond GPUs... Intel's Xeon Phi architecture supports >50 x86 cores).
Also, sharing buffers between OpenGL/CUDA and OpenCL is not fun.

When should I use GLSL?

I have used Opengl for a semester, but in a traditional way, like: glBegin...glEnd.
I heard someone said the GLSL is the future of OpenGL, I was just wondering do I need jump into GLSL instead of the traditional OpenGL?
Moreover, whether GLSL only works well for good GPU?
Short answer: Yes, you do need to update your OpenGL usage as you will generally get lousy performance from glBegin/glEnd and limit what you can do by constraining yourself to the old fixed pipe behavior.
Long answer:
You're mixing up two different problems. One of them is immediate mode (glBegin glVertex ... glEnd, etc.) vs. batched mode (glVertexPointer, etc.). To get full performance out of modern GPUs you need to used batches. See this SO discussion: When are VBOs faster than "simple" OpenGL primitives (glBegin())?.
The other one is fixed pipe vs. programmable shaders (glEnable states, etc.. vs. GLSL). This can be a performance issue in many cases, but more importantly it's a flexibility issue. With GLSL you have far more control over how things are rendered, so you can accomplish things that weren't really possible using the fixed pipe -- at least not at a usable frame rate. Programmable shaders are also a better reflection of how modern GPUs really work -- in fact if you use the fixed pipe it is probably just being emulated with a shader under the hood.
GLSL is not the future of OpenGL, it's the current way of programming. As Aeluned states, glBegin and glEnd are deprecated (and not even supported in OpenGL ES.)
And what do you mean by good GPU? Even Intel integrated graphic cards support shaders, using GLSL is not slower just for being GLSL. You might get a slow performance when doing heavy stuff, but if you implement the fixed pipeline I think you will get the same performance.
I'd say learning GLSL is the way to go.

Loading and using an HLSL shader?

I've been looking everywhere and all I can find are tutorials on writing the shaders. None of them showed me how to incorporate them into my scene.
So essentially:
Given an hlsl shader, if I were to have a function called drawTexturedQuad() and I wanted the shader to be applied to the result, how exactly could I do this?
Thanks
ID3DXEffect provides Begin() and BeginPass() methods. Simply call drawQuad() during that time. Any basic tutorial on shaders should show such a sample.
Just an additional note- if in doubt, ask MSDN.
The answer to this is surprisingly complex, and has been getting more difficult as the GPU hardware has been getting more and more powerful. The D3DX FX system is an example of all the work that needs to be done, so using that is a good step to just getting things working for short-term usage.
Shaders are code, but they live on another machine from the CPU, so need all of their data marshalled over. The fixed parts: basic render states like depth states, stencil states, blending modes, drawing commands; are extremely easy to implement. The hard part is making a bridge for the programmable parts: shaders, buffers, samplers, and textures.
Index buffers just work, since you can only have one, or none in the case of rendering un-indexed geometry.
Vertex buffers are more or less fairly easy to deal with, since the hardware can be programmed to read the vertex buffers procedurally. Your vertex buffers only needs to provide at least as much information as the vertex shader wants to access. Modifications to the shader's vertex input, or the vertex format requiring editing both sides at the same time, and so is reasonably easy to work with.
Samplers and Textures are the next 'easier' of the hard parts to hook: they have a variable name and a rather rigid type. For instance, When compiling shader 'foo', texture 'myNormalMap', is assigned texture slot 3. You need to look up (via the reflection APIs) which slot the texture was assigned, and set the texture your engine considers 'myNormalMap' to be to slot 3 at runtime, and of course also use the API to determine if the texture is even needed in the first place. This is where starting to have naming conventions for shader variables starts to matter, so multiple shaders can be made compatible with the same C++ code.
Constant buffers (or raw shader constants in D3D9) are a lot trickier, especially with a programmable shader framework like you can find in engines like Unreal. The constants any given shader uses is a subset of the full list, but the C++ side must generally be written as if all of them are needed. The reflection APIs again are needed to determine not only which variables are actually referenced in a shader, but where they are located. This became a bit more manageable in D3D10 and newer as the cbuffers are structs and less fluid than the D3D9 system which was heavily limited by register count, but it also adds the step of also needing to use the reflection APIs to determine the order of cbuffer bindings (and which cbuffers themselves are also referenced).
In the end there is one design to make it all work:
Make a class that drives a specific shader archetype. For each variable this class exposes to a shader (be it a texture, constant buffer, etc), look up in the reflection info if it is used, and find out its location, and set it. Keeping this fast, flexible, and extensible is all a difficult challenge.