I have found a few times differences between GPUs handling fragment shaders. One example was doing pow(x) where x is negative. One GPU handled it well while the other one failed.
Another situation was where I rewrote if() statements with step() statement and shader worked well. I blamed this to branching limit or something.
Now I am in situation where my fragment shader works on some GPUs and on some don't. I have tried to search for GPU/shader limits and similar information but found nothing.
The very current test which works everywhere I tried except on my GTX 780 is here online(Shadertoy)
I am asking for any directions or a link to shader limitations and most common issues in compatibility.
Shader limitations and specification, are vendor-specific, and are changing along with the GPU architecture versions.
Simply put, there's no unified way to "rule them all". Certain GPUs handle branching differently, some better, some worse. Some GPUs allow negative values in math functions, some don't. It quite depends on the architecture that's been used, version of the shading language and instructions that are allowed in the compiled version of the shader.
Instead of reading and trying to learn what works on which card, it's best to try/test shader on the specific GPU. That's probably the most reasonable decision regarding the resources spent trying to "fix the issue".
To answer the question directly, there's no (easy-to-find) resource which lists entire specification of the compiler used on the specific architecture, you simply follow tips'n'trick learned along the way and apply suggestions and observations made by others.
Related
The OpenGL Graphics Pipeline is changing every year. So the programmable Pipelines are growing. At the end, as an opengl Programmer we create many little programms (Vertex, Fragment, Geometry, Tessellation, ..)
Why is there such a high specialization between the stages? Are they all running on a different part of the hardware? Why not just writing one code-block to describe what should be come out at the end instead of juggling between the stages?
http://www.g-truc.net/doc/OpenGL%204.3%20Pipeline%20Map.pdf
In this Pipeline PDF we see the beast.
In the days of "Quake" (the game), developers had the freedom to do anything with their CPU rendering implementations, they were in control of everything in the "pipeline".
With the introduction of fixed pipeline and GPUs, you get "better" performance, but lose a lot of the freedom. Graphics developers are pushing to get that freedom back. Hence, more customization pipeline everyday. GPUs are even "fully" programmable now using tech such as CUDA/OpenCL, even if it's not strictly about graphics.
On the other hand, GPU vendors cannot replace the whole pipeline with fully programmable one overnight. In my opinion, this boils down to multiple reasons;
GPU capabilities and cost; GPUs evolve with each iteration, it's
nonsense to throw away all the architecture you have and replace it
overnight, instead you add new features and enhancements every iteration, especially
when developers ask for it (example: Tessellation stage). Think of CPUs, Intel tried to replace the x86 architecture with Itanium, losing backward compatibility, having failed, they eventually copied what AMD did with AMDx64 architecture.
They also can't fully replace it due to legacy applications support, which are more widely used than someone might expect.
Historically, there were actually different processing units for the different programmable parts - there were Vertex Shader processors and Fragment Shader processors, for example. Nowadays, GPUs employ a "unified shader architecture" where all types of shaders are executed on the same processing units. That's why non-graphic use of GPUs such as CUDA or OpenCL is possible (or at least easy).
Notice that the different shaders have different inputs/outputs - a vertex shader is executed for each vertex, a geometry shader for each primitive, a fragment shader for each fragment. I don't think this could be easily captured in one big block of code.
And last but definitely far from least, performance. There are still fixed-function stages between the programmable parts (such as rasterisation). And for some of these, it's simply impossible to make them programmable (or callable outside of their specific time in the pipeline) without reducing performance to a crawl.
Because each stage has a different purpose
Vertex is to transform the points to where they should be on the screen
Fragment is for each fragment (read: pixel of the triangles) and applying lighting and color
Geometry and Tessellation both do things the classic vertex and fragment shaders cannot (replacing the drawn primitives with other primitives) and are both optional.
If you look carefully at that PDF you'll see different inputs and outputs for each shader/
Separating each shader stage also allows you to mix and match shaders beginning with OpenGL 4.1. For example, you can use one vertex shader with multiple different fragment shaders, and swap out the fragment shaders as needed. Doing that when shaders are specified as a single code block would be tricky, if not impossible.
More info on the feature: http://www.opengl.org/wiki/GLSL_Object#Program_separation
Mostly because nobody wants to re-invent the wheel if they do not have to.
Many of the specialized things that are still fixed-function would simply make life more difficult for developers if they had to be programmed from scratch to draw a single triangle. Rasterization, for instance, would truly suck if you had to implement primitive coverage yourself or handle attribute interpolation. It might add some novel flexibility, but the vast majority of software does not require that flexibility and developers benefit tremendously from never thinking about this sort of stuff unless they have some specialized application in mind.
Truth be told, you can implement the entire graphics pipeline yourself using compute shaders if you are so inclined. Performance generally will not be competitive with pushing vertices through the traditional render pipeline and the amount of work necessary would be quite daunting, but it is doable on existing hardware. Realistically, this approach does not offer a lot of benefits for rasterized graphics, but implementing a ray-tracing based pipeline using compute shaders could be a worthwhile use of time.
I have used Opengl for a semester, but in a traditional way, like: glBegin...glEnd.
I heard someone said the GLSL is the future of OpenGL, I was just wondering do I need jump into GLSL instead of the traditional OpenGL?
Moreover, whether GLSL only works well for good GPU?
Short answer: Yes, you do need to update your OpenGL usage as you will generally get lousy performance from glBegin/glEnd and limit what you can do by constraining yourself to the old fixed pipe behavior.
Long answer:
You're mixing up two different problems. One of them is immediate mode (glBegin glVertex ... glEnd, etc.) vs. batched mode (glVertexPointer, etc.). To get full performance out of modern GPUs you need to used batches. See this SO discussion: When are VBOs faster than "simple" OpenGL primitives (glBegin())?.
The other one is fixed pipe vs. programmable shaders (glEnable states, etc.. vs. GLSL). This can be a performance issue in many cases, but more importantly it's a flexibility issue. With GLSL you have far more control over how things are rendered, so you can accomplish things that weren't really possible using the fixed pipe -- at least not at a usable frame rate. Programmable shaders are also a better reflection of how modern GPUs really work -- in fact if you use the fixed pipe it is probably just being emulated with a shader under the hood.
GLSL is not the future of OpenGL, it's the current way of programming. As Aeluned states, glBegin and glEnd are deprecated (and not even supported in OpenGL ES.)
And what do you mean by good GPU? Even Intel integrated graphic cards support shaders, using GLSL is not slower just for being GLSL. You might get a slow performance when doing heavy stuff, but if you implement the fixed pipeline I think you will get the same performance.
I'd say learning GLSL is the way to go.
Why do people tend to mix deprecated fixed-function pipeline features like the matrix stack, gluPerspective(), glMatrixMode() and what not when this is meant to be done manually and shoved into GLSL as a uniform.
Are there any benefits to this approach?
There is a legitimate reason to do this, in terms of user sanity. Fixed-function matrices (and other fixed-function state tracked in GLSL) are global state, shared among all uniforms. If you want to change the projection matrix in every shader, you can do that by simply changing it in one place.
Doing this in GLSL without fixed function requires the use of uniform buffers. Either that, or you have to build some system that will farm state information to every shader that you want to use. The latter is perfectly doable, but a huge hassle. The former is relatively new, only introduced in 2009, and it requires DX10-class hardware.
It's much simpler to just use fixed-function and GLSL state tracking.
No benefits as far as I'm aware of (unless you consider not having to recode the functionality a benefit).
Most likely just laziness, or a lack of knowledge of the alternative method.
Essentially because those applications requires shaders to run, but programmers are too lazy/stressed to re-implement those features that are already available using OpenGL compatibility profile.
Notable features that are "difficult" to replace are the line width (greater than 1), the line stipple and separate front and back polygon mode.
Most tutorials teach deprecated OpenGL, so maybe people don't know better.
The benefit is that you are using well-known, thoroughly tested and reliable code. If it's for MS Windows or Linux proprietary drivers, written by the people who built your GPU and therefore can be assumed to know how to make it really fast.
An additional benefit for group projects is that There Is Only One Way To Do It. No arguments about whether you should be writing your own C++ matrix class and what it should be called and which operators to overload and whether the internal implementation should be a 1D or 2D arrary...
I'm currently programming a scientific imaging application using OpenGL.
I would like to know if OpenGL rendering (in term of retrieved pixel from FBO) is supposed to be fully deterministic when my code (C++ / OpenGL and simple GLSL) is executed on different hardware (ATI vs NVidia, various NVidia generations and various OS)?
More precisely, I'd need the exact same pixels buffer everytime I run my code on any hardware (that can runs basic GLSL and OpenGL 3.0)...
Is that possible? Is there some advice I should consider?
If it's not possible, is there a specific brand of video card (perhaps Quadro?) that could do it while varying the host OS?
From the OpenGL spec (version 2.1 appendix A):
The OpenGL specification is not pixel exact. It therefore does not guarantee an exact match between images produced by different GL implementations. However, the specification does specify exact matches, in some cases, for images produced by the same implementation.
If you disable all anti-aliasing and texturing, you stand a good chance of getting consistent results across platforms. However, if you need antialiasing or texturing or a 100% pixel-perfect guarantee, use software rendering only: http://www.mesa3d.org/
By "Deterministic", I'm going to assume you mean what you said (rather than what the word actually means): that you can get pixel identical results cross-platform.
No. Not a chance.
You can change the pixel results you get from rendering just by playing with settings in your graphics driver's application. Driver revisions from the same hardware can change what you get.
The OpenGL specification has never required pixel-perfect results. Antialiasing and texture filtering especially are nebulous parts.
If you read through the OpenGL specification, there are a number of deterministic conditions that must be met in order for the implementation to comply with the standard, but there are also a significant number of implementation details that are left entirely up to the hardware vendor / driver developer. Unless you render with incredibly basic techniques that fall under the deterministic / invariant categories (which I believe will keep you from using filtered texturing, antialiasing, lighting, shaders, etc), the standard allows for pretty significant differences between different hardware and even different drivers on the same hardware.
I am trying to optimize some OpenGL code and I was wondering if someone knows of a table that would give a rough approximation of the relative costs of various OpenGL functions ?
Something like (these numbers are probably completely wrong) :
method cost
glDrawElements(100 indices) 1
glBindTexture(512x512) 2
glGenBuffers(1 buffer) 1.2
If that doesn't exist, would it be possible to build one or are the various hardware/OS too different for that to be even meaningful ?
There certainly is no such list. One of the problems in creating such a list is answering the question, "what kind of cost?"
All rendering functions have a GPU-time cost. That is, the GPU has to do rendering. How much of a cost depends on the shaders in use, the number of vertices provided, and the textures being used.
Even for CPU time cost, the values are not clear. Take glDrawElements. If you changed the vertex attribute bindings before calling it, then it can take more CPU time than if you didn't. Similarly, if you changed uniform values in a program since you last used it, then rendering with that program may take longer. And so forth.
The main problem with assembling such a list is that it encourages premature optimization. If you have such a list, then users will be encouraged to take steps to avoid using functions that cost more. They may take too many steps along this route. No, it's better to just avoid the issue entirely and encourage users to actually profile their applications before optimizing them.
The relative costs of different OpenGL functions will depend heavily on the arguments to the function, the active OpenGL environment when they are called, and the GPU, drivers, and OS you're running on. There's really no good way to do a comparison like what you're describing -- your best bet is simply to test out the different possibilities and see what performs best for you.