I'm currently programming a scientific imaging application using OpenGL.
I would like to know if OpenGL rendering (in term of retrieved pixel from FBO) is supposed to be fully deterministic when my code (C++ / OpenGL and simple GLSL) is executed on different hardware (ATI vs NVidia, various NVidia generations and various OS)?
More precisely, I'd need the exact same pixels buffer everytime I run my code on any hardware (that can runs basic GLSL and OpenGL 3.0)...
Is that possible? Is there some advice I should consider?
If it's not possible, is there a specific brand of video card (perhaps Quadro?) that could do it while varying the host OS?
From the OpenGL spec (version 2.1 appendix A):
The OpenGL specification is not pixel exact. It therefore does not guarantee an exact match between images produced by different GL implementations. However, the specification does specify exact matches, in some cases, for images produced by the same implementation.
If you disable all anti-aliasing and texturing, you stand a good chance of getting consistent results across platforms. However, if you need antialiasing or texturing or a 100% pixel-perfect guarantee, use software rendering only: http://www.mesa3d.org/
By "Deterministic", I'm going to assume you mean what you said (rather than what the word actually means): that you can get pixel identical results cross-platform.
No. Not a chance.
You can change the pixel results you get from rendering just by playing with settings in your graphics driver's application. Driver revisions from the same hardware can change what you get.
The OpenGL specification has never required pixel-perfect results. Antialiasing and texture filtering especially are nebulous parts.
If you read through the OpenGL specification, there are a number of deterministic conditions that must be met in order for the implementation to comply with the standard, but there are also a significant number of implementation details that are left entirely up to the hardware vendor / driver developer. Unless you render with incredibly basic techniques that fall under the deterministic / invariant categories (which I believe will keep you from using filtered texturing, antialiasing, lighting, shaders, etc), the standard allows for pretty significant differences between different hardware and even different drivers on the same hardware.
Related
I have an application that uses OpenGL to draw output images. For testing purposes I'm trying to create reference images and then use precision hash to compare them to program output. While it works flawlessly within a context of a single computer I've encountered strange problems when using the same approach with computers running different GPUs. While the images generated on different GPUs appear absolutely identical to human eye they cannot pass prescision hash test when compared to one another and per-pixel comparison reveals that there are several pixels that are "off". I've been trying to find problems in my code for several days to no avail and this behaviour manifests itself on all the shaders that I use. Could this possibly be due to differences in OpenGL implementation from different hardware manufacturers? Is it a valid approach to compare images generated on different GPUs with phash for testing purposes?
Could this possibly be due to differences in OpenGL implementation
from different hardware manufacturers? Is it a valid approach to
compare images generated on different GPUs with phash for testing
purposes?
No, it is not. Quoting the OpenGL 4.6 core profile specification, Appendix A "Invariance" (emphaisis mine):
The OpenGL specification is not pixel exact. It therefore does not
guarantee an exact match between images produced by different GL
implementations. However, the specification does specify exact
matches, in some cases, for images produced by the same
implementation. The purpose of this appendix is to identify and
provide justification for those cases that require exact matches.
The guarantees for exact matches are made only within the same implementation, under very strict limits, and these are usueful for example for multi-pass approaches where you need to get exactly the same fragments in different passes.
Note that the other 3D rendering API are not pixel-exact either. The actual hardware implementations do vary between individual GPUs, and the specifications typically only specify broader rules that every implementation must fulfill, and you can rely on.
I know OpenCL gives control of the GPU's memory architecture and thus allows better optimization, but, leaving this aside, can we use Compute Shaders for vector operations (addition, multiplication, inversion, etc.)?
In contrast to the other OpenGL shader types, compute shaders are not directly related to computer graphics and provide a much more direct abstraction of the underlying hardware, similar to CUDA and OpenCL. It provides customizable work group size, shared memory, intra-group synchronization and all those things known and loved from CUDA and OpenCL.
The main differences are basically:
It uses GLSL instead of OpenCL C. While there isn't such a huge difference bewteen those programming languages, you can however use all the graphics-related GLSL functions not available to OpenCL, like advanced texture types (e.g. cube map arrays), advanced filtering (e.g. mipmapping, well Ok, you will probably need to compute the mip-level yourself), and little convenience things like 4x4 matrices or geometric functions.
It is an OpenGL shader program like any other GLSL shader. This means accessing OpenGL data (like buffers, textures, images) is just trivial, while interfacing between OpenGL and OpenCL/CUDA can get tedious, with possible manual synchronization effort from your side. In the same way integrating it into an existing OpenGL workflow is also trivial, while setting up OpenCL is a book on its own, not to speak of its integration into an existing graphics pipeline.
So what this comes down to is, that compute shaders are really intended for use within existing OpenGL applications, though exhibiting the usual (OpenCL/CUDA-like) compute-approach to GPU programming, in contrast to the graphics-approach of the other shader stages, which didn't have the compute-flexibility of OpenCL/CUDA (while offering other advantages, of course). So doing compute tasks is more flexible, direct and easy than either squeezing them into other shader stages not intended for general computing or introducing an additional computing framework you have to synchronize with.
Compute shaders should be able to do nearly anything achievable with OpenCL with the same flexibility and control over hardware resources and with the same programming approach. So if you have a good GPU-suitable algorithm (that would work well with CUDA or OpenCL) for the task you want to do, then yes, you can also do it with compute shaders, too. But it wouldn't make that much sense to use OpenGL (which still is and will probably always be a framework for real-time computer graphics in the first place) only because of compute shaders. For this you can just use OpenCL or CUDA. The real strength of compute shaders comes into play when mixing graphics and compute capabilities.
Look here for another perspective.
Summarizing:
Yes, OpenCL already existed, but it targets heavyweight applications (think CFD, FEM, etc), and it is much more universal than OpenGL (think beyond GPUs... Intel's Xeon Phi architecture supports >50 x86 cores).
Also, sharing buffers between OpenGL/CUDA and OpenCL is not fun.
I recently read this list and I noticed that almost everything I studied from the OpenGL Red Book is considered deprecated.
I'm talking about pixel transfer operations, pixel drawings, accumulation buffer, Begin/End functions (!?), automatic mipmap generation and current raster position.
Why did they flag these features as deprecated? Will it be okay to still use them? What are the workarounds?
In my opinion its for the better. But this so called Immediate Mode is indeed deprecated in OpenGL 3.0 mainly because its performance is not optimal.
In immediate mode you use calls like glBegin and glEnd. So the rendering of primitives depends on the program's commands, OpenGL can't advance until it gets the appropiate command from the CPU. Instead you can use buffer objects to store all your vertices and data. And then tell OpenGL to render its primitives using this buffer with commands like glDrawArrays or glDrawElements or even more specialized commands like glDrawElementsInstanced. While the GPU is busy executing those commands and drawing the buffer to the target FrameBuffer (basically a render target). The program can go off and issue some other commands. This way both the CPU and the GPU are busy at the same time, and no time is wasted.
Not the best explanation ever, but my advice: try to learn this new rendering pipeline instead. It's superior to immediate mode by far. I recommend tutorials like:
http://www.arcsynthesis.org/gltut/index.html
http://www.opengl-tutorial.org/
http://ogldev.atspace.co.uk/
Literally try to forget what you know so far, immediate mode is long deprecated and shouldn't be used anymore, instead, focus on the new technology ;)
Edit Excuse me if I used 'intermediate' instead of 'immediate', I think its actually called 'immediate', I tend to mix them up.
Why did they flag these features as deprecated?
First, some terminology: they aren't deprected. In OpenGL 3.0, they are deprecated (meaning "may be removed in later versions"); in 3.1 and above, most of them are removed. The compatibility profile brings the removed features back. And while it is widely implemented on Windows and Linux, Apple's 3.2 implementation only implements the core profile.
As to the reasoning behind the removal, it depends on which feature you're talking about. We can really only speculate as to why the ARB any specific feature:
pixel transfer operations
Pixel transfer operations have not been removed. If you're talking about glDrawPixels, that is a pixel transfer operation, but it is one pixel transfer. Not all of them.
Speaking of which:
pixel drawings
Because it was a horrible idea to begin with. glDrawPixels is a performance trap; it sounds nice and neat, but it performs terribly and because it's simple, people will try to use it.
Having something that is easy to do but terrible in performance encourages people to write terrible OpenGL applications.
accumulation buffer
Shaders can do this just fine. Better in fact; they have a lot more options than accumulation buffers cover.
Begin/End functions (!?),
It's another performance trap. Immediate mode rendering is terribly slow.
automatic mipmap generation
Because it was a terrible idea to begin with. Having OpenGL decide when to do a heavyweight operation like generate mipmaps of a texture is not a good idea. The much better idea the ARB had was to just let you say, "OK, OpenGL, generate some mipmaps for this texture right now."
current raster position.
Another performance trap/bad idea.
Will it be okay to still use them?
That's up to you. NVIDIA has effectively pledged to support the compatibility profile in perpetuity. Which means that AMD and Intel probably will have to as well. So that covers Windows and Linux.
On MacOSX, Apple controls the GL implementations more rigidly, and they seem committed to not supporting the compatibility profile. However, they seem to have little interest in advancing OpenGL, since they stopped with 3.2. Even Mountain Lion didn't update the OpenGL version.
What are the workarounds?
Stop using performance traps. Use buffer objects for your vertex data like everyone else. Use shaders. Use glGenerateMipmap.
for an application I'm developing I need to be able to
draw lines of different widths and colours
draw solid color filled triangles
draw textured (no alpha) quads
Very easy...but...
All coordinates are integer in pixel space and, very important: glReading all the pixels from the framebuffer
on two different machines, with two different graphic cards, running two different OS (Linux and freebsd),
must result in exactly the same sequence of bits (given an appropriate constant format conversion).
I think this is impossible to safely be achieved using opengl and hardware acceleration, since I bet different graphic
cards (from different vendors) may implement different algorithms for rasterization.
(OpenGl specs are clear about this, since they propose an algorithm but they also state that implementations may differ
under certain circumstances).
Also I don't really need hardware acceleration since I will be rendering very low speed and simple graphics.
Do you think I can achieve this by just disabling hardware acceleration? What happens in that case under linux, will I default on
MESA software rasterizer? And in that case, can I be sure it will always work or I am missing something?
That you're reading back in rendered pixels and strongly depend on their mathematical exactness/reproducability sounds like a design flaw. What's the purpose of this action? If you, for example, need to extract some information from the image, why don't you try to extract this information from the abstract, vectorized information prior to rendering?
Anyhow, if you depend on external rendering code and there's no way to make your reading code more robust to small errors, you're signing up for lots of pain and maintenance work. Other people could break your code with every tiny patch, because that kind of pixel exactness to the bit-level is usually a non-issue when they're doing their unit tests etc. Let alone the infinite permutations of hard- and software layers that are possible, and all might have influence on the exact pixel bits.
If you only need those two operatios: lines (with different widths and colors) and quads (with/without texture), I recommend writing your own rendering/rasterizer code which operates on a 8 bit uint array representing the image pixels (R8G8B8). The operations you're proposing aren't too nasty, so if performance is unimportant, this might actually be the better way to go on the long run.
I'm just learning about them, and find it discouraging that they have been deprecated. Should I keep investing into learning them? Would I learn something useful for the current model?
I think, though I may be wrong, that since most high-performance graphics apps (mostly games) pretty much only used vertex buffers and the like (in order to squeeze every drop of performance out of the card), that there was pressure to stop worrying about "frivolous" items such as display lists (and even good-old glVertex calls). IMHO, this provides a huge barrier to people learning to write OpenGL code, and (for my own purposes) is a big impediment to whipping up some quick, legible, and reasonably well performing code.
Note that these features were deprecated in 3.0, and actually removed in 3.1 (but still provided compatibility via an ARB extension). In OpenGL 3.2, they moved these features into a 'compatibility' profile that is optional for driver writers to implement.
So what does this mean? NVidia, at least, has vowed to continue support for the old-school compatibility mode for the forseeable future - there is a large wealth of legacy code out there that they need to support. You can find the discussion of their support in a slideshow at:
http://www.slideshare.net/Mark_Kilgard/opengl-32-and-more
starting at about slide #32. I don't know ATI/AMD's stance on this, but I would assume that it would be similar.
So, while display lists are technically removed from the required portion of the OpenGL 3.2 standard, I think that you are safe using them for quite a while. Eventually, you may wish to learn the buffer/shader-centric interface to OpenGL, especially if your end-goal is envelope-pushing game writing, but it really is a lot less intuitive (no glRotate, even!), so I would recommend starting with good old OpenGL 2.x.
-matt
Display lists were removed, because with opengl 3+ all vertex, texture and lighting data are stored on the graphics card, in what is called retained mode rendering (the data is retained, allowing you to send a single command to the card to draw a mesh, rather than sending vertex data to the card every frame). A major bottleneck in computer graphics is data bandwidth between RAM and gpuRAM. by generating meshes once, and retaining that data, we can transform it using homogeneous transform matrices, and draw it easily. This effectively reduces the bottleneck, with the drawback of longer loading times.
Immediate mode, however (pre 3.0) uses massive amounts of graphics bandwidth to send vertex data every frame, pre-transformed, with recalculated normals etc.
The problems with this approach are twofold:
1) excessive bandwidth use, and too much gpu idle time.
2) Excessive use of cpu time for calculations that could be done in parallel on 100+ cores, on the gpu
The simple solution to this, is retained mode.
With retained mode, display lists are no longer necessary. Hence their removal from the core profile.
Immediate mode is still very good for learning the theory of computer graphics. (and it's loads of fun, to boot) It just suffers in terms of maximum possible performance.
VBOs & VAOs may be, at first, less intuitive, but in terms of speed, it is far superior.
There are several easy to understand opengl 3.0 tutorials on the internet. Once you have openGL 2.0 down, you should consider moving on to 3.0+, as it allows you to build very fast 3d graphics applications.
While Matthew Hall has a good answer and covers most things, there are a few things I'll add.
If you look at what's been deprecated, you'll see it's a lot of client side and fixed functionality. So it's obvious that they're trying to move people away from client side centered code and have people do everything possible server side on the GPU instead.
When it comes to which context to use, well, that's up to you. Though if performance is a major concerned then 3.x is probably the way to go. I personally definitely want to learn opengl 3.x, but I doubt I'll be giving up 1.x/2.x. It's just so much easier to put together a quick app with what's available in a 1.x or 2.x context.
If you want a list of what's been deprecated, download the 3.0 specification and look under "The Deprecation Model"
A note from the future: latest Direct-X, Metal, and Vulkan apis have command buffers and command queues, which allow to record commands in the CPU, then sent them to the GPU to execute them there. Thus, perhaps, display lists was not a so old-fashioned idea. In fact, compiling display list is something orthogonal to the use of shaders and VBOs, and display lists can improve performance further....I wonder if a Vulkan or Metal to OpenGL translator could use display lists for command buffers....
Because VBOs (vertex buffer objects) are much more efficient and can do everything display lists can do. They're not really any more complex, either, just a little different. Unless you're already more familiar with the old style glBegin/glEnd stuff, you're probably best off learning about buffers from the get go.