I have read that instancing in OpenGL makes drawing thousands of objects faster. But, if you use instancing and only draw one object, is it much slower? If so, what order of magnitude of objects do you need for instancing to be an improvement? Just a few? Tens? Hundreds?
Some context (in case I have an X-Y problem); if I have to write code for instancing anyway, it would be easier to just leave it on all the time.
Answers to these types of questions tend to be somewhat repetitive: Try different options, and benchmark them on the platform(s) you care about. There's really no way to give a definitive answer that will necessarily apply to every possible platform.
That being said, I would not expect instanced rendering to add significant overhead on hardware that fully supports it. Instanced rendering is not a very recent feature. Based on the history I could find, it was part of DX10 (released in 2006) and OpenGL 3.1 (released in 2009). So it seems very likely that any moderately recent hardware (DX10 level and later) can support it efficiently.
On recent hardware, non-instanced rendering could be just a special case of instanced rendering where only a single instance is drawn. There might be a little more state setup, but overall it could be basically no additional overhead.
In general, it's not uncommon that features are supported on hardware that does not really have full support for the feature. In those cases, the driver will sometimes have to jump through hoops to provide the feature, often with lower efficiency and additional CPU overhead. It's not impossible that this could be the case for instanced rendering on some platform, which brings us back to the start: Benchmark!
Related
Now I'm facing a problem regarding plotting some curves in a Qt and Qwt application for embedded linux (see more details about the problem in this link).
One of the proposed solutions was to use OpenGL together with QwtPlot, but my boss fears that OpenGL would ensure its graphical optimization with a higher processing cost, so essentially improving in one area to cause problem in another. I must say that this reasoning seems convincing.
Now I haven't checked how much exactly the improvements would be, neither I know how much extra processing OpenGL usage would take, but I came after this to do a more general question (whose answer may actually refute my boss' thesis): what are the disadvantages of using OpenGL, particularly for a embedded linux situation? I tried to find something on the web, but Google wouldn't help be with disadvantages apart from the issues related to the fight between OpenGL and DirectX.
but my boss fears that OpenGL would ensure its graphical optimization with a higher processing cost,
Your boss is speculating without having actual knowledge on the subject. This is akin to premature optimization.
OpenGL is not a library, it's an API to access graphics systems and it has been deliberately designed to have very little overhead and do not provide anything beyond what GPUs actually can do. There are no higher level kinds of "objects" in OpenGL. All what OpenGL does is making the GPU draw points, lines or triangles in exactly the order and way, you tell it to.
what are the disadvantages of using OpenGL, particularly for a embedded linux situation?
If your target embedded device has a OpenGL capable GPU: Zero. In fact using OpenGL will then greatly improve performance and reduce load on the CPU. More likely on an embedded system you'll have to deal with OpenGL-ES, though. In your other post you mention you're using a TI OMAP. Which one exactly? Because some of them come with PowerVR GPUs.
I recently read this list and I noticed that almost everything I studied from the OpenGL Red Book is considered deprecated.
I'm talking about pixel transfer operations, pixel drawings, accumulation buffer, Begin/End functions (!?), automatic mipmap generation and current raster position.
Why did they flag these features as deprecated? Will it be okay to still use them? What are the workarounds?
In my opinion its for the better. But this so called Immediate Mode is indeed deprecated in OpenGL 3.0 mainly because its performance is not optimal.
In immediate mode you use calls like glBegin and glEnd. So the rendering of primitives depends on the program's commands, OpenGL can't advance until it gets the appropiate command from the CPU. Instead you can use buffer objects to store all your vertices and data. And then tell OpenGL to render its primitives using this buffer with commands like glDrawArrays or glDrawElements or even more specialized commands like glDrawElementsInstanced. While the GPU is busy executing those commands and drawing the buffer to the target FrameBuffer (basically a render target). The program can go off and issue some other commands. This way both the CPU and the GPU are busy at the same time, and no time is wasted.
Not the best explanation ever, but my advice: try to learn this new rendering pipeline instead. It's superior to immediate mode by far. I recommend tutorials like:
http://www.arcsynthesis.org/gltut/index.html
http://www.opengl-tutorial.org/
http://ogldev.atspace.co.uk/
Literally try to forget what you know so far, immediate mode is long deprecated and shouldn't be used anymore, instead, focus on the new technology ;)
Edit Excuse me if I used 'intermediate' instead of 'immediate', I think its actually called 'immediate', I tend to mix them up.
Why did they flag these features as deprecated?
First, some terminology: they aren't deprected. In OpenGL 3.0, they are deprecated (meaning "may be removed in later versions"); in 3.1 and above, most of them are removed. The compatibility profile brings the removed features back. And while it is widely implemented on Windows and Linux, Apple's 3.2 implementation only implements the core profile.
As to the reasoning behind the removal, it depends on which feature you're talking about. We can really only speculate as to why the ARB any specific feature:
pixel transfer operations
Pixel transfer operations have not been removed. If you're talking about glDrawPixels, that is a pixel transfer operation, but it is one pixel transfer. Not all of them.
Speaking of which:
pixel drawings
Because it was a horrible idea to begin with. glDrawPixels is a performance trap; it sounds nice and neat, but it performs terribly and because it's simple, people will try to use it.
Having something that is easy to do but terrible in performance encourages people to write terrible OpenGL applications.
accumulation buffer
Shaders can do this just fine. Better in fact; they have a lot more options than accumulation buffers cover.
Begin/End functions (!?),
It's another performance trap. Immediate mode rendering is terribly slow.
automatic mipmap generation
Because it was a terrible idea to begin with. Having OpenGL decide when to do a heavyweight operation like generate mipmaps of a texture is not a good idea. The much better idea the ARB had was to just let you say, "OK, OpenGL, generate some mipmaps for this texture right now."
current raster position.
Another performance trap/bad idea.
Will it be okay to still use them?
That's up to you. NVIDIA has effectively pledged to support the compatibility profile in perpetuity. Which means that AMD and Intel probably will have to as well. So that covers Windows and Linux.
On MacOSX, Apple controls the GL implementations more rigidly, and they seem committed to not supporting the compatibility profile. However, they seem to have little interest in advancing OpenGL, since they stopped with 3.2. Even Mountain Lion didn't update the OpenGL version.
What are the workarounds?
Stop using performance traps. Use buffer objects for your vertex data like everyone else. Use shaders. Use glGenerateMipmap.
Why do people tend to mix deprecated fixed-function pipeline features like the matrix stack, gluPerspective(), glMatrixMode() and what not when this is meant to be done manually and shoved into GLSL as a uniform.
Are there any benefits to this approach?
There is a legitimate reason to do this, in terms of user sanity. Fixed-function matrices (and other fixed-function state tracked in GLSL) are global state, shared among all uniforms. If you want to change the projection matrix in every shader, you can do that by simply changing it in one place.
Doing this in GLSL without fixed function requires the use of uniform buffers. Either that, or you have to build some system that will farm state information to every shader that you want to use. The latter is perfectly doable, but a huge hassle. The former is relatively new, only introduced in 2009, and it requires DX10-class hardware.
It's much simpler to just use fixed-function and GLSL state tracking.
No benefits as far as I'm aware of (unless you consider not having to recode the functionality a benefit).
Most likely just laziness, or a lack of knowledge of the alternative method.
Essentially because those applications requires shaders to run, but programmers are too lazy/stressed to re-implement those features that are already available using OpenGL compatibility profile.
Notable features that are "difficult" to replace are the line width (greater than 1), the line stipple and separate front and back polygon mode.
Most tutorials teach deprecated OpenGL, so maybe people don't know better.
The benefit is that you are using well-known, thoroughly tested and reliable code. If it's for MS Windows or Linux proprietary drivers, written by the people who built your GPU and therefore can be assumed to know how to make it really fast.
An additional benefit for group projects is that There Is Only One Way To Do It. No arguments about whether you should be writing your own C++ matrix class and what it should be called and which operators to overload and whether the internal implementation should be a 1D or 2D arrary...
I am trying to optimize some OpenGL code and I was wondering if someone knows of a table that would give a rough approximation of the relative costs of various OpenGL functions ?
Something like (these numbers are probably completely wrong) :
method cost
glDrawElements(100 indices) 1
glBindTexture(512x512) 2
glGenBuffers(1 buffer) 1.2
If that doesn't exist, would it be possible to build one or are the various hardware/OS too different for that to be even meaningful ?
There certainly is no such list. One of the problems in creating such a list is answering the question, "what kind of cost?"
All rendering functions have a GPU-time cost. That is, the GPU has to do rendering. How much of a cost depends on the shaders in use, the number of vertices provided, and the textures being used.
Even for CPU time cost, the values are not clear. Take glDrawElements. If you changed the vertex attribute bindings before calling it, then it can take more CPU time than if you didn't. Similarly, if you changed uniform values in a program since you last used it, then rendering with that program may take longer. And so forth.
The main problem with assembling such a list is that it encourages premature optimization. If you have such a list, then users will be encouraged to take steps to avoid using functions that cost more. They may take too many steps along this route. No, it's better to just avoid the issue entirely and encourage users to actually profile their applications before optimizing them.
The relative costs of different OpenGL functions will depend heavily on the arguments to the function, the active OpenGL environment when they are called, and the GPU, drivers, and OS you're running on. There's really no good way to do a comparison like what you're describing -- your best bet is simply to test out the different possibilities and see what performs best for you.
I'm just learning about them, and find it discouraging that they have been deprecated. Should I keep investing into learning them? Would I learn something useful for the current model?
I think, though I may be wrong, that since most high-performance graphics apps (mostly games) pretty much only used vertex buffers and the like (in order to squeeze every drop of performance out of the card), that there was pressure to stop worrying about "frivolous" items such as display lists (and even good-old glVertex calls). IMHO, this provides a huge barrier to people learning to write OpenGL code, and (for my own purposes) is a big impediment to whipping up some quick, legible, and reasonably well performing code.
Note that these features were deprecated in 3.0, and actually removed in 3.1 (but still provided compatibility via an ARB extension). In OpenGL 3.2, they moved these features into a 'compatibility' profile that is optional for driver writers to implement.
So what does this mean? NVidia, at least, has vowed to continue support for the old-school compatibility mode for the forseeable future - there is a large wealth of legacy code out there that they need to support. You can find the discussion of their support in a slideshow at:
http://www.slideshare.net/Mark_Kilgard/opengl-32-and-more
starting at about slide #32. I don't know ATI/AMD's stance on this, but I would assume that it would be similar.
So, while display lists are technically removed from the required portion of the OpenGL 3.2 standard, I think that you are safe using them for quite a while. Eventually, you may wish to learn the buffer/shader-centric interface to OpenGL, especially if your end-goal is envelope-pushing game writing, but it really is a lot less intuitive (no glRotate, even!), so I would recommend starting with good old OpenGL 2.x.
-matt
Display lists were removed, because with opengl 3+ all vertex, texture and lighting data are stored on the graphics card, in what is called retained mode rendering (the data is retained, allowing you to send a single command to the card to draw a mesh, rather than sending vertex data to the card every frame). A major bottleneck in computer graphics is data bandwidth between RAM and gpuRAM. by generating meshes once, and retaining that data, we can transform it using homogeneous transform matrices, and draw it easily. This effectively reduces the bottleneck, with the drawback of longer loading times.
Immediate mode, however (pre 3.0) uses massive amounts of graphics bandwidth to send vertex data every frame, pre-transformed, with recalculated normals etc.
The problems with this approach are twofold:
1) excessive bandwidth use, and too much gpu idle time.
2) Excessive use of cpu time for calculations that could be done in parallel on 100+ cores, on the gpu
The simple solution to this, is retained mode.
With retained mode, display lists are no longer necessary. Hence their removal from the core profile.
Immediate mode is still very good for learning the theory of computer graphics. (and it's loads of fun, to boot) It just suffers in terms of maximum possible performance.
VBOs & VAOs may be, at first, less intuitive, but in terms of speed, it is far superior.
There are several easy to understand opengl 3.0 tutorials on the internet. Once you have openGL 2.0 down, you should consider moving on to 3.0+, as it allows you to build very fast 3d graphics applications.
While Matthew Hall has a good answer and covers most things, there are a few things I'll add.
If you look at what's been deprecated, you'll see it's a lot of client side and fixed functionality. So it's obvious that they're trying to move people away from client side centered code and have people do everything possible server side on the GPU instead.
When it comes to which context to use, well, that's up to you. Though if performance is a major concerned then 3.x is probably the way to go. I personally definitely want to learn opengl 3.x, but I doubt I'll be giving up 1.x/2.x. It's just so much easier to put together a quick app with what's available in a 1.x or 2.x context.
If you want a list of what's been deprecated, download the 3.0 specification and look under "The Deprecation Model"
A note from the future: latest Direct-X, Metal, and Vulkan apis have command buffers and command queues, which allow to record commands in the CPU, then sent them to the GPU to execute them there. Thus, perhaps, display lists was not a so old-fashioned idea. In fact, compiling display list is something orthogonal to the use of shaders and VBOs, and display lists can improve performance further....I wonder if a Vulkan or Metal to OpenGL translator could use display lists for command buffers....
Because VBOs (vertex buffer objects) are much more efficient and can do everything display lists can do. They're not really any more complex, either, just a little different. Unless you're already more familiar with the old style glBegin/glEnd stuff, you're probably best off learning about buffers from the get go.