Should I call glEnable and glDisable every time I draw something? - opengl

How often should I call OpenGL functions like glEnable() or glEnableClientState() and their corresponding glDisable counterparts? Are they meant to be called once at the beginning of the application, or should I keep them disabled and only enable those features I immediately need for drawing something? Is there a performance difference?

Warning: glPushAttrib / glPopAttrib are DEPRECATED in the modern OpenGL programmable pipeline.
If you find that you are checking the value of state variables often and subsequently calling glEnable/glDisable you may be able to clean things up a bit by using the attribute stack (glPushAttrib / glPopAttrib).
The attribute stack allows you to isolate areas of your code and such that changes to attribute in one sections does not affect the attribute state in other sections.
void drawObject1(){
glPushAttrib(GL_ENABLE_BIT);
glEnable(GL_DEPTH_TEST);
glEnable(GL_LIGHTING);
/* Isolated Region 1 */
glPopAttrib();
}
void drawObject2(){
glPushAttrib(GL_ENABLE_BIT);
glEnable(GL_FOG);
glEnable(GL_GL_POINT_SMOOTH);
/* Isolated Region 2 */
glPopAttrib();
}
void drawScene(){
drawObject1();
drawObject2();
}
Although GL_LIGHTING and GL_DEPTH_TEST are set in drawObject1 their state is not preserved to drawObject2. In the absence of glPushAttrib this would not be the case. Also - note that there is no need to call glDisable at the end of the function calls, glPopAttrib does the job.
As far as performance, the overhead due of individual function calls to glEnable/glDisable is minimal. If you need to be handling lots of state you will probably need to create your own state manager or make numerous calls to glGetInteger... and then act accordingly. The added machinery and control flow could made the code less transparent, harder to debug, and more difficult to maintain. These issues may make other, more fruitful, optimizations more difficult.
The attribution stack can aid in maintaining layers of abstraction and create regions of isolation.
glPushAttrib manpage

"That depends".
If you entire app only uses one combination of enable/disable states, then by all means just set it up at the beginning and go.
Most real-world apps need to mix, and then you're forced to call glEnable() to enable some particular state(s), do the draw calls, then glDisable() them again when you're done to "clear the stage".
State-sorting, state-tracking, and many optimization schemes stem from this, as state switching is sometimes expensive.

First of all, which OpenGL version do you use? And which generation of graphics hardware does your target group have? Knowing this would make it easier to give a more correct answer. My answer assumes OpenGL 2.1.
OpenGL is a state machine, meaning that whenever a state is changed, that state is made "current" until changed again explicitly by the programmer with a new OpenGL API call. Exceptions to this rule exist, like client state array calls making the current vertex color undefined. But those are the exceptions which define the rule.
"once at the beginning of the application" doesn't make much sense, because there are times you need to destroy your OpenGL context while the application is still running. I assume you mean just after every window creation. That works for state you don't need to change later. Example: If all your draw calls use the same vertex array data, you don't need to disable them with glDisableClientState afterwards.
There is a lot of enable/disable state associated with the old fixed-function pipeline. The easy redemption for this is: Use shaders! If you target a generation of cards at most five years old, it probably mimics the fixed-function pipeline with shaders anyway. By using shaders you are in more or less total control of what's happening during the transform and rasterization stages, and you can make your own "states" with uniforms, which are very cheap to change/update.
Knowing that OpenGL is a state machine like I said above should make it clear that one should strive to keep the state changes to a minimum, as long as it's possible. However, there are most probably other things which impacts performance much more than enable/disable state calls. If you want to know about them, read down.
The expense of state not associated with the old fixed-function state calls and which is not simple enable/disable state can differ widely in cost. Notably linking shaders, and binding names, ("names" of textures, programs, buffer objects) are usually fairly expensive. This is why a lot of games and applications used to sort the draw order of their meshes according to the texture. In that way, they didn't have to bind the same texture twice. Nowadays however, the same applies to shader programs. You don't want to bind the same shader program twice if you don't have to. Also, not all the features in a particular OpenGL version are hardware accelerated on all cards, even if the vendors of those cards claim they are OpenGL compliant. Being compliant means that they follow the specification, not that they necessarily run all the featurs efficiently. Some of the functions like glHistogram and glMinMax from GL__ ARB __imaging should be remembered in this regard.
Conclusion: Unless there is an obvious reason not to, use shaders! It saves you from a lot of uncecessary state calls since you can use uniforms instead. OpenGL shaders have just been around for about six years you know. Also, the overhead of enable/disable state changes can be an issue, but usually there is a lot more to gain from optimizing other more expensive state changes, like glUseProgram, glCompileShader, glLinkprogram, glBindBuffer and glBindTexture.
P.S: OpenGL 3.0 removed the client state enable/disable calls. They are implicitly enabled as draw arrays is the only way to draw in this version. Immediate mode was removed. gl..Pointer calls were removed as well, since one really just needs glVertexAttribPointer.

A rule of thumb that I was taught said that it's almost always cheaper to just enable/disable at will rather than checking the current state and changing only if needed.
That said, Marc's answer is something that should definitely work.

Related

Does it still make sense to cache OpenGL states?

There used to be a time when people said that 'managing' the OpenGL states was useful (just saw an article on it from 2001). Like this (C++):
void CStateManager::setCulling(bool enabled)
{
if (m_culling != enabled)
{
m_culling = enabled;
if (m_culling)
glEnable(GL_CULL_FACE);
else
glDisable(GL_CULL_FACE);
}
}
I can see that this might be still useful in a situation where the OpenGL server is not on the same place as the OpenGL client. But that certainly isn't the case in my 'game'-engine, so let's assume the OpenGL client is always on the same machine as the OpenGL server.
Is it still (it's 2017 now) worth it to have all this checking-code take up cycles instead of just always calling the driver?
One could say that I should profile it myself, but I don't think the results will matter because there are so much different graphics adapters, drivers, CPU's, OS's out there that my personal test can not be representative enough.
EDIT: And how about things like bound buffers, framebuffers, textures, ...
Is it still (it's 2017 now) worth it to have all this checking-code take up cycles instead of just always calling the driver?
You say that as though it ever were "worth it". It wasn't, and it still isn't. At least, not in general.
State caching is useful in the cases where it was before: when you don't have direct control over what's going on.
For example, if you're writing a game engine, you have firm knowledge over what rendering operations you intend to do, and when you intend to do them. You know that all of your meshes are going to use face culling, and you make your artists/tool pipeline deal with that accordingly. You might turn off face culling for GUI rendering or something like that, but those would be specific cases.
By contrast, if you're writing a generalized rendering system, where the user has near total control over the nature of meshes, state caching might help. If your code is being told which meshes use face culling and which do not, then you have no control over that sort of thing. And since the higher-level code/data isn't able to talk to OpenGL directly, you can do some state caching to smooth things over.
So if you're in control of how things get rendered, if you control the broad nature of your data and the way it gets drawn, then you don't need a cache. Your code does that job adequately. And in good data-driven designs, you can order the rendering of the data so that you still don't need a cache. You only need one in a completely free-wheeling system, where the outside world has near-total control over all state and has little regard for the performance impact of its state changes.

When does it make sense to turn off the rasterization step?

In vulkan there is a struct which is required for pipeline creation, named VkPipelineRasterizationStateCreateInfo. In this struct there is a member named rasterizerDiscardEnable. If this member is set to VK_TRUE then all primitives are discarded before the rasterization step. This disables any output to the framebuffer.
I cannot think of a scenario where this might make any sense. In which cases could it be useful?
It would be for any case where you're executing the rendering pipeline solely for the side effects of the vertex processing stage(s). For example, you could use a GS to feed data into a buffer, which you later render from.
Now in many cases you could use a compute shader to do something similar. But you can't use a CS to efficiently implement tessellation; that's best done by the hardware tessellator. So if you want to capture data generated by tessellation (presumably because you'll be rendering with it multiple times), you have to use a rendering process.
A useful side-effect (though not necessarily the intended use case) of this parameter is for benchmarking / determining the bottle-neck of your Vulkan application: If discarding all primitives before the rasterization stage (and thus before any fragment shaders are ever executed) does not improve your frame-rate then you can rule out that your application performance is fragment stage-bound.

Is it possible to preserve all the state at once in OpenGL?

If we have several OpenGL contexts, each in its own process, the driver somehow virtualises the device, so that each program thinks it exclusively runs the GPU. That is, if one program calls glEnable, the other one will never notice that.
This could be otherwise done with a ton of glGet calls to save state and its counterparts to restore it afterwards. Obviously, the driver does it more efficiently. However, in userspace we need to track which changes we made to the state and handle them selectively. Maybe it's just me missing something, but I thought it would be nice, for one, to adjust Viewport for a Framebuffer, and then just undo those changes to whatever state they were before.
Maybe there is a way of achieving the effect of a context switch yet within a single program?
Maybe there is a way of achieving the effect of a context switch yet within a single program?
You may create as many OpenGL contexts in a single process as you like and switch between them. Also with modern GPUs the state of the OpenGL context has little resemblance of what's actually happening on the GPU.
For pre-Core OpenGL there's glPushAttrib()/glPopAttrib() that will let you store off some GL state.
You're probably better off writing your own client-side state shadowing though.
The state machine (and command queue, discussed below) are unique to each context. It is much, much higher-level than you are thinking; the state is generally wrapped up nicely in usermode.
As for context-switching in a single process, be aware that each render context in GL is unsynchronized. An implicit flush is generated during a context switch in order to help alleviate this problem. As long as a context is only used by a single thread, this is generally adequate but probably going to negatively impact performance.

Storing OpenGL state

Suppose I'm trying to make some kind of a small opengl graphics engine on C++. I've read that accessing opengl state via glGet* functions can be quite expensive (while accessing opengl state seems to be an often operation), and it's strongly recommended to store a copy of opengl state somewhere with fast read/write access.
I'm currently thinking of storing the opengl state as a global thread_local variable of some appropriate type. How bad is that design? Are there any pitfalls?
If you want to stick with OpenGL's design (where your context pointer could be considered "thread_local") I guess it's a valid option... Obviously, you will need to have full control over all OpenGL calls in order to keep your state copy in sync with the current context's state.
I personally prefer to wrap the OpenGL state of interest using an "OpenGLState" class with a bunch of settable/gettable properties each mapping to some part of the state. You can then also avoid setting the same state twice. You could make it thread_local, but I couldn't (Visual C++ only supports thread_local for POD types).
You will need to be very careful, as some OpenGL calls indirectly change seemingly unrelated parts of the context's state. For example, glDeleteTextures will reset any binding of the deleted texture(s) to 0. Also, some toolkits are very "helpful" in changing OpenGL state behind your back (for example, QtOpenGLContext on OSX changes your viewport for you when made current).
Since you can only (reasonably) use a GL context with one thread, why do you need thread local? Yes, you can make a context current in different threads at different times, but this is not a wise design.
You will usually have one context and one thread accessing it. In rare cases, you will have two contexts (often shared) with two threads. In that case, you can simply put any additional state you wish to save into your context class, of which each instance is owned by exactly one thread.
But most of the time, you need not explicitly "remember" states anyway. All states have well-documented initial states, and they only change when you change them (exception being changes made by a "super smart" toolkit, but storing a wrong state doesn't help in that case either).
You will usually try to batch together states and do many "similar" draw calls with one set of states, the reason being that state changes are stalling the pipeline and need expensive validations being done before the next draw calls.
So, start off with the defaults, and set everything that needs to be non-default before drawing a batch. Then change what needs to be different for the next batch.
If you can't be bothered to dig through the specs for default values and keep track, you can redundantly set everything all the time. Then run your application in GDebugger, which will tell you what state changes are redundant, so you can elimiate them.

Which OpenGL functions are not GPU-accelerated?

I was shocked when I read this (from the OpenGL wiki):
glTranslate, glRotate, glScale
Are these hardware accelerated?
No, there are no known GPUs that
execute this. The driver computes the
matrix on the CPU and uploads it to
the GPU.
All the other matrix operations are
done on the CPU as well :
glPushMatrix, glPopMatrix,
glLoadIdentity, glFrustum, glOrtho.
This is the reason why these functions
are considered deprecated in GL 3.0.
You should have your own math library,
build your own matrix, upload your
matrix to the shader.
For a very, very long time I thought most of the OpenGL functions use the GPU to do computation. I'm not sure if this is a common misconception, but after a while of thinking, this makes sense. Old OpenGL functions (2.x and older) are really not suitable for real-world applications, due to too many state switches.
This makes me realise that, possibly, many OpenGL functions do not use the GPU at all.
So, the question is:
Which OpenGL function calls don't use the GPU?
I believe knowing the answer to the above question would help me become a better programmer with OpenGL. Please do share some of your insights.
Edit:
I know this question easily leads to optimisation level. It's good, but it's not the intention of this question.
If anyone knows a set of GL functions on a certain popular implementation (as AshleysBrain suggested, nVidia/ATI, and possibly OS-dependent) that don't use the GPU, that's what I'm after!
Plausible optimisation guides come later. Let's focus on the functions, for this topic.
Edit2:
This topic isn't about how matrix transformations work. There are other topics for that.
Boy, is this a big subject.
First, I'll start with the obvious: Since you're calling the function (any function) from the CPU, it has to run at least partly on the CPU. So the question really is, how much of the work is done on the CPU and how much on the GPU.
Second, in order for the GPU to get to execute some command, the CPU has to prepare a command description to pass down. The minimal set here is a command token describing what to do, as well as the data for the operation to be executed. How the CPU triggers the GPU to do the command is also somewhat important. Since most of the time, this is expensive, the CPU does not do it often, but rather batches commands in command buffers, and simply sends a whole buffer for the GPU to handle.
All this to say that passing work down to the GPU is not a free exercise. That cost has to be pitted against just running the function on the CPU (no matter what we're talking about).
Taking a step back, you have to ask yourself why you need a GPU at all. The fact is, a pure CPU implementation does the job (as AshleysBrain mentions). The power of the GPU comes from its design to handle:
specialized tasks (rasterization, blending, texture filtering, blitting, ...)
heavily parallel workloads (DeadMG is pointing to that in his answer), when a CPU is more designed to handle single-threaded work.
And those are the guiding principles to follow in order to decide what goes in the chip. Anything that can benefit from those ought to run on the GPU. Anything else ought to be on the CPU.
It's interesting, by the way. Some functionality of the GL (prior to deprecation, mostly) are really not clearly delineated. Display lists are probably the best example of such a feature. Each driver is free to push as much as it wants from the display list stream to the GPU (typically in some command buffer form) for later execution, as long as the semantics of the GL display lists are kept (and that is somewhat hard in general). So some implementations only choose to push a limited subset of the calls in a display list to a computed format, and choose to simply replay the rest of the command stream on the CPU.
Selection is another one where it's unclear whether there is value to executing on the GPU.
Lastly, I have to say that in general, there is little correlation between the API calls and the amount of work on either the CPU or the GPU. A state setting API tends to only modify a structure somewhere in the driver data. It's effect is only visible when a Draw, or some such, is called.
A lot of the GL API works like that. At that point, asking whether glEnable(GL_BLEND) is executed on the CPU or GPU is rather meaningless. What matters is whether the blending will happen on the GPU when Draw is called. So, in that sense, Most GL entry points are not accelerated at all.
I could also expand a bit on data transfer but Danvil touched on it.
I'll finish with the little "s/w path". Historically, GL had to work to spec no matter what the hardware special cases were. Which meant that if the h/w was not handling a specific GL feature, then it had to emulate it, or implement it fully in software. There are numerous cases of this, but one that struck a lot of people is when GLSL started to show up.
Since there was no practical way to estimate the code size of a GLSL shader, it was decided that the GL was supposed to take any shader length as valid. The implication was fairly clear: either implement h/w that could take arbitrary length shaders -not realistic at the time-, or implement a s/w shader emulation (or, as some vendors chose to, simply fail to be compliant). So, if you triggered this condition on a fragment shader, chances were the whole of your GL ended up being executed on the CPU, even when you had a GPU siting idle, at least for that draw.
The question should perhaps be "What functions eat an unexpectedly high amount of CPU time?"
Keeping a matrix stack for projection and view is not a thing the GPU can handle better than a CPU would (on the contrary ...). Another example would be shader compilation. Why should this run on the GPU? There is a parser, a compiler, ..., which are just normal CPU programs like the C++ compiler.
Potentially "dangerous" function calls are for example glReadPixels, because data can be copied from host (=CPU) memory to device (=GPU) memory over the limited bus. In this category are also functions like glTexImage_D or glBufferData.
So generally speaking, if you want to know how much CPU time an OpenGL call eats, try to understand its functionality. And beware of all functions, which copy data from host to device and back!
Typically, if an operation is per-something, it will occur on the GPU. An example is the actual transformation - this is done once per vertex. On the other hand, if it occurs only once per large operation, it'll be on the CPU - such as creating the transformation matrix, which is only done once for each time the object's state changes, or once per frame.
That's just a general answer and some functionality will occur the other way around - as well as being implementation dependent. However, typically, it shouldn't matter to you, the programmer. As long as you allow the GPU plenty of time to do it's work while you're off doing the game sim or whatever, or have a solid threading model, you shouldn't need to worry about it that much.
#sending data to GPU: As far as I know (only used Direct3D) it's all done in-shader, that's what shaders are for.
glTranslate, glRotate and glScale change the current active transformation matrix. This is of course a CPU operation. The model view and projection matrices just describes how the GPU should transforms vertices when issue a rendering command.
So e.g. by calling glTranslate nothing is translated at all yet. Before rendering the current projection and model view matrices are multiplied (MVP = projection * modelview) then this single matrix is copied to the GPU and then the GPU does the matrix * vertex multiplications ("T&L") for each vertex. So the translation/scaling/projection of the vertices is done by the GPU.
Also you really should not be worried about the performance if you don't use these functions in an inner loop somewhere. glTranslate results in three additions. glScale and glRotate are a bit more complex.
My advice is that you should learn a bit more about linear algebra. This is essential for working with 3D APIs.
There are software rendered implementations of OpenGL, so it's possible that no OpenGL functions run on the GPU. There's also hardware that doesn't support certain render states in hardware, so if you set a certain state, switch to software rendering, and again, nothing will run on the GPU (even though there's one there). So I don't think there's any clear distinction between 'GPU-accelerated functions' and 'non-GPU accelerated functions'.
To be on the safe side, keep things as simple as possible. The straightforward rendering-with-vertices and basic features like Z buffering are most likely to be hardware accelerated, so if you can stick to that with the minimum state changing, you'll be most likely to keep things hardware accelerated. This is also the way to maximize performance of hardware-accelerated rendering - graphics cards like to stay in one state and just crunch a bunch of vertices.