Does it still make sense to cache OpenGL states? - opengl

There used to be a time when people said that 'managing' the OpenGL states was useful (just saw an article on it from 2001). Like this (C++):
void CStateManager::setCulling(bool enabled)
{
if (m_culling != enabled)
{
m_culling = enabled;
if (m_culling)
glEnable(GL_CULL_FACE);
else
glDisable(GL_CULL_FACE);
}
}
I can see that this might be still useful in a situation where the OpenGL server is not on the same place as the OpenGL client. But that certainly isn't the case in my 'game'-engine, so let's assume the OpenGL client is always on the same machine as the OpenGL server.
Is it still (it's 2017 now) worth it to have all this checking-code take up cycles instead of just always calling the driver?
One could say that I should profile it myself, but I don't think the results will matter because there are so much different graphics adapters, drivers, CPU's, OS's out there that my personal test can not be representative enough.
EDIT: And how about things like bound buffers, framebuffers, textures, ...

Is it still (it's 2017 now) worth it to have all this checking-code take up cycles instead of just always calling the driver?
You say that as though it ever were "worth it". It wasn't, and it still isn't. At least, not in general.
State caching is useful in the cases where it was before: when you don't have direct control over what's going on.
For example, if you're writing a game engine, you have firm knowledge over what rendering operations you intend to do, and when you intend to do them. You know that all of your meshes are going to use face culling, and you make your artists/tool pipeline deal with that accordingly. You might turn off face culling for GUI rendering or something like that, but those would be specific cases.
By contrast, if you're writing a generalized rendering system, where the user has near total control over the nature of meshes, state caching might help. If your code is being told which meshes use face culling and which do not, then you have no control over that sort of thing. And since the higher-level code/data isn't able to talk to OpenGL directly, you can do some state caching to smooth things over.
So if you're in control of how things get rendered, if you control the broad nature of your data and the way it gets drawn, then you don't need a cache. Your code does that job adequately. And in good data-driven designs, you can order the rendering of the data so that you still don't need a cache. You only need one in a completely free-wheeling system, where the outside world has near-total control over all state and has little regard for the performance impact of its state changes.

Related

What can Vulkan do specifically that OpenGL 4.6+ cannot?

I'm looking into whether it's better for me to stay with OpenGL or consider a Vulkan migration for intensive bottlenecked rendering.
However I don't want to make the jump without being informed about it. I was looking up what benefits Vulkan offers me, but with a lot of googling I wasn't able to come across exactly what gives performance boosts. People will throw around terms like "OpenGL is slow, Vulkan is way faster!" or "Low power consumption!" and say nothing more on the subject.
Because of this, it makes it difficult for me to evaluate whether or not the problems I face are something Vulkan can help me with, or if my problems are due to volume and computation (and Vulkan would in such a case not help me much).
I'm assuming Vulkan does not magically make things in the pipeline faster (as in shading in triangles is going to be approximately the same between OpenGL and Vulkan for the same buffers and uniforms and shader). I'm assuming all the things with OpenGL that cause grief (ex: framebuffer and shader program changes) are going to be equally as painful in either API.
There are a few things off the top of my head that I think Vulkan offers based on reading through countless things online (and I'm guessing this certainly is not all the advantages, or whether these are even true):
Texture rendering without [much? any?] binding (or rather a better version of 'bindless textures'), which I've noticed when I switched to bindless textures I gained a significant performance boost, but this might not even be worth mentioning as a point if bindless textures effectively does this and therefore am not sure if Vulkan adds anything here
Reduced CPU/GPU communication by composing some kind of command list that you can execute on the GPU without needing to send much data
Being able to interface in a multithreaded way that OpenGL can't somehow
However I don't know exactly what cases people run into in the real world that demand these, and how OpenGL limits these. All the examples so far online say "you can run faster!" but I haven't seen how people have been using it to run faster.
Where can I find information that answers this question? Or do you know some tangible examples that would answer this for me? Maybe a better question would be where are the typical pain points that people have with OpenGL (or D3D) that caused Vulkan to become a thing in the first place?
An example of answer that would not be satisfying would be a response like
You can multithread and submit things to Vulkan quicker.
but a response that would be more satisfying would be something like
In Vulkan you can multithread your submissions to the GPU. In OpenGL you can't do this because you rely on the implementation to do the appropriate locking and placing fences on your behalf which may end up creating a bottleneck. A quick example of this would be [short example here of a case where OpenGL doesn't cut it for situation X] and in Vulkan it is solved by [action Y].
The last paragraph above may not be accurate whatsoever, but I was trying to give an example of what I'd be looking for without trying to write something egregiously wrong.
Vulkan really has four main advantages in terms of run-time behavior:
Lower CPU load
Predictable CPU load
Better memory interfaces
Predictable memory load
Specifically lower GPU load isn't one of the advantages; the same content using the same GPU features will have very similar GPU performance with both of the APIs.
In my opinion it also has many advantages in terms of developer usability - the programmer's model is a lot cleaner than OpenGL, but there is a steeper learning curve to get to the "something working correctly" stage.
Let's look at each of the advantages in more detail:
Lower CPU load
The lower CPU load in Vulkan comes from multiple areas, but the main ones are:
The API encourages up-front construction of descriptors, so you're not rebuilding state on a draw-by-draw basis.
The API is asynchronous and can therefore move some responsibilities, such as tracking resource dependencies, to the application. A naive application implementation here will be just as slow as OpenGL, but the application has more scope to apply high level algorithmic optimizations because it can know how resources are used and how they relate to the scene structure.
The API moves error checking out to layer drivers, so the release drivers are as lean as possible.
The API encourages multithreading, which is always a great win (especially on mobile where e.g. four threads running slowly will consume a lot less energy than one thread running fast).
Predictable CPU load
OpenGL drivers do various kinds of "magic", either for performance (specializing shaders based on state only known late at draw time), or to maintain the synchronous rendering illusion (creating resource ghosts on the fly to avoid stalling the pipeline when the application modifies a resource which is still referenced by a pending command).
The Vulkan design philosophy is "no magic". You get what you ask for, when you ask for it. Hopefully this means no random slowdowns because the driver is doing something you didn't expect in the background. The downside is that the application takes on the responsibility for doing the right thing ;)
Better memory interfaces
Many parts of the OpenGL design are based on distinct CPU and GPU memory pools which require a programming model which gives the driver enough information to keep them in sync. Most modern hardware can do better with hardware-backed coherency protocols, so Vulkan enables a model where you can just map a buffer once, and then modify it adhoc and guarantee that the "other process" will see the changes. No more "map" / "unmap" / "invalidate" overhead (provided the platform supports coherent buffers, of course, it's still not universal).
Secondly Vulkan separates the concept of the memory allocation and how that memory is used (the memory view). This allows the same memory to be recycled for different things in the frame pipeline, reducing the amount of intermediate storage you need allocated.
Predictable memory load
Related to the "no magic" comment for CPU performance, Vulkan won't generate random resources (e.g. ghosted textures) on the fly to hide application problems. No more random fluctuations in resource memory footprint, but again the application has to take on the responsibility to do the right thing.
This is at risk of being opinion based. I suppose I will just reiterate the Vulkan advantages that are written on the box, and hopefully uncontested.
You can disable validation in Vulkan. It obviously uses less CPU (or battery\power\noise) that way. In some cases this can be significant.
OpenGL does have poorly defined multi-threading. Vulkan has well defined multi-threading in the specification. Meaning you do not immediately lose your mind trying to code with multiple threads, as well as better performance if otherwise the single thread would be a bottleneck on CPU.
Vulkan is more explicit; it does not (or tries to not) expose big magic black boxes. That means e.g. you can do something about micro-stutter and hitching, and other micro-optimizations.
Vulkan has cleaner interface to windowing systems. No more odd contexts and default framebuffers. Vulkan does not even require window to draw (or it can achieve it without weird hacks).
Vulkan is cleaner and more conventional API. For me that means it is easier to learn (despite the other things) and more satisfying to use.
Vulkan takes binary intermediate code shaders. While OpenGL used not to. That should mean faster compilation of such code.
Vulkan has mobile GPUs as first class citizen. No more ES.
Vulkan have open source, and conventional (GitHub) public tracker(s). Meaning you can improve the ecosystem without going through hoops. E.g. you can improve\implement a validation check for error that often trips you. Or you can improve the specification so it does make sense for people that are not insiders.

Performance of WebGL and OpenGL

For the past month I've been messing with WebGL, and found that if I create and draw a large vertex buffer it causes low FPS. Does anyone know if it be the same if I used OpenGL with C++?
Is that a bottleneck with the language used (JavaScript in the case of WebGL) or the GPU?
WebGL examples like this show that you can draw 150,000 cubes using one buffer with good performance but anything more than this, I get FPS drops. Would that be the same with OpenGL, or would it be able to handle a larger buffer?
Basically, I've got to make a decision to continue using WebGL and try to optimise by code or - if you tell me OpenGL would perform better and it's a language speed bottleneck, switch to C++ and use OpenGL.
If you only have a single drawArrays call, there should not be much of a difference between OpenGL and WebGL for the call itself. However, setting up the data in Javascript might be a lot slower, so it really depends on your problem. If the bulk of your data is static (landscape, rooms), WebGL might work well for you. Otherwise, setting up the data in JS might be too slow for your purpose. It really depends on your problem.
p.s. If you include more details of what you are trying to do, you'll probably get more detailed / specific answers.
Anecdotally, I wrote a tile-based game in the early 2000's using the old glVertex() style API that ran perfectly smoothly. I recently started port it to WebGL and glDrawArrays() and now on my modern PC that is at least 10 times faster it gets terrible performance.
The reason seems to be that I was faking a call go glBegin(GL_QUADS); glVertex()*4; glEnd(); by using glDrawArrays(). Using glDrawArrays() to draw one polygon is much much slower in WebGL than doing the same with glVertex() was in C++.
I don't know why this is. Maybe it is because javascript is dog slow. Maybe it is because of some context switching issues in javascript. Anyway I can only do around 500 one-polygon glDrawArray() calls while still getting 60 FPS.
Everybody seems to work around this by doing as much on the GPU as possible, and doing as few glDrawArray() calls per frame as possible. Whether you can do this depends on what you are trying to draw. In the example of cubes you linked they can do everything on the GPU, including moving the cubes, which is why it is fast. Essentially they cheated - typically WebGL apps won't be like that.
Google had a talk where they explained this technique (they also unrealistically calculate the object motion on the GPU): https://www.youtube.com/watch?v=rfQ8rKGTVlg
OpenGL is more flexible and more optimized because of the newer versions of the api used.
It is true if you say that OpenGL is faster and more capable, but it also depends on your needs.
If you need one cube mesh with texture, webGL would be sufficient. However, if you intend building large-scale projects with lots of vertices, post-processing effects and different rendering techniques (and kind of displacement, parallax mapping, per-vertex or maybe tessellation) then OpenGL might be a better and wiser choice actually.
Optimizing buffers to a single call, optimizing update of those can be done, but it has its limits, of course, and yes, OpenGL would most likely perform better anyway.
To answer, it is not a language bottleneck, but an api-version-used one.
WebGL is based upon OpenGL ES, which has some pros but also runs a bit slower and it has more abstraction levels for code handling than pure OpenGL has, and that is reason for lowering performance - more code needs to be evaluated.
If your project doesn't require web-based solution, and doesn't care which devices are supported, then OpenGL would be a better and smarter choice.
Hope this helps.
WebGL is much slower on the same hardware compared to equivalent OpenGL, because of the high overheard for each WebGL call.
On desktop OpenGL, this overhead is at least limited, if still relatively expensive.
But in browsers like Chrome, WebGL requires that not only do we cross the FFI barrier to access those native OpenGL API calls (which still incur the same overhead), but we also have the cost of security checks to prevent the GPU being hijacked for computation.
If you are looking at something like glDraw* calls, which are called per frame, this means we are talking about perhaps (an) order(s) of magnitude fewer calls. All the more reason to opt for something like instancing, where the number of calls is drastically reduced.

Which OpenGL functions are not GPU-accelerated?

I was shocked when I read this (from the OpenGL wiki):
glTranslate, glRotate, glScale
Are these hardware accelerated?
No, there are no known GPUs that
execute this. The driver computes the
matrix on the CPU and uploads it to
the GPU.
All the other matrix operations are
done on the CPU as well :
glPushMatrix, glPopMatrix,
glLoadIdentity, glFrustum, glOrtho.
This is the reason why these functions
are considered deprecated in GL 3.0.
You should have your own math library,
build your own matrix, upload your
matrix to the shader.
For a very, very long time I thought most of the OpenGL functions use the GPU to do computation. I'm not sure if this is a common misconception, but after a while of thinking, this makes sense. Old OpenGL functions (2.x and older) are really not suitable for real-world applications, due to too many state switches.
This makes me realise that, possibly, many OpenGL functions do not use the GPU at all.
So, the question is:
Which OpenGL function calls don't use the GPU?
I believe knowing the answer to the above question would help me become a better programmer with OpenGL. Please do share some of your insights.
Edit:
I know this question easily leads to optimisation level. It's good, but it's not the intention of this question.
If anyone knows a set of GL functions on a certain popular implementation (as AshleysBrain suggested, nVidia/ATI, and possibly OS-dependent) that don't use the GPU, that's what I'm after!
Plausible optimisation guides come later. Let's focus on the functions, for this topic.
Edit2:
This topic isn't about how matrix transformations work. There are other topics for that.
Boy, is this a big subject.
First, I'll start with the obvious: Since you're calling the function (any function) from the CPU, it has to run at least partly on the CPU. So the question really is, how much of the work is done on the CPU and how much on the GPU.
Second, in order for the GPU to get to execute some command, the CPU has to prepare a command description to pass down. The minimal set here is a command token describing what to do, as well as the data for the operation to be executed. How the CPU triggers the GPU to do the command is also somewhat important. Since most of the time, this is expensive, the CPU does not do it often, but rather batches commands in command buffers, and simply sends a whole buffer for the GPU to handle.
All this to say that passing work down to the GPU is not a free exercise. That cost has to be pitted against just running the function on the CPU (no matter what we're talking about).
Taking a step back, you have to ask yourself why you need a GPU at all. The fact is, a pure CPU implementation does the job (as AshleysBrain mentions). The power of the GPU comes from its design to handle:
specialized tasks (rasterization, blending, texture filtering, blitting, ...)
heavily parallel workloads (DeadMG is pointing to that in his answer), when a CPU is more designed to handle single-threaded work.
And those are the guiding principles to follow in order to decide what goes in the chip. Anything that can benefit from those ought to run on the GPU. Anything else ought to be on the CPU.
It's interesting, by the way. Some functionality of the GL (prior to deprecation, mostly) are really not clearly delineated. Display lists are probably the best example of such a feature. Each driver is free to push as much as it wants from the display list stream to the GPU (typically in some command buffer form) for later execution, as long as the semantics of the GL display lists are kept (and that is somewhat hard in general). So some implementations only choose to push a limited subset of the calls in a display list to a computed format, and choose to simply replay the rest of the command stream on the CPU.
Selection is another one where it's unclear whether there is value to executing on the GPU.
Lastly, I have to say that in general, there is little correlation between the API calls and the amount of work on either the CPU or the GPU. A state setting API tends to only modify a structure somewhere in the driver data. It's effect is only visible when a Draw, or some such, is called.
A lot of the GL API works like that. At that point, asking whether glEnable(GL_BLEND) is executed on the CPU or GPU is rather meaningless. What matters is whether the blending will happen on the GPU when Draw is called. So, in that sense, Most GL entry points are not accelerated at all.
I could also expand a bit on data transfer but Danvil touched on it.
I'll finish with the little "s/w path". Historically, GL had to work to spec no matter what the hardware special cases were. Which meant that if the h/w was not handling a specific GL feature, then it had to emulate it, or implement it fully in software. There are numerous cases of this, but one that struck a lot of people is when GLSL started to show up.
Since there was no practical way to estimate the code size of a GLSL shader, it was decided that the GL was supposed to take any shader length as valid. The implication was fairly clear: either implement h/w that could take arbitrary length shaders -not realistic at the time-, or implement a s/w shader emulation (or, as some vendors chose to, simply fail to be compliant). So, if you triggered this condition on a fragment shader, chances were the whole of your GL ended up being executed on the CPU, even when you had a GPU siting idle, at least for that draw.
The question should perhaps be "What functions eat an unexpectedly high amount of CPU time?"
Keeping a matrix stack for projection and view is not a thing the GPU can handle better than a CPU would (on the contrary ...). Another example would be shader compilation. Why should this run on the GPU? There is a parser, a compiler, ..., which are just normal CPU programs like the C++ compiler.
Potentially "dangerous" function calls are for example glReadPixels, because data can be copied from host (=CPU) memory to device (=GPU) memory over the limited bus. In this category are also functions like glTexImage_D or glBufferData.
So generally speaking, if you want to know how much CPU time an OpenGL call eats, try to understand its functionality. And beware of all functions, which copy data from host to device and back!
Typically, if an operation is per-something, it will occur on the GPU. An example is the actual transformation - this is done once per vertex. On the other hand, if it occurs only once per large operation, it'll be on the CPU - such as creating the transformation matrix, which is only done once for each time the object's state changes, or once per frame.
That's just a general answer and some functionality will occur the other way around - as well as being implementation dependent. However, typically, it shouldn't matter to you, the programmer. As long as you allow the GPU plenty of time to do it's work while you're off doing the game sim or whatever, or have a solid threading model, you shouldn't need to worry about it that much.
#sending data to GPU: As far as I know (only used Direct3D) it's all done in-shader, that's what shaders are for.
glTranslate, glRotate and glScale change the current active transformation matrix. This is of course a CPU operation. The model view and projection matrices just describes how the GPU should transforms vertices when issue a rendering command.
So e.g. by calling glTranslate nothing is translated at all yet. Before rendering the current projection and model view matrices are multiplied (MVP = projection * modelview) then this single matrix is copied to the GPU and then the GPU does the matrix * vertex multiplications ("T&L") for each vertex. So the translation/scaling/projection of the vertices is done by the GPU.
Also you really should not be worried about the performance if you don't use these functions in an inner loop somewhere. glTranslate results in three additions. glScale and glRotate are a bit more complex.
My advice is that you should learn a bit more about linear algebra. This is essential for working with 3D APIs.
There are software rendered implementations of OpenGL, so it's possible that no OpenGL functions run on the GPU. There's also hardware that doesn't support certain render states in hardware, so if you set a certain state, switch to software rendering, and again, nothing will run on the GPU (even though there's one there). So I don't think there's any clear distinction between 'GPU-accelerated functions' and 'non-GPU accelerated functions'.
To be on the safe side, keep things as simple as possible. The straightforward rendering-with-vertices and basic features like Z buffering are most likely to be hardware accelerated, so if you can stick to that with the minimum state changing, you'll be most likely to keep things hardware accelerated. This is also the way to maximize performance of hardware-accelerated rendering - graphics cards like to stay in one state and just crunch a bunch of vertices.

How expensive are OpenGL operations?

I'm curious how expensive functions like:
glViewPort
glLoadIdentity
glOrtho
are in terms of both the work done on the CPU and the work done on the GPU.
Where is this documented?
This kind of thing is probably pretty dependent on your platform. Your best bet is probably to use a profiler yourself if you're worried about it.
As Alex O'Konski mentions, this is highly dependent on the platform.
That said, if you're interested in recent graphics cards of the PCs, You should know that most of them don't "do work" on the GPU. they set up state for future draw calls.
This is important because their cost is more related with how well the GPU can pipeline them between various draw calls that flow through the chip than how much time it takes to change a register from one value to the next.
Most platform vendors do not document at all what the costs of various state changes are. They don't document how OpenGL state maps to their hardware state, for that matter.
Last, state changes like matrix state (glLoadIdentity and glOrtho) are a remnant of the past. In modern graphics cards, they are simply helper (CPU) functions to set up uniforms (and this is why they are deprecated in core GL 3.1). And all the math they require (usually not much) is done on the CPU, inside the driver.

Should I call glEnable and glDisable every time I draw something?

How often should I call OpenGL functions like glEnable() or glEnableClientState() and their corresponding glDisable counterparts? Are they meant to be called once at the beginning of the application, or should I keep them disabled and only enable those features I immediately need for drawing something? Is there a performance difference?
Warning: glPushAttrib / glPopAttrib are DEPRECATED in the modern OpenGL programmable pipeline.
If you find that you are checking the value of state variables often and subsequently calling glEnable/glDisable you may be able to clean things up a bit by using the attribute stack (glPushAttrib / glPopAttrib).
The attribute stack allows you to isolate areas of your code and such that changes to attribute in one sections does not affect the attribute state in other sections.
void drawObject1(){
glPushAttrib(GL_ENABLE_BIT);
glEnable(GL_DEPTH_TEST);
glEnable(GL_LIGHTING);
/* Isolated Region 1 */
glPopAttrib();
}
void drawObject2(){
glPushAttrib(GL_ENABLE_BIT);
glEnable(GL_FOG);
glEnable(GL_GL_POINT_SMOOTH);
/* Isolated Region 2 */
glPopAttrib();
}
void drawScene(){
drawObject1();
drawObject2();
}
Although GL_LIGHTING and GL_DEPTH_TEST are set in drawObject1 their state is not preserved to drawObject2. In the absence of glPushAttrib this would not be the case. Also - note that there is no need to call glDisable at the end of the function calls, glPopAttrib does the job.
As far as performance, the overhead due of individual function calls to glEnable/glDisable is minimal. If you need to be handling lots of state you will probably need to create your own state manager or make numerous calls to glGetInteger... and then act accordingly. The added machinery and control flow could made the code less transparent, harder to debug, and more difficult to maintain. These issues may make other, more fruitful, optimizations more difficult.
The attribution stack can aid in maintaining layers of abstraction and create regions of isolation.
glPushAttrib manpage
"That depends".
If you entire app only uses one combination of enable/disable states, then by all means just set it up at the beginning and go.
Most real-world apps need to mix, and then you're forced to call glEnable() to enable some particular state(s), do the draw calls, then glDisable() them again when you're done to "clear the stage".
State-sorting, state-tracking, and many optimization schemes stem from this, as state switching is sometimes expensive.
First of all, which OpenGL version do you use? And which generation of graphics hardware does your target group have? Knowing this would make it easier to give a more correct answer. My answer assumes OpenGL 2.1.
OpenGL is a state machine, meaning that whenever a state is changed, that state is made "current" until changed again explicitly by the programmer with a new OpenGL API call. Exceptions to this rule exist, like client state array calls making the current vertex color undefined. But those are the exceptions which define the rule.
"once at the beginning of the application" doesn't make much sense, because there are times you need to destroy your OpenGL context while the application is still running. I assume you mean just after every window creation. That works for state you don't need to change later. Example: If all your draw calls use the same vertex array data, you don't need to disable them with glDisableClientState afterwards.
There is a lot of enable/disable state associated with the old fixed-function pipeline. The easy redemption for this is: Use shaders! If you target a generation of cards at most five years old, it probably mimics the fixed-function pipeline with shaders anyway. By using shaders you are in more or less total control of what's happening during the transform and rasterization stages, and you can make your own "states" with uniforms, which are very cheap to change/update.
Knowing that OpenGL is a state machine like I said above should make it clear that one should strive to keep the state changes to a minimum, as long as it's possible. However, there are most probably other things which impacts performance much more than enable/disable state calls. If you want to know about them, read down.
The expense of state not associated with the old fixed-function state calls and which is not simple enable/disable state can differ widely in cost. Notably linking shaders, and binding names, ("names" of textures, programs, buffer objects) are usually fairly expensive. This is why a lot of games and applications used to sort the draw order of their meshes according to the texture. In that way, they didn't have to bind the same texture twice. Nowadays however, the same applies to shader programs. You don't want to bind the same shader program twice if you don't have to. Also, not all the features in a particular OpenGL version are hardware accelerated on all cards, even if the vendors of those cards claim they are OpenGL compliant. Being compliant means that they follow the specification, not that they necessarily run all the featurs efficiently. Some of the functions like glHistogram and glMinMax from GL__ ARB __imaging should be remembered in this regard.
Conclusion: Unless there is an obvious reason not to, use shaders! It saves you from a lot of uncecessary state calls since you can use uniforms instead. OpenGL shaders have just been around for about six years you know. Also, the overhead of enable/disable state changes can be an issue, but usually there is a lot more to gain from optimizing other more expensive state changes, like glUseProgram, glCompileShader, glLinkprogram, glBindBuffer and glBindTexture.
P.S: OpenGL 3.0 removed the client state enable/disable calls. They are implicitly enabled as draw arrays is the only way to draw in this version. Immediate mode was removed. gl..Pointer calls were removed as well, since one really just needs glVertexAttribPointer.
A rule of thumb that I was taught said that it's almost always cheaper to just enable/disable at will rather than checking the current state and changing only if needed.
That said, Marc's answer is something that should definitely work.