Should I prefer glMapBufferRange over glMapBuffer? - c++

The documentation for glMapBuffer says it can only use the enum access specifiers of GL_READ_ONLY, GL_WRITE_ONLY, or GL_READ_WRITE.
The documentation for glMapBufferRange says it uses bitflag access specifiers instead, which include a way to persistently map the buffer with GL_MAP_PERSISTENT_BIT.
I want to map the buffer persistently, so should I just always use glMapBufferRange even though I want to map the entire buffer? I haven't seen anyone point out this rather important distinction between the two functions, so I was wondering if glMapBufferRange is a total and complete replacement for glMapBuffer, or if I should be prepared to use both in certain situations?
(I guess I'm just confused because, given the naming, I would think only the sub-range would be the difference between the two calls.)

glMapBufferRange was first introduced in OpenGL 3. OpenGL has evolved to provide more control to developers while keeping backwards compatibility as much as possible. So glMapBuffer remained unchanged, and glMapBufferRange introduced the explicitness that developers wanted (not only the subrange part, but also other bits).
glMapBufferRange reminds me of the options that are available nowadays in Vulkan (i.e cache invalidation and explicit synchronization). For some use cases, you might get better performance using the right flags with the new function. If you don't use any of the optional flags, the behaviour should be equivalent to the old function (except for the subrange part).
I think I would always use glMapBufferRange as it can do everything that the other does. Plus you can tweak for performance later on. Just my humble opinion :)

Related

Is there a `std::` equivalent to libdispatch's `dispatch_data_t`?

I'm fond of dispatch_data_t. It provides a useful abstraction on top of a range of memory: it provides reference counting, allows consumers to create arbitrary sub-ranges (which participate in the ref counting of the parent range), concatenate sub-ranges, etc. (I won't bother to get into the gory details -- the docs are right over here: Managing Dispatch Data Objects)
I've been trying to find out if there's a C++11 equivalent, but the terms "range", "memory" and "reference counting" are pretty generic, which is making googling for this a bit of a challenge. I suspect that someone who spends more time with the C++ Standard Library than I do might know off the top of their head.
Yes, I'm aware that I can use the dispatch_data_t API from C++ code, and yes, I'm aware that it would not be difficult to crank out a naive first pass implementation of such a thing, but I'm specifically looking for something idiomatic to C++, and with a high degree of polish/reliability. (Boost maybe?)
No.
Range views are being proposed for future standard revisions, but they are non-owning.
dispatch_data_t is highly tied to GCD in that cleanup occurs in a specified queue determined at creation: to duplicate that behaviour, we would need thread pools and queues in std, which we do not have.
As you have noted, an owning overlapping immutable range type into sparse or contiguous memory would not be hard to write up. Fully poished it would have to support allocators, some kind of raw input buffer system (type erasure on the owning/destruction mechanism?), have utlities for asynchronous iteration by block (with tuned block size), deal with errors and exceptions carefully, and some way to efficiently turn rc 1 views into mutable versions.
Something that complex would first have to show up in a library like boost and go through iterative improvements. And as it is quite many faceted, something with enough of its properties for your purposes may already be there.
If you roll your own I encourage you to submit it for boost consideration.

Proper usage of OpenGL method glTexImage2D()?

The specification for the OpenGL method glTexImage2D() gives a large table of accepted internalFormat parameters. I'm wondering though, if it really matters what I set this parameter as, since the doc says
If an application wants to store the texture at a certain
resolution or in a certain format, it can request the resolution
and format with internalFormat. The GL will choose an internal
representation that closely approximates that requested by internalFormat, but
it may not match exactly.
which makes it seem as though OpenGL is just going to pick what it wants anyways. Should I bother getting an images bit depth and setting the internalFormat to something like GL_RGBA8 or GL_RGBA16? All the code examples I've seen just use GL_RGBA...
which makes it seem as though OpenGL is just going to pick what it wants anyways.
This is very misleading.
There are a number of formats that implementations are required to support more or less exactly as described. Implementations are indeed permitted to store them in larger storage. But they're not permitted to lose precision compared to them. And there are advantages to using them (besides the obvious knowledge of exactly what you're getting).
First, it allows you to use specialized formats like GL_RGB10_A2, which is handy in certain situations (storing linear color values for deferred rendering, etc). Second, FBOs are required to support any combination of image formats, but only if all of those image formats come from the list of required color formats for textures/renderbuffers (but not the texture-only). If you're using any other internal formats, FBOs can throw GL_FRAMEBUFFER_UNSUPPORTED at you.
Third, immutable texture storage functions require the use of sized internal formats. And you should use those whenever they're available.
In general, you should always use sized internal formats. There's no reason to use the generic ones.
Using a generic internal format OpenGL will choose whatever it "likes" best, and tell it that you don't care. With an explicit internal format, you're telling OpenGL, that you actually care about the internal representation (most likely because you need the precision). While an implementation is free to up- or downgrade if an exact match can not be made, the usual fallback is to upgrade to the next higher format precision that can satisfy the requested demands.
Should I bother getting an images bit depth and setting the internalFormat
If you absolutely require the precision, then yes. If your concerns are more about performance, then no, as the usual default of the OpenGL implementations being around, is to choose the internal format for best performance if no specific format has been requested.

Deprecated OpenGL functions still used in shader-oriented applications

Why do people tend to mix deprecated fixed-function pipeline features like the matrix stack, gluPerspective(), glMatrixMode() and what not when this is meant to be done manually and shoved into GLSL as a uniform.
Are there any benefits to this approach?
There is a legitimate reason to do this, in terms of user sanity. Fixed-function matrices (and other fixed-function state tracked in GLSL) are global state, shared among all uniforms. If you want to change the projection matrix in every shader, you can do that by simply changing it in one place.
Doing this in GLSL without fixed function requires the use of uniform buffers. Either that, or you have to build some system that will farm state information to every shader that you want to use. The latter is perfectly doable, but a huge hassle. The former is relatively new, only introduced in 2009, and it requires DX10-class hardware.
It's much simpler to just use fixed-function and GLSL state tracking.
No benefits as far as I'm aware of (unless you consider not having to recode the functionality a benefit).
Most likely just laziness, or a lack of knowledge of the alternative method.
Essentially because those applications requires shaders to run, but programmers are too lazy/stressed to re-implement those features that are already available using OpenGL compatibility profile.
Notable features that are "difficult" to replace are the line width (greater than 1), the line stipple and separate front and back polygon mode.
Most tutorials teach deprecated OpenGL, so maybe people don't know better.
The benefit is that you are using well-known, thoroughly tested and reliable code. If it's for MS Windows or Linux proprietary drivers, written by the people who built your GPU and therefore can be assumed to know how to make it really fast.
An additional benefit for group projects is that There Is Only One Way To Do It. No arguments about whether you should be writing your own C++ matrix class and what it should be called and which operators to overload and whether the internal implementation should be a 1D or 2D arrary...

Relative cost of various OpenGL functions?

I am trying to optimize some OpenGL code and I was wondering if someone knows of a table that would give a rough approximation of the relative costs of various OpenGL functions ?
Something like (these numbers are probably completely wrong) :
method cost
glDrawElements(100 indices) 1
glBindTexture(512x512) 2
glGenBuffers(1 buffer) 1.2
If that doesn't exist, would it be possible to build one or are the various hardware/OS too different for that to be even meaningful ?
There certainly is no such list. One of the problems in creating such a list is answering the question, "what kind of cost?"
All rendering functions have a GPU-time cost. That is, the GPU has to do rendering. How much of a cost depends on the shaders in use, the number of vertices provided, and the textures being used.
Even for CPU time cost, the values are not clear. Take glDrawElements. If you changed the vertex attribute bindings before calling it, then it can take more CPU time than if you didn't. Similarly, if you changed uniform values in a program since you last used it, then rendering with that program may take longer. And so forth.
The main problem with assembling such a list is that it encourages premature optimization. If you have such a list, then users will be encouraged to take steps to avoid using functions that cost more. They may take too many steps along this route. No, it's better to just avoid the issue entirely and encourage users to actually profile their applications before optimizing them.
The relative costs of different OpenGL functions will depend heavily on the arguments to the function, the active OpenGL environment when they are called, and the GPU, drivers, and OS you're running on. There's really no good way to do a comparison like what you're describing -- your best bet is simply to test out the different possibilities and see what performs best for you.

C/C++ bitfields versus bitwise operators to single out bits, which is faster, better, more portable?

I need to pack some bits in a byte in this fashion:
struct
{
char bit0: 1;
char bit1: 1;
} a;
if( a.bit1 ) /* etc */
or:
if( a & 0x2 ) /* etc */
From the source code clarity it's pretty obvious to me that bitfields are neater. But which option is faster? I know the speed difference won't be too much if any, but as I can use any of them, if one's faster, better.
On the other hand, I've read that bitfields are not guaranteed to arrange bits in the same order across platforms, and I want my code to be portable.
Notes: If you plan to answer 'Profile' ok, I will, but as I'm lazy, if someone already has the answer, much better.
The code may be wrong, you can correct me if you want, but remember what the point to this question is and please try and answer it too.
Bitfields make the code much clearer if they are used appropriately. I would use bitfields as a space saving device only. One common place I've seen them used is in compilers: Often type or symbol information consists of a bunch of true/false flags. Bitfields are ideal here since a typical program will have many thousands of these nodes created when it is compiled.
I wouldn't use bitfields to do a common embedded programming job: reading and writing device registers. I prefer using shifts and masks here because you get exactly the bits the documentation tells you you need and you don't have to worry about differences in various compilers implementation of bitfields.
As for speed, a good compiler will give the same code for bitfields that masking will.
I would rather use the second example in preference for maximum portability. As Neil Butterworth pointed out, using bitfields is only for the native processor. Ok, think about this, what happens if Intel's x86 went out of business tomorrow, the code will be stuck, which means having to re-implement the bitfields for another processor, say RISC chip.
You have to look at the bigger picture and ask how did OpenBSD manage to port their BSD systems to a lot of platforms using one codebase? Ok, I'll admit that is a bit over the top, and debatable and subjective, but realistically speaking, if you want to port the code to another platform, its the way to do it by using the second example you used in your question.
Not alone that, compilers for different platforms would have their own way of padding, aligning bitfields for the processor where the compiler is on. And furthermore, what about the endianess of the processor?
Never rely on bitfields as a magic bullet. If you want speed for the processor and will be fixed on it, i.e. no intention of porting, then feel free to use bitfields. You cannot have both!
C bitfields were stillborn from the moment they were invented - for unknown reason. People didn't like them and used bitwise operators instead. You have to expect other developers to not understand C bitfield code.
With regard to which is faster: Irrelevant. Any optimizing compiler (which means practically all) will make the code do the same thing in whatever notation. It's a common misconception of C programmers that the compiler will just search-and-replace keywords into assembly. Modern compilers use the source code as a blueprint to what shall be achieved and then emit code that often looks very different but achieves the intended result.
The first one is explicit and whatever the speed the second expression is error-prone because any change to your struct might make the second expression wrong.
So use the first.
If you want portability, avoid bitfields. And if you are interested in performance of specific code, there is no alternative to writing your own tests. Remember, bitfields will be using the processor's bitwise instructions under the hood.
I think a C programmer would tend toward the second option of using bit masks and logic operations to deduce the value of each bit. Rather than having the code littered with hex values, enumerations would be set up, or, usually when more complex operations are involved, macros to get/set specific bits. I've heard on the grapevine, that struct implemented bitfields are slower.
Don't read to much in the "non portable bit-fields". There are two aspects of bit fields which are implementation defined: the signedness and the layout and one unspecified one: the alignment of the allocation unit in which they are packed. If you don't need anything else that the packing effect, using them is as portable (provided you specify explicitly a signed keyword where needed) as function calls which also have unspecified properties.
Concerning the performance, profile is the best answer you can get. In a perfect world, there would be no difference between the two writings. In practice, there can be some, but I can think of as many reasons in one direction as the other. And it can be very sensitive to context (logically meaningless difference between unsigned and signed for instance) so measure in context...
To sum up, the difference is thus mostly a style difference in cases where you have truly the choice (i.e. not if the precise layout is important). In those case, it is a optimization (in size, not in speed) and so I'd tend first to write the code without it and add it after if needed. So, bit fields are the obvious choice (the modifications to be done are the smallest one to achieve the result and are contained to the unique place of definition instead of being spread of to all the places of use).