OpenGL optimised representation of textures - opengl

I have read many places that one should avoid OpenGL textures that are 3 bytes long and should always use 4 bytes bus aligned representations. Keeping that in mind, I have a couple of questions about the glTextImage2D API.
From the documentation, the API signature is:
void glTexImage2D(GLenum target, GLint level, GLint internalformat,
GLsizei width, GLsizei height, GLint border,
GLenum format, GLenum type, const GLvoid * data);
Am I correct in assuming that if I have an RGB image and I want a RGBA representation, it is suffice to specigy the internalformat parameter as GL_RGBA and the format parameter as GL_RGB? Would there be an internal conversion between the formats when generating the texture?
My second question is what if I have grayscale data (so just one channel). Is it ok to represent this as GL_RED or is it also better to have a 4 byte representation for this?

I disagree with the recommendation you found to avoid RGB formats. I can't think of a good reason to avoid RGB textures. Some GPUs support them natively, many other do not. You have two scenarios:
The GPU does not support RGB textures. It will use a format that is largely equivalent to RGBA for storage, and ignore the A component during sampling.
The GPU supports RGB textures.
In scenario 1, you end up with pretty much the same thing as you get when specifying RGBA as the internal format. In scenario 2, you save 25% of memory (and the corresponding bandwidth) by actually using a RGB internal format.
Therefore, using RGB for the internal format is never worse, and can be better on some systems.
What you have in your code fragment is completely legal in desktop OpenGL. The RGB data you provide will be expanded to RGBA by filling the A component with 1.0. This would not be the case in OpenGL ES, where you can only use a very controlled number of formats for each internal format, which mostly avoids format conversions during TexImage and TexSubImage operations.
It is generally beneficial to match the internal format and the format/type to avoid conversions. But even that is not as clear cut as it might seem. Say you compare loading RGB data or RGBA data into a texture with an internal RGBA format. Loading RGBA has a clear advantage by not using a format conversion. Since on the other hand the RGB data is smaller, loading it requires less memory bandwidth, and might cause less cache pollution.
Now, the memory bandwidth of modern computer systems is so high that you can't really saturate it with sequential access from a single core. So the option that avoids conversion is likely to be better. But it's going to be very platform and situation dependent. For example, if intermediate copies are needed, the smaller amount of data could win. Particularly if the actual RGB to RGBA expansion can be done as part of a copy performed by the GPU.
One thing I would definitely avoid is doing conversions in your own code. Say you get RGB data from somewhere, and you really do need to load it into a RGBA texture. Drivers have highly optimized code for these conversions, or they might even happen as part of a GPU blit. And it's always going to be better to perform the conversion as part of a copy, compared to your code creating another copy of the data.
Sounds confusing? These kinds of tradeoffs are very common when looking at performance. It's often the case that to get the optimal performance with OpenGL, you need to use a different code path for different GPUs/platforms.

Am I correct in assuming that if I have an RGB image and I want a RGBA representation, it is suffice to specigy the internalformat parameter as GL_RGBA and the format parameter as GL_RGB? Would there be an internal conversion between the formats when generating the texture?
This will work, and GL will assign a constant 1.0 to the alpha component for every texel. OpenGL is required to convert compatible image data into the GPU's native format during pixel transfer, and this includes things like adding extra image channels, converting from floating-point to fixed-point, swapping bytes for endian differences between CPU and GPU.
For best pixel transfer performance, you need to eliminate all of that extra work that GL would do. That means if you used GL_RGBA8 as your internal format your data type should be GL_UNSIGNED_BYTE and the pixels should be some variant of RGBA (often GL_BGRA is the fast path -- the data can be copied straight from CPU to GPU unaltered).
Storing only 3 components on the CPU side will obviously prevent a direct copy, and will make the driver do more work. Whether this matters really depends on how much and how frequently you transfer pixel data.
My second question is what if I have grayscale data (so just one channel). Is it ok to represent this as GL_RED or is it also better to have a 4 byte representation for this?
GL_RED is not a representation of your data. That only tells GL which channels the pixel transfer data contains, you have to combine it with a data type (e.g. GL_UNSIGNED_BYTE) to make any sense out of it.
GL_R8 would be a common internal format for a grayscale image, and that is perfectly fine. The rule of thumb you need to concern yourself with is actually that data sizes needs to be aligned to a power-of-two. So 1-, 2-, 4- and 8-byte image formats are fine. The oddball is 3, which happens when you try to use something like GL_RGB8 (the driver is going to have to pad that image format for alignment).
Unless you actually need 32-bits worth of gradiation (4.29 billion shades of gray!) in your grayscale, stick with either GL_R8 (256 shades) or GL_R16 (65536 shades) for internal format.

Related

OpenGL: alpha-to-coverage cross-fade

If using alpha-to-coverage without explicitly setting the samples from the shader (a hardware 4.x feature?), is the coverage mask for alpha value ‘a‘ then guaranteed to be the bit-flip of the coverage mask for alpha value ‘1.f-a‘?
Or in other words: if i render two objects in the same location, and the pixel alphas of the two objects sum up to 1.0, is it then guaranteed that all samples of the pixel get written to (assuming both objects fully cover the pixel)?
The reason why I ask is that I want to crossfade two objects and during the crossfade each object should still properly depth-sort in respect to itself (without interacting with the depth values of the other object and without becoming ‚see-through‘).
If not, how can I realize such a ‚perfect‘ crossfade in a single render pass?
The logic for alpha-to-coverage computation is required to have the same invariance and proportionality guarantees as GL_SAMPLE_COVERAGE (which allows you to specify a floating-point coverage value applied to all fragments in a given rendering command).
However, said guarantees are not exactly specific:
It is intended that the number of 1’s in this value be proportional to the sample coverage value, with all 1’s corresponding to a value of 1.0 and all 0’s corresponding to 0.0.
Note the use of the word "intended" rather than "required". The spec is deliberately super-fuzzy on all of this.
Even the invariance is really fuzzy:
The algorithm can and probably should be different at different pixel locations. If it does differ, it should be defined relative to window, not screen, coordinates, so that rendering results are invariant with respect to window position.
Again, note the word "should". There are no actual requirements here.
So basically, the answer to all of your questions are "the OpenGL specification provides no guarantees for that".
That being said, the general thrust of your question suggests that you're trying to (ab)use multisampling to do cross-fading between two overlapping things without having to do a render-to-texture operation. That's just not going to work well, even if the standard actually guaranteed something about the alpha-to-coverage behavior.
Basically, what you're trying to do is multisample-based dither-based transparency. But like with standard dithering methods, the quality is based entirely on the number of samples. A 16x multisample buffer (which is a huge amount of multisampling) would only give you an effective 16 levels of cross-fade. This would make any kind of animated fading effect not smooth at all.
And the cost of doing 16x multisampling is going to be substantially greater than the cost of doing render-to-texture cross-fading. Both in terms of rendering time and memory overhead (16x multisample buffers are gigantic).
If not, how can I realize such a ‚perfect‘ crossfade in a single render pass?
You can't; not in the general case. Rasterizers accumulate values, with new pixels doing math against the accumulated value of all of the prior values. You want to have an operation do math against a specific previous operation, then combine those results and blend against the rest of the previous operations.
That's simply not the kind of math a rasterizer does.

High performance OpenCV matrix conversion from 16UC3 to 8UC3

I have an OpenCV CV_16UC3 matrix in which only the lower 8Bit per channel are occupied. I want to create a CV_8UC3 from it. Currently I use this method:
cv::Mat mat8uc3_rgb(imgWidth, imgHeight, CV_8UC3);
mat16uc3_rgb.convertTo(mat8uc3_rgb, CV_8UC3);
This has the desired result, but I wonder if it can be faster or more performant somehow.
Edit:
The entire processing chain consists of only 4 sub-steps (computing time framewise determined by QueryPerformanceCounter measurement on video scene)
mount raw byte buffer in OpenCV-Mat:
cv::Mat mat16uc1_bayer(imgHeight, RawImageWidth, CV_16UC1, (uint8*)payload);
De-Mosaiking
-> cv::cvtColor(mat16uc1_bayer, mat16uc3_rgb, cv::COLOR_BayerGR2BGR);
needs 0.008808[s]
pixel shift (only 12 of the 16 bits are occupied, but we only need 8 of them)
-> uses openCV parallel access to the pixels using mat16uc3_rgb.forEach<>
needs 0.004927[s]
conversion from CV_16UC3 to CV_8UC3
mat16uc3_rgb.convertTo(mat8uc3_rgb, CV_8UC3);
needs 0.006913[s]
I think I won't be able to do without the conversion of the raw buffer into CvMat or demosaiking. The pixel shift probably won't accelerate any further (here the parallelized forEach() is already used). I hoped that when converting from CV_8UC3 to CV_16UC3 an update of the matrix header info or similar would be possible, because the matrix data is already correct and doesn't have to be scaled anymore or similar.
I think you can safely assume that cv::Mat::convertTo is the fastest possible implementation of that operation.
Seeing you are going from one colorspace to another, it will likely not be a zero-cost operation. Memory copy is required for rearranging.
If you are designing a very high-performance system, you should do in-depth analysis of your bottlenecks, and redesign you system to minimize them. Ask yourself: is this conversion really required at this point? Can I solve it by making a custom function that integrates multiple operations in one? Can I use CPU parallelism extensions, multithreading or GPU acceleration? Etc.

Supplying data to an immutable texture storage

Is there a way to directly (i.e., not a copy or a fill) supply data to a texture with immutable storage that isn't glTexSubImage*? I know there might not be an actual need for a separate function to fulfill this role, but I do wonder why you can't specify the data in the same line you allocate your memory (in the same manner as immutable buffer objects).
I do wonder why you can't specify the data in the same line you allocate your memory (in the same manner as immutable buffer objects).
Textures are far more complex than buffer objects.
The immutable storage APIs for textures allocate all of the specified mipmap levels for that texture. However, the various pixel transfer functions only ever upload, at most, a single mipmap level's worth of pixel data in a single operation. There is no provision in any pixel transfer operation to transfer more than a single mipmap level of data. Indeed, the process of pixel transferring only makes sense with a single image of a given dimensionality (array textures are considered treated as a higher-dimensional image). Mipmap levels change their size from layer to layer, which makes the pixel transfer operation change its meaning. Particularly with regard to some of the pixel transfer parameters like sub-image selectors.
So if you're uploading to a texture with multiple mipmaps, you're going to need to make multiple calls anyway. So what's one more?
Furthermore, note that glTexImage* and glCompressedTexImage* take different parameters. The former takes pixel data via pixel transfer functionality, and the latter takes pre-compressed data. But they both allocate storage. If you're going to make glTexStorage* more analogous to glTexImage*, then you have to also add glCompressedTexStorage*. So now you have another series of functions, and the only difference is what kind of upload they do.
By contrast, buffers only contain a single array of bytes, and there's only one way to upload to them.
Overall, it's better to just use the existing SubImage infrastructure for uploading, and have allocation be entirely separate from that.

What is the purpose of OpenGL texture buffer objects?

We use buffer objects for reducing copy operations from CPU-GPU and for texture buffer objects we can change target from vertex to texture in buffer objects. Is there any other advantage here of texture buffer objects? Also, it does not allow filtering, is there any disadvantage of this?
A buffer texture is similar to a 1D-texture but has a backing buffer store that's not part of the texture object (in contrast to any other texture object) but realized with an actual buffer object bound to TEXTURE_BUFFER. Using a buffer texture has several implications and, AFAIK, one use-case that can't be mapped to any other type of texture.
Note that a buffer texture is not a buffer object - a buffer texture is merely associated with a buffer object using glTexBuffer.
By comparison, buffer textures can be huge. Table 23.53 and following of the core OpenGL 4.4 spec defines a minimum maximum (i.e. the minimal value that implementations must provide) number of texels MAX_TEXTURE_BUFFER_SIZE. The potential number of texels being stored in your buffer object is computed as follows (as found in GL_ARB_texture_buffer_object):
floor(<buffer_size> / (<components> * sizeof(<base_type>))
The resulting value clamped to MAX_TEXTURE_BUFFER_SIZE is the number of addressable texels.
Example:
You have a buffer object storing 4MiB of data. What you want is a buffer texture for addressing RGBA texels, so you choose an internal format RGBA8. The addressable number of texels is then
floor(4MiB / (4 * sizeof(UNSIGNED_BYTE)) == 1024^2 texels == 2^20 texels
If your implementation supports this number, you can address the full range of values in your buffer object. The above isn't too impressive and can simply be achieved with any other texture on current implementations. However, the machine on which I'm writing this answer supports 2^28 == 268435456 texels.
With OpenGL 4.4 (and 4.3 and possibly with earlier 4.x versions), the MAX_TEXTURE_SIZE is 2 ^ 16 texels per 1D-texture, so a buffer texture can still be 4 times as large. On my local machine I can allocate a 2GiB buffer texture (even larger actually), but only a 1GiB 1D-texture when using RGBAF32 texels.
A use-case for buffer textures is random (and atomic, if desired) read-/write-access (the latter via image load/store) to a large data store inside a shader. Yes, you can do random read-access on arrays of uniforms inside one or multiple blocks but it get's very tedious if you have to process a lot of data and have to work with multiple blocks and even then, looking at the maximum combined size of all uniform components (where a single float component has a size of 4 bytes) in all uniform blocks for a single stage,
MAX_(stage)_UNIFORM_BLOCKS *
MAX_UNIFORM_BLOCK_SIZE +
MAX_(stage)_UNIFORM_COMPONENTS * 4
isn't really a lot of space to work with in a shader stage (depending on how large your implementation allows the above number to be).
An important difference between textures and buffer textures is that the data store, as a regular buffer object, can be used in operations where a texture simply does not work. The extension mentions:
The use of a buffer object to provide storage allows the texture data to
be specified in a number of different ways: via buffer object loads
(BufferData), direct CPU writes (MapBuffer), framebuffer readbacks
(EXT_pixel_buffer_object extension). A buffer object can also be loaded
by transform feedback (NV_transform_feedback extension), which captures
selected transformed attributes of vertices processed by the GL. Several
of these mechanisms do not require an extra data copy, which would be
required when using conventional TexImage-like entry points.
An implication of using buffer textures is that look-ups inside a shader can only be done via texelFetch. Buffer textures also aren't mip-mapped and, as you already mentioned, during fetches there is no filtering.
Addendum:
Since OpenGL 4.3, we have what is called a
Shader Storage Buffer. These too provide random (atomic) read-/write-access to a large data store but don't need to be accessed with texelFetch() or image load/store functions as is the case for buffer textures. Using buffer textures also implies having to deal with gvec4 return values, both with texelFetch() and imageLoad() / imageStore(). This becomes very tedious as soon as you want to work with structures (or arrays thereof) and you don't want to think of some stupid packing scheme using multiple instances of vec4 or using multiple buffer textures to achieve something similar. With a buffer accessed as shader storage, you can simple index into the data store and pull one or more instances of some struct {} directly from the buffer.
Also, since they are very similar to uniform blocks, using them should be fairly straight forward - if you know how to use uniform buffers, you don't have a long way to go learn how to use shader storage buffers.
It's also absolutely worth browsing the Issues section of the corresponding ARB extension.
Performance Implications
Daniel Rakos did some performance analysis years ago, both as a comparison of uniform buffers and buffer textures, and also on a little more general note based on information from AMD's OpenCL programming guide. There is now a very recent version, specifically targeting OpenCL optimization an AMD platforms.
There are many factors influencing performance:
access patterns and resulting caching behavior
cache line sizes and memory layou
what kind of memory is accessed (registers, local, global, L1/L2 etc.) and its respective memory bandwidth
how well memory fetching latency is hidden by doing something else in the meantime
what kind of hardware you're on, i.e. a dedicated graphics card with dedicated memory or some unified memory architecture
etc., etc.
As always when worrying about performance: implement something that works and see if that solutions is fast enough for your needs. Otherwise, implement two or more approaches to solving the problem, profile them and compare.
Also, vendor specific guides can offer a great deal of insight. The above mentioned OpenCL user and optimization guides provide a high-level architectural perspective and specific hints on how to optimize your CL kernels - stuff that's also relevant when developing shaders.
A one use case I have found was to store per primitive attributes (accessed in the fragment shader with help of gl_PrimitiveID) while still maintaining unique vertices in the indexed mesh.

What does glTexStorage do?

The documentation indicates that this "allocates" storage for a texture and its levels. The pseudocode provided seems to indicate that this is for the mipmap levels.
How does usage of glTexStorage relate to glGenerateMipmap? glTexStorage seems to "lock" a texture's storage size. It seems to me this would only serve to make things less flexible. Are there meant to be performance gains to be had here?
It's pretty new and only available in 4.2 so I'm going to try to avoid using it, but I'm confused because its description makes it sound kind of important.
How is storage for textures managed in earlier GL versions? When i call glTexImage2D I effectively erase and free the storage previously associated with the texture handle, yes? and generating mipmaps also automatically handles storage for me as well.
I remember using the old-school glTexSubImage2D method to implement OpenGL 2-style render-to-texture to do some post-process effects in my previous engine experiment. It makes sense that glTexStorage will bring about a more sensible way of managing texture-related resources, now that we have better ways to do RTT.
To understand what glTexStorage does, you need to understand what the glTexImage* functions do.
glTexImage2D does three things:
It allocates OpenGL storage for a specific mipmap layer, with a specific size. For example, you could allocate a 64x64 2D image as mipmap level 2.
It sets the internal format for the mipmap level.
It uploads pixel data to the texture. The last step is optional; if you pass NULL as the pointer value (and no buffer object is bound to GL_PIXEL_UNPACK_BUFFER), then no pixel transfer takes place.
Creating a mipmapped texture by hand requires a sequence of glTexImage calls, one for each mipmap level. Each of the sizes of the mipmap levels needs to be the proper size based on the previous level's size.
Now, if you look at section 3.9.14 of the GL 4.2 specification, you will see two pages of rules that a texture object must follow to be "complete". A texture object that is incomplete cannot be accessed from.
Among those rules are things like, "mipmaps must have the appropriate size". Take the example I gave above: a 64x64 2D image, which is mipmap level 2. It would be perfectly valid OpenGL code to allocate a mipmap level 1 that used a 256x256 texture. Or a 16x16. Or a 10x345. All of those would be perfectly functional as far as source code is concerned. Obviously they would produce nonsense as a texture, since the texture would be incomplete.
Again consider the 64x64 mipmap 2. I create that as my first image. Now, I could create a 128x128 mipmap 1. But I could also create a 128x129 mipmap 1. Both of these are completely consistent with the 64x64 mipmap level 2 (mipmap sizes always round down). While they are both consistent, they're also both different sizes. If a driver has to allocate the full mipmap chain at once (which is entirely possible), which size does it allocate? It doesn't know. It can't know until you explicitly allocate the rest.
Here's another problem. Let's say I have a texture with a full mipmap chain. It is completely texture complete, according to the rules. And then I call glTexImage2D on it again. Now what? I could accidentally change the internal format. Each mipmap level has a separate internal format; if they don't all agree, then the texture is incomplete. I could accidentally change the size of the texture, again making the texture incomplete.
glTexStorage prevents all of these possible errors. It creates all the mipmaps you want up-front, given the base level's size. It allocates all of those mipmaps with the same image format, so you can't screw that up. It makes the texture immutable, so you can't come along and try to break the texture with a bad glTexImage2D call. And it prevents other errors I didn't even bother to cover.
The question isn't "what does glTexStorage do?" The question is "why did we go so long without it."
glTexStorage has no relation to glGenerateMipmap; they are orthogonal functionality. glTexStorage does exactly what it says: it allocates texture storage space. It does not fill that space with anything. So it creates a texture with a given size filled with uninitialized data. Much like glRenderbufferStorage allocates a renderbuffer with a given size filled with uninitialized data. If you use glTexStorage, you need to upload data with glTexSubImage (since glTexImage is forbidden on an immutable texture).
glTexStorage creates space for mipmaps. glGenerateMipmap creates the mipmap data itself (the smaller versions of the base layer). It can also create space for mipmaps if that space doesn't already exist. They're used for two different things.
Before calling glGenerateMipmap​, the base mipmap level must be established. (either with mutable or immutable storage).so...,you can using glTexImage2D+glGenerateMipmap only,more simple!