Loading multi-component bytes into shader in Vulkan - glsl

Vulkan allows you to specify attributes as multi-component byte arrays such as with the qualifier "VK_FORMAT_R8G8B8_UINT". I am, however, unsure what input variable type I should use in my glsl shader. Using an ivec3 creates an error as I would expect.
Do I need to load them into an uint and then do bitwise operations do extract the variables? What are the speed implications of this?
If I want to do these bitwise operations, how can I be sure they will be endian-independent? To my understanding, the first byte on my CPU side could be stored in the first or last byte of the integer on the GPU side.

There is nothing to "extract". You asked to pass 3 unsigned integer values per-vertex. That's what the format defines, and that's what the shader should receive. The fact that each unsigned integer value is 8 bits doesn't need to be reflected in your shader; only that they're unsigned integers and that there are 3 of them.
There are no endian issues; not unless you create them in your CPU code. The format specifies that each value of the attribute comes from an array of 3 8-bit values. The three components are read left-to-right, and that's the order the components are expected to be in in memory.
Bytes don't have endian problems. Endian only is an issue when reading a single value that takes up multiple bytes. You asked to read 3 bytes, so that's what it will do. And that's what the CPU should write.
BTW, you should avoid using misaligned types like this. Pad it out to 4 8-bit integers rather than 3.

Related

Why does QColor use 32-bit signed int to represent e.g. rgba values?

QColor can return rgba values of type int (32-bit signed integer). Why is that? The color values range from 0-255, don't they? Is there any situation where this might not be the case?
I'm considering to implicitly cast each of the rgba values returned by QColor.red()/green()/blue()/alpha() to quint8. It seems to work but I don't know if this will lead to problems in some cases. Any ideas?
I assume you are talking about QColor::rgba() which returns a QRgb.
QRgb is an alias to unsigned int. In these 32 bits all fours channels are encoded as #AARRGGBB, 8 bits each one (0-255, as you mentioned). So, a color like alpha=32, red=255, blue=127, green=0 would be 0x20FF7F00 (553615104 in decimal).
Now, regarding your question about casting to quint8, there should be no problem since each channel is guaranteed to be in the range 0..255 (reference). In general, Qt usually uses int as a general integer and do not pay too much attention to the width of the data type, unless in some specific situations (like when it is necessary for a given memory access, for example). So, do not worry about that.
Now, if these operations are done frequently in a high performance context, think about retrieving the 32 bits once using QColor::rgba and then extract the components from it. You can access the individual channels using bitwise operations, or through the convenience functions qAlpha, qRed, qBlue and qGreen.
For completeness, just to mention that the sibbling QColor::rgb method returns the same structure but the alpha channel is opaque (0xFF). You also have QColor::rgba64, which returns a QRgba64. It uses 16 bits per channel, for higher precision. You have the 64 bits equivalents to qAlpha, etc, as qAlpha64 and so on.

Are there any performance differences between representing a number using a (4 byte) `int` and a 4 element unsigned char array?

Assuming an int in C++ is represented by 4 bytes, and an unsigned char is represented by 1 byte, you could represent an int with an array of unsigned char with 4 elements right?
My question is, are there any performance downsides to representing a number with an array of unsigned char? Like if you wanted to add two numbers together would it be just as fast to do int + int compared to adding each element in the array and dealing with carries manually?
This is just me trying to experiment and to practice working with bytes rather than some practical application.
There will be many performance downsides on any kind of manipulation using the 4-byte array. For example, take simple addition: almost any CPU these days will have a single instruction that adds two 32-bit integers, in one (maybe two) CPU cycle(s). To emulate that with your 4-byte array, you would need at least 4 separate CPU instructions.
Further, many CPUs actually work faster with 32- or 64-bit data than they do with 8-bit data - because their internal registers are optimized for 32- and 64-bit operands.
Let's scale your question up. Is there any performance difference between single addition of two 16 byte variables compared to four separate additions of 4 byte variables? And here comes the concept of vector registers and vector instructions (MMX, SSE, AVX). It's pretty much the same story, SIMD is always faster, because there is literally less instructions to execute and the whole operation is done by dedicated hardware. On top of that, in your question you also have to take into account that modern CPUs don't work with 1 byte variables, instead they still process 32 or 64 bits at once anyway. So effectively you will do 4 individual additions using 4 byte registers, only to use single lower byte each time and then manually handle carry bit. Yeah, that will be very slow.

GL_MAX_VERTEX_UNIFORM_COMPONENTS and component sizes

As far as I understand glGet() with GL_MAX_VERTEX_UNIFORM_COMPONENTS returns the maximum number of available uniform components.
Is there an indicator, how large these components can be (1 byte? 4 bytes?)? Can I address more than GL_MAX_VERTEX_UNIFORM_COMPONENTS components if the components are used with low precision?
My question now is: Is there an indicator, how large these components can be ( 1 byte? 4 bytes? )?
No. A component is just a component of a vector, no matter the data type.
Can I address more than GL_MAX_VERTEX_UNIFORM_COMPONENTS components if the components are used with low precision?
No.
You might be able to manually pack multiple data elements into a component, for example 4 bytes or 2 shorts into one 32 Bit integer (assuming your implementation supports 32Bit integers, OpenGL ES 2.0 implementations are not required to). Modern GLSL also has functions like unpackHalf2x16, so you can pack two half-precision floats into one 32 Bit uint component.
Another option to consider (alternatively or additionally to manual packing) is using Uniform Buffer Objects, which allow to specify larger amounts of uniform data.

Clearing a double-precision buffer in OpenGL

Is there a fast way to clear an OpenGL buffer with a double-precision data type or set a default value with an API call to avoid using a compute shader?
For half- and single-precision types, glClearBufferData/glClearNamedBufferData can be used, but it appears like there is no internal format enum for 64 bit types, which makes the switch from single- to double-precision data in scientific computing applications more cumbersome. Or am I missing an extension?
I am looking for a solution that works with OpenGL 4.6, Nvidia-specific extensions are fine.
At the end of the day, a "double" is just a way of interpreting 64-bits of data. Your goal is to get the right 64-bits into your buffer.
As far as buffer clearing is concerned, the image format and pixel transfer parameters are just an explanation of how to interpret the data you pass. If the internal format of the clearing operation is GL_RG32UI, then each "pixel" in the buffer is 64-bits of data.
Given that, all you need to do is to get the clearing function to take a block of 64-bits and copy it exactly as you provide it. To do this, you have to use the right pixel transfer parameters.
See, pixel transfer operations can perform data conversion, taking the data pointer you pass and converting it to match the internal format. You don't want that; you want a direct copy. So your pixel transfer parameters need to exactly match the internal format. Which is quite easy.
A format of GL_RG_INTEGER represents a two-component pixel that stores integer data, in red-green order. And a type of GL_UNSIGNED_INT means that each component is a 32-bit unsigned integer. This exactly matches the internal format of GL_RG32UI, so the copying algorithm won't mess with the bytes of your data.
So, given some 64-bit double value in C or C++, clearing a buffer to that double ought to be as simple as:
void clear_buffer_to_double(GLuint buffer, double dbl)
{
glClearNamedBufferData(buffer, GL_RG32UI, GL_RG_INTEGER, GL_UNSIGNED_INT, &dbl);
}

How do I represent an LZW output in bytes?

I found an implementation of the LZW algorithm and I was wondering how can I represent its output, which is an int list, to a byte array.
I had tried with one byte but in case of long inputs the dictionary has more than 256 entries and thus I cannot convert.
Then I tried to add an extra byte to indicate how many bytes are used to store the values, but in this case I have to use 2 bytes for each value, which doesn't compress enough.
How can I optimize this?
As bits, not bytes. You just need a simple routine that writes an arbitrary number of bits to a stream of bytes. It simply keeps a one-byte buffer into which you put bits until you have eight bits. Then write than byte, clear the buffer, and start over. The process is reversed on the other side.
When you get to the end, just write the last byte buffer if not empty with the remainder of the bits set to zero.
You only need to figure out how many bits are required for each symbol at the current state of the compression. That same determination can be made on the other side when pulling bits from the stream.
In his 1984 article on LZW, T.A. Welch did not actually state how to "encode codes", but described mapping "strings of input characters into fixed-length codes", continuing "use of 12-bit codes is common". (Allows bijective mapping between three octets and two codes.)
The BSD compress(1) command didn't literally follow, but introduced a header, the interesting part being a specification of the maximum number if bits to use to encode an LZW output code, allowing decompressors to size decompression tables appropriately or fail early and in a controlled way. (But for the very first,) Codes were encoded with just the number of integral bits necessary, starting with 9.
An alternative would be to use Arithmetic Coding, especially if using a model different from every code is equally probable.