Vulkan unpack uvec4 from integer - glsl

I've observed some strange behavior. I have an array of unsigned 32bit integers.
I use one integer to encode 4 values, each one byte in size. I'd like to then pass such buffer to vertex shader
layout (location = 0) in uvec4 coords;
In order to achieve this, I use VkVertexInputAttributeDescription with format set to VK_FORMAT_R8G8B8A8_UINT. I have defined such handy struct
struct PackedUVec4{
unsigned char x;
unsigned char y;
unsigned char z;
unsigned char w;
};
Then I build my buffer as PackedUVec4[] and such buffer is then sent to GPU. However, what I have observed, is that the order of bytes gets swapped. For example if I have
layout (location = 0) in uvec4 coords;
void main(){
debugPrintfEXT("%v4d", coords);
}
it seems to print the correct output. But if change format to VK_FORMAT_R32_UINT and try to run
layout (location = 0) in uint coords;
void main(){
uint w = coords & 255u;
uint z = coords/256 & 255u;
uint y = coords/(256*256) & 255u;
uint x = coords/(256*256*256) & 255u;
debugPrintfEXT("%v4d", uvec4(x,y,z,w));
}
I get the bytes in opposite order. Do the vector types use different endianness?

The problem is not with Vulkan, but with your code's interpretation of what's going on. Both sending and receiving.
Recall that endianness is about the (potential) difference between the logical location of a byte within a multi-byte value and the relative address of a byte within a multi-byte value. In little endian, if you write a four-byte value to memory, the first byte will be the least significant byte of the value.
Endianness applies to both reading and writing, but only when reading/writing multi-byte values as multi-byte values. Your PackedUVec4 is not a multi-byte value; it's a struct containing bytes with a specific layout. Therefore, if you write to the x component of a PackedUVec4, you are writing to the first byte of that structure, regardless of your CPU's endian.
When you told Vulkan to read this data as a single 4-byte value (VK_FORMAT_R32_UINT), it does so as defined by your CPU's endian. But your code didn't generate that data in accord with your CPU's endian; it generated it in terms of the layout of a PackedUVec4. So the first byte in memory is x. If the GPU reads those 4 bytes as a little endian 4-byte value, then the first byte will map to the least significant byte of the 4-byte value.
But your code that manually decodes the data is decoding it wrong. It expects the least significant byte to be w.
If you want your code to be endian-independent, then you need the GPU to read the data as 4 individual bytes, in the order stored in memory. Which is what the VK_FORMAT_R8G8B8A8_UINT represents. If you want the GPU to read it as an endian-based ordering within a single 32-bit integer, then it needs to be written that way by the CPU.

Related

Copying big endian float data directly into a vector<float> and byte swapping in place. Is it safe?

I'd like to be able to copy big endian float arrays directly from an unaligned network buffer into a std::vector<float> and perform the byte swapping back to host order "in place", without involving an intermediate std::vector<uint32_t>. Is this even safe? I'm worried that the big endian float data may accidentally be interpreted as NaNs and trigger unexpected behavior. Is this a valid concern?
For the purposes of this question, assume that the host machine receiving the data is little endian.
Here's some code that demonstrates what I'm trying to do:
std::vector<float> source{1.0f, 2.0f, 3.0f, 4.0f};
std::size_t number_count = source.size();
// Simulate big-endian float values being received from network and stored
// in byte buffer. A temporary uint32_t vector is used to transform the
// source data to network byte order (big endian) before being copied
// to a byte buffer.
std::vector<uint32_t> temp(number_count, 0);
std::size_t byte_length = number_count * sizeof(float);
std::memcpy(temp.data(), source.data(), byte_length);
for (uint32_t& datum: temp)
datum = ::htonl(datum);
std::vector<uint8_t> buffer(byte_length, 0);
std::memcpy(buffer.data(), temp.data(), byte_length);
// buffer now contains the big endian float data, and is not aligned at word boundaries
// Copy the received network buffer data directly into the destination float vector
std::vector<float> numbers(number_count, 0.0f);
std::memcpy(numbers.data(), buffer.data(), byte_length); // IS THIS SAFE??
// Perform the byte swap back to host order (little endian) in place,
// to avoid needing to allocate an intermediate uint32_t vector.
auto ptr = reinterpret_cast<uint8_t*>(numbers.data());
for (size_t i=0; i<number_count; ++i)
{
// IS THIS SAFE??
uint32_t datum;
std::memcpy(&datum, ptr, sizeof(datum));
*datum = ::ntohl(*datum);
std::memcpy(ptr, &datum, sizeof(datum));
ptr += sizeof(datum);
}
assert(numbers == source);
Note the two "IS THIS SAFE??" comments above.
Motivation: I'm writing a CBOR serialization library with support for typed arrays. CBOR allows typed arrays to be transmitted as either big endian or little endian.
EDIT: Replaced illegal reinterpret_cast<uint32_t*> type punning in endian swap loop with memcpy.
After your edit:
Regarding the auto datum = reinterpret_cast<uint32_t*>(numbers.data());: This is not allowed in C++, one can only safely type-pun to uint8_t (only if CHAR_BIT == 8, more precisely this type-punning exception only holds for the char types)
Old answer:
Below is for the question before the edit (the one with bit_cast).
This is safe, provided sizeof(float) == sizeof(uint32_t)
Dont worry about signaling NaNs. The exceptions are usually disabled, and even if they are enabled, they are only happening when a signaling NaN is generated. The move instructions do not generate exceptions.
Accessing the vector elements via data() pointer is supported (for both reading and writing). vector is guarantueed to have a contiguous storage.
But why aren't you doing all in only a single loop without the temp buffers?
Just have the float vector (input or output) and the data buffer (uint8_t vector).
For sending just iterate over the float input vector, for each element perform the byte swapping and write the 4 bytes to the data buffer. One at a time. Then you do not need any intermediate buffers. It will probably not be slower. For receiving do the reverse.
Use std::bit_cast for conversion of float from/to std::array<uint8_t,4>. This would be the "correct" way in C++20 (you cant use C arrays directly with bit_cast).
With this approach you do not need to invoke ntohl, just copy the bytes in correct order from/to buffer.
ntohl() probably will interpret data as integers (Network TO Host Long). But to be sure I recommend byte-swapping first using only integer operations, then coping the buffer to a float vector.
Based on Andreas' suggestion of a single loop, the copy & swap code would look something like this (not tested):
std::vector<float> numbers(number_count, 0.0f); // Destination
auto ptr = buffer.data();
for (auto& number: numbers)
{
uint32_t datum;
std::memcpy(&datum, ptr, sizeof(datum));
number = std::bit_cast<float>(endian_swap(datum));
ptr += sizeof(datum);
}

Writing bits to file?

I'm trying to implement a Huffman tree.
Content of my simple .txt file that I want to do a simple test:
aaaaabbbbccd
Frequencies of characters: a:5, b:4, c:2, d:1
Code Table: (Data type of 1s and 0s: string)
a:0
d:100
c:101
b:11
Result that I want to write as binary: (22 bits)
0000011111111101101100
How can I write bit-by-bit each character of this result as a binary to ".dat" file? (not as string)
Answer: You can't.
The minimum amount you can write to a file (or read from it), is a char or unsigned char. For all practical purposes, a char has exactly eight bits.
You are going to need to have a one char buffer, and a count of the number of bits it holds. When that number reaches 8, you need to write it out, and reset the count to 0. You will also need a way to flush the buffer at the end. (Not that you cannot write 22 bits to a file - you can only write 16 or 24. You will need some way to mark which bits at the end are unused.)
Something like:
struct BitBuffer {
FILE* file; // Initialization skipped.
unsigned char buffer = 0;
unsigned count = 0;
void outputBit(unsigned char bit) {
buffer <<= 1; // Make room for next bit.
if (bit) buffer |= 1; // Set if necessary.
count++; // Remember we have added a bit.
if (count == 8) {
fwrite(&buffer, sizeof(buffer), 1, file); // Error handling elided.
buffer = 0;
count = 0;
}
}
};
The OP asked:
How can I write bit-by-bit each character of this result as a binary to ".dat" file? (not as string)
You can not and here is why...
Memory model
Defines the semantics of a computer memory storage for the purpose of C++ abstract machine.
The memory available to a C++ program is one or more contiguous sequences of bytes. Each byte in memory has a unique address.
Byte
A byte is the smallest addressable unit of memory. It is defined as a contiguous sequence of bits, large enough to hold the value of any UTF-8 code unit (256 distinct values) and of (since C++14) any member of the basic execution character set (the 96 characters that are required to be single-byte). Similar to C, C++ supports bytes of sizes 8 bits and greater.
The types char, unsigned char, and signed char use one byte for both storage and value representation. The number of bits in a byte is accessible as CHAR_BIT or std::numeric_limits<unsigned char>::digits.
Compliments of cppreference.com
You can find this page here: cppreference:memory model
This comes from the 2017-03-21: standard
©ISO/IEC N4659
4.4 The C++ memory model [intro.memory]
The fundamental storage unit in the C++ memory model is the byte. A byte is at least large enough to contain any member of the basic execution character set (5.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits,4 the number of which is implementation-defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit. The memory available to a C++ program consists of one or more sequences of contiguous bytes. Every byte has a unique address.
[ Note: The representation of types is described in 6.9. —end note ]
A memory location is either an object of scalar type or a maximal sequence of adjacent bit-fields all having nonzero width. [ Note: Various features of the language, such as references and virtual functions, might involve additional memory locations that are not accessible to programs but are managed by the implementation. —end note ] Two or more threads of execution (4.7) can access separate memory locations without interfering
with each other.
[ Note: Thus a bit-field and an adjacent non-bit-field are in separate memory locations, and therefore can be concurrently updated by two threads of execution without interference. The same applies to two bit-fields, if one is declared inside a nested struct declaration and the other is not, or if the two are separated by a zero-length bit-field declaration, or if they are separated by a non-bit-field declaration. It is not safe to concurrently update two bit-fields in the same struct if all fields between them are also bit-fields of nonzero width. —end note ]
[ Example: A structure declared as
struct {
char a;
int b:5,
c:11,
:0,
d:8;
struct {int ee:8;} e;
}
contains four separate memory locations: The field a and bit-fields d and e.ee are each separate memory
locations, and can be modified concurrently without interfering with each other. The bit-fields b and c
together constitute the fourth memory location. The bit-fields b and c cannot be concurrently modified, but
b and a, for example, can be. —end example ]
4) The number of bits in a byte is reported by the macro CHAR_BIT in the header <climits>.
This version of the standard can be found here:
www.open-std.org section § 4.4 on pages 8 & 9.
The smallest possible memory module that can be written to in a program is 8 contiguous bits or more for a standard byte. Even with bit fields, the 1 byte requirement still holds. You can manipulate, toggle, set, individual bits within a byte but you can not write individual bits.
What can be done is to have a byte buffer with a count of bits written. When your required bits are written you will need to have the rest of the unused bits marked as padding or un-used buffer bits.
Edit
[Note:] -- When using bit fields or unions one thing that you must take into consideration is the endian of the specific architecture.
Answer: You can, in a way.
Hello, from my experience I have found a way to do that simple. For the task you need to define yourself and array of characters (it just needs to be for instance 1 byte, it can be bigger). After that you must define functions to access a specific bit from any element. For example, how to write an expression to get the value of the 3th bit from a char in C++.
*/*position is [1,..,n], and bytes
are in little endian and index from 0`enter code here`*/
int bit_at(int position, unsigned char byte)
{
return (byte & (1 << (position - 1)));
}*
Now you can vision the array of bytes as this
[b1,...,bn]
Now what we actually have in memory is 8 * n bits of memory
We can try to visualize it like so.
NOTE: the arrays is zeroed!
|0000 0000|0000 0000|...|0000 0000|
Now from this you or whoever wants can figure how to manipulate it to get a specific bit from this array. Of course there will be some sort of converted but that is not such a problem.
In the end, for the encoding you provide, that is:
a:0
d:100
c:101
b:11
We can encode the message "abcd",
and make an array that holds the bits
of the message, using the elements
of the array as arrays for bits, like so:
|0111 0110|0000 0000|
You can write this to memory and you will have an excess of at most 7 bits.
This is a simple example, but it can be extended into much more.
I hope this gave some answers to your question.

Pointer Conception

Here i get 4225440 as the address of arr[0]; as it an integer array, the address will be increased by 4, so next one will be 4225444;
now
whats happen with those addresses
if put manualy one of addresses it shows absurd value from where it comes.
This is the code under discussion
#include <stdio.h>
int arr[10],i,a,*j;
void del(int a);
main()
{
for(i=0;i<4;i++)
scanf("%d",&arr[i]);
j=(int*)4225443;
for(i=0;i<4;i++)
{
printf("\n%d ",arr[i]);
printf(" %d ",&arr[i]);
}
printf(" %d ",*j);
}
j=(int*)4225443;
/* ... */
printf(" %d ",*j);
C has its word to say:
(C11, 6.3.2.3p5) "An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation."
In your case you can add to that you are also violating aliasing rules.
most of the CPUs that we use today have either
a 32bit or 64 bit wide bus between the CPU and the memory.
Lets use the 32 bit wide bus for demonstration purposes..
in general, each memory access will be to read (or write) 32 bits,
where the first address of that 32 bits will be an address
that is evenly divisible by 32.
In such a architecture, a 'int' will start on a address
that is evenly divisible by 32 and be 4 bytes (32bits) long.
in general, when the address of 'something' is NOT
on a 32 bit address boundary
(I.E. the address is not evenly divisible by 32)
then the CPU will:
for read,
read the whole 32 bits from memory,
starting at the 32 bit boundary,
then, within the CPU,
using the registers and the logic and math operations,
extract the desired byte.
for write,
read the whole 32 bits from memory,
starting at the 32 bit boundary,
then, within the CPU,
using the registers and logic and math operations,
modify the desired byte,
then write the whole 32 bits to memory
In other words,
accessing memory other than on 32bit boundarys is SLOW.
Unfortunately some CPUs,
if requested to read/write some value to/from memory
at other than a 32 bit boundary will raise a bus error.
regarding the 'unbelievable' value of the int
when the second byte of the int is modified...
A int (lets use a little endian architecture) is 4 bytes,
aligned on a 32 bit boundary
(I.E. the lowest address of the int is on a 32 bit boundary.)
Lets, for example say the int contains '5'
then its' representation in memory is 0x00,0x00,0x00,0x05
Then the second byte (address of the int+1) is set to some value,
for example, say 3,
Then the int contains 0x000, 0x03, 0x00, 0x05
now, when that int is printed, it will display: 196613
Note: the order of the bytes in memory is somewhat different
for a big endian architecture.
It will print value, located in address 4225443, if value exists, otherwise it will produce memory violation exception.

dealing with endianness in c++

I am working on translating a system from python to c++. I need to be able to perform actions in c++ that are generally performed by using Python's struct.unpack (interpreting binary strings as numerical values). For integer values, I am able to get this to (sort of) work, using the data types in stdint.h:
struct.unpack("i", str) ==> *(int32_t*) str; //str is a char* containing the data
This works properly for little-endian binary strings, but fails on big-endian binary strings. Basically, I need an equivalent to using the > tag in struct.unpack:
struct.unpack(">i", str) ==> ???
Please note, if there is a better way to do this, I am all ears. However, I cannot use c++11, nor any 3rd party libraries other than Boost. I will also need to be able to interpret floats and doubles, as in struct.unpack(">f", str) and struct.unpack(">d", str), but I'll get to that when I solve this.
NOTE I should point out that the endianness of my machine is irrelevant in this case. I know that the bitstream I receive in my code will ALWAYS be big-endian, and that's why I need a solution that will always cover the big-endian case. The article pointed out by BoBTFish in the comments seems to offer a solution.
For 32 and 16-bit values:
This is exactly the problem you have for network data, which is big-endian. You can use the the ntohl to turn a 32-bit into host order, little-endian in your case.
The ntohl() function converts the unsigned integer netlong from network byte order to
host byte order.
int res = ntohl(*((int32_t) str)));
This will also take care of the case where your host is big-endian and won't do anything.
For 64-bit values
Non-standardly on linux/BSD you can take a look at 64 bit ntohl() in C++?, which points to htobe64
These functions convert the byte encoding of integer values from the byte order that
the current CPU (the "host") uses, to and from little-endian and big-endian byte
order.
For windows try: How do I convert between big-endian and little-endian values in C++?
Which points to _byteswap_uint64 and as well as a 16 and 32-bit solution and a gcc-specific __builtin_bswap(32/64) call.
Other Sizes
Most systems don't have values that aren't 16/32/64 bits long. At that point I might try to store it in a 64-bit value, shift it and they translate. I'd write some good tests. I suspectt is an uncommon situation and more details would help.
Unpack the string one byte at a time.
unsigned char *str;
unsigned int result;
result = *str++ << 24;
result |= *str++ << 16;
result |= *str++ << 8;
result |= *str++;
First, the cast you're doing:
char *str = ...;
int32_t i = *(int32_t*)str;
results in undefined behavior due to the strict aliasing rule (unless str is initialized with something like int32_t x; char *str = (char*)&x;). In practical terms that cast can result in an unaligned read which causes a bus error (a crash) on some platforms and slow performance on others.
Instead you should be doing something like:
int32_t i;
std::memcpy(&i, c, sizeof(i));
There are a number of functions for swapping bytes between the host's native byte ordering and a host independent ordering: ntoh*(), hton*(), where * is nothing, l, or s for the different types supported. Since different hosts may have different byte orderings then this may be what you want to use if the data you're reading uses a consistent serialized form on all platforms.
ntoh(i);
You can also manually move bytes around in str before copying it into the integer.
std::swap(str[0],str[3]);
std::swap(str[1],str[2]);
std::memcpy(&i,str,sizeof(i));
Or you can manually manipulate the integer's value using shifts and bitwise operators.
std::memcpy(&i,str,sizeof(i));
i = (i&0xFFFF0000)>>16 | (i&0x0000FFFF)<<16;
i = (i&0xFF00FF00)>>8 | (i&0x00FF00FF)<<8;
This falls in the realm of bit twiddling.
for (i=0;i<sizeof(struct foo);i++) dst[i] = src[i ^ mask];
where mask == (sizeof type -1) if the stored and native endianness differ.
With this technique one can convert a struct to bit masks:
struct foo {
byte a,b; // mask = 0,0
short e; // mask = 1,1
int g; // mask = 3,3,3,3,
double i; // mask = 7,7,7,7,7,7,7,7
} s; // notice that all units must be aligned according their native size
Again these masks can be encoded with two bits per symbol: (1<<n)-1, meaning that in 64-bit machines one can encode necessary masks of a 32 byte sized struct in a single constant (with 1,2,4 and 8 byte alignments).
unsigned int mask = 0xffffaa50; // or zero if the endianness matches
for (i=0;i<16;i++) {
dst[i]=src[i ^ ((1<<(mask & 3))-1]; mask>>=2;
}
If your as received values are truly strings, (char* or std::string) and you know their format information, sscanf(), and atoi(), well, really ato() will be your friends. They take well formatted strings and convert them per passed-in formats (kind of reverse printf).

why unsigned char for RGB pixel data?

i'm approaching c++ with some basic computer graphics.
pixels data is usually represented as :
unsigned char *pixels
and an unsigned char is good because is a value between 0 and 255 (256 = 2^8 because a char is 2 byte and 1 byte is 8 bit?). and this is good because in RGB color are represented with a number between 0 and 255.
but.. i understand this as a monchromatic image, in a normal image i have RGB, i would have 3 array of unsiged char, one for red, one for green, one for blue. something like:
unsigned char *pixels[3]
but i never found something similar for RGB pixels data
RGB images are usually stored in interleaved order (R1, G1, B1, R2, G2, B2, ...), so one pointer (to R1) is enough.
This makes it a bit harder to address individual pixels: pixel with index N is stored at pixels[3*N+0], pixels[3*N+1] and pixels[3*N+2] instead of just red[N], green[N], blue[N].
However, this has the advantage of allowing faster access: less pointers lead to easier programs, improving their speed; interleaved order also makes memory caching more effective.
unsigned char *pixels[3];
declares an array of three pointers to unsigned char. I'm not sure if that's what you wanted.
There are several different ways to represent pixels. The simplest is probably something like:
struct Pixel
{
unsigned char red;
unsigned char green;
unsigned char blue;
};
But you may have to (or want to) conform to some external format. Another frequent possibility is to put all three colors in a uint32_t. Also, in some graphic systems, there may be a fourth element, and alpha, representing transparency.
Really whenever you refer to a block of bytes, it's going to be of type unsigned char* because of the fact that unsigned char by the C-specification has no padding in the type itself (i.e., every bit is used for a value in the byte, and there are no padded bits that are not used), and pixel-data is going to be some block of X bytes with no padding (at least not internal padding ... there may be padding at the end of the buffer for alignment purposes). It will also most likely be allocated on the heap somewhere. So no matter if it's going to be monochrome, color-data, etc., you will often find that a pixel buffer will be pointed to via an unsigned char pointer, and you may then cast it to some struct like James mentioned in order to easily access the pixel information. Other times you may have to index into the buffer like anatolyg mentions. But in the end, a buffer of pixels is just a buffer of data, and a general buffer of data bytes should be accessed in C/C++ using type unsigned char*.
With *pixels[3] you've got separate arrays for the three colour components, whereas in files the three colour components for a single pixel are stored together. It also means you can use a single fread()/fwrite() for the whole block of image data,