Is the msb of hex representation of binary the left side or right side? - bit-manipulation

In an answer to a bit padding related question here the respondent made the following statement
uint32 value of 69 is an integer value, if you want to pad it to have 32-byte
(the EVMnative length), then you have to consider the "endianness" (Read more here):
* Big-endian: 0x0000.....0045 (32 bytes)
* Little-endian: 0x4500.....00000 (32 bytes)
Apologies if this is a trite question, but what determines that
in case of big-endian the most significant bit is on the left side? hence padding goes on the left, but
in case of little-endian, the most significant bit is on right side? hence padding goes on the right?
Because my understanding is that with big-endian the most significant bits get read first into the least significant memory address but in little-endian, the least significant bits get read first into the least significant memory address.
The above padding scheme seems to suggest when represented in hex, the left side get's read in first with big-endian, but with little-endian, the right side get read in first
that most significant bit is on the left side, hence when dealing with big-endian, the padding goes on the left, but when dealing with little-endian it goes on the right, since again, the

Related

Is this the correct way of writing bits to big endian?

Currently, it's for a Huffman compression algorithm that assigns binary codes to characters used in a text file. Fewer bits for more frequent- and more bits for less-frequent characters.
Currently, I'm trying to save the binary code big-endian in a byte.
So let's say I'm using an unsigned char to hold it.
00000000
And I want to store some binary code that's 1101.
In advance, I want to apologize if this seems trivial or is a dupe but I've browsed dozens of other posts and can't seem to find what I need. If anyone could link or quickly explain, it'd be greatly appreciated.
Would this be the correct syntax?
I'll have some external method like
int length = 0;
unsigned char byte = (some default value);
void pushBit(unsigned int bit){
if (bit == 1){
byte |= 1;
}
byte <<= 1;
length++;
if (length == 8) {
//Output the byte
length = 0;
}
}
I've seen some videos explaining endianess and my understanding is the most significant bit (the first one) is placed in the lowest memory address.
Some videos showed the byte from left to right which makes me think I need to left shift everything over but whenever I set, toggle, erase a bit, it's from the rightmost is it not? I'm sorry once again if this is trivial.
So after my method finishes pushing the 1101 into this method, byte would be something like 00001101. Is this big endian? My knowledge of address locations is very weak and I'm not sure whether
**-->00001101 or 00001101<-- **
location is considered the most significant.
Would I need to left shift the remaining amount?
So since I used 4 bits, I would left shift 4 bits to make 11010000. Is this big endian?
First off, as the Killzone Kid noted, endianess and the bit ordering of a binary code are two entirely different things. Endianess refers to the order in which a multi-byte integer is stored in the bytes of memory. For little endian, the least significant byte is stored first. For big endian, the most significant byte is stored first. The bits in the bytes don't change order. Endianess has nothing to do with what you're asking.
As for accumulating bits until you have a byte's worth to write, you have the basic idea, but your code is incorrect. You need to shift first, and then or the bit. The way you're doing it, you are losing the first bit you put in off the top, and the low bit of what you write is always zero. Just put the byte <<= 1; before the if.
You also need to deal with ending the stream somehow, writing out the last bits if there are less than eight left. So you'll need a flushBits() to write out you bit buffer if it has more than one bit in it. Your bit stream would need to be self terminating, or you need to first send the number of bits, so that you don't misinterpret the filler bits in the last byte as a code or codes.
There are two types of endianness, Big-endian and Little-endian (technically there are more, like middle-endian, but big and little are the most common). If you want to have the big-endian format, (as it seems like you do), then the most significant byte comes first, with little-endian the least significant byte comes first.
Wikipedia has some good examples
It looks like what you are trying to do is store the bits themselves within the byte to be in reverse order, which is not what you want. A byte is endian agnostic and does not need to be flipped. Multi-byte types such as uint32_t may need their byte order changed, depending on what endianness you want to achieve.
Maybe what you are referring to is bit numbering, in which case the code you have should largely work (although you should compare length to 7, not 8). The order you place the bits in pushBit would end up with the first bit you pass being the most significant bit.
Bits aren't addressable by definition (if we're talking about C++, not C51 or its C++ successor), so from point of high level language, even from POV of assembler pseudo-code, no matter what the direction LSB -> MSB is, bit-wise << would perform shift from LSB to MSB. Bit order referred as bit numbering and is a separate feature from endian-ness, related to hardware implementation.
Bit fields in C++ change order because in most common use-cases usually bits do have an opposite order, e.g. in network communication, but in fact way how bit fields are packed into byte is implementation dependent, there is no consistency guarantee that there is no gaps or that order is preserved.
Minimal addressable unit of memory in C++ is of char size , and that's where your concern with endian-ness ends. The rare case if you actually should change bit order (when? working with some incompatible hardware?), you have to do explicitly so.
Note, that when working with Ethernet or other network protocol you should not do so, order change is done by hardware (first bit sent over wire is least significant one on the platform).

Bitstream parsing and Endianness

I am trying to parse a bitstream, and I am having trouble getting my head around endianness. I have a byte buffer, and I need to be able to read bitfields out which are of varying lengths, anywhere from 1 bit to 8 bits mostly.
My problem comes with the endianness of the bytes. When I step through with a debugger, the bottom 4 bits appear to be in the top portion of the byte. That is, where I am expecting the first two bits to be 10 (they must be 10), however, the first byte in the bitstream is 0xA3, or 1010 0011, when checking with the debugger. Meaning, assuming that the bits are in the "correct" order, the first two bits are in fact 11 (reading right to left).
It would seem, however, that if the bits were not in the right order, and should be 0x3A, or 0011 1010, I then have 10 as my expected first two bits.
This confuses me, because it doesn't seem to be a matter of bit order, MSb to LSb/LSb to MSb, but rather nibble order. How does this happen? That seems to just be the way it came out of the file. There is a possibility this is an invalid bitstream, but I have seen this kind of thing before when reading files in Hex Editors, nibbles seemingly in the "wrong" order.
I am just confused and would like some help understanding what's going on. I don't often deal with things at this level.
You don't need to concern the bit order, because in C/C++ there is no way for you to iterate through the bits using pointer arithmetics. You can only manipulate the bits using bit-wise operators that are independent of the bit order of the local machine. What you mentioned in the OP is just a matter of visualization. Different debuggers may choose different ways to visualize the bits in a byte. There is no right or wrong for this matter. There is just preference. What really matters if the byte order.

Byte order for packed images

So from http://en.wikipedia.org/wiki/RGBA_color_space, I learned that the byte order for ARGB is, from lowest address to highest address, BGRA, on a little endian machine in certain interpretations.
How does this effect the naming convention of packed data eg a uint8_t ar[]={R,G,B,R,G,B,R,G,B}?
Little endian by definition stores the bytes of a number in reverse order. This is not strictly necessary if you are treating them as byte arrays however any vaguely efficient code base will actually treat the 4 bytes as a 32 bit unsigned integer. This will speed up software blitting by a factor of almost 4.
Now the real question is why. This comes from the fact that when treating a pixel as a 32 bit int as described above coders want to be able to run arithmetic and shifts in a predictable way. This relies on the bytes being in reverse order.
In short, this is not actually odd as in little endian machines the last byte (highest address) is actually the most significant byte and the first the least significant. Thus a field like this will naturally be in reverse order so it is the correct way around when treated as a number (as a number it will appear ARGB but as a byte array it will appear BGRA).
Sorry if this is unclear, but I hope it helps. If you do not understand or I have missed something please comment.
If you are storing data in a byte array like you have specified, you are using BGR format which is basically RGB reversed:
bgr-color-space

Difference between byte flip and byte swap

I am trying to find the difference becoz of byte flip functionality I see in Calculator on Mac with Programmer`s view.
So I wrote a program to byte swap a value which we do to go from small to big endian or other way round and I call it as byte swap. But when I see byte flip I do not understand what exactly it is and how is it different than byte swap. I did confirm that the results are different.
For example, for an int with value 12976128
Byte Flip gives me 198;
Byte swap gives me 50688.
I want to implement an algorithm for byte flip since 198 is the value I want to get while reading something. Anything on google says byte flip founds the help byte swap which isnt the case for me.
Byte flip and byte swap are synonyms.
The results you see are just two different ways of swapping the bytes, depending on whether you look at the number as a 32bit number (consisting of 4 bytes), or as the smallest size of a number that can hold 12976128, which is 24 bits or 3 bytes.
The 4byte swap is more usual in computer culture, because 32bit processors are currently predominant (even 64bit architectures still do most of their mathematics in 32bit numbers, partly because of backward compatible software infrastructure, partly because it is enough for many practical purposes). But the Mac Calculator seems to use the minimum-width swap, in this case a 3 byte swap.
12976128, when converted to hexadecimal, gives you 0xC60000. That's 3 bytes total ; each hexadecimal digit is 4 bits, or half a byte wide. The bytes to be swapped are 0xC6, zero, and another zero.
After 3byte swap: 0x0000C6 = 198
After 4byte swap: 0x0000C600 = 50688

Bit order in C/C++

I have to implement a protocol which defines data in 8bit words, which starts with the least significant bit (LSB) first. I want to realize this data with unsigned char, but I don't know what's the bit order of LSB and most significant bit (MSB) in C/C++, that could possible require swapping the bits.
Can anybody explain me how to find out an unsigned char is encoded: with MSB-LSB or LSB-MSB?
Example:
unsigned char b = 1;
MSB-LSB: 0000 0001
LSB-MSB: 1000 0000
Endian-ness is platform dependent. Anyway, you don't have to worry about actual bit order unless you are serializing the bytes, which you may be trying to do. In which case, you still don't need to worry about how individual bytes are stored while they're on the machine, since you will have to dig the bits out individually anyway. Fortunately, if you bitwise AND with 1, you get the LSB, regardless of storage order; bit-AND with 2 and you get the next most significant bit, and so on. The compiler will sort out what constants to generate in the machine code, so that level of detail is abstracted away.
There is no such thing in C/C++. The least significant bit is -- well -- the least significant bit. Since the bits don't have addresses, there is no other ordering.