big endian little endian conversion

big endian little endian conversion - c++

I saw a question on stack overflow how to convert from one endian to another. And the solution was like this:
template <typename T>
void swap_endian(T& pX)
{
char& raw = reinterpret_cast<char&>(pX);
std::reverse(&raw, &raw + sizeof(T));
}
My question is. Will this solution swap the bits correctly? It will swaps the bytes in the correct order but it will not swap the bits.

Yes it will, because there is no need to swap the bits.
Edit:
Endianness has effect on the order in which the bytes are written for values of 2 bytes or more. Little endian means the least significant byte comes first, big-endian the other way around.
If you receive a big-eindian stream of bytes written by a little endian system, there is no debate what the most significant bit is within the bytes. If the bit order was affected you could not read each others byte streams reliably (even if it was just plain 8 bit ascii).
This can not be autmatically determined for 2-byte or bigger values, as the file system (or network layer) does not know if you send data a byte at a time, or if you are sending ints that are (e.g.) 4 bytes long.
If you have a direct 1-bit serial connection with another system, you will have to agree on little or big endian bit ordering at the transport layer.

bigendian vs little endian concerns itself with how bytes are ordered within a larger unit, such as an int,long, etc. The ordering of bits within a byte is the same.

"Endianness" generally refers to byte order, not the order of the bits within those bytes. In this case, you don't have to reverse the bits.

You are correct, that function would only swap the byte order, not individual bits. This is usually sufficient for networking. Depending on your needs, you may also find the htons() family of functions useful.
From Wikipedia:
Most modern computer processors agree
on bit ordering "inside" individual
bytes (this was not always the
case). This means that any single-byte
value will be read the same on almost
any computer one may send it to."

Related

Is this the correct way of writing bits to big endian?

Currently, it's for a Huffman compression algorithm that assigns binary codes to characters used in a text file. Fewer bits for more frequent- and more bits for less-frequent characters.
Currently, I'm trying to save the binary code big-endian in a byte.
So let's say I'm using an unsigned char to hold it.
00000000
And I want to store some binary code that's 1101.
In advance, I want to apologize if this seems trivial or is a dupe but I've browsed dozens of other posts and can't seem to find what I need. If anyone could link or quickly explain, it'd be greatly appreciated.
Would this be the correct syntax?
I'll have some external method like
int length = 0;
unsigned char byte = (some default value);
void pushBit(unsigned int bit){
if (bit == 1){
byte |= 1;
}
byte <<= 1;
length++;
if (length == 8) {
//Output the byte
length = 0;
}
}
I've seen some videos explaining endianess and my understanding is the most significant bit (the first one) is placed in the lowest memory address.
Some videos showed the byte from left to right which makes me think I need to left shift everything over but whenever I set, toggle, erase a bit, it's from the rightmost is it not? I'm sorry once again if this is trivial.
So after my method finishes pushing the 1101 into this method, byte would be something like 00001101. Is this big endian? My knowledge of address locations is very weak and I'm not sure whether
**-->00001101 or 00001101<-- **
location is considered the most significant.
Would I need to left shift the remaining amount?
So since I used 4 bits, I would left shift 4 bits to make 11010000. Is this big endian?

First off, as the Killzone Kid noted, endianess and the bit ordering of a binary code are two entirely different things. Endianess refers to the order in which a multi-byte integer is stored in the bytes of memory. For little endian, the least significant byte is stored first. For big endian, the most significant byte is stored first. The bits in the bytes don't change order. Endianess has nothing to do with what you're asking.
As for accumulating bits until you have a byte's worth to write, you have the basic idea, but your code is incorrect. You need to shift first, and then or the bit. The way you're doing it, you are losing the first bit you put in off the top, and the low bit of what you write is always zero. Just put the byte <<= 1; before the if.
You also need to deal with ending the stream somehow, writing out the last bits if there are less than eight left. So you'll need a flushBits() to write out you bit buffer if it has more than one bit in it. Your bit stream would need to be self terminating, or you need to first send the number of bits, so that you don't misinterpret the filler bits in the last byte as a code or codes.

There are two types of endianness, Big-endian and Little-endian (technically there are more, like middle-endian, but big and little are the most common). If you want to have the big-endian format, (as it seems like you do), then the most significant byte comes first, with little-endian the least significant byte comes first.
Wikipedia has some good examples
It looks like what you are trying to do is store the bits themselves within the byte to be in reverse order, which is not what you want. A byte is endian agnostic and does not need to be flipped. Multi-byte types such as uint32_t may need their byte order changed, depending on what endianness you want to achieve.
Maybe what you are referring to is bit numbering, in which case the code you have should largely work (although you should compare length to 7, not 8). The order you place the bits in pushBit would end up with the first bit you pass being the most significant bit.

Bits aren't addressable by definition (if we're talking about C++, not C51 or its C++ successor), so from point of high level language, even from POV of assembler pseudo-code, no matter what the direction LSB -> MSB is, bit-wise << would perform shift from LSB to MSB. Bit order referred as bit numbering and is a separate feature from endian-ness, related to hardware implementation.
Bit fields in C++ change order because in most common use-cases usually bits do have an opposite order, e.g. in network communication, but in fact way how bit fields are packed into byte is implementation dependent, there is no consistency guarantee that there is no gaps or that order is preserved.
Minimal addressable unit of memory in C++ is of char size , and that's where your concern with endian-ness ends. The rare case if you actually should change bit order (when? working with some incompatible hardware?), you have to do explicitly so.
Note, that when working with Ethernet or other network protocol you should not do so, order change is done by hardware (first bit sent over wire is least significant one on the platform).

Should I think about Network Byte Order?

I cannot understand the concept of "Network byte order". I have read Network byte order and endianness issues, but still cannot.
Now I made a formal network protocol description for a communication between two computers over TCP sockets. Here is the phrase "...use little-endian byte order". But the standard Network byte order is big-endian.
Whether should I think about a byte order, if on both sides of a network the byte order is fully defined, and I write, roughly speakin, void* and size ? How can a network "know" about my data? What about float types?
For example, whether I cannot write on my side:
stream.setDevice(tcpSocket);
stream.setByteOrder(QDataStream::LittleEndian);
...
struct SomeType
{
int32_t a;
int32_t b;
double c;
friend QDataStream& operator << (
QDataStream& stream, const SomeType& x)
{
stream << x.a
<< x.b
<< x.c;
return stream;
}
};
or may be just:
SomeType x;
tcpSocket.write(&x, size); // If known a byte order and a data structure alignment on both sides

32 bit value represented as little-endian (intel etc):
address offset 0 1 2 3
bits 0-7 bits 8-15 bits 16-23 bits 24-31
And represented in network byte order or big-endian (motorola cpus, etc):
address offset 0 1 2 3
bits 24-31 bits 16-23 bits 8-15 bits 0-7
Depending on which architecture you first learned to write machine code (if you did) one will make more sense to you than the other. For almost anyone under 45 years old, it will be little-endian which is the opposite of network byte order.
If you learned to write machine code on a Texas TMS9900 architecture like me it's even more confusing because in texas-land, bit 0 is the most significant bit (!)
Update:
In general it is better to encode data on the wire in a way that is independent of hardware or compiler implementation choices, or even languages.
Here's an example of such encoding from google's protocol buffers:
https://developers.google.com/protocol-buffers/docs/encoding
The advantages here are:
generally less traffic transmitted so faster networking
each end of the connection will understand the data regardless of hardware, compiler version, or even language.

A TCP-socket is just a stream of bytes and don't care at all about the endianness of the data you send. Thus, for your own private network-protocols you can use any byte-order you like. If all the computers that use the protocol has the same natural byte-order it is probably a good idea to use this as the serialization order as this allows you to write code like your second example.

Which bit is first and when you bit shift, does it actually shift in that direction?

So.. wrestling with bits and bytes, It occurred to me that if i say "First bit of nth byte", it might not mean what I think it means. So far I have assumed that if I have some data like this:
00000000 00000001 00001000
then the
First byte is the leftmost of the groups and has the value of 0
First bit is the leftmost of all 0's and has the value of 0
Last byte is the rightmost of the groups and has the value of 8
Last bit of the second byte is the rightmost of the middle group and has the value of 1
Then I learned that the byte order in a typed collection of bytes is determined by the endianess of the system. In my case it should be little endian (windows, intel, right?) which would mean that something like 01 10 as a 16 bit uinteger should be 2551 while in most programs dealing with memory it would be represented as 265.. no idea whats going on there.
I also learned that bits in a byte could be ordered as whatever and there seems to be no clear answer as to which bit is the actual first one since they could also be subject to bit-endianess and peoples definition about what is first differs. For me its left to right, for somebody else it might be what first appears when you add 1 to 0 or right to left.
Why does any of this matter? Well, curiosity mostly but I was also trying to write a class that would be able to extract X number of bits, starting from bit-address Y. I envisioned it sorta like .net string where i can go and type ".SubArray(12(position), 5(length))" then in case of data like in the top of this post it would retrieve "0001 0" or 2.
So could somebody clarifiy as to what is first and last in terms of bits and bytes in my environment, does it go right to left or left to right or both, wut? And why does this question exist in the first place, why couldn't the coding ancestors have agreed on something and stuck with it?

A shift is an arithmetic operation, not a memory-based operation: it is intended to work on the value, rather than on its representation. Shifting left by one is equivalent to a multiplication by two, and shifting right by one is equivalent to a division by two. These rules hold first, and if they conflict with the arrangement of the bits of a multibyte type in memory, then so much for the arrangement in memory. (Since shifts are the only way to examine bits within one byte, this is also why there is no meaningful notion of bit order within one byte.)
As long as you keep your operations to within a single data type (rather than byte-shifting long integers and them examining them as character sequences), the results will stay predictable. Examining the same chunk of memory through different integer types is, in this case, a bit like performing integer operations and then reading the bits as a float; there will be some change, but it's not the place of the integer arithmetic definitions to say exactly what. It's out of their scope.

You have some understanding, but a couple misconceptions.
First off, arithmetic operations such as shifting are not concerned with the representation of the bits in memory, they are dealing with the value. Where memory representation comes into play is usually in distributed environments where you have cross-platform communication in the mix, where the data on one system is represented differently on another.
Your first comment...
I also learned that bits in a byte could be ordered as whatever and there seems to be no clear answer as to which bit is the actual first one since they could also be subject to bit-endianess and peoples definition about what is first differs
This isn't entirely true, though the bits are only given meaning by the reader and the writer of data, generally bits within an 8-bit byte are always read from left (MSB) to right (LSB). The byte-order is what is determined by the endian-ness of the system architecture. It has to do with the representations of the data in memory, not the arithmetic operations.
Second...
And why does this question exist in the first place, why couldn't the coding ancestors have agreed on something and stuck with it?
From Wikipedia:
The initial endianness design choice was (is) mostly arbitrary, but later technology revisions and updates perpetuate the same endianness (and many other design attributes) to maintain backward compatibility. As examples, the Intel x86 processor represents a common little-endian architecture, and IBM z/Architecture mainframes are all big-endian processors. The designers of these two processor architectures fixed their endiannesses in the 1960s and 1970s with their initial product introductions to the market. Big-endian is the most common convention in data networking (including IPv6), hence its pseudo-synonym network byte order, and little-endian is popular (though not universal) among microprocessors in part due to Intel's significant historical influence on microprocessor designs. Mixed forms also exist, for instance the ordering of bytes within a 16-bit word may differ from the ordering of 16-bit words within a 32-bit word. Such cases are sometimes referred to as mixed-endian or middle-endian. There are also some bi-endian processors which can operate either in little-endian or big-endian mode.
Finally...
Why does any of this matter? Well, curiosity mostly but I was also trying to write a class that would be able to extract X number of bits, starting from bit-address Y. I envisioned it sorta like .net string where i can go and type ".SubArray(12(position), 5(length))" then in case of data like in the top of this post it would retrieve "0001 0" or 2.
Many programming languages and libraries offer functions that allow you to convert to/from network (big endian) and host order (system dependent) so that you can ensure data you're dealing with is in the proper format, if you need to care about it. Since you're asking specifically about bit shifting, it doesn't matter in this case.
Read this post for more info

Byte order for packed images

So from http://en.wikipedia.org/wiki/RGBA_color_space, I learned that the byte order for ARGB is, from lowest address to highest address, BGRA, on a little endian machine in certain interpretations.
How does this effect the naming convention of packed data eg a uint8_t ar[]={R,G,B,R,G,B,R,G,B}?

Little endian by definition stores the bytes of a number in reverse order. This is not strictly necessary if you are treating them as byte arrays however any vaguely efficient code base will actually treat the 4 bytes as a 32 bit unsigned integer. This will speed up software blitting by a factor of almost 4.
Now the real question is why. This comes from the fact that when treating a pixel as a 32 bit int as described above coders want to be able to run arithmetic and shifts in a predictable way. This relies on the bytes being in reverse order.
In short, this is not actually odd as in little endian machines the last byte (highest address) is actually the most significant byte and the first the least significant. Thus a field like this will naturally be in reverse order so it is the correct way around when treated as a number (as a number it will appear ARGB but as a byte array it will appear BGRA).
Sorry if this is unclear, but I hope it helps. If you do not understand or I have missed something please comment.

If you are storing data in a byte array like you have specified, you are using BGR format which is basically RGB reversed:
bgr-color-space

Byte swap of a byte array into a long long

I have a program where i simply copy a byte array into a long long array. There are a total of 20 bytes and so I just needed a long long of 3. The reason I copied the bytes into a long long was to make it portable on 64bit systems.
I just need to now byte swap before I populate that array such that the values that go into it go reversed.
there is a byteswap.h which has _int64 bswap_64(_int64) function that i think i can use. I was hoping for some help with the usage of that function given my long long array. would i just simply pass in the name of the long long and read it out into another long long array?
I am using c++ not .net or c#
update:
clearly there are issues i am still confused about. for example, workng with byte arrays that just happen to be populated with 160 bit hex string which then has to be outputed in decimal form made me think about the case where if i just do a simple assignment to a long (4 byte) array my worries would be over. Then i found out that this code would ahve to run on a 64bit sun box. Then I thought that since the sizes of data from one env to another can change just a simple assignment would not cut it. this made me think about just using a long long to just make the code sort of immune to that size issue. however, then i read about endianess and how 64bit reads MSB vs 32bit which is LSB. So, taking my data and reversing it such that it is stored in my long long as MSB was the only solution that came to mind. ofc, there is the case about the 4 extra bytes which in this case does not matter and i simply will take the decimal output and display any random six digits i choose. However programatically, i guess it would be better to just work with 4 byte longs and not deal with that whole wasted 4 byte issue.

Between this and your previous questions, it sounds like there are several fundamental confusions here:
If your program is going to be run on a 64-bit machine, it sounds like you should compile and unit-test it on a 64-bit machine. Running unit tests on a 32-bit machine can give you confidence the program is correct in that environment, but doesn't necessarily mean the code is correct for a 64-bit environment.
You seem to be confused about how 32- and 64-bit architectures relate to endianness. 32-bit machines are not always little-endian, and 64-bit machines are not always big-endian. They are two separate concepts and can vary independently.
Endianness only matters for single values consisting of multiple bytes; for example, the integer 305,419,896 (0x12345678) requires 4 bytes to represent, or a UTF-16 character (usually) requires 2 bytes to represent. For these, the order of storage matters because the bytes are interpreted as a single unit. It sounds like what you are working with is a sequence of raw bytes (like a checksum or hash). Values like this, where multiple bytes are not interpreted in groups, are not affected by the endianness of the processor. In your case, casting the byte array to a long long * actually creates a potential endianness problem (on a little-endian architecture, your bytes will now be interpreted in the opposite order), not the other way around.
Endianness also doesn't matter unless the little-endian and big-endian versions of your program actually have to communicate with each other. For example, if the little-endian program writes a file containing multi-byte integers without swapping and the big-endian program reads it in, the big-endian program will probably misinterpret the data. It sounds like you think your code that works on a little-endian platform will suddenly break on a big-endian platform even if the two never exchange data. You generally don't need to be worried about the endianness of the architecture if the two versions don't need to talk to each other.
Another point of confusion (perhaps a bit pedantic). A byte does not store a "hex value" versus a "decimal value," it stores an integer. Decimal and hexadecimal are just two different ways of representing (printing) a particular integer value. It's all binary in the computer's memory anyway, hexadecimal is just an easy conversion to and from binary and decimal is convenient to our brains since we have ten fingers.
Assuming what you're trying to do is print the value of each byte of the array as decimal, you could do this:
unsigned char bytes[] = {0x12, 0x34, 0x56, 0x78};
for (int i = 0; i < sizeof(bytes) / sizeof(unsigned char); ++i)
{
printf("%u ", (unsigned int)bytes[i]);
}
printf("\n");
Output should be something like:
18 52 86 120

Ithink you should look at: htonl() and family
http://beej.us/guide/bgnet/output/html/multipage/htonsman.html
This family of functions is used to encode/decode integers for transport between machines that have different sizes/endianness of integers.

Write your program in the clearest, simplest way. You shouldn't need to do anything to make it "portable."
Byte-swapping is done to translate data of one endianness to another. bswap_64 is for resolving incompatibility between different 64-bit systems such as Power and X86-64. It isn't for manipulating your data.
If you want to reverse bytes in C++, try searching the STL for "reverse." You will find std::reverse, a function which takes pointers or iterators to the first and one-past-last bytes of your 20-byte sequence and reverses it. It's in the <algorithm> header.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js