Should I think about Network Byte Order? - c++

I cannot understand the concept of "Network byte order". I have read Network byte order and endianness issues, but still cannot.
Now I made a formal network protocol description for a communication between two computers over TCP sockets. Here is the phrase "...use little-endian byte order". But the standard Network byte order is big-endian.
Whether should I think about a byte order, if on both sides of a network the byte order is fully defined, and I write, roughly speakin, void* and size ? How can a network "know" about my data? What about float types?
For example, whether I cannot write on my side:
stream.setDevice(tcpSocket);
stream.setByteOrder(QDataStream::LittleEndian);
...
struct SomeType
{
int32_t a;
int32_t b;
double c;
friend QDataStream& operator << (
QDataStream& stream, const SomeType& x)
{
stream << x.a
<< x.b
<< x.c;
return stream;
}
};
or may be just:
SomeType x;
tcpSocket.write(&x, size); // If known a byte order and a data structure alignment on both sides

32 bit value represented as little-endian (intel etc):
address offset 0 1 2 3
bits 0-7 bits 8-15 bits 16-23 bits 24-31
And represented in network byte order or big-endian (motorola cpus, etc):
address offset 0 1 2 3
bits 24-31 bits 16-23 bits 8-15 bits 0-7
Depending on which architecture you first learned to write machine code (if you did) one will make more sense to you than the other. For almost anyone under 45 years old, it will be little-endian which is the opposite of network byte order.
If you learned to write machine code on a Texas TMS9900 architecture like me it's even more confusing because in texas-land, bit 0 is the most significant bit (!)
Update:
In general it is better to encode data on the wire in a way that is independent of hardware or compiler implementation choices, or even languages.
Here's an example of such encoding from google's protocol buffers:
https://developers.google.com/protocol-buffers/docs/encoding
The advantages here are:
generally less traffic transmitted so faster networking
each end of the connection will understand the data regardless of hardware, compiler version, or even language.

A TCP-socket is just a stream of bytes and don't care at all about the endianness of the data you send. Thus, for your own private network-protocols you can use any byte-order you like. If all the computers that use the protocol has the same natural byte-order it is probably a good idea to use this as the serialization order as this allows you to write code like your second example.

Related

Is this the correct way of writing bits to big endian?

Currently, it's for a Huffman compression algorithm that assigns binary codes to characters used in a text file. Fewer bits for more frequent- and more bits for less-frequent characters.
Currently, I'm trying to save the binary code big-endian in a byte.
So let's say I'm using an unsigned char to hold it.
00000000
And I want to store some binary code that's 1101.
In advance, I want to apologize if this seems trivial or is a dupe but I've browsed dozens of other posts and can't seem to find what I need. If anyone could link or quickly explain, it'd be greatly appreciated.
Would this be the correct syntax?
I'll have some external method like
int length = 0;
unsigned char byte = (some default value);
void pushBit(unsigned int bit){
if (bit == 1){
byte |= 1;
}
byte <<= 1;
length++;
if (length == 8) {
//Output the byte
length = 0;
}
}
I've seen some videos explaining endianess and my understanding is the most significant bit (the first one) is placed in the lowest memory address.
Some videos showed the byte from left to right which makes me think I need to left shift everything over but whenever I set, toggle, erase a bit, it's from the rightmost is it not? I'm sorry once again if this is trivial.
So after my method finishes pushing the 1101 into this method, byte would be something like 00001101. Is this big endian? My knowledge of address locations is very weak and I'm not sure whether
**-->00001101 or 00001101<-- **
location is considered the most significant.
Would I need to left shift the remaining amount?
So since I used 4 bits, I would left shift 4 bits to make 11010000. Is this big endian?
First off, as the Killzone Kid noted, endianess and the bit ordering of a binary code are two entirely different things. Endianess refers to the order in which a multi-byte integer is stored in the bytes of memory. For little endian, the least significant byte is stored first. For big endian, the most significant byte is stored first. The bits in the bytes don't change order. Endianess has nothing to do with what you're asking.
As for accumulating bits until you have a byte's worth to write, you have the basic idea, but your code is incorrect. You need to shift first, and then or the bit. The way you're doing it, you are losing the first bit you put in off the top, and the low bit of what you write is always zero. Just put the byte <<= 1; before the if.
You also need to deal with ending the stream somehow, writing out the last bits if there are less than eight left. So you'll need a flushBits() to write out you bit buffer if it has more than one bit in it. Your bit stream would need to be self terminating, or you need to first send the number of bits, so that you don't misinterpret the filler bits in the last byte as a code or codes.
There are two types of endianness, Big-endian and Little-endian (technically there are more, like middle-endian, but big and little are the most common). If you want to have the big-endian format, (as it seems like you do), then the most significant byte comes first, with little-endian the least significant byte comes first.
Wikipedia has some good examples
It looks like what you are trying to do is store the bits themselves within the byte to be in reverse order, which is not what you want. A byte is endian agnostic and does not need to be flipped. Multi-byte types such as uint32_t may need their byte order changed, depending on what endianness you want to achieve.
Maybe what you are referring to is bit numbering, in which case the code you have should largely work (although you should compare length to 7, not 8). The order you place the bits in pushBit would end up with the first bit you pass being the most significant bit.
Bits aren't addressable by definition (if we're talking about C++, not C51 or its C++ successor), so from point of high level language, even from POV of assembler pseudo-code, no matter what the direction LSB -> MSB is, bit-wise << would perform shift from LSB to MSB. Bit order referred as bit numbering and is a separate feature from endian-ness, related to hardware implementation.
Bit fields in C++ change order because in most common use-cases usually bits do have an opposite order, e.g. in network communication, but in fact way how bit fields are packed into byte is implementation dependent, there is no consistency guarantee that there is no gaps or that order is preserved.
Minimal addressable unit of memory in C++ is of char size , and that's where your concern with endian-ness ends. The rare case if you actually should change bit order (when? working with some incompatible hardware?), you have to do explicitly so.
Note, that when working with Ethernet or other network protocol you should not do so, order change is done by hardware (first bit sent over wire is least significant one on the platform).

c++ memcpy a struct into a byte array

I have a problem with copying data of a struct to my byteArray. This byte array is used to pass information thru an interface. For normal datatypes I must use byteswap.
But now I have a struct. When I use memcpy, the values of the struct are swapped.
How can I copy the struct easily and "correctly" to the byte array?
memcpy(byteArray, &stData, sizeof(stData));
stData has simple integer. 0x0001 will be stored in the byte array as 0x1000.
If you are on an x86 architecture machine, then integers are stored in "Little Endian" order with the least significant bytes first. That is why 0x0001 will appear as 0x01 0x00 in a byte array. As long as you also unpack on a machine with the same architecture, this will work OK, but this is one of the (many) reasons that binary serialization is non-trivial.
If you need to exchange binary data between machines in a safe manner, then you can either decide on a standard (e.g. convert all binary data to little-endian or big-endian; network wire protocols generally convert to big-endian, though many high-performance proprietary systems stick with little-endian since today this is the native format on most machines) or look for a portable binary file format, such as HDF or BSON. (These store metadata about the binary data being stored.) Finally, you can convert to ASCII (XML, json). (Also, note that "big" and "little" aren't the only choices - "every machine" is a tall order since they haven't all been invented yet. :) )
See wikipedia or search for "endian" on SO for many examples.
Your problem is that you are in Little Endian end you want to store it as Big endian.
In the standard C hibrary you have functions to do this
htons, htonl : host (you little endian machine) to network standard( big endian).
s for 16 bits and l for 32 bits (http://linux.die.net/man/3/htons)
For 4 bytes integer you can do
#include <arpa/inet.h>
#include <stdint.h>
...
*(uint32_t*)byteArray = htonl((uint32_t)stData);
for 8 bytes int you can use bswap_64 https://www.gnu.org/software/gnulib/manual/html_node/bswap_005f64.html
But it only exists on gnu libc. Otherwise you have to swap manually, there are lot of examples on the web.

How to write EXACTLY n bits to UDP packet c++

As a way to learn my university content, I decided to start writing a networking library in c++11 using UDP and the new threading memory model. Here is my biggest roadblock at the moment.
The size of a byte is platform specific. Memcpy copies and writes with respect to the size of a byte on that platform. So how do I go about making platform-agnostic code to write exactly N bits to my UDP packets? N will typically be a multiple of 8, but I need to be sure that when I deal with N bytes, the same amount of bits are modified, regardless of platform.
The closest I think I have come to a solution is to make a base struct of 32 bits, and access it as groups of 8. E.g.
struct Data
{
char a : 8;
char b : 8;
char c : 8;
char d : 8;
}
This way, I know that each char will be limited to 8 bytes. My messages would all be a multiple of 32 - that is no problem (this struct would actually help when dealing with endianness). But how can I be sure that the compiler won't pad the structure to the platform's native boundary? Will this work?
Your data structure is not adequate. The char type is defined as signed char (7 data bits, 1 sign bit), unsigned char (8 data bits) or char by your compiler.
Most computers sending out packets use unsigned char or uint8_t for the octets.
Also, beware of multibyte ordering, as known as Endianness. Big endian is where the most significant byte comes first; Little endian is where the least significant byte comes first.
Many messaging schemes will pad remaining bits with zeros to make them come out to 8, 16 or 32 bit quantities.
You're probably looking for the #pragma pack(nopack) directive. As someone pointed out though, you're still going to be dealing with multiples of 8 bits.
char should always be exactly 1 byte, AFAIK although I could be wrong.

big endian little endian conversion

I saw a question on stack overflow how to convert from one endian to another. And the solution was like this:
template <typename T>
void swap_endian(T& pX)
{
char& raw = reinterpret_cast<char&>(pX);
std::reverse(&raw, &raw + sizeof(T));
}
My question is. Will this solution swap the bits correctly? It will swaps the bytes in the correct order but it will not swap the bits.
Yes it will, because there is no need to swap the bits.
Edit:
Endianness has effect on the order in which the bytes are written for values of 2 bytes or more. Little endian means the least significant byte comes first, big-endian the other way around.
If you receive a big-eindian stream of bytes written by a little endian system, there is no debate what the most significant bit is within the bytes. If the bit order was affected you could not read each others byte streams reliably (even if it was just plain 8 bit ascii).
This can not be autmatically determined for 2-byte or bigger values, as the file system (or network layer) does not know if you send data a byte at a time, or if you are sending ints that are (e.g.) 4 bytes long.
If you have a direct 1-bit serial connection with another system, you will have to agree on little or big endian bit ordering at the transport layer.
bigendian vs little endian concerns itself with how bytes are ordered within a larger unit, such as an int,long, etc. The ordering of bits within a byte is the same.
"Endianness" generally refers to byte order, not the order of the bits within those bytes. In this case, you don't have to reverse the bits.
You are correct, that function would only swap the byte order, not individual bits. This is usually sufficient for networking. Depending on your needs, you may also find the htons() family of functions useful.
From Wikipedia:
Most modern computer processors agree
on bit ordering "inside" individual
bytes (this was not always the
case). This means that any single-byte
value will be read the same on almost
any computer one may send it to."

How to guarantee bits of char and short for communication to external device

Hello I am writing a library for communicating to an external device via rs-232 serial connection.
Often I have to communicate a command that includes an 8 bit = 1 byte character or a 16 bit = 2 byte number. How do I do this in a portable way?
Main problem
From reading other questions it seems that the standard does not guarantee 1byte = 8bits,
(defined in the Standard $1.7/1)
The fundamental storage unit in the C
+ + memory model is the byte. A byte is at least large enough to contain
any member of the basic execution
character set and is composed of a
contiguous sequence of bits, the
number of which is
implementation-defined.
How can I guarantee the number of bits of char? My device expects 8-bits exactly, rather than at least 8 bits.
I realise that almost all implementations have 1byte = 8 bits but I am curious as to how to guarantee it.
Short->2 byte check
I hope you don't mind, I would also like to run my proposed solution for the short -> 2 byte conversion by you. I am new to byte conversions and cross platform portability.
To guarantee the the number of bytes of the short I guess I am going to need to need to
do a sizeof(short). If sizeof(short)=2 Convert to bytes and check the byte ordering (as here)
if sizeof(short)>2 then convert the short to bytes, check the byte ordering (as here), then check the most significant bytes are empty and remove them?
Is this the right thing to do? Is there a better way?
Many thanks
AFAIK, communication with the serial port is somehow platform/OS dependent, so when you write the low level part of it, you'll know very well the platform, its endianness and CHAR_BIT. In this way the question does not have any sense.
Also, don't forget that UART hardware is able to transmit 7 or 8 bit words, so it does not depend on the system architecture.
EDIT: I mentioned that the UART's word is fixed (let's consider mode 3 with 8 bits, as the most standard), the hardware itself won't send more than 8 bits, so by giving it one send command, it will send exactly 8 bits, regardless of the machine's CHAR_BIT. In this way, by using one single send for byte and
unsigned short i;
send(i);
send(i>>8);
you can be sure it will do the right thing.
Also, a good idea would be to see what exactly boost.asio is doing.
This thread seems to suggest you can use CHAR_BIT from <climits>. This page even suggests 8 is the minimum amount of bits in a char... Don't know how the quote from the standard relates to this.
For fixed-size integer types, if using MSVC2010 or GCC, you can rely on C99's <stdint.h> (even in C++) to define (u)int8_t and (u)int16_t which are guaranteed to be exactly 8 and 16 bits wide respectively.
CHAR_BIT from the <climits> header tells you the number of bits in a char. This is at least 8. Also, a short int uses at least 16 bits for its value representation. This is guaranteed by the minimum value ranges:
type can at least represent
---------------------------------------
unsigned char 0...255
signed char -127...127
unsigned short 0...65535
signed short -32767...32767
unsigned int 0...65535
signed int -32767...32767
see here
Regarding portability, whenever I write code that relies on CHAR_BIT==8 I simply write this:
#include <climits>
#if CHAR_BIT != 8
#error "I expect CHAR_BIT==8"
#endif
As you said, this is true for almost all platforms and if it's not in a particular case, it won't compile. That's enough portability for me. :-)