As a way to learn my university content, I decided to start writing a networking library in c++11 using UDP and the new threading memory model. Here is my biggest roadblock at the moment.
The size of a byte is platform specific. Memcpy copies and writes with respect to the size of a byte on that platform. So how do I go about making platform-agnostic code to write exactly N bits to my UDP packets? N will typically be a multiple of 8, but I need to be sure that when I deal with N bytes, the same amount of bits are modified, regardless of platform.
The closest I think I have come to a solution is to make a base struct of 32 bits, and access it as groups of 8. E.g.
struct Data
{
char a : 8;
char b : 8;
char c : 8;
char d : 8;
}
This way, I know that each char will be limited to 8 bytes. My messages would all be a multiple of 32 - that is no problem (this struct would actually help when dealing with endianness). But how can I be sure that the compiler won't pad the structure to the platform's native boundary? Will this work?
Your data structure is not adequate. The char type is defined as signed char (7 data bits, 1 sign bit), unsigned char (8 data bits) or char by your compiler.
Most computers sending out packets use unsigned char or uint8_t for the octets.
Also, beware of multibyte ordering, as known as Endianness. Big endian is where the most significant byte comes first; Little endian is where the least significant byte comes first.
Many messaging schemes will pad remaining bits with zeros to make them come out to 8, 16 or 32 bit quantities.
You're probably looking for the #pragma pack(nopack) directive. As someone pointed out though, you're still going to be dealing with multiples of 8 bits.
char should always be exactly 1 byte, AFAIK although I could be wrong.
Related
When I look up examples for htonl, it always returns a uint32_t. However, when I call htonl in VS2015 using Winsock2.h, it returns a u_long.
On my machine, when I compile for 32 bit and for 64 bit, I get that the size of a u_long is 4 bytes. I read online, that in a 64 bit architecture a long should be 8 bytes. Will this ever be the case? I am worried that I will have compatibility issues if a u_long is not the same amount of bytes as uint32_t when the data is to be sent over the socket.
TL;DR - Will a u_long always be 4 bytes? If not, how should you reliably send a 32 bit integer over a socket?
u_long is a typedef to unsigned long, long in turn is (or should be) guaranteed to be at least 32 bits - that is at least 4 bytes. On some systems it may be larger - but there's really no way to know in advance.
So, when you do network communication and want to send integers that's larger than one byte you have to take care to restrict the size yourself. Don't just send sizeof bytes, restrict it to four bytes.
Also when dealing with integers you have the little matter of which byte order is used to send/receive the data.
If you have the same OS both sides, this won't be an issue - but if you switch between Windows and Linux for example, it could be.
I saw a question on stack overflow how to convert from one endian to another. And the solution was like this:
template <typename T>
void swap_endian(T& pX)
{
char& raw = reinterpret_cast<char&>(pX);
std::reverse(&raw, &raw + sizeof(T));
}
My question is. Will this solution swap the bits correctly? It will swaps the bytes in the correct order but it will not swap the bits.
Yes it will, because there is no need to swap the bits.
Edit:
Endianness has effect on the order in which the bytes are written for values of 2 bytes or more. Little endian means the least significant byte comes first, big-endian the other way around.
If you receive a big-eindian stream of bytes written by a little endian system, there is no debate what the most significant bit is within the bytes. If the bit order was affected you could not read each others byte streams reliably (even if it was just plain 8 bit ascii).
This can not be autmatically determined for 2-byte or bigger values, as the file system (or network layer) does not know if you send data a byte at a time, or if you are sending ints that are (e.g.) 4 bytes long.
If you have a direct 1-bit serial connection with another system, you will have to agree on little or big endian bit ordering at the transport layer.
bigendian vs little endian concerns itself with how bytes are ordered within a larger unit, such as an int,long, etc. The ordering of bits within a byte is the same.
"Endianness" generally refers to byte order, not the order of the bits within those bytes. In this case, you don't have to reverse the bits.
You are correct, that function would only swap the byte order, not individual bits. This is usually sufficient for networking. Depending on your needs, you may also find the htons() family of functions useful.
From Wikipedia:
Most modern computer processors agree
on bit ordering "inside" individual
bytes (this was not always the
case). This means that any single-byte
value will be read the same on almost
any computer one may send it to."
Hello I am writing a library for communicating to an external device via rs-232 serial connection.
Often I have to communicate a command that includes an 8 bit = 1 byte character or a 16 bit = 2 byte number. How do I do this in a portable way?
Main problem
From reading other questions it seems that the standard does not guarantee 1byte = 8bits,
(defined in the Standard $1.7/1)
The fundamental storage unit in the C
+ + memory model is the byte. A byte is at least large enough to contain
any member of the basic execution
character set and is composed of a
contiguous sequence of bits, the
number of which is
implementation-defined.
How can I guarantee the number of bits of char? My device expects 8-bits exactly, rather than at least 8 bits.
I realise that almost all implementations have 1byte = 8 bits but I am curious as to how to guarantee it.
Short->2 byte check
I hope you don't mind, I would also like to run my proposed solution for the short -> 2 byte conversion by you. I am new to byte conversions and cross platform portability.
To guarantee the the number of bytes of the short I guess I am going to need to need to
do a sizeof(short). If sizeof(short)=2 Convert to bytes and check the byte ordering (as here)
if sizeof(short)>2 then convert the short to bytes, check the byte ordering (as here), then check the most significant bytes are empty and remove them?
Is this the right thing to do? Is there a better way?
Many thanks
AFAIK, communication with the serial port is somehow platform/OS dependent, so when you write the low level part of it, you'll know very well the platform, its endianness and CHAR_BIT. In this way the question does not have any sense.
Also, don't forget that UART hardware is able to transmit 7 or 8 bit words, so it does not depend on the system architecture.
EDIT: I mentioned that the UART's word is fixed (let's consider mode 3 with 8 bits, as the most standard), the hardware itself won't send more than 8 bits, so by giving it one send command, it will send exactly 8 bits, regardless of the machine's CHAR_BIT. In this way, by using one single send for byte and
unsigned short i;
send(i);
send(i>>8);
you can be sure it will do the right thing.
Also, a good idea would be to see what exactly boost.asio is doing.
This thread seems to suggest you can use CHAR_BIT from <climits>. This page even suggests 8 is the minimum amount of bits in a char... Don't know how the quote from the standard relates to this.
For fixed-size integer types, if using MSVC2010 or GCC, you can rely on C99's <stdint.h> (even in C++) to define (u)int8_t and (u)int16_t which are guaranteed to be exactly 8 and 16 bits wide respectively.
CHAR_BIT from the <climits> header tells you the number of bits in a char. This is at least 8. Also, a short int uses at least 16 bits for its value representation. This is guaranteed by the minimum value ranges:
type can at least represent
---------------------------------------
unsigned char 0...255
signed char -127...127
unsigned short 0...65535
signed short -32767...32767
unsigned int 0...65535
signed int -32767...32767
see here
Regarding portability, whenever I write code that relies on CHAR_BIT==8 I simply write this:
#include <climits>
#if CHAR_BIT != 8
#error "I expect CHAR_BIT==8"
#endif
As you said, this is true for almost all platforms and if it's not in a particular case, it won't compile. That's enough portability for me. :-)
Problem: I cannot understand the number 256 (2^8) in the extract of the IBM article:
On the other hand, if it's a
big-endian system, the high byte is 1
and the value of x is 256.
Assume each element in an array consumes 4 bites, then the processor should read somehow: 1000 0000. If it is a big endian, it is 0001 0000 because endianness does not affect bits inside bytes. [2] Contradiction to the 256 in the article!?
Question: Why is the number 256_dec (=1000 0000_bin) and not 32_dec (=0001 0000_bin)?
[2] Endian issues do not affect sequences that have single bytes, because "byte" is considered an atomic unit from a storage point of view.
Because a byte is 8 bits, not 4. The 9th least significant bit in an unsigned int will have value 2^(9-1)=256. (the least significant has value 2^(1-1)=1).
From the IBM article:
unsigned char endian[2] = {1, 0};
short x;
x = *(short *) endian;
They're correct; the value is (short)256 on big-endian, or (short)1 on little-endian.
Writing out the bits, it's an array of {00000001_{base2}, 00000000_{base2}}. Big endian would interpret that bit array read left to right; little endian would swap the two bytes.
256dec is not 1000_0000bin, it's 0000_0001_0000_0000bin.
With swapped bytes (1 byte = 8 bits) this looks like 0000_0000_0000_0001bin, which is 1dec.
Answering your followup question: briefly, there is no "default size of an element in an array" in most programming languages.
In C (perhaps the most popular programming language), the size of an array element -- or anything, really -- depends on its type. For an array of char, the elements are usually 1 byte. But for other types, the size of each element is whatever the sizeof() operator gives. For example, many C implementations give sizeof(short) == 2, so if you make an array of short, it will then occupy 2*N bytes of memory, where N is the number of elements.
Many high-level languages discourage you from even attempting to discover how many bytes an element of an array requires. Giving a fixed number of bytes ties the designers' hands to always using that many bytes, which is good for transparency and code that relies on its binary representation, but bad for backward compatibility whenever some reason comes along to change the representation.
Hope that helps. (I didn't see the other comments until after I wrote the first version of this.)
I have a byte array that has hex values and I initially put those values in a unsigned long.
I am using a 32 bit processor via Ubuntu at the moment. But, i might have to port this program to a 64 bit processor.
now I am aware of strtoul function but since I was able to convert it would any issues via a direct assignment I did not bother with that function. The reason I put it in a unsigned long was because I was thinking about little/big endian issues and so using a register like signed long would just take care of that problem for me regardless of processor. now however, i have been thinking about how my program would work on a 64 bit processor.
since i am on a 32bit processor it might only recognize 32bit long vs a 64 bit processor only recognizing a 64 bit long which would put my signed long array in jeopardy. so, to fix this issue I just made that signed array into long long. Would that address my concerns? or do I need to do something else?
some help and explanation would be appreciated. all my code is in c++.
Instead of using long or long long you should use a typedef like uint32_t, or something similar, so it can be 32-bits on all platforms, unless this isn't what you want?
It seems you do have a potential problem with endianness though, if you are simply doing:
char bytes[4] = {0x12, 0x23, 0xff, 0xed};
long* p_long = reinterpret_cast<long*>(bytes);
std::cout << std::hex << *p_long << std::endl; // prints edff2312 on a little endian platform, 1223ffed on a big endian one.
since the actual value of the bytes when interpreted as an integer will change depending on endianness. There is a good answer on converting endianness here.
1) Signed vs Unsigned does not make you immune to endian issues. The only data type endian agnostic is a byte (char). With everything else you need to swap endian if you have two different machines
2) A 64-bit machine will always provide you with some type of 32-bit integer which you can use to pull values out of your array. So that shouldn't be an issue, as long as you're sure that both machines are using a 32-bit int (and you probably code the endianness of the data).
You might want to look at SO 2032744 for an example of big-endian vs little-endian issues.
I'm not sure what you mean by using a register would resolve your endian-ness issues. We'd need to see the code to know. However, if you need to transfer integer values over the wire between different machines, you need to be sure that you are handling the size and the byte order correctly. That means both ends must agree on how to handle it - even if they actually do things differently.
Copying a byte array into a 'long' on an Intel platform will produce different results from copying the same array into a 'long' on a SPARC platform. To go via a register, you'd have to use code similar to:
void st_uint4(Uint4 l, char *s)
{
s += sizeof(Uint4) - 1;
*s-- = l & 0xFF;
l >>= 8;
*s-- = l & 0xFF;
l >>= 8;
*s-- = l & 0xFF;
l >>= 8;
*s = l & 0xFF;
}
Uint4 ld_uint4(const char *s)
{
int i;
Uint4 j = 0;
for (i = 0; i < 4; i++)
{
j = (j << 8) | (*s++ & 0xFF);
}
return(j);
}
There are multiple ways to write that code.
Addressing the comments:
When dealing with data across machines, you have to be very careful. The two functions shown are inverses of each other. The 'ld_uint4()' function takes a byte array and loads that into a 4-byte signed integer (assuming you have a typedef for Uint4 that maps to a 4-byte signed integer - uint32_t from inttypes.h or stdint.h is a good bet). The st_uint4() function does the reverse operation. This code uses a big-endian storage format (the MSB is first in the byte array), but the same code is used on both types of platform (no performance advantage to either - and no conditional compilation, which is probably more important). You could write the code to work with little-endian storage; you could write the code so that there is less penalty on one type of machine versus the other.
Understanding data layouts on disk is crucial - defining them carefully and in a platform neutral way is also crucial. Handling (single-byte code set) strings is easy; handling wide character strings (UTF-16 or UTF-32) is like handling integers - and you can use code similar to the code above for Uint2 and Uint8 if you wish (I have such functions pre-packaged, for example - I just copied the Uint4 versions; I also have SintN functions - for the copying stuff, the difference is not crucial, but for memory comparisons, the comparison techniques for signed and unsigned values are different).
Handling float and double is trickier still - though if you can safely assume IEEE 754 format, it is primarily a big-endian vs little-endian issue that you face (that and perhaps some skulduggery with a union). The code-base I work with leaves double/float platform dependent (a nuisance, but a decision dating back to the days before IEEE 754 was ubiquitous) so I don't have platform neutral code for that. Also beware of alignments; Intel chips allow misaligned access but other chips (SPARC, PowerPC) do not, or incur large overheads. That means if you copy a 4-byte value, the source and target addresses must be 4-byte aligned if you do a simple copy; the store/load functions above do not have that as a problem and can deal with arbitrary alignments. Again, be wary of over-optimization (premature optimization).