Converting network byte order (big endian) to little endian - c++

I have found the following function from MSDN which converts an unsigned long from network byte to unsigned long in host byte order i.e. in little-endian defined as:
u_long WSAAPI ntohl(
_In_ u_long netlong
);
The MSDN document says that it can convert a 32 bits number. But since in C++ as I have read that long and int are not the same i.e. long is not guaranteed to be 32 bits or the same size of an integer INT_MAX.
So, I wonder if there is a a similar function which takes a 32 bits values such as unsigned int instead of unsigned long?

The documentation specifically says that ntohl's netlong parameter is a 32-bit value:
netlong [in]
A 32-bit number in TCP/IP network byte order.
I have read that long and int are not the same i.e. long is not
guaranteed to be 32 bits or the same size of an integer INT_MAX.
You're right -- in Standard C++ a long is not guaranteed to be any particular size, except that it must be at least 32 bits.
However since we're talking about the endian conversion functions, we're talking about platform-specifics here. We need to drill down now in to what a long is under Windows. And under Windows, a long is 32-bits:
LONG
A 32-bit signed integer. The range is –2147483648 through 2147483647
decimal. This type is declared in WinNT.h as follows:
typedef long LONG;

ntohl() and htonl() always operate on 32-bit values on all platforms. That is part of the BSD standard. On platforms where (u_)long is larger than 32-bits, a different 32-bit data type will be used instead. For instance, some platforms, including Linux and FreeBSD, use uint32_t to avoid any issues with the size of (u_)long.

Related

fastest/smallest signed integer type

I am reading about the Fixed width integer types (cpp reference ) and come across
the types int_fast8_t, int_fast16_t, int_fast32_t and int_least8_t,
int_least16_t, int_least32_t,etc. My questions are following
What does it mean by saying for example int_fast32_t is fastest signed integer type (with at least 32 bits )? Is the more common type unsigned int slow?
What does it mean by saying for example int_least32_t is smallest signed integer type?
what are the differences between int_fast32_t, int_least32_t and unsigned int?
int_fast32_t means that it is the fastest type of at least 32 bit for the processor. For most processors it is probably a 32 bit int. But imagine a 48 bit processor without a 32 bit add instruction. Keeping everything 48 bits is faster. int_least32_t is the smallest type for the target that can hold 32 bits. On the hypothetical 48 bit processor, there might be a 32 bit data type supported, with library support to implement them. Or int_least32_t might also be 48 bits. int is usually the fastest integer type for the target, but there is no guarantee as to the number of bits you'll get.

Why are there different names for same type of data unit?

As far as i know, in c++ on a 32bit compiler, int = __int32 = long = DWORD. But why have so many? Why not just one?
If i were to pick a name, int32 seems most appropriate since there is no confusion there as to what it could be.
int is a pre-C99 type which is guaranteed to be at least 16 bits, but is 32 bits on most modern architectures. (It was originally intended to be the "native" word size, but even on 64-bit architectures it is usually still 32 bits, largely for backwards compatibility reasons.)
long is a pre-C99 type which is guaranteed to be at least 32 bits, but is allowed to be wider. (Few compilers make it longer, even on 64-bit architectures, largely for backwards compatibility reasons.)
__int32/__int32_t is a nonstandard typedef which was implemented by many C compilers and runtime libraries, to guarantee a fixed width pre-C99.
int32_t is a C99 type which is guaranteed to be exactly 32 bits.
DWORD is a typedef from the original Windows API which is guaranteed to be exactly 32 bits, from the days when there was no language-defined type of exactly 32 bits.
So basically, the large number of ways to say "32-bit integer" come from how C dragged its feet on standardizing fixed-width types, and from the long tenure of 32-bit processors dominating the field, causing everyone to standardize on 32 bits as the "normal" integer size.
Because of legacy applications. An int doesn't describe how big it is at all. It's an integer. Big deal.
In the 16-bit era, an int was not a long. DWORD being a double-word was precise. A word is known as 2 bytes, and therefore a DWORD must be two of them.
__intXX are Microsoft specific.
So, there are lots of different reasons why different projects (e.g Microsoft Windows) uses different types.
Where compilers TODAY are typically 32-bit, this has not always been the case. And there are compilers that are 64-bit.
The term DWORD originates from way back when Windows was a 16-bit segmented mode application (many members here have probably never worked on a 16-bit segmented mode environment). It is "two 16-bit words", treated, at least these days, as an unsigned 32-bit value.
The type int32_t is defined by the C standard document (and through inheritance, also in C++). It is GUARANTEED to only exist if it is actually exactly 32 bits. On a machine with 36-bit words, there is no int32_t (there is a int32_least_t, which should exist on all systems that support AT LEAST 32 bits).
long is 32 bits in a Windows 32- or 64-bit compiler, but 64-bits in a Linux 64-bit compiler, and 32-bits in a Linux 32-bit compiler. So it's definitely "variable size".
It is also often a good idea to pick your OWN name for types. That is assuming you do care at all - it's also fine to use int, long, etc, as long as you are not RELYING on them being some size - for(i = 0; i < 10; i++) x += i; will work with i and x being any integer type - the sum is even below 128, so char would work. Using int here will be fine, since it's likely to be a "fast" type. In some architectures, using long may make the code slower - especially in 16-bit architectures where long takes up two 16-bit words and needs to be dealt with using (typically) two or more operations for addition and subtraction for example. This can really slow code down in sensitive places.
It is because they represent different types which can be translated to different sizes.
int is a default 'integer' and its size is not specified.
`int32' says it is 32 bit (four bytes integer)
long is a 'longer version integer' which can occupy larger about of bytes. On your 32bit compiler it is still 4 bytes integer. A 'long long' type, which on Windows, as I remember was __int64 was 64bit.
DWORD is a Microsoft introduced type. It is a 'double word', where word, at that time, meant 'two bytes'
You choice of int32 is good when you know that you need 32bit integer.

When/where/why is a size_t not a uint?

I've been getting a lot of criticism for using uint instead of size_t, but every time I check the toolchain I am working with turns out size_t is defined as a uint.
Are there any compiler implementations where size_t is actually not a uint? What are the grounds for that criticism?
size_t is the "size matching the largest possible address range you can use in the machine" (or some words to roughly that effect).
In particular, size_t will be 64 bits on a 64-bit machine, and 32 bits on a 32-bit system.
I'm assuming uint is short of unsigned int, which is pretty much universally 32 bits (these days, some older systems would be using 16-bit integers). So on a 64-bit system, an unsigned int will be 32 bits still, although memory allocations, strings, etc can be larger than 32 bits in size - which would cause problems if you are trying to use uint for the size.

Long Vs. Int C/C++ - What's The Point?

As I've learned recently, a long in C/C++ is the same length as an int. To put it simply, why? It seems almost pointless to even include the datatype in the language. Does it have any uses specific to it that an int doesn't have? I know we can declare a 64-bit int like so:
long long x = 0;
But why does the language choose to do it this way, rather than just making a long well...longer than an int? Other languages such as C# do this, so why not C/C++?
When writing in C or C++, every datatype is architecture and compiler specific. On one system int is 32, but you can find ones where it is 16 or 64; it's not defined, so it's up to compiler.
As for long and int, it comes from times, where standard integer was 16bit, where long was 32 bit integer - and it indeed was longer than int.
The specific guarantees are as follows:
char is at least 8 bits (1 byte by definition, however many bits it is)
short is at least 16 bits
int is at least 16 bits
long is at least 32 bits
long long (in versions of the language that support it) is at least 64 bits
Each type in the above list is at least as wide as the previous type (but may well be the same).
Thus it makes sense to use long if you need a type that's at least 32 bits, int if you need a type that's reasonably fast and at least 16 bits.
Actually, at least in C, these lower bounds are expressed in terms of ranges, not sizes. For example, the language requires that INT_MIN <= -32767, and INT_MAX >= +32767. The 16-bit requirements follows from this and from the requirement that integers are represented in binary.
C99 adds <stdint.h> and <inttypes.h>, which define types such as uint32_t, int_least32_t, and int_fast16_t; these are typedefs, usually defined as aliases for the predefined types.
(There isn't necessarily a direct relationship between size and range. An implementation could make int 32 bits, but with a range of only, say, -2**23 .. +2^23-1, with the other 8 bits (called padding bits) not contributing to the value. It's theoretically possible (but practically highly unlikely) that int could be larger than long, as long as long has at least as wide a range as int. In practice, few modern systems use padding bits, or even representations other than 2's-complement, but the standard still permits such oddities. You're more likely to encounter exotic features in embedded systems.)
long is not the same length as an int. According to the specification, long is at least as large as int. For example, on Linux x86_64 with GCC, sizeof(long) = 8, and sizeof(int) = 4.
long is not the same size as int, it is at least the same size as int. To quote the C++03 standard (3.9.1-2):
There are four signed integer types: “signed char”, “short int”,
“int”, and “long int.” In this list, each type provides at least as
much storage as those preceding it in the list. Plain ints have the
natural size suggested by the architecture of the execution
environment); the other signed integer types are provided to meet special needs.
My interpretation of this is "just use int, but if for some reason that doesn't fit your needs and you are lucky to find another integral type that's better suited, be our guest and use that one instead". One way that long might be better is if you 're on an architecture where it is... longer.
looking for something completely unrelated and stumbled across this and needed to answer. Yeah, this is old, so for people who surf on in later...
Frankly, I think all the answers on here are incomplete.
The size of a long is the size of the number of bits your processor can operate on at one time. It's also called a "word". A "half-word" is a short. A "doubleword" is a long long and is twice as large as a long (and originally was only implemented by vendors and not standard), and even bigger than a long long is a "quadword" which is twice the size of a long long but it had no formal name (and not really standard).
Now, where does the int come in? In part registers on your processor, and in part your OS. Your registers define the native sizes the CPU handles which in turn define the size of things like the short and long. Processors are also designed with a data size that is the most efficient size for it to operate on. That should be an int.
On todays 64bit machines you'd assume, since a long is a word and a word on a 64bit machine is 64bits, that a long would be 64bits and an int whatever the processor is designed to handle, but it might not be. Why? Your OS has chosen a data model and defined these data sizes for you (pretty much by how it's built). Ultimately, if you're on Windows (and using Win64) it's 32bits for both a long and int. Solaris and Linux use different definitions (the long is 64bits). These definitions are called things like ILP64, LP64, and LLP64. Windows uses LLP64 and Solaris and Linux use LP64:
Model ILP64 LP64 LLP64
int 64 32 32
long 64 64 32
pointer 64 64 64
long long 64 64 64
Where, e.g., ILP means int-long-pointer, and LLP means long-long-pointer
To get around this most compilers seem to support setting the size of an integer directly with types like int32 or int64.

unsigned int vs. size_t

I notice that modern C and C++ code seems to use size_t instead of int/unsigned int pretty much everywhere - from parameters for C string functions to the STL. I am curious as to the reason for this and the benefits it brings.
The size_t type is the unsigned integer type that is the result of the sizeof operator (and the offsetof operator), so it is guaranteed to be big enough to contain the size of the biggest object your system can handle (e.g., a static array of 8Gb).
The size_t type may be bigger than, equal to, or smaller than an unsigned int, and your compiler might make assumptions about it for optimization.
You may find more precise information in the C99 standard, section 7.17, a draft of which is available on the Internet in pdf format, or in the C11 standard, section 7.19, also available as a pdf draft.
Classic C (the early dialect of C described by Brian Kernighan and Dennis Ritchie in The C Programming Language, Prentice-Hall, 1978) didn't provide size_t. The C standards committee introduced size_t to eliminate a portability problem
Explained in detail at embedded.com (with a very good example)
In short, size_t is never negative, and it maximizes performance because it's typedef'd to be the unsigned integer type that's big enough -- but not too big -- to represent the size of the largest possible object on the target platform.
Sizes should never be negative, and indeed size_t is an unsigned type. Also, because size_t is unsigned, you can store numbers that are roughly twice as big as in the corresponding signed type, because we can use the sign bit to represent magnitude, like all the other bits in the unsigned integer. When we gain one more bit, we are multiplying the range of numbers we can represents by a factor of about two.
So, you ask, why not just use an unsigned int? It may not be able to hold big enough numbers. In an implementation where unsigned int is 32 bits, the biggest number it can represent is 4294967295. Some processors, such as the IP16L32, can copy objects larger than 4294967295 bytes.
So, you ask, why not use an unsigned long int? It exacts a performance toll on some platforms. Standard C requires that a long occupy at least 32 bits. An IP16L32 platform implements each 32-bit long as a pair of 16-bit words. Almost all 32-bit operators on these platforms require two instructions, if not more, because they work with the 32 bits in two 16-bit chunks. For example, moving a 32-bit long usually requires two machine instructions -- one to move each 16-bit chunk.
Using size_t avoids this performance toll. According to this fantastic article, "Type size_t is a typedef that's an alias for some unsigned integer type, typically unsigned int or unsigned long, but possibly even unsigned long long. Each Standard C implementation is supposed to choose the unsigned integer that's big enough--but no bigger than needed--to represent the size of the largest possible object on the target platform."
The size_t type is the type returned by the sizeof operator. It is an unsigned integer capable of expressing the size in bytes of any memory range supported on the host machine. It is (typically) related to ptrdiff_t in that ptrdiff_t is a signed integer value such that sizeof(ptrdiff_t) and sizeof(size_t) are equal.
When writing C code you should always use size_t whenever dealing with memory ranges.
The int type on the other hand is basically defined as the size of the (signed) integer value that the host machine can use to most efficiently perform integer arithmetic. For example, on many older PC type computers the value sizeof(size_t) would be 4 (bytes) but sizeof(int) would be 2 (byte). 16 bit arithmetic was faster than 32 bit arithmetic, though the CPU could handle a (logical) memory space of up to 4 GiB.
Use the int type only when you care about efficiency as its actual precision depends strongly on both compiler options and machine architecture. In particular the C standard specifies the following invariants: sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) placing no other limitations on the actual representation of the precision available to the programmer for each of these primitive types.
Note: This is NOT the same as in Java (which actually specifies the bit precision for each of the types 'char', 'byte', 'short', 'int' and 'long').
Type size_t must be big enough to store the size of any possible object. Unsigned int doesn't have to satisfy that condition.
For example in 64 bit systems int and unsigned int may be 32 bit wide, but size_t must be big enough to store numbers bigger than 4G
This excerpt from the glibc manual 0.02 may also be relevant when researching the topic:
There is a potential problem with the size_t type and versions of GCC prior to release 2.4. ANSI C requires that size_t always be an unsigned type. For compatibility with existing systems' header files, GCC defines size_t in stddef.h' to be whatever type the system'ssys/types.h' defines it to be. Most Unix systems that define size_t in `sys/types.h', define it to be a signed type. Some code in the library depends on size_t being an unsigned type, and will not work correctly if it is signed.
The GNU C library code which expects size_t to be unsigned is correct. The definition of size_t as a signed type is incorrect. We plan that in version 2.4, GCC will always define size_t as an unsigned type, and the fixincludes' script will massage the system'ssys/types.h' so as not to conflict with this.
In the meantime, we work around this problem by telling GCC explicitly to use an unsigned type for size_t when compiling the GNU C library. `configure' will automatically detect what type GCC uses for size_t arrange to override it if necessary.
If my compiler is set to 32 bit, size_t is nothing other than a typedef for unsigned int. If my compiler is set to 64 bit, size_t is nothing other than a typedef for unsigned long long.
size_t is the size of a pointer.
So in 32 bits or the common ILP32 (integer, long, pointer) model size_t is 32 bits.
and in 64 bits or the common LP64 (long, pointer) model size_t is 64 bits (integers are still 32 bits).
There are other models but these are the ones that g++ use (at least by default)