How does endianness affect enumeration values in C++? - c++

How does endianness affect enumeration values in C++?
Is the size of an enum dependent upon how many enumerations there are and thus some enums are 1 byte, while others are 2 or 4 bytes?
How do I put an enumeration value into network byte order from host byte order?

Endianness affects enumerations no more or less than it does other integer types.
The only guarantee on size is that it must be possible for an enum to hold values of int; the compiler is free to choose the actual size based on the defined members (disclaimer: this is certainly the case for C, not 100% sure for C++; I'll check...)

Enums depend on the compiler. They can be an 1, 2, or 4 bytes (see here). They should have the same endianness as the platform they are used on.
To put an enum value into a specific byte order you would need to know what the system you are on is and what the network expect. Then treat as you would an int. See here for help on conversions.

Same way it affects everything else.
The compiler is free to choose the minimum required space.
htons(), or if you know you have more than 64k values, htonl().

Related

Capacity of fundamental types across different platforms

I know that sizeof(type) will return different values, depending on the platform and the compiler.
However, I know that whenever talking about ints (int32) it is said that it can be one of 2^32 values.
If I'm on a platform where int32 is 8 bytes, it's theoretical maximum is 2^64. Can it really store that much data, or does it always store 4 bytes and use 4 bytes for padding?
The question really is, while I know that sizes of types will differ, I want to know whether asking for max_int on various platform will be constant or will it give me the value according to the type size.
Particularly when dealing with files. If I write int32 to file, will it always store 4 bytes, or will it depend?
EDIT:
Given all the comments and the fact that I'm trying to create an equivalent of C# BinaryReader, I think that using fixed size type is the best choice, since it would delegate all this to whoever uses it (making it more flexible). Right?
std::int32_t has always a size of 32bit (usually 4 bytes).
The size of int can vary and depends on the platform you compile for, but at least 16 bit (usually 2 bytes).
You can check the max value of your type in C++:
#include <limits>
std::numeric_limits<std::int32_t>::max()
std::numeric_limits<int>::max()

Fortran storage_size intrinsic function

I am looking at the storage_size intrinsic function introduced in Fortran 2008 to obtain the size of a user-defined type man storage size. It returns the size in bits, not bytes. I am wondering what the rationale is behind returning the size in bits instead of bytes.
Since I need the size in bytes, I am simply going to divide the result by 8. Is it safe to assume that the size returned will always be divisible by 8?
It is not even safe to expect byte is always 8 bits (see CHARACTER_STORAGE_SIZE in module iso_fortran_env)! For rationale for the storage_size() contact someone from SC22/WG5 or X3J3, but one of the former members always says (on comp.lang.fortran) these questions don't have much sense and a single clear answer. There was often just someone pushing this variant and not the other.
My guess would be the symmetry with the former function bit_size() is one of the reasons. And why is there bit_size() and not byte_size()? I would guess you do not have to multiply it with the byte size (and check how large is one byte) and you can apply the bit manipulation procedures instantly.
To your last question. Yes, on a machine with 8-bit bytes (other machines do not have Fortran 2008 compilers AFAIK) the bit size will always be divisible by 8 as one byte is the smallest addressable piece of memory and structures cannot use just part of one byte.

Why do the sizes of data types change as the Operating System changes?

This question was asked to me in an interview, that size of char is 2 bytes in some OS, but in some operating system it is 4 bytes or different.
Why is that so?
Why is it different from other fundamental types, such as int?
That was probably a trick question. The sizeof(char) is always 1.
If the size differs, it's probably because of a non-conforming compiler, in which case the question should be about the compiler itself, not about the C or C++ language.
5.3.3 Sizeof [expr.sizeof]
1 The sizeof operator yields the number of bytes in the object
representation of its operand. The operand is either an expression,
which is not evaluated, or a parenthesized type-id. The sizeof
operator shall not be applied to an expression that has function or
incomplete type, or to an enumeration type before all its enumerators
have been declared, or to the parenthesized name of such types, or to
an lvalue that designates a bit-field. sizeof(char), sizeof(signed
char) and sizeof(unsigned char) are 1. The result of sizeof applied to any other fundamental type (3.9.1) is
implementation-defined. (emphasis mine)
The sizeof of other types than the ones pointed out are implementation-defined, and they vary for various reasons. An int has better range if it's represented in 64 bits instead of 32, but it's also more efficient as 32 bits on a 32-bit architecture.
The physical sizes (in terms of the number of bits) of types are usually dictated by the target hardware.
For example, some CPUs can access memory only in units not smaller than 16-bit. For the best performance, char can then be defined a 16-bit integer. If you want 8-bit chars on this CPU, the compiler has to generate extra code for packing and unpacking of 8-bit values into and from 16-bit memory cells. That extra packing/unpacking code will make your code bigger and slower.
And that's not the end of it. If you subdivide 16-bit memory cells into 8-bit chars, you effectively introduce an extra bit in addresses/pointers. If normal addresses are 16-bit in the CPU, where do you stick this extra, 17th bit? There are two options:
make pointers bigger (32-bit, of which 15 are unused) and waste memory and reduce the speed further
reduce the range of addressable address space by half, wasting memory, and loosing speed
The latter option can sometimes be practical. For example, if the entire address space is divided in halves, one of which is used by the kernel and the other by user applications, then application pointers will never use one bit in their addresses. You can use that bit to select an 8-bit byte in a 16-bit memory cell.
C was designed to run on as many different CPUs as possible. This is why the physical sizes of char, short, int, long, long long, void*, void(*)(), float, double, long double, wchar_t, etc can vary.
Now, when we're talking about different physical sizes in different compilers producing code for the same CPU, this becomes more of an arbitrary choice. However, it may be not that arbitrary as it may seem. For example, many compilers for Windows define int = long = 32 bits. They do that to avoid programmer's confusion when using Windows APIs, which expect INT = LONG = 32 bits. Defining int and long as something else would contribute to bugs due to loss of programmer's attention. So, compilers have to follow suit in this case.
And lastly, the C (and C++) standard operates with chars and bytes. They are the same concept size-wise. But C's bytes aren't your typical 8-bit bytes, they can legally be bigger than that as explained earlier. To avoid confusion you may use the term octet, whose name implies the number 8. A number of protocols uses this word for this very purpose.

Naming convention used in `<cstdint>`

The <cstdint> (<stdint.h>) header defines several integral types and their names follow this pattern: intN_t, where N is the number of bits, not bytes.
Given that a byte is not strictly defined as being 8 bits in length, why aren't these types defined as, for example, int1_t instead of int8_t? Wouldn't that be more appropriate since it takes into account machines that have bytes of unusual lengths?
On machines that don't have exactly those sizes the types are not defined. That is, if you machine doesn't have an 8-bit byte then int8_t would not be available. You would still however have the least versions available, such as int_least16_t.
The reason one suspects is that if you want a precise size you usually want a bit-size and not really an abstract byte size. For example all internet protocols deal with an 8-bit byte, thus you'd want to have 8-bits, whether that is a native byte size or not.
This answer is also quite informative in this regards.
int32_t could be a 4-byte 8-bits-per-byte type, or it could be a 2-byte 16-bits-per-byte type, or it could be a 1-byte 32-bits-per-byte type. It doesn't matter for the values you can store in it.
The idea of using those types is to make explicit the number of bits you can store into the variable. As you pointed, different architectures may have different byte sizes, so having the number of bytes doesn't guarantee the number of bits your variable can handle.

Is the byte alignment requirement of a given data type guaranteed to be a power of 2?

Is the byte alignment requirement of a given data type guaranteed to be a power of 2?
Is there something that provides this guarantee other than it "not making sense otherwise" because it wouldn't line up with system page sizes?
(background: C/C++, so feel free to assume data type is a C or C++ type and give C/C++ specific answers.)
Alignment requirement are based on the hardware. Most, if not all, "modern" chips have addresses that are divisible by 8, not just a power of 2. In the past there were non-divisible by 8 chips (I know of a 36 bit architecture).
Things you can assume about alignment, per the C standard:
The alignment requirement of any type divides the size of that type (as determined by sizeof).
The character types char, signed char, and unsigned char have no alignment requirement. (This is actually just a special case of the first point.)
In the modern real world, integer and pointer types have sizes that are powers of two, and their alignment requirements are usually equal to their sizes (the only exception being long long on 32-bit machines). Floating point is a bit less uniform. On 32-bit machines, all floating point types typically have an alignment of 4, whereas on 64-bit machines, the alignment requirement of floating point types is typically equal to the size of the type (4, 8, or 16).
The alignment requirement of a struct should be the least common multiple of the alignment requirements of its members, but a compiler is allowed to impose stricter alignment. However, normally each cpu architecture has an ABI standard that includes alignment rules, and compilers which do not adhere to the standard will generate code that cannot be linked with code built by compilers which follow the ABI standard, so it would be very unusual for a compiler to break from the standard except for very special-purpose use.
By the way, a useful macro that will work on any sane compiler is:
#define alignof(T) ((char *)&((struct { char x; T t; } *)0)->t - (char *)0)
The alignment of a field inside a "struct", optimized for size could very well be on a odd boundary. other then that your "It wouldn't make sense" would probably apply, but I think there is NO guarantee, especially if the program was small model, optimized for size. - Joe
The standard doesn't require alignment, but allows struct/unions/bit fields to silently add padding bytes to get a correct alignment. The compiler is also free to align all your data types on even addresses should it desire.
That being said, this is CPU dependent, and I don't believe there exists a CPU that has an alignment requirement on odd addresses. There are plenty of CPUs with no alignment requirements however, and the compiler may then place variables at any address.
In short, no. It depends on the hardware.
However, most modern CPUs either do byte alignment (e.g., Intel x86 CPUs), or word alignment (e.g., Motorola, IBM/390, RISC, etc.).
Even with word alignment, it can be complicated. For example, a 16-bit word would be aligned on a 2-byte (even) address, a 32-bit word on a 4-byte boundary, but a 64-bit value may only require 4-byte alignment instead of an 8-byte aligned address.
For byte-aligned CPUs, it's also a function of the compiler options. The default alignmen for struct members can usually be specified (usually also with a compiler-specific #pragma).
For basic data types (ints, floats, doubles) usually the alignment matches the size of the type. For classes/structs, the alignment is at least the lowest common multiple of the alignment of all its members (that's the standard)
In Visual Studio you can set your own alignment for a type, but it has to be a power of 2, between 1 and 8192.
In GCC there is a similar mechanism, but it has no such requirement (at least in theory)