Naming convention used in `<cstdint>` - c++

The <cstdint> (<stdint.h>) header defines several integral types and their names follow this pattern: intN_t, where N is the number of bits, not bytes.
Given that a byte is not strictly defined as being 8 bits in length, why aren't these types defined as, for example, int1_t instead of int8_t? Wouldn't that be more appropriate since it takes into account machines that have bytes of unusual lengths?

On machines that don't have exactly those sizes the types are not defined. That is, if you machine doesn't have an 8-bit byte then int8_t would not be available. You would still however have the least versions available, such as int_least16_t.
The reason one suspects is that if you want a precise size you usually want a bit-size and not really an abstract byte size. For example all internet protocols deal with an 8-bit byte, thus you'd want to have 8-bits, whether that is a native byte size or not.
This answer is also quite informative in this regards.

int32_t could be a 4-byte 8-bits-per-byte type, or it could be a 2-byte 16-bits-per-byte type, or it could be a 1-byte 32-bits-per-byte type. It doesn't matter for the values you can store in it.

The idea of using those types is to make explicit the number of bits you can store into the variable. As you pointed, different architectures may have different byte sizes, so having the number of bytes doesn't guarantee the number of bits your variable can handle.

Related

Capacity of fundamental types across different platforms

I know that sizeof(type) will return different values, depending on the platform and the compiler.
However, I know that whenever talking about ints (int32) it is said that it can be one of 2^32 values.
If I'm on a platform where int32 is 8 bytes, it's theoretical maximum is 2^64. Can it really store that much data, or does it always store 4 bytes and use 4 bytes for padding?
The question really is, while I know that sizes of types will differ, I want to know whether asking for max_int on various platform will be constant or will it give me the value according to the type size.
Particularly when dealing with files. If I write int32 to file, will it always store 4 bytes, or will it depend?
EDIT:
Given all the comments and the fact that I'm trying to create an equivalent of C# BinaryReader, I think that using fixed size type is the best choice, since it would delegate all this to whoever uses it (making it more flexible). Right?
std::int32_t has always a size of 32bit (usually 4 bytes).
The size of int can vary and depends on the platform you compile for, but at least 16 bit (usually 2 bytes).
You can check the max value of your type in C++:
#include <limits>
std::numeric_limits<std::int32_t>::max()
std::numeric_limits<int>::max()

In new code, why would you use `int` instead of `int_fast16_t` or `int_fast32_t` for a counting variable?

If you need a counting variable, surely there must be an upper and a lower limit that your integer must support. So why wouldn't you specify those limits by choosing an appropriate (u)int_fastxx_t data type?
The simplest reason is that people are more used to int than the additional types introduced in C++11, and that it's the language's "default" integral type (so much as C++ has one); the standard specifies, in [basic.fundamental/2] that:
Plain ints have the natural size suggested by the architecture of the execution environment46; the other signed integer types are provided to meet special needs.
46) that is, large enough to contain any value in the range of INT_MIN and INT_MAX, as defined in the header <climits>.
Thus, whenever a generic integer is needed, which isn't required to have a specific range or size, programmers tend to just use int. While using other types can communicate intent more clearly (for example, using int8_t indicates that the value should never exceed 127), using int also communicates that these details aren't crucial to the task at hand, while simultaneously providing a little leeway to catch values that exceed your required range (if a system handles signed overflow with modulo arithmetic, for example, an int8_t would treat 313 as 57, making the invalid value harder to troubleshoot); typically, in modern programming, it either indicates that the value can be represented within the system's word size (which int is supposed to represent), or that the value can be represented within 32 bits (which is nearly always the size of int on x86 and x64 platforms).
Sized types also have the issue that the (theoretically) most well-known ones, the intX_t line, are only defined on platforms which support sizes of exactly X bits. While the int_leastX_t types are guaranteed to be defined on all platforms, and guaranteed to be at least X bits, a lot of people wouldn't want to type that much if they don't have to, since it adds up when you need to specify types often. [You can't use auto, either because it detects integer literals as ints. This can be mitigated by making user-defined literal operators, but that still takes more time to type.] Thus, they'll typically use int if it's safe to do so.
Or in short, int is intended to be the go-to type for normal operation, with the other types intended to be used in extranormal circumstances. Many programmers stick to this mindset out of habit, and only use sized types when they explicitly require specific ranges and/or sizes. This also communicates intent relatively well; int means "number", and intX_t means "number that always fits in X bits".
It doesn't help that int has evolved to unofficially mean "32-bit integer", due to both 32- and 64-bit platforms usually using 32-bit ints. It's very likely that many programmers expect int to always be at least 32 bits in the modern age, to the point where it can very easily bite them in the rear if they have to program for platforms that don't support 32-bit ints.
Conversely, the sized types are typically used when a specific range or size is explicitly required, such as when defining a struct that needs to have the same layout on systems with different data models. They can also prove useful when working with limited memory, using the smallest type that can fully contain the required range.
A struct intended to have the same layout on 16- and 32-bit systems, for example, would use either int16_t or int32_t instead of int, because int is 16 bits in most 16-bit data models and the LP32 32-bit data model (used by the Win16 API and Apple Macintoshes), but 32 bits in the ILP32 32-bit data model (used by the Win32 API and *nix systems, effectively making it the de facto "standard" 32-bit model).
Similarly, a struct intended to have the same layout on 32- and 64-bit systems would use int/int32_t or long long/int64_t over long, due to long having different sizes in different models (64 bits in LP64 (used by 64-bit *nix), 32 bits in LLP64 (used by Win64 API) and the 32-bit models).
Note that there is also a third 64-bit model, ILP64, where int is 64 bits; this model is very rarely used (to my knowledge, it was only used on early 64-bit Unix systems), but would mandate the use of a sized type over int if layout compatibility with ILP64 platforms is required.
There are several reasons. One, these long names make the code less readable. Two, you might introduce really hard to find bugs. Say you used int_fast16_t but you really need to count up to 40,000. The implementation might use 32 bits and the code work just fine. Then you try to run the code on an implementation that uses 16 bits and you get hard-to-find bugs.
A note: In C / C++ you have types char, short, int, long and long long which must cover 8 to 64 bits, so int cannot be 64 bits (because char and short cannot cover 8, 16 and 32 bits), even if 64 bits is the natural word size. In Swift, for example, Int is the natural integer size, either 32 and 64 bits, and you have Int8, Int16, Int32 and Int64 for explicit sizes. Int is the best type unless you absolutely need 64 bits, in which case you use Int64, or if you need to save space.

how is word size in computer related to int or long

I have seen the link What does it mean by word size in computer? . It defines what word size is.
I am trying to represent very long string in bits where each character is represented by 4 bits and save it in long or integer array so that I can extract my string when required.
I can save the bits either in integer array or long array.
If I use long array (8 bytes) I will be able to save 8*4=32 bits in one long array.
But if I use int I will be able to save 4*4=16 bits only.
Now, if I am given my Word Size=32 then is it the case that I should use int only and not long.
To answer your direct question: There is no guaranteed relationship between the natural word-size of the processor and the C and C++ types int or long. Yes, quite often int will be the same as the size of a register in the processor, but most 64-bit processors do not follow this rule, as it makes data unnecessarily large. On the other hand, an 8-bit processor would have a register size of 8 bits, but int according to the C and C++ standards needs to be at least 16 bits in size, so the compiler would have to use more than one register to represent one integer [in some fashion].
In general, if you want to KNOW how many bits or bytes some type is, it's best to NOT rely on int, long, size_t or void *, since they are all likely to be different for different processor architectures or even different compilers on the same architecture. An int or long may be the same size or different sizes. Only rule that the standard says is that long is at least 32 bits.
So, to have control of the number of bits, use #include <cstdint> (or in C, stdint.h), and use the types for example uint16_t or uint32_t - then you KNOW that it will hold a given number of bits.
On a processor that has 36-bit "wordsize", the type uint32_t for example, will not exist, since there is no type that holds exactly 32-bits [most likely]. Alternatively, the compiler may add extra instructions to "behave as if it's a 32-bit type" (in other words, sign extending if necessary, and masking off the top bits as needed)

Why do the sizes of data types change as the Operating System changes?

This question was asked to me in an interview, that size of char is 2 bytes in some OS, but in some operating system it is 4 bytes or different.
Why is that so?
Why is it different from other fundamental types, such as int?
That was probably a trick question. The sizeof(char) is always 1.
If the size differs, it's probably because of a non-conforming compiler, in which case the question should be about the compiler itself, not about the C or C++ language.
5.3.3 Sizeof [expr.sizeof]
1 The sizeof operator yields the number of bytes in the object
representation of its operand. The operand is either an expression,
which is not evaluated, or a parenthesized type-id. The sizeof
operator shall not be applied to an expression that has function or
incomplete type, or to an enumeration type before all its enumerators
have been declared, or to the parenthesized name of such types, or to
an lvalue that designates a bit-field. sizeof(char), sizeof(signed
char) and sizeof(unsigned char) are 1. The result of sizeof applied to any other fundamental type (3.9.1) is
implementation-defined. (emphasis mine)
The sizeof of other types than the ones pointed out are implementation-defined, and they vary for various reasons. An int has better range if it's represented in 64 bits instead of 32, but it's also more efficient as 32 bits on a 32-bit architecture.
The physical sizes (in terms of the number of bits) of types are usually dictated by the target hardware.
For example, some CPUs can access memory only in units not smaller than 16-bit. For the best performance, char can then be defined a 16-bit integer. If you want 8-bit chars on this CPU, the compiler has to generate extra code for packing and unpacking of 8-bit values into and from 16-bit memory cells. That extra packing/unpacking code will make your code bigger and slower.
And that's not the end of it. If you subdivide 16-bit memory cells into 8-bit chars, you effectively introduce an extra bit in addresses/pointers. If normal addresses are 16-bit in the CPU, where do you stick this extra, 17th bit? There are two options:
make pointers bigger (32-bit, of which 15 are unused) and waste memory and reduce the speed further
reduce the range of addressable address space by half, wasting memory, and loosing speed
The latter option can sometimes be practical. For example, if the entire address space is divided in halves, one of which is used by the kernel and the other by user applications, then application pointers will never use one bit in their addresses. You can use that bit to select an 8-bit byte in a 16-bit memory cell.
C was designed to run on as many different CPUs as possible. This is why the physical sizes of char, short, int, long, long long, void*, void(*)(), float, double, long double, wchar_t, etc can vary.
Now, when we're talking about different physical sizes in different compilers producing code for the same CPU, this becomes more of an arbitrary choice. However, it may be not that arbitrary as it may seem. For example, many compilers for Windows define int = long = 32 bits. They do that to avoid programmer's confusion when using Windows APIs, which expect INT = LONG = 32 bits. Defining int and long as something else would contribute to bugs due to loss of programmer's attention. So, compilers have to follow suit in this case.
And lastly, the C (and C++) standard operates with chars and bytes. They are the same concept size-wise. But C's bytes aren't your typical 8-bit bytes, they can legally be bigger than that as explained earlier. To avoid confusion you may use the term octet, whose name implies the number 8. A number of protocols uses this word for this very purpose.

How does endianness affect enumeration values in C++?

How does endianness affect enumeration values in C++?
Is the size of an enum dependent upon how many enumerations there are and thus some enums are 1 byte, while others are 2 or 4 bytes?
How do I put an enumeration value into network byte order from host byte order?
Endianness affects enumerations no more or less than it does other integer types.
The only guarantee on size is that it must be possible for an enum to hold values of int; the compiler is free to choose the actual size based on the defined members (disclaimer: this is certainly the case for C, not 100% sure for C++; I'll check...)
Enums depend on the compiler. They can be an 1, 2, or 4 bytes (see here). They should have the same endianness as the platform they are used on.
To put an enum value into a specific byte order you would need to know what the system you are on is and what the network expect. Then treat as you would an int. See here for help on conversions.
Same way it affects everything else.
The compiler is free to choose the minimum required space.
htons(), or if you know you have more than 64k values, htonl().