Understanding fixed width integer types - c++

I understand the idea of fixed width types, but I am little confused by the explanation provided by the reference:
signed integer type with width of exactly 8, 16, 32 and 64 bits respectively
with no padding bits and using 2's complement for negative values
(provided only if the implementation directly supports the type)
So as far as I understand, if I was able to compile an application, everything should work on platforms which are able to run it. There are my questions:
What if some platform does not provide support for those types? Is some kind of alignment used or can the application not at all?
If we have a guarantee that sizeof(char) is exactly one byte on every platform, regardless of byte size which can be different among platforms, does it mean that int8_t and uint8_t are guaranteed to be available everywhere?

If the implementation does not provide the type you used, it will not exist and your code will not compile. Manual porting will be needed in this case.
Regarding your second question: while we know that sizeof(char) == 1, it is not guaranteed that char has exactly eight bits; it can have more than that. If that is the case int8_t and friends will not exist.
Note that there are other types that might provide sufficient guarantees for your use case if you don't need to know the exact width, such as int_least8_t or int_fast8_t. Those leave the implementation some more freedom, making them more portable.
However, if you are targeting a platform on which common integer types do not exist, you should know that in advance anyway; so it is not worth spending too much time working around those most likely irrelevant issues. Those platforms are relatively exotic.

Related

In new code, why would you use `int` instead of `int_fast16_t` or `int_fast32_t` for a counting variable?

If you need a counting variable, surely there must be an upper and a lower limit that your integer must support. So why wouldn't you specify those limits by choosing an appropriate (u)int_fastxx_t data type?
The simplest reason is that people are more used to int than the additional types introduced in C++11, and that it's the language's "default" integral type (so much as C++ has one); the standard specifies, in [basic.fundamental/2] that:
Plain ints have the natural size suggested by the architecture of the execution environment46; the other signed integer types are provided to meet special needs.
46) that is, large enough to contain any value in the range of INT_MIN and INT_MAX, as defined in the header <climits>.
Thus, whenever a generic integer is needed, which isn't required to have a specific range or size, programmers tend to just use int. While using other types can communicate intent more clearly (for example, using int8_t indicates that the value should never exceed 127), using int also communicates that these details aren't crucial to the task at hand, while simultaneously providing a little leeway to catch values that exceed your required range (if a system handles signed overflow with modulo arithmetic, for example, an int8_t would treat 313 as 57, making the invalid value harder to troubleshoot); typically, in modern programming, it either indicates that the value can be represented within the system's word size (which int is supposed to represent), or that the value can be represented within 32 bits (which is nearly always the size of int on x86 and x64 platforms).
Sized types also have the issue that the (theoretically) most well-known ones, the intX_t line, are only defined on platforms which support sizes of exactly X bits. While the int_leastX_t types are guaranteed to be defined on all platforms, and guaranteed to be at least X bits, a lot of people wouldn't want to type that much if they don't have to, since it adds up when you need to specify types often. [You can't use auto, either because it detects integer literals as ints. This can be mitigated by making user-defined literal operators, but that still takes more time to type.] Thus, they'll typically use int if it's safe to do so.
Or in short, int is intended to be the go-to type for normal operation, with the other types intended to be used in extranormal circumstances. Many programmers stick to this mindset out of habit, and only use sized types when they explicitly require specific ranges and/or sizes. This also communicates intent relatively well; int means "number", and intX_t means "number that always fits in X bits".
It doesn't help that int has evolved to unofficially mean "32-bit integer", due to both 32- and 64-bit platforms usually using 32-bit ints. It's very likely that many programmers expect int to always be at least 32 bits in the modern age, to the point where it can very easily bite them in the rear if they have to program for platforms that don't support 32-bit ints.
Conversely, the sized types are typically used when a specific range or size is explicitly required, such as when defining a struct that needs to have the same layout on systems with different data models. They can also prove useful when working with limited memory, using the smallest type that can fully contain the required range.
A struct intended to have the same layout on 16- and 32-bit systems, for example, would use either int16_t or int32_t instead of int, because int is 16 bits in most 16-bit data models and the LP32 32-bit data model (used by the Win16 API and Apple Macintoshes), but 32 bits in the ILP32 32-bit data model (used by the Win32 API and *nix systems, effectively making it the de facto "standard" 32-bit model).
Similarly, a struct intended to have the same layout on 32- and 64-bit systems would use int/int32_t or long long/int64_t over long, due to long having different sizes in different models (64 bits in LP64 (used by 64-bit *nix), 32 bits in LLP64 (used by Win64 API) and the 32-bit models).
Note that there is also a third 64-bit model, ILP64, where int is 64 bits; this model is very rarely used (to my knowledge, it was only used on early 64-bit Unix systems), but would mandate the use of a sized type over int if layout compatibility with ILP64 platforms is required.
There are several reasons. One, these long names make the code less readable. Two, you might introduce really hard to find bugs. Say you used int_fast16_t but you really need to count up to 40,000. The implementation might use 32 bits and the code work just fine. Then you try to run the code on an implementation that uses 16 bits and you get hard-to-find bugs.
A note: In C / C++ you have types char, short, int, long and long long which must cover 8 to 64 bits, so int cannot be 64 bits (because char and short cannot cover 8, 16 and 32 bits), even if 64 bits is the natural word size. In Swift, for example, Int is the natural integer size, either 32 and 64 bits, and you have Int8, Int16, Int32 and Int64 for explicit sizes. Int is the best type unless you absolutely need 64 bits, in which case you use Int64, or if you need to save space.

Uses and when to use int16_t,int32_t,int64_t and respectively short int,int,long int,long

Uses and when to use int16_t, int32_t, int64_t and respectively short, int, long.
There are too many damn types in C++. For integers when is it correct to use one over the other?
Use the well-defined types when the precision is important. Use the less-determinate ones when it is not. It's never wrong to use the more precise ones. It sometimes leads to bugs when you use the flexible ones.
Use the exact-width types when you actually need an exact width. For example, int32_t is guaranteed to be exactly 32 bits wide, with no padding bits, and with a two's-complement representation. If you need all those requirements (perhaps because they're imposed by an external data format), use int32_t. Likewise for the other [u]intN_t types.
If you merely need a signed integer type of at least 32 bits, use int_least32_t or int_fast32_t, depending on whether you want to optimize for size or speed. (They're likely to be the same type.)
Use the predefined types short, int, long, et al when they're good enough for your purposes and you don't want to use the longer names. short and int are both guaranteed to be at least 16 bits, long at least 32 bits, and long long at least 64 bits. int is normally the "natural" integer type suggested by the system's architecture; you can think of it as int_fast16_t, and long as int_fast32_t, though they're not guaranteed to be the same.
I haven't given firm criteria for using the built-in vs. the [u]int_leastN_t and [u]int_fastN_t types because, frankly, there are no such criteria. If the choice isn't imposed by the API you're using or by your organization's coding standard, it's really a matter of personal taste. Just try to be consistent.
This is a good question, but hard to answer.
In one line: It depends to context:
My rule of thumb:
I'd prefer code performance (speed: less time, then less complexity)
When using existing library I'd follow the library coding style (context).
When coding in a team I'd follow the team coding style (context).
When coding new things I'd use int16_t,int32_t,int64_t,.. when ever possible.
Explanation:
using int (int is system word size) in some context give you performance, but in some other context not.
I'd use uint64_t over unsigned long long because it is concise, but when ever possible.
So It depends to context
A use that I have found for them is when I am bitpacking data for, say, an image compressor. Using these types that precisely specify the number of bytes can save a lot of headache, since the C++ standard does not explicitly define the number of bytes in its types, only the MIN and MAX ranges.
In MISRA-C 2004 and MISRA-C++ 2008 guidelines, the advisory is to prefer specific-length typedefs:
typedefs that indicate size and signedness should be used in place of
the basic numerical types. [...]
This rule helps to clarify the size
of the storage, but does not guarantee portability because of the
asymmetric behaviour of integral promotion. [...]
Exception is for char type :
The plain char type shall be used only for the storage and use of character values.
However, keep in mind that the MISRA guidelines are for critical systems.
Personally, I follow these guidelines for embedded systems, not for computer applications where I simply use an int when I want an integer, letting the compiler optimize as it wants.

Is there any advantage of using non-fixed integers (int, long) instead of fixed-size ones (int64_t, int32_t)?

Maybe performance? I feel that using non-fixed integers just makes programs more complicated and prone to fail when porting to another architecture.
std::intN_t are provided only if the implementation can directly support them. So porting code that uses them can fail.
I would prefer std::intfastN_t for general use because they have less restrictions and should be as fast or faster as int.
Also, most C++ code uses int everywhere so you might run into promotion weirdness when passing a std::int32_t into a function accepting an int, especially if sizeof(int) is only 16 bits.
Many APIs accept or return values of non-fixed types. For example, file descriptors are of type int, file offsets or sizes are of type off_t and strtol() returns a long. Blindly converting such values from or to fixed-size types is likely to cause overflow on some machine.
The guaranteed-width types (intN_t) are just typedefs for the appropriate 'standard' integer types. If a platform does not have an appropriate type (for example, it uses 36-bit integers), then it can't and mustn't provide the guaranteed-width typedefs.
This means that performance can hardly be an argument.
The general guideline for maximum portability (in this regard) is to use the 'standard' integer types by default and the guaranteed-width types only if your algorithm demands an exact number of bits.
The 'standard' integer types should be assumed to be only as wide as guaranteed by the relevant standards (If you only look at the C++ standard, that would be: 8-bits char, 16-bits int, 32-bits long and, if your compiler supports it, 64-bits long long).
If you have data where the size of your type is critical for it's functionality, then you should use types with defined sizes. However, for example a piece of code that is well within [what you can reasonably expect] int range (say for example 1 ... 1000 loop counter), there is no reason to use int_32t just because you want to define that your variable. It will work just fine with a 16, 32, 64, 36, 18 or 49 bit integer, all the same. So let the compiler pick the size that is best.
There is a possibility that the compiler generates worse code for fixed size integers that aren't "best choice" for the architecture.
Obviously, any data that is presented over a network or in a file needs to have fixed size. Likewise, if you have interfaces that require binary compatibility across the interface boundary, then using defined size types is very useful to avoid the size becomming a problem.

How to guarantee a C++ type's number of bits

I am looking to typedef my own arithmetic types (e.g. Byte8, Int16, Int32, Float754, etc) with the intention of ensuring they comprise a specific number of bits (and in the case of the float, adhere to the IEEE754 format). How can I do this in a completely cross-platform way?
I have seen snippets of the C/C++ standards here and there and there is a lot of:
"type is at least x bytes"
and not very much of:
"type is exactly x bytes".
Given that typedef Int16 unsigned short int may not necessarily result in a 16-bit Int16, is there a cross-platform way to guarantee my types will have specific sizes?
You can use exact-width integer types int8_t, int16_t, int32_t, int64_t declared in <cstdint>. This way the sizes are fixed on all the platforms
The only available way to truly guarantee an exact number of bits is to use a bit-field:
struct X {
int abc : 14; // exactly 14 bits, regardless of platform
};
There is some upper limit on the size you can specify this way -- at least 16 bits for int, and 32 bits for long (but a modern platform may easily allow up to 64 bits for either). Note, however, that while this guarantees that arithmetic on X::abc will use (or at least emulate) exactly 14 bits, it does not guarantee that the size of a struct X is the minimum number of bytes necessary to provide 14 bits (e.g., given 8-bit bytes, its size could easily be 4 or 8 instead of the 2 that are absolutely necessary).
The C and C++ standards both now include a specification for fixed-size types (e.g., int8_t, int16_t), but no guarantee that they'll be present. They're required if the platform provides the right type, but otherwise won't be present. If memory serves, these are also required to use a 2's complement representation, so a platform with a 16-bit 1's complement integer type (for example) still won't define int16_t.
Have a look at the types declared in stdint.h. This is part of the standard library, so it is expected (though technically not guaranteed) to be available everywhere. Among the types declared here are int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, and uint64_t. Local implementations will map these types to the appropriate-width type for the given complier and architecture.
This is not possible.
There are platforms where char is 16 or even 32 bits.
Note that I'm not saying there are in theory platforms where this happens... it is a real and quite concrete possibility (e.g. DSPs).
On that kind of hardware there is just no way to use 8 bit only for an operation and for example if you need 8 bit modular arithmetic then the only way is doing a masking operation yourself.
The C language doesn't provide this kind of emulation for you...
With C++ you could try to build a class that behaves like the expected native elementary type in most cases (with the exclusion of sizeof, obviously). The result will have however truly horrible performances.
I can think to no use case in which forcing the hardware this way against its nature would be a good idea.
It is possible to use C++ templates at compile time to check and create new types on the fly that do fit your requirements, specifically that sizeof() of the type is the correct size that you want.
Take a look at this code: Compile time "if".
Do note that if the requested type is not available then it is entirely possible that your program will simply not compile. It simply depends on whether or not that works for you or not!

Why is uint_8 etc. used in C/C++?

I've seen some code where they don't use primitive types int, float, double etc. directly.
They usually typedef it and use it or use things like
uint_8 etc.
Is it really necessary even these days? Or is C/C++ standardized enough that it is preferable to use int, float etc directly.
Because the types like char, short, int, long, and so forth, are ambiguous: they depend on the underlying hardware. Back in the days when C was basically considered an assembler language for people in a hurry, this was okay. Now, in order to write programs that are portable -- which means "programs that mean the same thing on any machine" -- people have built special libraries of typedefs and #defines that allow them to make machine-independent definitions.
The secret code is really quite straight-forward. Here, you have uint_8, which is interpreted
u for unsigned
int to say it's treated as a number
_8 for the size in bits.
In other words, this is an unsigned integer with 8 bits (minimum) or what we used to call, in the mists of C history, an "unsigned char".
uint8_t is rather useless, because due to other requirements in the standard, it exists if and only if unsigned char is 8-bit, in which case you could just use unsigned char. The others, however, are extremely useful. int is (and will probably always be) 32-bit on most modern platforms, but on some ancient stuff it's 16-bit, and on a few rare early 64-bit systems, int is 64-bit. It could also of course be various odd sizes on DSPs.
If you want a 32-bit type, use int32_t or uint32_t, and so on. It's a lot cleaner and easier than all the nasty legacy hacks of detecting the sizes of types and trying to use the right one yourself...
Most code I read, and write, uses the fixed-size typedefs only when the size is an important assumption in the code.
For example if you're parsing a binary protocol that has two 32-bit fields, you should use a typedef guaranteed to be 32-bit, if only as documentation.
I'd only use int16 or int64 when the size must be that, say for a binary protocol or to avoid overflow or keep a struct small. Otherwise just use int.
If you're just doing "int i" to use i in a for loop, then I would not write "int32" for that. I would never expect any "typical" (meaning "not weird embedded firmware") C/C++ code to see a 16-bit "int," and the vast majority of C/C++ code out there would implode if faced with 16-bit ints. So if you start to care about "int" being 16 bit, either you're writing code that cares about weird embedded firmware stuff, or you're sort of a language pedant. Just assume "int" is the best int for the platform at hand and don't type extra noise in your code.
The sizes of types in C are not particularly well standardized. 64-bit integers are one example: a 64-bit integer could be long long, __int64, or even int on some systems. To get better portability, C99 introduced the <stdint.h> header, which has types like int32_t to get a signed type that is exactly 32 bits; many programs had their own, similar sets of typedefs before that.
C and C++ purposefully don't define the exact size of an int. This is because of a number of reasons, but that's not important in considering this problem.
Since int isn't set to a standard size, those who want a standard size must do a bit of work to guarantee a certain number of bits. The code that defines uint_8 does that work, and without it (or a technique like it) you wouldn't have a means of defining an unsigned 8 bit number.
The width of primitive types often depends on the system, not just the C++ standard or compiler. If you want true consistency across platforms when you're doing scientific computing, for example, you should use the specific uint_8 or whatever so that the same errors (or precision errors for floats) appear on different machines, so that the memory overhead is the same, etc.
C and C++ don't restrict the exact size of the numeric types, the standards only specify a minimum range of values that has to be represented. This means that int can be larger than you expect.
The reason for this is that often a particular architecture will have a size for which arithmetic works faster than other sizes. Allowing the implementor to use this size for int and not forcing it to use a narrower type may make arithmetic with ints faster.
This isn't going to go away any time soon. Even once servers and desktops are all fully transitioned to 64-bit platforms, mobile and embedded platforms may well be operating with a different integer size. Apart from anything else, you don't know what architectures might be released in the future. If you want your code to be portable, you have to use a fixed-size typedef anywhere that the type size is important to you.