What is int8_t if a machine has > 8 bits per byte? - c++

I was reading the C++ FAQ and it says
The C++ language guarantees a byte must always have at least 8 bits
So what does that mean for the <cstdint> types?
Side question - if I want an array of bytes should I use int8_t or char and why?

C++ (and C as well) defines intX_t (i.e. the exact width integer types) typedefs as optional. So, it just won't be there if there is no addressable unit that's exactly 8-bit wide.
If you want an array of bytes, you should use char, as sizeof char (and signed char and unsigned char) is well-defined to always be 1 byte.

To add to what Cat Plus Plus has already said (that the type is
optional), you can test whether it is present by using something like:
#ifdef INT8_MAX
// type int8_t exists.
#endif
or more likely:
#ifndef INT8_MAX
#error Machines with bytes that don't have 8 bits aren't supported
#endif

Related

Why is the result of sizeof implementation defined? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
In C99, §6.5.3.4:
2 The sizeof operator yields the size (in bytes) of its operand,
which may be an expression or the parenthesized name of a type. ...
4 The value of the result is implementation-defined, and its type (an
unsigned integer type) is size_t, defined in <stddef.h> (and other
headers).
In C++14, §5.3.3:
1 The sizeof operator yields the number of bytes in the object
representation of its operand. ... The result of sizeof applied to any
other fundamental type (3.9.1) is implementation-defined.
The only guaranteed values are sizeof(char), sizeof(unsigned char) and sizeof(signed char) which is one.
However, "the number of bytes in the object representation" seems pretty iron-clad to me. For example, in C99 §6.2.6.1:
4 Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object
of that type, in bytes. ...
So why is it implementation-defined if it seems pretty defined?
Many of you seem to be misinterpretating my question. I never claimed that:
A) The size of types are defined or the same on all systems,
B) implementation-defined means it can return "random values"
What I'm getting at here is that n * CHAR_BITS is a fixed formula. The formula itself can't changed between implementations. Yes, an int may be 4 bytes or 8 bytes. I get that. But between all implementations, the value must n * CHAR_BITS.
The result of sizeof is implementation defined because the size of the various basic types are implementation defined. The only guarantees we have on the size of the types in C++ is that
sizeof(char) = 1 and sizeof(char) <= sizeof(short) <= sizeof(int) <=
sizeof(long) <= sizeof(long long)
And that each type has a minimum value it must support C11 [Annex E (informative) Implementation limits]/1
[...]The minimum magnitudes shown shall be replaced by implementation-defined magnitudes with the same sign.[...]
#define CHAR_BIT 8
#define CHAR_MAX UCHAR_MAX or SCHAR_MAX
#define CHAR_MIN 0 or SCHAR_MIN
#define INT_MAX +32767
#define INT_MIN -32767
#define LONG_MAX +2147483647
#define LONG_MIN -2147483647
#define LLONG_MAX +9223372036854775807
#define LLONG_MIN -9223372036854775807
#define MB_LEN_MAX 1
#define SCHAR_MAX +127
#define SCHAR_MIN -127
#define SHRT_MAX +32767
#define SHRT_MIN -32767
#define UCHAR_MAX 255
#define USHRT_MAX 65535
#define UINT_MAX 65535
#define ULONG_MAX 4294967295
#define ULLONG_MAX 18446744073709551615
So per the standard a int has to be able to store a number that could be stored in 16 bits but it can be bigger and on most of today's systems it is 32 bits.
What I'm getting at here is that n * CHAR_BITS is a fixed formula. The formula itself can't changed between implementations. Yes, an int may be 4 bytes or 8 bytes. I get that. But between all implementations, the value must n * CHAR_BITS.
You are correct but n is defined per C99 §6.2.6.1 as
where n is the size of an object of that type
emphasis mine
So the formula may be fixed but n is not fixed and different implementations on the same system can use a different value of n.
The result of sizeof is not implementation defined. The standard does not say that; it says:
The value of the result is implementation-defined, [...]
That is semantically different. The result of sizeof is well defined:
[...] the size (in bytes) of its operand [...]
Both the bit width of a byte in this context and the number of bytes in non char types is implementation defined.
Because the sizes of basic types are defined in terms of efficiency, not in terms of exact number of bits. An "int" must be something that the CPU can manipulate efficiently. For most modern systems, this quantity turns out to be 32 bits (or 64 bits). For older systems, it was quite often 16 bits. However, if a 35 bits CPU were to exist, an int on such a system would be 35 bits. In other words, C++ does not apply a penalty to enforce a bit-width a CPU might not support at all.
Of course, one could argue that notions of exotic bit widths for basic types have been overtaken by history. I cannot think of any modern CPU that does not support the standard set of 8, 16, and 32 bits (feel free to disagree with this statement, but at least be so kind to give an example!), and 64 bits is also pretty common (and not a big deal to support in software if hardware support is unavailable).
Arguably the C++ language has already moved away from having variable numbers of bits for char; as far as I know, u8"..." converts to char *, but the unicode specification demands that u8 is encoded in 8 bits.
If a char of 8 bits is size 1, then an int of 32 bits is size 4. If a char of 16 bits is size 1, then an int of 32 bits is only size 2. Both situations are equally valid in C++, if such sizes happen to be good choices for their respective hardware.
Padding bits are "unspecified" not "implementation-defined".
Wrong. Very, very wrong. The values of padding bytes are unspecified. The intention here is that the values of these bits may represent trap values, but not necessarily.
The standard tells you sizeof returns bytes * CHAR_BITS, but doesn't specify a size (other than the exact-width types). The number of bytes a type occupies is implementation-defined, hence sizeof must be as well.
Implementation-defined is decribed as:
unspecified value where each implementation documents how the choice
is made
When you declare a new variable in example like this:
size_t a;
it will be equal with this:
unsigned short int a; // unsigned short a
On 32-bit computers size of the integer number (int) is 4 bytes.
Size of the short int is 2 bytes.
In C programming languange 'size_t' is the return type of the 'sizeof()' operator.When you use 'sizeof()' operator he will give you the size of the object.Argument of the 'sizeof()' must be an l-value type. Size of the element(object) cannot be a negative number and it must be an integer.

Is there an actual 8-bit integer data type in C++

In c++, specifically the cstdint header file, there are types for 8-bit integers which turn out to be of the char data type with a typedef. Could anyone suggest an actual 8-bit integer type?
Yes, you are right. int8_t and uint8_t are typedef to char on platforms where 1 byte is 8 bits. On platforms where it is not, appropriate definition will be given.
Following answer is based on assumption that char is 8 bits
char holds 1 byte, which may be signed or unsigned based on implementation.
So int8_t is signed char and uint8_t is unsigned char, but this will be safe to use int8_t/uint8_t as actual 8-bit integer without relying too much on the implementation.
For a implementer's point of view, typedeffing where char is 8 bits makes sense.
Having seen all this, It is safe to use int8_t or uint8_t as real 8 bit integer.

is it possible to get definitive/absolute sized types in C/C++?

I've glossed over some documentation and it seems like the spec only requires 'int' or 'long' or whatever to be able to hold "at least some range of values" (often corresponding to the max range afforded by n bytes).
Anyways, is there a reasonable way to ask for an integer of exactly n bits/bytes? I don't even need a way to specify arbitrary length or anything weird, I'd just want a type with definitively 2 bytes, or definitively 4 bytes. like "int32" or something.
Currently, the way I'm dealing with this is by having a char array of n length, then casting it to an int * and dereferencing.
(My reasoning for wanting this has to do with reading/writing to files directly from structs- and I acknowledge that with this I'll have to worry about struct packing and endianness and stuff with that, but that's another issue...)
Also, "compatibility" with like super limited embedded systems is not a particular concern.
Thanks!
The c++11 standard defines integer types of definite size, provided they are available on the target architecture.
#include <cstdint>
std::int8_t c; // 8-bit unsigned integer
std::int16_t s; // 16-bit unsigned integer
std::int32_t i; // 32-bit unsigned integer
std::int64_t l; // 64-bit unsigned integer
and the corresponding unsigned types with
std::uint8_t uc; // 8-bit unsigned integer
std::uint16_t us; // 16-bit unsigned integer
std::uint32_t ui; // 32-bit unsigned integer
std::uint64_t ul; // 64-bit unsigned integer
As noted in the comments, these types are also available in C from the stdint.h header without the std:: namespace prefix:
#include <stdint.h>
uint32_t ui;
In addition to the types of definite size, these header files also define types
that are at least n bits wide but may be larger, e.g. int_least16_t with at least 16 bits
that provide the fastest implementation of integers with at least n bits but may be larger, e.g. std::int_fast32_t with at least 32 bits.
The typed declared in <cstdint>, such as int32_t will either be exactly that number of bits [32 in this example], or not exist if the architecture doesn't support that size values. There are also types int_fast32_t which is guaranteed to hold a 32-bit value, but could be larger, and int_fast32_t which has a similar guarantee.
The current c++ standard provides Fixed width integer types like std::int16_t std::uint16_t, where 16 means the type size in bits.
You can use the types from <stdint.h>, but you cannot be sure that there is exactly the type you want.
If your architecture does have exact 32 bit types, which is highly likely, then you can use int16_t, uint16_t, int32_t and uint32_t, if not, the types int_fast32_t and uint_fast32_t as well as int_least32_t and uint_least32_t , etc. are always available.

size guarantee for integral/arithmetic types in C and C++

I know that the C++ standard explicitly guarantees the size of only char, signed char and unsigned char. Also it gives guarantees that, say, short is at least as big as char, int as big as short etc. But no explicit guarantees about absolute value of, say, sizeof(int). This was the info in my head and I lived happily with it. Some time ago, however, I came across a comment in SO (can't find it) that in C long is guaranteed to be at least 4 bytes, and that requirement is "inherited" by C++. Is that the case? If so, what other implicit guarantees do we have for the sizes of arithmetic types in C++? Please note that I am absolutely not interested in practical guarantees across different platforms in this question, just theoretical ones.
18.2.2 guarantees that <climits> has the same contents as the C library header <limits.h>.
The ISO C90 standard is tricky to get hold of, which is a shame considering that C++ relies on it, but the section "Numerical limits" (numbered 2.2.4.2 in a random draft I tracked down on one occasion and have lying around) gives minimum values for the INT_MAX etc. constants in <limits.h>. For example ULONG_MAX must be at least 4294967295, from which we deduce that the width of long is at least 32 bits.
There are similar restrictions in the C99 standard, but of course those aren't the ones referenced by C++03.
This does not guarantee that long is at least 4 bytes, since in C and C++ "byte" is basically defined to mean "char", and it is not guaranteed that CHAR_BIT is 8 in C or C++. CHAR_BIT == 8 is guaranteed by both POSIX and Windows.
Don't know about C++. In C you have
Annex E
(informative)
Implementation limits
[#1] The contents of the header are given below,
in alphabetical order. The minimum magnitudes shown shall
be replaced by implementation-defined magnitudes with the
same sign. The values shall all be constant expressions
suitable for use in #if preprocessing directives. The
components are described further in 5.2.4.2.1.
#define CHAR_BIT 8
#define CHAR_MAX UCHAR_MAX or SCHAR_MAX
#define CHAR_MIN 0 or SCHAR_MIN
#define INT_MAX +32767
#define INT_MIN -32767
#define LONG_MAX +2147483647
#define LONG_MIN -2147483647
#define LLONG_MAX +9223372036854775807
#define LLONG_MIN -9223372036854775807
#define MB_LEN_MAX 1
#define SCHAR_MAX +127
#define SCHAR_MIN -127
#define SHRT_MAX +32767
#define SHRT_MIN -32767
#define UCHAR_MAX 255
#define USHRT_MAX 65535
#define UINT_MAX 65535
#define ULONG_MAX 4294967295
#define ULLONG_MAX 18446744073709551615
So char <= short <= int <= long <= long long
and
CHAR_BIT * sizeof (char) >= 8
CHAR_BIT * sizeof (short) >= 16
CHAR_BIT * size of (int) >= 16
CHAR_BIT * sizeof (long) >= 32
CHAR_BIT * sizeof (long long) >= 64
Yes, C++ type sizes are inherited from C89.
I can't find the specification right now. But it's in the Bible.
Be aware that the guaranteed ranges of these types are one less wide than on most machines:
signed char -127 ... +127 guranteed but most twos complement machines have -128 ... + 127
Likewise for the larger types.
There are several inaccuracies in what you read. These inaccuracies were either present in the source, or maybe you remembered it all incorrectly.
Firstly, a pedantic remark about one peculiar difference between C and C++. C language does not make any guarantees about the relative sizes of integer types (in bytes). C language only makes guarantees about their relative ranges. It is true that the range of int is always at least as large as the range of short and so on. However, it is formally allowed by C standard to have sizeof(short) > sizeof(int). In such case the extra bits in short would serve as padding bits, not used for value representation. Obviously, this is something that is merely allowed by the legal language in the standard, not something anyone is likely to encounter in practice.
In C++ on the other hand, the language specification makes guarantees about both the relative ranges and relative sizes of the types, so in C++ in addition to the above range relationship inherited from C it is guaranteed that sizeof(int) is greater or equal than sizeof(short).
Secondly, the C language standard guarantees minimum range for each integer type (these guarantees are present in both C and C++). Knowing the minimum range for the given type, you can always say how many value-forming bits this type is required to have (as minimum number of bits). For example, it is true that type long is required to have at least 32 value-forming bits in order to satisfy its range requirements. If you want to recalculate that into bytes, it will depend on what you understand under the term byte. If you are talking specifically about 8-bit bytes, then indeed type long will always consist of at least four 8-bit bytes. However, that does not mean that sizeof(long) is always at least 4, since in C/C++ terminology the term byte refers to char objects. char objects are not limited to 8-bits. It is quite possible to have 32-bit char type in some implementation, meaning that sizeof(long) in C/C++ bytes can legally be 1, for example.
The C standard do not explicitly say that long has to be at least 4 bytes, but they do specify a minimum range for the different integral types, which implies a minimum size.
For example, the minimum range of an unsigned long is 0 to 4,294,967,295. You need at least 32 bits to represent every single number in that range. So yes, the standard guarantee (indirectly) that a long is at least 32 bits.
C++ inherits the data types from C, so you have to go look at the C standard. The C++ standard actually references to parts of the C standard in this case.
Just be careful about the fact that some machines have chars that are more than 8 bits. For example, IIRC on the TI C5x, a long is 32 bits, but sizeof(long)==2 because chars, shorts and ints are all 16 bits with sizeof(char)==1.

Relation between word length, character size, integer size and byte

What is the relation between word length, character size, integer size, and byte in C++?
The standard requires that certain types have minimum sizes (short is at least 16 bits, int is at least 16 bits, etc), and that some groups of type are ordered (sizeof(int) >= sizeof(short) >= sizeof(char)).
In C++ a char must be large enough to hold any character in the implemetation's basic character set.
int has the "natural size suggested by the architecture of the execution environment". Note that this means that an int does not need to be at least 32-bits in size. Implementations where int is 16 bits are common (think embedded ot MS-DOS).
The following are taken from various parts of the C++98 and C99 standards:
long int has to be at least as large as int
int has to be at least as large as short
short has to be at least as large as char
Note that they could all be the same size.
Also (assuming a two's complement implementation):
long int has to be at least 32-bits
int has to be at least 16-bits
short has to be at least 16-bits
char has to be at least 8 bits
The Standard doesn't know this "word" thingy used by processors. But it says the type "int" should have the natural size for a execution environment. But even for 64 bit environments, int is usually only 32 bits. So "word" in Standard terms has pretty much no common meaning (except for the common English "word" of course).
Character size is the size of a character. Depends on what character you talk about. Character types are char, unsigned char and signed char. Also wchar_t is used to store characters that can have any size (determined by the implementation - but must use one of the integer types as its underlying type. Much like enumerations), while char/signed char or unsigned char has to have one byte. That means that one byte has as much bits as one char has. If an implementation says one object of type char has 16 bits, then a byte has 16 bits too.
Now a byte is the size that one char occupies. It's a unit, not some specific type. There is not much more about it, just that it is the unit that you can access memory. I.e you do not have pointer access to bit-fields, but you have access to units starting at one byte.
"Integer size" now is pretty wide. What do you mean? All of bool, char, short, int, long and their unsinged counterparts are integers. Their range is what i would call "integer size" and it is documented in the C standard - taken over by the C++ Standard. For signed char the range is from -127 <-> 127, for short and int it is the same and is -2^15+1 <-> 2^15-1 and for long it is -2^31+1 <-> 2^31-1. Their unsigned counterparts range from 0 up to 2^8-1, 2^16-1 and 2^32-1 respectively. Those are however minimal sizes. That is, an int may not have maximal size 2^14 on any platform, because that is less than 2^15-1 of course. It follows for those values that a minimum of bits is required. For char that is 8, for short/int that is 16 and for long that is 32. Two's-complement representation for negative numbers is not required, which is why the negative value is not -128 instead of -127 for example for signed char.
Standard C++ doesn't have a datatype called word or byte. The rest are well defined as ranges. The base is a char which has of CHAR_BITS bits. The most commonly used value of CHAR_BITS is 8.
sizeof( char ) == 1 ( one byte ) (in c++, in C - not specified)
sizeof( int ) >= sizeof( char )
word - not c++ type, usualy in computer architecture it mean 2 bytes
Kind of depends on what you mean by relation. The size of numeric types is generally a multiple of the machine word size. A byte is a byte is a byte -- 8 bits, no more, no less. A character is defined in the standard as a single unsigned byte I believe (check your ARM for details).
The general rule is, don't make any assumptions about the actual size of data types. The standard specifies relationships between the types such as a "long" integer will be either the same size or larger than an "int". Individual implementations of the language will pick specific sizes for the types that are convenient for them. For example, a compiler for a 64-bit processor will choose different sizes than a compiler for a 32-bit processor.
You can use the sizeof() operator to examine the specific sizes for the compiler you are using (on the specific target architecture).