Using constants and their associated modifiers using gcc - c++

I was not sure what to call these flags, but what I am referring to is:
#define TEST_DEF 50000U //<- the "U" here
Google searching when you are not familiar with the jargon used to describe your question is futile.
What I am trying to do is use these constant definitions and make sure the value is only of a certain length, namely 8 or 16 bits.
How can I do this and what is it referred to as?

For integers, the section of the standard (ISO/IEC 9899:2011 — aka C2011 or C11) defining these suffixes is:
§6.4.4.1 Integer constants
Where it defines the integer-suffixes:
integer-suffix:
unsigned-suffix long-suffixopt
unsigned-suffix long-long-suffix
long-suffix unsigned-suffixopt
long-long-suffix unsigned-suffixopt
unsigned-suffix: one of
u U
long-suffix: one of
l L
long-long-suffix: one of
ll LL
The corresponding suffixes for floating point numbers are f, F, l and L (for float and long double).
Note that it would be perverse to use l because it is far too easily confused with 1, so the qualifiers are most often written with upper-case letters.
If you want to create integer literals that are of a given size, then the facilities to do so are standardized by <stdint.h> (added in C99).
The header (conditionally) defines fixed-size types such as int8_t and uint16_t. It also (unconditionally) provides minimum-sized types such as int_least8_t and uint_least16_t. If it cannot provide exact types (perhaps because the word size is 36 bits, so sizes 9, 18 and 36 are handled), it can still provide the least types.
It also provide macros such as INT8_C which ensure that the argument is an int_least8_t value.
Hence, you could use:
#include <stdint.h>
#define TEST_DEF UINT16_C(50000)
and you are guaranteed that the value will be at least 16 bits of unsigned integer, and formatted/qualified correctly.
§7.20.4 Macros for integer constants
¶1 The following function-like macros expand to integer constants suitable for initializing
objects that have integer types corresponding to types defined in <stdint.h>. Each
macro name corresponds to a similar type name in 7.20.1.2 or 7.20.1.5.
¶2 The argument in any instance of these macros shall be an unsuffixed integer constant (as
defined in 6.4.4.1) with a value that does not exceed the limits for the corresponding type.
¶3 Each invocation of one of these macros shall expand to an integer constant expression
suitable for use in #if preprocessing directives. The type of the expression shall have
the same type as would an expression of the corresponding type converted according to
the integer promotions. The value of the expression shall be that of the argument.
7.20.4.1 Macros for minimum-width integer constants
¶1 The macro INTN_C(value) shall expand to an integer constant expression
corresponding to the type int_leastN_t. The macro UINTN_C(value) shall expand
to an integer constant expression corresponding to the type uint_leastN_t. For
example, if uint_least64_t is a name for the type unsigned long long int,
then UINT64_C(0x123) might expand to the integer constant 0x123ULL.

There are five integer literal suffixes in C: u, l, ul, ll, and ull. Unlike nearly everything else in C they are case insensitive; also, ul and ull can be written as lu and llu respectively (however, lul is not acceptable).
They control the type of the constant. They work approximately like this:
literal │ type
────────┼───────────────────────
500 │ int
500u │ unsigned int
500l │ long int
500ul │ unsigned long int
500ll │ long long int
500ull │ unsigned long long int
This is only an approximation, because if the constant is too large for the indicated type, it is "promoted" to a larger type. The rules for this are sufficiently complicated that I'm not going to try to describe them. The rules for "promoting" hexadecimal and octal literals are slightly different than the rules for "promoting" decimal literals, and they are also slightly different in C99 versus C90 and different again in C++.
Because of the promotion effect, is not possible to use these suffixes to limit constants to any size. If you write 281474976710656 on a system where int and long are both 32 bits wide, the constant will be given type long long even though you didn't say to do that. Moreover, there are no suffixes to force a constant to have type short nor char. You can indicate your intent with the [U]INT{8,16,32,64,MAX}_C macros from <stdint.h>, but those do not impose any upper limit either, and on all systems I can conveniently get at right now (OSX, Linux), *INT8_C and *INT16_C actually produce values with type (unsigned) int.
Your compiler may, but is not required to, warn if you write ((uint8_t) 512) or similar (where 512 is a compile-time constant value outside the range of the type. In C11 you can use static_assert (from <assert.h>) to force the issue but it might be a bit tedious to write.

This is an unsigned literal (U is suffix). See: http://en.cppreference.com/w/cpp/language/integer_literal

Related

How compilers identify the length of byte shift operators

Consider the following line:
int mask = 1 << shift_amount;
we know that mask is 4 bytes because it was explicitly declared int, but this 1 that to be shifted has unknown length. If the compiler chose type as char it would be 8 bits, or it could be unsigned short with size 16 bits, so shifting result will really depend on the size of the compiler's decision about how to treat that 1. How does the compiler decide here? And is it safe to leave the code this way or should it instead be:
int flag = 1;
int mask = flag << shift_amount;
1 is an int (typically 4 bytes). If you wanted it to be a type other than int you'd use a suffix, like 1L for long. For more details see https://en.cppreference.com/w/cpp/language/integer_literal.
You can also use a cast like (long)1 or if you want a known fixed length, (int32_t)1.
As Eric Postpischil points out in a comment, values smaller than int like (short)1 are not useful because the left-hand argument to << is promoted to int anyway.
The 2018 C standard says in 6.4.4 3:
Each constant has a type, determined by its form and value, as detailed later.
This means we can always tell what the type of a constant is just from the text of the constant itself, without regard to the expression it appears in. (Here, “constant” actually means a literal: A thing whose value is given by its text. For example 34 and 'A' literally represent the number 34 and the character A, in contrast to an identifier foo that refers to some object.)
(This answer addresses C specifically. The rules described below are different in C++.)
The subclauses of 6.4.4 detail the various kinds of constants (integers, floating-point, enumerations, and characters). An integer constant without a suffix that can be represented in an int is an int, so 1 is an int.
If an integer constant has a suffix or does not fit in an int, then its type is affected by its suffix, its value, and whether it is decimal, octal, or hexadecimal, according to a table in 6.4.4.1 5.
Floating-point constants are double if they have no suffix, float with f or F, and long double with l or L.
Enumeration constants (declared with enum) have type int. (And these are not directly literals as I describe above, because they are names for values, but the name does indicate the value by way of the enum declaration.)
Character constants without a prefix have type int. Constants with prefixes L, u, or U have type wchar_t, char16_t, or char32_t, respectively.

What type is used in C++ to define an array size?

Compiling some test code in avr-gcc for an 8-bit micro-controller, the line
const uint32_t N = 65537;
uint8_t values[N];
I got the following compilation warning (by default should be an error, really)
warning: conversion from 'long unsigned int' to 'unsigned int' changes value from '65537' to '1' [-Woverflow]
uint8_t values[N];
Note that when compiling for this target, sizeof(int) is 2.
So it seems that, at an array size cannot exceed the size of an unsigned int.
Am I correct? Is this GCC-specific or is it part of some C or C++ standard?
Before somebody remarks that an 8-bit microcontroller generally does not have enough memory for an array so large, let me just anticipate saying that this is beside the point.
size_t is considered as the type to use, despite not being formally ratified by either the C or C++ standards.
The rationale for this is that the sizeof(values) will be that type (that is mandatated by the C and C++ standards), and the number of elements will be necessarily not greater than this since sizeof for an object is at least 1.
So it seems that, at an array size cannot exceed the size of an
unsigned int.
That seems to be the case in your particular C[++] implementation.
Am I correct? Is this gcc-specific or is it part of some C or C++
standard?
It is not a characteristic of GCC in general, nor is it specified by either the C or C++ standard. It is a characteristic of your particular implementation: a version of GCC for your specific computing platform.
The C standard requires the expression designating the number of elements of an array to have an integer type, but it does not specify a particular one. I do think it's strange that your GCC seems to claim it's giving you an array with a different number of elements than you specified. I don't think that conforms to the standard, and I don't think it makes much sense as an extension. I would prefer to see it reject the code instead.
I'll dissect the issue with the rules in the "incorrekt and incomplet" ISO CPP standard draft n4659. Emphasis is added by me.
11.3.4 defines array declarations. Paragraph one contains
If the constant-expression [between the square brackets] (8.20) is present, it shall be a converted constant expression of type std::size_t [...].
std::size_t is from <cstddef>and defined as
[...] an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object.
Since it is imported via the C standard library headers the C standard is relevant for the properties of size_t. The ISO C draft N2176 prescribes in 7.20.3 the "minimal maximums", if you want, of integer types. For size_t that maximum is 65535. In other words, a 16 bit size_t is entirely conformant.
A "converted constant expression" is defined in 8.20/4:
A converted constant expression of type T is an expression, implicitly converted to type T, where the converted expression is a constant expression and the implicit conversion sequence contains only [any of 10 distinct conversions, one of which concerns integers (par. 4.7):]
— integral conversions (7.8) other than narrowing conversions (11.6.4)
An integral conversion (as opposed to a promotion which changes the type to equivalent or larger types) is defined as follows (7.8/3):
A prvalue of an integer type can be converted to a prvalue of another integer type.
7.8/5 then excludes the integral promotions from the integral conversions. This means that the conversions are usually narrowing type changes.
Narrowing conversions (which, as you'll remember, are excluded from the list of allowed conversions in converted constant expressions used for array sizes) are defined in the context of list-initialization, 11.6.4, par. 7
A narrowing conversion is an implicit conversion
[...]
7.31 — from an integer type [...] to an integer type that cannot represent all the values of the original type, except where the source is a constant expression whose value after integral promotions will fit into the target type.
This is effectively saying that the effective array size must be the constant value at display, which is an entirely reasonable requirement for avoiding surprises.
Now let's cobble it all together. The working hypothesis is that std::size_t is a 16 bit unsigned integer type with a value range of 0..65535. The integer literal 65537 is not representable in the system's 16 bit unsigned int and thus has type long. Therefore it will undergo an integer conversion. This will be a narrowing conversion because the value is not representable in the 16 bit size_t2, so that the exception condition in 11.6.4/7.3, "value fits anyway", does not apply.
So what does this mean?
11.6.4/3.11 is the catch-all rule for the failure to produce an initializer value from an item in an intializer list. Because the initializer-list rules are used for array sizes, we can assume that the catch-all for conversion failure applies to the array size constant:
(3.11) — Otherwise, the program is ill-formed.
A conformant compiler is required to produce a diagnostic, which it does. Case closed.
1 Yes, they sub-divide paragraphs.
2 Converting an integer value of 65537 (in whatever type can hold the number — here probably a `long) to a 16 bit unsigned integer is a defined operation. 7.8/2 details:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source
integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s
complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is
no truncation). —end note ]
The binary representation of 65537 is 1_0000_0000_0000_0001, i.e. only the least significant bit of the lower 16 bits is set. The conversion to a 16 bit unsigned value (which circumstantial evidence indicates size_t is) computes the [expression value] modulo 2^16, i.e. simply takes the lower 16 bits. This results in the value of 1 mentioned in the compiler diagnostics.
In your implementation size_t is defined as unsigned int and uint32_t is defined as a long unsigned int. When you create a C array the argument for the array size gets implicitly converted to size_t by the compiler.
This is why you're getting a warning. You're specifying the array size argument with an uint32_t that gets converted to size_t and these types don't match.
This is probably not what you want. Use size_t instead.
The value returned by sizeof will be of type size_t.
It is generally used as the number of elements in an array, because it will be of sufficient size. size_t is always unsigned but it is implementation-defined which type this is. Lastly, it is implementation-defined whether the implementation can support objects of even SIZE_MAX bytes... or even close to it.
[This answer was written when the question was tagged with C and C++. I have not yet re-examined it in light of OP’s revelation they are using C++ rather than C.]
size_t is the type the C standard designates for working with object sizes. However, it is not a cure-all for getting sizes correct.
size_t should be defined in the <stddef.h> header (and also in other headers).
The C standard does not require that expressions for array sizes, when specified in declarations, have the type size_t, nor does it require that they fit in a size_t. It is not specified what a C implementation ought to do when it cannot satisfy a request for an array size, especially for variable length arrays.
In your code:
const uint32_t N = 65537;
uint8_t values[N];
values is declared as a variable length array. (Although we can see the value of N could easily be known at compile time, it does not fit C’s definition of a constant expression, so uint8_t values[N]; qualifies as a declaration of a variable length array.) As you observed, GCC warns you that the 32-bit unsigned integer N is narrowed to a 16-bit unsigned integer. This warning is not required by the C standard; it is a courtesy provided by the compiler. More than that, the conversion is not required at all—since the C standard does not specify the type for an array dimension, the compiler could accept any integer expression here. So the fact that it has inserted an implicit conversion to the type it needs for array dimensions and warned you about it is a feature of the compiler, not of the C standard.
Consider what would happen if you wrote:
size_t N = 65537;
uint8_t values[N];
Now there would be no warning in uint8_t values[N];, as a 16-bit integer (the width of size_t in your C implementation) is being used where a 16-bit integer is needed. However, in this case, your compiler likely warns in size_t N = 65537;, since 65537 will have a 32-bit integer type, and a narrowing conversion is performed during the initialization of N.
However, the fact that you are using a variable length array suggests you may be computing array sizes at run-time, and this is only a simplified example. Possibly your actual code does not use constant sizes like this; it may calculate sizes during execution. For example, you might use:
size_t N = NumberOfGroups * ElementsPerGroup + Header;
In this case, there is a possibility that the wrong result will be calculated. If the variables all have type size_t, the result may easily wrap (effectively overflow the limits of the size_t type). In this case, the compiler will not give you any warning, because the values are all the same width; there is no narrowing conversion, just overflow.
Therefore, using size_t is insufficient to guard against errors in array dimensions.
An alternative is to use a type you expect to be wide enough for your calculations, perhaps uint32_t. Given NumberOfGroups and such as uint32_t types, then:
const uint32_t N = NumberOfGroups * ElementsPerGroup + Header;
will produce a correct value for N. Then you can test it at run-time to guard against errors:
if ((size_t) N != N)
Report error…
uint8_t values[(size_t) N];

Integer overflow with UDL (user defined literal) for __int128 # min negative value

For clarity and simplicity I will shorten the following numbers as follows:
−170,141,183,460,469,231,731,687,303,715,884,105,728 as -170…728
170,141,183,460,469,231,731,687,303,715,884,105,727 as 170…727
These numbers represent the minimum and maximum values of an 128–bit signed integer (__int128 in gcc).
I implemented user–defined literals (raw literals) for this data type since gcc doesn’t offer a way of defining constants of this type: _u128 for unsigned __int128 and _i128 for __int128.
The minus character is not part of the UDL, but a unary minus operator applied to the result of the UDL.
So for a -ddddd_i128 (where d is a digit) the UDL computes a signed __int128 with the positive value ddddd and then the compiler will apply the unary minus operator to it. So far so good.
The problem is with -170…128_i128 (which should be a valid value for __int128):
the UDL computes the signed __int128 positive number 170…128 which is just outside of the range of __int128, resulting in Undefined Behavior (signed integer overflow).
Any solution to represent this number constant with a UDL?
My UDL’s are declared (just a non-constexpr, loopy version for now) (they are raw literals):
unsigned __int128 operator"" _u128(char const *str);
__int128 operator"" _i128(char const *str);
Some usages:
1000000000000000000000000000000000_i128
-1000000000000000000000000000000000_i128
-170141183460469231731687303715884105728_i128 // <-- this has UB
170141183460469231731687303715884105727_u128
340282366920938463463374607431768211455_u128
I know that there are ways of defining the constant -170…728 with various ways, like bit shifts, mathematical operations, but I want to be able to create it in a consistent way, e.g. I don’t want this situation: you can create any constant using this UDL, except for -170…728_i128, for which you have to use extra tricks.
This is essentially the same problem that implementors have when implementing <limits.h>: INT_MIN cannot be defined (on a typical 32-bit system) as -2147483648. It can be (and commonly is) defined as (-2147483647 - 1) instead. You'll have to do something similar. There may not be any way to represent the most negative number with a single negation operator and literal, but that's okay: there is simply no need for it.

Constant enum size no matter the number of enumerated values

Why is the size of an enum always 2 or 4 bytes (on a 16- or 32-bit architecture respectively), regardless of the number of enumerators in the type?
Does the compiler treat an enum like it does a union?
In both C and C++, the size of an enum type is implementation-defined, and is the same as the size of some integer type.
A common approach is to make all enum types the same size as int, simply because that's typically the type that makes for the most efficient access. Making it a single byte, for example, would save a very minor amount of space, but could require bigger and slower code to access it, depending on the CPU architecture.
In C, enumeration constants are by definition of type int. So given:
enum foo { zero, one, two };
enum foo obj;
the expression zero is of type int, but obj is of type enum foo, which may or may not have the same size as int. Given that the constants are of type int, it tends to be easier to make the enumerated type the same size.
In C++, the rules are different; the constants are of the enumerated type. But again, it often makes the most sense for each enum type to be one "word", which is typically the size of int, for efficiency reasons.
And the 2011 ISO C++ standard added the ability to specify the underlying integer type for an enum type. For example, you can now write:
enum foo: unsigned char { zero, one, two };
which guarantees that both the type foo and the constants zero, one, and two have a size of 1 byte. C does not have this feature, and it's not supported by older pre-2011 C++ compilers (unless they provide it as a language extension).
(Digression follows.)
So what if you have an enumeration constant too big to fit in an int? You don't need 231, or even 215, distinct constants to do this:
#include <limits.h>
enum huge { big = INT_MAX, bigger };
The value of big is INT_MAX, which is typically 231-1, but can be as small as 215-1 (32767). The value of bigger is implicitly big + 1.
In C++, this is ok; the compiler will simply choose an underlying type for huge that's big enough to hold the value INT_MAX + 1. (Assuming there is such a type; if int is 64 bits and there's no integer type bigger than that, that won't be possible.)
In C, since enumeration constants are of type int, the above is invalid. It violates the constraint stated in N1570 6.7.2.2p2:
The expression that defines the value of an enumeration constant shall
be an integer constant expression that has a value representable as an
int.
and so a compiler must reject it, or at least warn about it. gcc, for example, says:
error: overflow in enumeration values
An enum is not a structure, it's just a way of giving names to a set of integers. The size of a variable with this type is just the size of the underlying integer type. This will be a type needed to hold the largest value in the enum. So as long as all the types fit in the same integer type, the size won't change.
The size of an enum is implementation-defined -- the compiler is allowed to choose whatever size it wants, as long as it's large enough to fit all of the values. Some compilers choose to use 4-byte enums for all enum types, while some compilers will choose the smallest type (e.g. 1, 2, or 4 bytes) which can fit the enum values. The C and C++ language standards allow both of these behaviors.
From C99 §6.7.2.2/4:
Each enumerated type shall be compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined,110) but shall be
capable of representing the values of all the members of the enumeration.
From C++03 §7.2/5:
The underlying type of an enumeration is an integral type that can represent all the enumerator values
defined in the enumeration. It is implementation-defined which integral type is used as the underlying type
for an enumeration except that the underlying type shall not be larger than int unless the value of an enumerator
cannot fit in an int or unsigned int. If the enumerator-list is empty, the underlying type is
as if the enumeration had a single enumerator with value 0. The value of sizeof() applied to an enumeration
type, an object of enumeration type, or an enumerator, is the value of sizeof() applied to the
underlying type.
It seems to me that the OP has assumed that an enum is some kind of collection which stores the values declared in it. This is incorrect.
An enumeration in C/C++ is simply a numeric variable with strictly defined value range. The names of the enum are kind of aliases for numbers.
The storage size is not influenced by the amount of the values in enumeration. The storage size is implementation defined, but mostly it is the sizeof(int).
The size of an enum is "an integral type at least large enough to contain any of the values specified in the declaration". Many compilers will just use an int (possibly unsigned), but some will use a char or short, depending on optimization or other factors. An enum with less than 128 possible values would fit in a char (256 for unsigned char), and you would have to have 32768 (or 65536) values to overflow a short, and either 2 or 4 billion values to outgrow an int on most modern systems.
An enum is essentially just a better way of defining a bunch of different constants. Instead of this:
#define FIRST 0
#define SECOND 1
...
you just:
enum myenum
{ FIRST,
SECOND,
...
};
It helps avoid assigning duplicate values by mistake, and removes your need to even care what the particular values are (unless you really need to).
The big problem with making an enum type smaller than int when a smaller type could fit all the values is that it would make the ABI for a translation unit dependent on the number of enumeration constants. For instance, suppose you have a library that uses an enum type with 256 constants as part of its public interface, and the compiler chooses to represent the type as a single byte. Now suppose you add a new feature to the library and now need 257 constants. The compiler would have to switch to a new size/representation, and now all object files compiled for the old interface would be incompatible with your updated library; you would have to recompile everything to make it work again.
Thus, any sane implementation always uses int for enum types.

Unsigned vs signed range guarantees

I've spent some time poring over the standard references, but I've not been able to find an answer to the following:
is it technically guaranteed by the C/C++ standard that, given a signed integral type S and its unsigned counterpart U, the absolute value of each possible S is always less than or equal to the maximum value of U?
The closest I've gotten is from section 6.2.6.2 of the C99 standard (the wording of the C++ is more arcane to me, I assume they are equivalent on this):
For signed integer types, the bits of the object representation shall be divided into three
groups: value bits, padding bits, and the sign bit. (...) Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and Nin the unsigned type, then M≤N).
So, in hypothetical 4-bit signed/unsigned integer types, is anything preventing the unsigned type to have 1 padding bit and 3 value bits, and the signed type having 3 value bits and 1 sign bit? In such a case the range of unsigned would be [0,7] and for signed it would be [-8,7] (assuming two's complement).
In case anyone is curious, I'm relying at the moment on a technique for extracting the absolute value of a negative integer consisting of first a cast to the unsigned counterpart, and then the application of the unary minus operator (so that for instance -3 becomes 4 via cast and then 3 via unary minus). This would break on the example above for -8, which could not be represented in the unsigned type.
EDIT: thanks for the replies below Keith and Potatoswatter. Now, my last point of doubt is on the meaning of "subrange" in the wording of the standard. If it means a strictly "less-than" inclusion, then my example above and Keith's below are not standard-compliant. If the subrange is intended to be potentially the whole range of unsigned, then they are.
For C, the answer is no, there is no such guarantee.
I'll discuss types int and unsigned int; this applies equally to any corresponding pair of signed and unsigned types (other than char and unsigned char, neither of which can have padding bits).
The standard, in the section you quoted, implicitly guarantees that UINT_MAX >= INT_MAX, which means that every non-negative int value can be represented as an unsigned int.
But the following would be perfectly legal (I'll use ** to denote exponentiation):
CHAR_BIT == 8
sizeof (int) == 4
sizeof (unsigned int) == 4
INT_MIN = -2**31
INT_MAX = +2**31-1
UINT_MAX = +2**31-1
This implies that int has 1 sign bit (as it must) and 31 value bits, an ordinary 2's-complement representation, and unsigned int has 31 value bits and one padding bit. unsigned int representations with that padding bit set might either be trap representations, or extra representations of values with the padding bit unset.
This might be appropriate for a machine with support for 2's-complement signed arithmetic, but poor support for unsigned arithmetic.
Given these characteristics, -INT_MIN (the mathematical value) is outside the range of unsigned int.
On the other hand, I seriously doubt that there are any modern systems like this. Padding bits are permitted by the standard, but are very rare, and I don't expect them to become any more common.
You might consider adding something like this:
#if -INT_MIN > UINT_MAX
#error "Nope"
#endif
to your source, so it will compile only if you can do what you want. (You should think of a better error message than "Nope", of course.)
You got it. In C++11 the wording is more clear. §3.9.1/3:
The range of non-negative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the value representation of each corresponding signed/unsigned type shall be the same.
But, what really is the significance of the connection between the two corresponding types? They are the same size, but that doesn't matter if you just have local variables.
In case anyone is curious, I'm relying at the moment on a technique for extracting the absolute value of a negative integer consisting of first a cast to the unsigned counterpart, and then the application of the unary minus operator (so that for instance -3 becomes 4 via cast and then 3 via unary minus). This would break on the example above for -8, which could not be represented in the unsigned type.
You need to deal with whatever numeric ranges the machine supports. Instead of casting to the unsigned counterpart, cast to whatever unsigned type is sufficient: one larger than the counterpart if necessary. If no large enough type exists, then the machine may be incapable of doing what you want.