Consider the following code to set all bits of x
unsigned int x = -1;
Is this portable ? It seems to work on at least Visual Studio 2005-2010
The citation-heavy answer:
I know there are plenty of correct answers in here, but I'd like to add a few citations to the mix. I'll cite two standards: C99 n1256 draft (freely available) and C++ n1905 draft (also freely available). There's nothing special about these particular standards, they're just both freely available and whatever happened to be easiest to find at the moment.
The C++ version:
§5.3.2 ¶9: According to this paragraph, the value ~(type)0 is guaranteed to have all bits set, if (type) is an unsigned type.
The operand of ~ shall have integral or enumeration type; the result is the one’s complement of its operand. Integral promotions are performed. The type of the result is the type of the promoted operand.
§3.9.1 ¶4: This explains how overflow works with unsigned numbers.
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.
§3.9.1 ¶7, plus footnote 49: This explains that numbers must be binary. From this, we can infer that ~(type)0 must be the largest number representable in type (since it has all bits turned on, and each bit is additive).
The representations of integral types shall define values by use of a pure
binary numeration system49.
49) A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin
with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position. (Adapted from the American National
Dictionary for Information Processing Systems.)
Since arithmetic is done modulo 2n, it is guaranteed that (type)-1 is the largest value representable in that type. It is also guaranteed that ~(type)0 is the largest value representable in that type. They must therefore be equal.
The C99 version:
The C99 version spells it out in a much more compact, explicit way.
§6.5.3 ¶3:
The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,
each bit in the result is set if and only if the corresponding bit in the converted operand is
not set). The integer promotions are performed on the operand, and the result has the
promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent
to the maximum value representable in that type minus E.
As in C++, unsigned arithmetic is guaranteed to be modular (I think I've done enough digging through standards for now), so the C99 standard definitely guarantees that ~(type)0 == (type)-1, and we know from §6.5.3 ¶3 that ~(type)0 must have all bits set.
The summary:
Yes, it is portable. unsigned type x = -1; is guaranteed to have all bits set according to the standard.
Footnote: Yes, we are talking about value bits and not padding bits. I doubt that you need to set padding bits to one, however. You can see from a recent Stack Overflow question (link) that GCC was ported to the PDP-10 where the long long type has a single padding bit. On such a system, unsigned long long x = -1; may not set that padding bit to 1. However, you would only be able to discover this if you used pointer casts, which isn't usually portable anyway.
Apparently it is:
(4.7) If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2n where n is
the number of bits used to represent the unsigned type). [Note: In a
two’s complement representation, this conversion is conceptual and
there is no change in the bit pattern (if there is no truncation).
It is guaranteed to be the largest amount possible for that type due to the properties of modulo.
C99 also allows it:
Otherwise, if the newtype is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that
can be represented in the newtype until the value is in the range of
the newtype. 49)
Which wold also be the largest amount possible.
Largest amount possible may not be all bits set. Use ~static_cast<unsigned int>(0) for that.
I was sloppy in reading the question, and made several comments that might be misleading because of that. I'll try to clear up the confusion in this answer.
The declaration
unsigned int x = -1;
is guaranteed to set x to UINT_MAX, the maximum value of type unsigned int. The expression -1 is of type int, and it's implicitly converted to unsigned int. The conversion (which is defined in terms of values, not representations) results in the maximum value of the target unsigned type.
(It happens that the semantics of the conversion are optimized for two's-complement systems; for other schemes, the conversion might involve something more than just copying the bits.)
But the question referred to setting all bits of x. So, is UINT_MAX represented as all-bits-one?
There are several possible representations for signed integers (two's-complement is most common, but ones'-complement and sign-and-magnitude are also possible). But we're dealing with an unsigned integer type, so the way that signed integers are represented is irrelevant.
Unsigned integers are required to be represented in a pure binary format. Assuming that all the bits of the representation contribute to the value of an unsigned int object, then yes, UINT_MAX must be represented as all-bits-one.
On the other hand, integer types are allowed to have padding bits, bits that don't contribute to the representation. For example, it's legal for unsigned int to be 32 bits, but for only 24 of those bits to be value bits, so UINT_MAX would be 2*24-1 rather than 2*32-1. So in the most general case, all you can say is that
unsigned int x = -1;
sets all the value bits of x to 1.
In practice, very very few systems have padding bits in integer types. So on the vast majority of systems, unsigned int has a size of N bits, and a maximum value of 2**N-1, and the above declaration will set all the bits of x to 1.
This:
unsigned int x = ~0U;
will also set x to UINT_MAX, since bitwise complement for unsigned types is defined in terms of subtraction.
Beware!
This is implementation-defined, as how a negative integer shall be represented, whether two's complement or what, is not defined by the C++ Standard. It is up to the compiler which makes the decision, and has to document it properly.
In short, it is not portable. It may not set all bits of x.
Related
Suppose I assign an eleven digits number to an int, what will happen? I played around with it a little bit and I know it's giving me some other numbers within the int range. How is this new number created?
It is implementation-defined behaviour. This means that your compiler must provide documentation saying what happens in this scenario.
So, consult that documentation to get your answer.
A common way that implementations define it is to truncate the input integer to the number of bits of int (after reinterpreting unsigned as signed if necessary).
C++14 Standard references: [expr.ass]/3, [conv.integral]/3
In C++20, this behavior will still be implementation-defined1 but the requirements are much stronger than before. This is a side-consequence of the requirement to use two's complement representation for signed integer types coming in C++20.
This kind of conversion is specified by [conv.integral]:
A prvalue of an integer type can be converted to a prvalue of another integer type. [...]
[...]
Otherwise, the result is the unique value of the destination type that is congruent to the source integer modulo 2N, where N is the width of the destination type.
[...]
This behaviour is the same as truncating the representation of the number to the width of the integer type you are assigning to, e.g.:
int32_t u = 0x6881736df7939752;
...will kept the 32 right-most bits, i.e., 0xf7939752, and "copy" these bits to u. In two's complement, this corresponds to -141322414.
1 This will still be implementation-defined because the size of int is implementation-defined. If you assign to, e.g., int32_t, then the behaviour is fully defined.
Prior to C++20, there were two different rules for unsigned and signed types:
A prvalue of an integer type can be converted to a prvalue of another integer type. [...]
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [...]
If the destination type is signed, the value is unchanged if it can be represented in the destination type; otherwise, the value is implementation-defined.
INT_MAX on a 32 bit system is 2,147,483,647 (231 − 1), UINT_MAX is 4,294,967,295 (232 − 1).
int thing = 21474836470;
What happens is implementation-defined, it's up to the compiler. Mine appears to truncate the higher bits. 21474836470 is 0x4fffffff6,
warning: implicit conversion from 'long' to 'int' changes
value from 21474836470 to -10 [-Wconstant-conversion]
int thing = 21474836470;
The C and C++ standards both allow signed and unsigned variants of the same integer type to alias each other. For example, unsigned int* and int* may alias. But that's not the whole story because they clearly have a different range of representable values. I have the following assumptions:
If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?
If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?
If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?
Additionally, what about compatible but not equivalent integer types?
On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)
Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.
If an unsigned int is read through an int*, the value must be
within the range of int or an integer overflow occurs and the
behaviour is undefined. Is this correct?
Why would it be undefined? there is no integer overflow since no conversion or computation is done. We take an object representation of an unsigned int object and see it through an int. In what way the value of the unsigned int object transposes to the value of an int is completely implementation defined.
If an int is read through an unsigned int*, negative values wrap
around as if they were casted to unsigned int. Is this correct?
Depends on the representation. With two's complement and equivalent padding, yes. Not with signed magnitude though - a cast from int to unsigned is always defined through a congruence:
If the destination type is unsigned, the resulting value is the
least unsigned integer congruent to the source integer (modulo
2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this
conversion is conceptual and there is no change in the bit pattern (if
there is no truncation). — end note ]
And now consider
10000000 00000001 // -1 in signed magnitude for 16-bit int
This would certainly be 215+1 if interpreted as an unsigned. A cast would yield 216-1 though.
If the value is within the range of both int and unsigned int,
accessing it through a pointer of either type is fully defined and
gives the same value. Is this correct?
Again, with two's complement and equivalent padding, yes. With signed magnitude we might have -0.
On systems where int and long have the same range, alignment,
etc., can int* and long* alias? (I assume not.)
No. They are independent types.
Can char16_t* and uint_least16_t* alias?
Technically not, but that seems to be an unneccessary restriction of the standard.
Types char16_t and char32_t denote distinct types with the same
size, signedness, and alignment as uint_least16_t and
uint_least32_t, respectively, in <cstdint>, called the underlying
types.
So it should be practically possible without any risks (since there shouldn't be any padding).
If an int is read through an unsigned int*, negative values wrap around as if they were casted to unsigned int. Is this correct?
For a system using two's complement, type-punning and signed-to-unsigned conversion are equivalent, for example:
int n = ...;
unsigned u1 = (unsigned)n;
unsigned u2 = *(unsigned *)&n;
Here, both u1 and u2 have the same value. This is by far the most common setup (e.g. Gcc documents this behaviour for all its targets). However, the C standard also addresses machines using ones' complement or sign-magnitude to represent signed integers. In such an implementation (assuming no padding bits and no trap representations), the result of a conversion of an integer value and type-punning can yield different results. As an example, assume sign-magnitude and n being initialized to -1:
int n = -1; /* 10000000 00000001 assuming 16-bit integers*/
unsigned u1 = (unsigned)n; /* 11111111 11111111
effectively 2's complement, UINT_MAX */
unsigned u2 = *(unsigned *)&n; /* 10000000 00000001
only reinterpreted, the value is now INT_MAX + 2u */
Conversion to an unsigned type means adding/subtracting one more than the maximum value of that type until the value is in range. Dereferencing a converted pointer simply reinterprets the bit pattern. In other words, the conversion in the initialization of u1 is a no-op on 2's complement machines, but requires some calculations on other machines.
If an unsigned int is read through an int*, the value must be within the range of int or an integer overflow occurs and the behaviour is undefined. Is this correct?
Not exactly. The bit pattern must represent a valid value in the new type, it doesn't matter if the old value is representable. From C11 (n1570) [omitted footnotes]:
6.2.6.2 Integer types
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that objects of that type shall be capable of representing values from 0 to 2N-1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; signed char shall not have any padding bits. There shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M≤N). If the sign bit is zero, it shall not affect the resulting value. If the sign bit is one, the value shall be modified in one of the following ways:
the corresponding value with sign bit 0 is negated (sign and magnitude);
the sign bit has the value -2M (two's complement);
the sign bit has the value -2M-1 (ones' complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones' complement), is a trap representation or a normal value. In the case of sign and magnitude and ones' complement, if this representation is a normal value it is called a negative zero.
E.g., an unsigned int could have value bits, where the corresponding signed type (int) has a padding bit, something like unsigned u = ...; int n = *(int *)&u; may result in a trap representation on such a system (reading of which is undefined behaviour), but not the other way round.
If the value is within the range of both int and unsigned int, accessing it through a pointer of either type is fully defined and gives the same value. Is this correct?
I think, the standard would allow for one of the types to have a padding bit, which is always ignored (thus, two different bit patterns can represent the same value and that bit may be set on initialization) but be an always-trap-if-set bit for the other type. This leeway, however, is limited at least by ibid. p5:
The values of any padding bits are unspecified. A valid (non-trap) object representation of a signed integer type where the sign bit is zero is a valid object representation of the corresponding unsigned type, and shall represent the same value. For any integer type, the object representation where all the bits are zero shall be a representation of the value zero in that type.
On systems where int and long have the same range, alignment, etc., can int* and long* alias? (I assume not.)
Sure they can, if you don't use them ;) But no, the following is invalid on such platforms:
int n = 42;
long l = *(long *)&n; // UB
Can char16_t* and uint_least16_t* alias? I suspect this differs between C and C++. In C, char16_t is a typedef for uint_least16_t (correct?). In C++, char16_t is its own primitive type, which compatible with uint_least16_t. Unlike C, C++ seems to have no exception allowing compatible but distinct types to alias.
I'm not sure about C++, but at least for C, char16_t is a typedef, but not necessarily for uint_least16_t, it could very well be a typedef of some implementation-specific __char16_t, some type incompatible with uint_least16_t (or any other type).
It is not defined that happens since the c standard does not exactly define how singed integers should be stored. so you can not rely on the internal representation. Also there does no overflow occur. if you just typecast a pointer nothing other happens then another interpretation of the binary data in the following calculations.
Edit
Oh, i misread the phrase "but not equivalent integer types", but i keep the paragraph for your interest:
Your second question has much more trouble in it. Many machines can only read from correctly aligned addresses there the data has to lie on multiples of the types width. If you read a int32 from a non-by-4-divisable address (because you casted a 2-byte int pointer) your CPU may crash.
You should not rely on the sizes of types. If you chose another compiler or platform your long and int may not match anymore.
Conclusion:
Do not do this. You wrote highly platform dependent (compiler, target machine, architecture) code that hides its errors behind casts that suppress any warnings.
Concerning your questions regarding unsigned int* and int*: if the
value in the actual type doesn't fit in the type you're reading, the
behavior is undefined, simply because the standard neglects to define
any behavior in this case, and any time the standard fails to define
behavior, the behavior is undefined. In practice, you'll almost always
obtain a value (no signals or anything), but the value will vary
depending on the machine: a machine with signed magnitude or 1's
complement, for example, will result in different values (both ways)
from the usual 2's complement.
For the rest, int and long are different types, regardless of their
representations, and int* and long* cannot alias. Similarly, as you
say, in C++, char16_t is a distinct type in C++, but a typedef in
C (so the rules concerning aliasing are different).
I can't find the exact specification of how int value is converted to unsigned long long in the standard. Various similar conversions, such as int -> unsigned, unsigned -> int (UB if negative), unsigned long long -> int, etc. are specified
For example GCC, -1 is converted to 0xffffffffffffffff, not to 0x00000000ffffffff. Can I rely on this behavior?
Yes, this is well defined, it is basically adding max unsigned long long + 1 to -1 which will always be max unsigned long long. This is covered in the draft C++ standard section 4.7 Integral conversions which says:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]
it does the same thing as C99 but the draft C99 standard is easier to understand, from section 6.3.1.3 Signed and unsigned integers:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)
where footnote 49 says:
The rules describe arithmetic on the mathematical value, not the value of a given type of expression.
Yes, it's defined:
C++11 § 4.7 [conv.integral]/2 says this:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type).
The least unsigned integer congruent to -1 (modulo 2sizeof(unsigned long long)) is the largest value of unsigned long long possible.
Unsigned integers have guaranteed modulo arithmetic. Thus any int value v is converted to the unsigned long value u such that u = K*2n+v, where K is either 0 or 1, and where n is the number of value representation bits for unsigned long. In other words, if v is negative, just add 2n.
The power of 2 follows from the C++ standard's requirement that integers be represented with a pure binary system. With n value representation bits the number of possible values is 2n. There is not such a requirement for floating point types (you can use std::numeric_limits to check the radix of the representation of floating point values).
Also note that in order to cater to some now archaic platforms, as well as one popular compiler that does things its own way, the standard leaves the opposite conversion as undefined behavior when the unsigned value is not directly representable as a signed value. In practice, on modern systems all compilers can be told to make that reverse conversion the exact opposite of the conversion to unsigned type, and e.g. Visual C++ does that by default. However, it's worth keeping in mind that there's no formal support, so that portable code incurs a slight (now with modern computers needless) inefficiency.
I know that the following
unsigned short b=-5u;
evaluates to b being 65531 due to an underflow, but I don't understand if 5u is converted to a signed int before being transformed into -5 and then re-converted back to unsigned to be stored in b or -5u is equal to 0 - 5u (this should not be the case, -x is a unary operator)
5u is a literal unsigned integer, -5u is its negation.. Negation for unsigned integers is defined as subtraction from 2**n, which gets the same result as wrapping the result of subtraction from zero.
5u is a single token, an rvalue expression which has type unsigned int.
The unary - operator is applied to it according to the rules of unsigned
arithmetic (arithmetic modulo 2^n, where n is the number of bits in the
unsigned type). The results are converted to unsigned short; if they don't
fit (and they won't if sizeof(int) > sizeof(short)), the conversion will
be done using modulo arithmetic as well (modulo 2^n, where n is the number of
bits in the target type).
It's probably worth noting that if the original argument has type unsigned
short, the actual steps are different (although the results will always be
the same). Thus, if you'd have written:
unsigned short k = 5;
unsigned short b = -k;
the first operation would depend on the size of short. If shorts are smaller
than ints (often, but not always the case), the first step would be to promote
k to an int. (If the size of short and int are identical, then the first
step would be to promote k to unsigned int; from then on, everything
happens as above.) The unary - would be applied to this int, according to
the rules of signed integer arithmetic (thus, resulting in a value of -5). The
resulting -5 will be implicitly converted to unsigned short, using modulo
arithmetic, as above.
In general, these distinctions don't make a difference, but in cases where you
might have an integral value of INT_MIN, they could; on 2's complement
machines, -i, where i has type int and value INT_MIN is implementation
defined, and may result in strange values when later converted to unsigned.
ISO/IEC 14882-2003 section 4.7 says:
"If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]
Technically not an underflow, just the representation of the signed value -5 when shown as an unsigned number. Note that signed and unsigned numbers have the same "bits" - they just display differently. If you were to print the value as a signed value [assuming it's extended using the sign bit to fill the remaining bits], it would show -5. [This assumes that it's a typical machine using 2s complement. The C standard doesn't require that signed and unsigned types are the same number of bits, nor that the computer uses 2s complement for representing signed numbers - obviously, if it's not using 2s complement, it won't match up to the value you've shown, so I made the assumption that yours IS a 2s complement machine - which is all common processors, such as x86, 68K, 6502, Z80, PDP-11, VAX, 29K, 8051, ARM, MIPS. But technically, it is not necessary for C to function correctly]
And when you use the unary operator -x, it has the same effect as 0-x [this applies for computers as well as math - it has the same result].
I've spent some time poring over the standard references, but I've not been able to find an answer to the following:
is it technically guaranteed by the C/C++ standard that, given a signed integral type S and its unsigned counterpart U, the absolute value of each possible S is always less than or equal to the maximum value of U?
The closest I've gotten is from section 6.2.6.2 of the C99 standard (the wording of the C++ is more arcane to me, I assume they are equivalent on this):
For signed integer types, the bits of the object representation shall be divided into three
groups: value bits, padding bits, and the sign bit. (...) Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and Nin the unsigned type, then M≤N).
So, in hypothetical 4-bit signed/unsigned integer types, is anything preventing the unsigned type to have 1 padding bit and 3 value bits, and the signed type having 3 value bits and 1 sign bit? In such a case the range of unsigned would be [0,7] and for signed it would be [-8,7] (assuming two's complement).
In case anyone is curious, I'm relying at the moment on a technique for extracting the absolute value of a negative integer consisting of first a cast to the unsigned counterpart, and then the application of the unary minus operator (so that for instance -3 becomes 4 via cast and then 3 via unary minus). This would break on the example above for -8, which could not be represented in the unsigned type.
EDIT: thanks for the replies below Keith and Potatoswatter. Now, my last point of doubt is on the meaning of "subrange" in the wording of the standard. If it means a strictly "less-than" inclusion, then my example above and Keith's below are not standard-compliant. If the subrange is intended to be potentially the whole range of unsigned, then they are.
For C, the answer is no, there is no such guarantee.
I'll discuss types int and unsigned int; this applies equally to any corresponding pair of signed and unsigned types (other than char and unsigned char, neither of which can have padding bits).
The standard, in the section you quoted, implicitly guarantees that UINT_MAX >= INT_MAX, which means that every non-negative int value can be represented as an unsigned int.
But the following would be perfectly legal (I'll use ** to denote exponentiation):
CHAR_BIT == 8
sizeof (int) == 4
sizeof (unsigned int) == 4
INT_MIN = -2**31
INT_MAX = +2**31-1
UINT_MAX = +2**31-1
This implies that int has 1 sign bit (as it must) and 31 value bits, an ordinary 2's-complement representation, and unsigned int has 31 value bits and one padding bit. unsigned int representations with that padding bit set might either be trap representations, or extra representations of values with the padding bit unset.
This might be appropriate for a machine with support for 2's-complement signed arithmetic, but poor support for unsigned arithmetic.
Given these characteristics, -INT_MIN (the mathematical value) is outside the range of unsigned int.
On the other hand, I seriously doubt that there are any modern systems like this. Padding bits are permitted by the standard, but are very rare, and I don't expect them to become any more common.
You might consider adding something like this:
#if -INT_MIN > UINT_MAX
#error "Nope"
#endif
to your source, so it will compile only if you can do what you want. (You should think of a better error message than "Nope", of course.)
You got it. In C++11 the wording is more clear. §3.9.1/3:
The range of non-negative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the value representation of each corresponding signed/unsigned type shall be the same.
But, what really is the significance of the connection between the two corresponding types? They are the same size, but that doesn't matter if you just have local variables.
In case anyone is curious, I'm relying at the moment on a technique for extracting the absolute value of a negative integer consisting of first a cast to the unsigned counterpart, and then the application of the unary minus operator (so that for instance -3 becomes 4 via cast and then 3 via unary minus). This would break on the example above for -8, which could not be represented in the unsigned type.
You need to deal with whatever numeric ranges the machine supports. Instead of casting to the unsigned counterpart, cast to whatever unsigned type is sufficient: one larger than the counterpart if necessary. If no large enough type exists, then the machine may be incapable of doing what you want.