Possible results of integer xor in C++ - c++

Is it guaranteed that (2 ^ 32) == 34?

In C++20, yes.
Here's how [expr.xor] defines it:
Given the coefficients xi and yi of the base-2 representation ([basic.fundamental]) of the converted operands x and y, the coefficient ri of the base-2 representation of the result r is 1 if either (but not both) of xi and yi are 1, and 0 otherwise.
And [basic.fundamental] covers what a base-2 representation means:
Each value x of an unsigned integer type with width N has a unique representation x = x020 + x121 + … + xN-12N-1, where each coefficient xi is either 0 or 1; this is called the base-2 representation of x. The base-2 representation of a value of signed integer type is the base-2 representation of the congruent value of the corresponding unsigned integer type.
In short, it doesn't really matter how it's done "physically": the operation must satisfy the more abstract, arithmetic notion of base-2 (whether this matches the bits in memory or not; of course in reality it will) and so XOR is entirely well-defined.
However, this was not always the case. The wording was introduced by P1236R1, to make it crystal clear how integer operations behave and to abstract away the kind of wooly notion of a "bit".
In C++11, all we knew is that signed integers must follow "A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position" (footnote 49; be advised that this is non-normative).
This gets us most of the way there, actually, but the specific wording in [expr.xor] wasn't there: all we knew is that "the result is the bitwise exclusive OR function of the operands". At this juncture, whether that refers to a sufficiently commonly understood operation is really up to you. You'll be hard-pressed to find a dissenting opinion on what this operation was permitted to do, mind you.
So:
In C++11, YMMV.

yes
Or at least for the unedited version of the question when it was written as:
2 ^ 32 == 34
Given that the relational operator == has a higher precedence than bitwise XOR ^, the expression is evaluated as:
2 ^ (32 == 34)
that is: 2 ^ 0
which is by definition 2 and thus true

No matter how the values are represented internally, the result of 2 ^ 32 is 34. The ^ operator means a binary XOR and the result you must get if you do that operation correctly is independent of how you do the operation.
The same is true of 2 + 32. You can represent 2 and 32 in binary, in decimal, or any other way you want, but the result you get had better be the way you represent 34, whatever that is.

I don't know if the standard formally defines exclusive or, but it's a well known operation with a consistent definition. The one thing that is explicitly left out of the standard is the mapping of integer numbers to bits. Your assertion would hold for the commonly used twos-complement representation and the uncommon ones-complement.

Related

Does the C++ standard require the maximum of unsigned integer numbers to be of the form 2^N-1?

For T so that std::is_integral<T>::value && std::is_unsigned<T>::value is true, does the C++ standard guarantee that :
std::numeric_limits<T>::max() == 2^(std::numeric_limits<T>::digits)-1
in the mathematical sense? I am looking for a proof of that based on quotes from the standard.
C++ specifies the ranges of the integral types by reference to the C standard. The C standard says:
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N − 1, so that objects of that type shall be capable of
representing values from 0 to 2N − 1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
Moreover, C++ requires:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.
Putting all this together, we find that an unsigned integral type has n value bits, represents the values in the range [0, 2n) and obeys the laws of arithmetic modulo 2n.
I believe that this is implied by [basic.fundamental]/4 (N3337):
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2^n where n is the number
of bits in the value representation of that particular size of integer.

Behaviour of negative zero on a one's complement architecture?

Consider the following code on a one's complement architecture:
int zero = 0;
int negzero = -0;
std::cout<<(negzero < zero)<<std::endl;
std::cout<<(negzero <= zero)<<std::endl;
std::cout<<(negzero == zero)<<std::endl;
std::cout<<(~negzero)<<(~zero)<<std::endl;
std::cout<<(1 << negzero)<<std::endl;
std::cout<<(1 >> negzero)<<std::endl;
What output would the code produce?
What lines are defined by the standard, what lines are implementation dependent, and what lines are undefined behaviour?
Based on my interpretation of the standard:
The C++ standard in §3.9.1/p3 Fundamental types [basic.fundamental] actually throws the ball in the C standard:
The signed and unsigned integer types shall satisfy the constraints
given in the C standard, section 5.2.4.2.1.
Now if we go to ISO/IEC 9899:2011 section 5.2.4.2.1 it gives as a forward reference to §6.2.6.2/p2 Integer types (Emphasis Mine):
If the sign bit is zero, it shall not affect the resulting value. If
the sign bit is one, the value shall be modified in one of the
following ways:
the corresponding value with sign bit 0 is negated (sign and
magnitude);
the sign bit has the value −(2^M) (two’s complement);
the sign bit has the value −(2^M − 1) (ones’ complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two),
or with sign bit and all value bits 1 (for ones’ complement), is a
trap representation or a normal value. In the case of sign and
magnitude and ones’ complement, if this representation is a normal
value it is called a negative zero.
Consequently, the existence of negative zero is implementation defined.
If we proceed further in paragraph 3:
If the implementation supports negative zeros, they shall be generated
only by:
the &, |, ^, ~, <<, and >> operators with operands that produce such
a value;
the +, -, *, /, and % operators where one operand is a negative zero
and the result is zero;
compound assignment operators based on the above cases.
It is unspecified whether these cases actually generate a negative
zero or a normal zero, and whether a negative zero becomes a normal
zero when stored in an object.
Consequently, it is unspecified whether the related cases that you displayed are going to generate a negative zero at all.
Now proceeding in paragraph 4:
If the implementation does not support negative zeros, the behavior of
the &, |, ^, ~, <<, and >> operators with operands that would produce
such a value is undefined.
Consequently, whether the related operations result in undefined behaviour, depends on whether the implementation supports negative zeros.
First of all, your first premise is wrong:
int negzero = -0;
should produce a normal zero on any conformant architecture.
References for that were given in #101010's answer:
3.9.1 Fundamental types [basic.fundamental] §3:
... The signed and unsigned integer
types shall satisfy the constraints given in the C standard, section 5.2.4.2.1.
Later in C reference:
5.2.4.2.1 Sizes of integer types
... Forward references: representations of types (6.2.6)
and (still C):
6.2.6 Representations of types / 6.2.6.2 Integer types § 3
If the implementation supports negative zeros, they shall be generated only by:
the &, |, ^, ~, <<, and >> operators with arguments that produce such a value;
the +, -, *, /, and % operators where one argument is a negative zero and the result is
zero;
compound assignment operators based on the above cases.
So negzero = -0 is not such a construct and shall not produce a negative 0.
For following lines, I will assume that the negative 0 was produced in a bitwise manner, on an implementation that supports it.
C++ standard does not speak at all of negative zeros, and C standard just say of them that their existence is implementation dependant. I could not find any paragraph explicitly saying whether a negative zero should or not be equal to a normal zero for relational or equality operator.
So I will just cite in C reference : 6.5.8 Relational operators §6
Each of the operators < (less than), > (greater than), <= (less than or equal to), and >=
(greater than or equal to) shall yield 1 if the specified relation is true and 0 if it is false.92)
The result has type int.
and in C++ 5.9 Relational operators [expr.rel] §5
If both operands (after conversions) are of arithmetic or enumeration type, each of the operators shall yield
true if the specified relationship is true and false if it is false.
My interpretation of standard is that an implementation may allow an alternate representation of the integer value 0 (negative zero) but it is still a representation of the value 0 and it should perform accordingly in any arithmetic expression, because C 6.2.6.2 Integer types § 3 says:
negative zeros[...] shall be generated only by [...] the +, -, *, /, and % operators where one argument is a negative zero and the result is
zero
That means that if the result is not 0, a negative 0 should perform as a normal zero.
So these two lines at least are perfectly defined and should produce 1:
std::cout<<(1 << negzero)<<std::endl;
std::cout<<(1 >> negzero)<<std::endl;
This line is clearly defined to be implementation dependant:
std::cout<<(~negzero)<<(~zero)<<std::endl;
because an implementation could have padding bits. If there are no padding bits, on a one's complement architecture ~zero is negzero, so ~negzero should produce a 0 but I could not find in standard if a negative zero should display as 0 or as -0. A negative floating point 0 should be displayed with a minus sign, but nothing seems explicit for an integer negative value.
For the last 3 line involving relational and equality operators, there is nothing explicit in standard, so I would say it is implementation defined
TL/DR:
Implementation-dependent:
std::cout<<(negzero < zero)<<std::endl;
std::cout<<(negzero <= zero)<<std::endl;
std::cout<<(negzero == zero)<<std::endl;
std::cout<<(~negzero)<<(~zero)<<std::endl;
Perfectly defined and should produce 1:
std::cout<<(1 << negzero)<<std::endl;
std::cout<<(1 >> negzero)<<std::endl;
First of all one's complement architectures (or even distinguish negative zero) are rather rare, and there's a reason for that. It's basically easier to (hardware-wise) add two's complement than one's complement.
The code you've posted doesn't seem to have undefined behavior or even implementation defined behavior, it should probably not result in negative zero (or it shouldn't be distinguished from normal zero).
Negative zeros should not be that easy to produce (and if you manage to do that it's implementation defined behavior at best). If it's a ones-complement architecture they would be produced by ~0 (bit-wise inversion) rather than -0.
The C++ standard is rather vague about the actual representation and requirements on the behavior of basic types (which means that the specification only deals with the actual meaning of the number). What this means is that you basically are out of luck in relating internal representation of the number and it's actual value. So even if you did this right and used ~0 (or whatever way is proper for the implementation) the standard still doesn't seem to bother with the representation as the value of negative zero is still zero.
#define zero (0)
#define negzero (~0)
std::cout<<(negzero < zero)<<std::endl;
std::cout<<(negzero <= zero)<<std::endl;
std::cout<<(negzero == zero)<<std::endl;
std::cout<<(~negzero)<<(~zero)<<std::endl;
std::cout<<(1 << negzero)<<std::endl;
std::cout<<(1 >> negzero)<<std::endl;
the three first lines should produce the same output as if negzero was defined the same as zero. The third line should output two zeros (as the standard requires that 0 to be rendered as 0 without sign-sign). The two last should output ones.
There are some hints (on how to produce negative zeros) that can be found in the C standard which actually mentions negative zero, but I don't think there's any mentions about they should compare less than normal zero. The C-standard suggests that negative zero might not survive storage in object (that's why I avoided that in the above example).
The way C and C++ are related it's reasonable to think that a negative zero would be produced in the same way in C++ as in C, and the standard seem to allow for that. While the C++ standard allows for other ways (via undefined behavior), but no other seem to be available via defined behavior. So it's rather certain that if a C++ implementation is to be able to produce negative zeros in an reasonable way it would be the same as for a similar C implementation.

Unsigned vs signed range guarantees

I've spent some time poring over the standard references, but I've not been able to find an answer to the following:
is it technically guaranteed by the C/C++ standard that, given a signed integral type S and its unsigned counterpart U, the absolute value of each possible S is always less than or equal to the maximum value of U?
The closest I've gotten is from section 6.2.6.2 of the C99 standard (the wording of the C++ is more arcane to me, I assume they are equivalent on this):
For signed integer types, the bits of the object representation shall be divided into three
groups: value bits, padding bits, and the sign bit. (...) Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and Nin the unsigned type, then M≤N).
So, in hypothetical 4-bit signed/unsigned integer types, is anything preventing the unsigned type to have 1 padding bit and 3 value bits, and the signed type having 3 value bits and 1 sign bit? In such a case the range of unsigned would be [0,7] and for signed it would be [-8,7] (assuming two's complement).
In case anyone is curious, I'm relying at the moment on a technique for extracting the absolute value of a negative integer consisting of first a cast to the unsigned counterpart, and then the application of the unary minus operator (so that for instance -3 becomes 4 via cast and then 3 via unary minus). This would break on the example above for -8, which could not be represented in the unsigned type.
EDIT: thanks for the replies below Keith and Potatoswatter. Now, my last point of doubt is on the meaning of "subrange" in the wording of the standard. If it means a strictly "less-than" inclusion, then my example above and Keith's below are not standard-compliant. If the subrange is intended to be potentially the whole range of unsigned, then they are.
For C, the answer is no, there is no such guarantee.
I'll discuss types int and unsigned int; this applies equally to any corresponding pair of signed and unsigned types (other than char and unsigned char, neither of which can have padding bits).
The standard, in the section you quoted, implicitly guarantees that UINT_MAX >= INT_MAX, which means that every non-negative int value can be represented as an unsigned int.
But the following would be perfectly legal (I'll use ** to denote exponentiation):
CHAR_BIT == 8
sizeof (int) == 4
sizeof (unsigned int) == 4
INT_MIN = -2**31
INT_MAX = +2**31-1
UINT_MAX = +2**31-1
This implies that int has 1 sign bit (as it must) and 31 value bits, an ordinary 2's-complement representation, and unsigned int has 31 value bits and one padding bit. unsigned int representations with that padding bit set might either be trap representations, or extra representations of values with the padding bit unset.
This might be appropriate for a machine with support for 2's-complement signed arithmetic, but poor support for unsigned arithmetic.
Given these characteristics, -INT_MIN (the mathematical value) is outside the range of unsigned int.
On the other hand, I seriously doubt that there are any modern systems like this. Padding bits are permitted by the standard, but are very rare, and I don't expect them to become any more common.
You might consider adding something like this:
#if -INT_MIN > UINT_MAX
#error "Nope"
#endif
to your source, so it will compile only if you can do what you want. (You should think of a better error message than "Nope", of course.)
You got it. In C++11 the wording is more clear. §3.9.1/3:
The range of non-negative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the value representation of each corresponding signed/unsigned type shall be the same.
But, what really is the significance of the connection between the two corresponding types? They are the same size, but that doesn't matter if you just have local variables.
In case anyone is curious, I'm relying at the moment on a technique for extracting the absolute value of a negative integer consisting of first a cast to the unsigned counterpart, and then the application of the unary minus operator (so that for instance -3 becomes 4 via cast and then 3 via unary minus). This would break on the example above for -8, which could not be represented in the unsigned type.
You need to deal with whatever numeric ranges the machine supports. Instead of casting to the unsigned counterpart, cast to whatever unsigned type is sufficient: one larger than the counterpart if necessary. If no large enough type exists, then the machine may be incapable of doing what you want.

converting -1 to unsigned types

Consider the following code to set all bits of x
unsigned int x = -1;
Is this portable ? It seems to work on at least Visual Studio 2005-2010
The citation-heavy answer:
I know there are plenty of correct answers in here, but I'd like to add a few citations to the mix. I'll cite two standards: C99 n1256 draft (freely available) and C++ n1905 draft (also freely available). There's nothing special about these particular standards, they're just both freely available and whatever happened to be easiest to find at the moment.
The C++ version:
§5.3.2 ¶9: According to this paragraph, the value ~(type)0 is guaranteed to have all bits set, if (type) is an unsigned type.
The operand of ~ shall have integral or enumeration type; the result is the one’s complement of its operand. Integral promotions are performed. The type of the result is the type of the promoted operand.
§3.9.1 ¶4: This explains how overflow works with unsigned numbers.
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.
§3.9.1 ¶7, plus footnote 49: This explains that numbers must be binary. From this, we can infer that ~(type)0 must be the largest number representable in type (since it has all bits turned on, and each bit is additive).
The representations of integral types shall define values by use of a pure
binary numeration system49.
49) A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin
with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position. (Adapted from the American National
Dictionary for Information Processing Systems.)
Since arithmetic is done modulo 2n, it is guaranteed that (type)-1 is the largest value representable in that type. It is also guaranteed that ~(type)0 is the largest value representable in that type. They must therefore be equal.
The C99 version:
The C99 version spells it out in a much more compact, explicit way.
§6.5.3 ¶3:
The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,
each bit in the result is set if and only if the corresponding bit in the converted operand is
not set). The integer promotions are performed on the operand, and the result has the
promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent
to the maximum value representable in that type minus E.
As in C++, unsigned arithmetic is guaranteed to be modular (I think I've done enough digging through standards for now), so the C99 standard definitely guarantees that ~(type)0 == (type)-1, and we know from §6.5.3 ¶3 that ~(type)0 must have all bits set.
The summary:
Yes, it is portable. unsigned type x = -1; is guaranteed to have all bits set according to the standard.
Footnote: Yes, we are talking about value bits and not padding bits. I doubt that you need to set padding bits to one, however. You can see from a recent Stack Overflow question (link) that GCC was ported to the PDP-10 where the long long type has a single padding bit. On such a system, unsigned long long x = -1; may not set that padding bit to 1. However, you would only be able to discover this if you used pointer casts, which isn't usually portable anyway.
Apparently it is:
(4.7) If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2n where n is
the number of bits used to represent the unsigned type). [Note: In a
two’s complement representation, this conversion is conceptual and
there is no change in the bit pattern (if there is no truncation).
It is guaranteed to be the largest amount possible for that type due to the properties of modulo.
C99 also allows it:
Otherwise, if the newtype is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that
can be represented in the newtype until the value is in the range of
the newtype. 49)
Which wold also be the largest amount possible.
Largest amount possible may not be all bits set. Use ~static_cast<unsigned int>(0) for that.
I was sloppy in reading the question, and made several comments that might be misleading because of that. I'll try to clear up the confusion in this answer.
The declaration
unsigned int x = -1;
is guaranteed to set x to UINT_MAX, the maximum value of type unsigned int. The expression -1 is of type int, and it's implicitly converted to unsigned int. The conversion (which is defined in terms of values, not representations) results in the maximum value of the target unsigned type.
(It happens that the semantics of the conversion are optimized for two's-complement systems; for other schemes, the conversion might involve something more than just copying the bits.)
But the question referred to setting all bits of x. So, is UINT_MAX represented as all-bits-one?
There are several possible representations for signed integers (two's-complement is most common, but ones'-complement and sign-and-magnitude are also possible). But we're dealing with an unsigned integer type, so the way that signed integers are represented is irrelevant.
Unsigned integers are required to be represented in a pure binary format. Assuming that all the bits of the representation contribute to the value of an unsigned int object, then yes, UINT_MAX must be represented as all-bits-one.
On the other hand, integer types are allowed to have padding bits, bits that don't contribute to the representation. For example, it's legal for unsigned int to be 32 bits, but for only 24 of those bits to be value bits, so UINT_MAX would be 2*24-1 rather than 2*32-1. So in the most general case, all you can say is that
unsigned int x = -1;
sets all the value bits of x to 1.
In practice, very very few systems have padding bits in integer types. So on the vast majority of systems, unsigned int has a size of N bits, and a maximum value of 2**N-1, and the above declaration will set all the bits of x to 1.
This:
unsigned int x = ~0U;
will also set x to UINT_MAX, since bitwise complement for unsigned types is defined in terms of subtraction.
Beware!
This is implementation-defined, as how a negative integer shall be represented, whether two's complement or what, is not defined by the C++ Standard. It is up to the compiler which makes the decision, and has to document it properly.
In short, it is not portable. It may not set all bits of x.

What's a portable value for UINT_MIN?

In limits.h, there are #defines for INT_MAX and INT_MIN (and SHRT_* and LONG_* and so on), but only UINT_MAX.
Should I define UINT_MIN myself? Is 0 (positive zero) a portable value?
It's an unsigned integer - by definition its smallest possible value is 0. If you want some justification besides just common sense, the standard says:
6.2.6.2 Integer types
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2^(N−1), so that objects of that type shall be capable of representing values from 0 to 2^(N−1) using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
You could use std::numeric_limits<unsigned int>::min().
If you want to be "typesafe" you could use 0U, so if you use it in an expression you will have the correct promotions to unsigned.