Consider the following code on a one's complement architecture:
int zero = 0;
int negzero = -0;
std::cout<<(negzero < zero)<<std::endl;
std::cout<<(negzero <= zero)<<std::endl;
std::cout<<(negzero == zero)<<std::endl;
std::cout<<(~negzero)<<(~zero)<<std::endl;
std::cout<<(1 << negzero)<<std::endl;
std::cout<<(1 >> negzero)<<std::endl;
What output would the code produce?
What lines are defined by the standard, what lines are implementation dependent, and what lines are undefined behaviour?
Based on my interpretation of the standard:
The C++ standard in §3.9.1/p3 Fundamental types [basic.fundamental] actually throws the ball in the C standard:
The signed and unsigned integer types shall satisfy the constraints
given in the C standard, section 5.2.4.2.1.
Now if we go to ISO/IEC 9899:2011 section 5.2.4.2.1 it gives as a forward reference to §6.2.6.2/p2 Integer types (Emphasis Mine):
If the sign bit is zero, it shall not affect the resulting value. If
the sign bit is one, the value shall be modified in one of the
following ways:
the corresponding value with sign bit 0 is negated (sign and
magnitude);
the sign bit has the value −(2^M) (two’s complement);
the sign bit has the value −(2^M − 1) (ones’ complement).
Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two),
or with sign bit and all value bits 1 (for ones’ complement), is a
trap representation or a normal value. In the case of sign and
magnitude and ones’ complement, if this representation is a normal
value it is called a negative zero.
Consequently, the existence of negative zero is implementation defined.
If we proceed further in paragraph 3:
If the implementation supports negative zeros, they shall be generated
only by:
the &, |, ^, ~, <<, and >> operators with operands that produce such
a value;
the +, -, *, /, and % operators where one operand is a negative zero
and the result is zero;
compound assignment operators based on the above cases.
It is unspecified whether these cases actually generate a negative
zero or a normal zero, and whether a negative zero becomes a normal
zero when stored in an object.
Consequently, it is unspecified whether the related cases that you displayed are going to generate a negative zero at all.
Now proceeding in paragraph 4:
If the implementation does not support negative zeros, the behavior of
the &, |, ^, ~, <<, and >> operators with operands that would produce
such a value is undefined.
Consequently, whether the related operations result in undefined behaviour, depends on whether the implementation supports negative zeros.
First of all, your first premise is wrong:
int negzero = -0;
should produce a normal zero on any conformant architecture.
References for that were given in #101010's answer:
3.9.1 Fundamental types [basic.fundamental] §3:
... The signed and unsigned integer
types shall satisfy the constraints given in the C standard, section 5.2.4.2.1.
Later in C reference:
5.2.4.2.1 Sizes of integer types
... Forward references: representations of types (6.2.6)
and (still C):
6.2.6 Representations of types / 6.2.6.2 Integer types § 3
If the implementation supports negative zeros, they shall be generated only by:
the &, |, ^, ~, <<, and >> operators with arguments that produce such a value;
the +, -, *, /, and % operators where one argument is a negative zero and the result is
zero;
compound assignment operators based on the above cases.
So negzero = -0 is not such a construct and shall not produce a negative 0.
For following lines, I will assume that the negative 0 was produced in a bitwise manner, on an implementation that supports it.
C++ standard does not speak at all of negative zeros, and C standard just say of them that their existence is implementation dependant. I could not find any paragraph explicitly saying whether a negative zero should or not be equal to a normal zero for relational or equality operator.
So I will just cite in C reference : 6.5.8 Relational operators §6
Each of the operators < (less than), > (greater than), <= (less than or equal to), and >=
(greater than or equal to) shall yield 1 if the specified relation is true and 0 if it is false.92)
The result has type int.
and in C++ 5.9 Relational operators [expr.rel] §5
If both operands (after conversions) are of arithmetic or enumeration type, each of the operators shall yield
true if the specified relationship is true and false if it is false.
My interpretation of standard is that an implementation may allow an alternate representation of the integer value 0 (negative zero) but it is still a representation of the value 0 and it should perform accordingly in any arithmetic expression, because C 6.2.6.2 Integer types § 3 says:
negative zeros[...] shall be generated only by [...] the +, -, *, /, and % operators where one argument is a negative zero and the result is
zero
That means that if the result is not 0, a negative 0 should perform as a normal zero.
So these two lines at least are perfectly defined and should produce 1:
std::cout<<(1 << negzero)<<std::endl;
std::cout<<(1 >> negzero)<<std::endl;
This line is clearly defined to be implementation dependant:
std::cout<<(~negzero)<<(~zero)<<std::endl;
because an implementation could have padding bits. If there are no padding bits, on a one's complement architecture ~zero is negzero, so ~negzero should produce a 0 but I could not find in standard if a negative zero should display as 0 or as -0. A negative floating point 0 should be displayed with a minus sign, but nothing seems explicit for an integer negative value.
For the last 3 line involving relational and equality operators, there is nothing explicit in standard, so I would say it is implementation defined
TL/DR:
Implementation-dependent:
std::cout<<(negzero < zero)<<std::endl;
std::cout<<(negzero <= zero)<<std::endl;
std::cout<<(negzero == zero)<<std::endl;
std::cout<<(~negzero)<<(~zero)<<std::endl;
Perfectly defined and should produce 1:
std::cout<<(1 << negzero)<<std::endl;
std::cout<<(1 >> negzero)<<std::endl;
First of all one's complement architectures (or even distinguish negative zero) are rather rare, and there's a reason for that. It's basically easier to (hardware-wise) add two's complement than one's complement.
The code you've posted doesn't seem to have undefined behavior or even implementation defined behavior, it should probably not result in negative zero (or it shouldn't be distinguished from normal zero).
Negative zeros should not be that easy to produce (and if you manage to do that it's implementation defined behavior at best). If it's a ones-complement architecture they would be produced by ~0 (bit-wise inversion) rather than -0.
The C++ standard is rather vague about the actual representation and requirements on the behavior of basic types (which means that the specification only deals with the actual meaning of the number). What this means is that you basically are out of luck in relating internal representation of the number and it's actual value. So even if you did this right and used ~0 (or whatever way is proper for the implementation) the standard still doesn't seem to bother with the representation as the value of negative zero is still zero.
#define zero (0)
#define negzero (~0)
std::cout<<(negzero < zero)<<std::endl;
std::cout<<(negzero <= zero)<<std::endl;
std::cout<<(negzero == zero)<<std::endl;
std::cout<<(~negzero)<<(~zero)<<std::endl;
std::cout<<(1 << negzero)<<std::endl;
std::cout<<(1 >> negzero)<<std::endl;
the three first lines should produce the same output as if negzero was defined the same as zero. The third line should output two zeros (as the standard requires that 0 to be rendered as 0 without sign-sign). The two last should output ones.
There are some hints (on how to produce negative zeros) that can be found in the C standard which actually mentions negative zero, but I don't think there's any mentions about they should compare less than normal zero. The C-standard suggests that negative zero might not survive storage in object (that's why I avoided that in the above example).
The way C and C++ are related it's reasonable to think that a negative zero would be produced in the same way in C++ as in C, and the standard seem to allow for that. While the C++ standard allows for other ways (via undefined behavior), but no other seem to be available via defined behavior. So it's rather certain that if a C++ implementation is to be able to produce negative zeros in an reasonable way it would be the same as for a similar C implementation.
Related
I was reading Setting an int to Infinity in C++. I understand that when one needs true infinity, one is supposed to use numeric_limits<float>::infinity(); I guess the rationale behind it is that usually integral types have no values designated for representing special states like NaN, Inf, etc. like IEEE 754 floats do (again C++ doesn't mandate neither - int & float used are left to the implementation); but still it's misleading that max > infinity for a given type. I'm trying to understand the rationale behind this call in the standard. If having infinity doesn't make sense for a type, then shouldn't it be disallowed instead of having a flag to be checked for its validity?
The function numeric_limits<T>::infinity() makes sense for those T for which numeric_limits<T>::has_infinity returns true.
In case of T=int, it returns false. So that comparison doesn't make sense, because numeric_limits<int>::infinity() does not return any meaningful value to compare with.
If you read e.g. this reference you will see a table showing infinity to be zero for integer types. That's because integer types in C++ can't, by definition, be infinite.
Suppose, conversely, the standard did reserve some value to represent inifity, and that numeric_limits<int>::infinity() > numeric_limits<int>::max(). That means that there would be some value of int which is greater than max(), that is, some representable value of int is greater than the greatest representable value of int.
Clearly, whichever way the Standard specifies, some natural understanding is violated. Either inifinity() <= max(), or there exists x such that int(x) > max(). The Standard must choose which rule of nature to violate.
I believe they chose wisely.
numeric_limits<int>::infinity() returns the representation of positive infinity, if available.
In case of integers, positive infinity does not exists:
cout << "int has infinity: " << numeric_limits<int>::has_infinity << endl;
prints
int has infinity: false
Does the C++11 standard guarantee that the unary minus of a zero-valued signed integer is zero?
For example:
int zero = 0;
int n = -zero;
int m = -0;
assert(memcmp(&n, &zero, sizeof(int)) == 0);
assert(memcmp(&m, &zero, sizeof(int)) == 0);
I know that -0 and 0 are identical in two's compliment representation, but I'd like to know if the standard allows for the negation of signed integer zero to be negative-zero for other representations, such as one's compliment or signed-magnitude.
All I could find in the C++11 draft is §5.3.1, paragraph 8:
The operand of the unary-operator shall have arithmetic or unscoped
enumeration type and the result is the negation of its operand.
Integral promotion is performed on integral or enumeration operands.
The negative of an unsigned quantity is computed by subtracting its
value from 2^n, where n is the number of bits in the promoted operand.
The type of the result is the type of the promoted operand.
I can't find a definition of negation within the draft.
Motivation: I'm writing a specialized integer number parser for a library (that may be open-sourced eventually) and I want to know if I should be concerned about the possibility of "-0" being interpreted as a negative-zero signed integer on uncommon architectures.
Note: I already know about negative-zero floating-point numbers.
The standard does not mandate bit patterns for integers, because it is meant to be applicable to the widest possible range of machines. It is entirely possible for a C++ compiler to use ones-complement, where zero and negative zero would be different.
Is it guaranteed that (2 ^ 32) == 34?
In C++20, yes.
Here's how [expr.xor] defines it:
Given the coefficients xi and yi of the base-2 representation ([basic.fundamental]) of the converted operands x and y, the coefficient ri of the base-2 representation of the result r is 1 if either (but not both) of xi and yi are 1, and 0 otherwise.
And [basic.fundamental] covers what a base-2 representation means:
Each value x of an unsigned integer type with width N has a unique representation x = x020 + x121 + … + xN-12N-1, where each coefficient xi is either 0 or 1; this is called the base-2 representation of x. The base-2 representation of a value of signed integer type is the base-2 representation of the congruent value of the corresponding unsigned integer type.
In short, it doesn't really matter how it's done "physically": the operation must satisfy the more abstract, arithmetic notion of base-2 (whether this matches the bits in memory or not; of course in reality it will) and so XOR is entirely well-defined.
However, this was not always the case. The wording was introduced by P1236R1, to make it crystal clear how integer operations behave and to abstract away the kind of wooly notion of a "bit".
In C++11, all we knew is that signed integers must follow "A positional representation for integers that uses the binary digits 0 and 1, in which the values represented by successive bits are additive, begin with 1, and are multiplied by successive integral power of 2, except perhaps for the bit with the highest position" (footnote 49; be advised that this is non-normative).
This gets us most of the way there, actually, but the specific wording in [expr.xor] wasn't there: all we knew is that "the result is the bitwise exclusive OR function of the operands". At this juncture, whether that refers to a sufficiently commonly understood operation is really up to you. You'll be hard-pressed to find a dissenting opinion on what this operation was permitted to do, mind you.
So:
In C++11, YMMV.
yes
Or at least for the unedited version of the question when it was written as:
2 ^ 32 == 34
Given that the relational operator == has a higher precedence than bitwise XOR ^, the expression is evaluated as:
2 ^ (32 == 34)
that is: 2 ^ 0
which is by definition 2 and thus true
No matter how the values are represented internally, the result of 2 ^ 32 is 34. The ^ operator means a binary XOR and the result you must get if you do that operation correctly is independent of how you do the operation.
The same is true of 2 + 32. You can represent 2 and 32 in binary, in decimal, or any other way you want, but the result you get had better be the way you represent 34, whatever that is.
I don't know if the standard formally defines exclusive or, but it's a well known operation with a consistent definition. The one thing that is explicitly left out of the standard is the mapping of integer numbers to bits. Your assertion would hold for the commonly used twos-complement representation and the uncommon ones-complement.
TCPL 3rd Edition, C.6.2.1 Integral Conversions gives the following example:
signed char c = 1023; // implementation defined
Plausible results are 255 and -1 C.3.4
The -1 option is obtained if the target machine uses 2s complement.
What implementation of 'signedness' would result in 255?
A compiler could trivially replace your static initialiser with = 255 in this case, on a whim.
But I'd just like to say that looking for an answer to this question is the wrong approach to programming C++. The book is teaching you about C++, not about specific computers. Code to standards and you won't have any problems.
Honnestly I know no implementation that woult result in 255. It is theorically possible, but for a signed char to have a 255 value, the char type shall have at least 9 bits: 8 for the data value and one for the sign.
But as you are converting a value that cannot be represented into a signed type, the result is implementation dependant(*), so it could legitimately be MAX_CHAR for any integer value greater than it, which is 127. Or an implementation could choose to convert unacceptable values to 0.
In any way, it has nothing to do with what is the representation of negative values. Except that common implementations use 2's complement and simply keep the lowest order bits which ends in -1.
But what is true with common implementations is that:
char c = 1023; // plausible values are -1 (default char is signed) or 255 (default unsigned)
(*) Refs:
From n4296 draft for C++14
4.7 Integral conversions [conv.integral]
...
If the destination type is signed, the value is unchanged if it can be represented in the destination type;
otherwise, the value is implementation-defined.
I'm reading C++ Primer and I'm slightly confused by a few comments which talk about how Bitwise operators deal with signed types. I'll quote:
Quote #1
(When talking about Bitwise operators) "If the operand is signed and
its value is negative, then the way that the “sign bit” is handled in
a number of the bitwise operations is machine dependent. Moreover,
doing a left shift that changes the value of the sign bit is
undefined"
Quote #2
(When talking about the rightshift operator) "If that operand is
unsigned, then the operator inserts 0-valued bits on the left; if it
is a signed type, the result is implementation defined—either copies
of the sign bit or 0-valued bits are inserted on the left."
The bitwise operators promote small integers (such as char) to signed ints. Isn't there an issue with this promotion to signed ints when bitwise operators often gives undefined or implementation-defined behaviour on signed operator types? Why wouldn't the standard promote char to unsigned int?
Edit: Here is the question I took out, but I've placed it back for context with some answers below.
An exercise later asks
"What is the value of ~'q' << 6 on a machine with 32-bit ints and 8 bit chars, that uses Latin-1 character set in which 'q' has the bit pattern 01110001?"
Well, 'q' is a character literal and would be promoted to int, giving
~'q' == ~0000000 00000000 00000000 01110001 == 11111111 11111111 11111111 10001110
The next step is to apply a left shift operator to the bits above, but as quote #1 mentions
"doing a left shift that changes the value of the sign bit is
undefined"
well I don't exactly know which bit is the sign bit, but surely the answer is undefined?
You're quite correct -- the expression ~'q' << 6 is undefined behavior according to the standard. Its even worse than you state, as the ~ operator is defined as computing "The one's complement" of the value, which is meaningless for a signed (2s-complement) integer -- the term "one's complement" only really means anything for an unsigned integer.
When doing bitwise operations, if you want strictly well-defined (according to the standard) results, you generally have to ensure that the values being operated on are unsigned. You can do that either with explicit casts, or by using explicitly unsigned constants (U-suffix) in binary operations. Doing a binary operation with a signed and unsigned int is done as unsigned (the signed value is converted to unsigned).
C and C++ are subtley different with the integer promotions, so you need to be careful here -- C++ will convert a smaller-than-int unsigned value to int (signed) before comparing with the other operand to see what should be done, while C will compare operands first.
It might be simplest to read the exact text of the Standard, instead of a summary like in Primer Plus. (The summary has to leave out detail by virtue of being a summary!)
The relevant portions are:
[expr.shift]
The shift operators << and >> group left-to-right.
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2E2 , reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
[expr.unary.op]/10
The operand of ˜ shall have integral or unscoped enumeration type; the result is the one’s complement of its operand. Integral promotions are performed. The type of the result is the type of the promoted operand.
Note that neither of these performs the usual arithmetic conversions (which is the conversion to a common type that is done by most of the binary operators).
The integral promotions:
[conv.prom]/1
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.
(There are other entries for the types in the "other than" list, I have omitted them here but you can look it up in a Standard draft).
The thing to remmeber about the integer promotions is that they are value-preserving , if you have a char of value -30, then after promotion it will be an int of value -30. You don't need to think about things like "sign extension".
Your initial analysis of ~'q' is correct, and the result has type int (because int can represent all the values of char on normal systems).
It turns out that any int whose most significant bit is set represents a negative value (there are rules about this in another part of the standard that I haven't quoted here), so ~'q' is a negative int.
Looking at [expr.shift]/2 we see that this means left-shifting it causes undefined behaviour (it's not covered by any of the earlier cases in that paragraph).
Of course, by editing the question, my answer is now partly answering a different question than the one posed, so here goes an attempt to answer the "new" question:
The promotion rules (what gets converted to what) are well defined in the standard. The type char may be either signed or unsigned - in some compilers you can even give a flag to the compiler to say "I want unsigned char type" or "I want signed char type" - but most compilers just define char as either signed or unsigned.
A constant, such as 6 is signed by default. When an operation, such as 'q' << 6 is written in the code, the compiler will convert any smaller type to any larger type [or if you do any arithmetic in general, char is converted to int], so 'q' becomes the integer value of 'q'. If you want to avoid that, you should use 6u, or an explicit cast, such as static_cast<unsigned>('q') << 6 - that way, you are ensured that the operand is converted to unsigned, rather than signed.
The operations are undefined because different hardware behaves differently, and there are architectures with "strange" numbering systems, which means that the standards committee has to choose between "ruling out/making operations extremely inefficient" or "defining the standard in a way that isn't very clear". In a few architectures, overflowing integers may also be a trap, and if you shift such that you change the sign on the number, that typically counts as an overflow - and since trapping typically means "your code no longer runs", that would not be what your average programmer expects -> falls under the umbrella of "undefined behaviour". Most processors don't, and nothing really bad will happen if you do that.
Old answer:
So the solution to avoid this is to always cast your signed values (including char) to unsigned before shifting them (or accept that your code may not work on another compiler, the same compiler with different options, or the next release of the same compiler).
It is also worth noting that the resulting value is "nearly always what you expect" (in that the compiler/processor will just perform the left or right shift on the value, on right shifts using the sign bit to shift down), it's just undefined or implementation defined because SOME machine architectures may not have hardware to "do this right", and C compilers still need to work on those systems.
The sign bit is the highest bit in a twos-complement, and you are not changing that by shifting that number:
11111111 11111111 11111111 10001110 << 6 =
111111 11111111 11111111 11100011 10000000
^^^^^^--- goes away.
result=11111111 11111111 11100011 10000000
Or as a hex number: 0xffffe380.