unary minus for 0x80000000 (signed and unsigned)

unary minus for 0x80000000 (signed and unsigned) - c++

The n3337.pdf draft, 5.3.1.8, states that:
The operand of the unary - operator shall have arithmetic or unscoped enumeration type and the result is the negation of its operand. Integral promotion is performed on integral or enumeration operands. The negative of an unsigned quantity is computed by subtracting its value from 2ⁿ, where n is the number of bits in the promoted operand. The type of the result is the type of the promoted operand.
For some cases it is enough. Suppose unsigned int is 32 bits wide, then (-(0x80000000u)) == 0x80000000u, isn't it?
Still, I can not find anything about unary minus on unsigned 0x80000000. Also, C99 standard draft n1336.pdf, 6.5.3.3 seems to say nothing about it:
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
UPDATE2: Let us suppose that unsigned int is 32 bits wide. So, the question is: what about unary minus in C (signed and unsigned), and unary minus in C++ (signed only)?
UPDATE1: both run-time behavior and compile-time behavior (i.e. constant-folding) are interesting.
(related: Why is abs(0x80000000) == 0x80000000?)

For your question, the important part of the quote you've included is this:
The negative of an unsigned quantity is computed by subtracting its
value from 2ⁿ, where n is the number of bits in the promoted operand.
So, to know what the value of -0x80000000u is, we need to know n, the number of bits in the type of 0x80000000u. This is at least 32, but this is all we know (without further information about the sizes of types in your implementation). Given some values of n, we can calculate what the result will be:
n | -0x80000000u
----+--------------
32 | 0x80000000
33 | 0x180000000
34 | 0x380000000
48 | 0xFFFF80000000
64 | 0xFFFFFFFF80000000
(For example, an implementation where unsigned int is 16 bits and unsigned long is 64 bits would have an n of 64).
C99 has equivalent wording hidden away in §6.2.5 Types p9:
A computation involving unsigned operands can never overflow, because
a result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting type.
The result of the unary - operator on an unsigned operand other than zero will always be caught by this rule.
With a 32 bit int, the type of 0x80000000 will be unsigned int, regardless of the lack of a u suffix, so the result will still be the value 0x80000000 with type unsigned int.
If instead you use the decimal constant 2147483648, it will have type long and the calculation will be signed. The result will be the value -2147483648 with type long.

In n1336, 6.3.1.3 Signed and Unsigned Integers, paragraph 2 defines the conversion to an unsigned integer:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
So for 32-bit unsigned int, -0x80000000u==-0x80000000 + 0x100000000==0x80000000u.

Related

returns true for ((unsigned int)0-1)>0

I came across some c++ code which was like
if(((unsigned int)0-1)>0)
{
//do something
}
and the program executed the statements in the if block. Being curious, I tried the same in c, which did the same. My understanding is that the statements in the if block get executed if the expression in the if condition returns a bool value true. This means that ((unsigned int)0-1)>0 must be returning true. Why is this happening?

For (unsigned int)0-1, the operands of operator- is unsigned int 0 and int 1. Then the type of the result (i.e. the common type) would be unsigned int.
Otherwise, if the unsigned operand's conversion rank is greater or equal to the conversion rank of the signed operand, the signed operand is converted to the unsigned operand's type.
For unsigned int, it couldn't be 0-1, but
Unsigned integer arithmetic is always performed modulo 2n
where n is the number of bits in that particular integer. E.g. for unsigned int, adding one to UINT_MAX gives 0, and subtracting one from 0 gives UINT_MAX.
That means if(((unsigned int)0-1)>0) is equivalent to if(UINT_MAX>0), which is true.

if(((unsigned int)0-1)>0)
With ordinary arithmetic, 0-1 is negative one, which is not greater than zero. We're not dealing with ordinary arithmetic here.
The C++ (and also C) precedence rules says that casting has precedence over subtraction, so (unsigned int)0-1 is equivalent to ((unsigned int) 0)-1. In other words, the 0 is treated as an unsigned int.
Next in line we have an unsigned int minus a signed int. The C++ (and C) rules regarding such operations is that the signed value is treated as if it were unsigned. The C++ (and C) rules with regard to unsigned operations is to perform the computation modulo 2N, where N is the number of bits in the common type (typically 32 for an int, but no guarantee of that). 0-1 modulo 2N is 2N-1, which is a (large) positive number.
Quoting from the standard, [basic.fundamental] paragraph 4,
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.46
and the footnote:
46) This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.

unsigned int / signed int / long long: inexplicable output

So I was playing around with types and I came out with this weird result below. Debugging it made no sense, and then the only result was to check out the c++ spects, which didn't helped much. I was wondering if you might know what is happening here exactly, and if it is 32Bit and/or 64Bit specific issue.
#include <iostream>
using namespace std;
int main() {
unsigned int u = 1;
signed int i = 1;
long long lu = -1 * u;
long long li = -1 * i;
std::cout<<"this is a weird " << lu << " " << li << std::endl;
return 0;
}
Where the output is
this is a weird 4294967295 -1

The key observation is that the expression -1 * u is of type unsigned int. That is because the rules for arithmetic conversions* say that if one operand is unsigned int and the other is signed int, then the latter operand is converted to unsigned int. The arithmetic expressions are ultimately only defined for homogeneous operands, so the conversions happen before the operation proper.
The result of the conversion of -1 to unsigned int is a large, positive number, which is representable as a long long int, and which is the number you see in the output.
Currently, that's [expr]/(11.5.3).

The type of -1 is signed int. When you perform an arithmetic operation between objects of different fundamental type, one or both of the arguments will be converted so that both have the same type. (For non-fundamental types, there may be operator overloads for mixed operands). In this case, the signed value is converted to unsigned, following the conversion rules †.
So, -1 was converted to unsigned. But negative numbers cannot be represented by unsigned types. What happens, is that the resulting value will be the smallest positive value that can be represented by the unsigned type, that is congruent with the original signed value modulo the maximum value representable by unsigned type. Which on your platform happens to be 4294967295.
†The rules ([expr], standard draft):
... rules that apply to non-integers ...
Otherwise, the integral promotions (4.5) shall be performed on both operands.61 Then the following
rules shall be applied to the promoted operands:
— If both operands have the same type, no further conversion is needed.
— Otherwise, if both operands have signed integer types or both have unsigned integer types, the
operand with the type of lesser integer conversion rank shall be converted to the type of the
operand with greater rank.
— Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type. (this applies to your case)
— Otherwise, if the type of the operand with signed integer type can represent all of the values of
the type of the operand with unsigned integer type, the operand with unsigned integer type shall
be converted to the type of the operand with signed integer type.
— Otherwise, both operands shall be converted to the unsigned integer type corresponding to the
type of the operand with signed integer type.

The evaluation of
-1 * i
is trivial multiplication of two int types: nothing strange there. And a long long must be capable of holding any int.
First note is there is no such thing as a negative literal in C++, so
-1 * u
is evaluated as (-1) * u due to operator precedence. The type of (-1) must be int. But this will be converted to unsigned int due to C++'s rule of argument conversion as the other argument is an unsigned int In doing that it is converted modulo UINT_MAX + 1, so you end up with UINT_MAX multiplied by 1, which is the number you observe, albeit converted to a long long type.
As a final note, the behaviour of this conversion is subject to the rules of conversion from an unsigned to a signed type: if unsigned int and long long were both 64 bits on your platform then the behaviour is implementation-defined.

The bit pattern "0xFFFFFFFF" corresponds with "-1" when interpreted as a 32b signed integer and corresponds with "4294967295" when interpreted as a 32b unsigned integer.
If used -2 the result is "4294967294"
If used -3 the result is "4294967293"
If used -4 the result is "4294967292"
....

Subtraction between signed and unsigned followed by division

The following results make me really confused:
int i1 = 20-80u; // -60
int i2 = 20-80; // -60
int i3 =(20-80u)/2; // 2147483618
int i4 =(20-80)/2; // -30
int i5 =i1/2; // -30
i3 seems to be computed as (20u-80u)/2, instead of (20-80u)/2
supposedly i3 is the same as i5.

IIRC, an arithmetic operation between signed and unsigned int will produce an unsigned result.
Thus, 20 - 80u produces the unsigned result equivalent to -60: if unsigned int is a 32-bit type, that result is 4294967236.
Incidentally, assigning that to i1 produces an implementation-defined result because the number is too large to fit. Getting -60 is typical, but not guaranteed.

int i1 = 20-80u; // -60
This has subtle demons! The operands are different, so a conversion is necessary. Both operands are converted to a common type (an unsigned int, in this case). The result, which will be a large unsigned int value (60 less than UINT_MAX + 1 if my calculations are correct) will be converted to an int before it's stored in i1. Since that value is out of range of int, the result will be implementation defined, might be a trap representation and thus might cause undefined behaviour when you attempt to use it. However, in your case it coincidentally converts to -60.
int i3 =(20-80u)/2; // 2147483618
Continuing on from the first example, my guess was that the result of 20-80u would be 60 less than UINT_MAX + 1. If UINT_MAX is 4294967295 (a common value for UINT_MAX), that would mean 20-80u is 4294967236... and 4294967236 / 2 is 2147483618.
As for i2 and the others, there should be no surprises. They follow conventional mathematical calculations with no conversions, truncations, overflows or other implementation-defined behaviour what-so-ever.

The binary arithmetic operators will perform the usual arithmetic conversions on their operands to bring them to a common type.
In the case of i1, i3 and i5 the common type will be unsigned int and so the result will also be unsigned int. Unsigned numbers will wrap via modulo arithmetic and so subtracting a slightly larger unsigned value will result in a number close to unsigned int max which can not be represented by an int.
So in the case of i1 we end up with an implementation defined conversion since the value can not be represented. In the case of i3 dividing by 2 brings the unsigned value back into the range of int and so we end up with a large signed int value after conversion.
The relevant sections form the C++ draft standard are as follows. Section 5.7 [expr.add]:
The additive operators + and - group left-to-right. The usual arithmetic conversions are performed for
operands of arithmetic or enumeration type.
The usual arithmetic conversions are covered in section 5 and it says:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
[...]
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
and for the conversion from a value that can not be represented for a signed type, section 4.7 [conv.integral]:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined.
and for unsigned integers obeys modulo arithmetic section 3.9.1 [basic.fundamental]:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value
representation of that particular size of integer.48

Bitshift - Need explanation to understand the code

I was wondering what this function actually performs.
To my understanding it should return pSrc[1].
So why does it bother left-shifting pSrc[0] by 8 bits, which zeroes out those 8 bits.
And when these zeroes are ORed with pSrc[1], pSrc[1] is not affected so you get pSrc[1] anyway as if the bitwise OR had never happened.
/*
* Get 2 big-endian bytes.
*/
INLINE u2 get2BE(unsigned char const* pSrc)
{
return (pSrc[0] << 8) | pSrc[1];
}
This function is from the source code of the dalvik virtual machine.
https://android.googlesource.com/platform/dalvik/+/android-4.4.4_r1/vm/Bits.h
Update:
OK, now I got it thanks to all the answers here.
(1) pSrc[0] is originally an unsigned char (1 byte).
(2) When it is left-shifted (pSrc[0] << 8) with the literal 8 of int type, pSrc[0] is therefore int-promoted to a signed int (4 byte).
(3) The result of pSrc[0] << 8 is that the interested 8 bits in pSrc[0] are shifted over to the second byte of the 4 bytes of the signed int, thereby leaving zeroes in the other bytes(1st,3rd and 4th bytes).
(4) And when it is ORed ( intermediate result from step (3) | pSrc[1]), pSrc[1] is then int-promoted to a signed int (4 bytes).
(5) The result of ( intermediate result from step (3) | pSrc[1]) leaves the first two least significant bytes the way we want with zeroes all in the two most significant bytes.
(6) return only the first two least significant bytes to get the 2 big-endian bytes by returning the result as a u2 type.

For arithmetic operations like this, the unsigned char is converted via a process called integral promotions.
C++11 - N3485 §5.8 [expr.shift]/1:
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand.
And §13.6 [over.built]/17:
For every pair of promoted integral types L and R, there exist candidate operator functions of the form
LR operator%(L , R );
LR operator&(L , R );
LR operator^(L , R );
LR operator|(L , R );
L operator<<(L , R );
L operator>>(L , R );
where LR is the result of the usual arithmetic conversions between types L and R.
When integral promotions are done (§4.5 [conv.prom]/1):
A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion
rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all
the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned
int.
By integral promotions, the unsigned char will be promoted to int. The other operand is already int, so no changes in type are made to it. The return type then becomes int as well.
Thus, what you have is the first unsigned char's bits shifted left, but still in the now-bigger int, and then the second unsigned char's bits at the end.
You'll notice that the return type of operator| is the result of usual arithmetic conversions between the two operands. At this point, those are the int from the shift and the second unsigned char.
This conversion is defined as follows (§5 [expr]/10):
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
…
Otherwise, the integral promotions (4.5) shall be performed on both operands. Then the following
rules shall be applied to the promoted operands:
…
If both operands have the same type, no further conversion is needed.
Since L and R, being promoted before this, are already int, the promotion leaves them the same and the overall return type of the expression is thus int, which is then converted to u2, whatever that happens to be.

There are no operations (other than type conversions) on
unsigned char. Before any operation, integral promotion
occurs, which converts the unsigned char to an int. So the
operation is shifting an int left, not an unsigned char.

C11 6.5.7 Bitwise shift operators
The integer promotions are performed on each of the operands. The type
of the result is that of the promoted left operand. If the value of
the right operand is negative or is greater than or equal to the
width of the promoted left operand, the behavior is undefined.
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with
zeros. If E1 has an unsigned type, the value of the result is E1 × 2E2, reduced modulo
one more than the maximum value representable in the result type. If E1 has a signed
type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.
So pSrc[0] is integer promoted to an int. The literal 8 is already an int, so no integer promotion takes place. The usual arithmetic converstions do not apply to shift operators: they are a special case.
Since the original variable was an unsigned char which gets left shifted 8 bits, we also encounter the issue where "E1" (our promoted variable) is signed and potentially the result cannot be representable in the result type, which leads to undefined behavior if this is a 16 bit system.
In plain English: if you shift something into the sign bits of a signed variable, anything can happen. In general: relying on implicit type promotions is bad programming and dangerous practice.
You should fix the code to this:
((unsigned int)pSrc[0] << 8) | (unsigned int)pSrc[1]

unsigned-signed underflow mechanism

I know that the following
unsigned short b=-5u;
evaluates to b being 65531 due to an underflow, but I don't understand if 5u is converted to a signed int before being transformed into -5 and then re-converted back to unsigned to be stored in b or -5u is equal to 0 - 5u (this should not be the case, -x is a unary operator)

5u is a literal unsigned integer, -5u is its negation.. Negation for unsigned integers is defined as subtraction from 2**n, which gets the same result as wrapping the result of subtraction from zero.

5u is a single token, an rvalue expression which has type unsigned int.
The unary - operator is applied to it according to the rules of unsigned
arithmetic (arithmetic modulo 2^n, where n is the number of bits in the
unsigned type). The results are converted to unsigned short; if they don't
fit (and they won't if sizeof(int) > sizeof(short)), the conversion will
be done using modulo arithmetic as well (modulo 2^n, where n is the number of
bits in the target type).
It's probably worth noting that if the original argument has type unsigned
short, the actual steps are different (although the results will always be
the same). Thus, if you'd have written:
unsigned short k = 5;
unsigned short b = -k;
the first operation would depend on the size of short. If shorts are smaller
than ints (often, but not always the case), the first step would be to promote
k to an int. (If the size of short and int are identical, then the first
step would be to promote k to unsigned int; from then on, everything
happens as above.) The unary - would be applied to this int, according to
the rules of signed integer arithmetic (thus, resulting in a value of -5). The
resulting -5 will be implicitly converted to unsigned short, using modulo
arithmetic, as above.
In general, these distinctions don't make a difference, but in cases where you
might have an integral value of INT_MIN, they could; on 2's complement
machines, -i, where i has type int and value INT_MIN is implementation
defined, and may result in strange values when later converted to unsigned.

ISO/IEC 14882-2003 section 4.7 says:
"If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2^n where n is the number of bits used to represent the unsigned type). [ Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). —end note ]

Technically not an underflow, just the representation of the signed value -5 when shown as an unsigned number. Note that signed and unsigned numbers have the same "bits" - they just display differently. If you were to print the value as a signed value [assuming it's extended using the sign bit to fill the remaining bits], it would show -5. [This assumes that it's a typical machine using 2s complement. The C standard doesn't require that signed and unsigned types are the same number of bits, nor that the computer uses 2s complement for representing signed numbers - obviously, if it's not using 2s complement, it won't match up to the value you've shown, so I made the assumption that yours IS a 2s complement machine - which is all common processors, such as x86, 68K, 6502, Z80, PDP-11, VAX, 29K, 8051, ARM, MIPS. But technically, it is not necessary for C to function correctly]
And when you use the unary operator -x, it has the same effect as 0-x [this applies for computers as well as math - it has the same result].

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js