Integer arithmetic when overflow exists - c++

Two 32 bit integer values A and B, are processed to give the 32 bit integers C and D as per the following rules. Which of the rule(s) is(are) reversible?
i.e. is it possible to obtain A and B given c and D in all condition?
A. C = (int32)(A+B), D = (int32)(A-B)
B. C = (int32)(A+B), D= (int32)((A-B)>>1)
C. C = (int32)(A+B), D = B
D. C = (int32)(A+B), D = (int32)(A+2*B)
E. C = (int32)(A*B), D = (int32)(A/B)
A few questions about the integer arithmetic. Modular addition forms amathematical structure known as an abelian group. How about signed addition? It's also commutative (that’s where the “abelian” part comes in) and associative, is this forms a n an abelian group?
Given that integer addition is commutative and associative, C is apparently true, because we can retrieve A by (A+(B-B)). What about D? Can we assume that 2 * B = B + B st. B = A+B+B-(A+B)?
And multiplication is more complicated, but I know that it can not be retrieve A if there is an overflow.

This is a quote from 5 [expr] paragraph 4:
If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined.
What makes overflow for unsigned integers work is defined in 3.9.1 [basic.fundamental] paragraph 4:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.
Basically this says that you shall not overflow when using signed integer arithmetic. If you do, all bets are off. This implies that signed integers do not form an Abelian group in C++.

Related

Does compound assignment of two unsigned integers of the same type always operate as if using that type's modular arithmetic?

Here is a conjecture:
For expression a op b where a and b are of the same unsigned integral type U, and op is one of the compound assignment operators (+=,-=,*=,/=,%=,&=,|=,^=,<<=,>>=), the result is computed directly in the value domain of U using modular arithmetic, as if no integral promotions or usual arithmetic conversions, etc. are performed at all.
Is this true? What about signed integral types?
To clarify: By definition integral promotions and usual arithmetic conversions do apply here. I'm asking if the result is the same as not applying them.
I'm looking for an answer in C++, but if you can point out the difference with C, it would also be nice.
Counter example:
int has width 31 plus one bit for sign, unsigned short has width 16. With a and b of type unsigned short, after integral promotions, the operation is performed in int.
If a and b have value 2^16 - 1, then the mathematical exact result of a * b in the natural numbers would be 2^32 - 2^17 + 1. This is larger than 2^31 - 1 and therefore cannot be represented by int.
Arithmetic overflow in signed integral types results in undefined behavior. Therefore a *= b has undefined behavior. It would not have this undefined behavior if unsigned arithmetic modulo 2^width(unsigned short) was used.
(Applies to all C and C++ versions.)

Why is the sign different after subtracting unsigned and signed?

unsigned int t = 10;
int d = 16;
float c = t - d;
int e = t - d;
Why is the value of c positive but e negative?
Let's start by analysing the result of t - d.
t is an unsigned int while d is an int, so to do arithmetic on them, the value of d is converted to an unsigned int (C++ rules say unsigned gets preference here). So we get 10u - 16u, which (assuming 32-bit int) wraps around to 4294967290u.
This value is then converted to float in the first declaration, and to int in the second one.
Assuming the typical implementation of float (32-bit single-precision IEEE), its highest representable value is roughly 1e38, so 4294967290u is well within that range. There will be rounding errors, but the conversion to float won't overflow.
For int, the situation's different. 4294967290u is too big to fit into an int, so wrap-around happens and we arrive back at the value -6. Note that such wrap-around is not guaranteed by the standard: the resulting value in this case is implementation-defined(1), which means it's up to the compiler what the result value is, but it must be documented.
(1) C++17 (N4659), [conv.integral] 7.8/3:
If the destination type is signed, the value is unchanged if it can be represented in the destination type;
otherwise, the value is implementation-defined.
First, you have to understand "usual arithmetic conversions" (that link is for C, but the rules are the same in C++). In C++, if you do arithmetic with mixed types (you should avoid that when possible, by the way), there's a set of rules that decides which type the calculation is done in.
In your case, you are subtracting a signed int from an unsigned int. The promotion rules say that the actual calculation is done using unsigned int.
So your calculation is 10 - 16 in unsigned int arithmetic. Unsigned arithmetic is modulo arithmetic, meaning that it wraps around. So, assuming your typical 32-bit int, the result of this calculation is 2^32 - 6.
This is the same for both lines. Note that the subtraction is completely independent from the assignment; the type on the left side has absolutely no influence on how the calculation happens. It is a common beginner mistake to think that the type on the left side somehow influences the calculation; but float f = 5 / 6 is zero, because the division still uses integer arithmetic.
The difference, then, is what happens during the assignment. The result of the subtraction is implicitly converted to float in one case, and int in the other.
The conversion to float tries to find the closest value to the actual one that the type can represent. This will be some very large value; not quite the one the original subtraction yielded though.
The conversion to int says that if the value fits into the range of int, the value will be unchanged. But 2^32 - 6 is far larger than the 2^31 - 1 that a 32-bit int can hold, so you get the other part of the conversion rule, which says that the resulting value is implementation-defined. This is a term in the standard that means "different compilers can do different things, but they have to document what they do".
For all practical purposes, all compilers that you'll likely encounter say that the bit pattern stays the same and is just interpreted as signed. Because of the way 2's complement arithmetic works (the way that almost all computers represent negative numbers), the result is the -6 you would expect from the calculation.
But all this is a very long way of repeating the first point, which is "don't do mixed type arithmetic". Cast the types first, explicitly, to types that you know will do the right thing.

Does the C++ standard require the maximum of unsigned integer numbers to be of the form 2^N-1?

For T so that std::is_integral<T>::value && std::is_unsigned<T>::value is true, does the C++ standard guarantee that :
std::numeric_limits<T>::max() == 2^(std::numeric_limits<T>::digits)-1
in the mathematical sense? I am looking for a proof of that based on quotes from the standard.
C++ specifies the ranges of the integral types by reference to the C standard. The C standard says:
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N − 1, so that objects of that type shall be capable of
representing values from 0 to 2N − 1 using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.
Moreover, C++ requires:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.
Putting all this together, we find that an unsigned integral type has n value bits, represents the values in the range [0, 2n) and obeys the laws of arithmetic modulo 2n.
I believe that this is implied by [basic.fundamental]/4 (N3337):
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2^n where n is the number
of bits in the value representation of that particular size of integer.

C++ Arithmetic With Mixed Integral Types That Causes Overflow

I have done some tests in VC++2010 mixing operands of different sizes that cause overflow in add operation:
int _tmain(int argc, _TCHAR* argv[])
{
__int8 a=127;
__int8 b=1;
__int16 c=b+a;
__int8 d=b+a;
printf("c=%d,d=%d\n",c,d);
return 0;
}
//result is: c=128, d=-128
I don't understand why c==128! My understanding is that in both additions, b+a are still considered addition of 2 signed 8 bit variables. So the result is an overflow i.e. -128. After that, the result is then promoted to 16 bit signed int for the first assignment operation and c should still get a 16 bit -128 value. Is my understanding correct? The c++ standard is a bit difficult to read. Chapter 4 seems talking about integeral promotion but I can't find anything related to this specific example.
My understanding is that in both additions, b+a are still considered addition of 2 signed 8 bit variables. So the result is an overflow i.e. -128.
No, the promotion happens before the + is evaluated, not after it. The addition happens when both a and b are positive. Both numbers are promoted to ints for an addition, added as two positive numbers, and then converted to a 16-bit short. At no point in the process does the result become negative due to an overflow, hence the end result of 128.
Arguably, this makes sense: the behavior of a and b matches that of two numbers in mathematics, making it more intuitive to language practitioners.
It's integral promotion.
1 A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank (4.13) is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int. [§ 4.5]
In this statement
__int16 c=b+a;
First, all char and short int values are automatically elevated to int. This process is called integral promotion. Next, all operands are converted up to the type of the largest operand, which is called type promotion. [Herbert Schildt]
The values of variables b and a will be promoted to int and then the operation applies on them.
In twos-complement integer representation, a signed value is represented by setting the highest bit. This allows the machine to add and subtract binary integers with the same instructions, regardless of whether the integer is signed.
a = 127 == 0x7F == 0b01111111
+ b = 1 == 0x01 == 0b00000001
-------------------------------
c = 128 == 0x80 == 0b10000000
d =-128 == 0x80 == 0b10000000
The variables c and d may have different types, but different types of integers are merely different interpretations of a single binary value. As you can see above, the binary value fits in 8 bits just fine. Since the standard requires terms of a mathematical expression to be zero- or sign-extended (promoted) to the size of a machine word before any math is done and neither operand will be sign-extended, the result is always 0b10000000 no matter what type the operands are.
In summary, the difference between the results is that, to a 16 bit integer the sign bit is 0b1000000000000000 (which a+b doesn't have), and to an 8 bit integer the sign bit is 0b10000000 (which a+b does have).

int multiplication overflow in LCG

The following is copied from one Opensource project's rand(), it use LCG
rand_next = rand_next * 1103515245L + 12345L; //unsigned long rand_next
The classic LCG is:
Next = (Next * a + c) mod M
Obviously, here M is 2^32.
what make me confused is rand_next * 1103515245L, where I am pretty sure overflow will occur! I take a look several rand() implementations, all take this way except using different a and c.
Is that overflow harmful? if not why?
Thanks
This is fine. According to the C99 specification, for unsigned long operations, the result is the same but reduced modulo 232 (§6.2.5):
A computation involving unsigned operands can never overflow, because a result
that cannot be represented by the resulting unsigned integer type is reduced
modulo the number that is one greater than the largest value that can be
represented by the resulting type.
So this behaviour isn't actually "overflow", but I'll call it that for simplicity in this answer. Since for modular arithmetic we have
a1 ≡ b1 (mod m)
a2 ≡ b2 (mod m)
implies
a1 + a2 ≡ b1 + b2 (mod m)
We have
Next * a ≡ k (mod 2^32)
where k is Next * a with "overflow". So since M = 2^32,
Next * a + c ≡ k + c (mod M)
The result with "overflow" is equivalent to the one without "overflow" under modular arithmetic, so the formula is fine. Once we reduce modulo M = 2^32, it will give the same result.
You multiply a signed long with an unsigned long. So, both operands of * have the same integer conversion rank. In this case, the rule below (C++11, §5/9, item 5, sub-item 3) applies:
[...] if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
So both operands are implicitly converted to unsigned long before the multiplication is evaluated. Hence you get unsigned arithmetic and an unsigned result, and the same rule applies again for the addition operation.
Integer overflow for unsigned is well-defined (see Zong Zhen Li's answer, which has just been expanded to cover this in detail), so there is no problem.
Regarding C (as opposed to C++), C11 has an identical rule in §6.3.1.8/1:
[...] if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.