int multiplication overflow in LCG - c++

The following is copied from one Opensource project's rand(), it use LCG
rand_next = rand_next * 1103515245L + 12345L; //unsigned long rand_next
The classic LCG is:
Next = (Next * a + c) mod M
Obviously, here M is 2^32.
what make me confused is rand_next * 1103515245L, where I am pretty sure overflow will occur! I take a look several rand() implementations, all take this way except using different a and c.
Is that overflow harmful? if not why?
Thanks

This is fine. According to the C99 specification, for unsigned long operations, the result is the same but reduced modulo 232 (§6.2.5):
A computation involving unsigned operands can never overflow, because a result
that cannot be represented by the resulting unsigned integer type is reduced
modulo the number that is one greater than the largest value that can be
represented by the resulting type.
So this behaviour isn't actually "overflow", but I'll call it that for simplicity in this answer. Since for modular arithmetic we have
a1 ≡ b1 (mod m)
a2 ≡ b2 (mod m)
implies
a1 + a2 ≡ b1 + b2 (mod m)
We have
Next * a ≡ k (mod 2^32)
where k is Next * a with "overflow". So since M = 2^32,
Next * a + c ≡ k + c (mod M)
The result with "overflow" is equivalent to the one without "overflow" under modular arithmetic, so the formula is fine. Once we reduce modulo M = 2^32, it will give the same result.

You multiply a signed long with an unsigned long. So, both operands of * have the same integer conversion rank. In this case, the rule below (C++11, §5/9, item 5, sub-item 3) applies:
[...] if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to the type of the operand with unsigned integer type.
So both operands are implicitly converted to unsigned long before the multiplication is evaluated. Hence you get unsigned arithmetic and an unsigned result, and the same rule applies again for the addition operation.
Integer overflow for unsigned is well-defined (see Zong Zhen Li's answer, which has just been expanded to cover this in detail), so there is no problem.
Regarding C (as opposed to C++), C11 has an identical rule in §6.3.1.8/1:
[...] if the operand that has unsigned integer type has rank greater or
equal to the rank of the type of the other operand, then the operand with
signed integer type is converted to the type of the operand with unsigned
integer type.

Related

Does compound assignment of two unsigned integers of the same type always operate as if using that type's modular arithmetic?

Here is a conjecture:
For expression a op b where a and b are of the same unsigned integral type U, and op is one of the compound assignment operators (+=,-=,*=,/=,%=,&=,|=,^=,<<=,>>=), the result is computed directly in the value domain of U using modular arithmetic, as if no integral promotions or usual arithmetic conversions, etc. are performed at all.
Is this true? What about signed integral types?
To clarify: By definition integral promotions and usual arithmetic conversions do apply here. I'm asking if the result is the same as not applying them.
I'm looking for an answer in C++, but if you can point out the difference with C, it would also be nice.
Counter example:
int has width 31 plus one bit for sign, unsigned short has width 16. With a and b of type unsigned short, after integral promotions, the operation is performed in int.
If a and b have value 2^16 - 1, then the mathematical exact result of a * b in the natural numbers would be 2^32 - 2^17 + 1. This is larger than 2^31 - 1 and therefore cannot be represented by int.
Arithmetic overflow in signed integral types results in undefined behavior. Therefore a *= b has undefined behavior. It would not have this undefined behavior if unsigned arithmetic modulo 2^width(unsigned short) was used.
(Applies to all C and C++ versions.)

returns true for ((unsigned int)0-1)>0

I came across some c++ code which was like
if(((unsigned int)0-1)>0)
{
//do something
}
and the program executed the statements in the if block. Being curious, I tried the same in c, which did the same. My understanding is that the statements in the if block get executed if the expression in the if condition returns a bool value true. This means that ((unsigned int)0-1)>0 must be returning true. Why is this happening?
For (unsigned int)0-1, the operands of operator- is unsigned int 0 and int 1. Then the type of the result (i.e. the common type) would be unsigned int.
Otherwise, if the unsigned operand's conversion rank is greater or equal to the conversion rank of the signed operand, the signed operand is converted to the unsigned operand's type.
For unsigned int, it couldn't be 0-1, but
Unsigned integer arithmetic is always performed modulo 2n
where n is the number of bits in that particular integer. E.g. for unsigned int, adding one to UINT_MAX gives ​0​, and subtracting one from ​0​ gives UINT_MAX.
That means if(((unsigned int)0-1)>0) is equivalent to if(UINT_MAX>0), which is true.
if(((unsigned int)0-1)>0)
With ordinary arithmetic, 0-1 is negative one, which is not greater than zero. We're not dealing with ordinary arithmetic here.
The C++ (and also C) precedence rules says that casting has precedence over subtraction, so (unsigned int)0-1 is equivalent to ((unsigned int) 0)-1. In other words, the 0 is treated as an unsigned int.
Next in line we have an unsigned int minus a signed int. The C++ (and C) rules regarding such operations is that the signed value is treated as if it were unsigned. The C++ (and C) rules with regard to unsigned operations is to perform the computation modulo 2N, where N is the number of bits in the common type (typically 32 for an int, but no guarantee of that). 0-1 modulo 2N is 2N-1, which is a (large) positive number.
Quoting from the standard, [basic.fundamental] paragraph 4,
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.46
and the footnote:
46) This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.

Type promotion and conversion in c

double dVal;
int iVal = -7;
unsigned long ulVal = 1000;
dVal = iVal * ulVal;
printf("iVal * ulVal = %lf\n", dVal);
Can someone explain step by step how to get 4294960296.000000?
What comes first, changing the sign of iVal to unsigned or promotion
to ulVal type before multiplication with ulVal?
Also if we multiplicate iVal and ulVal we are out of range for long type
and we store that value of multiplication to double variable (so we have conversion
again). But how we can know to which value to round, when double type
is the most precise around 0 and as far we go from 0 the distance
between adjacent numbers is bigger?
It is really quite simple:
iVal is promoted from int to unsigned long. So its value of -7 (being a two's compliment) becomes a positive value of 0xFFFFFFF9 (ie. 4294967289) (at least on your particular system).
This when multipled by 1000 overflows the unsigned long, so instead of being a result of 4,294,967,289,000 (0x3E7 FFFF E4A8) it ends up 0xFFFFE4A8 (429960296).
This is then converted to a double, getting your final answer. The trailing zeros are because the value of the float is very slightly above 429960296 because it is constructed as a sum of fractions that printf is rounding to 6 digits.
Referring to http://en.cppreference.com/w/c/language/conversion as mentioned by Joachim,
First the multiplication of the two integers occurs. The result is then stored in the float.
So we look at iVal * ulVal. Here we refer to the section on Usual arithmetic conversions. Both operands are integers so case 4. applies.
First the integer Promotions occur. In this, both operands are ints or greater so they are unchanged.
If the types after promotion are the same, that type is the common type
This is not applicable, as the types are int and unsigned long respectively.
Otherwise, if both operands after promotion have the same signedness (both signed or both unsigned), the operand with the lesser conversion rank (see below) is implicitly converted to the type of the operand with the greater conversion rank
This too, does not apply, as one type is signed and the second is unsigned
Otherwise, the signedness is different: If the operand with the unsigned type has conversion rank greater or equal than the rank of the type of the signed operand, then the operand with the signed type is implicitly converted to the unsigned type
Here the unsigned operand is long and the signed is int, The rank for long is greater than int, so this part applies. The signed int is converted to a unsigned long.
So, we have the multiplication of two numbers 4294967289 (unsigned long) and 1000 (unsigned long). Doing the multiplication there is an overflow but if you calculate 4294967289000 % 2^32, you get 4294960296.
This is then converted to a float at the equality sign and then printed.

Subtraction between signed and unsigned followed by division

The following results make me really confused:
int i1 = 20-80u; // -60
int i2 = 20-80; // -60
int i3 =(20-80u)/2; // 2147483618
int i4 =(20-80)/2; // -30
int i5 =i1/2; // -30
i3 seems to be computed as (20u-80u)/2, instead of (20-80u)/2
supposedly i3 is the same as i5.
IIRC, an arithmetic operation between signed and unsigned int will produce an unsigned result.
Thus, 20 - 80u produces the unsigned result equivalent to -60: if unsigned int is a 32-bit type, that result is 4294967236.
Incidentally, assigning that to i1 produces an implementation-defined result because the number is too large to fit. Getting -60 is typical, but not guaranteed.
int i1 = 20-80u; // -60
This has subtle demons! The operands are different, so a conversion is necessary. Both operands are converted to a common type (an unsigned int, in this case). The result, which will be a large unsigned int value (60 less than UINT_MAX + 1 if my calculations are correct) will be converted to an int before it's stored in i1. Since that value is out of range of int, the result will be implementation defined, might be a trap representation and thus might cause undefined behaviour when you attempt to use it. However, in your case it coincidentally converts to -60.
int i3 =(20-80u)/2; // 2147483618
Continuing on from the first example, my guess was that the result of 20-80u would be 60 less than UINT_MAX + 1. If UINT_MAX is 4294967295 (a common value for UINT_MAX), that would mean 20-80u is 4294967236... and 4294967236 / 2 is 2147483618.
As for i2 and the others, there should be no surprises. They follow conventional mathematical calculations with no conversions, truncations, overflows or other implementation-defined behaviour what-so-ever.
The binary arithmetic operators will perform the usual arithmetic conversions on their operands to bring them to a common type.
In the case of i1, i3 and i5 the common type will be unsigned int and so the result will also be unsigned int. Unsigned numbers will wrap via modulo arithmetic and so subtracting a slightly larger unsigned value will result in a number close to unsigned int max which can not be represented by an int.
So in the case of i1 we end up with an implementation defined conversion since the value can not be represented. In the case of i3 dividing by 2 brings the unsigned value back into the range of int and so we end up with a large signed int value after conversion.
The relevant sections form the C++ draft standard are as follows. Section 5.7 [expr.add]:
The additive operators + and - group left-to-right. The usual arithmetic conversions are performed for
operands of arithmetic or enumeration type.
The usual arithmetic conversions are covered in section 5 and it says:
Many binary operators that expect operands of arithmetic or enumeration type cause conversions and yield
result types in a similar way. The purpose is to yield a common type, which is also the type of the result.
This pattern is called the usual arithmetic conversions, which are defined as follows:
[...]
Otherwise, if the operand that has unsigned integer type has rank greater than or equal to the
rank of the type of the other operand, the operand with signed integer type shall be converted to
the type of the operand with unsigned integer type.
and for the conversion from a value that can not be represented for a signed type, section 4.7 [conv.integral]:
If the destination type is signed, the value is unchanged if it can be represented in the destination type (and
bit-field width); otherwise, the value is implementation-defined.
and for unsigned integers obeys modulo arithmetic section 3.9.1 [basic.fundamental]:
Unsigned integers shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value
representation of that particular size of integer.48

unary minus for 0x80000000 (signed and unsigned)

The n3337.pdf draft, 5.3.1.8, states that:
The operand of the unary - operator shall have arithmetic or unscoped enumeration type and the result is the negation of its operand. Integral promotion is performed on integral or enumeration operands. The negative of an unsigned quantity is computed by subtracting its value from 2ⁿ, where n is the number of bits in the promoted operand. The type of the result is the type of the promoted operand.
For some cases it is enough. Suppose unsigned int is 32 bits wide, then (-(0x80000000u)) == 0x80000000u, isn't it?
Still, I can not find anything about unary minus on unsigned 0x80000000. Also, C99 standard draft n1336.pdf, 6.5.3.3 seems to say nothing about it:
The result of the unary - operator is the negative of its (promoted) operand. The integer promotions are performed on the operand, and the result has the promoted type.
UPDATE2: Let us suppose that unsigned int is 32 bits wide. So, the question is: what about unary minus in C (signed and unsigned), and unary minus in C++ (signed only)?
UPDATE1: both run-time behavior and compile-time behavior (i.e. constant-folding) are interesting.
(related: Why is abs(0x80000000) == 0x80000000?)
For your question, the important part of the quote you've included is this:
The negative of an unsigned quantity is computed by subtracting its
value from 2ⁿ, where n is the number of bits in the promoted operand.
So, to know what the value of -0x80000000u is, we need to know n, the number of bits in the type of 0x80000000u. This is at least 32, but this is all we know (without further information about the sizes of types in your implementation). Given some values of n, we can calculate what the result will be:
n | -0x80000000u
----+--------------
32 | 0x80000000
33 | 0x180000000
34 | 0x380000000
48 | 0xFFFF80000000
64 | 0xFFFFFFFF80000000
(For example, an implementation where unsigned int is 16 bits and unsigned long is 64 bits would have an n of 64).
C99 has equivalent wording hidden away in §6.2.5 Types p9:
A computation involving unsigned operands can never overflow, because
a result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting type.
The result of the unary - operator on an unsigned operand other than zero will always be caught by this rule.
With a 32 bit int, the type of 0x80000000 will be unsigned int, regardless of the lack of a u suffix, so the result will still be the value 0x80000000 with type unsigned int.
If instead you use the decimal constant 2147483648, it will have type long and the calculation will be signed. The result will be the value -2147483648 with type long.
In n1336, 6.3.1.3 Signed and Unsigned Integers, paragraph 2 defines the conversion to an unsigned integer:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
So for 32-bit unsigned int, -0x80000000u==-0x80000000 + 0x100000000==0x80000000u.