Correctness of multiplication with overflow detection - c++

The following C++ template detects overflows from multiplying two unsigned integers.
template<typename UInt> UInt safe_multiply(UInt a, UInt b) {
UInt x = a * b; // x := ab mod n, for n := 2^#bits > 0
if (a != 0 && x / a != b)
cerr << "Overflow for " << a << " * " << b << "." << endl;
return x;
}
Can you give a proof that this algorithm detects every potential overflow, regardless of how many bits UInt uses?
The case
cannot result in overflows, so we can consider
.
It seems that the correctness proof boils down to leading
to a contradiction, since x / a actually means .
When assuming
, this leads to the straightforward consequence
thus
which contradicts n > 0.
So it remains to show
or there must be another way.
If the last equation is true, WolframAlpha fails to confirm that (also with exponents).
However, it asserts that the original assumptions have no integer solutions, so the algorithms seems to be correct indeed.
But it doesn't provide an explanation. So why is it correct?
I am looking for the smallest possible explanation that is still mathematically profound, ideally that it fits in a single-line comment. Maybe I am missing something trivial, or the problem is not as easy as it looks.
On a side note, I used Codecogs Equation Editor for the LaTeX markup images, which apparently looks bad in dark mode, so consider switching to light mode or, if you know, please tell me how to use different images depending on the client settings. It is just \bg{white} vs. \bg{black} as part of the image URLs.

To be clear, I'll use the multiplication and division symbols (*, /) mathematically.
Also, for convenience let's name the set N = {0, 1, ..., n - 1}.
Let's clear up what unsigned multiplication is:
Unsigned multiplication for some magnitude, n, is a modular n operation on unsigned-n inputs (inputs that are in N) that results in an unsigned-n output (ie. also in N).
In other words, the result of unsigned multiplication, x, is x = a*b (mod n), and, additionally, we know that x,a,b are in N.
It's important to be able to expand many modular formulas like so: x = a*b - k*n, where k is an integer - but in our case x,a,b are in N so this implies that k is in N.
Now, let's mathematically say what we're trying to prove:
Given positive integers, a,n, and non-negative integers x,b, where x,a,b are in N, and x = a*b (mod n), then a*b >= n (overflow) implies floor(x/a) != b.
Proof:
If overflow (a*b >= n) then x >= n - k*n = (1 - k)*n (for k in N),
As x < n then (1 - k)*n < n, so k > 0.
This means x <= a*b - n.
So, floor(x/a) <= floor([a*b - n]/a) = floor(a*b/a - n/a) = b - floor(n/a) <= b - 1,
Abbreviated: floor(x/a) <= b - 1
Therefore floor(x/a) != b

The multiplication gives either the mathematically correct result, or it is off by some multiple of 2^64. Since you check for a=0, the division always gives the correct result for its input. But in the case of overflow, the input is off by 2^64 or more, so the test will fail as you hoped.
The last bit is that unsigned operations don’t have undefined behaviour except for division by zero, so your code is fine.

Related

Modulo Multiplication Function: Multiplying two integers under a modulus

I came across this modulo multiplication function in a code for the miller-rabin primality test. This is supposed to eliminate the integer overflow that occurs when calculating ( a * b ) % m.
I need some help in understanding what is going on here. Why does this work? and what is the significance of that number literal 0x8000000000000000ULL?
unsigned long long mul_mod(unsigned long long a, unsigned long long b, unsigned long long m) {
unsigned long long d = 0, mp2 = m >> 1;
if (a >= m) a %= m;
if (b >= m) b %= m;
for (int i = 0; i < 64; i++)
{
d = (d > mp2) ? (d << 1) - m : d << 1;
if (a & 0x8000000000000000ULL)
d += b;
if (d >= m) d -= m;
a <<= 1;
}
return d;
}
This code, which currently appears on the modular arithmetic Wikipedia page, only works for arguments of up to 63 bits -- see bottom.
Overview
One way to compute an ordinary multiplication a * b is to add left-shifted copies of b -- one for each 1-bit in a. This is similar to how most of us did long multiplication in school, but simplified: Since we only ever need to "multiply" each copy of b by 1 or 0, all we need to do is either add the shifted copy of b (when the corresponding bit of a is 1) or do nothing (when it's 0).
This code does something similar. However, to avoid overflow (mostly; see below), instead of shifting each copy of b and then adding it to the total, it adds an unshifted copy of b to the total, and relies on later left-shifts performed on the total to shift it into the correct place. You can think of these shifts "acting on" all the summands added to the total so far. For example, the first loop iteration checks whether the highest bit of a, namely bit 63, is 1 (that's what a & 0x8000000000000000ULL does), and if so adds an unshifted copy of b to the total; by the time the loop completes, the previous line of code will have shifted the total d left 1 bit 63 more times.
The main advantage of doing it this way is that we are always adding two numbers (namely b and d) that we already know are less than m, so handling the modulo wraparound is cheap: We know that b + d < 2 * m, so to ensure that our total so far remains less than m, it suffices to check whether b + d < m, and if not, subtract m. If we were to use the shift-then-add approach instead, we would need a % modulo operation per bit, which is as expensive as division -- and usually much more expensive than subtraction.
One of the properties of modulo arithmetic is that, whenever we want to perform a sequence of arithmetic operations modulo some number m, performing them all in usual arithmetic and taking the remainder modulo m at the end always yields the same result as taking remainders modulo m for each intermediate result (provided no overflows occur).
Code
Before the first line of the loop body, we have the invariants d < m and b < m.
The line
d = (d > mp2) ? (d << 1) - m : d << 1;
is a careful way of shifting the total d left by 1 bit, while keeping it in the range 0 .. m and avoiding overflow. Instead of first shifting it and then testing whether the result is m or greater, we test whether it is currently strictly above RoundDown(m/2) -- because if so, after doubling, it will surely be strictly above 2 * RoundDown(m/2) >= m - 1, and so require a subtraction of m to get back in range. Note that even though the (d << 1) in (d << 1) - m may overflow and lose the top bit of d, this does no harm as it does not affect the lowest 64 bits of the subtraction result, which are the only ones we are interested in. (Also note that if d == m/2 exactly, we wind up with d == m afterward, which is slightly out of range -- but changing the test from d > mp2 to d >= mp2 to fix this would break the case where m is odd and d == RoundDown(m/2), so we have to live with this. It doesn't matter, because it will be fixed up below.)
Why not simply write d <<= 1; if (d >= m) d -= m; instead? Suppose that, in infinite-precision arithmetic, d << 1 >= m, so we should perform the subtraction -- but the high bit of d is on and the rest of d << 1 is less than m: In this case, the initial shift will lose the high bit and the if will fail to execute.
Restriction to inputs of 63 bits or fewer
The above edge case can only occur when d's high bit is on, which can only occur when m's high bit is also on (since we maintain the invariant d < m). So it looks like the code is taking pains to work correctly even with very high values of m. Unfortunately, it turns out that it can still overflow elsewhere, resulting in incorrect answers for some inputs that set the top bit. For example, when a = 3, b = 0x7FFFFFFFFFFFFFFFULL and m = 0xFFFFFFFFFFFFFFFFULL, the correct answer should be 0x7FFFFFFFFFFFFFFEULL, but the code will return 0x7FFFFFFFFFFFFFFDULL (an easy way to see the correct answer is to rerun with the values of a and b swapped). Specifically, this behaviour occurs whenever the line d += b overflows and leaves the truncated d less than m, causing a subtraction to be erroneously skipped.
Provided this behaviour is documented (as it is on the Wikipedia page), this is just a limitation, not a bug.
Removing the restriction
If we replace the lines
if (a & 0x8000000000000000ULL)
d += b;
if (d >= m) d -= m;
with
unsigned long long x = -(a >> 63) & b;
if (d >= m - x) d -= m;
d += x;
the code will work for all inputs, including those with top bits set. The cryptic first line is just a conditional-free (and thus usually faster) way of writing
unsigned long long x = (a & 0x8000000000000000ULL) ? b : 0;
The test d >= m - x operates on d before it has been modified -- it's like the old d >= m test, but b (when the top bit of a is on) or 0 (otherwise) has been subtracted from both sides. This tests whether d would be m or larger once x is added to it. We know that the RHS m - x never underflows, because the largest x can be is b and we have established that b < m at the top of the function.

Why is my program for calculating the woodall numbers producing wrong results after n >47?

For this function that calculates the woodall numbers up to n = 64
And the algorithm for a woodall is Wn = n ⋅ 2n - 1
for (int n = 1; n <= 64; ++n)
{
a[n - 1] = (n * (exp2(n))) - 1;
}
But after n is greater than 47 the results are wrong in that it seems like it is forgetting to - 1 the result of n * (exp2(n)).
Here is what the output is if i cout the values via
std::cout << i << ":\t" << std::setprecision(32) << a[i - 1] << std::endl;
... before is correct
n
45: 1583296743997439
46: 3236962232172543
47: 6614661952700415
48: 13510798882111488
49: 27584547717644288
50: 56294995342131200
... after is incorrect
for a[] is an unsigned long int
The function produces correct results if I separate the - 1 operation out to its own for loop though:
for (int n = 1; n <= 64; ++n)
{
a[n - 1] = (n * (exp2(n)));
}
for (int n = 1; n <= 64; ++n)
{
a[n - 1] = a[n - 1] - 1;
}
exp2(n) returns a double.
In IEEE754 (a very common specification for floating point types), that only gives you exact integers up to the 52nd power of 2. Thereafter you get approximations.
You observe issues before the 52nd Woodall number since the entire expression n * (exp2(n))) - 1 is a double due to implicit type conversion. By a computational quirk, it's the -1 that causes the problem. It just happens that the other term is an appropriate multiple of a power of 2 which allows it to be represented as a double without precision loss! This is the reason behind your second snippet working but your first snippet not.
On a system with a 64 bit int, you'll hit integer limits (and undefined behaviour) on the 63rd power of 2.
Your best bet is to generate the Woodall numbers purely in unsigned arithmetic (note the relationship between << and a power of 2), perhaps even using a recurrence relation for successive Woodall numbers.
double has precision limitations. It does use a binary base to work, though, meaning most numbers finishing with a series of zero bits in binary can be represented exactly, which is the case for multiples of exp2(int).
50 * exp2(50) which is 56294995342131200 for example, is C8000000000000 in hexadecimal. Even though the number of digits exceeds the precision limitations of double, it can be represented exactly. However, if I try to sum or subtract 1 from this number, that is no longer the case.
double can't represent 56294995342131199 nor 56294995342131201, so when you try to do it, it simply gets rounded back to 56294995342131200.
This is why your - 1 bit is failing, it is still being operated as a double when you try to perform this operation. You'd have to cast the rest of the expression to int64_t before performing this subtraction.
But another solution is to not use exp2() at all. Since we are working with integers, you can simply use bitwise operations to perform the same task. (1 << n) will yield you the same results as exp2() except it is now in integer format, and because you are just multiplying this to n, you can actually just do (n << n).
Of course, this will still break down the line. int64_t can only hold a number as big as 263-1 and uint64_t 264-1, which should break when you iterator reaches around n = 57.

Mutliplication overflow test [duplicate]

This question already has answers here:
How do I detect unsigned integer overflow?
(31 answers)
Closed 8 years ago.
How to correctly check if overflow occurs in integer multiplication?
int i = X(), j = Y();
i *= j;
How to check for overflow, given values of i, j and their type? Note that the check must work correctly for both signed and unsigned types. Can assume that both i and j are of the same type. Can also assume that the type is known while writing the code, so different solutions can be provided for signed / unsigned cases (no need for template juggling, if it works in "C", it is a bonus).
EDIT:
Answer of #pmg is the correct one. I just couldn't wrap my head around its simplicity for a while so I will share with you here. Suppose we want to check:
i * j > MAX
But we can't really check because i * j would cause overflow and the result would be incorrect (and always less or equal to MAX). So we modify it like this:
i > MAX / j
But this is not quite correct, as in the division, there is some rounding involved. Rather, we want to know the result of this:
i > floor(MAX / j) + float(MAX % j) / j
So we have the division itself, which is implicitly rounded down by the integer arithmetics (the floor is no-op there, merely as an illustration), and we have the remainder of the division which was missing in the previous inequality (which evaluates to less than 1).
Assume that i and j are two numbers at the limit and if any of them increases by 1, an overflow will occur. Assuming none of them is zero (in which case no overflow would occur anyway), both (i + 1) * j and i * (j + 1) are both more than 1 + (i * j). We can therefore safely ignore the roundoff error of the division, which is less than 1.
Alternately, we can reorganize as such:
i - floor(MAX / j) > float(MAX % j) / j
Basically, this tells us that i - floor(MAX / j) must be greater than a number in a [0, 1) interval. That can be written exactly, as:
i - floor(MAX / j) >= 1
Because 1 is just after the interval. We can rewrite as:
i - floor(MAX / j) > 0
Or as:
i > floor(MAX / j)
So we have shown equivalence of the simple test and the floating-point version. It is because the division does not cause significant roundoff error. We can now use the simple test and live happily ever after.
You cannot test afterwards. If the multiplication overflows, it triggers Undefined Behaviour which can render tests inconclusive.
You need to test before doing the multiplication
if (INT_MAX / x > y) /* multiplication of x and y will overflow */;
If your compiler has a type that is at least twice as big as int then you can do this:
long long r = 1LL * x * y;
if ( r > INT_MAX || r < INT_MIN )
// overflowed...
else
x = r;
For portability you should STATIC_ASSERT( sizeof(long long) >= 2 * sizeof(int) ); or something similar but more extreme if you're worried about padding bits!
Try this
bool willoverflow(uint32_t a, uint32_t b) {
size_t a_bits=highestOneBitPosition(a),
size_t b_bits=highestOneBitPosition(b);
return (a_bits+b_bits<=32);
}
It is possible to see if overflow occured postfacto by using a division. In the case of unsigned values, the multiplication z=x*y has overflowed if y!=0 and:
bool overflow_occured = (y!=0)? z/y!=x : false;
(if y did equal zero, no overflow occured). For the case of signed values, it is a little trickier.
if(y!=0){
bool overflow_occured = (y<0 && x=2^31) | (y!=0 && z/y != x);
}
We need the first part of the expression because the first test will fail if x=-2^31 and y=-1. In this case the multiplication overflows, but the machine may give a result of -2^31. Therefore we test for it seperately.
This is true for 32 bit values. Extending the code to the 64 bit case is left as an exercise for the reader.

Optimising code for modular arithmetic

I am trying to calculate below expression for large numbers.
Since the value of this expression will be very large, I just need the value of this expression modulus some prime number. Suppose the value of this expression is x and I choose the prime number 1000000007; I'm looking for x % 1000000007.
Here is my code.
#include<iostream>
#define MOD 1000000007
using namespace std;
int main()
{
unsigned long long A[1001];
A[2]=2;
for(int i=4;i<=1000;i+=2)
{
A[i]=((4*A[i-2])/i)%MOD;
A[i]=(A[i]*(i-1))%MOD;
while(1)
{
int N;
cin>>N;
cout<<A[N];
}
}
But even this much optimisation is failing for large values of N. For example if N is 50, the correct output is 605552882, but this gives me 132924730. How can I optimise it further to get the correct output?
Note : I am only considering N as even.
When you do modular arithmetic, there is no such operation as division. Instead, you take the modular inverse of the denominator and multiply. The modular inverse is computed using the extended Euclidean algorithm, discovered by Etienne Bezout in 1779:
# return y such that x * y == 1 (mod m)
function inverse(x, m)
a, b, u := 0, m, 1
while x > 0
q, r := divide(b, x)
x, a, b, u := b % x, u, x, a - q * u
if b == 1 return a % m
error "must be coprime"
The divide function returns both quotient and remainder. All of the assignment operators given above are simultaneous assignment, where all of the right hand sides are computed first, then all of the left hand sides are assigned simultaneously. You can see more about modular arithmetic at my blog.
For starters no modulo division is needed at all, your formula can be rewrited as follows:
N!/((N/2)!^2)
=(1.2.3...N)/((1.2.3...N/2)*(1.2.3...N/2))
=((N/2+1)...N)/(1.2.3...N/2))
ok now you are dividing bigger number by the smaller
so you can iterate the result by multiplicating divisor and divident
so booth sub results have similar magnitude
any time both numbers are divisible 2 shift them left
this will ensure that the do not overflow
if you are at the and of (N/2)! than continue the the multiplicetion only for the rest.
any time both subresults are divisible by anything divide them
until you are left with divison by 1
after this you can multiply with modulo arithmetics till the end normaly.
for more advanced approach see this.
N! and (N/2)! are decomposable much further than it seems at the first look
i had solved that for some time now,...
here is what i found: Fast exact bigint factorial
in shortcut your terms N! and ((N/2)!)^2 will disappear completely.
only simple prime decomposition + 4N <-> 1N correction will remind
solution:
I. (4N!)=((2N!)^2) . mul(i=all primes<=4N) of [i^sum(j=1,2,3,4,5,...4N>=i^j) of [(4N/(i^j))%2]]
II. (4N)!/((4N/2)!^2) = (4N)!/((2N)!^2)
----------------------------------------
I.=II. (4N)!/((2N)!^2)=mul(i=all primes<=4N) of [i^sum(j=1,2,3,4,5,...4N>=i^j) of [(4N/(i^j))%2]]
the only thing is that N must be divisible by 4 ... therefore 4N in all terms.
if you have N%4!=0 than solve for N-N%4 and the result correct by the misin 1-3 numbers.
hope it helps

Need an efficient subtraction algorithm modulo a number

For given numbers x,y and n, I would like to calculate x-y mod n in C. Look at this example:
int substract_modulu(int x, int y, int n)
{
return (x-y) % n;
}
As long as x>y, we are fine. In the other case, however, the modulu operation is undefined.
You can think of x,y,n>0. I would like the result to be positive, so if (x-y)<0, then ((x-y)-substract_modulu(x,y,n))/ n shall be an integer.
What is the fastest algorithm you know for that? Is there one which avoids any calls of if and operator??
As many have pointed out, in current C and C++ standards, x % n is no longer implementation-defined for any values of x and n. It is undefined behaviour in the cases where x / n is undefined [1]. Also, x - y is undefined behaviour in the case of integer overflow, which is possible if the signs of x and y might differ.
So the main problem for a general solution is avoiding integer overflow, either in the division or the subtraction. If we know that x and y are non-negative and n is positive, then overflow and division by zero are not possible, and we can confidently say that (x - y) % n is defined. Unfortunately, x - y might be negative, in which case so will be the result of the % operator.
It's easy to correct for the result being negative if we know that n is positive; all we have to do is unconditionally add n and do another modulo operation. That's unlikely to be the best solution, unless you have a computer where division is faster than branching.
If a conditional load instruction is available (pretty common these days), then the compiler will probably do well with the following code, which is portable and well-defined, subject to the constraints that x,y ≥ 0 ∧ n > 0:
((x - y) % n) + ((x >= y) ? 0 : n)
For example, gcc produces this code for my core I5 (although it's generic enough to work on any non-Paleozoic intel chip):
idivq %rcx
cmpq %rsi, %rdi
movl $0, %eax
cmovge %rax, %rcx
leaq (%rdx,%rcx), %rax
which is cheerfully branch-free. (Conditional move is usually a lot faster than branching.)
Another way of doing this would be (except that the function sign needs to be written):
((x - y) % n) + (sign(x - y) & (unsigned long)n)
where sign is all 1s if its argument is negative, and otherwise 0. One possible implementation of sign (adapted from bithacks) is
unsigned long sign(unsigned long x) {
return x >> (sizeof(long) * CHAR_BIT - 1);
}
This is portable (casting negative integer values to unsigned is defined), but it may be slow on architectures which lack high-speed shift. It's unlikely to be faster than the previous solution, but YMMV. TIAS.
Neither of these produce correct results for the general case where integer overflow is possible. It's very difficult to deal with integer overflow. (One particularly annoying case is n == -1, although you can test for that and return 0 without any use of %.) Also, you need to decide your preference for the result of modulo of negative n. I personally prefer the definition where x%n is either 0 or has the same sign as n -- otherwise why would you bother with a negative divisor -- but applications differ.
The three-modulo solution proposed by Tom Tanner will work if n is not -1 and n + n does not overflow. n == -1 will fail if either x or y is INT_MIN, and the simple fix of using abs(n) instead of n will fail if n is INT_MIN. The cases where n has a large absolute value could be replaced with comparisons, but there are a lot of corner cases, and made more complicated by the fact that the standard does not require 2's complement arithmetic, so it's not easily predictable what the corner cases are [2].
As a final note, some tempting solutions do not work. You cannot just take the absolute value of (x - y):
(-z) % n == -(z % n) == n - (z % n) ≠ z % n (unless z % n happens to be n / 2)
And, for the same reason, you cannot just take the absolute value of the result of modulo.
Also, you cannot just cast (x - y) to unsigned:
(unsigned)z == z + 2k (for some k) if z < 0
(z + 2k) % n == (z % n) + (2k % n) ≠ z % n unless (2k % n) == 0
[1] x/n and x%n are both undefined if n==0. But x%n is also undefined if x/n is "not representable" (i.e. there was integer overflow), which will happen on twos-complement
machines (that is, all the ones you care about) if x is most negative representable number and n == -1. It's clear why x/n should be undefined in this case, but slightly less so in the case of x%n, since that value is (mathematically) 0.
[2] Most people who complain about the difficulty of predicting the results of floating-point arithmetic haven't spent much time trying to write truly portable integer arithmetic code :)
If you want to avoid undefined behaviour, without an if, the following would work
return (x % n - y % n + n) % n;
The efficiency depends on the implementation of the modulo operation, but I'd suspect algorithms involving if would be rather faster.
Alternatively you could treat x and y as unsigned. In which case there are no negative numbers involved and no undefined behaviour.
With C++11 the undefined behavior was removed. Depending on the the exact behavior you want you can there just stick with
return (x-y) % n;
For a full explanation read this answer:
https://stackoverflow.com/a/13100805/1149664
You still get undefined behavior for n==0 or if x-y can not be stored in the type you are using.
Whether branching is going to matter will depend on the CPU to some degree. According to the documentation abs (on MSDN) has intrinsic behavior and it might not be a bottleneck at all. This you'll have to test.
If you wan't unconditionally compute things there are several nice methods that can be adapted from the Bit Twiddling Hacks site.
int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
r = (v + mask) ^ mask;
However, I don't know if this will be helpful to your situation without more information about hardware targets and testing.
Just out of curiosity I had to test this myself and when you look at the assembly generated by the compiler we can see there's no real overhead in the use of abs.
unsigned r = abs(i);
====
00381006 cdq
00381007 xor eax,edx
00381009 sub eax,edx
The following is just an alternate form of the above example which according to the Bit Twiddling Site is not patented (while the version used by the Visual C++ 2008 compiler is).
Throughout my answer I have been using MSDN and Visual C++ but I would assume that any sane compiler has similar behavior.
Assuming 0 <= x < n and 0 <= y < n, how about (x + n - y) % n? Then x + n will certainly be larger than y, subtracting y will always result in a positive integer, and the final mod n reduces the result if necessary.
I'm going to guess that it's not really the case here, but I'd like to mention that if the value you are taking modulo with is a power of two, then using the "AND" method is a lot quicker (I'm going to ignore the x-y, and just show how it works for a single x, as x-y is not part of the equation here):
int modpow2(int x, int n)
{
return x & (n-1);
}
If you want to ensure that your code doesn't do anything daft, you could add ASSERT(!(n & n-1)); - this checks that there is only a single bit set in n (so, n is a power of two).
Here is the CPP Code I use in competitive programming:
#include <iostream>
#include<bits/stdc++.h>
using namespace std;
#define ll long long
#define mod 1000000007
ll subtraction_modulo(ll x, ll y ){
return ( ( (x - y) % mod ) + mod ) % mod;
}
Here,
ll -> long long int
mod -> globally defined mod value to be used.