Modulus Function to Avoid Integer Overflow in C++ - c++

If I have 2 int or long long variables, call them a and b, and I want to compute the sum (a + b) mod p, where p is a large prime integer, how can I utilize the modulo operator in C++ to achieve the desired result?
I have tried (a + b) % p, but this gives overflow sometimes, since a + b will overflow before the mod is applied.
Other similar approaches I have tried seem to avoid overflow, but give an incorrect result.
How can I use the modulo operator in this case to correctly compute the desired sum, while avoiding overflow?

a %= p
b %= p
c = p-a
if(b==c)
sum = 0
if (b<c)
sum = a+b
if (b>c)
sum = b-c
EDIT: The trick is to avoid any calculation that might cause overflow, without knowing where the limit is. All we know is that the given a, b and p are below the limit -- maybe just below the limit.
After the first two steps (a%=p;b%=p;) we know a<p and b<p. We still daren't add a+b, because that sum might exceed p and break the limit*. But we can see how much room we have left with c = p-a, which is safe because we know that c<=p and c>0. (The stated types are unsigned, but we may as well avoid negative numbers, if only because their limits are sometimes off by one from the negatives of the positive limits, in ways I can never remember.)
If b=c, then b=p-a, so a+b=p, so the sum (mod p) is zero.
If b<c, then a+b<p, so we can safely compute a+b (and need not apply the modulo).
if b>c, then it is not safe to compute a+b, but we know that the number we're looking for is a+b-p, which we can rewrite as b-(p-a), and we already have b and p-a, so we can safely perform that subtraction.
(*) That's right, I said "daren't". It's a perfectly good word.

Related

How to properly avoid SIGFPE and overflow on arithmetic operations

I've been trying to create a Fraction class as complete as possible, to learn C++, classes and related stuff on my own. Among other things, I wanted to ensure some level of "protection" against floating point exceptions and overflows.
Objective:
Avoid overflow and floating point exceptions in arithmetic operations found in common operations, expending the least time/memory. If avoiding is not possible, then at least detect it.
Also, the idea is to not cast to some bigger type. That creates a handful of problems (like there might be no bigger type)
Cases I've found:
Overflow on +, -, *, /, pow, root
Operations are mostly straightforward (a and b are Long):
a+b: if LONG_MAX - b > a then there's overflow. (not enough. a or b might be negatives)
a-b: if LONG_MAX - a > -b then there's overflow. (Idem)
a*b: if LONG_MAX / b > a then there's overflow. (if b != 0)
a/b: might thrown SIGFPE if a << b or overflow if b << 0
pow(a,b): if (pow(LONG_MAX, 1.0/b) > a then there's overflow.
pow(a,1.0/b): Similar to a/b
Overflow on abs(x) when x = LONG_MIN (or equivalent)
This is funny. Every signed type has a range [-x-1,x] of possible values. abs(-x-1) = x+1 = -x-1 because overflow. This means there is a case where abs(x) < 0
SIGFPE with big numbers divided by -1
Found when applying numerator/gcd(numerator,denominator). Sometimes gcd returned -1 and I got a floating point exception.
Easy fixes:
On some operations is easy to check for overflow. If that's the case, I can always cast to double (with the risk of loosing precision over big integers). The idea is to find a better solution, without casting.
In Fraction arithmetics, sometimes I can do extra checking for simplifications: to solve a/b * c/d (co-primes), I can reduce to co-primes a/d and c/b first.
I can always do cascade if's asking if a or b are <0 or > 0. Not the prettiest. Besides that awful choice, I can create a function neg() that will avoid that overflow
T neg(T x){if (x > 0) return -x; else return x;},
I can take abs(x) of gcd and any similar situation (anywhere x > LONG_MIN)
I'm not sure if 2. and 3. are the best solutions, but seems good enough. I'm posting those here so maybe anyone has a better answer.
Ugliest fixes
In most operations I need to do a lot of extra operations to check and avoid overflow. Here is were I'm pretty sure I can learn a thing or two.
Example:
Fraction Fraction::operator+(Fraction f){
double lcm = max(den,f.den);
lcm /= gcd(den, f.den);
lcm *= min(den,f.den);
// a/c + b/d = [a*(lcm/d) + b*(lcm/c)] / lcm //use to create normal fractions
// a/c + b/d = [a/lcm * (lcm/c)] + [b/lcm * (lcm/d)] //use to create fractions through double
double p = (double)num;
p *= lcm / (double)den;
double q = (double)f.num;
q *= lcm / (double)f.den;
if(lcm >= LONG_MAX || (p + q) >= LONG_MAX || (p + q) <= LONG_MIN){
//cerr << "Aproximating " << num << "/" << den << " + " << f.num << "/" << f.den << endl;
p = (double)num / lcm;
p *= lcm / (double)den;
q = (double)f.num / lcm;
q *= lcm / (double)f.den;
return Fraction(p + q);
}
else
return normal(p + q, (long)lcm);
}
Which is the best way to avoid overflow on these arithmetic operations?
Edit: There are a handfull of questions in this site quite similar, but those are not the same (detect instead of avoid, unsigned instead of signed, SIGFPE in specific no-related situations).
Checking all of them I found some answers that upon modification might be usefull to give a propper answer, like:
Detect overflow in unsigned addition (not my case, I'm working with signed):
uint32_t x, y;
uint32_t value = x + y;
bool overflow = value < x; // Alternatively "value < y" should also work
Detect overflow in signed operations. This might be a bit too general, with a lot of branches, and doesn't discuss how to avoid overflow.
The CERT rules mentioned in an answer, are a good starting point, but again only discuss how to detect.
Other answers are too general and I wonder if there are any answer more specific for the cases I'm looking at.
You need to differentiate between floating point operations and integral operations.
Concerning the latter, operations on unsigned types do not normally overflow, except for division by zero which is undefined behaviour by definition IIRC. This is closely related to the fact that C(++) standard mandates a binary representation for unsigned numbers, which virtually makes them a ring.
In contrast, the C(++) standard allows for multiple implementations of signed numbers (sign+magnitude, 1's complement or, most widely used, 2's complement). So signed overflow is defined to be undefined behaviour, possibly to give compiler implementers more freedom to generate efficient code for their target machines. Also this is the reason for your worries with abs(): At least in 2's complement representation, there is no positive number that is equal in magnitude to the largest negative number in magnitude. Refer to CERT rules for elaboration.
On the floating point side SIGFPE has historically been coined for signalling floating point exceptions. However, given the variety of implementations of the arithmetic units in processors nowadays, SIGFPE should be considered a generic signal that reports arithmetic errors. For instance, the glibc reference manual gives a list of possible reasons, explicitely including integral division by zero.
It is worth noting that floating point operations as per ANSI/IEEE Std 754, which is most commonly used today I suppose, are specifically designed to be a kind of error-proof. This means that for example, when an addition overflows it gives a result of infinity and typically sets a flag that you can check later. It is perfectly legal to use this infinite value in further calculations as the floating point operations have been defined for affine arithmetic. This once was meant to allow long running computations (on slow machines) to continue even with intermediate overflows etc. Note that certain operations are forbidden even in affine arithmetic, for example dividing infinity by infinity or subtracting infinity by infinity.
So the bottom line is that floating point computations should not normally cause floating point exceptions. Yet you can have so-called traps which cause SIGFPE (or a similar mechanism) to be triggered whenever the above mentioned flags become raised.

Safest and most efficient way to compute an integer operation that may overflow

Suppose we have 2 constants A & B and a variable i, all 64 bits integers. And we want to compute a simple common arithmetic operation such as:
i * A / B (1)
To simplify the problem, let's assume that variable i is always in the range [INT64_MIN*B/A, INT64_MAX*B/A], so that the final result of the arithmetic operation (1) does not overflow (i.e.: fits in the range [INT64_MIN, INT64_MAX]).
In addition, i is assumed to be more likely in the friendly range Range1 = [INT64_MIN/A, INT64_MAX/A] (i.e.: close to 0), however i may be (less likely) outside this range. In the first case, a trivial integer computation of i * A would not overflow (that's why we called the range friendly); and in the latter case, a trivial integer computation of i * A would overflow, leading to an erroneous result in computation of (1).
What would be the "safest" and "most efficient" way to compute operation (1) (where "safest" means: preserving exactness or at least a decent precision, and where "most efficient" means: lowest average computation time), provided i is more likely in the friendly range Range1.
At now, the solution currently implemented in the code is the following one :
(int64_t)((double)A / B * i)
which solution is quite safe (no overflow) though inaccurate (precision loss due to double significand 53 bits limitation) and quite fast because double division (double)A / B is precomputed at compile time, letting only a double multiplication to be computed at runtime.
If you cannot get better bounds on the ranges involved then you're best off following iammilind's advice to use __int128.
The reason is that otherwise you would have to implement the full logic of word to double-word multiplication and double-word by word division. The Intel and AMD processor manuals contain helpful information and ready-made code, but it gets quite involved, and using C/C++ instead of in assembler makes things doubly complicated.
All good compilers expose useful primitives as intrinsics. Microsoft's list doesn't seem to include a muldiv-like primitive but the __mul128 intrinsic gives you the two halves of the 128-bit product as two 64-bit integers. Based on that you can perform long division of two digits by one digit, where one 'digit' would be a 64-bit integer (usually called 'limb' because bigger than a digit but still only part of the whole). Still quite involved but lots better than using pure C/C++. However, portability-wise it is no better than using __int128 directly. At least that way the compiler implementers have already done all the hard work for you.
If your application domain can give you useful bounds, like that (u % d) * v will not overflow then you can use the identity
(u * v) / d = (u / d) * v + ((u % d) * v) / d
where / signifies integer division, as long as u is non-negative and d is positive (otherwise you might run afoul of the leeway allowed for the semantics of operator %).
In any case you may have to separate out the signs of the operands and use unsigned operations in order to find more useful mechanisms that you can exploit - or to circumvent sabotage by the compiler, like the saturating multiplication that you mentioned. Overflow of signed integer operations invokes undefined behaviour, compilers are free to do whatever they please. By contrast, overflow for unsigned types is well-defined.
Also, with unsigned types you can fall back on rules like that with s = a (+) b (where (+) is possibly-overflowing unsigned addition) you will have either s == a + b or s < a && s < b, which lets you detect overflow after the fact with cheap operations.
However, it is unlikely that you will get much farther on this road because the effort required quickly approaches - or even exceeds - the effort of implementing the double-limb operations I alluded to earlier. Only a thorough analysis of the application domain could give the information required for planning/deploying such shortcuts. In the general case and with the bounds you have given you're pretty much out of luck.
In order to provide a quantified answer to the question, I made a benchmark of different solutions as part of the ones proposed here in this post (thanks to comments and answers).
The benchmark measures computation time of different implementations, when i is inside the friendly range Range1 = [INT64_MIN/A, INT64_MAX/A], and when i is outside the friendly range (yet in the safe range Range2 = [INT64_MIN*B/A, INT64_MAX*B/A]).
Each implementation performs a "safe" (i.e. with no overflow) computation of the operation: i * A / B (except the 1st implementation, given as reference computation time). However, some implementations may return infrequent inaccurate computation result (which behavior is notified).
Some solutions proposed have not been tested or are not listed hereafter; these are: solution using __int128 (unsupported by ms vc compiler), yet boost int128_t has been used instead; solutions using extended 80 bits long double (unsupported by ms vc compiler); solution using InfInt (working and tested though too slow to be a decent competitor).
Time measurements are specified in ps/op (picoseconds per operation). Benchmark platform is an Intel Q6600#3GHz under Windows 7 x64, executable compiled with MS vc14, x64/Release target. The variables, constants and function referenced hereafter are defined as:
int64_t i;
const int64_t A = 1234567891;
const int64_t B = 4321987;
inline bool in_safe_range(int64_t i) { return (INT64_MIN/A <= i) && (i <= INT64_MAX/A); }
(i * A / B) [reference]
i in Range1: 1469 ps/op, i outside Range1: irrelevant (overflows)
((int64_t)((double)i * A / B))
i in Range1: 10613 ps/op, i outside Range1: 10606 ps/op
Note: infrequent inaccurate result (max error = 1 bit) in the whole range Range2
((int64_t)((double)A / B * i))
i in Range1: 1073 ps/op, i outside Range1: 1071 ps/op
Note: infrequent inaccurate result (max error = 1 bit) in the whole range Range2
Note: compiler likely precomputes (double)A / B resulting in the observed performance boost vs previous solution.
(!in_safe_range(i) ? (int64_t)((double)A / B * i) : (i * A / B))
i in Range1: 2009 ps/op, i outside Range1: 1606 ps/op
Note: rare inaccurate result (max error = 1 bit) outside Range1
((int64_t)((int128_t)i * A / B)) [boost int128_t]
i in Range1: 89924 ps/op, i outside Range1: 89289 ps/op
Note: boost int128_t performs dramatically bad on the bench platform (have no idea why)
((i / B) * A + ((i % B) * A) / B)
i in Range1: 5876 ps/op, i outside Range1: 5879 ps/op
(!in_safe_range(i) ? ((i / B) * A + ((i % B) * A) / B) : (i * A / B))
i in Range1: 1999 ps/op, i outside Range1: 6135 ps/op
Conclusion
a) If slight computation errors are acceptable in the whole range Range2, then solution (3) is the fastest one, even faster than the direct integer computation given as reference.
b) If computation errors are unacceptable in the friendly range Range1, yet acceptable outside this range, then solution (4) is the fastest one.
c) If computation errors are unacceptable in the whole range Range2, then solution (7) performs as well as solution (4) in the friendly range Range1, and remains decently fast outside this range.
I think you can detect the overflow before it happens. In your case of i * A / B, you are only worried about the i * A part because the division cannot overflow.
You can detect the overflow by performing test of bool overflow = i > INT64_MAX / A. You will have to do modify this depending on the sign of operands and result.
Some implementations permit __int128_t. Check if your implementation allows it, so that you can you may use it as placeholder instead of double. Refer below post:
Why isn't there int128_t?
If not very concerned about "fast"-ness, then for good portability I would suggest to use header only C++ library "InfInt".
It is pretty straight forward to use the library. Just create an
instance of InfInt class and start using it:
InfInt myint1 = "15432154865413186646848435184100510168404641560358";
InfInt myint2 = 156341300544608LL;
myint1 *= --myint2 - 3;
std::cout << myint1 << std::endl;
Not sure about value bounds, will (i / B) * A + (i % B) * A / B help?

real modulo operator in C/C++? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to code a modulo (%) operator in C/C++/Obj-C that handles negative numbers
From what I understand (see Modulo operator with negative values and Modulo operation) C & C++ have a "remainder" operator a % b but no operator that actually does modular arithmetic when the LHS is negative.
Several languages do have such a function. Is it possible to build an efficient function in C/C++ (or is there no efficient way to do it on i686/x64 CPUs)?
Currently I use (n * b + a) % b where n is picked such that I'm fairly sure the entire LHS is non-negative, but inevitably code gets changed and bugs sometimes occur.
Note: in case it's not clear, by modular arithmetic I mean an operator such that a + b % b = a % b for all integers a and all positive integers b.
There is no simple way to do it, however it is more efficient if you create a two-line solution, and spare a multiplication plus determining n.
inline int modulo(int a, int b) {
const int result = a % b;
return result >= 0 ? result : result + b;
}
Also, if you need to work correctly for negative b numbers as well, add to the beginning:
if(b < 0) return modulo(-a, -b);
I would suggest a function like the one above, but using inline int modulo(int a, int b) {} (just as if the operator existed in C++). Personnally I don't use negative numbers often, and still think you should keep % whenever your code doesn't use negative numbers.

Is this multiply-divide function correct?

I'm trying to avoid long longs and integer overflow in some calculations, so I came up with the function below to calculate (a * b) / c (order is important due to truncating integer division).
unsigned muldiv(unsigned a, unsigned b, unsigned c)
{
return a * (b / c) + (a * (b % c)) / c;
}
Are there any edge cases for which this won't work as expected?
EDITED: This is correct for a superset of values for which the original obvious logic was correct. It still buys you nothing if c > b and possibly in other conditions. Perhaps you know something about your values of c but this may not help as much as you expect. Some combinations of a, b, c will still overflow.
EDIT: Assuming you're avoiding long long for strict C++98 portability reasons, you can get about 52 bits of precision by promoting your unsigned to doubles that happen to have integral values to do the math. Using double math may in fact be faster than doing three integral divisions.
This fails on quite a few cases. The most obvious is when a is large, so a * (b % c) overflows. You might try swapping a and b in that case, but that still fails if a, b, and c are all large. Consider a = b = 2^25-1 and c = 2^24 with a 32 bit unsigned. The correct result is 2^26-4, but both a * (b % c) and b * (a % c) will overflow. Even (a % c) * (b % c) would overflow.
By far the easisest way to solve this in general is to have a widening multiply, so you can get the intermediate product in higher precision. If you don't have that, you need to synthesize it out of smaller multiplies and divides, which is pretty much the same thing as implementing your own biginteger library.
If you can guarentee that c is small enough that (c-1)*(c-1) will not overflow an unsigned, you could use:
unsigned muldiv(unsigned a, unsigned b, unsigned c) {
return (a/c)*(b/c)*c + (a%c)*(b/c) + (a/c)*(b%c) + (a%c)*(b%c)/c;
}
This will actually give you the "correct" answer for ALL a and b -- (a * b)/c % (UINT_MAX+1)
To avoid overflow you have to pre-divide and then post-multiply by some factor.
The best factor to use is c, as long as one (or both) of a and b is greater than c. This is what Chris Dodd's function does. It has a greatest intermediate of ((a % c) * (b % c)), which, as Chris identifies, is less than or equal to ((c-1)*(c-1)).
If you could have a situation where both a and b are less than c, but (a * b) could still overflow, (which might be the case when c approaches the limit of the word size) then the best factor to use is a large power of two, to turn the multiply and divides into shifts. Try shifting by half the word size.
Note that using pre-divide and then post-multiplying is the equivalent of using longer words when you don't have longer words available. Assuming you don't discard the low order bits but just add them as another term, then you are just using several words instead of one larger one.
I'll let you fill the code in.

Another double type trick in C++?

First, I understand that the double type in C++ has been discussed lots of time, but I wasn't able to answer my question after searching. Any help or idea is highly appreciated.
The simplified version of my question is: I got three different results (a=-0.926909, a=-0.926947 and a=-0.926862) when I computed a=b-c+d with three different approaches and the same values of b, c and d, and I don't know which one to trust.
The detailed version of my question is:
I was recently writing a program (in C++ on Ubuntu 10.10) to handle some data. One function looks like this:
void calc() {
double a, b;
...
a = b - c + d; // c, d are global variables of double
...
}
When I was using GDB to debug the above code, during a call to calc(), I recorded the values of b, c and d before the statement a = b - c + d as follows:
b = 54.7231
c = 55.4051
d = -0.244947
After the statement a = b - c + d excuted, I found that a=-0.926909 instead of -0.926947 which is calculated by a calculator. Well, so far it is not quite confusing yet, as I guess this might just be a precision problem. Later on I re-implemented another version of calc() for some reason. Let's call this new version calc_new(). calc_new() is almost the same as calc(), except for how and where b, c and d are calculated:
void calc_new() {
double a, b;
...
a = b - c + d; // c, d are global variables of double
...
}
This time when I was debugging, the values of b, c and d before the statement a = b - c + d are the same as when calc() was debugged: b = 54.7231, c = 55.4051, d = -0.244947. However, this time after the statement a = b - c + d executed, I got a=-0.926862. That being said, I got three different a when I computed a = b - c + d with the same values of b, c and d. I think differences between a=-0.926862, a=-0.926909 and a=-0.926947 are not small, but I cannot figure out the cause. And which one is correct?
With Many Thanks,
Tom
If you expect the answer to be accurate in the 5th and 6th decimal place, you need to know exactly what the inputs to the calculation are in those places. You are seeing inputs with only 4 decimal places, you need to display their 5th and 6th place as well. Then I think you would see a comprehensible situation that matches your calculator to 6 decimal places. Double has more than sufficient precision for this job, there would only be precision problems here if you were taking the difference of two very similar numbers (you're not).
Edit: Unsurprisingly, increasing the display precision would have also shown you that calc() and calc_new() were supplying different inputs to the calculation. Credit to Mike Seymour and Dietmar Kuhl in the comments who were the first to see your actual problem.
Let me try to answer the question I suspect that you meant to ask. If I have mistaken your intent, then you can disregard the answer.
Suppose that I have the numbers u = 500.1 and v = 5.001, each to four decimal places of accuracy. What then is w = u + v? Answer, w = 505.101, but to four decimal places, it's w = 505.1.
Now consider x = w - u = 5.000, which should equal v, but doesn't quite.
If I only change the order of operations however, I can get x to equal v exactly, not by x = w - u or by x = (u + v) - u, but by x = v + (u - u).
Is that trivial? Yes, in my example, it is; but the same principle applies in your example, except that they aren't really decimal places but bits of precision.
In general, to maintain precision, if you have some floating-point numbers to sum, you should try to add the small ones together first, and only bring the larger ones into the sum later.
We're discussing here about smoke. If nothing changed in the environment an expression like:
a = b + c + d
MUST ALWAYS RETURN THE SAME VALUE IF INPUTS AREN'T CHANGED.
No rounding errors. No esoteric pragmas, nothing at all.
If you check your bank account today and tomorrow (and nothing changed in that time) I suspect you'll go crazy if you see something different. We're speaking about programs, not random number generators!!!
The correct one is -0.926947.
The differences you see are far too large for rounding errors (even in single precision) as one can check in this encoder.
When using the encoder, you need to enter them like this: -55.926909 (to account for the potential effect of the operator commutativity effects nicely described in previously submitted answers.) Additionally, a difference in just the last significant bit may well be due to rounding effects, but you will not see any with your values.
When using the tool, 64bit format (Binary64) corresponds to your implementation's double type.
Rational numbers do not always have a terminating expansion in a given base. 1/3rd cannot be expressed in a finite number of digits in base ten. In base 2, rational numbers with a denominator that is a power of two will have a terminating expansion. The rest won't. So 1/2, 1/4, 3/8, 7/16.... any number that looks like x/(2^n) can be represented accurately. That turns out to be a fairly sparse subset of the infinite series of rational numbers. Everything else will be subject to the errors introduced by trying to represent an infinite number of binary digits within a finite container.
But addition is commutative, right? Yes. But when you start introducing rounding errors things change a little. With a = b + c + d as an example, let's say that d cannot be expressed in a finite number of binary digits. Neither can c. So adding them together will give us some inaccurate value, which itself may also be incapable of being represented in a finite number of binary digits. So error on top of error. Then we add that value to b, which may also not be a terminating expansion in binary. So taking one inaccurate result and adding it to another inaccurate number results in another inaccurate number. And because we're throwing away precision at every step, we potentially break the symmetry of commutativity at each step.
There's a post I made: (Perl-related, but it's a universal topic) Re: Shocking Imprecision (PerlMonks), and of course the canonical What Every Computer Scientist Should Know About Floating Point Math, both which discuss the topic. The latter is far more detailed.