I'm trying to avoid long longs and integer overflow in some calculations, so I came up with the function below to calculate (a * b) / c (order is important due to truncating integer division).
unsigned muldiv(unsigned a, unsigned b, unsigned c)
{
return a * (b / c) + (a * (b % c)) / c;
}
Are there any edge cases for which this won't work as expected?
EDITED: This is correct for a superset of values for which the original obvious logic was correct. It still buys you nothing if c > b and possibly in other conditions. Perhaps you know something about your values of c but this may not help as much as you expect. Some combinations of a, b, c will still overflow.
EDIT: Assuming you're avoiding long long for strict C++98 portability reasons, you can get about 52 bits of precision by promoting your unsigned to doubles that happen to have integral values to do the math. Using double math may in fact be faster than doing three integral divisions.
This fails on quite a few cases. The most obvious is when a is large, so a * (b % c) overflows. You might try swapping a and b in that case, but that still fails if a, b, and c are all large. Consider a = b = 2^25-1 and c = 2^24 with a 32 bit unsigned. The correct result is 2^26-4, but both a * (b % c) and b * (a % c) will overflow. Even (a % c) * (b % c) would overflow.
By far the easisest way to solve this in general is to have a widening multiply, so you can get the intermediate product in higher precision. If you don't have that, you need to synthesize it out of smaller multiplies and divides, which is pretty much the same thing as implementing your own biginteger library.
If you can guarentee that c is small enough that (c-1)*(c-1) will not overflow an unsigned, you could use:
unsigned muldiv(unsigned a, unsigned b, unsigned c) {
return (a/c)*(b/c)*c + (a%c)*(b/c) + (a/c)*(b%c) + (a%c)*(b%c)/c;
}
This will actually give you the "correct" answer for ALL a and b -- (a * b)/c % (UINT_MAX+1)
To avoid overflow you have to pre-divide and then post-multiply by some factor.
The best factor to use is c, as long as one (or both) of a and b is greater than c. This is what Chris Dodd's function does. It has a greatest intermediate of ((a % c) * (b % c)), which, as Chris identifies, is less than or equal to ((c-1)*(c-1)).
If you could have a situation where both a and b are less than c, but (a * b) could still overflow, (which might be the case when c approaches the limit of the word size) then the best factor to use is a large power of two, to turn the multiply and divides into shifts. Try shifting by half the word size.
Note that using pre-divide and then post-multiplying is the equivalent of using longer words when you don't have longer words available. Assuming you don't discard the low order bits but just add them as another term, then you are just using several words instead of one larger one.
I'll let you fill the code in.
Related
If I have 2 int or long long variables, call them a and b, and I want to compute the sum (a + b) mod p, where p is a large prime integer, how can I utilize the modulo operator in C++ to achieve the desired result?
I have tried (a + b) % p, but this gives overflow sometimes, since a + b will overflow before the mod is applied.
Other similar approaches I have tried seem to avoid overflow, but give an incorrect result.
How can I use the modulo operator in this case to correctly compute the desired sum, while avoiding overflow?
a %= p
b %= p
c = p-a
if(b==c)
sum = 0
if (b<c)
sum = a+b
if (b>c)
sum = b-c
EDIT: The trick is to avoid any calculation that might cause overflow, without knowing where the limit is. All we know is that the given a, b and p are below the limit -- maybe just below the limit.
After the first two steps (a%=p;b%=p;) we know a<p and b<p. We still daren't add a+b, because that sum might exceed p and break the limit*. But we can see how much room we have left with c = p-a, which is safe because we know that c<=p and c>0. (The stated types are unsigned, but we may as well avoid negative numbers, if only because their limits are sometimes off by one from the negatives of the positive limits, in ways I can never remember.)
If b=c, then b=p-a, so a+b=p, so the sum (mod p) is zero.
If b<c, then a+b<p, so we can safely compute a+b (and need not apply the modulo).
if b>c, then it is not safe to compute a+b, but we know that the number we're looking for is a+b-p, which we can rewrite as b-(p-a), and we already have b and p-a, so we can safely perform that subtraction.
(*) That's right, I said "daren't". It's a perfectly good word.
Suppose we have 2 constants A & B and a variable i, all 64 bits integers. And we want to compute a simple common arithmetic operation such as:
i * A / B (1)
To simplify the problem, let's assume that variable i is always in the range [INT64_MIN*B/A, INT64_MAX*B/A], so that the final result of the arithmetic operation (1) does not overflow (i.e.: fits in the range [INT64_MIN, INT64_MAX]).
In addition, i is assumed to be more likely in the friendly range Range1 = [INT64_MIN/A, INT64_MAX/A] (i.e.: close to 0), however i may be (less likely) outside this range. In the first case, a trivial integer computation of i * A would not overflow (that's why we called the range friendly); and in the latter case, a trivial integer computation of i * A would overflow, leading to an erroneous result in computation of (1).
What would be the "safest" and "most efficient" way to compute operation (1) (where "safest" means: preserving exactness or at least a decent precision, and where "most efficient" means: lowest average computation time), provided i is more likely in the friendly range Range1.
At now, the solution currently implemented in the code is the following one :
(int64_t)((double)A / B * i)
which solution is quite safe (no overflow) though inaccurate (precision loss due to double significand 53 bits limitation) and quite fast because double division (double)A / B is precomputed at compile time, letting only a double multiplication to be computed at runtime.
If you cannot get better bounds on the ranges involved then you're best off following iammilind's advice to use __int128.
The reason is that otherwise you would have to implement the full logic of word to double-word multiplication and double-word by word division. The Intel and AMD processor manuals contain helpful information and ready-made code, but it gets quite involved, and using C/C++ instead of in assembler makes things doubly complicated.
All good compilers expose useful primitives as intrinsics. Microsoft's list doesn't seem to include a muldiv-like primitive but the __mul128 intrinsic gives you the two halves of the 128-bit product as two 64-bit integers. Based on that you can perform long division of two digits by one digit, where one 'digit' would be a 64-bit integer (usually called 'limb' because bigger than a digit but still only part of the whole). Still quite involved but lots better than using pure C/C++. However, portability-wise it is no better than using __int128 directly. At least that way the compiler implementers have already done all the hard work for you.
If your application domain can give you useful bounds, like that (u % d) * v will not overflow then you can use the identity
(u * v) / d = (u / d) * v + ((u % d) * v) / d
where / signifies integer division, as long as u is non-negative and d is positive (otherwise you might run afoul of the leeway allowed for the semantics of operator %).
In any case you may have to separate out the signs of the operands and use unsigned operations in order to find more useful mechanisms that you can exploit - or to circumvent sabotage by the compiler, like the saturating multiplication that you mentioned. Overflow of signed integer operations invokes undefined behaviour, compilers are free to do whatever they please. By contrast, overflow for unsigned types is well-defined.
Also, with unsigned types you can fall back on rules like that with s = a (+) b (where (+) is possibly-overflowing unsigned addition) you will have either s == a + b or s < a && s < b, which lets you detect overflow after the fact with cheap operations.
However, it is unlikely that you will get much farther on this road because the effort required quickly approaches - or even exceeds - the effort of implementing the double-limb operations I alluded to earlier. Only a thorough analysis of the application domain could give the information required for planning/deploying such shortcuts. In the general case and with the bounds you have given you're pretty much out of luck.
In order to provide a quantified answer to the question, I made a benchmark of different solutions as part of the ones proposed here in this post (thanks to comments and answers).
The benchmark measures computation time of different implementations, when i is inside the friendly range Range1 = [INT64_MIN/A, INT64_MAX/A], and when i is outside the friendly range (yet in the safe range Range2 = [INT64_MIN*B/A, INT64_MAX*B/A]).
Each implementation performs a "safe" (i.e. with no overflow) computation of the operation: i * A / B (except the 1st implementation, given as reference computation time). However, some implementations may return infrequent inaccurate computation result (which behavior is notified).
Some solutions proposed have not been tested or are not listed hereafter; these are: solution using __int128 (unsupported by ms vc compiler), yet boost int128_t has been used instead; solutions using extended 80 bits long double (unsupported by ms vc compiler); solution using InfInt (working and tested though too slow to be a decent competitor).
Time measurements are specified in ps/op (picoseconds per operation). Benchmark platform is an Intel Q6600#3GHz under Windows 7 x64, executable compiled with MS vc14, x64/Release target. The variables, constants and function referenced hereafter are defined as:
int64_t i;
const int64_t A = 1234567891;
const int64_t B = 4321987;
inline bool in_safe_range(int64_t i) { return (INT64_MIN/A <= i) && (i <= INT64_MAX/A); }
(i * A / B) [reference]
i in Range1: 1469 ps/op, i outside Range1: irrelevant (overflows)
((int64_t)((double)i * A / B))
i in Range1: 10613 ps/op, i outside Range1: 10606 ps/op
Note: infrequent inaccurate result (max error = 1 bit) in the whole range Range2
((int64_t)((double)A / B * i))
i in Range1: 1073 ps/op, i outside Range1: 1071 ps/op
Note: infrequent inaccurate result (max error = 1 bit) in the whole range Range2
Note: compiler likely precomputes (double)A / B resulting in the observed performance boost vs previous solution.
(!in_safe_range(i) ? (int64_t)((double)A / B * i) : (i * A / B))
i in Range1: 2009 ps/op, i outside Range1: 1606 ps/op
Note: rare inaccurate result (max error = 1 bit) outside Range1
((int64_t)((int128_t)i * A / B)) [boost int128_t]
i in Range1: 89924 ps/op, i outside Range1: 89289 ps/op
Note: boost int128_t performs dramatically bad on the bench platform (have no idea why)
((i / B) * A + ((i % B) * A) / B)
i in Range1: 5876 ps/op, i outside Range1: 5879 ps/op
(!in_safe_range(i) ? ((i / B) * A + ((i % B) * A) / B) : (i * A / B))
i in Range1: 1999 ps/op, i outside Range1: 6135 ps/op
Conclusion
a) If slight computation errors are acceptable in the whole range Range2, then solution (3) is the fastest one, even faster than the direct integer computation given as reference.
b) If computation errors are unacceptable in the friendly range Range1, yet acceptable outside this range, then solution (4) is the fastest one.
c) If computation errors are unacceptable in the whole range Range2, then solution (7) performs as well as solution (4) in the friendly range Range1, and remains decently fast outside this range.
I think you can detect the overflow before it happens. In your case of i * A / B, you are only worried about the i * A part because the division cannot overflow.
You can detect the overflow by performing test of bool overflow = i > INT64_MAX / A. You will have to do modify this depending on the sign of operands and result.
Some implementations permit __int128_t. Check if your implementation allows it, so that you can you may use it as placeholder instead of double. Refer below post:
Why isn't there int128_t?
If not very concerned about "fast"-ness, then for good portability I would suggest to use header only C++ library "InfInt".
It is pretty straight forward to use the library. Just create an
instance of InfInt class and start using it:
InfInt myint1 = "15432154865413186646848435184100510168404641560358";
InfInt myint2 = 156341300544608LL;
myint1 *= --myint2 - 3;
std::cout << myint1 << std::endl;
Not sure about value bounds, will (i / B) * A + (i % B) * A / B help?
What's the order C++ does in chained multiplication ?
int a, b, c, d;
// set values
int m = a*b*c*d;
operator * has left to right associativity:
int m = ((a * b) * c) * d;
While in math it doesn't matter (multiplication is associative), in case of both C and C++ we may have or not have overflow depending on the order.
0 * INT_MAX * INT_MAX // 0
INT_MAX * INT_MAX * 0 // overflow
And things are getting even more complex if we consider floating point types or operator overloading. See comments of #delnan and #melpomene.
Yes, the order is from left to right.
int m = a * b * c * d;
If you are more interested in the topic of evaluation order or operator precedence, then you might be surprised how some ++ operations behave, even differently with the version of C.
http://en.cppreference.com/w/c/language/eval_order
http://en.cppreference.com/w/c/language/operator_precedence
The order is from left to right in this case
where
int m=a*b*c*d;
Here, first (a*b) is computed then the result is multiplied by c and then d as shown using parentheses:
int m=(((a*b)*c)*d);
The order is nominally left to right. But the optimizers in C++ compilers I've used feel free to change that order for data types they think they understand. If you overload operator* and the optimizer can't see through your overload, then it can't change the order. But when you multiply a sequence of things (variables, constants, function results, etc.) whose type is double, the optimizer might use the associative and commutative property of real number multiplication as if it were true in float or double multiplication. That can lead to some surprises when least significant bits matter to you.
So far as I understand, the standard allows that optimization, but I am far from a "language lawyer" so my best guess on the standard is not a pronouncement to be trusted (as compared with my experience on what compilers actually do).
There isn't any special "order" it multiplies as shown, left to right.
int m = a * b * c * d;
The order of operations comes into affect when using addition/ subtraction with dividing/ multiplying. Otherwise it is always left to right. Regardless, with the example, the solution would be the same no matter which order they were in.
I wanted to see if GCC would reduce a - (b - c) to (a + c) - b with signed and unsigned integers so I created two tests
//test1.c
unsigned fooau(unsigned a, unsigned b, unsigned c) { return a - (b - c); }
signed fooas(signed a, signed b, signed c) { return a - (b - c); }
signed fooms(signed a) { return a*a*a*a*a*a; }
unsigned foomu(unsigned a) { return a*a*a*a*a*a; }
//test2.c
unsigned fooau(unsigned a, unsigned b, unsigned c) { return (a + c) - b; }
signed fooas(signed a, signed b, signed c) { return (a + c) - b; }
signed fooms(signed a) { return (a*a*a)*(a*a*a); }
unsigned foomu(unsigned a) { return (a*a*a)*(a*a*a); }
I compiled first with gcc -O3 test1.c test2.c -S and looked at the assembly. For both tests fooau were identical however fooas was not.
As far as I understand unsigned arithmetic can be derived from the following formula
(a%n + b%n)%n = (a+b)%n
which can be used to show that unsigned arithmetic is associative. But since signed overflow is undefined behavior this equality does not necessarily hold for signed addition (i.e. signed addition is not associative) which explains why GCC did not reduce a - (b - c) to (a + c) - b for signed integers. But we can tell GCC to use this formula using -fwrapv. Using this option fooas for both tests is identical.
But what about multiplication? For both tests fooms and foomu were simplified to three multiplications (a*a*a*a*a*a to (a*a*a)*(a*a*a)). But multiplication can be written as repeated addition so using the formula above I think it can be shown that
((a%n)*(b%n))%n = (a*b)%n
which I think can also show that unsigned modular multiplication is associative as well. But since GCC used only three multiplications for foomu this shows that GCC assumes signed integer multiplication is associative.
This seems like a contradiction to me. For addition signed arithmetic was not associative but for multiplication it is.
Two questions:
Is it true that addition is not associative with signed integers but multiplication is in C/C++?
If signed overflow is used for optimization isn't the fact that GCC not reducing the algebraic expression a failure to optimize? Wouldn't it better better for optimization to use -fwrapv (I understand that a - (b - c) to (a + c) - b is not much of a reduction but I'm worried about more complicated cases)? Does this mean for optimization sometimes using -fwrapv is more efficient and sometimes it's not?
No, multiplication is not associative in signed integers. Consider (0 * x) * x vs. 0 * (x * x) - the latter has potentially undefined behavior while the former is always defined.
The potential for undefined behavior only ever introduces new optimization opportunities, the classic example being optimizing x + 1 > x to true for signed x, an optimization that is not available for unsigned integers.
I don't think you can assume that gcc failing to change a - (b - c) to (a + c) - b represents a missed optimization opportunity; the two calculations compile to the same two instructions on x86-64 (leal and subl), just in a different order.
Indeed, the implementation is entitled to assume that arithmetic is associative, and use that for optimizations, since anything can happen on UB including modulo arithmetic or infinite-range arithmetic. However, you as the programmer are not entitled to assume associativity unless you can guarantee that no intermediate result overflows.
As another example, try (a + a) - a - gcc will optimize this to a for signed a as well as for unsigned.
Algebraic reduction of signed integer expressions can be performed provided it has the same result for any defined set of inputs. So if the expression
a * a * a * a * a * a
is defined -- that is, a is small enough that no signed overflow occurs during the computation -- then any regrouping of the multiplications will produce the same value, because no product of less than six as can overflow.
The same would be true for a + a + a + a + a + a.
Things change if the variables multiplied (or added) are not all the same, or if the additions are intermingled with subtractions. In those cases, regrouping and rearranging the computation could lead to a signed overflow which did not occur in the canonical computation.
For example, take the expression
a - (b - c)
Algebraically, that's equivalent to
(a + c) - b
But the compiler can not do that rearrangement because it is possible that the intermediate value a+c will overflow with inputs which would not cause an overflow in the original. Suppose we had a=INT_MAX-1; b=1; c=2; then a+c results in an overflow, but a - (b - c) is computed as a - (-1), which is INT_MAX, without overflow.
If the compiler can assume that signed overflow does not trap but instead is computed modulo INT_MAX+1, then these rearrangements are possible. The -fwrapv options allows gcc to make that assumption.
I have some C++ code that is in this format:
void calculate(int a, float b, long c) {
long long result = (a / b) * (1000 / c);
assert(result > 0);
// more code...
}
However, the assert statement fails. It is a given that a, b, and c are all values that are positive. Then I take it that 'result' ends up becoming 'result == 0' for some reason. Is there something inherent in the logic of the type conversions presented here that is causing my code to fail?
I am new to C++ so the answer may seem obvious...
If c > 1000, then the expression (1000/c) becomes zero due to integer division. Integer division is being performed here because both operands are integral types.
As a result, the expression (a / b) * (1000 / c) becomes zero, so the value of result becomes zero, triggering the assert.
The same thing will happen if b > a, for the same reason as above.
If integer division is not what you want, rewrite your expression like this:
long long result = long long((float(a) / b) * (1000.0f / c));
This invokes floating-point division instead of integer division. The value of a and c will be converted to a floating-point type for the purposes of calculating the result.