Related
I am looking for an efficient (optionally standard, elegant and easy to implement) solution to multiply relatively large numbers, and store the result into one or several integers :
Let say I have two 64 bits integers declared like this :
uint64_t a = xxx, b = yyy;
When I do a * b, how can I detect if the operation results in an overflow and in this case store the carry somewhere?
Please note that I don't want to use any large-number library since I have constraints on the way I store the numbers.
1. Detecting the overflow:
x = a * b;
if (a != 0 && x / a != b) {
// overflow handling
}
Edit: Fixed division by 0 (thanks Mark!)
2. Computing the carry is quite involved. One approach is to split both operands into half-words, then apply long multiplication to the half-words:
uint64_t hi(uint64_t x) {
return x >> 32;
}
uint64_t lo(uint64_t x) {
return ((1ULL << 32) - 1) & x;
}
void multiply(uint64_t a, uint64_t b) {
// actually uint32_t would do, but the casting is annoying
uint64_t s0, s1, s2, s3;
uint64_t x = lo(a) * lo(b);
s0 = lo(x);
x = hi(a) * lo(b) + hi(x);
s1 = lo(x);
s2 = hi(x);
x = s1 + lo(a) * hi(b);
s1 = lo(x);
x = s2 + hi(a) * hi(b) + hi(x);
s2 = lo(x);
s3 = hi(x);
uint64_t result = s1 << 32 | s0;
uint64_t carry = s3 << 32 | s2;
}
To see that none of the partial sums themselves can overflow, we consider the worst case:
x = s2 + hi(a) * hi(b) + hi(x)
Let B = 1 << 32. We then have
x <= (B - 1) + (B - 1)(B - 1) + (B - 1)
<= B*B - 1
< B*B
I believe this will work - at least it handles Sjlver's test case. Aside from that, it is untested (and might not even compile, as I don't have a C++ compiler at hand anymore).
The idea is to use following fact which is true for integral operation:
a*b > c if and only if a > c/b
/ is integral division here.
The pseudocode to check against overflow for positive numbers follows:
if (a > max_int64 / b) then "overflow" else "ok".
To handle zeroes and negative numbers you should add more checks.
C code for non-negative a and b follows:
if (b > 0 && a > 18446744073709551615 / b) {
// overflow handling
}; else {
c = a * b;
}
Note, max value for 64 type:
18446744073709551615 == (1<<64)-1
To calculate carry we can use approach to split number into two 32-digits and multiply them as we do this on the paper. We need to split numbers to avoid overflow.
Code follows:
// split input numbers into 32-bit digits
uint64_t a0 = a & ((1LL<<32)-1);
uint64_t a1 = a >> 32;
uint64_t b0 = b & ((1LL<<32)-1);
uint64_t b1 = b >> 32;
// The following 3 lines of code is to calculate the carry of d1
// (d1 - 32-bit second digit of result, and it can be calculated as d1=d11+d12),
// but to avoid overflow.
// Actually rewriting the following 2 lines:
// uint64_t d1 = (a0 * b0 >> 32) + a1 * b0 + a0 * b1;
// uint64_t c1 = d1 >> 32;
uint64_t d11 = a1 * b0 + (a0 * b0 >> 32);
uint64_t d12 = a0 * b1;
uint64_t c1 = (d11 > 18446744073709551615 - d12) ? 1 : 0;
uint64_t d2 = a1 * b1 + c1;
uint64_t carry = d2; // needed carry stored here
Although there have been several other answers to this question, I several of them have code that is completely untested, and thus far no one has adequately compared the different possible options.
For that reason, I wrote and tested several possible implementations (the last one is based on this code from OpenBSD, discussed on Reddit here). Here's the code:
/* Multiply with overflow checking, emulating clang's builtin function
*
* __builtin_umull_overflow
*
* This code benchmarks five possible schemes for doing so.
*/
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <limits.h>
#ifndef BOOL
#define BOOL int
#endif
// Option 1, check for overflow a wider type
// - Often fastest and the least code, especially on modern compilers
// - When long is a 64-bit int, requires compiler support for 128-bits
// ints (requires GCC >= 3.0 or Clang)
#if LONG_BIT > 32
typedef __uint128_t long_overflow_t ;
#else
typedef uint64_t long_overflow_t;
#endif
BOOL
umull_overflow1(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
long_overflow_t prod = (long_overflow_t)lhs * (long_overflow_t)rhs;
*result = (unsigned long) prod;
return (prod >> LONG_BIT) != 0;
}
// Option 2, perform long multiplication using a smaller type
// - Sometimes the fastest (e.g., when mulitply on longs is a library
// call).
// - Performs at most three multiplies, and sometimes only performs one.
// - Highly portable code; works no matter how many bits unsigned long is
BOOL
umull_overflow2(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
const unsigned long HALFSIZE_MAX = (1ul << LONG_BIT/2) - 1ul;
unsigned long lhs_high = lhs >> LONG_BIT/2;
unsigned long lhs_low = lhs & HALFSIZE_MAX;
unsigned long rhs_high = rhs >> LONG_BIT/2;
unsigned long rhs_low = rhs & HALFSIZE_MAX;
unsigned long bot_bits = lhs_low * rhs_low;
if (!(lhs_high || rhs_high)) {
*result = bot_bits;
return 0;
}
BOOL overflowed = lhs_high && rhs_high;
unsigned long mid_bits1 = lhs_low * rhs_high;
unsigned long mid_bits2 = lhs_high * rhs_low;
*result = bot_bits + ((mid_bits1+mid_bits2) << LONG_BIT/2);
return overflowed || *result < bot_bits
|| (mid_bits1 >> LONG_BIT/2) != 0
|| (mid_bits2 >> LONG_BIT/2) != 0;
}
// Option 3, perform long multiplication using a smaller type (this code is
// very similar to option 2, but calculates overflow using a different but
// equivalent method).
// - Sometimes the fastest (e.g., when mulitply on longs is a library
// call; clang likes this code).
// - Performs at most three multiplies, and sometimes only performs one.
// - Highly portable code; works no matter how many bits unsigned long is
BOOL
umull_overflow3(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
const unsigned long HALFSIZE_MAX = (1ul << LONG_BIT/2) - 1ul;
unsigned long lhs_high = lhs >> LONG_BIT/2;
unsigned long lhs_low = lhs & HALFSIZE_MAX;
unsigned long rhs_high = rhs >> LONG_BIT/2;
unsigned long rhs_low = rhs & HALFSIZE_MAX;
unsigned long lowbits = lhs_low * rhs_low;
if (!(lhs_high || rhs_high)) {
*result = lowbits;
return 0;
}
BOOL overflowed = lhs_high && rhs_high;
unsigned long midbits1 = lhs_low * rhs_high;
unsigned long midbits2 = lhs_high * rhs_low;
unsigned long midbits = midbits1 + midbits2;
overflowed = overflowed || midbits < midbits1 || midbits > HALFSIZE_MAX;
unsigned long product = lowbits + (midbits << LONG_BIT/2);
overflowed = overflowed || product < lowbits;
*result = product;
return overflowed;
}
// Option 4, checks for overflow using division
// - Checks for overflow using division
// - Division is slow, especially if it is a library call
BOOL
umull_overflow4(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
*result = lhs * rhs;
return rhs > 0 && (SIZE_MAX / rhs) < lhs;
}
// Option 5, checks for overflow using division
// - Checks for overflow using division
// - Avoids division when the numbers are "small enough" to trivially
// rule out overflow
// - Division is slow, especially if it is a library call
BOOL
umull_overflow5(unsigned long lhs, unsigned long rhs, unsigned long* result)
{
const unsigned long MUL_NO_OVERFLOW = (1ul << LONG_BIT/2) - 1ul;
*result = lhs * rhs;
return (lhs >= MUL_NO_OVERFLOW || rhs >= MUL_NO_OVERFLOW) &&
rhs > 0 && SIZE_MAX / rhs < lhs;
}
#ifndef umull_overflow
#define umull_overflow2
#endif
/*
* This benchmark code performs a multiply at all bit sizes,
* essentially assuming that sizes are logarithmically distributed.
*/
int main()
{
unsigned long i, j, k;
int count = 0;
unsigned long mult;
unsigned long total = 0;
for (k = 0; k < 0x40000000 / LONG_BIT / LONG_BIT; ++k)
for (i = 0; i != LONG_MAX; i = i*2+1)
for (j = 0; j != LONG_MAX; j = j*2+1) {
count += umull_overflow(i+k, j+k, &mult);
total += mult;
}
printf("%d overflows (total %lu)\n", count, total);
}
Here are the results, testing with various compilers and systems I have (in this case, all testing was done on OS X, but results should be similar on BSD or Linux systems):
+------------------+----------+----------+----------+----------+----------+
| | Option 1 | Option 2 | Option 3 | Option 4 | Option 5 |
| | BigInt | LngMult1 | LngMult2 | Div | OptDiv |
+------------------+----------+----------+----------+----------+----------+
| Clang 3.5 i386 | 1.610 | 3.217 | 3.129 | 4.405 | 4.398 |
| GCC 4.9.0 i386 | 1.488 | 3.469 | 5.853 | 4.704 | 4.712 |
| GCC 4.2.1 i386 | 2.842 | 4.022 | 3.629 | 4.160 | 4.696 |
| GCC 4.2.1 PPC32 | 8.227 | 7.756 | 7.242 | 20.632 | 20.481 |
| GCC 3.3 PPC32 | 5.684 | 9.804 | 11.525 | 21.734 | 22.517 |
+------------------+----------+----------+----------+----------+----------+
| Clang 3.5 x86_64 | 1.584 | 2.472 | 2.449 | 9.246 | 7.280 |
| GCC 4.9 x86_64 | 1.414 | 2.623 | 4.327 | 9.047 | 7.538 |
| GCC 4.2.1 x86_64 | 2.143 | 2.618 | 2.750 | 9.510 | 7.389 |
| GCC 4.2.1 PPC64 | 13.178 | 8.994 | 8.567 | 37.504 | 29.851 |
+------------------+----------+----------+----------+----------+----------+
Based on these results, we can draw a few conclusions:
Clearly, the division-based approach, although simple and portable, is slow.
No technique is a clear winner in all cases.
On modern compilers, the use-a-larger-int approach is best, if you can use it
On older compilers, the long-multiplication approach is best
Surprisingly, GCC 4.9.0 has performance regressions over GCC 4.2.1, and GCC 4.2.1 has performance regressions over GCC 3.3
A version that also works when a == 0:
x = a * b;
if (a != 0 && x / a != b) {
// overflow handling
}
Easy and fast with clang and gcc:
unsigned long long t a, b, result;
if (__builtin_umulll_overflow(a, b, &result)) {
// overflow!!
}
This will use hardware support for overflow detection where available. By being compiler extensions it can even handle signed integer overflow (replace umul with smul), eventhough that is undefined behavior in C++.
If you need not just to detect overflow but also to capture the carry, you're best off breaking your numbers down into 32-bit parts. The code is a nightmare; what follows is just a sketch:
#include <stdint.h>
uint64_t mul(uint64_t a, uint64_t b) {
uint32_t ah = a >> 32;
uint32_t al = a; // truncates: now a = al + 2**32 * ah
uint32_t bh = b >> 32;
uint32_t bl = b; // truncates: now b = bl + 2**32 * bh
// a * b = 2**64 * ah * bh + 2**32 * (ah * bl + bh * al) + al * bl
uint64_t partial = (uint64_t) al * (uint64_t) bl;
uint64_t mid1 = (uint64_t) ah * (uint64_t) bl;
uint64_t mid2 = (uint64_t) al * (uint64_t) bh;
uint64_t carry = (uint64_t) ah * (uint64_t) bh;
// add high parts of mid1 and mid2 to carry
// add low parts of mid1 and mid2 to partial, carrying
// any carry bits into carry...
}
The problem is not just the partial products but the fact that any of the sums can overflow.
If I had to do this for real, I would write an extended-multiply routine in the local assembly language. That is, for example, multiply two 64-bit integers to get a 128-bit result, which is stored in two 64-bit registers. All reasonable hardware provides this functionality in a single native multiply instruction—it's not just accessible from C.
This is one of those rare cases where the solution that's most elegant and easy to program is actually to use assembly language. But it's certainly not portable :-(
The GNU Portability Library (Gnulib) contains a module intprops, which has macros that efficiently test whether arithmetic operations would overflow.
For example, if an overflow in multiplication would occur, INT_MULTIPLY_OVERFLOW (a, b) would yield 1.
Perhaps the best way to solve this problem is to have a function, which multiplies two UInt64 and results a pair of UInt64, an upper part and a lower part of the UInt128 result. Here is the solution, including a function, which displays the result in hex. I guess you perhaps prefer a C++ solution, but I have a working Swift-Solution which shows, how to manage the problem:
func hex128 (_ hi: UInt64, _ lo: UInt64) -> String
{
var s: String = String(format: "%08X", hi >> 32)
+ String(format: "%08X", hi & 0xFFFFFFFF)
+ String(format: "%08X", lo >> 32)
+ String(format: "%08X", lo & 0xFFFFFFFF)
return (s)
}
func mul64to128 (_ multiplier: UInt64, _ multiplicand : UInt64)
-> (result_hi: UInt64, result_lo: UInt64)
{
let x: UInt64 = multiplier
let x_lo: UInt64 = (x & 0xffffffff)
let x_hi: UInt64 = x >> 32
let y: UInt64 = multiplicand
let y_lo: UInt64 = (y & 0xffffffff)
let y_hi: UInt64 = y >> 32
let mul_lo: UInt64 = (x_lo * y_lo)
let mul_hi: UInt64 = (x_hi * y_lo) + (mul_lo >> 32)
let mul_carry: UInt64 = (x_lo * y_hi) + (mul_hi & 0xffffffff)
let result_hi: UInt64 = (x_hi * y_hi) + (mul_hi >> 32) + (mul_carry >> 32)
let result_lo: UInt64 = (mul_carry << 32) + (mul_lo & 0xffffffff)
return (result_hi, result_lo)
}
Here is an example to verify, that the function works:
var c: UInt64 = 0
var d: UInt64 = 0
(c, d) = mul64to128(0x1234567890123456, 0x9876543210987654)
// 0AD77D742CE3C72E45FD10D81D28D038 is the result of the above example
print(hex128(c, d))
(c, d) = mul64to128(0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF)
// FFFFFFFFFFFFFFFE0000000000000001 is the result of the above example
print(hex128(c, d))
There is a simple (and often very fast solution) which has not been mentioned yet. The solution is based on the fact that n-Bit times m-Bit multiplication does never overflow for a product width of n+m-bit or higher but overflows for all result widths smaller than n+m-1.
Because my old description might have been too difficult to read for some people, I try it again:
What you need is checking the sum of leading-zeroes of both operands. It would be very easy to prove mathematically.
Let x be n-Bit and y be m-Bit. z = x * y is k-Bit. Because the product can be n+m bit large at most it can overflow. Let's say. x*y is p-Bit long (without leading zeroes). The leading zeroes of the product are clz(x * y) = n+m - p. clz behaves similar to log, hence:
clz(x * y) = clz(x) + clz(y) + c with c = either 1 or 0.
(thank you for the c = 1 advice in the comment!)
It overflows when k < p <= n+m <=> n+m - k > n+m - p = clz(x * y).
Now we can use this algorithm:
if max(clz(x * y)) = clz(x) + clz(y) +1 < (n+m - k) --> overflow
if max(clz(x * y)) = clz(x) + clz(y) +1 == (n+m - k) --> overflow if c = 0
else --> no overflow
How to check for overflow in the middle case? I assume, you have a multiplication instruction. Then we easily can use it to see the leading zeroes of the result, i.e.:
if clz(x * y / 2) == (n+m - k) <=> msb(x * y/2) == 1 --> overflow
else --> no overflow
You do the multiplication by treating x/2 as fixed point and y as normal integer:
msb(x * y/2) = msb(floor(x * y / 2))
floor(x * y/2) = floor(x/2) * y + (lsb(x) * floor(y/2)) = (x >> 1)*y + (x & 1)*(y >> 1)
(this result never overflows in case of clz(x)+clz(y)+1 == (n+m -k))
The trick is using builtins/intrinsics. In GCC it looks this way:
static inline int clz(int a) {
if (a == 0) return 32; //only needed for x86 architecture
return __builtin_clz(a);
}
/**#fn static inline _Bool chk_mul_ov(uint32_t f1, uint32_t f2)
* #return one, if a 32-Bit-overflow occurs when unsigned-unsigned-multipliying f1 with f2 otherwise zero. */
static inline _Bool chk_mul_ov(uint32_t f1, uint32_t f2) {
int lzsum = clz(f1) + clz(f2); //leading zero sum
return
lzsum < sizeof(f1)*8-1 || ( //if too small, overflow guaranteed
lzsum == sizeof(f1)*8-1 && //if special case, do further check
(int32_t)((f1 >> 1)*f2 + (f1 & 1)*(f2 >> 1)) < 0 //check product rightshifted by one
);
}
...
if (chk_mul_ov(f1, f2)) {
//error handling
}
...
Just an example for n = m = k = 32-Bit unsigned-unsigned-multiplication. You can generalize it to signed-unsigned- or signed-signed-multiplication. And even no multiple-bit-shift is required (because some microcontrollers implement one-bit-shifts only but sometimes support product divided by two with a single instruction like Atmega!). However, if no count-leading-zeroes instruction exists but long multiplication, this might not be better.
Other compilers have their own way of specifying intrinsics for CLZ operations.
Compared to checking upper half of the multiplication the clz-method should scale better (in worst case) than using a highly optimized 128-Bit multiplication to check for 64-Bit overflow. Multiplication needs over linear overhead while count bits needs only linear overhead.
This code worked out-of-the box for me when tried.
I've been working with this problem this days and I have to say that it has impressed me the number of times I have seen people saying the best way to know if there has been an overflow is to divide the result, thats totally inefficient and unnecessary. The point for this function is that it must be as fast as possible.
There are two options for the overflow detection:
1º- If possible create the result variable twice as big as the multipliers, for example:
struct INT32struct {INT16 high, low;};
typedef union
{
struct INT32struct s;
INT32 ll;
} INT32union;
INT16 mulFunction(INT16 a, INT16 b)
{
INT32union result.ll = a * b; //32Bits result
if(result.s.high > 0)
Overflow();
return (result.s.low)
}
You will know inmediately if there has been an overflow, and the code is the fastest possible without writing it in machine code. Depending on the compiler this code can be improved in machine code.
2º- Is impossible to create a result variable twice as big as the multipliers variable:
Then you should play with if conditions to determine the best path. Continuing with the example:
INT32 mulFunction(INT32 a, INT32 b)
{
INT32union s_a.ll = abs(a);
INT32union s_b.ll = abs(b); //32Bits result
INT32union result;
if(s_a.s.hi > 0 && s_b.s.hi > 0)
{
Overflow();
}
else if (s_a.s.hi > 0)
{
INT32union res1.ll = s_a.s.hi * s_b.s.lo;
INT32union res2.ll = s_a.s.lo * s_b.s.lo;
if (res1.hi == 0)
{
result.s.lo = res1.s.lo + res2.s.hi;
if (result.s.hi == 0)
{
result.s.ll = result.s.lo << 16 + res2.s.lo;
if ((a.s.hi >> 15) ^ (b.s.hi >> 15) == 1)
{
result.s.ll = -result.s.ll;
}
return result.s.ll
}else
{
Overflow();
}
}else
{
Overflow();
}
}else if (s_b.s.hi > 0)
{
//Same code changing a with b
}else
{
return (s_a.lo * s_b.lo);
}
}
I hope this code helps you to have a quite efficient program and I hope the code is clear, if not I'll put some coments.
best regards.
Here is a trick for detecting whether multiplication of two unsigned integers overflows.
We make the observation that if we multiply an N-bit-wide binary number with an M-bit-wide binary number, the product does not have more than N + M bits.
For instance, if we are asked to multiply a three-bit number with a twenty-nine bit number, we know that this doesn't overflow thirty-two bits.
#include <stdlib.h>
#include <stdio.h>
int might_be_mul_oflow(unsigned long a, unsigned long b)
{
if (!a || !b)
return 0;
a = a | (a >> 1) | (a >> 2) | (a >> 4) | (a >> 8) | (a >> 16) | (a >> 32);
b = b | (b >> 1) | (b >> 2) | (b >> 4) | (b >> 8) | (b >> 16) | (b >> 32);
for (;;) {
unsigned long na = a << 1;
if (na <= a)
break;
a = na;
}
return (a & b) ? 1 : 0;
}
int main(int argc, char **argv)
{
unsigned long a, b;
char *endptr;
if (argc < 3) {
printf("supply two unsigned long integers in C form\n");
return EXIT_FAILURE;
}
a = strtoul(argv[1], &endptr, 0);
if (*endptr != 0) {
printf("%s is garbage\n", argv[1]);
return EXIT_FAILURE;
}
b = strtoul(argv[2], &endptr, 0);
if (*endptr != 0) {
printf("%s is garbage\n", argv[2]);
return EXIT_FAILURE;
}
if (might_be_mul_oflow(a, b))
printf("might be multiplication overflow\n");
{
unsigned long c = a * b;
printf("%lu * %lu = %lu\n", a, b, c);
if (a != 0 && c / a != b)
printf("confirmed multiplication overflow\n");
}
return 0;
}
A smattering of tests: (on 64 bit system):
$ ./uflow 0x3 0x3FFFFFFFFFFFFFFF
3 * 4611686018427387903 = 13835058055282163709
$ ./uflow 0x7 0x3FFFFFFFFFFFFFFF
might be multiplication overflow
7 * 4611686018427387903 = 13835058055282163705
confirmed multiplication overflow
$ ./uflow 0x4 0x3FFFFFFFFFFFFFFF
might be multiplication overflow
4 * 4611686018427387903 = 18446744073709551612
$ ./uflow 0x5 0x3FFFFFFFFFFFFFFF
might be multiplication overflow
5 * 4611686018427387903 = 4611686018427387899
confirmed multiplication overflow
The steps in might_be_mul_oflow are almost certainly slower than just doing the division test, at least on mainstream processors used in desktop workstations, servers and mobile devices. On chips without good division support, it could be useful.
It occurs to me that there is another way to do this early rejection test.
We start with a pair of numbers arng and brng which are initialized to 0x7FFF...FFFF and 1.
If a <= arng and b <= brng we can conclude that there is no overflow.
Otherwise, we shift arng to the right, and shift brng to the left, adding one bit to brng, so that they are 0x3FFF...FFFF and 3.
If arng is zero, finish; otherwise repeat at 2.
The function now looks like:
int might_be_mul_oflow(unsigned long a, unsigned long b)
{
if (!a || !b)
return 0;
{
unsigned long arng = ULONG_MAX >> 1;
unsigned long brng = 1;
while (arng != 0) {
if (a <= arng && b <= brng)
return 0;
arng >>= 1;
brng <<= 1;
brng |= 1;
}
return 1;
}
}
When your using e.g. 64 bits variables, implement 'number of significant bits' with nsb(var) = { 64 - clz(var); }.
clz(var) = count leading zeros in var, a builtin command for GCC and Clang, or probably available with inline assembly for your CPU.
Now use the fact that nsb(a * b) <= nsb(a) + nsb(b) to check for overflow. When smaller, it is always 1 smaller.
Ref GCC: Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.
I was thinking about this today and stumbled upon this question, my thoughts led me to this result. TLDR, while I find it "elegant" in that it only uses a few lines of code (could easily be a one liner), and has some mild math that simplifies to something relatively simple conceptually, this is mostly "interesting" and I haven't tested it.
If you think of an unsigned integer as being a single digit with radix 2^n where n is the number of bits in the integer, then you can map those numbers to radians around the unit circle, e.g.
radians(x) = x * (2 * pi * rad / 2^n)
When the integer overflows, it is equivalent to wrapping around the circle. So calculating the carry is equivalent to calculating the number of times multiplication would wrap around the circle. To calculate the number of times we wrap around the circle we divide radians(x) by 2pi radians. e.g.
wrap(x) = radians(x) / (2*pi*rad)
= (x * (2*pi*rad / 2^n)) / (2*pi*rad / 1)
= (x * (2*pi*rad / 2^n)) * (1 / 2*pi*rad)
= x * 1 / 2^n
= x / 2^n
Which simplifies to
wrap(x) = x / 2^n
This makes sense. The number of times a number, for example, 15 with radix 10, wraps around is 15 / 10 = 1.5, or one and a half times. However, we can't use 2 digits here (assuming we are limited to a single 2^64 digit).
Say we have a * b, with radix R, we can calculate the carry with
Consider that: wrap(a * b) = a * wrap(b)
wrap(a * b) = (a * b) / R
a * wrap(b) = a * (b / R)
a * (b / R) = (a * b) / R
carry = floor(a * wrap(b))
Take for example a = 9 and b = 5, which are factors of 45 (i.e. 9 * 5 = 45).
wrap(5) = 5 / 10 = 0.5
a * wrap(5) = 9 * 0.5 = 4.5
carry = floor(9 * wrap(5)) = floor(4.5) = 4
Note that if the carry was 0, then we would not have had overflow, for example if a = 2, b=2.
In C/C++ (if the compiler and architecture supports it) we have to use long double.
Thus we have:
long double wrap = b / 18446744073709551616.0L; // this is b / 2^64
unsigned long carry = (unsigned long)(a * wrap); // floor(a * wrap(b))
bool overflow = carry > 0;
unsigned long c = a * b;
c here is the lower significant "digit", i.e. in base 10 9 * 9 = 81, carry = 8, and c = 1.
This was interesting to me in theory, so I thought I'd share it, but one major caveat is with the floating point precision in computers. Using long double, there may be rounding errors for some numbers when we calculate the wrap variable depending on how many significant digits your compiler/arch uses for long doubles, I believe it should be 20 more more to be sure. Another issue with this result, is that it may not perform as well as some of the other solutions simply by using floating points and division.
If you just want to detect overflow, how about converting to double, doing the multiplication and if
|x| < 2^53, convert to int64
|x| < 2^63, make the multiplication using int64
otherwise produce whatever error you want?
This seems to work:
int64_t safemult(int64_t a, int64_t b) {
double dx;
dx = (double)a * (double)b;
if ( fabs(dx) < (double)9007199254740992 )
return (int64_t)dx;
if ( (double)INT64_MAX < fabs(dx) )
return INT64_MAX;
return a*b;
}
/*
* isLessOrEqual - if x <= y then return 1, else return 0
* Example: isLessOrEqual(4,5) = 1.
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 24
* Rating: 3
*/
int isLessOrEqual(int x, int y)
{
int msbX = x>>31;
int msbY = y>>31;
int sum_xy = (y+(~x+1));
int twoPosAndNegative = (!msbX & !msbY) & sum_xy; //isLessOrEqual is FALSE.
// if = true, twoPosAndNegative = 1; Overflow true
// twoPos = Negative means y < x which means that this
int twoNegAndPositive = (msbX & msbY) & !sum_xy;//isLessOrEqual is FALSE
//We started with two negative numbers, and subtracted X, resulting in positive. Therefore, x is bigger.
int isEqual = (!x^!y); //isLessOrEqual is TRUE
return (twoPosAndNegative | twoNegAndPositive | isEqual);
}
Currently, I am trying to work through how to carry bits in this operator.
The purpose of this function is to identify whether or not int y >= int x.
This is part of a class assignment, so there are restrictions on casting and which operators I can use.
I'm trying to account for a carried bit by applying a mask of the complement of the MSB, to try and remove the most significant bit from the equation, so that they may overflow without causing an issue.
I am under the impression that, ignoring cases of overflow, the returned operator would work.
EDIT: Here is my adjusted code, still not working. But, I think this is progress? I feel like I'm chasing my own tail.
int isLessOrEqual(int x, int y)
{
int msbX = x >> 31;
int msbY = y >> 31;
int sign_xy_sum = (y + (~x + 1)) >> 31;
return ((!msbY & msbX) | (!sign_xy_sum & (!msbY | msbX)));
}
I figured it out with the assistance of one of my peers, alongside the commentators here on StackOverflow.
The solution is as seen above.
The asker has self-answered their question (a class assignment), so providing alternative solutions seems appropriate at this time. The question clearly assumes that integers are represented as two's complement numbers.
One approach is to consider how CPUs compute predicates for conditional branching by means of a compare instruction. "signed less than" as expressed in processor condition codes is SF ≠ OF. SF is the sign flag, a copy of the sign-bit, or most significant bit (MSB) of the result. OF is the overflow flag which indicates overflow in signed integer operations. This is computed as the XOR of the carry-in and the carry-out of the sign-bit or MSB. With two's complement arithmetic, a - b = a + ~b + 1, and therefore a < b = a + ~b < 0. It remains to separate computation on the sign bit (MSB) sufficiently from the lower order bits. This leads to the following code:
int isLessOrEqual (int a, int b)
{
int nb = ~b;
int ma = a & ((1U << (sizeof(a) * CHAR_BIT - 1)) - 1);
int mb = nb & ((1U << (sizeof(b) * CHAR_BIT - 1)) - 1);
// for the following, only the MSB is of interest, other bits are don't care
int cyin = ma + mb;
int ovfl = (a ^ cyin) & (a ^ b);
int sign = (a ^ nb ^ cyin);
int lteq = sign ^ ovfl;
// desired predicate is now in the MSB (sign bit) of lteq, extract it
return (int)((unsigned int)lteq >> (sizeof(lteq) * CHAR_BIT - 1));
}
The casting to unsigned int prior to the final right shift is necessary because right-shifting of signed integers with negative value is implementation-defined, per the ISO-C++ standard, section 5.8. Asker has pointed out that casts are not allowed. When right shifting signed integers, C++ compilers will generate either a logical right shift instruction, or an arithmetic right shift instruction. As we are only interested in extracting the MSB, we can isolate ourselves from the choice by shifting then masking out all other bits besides the LSB, at the cost of one additional operation:
return (lteq >> (sizeof(lteq) * CHAR_BIT - 1)) & 1;
The above solution requires a total of eleven or twelve basic operations. A significantly more efficient solution is based on the 1972 MIT HAKMEM memo, which contains the following observation:
ITEM 23 (Schroeppel): (A AND B) + (A OR B) = A + B = (A XOR B) + 2 (A AND B).
This is straightforward, as A AND B represent the carry bits, and A XOR B represent the sum bits. In a newsgroup posting to comp.arch.arithmetic on February 11, 2000, Peter L. Montgomery provided the following extension:
If XOR is available, then this can be used to average
two unsigned variables A and B when the sum might overflow:
(A+B)/2 = (A AND B) + (A XOR B)/2
In the context of this question, this allows us to compute (a + ~b) / 2 without overflow, then inspect the sign bit to see if the result is less than zero. While Montgomery only referred to unsigned integers, the extension to signed integers is straightforward by use of an arithmetic right shift, keeping in mind that right shifting is an integer division which rounds towards negative infinity, rather than towards zero as regular integer division.
int isLessOrEqual (int a, int b)
{
int nb = ~b;
// compute avg(a,~b) without overflow, rounding towards -INF; lteq(a,b) = SF
int lteq = (a & nb) + arithmetic_right_shift (a ^ nb, 1);
return (int)((unsigned int)lteq >> (sizeof(lteq) * CHAR_BIT - 1));
}
Unfortunately, C++ itself provides no portable way to code an arithmetic right shift, but we can emulate it fairly efficiently using this answer:
int arithmetic_right_shift (int a, int s)
{
unsigned int mask_msb = 1U << (sizeof(mask_msb) * CHAR_BIT - 1);
unsigned int ua = a;
ua = ua >> s;
mask_msb = mask_msb >> s;
return (int)((ua ^ mask_msb) - mask_msb);
}
When inlined, this adds just a couple of instructions to the code when the shift count is a compile-time constant. If the compiler documentation indicates that the implementation-defined handling of signed integers of negative value is accomplished via arithmetic right shift instruction, it is safe to simplify to this six-operation solution:
int isLessOrEqual (int a, int b)
{
int nb = ~b;
// compute avg(a,~b) without overflow, rounding towards -INF; lteq(a,b) = SF
int lteq = (a & nb) + ((a ^ nb) >> 1);
return (int)((unsigned int)lteq >> (sizeof(lteq) * CHAR_BIT - 1));
}
The previously made comments regarding use of a cast when converting the sign bit into a predicate apply here as well.
Imagine I have two unsigned bytes b and x. I need to calculate bsub as b - x and badd as b + x. However, I don't want underflow/overflow occur during these operations. For example (pseudo-code):
b = 3; x = 5;
bsub = b - x; // bsub must be 0, not 254
and
b = 250; x = 10;
badd = b + x; // badd must be 255, not 4
The obvious way to do this includes branching:
bsub = b - min(b, x);
badd = b + min(255 - b, x);
I just wonder if there are any better ways to do this, i.e. by some hacky bit manipulations?
The article Branchfree Saturating Arithmetic provides strategies for this:
Their addition solution is as follows:
u32b sat_addu32b(u32b x, u32b y)
{
u32b res = x + y;
res |= -(res < x);
return res;
}
modified for uint8_t:
uint8_t sat_addu8b(uint8_t x, uint8_t y)
{
uint8_t res = x + y;
res |= -(res < x);
return res;
}
and their subtraction solution is:
u32b sat_subu32b(u32b x, u32b y)
{
u32b res = x - y;
res &= -(res <= x);
return res;
}
modified for uint8_t:
uint8_t sat_subu8b(uint8_t x, uint8_t y)
{
uint8_t res = x - y;
res &= -(res <= x);
return res;
}
A simple method is to detect overflow and reset the value accordingly as below
bsub = b - x;
if (bsub > b)
{
bsub = 0;
}
badd = b + x;
if (badd < b)
{
badd = 255;
}
GCC can optimize the overflow check into a conditional assignment when compiling with -O2.
I measured how much optimization comparing with other solutions. With 1000000000+ operations on my PC, this solution and that of #ShafikYaghmour averaged 4.2 seconds, and that of #chux averaged 4.8 seconds. This solution is more readable as well.
For subtraction:
diff = (a - b)*(a >= b);
Addition:
sum = (a + b) | -(a > (255 - b))
Evolution
// sum = (a + b)*(a <= (255-b)); this fails
// sum = (a + b) | -(a <= (255 - b)) falis too
Thanks to #R_Kapp
Thanks to #NathanOliver
This exercise shows the value of simply coding.
sum = b + min(255 - b, a);
If you are using a recent enough version of gcc or clang (maybe also some others) you could use built-ins to detect overflow.
if (__builtin_add_overflow(a,b,&c))
{
c = UINT_MAX;
}
For addition:
unsigned temp = a+b; // temp>>8 will be 1 if overflow else 0
unsigned char c = temp | -(temp >> 8);
For subtraction:
unsigned temp = a-b; // temp>>8 will be 0xFF if neg-overflow else 0
unsigned char c = temp & ~(temp >> 8);
No comparison operators or multiplies required.
All can be done in unsigned byte arithmetic
// Addition without overflow
return (b > 255 - a) ? 255 : a + b
// Subtraction without underflow
return (b > a) ? 0 : a - b;
If you want to do this with two bytes, use the simplest code possible.
If you want to do this with twenty billion bytes, check what vector instructions are available on your processor and whether they can be used. You might find that your processor can do 32 of these operations with a single instruction.
You could also use the safe numerics library at Boost Library Incubator. It provides drop-in replacements for int, long, etc... which guarantee that you'll never get an undetected overflow, underflow, etc.
If you are willing to use assembly or intrinsics, I think I have an optimal solution.
For subtraction:
We can use the sbb instruction
In MSVC we can use the intrinsic function _subborrow_u64 (also available in other bit sizes).
Here is how it is used:
// *c = a - (b + borrow)
// borrow_flag is set to 1 if (a < (b + borrow))
borrow_flag = _subborrow_u64(borrow_flag, a, b, c);
Here is how we could apply it to your situation
uint64_t sub_no_underflow(uint64_t a, uint64_t b){
uint64_t result;
borrow_flag = _subborrow_u64(0, a, b, &result);
return result * !borrow_flag;
}
For addition:
We can use the adcx instruction
In MSVC we can use the intrinsic function _addcarry_u64 (also available in other bit sizes).
Here is how it is used:
// *c = a + b + carry
// carry_flag is set to 1 if there is a carry bit
carry_flag = _addcarry_u64(carry_flag, a, b, c);
Here is how we could apply it to your situation
uint64_t add_no_overflow(uint64_t a, uint64_t b){
uint64_t result;
carry_flag = _addcarry_u64(0, a, b, &result);
return !carry_flag * result - carry_flag;
}
I don't like this one as much as the subtraction one, but I think it is pretty nifty.
If the add overflows, carry_flag = 1. Not-ing carry_flag yields 0, so !carry_flag * result = 0 when there is overflow. And since 0 - 1 will set the unsigned integral value to its max, the function will return the result of the addition if there is no carry and return the max of the chosen integral value if there is carry.
what about this:
bsum = a + b;
bsum = (bsum < a || bsum < b) ? 255 : bsum;
bsub = a - b;
bsub = (bsub > a || bsub > b) ? 0 : bsub;
If you will call those methods a lot, the fastest way would be not bit manipulation but probably a look-up table. Define an array of length 511 for each operation.
Example for minus (subtraction)
static unsigned char maxTable[511];
memset(maxTable, 0, 255); // If smaller, emulates cutoff at zero
maxTable[255]=0; // If equal - return zero
for (int i=0; i<256; i++)
maxTable[255+i] = i; // If greater - return the difference
The array is static and initialized only once. Now your subtraction can be defined as inline method or using pre-compiler:
#define MINUS(A,B) maxTable[A-B+255];
How it works? Well you want to pre-calculate all possible subtractions for unsigned chars. The results vary from -255 to +255, total of 511 different result. We define an array of all possible results but because in C we cannot access it from negative indices we use +255 (in [A-B+255]). You can remove this action by defining a pointer to the center of the array.
const unsigned char *result = maxTable+255;
#define MINUS(A,B) result[A-B];
use it like:
bsub = MINUS(13,15); // i.e 13-15 with zero cutoff as requested
Note that the execution is extremely fast. Only one subtraction and one pointer deference to get the result. No branching. The static arrays are very short so they will be fully loaded into CPU's cache to further speed up the calculation
Same would work for addition but with a bit different table (first 256 elements will be the indices and last 255 elements will be equal to 255 to emulate the cutoff beyond 255.
If you insist on bits operation, the answers that use (a>b) are wrong. This still might be implemented as branching. Use the sign-bit technique
// (num1>num2) ? 1 : 0
#define is_int_biggerNotEqual( num1,num2) ((((__int32)((num2)-(num1)))&0x80000000)>>31)
Now you can use it for calculation of subtraction and addition.
If you want to emulate the functions max(), min() without branching use:
inline __int32 MIN_INT(__int32 x, __int32 y){ __int32 d=x-y; return y+(d&(d>>31)); }
inline __int32 MAX_INT(__int32 x, __int32 y){ __int32 d=x-y; return x-(d&(d>>31)); }
My examples above use 32 bits integers. You can change it to 64, though I believe that 32 bits calculations run a bit faster. Up to you
I want to calculate ab mod n for use in RSA decryption. My code (below) returns incorrect answers. What is wrong with it?
unsigned long int decrypt2(int a,int b,int n)
{
unsigned long int res = 1;
for (int i = 0; i < (b / 2); i++)
{
res *= ((a * a) % n);
res %= n;
}
if (b % n == 1)
res *=a;
res %=n;
return res;
}
You can try this C++ code. I've used it with 32 and 64-bit integers. I'm sure I got this from SO.
template <typename T>
T modpow(T base, T exp, T modulus) {
base %= modulus;
T result = 1;
while (exp > 0) {
if (exp & 1) result = (result * base) % modulus;
base = (base * base) % modulus;
exp >>= 1;
}
return result;
}
You can find this algorithm and related discussion in the literature on p. 244 of
Schneier, Bruce (1996). Applied Cryptography: Protocols, Algorithms, and Source Code in C, Second Edition (2nd ed.). Wiley. ISBN 978-0-471-11709-4.
Note that the multiplications result * base and base * base are subject to overflow in this simplified version. If the modulus is more than half the width of T (i.e. more than the square root of the maximum T value), then one should use a suitable modular multiplication algorithm instead - see the answers to Ways to do modulo multiplication with primitive types.
In order to calculate pow(a,b) % n to be used for RSA decryption, the best algorithm I came across is Primality Testing 1) which is as follows:
int modulo(int a, int b, int n){
long long x=1, y=a;
while (b > 0) {
if (b%2 == 1) {
x = (x*y) % n; // multiplying with base
}
y = (y*y) % n; // squaring the base
b /= 2;
}
return x % n;
}
See below reference for more details.
1) Primality Testing : Non-deterministic Algorithms – topcoder
Usually it's something like this:
while (b)
{
if (b % 2) { res = (res * a) % n; }
a = (a * a) % n;
b /= 2;
}
return res;
The only actual logic error that I see is this line:
if (b % n == 1)
which should be this:
if (b % 2 == 1)
But your overall design is problematic: your function performs O(b) multiplications and modulus operations, but your use of b / 2 and a * a implies that you were aiming to perform O(log b) operations (which is usually how modular exponentiation is done).
Doing the raw power operation is very costly, hence you can apply the following logic to simplify the decryption.
From here,
Now say we want to encrypt the message m = 7, c = m^e mod n = 7^3 mod 33
= 343 mod 33 = 13. Hence the ciphertext c = 13.
To check decryption we compute m' = c^d mod n = 13^7 mod 33 = 7. Note
that we don't have to calculate the full value of 13 to the power 7
here. We can make use of the fact that a = bc mod n = (b mod n).(c mod
n) mod n so we can break down a potentially large number into its
components and combine the results of easier, smaller calculations to
calculate the final value.
One way of calculating m' is as follows:- Note that any number can be
expressed as a sum of powers of 2. So first compute values of 13^2,
13^4, 13^8, ... by repeatedly squaring successive values modulo 33. 13^2
= 169 ≡ 4, 13^4 = 4.4 = 16, 13^8 = 16.16 = 256 ≡ 25. Then, since 7 = 4 + 2 + 1, we have m' = 13^7 = 13^(4+2+1) = 13^4.13^2.13^1 ≡ 16 x 4 x 13 = 832
≡ 7 mod 33
Are you trying to calculate (a^b)%n, or a^(b%n) ?
If you want the first one, then your code only works when b is an even number, because of that b/2. The "if b%n==1" is incorrect because you don't care about b%n here, but rather about b%2.
If you want the second one, then the loop is wrong because you're looping b/2 times instead of (b%n)/2 times.
Either way, your function is unnecessarily complex. Why do you loop until b/2 and try to multiply in 2 a's each time? Why not just loop until b and mulitply in one a each time. That would eliminate a lot of unnecessary complexity and thus eliminate potential errors. Are you thinking that you'll make the program faster by cutting the number of times through the loop in half? Frankly, that's a bad programming practice: micro-optimization. It doesn't really help much: You still multiply by a the same number of times, all you do is cut down on the number of times testing the loop. If b is typically small (like one or two digits), it's not worth the trouble. If b is large -- if it can be in the millions -- then this is insufficient, you need a much more radical optimization.
Also, why do the %n each time through the loop? Why not just do it once at the end?
Calculating pow(a,b) mod n
A key problem with OP's code is a * a. This is int overflow (undefined behavior) when a is large enough. The type of res is irrelevant in the multiplication of a * a.
The solution is to ensure either:
the multiplication is done with 2x wide math or
with modulus n, n*n <= type_MAX + 1
There is no reason to return a wider type than the type of the modulus as the result is always represent by that type.
// unsigned long int decrypt2(int a,int b,int n)
int decrypt2(int a,int b,int n)
Using unsigned math is certainly more suitable for OP's RSA goals.
Also see Modular exponentiation without range restriction
// (a^b)%n
// n != 0
// Test if unsigned long long at least 2x values bits as unsigned
#if ULLONG_MAX/UINT_MAX - 1 > UINT_MAX
unsigned decrypt2(unsigned a, unsigned b, unsigned n) {
unsigned long long result = 1u % n; // Insure result < n, even when n==1
while (b > 0) {
if (b & 1) result = (result * a) % n;
a = (1ULL * a * a) %n;
b >>= 1;
}
return (unsigned) result;
}
#else
unsigned decrypt2(unsigned a, unsigned b, unsigned n) {
// Detect if UINT_MAX + 1 < n*n
if (UINT_MAX/n < n-1) {
return TBD_code_with_wider_math(a,b,n);
}
a %= n;
unsigned result = 1u % n;
while (b > 0) {
if (b & 1) result = (result * a) % n;
a = (a * a) % n;
b >>= 1;
}
return result;
}
#endif
int's are generally not enough for RSA (unless you are dealing with small simplified examples)
you need a data type that can store integers up to 2256 (for 256-bit RSA keys) or 2512 for 512-bit keys, etc
Here is another way. Remember that when we find modulo multiplicative inverse of a under mod m.
Then
a and m must be coprime with each other.
We can use gcd extended for calculating modulo multiplicative inverse.
For computing ab mod m when a and b can have more than 105 digits then its tricky to compute the result.
Below code will do the computing part :
#include <iostream>
#include <string>
using namespace std;
/*
* May this code live long.
*/
long pow(string,string,long long);
long pow(long long ,long long ,long long);
int main() {
string _num,_pow;
long long _mod;
cin>>_num>>_pow>>_mod;
//cout<<_num<<" "<<_pow<<" "<<_mod<<endl;
cout<<pow(_num,_pow,_mod)<<endl;
return 0;
}
long pow(string n,string p,long long mod){
long long num=0,_pow=0;
for(char c: n){
num=(num*10+c-48)%mod;
}
for(char c: p){
_pow=(_pow*10+c-48)%(mod-1);
}
return pow(num,_pow,mod);
}
long pow(long long a,long long p,long long mod){
long res=1;
if(a==0)return 0;
while(p>0){
if((p&1)==0){
p/=2;
a=(a*a)%mod;
}
else{
p--;
res=(res*a)%mod;
}
}
return res;
}
This code works because ab mod m can be written as (a mod m)b mod m-1 mod m.
Hope it helped { :)
use fast exponentiation maybe..... gives same o(log n) as that template above
int power(int base, int exp,int mod)
{
if(exp == 0)
return 1;
int p=power(base, exp/2,mod);
p=(p*p)% mod;
return (exp%2 == 0)?p:(base * p)%mod;
}
This(encryption) is more of an algorithm design problem than a programming one. The important missing part is familiarity with modern algebra. I suggest that you look for a huge optimizatin in group theory and number theory.
If n is a prime number, pow(a,n-1)%n==1 (assuming infinite digit integers).So, basically you need to calculate pow(a,b%(n-1))%n; According to group theory, you can find e such that every other number is equivalent to a power of e modulo n. Therefore the range [1..n-1] can be represented as a permutation on powers of e. Given the algorithm to find e for n and logarithm of a base e, calculations can be significantly simplified. Cryptography needs a tone of math background; I'd rather be off that ground without enough background.
For my code a^k mod n in php:
function pmod(a, k, n)
{
if (n==1) return 0;
power = 1;
for(i=1; i<=k; $i++)
{
power = (power*a) % n;
}
return power;
}
#include <cmath>
...
static_cast<int>(std::pow(a,b))%n
but my best bet is you are overflowing int (IE: the number is two large for the int) on the power I had the same problem creating the exact same function.
I'm using this function:
int CalculateMod(int base, int exp ,int mod){
int result;
result = (int) pow(base,exp);
result = result % mod;
return result;
}
I parse the variable result because pow give you back a double, and for using mod you need two variables of type int, anyway, in a RSA decryption, you should just use integer numbers.
One of my pet hates of C-derived languages (as a mathematician) is that
(-1) % 8 // comes out as -1, and not 7
fmodf(-1,8) // fails similarly
What's the best solution?
C++ allows the possibility of templates and operator overloading, but both of these are murky waters for me. examples gratefully received.
First of all I'd like to note that you cannot even rely on the fact that (-1) % 8 == -1. the only thing you can rely on is that (x / y) * y + ( x % y) == x. However whether or not the remainder is negative is implementation-defined.
Reference: C++03 paragraph 5.6 clause 4:
The binary / operator yields the quotient, and the binary % operator yields the remainder from the division of the first expression by the second. If the second operand of / or % is zero the behavior is undefined; otherwise (a/b)*b + a%b is equal to a. If both operands are nonnegative then the remainder is nonnegative; if not, the sign of the remainder is implementation-defined.
Here it follows a version that handles both negative operands so that the result of the subtraction of the remainder from the divisor can be subtracted from the dividend so it will be floor of the actual division. mod(-1,8) results in 7, while mod(13, -8) is -3.
int mod(int a, int b)
{
if(b < 0) //you can check for b == 0 separately and do what you want
return -mod(-a, -b);
int ret = a % b;
if(ret < 0)
ret+=b;
return ret;
}
Here is a C function that handles positive OR negative integer OR fractional values for BOTH OPERANDS
#include <math.h>
float mod(float a, float N) {return a - N*floor(a/N);} //return in range [0, N)
This is surely the most elegant solution from a mathematical standpoint. However, I'm not sure if it is robust in handling integers. Sometimes floating point errors creep in when converting int -> fp -> int.
I am using this code for non-int s, and a separate function for int.
NOTE: need to trap N = 0!
Tester code:
#include <math.h>
#include <stdio.h>
float mod(float a, float N)
{
float ret = a - N * floor (a / N);
printf("%f.1 mod %f.1 = %f.1 \n", a, N, ret);
return ret;
}
int main (char* argc, char** argv)
{
printf ("fmodf(-10.2, 2.0) = %f.1 == FAIL! \n\n", fmodf(-10.2, 2.0));
float x;
x = mod(10.2f, 2.0f);
x = mod(10.2f, -2.0f);
x = mod(-10.2f, 2.0f);
x = mod(-10.2f, -2.0f);
return 0;
}
(Note: You can compile and run it straight out of CodePad: http://codepad.org/UOgEqAMA)
Output:
fmodf(-10.2, 2.0) = -0.20 == FAIL!
10.2 mod 2.0 = 0.2
10.2 mod -2.0 = -1.8
-10.2 mod 2.0 = 1.8
-10.2 mod -2.0 = -0.2
I have just noticed that Bjarne Stroustrup labels % as the remainder operator, not the modulo operator.
I would bet that this is its formal name in the ANSI C & C++ specifications, and that abuse of terminology has crept in. Does anyone know this for a fact?
But if this is the case then C's fmodf() function (and probably others) are very misleading. they should be labelled fremf(), etc
The simplest general function to find the positive modulo would be this-
It would work on both positive and negative values of x.
int modulo(int x,int N){
return (x % N + N) %N;
}
For integers this is simple. Just do
(((x < 0) ? ((x % N) + N) : x) % N)
where I am supposing that N is positive and representable in the type of x. Your favorite compiler should be able to optimize this out, such that it ends up in just one mod operation in assembler.
The best solution ¹for a mathematician is to use Python.
C++ operator overloading has little to do with it. You can't overload operators for built-in types. What you want is simply a function. Of course you can use C++ templating to implement that function for all relevant types with just 1 piece of code.
The standard C library provides fmod, if I recall the name correctly, for floating point types.
For integers you can define a C++ function template that always returns non-negative remainder (corresponding to Euclidian division) as ...
#include <stdlib.h> // abs
template< class Integer >
auto mod( Integer a, Integer b )
-> Integer
{
Integer const r = a%b;
return (r < 0? r + abs( b ) : r);
}
... and just write mod(a, b) instead of a%b.
Here the type Integer needs to be a signed integer type.
If you want the common math behavior where the sign of the remainder is the same as the sign of the divisor, then you can do e.g.
template< class Integer >
auto floor_div( Integer const a, Integer const b )
-> Integer
{
bool const a_is_negative = (a < 0);
bool const b_is_negative = (b < 0);
bool const change_sign = (a_is_negative != b_is_negative);
Integer const abs_b = abs( b );
Integer const abs_a_plus = abs( a ) + (change_sign? abs_b - 1 : 0);
Integer const quot = abs_a_plus / abs_b;
return (change_sign? -quot : quot);
}
template< class Integer >
auto floor_mod( Integer const a, Integer const b )
-> Integer
{ return a - b*floor_div( a, b ); }
… with the same constraint on Integer, that it's a signed type.
¹ Because Python's integer division rounds towards negative infinity.
Here's a new answer to an old question, based on this Microsoft Research paper and references therein.
Note that from C11 and C++11 onwards, the semantics of div has become truncation towards zero (see [expr.mul]/4). Furthermore, for D divided by d, C++11 guarantees the following about the quotient qT and remainder rT
auto const qT = D / d;
auto const rT = D % d;
assert(D == d * qT + rT);
assert(abs(rT) < abs(d));
assert(signum(rT) == signum(D) || rT == 0);
where signum maps to -1, 0, +1, depending on whether its argument is <, ==, > than 0 (see this Q&A for source code).
With truncated division, the sign of the remainder is equal to the sign of the dividend D, i.e. -1 % 8 == -1. C++11 also provides a std::div function that returns a struct with members quot and rem according to truncated division.
There are other definitions possible, e.g. so-called floored division can be defined in terms of the builtin truncated division
auto const I = signum(rT) == -signum(d) ? 1 : 0;
auto const qF = qT - I;
auto const rF = rT + I * d;
assert(D == d * qF + rF);
assert(abs(rF) < abs(d));
assert(signum(rF) == signum(d));
With floored division, the sign of the remainder is equal to the sign of the divisor d. In languages such as Haskell and Oberon, there are builtin operators for floored division. In C++, you'd need to write a function using the above definitions.
Yet another way is Euclidean division, which can also be defined in terms of the builtin truncated division
auto const I = rT >= 0 ? 0 : (d > 0 ? 1 : -1);
auto const qE = qT - I;
auto const rE = rT + I * d;
assert(D == d * qE + rE);
assert(abs(rE) < abs(d));
assert(signum(rE) >= 0);
With Euclidean division, the sign of the remainder is always non-negative.
Oh, I hate % design for this too....
You may convert dividend to unsigned in a way like:
unsigned int offset = (-INT_MIN) - (-INT_MIN)%divider
result = (offset + dividend) % divider
where offset is closest to (-INT_MIN) multiple of module, so adding and subtracting it will not change modulo. Note that it have unsigned type and result will be integer. Unfortunately it cannot correctly convert values INT_MIN...(-offset-1) as they cause arifmetic overflow. But this method have advandage of only single additional arithmetic per operation (and no conditionals) when working with constant divider, so it is usable in DSP-like applications.
There's special case, where divider is 2N (integer power of two), for which modulo can be calculated using simple arithmetic and bitwise logic as
dividend&(divider-1)
for example
x mod 2 = x & 1
x mod 4 = x & 3
x mod 8 = x & 7
x mod 16 = x & 15
More common and less tricky way is to get modulo using this function (works only with positive divider):
int mod(int x, int y) {
int r = x%y;
return r<0?r+y:r;
}
This just correct result if it is negative.
Also you may trick:
(p%q + q)%q
It is very short but use two %-s which are commonly slow.
I believe another solution to this problem would be use to variables of type long instead of int.
I was just working on some code where the % operator was returning a negative value which caused some issues (for generating uniform random variables on [0,1] you don't really want negative numbers :) ), but after switching the variables to type long, everything was running smoothly and the results matched the ones I was getting when running the same code in python (important for me as I wanted to be able to generate the same "random" numbers across several platforms.
For a solution that uses no branches and only 1 mod, you can do the following
// Works for other sizes too,
// assuming you change 63 to the appropriate value
int64_t mod(int64_t x, int64_t div) {
return (x % div) + (((x >> 63) ^ (div >> 63)) & div);
}
/* Warning: macro mod evaluates its arguments' side effects multiple times. */
#define mod(r,m) (((r) % (m)) + ((r)<0)?(m):0)
... or just get used to getting any representative for the equivalence class.
Example template for C++
template< class T >
T mod( T a, T b )
{
T const r = a%b;
return ((r!=0)&&((r^b)<0) ? r + b : r);
}
With this template, the returned remainder will be zero or have the same sign as the divisor (denominator) (the equivalent of rounding towards negative infinity), instead of the C++ behavior of the remainder being zero or having the same sign as the dividend (numerator) (the equivalent of rounding towards zero).
define MOD(a, b) ((((a)%(b))+(b))%(b))
unsigned mod(int a, unsigned b) {
return (a >= 0 ? a % b : b - (-a) % b);
}
This solution (for use when mod is positive) avoids taking negative divide or remainder operations all together:
int core_modulus(int val, int mod)
{
if(val>=0)
return val % mod;
else
return val + mod * ((mod - val - 1)/mod);
}
I would do:
((-1)+8) % 8
This adds the latter number to the first before doing the modulo giving 7 as desired. This should work for any number down to -8. For -9 add 2*8.