Why does C++ output negative numbers when using modulo? - c++

Math:
If you have an equation like this:
x = 3 mod 7
x could be ... -4, 3, 10, 17, ..., or more generally:
x = 3 + k * 7
where k can be any integer. I don't know of a modulo operation is defined for math, but the factor ring certainly is.
Python:
In Python, you will always get non-negative values when you use % with a positive m:
#!/usr/bin/python
# -*- coding: utf-8 -*-
m = 7
for i in xrange(-8, 10 + 1):
print(i % 7)
Results in:
6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3
C++:
#include <iostream>
using namespace std;
int main(){
int m = 7;
for(int i=-8; i <= 10; i++) {
cout << (i % m) << endl;
}
return 0;
}
Will output:
-1 0 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 0 1 2 3
ISO/IEC 14882:2003(E) - 5.6 Multiplicative operators:
The binary / operator yields the quotient, and the binary % operator
yields the remainder from the division of the first expression by the
second. If the second operand of / or % is zero the behavior is
undefined; otherwise (a/b)*b + a%b is equal to a. If both operands are
nonnegative then the remainder is nonnegative; if not, the sign of the
remainder is implementation-defined 74).
and
74) According to work underway toward the revision of ISO C, the
preferred algorithm for integer division follows the rules defined in
the ISO Fortran standard, ISO/IEC 1539:1991, in which the quotient is
always rounded toward zero.
Source: ISO/IEC 14882:2003(E)
(I couldn't find a free version of ISO/IEC 1539:1991. Does anybody know where to get it from?)
The operation seems to be defined like this:
Question:
Does it make sense to define it like that?
What are arguments for this specification? Is there a place where the people who create such standards discuss about it? Where I can read something about the reasons why they decided to make it this way?
Most of the time when I use modulo, I want to access elements of a datastructure. In this case, I have to make sure that mod returns a non-negative value. So, for this case, it would be good of mod always returned a non-negative value.
(Another usage is the Euclidean algorithm. As you could make both numbers positive before using this algorithm, the sign of modulo would matter.)
Additional material:
See Wikipedia for a long list of what modulo does in different languages.

On x86 (and other processor architectures), integer division and modulo are carried out by a single operation, idiv (div for unsigned values), which produces both quotient and remainder (for word-sized arguments, in AX and DX respectively). This is used in the C library function divmod, which can be optimised by the compiler to a single instruction!
Integer division respects two rules:
Non-integer quotients are rounded towards zero; and
the equation dividend = quotient*divisor + remainder is satisfied by the results.
Accordingly, when dividing a negative number by a positive number, the quotient will be negative (or zero).
So this behaviour can be seen as the result of a chain of local decisions:
Processor instruction set design optimises for the common case (division) over the less common case (modulo);
Consistency (rounding towards zero, and respecting the division equation) is preferred over mathematical correctness;
C prefers efficiency and simplicitly (especially given the tendency to view C as a "high level assembler"); and
C++ prefers compatibility with C.

Back in the day, someone designing the x86 instruction set decided it was right and good to round integer division toward zero rather than round down. (May the fleas of a thousand camels nest in his mother's beard.) To keep some semblance of math-correctness, operator REM, which is pronounced "remainder", had to behave accordingly. DO NOT read this: https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_73/rzatk/REM.htm
I warned you. Later someone doing the C spec decided it would be conforming for a compiler to do it either the right way or the x86 way. Then a committee doing the C++ spec decided to do it the C way. Then later yet, after this question was posted, a C++ committee decided to standardize on the wrong way. Now we are stuck with it. Many a programmer has written the following function or something like it. I have probably done it at least a dozen times.
inline int mod(int a, int b) {int ret = a%b; return ret>=0? ret: ret+b; }
There goes your efficiency.
These days I use essentially the following, with some type_traits stuff thrown in. (Thanks to Clearer for a comment that gave me an idea for an improvement using latter day C++. See below.)
<strike>template<class T>
inline T mod(T a, T b) {
assert(b > 0);
T ret = a%b;
return (ret>=0)?(ret):(ret+b);
}</strike>
template<>
inline unsigned mod(unsigned a, unsigned b) {
assert(b > 0);
return a % b;
}
True fact: I lobbied the Pascal standards committee to do mod the right way until they relented. To my horror, they did integer division the wrong way. So they do not even match.
EDIT: Clearer gave me an idea. I am working on a new one.
#include <type_traits>
template<class T1, class T2>
inline T1 mod(T1 a, T2 b) {
assert(b > 0);
T1 ret = a % b;
if constexpr ( std::is_unsigned_v<T1>)
{
return ret;
} else {
return (ret >= 0) ? (ret) : (ret + b);
}
}

What are arguments for this specification?
One of the design goals of C++ is to map efficiently to hardware. If the underlying hardware implements division in a way that produces negative remainders, then that's what you'll get if you use % in C++. That's all there is to it really.
Is there a place where the people who create such standards discuss about it?
You will find interesting discussions on comp.lang.c++.moderated and, to a lesser extent, comp.lang.c++

Others have described the why well enough and unfortunately the question which asks for a solution is marked a duplicate of this one and a comprehensive answer on that aspect seems to be missing. There seem to be 2 commonly used general solutions and one special-case I would like to include:
// 724ms
inline int mod1(int a, int b)
{
const int r = a % b;
return r < 0 ? r + b : r;
}
// 759ms
inline int mod2(int a, int b)
{
return (a % b + b) % b;
}
// 671ms (see NOTE1!)
inline int mod3(int a, int b)
{
return (a + b) % b;
}
int main(int argc, char** argv)
{
volatile int x;
for (int i = 0; i < 10000000; ++i) {
for (int j = -argc + 1; j < argc; ++j) {
x = modX(j, argc);
if (x < 0) return -1; // Sanity check
}
}
}
NOTE1: This is not generally correct (i.e. if a < -b). The reason I included it is because almost every time I find myself taking the modulus of a negative number is when doing math with numbers that are already modded, for example (i1 - i2) % n where the 0 <= iX < n (e.g. indices of a circular buffer).
As always, YMMV with regards to timing.

Related

Dividing two integers and rounding up the result, without using floating point

I need to divide two numbers and round it up. Are there any better way to do this?
int myValue = (int) ceil( (float)myIntNumber / myOtherInt );
I find an overkill to have to cast two different time. (the extern int cast is just to shut down the warning)
Note I have to cast internally to float otherwise
int a = ceil(256/11); //> Should be 24, but it is 23
^example
Assuming that both myIntNumber and myOtherInt are positive, you could do:
int myValue = (myIntNumber + myOtherInt - 1) / myOtherInt;
With help from DyP, came up with the following branchless formula:
int idiv_ceil ( int numerator, int denominator )
{
return numerator / denominator
+ (((numerator < 0) ^ (denominator > 0)) && (numerator%denominator));
}
It avoids floating-point conversions and passes a basic suite of unit tests, as shown here:
http://ideone.com/3OrviU
Here's another version that avoids the modulo operator.
int idiv_ceil ( int numerator, int denominator )
{
int truncated = numerator / denominator;
return truncated + (((numerator < 0) ^ (denominator > 0)) &&
(numerator - truncated*denominator));
}
http://ideone.com/Z41G5q
The first one will be faster on processors where IDIV returns both quotient and remainder (and the compiler is smart enough to use that).
Maybe it is just easier to do a:
int result = dividend / divisor;
if(dividend % divisor != 0)
result++;
Benchmarks
Since a lot of different methods are shown in the answers and none of the answers actually prove any advantages in terms of performance I tried to benchmark them myself. My plan was to write an answer that contains a short table and a definite answer which method is the fastest.
Unfortunately it wasn't that simple. (It never is.) It seems that the performance of the rounding formulas depend on the used data type, compiler and optimization level.
In one case there is an increase of speed by 7.5× from one method to another. So the impact can be significant for some people.
TL;DR
For long integers the naive version using a type cast to float and std::ceil was actually the fastest. This was interesting for me personally since I intended to use it with size_t which is usually defined as unsigned long.
For ordinary ints it depends on your optimization level. For lower levels #Jwodder's solution performs best. For the highest levels std::ceil was the optimal one. With one exception: For the clang/unsigned int combination #Jwodder's was still better.
The solutions from the accepted answer never really outperformed the other two. You should keep in mind however that #Jwodder's solution doesn't work with negatives.
Results are at the bottom.
Implementations
To recap here are the four methods I benchmarked and compared:
#Jwodder's version (Jwodder)
template<typename T>
inline T divCeilJwodder(const T& numerator, const T& denominator)
{
return (numerator + denominator - 1) / denominator;
}
#Ben Voigt's version using modulo (VoigtModulo)
template<typename T>
inline T divCeilVoigtModulo(const T& numerator, const T& denominator)
{
return numerator / denominator + (((numerator < 0) ^ (denominator > 0))
&& (numerator%denominator));
}
#Ben Voigt's version without using modulo (VoigtNoModulo)
inline T divCeilVoigtNoModulo(const T& numerator, const T& denominator)
{
T truncated = numerator / denominator;
return truncated + (((numerator < 0) ^ (denominator > 0))
&& (numerator - truncated*denominator));
}
OP's implementation (TypeCast)
template<typename T>
inline T divCeilTypeCast(const T& numerator, const T& denominator)
{
return (int)std::ceil((double)numerator / denominator);
}
Methodology
In a single batch the division is performed 100 million times. Ten batches are calculated for each combination of Compiler/Optimization level, used data type and used implementation. The values shown below are the averages of all 10 batches in milliseconds. The errors that are given are standard deviations.
The whole source code that was used can be found here. Also you might find this script useful which compiles and executes the source with different compiler flags.
The whole benchmark was performed on a i7-7700K. The used compiler versions were GCC 10.2.0 and clang 11.0.1.
Results
Now without further ado here are the results:
DataTypeAlgorithm
GCC-O0
-O1
-O2
-O3
-Os
-Ofast
-Og
clang-O0
-O1
-O2
-O3
-Ofast
-Os
-Oz
unsigned
Jwodder
264.1 ± 0.9 🏆
175.2 ± 0.9 🏆
153.5 ± 0.7 🏆
175.2 ± 0.5 🏆
153.3 ± 0.5
153.4 ± 0.8
175.5 ± 0.6 🏆
329.4 ± 1.3 🏆
220.0 ± 1.3 🏆
146.2 ± 0.6 🏆
146.2 ± 0.6 🏆
146.0 ± 0.5 🏆
153.2 ± 0.3 🏆
153.5 ± 0.6 🏆
VoigtModulo
528.5 ± 2.5
306.5 ± 1.0
175.8 ± 0.7
175.2 ± 0.5 🏆
175.6 ± 0.7
175.4 ± 0.6
352.0 ± 1.0
588.9 ± 6.4
408.7 ± 1.5
164.8 ± 1.0
164.0 ± 0.4
164.1 ± 0.4
175.2 ± 0.5
175.8 ± 0.9
VoigtNoModulo
375.3 ± 1.5
175.7 ± 1.3 🏆
192.5 ± 1.4
197.6 ± 1.9
200.6 ± 7.2
176.1 ± 1.5
197.9 ± 0.5
541.0 ± 1.8
263.1 ± 0.9
186.4 ± 0.6
186.4 ± 1.2
187.2 ± 1.1
197.2 ± 0.8
197.1 ± 0.7
TypeCast
348.5 ± 2.7
231.9 ± 3.9
234.4 ± 1.3
226.6 ± 1.0
137.5 ± 0.8 🏆
138.7 ± 1.7 🏆
243.8 ± 1.4
591.2 ± 2.4
591.3 ± 2.6
155.8 ± 1.9
155.9 ± 1.6
155.9 ± 2.4
214.6 ± 1.9
213.6 ± 1.1
unsigned long
Jwodder
658.6 ± 2.5
546.3 ± 0.9
549.3 ± 1.8
549.1 ± 2.8
540.6 ± 3.4
548.8 ± 1.3
486.1 ± 1.1
638.1 ± 1.8
613.3 ± 2.1
190.0 ± 0.8 🏆
182.7 ± 0.5
182.4 ± 0.5
496.2 ± 1.3
554.1 ± 1.0
VoigtModulo
1,169.0 ± 2.9
1,015.9 ± 4.4
550.8 ± 2.0
504.0 ± 1.4
550.3 ± 1.2
550.5 ± 1.3
1,020.1 ± 2.9
1,259.0 ± 9.0
1,136.5 ± 4.2
187.0 ± 3.4 🏆
199.7 ± 6.1
197.6 ± 1.0
549.4 ± 1.7
506.8 ± 4.4
VoigtNoModulo
768.1 ± 1.7
559.1 ± 1.8
534.4 ± 1.6
533.7 ± 1.5
559.5 ± 1.7
534.3 ± 1.5
571.5 ± 1.3
879.5 ± 10.8
617.8 ± 2.1
223.4 ± 1.3
231.3 ± 1.3
231.4 ± 1.1
594.6 ± 1.9
572.2 ± 0.8
TypeCast
353.3 ± 2.5 🏆
267.5 ± 1.7 🏆
248.0 ± 1.6 🏆
243.8 ± 1.2 🏆
154.2 ± 0.8 🏆
154.1 ± 1.0 🏆
263.8 ± 1.8 🏆
365.5 ± 1.6 🏆
316.9 ± 1.8 🏆
189.7 ± 2.1 🏆
156.3 ± 1.8 🏆
157.0 ± 2.2 🏆
155.1 ± 0.9 🏆
176.5 ± 1.2 🏆
int
Jwodder
307.9 ± 1.3 🏆
175.4 ± 0.9 🏆
175.3 ± 0.5 🏆
175.4 ± 0.6 🏆
175.2 ± 0.5
175.1 ± 0.6
175.1 ± 0.5 🏆
307.4 ± 1.2 🏆
219.6 ± 0.6 🏆
146.0 ± 0.3 🏆
153.5 ± 0.5
153.6 ± 0.8
175.4 ± 0.7 🏆
175.2 ± 0.5 🏆
VoigtModulo
528.5 ± 1.9
351.9 ± 4.6
175.3 ± 0.6 🏆
175.2 ± 0.4 🏆
197.1 ± 0.6
175.2 ± 0.8
373.5 ± 1.1
598.7 ± 5.1
460.6 ± 1.3
175.4 ± 0.4
164.3 ± 0.9
164.0 ± 0.4
176.3 ± 1.6 🏆
460.0 ± 0.8
VoigtNoModulo
398.0 ± 2.5
241.0 ± 0.7
199.4 ± 5.1
219.2 ± 1.0
175.9 ± 1.2
197.7 ± 1.2
242.9 ± 3.0
543.5 ± 2.3
350.6 ± 1.0
186.6 ± 1.2
185.7 ± 0.3
186.3 ± 1.1
197.1 ± 0.6
373.3 ± 1.6
TypeCast
338.8 ± 4.9
228.1 ± 0.9
230.3 ± 0.8
229.5 ± 9.4
153.8 ± 0.4 🏆
138.3 ± 1.0 🏆
241.1 ± 1.1
590.0 ± 2.1
589.9 ± 0.8
155.2 ± 2.4
149.4 ± 1.6 🏆
148.4 ± 1.2 🏆
214.6 ± 2.2
211.7 ± 1.6
long
Jwodder
758.1 ± 1.8
600.6 ± 0.9
601.5 ± 2.2
601.5 ± 2.8
581.2 ± 1.9
600.6 ± 1.8
586.3 ± 3.6
745.9 ± 3.6
685.8 ± 2.2
183.1 ± 1.0
182.5 ± 0.5
182.6 ± 0.6
553.2 ± 1.5
488.0 ± 0.8
VoigtModulo
1,360.8 ± 6.1
1,202.0 ± 2.1
600.0 ± 2.4
600.0 ± 3.0
607.0 ± 6.8
599.0 ± 1.6
1,187.2 ± 2.6
1,439.6 ± 6.7
1,346.5 ± 2.9
197.9 ± 0.7
208.2 ± 0.6
208.0 ± 0.4
548.9 ± 1.4
1,326.4 ± 3.0
VoigtNoModulo
844.5 ± 6.9
647.3 ± 1.3
628.9 ± 1.8
627.9 ± 1.6
629.1 ± 2.4
629.6 ± 4.4
668.2 ± 2.7
1,019.5 ± 3.2
715.1 ± 8.2
224.3 ± 4.8
219.0 ± 1.0
219.0 ± 0.6
561.7 ± 2.5
769.4 ± 9.3
TypeCast
366.1 ± 0.8 🏆
246.2 ± 1.1 🏆
245.3 ± 1.8 🏆
244.6 ± 1.1 🏆
154.6 ± 1.6 🏆
154.3 ± 0.5 🏆
257.4 ± 1.5 🏆
591.8 ± 4.1 🏆
590.4 ± 1.3 🏆
154.5 ± 1.3 🏆
135.4 ± 8.3 🏆
132.9 ± 0.7 🏆
132.8 ± 1.2 🏆
177.4 ± 2.3 🏆
Now I can finally get on with my life :P
Integer division with round-up.
Only 1 division executed per call, no % or * or conversion to/from floating point, works for positive and negative int. See note (1).
n (numerator) = OPs myIntNumber;
d (denominator) = OPs myOtherInt;
The following approach is simple. int division rounds toward 0. For negative quotients, this is a round up so nothing special is needed. For positive quotients, add d-1 to effect a round up, then perform an unsigned division.
Note (1) The usual divide by 0 blows things up and MININT/-1 fails as expected on 2's compliment machines.
int IntDivRoundUp(int n, int d) {
// If n and d are the same sign ...
if ((n < 0) == (d < 0)) {
// If n (and d) are negative ...
if (n < 0) {
n = -n;
d = -d;
}
// Unsigned division rounds down. Adding d-1 to n effects a round up.
return (((unsigned) n) + ((unsigned) d) - 1)/((unsigned) d);
}
else {
return n/d;
}
}
[Edit: test code removed, see earlier rev as needed]
Just use
int ceil_of_division = ((dividend-1)/divisor)+1;
For example:
for (int i=0;i<20;i++)
std::cout << i << "/8 = " << ((i-1)/8)+1 << std::endl;
A small hack is to do:
int divideUp(int a, int b) {
result = (a-1)/b + 1;
}
// Proof:
a = b*N + k (always)
if k == 0, then
(a-1) == b*N - 1
(a-1)/b == N - 1
(a-1)/b + 1 == N ---> Good !
if k > 0, then
(a-1) == b*N + l
(a-1)/b == N
(a-1)/b + 1 == N+1 ---> Good !
Instead of using the ceil function before casting to int, you can add a constant which is very nearly (but not quite) equal to 1 - this way, nearly anything (except a value which is exactly or incredibly close to an actual integer) will be increased by one before it is truncated.
Example:
#define EPSILON (0.9999)
int myValue = (int)(((float)myIntNumber)/myOtherInt + EPSILON);
EDIT: after seeing your response to the other post, I want to clarify that this will round up, not away from zero - negative numbers will become less negative, and positive numbers will become more positive.

Why is this Ruby code far faster than the equivalent C++ code?

Recently I've been going through some easy project Euler problems and solving them in Ruby and C++. But for Problem 14 concerning the Collatz conjecture, my C++ code went on for about half an hour before I terminated it, though when I translated the code into Ruby, it solved it in nine seconds.
That difference is quite unbelievable to me - I had always been led to believe that C++ was almost always faster than Ruby, especially for mathematical process.
My code is as follows.
C++:
#include <iostream>
using namespace std;
int main ()
{
int a = 2;
int b = 2;
int c = 0;
while (b < 1000000)
{
a = b;
int d = 2;
while (a != 4)
{
if (a % 2 == 0)
a /= 2;
else
a = 3*a + 1;
d++;
}
if (d > c)
{
cout << b << ' ' << d << endl;
c=d;
}
b++;
}
cout << c;
return 0;
}
Run time - I honestly don't know, but it's a really REALLY long time.
and Ruby:
#!/usr/bin/ruby -w
a = 0
b = 2
c = 0
while b < 1000000
a = b;
d = 2
while a != 4
if a % 2 == 0
a /= 2
else
a = 3*a + 1
end
d+=1
end
if d > c
p b,d
c=d
end
b+=1
end
p c
Run time - approximately 9 seconds.
Any idea what's going on here?
P.S. the C++ code runs a good deal faster than the Ruby code until it hits 100,000.
You're overflowing int, so it's not terminating. Use int64_t instead of int in your c++ code.
You'll probably need to include stdint.h for that..
In your case the problem was a bug in C++ implementation (numeric overflow).
Note however that trading in some memory you can get the result much faster than you're doing...
Hint: suppose you find that from number 231 you need 127 steps to end the computation, and suppose that starting from another number you get to 231 after 22 steps... how many more steps do you need to do?
With 32-bit arithmetic, the C++ code overflows on a = 3*a + 1. With signed 32-bit arithmetic, the problem is compounded, because the a /= 2 line will preserve the sign bit.
This makes it much harder for a to ever equal 4, and indeed when b reaches 113383, a overflows and the loop never ends.
With 64-bit arithmetic there is no overflow, because a maxes out at 56991483520, when b is 704511.
Without looking at the math, I speculate that unsigned 32-bit arithmetic will "probably" work, because the multiplication and unsigned division will both work modulo 2^32. And given the short running time of the program, values aren't going to cover too much of the 64-bit spectrum, so if a value is equal to 4 modulo 2^32, it's "probably" actually equal to 4.

Need an efficient subtraction algorithm modulo a number

For given numbers x,y and n, I would like to calculate x-y mod n in C. Look at this example:
int substract_modulu(int x, int y, int n)
{
return (x-y) % n;
}
As long as x>y, we are fine. In the other case, however, the modulu operation is undefined.
You can think of x,y,n>0. I would like the result to be positive, so if (x-y)<0, then ((x-y)-substract_modulu(x,y,n))/ n shall be an integer.
What is the fastest algorithm you know for that? Is there one which avoids any calls of if and operator??
As many have pointed out, in current C and C++ standards, x % n is no longer implementation-defined for any values of x and n. It is undefined behaviour in the cases where x / n is undefined [1]. Also, x - y is undefined behaviour in the case of integer overflow, which is possible if the signs of x and y might differ.
So the main problem for a general solution is avoiding integer overflow, either in the division or the subtraction. If we know that x and y are non-negative and n is positive, then overflow and division by zero are not possible, and we can confidently say that (x - y) % n is defined. Unfortunately, x - y might be negative, in which case so will be the result of the % operator.
It's easy to correct for the result being negative if we know that n is positive; all we have to do is unconditionally add n and do another modulo operation. That's unlikely to be the best solution, unless you have a computer where division is faster than branching.
If a conditional load instruction is available (pretty common these days), then the compiler will probably do well with the following code, which is portable and well-defined, subject to the constraints that x,y ≥ 0 ∧ n > 0:
((x - y) % n) + ((x >= y) ? 0 : n)
For example, gcc produces this code for my core I5 (although it's generic enough to work on any non-Paleozoic intel chip):
idivq %rcx
cmpq %rsi, %rdi
movl $0, %eax
cmovge %rax, %rcx
leaq (%rdx,%rcx), %rax
which is cheerfully branch-free. (Conditional move is usually a lot faster than branching.)
Another way of doing this would be (except that the function sign needs to be written):
((x - y) % n) + (sign(x - y) & (unsigned long)n)
where sign is all 1s if its argument is negative, and otherwise 0. One possible implementation of sign (adapted from bithacks) is
unsigned long sign(unsigned long x) {
return x >> (sizeof(long) * CHAR_BIT - 1);
}
This is portable (casting negative integer values to unsigned is defined), but it may be slow on architectures which lack high-speed shift. It's unlikely to be faster than the previous solution, but YMMV. TIAS.
Neither of these produce correct results for the general case where integer overflow is possible. It's very difficult to deal with integer overflow. (One particularly annoying case is n == -1, although you can test for that and return 0 without any use of %.) Also, you need to decide your preference for the result of modulo of negative n. I personally prefer the definition where x%n is either 0 or has the same sign as n -- otherwise why would you bother with a negative divisor -- but applications differ.
The three-modulo solution proposed by Tom Tanner will work if n is not -1 and n + n does not overflow. n == -1 will fail if either x or y is INT_MIN, and the simple fix of using abs(n) instead of n will fail if n is INT_MIN. The cases where n has a large absolute value could be replaced with comparisons, but there are a lot of corner cases, and made more complicated by the fact that the standard does not require 2's complement arithmetic, so it's not easily predictable what the corner cases are [2].
As a final note, some tempting solutions do not work. You cannot just take the absolute value of (x - y):
(-z) % n == -(z % n) == n - (z % n) ≠ z % n (unless z % n happens to be n / 2)
And, for the same reason, you cannot just take the absolute value of the result of modulo.
Also, you cannot just cast (x - y) to unsigned:
(unsigned)z == z + 2k (for some k) if z < 0
(z + 2k) % n == (z % n) + (2k % n) ≠ z % n unless (2k % n) == 0
[1] x/n and x%n are both undefined if n==0. But x%n is also undefined if x/n is "not representable" (i.e. there was integer overflow), which will happen on twos-complement
machines (that is, all the ones you care about) if x is most negative representable number and n == -1. It's clear why x/n should be undefined in this case, but slightly less so in the case of x%n, since that value is (mathematically) 0.
[2] Most people who complain about the difficulty of predicting the results of floating-point arithmetic haven't spent much time trying to write truly portable integer arithmetic code :)
If you want to avoid undefined behaviour, without an if, the following would work
return (x % n - y % n + n) % n;
The efficiency depends on the implementation of the modulo operation, but I'd suspect algorithms involving if would be rather faster.
Alternatively you could treat x and y as unsigned. In which case there are no negative numbers involved and no undefined behaviour.
With C++11 the undefined behavior was removed. Depending on the the exact behavior you want you can there just stick with
return (x-y) % n;
For a full explanation read this answer:
https://stackoverflow.com/a/13100805/1149664
You still get undefined behavior for n==0 or if x-y can not be stored in the type you are using.
Whether branching is going to matter will depend on the CPU to some degree. According to the documentation abs (on MSDN) has intrinsic behavior and it might not be a bottleneck at all. This you'll have to test.
If you wan't unconditionally compute things there are several nice methods that can be adapted from the Bit Twiddling Hacks site.
int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
r = (v + mask) ^ mask;
However, I don't know if this will be helpful to your situation without more information about hardware targets and testing.
Just out of curiosity I had to test this myself and when you look at the assembly generated by the compiler we can see there's no real overhead in the use of abs.
unsigned r = abs(i);
====
00381006 cdq
00381007 xor eax,edx
00381009 sub eax,edx
The following is just an alternate form of the above example which according to the Bit Twiddling Site is not patented (while the version used by the Visual C++ 2008 compiler is).
Throughout my answer I have been using MSDN and Visual C++ but I would assume that any sane compiler has similar behavior.
Assuming 0 <= x < n and 0 <= y < n, how about (x + n - y) % n? Then x + n will certainly be larger than y, subtracting y will always result in a positive integer, and the final mod n reduces the result if necessary.
I'm going to guess that it's not really the case here, but I'd like to mention that if the value you are taking modulo with is a power of two, then using the "AND" method is a lot quicker (I'm going to ignore the x-y, and just show how it works for a single x, as x-y is not part of the equation here):
int modpow2(int x, int n)
{
return x & (n-1);
}
If you want to ensure that your code doesn't do anything daft, you could add ASSERT(!(n & n-1)); - this checks that there is only a single bit set in n (so, n is a power of two).
Here is the CPP Code I use in competitive programming:
#include <iostream>
#include<bits/stdc++.h>
using namespace std;
#define ll long long
#define mod 1000000007
ll subtraction_modulo(ll x, ll y ){
return ( ( (x - y) % mod ) + mod ) % mod;
}
Here,
ll -> long long int
mod -> globally defined mod value to be used.

How to code a modulo (%) operator in C/C++/Obj-C that handles negative numbers

One of my pet hates of C-derived languages (as a mathematician) is that
(-1) % 8 // comes out as -1, and not 7
fmodf(-1,8) // fails similarly
What's the best solution?
C++ allows the possibility of templates and operator overloading, but both of these are murky waters for me. examples gratefully received.
First of all I'd like to note that you cannot even rely on the fact that (-1) % 8 == -1. the only thing you can rely on is that (x / y) * y + ( x % y) == x. However whether or not the remainder is negative is implementation-defined.
Reference: C++03 paragraph 5.6 clause 4:
The binary / operator yields the quotient, and the binary % operator yields the remainder from the division of the first expression by the second. If the second operand of / or % is zero the behavior is undefined; otherwise (a/b)*b + a%b is equal to a. If both operands are nonnegative then the remainder is nonnegative; if not, the sign of the remainder is implementation-defined.
Here it follows a version that handles both negative operands so that the result of the subtraction of the remainder from the divisor can be subtracted from the dividend so it will be floor of the actual division. mod(-1,8) results in 7, while mod(13, -8) is -3.
int mod(int a, int b)
{
if(b < 0) //you can check for b == 0 separately and do what you want
return -mod(-a, -b);
int ret = a % b;
if(ret < 0)
ret+=b;
return ret;
}
Here is a C function that handles positive OR negative integer OR fractional values for BOTH OPERANDS
#include <math.h>
float mod(float a, float N) {return a - N*floor(a/N);} //return in range [0, N)
This is surely the most elegant solution from a mathematical standpoint. However, I'm not sure if it is robust in handling integers. Sometimes floating point errors creep in when converting int -> fp -> int.
I am using this code for non-int s, and a separate function for int.
NOTE: need to trap N = 0!
Tester code:
#include <math.h>
#include <stdio.h>
float mod(float a, float N)
{
float ret = a - N * floor (a / N);
printf("%f.1 mod %f.1 = %f.1 \n", a, N, ret);
return ret;
}
int main (char* argc, char** argv)
{
printf ("fmodf(-10.2, 2.0) = %f.1 == FAIL! \n\n", fmodf(-10.2, 2.0));
float x;
x = mod(10.2f, 2.0f);
x = mod(10.2f, -2.0f);
x = mod(-10.2f, 2.0f);
x = mod(-10.2f, -2.0f);
return 0;
}
(Note: You can compile and run it straight out of CodePad: http://codepad.org/UOgEqAMA)
Output:
fmodf(-10.2, 2.0) = -0.20 == FAIL!
10.2 mod 2.0 = 0.2
10.2 mod -2.0 = -1.8
-10.2 mod 2.0 = 1.8
-10.2 mod -2.0 = -0.2
I have just noticed that Bjarne Stroustrup labels % as the remainder operator, not the modulo operator.
I would bet that this is its formal name in the ANSI C & C++ specifications, and that abuse of terminology has crept in. Does anyone know this for a fact?
But if this is the case then C's fmodf() function (and probably others) are very misleading. they should be labelled fremf(), etc
The simplest general function to find the positive modulo would be this-
It would work on both positive and negative values of x.
int modulo(int x,int N){
return (x % N + N) %N;
}
For integers this is simple. Just do
(((x < 0) ? ((x % N) + N) : x) % N)
where I am supposing that N is positive and representable in the type of x. Your favorite compiler should be able to optimize this out, such that it ends up in just one mod operation in assembler.
The best solution ¹for a mathematician is to use Python.
C++ operator overloading has little to do with it. You can't overload operators for built-in types. What you want is simply a function. Of course you can use C++ templating to implement that function for all relevant types with just 1 piece of code.
The standard C library provides fmod, if I recall the name correctly, for floating point types.
For integers you can define a C++ function template that always returns non-negative remainder (corresponding to Euclidian division) as ...
#include <stdlib.h> // abs
template< class Integer >
auto mod( Integer a, Integer b )
-> Integer
{
Integer const r = a%b;
return (r < 0? r + abs( b ) : r);
}
... and just write mod(a, b) instead of a%b.
Here the type Integer needs to be a signed integer type.
If you want the common math behavior where the sign of the remainder is the same as the sign of the divisor, then you can do e.g.
template< class Integer >
auto floor_div( Integer const a, Integer const b )
-> Integer
{
bool const a_is_negative = (a < 0);
bool const b_is_negative = (b < 0);
bool const change_sign = (a_is_negative != b_is_negative);
Integer const abs_b = abs( b );
Integer const abs_a_plus = abs( a ) + (change_sign? abs_b - 1 : 0);
Integer const quot = abs_a_plus / abs_b;
return (change_sign? -quot : quot);
}
template< class Integer >
auto floor_mod( Integer const a, Integer const b )
-> Integer
{ return a - b*floor_div( a, b ); }
… with the same constraint on Integer, that it's a signed type.
¹ Because Python's integer division rounds towards negative infinity.
Here's a new answer to an old question, based on this Microsoft Research paper and references therein.
Note that from C11 and C++11 onwards, the semantics of div has become truncation towards zero (see [expr.mul]/4). Furthermore, for D divided by d, C++11 guarantees the following about the quotient qT and remainder rT
auto const qT = D / d;
auto const rT = D % d;
assert(D == d * qT + rT);
assert(abs(rT) < abs(d));
assert(signum(rT) == signum(D) || rT == 0);
where signum maps to -1, 0, +1, depending on whether its argument is <, ==, > than 0 (see this Q&A for source code).
With truncated division, the sign of the remainder is equal to the sign of the dividend D, i.e. -1 % 8 == -1. C++11 also provides a std::div function that returns a struct with members quot and rem according to truncated division.
There are other definitions possible, e.g. so-called floored division can be defined in terms of the builtin truncated division
auto const I = signum(rT) == -signum(d) ? 1 : 0;
auto const qF = qT - I;
auto const rF = rT + I * d;
assert(D == d * qF + rF);
assert(abs(rF) < abs(d));
assert(signum(rF) == signum(d));
With floored division, the sign of the remainder is equal to the sign of the divisor d. In languages such as Haskell and Oberon, there are builtin operators for floored division. In C++, you'd need to write a function using the above definitions.
Yet another way is Euclidean division, which can also be defined in terms of the builtin truncated division
auto const I = rT >= 0 ? 0 : (d > 0 ? 1 : -1);
auto const qE = qT - I;
auto const rE = rT + I * d;
assert(D == d * qE + rE);
assert(abs(rE) < abs(d));
assert(signum(rE) >= 0);
With Euclidean division, the sign of the remainder is always non-negative.
Oh, I hate % design for this too....
You may convert dividend to unsigned in a way like:
unsigned int offset = (-INT_MIN) - (-INT_MIN)%divider
result = (offset + dividend) % divider
where offset is closest to (-INT_MIN) multiple of module, so adding and subtracting it will not change modulo. Note that it have unsigned type and result will be integer. Unfortunately it cannot correctly convert values INT_MIN...(-offset-1) as they cause arifmetic overflow. But this method have advandage of only single additional arithmetic per operation (and no conditionals) when working with constant divider, so it is usable in DSP-like applications.
There's special case, where divider is 2N (integer power of two), for which modulo can be calculated using simple arithmetic and bitwise logic as
dividend&(divider-1)
for example
x mod 2 = x & 1
x mod 4 = x & 3
x mod 8 = x & 7
x mod 16 = x & 15
More common and less tricky way is to get modulo using this function (works only with positive divider):
int mod(int x, int y) {
int r = x%y;
return r<0?r+y:r;
}
This just correct result if it is negative.
Also you may trick:
(p%q + q)%q
It is very short but use two %-s which are commonly slow.
I believe another solution to this problem would be use to variables of type long instead of int.
I was just working on some code where the % operator was returning a negative value which caused some issues (for generating uniform random variables on [0,1] you don't really want negative numbers :) ), but after switching the variables to type long, everything was running smoothly and the results matched the ones I was getting when running the same code in python (important for me as I wanted to be able to generate the same "random" numbers across several platforms.
For a solution that uses no branches and only 1 mod, you can do the following
// Works for other sizes too,
// assuming you change 63 to the appropriate value
int64_t mod(int64_t x, int64_t div) {
return (x % div) + (((x >> 63) ^ (div >> 63)) & div);
}
/* Warning: macro mod evaluates its arguments' side effects multiple times. */
#define mod(r,m) (((r) % (m)) + ((r)<0)?(m):0)
... or just get used to getting any representative for the equivalence class.
Example template for C++
template< class T >
T mod( T a, T b )
{
T const r = a%b;
return ((r!=0)&&((r^b)<0) ? r + b : r);
}
With this template, the returned remainder will be zero or have the same sign as the divisor (denominator) (the equivalent of rounding towards negative infinity), instead of the C++ behavior of the remainder being zero or having the same sign as the dividend (numerator) (the equivalent of rounding towards zero).
define MOD(a, b) ((((a)%(b))+(b))%(b))
unsigned mod(int a, unsigned b) {
return (a >= 0 ? a % b : b - (-a) % b);
}
This solution (for use when mod is positive) avoids taking negative divide or remainder operations all together:
int core_modulus(int val, int mod)
{
if(val>=0)
return val % mod;
else
return val + mod * ((mod - val - 1)/mod);
}
I would do:
((-1)+8) % 8
This adds the latter number to the first before doing the modulo giving 7 as desired. This should work for any number down to -8. For -9 add 2*8.

How can I safely average two unsigned ints in C++?

Using integer math alone, I'd like to "safely" average two unsigned ints in C++.
What I mean by "safely" is avoiding overflows (and anything else that can be thought of).
For instance, averaging 200 and 5000 is easy:
unsigned int a = 200;
unsigned int b = 5000;
unsigned int average = (a + b) / 2; // Equals: 2600 as intended
But in the case of 4294967295 and 5000 then:
unsigned int a = 4294967295;
unsigned int b = 5000;
unsigned int average = (a + b) / 2; // Equals: 2499 instead of 2147486147
The best I've come up with is:
unsigned int a = 4294967295;
unsigned int b = 5000;
unsigned int average = (a / 2) + (b / 2); // Equals: 2147486147 as expected
Are there better ways?
Your last approach seems promising. You can improve on that by manually considering the lowest bits of a and b:
unsigned int average = (a / 2) + (b / 2) + (a & b & 1);
This gives the correct results in case both a and b are odd.
If you know ahead of time which one is higher, a very efficient way is possible. Otherwise you're better off using one of the other strategies, instead of conditionally swapping to use this.
unsigned int average = low + ((high - low) / 2);
Here's a related article: http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
Your method is not correct if both numbers are odd eg 5 and 7, average is 6 but your method #3 returns 5.
Try this:
average = (a>>1) + (b>>1) + (a & b & 1)
with math operators only:
average = a/2 + b/2 + (a%2) * (b%2)
And the correct answer is...
(A&B)+((A^B)>>1)
If you don't mind a little x86 inline assembly (GNU C syntax), you can take advantage of supercat's suggestion to use rotate-with-carry after an add to put the high 32 bits of the full 33-bit result into a register.
Of course, you usually should mind using inline-asm, because it defeats some optimizations (https://gcc.gnu.org/wiki/DontUseInlineAsm). But here we go anyway:
// works for 64-bit long as well on x86-64, and doesn't depend on calling convention
unsigned average(unsigned x, unsigned y)
{
unsigned result;
asm("add %[x], %[res]\n\t"
"rcr %[res]"
: [res] "=r" (result) // output
: [y] "%0"(y), // input: in the same reg as results output. Commutative with next operand
[x] "rme"(x) // input: reg, mem, or immediate
: // no clobbers. ("cc" is implicit on x86)
);
return result;
}
The % modifier to tell the compiler the args are commutative doesn't actually help make better asm in the case I tried, calling the function with y being a constant or pointer-deref (memory operand). Probably using a matching constraint for an output operand defeats that, since you can't use it with read-write operands.
As you can see on the Godbolt compiler explorer, this compiles correctly, and so does a version where we change the operands to unsigned long, with the same inline asm. clang3.9 makes a mess of it, though, and decides to use the "m" option for the "rme" constraint, so it stores to memory and uses a memory operand.
RCR-by-one is not too slow, but it's still 3 uops on Skylake, with 2 cycle latency. It's great on AMD CPUs, where RCR has single-cycle latency. (Source: Agner Fog's instruction tables, see also the x86 tag wiki for x86 performance links). It's still better than #sellibitze's version, but worse than #Sheldon's order-dependent version. (See code on Godbolt)
But remember that inline-asm defeats optimizations like constant-propagation, so any pure-C++ version will be better in that case.
What you have is fine, with the minor detail that it will claim that the average of 3 and 3 is 2. I'm guessing that you don't want that; fortunately, there's an easy fix:
unsigned int average = a/2 + b/2 + (a & b & 1);
This just bumps the average back up in the case that both divisions were truncated.
In C++20, you can use std::midpoint:
template <class T>
constexpr T midpoint(T a, T b) noexcept;
The paper P0811R3 that introduced std::midpoint recommended this snippet (slightly adopted to work with C++11):
#include <type_traits>
template <typename Integer>
constexpr Integer midpoint(Integer a, Integer b) noexcept {
using U = std::make_unsigned<Integer>::type;
return a>b ? a-(U(a)-b)/2 : a+(U(b)-a)/2;
}
For completeness, here is the unmodified C++20 implementation from the paper:
constexpr Integer midpoint(Integer a, Integer b) noexcept {
using U = make_unsigned_t<Integer>;
return a>b ? a-(U(a)-b)/2 : a+(U(b)-a)/2;
}
If the code is for an embedded micro, and if speed is critical, assembly language may be helpful. On many microcontrollers, the result of the add would naturally go into the carry flag, and instructions exist to shift it back into a register. On an ARM, the average operation (source and dest. in registers) could be done in two instructions; any C-language equivalent would likely yield at least 5, and probably a fair bit more than that.
Incidentally, on machines with shorter word sizes, the differences can be even more substantial. On an 8-bit PIC-18 series, averaging two 32-bit numbers would take twelve instructions. Doing the shifts, add, and correction, would take 5 instructions for each shift, eight for the add, and eight for the correction, so 26 (not quite a 2.5x difference, but probably more significant in absolute terms).
int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
decimal avg = 0;
for (int i = 0; i < array.Length; i++){
avg = (array[i] - avg) / (i+1) + avg;
}
expects avg == 5.0 for this test
(((a&b << 1) + (a^b)) >> 1) is also a nice way.
Courtesy: http://www.ragestorm.net/blogs/?p=29