Is there a more efficient way to compute positive modulo? [duplicate] - c++

This question already has answers here:
Fastest way to get a positive modulo in C/C++
(9 answers)
Closed 8 years ago.
Modulo in C and C++ does not behave in a mathematically correct manner, as it returns a negative result when performing the modulo of a negative number. After doing some research, it seems the classic way of implementing a correctly behaving one is:
positive modulo(int i, int n)
{
return ( ( (i % n) + n ) % n );
}
Considering modulo is computationally expensive, is there a more efficient way to compute positive modulo for any number (I already saw the solution for powers of 2, but I need something generic)?

This may be slower or faster, depending on the compiler, the optimization level and the architecture:
static inline int modulo(int i, int n) {
const int k = i % n;
return k < 0 ? k + n : k;
}
The reason why it can be slower is that the condition operation may introduce a branch, and sometimes branches are slow.

The solution in pts's answer is probbaly the best solution. It often compiles to a branchless code, but even if there are slowdowns by the branch, it may possibly be faster than the division anyway. But in case you really need to avoid branching
inline int modulo(int i, int n)
{
int k = i % n;
int a = -(k < 0); // assuming 2's complement
// or int a = ((k < 0) << (INT_SIZE - 1)) >> (INT_SIZE - 1); if your system doesn't use 2's complement
return k + n & a;
}

The second modulo in your formula is necessary only to substract n again if the modulo was posivite in the first place. So it should be at least as performant to only conditionally add n:
auto m = (i % n);
return (m < 0) ? m+n : m;

Related

Ways to go from a number to 0 the fastest way

So, I have a homework like this:
Given two number n and k that can reach the long long limit, we do such operation:
assign n = n / k if n is divisible by k
reduce n by 1 if n is not divisible by k
Find the smallest number of operations to go from n to 0.
This is my solution
#define ll long long
ll smallestSteps(ll n, ll k) {
int steps = 0;
if (n < k) return n;
else if (n == k) return 2;
else {
while (n != 0) {
if (n % k == 0) {
n /= k;
steps++;
}
else {
n--;
steps++;
}
}
return (ll)steps;
}
}
This solution is O(n/k) I think?
But I think that n and k could be extremely big, and thus the program could exceed the time limit of 1s. Is there any better way to do this?
Edit 1: I use ll for it to be shorter
The algorithm can be improved given these observations:
If n<k then k|(n-m) will never hold for any positive m. So the answer is n steps.
If (k|n) does not hold then the biggest number m, m<n for which it does is n - (n%k). So it takes n%k steps until (k|m) holds again.
Actually all that you need is to keep doing division with remainder using std::div (or rely on compiler to optimize) and increase steps by remainder+1.
steps=0
while(n>0)
mod = n%k
n = n/k
steps+=mod + 1
return steps
This can be done with an even simpler main program.
Convert n to base k. Let d be the number of digits in this number.
To get to 0, you will divide by k (d-1) times.
The number of times you subtract 1 is the digital sum of this number.
For instance, consider n=314, k=3.
314 in base 3 is 102122. This has 6 digits; the digital sum is 8.
You will have 6-1+8 steps ... 13 steps to 0.
Use your C++ packages to convert to the new base, convert the digits to integers, and do the array sum. This pushes all the shift-count work into module methods.
Granted this won't work for weird values of k, but you can also steal available conversion packages instead of writing your own.

Is there an expression using modulo to do backwards wrap-around ("reverse overflow")?

For any whole number input W restricted by the range R = [x,y], the "overflow," for lack of a better term, of W over R is W % (y-x+1) + x. This causes it wrap back around if W exceeds y.
As an example of this principle, suppose we iterate over a calendar's months:
int this_month = 5;
int next_month = (this_month + 1) % 12;
where both integers will be between 0 and 11, inclusive. Thus, the expression above "clamps" the integer to the range R = [0,11]. This approach of using an expression is simple, elegant, and advantageous as it omits branching.
Now, what if we want to do the same thing, but backwards? The following expression works:
int last_month = ((this_month - 1) % 12 + 12) % 12;
but it's abstruse. How can it be beautified?
tl;dr - Can the expression ((x-1) % k + k) % k be simplified further?
Note: C++ tag specified because other languages handle negative operands for the modulo operator differently.
Your expression should be ((x-1) + k) % k. This will properly wrap x=0 around to 11. In general, if you want to step back more than 1, you need to make sure that you add enough so that the first operand of the modulo operation is >= 0.
Here is an implementation in C++:
int wrapAround(int v, int delta, int minval, int maxval)
{
const int mod = maxval + 1 - minval;
if (delta >= 0) {return (v + delta - minval) % mod + minval;}
else {return ((v + delta) - delta * mod - minval) % mod + minval;}
}
This also allows to use months labeled from 0 to 11 or from 1 to 12, setting min_val and max_val accordingly.
Since this answer is so highly appreciated, here is an improved version without branching, which also handles the case where the initial value v is smaller than minval. I keep the other example because it is easier to understand:
int wrapAround(int v, int delta, int minval, int maxval)
{
const int mod = maxval + 1 - minval;
v += delta - minval;
v += (1 - v / mod) * mod;
return v % mod + minval;
}
The only issue remaining is if minval is larger than maxval. Feel free to add an assertion if you need it.
k % k will always be 0. I'm not 100% sure what you're trying to do but it seems you want the last month to be clamped between 0 and 11 inclusive.
(this_month + 11) % 12
Should suffice.
The general solution is to write a function that computes the value that you want:
//Returns floor(a/n) (with the division done exactly).
//Let ÷ be mathematical division, and / be C++ division.
//We know
// a÷b = a/b + f (f is the remainder, not all
// divisions have exact Integral results)
//and
// (a/b)*b + a%b == a (from the standard).
//Together, these imply (through algebraic manipulation):
// sign(f) == sign(a%b)*sign(b)
//We want the remainder (f) to always be >=0 (by definition of flooredDivision),
//so when sign(f) < 0, we subtract 1 from a/n to make f > 0.
template<typename Integral>
Integral flooredDivision(Integral a, Integral n) {
Integral q(a/n);
if ((a%n < 0 && n > 0) || (a%n > 0 && n < 0)) --q;
return q;
}
//flooredModulo: Modulo function for use in the construction
//looping topologies. The result will always be between 0 and the
//denominator, and will loop in a natural fashion (rather than swapping
//the looping direction over the zero point (as in C++11),
//or being unspecified (as in earlier C++)).
//Returns x such that:
//
//Real a = Real(numerator)
//Real n = Real(denominator)
//Real r = a - n*floor(n/d)
//x = Integral(r)
template<typename Integral>
Integral flooredModulo(Integral a, Integral n) {
return a - n * flooredDivision(a, n);
}
Easy Peasy, do not use the first module operator, it is superfluous:
int last_month = (this_month - 1 + 12) % 12;
which is the general case
In this instance you can write 11, but I would still do the -1 + 11 as it more clearly states what you want to achieve.
Note that normal mod causes the pattern 0...11 to repeat at 12...23, 24...35, etc. but doesn't wrap on -11...-1. In other words, it has two sets of behaviors. One from -infinity...-1, and a different set of behavior from 0...infinity.
The expression ((x-1) % k + k) % k fixes -11...-1 but has the same problem as normal mod with -23...-12. I.e. while it fixes 12 additional numbers, it doesn't wrap around infinitely. It still has one set of behavior from -infinity...-12, and a different behavior from -11...+infinity.
This means that if you're using the function for offsets, it could lead to buggy code.
If you want a truly wrap around mod, it should handle the entire range, -infinity...infinity in exactly the same way.
There is probably a better way to implement this, but here is an easy to understand implementation:
// n must be greater than 0
func wrapAroundMod(a: Int, n: Int) -> Int {
var offsetTimes: Int = 0
if a < 0 {
offsetTimes = (-a / n) + 1
}
return (a + n * offsetTimes) % n
}
Not sure if you were having the same problem as me, but my problem was essentially that I wanted to constrain all numbers to a certain range. Say that range was 0-6, so using %7 means that any number higher than 6 will wrap back around to 0 or above. The actual problem is that numbers less than zero didn't wrap back around to 6. I have a solution to that (where X is the upper limit of your number range and 0 is the minimum):
if(inputNumber <0)//If this is a negative number
{
(X-(inputNumber*-1))%X;
}
else
{
inputNumber%X;
}

Is there any way to reduce time complexity to find this matrix to the power n?

I am working on a problem where I am supposed to find nth power of a 4x4 matrix where n can be as large as 10^15 and since values in answer can be very large I can use modulo 10^9+7 .
Given Matrix is-
2 1 -2 -1
A= 1 0 0 0
0 1 0 0
0 0 1 0
I have written a code for this purpose but its running time is more than desired time. So
anyone please help me in reducing time complexity.
#define FOR(k,a,b) for(typeof(a) k=(a); k < (b); ++k)
typedef long long ll;
#define dim 4
struct matrix {
long long a[dim][dim];
};
#define MOD 1000000007
matrix mul(matrix x, matrix y)
{
matrix res;
FOR(a, 0, dim) FOR(b, 0, dim) res.a[a][b] = 0;
FOR(a, 0, dim) FOR(b, 0, dim) FOR(c, 0, dim) {
ll temp = x.a[a][b] * y.a[b][c];
if (temp <= -MOD || temp >= MOD)
temp %= MOD;
res.a[a][c] += temp;
if (res.a[a][c] <= -MOD || res.a[a][c] >= MOD)
res.a[a][c] %= MOD;
}
return res;
}
matrix power(matrix m, ll n)
{
if (n == 1)
return m;
matrix u = mul(m, m);
u = power(u, n / 2);
if (n & 1)
u = mul(u, m);
return u;
}
matrix M, RP;
int main()
{
FOR(a, 0, dim) FOR(b, 0, dim) M.a[a][b] = 0;
M.a[0][0] = 2;
M.a[0][1] = 1;
M.a[0][2] = -2;
M.a[0][3] = -1;
M.a[1][0] = 1;
M.a[2][1] = 1;
M.a[3][2] = 1;
int nt;
scanf("%d", &nt);
while (nt--) {
ll n;
scanf("%lld", &n);
RP = power(M, n);
FOR(a, 0, dim)
FOR(b, 0, dim)
printf("%lld\n", RP.a[a][b]);
}
return 0;
}
[Commenters have shown that this answer is incomplete. The answer is retained here for reference, but wants no more upvotes. Would the commenters add more complete answers, at their discretion?]
Yes. An excellent way to do exactly what you want is known. You must diagonalize the matrix.
Diagonalization will require some programming. The theory is explained here, in sect. 14.6.
Fortunately, existing matrix-algebra libraries like LAPACK already include diagonalization routines.
#Haile correctly and interestingly observes that not all matrices are diagonalizable, that there exist degenerate cases. I do not have much practical experience with such cases. There is the Schur decomposition (see sect. 14.10 of the previously linked source), but I have normally seen Schur used only to make theoretical points, not to do practical calculations. Still, I believe that Schur would work. It would take a lot of effort to implement it, I suspect, but it would work, even in the case of the strictly nondiagonalizable matrix.
You could take advantage of the multiple test cases to reduce the total computation.
Note that every time you call power you are recomputing all the powers of 2 of your original matrix. So for a number like 10^15 (roughly 2^50) you will end up squaring a matrix 50 times, and also calculating a multiply for each nonzero bit in the number (perhaps 25 times).
If you simply precompute the 50 powers of 2, then each test case would only require on average 25 multiplications instead of 75.
You can take this idea a little further and use a different base for your exponentiation. This would result in more precomputation, but fewer final matrix multiplications for each test value.
For example, instead of precomputing M^2, M^4, M^8, M^16 you could precompute [M^1,M^2,M^3],[M^4,M^8,M^12],[M^16,M^32,M^48] and so M^51 would be (M^3)*(M^48) instead of M*M^2*M^16*M^32
This is not really an idea about exponentiating matrices faster, but about speeding up the entire program.
If you are asked to perform 10^4 exponentiations, this doesn't mean they should be done independently. You can sort requests and reuse previous result for each next computation.
Also you can store intermediate results from previous computations.

Coding Competitions: How to store large numbers and find its all combination modulus P

I have started doing competitive programming and most of the time i find that the input size of numbers is like
1 <= n <= 10^(500).
So i understand that it would be like 500 digits which can not be stored on simple int memory. I know c and c++.
I think i should use an array. But then i get confused on how would i find
if ( (nCr % P) == 0 ) //for all (0<=r<=n)//
I think that i would store it in an array and then find nCr. Which would require coding multiplication and division on digits but what about modulus.
Is there any other way?
Thanks.
I think you don't want to code the multiplication and division yourself, but use something like the GNU MP Bignum library http://gmplib.org/
Regarding large number libraries, I have used ttmath, which provides arbitrary length integers, floats, etc, and some really good operations, all with relatively little bulk.
However, if you are only trying to figure out what (n^e) mod m is, you can do this for very large values of e even without extremely large number calculation. Below is a function I added to my local ttmath lib to do just that:
/*!
mod power this = (this ^ pow) % m
binary algorithm (r-to-l)
return values:
0 - ok
1 - carry
2 - incorrect argument (0^0)
*/
uint PowMod(UInt<value_size> pow, UInt<value_size> mod)
{
if(pow.IsZero() && IsZero())
// we don't define zero^zero
return 2;
UInt<value_size> remainder;
UInt<value_size> x = 1;
uint c = 0;
while (pow != 0)
{
remainder = (pow & 1 == 1);
pow /= 2;
if (remainder != 0)
{
c += x.Mul(*this);
x = x % mod;
}
c += Mul(*this);
*this = *this % mod;
}
*this = x;
return (c==0)? 0 : 1;
}
I don't believe you ever need to store a number larger than n^2 for this algorithm. It should be easy to modify such that it removes the ttmath related aspects, if you don't want to use those headers.
You can find the details of the mathematics online by looking up modular exponentiation, if you care about it.
If we have to calcuate nCr mod p(where p is a prime), we can calculate factorial mod p and then use modular inverse to find nCr mod p. If we have to find nCr mod m(where m is not prime), we can factorize m into primes and then use Chinese Remainder Theorem(CRT) to find nCr mod m.
#include<iostream>
using namespace std;
#include<vector>
/* This function calculates (a^b)%MOD */
long long pow(int a, int b, int MOD)
{
long long x=1,y=a;
while(b > 0)
{
if(b%2 == 1)
{
x=(x*y);
if(x>MOD) x%=MOD;
}
y = (y*y);
if(y>MOD) y%=MOD;
b /= 2;
}
return x;
}
/* Modular Multiplicative Inverse
Using Euler's Theorem
a^(phi(m)) = 1 (mod m)
a^(-1) = a^(m-2) (mod m) */
long long InverseEuler(int n, int MOD)
{
return pow(n,MOD-2,MOD);
}
long long C(int n, int r, int MOD)
{
vector<long long> f(n + 1,1);
for (int i=2; i<=n;i++)
f[i]= (f[i-1]*i) % MOD;
return (f[n]*((InverseEuler(f[r], MOD) * InverseEuler(f[n-r], MOD)) % MOD)) % MOD;
}
int main()
{
int n,r,p;
while (~scanf("%d%d%d",&n,&r,&p))
{
printf("%lld\n",C(n,r,p));
}
}
Here, I've used long long int to stote the number.
In many. many cases in these coding competitions, the idea is that you don't actually calculate these big numbers, but figure out how to answer the question without calculating it. For example:
What are the last ten digits of 1,000,000! (factorial)?
It's a number with over five million digits. However, I can answer that question without a computer, not even using pen and paper. Or take the question: What is (2014^2014) modulo 153? Here's a simple way to calculate this in C:
int modulo = 1;
for (int i = 0; i < 2014; ++i) modulo = (modulo * 2014) % 153;
Again, you avoided doing a calculation with a 6,000 digit number. (You can actually do this considerably faster, but I'm not trying to enter a competition).

c++ how to expess a mathematical term

i have the following :
I only want (for now) to express the (s 1) , (s 2) term .
For example ,(s 1)=s , (s 2)= s(s-1)/2! , (s 3)=s(s-1)(s-2)/3!.
I created a factorial function :
//compute factorial
int fact(int x){
if (x==0)
return 1;
else
return fact(x-1)*x;
}
and i have problem in how to do right the above.
.....
double s=(z-x[1])/h;
double s_term=0;
for (int p=1;p<=n;p++){
if p==1
s_term=s;
else
s_term=s*(s-p)/fact(p+1);
}
Also, it is that : s=(x - x0)/h.
I don't know if i have declared right the s above.(i use x1 in the declaration because this is my starting point)
Thank you!
You can calculate the Binomial Coefficient simply using this function (probably the best for performance and memory usage):
unsigned long long ComputeBinomialCoefficient( int n, int k )
{
// Run-time assert to ensure correct behavior
assert( n > k && n > 1 );
// Exploit the symmetry in the line x = k/2:
if( k > n - k )
k = n - k;
unsigned long long c(1);
// Perform the product over the space i = [1...k]
for( int i = 1; i < k+1; i++ )
{
c *= n - (k - i);
c /= i;
}
return c;
}
You can then just call this when you see the brackets. (I'm assuming that is the Binomial Coefficient, rather than a 2D column vector?). This technique only uses 2 variables internally (taking up a grand total of 12 bytes), and uses no recursion.
Hope this helps! :)
EDIT: I'm curious how you're going to do the (I assume laplacian) operator? Are you intending to do the forward difference method for discrete values of x, and then calculate the 2nd derivative using the results from the first, then take the quotient?
The factorial part will be much more efficient using a loop rather than recursion.
As for the binomial coefficients, the line:
s_term=s*(s-p)/fact(p+1);
isn't going to have the desired effect, as you're only setting the first and last terms correctly and missing out the (s-1), (s-2), ..., (s-p+1) terms. It's easier to just use:
s_term = fact(s) / (fact(p) * fact(s-p))
for s choose p.
As others have pointed out implementing factorial and binomial coefficient functions is not easy (e.g. overflows lurk everywhere).
If you are interested in reasonable implementations as opposed to implementing all this yourself have a look at what is available in gsl which everybody dealing with numerical problems should know of.
#include <gsl/gsl_sf_gamma.h>
double factorial_10 = gsl_sf_fact(10);
double ten_over_four = gsl_sf_choose(10, 4);
Have also a look at the documentation. There are numerous functions returning the log instead of the value to avoid overflow problems.