I do a simulation with a lot of particles (up to 100000) in periodic domain(box), and in order particles to stay inside the box, I use modulo function with float or double numbers.
In Matlab everything works great with mod function. However in C++ I found out, that function fmod is not completely equal to Matlab's mod function:
mod(-0.5,10)=9.5 - I want this result in C++
fmod(-0.5,10)=-0.5 - I don't want this.
I, of course, can solve my problem with if statements. However, i think, it will affect efficiency (if statement in critical loop). Is there a way to implement this function without if statement? May be some other function?
Thanks.
Just use a conditional. It will not meaningfully affect efficiency.
inline double realmod (x, y)
{
result = fmod(x, y);
return result >= 0 ? result : result + y;
}
fmod() calls assembly instruction FPREM which takes 16-64 cycles (according to the Pentium manual, http://www.intel.com/design/pentium/manuals/24143004.pdf). The jump instructions for the conditional and the floating point addition only amount to 5 or so.
When your code has floating point division, you don't need to sweat the small stuff.
Either use floor and regular division:
float modulo(float a, float q)
{
float b = a / q;
return (b - floor(b)) * q;
}
or you can add the divisor to the result of fmod without branching:
float modulo(float a, float q)
{
float m = fmod(a, q);
return m + q * (m < 0.f);
}
Based on Matlab mod(a, m) documentation and #QuestionC's answer -
A general solution that behaves exactly like Matlab - also for negative and zero divisor.
Tested against multiple values :
static inline double MatlabMod(double q, double m)
{
if(m == 0)
return q;
double result = fmod(q, m);
return ((result >= 0 && m > 0) || (q <= 0 && m < 0)) ? result : (result + m);
}
Tested with matlab for :
(q, m) -> result
(54, 321) -> 54
(-50, 512) -> 462
(54, -152) -> -98
(-53, -500) -> -53
(-500, 300) -> 100
(-5000, 400) -> 200
(-1000, -360) -> -280
(500, 360) -> 140
(1000, 360) -> 280
(-1000, 360) -> 80
(-5051, 0) -> -5051
(512, 0) -> 512
(0, 52) -> 0
(0, -58) -> 0
Just add the divisor to the number you want to keep in the interval before you apply the modulo operator:
return fmod(a+q,q);
this requires no branching at all.
If you have to worry about a exeeding -q between two updates, you can make it more robust by e.g.:
return fmod(a+q*10,q);
which would work for a >= -10*q
The most straightforward with working for both floats and ints without any branching.
// b = MOD(a, m)
int a = a - m * floor(a / m) + m;
int b = a - m * floor(a / m);
Related
I want to compute nCk mod m with following constraints:
n<=10^18
k<=10^5
m=10^9+7
I have read this article:
Calculating Binomial Coefficient (nCk) for large n & k
But here value of m is 1009. Hence using Lucas theorem, we need only to calculate 1009*1009 different values of aCb where a,b<=1009
How to do it with above constraints.
I cannot make a array of O(m*k) space complexity with given constraints.
Help!
The binominal coefficient of (n, k) is calculated by the formula:
(n, k) = n! / k! / (n - k)!
To make this work for large numbers n and k modulo m observe that:
Factorial of a number modulo m can be calculated step-by-step, in
each step taking the result % m. However, this will be far too slow with n up to 10^18. So there are faster methods where the complexity is bounded by the modulo, and you can use some of those.
The division (a / b) mod m is equal to (a * b^-1) mod m, where b^-1 is the inverse of b modulo m (that is, (b * b^-1 = 1) mod m).
This means that:
(n, k) mod m = (n! * (k!)^-1 * ((n - k)!)^-1) mod m
The inverse of a number can be efficiently found using the Extended Euclidean algorithm. Assuming you have the factorial calculation sorted out, the rest of the algorithm is straightforward, just watch out for integer overflows on multiplication. Here's reference code that works up to n=10^9. To handle for larger numbers the factorial computation should be replaced with a more efficient algorithm and the code should be slightly adapted to avoid integer overflows, but the main idea will remain the same:
#define MOD 1000000007
// Extended Euclidean algorithm
int xGCD(int a, int b, int &x, int &y) {
if (b == 0) {
x = 1;
y = 0;
return a;
}
int x1, y1, gcd = xGCD(b, a % b, x1, y1);
x = y1;
y = x1 - (long long)(a / b) * y1;
return gcd;
}
// factorial of n modulo MOD
int modfact(int n) {
int result = 1;
while (n > 1) {
result = (long long)result * n % MOD;
n -= 1;
}
return result;
}
// multiply a and b modulo MOD
int modmult(int a, int b) {
return (long long)a * b % MOD;
}
// inverse of a modulo MOD
int inverse(int a) {
int x, y;
xGCD(a, MOD, x, y);
return x;
}
// binomial coefficient nCk modulo MOD
int bc(int n, int k)
{
return modmult(modmult(modfact(n), inverse(modfact(k))), inverse(modfact(n - k)));
}
Just use the fact that
(n, k) = n! / k! / (n - k)! = n*(n-1)*...*(n-k+1)/[k*(k-1)*...*1]
so you actually have just 2*k=2*10^5 factors. For the inverse of a number you can use suggestion of kfx since your m is prime.
First, you don't need to pre-compute and store all the possible aCb values! they can be computed per case.
Second, for the special case when (k < m) and (n < m^2), the Lucas theorem easily reduces to the following result:
(n choose k) mod m = ((n mod m) choose k) mod m
then since (n mod m) < 10^9+7 you can simply use the code proposed by #kfx.
We want to compute nCk (mod p). I'll handle when 0 <= k <= p-2, because Lucas's theorem handles the rest.
Wilson's theorem states that for prime p, (p-1)! = -1 (mod p), or equivalently (p-2)! = 1 (mod p) (by division).
By division: (k!)^(-1) = (p-2)!/(k!) = (p-2)(p-3)...(k+1) (mod p)
Thus, the binomial coefficient is n!/(k!(n-k)!) = n(n-1)...(n-k+1)/(k!) = n(n-1)...(n-k+1)(p-2)(p-3)...(k+1) (mod p)
Voila. You don't have to do any inverse computations or anything like that. It's also fairly easy to code. A couple optimizations to consider: (1) you can replace (p-2)(p-3)... with (-2)(-3)...; (2) nCk is symmetric in the sense that nCk = nC(n-k) so choose the half that requires you to do less computations.
I recently wrote a Computer Science exam where they asked us to give a recursive definition for the cos taylor series expansion. This is the series
cos(x) = 1 - x^2/2! + x^4/4! + x^6/6! ...
and the function signature looks as follows
float cos(int n , float x)
where n represents the number in the series the user would like to calculate till and x represents the value of x in the cos function
I obviously did not get that question correct and I have been trying to figure it out for the past few days but I have hit a brick wall
Would anyone be able to help out getting me started somewhere ?
All answers so far recompute the factorial every time. I surely wouldn't do that. Instead you can write :
float cos(int n, float x)
{
if (n > MAX)
return 1;
return 1 - x*x / ((2 * n - 1) * (2 * n)) * cos(n + 1, x);
}
Consider that cos returns the following (sorry for the dots position) :
You can see that this is true for n>MAX, n=MAX, and so on. The sign alternating and powers of x are easy to see.
Finally, at n=1 you get 0! = 1, so calling cos(1, x) gets you the first MAX terms of the Taylor expansion of cos.
By developing (easier to see when it has few terms), you can see the first formula is equivalent to the following :
For n > 0, you do in cos(n-1, x) a division by (2n-3)(2n-2) of the previous result, and a multiplication by x². You can see that when n=MAX+1 this formula is 1, with n=MAX then it is 1-x²/((2MAX-1)2MAX) and so on.
If you allow yourself helper functions, then you should change the signature of the above to float cos_helper(int n, float x, int MAX) and call it like so :
float cos(int n, float x) { return cos_helper(1, x, n); }
Edit : To reverse the meaning of n from degree of the evaluated term (as in this answer so far) to number of terms (as in the question, and below), but still not recompute the total factorial every time, I would suggest using a two-term relation.
Let us define trivially cos(0,x) = 0 and cos(1,x) = 1, and try to achieve generally cos(n,x) the sum of the n first terms of the Taylor series.
Then for each n > 0, we can write, cos(n,x) from cos(n-1,x) :
cos(n,x) = cos(n-1,x) + x2n / (2n)!
now for n > 1, we try to make the last term of cos(n-1,x) appear (because it is the closest term to the one we want to add) :
cos(n,x) = cos(n-1,x) + x² / ((2n-1)2n) * ( x2n-2 / (2n-2)! )
By combining this formula with the previous one (adapting it to n-1 instead of n) :
cos(n,x) = cos(n-1,x) + x² / ((2n-1)2n) * ( cos(n-1,x) - cos(n-2,x) )
We now have a purely recursive definition of cos(n,x), without helper function, without recomputing the factorial, and with n the number of terms in the sum of the Taylor decomposition.
However, I must stress that the following code will perform terribly :
performance wise, unless some optimization allows to not re-evaluate a cos(n-1,x) that was evaluated at the previous step as cos( (n-1) - 1, x)
precision wise, because of cancellation effects : the precision with which we get x2n-2 / (2n-2)! is very bad
Now this disclaimer is in place, here comes the code :
float cos(int n, float x)
{
if (n < 2)
return n;
float c = x * x / (2 * (n - 1) * 2 * n);
return (1-c) * cos(n-1, x) + c * cos(n-2, x);
}
cos(x)=1 - x^2/2! + x^4/4! - x^6/6! + x^8/8!.....
=1-x^2/2 (1 - x^2/3*4 + x^4/3*4*5*6 -x^6/3*4*5*6*7*8)
=1 - x^2/2 {1- x^2/3*4 (1- x^2/5*6 + x^4/5*6*7*8)}
=1 - x^2/2 [1- x^2/3*4 {1- x^2/5*6 ( 1- x^2/7*8)}]
double cos_series_recursion(double x, int n, double r=1){
if(n>0){
r=1-((x*x*r)/(n*(n-1)));
return cos_series_recursion(x,n-2,r);
}else return r;
}
A simple approach that makes use of static variables:
double cos(double x, int n) {
static double p = 1, f = 1;
double r;
if(n == 0)
return 1;
r = cos(x, n-1);
p = (p*x)*x;
f = f*(2*n-1)*2*n;
if(n%2==0) {
return r+p/f;
} else {
return r-p/f;
}
}
Notice that I'm multiplying 2*n in the operation to get the next factorial.
Having n align to the factorial we need makes this easy to do in 2 operations: f = f * (n - 1) then f = f * n.
when n = 1, we need 2!
when n = 2, we need 4!
when n = 3, we need 6!
So we can safely double n and work from there. We could write:
n = 2*n;
f = f*(n-1);
f = f*n;
If we did this, we would need to update our even/odd check to if((n/2)%2==0) since we're doubling the value of n.
This can instead be written as f = f*(2*n-1)*2*n; and now we don't have to divide n when checking if it's even/odd, since n is not being altered.
You can use a loop or recursion, but I would recommend a loop. Anyway, if you must use recursion you could use something like the code below
#include <iostream>
using namespace std;
int fact(int n) {
if (n <= 1) return 1;
else return n*fact(n-1);
}
float Cos(int n, float x) {
if (n == 0) return 1;
return Cos(n-1, x) + (n%2 ? -1 : 1) * pow (x, 2*n) / (fact(2*n));
}
int main()
{
cout << Cos(6, 3.14/6);
}
Just do it like the sum.
The parameter n in float cos(int n , float x) is the l and now just do it...
Some pseudocode:
float cos(int n , float x)
{
//the sum-part
float sum = pow(-1, n) * (pow(x, 2*n))/faculty(2*n);
if(n <= /*Some predefined maximum*/)
return sum + cos(n + 1, x);
return sum;
}
The usual technique when you want to recurse but the function arguments don't carry the information that you need, is to introduce a helper function to do the recursion.
I have the impression that in the Lisp world the convention is to name such a function something-aux (short for auxiliary), but that may have been just a limited group in the old days.
Anyway, the main problem here is that n represents the natural ending point for the recursion, the base case, and that you then also need some index that works itself up to n. So, that's one good candidate for extra argument for the auxiliary function. Another candidate stems from considering how one term of the series relates to the previous one.
I need to find n!%1000000009.
n is of type 2^k for k in range 1 to 20.
The function I'm using is:
#define llu unsigned long long
#define MOD 1000000009
llu mulmod(llu a,llu b) // This function calculates (a*b)%MOD caring about overflows
{
llu x=0,y=a%MOD;
while(b > 0)
{
if(b%2 == 1)
{
x = (x+y)%MOD;
}
y = (y*2)%MOD;
b /= 2;
}
return (x%MOD);
}
llu fun(int n) // This function returns answer to my query ie. n!%MOD
{
llu ans=1;
for(int j=1; j<=n; j++)
{
ans=mulmod(ans,j);
}
return ans;
}
My demand is such that I need to call the function 'fun', n/2 times. My code runs too slow for values of k around 15. Is there a way to go faster?
EDIT:
In actual I'm calculating 2*[(i-1)C(2^(k-1)-1)]*[((2^(k-1))!)^2] for all i in range 2^(k-1) to 2^k. My program demands (nCr)%MOD caring about overflows.
EDIT: I need an efficient way to find nCr%MOD for large n.
The mulmod routine can be speeded up by a large factor K.
1) '%' is overkill, since (a + b) are both less than N.
- It's enough to evaluate c = a+b; if (c>=N) c-=N;
2) Multiple bits can be processed at once; see optimization to "Russian peasant's algorithm"
3) a * b is actually small enough to fit 64-bit unsigned long long without overflow
Since the actual problem is about nCr mod M, the high level optimization requires using the recurrence
(n+1)Cr mod M = (n+1)nCr / (n+1-r) mod M.
Because the left side of the formula ((nCr) mod M)*(n+1) is not divisible by (n+1-r), the division needs to be implemented as multiplication with the modular inverse: (n+r-1)^(-1). The modular inverse b^(-1) is b^(M-1), for M being prime. (Otherwise it's b^(phi(M)), where phi is Euler's Totient function.)
The modular exponentiation is most commonly implemented with repeated squaring, which requires in this case ~45 modular multiplications per divisor.
If you can use the recurrence
nC(r+1) mod M = nCr * (n-r) / (r+1) mod M
It's only necessary to calculate (r+1)^(M-1) mod M once.
Since you are looking for nCr for multiple sequential values of n you can make use of the following:
(n+1)Cr = (n+1)! / ((r!)*(n+1-r)!)
(n+1)Cr = n!*(n+1) / ((r!)*(n-r)!*(n+1-r))
(n+1)Cr = n! / ((r!)*(n-r)!) * (n+1)/(n+1-r)
(n+1)Cr = nCr * (n+1)/(n+1-r)
This saves you from explicitly calling the factorial function for each i.
Furthermore, to save that first call to nCr you can use:
nC(n-1) = n //where n in your case is 2^(k-1).
EDIT:
As Aki Suihkonen pointed out, (a/b) % m != a%m / b%m. So the method above so the method above won't work right out of the box. There are two different solutions to this:
1000000009 is prime, this means that a/b % m == a*c % m where c is the inverse of b modulo m. You can find an explanation of how to calculate it here and follow the link to the Extended Euclidean Algorithm for more on how to calculate it.
The other option which might be easier is to recognize that since nCr * (n+1)/(n+1-r) must give an integer, it must be possible to write n+1-r == a*b where a | nCr and b | n+1 (the | here means divides, you can rewrite that as nCr % a == 0 if you like). Without loss of generality, let a = gcd(n+1-r,nCr) and then let b = (n+1-r) / a. This gives (n+1)Cr == (nCr / a) * ((n+1) / b) % MOD. Now your divisions are guaranteed to be exact, so you just calculate them and then proceed with the multiplication as before. EDIT As per the comments, I don't believe this method will work.
Another thing I might try is in your llu mulmod(llu a,llu b)
llu mulmod(llu a,llu b)
{
llu q = a * b;
if(q < a || q < b) // Overflow!
{
llu x=0,y=a%MOD;
while(b > 0)
{
if(b%2 == 1)
{
x = (x+y)%MOD;
}
y = (y*2)%MOD;
b /= 2;
}
return (x%MOD);
}
else
{
return q % MOD;
}
}
That could also save some precious time.
I often smooth values by blending percentages and inverse percentages with the below:
current_value = (current_value * 0.95f) + (goal_value * 0.05f)
I'm running into a situation where I would like to perform this action n times, and n is a floating point value.
What would be the proper way of performing the above, say 12.5 times for example?
One way of doing this would be to handle the integer amount, and then approximate the remaining amount. For example (I assume valid inputs, you would want to check for those):
void Smooth(float& current, float goal, float times, float factor){
// Handle the integer steps;
int numberIterations = (int)times;
for (int i = 0; i < numberIterations; ++i){
current = (factor * current) + (goal * (1 - factor));
}
// Aproximate the rest of the step
float remainingIteration = times - numberIterations;
float adjusted_factor = factor + ((1 - factor) * (1 - remainingIteration));
current = (adjusted_factor * current) + (goal * (1 - adjusted_factor));
}
Running for the following values I get:
current=1 goal=2 factor=0.95
12.0 times - 1.45964
12.5 times - 1.47315
13.0 times - 1.48666
I appreciate the help! I have been trying several things related to compound interest, and I believe I may have solved this with the following. My ultimate goal in mind (which was actually unstated here) was to actually do this with very little iterative processing. powf() may be the most time consuming piece here.
float blend_n(float c, float g, float f, float n)
{
if (g != 0.0f)
return c + ((g - c) / g) * (g - g * powf(1.0f - f, n));
else
return c * powf(1.0 - f, n);
}
It's late here, and my redbull is wearing off so there may be some parts that could be factored out.
Usage would be setting c to the return value of blend_n ...
Thanks again!
[EDIT]
I should explain here that c is the (current) value, g is the (goal) value, f is the (factor), and n is the (number of steps)
[/EDIT]
[EDIT2]
An exception has to be made for goal values of 0, as it will result in a NaN (Not a Number) ... Change done to the code above
[/EDIT2]
Given integer values x and y, C and C++ both return as the quotient q = x/y the floor of the floating point equivalent. I'm interested in a method of returning the ceiling instead. For example, ceil(10/5)=2 and ceil(11/5)=3.
The obvious approach involves something like:
q = x / y;
if (q * y < x) ++q;
This requires an extra comparison and multiplication; and other methods I've seen (used in fact) involve casting as a float or double. Is there a more direct method that avoids the additional multiplication (or a second division) and branch, and that also avoids casting as a floating point number?
For positive numbers where you want to find the ceiling (q) of x when divided by y.
unsigned int x, y, q;
To round up ...
q = (x + y - 1) / y;
or (avoiding overflow in x+y)
q = 1 + ((x - 1) / y); // if x != 0
For positive numbers:
q = x/y + (x % y != 0);
Sparky's answer is one standard way to solve this problem, but as I also wrote in my comment, you run the risk of overflows. This can be solved by using a wider type, but what if you want to divide long longs?
Nathan Ernst's answer provides one solution, but it involves a function call, a variable declaration and a conditional, which makes it no shorter than the OPs code and probably even slower, because it is harder to optimize.
My solution is this:
q = (x % y) ? x / y + 1 : x / y;
It will be slightly faster than the OPs code, because the modulo and the division is performed using the same instruction on the processor, because the compiler can see that they are equivalent. At least gcc 4.4.1 performs this optimization with -O2 flag on x86.
In theory the compiler might inline the function call in Nathan Ernst's code and emit the same thing, but gcc didn't do that when I tested it. This might be because it would tie the compiled code to a single version of the standard library.
As a final note, none of this matters on a modern machine, except if you are in an extremely tight loop and all your data is in registers or the L1-cache. Otherwise all of these solutions will be equally fast, except for possibly Nathan Ernst's, which might be significantly slower if the function has to be fetched from main memory.
You could use the div function in cstdlib to get the quotient & remainder in a single call and then handle the ceiling separately, like in the below
#include <cstdlib>
#include <iostream>
int div_ceil(int numerator, int denominator)
{
std::div_t res = std::div(numerator, denominator);
return res.rem ? (res.quot + 1) : res.quot;
}
int main(int, const char**)
{
std::cout << "10 / 5 = " << div_ceil(10, 5) << std::endl;
std::cout << "11 / 5 = " << div_ceil(11, 5) << std::endl;
return 0;
}
There's a solution for both positive and negative x but only for positive y with just 1 division and without branches:
int div_ceil(int x, int y) {
return x / y + (x % y > 0);
}
Note, if x is positive then division is towards zero, and we should add 1 if reminder is not zero.
If x is negative then division is towards zero, that's what we need, and we will not add anything because x % y is not positive
How about this? (requires y non-negative, so don't use this in the rare case where y is a variable with no non-negativity guarantee)
q = (x > 0)? 1 + (x - 1)/y: (x / y);
I reduced y/y to one, eliminating the term x + y - 1 and with it any chance of overflow.
I avoid x - 1 wrapping around when x is an unsigned type and contains zero.
For signed x, negative and zero still combine into a single case.
Probably not a huge benefit on a modern general-purpose CPU, but this would be far faster in an embedded system than any of the other correct answers.
I would have rather commented but I don't have a high enough rep.
As far as I am aware, for positive arguments and a divisor which is a power of 2, this is the fastest way (tested in CUDA):
//example y=8
q = (x >> 3) + !!(x & 7);
For generic positive arguments only, I tend to do it like so:
q = x/y + !!(x % y);
This works for positive or negative numbers:
q = x / y + ((x % y != 0) ? !((x > 0) ^ (y > 0)) : 0);
If there is a remainder, checks to see if x and y are of the same sign and adds 1 accordingly.
simplified generic form,
int div_up(int n, int d) {
return n / d + (((n < 0) ^ (d > 0)) && (n % d));
} //i.e. +1 iff (not exact int && positive result)
For a more generic answer, C++ functions for integer division with well defined rounding strategy
For signed or unsigned integers.
q = x / y + !(((x < 0) != (y < 0)) || !(x % y));
For signed dividends and unsigned divisors.
q = x / y + !((x < 0) || !(x % y));
For unsigned dividends and signed divisors.
q = x / y + !((y < 0) || !(x % y));
For unsigned integers.
q = x / y + !!(x % y);
Zero divisor fails (as with a native operation). Cannot cause overflow.
Corresponding floored and modulo constexpr implementations here, along with templates to select the necessary overloads (as full optimization and to prevent mismatched sign comparison warnings):
https://github.com/libbitcoin/libbitcoin-system/wiki/Integer-Division-Unraveled
Compile with O3, The compiler performs optimization well.
q = x / y;
if (x % y) ++q;