I am doing this coding question where they ask you to enter numbers N and M, and you are supposed to output the Nth fibonacci number mod M. My code runs rather slowly and I would like to learn how to speed it up.
#include<bits/stdc++.h>
using namespace std;
long long fib(long long N)
{
if (N <= 1)
return N;
return fib(N-1) + fib(N-2);
}
int main ()
{
long long N;
cin >> N;
long long M;
cin >> M;
long long b;
b = fib(N) % M;
cout << b;
getchar();
return 0;
}
While the program you wrote is pretty much the go-to example of recursion in education, it is really a pretty damn bad algorithm as you have found out. Try to write up the call tree for fib(7) and you will find that the number of calls you make balloons dramatically.
There are many ways of speeding it up and keeping it from recalculating the same values over and over. Somebody already linked to a bunch of algorithms in the comments - a simple loop can easily make it linear in N instead of exponential.
One problem with this though is that fibonacci numbers grow pretty fast: You can hold fib(93) in a 64 bit integer, but fib(94) overflows it.
However, you don't want the N'th fibonacci number - you want the N'th mod M. This changes the challenge a bit, because as long as M is smaller than MAX_INT_64 / 2 then you can calculate fib(N) mod M for any N.
Turn your attention to Modular arithmetic and the congruence relations. Specifically the one for addition, which says (changed to C++ syntax and simplified a bit):
If a1 % m == b1 and a2 % m == b2 then (a1 + a2) % m == (b1 + b2) % m
Or, to give an example: 17 % 3 == 2, 22 % 3 == 1 => (17 + 22) % 3 == (2 + 1) % 3 == 3 % 3 == 0
This means that you can put the modulo operator into the middle of your algorithm so that you never add big numbers together and never overflow. This way you can easily calculate f.ex. fib(10000) mod 237.
There is one simple optimatimization in calling fib without calculating duplicate values. Also using loops instead of recursion may speed up the process:
int fib(int N) {
int f0 = 0;
int f1 = 1;
for (int i = 0; i < N; i++) {
int tmp = f0 + f1;
f0 = f1;
f1 = tmp;
}
return f1;
}
You can apply the modulo operator sugested by #Frodyne on top of this.
1st observation is that you can turn the recursion into a simple loop:
#include <cstdint>
std::uint64_t fib(std::uint16_t n) {
if (!n)
return 0;
std::uint64_t result[]{ 0,1 };
bool select = 1;
for (auto i = 1; i < n; ++i , select=!select)
{
result[!select] += result[select];
};
return result[select];
};
next you can memoize it:
#include <cstdint>
#include <vector>
std::uint64_t fib(std::uint16_t n) {
static std::vector<std::uint64_t> result{0,1};
if (result.size()>n)
return result[n];
std::uint64_t back[]{ result.crbegin()[1],result.back() };
bool select = 1;
result.reserve(n + 1);
for (auto i=result.size(); i < result.capacity();++i, select = !select)
result.push_back(back[!select] += back[select]);
return result[n];
};
Another option would be an algebraic formula.
cheers,
FM.
Related
I am new to C++ and I have to optimise this code so that it gets executed within 1.5 secs for Input values upto 10^6.
But this code takes 3.52 seconds for input 10^6 to get executed.
I had tried a lot and came up with this code.
#include <iostream>
#include <iostream>
#include <vector>
#include <bits/stdc++.h>
#include <iterator>
#include <utility>
#include <boost/multiprecision/cpp_int.hpp>
using boost::multiprecision::cpp_int;
using namespace std;
cpp_int gcd(cpp_int a, cpp_int b)
{
// Everything divides 0
if (a == 0)
return b;
if (b == 0)
return a;
// base case
if (a == b)
return a;
// a is greater
if (a > b)
return gcd(a - b, b);
return gcd(a, b - a);
}
int main()
{
ios_base::sync_with_stdio(false);
cin.tie(NULL);
cpp_int t, ans;
std::cin >> t;
while (t-- > 0) {
cpp_int k;
std::cin >> k;
cpp_int limit = (2 * k) + 1;
ans = 0;
vector<cpp_int> g1;
for (int i = 1; i <= limit; ++i)
g1.push_back(k + (i * i));
for (int i = 0; i <= (2 * k) - 1; ++i) {
ans += gcd(g1[i], g1[i + 1]);
}
std::cout << ans << std::endl;
}
return 0;
}
Constraints --> 1 ≤ t ≤ 10^6
& 1 ≤ k ≤ 10^6
std::endl is better to be replaced with '\n'. std::endl flushes output stream and it is not a cheap operation, because it ignores all the advantages of buffered output.
g1.reserve(MAX_G1_SIZE) after g1 initialization. It looks like your g1 vector is big enough at the end of the program. a vector is a dynamic array, which means that as soon as its capacity becomes insufficient, a larger piece of memory is allocated in the heap and existing elements are copied there. This operation has linear complexity but is rarely performed. The solution is to first tell the vector what size to reserve with the command g1.reserve(MAX_G1_SIZE) (MAX_G1_SIZE is maximal g1 size after program evaluation. For example, we can put MAX_G1_SIZE = t).
Make gcd not recursive. This usually doesn't speed things up much, but it may help.
You do not reserve memory in advance, so you suffer from continuous re-allocations, especially as you have a locally created std::vector.
Keeping the vector outside of the loop allows to re-use previously allocated memory (you'd simply clear the vector). However it is pretty simple to get along without an additional vector at all:
cpp_int n = k + 1;
for (int i = 2; i <= limit; ++i)
{
cpp_int m = k + i * i; // if multiplication is costly: = m + i << 1 - 1
ans += gcd(n, m);
n = m;
}
Not sure how complex modulo operator is with boost's multiprecision library, but I'd assume you get further improvement by using that one instead of subtraction:
return b == 0 ? a : gcd(b, a % b);
Modulo reaches quicker the gcd, additionally you spare some if's (if a is 0 or smaller than b on first call, you get one additional recursion that only swaps the two values).
In my code I am trying to multiply two numbers. The algorithm is simple as (k)*(k-1)^n. I stored the product (k-1)^n in variable p1 and then I multiply it with k. For n=10, k=10 (k-1)^n-1 should be 387420489 and I got this in variable p1 but on multiplying it with k, I get a negative number. I used modulus but instead of 3874208490, I get some other large positive number. What is the correct approach?
#include <iostream>
using namespace std;
typedef long long ll;
ll big = 1000000000 + 7;
ll multiply(ll a, ll b)
{
ll ans = 1;
for (int i = 1; i <= b; i++)
ans = ans * a;
return ans % big;
}
int main()
{
int t;
scanf("%d", &t);
while (t--)
{
ll n, k;
cin >> n >> k;
ll p1 = multiply(k - 1, n - 1);
cout << p1 << endl; // this gives correct value
ll p2 = (k % big) * (p1 % big);
cout << ((p2 + big) % big) % big << endl;
}
}
What is ll type? If it is just int (and I pretty sure it is), it gets overflowed, because 32-bit signed type can't store values more than (2^31)-1, which approximately equals to 2 * 10^9. You can use long long int to make it work, then your code will work with the results less than 2^63.
It's not surprising you get an overflow. I plugged your equation into wolfram alpha, fixing n at 10 and iterating over k from 0 to 100.
The curve gets very vertical, very quickly at around k = 80.
10^21 requires 70 binary bits to represent it, and you only have 63 in a long long.
You're going to have to decide what the limits of this algorithm's parameters are and pick data types corresponding. Perhaps a double would be more suitable?
link to plot is here
I am trying to solve a question in which i need to find out the number of possible ways to make a team of two members.(note: a team can have at most two person)
After making this code, It works properly but in some test cases it shows floating point error ad i can't find out what it is exactly.
Input: 1st line : Number of test cases
2nd line: number of total person
Thank you
#include<iostream>
using namespace std;
long C(long n, long r)
{
long f[n + 1];
f[0] = 1;
for (long i = 1; i <= n; i++)
{
f[i] = i * f[i - 1];
}
return f[n] / f[r] / f[n - r];
}
int main()
{
long n, r, m,t;
cin>>t;
while(t--)
{
cin>>n;
r=1;
cout<<C(n, min(r, n - r))+1<<endl;
}
return 0;
}
You aren't getting a floating point exception. You are getting a divide by zero exception. Because your code is attempting to divide by the number 0 (which can't be done on a computer).
When you invoke C(100, 1) the main loop that initializes the f array inside C increases exponentially. Eventually, two values are multiplied such that i * f[i-1] is zero due to overflow. That leads to all the subsequent f[i] values being initialized to zero. And then the division that follows the loop is a division by zero.
Although purists on these forums will say this is undefined, here's what's really happening on most 2's complement architectures. Or at least on my computer....
At i==21:
f[20] is already equal to 2432902008176640000
21 * 2432902008176640000 overflows for 64-bit signed, and will typically become -4249290049419214848 So at this point, your program is bugged and is now in undefined behavior.
At i==66
f[65] is equal to 0x8000000000000000. So 66 * f[65] gets calculated as zero for reasons that make sense to me, but should be understood as undefined behavior.
With f[66] assigned to 0, all subsequent assignments of f[i] become zero as well. After the main loop inside C is over, the f[n-r] is zero. Hence, divide by zero error.
Update
I went back and reverse engineered your problem. It seems like your C function is just trying to compute this expression:
N!
-------------
R! * (N-R)!
Which is the "number of unique sorted combinations"
In which case instead of computing the large factorial of N!, we can reduce that expression to this:
n
[ ∏ i ]
n-r
--------------------
R!
This won't eliminate overflow, but will allow your C function to be able to take on larger values of N and R to compute the number of combinations without error.
But we can also take advantage of simple reduction before trying to do a big long factorial expression
For example, let's say we were trying to compute C(15,5). Mathematically that is:
15!
--------
10! 5!
Or as we expressed above:
1*2*3*4*5*6*7*8*9*10*11*12*13*14*15
-----------------------------------
1*2*3*4*5*6*7*8*9*10 * 1*2*3*4*5
The first 10 factors of the numerator and denominator cancel each other out:
11*12*13*14*15
-----------------------------------
1*2*3*4*5
But intuitively, you can see that "12" in the numerator is already evenly divisible by denominators 2 and 3. And that 15 in the numerator is evenly divisible by 5 in the denominator. So simple reduction can be applied:
11*2*13*14*3
-----------------------------------
1 * 4
There's even more room for greatest common divisor reduction, but this is a great start.
Let's start with a helper function that computes the product of all the values in a list.
long long multiply_vector(std::vector<int>& values)
{
long long result = 1;
for (long i : values)
{
result = result * i;
if (result < 0)
{
std::cout << "ERROR - multiply_range hit overflow" << std::endl;
return 0;
}
}
return result;
}
Not let's implement C as using the above function after doing the reduction operation
long long C(int n, int r)
{
if ((r >= n) || (n < 0) || (r < 0))
{
std::cout << "invalid parameters passed to C" << std::endl;
return 0;
}
// compute
// n!
// -------------
// r! * (n-r)!
//
// assume (r < n)
// Which maps to
// n
// [∏ i]
// n - r
// --------------------
// R!
int end = n;
int start = n - r + 1;
std::vector<int> numerators;
std::vector<int> denominators;
long long numerator = 1;
long long denominator = 1;
for (int i = start; i <= end; i++)
{
numerators.push_back(i);
}
for (int i = 2; i <= r; i++)
{
denominators.push_back(i);
}
size_t n_length = numerators.size();
size_t d_length = denominators.size();
for (size_t n = 0; n < n_length; n++)
{
int nval = numerators[n];
for (size_t d = 0; d < d_length; d++)
{
int dval = denominators[d];
if ((nval % dval) == 0)
{
denominators[d] = 1;
numerators[n] = nval / dval;
}
}
}
numerator = multiply_vector(numerators);
denominator = multiply_vector(denominators);
if ((numerator == 0) || (denominator == 0))
{
std::cout << "Giving up. Can't resolve overflow" << std::endl;
return 0;
}
long long result = numerator / denominator;
return result;
}
You are not using floating-point. And you seem to be using variable sized arrays, which is a C feature and possibly a C++ extension but not standard.
Anyway, you will get overflow and therefore undefined behaviour even for rather small values of n.
In practice the overflow will lead to array elements becoming zero for not much larger values of n.
Your code will then divide by zero and crash.
They also might have a test case like (1000000000, 999999999) which is trivial to solve, but not for your code which I bet will crash.
You don't specify what you mean by "floating point error" - I reckon you are referring to the fact that you are doing an integer division rather than a floating point one so that you will always get integers rather than floats.
int a, b;
a = 7;
b = 2;
std::cout << a / b << std::endl;
this will result in 3, not 3.5! If you want floating point result you should use floats instead like this:
float a, b;
a = 7;
b = 2;
std::cout << a / b << std::end;
So the solution to your problem would simply be to use float instead of long long int.
Note also that you are using variable sized arrays which won't work in C++ - why not use std::vector instead??
Array syntax as:
type name[size]
Note: size must a constant not a variable
Example #1:
int name[10];
Example #2:
const int asize = 10;
int name[asize];
I have found a formula that solves a problem but I can't make it work for large numbers. The n-th factor would be the (n-1)-th factor + (n-1)*(n-1) + n * n
So I wrote this function:
inline long long int formula(long long int n)
{
if(n==1)return 1;
return formula(n-1)+(n-1)*(n-1)+*n*n;
}
and since the answer has to be calculated modulo 666013, I added this (MOD=666013):
inline long long int formula(long long int n)
{
if(n==1)return 1;
return ((formula(n-1)%MOD+(1LL*(n-1)*(n-1))%MOD)%MOD+(1LL*n*n)%MOD)%MOD;
}
I probably didn't use modulo correctly. My function has to work for numbers as large as 2.000.000.000 and it stops working at about 30.000
EDIT: I've tried using a loop and I still can't make it work for numbers larger than 20.000.000. This is what I'm using:
ans=1;
for(i=2;i<=n;i++)
{
ans=(ans%MOD+1LL*(i-1)*(i-1)%MOD+1LL*i*i%MOD)%MOD;
}
I don't understand why you are using a recursive function for this. It will work at a low number of calls, but if you recursively call it a few milion times, well... it will not. The reason is that you are calling a function within another function within another function... too many times provoking the program to collapse or named as "Stack Overflow".
The best possible way to overcome this, is to use a loop to fix it up! Just iterate from 0 to n (n being the number you want to obtain).
Simplify as much as possible in order to be able to see the requirements:
typedef long long val_t;
#define MOD ((val_t) 666013)
// for really big numbers, change #if to 1
#if 0
#define MODOF(_x) ((_x) % MOD)
#else
#define MODOF(_x) (_x)
#endif
#define SQR(_i) MODOF((_i) * (_i))
val_t
formula(val_t n)
{
val_t i;
val_t ans;
ans = 0;
for (i = 1; i <= n; ++i) {
ans += SQR(i-1);
ans += SQR(i);
ans %= MOD;
}
return ans;
}
UPDATE: I'm so used to seeing factorial herein that I wrote the wrong formula. Now corrected.
Iterative version of your code is in below . You can use it
inline long long int formula(long long int n)
{
long long f = 1;
for (int i = 2; i <= n; i++)
{
f = ((f % MOD + (1LL * (i - 1)*(i - 1)) % MOD) % MOD + (1LL * i*i) % MOD) % MOD;
}
return f;
}
The loop will take quite a long time if you need to calculate it for size of 2 billion. However, the recursive equation leads trivially to
sum [i * i+(i-1)*(i-1)] = sum [2* i * i - 2*i + 1].
You can use the equation for the sum of first n squares and the arithmetic sequence to simplify this to:
2*n(n * n + 1) / 3
Now you can further reduce this using a * b % c = (a % c) * (b %c). However, the division by 3 and modulus operation does not commute. So you need to write the equation as
( ((2*(n % MOD)) %MOD) * (((n % MOD) * (n % MOD)) +1) %MOD) * 444009) % MOD,
where the 444009 is the modular inverse of 3 mod MOD, i.e, 3*444009 % MOD =1.
EDIT: Added the discussion about commuting modulus and division operators as pointed out by Raymond Chen the modulus and division do not commute.
I have made a recursive function in c++ which deals with very large integers.
long long int findfirst(int level)
{
if(level==1)
return 1;
else if(level%2==0)
return (2*findfirst(--level));
else
return (2*findfirst(--level)-1);
}
when the input variable(level) is high,it reaches the limit of long long int and gives me wrong output.
i want to print (output%mod) where mod is 10^9+7(^ is power) .
int main()
{
long long int first = findfirst(143)%1000000007;
cout << first;
}
It prints -194114669 .
Normally online judges problem don't require the use of large integers (normally meaning almost always), if your solution need large integers probably is not the best solution to solve the problem.
Some notes about modular arithmetic
if a1 = b1 mod n and a2 = b2 mod n then:
a1 + a2 = b1 + b2 mod n
a1 - a2 = b1 - b2 mod n
a1 * a2 = b1 * b2 mod n
That mean that modular arithmetic is transitive (a + b * c) mod n could be calculated as (((b mod n) * (c mod n)) mod n + (a mod n)) mod n, I know there a lot of parenthesis and sub-expression but that is to avoid integer overflow as much as we can.
As long as I understand your program you don't need recursion at all:
#include <iostream>
using namespace std;
const long long int mod_value = 1000000007;
long long int findfirst(int level) {
long long int res = 1;
for (int lev = 1; lev <= level; lev++) {
if (lev % 2 == 0)
res = (2*res) % mod_value;
else
res = (2*res - 1) % mod_value;
}
return res;
}
int main() {
for (int i = 1; i < 143; i++) {
cout << findfirst(i) << endl;
}
return 0;
}
If you need to do recursion modify you solution to:
long long int findfirst(int level) {
if (level == 1)
return 1;
else if (level % 2 == 0)
return (2 * findfirst(--level)) % mod_value;
else
return (2 * findfirst(--level) - 1) % mod_value;
}
Where mod_value is the same as before:
Please make a good study of modular arithmetic and apply in the following online challenge (the reward of discovery the solution yourself is to high to let it go). Most of the online challenge has a mathematical background.
If the problem is (as you say) it overflows long long int, then use an arbitrary precision Integer library. Examples are here.