The following code is a solution to a problem statement from a contest.
The time constraint given was 1s. The code worked correctly for 5/7 test cases. For the rest cases, the time limit was exceeded.
How can the time complexity for the code below be reduced?
Edit:
The problem statement is defined as return the value of number n or sum of n/2,n/3,n/4 whichever is maximum.
For example, if input is 24
it can be reduced or exchanged for
12+8+6=26
Further, 12 can be reduced to 6+4+3=13.
8 and 6 should not be reduced as it may decrease the value.
So final answer is 13+8+6=27
#include <cmath>
#include <cstdio>
#include <vector>
#include <iostream>
#include <algorithm>
#define lli long long int
using namespace std;
lli exchange(lli n){
if(n<12)
return n;
else{
lli sum=0;
sum+=max(n/2,exchange(n/2));
sum+=max(n/3,exchange(n/3));
sum+=max(n/4,exchange(n/4));
return sum;
}
}
int main() {
lli t;
cin>>t;
while(t--){
lli n;
cin>>n;
lli ans;
ans=max(n,exchange(n));
cout<<ans<<endl;
}
return 0;
}
Just trying some ideas out. First, the "true" branch of an if-statement is the branch that the compiler pre-loads instructions on. By making the high-n branch the default, it's a little faster.
EDITED: Some of the other ideas didn't work out (no faster). However, one that is promising is to unroll two levels of the recursion.
exchange(n) = exchange(n/2) + exchange(n/3) + exchange(n/4)
exchange(n/2) = exchange(n/2/2) + exchange(n/3/2) + exchange(n/4/2)
exchange(n/3) = exchange(n/2/3) + exchange(n/3/3) + exchange(n/4/3)
exchange(n/4) = exchange(n/2/4) + exchange(n/3/4) + exchange(n/4/4)
However, these become untrue if n/4 < 12. So we can unroll one full level of recursion and only use the fallback for n < 48. Here's an "opt" version of exchange:
lli exchange_opt(lli n)
{
if (n >= 12 * 4) // for n=48+ none of the terms would trigger n < 12
{
lli sum = 0;
sum += exchange_opt(n / 4);
sum += exchange_opt(n / 6) * 2;
sum += exchange_opt(n / 8) * 2;
sum += exchange_opt(n / 9);
sum += exchange_opt(n / 12) * 2;
sum += exchange_opt(n / 16);
return sum;
}
if (n > 11)
{
lli sum = 0;
sum += exchange_opt(n / 2);
sum += exchange_opt(n / 3);
sum += exchange_opt(n / 4);
return sum;
}
return n;
}
this is 4 times faster on my machine than the default implementation, and the idea is extensible, e.g. you could unroll three levels of recursion, but increase the n-count at which it applies. Here's a version which unrolls three levels of recursion from the base case, and combines like terms to reduce function calls. It's 8 times faster now:
lli exchange_opt(lli n)
{
if (n >= 12 * 4 * 4)
// for n=48+ none of the core terms would trigger n < 12
{
lli sum = 0;
sum += exchange_opt(n / 8);
sum += exchange_opt(n / 12) * 3;
sum += exchange_opt(n / 16) * 3;
sum += exchange_opt(n / 18) * 3;
sum += exchange_opt(n / 24) * 6;
sum += exchange_opt(n / 27);
sum += exchange_opt(n / 32) * 3;
sum += exchange_opt(n / 36) * 3;
sum += exchange_opt(n / 48) * 3;
sum += exchange_opt(n / 64);
return sum;
}
if (n >= 12 * 4)
// for n=48+ none of the core terms would trigger n < 12
{
lli sum = 0;
sum += exchange_opt(n / 4);
sum += exchange_opt(n / 6) * 2;
sum += exchange_opt(n / 8) * 2;
sum += exchange_opt(n / 9);
sum += exchange_opt(n / 12) * 2;
sum += exchange_opt(n / 16);
return sum;
}
if (n >= 12)
{
lli sum = 0;
sum += exchange_opt(n / 2);
sum += exchange_opt(n / 3);
sum += exchange_opt(n / 4);
return sum;
}
return n;
}
BTW for testing, i ran all the numbers from 0 ... 9999 through the system, then added up the time for the OP's original function and my functions, while testing that the results were equal. As this is optimized for large numbers, the results might be even better on very large numbers.
I'm guessing that every level of recursion that's unrolled will roughly double the speed of this algorithm. Instead of computing the unrolling by hand as I have done, it might actually be possible to write a program that outputs the correct equation for the level of unrolling needed. Basically to compute exchange(n) in the minimum time you want to unroll to the nearest level of recursion "k" where n >= 12 * 4^k
But enough of manually unrolling the loop. Here's a recursive function that generates the recursive function to whatever level of unrolling needed. it uses std::vector, std::map so you'll need to include the right headers:
std::vector<std::map<lli, lli>> map1;
map1.push_back(std::map<lli, lli>());
map1[0][2] = 1;
map1[0][3] = 1;
map1[0][4] = 1;
const int unrolled_levels = 20;
for (int level = 1; level < unrolled_levels; ++level)
{
map1.push_back(std::map<lli, lli>());
for (auto i = map1[level - 1].begin(); i != map1[level - 1].end(); ++i)
{
map1[level][(*i).first * 2] += map1[level - 1][(*i).first];
map1[level][(*i).first * 3] += map1[level - 1][(*i).first];
map1[level][(*i).first * 4] += map1[level - 1][(*i).first];
}
}
int level = unrolled_levels - 1;
std::cout << "\tlli exchange_opt(lli n) // unroll" << level << "\n\t{\n\n";
for (int inner_level = level; inner_level >= 0; --inner_level)
{
lli mult = 12;
std::cout << "\t\tif (n >= 12LL ";
for (auto i = 0; i < inner_level; ++i)
{
std::cout << " * 4LL";
mult *= 4LL;
}
std::cout << ") // " << (mult) << "\n\t\t{\n";
std::cout << "\t\t\tlli sum = 0;\n";
for (auto i = map1[inner_level].begin(); i != map1[inner_level].end(); ++i)
{
std::cout << "\t\t\tsum += exchange_opt(n/" << (*i).first << "LL)";
if ((*i).second > 1) std::cout << " * " << (*i).second;
std::cout <<"; \n";
}
std::cout << "\t\t\treturn sum;\n";
std::cout << "\t\t}\n";
}
std::cout << "\t\treturn n;\n";
std::cout << "\n\t}\n\n";
Basically, you set unrolled_levels to whatever you want. Each level unrolls the equation for 4 time bigger numbers. Just be aware that the output function is going to be huge, it tests the number range for n and then proceeds to short-circuit the sub-levels as much as possible. For some higher numbers it works out a partial value and mults by thousands or millions, effective short-circuiting millions of function calls.
Copy and paste the output from this code and use it as the function for calculating exchange(n). For numbers around 1 million, it's 200 times faster than the original formula (0.5% running time). For numbers around 100 million, it took 1/70 of 1% of the original equation, 7000 times faster.
BTW this could be even faster. I haven't gone through and collected terms which are multiplied by like-constants in the same branch.
One of the standard trade-offs in any algorithm is time vs. space; the more memory or disk space you have, the more time you can save, or vice versa. In this case, you need to run within a specific time, but you appear to be allowed to use the full memory of the machine. Therefore, noting that this particular algorithm frequently requests values that it has already calculated, it should be worth saving them all for quick lookup.
Indeed, Python, not often regarded as particularly fast, can calculate 10295 in about a second, though 10300 runs into a maximum recursion depth error if run with an empty result cache:
exchanged = {}
def exchange(n):
if n in exchanged:
value = exchanged[n]
elif n < 12:
exchanged[n] = value = n
else:
exchanged[n] = value = exchange(n//2) + exchange(n//3) + exchange(n//4)
return value
exchange(10**295)
For C++, a static std::map<lli, lli> should work for in place of the exchanged dict. Don't try using an array, though, because you don't need nearly as many values as the largest one calculated; 10295, for example, uses less than 300,000 results.
And yes, the max part can be omitted, because it's handled by the n < 12 check. We can prove this by noting that it would only be necessary if (n-1)/2 + (n-2)/3 + (n-3)/4 < n (accounting for the way integer division throws away the remainder), throwing out all cases above 23; the remaining cases are easy to check by hand, and the largest exception happens to be 11. But that's a minor optimization compared with changing your algorithm from O(n*log(n)) to O(log(n)).
Related
I'm working on a code that calculates PI with n terms. However, my code only works correctly with some values of n.
This piece of code even numbers do not work and when I switch up the negative sign the odd numbers do not work.
double PI(int n, double y=2){
double sum = 0;
if (n==0){
return 3;
}else if (n % 2 != 0){
sum = (4/(y*(y+1)*(y+2)))+(PI (n - 1 ,y+2)) ;
}else{
sum= -(4/(y*(y+1)*(y+2)))+PI (n - 1,y+2) ;
}
return sum;
}
int main(int argc, const char * argv[]) {
double n = PI (2,2);
cout << n << endl;
}
For n = 2 I expected a result of 3.1333 but I got a value of 2.86667
This is the formula for calculating PI , y is the denominator and n is the number of terms
Firstly, I will assume that a complete runnable case of your code looks like
#include <iostream>
using namespace std;
double PI(int n, double y=2){
double sum = 0;
if (n==0){
return 3;
}else if (n % 2 != 0){
sum = (4/(y*(y+1)*(y+2)))+(PI (n - 1 ,y+2)) ;
}else{
sum= -(4/(y*(y+1)*(y+2)))+PI (n - 1,y+2) ;
}
return sum;
}
int main(int argc, const char * argv[]) {
double n = PI (2,2);
cout << n << endl;
}
I believe that you are attempting to compute pi through the formula
(pi - 3)/4 = \sum_{k = 1}^{\infty} (-1)^{k+1} / ((2k(2k+1)(2k+2)),
(where here and elsewhere I use LaTeX code to represent mathy things). This is a good formula that converges pretty quickly despite being so simple. If you were to use the first two terms of the sum, you would find that
(pi - 3)/4 \approx 1/(2*3*4) - 1/(4*5*6) ==> pi \approx 3.13333,
which you seem to indicate in your question.
To see what's wrong, you might trace through your first function call with PI(2, 2). This produces three terms.
n=2: 2 % 2 == 0, so the first term is -4/(2*3*4) + PI(1, 4). This is the wrong sign.
n=1: 1 % 2 == 1, so the second term is 4/(4*5*6), which is also the wrong sign.
n=0: n == 0, so the third term is 3, which is the correct sign.
So you have computed
3 - 4/(2*3*4) + 4/(4*5*6)
and we can see that there are many sign errors.
The underlying reason is because you are determining the sign based on n, but if you examine the formula the sign depends on y. Or in particular, it depends on whether y/2 is odd or even (in your formulation, where you are apparently only going to provide even y values to your sum).
You should change y and n appropriately. Or you might recognize that there is no reason to decouple them, and use something like the following code. In this code, n represents the number of terms to use and we compute y accordingly.
#include <iostream>
using namespace std;
double updatedPI(int n)
{
int y = 2*n;
if (n == 0) { return 3; }
else if (n % 2 == 1)
{
return 4. / (y*(y + 1)*(y + 2)) + updatedPI(n-1);
}
else
{
return -4. / (y*(y + 1)*(y + 2)) + updatedPI(n-1);
}
}
int main() {
double n = updatedPI(3);
cout << n << endl;
}
The only problem with your code is that y is calculated incorrectly. It has to be equal to 2 * n. Simply modifying your code that way gives correct results:
Live demo: https://wandbox.org/permlink/3pZNYZYbtHm7k1ND
That is, get rid of the y function parameter and set int y = 2 * n; in your function.
I am calculating combination(15, 7) in C++.
I first used the following code and get the wrong answer due to a type promotion error.
#include <iostream>
int main()
{
int a = 15;
double ans = 1;
for(int i = 1; i <= 7; i++)
ans *= (a + 1 - i) / i;
std::cout << (int) ans;
return 0;
}
Output: 2520
So I changed ans *= (a + 1 - i) / i; to ans *= (double)(a + 1 - i) / i; and still get the wrong answer.
#include <iostream>
int main()
{
int a = 15;
double ans = 1;
for(int i = 1; i <= 7; i++)
ans *= (double) (a + 1 - i) / i;
std::cout << (int) ans;
return 0;
}
Output: 6434
Finally, I tried ans = ans * (a + 1 - i) / i, which gives the right answer.
#include <iostream>
int main()
{
int a = 15;
double ans = 1;
for(int i = 1; i <= 7; i++)
ans = ans * (a + 1 - i) / i;
std::cout << (int) ans;
return 0;
}
Output: 6435
Could someone tell me why the second one did not work?
If you print out ans without casting it to (int) you'll see the second result is 6434.9999999999990905052982270717620849609375. That's pretty darn close to the right answer of 6535, so it's clearly not a type promotion error any more.
No, this is classic floating point inaccuracy. When you write ans *= (double) (a + 1 - i) / i you are doing the equivalent of:
ans = ans * ((double) (a + 1 - i) / i);
Compare this to the third version:
ans = ans * (a + 1 - i) / i;
The former performs division first followed by multiplication. The latter operates left to right and so the multiplication precedes the division. This change in order of operations causes the results of the two to be slightly different. Floating point calculations are extremely sensitive to order of operations.
Quick fix: Don't truncate the result; round it.
Better fix: Don't use floating point for integral arithmetic. Save the divisions until after all the multiplications are done. Use long, long long, or even a big number library.
First one did not work because you have integer division there.
Difference btw second one and third one is this:
ans = ans * (double(a + 1 - i) / i); // second is equal to this
vs:
ans = (ans * (a + 1 - i)) / i; // third is equal to this
so difference is in order of multiplication and division. If you round double to integer instead of simply dropping fractional part you will get the same result.
std::cout << int( ans + 0.5 ) << std::endl;
I was given a task to write a program that displays:
I coded this:
#include<iostream.h>
#include<conio.h>
void main()
{
clrscr();
int a, n = 1, f = 1;
float s = 0;
cin >> a;
while(n <= a)
{
f = f * n;
s += 1 / (float)f;
n = n + 1;
}
cout << s;
getch();
}
So this displays -
s = 1 + 1/2! + 1/3! + 1/4! .... + 1/a!, including odd and even factorials.
For the past two hours I am trying to figure out how can I modify this code so that it displays the desired result. But I couldn't figure it out yet.
Question:
What changes should I make to my code?
You need to accumulate the sum while checking the counter n and only calculate the even factorials:
int n;
double sum = 1;
cin >> n;
for(int i = 2; i < n; ++i{
if(i % 2 == 0) sum += 1 / factorial(i);
}
In your code:
while(n <= a)
{
f = f * n;
// checks if n is even;
// n even if the remainder of the division by 2 is zero
if(n % 2 == 0){
s += 1 / (float)f;
}
n = n + 1;
}
12! is the largest value that fits in an 32 bit integer. You should use double for all the numbers. For even factorials, starting with f = 1 (0!), f = f * (n-1) * n, where n = 2, 4, 6, 8, ... .
You have almost everything you need in place (assuming you don't want to make design changes based on the issues brought up in the comments).
All you need to change is what you multiply f by in each step. To build up n! you are multiplying by n in each step. To build up (2n)! you would multiply by 2*n*(2*n-1)
Edit: Your second theory about what the instructor wants would need only slightly more of a change. Your inner loop could be replaced by
while(n < a)
{
f = f * n * (n+1);
s += 1 / f;
n = n + 2;
}
Edit2: To run your program I made several changes for I/O things you did that don't work in my copy of GCC. Hopefully those won't distract from the main point of the following code. I also added a second, more complicated and more accurate method of computing the answer to see how much was lost in floating point rounding.
So this code computes the answer twice, once by the method I suggested you change your code to and once by a more accurate method (using double instead of float and adding the numbers in the more accurate sequence via a recursive function). Then it display your answer and the difference between the two answers.
Running that shows the version I suggested gets all the displayed digits correct and is only wrong for the values of a I tried by tiny amounts that would need more display precision to notice:
#include<iostream>
using namespace std;
double fac_sum(int n, int a, double f)
{
if ( n > a )
return 0;
f *= n * (n-1);
return fac_sum(n+2, a, f) + 1 / f;
}
int main()
{
int a, n = 1;
float f = 1;
float s = 0;
cin >> a;
while(n < a)
{
f = f * n * (n+1);
s += 1 / f;
n = n + 2;
}
cout << s;
cout << " approx error was " << fac_sum( 2, a, 1.0)-s;
return 0;
}
For 8 that displays 0.54308 approx error was -3.23568e-08
I hope you understand the e-08 notation meaning the error is in the 8'th digit to the right of the .
Edit3: I changed f to float in this post because I had copied/tested thinking f was float, so parts of my answer didn't make sense when f was int
I am looking to implement the fermat's little theorem for prime testing. Here's the code I have written:
lld expo(lld n, lld p) //2^p mod n
{
if(p==0)
return 1;
lld exp=expo(n,p/2);
if(p%2==0)
return (exp*exp)%n;
else
return (((exp*exp)%n)*2)%n;
}
bool ifPseudoPrime(lld n)
{
if(expo(n,n)==2)
return true;
else
return false;
}
NOTE: I took the value of a(<=n-1) as 2.
Now, the number n can go as large as 10^18. This means that variable exp can reach values near 10^18. Which further implies that the expression (exp*exp) can reach as high as 10^36 hence causing overflow. How do I avoid this.
I tested this and it ran fine till 10^9. I am using C++
If the modulus is close to the limit of the largest integer type you can use, things get somewhat complicated. If you can't use a library that implements biginteger arithmetic, you can roll a modular multiplication yourself by splitting the factors in low-order and high-order parts.
If the modulus m is so large that 2*(m-1) overflows, things get really fussy, but if 2*(m-1) doesn't overflow, it's bearable.
Let us suppose you have and use a 64-bit unsigned integer type.
You can calculate the modular product by splitting the factors into low and high 32 bits, the product then splits into
a = a1 + (a2 << 32) // 0 <= a1, a2 < (1 << 32)
b = b1 + (b2 << 32) // 0 <= b1, b2 < (1 << 32)
a*b = a1*b1 + (a1*b2 << 32) + (a2*b1 << 32) + (a2*b2 << 64)
To calculate a*b (mod m) with m <= (1 << 63), reduce each of the four products modulo m,
p1 = (a1*b1) % m;
p2 = (a1*b2) % m;
p3 = (a2*b1) % m;
p4 = (a2*b2) % m;
and the simplest way to incorporate the shifts is
for(i = 0; i < 32; ++i) {
p2 *= 2;
if (p2 >= m) p2 -= m;
}
the same for p3 and with 64 iterations for p4. Then
s = p1+p2;
if (s >= m) s -= m;
s += p3;
if (s >= m) s -= m;
s += p4;
if (s >= m) s -= m;
return s;
That way is not very fast, but for the few multiplications needed here, it may be fast enough. A small speedup should be obtained by reducing the number of shifts; first calculate (p4 << 32) % m,
for(i = 0; i < 32; ++i) {
p4 *= 2;
if (p4 >= m) p4 -= m;
}
then all of p2, p3 and the current value of p4 need to be multiplied with 232 modulo m,
p4 += p3;
if (p4 >= m) p4 -= m;
p4 += p2;
if (p4 >= m) p4 -= m;
for(i = 0; i < 32; ++i) {
p4 *= 2;
if (p4 >= m) p4 -= m;
}
s = p4+p1;
if (s >= m) s -= m;
return s;
You can perform your multiplications in several stages. For example, say you want to compute X*Y mod n. Take X and Y and write them as X = 10^9*X_1 + X_0, Y = 10^9*Y_1 + Y_0. Then compute all four products X_i*Y_j mod n, and finally compute X = 10^18*(X_1*Y_1 mod n) + 10^9*( X_0*Y_1 + X_1*Y_0 mod n) + X_0*Y_0. Note that in this case, you are operating with numbers half the size of the maximum allowed.
If splitting in two parts do not suffice (I suspect this is the case), split in three parts using the same schema. Splitting in three should work.
A simpler approach is just to multiply the school way. It corresponds to the previous approach, but writing one number in as many parts as digits it has.
Good luck!
The following code is used to print an int. How can I modify it to print a long long int? Please explain.
For pc, read putchar_unlocked
inline void writeInt (int n)
{
int N = n, rev, count = 0;
rev = N;
if (N == 0) { pc('0'); pc('\n'); return ;}
while ((rev % 10) == 0) { count++; rev /= 10;}
rev = 0;
while (N != 0) { rev = (rev<<3) + (rev<<1) + N % 10; N /= 10;}
while (rev != 0) { pc(rev % 10 + '0'); rev /= 10;}
while (count--) pc('0');
pc('\n');
return ;
}
There's nothing specific about int in the code. Just replace both occurrences of "int" by "long long int", and you're done.
(I find the "optimization" of *10 via shift and add quite ridiculous with all the divisions that remain. Any decent C compiler will do that (and much more) automatically. And don't forget to profile this "fast" version against the stdlib routine, to be sure it really was worth the effort).
This code is a lit more complex than it needs to be:
inline void writeLongLong (long long n)
{
char buffer[sizeof(n) * 8 * 3 / 10 + 3]; // 3 digits per 10 bits + two extra and space for terminating zero.
int index = sizeof(buffer)-1;
int end = index;
buffer[index--] = 0;
do {
buffer[index--] = (n % 10) + '0';
n /= 10;
} while(n);
puts(&buffer[index+1]);
}
This does the same job, with about half as many divide/modulo operations and at least I can follow it better. Note that stdio/stdlib functions are probably better than this, and this function does not cope with negative numbers (neither does the one posted above).