I'm working on a fibonacci algorithm for really big numbers (100k th number). I need to make this run faster though, but just a couple of seconds and I ran out of ideas. Is there any way to make it faster? Thanks for help.
#include <iostream>
using namespace std;
int main() {
string elem_major = "1";
string elem_minor = "0";
short elem_maj_int;
short elem_min_int;
short sum;
int length = 1;
int ten = 0;
int n;
cin >> n;
for (int i = 1; i < n; i++)
{
for (int j = 0; j < length; j++)
{
elem_maj_int = short(elem_major[j] - 48);
elem_min_int = short(elem_minor[j] - 48);
sum = elem_maj_int + elem_min_int + ten;
ten = 0;
if (sum > 9)
{
sum -= 10;
ten = 1;
if (elem_major[j + 1] == NULL)
{
elem_major += "0";
elem_minor += "0";
length++;
}
}
elem_major[j] = char(sum + 48);
elem_minor[j] = char(elem_maj_int + 48);
}
}
for (int i = length-1; i >= 0; i--)
{
cout << elem_major[i];
}
return 0;
}
No matter how good optimizations you perform on a given code, without changing the underlying algorithm you can only optimize it marginally. Your approach is with linear complexity and for big values it will quickly become slow. A faster implementation of Fibonacci numbers is by doing matrix exponentiation by squaring on the matrix:
0 1
1 1
This approach will be with logarithmic complexity which is asymptotically better. Perform a few exponentiations of this matrix and you'll notice that the n + 1st Fibonacci number is at its lower right corner.
I suggest you use something like cpp-bigint (http://sourceforge.net/projects/cpp-bigint/) for your big numbers.
The code would look like this then
#include <iostream>
#include "bigint.h"
using namespace std;
int main() {
BigInt::Rossi num1(0);
BigInt::Rossi num2(1);
BigInt::Rossi num_next(1);
int n = 100000;
for (int i = 0; i < n - 1; ++i)
{
num_next = num1 + num2;
num1 = std::move(num2);
num2 = std::move(num_next);
}
cout << num_next.toStrDec() << endl;
return 0;
}
Quick benchmark on my machine:
time ./yourFib
real 0m8.310s
user 0m8.301s
sys 0m0.005s
time ./cppBigIntFib
real 0m2.004s
user 0m1.993s
sys 0m0.006s
I would save some precomputed points (especially since you are looking for really big numbers)
ie say I saved 500th and 501st fib number. Then if some one asks me what is 600th fib? I would start computing from 502 rather than from 1. This would really save time.
Now the question how many points you would save and how would select the points to save?
The answer to this question totally depends on the application and probable distribution.
Related
Digressing a bit on the site I came across this question, It is clear to me that generating prime numbers is complicated and some of the solutions given in that question to make the problem easier are good and ingenious, It occurs to me that perhaps giving the program some leading digits for large primes will make the search easier, is this correct? For example, perhaps finding 10-digit primes starting with 111 is easier than generating all 10-digit primes (even the more leading digits provided this makes it less complex).
Searching on the net (I must clarify that I am not a mathematician and I am not a computer scientist) I found the following code to generate primes of n digits
#include <bits/stdc++.h>
using namespace std;
const int sz = 1e5;
bool isPrime[sz + 1];
// Function for Sieve of Eratosthenes
void sieve() {
memset(isPrime, true, sizeof(isPrime));
isPrime[0] = isPrime[1] = false;
for (int i = 2; i * i <= sz; i++) {
if (isPrime[i]) {
for (int j = i * i; j < sz; j += i) {
isPrime[j] = false;
}
}
}
}
// Function to print all the prime
// numbers with d digits
void findPrimesD(int d) {
// Range to check integers
int left = pow(10, d - 1);
int right = pow(10, d) - 1;
// For every integer in the range
for (int i = left; i <= right; i++) {
// If the current integer is prime
if (isPrime[i]) {
cout << i << " ";
}
}
}
// Driver code
int main() {
// Generate primes
sieve();
int d = 1;
findPrimesD(d);
return 0;
}
My question is, how to take advantage of this code to give it the first m digits and thus make the search easier and smaller?
Given an array of N numbers (not necessarily sorted). We can merge any two numbers into one and the cost of merging the two numbers is equal to the sum of the two values. The task is to find the total minimum cost of merging all the numbers.
Example:
Let the array A = [1,2,3,4]
Then, we can remove 1 and 2, add both of them and keep the sum back in array. Cost of this step would be (1+2) = 3.
Now, A = [3,3,4], Cost = 3
In second step, we can 3 and 3, add both of them and keep the sum back in array. Cost of this step would be (3+3) = 6.
Now, A = [4,6], Cost = 6
In third step, we can remove both elements from the array and keep the sum back in array again. Cost of this step would be (4+6) = 6.
Now, A = [10], Cost = 10
So, total cost turns out to be 19 (10+6+3).
We will have to pick the 2 smallest elements to minimize our total cost. A simple way to do this is using a min heap structure. We will be able to get the minimum element in O(1) and insertion will be O(log n).
The time complexity of this approach is O(n log n).
But I tried another approach, and wasn't able to find the cases where it fails. The basic idea was that the sum of two smallest elements that we will choose at any time will always be greater than the sum of the pair of elements chosen before. So the "temp" array will always be sorted, and we will be able to access the minimum elements in O(1).
As I am sorting the input array and then simply traversing the array, the complexity of my approach is O(n log n).
int minCost(vector<int>& arr) {
sort(arr.begin(), arr.end());
// temp array will contain the sum of all the pairs of minimum elements
vector<int> temp;
// index for arr
int i = 0;
// index for temp
int j = 0;
int cost = 0;
// while we have more than 1 element combined in both the input and temp array
while(arr.size() - i + temp.size() - j > 1) {
int num1, num2;
// selecting num1 (minimum element)
if(i < arr.size() && j < temp.size()) {
if(arr[i] <= temp[j])
num1 = arr[i++];
else
num1 = temp[j++];
}
else if(i < arr.size())
num1 = arr[i++];
else if(j < temp.size())
num1 = temp[j++];
// selecting num2 (second minimum element)
if(i < arr.size() && j < temp.size()) {
if(arr[i] <= temp[j])
num2 = arr[i++];
else
num2 = temp[j++];
}
else if(i < arr.size())
num2 = arr[i++];
else if(j < temp.size())
num2 = temp[j++];
// appending the sum of the minimum elements in the temp array
int sum = num1 + num2;
temp.push_back(sum);
cost += sum;
}
return cost;
}
Is this approach correct? If not, please let me know what I am missing, and the test cases in which this algorithm fails.
SPOJ Link for the same problem
The logic seems very solid to me... all the computed sums will never be decreasing and therefore you only need to add up either oldest two computed sums, next two elements or oldest sum and next element.
I would just simplify the code:
#include <vector>
#include <algorithm>
#include <stdio.h>
int hsum(std::vector<int> arr) {
int ni = arr.size(), nj = 0, i = 0, j = 0, res = 0;
std::sort(arr.begin(), arr.end());
std::vector<int> temp;
auto get = [&]()->int {
if (j == nj || (i < ni && arr[i] < temp[j])) return arr[i++];
return temp[j++];
};
while ((ni-i)+(nj-j)>1) {
int a = get(), b = get();
res += a+b;
temp.push_back(a + b); nj++;
}
return res;
}
int main() {
fprintf(stderr, "%i\n", hsum(std::vector<int>{1,4,2,3}));
return 0;
}
Very nice idea!
Another improvement is noting that the cumulative length of the two arrays being processed (the original one and the temporary one holding the sums) will decrease at every step.
Since the first step will use two input elements, the fact that the temporary array grows one element at each step will still not be enough for a "walking queue" allocated in the array itself to reach the reading pointer.
This means that there is no need of a temporary array and the space for the sums can be found in the array itself...
int hsum(std::vector<int> arr) {
int ni = arr.size(), nj = 0, i = 0, j = 0, res = 0;
std::sort(arr.begin(), arr.end());
auto get = [&]()->int {
if (j == nj || (i < ni && arr[i] < arr[j])) return arr[i++];
return arr[j++];
};
while ((ni-i)+(nj-j)>1) {
int a = get(), b = get();
res += a+b;
arr[nj++] = a + b;
}
return res;
}
About the error on SPOJ... I tried briefly to search for the problem but I didn't succeed. I tried however generating random arrays of random lengths and checking this solution with what finds a "brute-force" one implemented directly from the specs and I'm reasonably confident that the algorithm is correct.
I know at least one programming arena (Topcoder) where sometimes the problems are carefully crafted so that the computation gives correct results if using unsigned but not if using int (or if using unsigned long long but not if using long long) because of integer overflow.
I don't know if SPOJ also does this kind of nonsense(1)... may be that is the reason some hidden test case fails...
EDIT
Checking with SPOJ the algorithm passes if using long long values... this is the entry I used:
#include <stdio.h>
#include <algorithm>
#include <vector>
int main(int argc, const char *argv[]) {
int n;
scanf("%i", &n);
for (int testcase=0; testcase<n; testcase++) {
int sz; scanf("%i", &sz);
std::vector<long long> arr(sz);
for (int i=0; i<sz; i++) scanf("%lli", &arr[i]);
int ni = arr.size(), nj = 0, i = 0, j = 0;
long long res = 0;
std::sort(arr.begin(), arr.end());
auto get = [&]() -> long long {
if (j == nj || (i < ni && arr[i] < arr[j])) return arr[i++];
return arr[j++];
};
while ((ni-i)+(nj-j)>1) {
long long a = get(), b = get();
res += a+b;
arr[nj++] = a + b;
}
printf("%lli\n", res);
}
return 0;
}
PS: This very kind of computation is also what is needed to build an Huffman tree for entropy coding given the symbols frequency table and thus it's not a mere random exercise but it has practical applications.
(1) I'm saying "nonsense" because in Topcoder they never give problems that require 65 bits; thus it's not a genuine care about overflows, but just setting traps for novices.
Another that I think is a bad practice I saw on TC is that some problems are carefully designed so that the correct algorithm if using C++ will barely fit in the timeout limit: just use another language (and get e.g. a 2× slowdown) and you cannot solve the problem.
First of all, think simple!
When using a priority queue, the problem is easy!
In the first test case :
1 6 3 20
// after pushing to Q
1 3 6 20
// and sum two top items and pop and push!
(1 + 3) 6 20 cost = 4
(4 + 6) 20 cost = 10 + 4
(10 + 20) cost = 30 + 14
30 cost = 44
#include<iostream>
#include<queue>
using namespace std;
int main()
{
int t;
cin >> t;
while (t--) {
int n;
cin >> n;
priority_queue<long long int, vector<long long int>, greater<long long int>> q;
for (int i = 0; i < n; ++i) {
int k;
cin >> k;
q.push(k);
}
long long int sum = 0;
while (q.size() > 1) {
long long int a = q.top();
q.pop();
long long int b = q.top();
q.pop();
q.push(a + b);
sum += a + b;
}
cout << sum << "\n";
}
}
Basically we need to sort the list in desc order and then find its cost like this.
A.sort(reverse=True)
cost = 0
for i in range(len(A)):
cost += A[i] * (i+1)
return cost
I am playing with the travelling salesman problem and am looking at the version where:
the towns are points in 2d space and there are paths from every town to all others and the lengths are the distances between the points. So it's very easy to implement the naive solution where you check all permutations of n points and calculate the length of the path.
I've found however that for n >= 10 the compiler does some magic and prints a value that is certainly not the actual shortest path. I compile with the Microsoft visual studio compiler in release mode with the default settings. For values (10,30) it thinks for 30 seconds and then returns some number that seems like it could be correct but it is not (I check in different ways). And for n > 40 it calculates a result immediately and is always 2.14748e+09.
I am looking for an explanation to what does the compiler do in the different situations (the (10,30) case is really interesting). And an example where these optimizations are more useful than the program just spinning to the end of the world.
vector<pair<int,int>> points;
void min_len()
{
// n is a global variable with the number of points(towns)
double min = INT_MAX;
// there are n! permutations of n elements
for (auto j = 0; j < factorial(n); ++j)
{
double sum = 0;
for (auto i = 0; i < n - 1; ++i)
{
sum += distance_points(points[i], points[i + 1]);
}
if (sum < min)
{
min = sum;
s_path = points;
}
next_permutation(points.begin(), points.end());
}
for (auto i = 0; i < n; ++i)
{
cout << s_path[i].first << " " << s_path[i].second << endl;
}
cout << min << endl;
}
unsigned int factorial(unsigned int n)
{
int res = 1, i;
for (i = 2; i <= n; i++)
res *= i;
return res;
}
Your factorial function is overflowing. Try replacing it with one returning int64_t and see your code taking 3 years to terminate for n > 20.
constexpr uint64_t factorial(unsigned int n) {
return n ? n * factorial(n-1) : 1;
}
Also, you don't need to calculate this at all. The std::next_permutation function returns 0 when all permutations have occured (starting from sorted position).
I'm trying to understand possible optimization methods for the bubble sort algorithm. I know there are better sorting methods, but I'm just curious.
To test the efficiency I'm using std::chrono. The program sorts a 10000 number long int array 30 times and prints the average sorting time. The numbers are picked randomly(up to 10000) in every iteration. Here is the code, with no optimization:
#include <iostream>
#include <ctime>
#include <chrono>
using namespace std;
int main() {
//bubble sort
srand(time(NULL));
chrono::time_point<chrono::steady_clock> start, end;
const int n = 10000;
int i,j, last, tests = 30,arr[n];
long long total = 0;
bool out;
while (tests-->0) {
for (i = 0; i < n; i++) {
arr[i] = rand() % 1000;
}
j = n;
start = chrono::high_resolution_clock::now();
while(1){
out = 0;
for (i = 0; i < j - 1; i++) {
if (arr[i + 1] < arr[i]) {
swap(arr[i + 1], arr[i]);
out = 1;
}
}
if (!out) {
break;
}
//j--;
}
end = chrono::high_resolution_clock::now();
total += chrono::duration_cast<chrono::nanoseconds>(end - start).count();
cout << "Remaining :"<<tests << endl;
}
cout << "Average :" << total / static_cast<double>(30)/1000000000<<" seconds"; // tests(30) + nanosec -> sec
cin.sync();
cin.ignore();
return 0;
}
I get 0.17 seconds average sorting time.
If I uncomment line 47(j--;) to avoid comparing numbers already sorted I get 0.12 sorting time which is understandable.
If I remember the last position where a swap took place, I know that after that index, elements are sorted, and can thus sort up to that position in further iterations. It's better explained in the second part of this post: https://stackoverflow.com/a/16196115/1967496.
This is the code that implements the new possible optimization:
#include <iostream>
#include <ctime>
#include <chrono>
using namespace std;
int main() {
//bubble sort
srand(time(NULL));
chrono::time_point<chrono::steady_clock> start, end;
const int n = 10000;
int i,j, last, tests = 30,arr[n];
long long total = 0;
bool out;
while (tests-->0) {
for (i = 0; i < n; i++) {
arr[i] = rand() % 1000;
}
j = n;
start = chrono::high_resolution_clock::now();
while(1){
out = 0;
for (i = 0; i < j - 1; i++) {
if (arr[i + 1] < arr[i]) {
swap(arr[i + 1], arr[i]);
out = 1;
last = i;
}
}
if (!out) {
break;
}
j = last + 1;
}
end = chrono::high_resolution_clock::now();
total += chrono::duration_cast<chrono::nanoseconds>(end - start).count();
cout << "Remaining :"<<tests << endl;
}
cout << "Average :" << total / static_cast<double>(30)/1000000000<<" seconds"; // tests(30) + nanosec -> sec
cin.sync();
cin.ignore();
return 0;
}
Note lines 40 and 48. And here comes the problem: The average time is now again around 0.17 seconds.
Is there a problem in my code, or am I missing something ?
Update:
I did sorting with 10 times more numbers and get now following results:
No optimization: 19.3 seconds
First optimization(j--): 14.5 seconds
Second (supposed) optimization(j=last+1): 17.4 seconds;
From my understanding, the second method should be in any case better than the first, but the numbers tell something else.
Well... The problem is that there might not be the right or wrong answer to this question.
First of all, when you're comparing only 10000 elements, you cannot really call it an effeciency test. Try comparing much higher number of elements - maybe 500000 (although you will probably need to alocate an array dynamicaly for that).
Second of all, it might be the compiler. Compilers often try to optimize things so that the program execution will run smoother and faster.
I know there are multiple topic regarding Project Euler #8. But I am using a different approach, no STL.
#include <iostream>
using namespace std;
int main(){
char str[] = "7316717653133062491922511967442657474235534919493496983520312774506326239578318016984801869478851843858615607891129494954595017379583319528532088055111254069874715852386305071569329096329522744304355766896648950445244523161731856403098711121722383113622298934233803081353362766142828064444866452387493035890729629049156044077239071381051585930796086670172427121883998797908792274921901699720888093776657273330010533678812202354218097512545405947522435258490771167055601360483958644670632441572215539753697817977846174064955149290862569321978468622482839722413756570560574902614079729686524145351004748216637048440319989000889524345065854122758866688116427171479924442928230863465674813919123162824586178664583591245665294765456828489128831426076900422421902267105562632111110937054421750694165896040807198403850962455444362981230987879927244284909188845801561660979191338754992005240636899125607176060588611646710940507754100225698315520005593572972571636269561882670428252483600823257530420752963450";
int size = strlen(str);
int number = 1;
int max = 0;
int product = 0;
int lowerBound = 0;
int upperBound = 4;
for (int i = 0; i <= size/5; i++)
{
for (int j = lowerBound; j <= upperBound; j++)
{
number = number * str[j];
}
product = number;
number = 1;
lowerBound += 5;
upperBound += 5;
if (product > max)
{
max = product;
}
}
cout << "the largest product: " << max << endl;
return 0;
}
the answer is : 550386080, which is way too big and incorrect.
Please tell me what's wrong with my code. No advanced pointers or template technique, just control flow statement and some basic stuff.
Part of your problem is the expression
number = number * str[j];
The str[j] is an ASCII character and you are incorrectly assuming it's a numeric value in the range 0..9. A cheap way to convert a single numeric character to a number would be to say
number = number * (str[j] - '0');
That gets you closer to the correct answer but there is another problem. You are testing each index range like [0..4], [5..9], [10..14], [15..19], etc. You should instead be testing indices [0..4], [1..5], [2..6], [3..7], etc. I'll leave that for you to correct.