optimize c++ query to calculate Nmin - c++

I have run into a problem where i am trying to optimize my query which is created to calculate Nmin values for the increasing values of N and error approximation.
I am not from programming background and have just started to take it up.
This is the calculation which is inefficient as it calculates Nmin even after finding Nmin.
Now to reduce the time i did below changes reduce function call with no improvement:
#include<iostream>
#include<cmath>
#include<time.h>
#include<iomanip>
using namespace std;
double f(int);
int main(void)
{
double err;
double pi = 4.0*atan(1.0);
cout<<fixed<<setprecision(7);
clock_t start = clock();
for (int n=1;;n++)
{
if((f(n)-pi)>= 1e-6)
{
cout<<"n_min is "<< n <<"\t"<<f(n)-pi<<endl;
}
else
{
break;
}
}
clock_t stop = clock();
//double elapsed = (double)(stop - start) * 1000.0 / CLOCKS_PER_SEC; //this one in ms
cout << "time: " << (stop-start)/double(CLOCKS_PER_SEC)*1000 << endl; //this one in s
return 0;
}
double f(int n)
{
double sum=0;
for (int i=1;i<=n;i++)
{
sum += 1/(1+pow((i-0.5)/n,2));
}
return (4.0/n)*sum;
}
Is there any way to reduce the time and make the second query efficient?
Any help would be greatly appreciated.

I do not see any immediate way of optimizing the algorithm itself. You could however reduce the time significantly by not writing to the standard output for every iteration. Also, do not calculate f(n) more than once per iteration.
for (int n=1;;n++)
{
double val = f(n);
double diff = val-pi;
if(diff < 1e-6)
{
cout<<"n_min is "<< n <<"\t"<<diff<<endl;
break;
}
}
Note however that this will yield a higher n_min (increased by 1 compared to the result of your version) since we changed the condition to diff < 1e-6.

Related

C++ function to approximate sine using taylor series expansion

Hi I am trying to calculate the results of the Taylor series expansion for sine to the specified number of terms.
I am running into some problems
Your task is to implement makeSineToOrder(k)
This is templated by the type of values used in the calculation.
It must yield a function that takes a value of the specified type and
returns the sine of that value (in the specified type again)
double factorial(double long order){
#include <iostream>
#include <iomanip>
#include <cmath>
double fact = 1;
for(int i = 1; i <= num; i++){
fact *= i;
}
return fact;
}
void makeSineToOrder(long double order,long double precision = 15){
double value = 0;
for(int n = 0; n < precision; n++){
value += pow(-1.0, n) * pow(num, 2*n+1) / factorial(2*n + 1);
}
return value;
int main()
{
using namespace std;
long double pi = 3.14159265358979323846264338327950288419716939937510L;
for(int order = 1;order < 20; order++) {
auto sine = makeSineToOrder<long double>(order);
cout << "order(" << order << ") -> sine(pi) = " << setprecision(15) << sine(pi) << endl;
}
return 0;
}
I tried debugging
here is a version that at least compiles and gives some output
#include <iostream>
#include <iomanip>
#include <cmath>
using namespace std;
double factorial(double long num) {
double fact = 1;
for (int i = 1; i <= num; i++) {
fact *= i;
}
return fact;
}
double makeSineToOrder(double num, double precision = 15) {
double value = 0;
for (int n = 0; n < precision; n++) {
value += pow(-1.0, n) * pow(num, 2 * n + 1) / factorial(2 * n + 1);
}
return value;
}
int main(){
long double pi = 3.14159265358979323846264338327950288419716939937510L;
for (int order = 1; order < 20; order++) {
auto sine = makeSineToOrder(order);
cout << "order(" << order << ") -> sine(pi) = " << setprecision(15) << sine << endl;
}
return 0;
}
not sure what that odd sine(pi) was supposed to be doing
Apart the obvious syntax errors (the includes should be before your factorial header) in your code:
I see no templates in your code which your assignment clearly states to use
so I would expect template like:
<class T> T mysin(T x,int n=15){ ... }
using pow for generic datatype is not safe
because inbuild pow will use float or double instead of your generic type so you might expect rounding/casting problems or even unresolved function in case of incompatible type.
To remedy that you can rewrite the code to not use pow as its just consequent multiplication in loop so why computing pow again and again?
using factorial function is waste
you can compute it similar to pow in the same loop no need to compute the already computed multiplications again and again. Also not using template for your factorial makes the same problems as using pow
so putting all together using this formula:
along with templates and exchanging pow,factorial functions with consequent iteration I got this:
template <class T> T mysin(T x,int n=15)
{
int i;
T y=0; // result
T x2=x*x; // x^2
T xi=x; // x^i
T ii=1; // i!
if (n>0) for(i=1;;)
{
y+=xi/ii; xi*=x2; i++; ii*=i; i++; ii*=i; n--; if (!n) break;
y-=xi/ii; xi*=x2; i++; ii*=i; i++; ii*=i; n--; if (!n) break;
}
return y;
}
so factorial ii is multiplied by i+1 and i+2 every iteration and power xi is multiplied by x^2 every iteration ... the sign change is hard coded so for loop does 2 iterations per one run (that is the reason for the break;)
As you can see this does not use anything funny so you do not need any includes for this not even math ...
You might want to add x=fmod(x,6.283185307179586476925286766559) at the start of mysin in order to use more than just first period however in that case you have to ensure fmod implementation uses T or compatible type to it ... Also the 2*pi constant should be in target precision or higher
beware too big n will overflow both int and generic type T (so you might want to limit n based on used type somehow or just use it wisely).
Also note on 32bit floats you can not get better than 5 decimal places no matter what n is with this kind of computation.
Btw. there are faster and more accurate methods of computing goniometrics like Chebyshev and CORDIC

How can I test two algorithms and determine which is faster?

Whenever working on a specific problem, I may come across different solutions. I'm not sure how to choose the better of the two options. The first idea is to compute the complexity of the two solutions, but sometimes they may share the same complexity, or they may differ but the range of the input is small that the constant factor matters.
The second idea is to benchmark both solutions. However, I'm not sure how to time them using c++. I have found this question:
How to Calculate Execution Time of a Code Snippet in C++ , but I don't know how to properly deal with compiler optimizations or processor inconsistencies.
In short: is the code provided in the question above sufficient for everyday tests? is there some options that I should enable in the compiler before I run the tests? (I'm using Visual C++) How many tests should I do, and how much time difference between the two benchmarks matters?
Here is an example of a code I want to test. Which of these is faster? How can I calculate that myself?
unsigned long long fiborecursion(int rank){
if (rank == 0) return 1;
else if (rank < 0) return 0;
return fiborecursion(rank-1) + fiborecursion(rank-2);
}
double sq5 = sqrt(5);
unsigned long long fiboconstant(int rank){
return pow((1 + sq5) / 2, rank + 1) / sq5 + 0.5;
}
Using the clock from this answer
#include <iostream>
#include <chrono>
class Timer
{
public:
Timer() : beg_(clock_::now()) {}
void reset() { beg_ = clock_::now(); }
double elapsed() const {
return std::chrono::duration_cast<second_>
(clock_::now() - beg_).count(); }
private:
typedef std::chrono::high_resolution_clock clock_;
typedef std::chrono::duration<double, std::ratio<1> > second_;
std::chrono::time_point<clock_> beg_;
};
You can write a program to time both of your functions.
int main() {
const int N = 10000;
Timer tmr;
tmr.reset();
for (int i = 0; i < N; i++) {
auto value = fiborecursion(i%50);
}
double time1 = tmr.elapsed();
tmr.reset();
for (int i = 0; i < N; i++) {
auto value = fiboconstant(i%50);
}
double time2 = tmr.elapsed();
std::cout << "Recursion"
<< "\n\tTotal: " << time1
<< "\n\tAvg: " << time1 / N
<< "\n"
<< "\nConstant"
<< "\n\tTotal: " << time2
<< "\n\tAvg: " << time2 / N
<< "\n";
}
I would try compiling with no compiler optimizations (-O0) and max compiler optimizations (-O3) just to see what the differences are. It is likely that at max optimizations the compiler may eliminate the loops entirely.

Are loops really faster than recursion?

According to my professor loops are faster and more deficient than using recursion yet I came up with this c++ code that calculates the Fibonacci series using both recursion and loops and the results prove that they are very similar. So I maxed the possible input to see if there was a difference in performance and for some reason recursion clocked in better than using a loop. Anyone know why? Thanks in advanced.
Here's the code:
#include "stdafx.h"
#include "iostream"
#include <time.h>
using namespace std;
double F[200000000];
//double F[5];
/*int Fib(int num)
{
if (num == 0)
{
return 0;
}
if (num == 1)
{
return 1;
}
return Fib(num - 1) + Fib(num - 2);
}*/
double FiboNR(int n) // array of size n
{
for (int i = 2; i <= n; i++)
{
F[i] = F[i - 1] + F[i - 2];
}
return (F[n]);
}
double FibMod(int i,int n) // array of size n
{
if (i==n)
{
return F[i];
}
F[i] = F[i - 1] + F[i - 2];
return (F[n]);
}
int _tmain(int argc, _TCHAR* argv[])
{
/*cout << "----------------Recursion--------------"<<endl;
for (int i = 0; i < 36; i=i+5)
{
clock_t tStart = clock();
cout << Fib(i);
printf("Time taken: %.2fs\n", (double)(clock() - tStart) / CLOCKS_PER_SEC);
cout << " : Fib(" << i << ")" << endl;
}*/
cout << "----------------Linear--------------"<<endl;
for (int i = 0; i < 200000000; i = i + 20000000)
//for (int i = 0; i < 50; i = i + 5)
{
clock_t tStart = clock();
F[0] = 0; F[1] = 1;
cout << FiboNR(i);
printf("Time taken: %.2fs\n", (double)(clock() - tStart) / CLOCKS_PER_SEC);
cout << " : Fib(" << i << ")" << endl;
}
cout << "----------------Recursion Modified--------------" << endl;
for (int i = 0; i < 200000000; i = i + 20000000)
//for (int i = 0; i < 50; i = i + 5)
{
clock_t tStart = clock();
F[0] = 0; F[1] = 1;
cout << FibMod(0,i);
printf("Time taken: %.2fs\n", (double)(clock() - tStart) / CLOCKS_PER_SEC);
cout << " : Fib(" << i << ")" << endl;
}
std::cin.ignore();
return 0;
}
You you go by the conventional programming approach loops are faster. But there is category of languages called functional programming languages which does not contain loops. I am a big fan of functional programming and I am an avid Haskell user. Haskell is a type of functional programming language. In this instead of loops you use recursions. To implement fast recursion there is something known as tail recursion. Basically to prevent having a lot of extra info to the system stack, you write the function such a way that all the computations are stored as function parameters so that nothing needs to be stored on the stack other that the function call pointer. So once the final recursive call has been called, instead of unwinding the stack the program just needs to go to the first function call stack entry. Functional programming language compilers have an inbuilt design to deal with this. Now even non functional programming languages are implementing tail recursion.
For example consider finding the recursive solution for finding the factorial of a positive number. The basic implementation in C would be
int fact(int n)
{
if(n == 1 || n== 0)
return 1
return n*fact(n-1);
}
In the above approach, each time the stack is called n is stored in the stack so that it can be multiplied with the result of fact(n-1). This basically happens during stack unwinding. Now check out the following implementation.
int fact(int n,int result)
{
if(n == 1 || n== 0)
return result
return fact(n-1,n*result);
}
In this approach we are passing the computation result in the variable result. So in the end we directly get the answer in the variable result. The only thing you have to do is that in the initial call pass a value of 1 for the result in this case. The stack can be unwound directly to its first entry. Of course I am not sure that C or C++ allows tail recursion detection, but functional programming languages do.
Your "recursion modified" version doesn't have recursion at all.
In fact, the only thing enabling a non-recursive version that fills in exactly one new entry of the array is the for-loop in your main function -- so it is actually a solution using iteration also (props to immibis and BlastFurnace for noticing that).
But your version doesn't even do that correctly. Rather since it is always called with i == 0, it illegally reads F[-1] and F[-2]. You are lucky (?)1 the program didn't crash.
The reason you are getting correct results is that the entire F array is prefilled by the correct version.
Your attempt to calculate Fib(2000....) isn't successful anyway, since you overflow a double. Did you even try running that code?
Here's a version that works correctly (to the precision of double, anyway) and doesn't use a global array (it really is iteration vs recursion and not iteration vs memoization).
#include <cstdio>
#include <ctime>
#include <utility>
double FiboIterative(int n)
{
double a = 0.0, b = 1.0;
if (n <= 0) return a;
for (int i = 2; i <= n; i++)
{
b += a;
a = b - a;
}
return b;
}
std::pair<double,double> FiboRecursive(int n)
{
if (n <= 0) return {};
if (n == 1) return {0, 1};
auto rec = FiboRecursive(n-1);
return {rec.second, rec.first + rec.second};
}
int main(void)
{
const int repetitions = 1000000;
const int n = 100;
volatile double result;
std::puts("----------------Iterative--------------");
std::clock_t tStart = std::clock();
for( int i = 0; i < repetitions; ++i )
result = FiboIterative(n);
std::printf("[%d] = %f\n", n, result);
std::printf("Time taken: %.2f us\n", (std::clock() - tStart) / 1.0 / CLOCKS_PER_SEC);
std::puts("----------------Recursive--------------");
tStart = std::clock();
for( int i = 0; i < repetitions; ++i )
result = FiboRecursive(n).second;
std::printf("[%d] = %f\n", n, result);
std::printf("Time taken: %.2f us\n", (std::clock() - tStart) / 1.0 / CLOCKS_PER_SEC);
return 0;
}
--
1Arguably anything that hides a bug is actually unlucky.
I don't think this is not a good question. But maybe the answer why is somehow interesting.
At first let me say that generally the statement is probably true. But ...
Questions about performance of c++ programs are very localized. It's never possible to give a good general answer. Every example should be profiled an analyzed separately. It involves lots of technicalities. c++ compilers are allowed to modify program practically as they wish as long as they don't produce visible side effects (whatever precisely that means). So as long as your computation gives the same result is fine. This technically allows to transform one version of your program into an equivalent even from recursive version into the loop based and vice versa. So it depends on compiler optimizations and compiler effort.
Also, to compare one version to another you would need to prove that the versions you compare are actually equivalent.
It might also happen that somehow a recursive implementation of algorithm is faster than a loop based one if it's easier to optimize for the compiler. Usually iterative versions are more complex, and generally the simpler the code is, the easier it is for the compiler to optimize because it can make assumptions about invariants, etc.

Calculating Running Time of Binary Search

The following Binary Search program is returning a running time of 0 milliseconds using GetTickCount() no matter how big the search item is set in the given list of values.
Is there any other way to get the running time for comparison?
Here's the code :
#include <iostream>
#include <windows.h>
using namespace std;
int main(int argc, char **argv)
{
long int i = 1, max = 10000000;
long int *data = new long int[max];
long int initial = 1;
long int final = max, mid, loc = -5;
for(i = 1; i<=max; i++)
{
data[i] = i;
}
int range = final - initial + 1;
long int search_item = 8800000;
cout<<"Search Item :- "<<search_item<<"\n";
cout<<"-------------------Binary Search-------------------\n";
long int start = GetTickCount();
cout<<"Start Time : "<<start<<"\n";
while(initial<=final)
{
mid=(initial+final)/2;
if(data[mid]==search_item)
{
loc=mid;
break;
}
if(search_item<data[mid])
final=mid-1;
if(search_item>data[mid])
initial=mid+1;
}
long int end = GetTickCount();
cout<<"End Time : "<<end<<"\n";
cout << "time: " << double(end - start)<<" milliseconds \n";
if(loc==-5)
cout<<" Required number not found "<<endl;
else
cout<<" Required number is found at index "<<loc<<endl;
return 0;
}
Your code looks like this:
int main()
{
// Some code...
while (some_condition)
{
// Some more code...
// Print timing result
return 0;
}
}
That's why your code prints zero time, you only do one iteration of the loop then you exit the program.
Try to use the clock_t object from the time.h header:
clock_t START, END;
START = clock();
**YOUR CODE GOES HERE**
END = clock();
float clocks = END - START;
cout <<"running time : **" << clocks/CLOCKS_PER_SEC << "** seconds" << endl;
CLOCKS_PER_SEC is a defined var to convert from clock ticks to seconds.
https://msdn.microsoft.com/en-us/library/windows/desktop/ms724408(v=vs.85).aspx
This article says that result of GetTickCount will wrap to zero if you system runs for 49.7 days.
You can find here: Easily measure elapsed time how to measure time in C++.
You can use time.h header
and do something like this in your code :
clock_t Start, Stop;
double sec;
Start = clock();
//call your BS function
Stop = clock();
Sec = ((double) (Stop - Start) / CLOCKS_PER_SEC);
and print the sec!
I hope this helps you!
The complexity of binary search is log2(N), it's about 23 for N = 10000000.
I think its not enough to mesure in realtime scale and even clock.
In this case you should use unsigned long long __rdtsc(), that returns number of processor ticks from last reset. Put this before and after your binary search and place cout << start; after obtaining end time. Overwise time of output would be included.
There is also memory corruption around data array. Index in C runs from 0 to size - 1, so thereis no data[max] element.
And delete [] data; before calling return.

Is pow() function slower than simple multiplication when exponent is integer? [duplicate]

This question already has answers here:
What is more efficient? Using pow to square or just multiply it with itself?
(7 answers)
Closed 8 years ago.
I have got one question: for calculating simple integer powers of a double, is pow() function slower than simple multiplication? such as for 2.71828^4, is pow(2.71828, double(4)) slower than the simple multiplication using for loop?
I have tried to compare the durations for both approaches, but the durations are not stable, sometimes pow() wins and sometimes simple multiplication wins. Can anyone give me an confirmatory answer?
my code is as followed:
#include <iostream>
#include <cmath>
#include <ctime>
using namespace std;
double myFunction(double a) {
double c = 1;
for (int i = 1; i <= 4; i++)
c *= a;
return c;
}
int main() {
// Calculate the time used by pow function
clock_t start = clock();
for (double i = 0; i < 1000000; i = i + 0.001)
pow(i, 4);
clock_t durationP = double(clock() - start);
cout << "the duration for pow function is: " << durationP << "s" << endl;
// Calculate the time used by simple multiplication
start = clock();
for (double i = 0; i < 1000000; i = i + 0.001)
myFunction(i);
double durationS = double(clock() - start);
cout << "the duration for simple multiplication is:" << durationS << "s"
<< endl;
}
thanks a lot!
Yes, pow is slower than multiplication, multiplication is slower than addition. Tradeoff is, for simple power like pow(x, 2), use x*x instead