C++ optimizer call function after delay - c++

I'm trying to benchmark recursive fibonacci sequence calculator on C++. But surprisingly program outputs 0 nanoseconds, and start calculation after printing result. (CPU usage increase after printing 0 nanoseconds)
I think this is optimization feature of the compiler.
#include <iostream>
#include <chrono>
int fib2(int n) {
return (n < 2) ? n : fib2(n - 1) + fib2(n - 2);
}
int main(int argc, char* argv[])
{
auto tbegin = std::chrono::high_resolution_clock::now();
int a = fib2(50);
auto tend = std::chrono::high_resolution_clock::now();
std::cout << (tend - tbegin).count() << " nanoseconds" << std::endl;
std::cout << "fib => " << a << std::endl;
}
Output:
0 nanoseconds
Is it feature? If yes, how can I disable this feature?

The problem is that the result of this function called with a value of 50 doesn't fit to the int type, it's just too big. Try using int64_t instead.
Live demo
Note that I replaced the original Fibbonachi function with a more optimized one, as the execution took too long and the online tool cuts off the execution after some period of time. That is not a fault of the program or the code, it's just a protection of the online tool.

Related

Chrono C++ timings not correct

I'm just comparing the speed of a couple Fibonacci functions, one gives an output almost immediately and reads it got done in 500 nanoseconds, while the other, depending on the depth, may sit there loading for many seconds, yet when it is done, it will read that it took it only 100 nanoseconds... After I just sat there and waited like 20 seconds for it.
It's not a big deal as I can prove the other is slower just with raw human perception, but why would chrono not be working? Something to do with recursion?
PS I know that fibonacci2() doesn't give the correct output on odd numbered depths, I'm just testing some things and the output is actually just there so the compiler doesn't optimize it away or something. Go ahead and just copy this code and you'll see fibonacci2() immediately output but you'll have to wait like 5 seconds for fibonacci(). Thank you.
#include <iostream>
#include <chrono>
int fibonacci2(int depth) {
static int a = 0;
static int b = 1;
if (b > a) {
a += b; //std::cout << a << '\n';
}
else {
b += a; //std::cout << b << '\n';
}
if (depth > 1) {
fibonacci2(depth - 1);
}
return a;
}
int fibonacci(int n) {
if (n <= 1) {
return n;
}
return fibonacci(n - 1) + fibonacci(n - 2);
}
int main() {
int f = 0;
auto start2 = std::chrono::steady_clock::now();
f = fibonacci2(44);
auto stop2 = std::chrono::steady_clock::now();
std::cout << f << '\n';
auto duration2 = std::chrono::duration_cast<std::chrono::nanoseconds>(stop2 - start2);
std::cout << "faster function time: " << duration2.count() << '\n';
auto start = std::chrono::steady_clock::now();
f = fibonacci(44);
auto stop = std::chrono::steady_clock::now();
std::cout << f << '\n';
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(stop - start);
std::cout << "way slower function with incorrect time: " << duration.count() << '\n';
}
I don't know what compiler you are using and with which compiler options, but I tested x64 msvc v19.28 using /O2 in godbolt. Here the compiled instructions are reordered such that it queries the perf_counter twice before invoking the fibonacci(int) function, which in code would look like
auto start = ...;
auto stop = ...;
f = fibonacci(44);
A solution to disallow this reordering might be to use a atomic_thread_fence just before and after the fibonacci function call.
As Mestkon answered the compiler can reorder your code.
Examples of how to prevent the compiler from reordering Memory Ordering - Compile Time Memory Barrier
It would be beneficial in the future if you provided information on what compiler you were using.
gcc 7.5 with -O2 for example does not reorder the timer instructions in this given scenario.

C++ process time [duplicate]

This question already has answers here:
Measuring execution time of a function in C++
(14 answers)
Closed 4 years ago.
In C# I would fire up the Stopwatch class to do some quick-and-dirty timing of how long certain methods take.
What is the equivalent of this in C++? Is there a high precision timer built in?
I used boost::timer for measuring the duration of an operation. It provides a very easy way to do the measurement, and at the same time being platform independent. Here is an example:
boost::timer myTimer;
doOperation();
std::cout << myTimer.elapsed();
P.S. To overcome precision errors, it would be great to measure operations that take a few seconds. Especially when you are trying to compare several alternatives. If you want to measure something that takes very little time, try putting it into a loop. For example run the operation 1000 times, and then divide the total time by 1000.
I've implemented a timer for situations like this before: I actually ended up with a class with two different implemations, one for Windows and one for POSIX.
The reason was that Windows has the QueryPerformanceCounter() function which gives you access to a very accurate clock which is ideal for such timings.
On POSIX however this isn't available so I just used boost.datetime's classes to store the start and end times then calculated the duration from those. It offers a "high resolution" timer but the resolution is undefined and varies from platform to platform.
I use my own version of Python's time_it function. The advantage of this function is that it repeats a computation as many times as necessary to obtain meaningful results. If the computation is very fast, it will be repeated many times. In the end you obtain the average time of all the repetitions. It does not use any non-standard functionality:
#include <ctime>
double clock_diff_to_sec(long clock_diff)
{
return double(clock_diff) / CLOCKS_PER_SEC;
}
template<class Proc>
double time_it(Proc proc, int N=1) // returns time in microseconds
{
std::clock_t const start = std::clock();
for(int i = 0; i < N; ++i)
proc();
std::clock_t const end = std::clock();
if(clock_diff_to_sec(end - start) < .2)
return time_it(proc, N * 5);
return clock_diff_to_sec(end - start) * (1e6 / N);
}
The following example uses the time_it function to measure the performance of different STL containers:
void dummy_op(int i)
{
if(i == -1)
std::cout << i << "\n";
}
template<class Container>
void test(Container const & c)
{
std::for_each(c.begin(), c.end(), &dummy_op);
}
template<class OutIt>
void init(OutIt it)
{
for(int i = 0; i < 1000; ++i)
*it = i;
}
int main( int argc, char ** argv )
{
{
std::vector<int> c;
init(std::back_inserter(c));
std::cout << "vector: "
<< time_it(boost::bind(&test<std::vector<int> >, c)) << "\n";
}
{
std::list<int> c;
init(std::back_inserter(c));
std::cout << "list: "
<< time_it(boost::bind(&test<std::list<int> >, c)) << "\n";
}
{
std::deque<int> c;
init(std::back_inserter(c));
std::cout << "deque: "
<< time_it(boost::bind(&test<std::deque<int> >, c)) << "\n";
}
{
std::set<int> c;
init(std::inserter(c, c.begin()));
std::cout << "set: "
<< time_it(boost::bind(&test<std::set<int> >, c)) << "\n";
}
{
std::tr1::unordered_set<int> c;
init(std::inserter(c, c.begin()));
std::cout << "unordered_set: "
<< time_it(boost::bind(&test<std::tr1::unordered_set<int> >, c)) << "\n";
}
}
In case anyone is curious here is the output I get (compiled with VS2008 in release mode):
vector: 8.7168
list: 27.776
deque: 91.52
set: 103.04
unordered_set: 29.76
You can use the ctime library to get the time in seconds. Getting the time in milliseconds is implementation-specific. Here is a discussion exploring some ways to do that.
See also: How to measure time in milliseconds using ANSI C?
High-precision timers are platform-specific and so aren't specified by the C++ standard, but there are libraries available. See this question for a discussion.
I humbly submit my own micro-benchmarking mini-library (on Github). It's super simple -- the only advantage it has over rolling your own is that it already has the high-performance timer code implemented for Windows and Linux, and abstracts away the annoying boilerplate.
Just pass in a function (or lambda), the number of times it should be called per test run (default: 1), and the number of test runs (default: 100). The fastest test run (measured in fractional milliseconds) is returned:
// Example that times the compare-and-swap atomic operation from C++11
// Sample GCC command: g++ -std=c++11 -DNDEBUG -O3 -lrt main.cpp microbench/systemtime.cpp -o bench
#include "microbench/microbench.h"
#include <cstdio>
#include <atomic>
int main()
{
std::atomic<int> x(0);
int y = 0;
printf("CAS takes %.4fms to execute 100000 iterations\n",
moodycamel::microbench(
[&]() { x.compare_exchange_strong(y, 0); }, /* function to benchmark */
100000, /* iterations per test run */
100 /* test runs */
)
);
// Result: Clocks in at 1.2ms (12ns per CAS operation) in my environment
return 0;
}
#include <time.h>
clock_t start, end;
start = clock();
//Do stuff
end = clock();
printf("Took: %f\n", (float)((end - start) / (float)CLOCKS_PER_SEC));
This might be an OS-dependent issue rather than a language issue.
If you're on Windows then you can access a millisecond 10- to 16-millisecond timer through GetTickCount() or GetTickCount64(). Just call it once at the start and once at the end, and subtract.
That was what I used before if I recall correctly. The linked page has other options as well.
You can find useful this class.
Using RAII idiom, it prints the text given in construction when destructor is called, filling elapsed time placeholder with the proper value.
Example of use:
int main()
{
trace_elapsed_time t("Elapsed time: %ts.\n");
usleep(1.005 * 1e6);
}
Output:
Elapsed time: 1.00509s.

std::chrono::duration_cast - any unit more precise than nano-second?

I wanted to ask that how can I calculate time in any units like picosecond, femtosecond and up to more precision. I am calculating running times for functions and using nanoseconds, the running time of functions is returning 0 when i use millisecond or nanosecond. I think Chrono library supports only till nanosecond, It was the most precise which appeared when I pressed ctrl+space after typing chrono::
int main()
{
auto t1 = std::chrono::high_resolution_clock::now();
f();
auto t2 = std::chrono::high_resolution_clock::now();
std::cout << "f() took "
<< std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count()
<< " milliseconds\n";
}
code source: http://en.cppreference.com/w/cpp/chrono/duration/duration_cast
Thanks.
You can calculate time in more precise durations (picoseconds...)
www.stroustrup.com/C++11FAQ.html
See the following definition:
typedef ratio<1, 1000000000000> pico;
then use:
duration<double, pico> d{1.23}; //...1.23 picoseconds
UPDATE
Your question has three parts:
How to use std::chrono and make calculation with std::chrono::duration
How to get higher precision timestamps
How to do performance mesurments of your code
Above I partially answered only for the first question (how to define "picoseconds" duration). Consider the following code as example:
#include <chrono>
#include <iostream>
#include <typeinfo>
using namespace std;
using namespace std::chrono;
void f()
{
enum { max_count = 10000 };
cout << '|';
for(volatile size_t count = 0; count != max_count; ++count)
{
if(!(count % (max_count / 10)))
cout << '.';
}
cout << "|\n";
}
int main()
{
typedef std::ratio<1l, 1000000000000l> pico;
typedef duration<long long, pico> picoseconds;
typedef std::ratio<1l, 1000000l> micro;
typedef duration<long long, micro> microseconds;
const auto t1 = high_resolution_clock::now();
enum { number_of_test_cycles = 10 };
for(size_t count = 0; count != number_of_test_cycles; ++count)
{
f();
}
const auto t2 = high_resolution_clock::now();
cout << number_of_test_cycles << " times f() took:\n"
<< duration_cast<milliseconds>(t2 - t1).count() << " milliseconds\n"
<< duration_cast<microseconds>(t2 - t1).count() << " microseconds\n"
<< duration_cast<picoseconds>(t2 - t1).count() << " picoseconds\n";
}
It produces this output:
$ ./test
|..........|
|..........|
|..........|
|..........|
|..........|
|..........|
|..........|
|..........|
|..........|
|..........|
10 times f() took:
1 milliseconds
1084 microseconds
1084000000 picoseconds
As you see in order to get 1 millisecond result I had to repeat f() 10 times. Repeating your test is general approach when your timer doesn't have enough precision. There is one problem associated with repetition - it's not neccessary that repeating your test N times takes proportianal period of time. You need to prove it first.
Another thing - although I can make calculation using picoseconds durations my high_resolution_timer can't give me higher precision than microseconds.
To get higher precision you can use timestamp counter, see wiki/Time_Stamp_Counter - but this is tricky and platform specific.
A "standard" PC has a resolution of around 100 nanoseconds, so trying to measure time at resolutions greater than that is not really possible unless you have custom hardware of some kind. See How precise is the internal clock of a modern PC? for a related question and check out the second answer: https://stackoverflow.com/a/2615977/1169863.

What is the best way to time how long functions take in C++? [duplicate]

This question already has answers here:
Measuring execution time of a function in C++
(14 answers)
Closed 4 years ago.
In C# I would fire up the Stopwatch class to do some quick-and-dirty timing of how long certain methods take.
What is the equivalent of this in C++? Is there a high precision timer built in?
I used boost::timer for measuring the duration of an operation. It provides a very easy way to do the measurement, and at the same time being platform independent. Here is an example:
boost::timer myTimer;
doOperation();
std::cout << myTimer.elapsed();
P.S. To overcome precision errors, it would be great to measure operations that take a few seconds. Especially when you are trying to compare several alternatives. If you want to measure something that takes very little time, try putting it into a loop. For example run the operation 1000 times, and then divide the total time by 1000.
I've implemented a timer for situations like this before: I actually ended up with a class with two different implemations, one for Windows and one for POSIX.
The reason was that Windows has the QueryPerformanceCounter() function which gives you access to a very accurate clock which is ideal for such timings.
On POSIX however this isn't available so I just used boost.datetime's classes to store the start and end times then calculated the duration from those. It offers a "high resolution" timer but the resolution is undefined and varies from platform to platform.
I use my own version of Python's time_it function. The advantage of this function is that it repeats a computation as many times as necessary to obtain meaningful results. If the computation is very fast, it will be repeated many times. In the end you obtain the average time of all the repetitions. It does not use any non-standard functionality:
#include <ctime>
double clock_diff_to_sec(long clock_diff)
{
return double(clock_diff) / CLOCKS_PER_SEC;
}
template<class Proc>
double time_it(Proc proc, int N=1) // returns time in microseconds
{
std::clock_t const start = std::clock();
for(int i = 0; i < N; ++i)
proc();
std::clock_t const end = std::clock();
if(clock_diff_to_sec(end - start) < .2)
return time_it(proc, N * 5);
return clock_diff_to_sec(end - start) * (1e6 / N);
}
The following example uses the time_it function to measure the performance of different STL containers:
void dummy_op(int i)
{
if(i == -1)
std::cout << i << "\n";
}
template<class Container>
void test(Container const & c)
{
std::for_each(c.begin(), c.end(), &dummy_op);
}
template<class OutIt>
void init(OutIt it)
{
for(int i = 0; i < 1000; ++i)
*it = i;
}
int main( int argc, char ** argv )
{
{
std::vector<int> c;
init(std::back_inserter(c));
std::cout << "vector: "
<< time_it(boost::bind(&test<std::vector<int> >, c)) << "\n";
}
{
std::list<int> c;
init(std::back_inserter(c));
std::cout << "list: "
<< time_it(boost::bind(&test<std::list<int> >, c)) << "\n";
}
{
std::deque<int> c;
init(std::back_inserter(c));
std::cout << "deque: "
<< time_it(boost::bind(&test<std::deque<int> >, c)) << "\n";
}
{
std::set<int> c;
init(std::inserter(c, c.begin()));
std::cout << "set: "
<< time_it(boost::bind(&test<std::set<int> >, c)) << "\n";
}
{
std::tr1::unordered_set<int> c;
init(std::inserter(c, c.begin()));
std::cout << "unordered_set: "
<< time_it(boost::bind(&test<std::tr1::unordered_set<int> >, c)) << "\n";
}
}
In case anyone is curious here is the output I get (compiled with VS2008 in release mode):
vector: 8.7168
list: 27.776
deque: 91.52
set: 103.04
unordered_set: 29.76
You can use the ctime library to get the time in seconds. Getting the time in milliseconds is implementation-specific. Here is a discussion exploring some ways to do that.
See also: How to measure time in milliseconds using ANSI C?
High-precision timers are platform-specific and so aren't specified by the C++ standard, but there are libraries available. See this question for a discussion.
I humbly submit my own micro-benchmarking mini-library (on Github). It's super simple -- the only advantage it has over rolling your own is that it already has the high-performance timer code implemented for Windows and Linux, and abstracts away the annoying boilerplate.
Just pass in a function (or lambda), the number of times it should be called per test run (default: 1), and the number of test runs (default: 100). The fastest test run (measured in fractional milliseconds) is returned:
// Example that times the compare-and-swap atomic operation from C++11
// Sample GCC command: g++ -std=c++11 -DNDEBUG -O3 -lrt main.cpp microbench/systemtime.cpp -o bench
#include "microbench/microbench.h"
#include <cstdio>
#include <atomic>
int main()
{
std::atomic<int> x(0);
int y = 0;
printf("CAS takes %.4fms to execute 100000 iterations\n",
moodycamel::microbench(
[&]() { x.compare_exchange_strong(y, 0); }, /* function to benchmark */
100000, /* iterations per test run */
100 /* test runs */
)
);
// Result: Clocks in at 1.2ms (12ns per CAS operation) in my environment
return 0;
}
#include <time.h>
clock_t start, end;
start = clock();
//Do stuff
end = clock();
printf("Took: %f\n", (float)((end - start) / (float)CLOCKS_PER_SEC));
This might be an OS-dependent issue rather than a language issue.
If you're on Windows then you can access a millisecond 10- to 16-millisecond timer through GetTickCount() or GetTickCount64(). Just call it once at the start and once at the end, and subtract.
That was what I used before if I recall correctly. The linked page has other options as well.
You can find useful this class.
Using RAII idiom, it prints the text given in construction when destructor is called, filling elapsed time placeholder with the proper value.
Example of use:
int main()
{
trace_elapsed_time t("Elapsed time: %ts.\n");
usleep(1.005 * 1e6);
}
Output:
Elapsed time: 1.00509s.

High Resolution Timing Part of Your Code

I want to measure the speed of a function within a loop. But why my way of doing it always print "0" instead of high-res timing with 9 digits decimal precision (i.e. in nano/micro seconds)?
What's the correct way to do it?
#include <iomanip>
#include <iostream>
#include <time.h>
int main() {
for (int i = 0; i <100; i++) {
std::clock_t startTime = std::clock();
// a very fast function in the middle
cout << "Time: " << setprecision(9) << (clock() - startTime + 0.00)/CLOCKS_PER_SEC << endl;
}
return 0;
}
Related Questions:
How to overcome clock()'s low resolution
High Resolution Timer with C++ and linux
Equivalent of Windows’ QueryPerformanceCounter on OSX
Move your time calculation functions outside for () { .. } statement then devide total execution time by the number of operations in your testing loop.
#include <iostream>
#include <ctime>
#define NUMBER 10000 // the number of operations
// get the difference between start and end time and devide by
// the number of operations
double diffclock(clock_t clock1, clock_t clock2)
{
double diffticks = clock1 - clock2;
double diffms = (diffticks) / (CLOCKS_PER_SEC / NUMBER);
return diffms;
}
int main() {
// start a timer here
clock_t begin = clock();
// execute your functions several times (at least 10'000)
for (int i = 0; i < NUMBER; i++) {
// a very fast function in the middle
func()
}
// stop timer here
clock_t end = clock();
// display results here
cout << "Execution time: " << diffclock(end, begin) << " ms." << endl;
return 0;
}
Note: std::clock() lacks sufficient precision for profiling. Reference.
A few pointers:
I would be careful with the optimizer, it might throw all your code if I will think that it doesn't do anything.
You might want to run the loop 100000 times.
Before doing the total time calc store the current time in a variable.
Run your program several times.
If you need higher resolution, the only way to go is platform dependent.
On Windows, check out the QueryPerformanceCounter/QueryPerformanceFrequency API's.
On Linux, look up clock_gettime().
See a question I asked about the same thing: apparently clock()'s resolution is not guaranteed to be so high.
C++ obtaining milliseconds time on Linux -- clock() doesn't seem to work properly
Try gettimeofday function, or boost
If you need platform independence you need to use something like ACE_High_Res_Timer (http://www.dre.vanderbilt.edu/Doxygen/5.6.8/html/ace/a00244.html)
You might want to look into using openMp.
#include <omp.h>
int main(int argc, char* argv[])
{
double start = omp_get_wtime();
// code to be checked
double end = omp_get_wtime();
double result = end - start;
return 0;
}