I'm just comparing the speed of a couple Fibonacci functions, one gives an output almost immediately and reads it got done in 500 nanoseconds, while the other, depending on the depth, may sit there loading for many seconds, yet when it is done, it will read that it took it only 100 nanoseconds... After I just sat there and waited like 20 seconds for it.
It's not a big deal as I can prove the other is slower just with raw human perception, but why would chrono not be working? Something to do with recursion?
PS I know that fibonacci2() doesn't give the correct output on odd numbered depths, I'm just testing some things and the output is actually just there so the compiler doesn't optimize it away or something. Go ahead and just copy this code and you'll see fibonacci2() immediately output but you'll have to wait like 5 seconds for fibonacci(). Thank you.
#include <iostream>
#include <chrono>
int fibonacci2(int depth) {
static int a = 0;
static int b = 1;
if (b > a) {
a += b; //std::cout << a << '\n';
}
else {
b += a; //std::cout << b << '\n';
}
if (depth > 1) {
fibonacci2(depth - 1);
}
return a;
}
int fibonacci(int n) {
if (n <= 1) {
return n;
}
return fibonacci(n - 1) + fibonacci(n - 2);
}
int main() {
int f = 0;
auto start2 = std::chrono::steady_clock::now();
f = fibonacci2(44);
auto stop2 = std::chrono::steady_clock::now();
std::cout << f << '\n';
auto duration2 = std::chrono::duration_cast<std::chrono::nanoseconds>(stop2 - start2);
std::cout << "faster function time: " << duration2.count() << '\n';
auto start = std::chrono::steady_clock::now();
f = fibonacci(44);
auto stop = std::chrono::steady_clock::now();
std::cout << f << '\n';
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(stop - start);
std::cout << "way slower function with incorrect time: " << duration.count() << '\n';
}
I don't know what compiler you are using and with which compiler options, but I tested x64 msvc v19.28 using /O2 in godbolt. Here the compiled instructions are reordered such that it queries the perf_counter twice before invoking the fibonacci(int) function, which in code would look like
auto start = ...;
auto stop = ...;
f = fibonacci(44);
A solution to disallow this reordering might be to use a atomic_thread_fence just before and after the fibonacci function call.
As Mestkon answered the compiler can reorder your code.
Examples of how to prevent the compiler from reordering Memory Ordering - Compile Time Memory Barrier
It would be beneficial in the future if you provided information on what compiler you were using.
gcc 7.5 with -O2 for example does not reorder the timer instructions in this given scenario.
Related
I want to measure the time it takes to call a function.
Here is the code:
for (int i = 0; i < 5; ++i) {
std::cout << "Pass : " << i << "\n";
const auto t0 = std::chrono::high_resolution_clock::now();
system1.euler_intregration(0.0166667);
const auto t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() << "\n";
but the compiler keeps optimising the loop so time is not measured and returns zero.
I have tried using asm("") and __asm__("") as advised here but nothing works for me.
I must admit that I don't really know how these asm() functions works so I might be using them in the wrong way.
In the following code, I have 2 nested for loops. The second one swaps the order of the for loops, and runs significantly faster.
Is this purely a cache locality issue (the first code loops over a vector many times, whereas the second code loops over the vector once), or is there something else that I'm not understanding?
int main()
{
using namespace std::chrono;
auto n = 1 << 12;
vector<int> v(n);
high_resolution_clock::time_point t1 = high_resolution_clock::now();
for(int i = 0; i < (1 << 16); ++i)
{
for(const auto val : v) i & val;
}
high_resolution_clock::time_point t2 = high_resolution_clock::now();
duration<double> time_span = duration_cast<duration<double>>(t2 - t1);
std::cout << "It took me " << time_span.count() << " seconds.";
std::cout << std::endl;
t1 = high_resolution_clock::now();
for(const auto val : v)
{
for(int i = 0; i < (1 << 16); ++i) i & val;
}
t2 = high_resolution_clock::now();
time_span = duration_cast<duration<double>>(t2 - t1);
std::cout << "It took me " << time_span.count() << " seconds.";
std::cout << std::endl;
}
As written, the second loop needs to read each val from vector v only once. The first version needs to read each val from vector v once in the inner loop for every i, so in total 65536 times.
So without any optimisation, this will make the second loop several times faster. With optimisation turned on high enough, the compiler will figure out that all these calculations achieve nothing, and are unnecessary, and throw them all away. Your execution times will then go down to zero.
If you change the code to do something with the results (like adding up all values i & val, then printing the total), a really good compiler may figure out that both pieces of code produce the same result and use the faster method for both cases.
I'm trying to benchmark recursive fibonacci sequence calculator on C++. But surprisingly program outputs 0 nanoseconds, and start calculation after printing result. (CPU usage increase after printing 0 nanoseconds)
I think this is optimization feature of the compiler.
#include <iostream>
#include <chrono>
int fib2(int n) {
return (n < 2) ? n : fib2(n - 1) + fib2(n - 2);
}
int main(int argc, char* argv[])
{
auto tbegin = std::chrono::high_resolution_clock::now();
int a = fib2(50);
auto tend = std::chrono::high_resolution_clock::now();
std::cout << (tend - tbegin).count() << " nanoseconds" << std::endl;
std::cout << "fib => " << a << std::endl;
}
Output:
0 nanoseconds
Is it feature? If yes, how can I disable this feature?
The problem is that the result of this function called with a value of 50 doesn't fit to the int type, it's just too big. Try using int64_t instead.
Live demo
Note that I replaced the original Fibbonachi function with a more optimized one, as the execution took too long and the online tool cuts off the execution after some period of time. That is not a fault of the program or the code, it's just a protection of the online tool.
I am writing an in-depth test program for a data structure I had to write for a class. I am trying to time how long it takes functions to execute and store them in an array for later printing. To double check that it was working I decided to print it immediately, and I found out it is not working.
Here is the code where I get the times and store them in an array that is in a struct.
void test1(ArrayLinkedBag<ItemType> &bag,TestAnalytics &analytics){
clock_t totalStart;
clock_t incrementalStart;
clock_t stop; //Both timers stop at the same time;
// Start TEST 1
totalStart = clock();
bag.debugPrint();
cout << "Bag Should Be Empty, Checking..." << endl;
incrementalStart = clock();
checkEmpty<ItemType>(bag);
stop = clock();
analytics.test1Times[0] = analytics.addTimes(incrementalStart,stop);
analytics.test1Times[1] = analytics.addTimes(totalStart,stop);
cout << analytics.test1Times[0] << setprecision(5) << "ms" << endl;
std::cout << "Time: "<< setprecision(5) << (stop - totalStart) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
cout << "===========================================" << endl; //So I can find the line easier
}
Here is the code where I am doing the calculation that I am putting in the array, this function is located in a TestAnalytics struct
double addTimes(double start, double stop){
return (stop - start)/ (double)(CLOCKS_PER_SEC/1000);
}
Here is a snippet of the output I am getting:
Current Head: -1
Current Size: 0
Cell: 1, Index: 0, Item: 6317568, Next Index: -2
Cell: 2, Index: 1, Item: 4098, Next Index: -2
Cell: 3, Index: 2, Item: 6317544, Next Index: -2
Cell: 4, Index: 3, Item: -683175280, Next Index: -2
Cell: 5, Index: 4, Item: 4201274, Next Index: -2
Cell: 6, Index: 5, Item: 6317536, Next Index: -2
Bag Should Be Empty, Checking...
The Bag Is Empty
0ms
Time: 0 ms
===========================================
I am trying to calculate the time as per a different post on this site.
I am using clang compiler on an UNIX system. Is it possible that the number is still too small to show above 0?
Unless you're stuck with an old (pre-C++ 11) compiler/library, I'd use the functions from the <chrono> header:
template <class ItemType>
void test1(ArrayLinkedBag<ItemType> &bag){
using namespace std::chrono;
auto start = high_resolution_clock::now();
bag.debugPrint();
auto first = high_resolution_clock::now();
checkEmpty(bag);
auto stop = high_resolution_clock::now();
std::cout << " first time: " << duration_cast<microseconds>(first - start).count() << " us\n";
std::cout << "second time: " << duration_cast<microseconds>(stop - start).count() << " us\n";
}
Some parts are a bit verbose (to put it nicely) but it still works reasonably well. duration_cast supports difference types down to (at least) nanoseconds, which is typically sufficient for timing even relatively small/fast pieces of code (though it's not guaranteed that it uses a timer with nanosecond precision).
In addition to Jerry's good answer (which I've upvoted), I wanted to add just a little more information that might be helpful.
For timing I recommend steady_clock over high_resolution_clock because steady_clock is guaranteed to not be adjusted (especially backwards) during your timing. Now on Visual Studio and clang, this can't possibly happen because high_resolution_clock and steady_clock are exactly the same type. However if you're using gcc, high_resolution_clock is the same type as system_clock, which is subject to being adjusted at any time (say by an NTP correction).
But if you use steady_clock, then on every platform you have a stop-watch-like timer: Not good for telling you the time of day, but not subject to being corrected at an inopportune moment.
Also, if you use my free, open-source, header-only <chrono> extension library, it can stream out durations in a much more friendly manner, without having to use duration_cast nor .count(). It will print out the duration units right along with the value.
Finally, if you call steady_clock::now() multiple times in a row (with nothing in between), and print out that difference, then you can get a feel for how precisely your implementation is able to time things. Can it time something as short as femtoseconds? Probably not. Is it as coarse as milliseconds? We hope not.
Putting this all together, the following program was compiled like this:
clang++ test.cpp -std=c++14 -O3 -I../date/include
The program:
#include "date/date.h"
#include <iostream>
int
main()
{
using namespace std::chrono;
using date::operator<<;
for (int i = 0; i < 100; ++i)
{
auto t0 = steady_clock::now();
auto t1 = steady_clock::now();
auto t2 = steady_clock::now();
auto t3 = steady_clock::now();
auto t4 = steady_clock::now();
auto t5 = steady_clock::now();
auto t6 = steady_clock::now();
std::cout << t1-t0 << '\n';
std::cout << t2-t1 << '\n';
std::cout << t3-t2 << '\n';
std::cout << t4-t3 << '\n';
std::cout << t5-t4 << '\n';
std::cout << t6-t5 << '\n';
}
}
And output for me on macOS:
150ns
80ns
69ns
53ns
63ns
64ns
88ns
54ns
66ns
66ns
59ns
56ns
59ns
69ns
76ns
74ns
73ns
73ns
64ns
60ns
58ns
...
This question already has answers here:
What are the uses of std::chrono::high_resolution_clock?
(2 answers)
Closed 6 years ago.
So I was trying to use std::chrono::high_resolution_clock to time how long something takes to executes. I figured that you can just find the difference between the start time and end time...
To check my approach works, I made the following program:
#include <iostream>
#include <chrono>
#include <vector>
void long_function();
int main()
{
std::chrono::high_resolution_clock timer;
auto start_time = timer.now();
long_function();
auto end_time = timer.now();
auto diff_millis = std::chrono::duration_cast<std::chrono::duration<int, std::milli>>(end_time - start_time);
std::cout << "It took " << diff_millis.count() << "ms" << std::endl;
return 0;
}
void long_function()
{
//Should take a while to execute.
//This is calculating the first 100 million
//fib numbers and storing them in a vector.
//Well, it doesn't actually, because it
//overflows very quickly, but the point is it
//should take a few seconds to execute.
std::vector<unsigned long> numbers;
numbers.push_back(1);
numbers.push_back(1);
for(int i = 2; i < 100000000; i++)
{
numbers.push_back(numbers[i-2] + numbers[i-1]);
}
}
The problem is, it just outputs 3000ms exactly, when it clearly wasn't actually that.
On shorter problems, it just outputs 0ms... What am I doing wrong?
EDIT: If it's of any use, I'm using the GNU GCC compiler with -std=c++0x flag on
The resolution of the high_resolution_clock depends on the platform.
Printing the following will give you an idea of the resolution of the implementation you use
std::cout << "It took " << std::chrono::nanoseconds(end_time - start_time).count() << std::endl;
I have got a similar problem with g++ (rev5, Built by MinGW-W64 project) 4.8.1 under window7.
int main()
{
auto start_time = std::chrono::high_resolution_clock::now();
int temp(1);
const int n(1e7);
for (int i = 0; i < n; i++)
temp += temp;
auto end_time = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(end_time - start_time).count() << " ns.";
return 0;
}
if n=1e7 it displays 19999800 ns
but if
n=1e6 it displays 0 ns.
the precision seems weak.