Measuring execution time - gettimeofday versus clock() versus chrono - c++

I have a subroutine that should be executed once every milisecond. I wanted to check that indeed that's what's happening. But I get different execution times from different functions. I've been trying to understand the differences between these functions (there are several SO questions about the subject) but I cannot get my head around the results I got. Please forget the global variables etc. This is a legacy code, written in C, ported to C++, which I'm trying to improve, so is messy.
< header stuff>
std::chrono::high_resolution_clock::time_point tchrono;
int64_t tgettime;
float tclock;
void myfunction(){
<all kinds of calculations>
using ms = std::chrono::duration<double, std::milli>;
std::chrono::high_resolution_clock::time_point tmpchrono = std::chrono::high_resolution_clock::now();
printf("chrono %f (ms): \n",std::chrono::duration_cast<ms>(tmpchrono-tchrono).count());
tchrono = tmpchrono;
struct timeval tv;
gettimeofday (&tv, NULL);
int64_t tmpgettime = (int64_t) tv.tv_sec * 1000000 + tv.tv_usec;
printf("gettimeofday: %lld\n",tmpgettime-tgettime);
tgettime = tmpgettime;
float tmpclock = 1000.0f*((float)clock())/CLOCKS_PER_SEC;
printf("clock %f (ms)\n",tmpclock-tclock);
tclock = tmpclock;
<more stuff>
}
and the output is:
chrono 0.998352 (ms):
gettimeofday: 999
clock 0.544922 (ms)
Why the difference? I'd expect clock to be at least as large as the others, or not?

std::chrono::high_resolution_clock::now() is not even working.
std::chrono::milliseconds represents the milliseconds as integers. When you convert to that representation, time representations of higher granularity are truncated to whole milliseconds. Then you assign it to a duration that has a double representation and seconds-ratio. Then you pass the duration object - instead of a double - to printf. All of those steps are wrong.
To get the milliseconds as a floating point, do this:
using ms = std::chrono::duration<double, std::milli>;
std::chrono::duration_cast<ms>(tmpchrono-tchrono).count();
clock() returns the processor time the process has used. That will depend on how much time the OS scheduler has given to your process. Unless the process is the only one on the system, this will be different from the passed wall clock time.
gettimeofday() returns the wall clock time.
What's the difference between using high_resolution_clock::now() and gettimeofday() ?
Both measure the wall clock time. The internal representation of both is implementation defined. The granularity of both is implementation defined as well.
gettimeofday is part of the POSIX standard and therefore available in all operating systems that comply with that standard (POSIX.1-2001). gettimeofday is not monotonic, i.e. it's affected by things like setting the time (by ntpd or by adminstrator) and changes in daylight saving time.
high_resolution_clock represents the clock with the smallest tick period provided by the implementation. It may be an alias of std::chrono::system_clock or std::chrono::steady_clock, or a third, independent clock.
high_resolution_clock is part of the c++ standard library and therefore available in all compilers that comply with that standard (c++11). high_resolution_clock may or might not be monotonic. This can be tested with high_resolution_clock::is_steady

The simples way to use std::chrono to measure execution time is this:
auto start = high_resolution_clock::now();
/*
* multiple iterations of the code you want to benchmark -
* make sure the optimizer doesn't eliminate the whole code
*/
auto end = high_resolution_clock::now();
std::cout << "Execution time (us): " << duration_cast<microseconds>(end - start).count() << std::endl;

Related

Time-stamping using std::chrono - How to 'filter' data based on relative time?

I want to time-tag a stream of data I produce, for which I want to use std::chrono::steady_clock.
These time-stamps are stored with the data ( as array of uint64 values?), and I will later need to process these time-stamps again.
Now, I haven't been using the std::chrono library at all so far, so I do need a bit of help on the syntax and best practices with this library.
I can get & store values using:
uint64_t timestamp = std::chrono::steady_clock::now().time_since_epoch().count();
but how do I best:
On reading the data create a timepoint from the uint64 ?
Get the ticks-per-second (uint64) value for the steady_clock?
Find a "cut-off" timepoint (as uint64) that lies a certain time (in seconds) prior a given timepoint?
Code snippets for the above would be appreciated.
I want to combine the three above essentially to do the following: Having an array of (increasing) time-stamp values (as uint64), I want to truncate it such that all data 'older' than last-time-stamp minus X seconds is thrown away.
Let's have a look at the features you might use in the cppreference documentation for chrono.
First off, you need to decide which clock you want to use. There is the steady_clock which you suggested, the high_resolution_clock and the system_clock.
high_resolution_clock is implementation dependent, so let's put this away unless we really need it. The steady_clock is guaranteed to be monotonic, but there is no guarantee of the meaning for the value you are getting. It's ideal for sorting events or measuring their intervals, but you can't get a timepoint out of it.
On the other hand, system_clock has a meaning, it's the UNIX epoch, so you can get a time value out of it, but is not guaranteed to be monotonic.
To get the period (duration of one tick) of a steady_clock, you have the period member:
auto period = std::chrono::steady_clock::period();
std::cout << "Clock period " << period.num << " / " << period.den << " seconds" << std::endl;
std::cout << "Clock period " << static_cast<double>(period.num) / period.den << " seconds" << std::endl;
Assuming you want to filter events that happened in the last few seconds using steady_clock values, you first need to compute the number of ticks in the time period you want and subtract it from now. Something along the lines of:
std::chrono::system_clock::time_point now = std::chrono::system_clock::now();
std::time_t t_c = std::chrono::system_clock::to_time_t(now - std::chrono::seconds(10));
And use t_c as cutoff point.
However, do not rely on std::chrono::steady_clock::now().time_since_epoch().count(); to get something meaningful - is just a number. The epoch for the steady_clock is usually the boot time. If you need a time, you should use system_clock (keeping in mind that is not monotonous).
C++20a introduces some more clocks, which are convertible to time.
As it took me far too long to figure it out from various sources today, I'm going to post my solution here as self-answer. ( I would appreciate comments on it, in case something is not correct or could be done better.)
Getting a clock's period in seconds and ticks-per-second value
using namespace std::chrono;
auto period = system_clock::period();
double period_s = (double) period.num / period.den;
uint64 tps = period.den / period.num;
Getting a clock's timepoint (now) as uint64 value for time-stamping a data stream
using namespace std::chrono;
system_clock::time_point tp_now = system_clock::now();
uint64 nowAsTicks = tp_now.time_since_epoch().count();
Getting a clock's timepoint given a stored uint64 value
using namespace std::chrono;
uint64 givenTicks = 12345; // Whatever the value was
system_clock::time_point tp_recreated = system_clock::time_point{} + system_clock::duration(givenTicks);
uint64 recreatedTicks = tp_now.time_since_epoch().count();
Assert( givenTicks == recreatedTicks ); // has to be true now
The last ( uint64 to timepoint ) was troubling me the most. The key-insights needed were:
(On Win10) The system_clock uses a time-resolution of 100 nanoseconds. Therefore one can not directly add std::chrono::nanoseconds to its native time points. (std::chrono:system_clock_time_point)
However, because the ticks are 100's of nanoseconds, one can also not use the next higher duration unit (microseconds) as it cannot be represent as an integer value.
One could use use an explicit cast to microseconds, but that would loose the 0.1us resolution of the the tick.
The proper way is to use the system_clock's own duration and directly initialize it with the stored tick value.
In my search I found the following resources most helpful:
Lecture of Howard Hinnant on YouTube - extremely helpful. I wish I would have started here.
cppreference.com on time_point and duration and time_since_epoch
cplusplus.com on steady clock and time_point
A nice place to look as usual is the reference manual :
https://en.cppreference.com/w/cpp/chrono
In this case you are looking for :
https://en.cppreference.com/w/cpp/chrono/clock_time_conversion
Since really you are using a clock with "epoch" 1/1/70 as origin and ms as unit.
Then just use arithmetic on durations to do the cutoff things you want :
https://en.cppreference.com/w/cpp/chrono/duration
There are code examples at bottom of each linked page.

Correct way of portably timing code using C++11

I'm in the midst of writing some timing code for a part of a program that has a low latency requirement.
Looking at whats available in the std::chrono library, I'm finding it a bit difficult to write timing code that is portable.
std::chrono::high_resolution_clock
std::chrono::steady_clock
std::chrono::system_clock
The system_clock is useless as it's not steady, the remaining two clocks are problematic.
The high_resolution_clock isn't necessarily stable on all platforms.
The steady_clock does not necessarily support fine-grain resolution time periods (eg: nano seconds)
For my purposes having a steady clock is the most important requirement and I can sort of get by with microsecond granularity.
My question is if one wanted to time code that could be running on different h/w architectures and OSes - what would be the best option?
Use steady_clock. On all implementations its precision is nanoseconds. You can check this yourself for your platform by printing out steady_clock::period::num and steady_clock::period::den.
Now that doesn't mean that it will actually measure nanosecond precision. But platforms do their best. For me, two consecutive calls to steady_clock (with optimizations enabled) will report times on the order of 100ns apart.
#include "chrono_io.h"
#include <chrono>
#include <iostream>
int
main()
{
using namespace std::chrono;
using namespace date;
auto t0 = steady_clock::now();
auto t1 = steady_clock::now();
auto t2 = steady_clock::now();
auto t3 = steady_clock::now();
std::cout << t1-t0 << '\n';
std::cout << t2-t1 << '\n';
std::cout << t3-t2 << '\n';
}
The above example uses this free, open-source, header-only library only for convenience of formatting the duration. You can format things yourself (I'm lazy). For me this just output:
287ns
116ns
75ns
YMMV.

Persisting std::chrono time_point instances

What is the correct way to persist std::chrono time_point instances and then read them back into another instance of the same type?
typedef std::chrono::time_point<std::chrono::high_resolution_clock> time_point_t;
time_point_t tp = std::chrono::high_resolution_clock::now();
serializer.write(tp);
.
.
.
time_point_t another_tp;
serializer.read(another_tp);
The calls to write/read, assume that the instance of type time_point_t, can be somehow converted to a byte representation, which can then be written to or read from a disk or a socket etc.
A possible solution suggested by Alf is as follows:
std::chrono::high_resolution_clock::time_point t0 = std::chrono::high_resolution_clock::now();
//Generate POD to write to disk
unsigned long long ns0 = t0.time_since_epoch().count();
//Read POD from disk and attempt to instantiate time_point
std::chrono::high_resolution_clock::duration d(ns0)
std::chrono::high_resolution_clock::time_point t1(d);
unsigned long long ns1 = t1.time_since_epoch().count();
if ((t0 != t1) || (ns0 != ns1))
{
std::cout << "Error time points don't match!\n";
}
Note: The above code has a bug as the final instantiated time point does not match the original.
In the case of of the old style time_t, one typically just writes the entire entity to disk based on its sizeof and then reads it back the same way - In short what would be the equivalent for the new std::chrono types?
Reading from a disk or socket implies that you might be reading in an instance of the application that did not do the write. And in this case, serializing the duration alone is not sufficient.
A time_point is a duration amount of time since an unspecified epoch. The epoch could be anything. On my computer the epoch of std::chrono::high_resolution_clock is whenever the computer booted. I.e. this clock reports the number of nanoseconds since boot.
If one application writes the time_since_epoch().count(), the computer is rebooted, and then another (or even the same) application reads it back in, the read in value has no meaning whatsoever, unless you happen to somehow know the amount of time between boots.
To reliably serialize a time_point one has to arrange for the writer and the reader to agree upon some epoch, and then ensure that the time_point written and read is with respect to that epoch. For example one might arrange to use the POSIX epoch: New Years 1970 UTC.
As it turns out, every std::chrono::system_clock implementation I'm aware of uses Unix time, a close approximation of UTC measured from New Years 1970. However I know of no common epoch for std::chrono::high_resolution_clock.
Only if you can somehow ensure that the reader and writer clocks agree upon a common epoch, can you serialize a time_point as a duration.
the time_point constructor takes a duration, and you can get a duration from member time_since_epoch. thus the question reduces to serialize a duration value. and duration has a constructor that takes a number of ticks, and a member function count that produces the number of ticks.
all this just by googling std::chrono::time_point and looking at the cppreference documentation google landed me on.
it's often a good idea to read the documentation.
Addendum: an example.
#include <chrono>
#include <iostream>
#include <typeinfo>
using namespace std;
auto main() -> int
{
using Clock = chrono::high_resolution_clock;
using Time_point = Clock::time_point;
using Duration = Clock::duration;
Time_point const t0 = Clock::now();
//Generate POD to write to disk
Duration::rep const ns0 = t0.time_since_epoch().count();
//Read POD from disk and attempt to instantiate time_point
Duration const d(ns0);
Time_point const t1(d);
cout << "Basic number type is " << typeid( ns0 ).name() << "." << endl;
if( t0 != t1 )
{
cout << "Error time points don't match!" << endl;
}
else
{
cout << "Reconstituted time is OK." << endl;
}
}
With Visual C++ 12.0 the reported basic type is __int64, i.e. long long, while with g++ 4.8.2 in Windows the reported type is x, which presumably means the same.
With both compilers the reconstituted time is identical to the original.
Addendum: As noted by Dina in the comments, as of C++14 the C++ standard doesn't specify the epoch, and so to make this work across machines or with different clocks it's necessary to add additional steps that normalize the epoch for the serialized data, e.g. and most naturally to Posix time, i.e. time since since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970.

Programme execution time counter

What is the most accurate way to calculate the elapsed time in C++? I used clock() to calculate this, but I have a feeling this is wrong as I get 0 ms 90% of the time and 15 ms the rest of it which makes little sense to me.
Even if it is really small and very close to 0 ms, is there a more accurate method that will give me the exact the value rather than a rounded down 0 ms?
clock_t tic = clock();
/*
main programme body
*/
clock_t toc = clock();
double time = (double)(toc-tic);
cout << "\nTime taken: " << (1000*(time/CLOCKS_PER_SEC)) << " (ms)";
Thanks
With C++11, I'd use
#include <chrono>
auto t0 = std::chrono::high_resolution_clock::now();
...
auto t1 = std::chrono::high_resolution_clock::now();
auto dt = 1.e-9*std::chrono::duration_cast<std::chrono::nanoseconds>(t1-t0).count();
for the elapsed time in seconds.
For pre 2011 C++, you can use QueryPerformanceCounter() on windows or gettimeofday() with linux/OSX. For example (this is actually C, not C++):
timeval oldCount,newCount;
gettimeofday(&oldCount, NULL);
...
gettimeofday(&newCount, NULL);
double t = double(newCount.tv_sec -oldCount.tv_sec )
+ double(newCount.tv_usec-oldCount.tv_usec) * 1.e-6;
for the elapsed time in seconds.
std::chrono::high_resolution_clock is as portable a solution as you can get, however it may not actually be higher resolution than what you already saw.
Pretty much any function which returns system time is going to jump forward whenever the system time is updated by the timer interrupt handler, and 10ms is a typical interval for that on modern OSes.
For better precision timing, you need to access either a CPU cycle counter or high precision event timer (HPET). Compiler library vendors ought to use these for high_resolution_clock, but not all do. So you may need OS-specific APIs.
(Note: specifically Visual C++ high_resolution_clock uses the low resolution system clock. But there are likely others.)
On Win32, for example, the QueryPerformanceFrequency() and QueryPerformanceCounter() functions are a good choice. For a wrapper that conforms to the C++11 timer interface and uses these functions, see
Mateusz answers "Difference between std::system_clock and std::steady_clock?"
If you have C++11 available, use the chrono library.
Also, different platforms provide access to high precision clocks.
For example, in linux, use clock_gettime. In Windows, use the high performance counter api.
Example:
C++11:
auto start=high_resolution_clock::now();
... // do stuff
auto diff=duration_cast<milliseconds>(high_resolution_clock::now()-start);
clog << diff.count() << "ms elapsed" << endl;

Estimating time left in C++11

I'm writing a progress bar class that outputs an updated progress bar every n ticks to an std::ostream:
class progress_bar
{
public:
progress_bar(uint64_t ticks)
: _total_ticks(ticks), ticks_occured(0),
_begin(std::chrono::steady_clock::now())
...
void tick()
{
// test to see if enough progress has elapsed
// to warrant updating the progress bar
// that way we aren't wasting resources printing
// something that hasn't changed
if (/* should we update */)
{
...
}
}
private:
std::uint64_t _total_ticks;
std::uint64_t _ticks_occurred;
std::chrono::steady_clock::time_point _begin;
...
}
I would like to also output the time remaining. I found a formula on another question that states time remaining is (variable names changed to fit my class):
time_left = (time_taken / _total_ticks) * (_total_ticks - _ticks_occured)
The parts I would like to fill in for my class are the time_left and the time_taken, using C++11's new <chrono> header.
I know I need to use a std::chrono::steady_clock, but I'm not sure how to integrate it into code. I assume the best way to measure the time would be a std::uint64_t as nanoseconds.
My questions are:
Is there a function in <chrono> that will convert the nanoseconds into an std::string, say something like "3m12s"?
Should I use the std::chrono::steady_clock::now() each time I update my progress bar, and subtract that from _begin to determine time_left?
Is there a better algorithm to determine time_left
Is there a function in that will convert the nanoseconds into
an std::string, say something like "3m12s"?
No. But I'll show you how you can easily do this below.
Should I use the std::chrono::steady_clock::now() each time I update
my progress bar, and subtract that from _begin to determine time_left?
Yes.
Is there a better algorithm to determine time_left
Yes. See below.
Edit
I had originally misinterpreted "ticks" as "clock ticks", when in actuality "ticks" has units of work and _ticks_occurred/_total_ticks can be interpreted as %job_done. So I've changed the proposed progress_bar below accordingly.
I believe the equation:
time_left = (time_taken / _total_ticks) * (_total_ticks - _ticks_occured)
is incorrect. It doesn't pass a sanity check: If _ticks_occured == 1 and _total_ticks is large, then time_left approximately equals (ok, slightly less) time_taken. That doesn't make sense.
I am rewriting the above equation to be:
time_left = time_taken * (1/percent_done - 1)
where
percent_done = _ticks_occurred/_total_ticks
Now as percent_done approaches zero, time_left approaches infinity, and when percent_done approaches 1, 'time_left approaches 0. When percent_done is 10%, time_left is 9*time_taken. This meets my expectations, assuming a roughly linear time cost per work-tick.
class progress_bar
{
public:
progress_bar(uint64_t ticks)
: _total_ticks(ticks), _ticks_occurred(0),
_begin(std::chrono::steady_clock::now())
// ...
{}
void tick()
{
using namespace std::chrono;
// test to see if enough progress has elapsed
// to warrant updating the progress bar
// that way we aren't wasting resources printing
// something that hasn't changed
if (/* should we update */)
{
// somehow _ticks_occurred is updated here and is not zero
duration time_taken = Clock::now() - _begin;
float percent_done = (float)_ticks_occurred/_total_ticks;
duration time_left = time_taken * static_cast<rep>(1/percent_done - 1);
minutes minutes_left = duration_cast<minutes>(time_left);
seconds seconds_left = duration_cast<seconds>(time_left - minutes_left);
}
}
private:
typedef std::chrono::steady_clock Clock;
typedef Clock::time_point time_point;
typedef Clock::duration duration;
typedef Clock::rep rep;
std::uint64_t _total_ticks;
std::uint64_t _ticks_occurred;
time_point _begin;
//...
};
Traffic in std::chrono::durations whenever you can. That way <chrono> does all the conversions for you. typedefs can ease the typing with the long names. And breaking down the time into minutes and seconds is as easy as shown above.
As bames53 notes in his answer, if you want to use my <chrono_io> facility, that's cool too. Your needs may be simple enough that you don't want to. It is a judgement call. bames53's answer is a good answer. I thought these extra details might be helpful too.
Edit
I accidentally left a bug in the code above. And instead of just patch the code above, I thought it would be a good idea to point out the bug and show how to use <chrono> to fix it.
The bug is here:
duration time_left = time_taken * static_cast<rep>(1/percent_done - 1);
and here:
typedef Clock::duration duration;
In practice steady_clock::duration is usually based on an integral type. <chrono> calls this the rep (short for representation). And when percent_done is greater than 50%, the factor being multiplied by time_taken is going to be less than 1. And when rep is integral, that gets cast to 0. So this progress_bar only behaves well during the first 50% and predicts 0 time left during the last 50%.
The key to fixing this is to traffic in durations that are based on floating point instead of integers. And <chrono> makes this very easy to do.
typedef std::chrono::steady_clock Clock;
typedef Clock::time_point time_point;
typedef Clock::period period;
typedef std::chrono::duration<float, period> duration;
duration now has the same tick period as steady_clock::duration but uses a float for the representation. And now the computation for time_left can leave off the static_cast:
duration time_left = time_taken * (1/percent_done - 1);
Here's the whole package again with these fixes:
class progress_bar
{
public:
progress_bar(uint64_t ticks)
: _total_ticks(ticks), _ticks_occurred(0),
_begin(std::chrono::steady_clock::now())
// ...
{}
void tick()
{
using namespace std::chrono;
// test to see if enough progress has elapsed
// to warrant updating the progress bar
// that way we aren't wasting resources printing
// something that hasn't changed
if (/* should we update */)
{
// somehow _ticks_occurred is updated here and is not zero
duration time_taken = Clock::now() - _begin;
float percent_done = (float)_ticks_occurred/_total_ticks;
duration time_left = time_taken * (1/percent_done - 1);
minutes minutes_left = duration_cast<minutes>(time_left);
seconds seconds_left = duration_cast<seconds>(time_left - minutes_left);
std::cout << minutes_left.count() << "m " << seconds_left.count() << "s\n";
}
}
private:
typedef std::chrono::steady_clock Clock;
typedef Clock::time_point time_point;
typedef Clock::period period;
typedef std::chrono::duration<float, period> duration;
std::uint64_t _total_ticks;
std::uint64_t _ticks_occurred;
time_point _begin;
//...
};
Nothing like a little testing... ;-)
The chrono library includes types for representing durations. You shouldn't convert that to a flat integer of some 'known' unit. When you want a known unit just use the chrono types, e.g. 'std::chrono::nanoseconds', and duration_cast. Or create your own duration type using a floating point representation and one of the SI ratios. E.g. std::chrono::duration<double,std::nano>. Without duration_cast or a floating point duration rounding is prohibited at compile time.
The IO facilities for chrono didn't make it into C++11, but you can get source from here. Using this you can just ignore the duration type, and it will print the right units. I don't think there's anything there to that will show the time in minutes, seconds, etc., but such a thing shouldn't be too hard to write.
I don't know that there's too much reason to be concerned about calling steady_clock::now() frequently, if that's what your asking. I'd expect most platforms to have a pretty fast timer for just that sort of thing. It does depend on the implementation though. Obviously it's causing an issue for you, so maybe you could only call steady_clock::now() inside the if (/* should we update */) block, which should put a reasonable limit on the call frequency.
Obviously there are other ways to estimate the time remaining. For example instead of taking the average over the progress so far (which is what the formula you show does), you could take the average from the last N ticks. Or do both and take a weighted average of the two estimates.