Programme execution time counter - c++

What is the most accurate way to calculate the elapsed time in C++? I used clock() to calculate this, but I have a feeling this is wrong as I get 0 ms 90% of the time and 15 ms the rest of it which makes little sense to me.
Even if it is really small and very close to 0 ms, is there a more accurate method that will give me the exact the value rather than a rounded down 0 ms?
clock_t tic = clock();
/*
main programme body
*/
clock_t toc = clock();
double time = (double)(toc-tic);
cout << "\nTime taken: " << (1000*(time/CLOCKS_PER_SEC)) << " (ms)";
Thanks

With C++11, I'd use
#include <chrono>
auto t0 = std::chrono::high_resolution_clock::now();
...
auto t1 = std::chrono::high_resolution_clock::now();
auto dt = 1.e-9*std::chrono::duration_cast<std::chrono::nanoseconds>(t1-t0).count();
for the elapsed time in seconds.
For pre 2011 C++, you can use QueryPerformanceCounter() on windows or gettimeofday() with linux/OSX. For example (this is actually C, not C++):
timeval oldCount,newCount;
gettimeofday(&oldCount, NULL);
...
gettimeofday(&newCount, NULL);
double t = double(newCount.tv_sec -oldCount.tv_sec )
+ double(newCount.tv_usec-oldCount.tv_usec) * 1.e-6;
for the elapsed time in seconds.

std::chrono::high_resolution_clock is as portable a solution as you can get, however it may not actually be higher resolution than what you already saw.
Pretty much any function which returns system time is going to jump forward whenever the system time is updated by the timer interrupt handler, and 10ms is a typical interval for that on modern OSes.
For better precision timing, you need to access either a CPU cycle counter or high precision event timer (HPET). Compiler library vendors ought to use these for high_resolution_clock, but not all do. So you may need OS-specific APIs.
(Note: specifically Visual C++ high_resolution_clock uses the low resolution system clock. But there are likely others.)
On Win32, for example, the QueryPerformanceFrequency() and QueryPerformanceCounter() functions are a good choice. For a wrapper that conforms to the C++11 timer interface and uses these functions, see
Mateusz answers "Difference between std::system_clock and std::steady_clock?"

If you have C++11 available, use the chrono library.
Also, different platforms provide access to high precision clocks.
For example, in linux, use clock_gettime. In Windows, use the high performance counter api.
Example:
C++11:
auto start=high_resolution_clock::now();
... // do stuff
auto diff=duration_cast<milliseconds>(high_resolution_clock::now()-start);
clog << diff.count() << "ms elapsed" << endl;

Related

Time-stamping using std::chrono - How to 'filter' data based on relative time?

I want to time-tag a stream of data I produce, for which I want to use std::chrono::steady_clock.
These time-stamps are stored with the data ( as array of uint64 values?), and I will later need to process these time-stamps again.
Now, I haven't been using the std::chrono library at all so far, so I do need a bit of help on the syntax and best practices with this library.
I can get & store values using:
uint64_t timestamp = std::chrono::steady_clock::now().time_since_epoch().count();
but how do I best:
On reading the data create a timepoint from the uint64 ?
Get the ticks-per-second (uint64) value for the steady_clock?
Find a "cut-off" timepoint (as uint64) that lies a certain time (in seconds) prior a given timepoint?
Code snippets for the above would be appreciated.
I want to combine the three above essentially to do the following: Having an array of (increasing) time-stamp values (as uint64), I want to truncate it such that all data 'older' than last-time-stamp minus X seconds is thrown away.
Let's have a look at the features you might use in the cppreference documentation for chrono.
First off, you need to decide which clock you want to use. There is the steady_clock which you suggested, the high_resolution_clock and the system_clock.
high_resolution_clock is implementation dependent, so let's put this away unless we really need it. The steady_clock is guaranteed to be monotonic, but there is no guarantee of the meaning for the value you are getting. It's ideal for sorting events or measuring their intervals, but you can't get a timepoint out of it.
On the other hand, system_clock has a meaning, it's the UNIX epoch, so you can get a time value out of it, but is not guaranteed to be monotonic.
To get the period (duration of one tick) of a steady_clock, you have the period member:
auto period = std::chrono::steady_clock::period();
std::cout << "Clock period " << period.num << " / " << period.den << " seconds" << std::endl;
std::cout << "Clock period " << static_cast<double>(period.num) / period.den << " seconds" << std::endl;
Assuming you want to filter events that happened in the last few seconds using steady_clock values, you first need to compute the number of ticks in the time period you want and subtract it from now. Something along the lines of:
std::chrono::system_clock::time_point now = std::chrono::system_clock::now();
std::time_t t_c = std::chrono::system_clock::to_time_t(now - std::chrono::seconds(10));
And use t_c as cutoff point.
However, do not rely on std::chrono::steady_clock::now().time_since_epoch().count(); to get something meaningful - is just a number. The epoch for the steady_clock is usually the boot time. If you need a time, you should use system_clock (keeping in mind that is not monotonous).
C++20a introduces some more clocks, which are convertible to time.
As it took me far too long to figure it out from various sources today, I'm going to post my solution here as self-answer. ( I would appreciate comments on it, in case something is not correct or could be done better.)
Getting a clock's period in seconds and ticks-per-second value
using namespace std::chrono;
auto period = system_clock::period();
double period_s = (double) period.num / period.den;
uint64 tps = period.den / period.num;
Getting a clock's timepoint (now) as uint64 value for time-stamping a data stream
using namespace std::chrono;
system_clock::time_point tp_now = system_clock::now();
uint64 nowAsTicks = tp_now.time_since_epoch().count();
Getting a clock's timepoint given a stored uint64 value
using namespace std::chrono;
uint64 givenTicks = 12345; // Whatever the value was
system_clock::time_point tp_recreated = system_clock::time_point{} + system_clock::duration(givenTicks);
uint64 recreatedTicks = tp_now.time_since_epoch().count();
Assert( givenTicks == recreatedTicks ); // has to be true now
The last ( uint64 to timepoint ) was troubling me the most. The key-insights needed were:
(On Win10) The system_clock uses a time-resolution of 100 nanoseconds. Therefore one can not directly add std::chrono::nanoseconds to its native time points. (std::chrono:system_clock_time_point)
However, because the ticks are 100's of nanoseconds, one can also not use the next higher duration unit (microseconds) as it cannot be represent as an integer value.
One could use use an explicit cast to microseconds, but that would loose the 0.1us resolution of the the tick.
The proper way is to use the system_clock's own duration and directly initialize it with the stored tick value.
In my search I found the following resources most helpful:
Lecture of Howard Hinnant on YouTube - extremely helpful. I wish I would have started here.
cppreference.com on time_point and duration and time_since_epoch
cplusplus.com on steady clock and time_point
A nice place to look as usual is the reference manual :
https://en.cppreference.com/w/cpp/chrono
In this case you are looking for :
https://en.cppreference.com/w/cpp/chrono/clock_time_conversion
Since really you are using a clock with "epoch" 1/1/70 as origin and ms as unit.
Then just use arithmetic on durations to do the cutoff things you want :
https://en.cppreference.com/w/cpp/chrono/duration
There are code examples at bottom of each linked page.

Correct way of portably timing code using C++11

I'm in the midst of writing some timing code for a part of a program that has a low latency requirement.
Looking at whats available in the std::chrono library, I'm finding it a bit difficult to write timing code that is portable.
std::chrono::high_resolution_clock
std::chrono::steady_clock
std::chrono::system_clock
The system_clock is useless as it's not steady, the remaining two clocks are problematic.
The high_resolution_clock isn't necessarily stable on all platforms.
The steady_clock does not necessarily support fine-grain resolution time periods (eg: nano seconds)
For my purposes having a steady clock is the most important requirement and I can sort of get by with microsecond granularity.
My question is if one wanted to time code that could be running on different h/w architectures and OSes - what would be the best option?
Use steady_clock. On all implementations its precision is nanoseconds. You can check this yourself for your platform by printing out steady_clock::period::num and steady_clock::period::den.
Now that doesn't mean that it will actually measure nanosecond precision. But platforms do their best. For me, two consecutive calls to steady_clock (with optimizations enabled) will report times on the order of 100ns apart.
#include "chrono_io.h"
#include <chrono>
#include <iostream>
int
main()
{
using namespace std::chrono;
using namespace date;
auto t0 = steady_clock::now();
auto t1 = steady_clock::now();
auto t2 = steady_clock::now();
auto t3 = steady_clock::now();
std::cout << t1-t0 << '\n';
std::cout << t2-t1 << '\n';
std::cout << t3-t2 << '\n';
}
The above example uses this free, open-source, header-only library only for convenience of formatting the duration. You can format things yourself (I'm lazy). For me this just output:
287ns
116ns
75ns
YMMV.

Measuring execution time - gettimeofday versus clock() versus chrono

I have a subroutine that should be executed once every milisecond. I wanted to check that indeed that's what's happening. But I get different execution times from different functions. I've been trying to understand the differences between these functions (there are several SO questions about the subject) but I cannot get my head around the results I got. Please forget the global variables etc. This is a legacy code, written in C, ported to C++, which I'm trying to improve, so is messy.
< header stuff>
std::chrono::high_resolution_clock::time_point tchrono;
int64_t tgettime;
float tclock;
void myfunction(){
<all kinds of calculations>
using ms = std::chrono::duration<double, std::milli>;
std::chrono::high_resolution_clock::time_point tmpchrono = std::chrono::high_resolution_clock::now();
printf("chrono %f (ms): \n",std::chrono::duration_cast<ms>(tmpchrono-tchrono).count());
tchrono = tmpchrono;
struct timeval tv;
gettimeofday (&tv, NULL);
int64_t tmpgettime = (int64_t) tv.tv_sec * 1000000 + tv.tv_usec;
printf("gettimeofday: %lld\n",tmpgettime-tgettime);
tgettime = tmpgettime;
float tmpclock = 1000.0f*((float)clock())/CLOCKS_PER_SEC;
printf("clock %f (ms)\n",tmpclock-tclock);
tclock = tmpclock;
<more stuff>
}
and the output is:
chrono 0.998352 (ms):
gettimeofday: 999
clock 0.544922 (ms)
Why the difference? I'd expect clock to be at least as large as the others, or not?
std::chrono::high_resolution_clock::now() is not even working.
std::chrono::milliseconds represents the milliseconds as integers. When you convert to that representation, time representations of higher granularity are truncated to whole milliseconds. Then you assign it to a duration that has a double representation and seconds-ratio. Then you pass the duration object - instead of a double - to printf. All of those steps are wrong.
To get the milliseconds as a floating point, do this:
using ms = std::chrono::duration<double, std::milli>;
std::chrono::duration_cast<ms>(tmpchrono-tchrono).count();
clock() returns the processor time the process has used. That will depend on how much time the OS scheduler has given to your process. Unless the process is the only one on the system, this will be different from the passed wall clock time.
gettimeofday() returns the wall clock time.
What's the difference between using high_resolution_clock::now() and gettimeofday() ?
Both measure the wall clock time. The internal representation of both is implementation defined. The granularity of both is implementation defined as well.
gettimeofday is part of the POSIX standard and therefore available in all operating systems that comply with that standard (POSIX.1-2001). gettimeofday is not monotonic, i.e. it's affected by things like setting the time (by ntpd or by adminstrator) and changes in daylight saving time.
high_resolution_clock represents the clock with the smallest tick period provided by the implementation. It may be an alias of std::chrono::system_clock or std::chrono::steady_clock, or a third, independent clock.
high_resolution_clock is part of the c++ standard library and therefore available in all compilers that comply with that standard (c++11). high_resolution_clock may or might not be monotonic. This can be tested with high_resolution_clock::is_steady
The simples way to use std::chrono to measure execution time is this:
auto start = high_resolution_clock::now();
/*
* multiple iterations of the code you want to benchmark -
* make sure the optimizer doesn't eliminate the whole code
*/
auto end = high_resolution_clock::now();
std::cout << "Execution time (us): " << duration_cast<microseconds>(end - start).count() << std::endl;

Execute a function periodically in 10 milliseconds in C++ [duplicate]

Given a while loop and the function ordering as follows:
int k=0;
int total=100;
while(k<total){
doSomething();
if(approx. t milliseconds elapsed) { measure(); }
++k;
}
I want to perform 'measure' every t-th milliseconds. However, since 'doSomething' can be close to the t-th millisecond from the last execution, it is acceptable to perform the measure after approximately t milliseconds elapsed from the last measure.
My question is: how could this be achieved?
One solution would be to set timer to zero, and measure it after every 'doSomething'. When it is withing the acceptable range, I perform measures, and reset. However, I'm not which c++ function I should use for such a task. As I can see, there are certain functions, but the debate on which one is the most appropriate is outside of my understanding. Note that some of the functions actually take into account the time taken by some other processes, but I want my timer to only measure the time of the execution of my c++ code (I hope that is clear). Another thing is the resolution of the measurements, as pointed out below. Suppose the medium option of those suggested.
High resolution timing is platform specific, and you have not specified in the question. The standard library clock() function returns a count that increments at CLOCKS_PER_SEC per second. On some platforms this may be fast enough to give you the resolution you need but you should check your system's tick rate since it is implementation defined. However if you find it is high enough then:
#define SAMPLE_PERIOD_MS 100
#define SAMPLE_PERIOD_TICKS ((CLOCKS_PER_SEC * SAMPLE_PERIOD_MS) / 1000)
int k=0;
int total=100;
clock_t measure_time = clock() + SAMPLE_PERIOD_TICKS ;
while(k<total)
{
doSomething();
if( clock() - measure_time > 0 )
{
measure();
measure_time += SAMPLE_PERIOD_TICKS ;
++k;
}
}
You might replace clock() with some other high-resolution clock source if necessary.
However note a couple of issues. This method is a "busy-loop"; unless either doSomething() or measure() yield the CPU, the process will take all the cpu cycles it can. If this is the only code running on a target, that may not matter. On the other hand is this is running on a general purpose OS such as Windows or Linux which are not real-time, the process may be pre-empted by other processes, and this may affect the accuracy of the sampling periodicity. If you need accurate timing use of an RTOS and performing doSomething() and measure() in separate threads would be better. Even in a GPOS that would be better. For example a general pattern (using a made-up API in teh absence of any specification) would be:
int main()
{
StartThread( measure_thread, HIGH_PRIORITY ) ;
for(;;)
{
doSomething() ;
}
}
void measure_thread()
{
for(;;)
{
measure() ;
sleep( SAMPLE_PERIOD_MS ) ;
}
}
The code for measure_thread() is only accurate if measure() takes a negligible time to run. If it takes significant time you may need to account for that. If it is non-deterministic, you may even have to measure its execution time in order to subtract it the sleep period.

Calculate Clocks Per Sec

Am I doing it correctly? At times, my program will print 2000+ for the chrono solution and it always prints 1000 for the CLOCKS_PER_SEC..
What is that value I'm actually calculating? Is it Clocks Per Sec?
#include <iostream>
#include <chrono>
#include <thread>
#include <ctime>
std::chrono::time_point<std::chrono::high_resolution_clock> SystemTime()
{
return std::chrono::high_resolution_clock::now();
}
std::uint32_t TimeDuration(std::chrono::time_point<std::chrono::high_resolution_clock> Time)
{
return std::chrono::duration_cast<std::chrono::nanoseconds>(SystemTime() - Time).count();
}
int main()
{
auto Begin = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_for(std::chrono::milliseconds(1));
std::cout<< (TimeDuration(Begin) / 1000.0)<<std::endl;
std::cout<<CLOCKS_PER_SEC;
return 0;
}
In order to get the correct ticks per second on Linux, you need to use the return value of ::sysconf(_SC_CLK_TCK) (declared in the header unistd.h), rather than the macro CLOCKS_PER_SEC.
The latter is a constant defined in the POSIX standard – it is unrelated to the actual ticks per second of your CPU clock. For example, see the man page for clock:
C89, C99, POSIX.1-2001. POSIX requires that CLOCKS_PER_SEC equals 1000000 independent of the actual resolution.
However, note that even when using the correct ticks-per-second constant, you still won't get the number of actual CPU cycles per second. "Clock tick" is a special unit used by the CPU clock. There is no standardized definition of how it relates to actual CPU cycles.
In boost's library, there is a timer class, use CLOCKS_PER_SEC to calculate the maximum time the timer can elapse. It said that on Windows CLOCKS_PER_SEC is 1000 and on Mac OS X, Linux it is 1000000. So on the latter OSs, the accuracy is higher.