What is the difference between std::system_clock and std::steady_clock? (An example case that illustrate different results/behaviours would be great).
If my goal is to precisely measure execution time of functions (like a benchmark), what would be the best choice between std::system_clock, std::steady_clock and std::high_resolution_clock?
From N3376:
20.11.7.1 [time.clock.system]/1:
Objects of class system_clock represent wall clock time from the system-wide realtime clock.
20.11.7.2 [time.clock.steady]/1:
Objects of class steady_clock represent clocks for which values of time_point never decrease as physical time advances and for which values of time_point advance at a steady rate relative to real time. That is, the clock may not be adjusted.
20.11.7.3 [time.clock.hires]/1:
Objects of class high_resolution_clock represent clocks with the shortest tick period. high_resolution_clock may be a synonym for system_clock or steady_clock.
For instance, the system wide clock might be affected by something like daylight savings time, at which point the actual time listed at some point in the future can actually be a time in the past. (E.g. in the US, in the fall time moves back one hour, so the same hour is experienced "twice") However, steady_clock is not allowed to be affected by such things.
Another way of thinking about "steady" in this case is in the requirements defined in the table of 20.11.3 [time.clock.req]/2:
In Table 59 C1 and C2 denote clock types. t1 and t2 are values returned by C1::now() where the call returning t1 happens before the call returning t2 and both of these calls occur before C1::time_point::max(). [ Note: this means C1 did not wrap around between t1 and t2. —end note ]
Expression: C1::is_steady
Returns: const bool
Operational Semantics: true if t1 <= t2 is always true and the time between clock ticks is constant, otherwise false.
That's all the standard has on their differences.
If you want to do benchmarking, your best bet is probably going to be std::high_resolution_clock, because it is likely that your platform uses a high resolution timer (e.g. QueryPerformanceCounter on Windows) for this clock. However, if you're benchmarking, you should really consider using platform specific timers for your benchmark, because different platforms handle this differently. For instance, some platforms might give you some means of determining the actual number of clock ticks the program required (independent of other processes running on the same CPU). Better yet, get your hands on a real profiler and use that.
Billy provided a great answer based on the ISO C++ standard that I fully agree with. However there is another side of the story - real life. It seems that right now there is really no difference between those clocks in implementation of popular compilers:
gcc 4.8:
#ifdef _GLIBCXX_USE_CLOCK_MONOTONIC
...
#else
typedef system_clock steady_clock;
#endif
typedef system_clock high_resolution_clock;
Visual Studio 2012:
class steady_clock : public system_clock
{ // wraps monotonic clock
public:
static const bool is_monotonic = true; // retained
static const bool is_steady = true;
};
typedef system_clock high_resolution_clock;
In case of gcc you can check if you deal with steady clock simply by checking is_steady and behave accordingly. However VS2012 seems to cheat a bit here :-)
If you need high precision clock I recommend for now writing your own clock that conforms to C++11 official clock interface and wait for implementations to catch up. It will be much better approach than using OS specific API directly in your code.
For Windows you can do it like that:
// Self-made Windows QueryPerformanceCounter based C++11 API compatible clock
struct qpc_clock {
typedef std::chrono::nanoseconds duration; // nanoseconds resolution
typedef duration::rep rep;
typedef duration::period period;
typedef std::chrono::time_point<qpc_clock, duration> time_point;
static bool is_steady; // = true
static time_point now()
{
if(!is_inited) {
init();
is_inited = true;
}
LARGE_INTEGER counter;
QueryPerformanceCounter(&counter);
return time_point(duration(static_cast<rep>((double)counter.QuadPart / frequency.QuadPart *
period::den / period::num)));
}
private:
static bool is_inited; // = false
static LARGE_INTEGER frequency;
static void init()
{
if(QueryPerformanceFrequency(&frequency) == 0)
throw std::logic_error("QueryPerformanceCounter not supported: " + std::to_string(GetLastError()));
}
};
For Linux it is even easier. Just read the man page of clock_gettime and modify the code above.
GCC 5.3.0 implementation
C++ stdlib is inside GCC source:
high_resolution_clock is an alias for system_clock
system_clock forwards to the first of the following that is available:
clock_gettime(CLOCK_REALTIME, ...)
gettimeofday
time
steady_clock forwards to the first of the following that is available:
clock_gettime(CLOCK_MONOTONIC, ...)
system_clock
Then CLOCK_REALTIME vs CLOCK_MONOTONIC is explained at: Difference between CLOCK_REALTIME and CLOCK_MONOTONIC?
Maybe, the most significant difference is the fact that the starting point of std::chrono:system_clock is the 1.1.1970, so-called UNIX-epoch.
On the other side, for std::chrono::steady_clock typically the boot time of your PC and it's most suitable for measuring intervals.
Relevant talk about chrono by Howard Hinnant, author of chrono:
don't use high_resolution_clock, as it's an alias for one of these:
system_clock: it's like a regular clock, use it for time/date related stuff
steady_clock: it's like a stopwatch, use it for timing things.
Related
I need to find the time taken to execute a piece of code, and the method should be independent of system time, ie chrono and all wouldn't work.
My usecse looks somewhat like this.
int main{
//start
function();
//end
time_take = end - start;
}
I am working in an embedded platform that doesn't have the right time at the start-up. In my case, the start of funcion happens before actual time is set from ntp server and end happens after the exact time is obtained. So any method that compares the time difference between two points wouldn't work. Also, number of CPU ticks wouldn't work for me since my programme necessarily be running actively throughout.
I tried the conventional methods and they didn't work for me.
On Linux clock_gettime() has an option to return the the current CLOCK_MONOTONIC, which is unaffected by system time changes. Measuring the CLOCK_MONOTONIC at the beginning and the end, and then doing your own math to subtract the two values, will measure the elapsed time ignoring any system time changes.
If you don't want to dip down to C-level abstractions, <chrono> has this covered for you with steady_clock:
int main{
//start
auto t0 = std::chrono::steady_clock::now();
function();
auto t1 = std::chrono::steady_clock::now();
//end
auto time_take = end - start;
}
steady_clock is generally a wrapper around clock_gettime used with CLOCK_MONOTONIC except is portable across all platforms. I.e. some platforms don't have clock_gettime, but do have an API for getting a monotonic clock time.
Above the type of take_time will be steady_clock::duration. On all platforms I'm aware of, this type is an alias for nanoseconds. If you want an integral count of nanoseconds you can:
using namespace std::literals;
int64_t i = time_take/1ns;
The above works on all platforms, even if steady_clock::duration is not nanoseconds.
The minor advantage of <chrono> over a C-level API is that you don't have to deal with computing timespec subtraction manually. And of course it is portable.
I am using in my projects some time_point<steady_clock> variables in order to do operations at a specific interval. I want to serialize/deserialize in a file those values.
But it seems that the time_since_epoch from a steady_clock is not reliable, although time_since_epoch from a system_clock is quite ok, it always calculates the time from 1970/1/1 (UNIX time)
What's the best solution for me? It seems that I have to convert somehow from steady_clock to system_clock but I don't think this is achievable.
P.S. I already read the topic here: Persisting std::chrono time_point instances
On the cppreference page for std::chrono::steady_clock, it says:
This clock is not related to wall clock time (for example, it can be time since last reboot), and is most suitable for measuring intervals.
The page for std::chrono::system_clock says that most implementations use UTC as an epoch:
The epoch of system_clock is unspecified, but most implementations use Unix Time (i.e., time since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970, not counting leap seconds).
If you're trying to compare times across machines or hoping to correlate the recorded times to real world events (i.e. at 3pm today there was an issue), then you'll want to switch your code over to using the system clock. Anytime you reboot the steady clock will reset and it doesn't relate to wall time at all.
Edit: if you wanted to do an approximate conversion between steady and system timestamps you could do something like this:
template <typename To, typename FromTimePoint>
typename To::time_point approximate_conversion(const FromTimePoint& from) {
const auto to_now = To::now().time_since_epoch();
const auto from_now = FromTimePoint::clock::now().time_since_epoch();
// compute an approximate offset between the clocks and apply that to the input timestamp
const auto approx_offset = to_now - from_now;
return typename To::time_point{from.time_since_epoch() + approx_offset};
}
int main() {
auto steady = std::chrono::steady_clock::now();
auto system = approximate_conversion<std::chrono::system_clock>(steady);
}
This assumes the clocks don't drift apart very quickly, and that there are no large discontinuities in either clock (both of which are false assumptions over long periods of time).
taking minimal steps to question how a given code perform(a fast one), isn't
that the smallest unit, most fine measurment?
#pragma intrinsic(__rdtsc)
int main(void)
{
ULONGLONG t1,t2;
t1= __rdtsc();
work();
t2= __rdtsc();
std::cout<<t2-t1<<std::endl;
}
the man page, found at: http://linux.die.net/man/3/clock_gettime gives all the details.
you want to be calling the clock_gettime() function
to get just the time for your process, use:
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, struct timespec * );
or for the current thread use:
clock_gettime(CLOCK_THREAD_CPUTIME_ID, struct timespec * );
returns 0 for success, or -1 for failure (in which case errno is set appropriately).
The struct timespec is defined as:
struct timespec
{
time_t tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};
All the above is defined in the header file: time.h
It depends on what you want to measure. It does not necessarily give you any information about the elapsed time. That depends on the concrete x86 implementation. It gives you the number of "ticks" elapsed, with different definition of "ticks". It might be the more or less constant maximum clock frequency or the actually used frequency.
To make rdtsc usable for performance measurements of small core fragments, you also have to make sure, that the OS does not preempt your thread, or move it to another core, that might have a different TSC value. Use CPU binding and CPU shielding for your performance measurement thread. Also consider the difference between cold and warm performance testing. Chose wisely between the two depending on your use case.
In a team, I have worked a few years ago, we used it that way and it did give us good, rather stable and reproducible results, since we took care of all these other issues as well: warm tests with CPU binding and CPU shielding.
I have been experimenting with all kind of timers on Linux and OSX, and would like to try and wrap some of them with the same interface used by std::chrono.
That's easy to do for timers that have a well-defined "period" at compile time, e.g. the POSIX clock_gettime() familiy, the clock_get_time() family on OSX, or gettimeofday().
However, there are some useful timers for which the "period" - while constant - is only known at runtime.
For example:
- POSIX states the period of clock(), CLOCKS_PER_SEC, may be a variable on non-XSI systems
- on Linux, the period of times() is given at runtime by sysconf(_SC_CLK_TCK)
- on OSX, the period of mach_absolute_time() is given at runtime by mach_timebase_info()
- on recent Intel processors, the DST register ticks at a constant rate, but of course that can only be determined at runtime
To wrap these timers in the std::chrono interface, one possibility would be to use a period of std::chrono::nanosecond , and convert the value of each timer to nanoseconds. An other approach could be to use a floating point representation. However, both approaches would introduce a (very small) overhead to the now() function, and a (probably small) loss in precision.
The solution I'm trying to pursue is to define a set of classes to represent such "run-time constant" periods, built along the same lines as the std::ratio class.
However I expect that will require rewriting all the related template classes and functions (as they assume constexpr values).
How do I wrap these kind of timers a la std:chrono ?
Or use non-constexpr values for the time period of a clock ?
Does anyone have any experience with wrapping these kind of timers a
la std:chrono ?
Actually I do. And on OSX, one of your platforms of interest. :-)
You mention:
on OSX, the period of mach_absolute_time() is given at runtime by
mach_timebase_info()
Absolutely correct. Also on OSX, the libc++ implementation of high_resolution_clock and steady_clock is actually based on mach_absolute_time. I'm the author of this code, which is open source with a generous license (do anything you want with it as long as you retain the copyright).
Here is the source for libc++'s steady_clock::now(). It is built pretty much the way you surmised. The run time period is converted to nanoseconds prior to returning. On OS X the conversion factor is very often 1, and the code takes advantage of that fact with an optimization. However the code is general enough to handle non-1 conversion factors.
On the first call to now() there's a small cost of querying the run time conversion factor to nanoseconds. In the general case a floating point conversion factor is computed. In the common case (conversion factor == 1) the subsequent cost is calling through a function pointer. I've found that the overhead is really quite reasonable.
On OS X the conversion factor, although not determined until run time, is still a constant (i.e. does not vary as the program executes), so it only needs to be computed once.
If you're in a situation where your period is actually varying dynamically, you'll need more infrastructure to handle this. Essentially you would need to integrate (calculus) the period vs time curve and then compute an average period between two points in time. That would require a constant monitoring of the period as it changes with time, and <chrono> isn't the right tool for that. Such tools are typically handled at the OS level.
[Does anyone have any experience] Or with using non-constexpr values for the time period of a clock ?
After reading through the standard (20.11.5, Class template duration), "period" is expected to be "a specialization of ratio":
Remarks: If Period is not a specialization of ratio, the program is ill-formed.
and all chrono templates rely heavily on constexpr functionality.
Does anyone have any experience with wrapping these kind of timers a la std:chrono ?
I've found here a suggestion to use a duration with period = 1, boost::rational as rep , though without any concrete examples.
I have done a similar thing for my purposes, only for Linux though. You find the code here; feel free to use the code in whatever way you want.
The challenges my implementation addresses overlap partially with the ones mentioned in your question. Specifically:
The tick factor (required to convert from clock ticks to a time unit based on seconds) is retrieved at run time, but only the first time now() is used‡. If you are concerned about the small overhead this causes, you may call the now() function once at start-up before you measure any actual intervals. The tick factor is stored in a static variable, which means there is still some overhead as – on the lowest level – each call of the now() function implies checking whether the static variable has been initialized. However, this overhead will be the same in each call of now(), so it shouldn't impact measuring time intervals.
I do not convert to nanoseconds by default, because when measuring relatively long periods of time (e.g. a few seconds) this causes overflows very quickly. This is in fact the main reason why I don't use the boost implementation. Instead of converting to nanoseconds, I implement the base unit as a template parameter (called Precision in the code). I use std::ratio from C++11 as template arguments. So I can choose, for example, a clock<micro>, which implies that calling the now() function will internally convert to microseconds rather than nanoseconds, which means I can measure periods of many seconds or minutes without overflows and still with good precision. (This is independent of the unit used to produce output. You can have a clock<micro> and display the result in seconds, etc.)
My clock type, which is called combined_clock combines user time, system time and wall-clock time. There is a boost clock type for this, too, but it's not compatible with the ratio types and units from std, whereas mine is.
‡The tick factor is retrieved using the ::sysconf() call you suggest, and that is guaranteed to return one and the same value throughout the life time of the process.
So the way you use it is as follows:
#include "util/proctime.hpp"
#include <ratio>
#include <chrono>
#include <thread>
#include <utility>
#include <iostream>
int main()
{
using std::chrono::duration_cast;
using millisec = std::chrono::milliseconds;
using clock_type = rlxutil::combined_clock<std::micro>;
auto tp1 = clock_type::now();
/* Perform some random calculations. */
unsigned long step1 = 1;
unsigned long step2 = 1;
for (int i = 0 ; i < 50000000 ; ++i) {
unsigned long step3 = step1 + step2;
std::swap(step1,step2);
std::swap(step2,step3);
}
/* Sleep for a while (this adds to real time, but not CPU time). */
std::this_thread::sleep_for(millisec(1000));
auto tp2 = clock_type::now();
std::cout << "Elapsed time: "
<< duration_cast<millisec>(tp2 - tp1)
<< std::endl;
return 0;
}
The usage above involves a pretty-print function that generates output like this:
Elapsed time: [user 40, system 0, real 1070 millisec]
I'm interested in measuring a specific point in time down to the nanosecond using C++ in Windows. Is this possible? If it isn't, is it possible to get the specific time in microseconds at least?. Any library should do, unless I suppose it's possible with managed code.
thanks
If you have a threaded application running on a multicore computer QueryPerformanceCounter can (and will) return different values depending on which core the code is executing on. See this MSDN article. (rdtsc has the same problem)
This is not just a theoretical problem; we ran into it with our application and had to conclude that the only reliable time source is timeGetTime which only has ms precision (which fortunately was sufficient in our case). We also tried fixating the thread affinity for our threads to guarantee that each thread always got a consistent value from QueryPerformanceCounter, this worked but it absolutely killed the performance in the application.
To sum things up there isn't a reliable timer on windows that can be used to time thing with micro second precision (at least not when running on a multicore computer).
Windows has a high-performance counter API.
You need to get the ticks form QueryPerformanceCounter and divide by the frequency of the processor, provided by QueryPerformanceFrequency.
LARGE_INTEGER frequency;
if (::QueryPerformanceFrequency(&frequency) == FALSE)
throw "foo";
LARGE_INTEGER start;
if (::QueryPerformanceCounter(&start) == FALSE)
throw "foo";
// Calculation.
LARGE_INTEGER end;
if (::QueryPerformanceCounter(&end) == FALSE)
throw "foo";
double interval = static_cast<double>(end.QuadPart - start.QuadPart) / frequency.QuadPart;
This interval should be in seconds.
For future reference, with Windows Vista, 2008 and higher, Windows requires the hardware support "HPET". This operates independently of the CPU and its clock and frequency. It is possible to obtain times with accuracies to the sub-microsecond.
In order to implement this, you DO need to use QPC/QPF. The problem is that QPF (frequency) is a NOMINAL value, so using the raw calls will cause time drifts that can exceed minutes per day. In order to accound for this, you have to measure the actual frequency and check for its drift over time as heat and other physical operating conditions will affect it.
An article that describes this can be found on MSDN (circa 2004!) at this link.
http://msdn.microsoft.com/en-us/magazine/cc163996.aspx
I did implement something similar to this myself (and just found the above link today!) but prefer not to use "microsecond time" because the QPC call itself is rather lengthy compared to other Windows calls such as GetSystemTimeAsFileTime, and synchronization adds more overhead. So I prefer to use millisecond timestamps (approx 70% less call time than using QPC) especially when I'm trying to get the time hundreds of thousands of times per second.
The best choice are the functions QueryPerformanceCounter and QueryPerformanceFrequency.
Microsoft has just recently (2014) released more detailed information about QueryPerformanceCounter:
See Acquiring high-resolution time stamps (MSDN 2014) for the details.
This is a comprehensive article with lots of examples and detailed description. A must read for users of QPC.
I think microseconds is a bit unreasonable (without hardware assistance). Milliseconds is doable, but even then not that accurate due to various nefarious counter resolution issues. Regardless, I include my own timer class (based on std::chrono) for your consideration:
#include <type_traits>
#include <chrono>
class Stopwatch final
{
public:
using elapsed_resolution = std::chrono::milliseconds;
Stopwatch()
{
Reset();
}
void Reset()
{
reset_time = clock.now();
}
elapsed_resolution Elapsed()
{
return std::chrono::duration_cast<elapsed_resolution>(clock.now() - reset_time);
}
private:
std::chrono::high_resolution_clock clock;
std::chrono::high_resolution_clock::time_point reset_time;
};
Note that under the hood on Windows std::chrono::high_resolution_clock is using QueryPerformanceCounter, so it's just the same but portable.
MSDN claims that -
A Scenario object is a highly-accurate timer that logs ETW events
(Event Tracing for Windows) when you start and stop it. It's designed
to be used for performance instrumentation and benchmarking, and comes
in both C# and C++ versions. ... As a rule of thumb on modern
hardware, a call to Begin() or End() takes on the order of a
microsecond, and the resulting timestamps are accurate to 100ns (i.e.
0.1 microseconds). ... Versions are available for both .NET 3.5 (written in C#), and native C++, and run on both x86 and x64
platforms. The Scenario class was originally developed using Visual
Studio 2008, but is now targeted at developers using Visual Studio
2010.]
From Scenario Home Page. As far as i know, it was provided by the same people as PPL.
Addionaly you can read this High Resolution Clocks and Timers for Performance Measurement in Windows.
In newer Windows versions you probably want GetSystemTimePreciseAsFileTime. See Acquiring high resolution timestamps.
Lots of this varies a rather unfortunate amount based on hardware and OS version.
If you can use the Visual Studio compiler 2012 or higher, you can well use the std::chrono standard library.
#include <chrono>
::std::chrono::steady_clock::time_point time = std::chrono::steady_clock::now();
Note that the MSVC 2012 version may be only 1ms accurate. Newer versions should be accurate up to a microsecond.
You can use the Performance Counter API as Konrad Rudolf proposed, but should be warned that it is based on the CPU frequency. This frequency is not stable when e.g. a power save mode is enabled. If you want to use this API, make sure the CPU is at a constant frequency.
Otherwise, you can create some kind of 'statistical' system, correlating the CPU ticks to the PC BIOS clock. The latter is way less precise, but constant.
using QueryPerformanceCounter (for windows)
With respect to Konrad Rudolph's answer, note that in my experience the frequency of the performance counter is around 3.7MHz, so sub-microsecond, but certainly not nanosecond precision. The actual frequency is hardware (and power-save mode) dependent. Nanosecond precision is somewhat unreasonable in any case since interrupt latencies and process/thread context switching times are far longer than that, and that is also the order of magnitude of individual machine instructions.
rdtsc instruction is the most accurate.
Here is a Timer class that will work both for Windows and Linux :
#ifndef INCLUDE_CTIMER_HPP_
#define INCLUDE_CTIMER_HPP_
#if defined(_MSC_VER)
# define NOMINMAX // workaround a bug in windows.h
# include <windows.h>
#else
# include <sys/time.h>
#endif
namespace Utils
{
class CTimer
{
private:
# if defined(_MSC_VER)
LARGE_INTEGER m_depart;
# else
timeval m_depart;
# endif
public:
inline void start()
{
# if defined(_MSC_VER)
QueryPerformanceCounter(&m_depart);
# else
gettimeofday(&m_depart, 0);
# endif
};
inline float GetSecondes() const
{
# if defined(_MSC_VER)
LARGE_INTEGER now;
LARGE_INTEGER freq;
QueryPerformanceCounter(&now);
QueryPerformanceFrequency(&freq);
return (now.QuadPart - m_depart.QuadPart) / static_cast<float>(freq.QuadPart);
# else
timeval now;
gettimeofday(&now, 0);
return now.tv_sec - m_depart.tv_sec + (now.tv_usec - m_depart.tv_usec) / 1000000.0f;
# endif
};
};
}
#endif // INCLUDE_CTIMER_HPP_
Thanks for the input...though I couldn't get nano, or microsecond resolution which would have been nice, I was however able to come up with this...maybe someone else will find it usefull.
class N_Script_Timer
{
public:
N_Script_Timer()
{
running = false;
milliseconds = 0;
seconds = 0;
start_t = 0;
end_t = 0;
}
void Start()
{
if(running)return;
running = true;
start_t = timeGetTime();
}
void End()
{
if(!running)return;
running = false;
end_t = timeGetTime();
milliseconds = end_t - start_t;
seconds = milliseconds / (float)1000;
}
float milliseconds;
float seconds;
private:
unsigned long start_t;
unsigned long end_t;
bool running;
};