C++ high precision time measurement in Windows - c++

I'm interested in measuring a specific point in time down to the nanosecond using C++ in Windows. Is this possible? If it isn't, is it possible to get the specific time in microseconds at least?. Any library should do, unless I suppose it's possible with managed code.
thanks

If you have a threaded application running on a multicore computer QueryPerformanceCounter can (and will) return different values depending on which core the code is executing on. See this MSDN article. (rdtsc has the same problem)
This is not just a theoretical problem; we ran into it with our application and had to conclude that the only reliable time source is timeGetTime which only has ms precision (which fortunately was sufficient in our case). We also tried fixating the thread affinity for our threads to guarantee that each thread always got a consistent value from QueryPerformanceCounter, this worked but it absolutely killed the performance in the application.
To sum things up there isn't a reliable timer on windows that can be used to time thing with micro second precision (at least not when running on a multicore computer).

Windows has a high-performance counter API.
You need to get the ticks form QueryPerformanceCounter and divide by the frequency of the processor, provided by QueryPerformanceFrequency.
LARGE_INTEGER frequency;
if (::QueryPerformanceFrequency(&frequency) == FALSE)
throw "foo";
LARGE_INTEGER start;
if (::QueryPerformanceCounter(&start) == FALSE)
throw "foo";
// Calculation.
LARGE_INTEGER end;
if (::QueryPerformanceCounter(&end) == FALSE)
throw "foo";
double interval = static_cast<double>(end.QuadPart - start.QuadPart) / frequency.QuadPart;
This interval should be in seconds.

For future reference, with Windows Vista, 2008 and higher, Windows requires the hardware support "HPET". This operates independently of the CPU and its clock and frequency. It is possible to obtain times with accuracies to the sub-microsecond.
In order to implement this, you DO need to use QPC/QPF. The problem is that QPF (frequency) is a NOMINAL value, so using the raw calls will cause time drifts that can exceed minutes per day. In order to accound for this, you have to measure the actual frequency and check for its drift over time as heat and other physical operating conditions will affect it.
An article that describes this can be found on MSDN (circa 2004!) at this link.
http://msdn.microsoft.com/en-us/magazine/cc163996.aspx
I did implement something similar to this myself (and just found the above link today!) but prefer not to use "microsecond time" because the QPC call itself is rather lengthy compared to other Windows calls such as GetSystemTimeAsFileTime, and synchronization adds more overhead. So I prefer to use millisecond timestamps (approx 70% less call time than using QPC) especially when I'm trying to get the time hundreds of thousands of times per second.

The best choice are the functions QueryPerformanceCounter and QueryPerformanceFrequency.
Microsoft has just recently (2014) released more detailed information about QueryPerformanceCounter:
See Acquiring high-resolution time stamps (MSDN 2014) for the details.
This is a comprehensive article with lots of examples and detailed description. A must read for users of QPC.

I think microseconds is a bit unreasonable (without hardware assistance). Milliseconds is doable, but even then not that accurate due to various nefarious counter resolution issues. Regardless, I include my own timer class (based on std::chrono) for your consideration:
#include <type_traits>
#include <chrono>
class Stopwatch final
{
public:
using elapsed_resolution = std::chrono::milliseconds;
Stopwatch()
{
Reset();
}
void Reset()
{
reset_time = clock.now();
}
elapsed_resolution Elapsed()
{
return std::chrono::duration_cast<elapsed_resolution>(clock.now() - reset_time);
}
private:
std::chrono::high_resolution_clock clock;
std::chrono::high_resolution_clock::time_point reset_time;
};
Note that under the hood on Windows std::chrono::high_resolution_clock is using QueryPerformanceCounter, so it's just the same but portable.

MSDN claims that -
A Scenario object is a highly-accurate timer that logs ETW events
(Event Tracing for Windows) when you start and stop it. It's designed
to be used for performance instrumentation and benchmarking, and comes
in both C# and C++ versions. ... As a rule of thumb on modern
hardware, a call to Begin() or End() takes on the order of a
microsecond, and the resulting timestamps are accurate to 100ns (i.e.
0.1 microseconds). ... Versions are available for both .NET 3.5 (written in C#), and native C++, and run on both x86 and x64
platforms. The Scenario class was originally developed using Visual
Studio 2008, but is now targeted at developers using Visual Studio
2010.]
From Scenario Home Page. As far as i know, it was provided by the same people as PPL.
Addionaly you can read this High Resolution Clocks and Timers for Performance Measurement in Windows.

In newer Windows versions you probably want GetSystemTimePreciseAsFileTime. See Acquiring high resolution timestamps.
Lots of this varies a rather unfortunate amount based on hardware and OS version.

If you can use the Visual Studio compiler 2012 or higher, you can well use the std::chrono standard library.
#include <chrono>
::std::chrono::steady_clock::time_point time = std::chrono::steady_clock::now();
Note that the MSVC 2012 version may be only 1ms accurate. Newer versions should be accurate up to a microsecond.

You can use the Performance Counter API as Konrad Rudolf proposed, but should be warned that it is based on the CPU frequency. This frequency is not stable when e.g. a power save mode is enabled. If you want to use this API, make sure the CPU is at a constant frequency.
Otherwise, you can create some kind of 'statistical' system, correlating the CPU ticks to the PC BIOS clock. The latter is way less precise, but constant.

using QueryPerformanceCounter (for windows)

With respect to Konrad Rudolph's answer, note that in my experience the frequency of the performance counter is around 3.7MHz, so sub-microsecond, but certainly not nanosecond precision. The actual frequency is hardware (and power-save mode) dependent. Nanosecond precision is somewhat unreasonable in any case since interrupt latencies and process/thread context switching times are far longer than that, and that is also the order of magnitude of individual machine instructions.

rdtsc instruction is the most accurate.

Here is a Timer class that will work both for Windows and Linux :
#ifndef INCLUDE_CTIMER_HPP_
#define INCLUDE_CTIMER_HPP_
#if defined(_MSC_VER)
# define NOMINMAX // workaround a bug in windows.h
# include <windows.h>
#else
# include <sys/time.h>
#endif
namespace Utils
{
class CTimer
{
private:
# if defined(_MSC_VER)
LARGE_INTEGER m_depart;
# else
timeval m_depart;
# endif
public:
inline void start()
{
# if defined(_MSC_VER)
QueryPerformanceCounter(&m_depart);
# else
gettimeofday(&m_depart, 0);
# endif
};
inline float GetSecondes() const
{
# if defined(_MSC_VER)
LARGE_INTEGER now;
LARGE_INTEGER freq;
QueryPerformanceCounter(&now);
QueryPerformanceFrequency(&freq);
return (now.QuadPart - m_depart.QuadPart) / static_cast<float>(freq.QuadPart);
# else
timeval now;
gettimeofday(&now, 0);
return now.tv_sec - m_depart.tv_sec + (now.tv_usec - m_depart.tv_usec) / 1000000.0f;
# endif
};
};
}
#endif // INCLUDE_CTIMER_HPP_

Thanks for the input...though I couldn't get nano, or microsecond resolution which would have been nice, I was however able to come up with this...maybe someone else will find it usefull.
class N_Script_Timer
{
public:
N_Script_Timer()
{
running = false;
milliseconds = 0;
seconds = 0;
start_t = 0;
end_t = 0;
}
void Start()
{
if(running)return;
running = true;
start_t = timeGetTime();
}
void End()
{
if(!running)return;
running = false;
end_t = timeGetTime();
milliseconds = end_t - start_t;
seconds = milliseconds / (float)1000;
}
float milliseconds;
float seconds;
private:
unsigned long start_t;
unsigned long end_t;
bool running;
};

Related

Time taken between two points in code independent of system clock CPP Linux

I need to find the time taken to execute a piece of code, and the method should be independent of system time, ie chrono and all wouldn't work.
My usecse looks somewhat like this.
int main{
//start
function();
//end
time_take = end - start;
}
I am working in an embedded platform that doesn't have the right time at the start-up. In my case, the start of funcion happens before actual time is set from ntp server and end happens after the exact time is obtained. So any method that compares the time difference between two points wouldn't work. Also, number of CPU ticks wouldn't work for me since my programme necessarily be running actively throughout.
I tried the conventional methods and they didn't work for me.
On Linux clock_gettime() has an option to return the the current CLOCK_MONOTONIC, which is unaffected by system time changes. Measuring the CLOCK_MONOTONIC at the beginning and the end, and then doing your own math to subtract the two values, will measure the elapsed time ignoring any system time changes.
If you don't want to dip down to C-level abstractions, <chrono> has this covered for you with steady_clock:
int main{
//start
auto t0 = std::chrono::steady_clock::now();
function();
auto t1 = std::chrono::steady_clock::now();
//end
auto time_take = end - start;
}
steady_clock is generally a wrapper around clock_gettime used with CLOCK_MONOTONIC except is portable across all platforms. I.e. some platforms don't have clock_gettime, but do have an API for getting a monotonic clock time.
Above the type of take_time will be steady_clock::duration. On all platforms I'm aware of, this type is an alias for nanoseconds. If you want an integral count of nanoseconds you can:
using namespace std::literals;
int64_t i = time_take/1ns;
The above works on all platforms, even if steady_clock::duration is not nanoseconds.
The minor advantage of <chrono> over a C-level API is that you don't have to deal with computing timespec subtraction manually. And of course it is portable.

Difference between std::system_clock and std::steady_clock?

What is the difference between std::system_clock and std::steady_clock? (An example case that illustrate different results/behaviours would be great).
If my goal is to precisely measure execution time of functions (like a benchmark), what would be the best choice between std::system_clock, std::steady_clock and std::high_resolution_clock?
From N3376:
20.11.7.1 [time.clock.system]/1:
Objects of class system_clock represent wall clock time from the system-wide realtime clock.
20.11.7.2 [time.clock.steady]/1:
Objects of class steady_clock represent clocks for which values of time_point never decrease as physical time advances and for which values of time_point advance at a steady rate relative to real time. That is, the clock may not be adjusted.
20.11.7.3 [time.clock.hires]/1:
Objects of class high_resolution_clock represent clocks with the shortest tick period. high_resolution_clock may be a synonym for system_clock or steady_clock.
For instance, the system wide clock might be affected by something like daylight savings time, at which point the actual time listed at some point in the future can actually be a time in the past. (E.g. in the US, in the fall time moves back one hour, so the same hour is experienced "twice") However, steady_clock is not allowed to be affected by such things.
Another way of thinking about "steady" in this case is in the requirements defined in the table of 20.11.3 [time.clock.req]/2:
In Table 59 C1 and C2 denote clock types. t1 and t2 are values returned by C1::now() where the call returning t1 happens before the call returning t2 and both of these calls occur before C1::time_point::max(). [ Note: this means C1 did not wrap around between t1 and t2. —end note ]
Expression: C1::is_steady
Returns: const bool
Operational Semantics: true if t1 <= t2 is always true and the time between clock ticks is constant, otherwise false.
That's all the standard has on their differences.
If you want to do benchmarking, your best bet is probably going to be std::high_resolution_clock, because it is likely that your platform uses a high resolution timer (e.g. QueryPerformanceCounter on Windows) for this clock. However, if you're benchmarking, you should really consider using platform specific timers for your benchmark, because different platforms handle this differently. For instance, some platforms might give you some means of determining the actual number of clock ticks the program required (independent of other processes running on the same CPU). Better yet, get your hands on a real profiler and use that.
Billy provided a great answer based on the ISO C++ standard that I fully agree with. However there is another side of the story - real life. It seems that right now there is really no difference between those clocks in implementation of popular compilers:
gcc 4.8:
#ifdef _GLIBCXX_USE_CLOCK_MONOTONIC
...
#else
typedef system_clock steady_clock;
#endif
typedef system_clock high_resolution_clock;
Visual Studio 2012:
class steady_clock : public system_clock
{ // wraps monotonic clock
public:
static const bool is_monotonic = true; // retained
static const bool is_steady = true;
};
typedef system_clock high_resolution_clock;
In case of gcc you can check if you deal with steady clock simply by checking is_steady and behave accordingly. However VS2012 seems to cheat a bit here :-)
If you need high precision clock I recommend for now writing your own clock that conforms to C++11 official clock interface and wait for implementations to catch up. It will be much better approach than using OS specific API directly in your code.
For Windows you can do it like that:
// Self-made Windows QueryPerformanceCounter based C++11 API compatible clock
struct qpc_clock {
typedef std::chrono::nanoseconds duration; // nanoseconds resolution
typedef duration::rep rep;
typedef duration::period period;
typedef std::chrono::time_point<qpc_clock, duration> time_point;
static bool is_steady; // = true
static time_point now()
{
if(!is_inited) {
init();
is_inited = true;
}
LARGE_INTEGER counter;
QueryPerformanceCounter(&counter);
return time_point(duration(static_cast<rep>((double)counter.QuadPart / frequency.QuadPart *
period::den / period::num)));
}
private:
static bool is_inited; // = false
static LARGE_INTEGER frequency;
static void init()
{
if(QueryPerformanceFrequency(&frequency) == 0)
throw std::logic_error("QueryPerformanceCounter not supported: " + std::to_string(GetLastError()));
}
};
For Linux it is even easier. Just read the man page of clock_gettime and modify the code above.
GCC 5.3.0 implementation
C++ stdlib is inside GCC source:
high_resolution_clock is an alias for system_clock
system_clock forwards to the first of the following that is available:
clock_gettime(CLOCK_REALTIME, ...)
gettimeofday
time
steady_clock forwards to the first of the following that is available:
clock_gettime(CLOCK_MONOTONIC, ...)
system_clock
Then CLOCK_REALTIME vs CLOCK_MONOTONIC is explained at: Difference between CLOCK_REALTIME and CLOCK_MONOTONIC?
Maybe, the most significant difference is the fact that the starting point of std::chrono:system_clock is the 1.1.1970, so-called UNIX-epoch.
On the other side, for std::chrono::steady_clock typically the boot time of your PC and it's most suitable for measuring intervals.
Relevant talk about chrono by Howard Hinnant, author of chrono:
don't use high_resolution_clock, as it's an alias for one of these:
system_clock: it's like a regular clock, use it for time/date related stuff
steady_clock: it's like a stopwatch, use it for timing things.

timespec equivalent for windows

I am porting my application to windows from unix and I have run into a wall. In my application I need to find time in microseconds (the whole application heavily depends on it due to it being a high precision application).
Previously I was using timespec structure, but windows contains no such thing. The command GetTickCount does not suffice because it returns time in milliseconds. I was also thinking of QueryPerformanceFrequency.
Would anyone happen to know something that is as identical to timespec as possible?
In the future I might even require nanoseconds too, which nothing I have searched in windows supports.
See, for example, How to realise long-term high-resolution timing on windows using C++? and C++ Timer function to provide time in nano seconds.
I have done some testing with Cygwin under Windows XP: on my machine, the granularity of gettimeofday() is about 15 msecs (~1/64 secs). Which is quite coarse. And so is the granularity of:
* clock_t clock(void) (divisor CLOCKS_PER_SEC)
* clock_t times(struct tms *) (divisor sysconf(_SC_CLK_TCK))
Both divisors are 1000 (POSIX may have 1000000 for first).
Also, clock_getres(CLOCK_REALTIME,...) returns 15 msecs, so clock_gettime() is unlikely to help. And CLOCK_MONOTONIC and CLOCK_PROCESS_CPUTIME_ID don't work.
Other possibilites for Windows might be RDTSC; see the Wikipedia article. And HPET, which isn't available with Windows XP.
Also note in Linux, clock() is the process time, while in Windows it is the wall time.
So some sample code, both for standard Unix, and for CYGWIN code running under Windows, which gives a granularity of about 50 microsecs (on my machine). The return value is in seconds, and gives the number of seconds elapsed since the function was first called. (I belatedly realized this was in an answer I gave over a year ago).
#ifndef __CYGWIN32__
double RealElapsedTime(void) { // returns 0 seconds first time called
static struct timeval t0;
struct timeval tv;
gettimeofday(&tv, 0);
if (!t0.tv_sec)
t0 = tv;
return tv.tv_sec - t0.tv_sec + (tv.tv_usec - t0.tv_usec) / 1000000.;
}
#else
#include <windows.h>
double RealElapsedTime(void) { // granularity about 50 microsecs on my machine
static LARGE_INTEGER freq, start;
LARGE_INTEGER count;
if (!QueryPerformanceCounter(&count))
FatalError("QueryPerformanceCounter");
if (!freq.QuadPart) { // one time initialization
if (!QueryPerformanceFrequency(&freq))
FatalError("QueryPerformanceFrequency");
start = count;
}
return (double)(count.QuadPart - start.QuadPart) / freq.QuadPart;
}
#endif
Portable between Windows, UNIX, Linux and anything vaguely modern: std::chrono::high_resolution_clock. Resolution may vary, but you can find out at compile time what it is. Nanoseconds is certainly possible on modern hardware.
Keep in mind that nanosecond precision really means a sub-meter precision. A nanosecond at lightspeed is only 30 centimeters. Moving your computer from the top of rack to the bottom is literally moving it by several nanoseconds.

How to create an effecient portable timer in C++?

For a school project I need to (re)create a fully functional multi-player version of R-Type without the use of the following external libraries:
Boost
SFML/SDL
Qt
Use of C++11 not allowed
Moreover, this game must be fully portable between Fedora(Linux) and Windows. I am in charge of the server so the use of any graphic libraries is strictly prohibited.
In order to create a correct game loop I need a correct Timer class, similar as those found in the SDL which implements GetTicks() or GetElapsedTime() methods. But I asked myself what would be the best way to create such a Class, so far this is how I would start:
Creating a threaded-class using pthread(which is portable)
Using the functions time() and difftime() in a loop to determine how much time was elapsed since the last tick.
Knowing that this class will be used by dozens of instances playing at the same time, should I use the Singleton Design Pattern? Will this methods be accurate?
EDIT: Changed the explanation of my question to fit more my needs and to be more accurate on what I am allowed to use or not.
There's not an easy way to do what you're thinking. Luckily, there are easy ways to do what you want.
First: Using the functions time() and difftime() in a loop to determine how much time was elapsed That's a terrible idea. That will use 100% of one of your CPUs and thus slow your program to a crawl. If you want to wait a specific amount of time (a "tick" of 1/60 of a second, or 1/10 of a second), then just wait. Don't spin a thread.
header:
long long get_time();
long long get_freq();
void wait_for(long long nanoseconds);
cpp:
#ifdef _MSC_VER //windows compiler for windows machines
long long get_time() {
LARGE_INTEGER r;
QueryPerformanceCounter(r);
return r.QuadPart;
}
long long get_freq() {
LARGE_INTEGER r;
QueryPerformanceFrequency(r);
return r.QuadPart;
}
void wait_for(long long nanoseconds)
{
Sleep(nanoseconds / 1000000);
}
#endif
#ifdef __GNUC__ //linux compiler for linux machines
long long get_time() {
timespec r
clock_gettime(CLOCK_MONOTONIC, &r);
return long long(r.seconds)*1000000000 + r.nanoseconds;
}
long long get_freq() {
timespec r
clock_getres(CLOCK_MONOTONIC, &r);
return r.nanoseconds;
}
void wait_for(long long nanoseconds)
{
timespec r = {nanoseconds/1000000000, nanoseconds%1000000000};
nanosleep(&r, NULL);
}
#endif
None of this is perfect (especially since I don't code for linux), but this is the general concept whenever you have to deal with the OS (since it isn't in the standard and you cant use libraries). The Windows and GCC implementations can be in separate files if you like
Given the spec pthreads are out, not going to run on windows, not included in the standard.
If you can use C++11 you can use std::chrono for the timer this is a high precision timer, with a fairly intuitive interface. It has basically been lifted from boost (as has thread), so most of the documentation for boost translate to std::chrono.
(or for low precision just use the C time library) and for threads you can use std::thread.
N.B. these elements of the standard library and you just create test on your platforms to make sure the stdlib you are using supports them (you will need to enable c++11 - usually --std=c++0x)
I know for sure that gcc 4.6 has the majority of thread and chrono in and seems to be stable.
You probably want to create a wrapper around gettimeofday for Linux which returns the number of microseconds since the Epoch, and GetTickCount for Windows which returns the number of milliseconds since the system was started.
You can also use clock() on Windows which will return seconds * CLOCKS_PER_SEC (yes, wall-clock time not CPU time) since the process started.
To get a wall time you could use QueryPerformanceCounter on Windows and clock_gettime() with CLOCK_MONOTONIC on POSIX systems (CLOCK_MONOTONIC_RAW Linux 2.6.28+).

clock() vs getsystemtime()

I developed a class for calculations on multithreads and only one instance of this class is used by a thread. Also I want to measure the duration of calculations by iterating over a container of this class from another thread. The application is win32. The thing is I have read QueryPerformanceCounter is useful when comparing the measuremnts on a single thread. Because I can not use it my problem, I think of clock() or GetSystemTime(). It is sad that both methods have a 'resolution' of milliseconds (since CLOCKS_PER_SEC is 1000 on win32). Which method should I use or to generalize, is there a better option for me?
As a rule I have to take the measurements outside the working thread.
Here is some code as an example.
unsinged long GetCounter()
{
SYSTEMTIME ww;
GetSystemTime(&ww);
return ww.wMilliseconds + 1000 * ww.wSeconds;
// or
return clock();
}
class WorkClass
{
bool is_working;
unsigned long counter;
HANDLE threadHandle;
public:
DoWork()
{
threadHandle = GetCurrentThread();
is_working = true;
counter = GetCounter();
// Do some work
is_working = false;
}
};
void CheckDurations() // will work on another thread;
{
for(size_t i =0;i < vector_of_workClass.size(); ++i)
{
WorkClass & wc = vector_of_workClass[i];
if(wc.is_working)
{
unsigned long dur = GetCounter() - wc.counter;
ReportDuration(wc,dur);
if( dur > someLimitValue)
TerminateThread(wc.threadHandle);
}
}
}
QueryPerformanceCounter is fine for multithreaded applications. The processor instruction that may be used (rdtsc) can potentially provide invalid results when called on different processors.
I recommend reading "Game Timing and Multicore Processors".
For your specific application, the problem it appears you are trying to solve is using a timeout on some potentially long-running threads. The proper solution to this would be to use the WaitForMultipleObjects function with a timeout value. If the time expires, then you can terminate any threads that are still running - ideally by setting a flag that each thread checks, but TerminateThread may be suitable.
both methods have a precision of milliseconds
They don't. They have a resolution of a millisecond, the precision is far worse. Most machines increment the value only at intervals of 15.625 msec. That's a heckofalot of CPU cycles, usually not good enough to get any reliable indicator of code efficiency.
QPF does much better, no idea why you couldn't use it. A profiler is a the standard tool to measure code efficiency. Beats taking dependencies you don't want.
QueryPerformanceCounter should give you the best precision, but there is issues when the function get run on different processors (you get a different result for each processor). So when running in a thread you will experience shifts when the thread switch processor. To solve this you can set processor affinity for the thread that measures time.
GetSystemTime gets an absolute time, clock is a relative time but both measure elapsed time, not CPU time related to the actual thread/process.
Of course clock() is more portable. Having said that I use clock_gettime on Linux because I can get both elapsed and thread CPU time with that call.
boost has some time functions that you could use that will run on multiple platforms if you want platform independent code.