QueryPerformanceCounter throwing incorrect numbers - c++

I am using QueryPerformanceCounter to measure the time of some functions/operations. It used to give me correct numbers, for instance, I could test Sleep(1000) and it would return a time very close to 1 second. Now, it returns a time very different. I am not sure what the issue is as the code hasn't changed at all. Here is the code:
Expected Output:
duration : 1.000 seconds
Actual Output:
duration : 187.988 seconds
Code:
#include <windows.h>
#pragma comment (lib, "winmm.lib")
struct Clock{
Clock(){}
virtual ~Clock(){}
void start();
void stop();
long double duration();
LARGE_INTEGER _start, _end, _freq;
};
void Clock::start(){
_start.QuadPart = 0;
QueryPerformanceCounter(&_freq);
QueryPerformanceCounter(&_start);
}
void Clock::stop(){
QueryPerformanceCounter(&_end);
}
long double Clock::duration(){
//microseconds
LARGE_INTEGER delta;
delta.QuadPart = (_end.QuadPart - _start.QuadPart) * 1000000;
long double the_duration = ((long double)delta.QuadPart) / _freq.QuadPart;
std::cout << "duration : " << the_duration << " seconds" << std::endl;
return the_duration;
}
void main(){
Clock clock;
clock.start();
Sleep(1000);
clock.stop();
clock.duration();
std::cin.get();
}

You are assuming that the frequency from the performance counter is exactly 1000000 Hz.
You need to call QueryPerformanceFrequency instead, as the frequency can vary (some kernels use the motherboard's 1.024 MHz timer, others use the CPUs time-stamp-counter, which runs at approximately the CPU's clock frequency).

I use it in VBA. Sometimes I get 8976543566 ms instead of 40
I deal with it by calling it twice in the beginning of procedure:
If gbolShowRunTime = True Then
QueryPerformanceCounter curFrequencyStartLoop
End If
If gbolShowBFTime = True Then
QueryPerformanceCounter curFrequencyEndLoop 'Get the end time
QueryPerformanceCounter curFrequencyStartLoop ' reseed it
End If
Thereafter it works fine.

Related

How to get the difference between two time periods using the local system time [duplicate]

What's the best way to calculate a time difference in C++? I'm timing the execution speed of a program, so I'm interested in milliseconds. Better yet, seconds.milliseconds..
The accepted answer works, but needs to include ctime or time.h as noted in the comments.
See std::clock() function.
const clock_t begin_time = clock();
// do something
std::cout << float( clock () - begin_time ) / CLOCKS_PER_SEC;
If you want calculate execution time for self ( not for user ), it is better to do this in clock ticks ( not seconds ).
EDIT:
responsible header files - <ctime> or <time.h>
I added this answer to clarify that the accepted answer shows CPU time which may not be the time you want. Because according to the reference, there are CPU time and wall clock time. Wall clock time is the time which shows the actual elapsed time regardless of any other conditions like CPU shared by other processes. For example, I used multiple processors to do a certain task and the CPU time was high 18s where it actually took 2s in actual wall clock time.
To get the actual time you do,
#include <chrono>
auto t_start = std::chrono::high_resolution_clock::now();
// the work...
auto t_end = std::chrono::high_resolution_clock::now();
double elapsed_time_ms = std::chrono::duration<double, std::milli>(t_end-t_start).count();
if you are using c++11, here is a simple wrapper (see this gist):
#include <iostream>
#include <chrono>
class Timer
{
public:
Timer() : beg_(clock_::now()) {}
void reset() { beg_ = clock_::now(); }
double elapsed() const {
return std::chrono::duration_cast<second_>
(clock_::now() - beg_).count(); }
private:
typedef std::chrono::high_resolution_clock clock_;
typedef std::chrono::duration<double, std::ratio<1> > second_;
std::chrono::time_point<clock_> beg_;
};
Or for c++03 on *nix:
#include <iostream>
#include <ctime>
class Timer
{
public:
Timer() { clock_gettime(CLOCK_REALTIME, &beg_); }
double elapsed() {
clock_gettime(CLOCK_REALTIME, &end_);
return end_.tv_sec - beg_.tv_sec +
(end_.tv_nsec - beg_.tv_nsec) / 1000000000.;
}
void reset() { clock_gettime(CLOCK_REALTIME, &beg_); }
private:
timespec beg_, end_;
};
Example of usage:
int main()
{
Timer tmr;
double t = tmr.elapsed();
std::cout << t << std::endl;
tmr.reset();
t = tmr.elapsed();
std::cout << t << std::endl;
return 0;
}
I would seriously consider the use of Boost, particularly boost::posix_time::ptime and boost::posix_time::time_duration (at http://www.boost.org/doc/libs/1_38_0/doc/html/date_time/posix_time.html).
It's cross-platform, easy to use, and in my experience provides the highest level of time resolution an operating system provides. Possibly also very important; it provides some very nice IO operators.
To use it to calculate the difference in program execution (to microseconds; probably overkill), it would look something like this [browser written, not tested]:
ptime time_start(microsec_clock::local_time());
//... execution goes here ...
ptime time_end(microsec_clock::local_time());
time_duration duration(time_end - time_start);
cout << duration << '\n';
boost 1.46.0 and up includes the Chrono library:
thread_clock class provides access to the real thread wall-clock, i.e.
the real CPU-time clock of the calling thread. The thread relative
current time can be obtained by calling thread_clock::now()
#include <boost/chrono/thread_clock.hpp>
{
...
using namespace boost::chrono;
thread_clock::time_point start = thread_clock::now();
...
thread_clock::time_point stop = thread_clock::now();
std::cout << "duration: " << duration_cast<milliseconds>(stop - start).count() << " ms\n";
In Windows: use GetTickCount
//GetTickCount defintition
#include <windows.h>
int main()
{
DWORD dw1 = GetTickCount();
//Do something
DWORD dw2 = GetTickCount();
cout<<"Time difference is "<<(dw2-dw1)<<" milliSeconds"<<endl;
}
You can also use the clock_gettime. This method can be used to measure:
System wide real-time clock
System wide monotonic clock
Per Process CPU time
Per process Thread CPU time
Code is as follows:
#include < time.h >
#include <iostream>
int main(){
timespec ts_beg, ts_end;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_beg);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_end);
std::cout << (ts_end.tv_sec - ts_beg.tv_sec) + (ts_end.tv_nsec - ts_beg.tv_nsec) / 1e9 << " sec";
}
`
just in case you are on Unix, you can use time to get the execution time:
$ g++ myprog.cpp -o myprog
$ time ./myprog
For me, the most easy way is:
#include <boost/timer.hpp>
boost::timer t;
double duration;
t.restart();
/* DO SOMETHING HERE... */
duration = t.elapsed();
t.restart();
/* DO OTHER STUFF HERE... */
duration = t.elapsed();
using this piece of code you don't have to do the classic end - start.
Enjoy your favorite approach.
Just a side note: if you're running on Windows, and you really really need precision, you can use QueryPerformanceCounter. It gives you time in (potentially) nanoseconds.
Get the system time in milliseconds at the beginning, and again at the end, and subtract.
To get the number of milliseconds since 1970 in POSIX you would write:
struct timeval tv;
gettimeofday(&tv, NULL);
return ((((unsigned long long)tv.tv_sec) * 1000) +
(((unsigned long long)tv.tv_usec) / 1000));
To get the number of milliseconds since 1601 on Windows you would write:
SYSTEMTIME systime;
FILETIME filetime;
GetSystemTime(&systime);
if (!SystemTimeToFileTime(&systime, &filetime))
return 0;
unsigned long long ns_since_1601;
ULARGE_INTEGER* ptr = (ULARGE_INTEGER*)&ns_since_1601;
// copy the result into the ULARGE_INTEGER; this is actually
// copying the result into the ns_since_1601 unsigned long long.
ptr->u.LowPart = filetime.dwLowDateTime;
ptr->u.HighPart = filetime.dwHighDateTime;
// Compute the number of milliseconds since 1601; we have to
// divide by 10,000, since the current value is the number of 100ns
// intervals since 1601, not ms.
return (ns_since_1601 / 10000);
If you cared to normalize the Windows answer so that it also returned the number of milliseconds since 1970, then you would have to adjust your answer by 11644473600000 milliseconds. But that isn't necessary if all you care about is the elapsed time.
If you are using:
tstart = clock();
// ...do something...
tend = clock();
Then you will need the following to get time in seconds:
time = (tend - tstart) / (double) CLOCKS_PER_SEC;
This seems to work fine for intel Mac 10.7:
#include <time.h>
time_t start = time(NULL);
//Do your work
time_t end = time(NULL);
std::cout<<"Execution Time: "<< (double)(end-start)<<" Seconds"<<std::endl;

Computing CPU time in C++ on Windows

Is there any way in C++ to calculate how long does it take to run a given program or routine in CPU time?
I work with Visual Studio 2008 running on Windows 7.
If you want to know the total amount of CPU time used by a process, neither clock nor rdtsc (either directly or via a compiler intrinsic) is really the best choice, at least IMO. If you need the code to be portable, about the best you can do is use clock, test with the system as quiescent as possible, and hope for the best (but if you do, be aware that the resolution of clock is CLOCKS_PER_SEC, which may or may not be 1000, and even if it is, your actual timing resolution often won't be that good -- it may give you times in milliseconds, but at least normally advance tens of milliseconds at a time).
Since, however, you don't seem to mind the code being specific to Windows, you can do quite a bit better. At least if my understanding of what you're looking for is correctly, what you really want is probably GetProcessTimes, which will (separately) tell you both kernel-mode and user-mode CPU usage of the process (as well as the start time and exit time, from which you can compute wall time used, if you care). There's also QueryProcessCycleTime, which will tell you the total number of CPU clock cycles used by the process (total of both user and kernel mode in all threads). Personally, I have a hard time imagining much use for the latter though -- counting individual clock cycles can be useful for small sections of code subject to intensive optimization, but I'm less certain about how you'd apply it to a complete process. GetProcessTimes uses FILETIME structures, which support resolutions of 100 nanoseconds, but in reality most times you'll see will be multiples of the scheduler's time slice (which varies with the version of windows, but is on the order of milliseconds to tens of milliseconds).
In any case, if you truly want time from beginning to end, GetProcessTimes will let you do that -- if you spawn the program (e.g., with CreateProcess), you'll get a handle to the process which will be signaled when the child process exits. You can then call GetProcessTimes on that handle, and retrieve the times even though the child has already exited -- the handle will remain valid as long as at least one handle to the process remains open.
Here's one way. It measures routine exeution time in milliseconds.
clock_t begin=clock(); starts before the route is executed and clock_t end=clock(); starts right after the routine exits.
The two time sets are then subtracted from each other and the result is a millisecod value.
#include <stdio.h>
#include <iostream>
#include <time.h>
using namespace std;
double get_CPU_time_usage(clock_t clock1,clock_t clock2)
{
double diffticks=clock1-clock2;
double diffms=(diffticks*1000)/CLOCKS_PER_SEC;
return diffms;
}
void test_CPU_usage()
{
cout << "Standby.. measuring exeution time: ";
for (int i=0; i<10000;i++)
{
cout << "\b\\" << std::flush;
cout << "\b|" << std::flush;
cout << "\b/" << std::flush;
cout << "\b-" << std::flush;
}
cout << " \n\n";
}
int main (void)
{
clock_t begin=clock();
test_CPU_usage();
clock_t end=clock();
cout << "Time elapsed: " << double(get_CPU_time_usage(end,begin)) << " ms ("<<double(get_CPU_time_usage(end,begin))/1000<<" sec) \n\n";
return 0;
}
The __rdtscp intrinsic will give you the time in CPU cycles with some caveats.
Here's the MSDN article
It depends really what you want to measure. For better results take the average of a few million (if not billion) iterations.
The clock() function [as provided by Visual C++ 2008] doesn't return processor time used by the program, while it should (according to the C standard and/or C++ standard). That said, to measure CPU time on Windows, I have this helper class (which is inevitably non-portable):
class ProcessorTimer
{
public:
ProcessorTimer() { start(); }
void start() { ::GetProcessTimes(::GetCurrentProcess(), &ft_[3], &ft_[2], &ft_[1], &ft_[0]); }
std::tuple<double, double> stop()
{
::GetProcessTimes(::GetCurrentProcess(), &ft_[5], &ft_[4], &ft_[3], &ft_[2]);
ULARGE_INTEGER u[4];
for (size_t i = 0; i < 4; ++i)
{
u[i].LowPart = ft_[i].dwLowDateTime;
u[i].HighPart = ft_[i].dwHighDateTime;
}
double user = (u[2].QuadPart - u[0].QuadPart) / 10000000.0;
double kernel = (u[3].QuadPart - u[1].QuadPart) / 10000000.0;
return std::make_tuple(user, kernel);
}
private:
FILETIME ft_[6];
};
class ScopedProcessorTimer
{
public:
ScopedProcessorTimer(std::ostream& os = std::cerr) : timer_(ProcessorTimer()), os_(os) { }
~ScopedProcessorTimer()
{
std::tuple<double, double> t = timer_.stop();
os_ << "user " << std::get<0>(t) << "\n";
os_ << "kernel " << std::get<1>(t) << "\n";
}
private:
ProcessorTimer timer_;
std::ostream& os_;
}
For example, one can measure how long it takes a block to execute, by defining a ScopedProcessorTimer at the beginning of that {} block.
This Code is Process Cpu Usage
ULONGLONG LastCycleTime = NULL;
LARGE_INTEGER LastPCounter;
LastPCounter.QuadPart = 0; // LARGE_INTEGER Init
// cpu get core number
SYSTEM_INFO sysInfo;
GetSystemInfo(&sysInfo);
int numProcessors = sysInfo.dwNumberOfProcessors;
HANDLE hProcess = OpenProcess(PROCESS_QUERY_INFORMATION | PROCESS_VM_READ, FALSE, Process::pid);
if (hProcess == NULL)
nResult = 0;
int count = 0;
while (true)
{
ULONG64 CycleTime;
LARGE_INTEGER qpcLastInt;
if (!QueryProcessCycleTime(hProcess, &CycleTime))
nResult = 0;
ULONG64 cycle = CycleTime - LastCycleTime;
if (!QueryPerformanceCounter(&qpcLastInt))
nResult = 0;
double Usage = cycle / ((double)(qpcLastInt.QuadPart - LastPCounter.QuadPart));
// Scaling
Usage *= 1.0 / numProcessors;
Usage *= 0.1;
LastPCounter = qpcLastInt;
LastCycleTime = CycleTime;
if (count > 3)
{
printf("%.1f", Usage);
break;
}
Sleep(1); // QueryPerformanceCounter Function Resolution is 1 microsecond
count++;
}
CloseHandle(hProcess);

algorithm speed tester in C/C++

I have to calculate speed of my algorithm in base of milliseconds. In C++/C , how can i do this? I need to write smth before input and after output but what exactly??
You could use clock() function from <time.h>
clock() shows how many ticks have passed since your program started. The macro CLOCKS_PER_SEC contains the number of ticks per second, so you can actually get time.
//We start measuring here. Remember what was the amount of ticks in the
//beginning of the part of code you want to test:
int start = clock();
//<...>
//Do your stuff here
//<...>
int end = clock();//Now check what amount of ticks we have now.
//To get the time, just subtract start from end, and divide by CLOCKS_PER_SEC.
std::cout << "it took " << end - start << "ticks, or " << ((float)end - start)/CLOCKS_PER_SEC << "seconds." << std::endl;
There is no general way to measure the exact time or ticks. The method of measurement, the operating system and the other things happening on your computer (other application, graphical output, background processes) will influence the result. There is different ways to do "good-enough" (in many cases) measurements:
library functions
clock(...), clock_gettime(...)
from the standard lib (in time.h) and
gettimeofday(..) // for elapsed (wallclock) time
times(..) // process times
for linux and other unix systems (in sys/time.h) (edited according to Oleg's comment)
hardware counter:
__inline__ uint64_t rdtsc(void) {
uint32_t lo, hi;
__asm__ __volatile__( // serialize
"xorl %%eax,%%eax \n cpuid":::"%rax",
"%rbx", "%rcx", "%rdx");
__asm__ __volatile__("rdtsc":"=a"(lo), "=d"(hi));
return (uint64_t) hi << 32 | lo;
}
/*...*/
uint64_t t0 = rdtsc();
code_to_be_tested();
uint64_t t1 = rdtsc();
I prefer this method as it reads directly the hardware counter.
for C++11: std:chrono::highresolution_clock
typedef std::chrono::high_resolution_clock Clock;
auto t0 = Clock::now();
code_to_be_tested();
auto t1 = Clock::now();
Keep in mind, that the measurements will not be exact to the clockcycle. i.e. nanosecond. I always calculate microseconds (10e-6 s) as the smallest reasonable time unit.
Note that you can use date and time utilities from C++11 chrono library. From cppreference.com:
The chrono library defines three main types (durations, clocks, and time points) as well as utility functions and common typedefs.
See the sample from the article compiled in GCC 4.5.1 here
You can use this function library :
// clock.c
#include <time.h>
#include "clock.h"
struct clock { clock_t c1, c2; };
void start(clock *this) { this->c1 = clock(); }
void stop (clock *this) { this->c2 = clock(); }
double print(clock *this) { return (double)(c1 - c2) / CLOCKS_PER_SEC; }
// clock.h
#ifndef CLOCK_H_INCLUDED
# define CLOCK_H_INCLUDED
typedef struct clock clock;
extern void start(clock *);
extern void stop (clock *);
extern double print(clock *);
#endif // CLOCK_H_INCLUDED
But sometimes clock isn't very adapted: you can use your system functions, which can be more accurate.

Why is my c++ clock() based function returning a negative value?

I am still new to C++, is the clock function absolute (meaning it counts how long you sleep for), or is it how much time the application actually executes for?
I want a reliable way to produce exact intervals of 1 second. I am saving files, so I need to account for that. I was returning the runtime for that in milliseconds, and then sleeping for the remainder.
Is there a more accurate or simpler way to do this?
EDIT:
The main problem I am having is that I am getting a negative number:
double FCamera::getRuntime(clock_t* end, clock_t* start)
{
return((double(end - start)/CLOCKS_PER_SEC)*1000);
}
clock_t start = clock();
doWork();
clock_t end = clock();
double runtimeInMilliseconds = getRuntime(&end, &start);
It's giving me a negative number, what's up with that?
Walter
clock() returns the number of clock ticks elapsed since the program was launched. If you want to convert the value returned by clock into seconds divide by CLOCKS_PER_SEC (and multiply for the other way around).
There is just one pitfall, the initial moment of reference used by clock as the beginning of the program execution may vary between platforms. To calculate the actual processing times of a program, the value returned by clock should be compared to a value returned by an initial call to clock.
EDIT
larsman has been so kind to post other pitfalls in the comments. I have included them here for future reference.
On several other implementations, the value returned by clock() also includes the times of any children whose status has been collected via wait(2) (or another wait-type call). Linux does not include the times of waited-for children in the value returned by clock().
Note that the time can wrap around. On a 32-bit system where CLOCKS_PER_SEC equals 1000000 [as mandated by POSIX] this function will return the same value approximately every 72 minutes.
EDIT2
After messing around a while here is my portable (Linux/Windows) msleep. Be wary though, I'm not experienced with C/C++ and will most likely contain the stupidest error ever.
#ifdef _WIN32
#include <windows.h>
#define msleep(ms) Sleep((DWORD) ms)
#else
#include <unistd.h>
inline void msleep(unsigned long ms) {
while (ms--) usleep(1000);
}
#endif
You missed * (pointer) ,
Your argument is pointer (address of clock_t variable)
so, Your code must be modified::
return((double(*end - *start)/CLOCKS_PER_SEC)*1000);
Under windows, you can use:
VOID WINAPI Sleep(
__in DWORD dwMilliseconds
);
In linux, you will want to use:
#include <unistd.h>
unsigned int sleep(unsigned int seconds);
Notice the parameter difference - milliseconds under windows and seconds under linux.
My approach relies on:
int gettimeofday(struct timeval *tv, struct timezone *tz);
which gives the number of seconds and microseconds since the Epoch. According to the man pages:
The tv argument is a struct timeval (as specified in <sys/time.h>):
struct timeval {
time_t tv_sec; /* seconds */
suseconds_t tv_usec; /* microseconds */
};
So here we go:
#include <sys/time.h>
#include <iostream>
#include <iomanip>
static long myclock()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return (tv.tv_sec * 1000000) + tv.tv_usec;
}
double getRuntime(long* end, long* start)
{
return (*end - *start);
}
void doWork()
{
sleep(3);
}
int main(void)
{
long start = myclock();
doWork();
long end = myclock();
std::cout << "Time elapsed: " << std::setprecision(6) << getRuntime(&end, &start)/1000.0 << " miliseconds" << std::endl;
std::cout << "Time elapsed: " << std::setprecision(3) << getRuntime(&end, &start)/1000000.0 << " seconds" << std::endl;
return 0;
}
Outputs:
Time elapsed: 3000.08 miliseconds
Time elapsed: 3 seconds

Timer function to provide time in nano seconds using C++

I wish to calculate the time it took for an API to return a value.
The time taken for such an action is in the space of nanoseconds. As the API is a C++ class/function, I am using the timer.h to calculate the same:
#include <ctime>
#include <iostream>
using namespace std;
int main(int argc, char** argv) {
clock_t start;
double diff;
start = clock();
diff = ( std::clock() - start ) / (double)CLOCKS_PER_SEC;
cout<<"printf: "<< diff <<'\n';
return 0;
}
The above code gives the time in seconds. How do I get the same in nano seconds and with more precision?
What others have posted about running the function repeatedly in a loop is correct.
For Linux (and BSD) you want to use clock_gettime().
#include <sys/time.h>
int main()
{
timespec ts;
// clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
}
For windows you want to use the QueryPerformanceCounter. And here is more on QPC
Apparently there is a known issue with QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:
QueryPerformanceCounter() and
QueryPerformanceFrequency() offer a
bit better resolution, but have
different issues. For example in
Windows XP, all AMD Athlon X2 dual
core CPUs return the PC of either of
the cores "randomly" (the PC sometimes
jumps a bit backwards), unless you
specially install AMD dual core driver
package to fix the issue. We haven't
noticed any other dual+ core CPUs
having similar issues (p4 dual, p4 ht,
core2 dual, core2 quad, phenom quad).
EDIT 2013/07/16:
It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http://msdn.microsoft.com/en-us/library/windows/desktop/ee417693(v=vs.85).aspx
...While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for
multiple processors, bugs in the BIOS or drivers may result in these routines returning
different values as the thread moves from one processor to another...
However this StackOverflow answer https://stackoverflow.com/a/4588605/34329 states that QPC should work fine on any MS OS after Win XP service pack 2.
This article shows that Windows 7 can determine if the processor(s) have an invariant TSC and falls back to an external timer if they don't. http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html Synchronizing across processors is still an issue.
Other fine reading related to timers:
https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks
http://lwn.net/Articles/209101/
http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html
QueryPerformanceCounter Status?
See the comments for more details.
This new answer uses C++11's <chrono> facility. While there are other answers that show how to use <chrono>, none of them shows how to use <chrono> with the RDTSC facility mentioned in several of the other answers here. So I thought I would show how to use RDTSC with <chrono>. Additionally I'll demonstrate how you can templatize the testing code on the clock so that you can rapidly switch between RDTSC and your system's built-in clock facilities (which will likely be based on clock(), clock_gettime() and/or QueryPerformanceCounter.
Note that the RDTSC instruction is x86-specific. QueryPerformanceCounter is Windows only. And clock_gettime() is POSIX only. Below I introduce two new clocks: std::chrono::high_resolution_clock and std::chrono::system_clock, which, if you can assume C++11, are now cross-platform.
First, here is how you create a C++11-compatible clock out of the Intel rdtsc assembly instruction. I'll call it x::clock:
#include <chrono>
namespace x
{
struct clock
{
typedef unsigned long long rep;
typedef std::ratio<1, 2'800'000'000> period; // My machine is 2.8 GHz
typedef std::chrono::duration<rep, period> duration;
typedef std::chrono::time_point<clock> time_point;
static const bool is_steady = true;
static time_point now() noexcept
{
unsigned lo, hi;
asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
return time_point(duration(static_cast<rep>(hi) << 32 | lo));
}
};
} // x
All this clock does is count CPU cycles and store it in an unsigned 64-bit integer. You may need to tweak the assembly language syntax for your compiler. Or your compiler may offer an intrinsic you can use instead (e.g. now() {return __rdtsc();}).
To build a clock you have to give it the representation (storage type). You must also supply the clock period, which must be a compile time constant, even though your machine may change clock speed in different power modes. And from those you can easily define your clock's "native" time duration and time point in terms of these fundamentals.
If all you want to do is output the number of clock ticks, it doesn't really matter what number you give for the clock period. This constant only comes into play if you want to convert the number of clock ticks into some real-time unit such as nanoseconds. And in that case, the more accurate you are able to supply the clock speed, the more accurate will be the conversion to nanoseconds, (milliseconds, whatever).
Below is example code which shows how to use x::clock. Actually I've templated the code on the clock as I'd like to show how you can use many different clocks with the exact same syntax. This particular test is showing what the looping overhead is when running what you want to time under a loop:
#include <iostream>
template <class clock>
void
test_empty_loop()
{
// Define real time units
typedef std::chrono::duration<unsigned long long, std::pico> picoseconds;
// or:
// typedef std::chrono::nanoseconds nanoseconds;
// Define double-based unit of clock tick
typedef std::chrono::duration<double, typename clock::period> Cycle;
using std::chrono::duration_cast;
const int N = 100000000;
// Do it
auto t0 = clock::now();
for (int j = 0; j < N; ++j)
asm volatile("");
auto t1 = clock::now();
// Get the clock ticks per iteration
auto ticks_per_iter = Cycle(t1-t0)/N;
std::cout << ticks_per_iter.count() << " clock ticks per iteration\n";
// Convert to real time units
std::cout << duration_cast<picoseconds>(ticks_per_iter).count()
<< "ps per iteration\n";
}
The first thing this code does is create a "real time" unit to display the results in. I've chosen picoseconds, but you can choose any units you like, either integral or floating point based. As an example there is a pre-made std::chrono::nanoseconds unit I could have used.
As another example I want to print out the average number of clock cycles per iteration as a floating point, so I create another duration, based on double, that has the same units as the clock's tick does (called Cycle in the code).
The loop is timed with calls to clock::now() on either side. If you want to name the type returned from this function it is:
typename clock::time_point t0 = clock::now();
(as clearly shown in the x::clock example, and is also true of the system-supplied clocks).
To get a duration in terms of floating point clock ticks one merely subtracts the two time points, and to get the per iteration value, divide that duration by the number of iterations.
You can get the count in any duration by using the count() member function. This returns the internal representation. Finally I use std::chrono::duration_cast to convert the duration Cycle to the duration picoseconds and print that out.
To use this code is simple:
int main()
{
std::cout << "\nUsing rdtsc:\n";
test_empty_loop<x::clock>();
std::cout << "\nUsing std::chrono::high_resolution_clock:\n";
test_empty_loop<std::chrono::high_resolution_clock>();
std::cout << "\nUsing std::chrono::system_clock:\n";
test_empty_loop<std::chrono::system_clock>();
}
Above I exercise the test using our home-made x::clock, and compare those results with using two of the system-supplied clocks: std::chrono::high_resolution_clock and std::chrono::system_clock. For me this prints out:
Using rdtsc:
1.72632 clock ticks per iteration
616ps per iteration
Using std::chrono::high_resolution_clock:
0.620105 clock ticks per iteration
620ps per iteration
Using std::chrono::system_clock:
0.00062457 clock ticks per iteration
624ps per iteration
This shows that each of these clocks has a different tick period, as the ticks per iteration is vastly different for each clock. However when converted to a known unit of time (e.g. picoseconds), I get approximately the same result for each clock (your mileage may vary).
Note how my code is completely free of "magic conversion constants". Indeed, there are only two magic numbers in the entire example:
The clock speed of my machine in order to define x::clock.
The number of iterations to test over. If changing this number makes your results vary greatly, then you should probably make the number of iterations higher, or empty your computer of competing processes while testing.
With that level of accuracy, it would be better to reason in CPU tick rather than in system call like clock(). And do not forget that if it takes more than one nanosecond to execute an instruction... having a nanosecond accuracy is pretty much impossible.
Still, something like that is a start:
Here's the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started. It will work on Pentium and above (386/486 not supported). This code is actually MS Visual C++ specific, but can be probably very easy ported to whatever else, as long as it supports inline assembly.
inline __int64 GetCpuClocks()
{
// Counter
struct { int32 low, high; } counter;
// Use RDTSC instruction to get clocks count
__asm push EAX
__asm push EDX
__asm __emit 0fh __asm __emit 031h // RDTSC
__asm mov counter.low, EAX
__asm mov counter.high, EDX
__asm pop EDX
__asm pop EAX
// Return result
return *(__int64 *)(&counter);
}
This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute.
Using the Timing Figures:
If you need to translate the clock counts into true elapsed time, divide the results by your chip's clock speed. Remember that the "rated" GHz is likely to be slightly different from the actual speed of your chip. To check your chip's true speed, you can use several very good utilities or the Win32 call, QueryPerformanceFrequency().
To do this correctly you can use one of two ways, either go with RDTSC or with clock_gettime().
The second is about 2 times faster and has the advantage of giving the right absolute time. Note that for RDTSC to work correctly you need to use it as indicated (other comments on this page have errors, and may yield incorrect timing values on certain processors)
inline uint64_t rdtsc()
{
uint32_t lo, hi;
__asm__ __volatile__ (
"xorl %%eax, %%eax\n"
"cpuid\n"
"rdtsc\n"
: "=a" (lo), "=d" (hi)
:
: "%ebx", "%ecx" );
return (uint64_t)hi << 32 | lo;
}
and for clock_gettime: (I chose microsecond resolution arbitrarily)
#include <time.h>
#include <sys/timeb.h>
// needs -lrt (real-time lib)
// 1970-01-01 epoch UTC time, 1 mcs resolution (divide by 1M to get time_t)
uint64_t ClockGetTime()
{
timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
return (uint64_t)ts.tv_sec * 1000000LL + (uint64_t)ts.tv_nsec / 1000LL;
}
the timing and values produced:
Absolute values:
rdtsc = 4571567254267600
clock_gettime = 1278605535506855
Processing time: (10000000 runs)
rdtsc = 2292547353
clock_gettime = 1031119636
I am using the following to get the desired results:
#include <time.h>
#include <iostream>
using namespace std;
int main (int argc, char** argv)
{
// reset the clock
timespec tS;
tS.tv_sec = 0;
tS.tv_nsec = 0;
clock_settime(CLOCK_PROCESS_CPUTIME_ID, &tS);
...
... <code to check for the time to be put here>
...
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &tS);
cout << "Time taken is: " << tS.tv_sec << " " << tS.tv_nsec << endl;
return 0;
}
For C++11, here is a simple wrapper:
#include <iostream>
#include <chrono>
class Timer
{
public:
Timer() : beg_(clock_::now()) {}
void reset() { beg_ = clock_::now(); }
double elapsed() const {
return std::chrono::duration_cast<second_>
(clock_::now() - beg_).count(); }
private:
typedef std::chrono::high_resolution_clock clock_;
typedef std::chrono::duration<double, std::ratio<1> > second_;
std::chrono::time_point<clock_> beg_;
};
Or for C++03 on *nix,
class Timer
{
public:
Timer() { clock_gettime(CLOCK_REALTIME, &beg_); }
double elapsed() {
clock_gettime(CLOCK_REALTIME, &end_);
return end_.tv_sec - beg_.tv_sec +
(end_.tv_nsec - beg_.tv_nsec) / 1000000000.;
}
void reset() { clock_gettime(CLOCK_REALTIME, &beg_); }
private:
timespec beg_, end_;
};
Example of usage:
int main()
{
Timer tmr;
double t = tmr.elapsed();
std::cout << t << std::endl;
tmr.reset();
t = tmr.elapsed();
std::cout << t << std::endl;
return 0;
}
From https://gist.github.com/gongzhitaao/7062087
In general, for timing how long it takes to call a function, you want to do it many more times than just once. If you call your function only once and it takes a very short time to run, you still have the overhead of actually calling the timer functions and you don't know how long that takes.
For example, if you estimate your function might take 800 ns to run, call it in a loop ten million times (which will then take about 8 seconds). Divide the total time by ten million to get the time per call.
You can use the following function with gcc running under x86 processors:
unsigned long long rdtsc()
{
#define rdtsc(low, high) \
__asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))
unsigned int low, high;
rdtsc(low, high);
return ((ulonglong)high << 32) | low;
}
with Digital Mars C++:
unsigned long long rdtsc()
{
_asm
{
rdtsc
}
}
which reads the high performance timer on the chip. I use this when doing profiling.
If you need subsecond precision, you need to use system-specific extensions, and will have to check with the documentation for the operating system. POSIX supports up to microseconds with gettimeofday, but nothing more precise since computers didn't have frequencies above 1GHz.
If you are using Boost, you can check boost::posix_time.
I'm using Borland code here is the code ti_hund gives me some times a negativnumber but timing is fairly good.
#include <dos.h>
void main()
{
struct time t;
int Hour,Min,Sec,Hun;
gettime(&t);
Hour=t.ti_hour;
Min=t.ti_min;
Sec=t.ti_sec;
Hun=t.ti_hund;
printf("Start time is: %2d:%02d:%02d.%02d\n",
t.ti_hour, t.ti_min, t.ti_sec, t.ti_hund);
....
your code to time
...
// read the time here remove Hours and min if the time is in sec
gettime(&t);
printf("\nTid Hour:%d Min:%d Sec:%d Hundreds:%d\n",t.ti_hour-Hour,
t.ti_min-Min,t.ti_sec-Sec,t.ti_hund-Hun);
printf("\n\nAlt Ferdig Press a Key\n\n");
getch();
} // end main
Using Brock Adams's method, with a simple class:
int get_cpu_ticks()
{
LARGE_INTEGER ticks;
QueryPerformanceFrequency(&ticks);
return ticks.LowPart;
}
__int64 get_cpu_clocks()
{
struct { int32 low, high; } counter;
__asm cpuid
__asm push EDX
__asm rdtsc
__asm mov counter.low, EAX
__asm mov counter.high, EDX
__asm pop EDX
__asm pop EAX
return *(__int64 *)(&counter);
}
class cbench
{
public:
cbench(const char *desc_in)
: desc(strdup(desc_in)), start(get_cpu_clocks()) { }
~cbench()
{
printf("%s took: %.4f ms\n", desc, (float)(get_cpu_clocks()-start)/get_cpu_ticks());
if(desc) free(desc);
}
private:
char *desc;
__int64 start;
};
Usage Example:
int main()
{
{
cbench c("test");
... code ...
}
return 0;
}
Result:
test took: 0.0002 ms
Has some function call overhead, but should be still more than fast enough :)
You can use Embedded Profiler (free for Windows and Linux) which has an interface to a multiplatform timer (in a processor cycle count) and can give you a number of cycles per seconds:
EProfilerTimer timer;
timer.Start();
... // Your code here
const uint64_t number_of_elapsed_cycles = timer.Stop();
const uint64_t nano_seconds_elapsed =
mumber_of_elapsed_cycles / (double) timer.GetCyclesPerSecond() * 1000000000;
Recalculation of cycle count to time is possibly a dangerous operation with modern processors where CPU frequency can be changed dynamically. Therefore to be sure that converted times are correct, it is necessary to fix processor frequency before profiling.
If this is for Linux, I've been using the function "gettimeofday", which returns a struct that gives the seconds and microseconds since the Epoch. You can then use timersub to subtract the two to get the difference in time, and convert it to whatever precision of time you want. However, you specify nanoseconds, and it looks like the function clock_gettime() is what you're looking for. It puts the time in terms of seconds and nanoseconds into the structure you pass into it.
What do you think about that:
int iceu_system_GetTimeNow(long long int *res)
{
static struct timespec buffer;
//
#ifdef __CYGWIN__
if (clock_gettime(CLOCK_REALTIME, &buffer))
return 1;
#else
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &buffer))
return 1;
#endif
*res=(long long int)buffer.tv_sec * 1000000000LL + (long long int)buffer.tv_nsec;
return 0;
}
Here is a nice Boost timer that works well:
//Stopwatch.hpp
#ifndef STOPWATCH_HPP
#define STOPWATCH_HPP
//Boost
#include <boost/chrono.hpp>
//Std
#include <cstdint>
class Stopwatch
{
public:
Stopwatch();
virtual ~Stopwatch();
void Restart();
std::uint64_t Get_elapsed_ns();
std::uint64_t Get_elapsed_us();
std::uint64_t Get_elapsed_ms();
std::uint64_t Get_elapsed_s();
private:
boost::chrono::high_resolution_clock::time_point _start_time;
};
#endif // STOPWATCH_HPP
//Stopwatch.cpp
#include "Stopwatch.hpp"
Stopwatch::Stopwatch():
_start_time(boost::chrono::high_resolution_clock::now()) {}
Stopwatch::~Stopwatch() {}
void Stopwatch::Restart()
{
_start_time = boost::chrono::high_resolution_clock::now();
}
std::uint64_t Stopwatch::Get_elapsed_ns()
{
boost::chrono::nanoseconds nano_s = boost::chrono::duration_cast<boost::chrono::nanoseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(nano_s.count());
}
std::uint64_t Stopwatch::Get_elapsed_us()
{
boost::chrono::microseconds micro_s = boost::chrono::duration_cast<boost::chrono::microseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(micro_s.count());
}
std::uint64_t Stopwatch::Get_elapsed_ms()
{
boost::chrono::milliseconds milli_s = boost::chrono::duration_cast<boost::chrono::milliseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(milli_s.count());
}
std::uint64_t Stopwatch::Get_elapsed_s()
{
boost::chrono::seconds sec = boost::chrono::duration_cast<boost::chrono::seconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(sec.count());
}
Minimalistic copy&paste-struct + lazy usage
If the idea is to have a minimalistic struct that you can use for quick tests, then I suggest you just copy and paste anywhere in your C++ file right after the #include's. This is the only instance in which I sacrifice Allman-style formatting.
You can easily adjust the precision in the first line of the struct. Possible values are: nanoseconds, microseconds, milliseconds, seconds, minutes, or hours.
#include <chrono>
struct MeasureTime
{
using precision = std::chrono::microseconds;
std::vector<std::chrono::steady_clock::time_point> times;
std::chrono::steady_clock::time_point oneLast;
void p() {
std::cout << "Mark "
<< times.size()/2
<< ": "
<< std::chrono::duration_cast<precision>(times.back() - oneLast).count()
<< std::endl;
}
void m() {
oneLast = times.back();
times.push_back(std::chrono::steady_clock::now());
}
void t() {
m();
p();
m();
}
MeasureTime() {
times.push_back(std::chrono::steady_clock::now());
}
};
Usage
MeasureTime m; // first time is already in memory
doFnc1();
m.t(); // Mark 1: next time, and print difference with previous mark
doFnc2();
m.t(); // Mark 2: next time, and print difference with previous mark
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.t(); // prints 'Mark 3: 123123' etc...
Standard output result
Mark 1: 123
Mark 2: 32
Mark 3: 433234
If you want summary after execution
If you want the report afterwards, because for example your code in between also writes to standard output. Then add the following function to the struct (just before MeasureTime()):
void s() { // summary
int i = 0;
std::chrono::steady_clock::time_point tprev;
for(auto tcur : times)
{
if(i > 0)
{
std::cout << "Mark " << i << ": "
<< std::chrono::duration_cast<precision>(tprev - tcur).count()
<< std::endl;
}
tprev = tcur;
++i;
}
}
So then you can just use:
MeasureTime m;
doFnc1();
m.m();
doFnc2();
m.m();
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.m();
m.s();
Which will list all the marks just like before, but then after the other code is executed. Note that you shouldn't use both m.s() and m.t().
plf::nanotimer is a lightweight option for this, works in Windows, Linux, Mac and BSD etc. Has ~microsecond accuracy depending on OS:
#include "plf_nanotimer.h"
#include <iostream>
int main(int argc, char** argv)
{
plf::nanotimer timer;
timer.start()
// Do something here
double results = timer.get_elapsed_ns();
std::cout << "Timing: " << results << " nanoseconds." << std::endl;
return 0;
}