Related
What's the best way to calculate a time difference in C++? I'm timing the execution speed of a program, so I'm interested in milliseconds. Better yet, seconds.milliseconds..
The accepted answer works, but needs to include ctime or time.h as noted in the comments.
See std::clock() function.
const clock_t begin_time = clock();
// do something
std::cout << float( clock () - begin_time ) / CLOCKS_PER_SEC;
If you want calculate execution time for self ( not for user ), it is better to do this in clock ticks ( not seconds ).
EDIT:
responsible header files - <ctime> or <time.h>
I added this answer to clarify that the accepted answer shows CPU time which may not be the time you want. Because according to the reference, there are CPU time and wall clock time. Wall clock time is the time which shows the actual elapsed time regardless of any other conditions like CPU shared by other processes. For example, I used multiple processors to do a certain task and the CPU time was high 18s where it actually took 2s in actual wall clock time.
To get the actual time you do,
#include <chrono>
auto t_start = std::chrono::high_resolution_clock::now();
// the work...
auto t_end = std::chrono::high_resolution_clock::now();
double elapsed_time_ms = std::chrono::duration<double, std::milli>(t_end-t_start).count();
if you are using c++11, here is a simple wrapper (see this gist):
#include <iostream>
#include <chrono>
class Timer
{
public:
Timer() : beg_(clock_::now()) {}
void reset() { beg_ = clock_::now(); }
double elapsed() const {
return std::chrono::duration_cast<second_>
(clock_::now() - beg_).count(); }
private:
typedef std::chrono::high_resolution_clock clock_;
typedef std::chrono::duration<double, std::ratio<1> > second_;
std::chrono::time_point<clock_> beg_;
};
Or for c++03 on *nix:
#include <iostream>
#include <ctime>
class Timer
{
public:
Timer() { clock_gettime(CLOCK_REALTIME, &beg_); }
double elapsed() {
clock_gettime(CLOCK_REALTIME, &end_);
return end_.tv_sec - beg_.tv_sec +
(end_.tv_nsec - beg_.tv_nsec) / 1000000000.;
}
void reset() { clock_gettime(CLOCK_REALTIME, &beg_); }
private:
timespec beg_, end_;
};
Example of usage:
int main()
{
Timer tmr;
double t = tmr.elapsed();
std::cout << t << std::endl;
tmr.reset();
t = tmr.elapsed();
std::cout << t << std::endl;
return 0;
}
I would seriously consider the use of Boost, particularly boost::posix_time::ptime and boost::posix_time::time_duration (at http://www.boost.org/doc/libs/1_38_0/doc/html/date_time/posix_time.html).
It's cross-platform, easy to use, and in my experience provides the highest level of time resolution an operating system provides. Possibly also very important; it provides some very nice IO operators.
To use it to calculate the difference in program execution (to microseconds; probably overkill), it would look something like this [browser written, not tested]:
ptime time_start(microsec_clock::local_time());
//... execution goes here ...
ptime time_end(microsec_clock::local_time());
time_duration duration(time_end - time_start);
cout << duration << '\n';
boost 1.46.0 and up includes the Chrono library:
thread_clock class provides access to the real thread wall-clock, i.e.
the real CPU-time clock of the calling thread. The thread relative
current time can be obtained by calling thread_clock::now()
#include <boost/chrono/thread_clock.hpp>
{
...
using namespace boost::chrono;
thread_clock::time_point start = thread_clock::now();
...
thread_clock::time_point stop = thread_clock::now();
std::cout << "duration: " << duration_cast<milliseconds>(stop - start).count() << " ms\n";
In Windows: use GetTickCount
//GetTickCount defintition
#include <windows.h>
int main()
{
DWORD dw1 = GetTickCount();
//Do something
DWORD dw2 = GetTickCount();
cout<<"Time difference is "<<(dw2-dw1)<<" milliSeconds"<<endl;
}
You can also use the clock_gettime. This method can be used to measure:
System wide real-time clock
System wide monotonic clock
Per Process CPU time
Per process Thread CPU time
Code is as follows:
#include < time.h >
#include <iostream>
int main(){
timespec ts_beg, ts_end;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_beg);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_end);
std::cout << (ts_end.tv_sec - ts_beg.tv_sec) + (ts_end.tv_nsec - ts_beg.tv_nsec) / 1e9 << " sec";
}
`
just in case you are on Unix, you can use time to get the execution time:
$ g++ myprog.cpp -o myprog
$ time ./myprog
For me, the most easy way is:
#include <boost/timer.hpp>
boost::timer t;
double duration;
t.restart();
/* DO SOMETHING HERE... */
duration = t.elapsed();
t.restart();
/* DO OTHER STUFF HERE... */
duration = t.elapsed();
using this piece of code you don't have to do the classic end - start.
Enjoy your favorite approach.
Just a side note: if you're running on Windows, and you really really need precision, you can use QueryPerformanceCounter. It gives you time in (potentially) nanoseconds.
Get the system time in milliseconds at the beginning, and again at the end, and subtract.
To get the number of milliseconds since 1970 in POSIX you would write:
struct timeval tv;
gettimeofday(&tv, NULL);
return ((((unsigned long long)tv.tv_sec) * 1000) +
(((unsigned long long)tv.tv_usec) / 1000));
To get the number of milliseconds since 1601 on Windows you would write:
SYSTEMTIME systime;
FILETIME filetime;
GetSystemTime(&systime);
if (!SystemTimeToFileTime(&systime, &filetime))
return 0;
unsigned long long ns_since_1601;
ULARGE_INTEGER* ptr = (ULARGE_INTEGER*)&ns_since_1601;
// copy the result into the ULARGE_INTEGER; this is actually
// copying the result into the ns_since_1601 unsigned long long.
ptr->u.LowPart = filetime.dwLowDateTime;
ptr->u.HighPart = filetime.dwHighDateTime;
// Compute the number of milliseconds since 1601; we have to
// divide by 10,000, since the current value is the number of 100ns
// intervals since 1601, not ms.
return (ns_since_1601 / 10000);
If you cared to normalize the Windows answer so that it also returned the number of milliseconds since 1970, then you would have to adjust your answer by 11644473600000 milliseconds. But that isn't necessary if all you care about is the elapsed time.
If you are using:
tstart = clock();
// ...do something...
tend = clock();
Then you will need the following to get time in seconds:
time = (tend - tstart) / (double) CLOCKS_PER_SEC;
This seems to work fine for intel Mac 10.7:
#include <time.h>
time_t start = time(NULL);
//Do your work
time_t end = time(NULL);
std::cout<<"Execution Time: "<< (double)(end-start)<<" Seconds"<<std::endl;
#include <iostream>
#include <chrono>
using namespace std;
class MyTimer {
private:
std::chrono::time_point<std::chrono::steady_clock> starter;
std::chrono::time_point<std::chrono::steady_clock> ender;
public:
void startCounter() {
starter = std::chrono::steady_clock::now();
}
double getCounter() {
ender = std::chrono::steady_clock::now();
return double(std::chrono::duration_cast<std::chrono::nanoseconds>(ender - starter).count()) /
1000000; // millisecond output
}
// timer need to have nanosecond precision
int64_t getCounterNs() {
return std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::steady_clock::now() - starter).count();
}
};
MyTimer timer1, timer2, timerMain;
volatile int64_t dummy = 0, res1 = 0, res2 = 0;
// time run without any time measure
void func0() {
dummy++;
}
// we're trying to measure the cost of startCounter() and getCounterNs(), not "dummy++"
void func1() {
timer1.startCounter();
dummy++;
res1 += timer1.getCounterNs();
}
void func2() {
// start your counter here
dummy++;
// res2 += end your counter here
}
int main()
{
int i, ntest = 1000 * 1000 * 100;
int64_t runtime0, runtime1, runtime2;
timerMain.startCounter();
for (i=1; i<=ntest; i++) func0();
runtime0 = timerMain.getCounter();
cout << "Time0 = " << runtime0 << "ms\n";
timerMain.startCounter();
for (i=1; i<=ntest; i++) func1();
runtime1 = timerMain.getCounter();
cout << "Time1 = " << runtime1 << "ms\n";
timerMain.startCounter();
for (i=1; i<=ntest; i++) func2();
runtime2 = timerMain.getCounter();
cout << "Time2 = " << runtime2 << "ms\n";
return 0;
}
I'm trying to profile a program where certain critical parts have execution time measured in < 50 nanoseconds. I found that my timer class using std::chrono is too expensive (code with timing takes 40% more time than code without). How can I make a faster timer class?
I think some OS-specific system calls would be the fastest solution. The platform is Linux Ubuntu.
Edit: all code is compiled with -O3. It's ensured that each timer is only initialized once, so the measured cost is due to the startMeasure/stopMeasure functions only. I'm not doing any text printing.
Edit 2: the accepted answer doesn't include the method to actually convert number-of-cycles to nanoseconds. If someone can do that, it'd be very helpful.
What you want is called "micro-benchmarking". It can get very complex. I assume you are using Ubuntu Linux on x86_64. This is not valid form ARM, ARM64 or any other platforms.
std::chrono is implemented at libstdc++ (gcc) and libc++ (clang) on Linux as simply a thin wrapper around the GLIBC, the C library, which does all the heavy lifting. If you look at std::chrono::steady_clock::now() you will see calls to clock_gettime().
clock_gettime() is a VDSO, ie it is kernel code that runs in userspace. It should be very fast but it might be that from time to time it has to do some housekeeping and take a long time every n-th call. So I would not recommend for microbenchmarking.
Almost every platform has a cycle counter and x86 has the assembly instruction rdtsc. This instruction can be inserted in your code by crafting asm calls or by using the compiler-specific builtins __builtin_ia32_rdtsc() or __rdtsc().
These calls will return a 64-bit integer representing the number of clocks since the machine power up. rdtsc is not immediate but fast, it will take roughly 15-40 cycles to complete.
It is not guaranteed in all platforms that this counter will be the same for each core so beware when the process gets moved from core to core. In modern systems this should not be a problem though.
Another problem with rdtsc is that compilers will often reorder instructions if they find they don't have side effects and unfortunately rdtsc is one of them. So you have to use fake barriers around these counter reads if you see that the compiler is playing tricks on you - look at the generated assembly.
Also a big problem is cpu out of order execution itself. Not only the compiler can change the order of execution but the cpu can as well. Since the x86 486 the Intel CPUs are pipelined so several instructions can be executed at the same time - roughly speaking. So you might end up measuring spurious execution.
I recommend you to get familiar with the quantum-like problems of micro-benchmarking. It is not straightforward.
Notice that rdtsc() will return the number of cycles. You have to convert to nanoseconds using the timestamp counter frequency.
Here is one example:
#include <iostream>
#include <cstdio>
void dosomething() {
// yada yada
}
int main() {
double sum = 0;
const uint32_t numloops = 100000000;
for ( uint32_t j=0; j<numloops; ++j ) {
uint64_t t0 = __builtin_ia32_rdtsc();
dosomething();
uint64_t t1 = __builtin_ia32_rdtsc();
uint64_t elapsed = t1-t0;
sum += elapsed;
}
std::cout << "Average:" << sum/numloops << std::endl;
}
This paper is a bit outdated (2010) but it is sufficiently up to date to give you a good introduction to micro-benchmarking:
How to Benchmark Code Execution Times on IntelĀ® IA-32 and IA-64 Instruction Set Architectures
I am still new to C++, is the clock function absolute (meaning it counts how long you sleep for), or is it how much time the application actually executes for?
I want a reliable way to produce exact intervals of 1 second. I am saving files, so I need to account for that. I was returning the runtime for that in milliseconds, and then sleeping for the remainder.
Is there a more accurate or simpler way to do this?
EDIT:
The main problem I am having is that I am getting a negative number:
double FCamera::getRuntime(clock_t* end, clock_t* start)
{
return((double(end - start)/CLOCKS_PER_SEC)*1000);
}
clock_t start = clock();
doWork();
clock_t end = clock();
double runtimeInMilliseconds = getRuntime(&end, &start);
It's giving me a negative number, what's up with that?
Walter
clock() returns the number of clock ticks elapsed since the program was launched. If you want to convert the value returned by clock into seconds divide by CLOCKS_PER_SEC (and multiply for the other way around).
There is just one pitfall, the initial moment of reference used by clock as the beginning of the program execution may vary between platforms. To calculate the actual processing times of a program, the value returned by clock should be compared to a value returned by an initial call to clock.
EDIT
larsman has been so kind to post other pitfalls in the comments. I have included them here for future reference.
On several other implementations, the value returned by clock() also includes the times of any children whose status has been collected via wait(2) (or another wait-type call). Linux does not include the times of waited-for children in the value returned by clock().
Note that the time can wrap around. On a 32-bit system where CLOCKS_PER_SEC equals 1000000 [as mandated by POSIX] this function will return the same value approximately every 72 minutes.
EDIT2
After messing around a while here is my portable (Linux/Windows) msleep. Be wary though, I'm not experienced with C/C++ and will most likely contain the stupidest error ever.
#ifdef _WIN32
#include <windows.h>
#define msleep(ms) Sleep((DWORD) ms)
#else
#include <unistd.h>
inline void msleep(unsigned long ms) {
while (ms--) usleep(1000);
}
#endif
You missed * (pointer) ,
Your argument is pointer (address of clock_t variable)
so, Your code must be modified::
return((double(*end - *start)/CLOCKS_PER_SEC)*1000);
Under windows, you can use:
VOID WINAPI Sleep(
__in DWORD dwMilliseconds
);
In linux, you will want to use:
#include <unistd.h>
unsigned int sleep(unsigned int seconds);
Notice the parameter difference - milliseconds under windows and seconds under linux.
My approach relies on:
int gettimeofday(struct timeval *tv, struct timezone *tz);
which gives the number of seconds and microseconds since the Epoch. According to the man pages:
The tv argument is a struct timeval (as specified in <sys/time.h>):
struct timeval {
time_t tv_sec; /* seconds */
suseconds_t tv_usec; /* microseconds */
};
So here we go:
#include <sys/time.h>
#include <iostream>
#include <iomanip>
static long myclock()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return (tv.tv_sec * 1000000) + tv.tv_usec;
}
double getRuntime(long* end, long* start)
{
return (*end - *start);
}
void doWork()
{
sleep(3);
}
int main(void)
{
long start = myclock();
doWork();
long end = myclock();
std::cout << "Time elapsed: " << std::setprecision(6) << getRuntime(&end, &start)/1000.0 << " miliseconds" << std::endl;
std::cout << "Time elapsed: " << std::setprecision(3) << getRuntime(&end, &start)/1000000.0 << " seconds" << std::endl;
return 0;
}
Outputs:
Time elapsed: 3000.08 miliseconds
Time elapsed: 3 seconds
I need some way in c++ to keep track of the number of milliseconds since program execution. And I need the precision to be in milliseconds. (In my googling, I've found lots of folks that said to include time.h and then multiply the output of time() by 1000 ... this won't work.)
clock has been suggested a number of times. This has two problems. First of all, it often doesn't have a resolution even close to a millisecond (10-20 ms is probably more common). Second, some implementations of it (e.g., Unix and similar) return CPU time, while others (E.g., Windows) return wall time.
You haven't really said whether you want wall time or CPU time, which makes it hard to give a really good answer. On Windows, you could use GetProcessTimes. That will give you the kernel and user CPU times directly. It will also tell you when the process was created, so if you want milliseconds of wall time since process creation, you can subtract the process creation time from the current time (GetSystemTime). QueryPerformanceCounter has also been mentioned. This has a few oddities of its own -- for example, in some implementations it retrieves time from the CPUs cycle counter, so its frequency varies when/if the CPU speed changes. Other implementations read from the motherboard's 1.024 MHz timer, which does not vary with the CPU speed (and the conditions under which each are used aren't entirely obvious).
On Unix, you can use GetTimeOfDay to just get the wall time with (at least the possibility of) relatively high precision. If you want time for a process, you can use times or getrusage (the latter is newer and gives more complete information that may also be more precise).
Bottom line: as I said in my comment, there's no way to get what you want portably. Since you haven't said whether you want CPU time or wall time, even for a specific system, there's not one right answer. The one you've "accepted" (clock()) has the virtue of being available on essentially any system, but what it returns also varies just about the most widely.
See std::clock()
Include time.h, and then use the clock() function. It returns the number of clock ticks elapsed since the program was launched. Just divide it by "CLOCKS_PER_SEC" to obtain the number of seconds, you can then multiply by 1000 to obtain the number of milliseconds.
Some cross platform solution. This code was used for some kind of benchmarking:
#ifdef WIN32
LARGE_INTEGER g_llFrequency = {0};
BOOL g_bQueryResult = QueryPerformanceFrequency(&g_llFrequency);
#endif
//...
long long osQueryPerfomance()
{
#ifdef WIN32
LARGE_INTEGER llPerf = {0};
QueryPerformanceCounter(&llPerf);
return llPerf.QuadPart * 1000ll / ( g_llFrequency.QuadPart / 1000ll);
#else
struct timeval stTimeVal;
gettimeofday(&stTimeVal, NULL);
return stTimeVal.tv_sec * 1000000ll + stTimeVal.tv_usec;
#endif
}
The most portable way is using the clock function.It usually reports the time that your program has been using the processor, or an approximation thereof. Note however the following:
The resolution is not very good for GNU systems. That's really a pity.
Take care of casting everything to double before doing divisions and assignations.
The counter is held as a 32 bit number in GNU 32 bits, which can be pretty annoying for long-running programs.
There are alternatives using "wall time" which give better resolution, both in Windows and Linux. But as the libc manual states: If you're trying to optimize your program or measure its efficiency, it's very useful to know how much processor time it uses. For that, calendar time and elapsed times are useless because a process may spend time waiting for I/O or for other processes to use the CPU.
Here is a C++0x solution and an example why clock() might not do what you think it does.
#include <chrono>
#include <iostream>
#include <cstdlib>
#include <ctime>
int main()
{
auto start1 = std::chrono::monotonic_clock::now();
auto start2 = std::clock();
sleep(1);
for( int i=0; i<100000000; ++i);
auto end1 = std::chrono::monotonic_clock::now();
auto end2 = std::clock();
auto delta1 = end1-start1;
auto delta2 = end2-start2;
std::cout << "chrono: " << std::chrono::duration_cast<std::chrono::duration<float>>(delta1).count() << std::endl;
std::cout << "clock: " << static_cast<float>(delta2)/CLOCKS_PER_SEC << std::endl;
}
On my system this outputs:
chrono: 1.36839
clock: 0.36
You'll notice the clock() method is missing a second. An astute observer might also notice that clock() looks to have less resolution. On my system it's ticking by in 12 millisecond increments, terrible resolution.
If you are unable or unwilling to use C++0x, take a look at Boost.DateTime's ptime microsec_clock::universal_time().
This isn't C++ specific (nor portable), but you can do:
SYSTEMTIME systemDT;
In Windows.
From there, you can access each member of the systemDT struct.
You can record the time when the program started and compare the current time to the recorded time (systemDT versus systemDTtemp, for instance).
To refresh, you can call GetLocalTime(&systemDT);
To access each member, you would do systemDT.wHour, systemDT.wMinute, systemDT.wMilliseconds.
To get more information on SYSTEMTIME.
Do you want wall clock time, CPU time, or some other measurement? Also, what platform is this? There is no universally portable way to get more precision than time() and clock() give you, but...
on most Unix systems, you can use gettimeofday() and/or clock_gettime(), which give at least microsecond precision and access to a variety of timers;
I'm not nearly as familiar with Windows, but one of these functions probably does what you want.
You can try this code (get from StockFish chess engine source code (GPL)):
#include <iostream>
#include <stdio>
#if !defined(_WIN32) && !defined(_WIN64) // Linux - Unix
# include <sys/time.h>
typedef timeval sys_time_t;
inline void system_time(sys_time_t* t) {
gettimeofday(t, NULL);
}
inline long long time_to_msec(const sys_time_t& t) {
return t.tv_sec * 1000LL + t.tv_usec / 1000;
}
#else // Windows and MinGW
# include <sys/timeb.h>
typedef _timeb sys_time_t;
inline void system_time(sys_time_t* t) { _ftime(t); }
inline long long time_to_msec(const sys_time_t& t) {
return t.time * 1000LL + t.millitm;
}
#endif
struct Time {
void restart() { system_time(&t); }
uint64_t msec() const { return time_to_msec(t); }
long long elapsed() const {
return long long(current_time().msec() - time_to_msec(t));
}
static Time current_time() { Time t; t.restart(); return t; }
private:
sys_time_t t;
};
int main() {
sys_time_t t;
system_time(&t);
long long currentTimeMs = time_to_msec(t);
std::cout << "currentTimeMs:" << currentTimeMs << std::endl;
Time time = Time::current_time();
for (int i = 0; i < 1000000; i++) {
//Do something
}
long long e = time.elapsed();
std::cout << "time elapsed:" << e << std::endl;
getchar(); // wait for keyboard input
}
I wish to calculate the time it took for an API to return a value.
The time taken for such an action is in the space of nanoseconds. As the API is a C++ class/function, I am using the timer.h to calculate the same:
#include <ctime>
#include <iostream>
using namespace std;
int main(int argc, char** argv) {
clock_t start;
double diff;
start = clock();
diff = ( std::clock() - start ) / (double)CLOCKS_PER_SEC;
cout<<"printf: "<< diff <<'\n';
return 0;
}
The above code gives the time in seconds. How do I get the same in nano seconds and with more precision?
What others have posted about running the function repeatedly in a loop is correct.
For Linux (and BSD) you want to use clock_gettime().
#include <sys/time.h>
int main()
{
timespec ts;
// clock_gettime(CLOCK_MONOTONIC, &ts); // Works on FreeBSD
clock_gettime(CLOCK_REALTIME, &ts); // Works on Linux
}
For windows you want to use the QueryPerformanceCounter. And here is more on QPC
Apparently there is a known issue with QPC on some chipsets, so you may want to make sure you do not have those chipset. Additionally some dual core AMDs may also cause a problem. See the second post by sebbbi, where he states:
QueryPerformanceCounter() and
QueryPerformanceFrequency() offer a
bit better resolution, but have
different issues. For example in
Windows XP, all AMD Athlon X2 dual
core CPUs return the PC of either of
the cores "randomly" (the PC sometimes
jumps a bit backwards), unless you
specially install AMD dual core driver
package to fix the issue. We haven't
noticed any other dual+ core CPUs
having similar issues (p4 dual, p4 ht,
core2 dual, core2 quad, phenom quad).
EDIT 2013/07/16:
It looks like there is some controversy on the efficacy of QPC under certain circumstances as stated in http://msdn.microsoft.com/en-us/library/windows/desktop/ee417693(v=vs.85).aspx
...While QueryPerformanceCounter and QueryPerformanceFrequency typically adjust for
multiple processors, bugs in the BIOS or drivers may result in these routines returning
different values as the thread moves from one processor to another...
However this StackOverflow answer https://stackoverflow.com/a/4588605/34329 states that QPC should work fine on any MS OS after Win XP service pack 2.
This article shows that Windows 7 can determine if the processor(s) have an invariant TSC and falls back to an external timer if they don't. http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html Synchronizing across processors is still an issue.
Other fine reading related to timers:
https://blogs.oracle.com/dholmes/entry/inside_the_hotspot_vm_clocks
http://lwn.net/Articles/209101/
http://performancebydesign.blogspot.com/2012/03/high-resolution-clocks-and-timers-for.html
QueryPerformanceCounter Status?
See the comments for more details.
This new answer uses C++11's <chrono> facility. While there are other answers that show how to use <chrono>, none of them shows how to use <chrono> with the RDTSC facility mentioned in several of the other answers here. So I thought I would show how to use RDTSC with <chrono>. Additionally I'll demonstrate how you can templatize the testing code on the clock so that you can rapidly switch between RDTSC and your system's built-in clock facilities (which will likely be based on clock(), clock_gettime() and/or QueryPerformanceCounter.
Note that the RDTSC instruction is x86-specific. QueryPerformanceCounter is Windows only. And clock_gettime() is POSIX only. Below I introduce two new clocks: std::chrono::high_resolution_clock and std::chrono::system_clock, which, if you can assume C++11, are now cross-platform.
First, here is how you create a C++11-compatible clock out of the Intel rdtsc assembly instruction. I'll call it x::clock:
#include <chrono>
namespace x
{
struct clock
{
typedef unsigned long long rep;
typedef std::ratio<1, 2'800'000'000> period; // My machine is 2.8 GHz
typedef std::chrono::duration<rep, period> duration;
typedef std::chrono::time_point<clock> time_point;
static const bool is_steady = true;
static time_point now() noexcept
{
unsigned lo, hi;
asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
return time_point(duration(static_cast<rep>(hi) << 32 | lo));
}
};
} // x
All this clock does is count CPU cycles and store it in an unsigned 64-bit integer. You may need to tweak the assembly language syntax for your compiler. Or your compiler may offer an intrinsic you can use instead (e.g. now() {return __rdtsc();}).
To build a clock you have to give it the representation (storage type). You must also supply the clock period, which must be a compile time constant, even though your machine may change clock speed in different power modes. And from those you can easily define your clock's "native" time duration and time point in terms of these fundamentals.
If all you want to do is output the number of clock ticks, it doesn't really matter what number you give for the clock period. This constant only comes into play if you want to convert the number of clock ticks into some real-time unit such as nanoseconds. And in that case, the more accurate you are able to supply the clock speed, the more accurate will be the conversion to nanoseconds, (milliseconds, whatever).
Below is example code which shows how to use x::clock. Actually I've templated the code on the clock as I'd like to show how you can use many different clocks with the exact same syntax. This particular test is showing what the looping overhead is when running what you want to time under a loop:
#include <iostream>
template <class clock>
void
test_empty_loop()
{
// Define real time units
typedef std::chrono::duration<unsigned long long, std::pico> picoseconds;
// or:
// typedef std::chrono::nanoseconds nanoseconds;
// Define double-based unit of clock tick
typedef std::chrono::duration<double, typename clock::period> Cycle;
using std::chrono::duration_cast;
const int N = 100000000;
// Do it
auto t0 = clock::now();
for (int j = 0; j < N; ++j)
asm volatile("");
auto t1 = clock::now();
// Get the clock ticks per iteration
auto ticks_per_iter = Cycle(t1-t0)/N;
std::cout << ticks_per_iter.count() << " clock ticks per iteration\n";
// Convert to real time units
std::cout << duration_cast<picoseconds>(ticks_per_iter).count()
<< "ps per iteration\n";
}
The first thing this code does is create a "real time" unit to display the results in. I've chosen picoseconds, but you can choose any units you like, either integral or floating point based. As an example there is a pre-made std::chrono::nanoseconds unit I could have used.
As another example I want to print out the average number of clock cycles per iteration as a floating point, so I create another duration, based on double, that has the same units as the clock's tick does (called Cycle in the code).
The loop is timed with calls to clock::now() on either side. If you want to name the type returned from this function it is:
typename clock::time_point t0 = clock::now();
(as clearly shown in the x::clock example, and is also true of the system-supplied clocks).
To get a duration in terms of floating point clock ticks one merely subtracts the two time points, and to get the per iteration value, divide that duration by the number of iterations.
You can get the count in any duration by using the count() member function. This returns the internal representation. Finally I use std::chrono::duration_cast to convert the duration Cycle to the duration picoseconds and print that out.
To use this code is simple:
int main()
{
std::cout << "\nUsing rdtsc:\n";
test_empty_loop<x::clock>();
std::cout << "\nUsing std::chrono::high_resolution_clock:\n";
test_empty_loop<std::chrono::high_resolution_clock>();
std::cout << "\nUsing std::chrono::system_clock:\n";
test_empty_loop<std::chrono::system_clock>();
}
Above I exercise the test using our home-made x::clock, and compare those results with using two of the system-supplied clocks: std::chrono::high_resolution_clock and std::chrono::system_clock. For me this prints out:
Using rdtsc:
1.72632 clock ticks per iteration
616ps per iteration
Using std::chrono::high_resolution_clock:
0.620105 clock ticks per iteration
620ps per iteration
Using std::chrono::system_clock:
0.00062457 clock ticks per iteration
624ps per iteration
This shows that each of these clocks has a different tick period, as the ticks per iteration is vastly different for each clock. However when converted to a known unit of time (e.g. picoseconds), I get approximately the same result for each clock (your mileage may vary).
Note how my code is completely free of "magic conversion constants". Indeed, there are only two magic numbers in the entire example:
The clock speed of my machine in order to define x::clock.
The number of iterations to test over. If changing this number makes your results vary greatly, then you should probably make the number of iterations higher, or empty your computer of competing processes while testing.
With that level of accuracy, it would be better to reason in CPU tick rather than in system call like clock(). And do not forget that if it takes more than one nanosecond to execute an instruction... having a nanosecond accuracy is pretty much impossible.
Still, something like that is a start:
Here's the actual code to retrieve number of 80x86 CPU clock ticks passed since the CPU was last started. It will work on Pentium and above (386/486 not supported). This code is actually MS Visual C++ specific, but can be probably very easy ported to whatever else, as long as it supports inline assembly.
inline __int64 GetCpuClocks()
{
// Counter
struct { int32 low, high; } counter;
// Use RDTSC instruction to get clocks count
__asm push EAX
__asm push EDX
__asm __emit 0fh __asm __emit 031h // RDTSC
__asm mov counter.low, EAX
__asm mov counter.high, EDX
__asm pop EDX
__asm pop EAX
// Return result
return *(__int64 *)(&counter);
}
This function has also the advantage of being extremely fast - it usually takes no more than 50 cpu cycles to execute.
Using the Timing Figures:
If you need to translate the clock counts into true elapsed time, divide the results by your chip's clock speed. Remember that the "rated" GHz is likely to be slightly different from the actual speed of your chip. To check your chip's true speed, you can use several very good utilities or the Win32 call, QueryPerformanceFrequency().
To do this correctly you can use one of two ways, either go with RDTSC or with clock_gettime().
The second is about 2 times faster and has the advantage of giving the right absolute time. Note that for RDTSC to work correctly you need to use it as indicated (other comments on this page have errors, and may yield incorrect timing values on certain processors)
inline uint64_t rdtsc()
{
uint32_t lo, hi;
__asm__ __volatile__ (
"xorl %%eax, %%eax\n"
"cpuid\n"
"rdtsc\n"
: "=a" (lo), "=d" (hi)
:
: "%ebx", "%ecx" );
return (uint64_t)hi << 32 | lo;
}
and for clock_gettime: (I chose microsecond resolution arbitrarily)
#include <time.h>
#include <sys/timeb.h>
// needs -lrt (real-time lib)
// 1970-01-01 epoch UTC time, 1 mcs resolution (divide by 1M to get time_t)
uint64_t ClockGetTime()
{
timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
return (uint64_t)ts.tv_sec * 1000000LL + (uint64_t)ts.tv_nsec / 1000LL;
}
the timing and values produced:
Absolute values:
rdtsc = 4571567254267600
clock_gettime = 1278605535506855
Processing time: (10000000 runs)
rdtsc = 2292547353
clock_gettime = 1031119636
I am using the following to get the desired results:
#include <time.h>
#include <iostream>
using namespace std;
int main (int argc, char** argv)
{
// reset the clock
timespec tS;
tS.tv_sec = 0;
tS.tv_nsec = 0;
clock_settime(CLOCK_PROCESS_CPUTIME_ID, &tS);
...
... <code to check for the time to be put here>
...
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &tS);
cout << "Time taken is: " << tS.tv_sec << " " << tS.tv_nsec << endl;
return 0;
}
For C++11, here is a simple wrapper:
#include <iostream>
#include <chrono>
class Timer
{
public:
Timer() : beg_(clock_::now()) {}
void reset() { beg_ = clock_::now(); }
double elapsed() const {
return std::chrono::duration_cast<second_>
(clock_::now() - beg_).count(); }
private:
typedef std::chrono::high_resolution_clock clock_;
typedef std::chrono::duration<double, std::ratio<1> > second_;
std::chrono::time_point<clock_> beg_;
};
Or for C++03 on *nix,
class Timer
{
public:
Timer() { clock_gettime(CLOCK_REALTIME, &beg_); }
double elapsed() {
clock_gettime(CLOCK_REALTIME, &end_);
return end_.tv_sec - beg_.tv_sec +
(end_.tv_nsec - beg_.tv_nsec) / 1000000000.;
}
void reset() { clock_gettime(CLOCK_REALTIME, &beg_); }
private:
timespec beg_, end_;
};
Example of usage:
int main()
{
Timer tmr;
double t = tmr.elapsed();
std::cout << t << std::endl;
tmr.reset();
t = tmr.elapsed();
std::cout << t << std::endl;
return 0;
}
From https://gist.github.com/gongzhitaao/7062087
In general, for timing how long it takes to call a function, you want to do it many more times than just once. If you call your function only once and it takes a very short time to run, you still have the overhead of actually calling the timer functions and you don't know how long that takes.
For example, if you estimate your function might take 800 ns to run, call it in a loop ten million times (which will then take about 8 seconds). Divide the total time by ten million to get the time per call.
You can use the following function with gcc running under x86 processors:
unsigned long long rdtsc()
{
#define rdtsc(low, high) \
__asm__ __volatile__("rdtsc" : "=a" (low), "=d" (high))
unsigned int low, high;
rdtsc(low, high);
return ((ulonglong)high << 32) | low;
}
with Digital Mars C++:
unsigned long long rdtsc()
{
_asm
{
rdtsc
}
}
which reads the high performance timer on the chip. I use this when doing profiling.
If you need subsecond precision, you need to use system-specific extensions, and will have to check with the documentation for the operating system. POSIX supports up to microseconds with gettimeofday, but nothing more precise since computers didn't have frequencies above 1GHz.
If you are using Boost, you can check boost::posix_time.
I'm using Borland code here is the code ti_hund gives me some times a negativnumber but timing is fairly good.
#include <dos.h>
void main()
{
struct time t;
int Hour,Min,Sec,Hun;
gettime(&t);
Hour=t.ti_hour;
Min=t.ti_min;
Sec=t.ti_sec;
Hun=t.ti_hund;
printf("Start time is: %2d:%02d:%02d.%02d\n",
t.ti_hour, t.ti_min, t.ti_sec, t.ti_hund);
....
your code to time
...
// read the time here remove Hours and min if the time is in sec
gettime(&t);
printf("\nTid Hour:%d Min:%d Sec:%d Hundreds:%d\n",t.ti_hour-Hour,
t.ti_min-Min,t.ti_sec-Sec,t.ti_hund-Hun);
printf("\n\nAlt Ferdig Press a Key\n\n");
getch();
} // end main
Using Brock Adams's method, with a simple class:
int get_cpu_ticks()
{
LARGE_INTEGER ticks;
QueryPerformanceFrequency(&ticks);
return ticks.LowPart;
}
__int64 get_cpu_clocks()
{
struct { int32 low, high; } counter;
__asm cpuid
__asm push EDX
__asm rdtsc
__asm mov counter.low, EAX
__asm mov counter.high, EDX
__asm pop EDX
__asm pop EAX
return *(__int64 *)(&counter);
}
class cbench
{
public:
cbench(const char *desc_in)
: desc(strdup(desc_in)), start(get_cpu_clocks()) { }
~cbench()
{
printf("%s took: %.4f ms\n", desc, (float)(get_cpu_clocks()-start)/get_cpu_ticks());
if(desc) free(desc);
}
private:
char *desc;
__int64 start;
};
Usage Example:
int main()
{
{
cbench c("test");
... code ...
}
return 0;
}
Result:
test took: 0.0002 ms
Has some function call overhead, but should be still more than fast enough :)
You can use Embedded Profiler (free for Windows and Linux) which has an interface to a multiplatform timer (in a processor cycle count) and can give you a number of cycles per seconds:
EProfilerTimer timer;
timer.Start();
... // Your code here
const uint64_t number_of_elapsed_cycles = timer.Stop();
const uint64_t nano_seconds_elapsed =
mumber_of_elapsed_cycles / (double) timer.GetCyclesPerSecond() * 1000000000;
Recalculation of cycle count to time is possibly a dangerous operation with modern processors where CPU frequency can be changed dynamically. Therefore to be sure that converted times are correct, it is necessary to fix processor frequency before profiling.
If this is for Linux, I've been using the function "gettimeofday", which returns a struct that gives the seconds and microseconds since the Epoch. You can then use timersub to subtract the two to get the difference in time, and convert it to whatever precision of time you want. However, you specify nanoseconds, and it looks like the function clock_gettime() is what you're looking for. It puts the time in terms of seconds and nanoseconds into the structure you pass into it.
What do you think about that:
int iceu_system_GetTimeNow(long long int *res)
{
static struct timespec buffer;
//
#ifdef __CYGWIN__
if (clock_gettime(CLOCK_REALTIME, &buffer))
return 1;
#else
if (clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &buffer))
return 1;
#endif
*res=(long long int)buffer.tv_sec * 1000000000LL + (long long int)buffer.tv_nsec;
return 0;
}
Here is a nice Boost timer that works well:
//Stopwatch.hpp
#ifndef STOPWATCH_HPP
#define STOPWATCH_HPP
//Boost
#include <boost/chrono.hpp>
//Std
#include <cstdint>
class Stopwatch
{
public:
Stopwatch();
virtual ~Stopwatch();
void Restart();
std::uint64_t Get_elapsed_ns();
std::uint64_t Get_elapsed_us();
std::uint64_t Get_elapsed_ms();
std::uint64_t Get_elapsed_s();
private:
boost::chrono::high_resolution_clock::time_point _start_time;
};
#endif // STOPWATCH_HPP
//Stopwatch.cpp
#include "Stopwatch.hpp"
Stopwatch::Stopwatch():
_start_time(boost::chrono::high_resolution_clock::now()) {}
Stopwatch::~Stopwatch() {}
void Stopwatch::Restart()
{
_start_time = boost::chrono::high_resolution_clock::now();
}
std::uint64_t Stopwatch::Get_elapsed_ns()
{
boost::chrono::nanoseconds nano_s = boost::chrono::duration_cast<boost::chrono::nanoseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(nano_s.count());
}
std::uint64_t Stopwatch::Get_elapsed_us()
{
boost::chrono::microseconds micro_s = boost::chrono::duration_cast<boost::chrono::microseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(micro_s.count());
}
std::uint64_t Stopwatch::Get_elapsed_ms()
{
boost::chrono::milliseconds milli_s = boost::chrono::duration_cast<boost::chrono::milliseconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(milli_s.count());
}
std::uint64_t Stopwatch::Get_elapsed_s()
{
boost::chrono::seconds sec = boost::chrono::duration_cast<boost::chrono::seconds>(boost::chrono::high_resolution_clock::now() - _start_time);
return static_cast<std::uint64_t>(sec.count());
}
Minimalistic copy&paste-struct + lazy usage
If the idea is to have a minimalistic struct that you can use for quick tests, then I suggest you just copy and paste anywhere in your C++ file right after the #include's. This is the only instance in which I sacrifice Allman-style formatting.
You can easily adjust the precision in the first line of the struct. Possible values are: nanoseconds, microseconds, milliseconds, seconds, minutes, or hours.
#include <chrono>
struct MeasureTime
{
using precision = std::chrono::microseconds;
std::vector<std::chrono::steady_clock::time_point> times;
std::chrono::steady_clock::time_point oneLast;
void p() {
std::cout << "Mark "
<< times.size()/2
<< ": "
<< std::chrono::duration_cast<precision>(times.back() - oneLast).count()
<< std::endl;
}
void m() {
oneLast = times.back();
times.push_back(std::chrono::steady_clock::now());
}
void t() {
m();
p();
m();
}
MeasureTime() {
times.push_back(std::chrono::steady_clock::now());
}
};
Usage
MeasureTime m; // first time is already in memory
doFnc1();
m.t(); // Mark 1: next time, and print difference with previous mark
doFnc2();
m.t(); // Mark 2: next time, and print difference with previous mark
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.t(); // prints 'Mark 3: 123123' etc...
Standard output result
Mark 1: 123
Mark 2: 32
Mark 3: 433234
If you want summary after execution
If you want the report afterwards, because for example your code in between also writes to standard output. Then add the following function to the struct (just before MeasureTime()):
void s() { // summary
int i = 0;
std::chrono::steady_clock::time_point tprev;
for(auto tcur : times)
{
if(i > 0)
{
std::cout << "Mark " << i << ": "
<< std::chrono::duration_cast<precision>(tprev - tcur).count()
<< std::endl;
}
tprev = tcur;
++i;
}
}
So then you can just use:
MeasureTime m;
doFnc1();
m.m();
doFnc2();
m.m();
doStuff = doMoreStuff();
andDoItAgain = doStuff.aoeuaoeu();
m.m();
m.s();
Which will list all the marks just like before, but then after the other code is executed. Note that you shouldn't use both m.s() and m.t().
plf::nanotimer is a lightweight option for this, works in Windows, Linux, Mac and BSD etc. Has ~microsecond accuracy depending on OS:
#include "plf_nanotimer.h"
#include <iostream>
int main(int argc, char** argv)
{
plf::nanotimer timer;
timer.start()
// Do something here
double results = timer.get_elapsed_ns();
std::cout << "Timing: " << results << " nanoseconds." << std::endl;
return 0;
}