I've been writing an OpenCL program that I first wrote on my Macbook Pro and since my desktop computer is stronger i wanted to port the code and see if there is any improvement.
The same code ran for:
Mac: 0.055452s
Win7 :0.359s
The specifications of both computers are:
Mac : 2.6GHz Intel Core i5, 8GB 1600MHz DDR3, Intel Iris 1536MB
PC : 3.3GHz Intel Core i5-2500k, 8GB 1600MHz DDR3, AMD Radeon HD 6900 Series
Now as you can see the code ran on my Mac almost 10x faster than on my desktop PC.
I timed the code using
#include<ctime>
clock_t begin = clock();
....// Entire main file
float timeTaken = (float)(clock() - begin) / CLOCKS_PER_SEC;
cout << "Time taken: " << timeTaken << endl;
If I am not mistaken, both the CPU and the GPU are stronger on the PC. I was able to run Battlefield 3 on Ultra settings with this desktop computer.
Only difference might being that Visual Studio on the PC compiles with another compiler? I used g++ on my mac, not sure what Visual Studio uses.
These results don't make sense to me. What do you guys think? If you want to check out the code I can post the github link
EDIT: The following github link shows the code https://github.com/Batkow/OpenCL/tree/master/OpenCL . PSO_V2 uses the type of coding used in the tutorial from : https://www.fixstars.com/en/opencl/book/OpenCLProgrammingBook/introduction-to-parallelization/
And PSO simplifies the coding using the custom headers from this github repo: https://github.com/HandsOnOpenCL/Exercises-Solutions ..
I ran the code on my friends new i7 laptop with an NVidia Geforce 950M and the code was executed even slower than on my desktop PC.
I do realize that the code isn't optimized so any hints on stupid stuff I do, please call out on it. For instance having a while loop across three different kernel functions is kind of stupid right? I'm working on to implement all of it inside a kernel and loop inside of it, which should improve performance?
UPDATE: Ran the OpenCL/PSO code on the windows at home again. Timing the code before and after the while loop gives WINDOWS a faster performance yay!
clock_t Win7 = 0.027 and Mac = 0.036. Using the external .hpp with the Util::Timer class Win7 ran on :0.026s while Mac on 0.085s.
Timing from the start of the main file to right before the while loop (all of the initializations) then Mac scored better than Windows almost by 10 times using both clock_t and the Util::Timer. So the bottleneck seems to be at the initialization of the device?
Could be dozens of things - what the CL kernel does would be key, and how well that works on different types of GPUs. Or what compiler is used
However, I think the problem is how you are measuring time. clock() on Windows measure "wallclock-time" (in other words "elapsed time"), on OSX (and all other sane OS's), it reports CPU time for your process. If the OSX runs on the graphics processor [or in a separate process], it won't count as CPU-time, where Windows measures the overall time.
Either measure the CPU time using an appropriate CPU time measurement on Windows (using GetProcessTimes for example). Or use c++ std::chrono to measure wall-clock-time in both places.
Maybe the problem with way you measuring of elapsing time for example I made three different way to do so in my project for different OS:
#include <cstdio>
#include <iostream>
#if defined(_WIN32) || defined(_WIN64)
#include <windows.h>
#elif defined(__linux__)
#include <time.h>
#elif defined(__APPLE__)
#include <mach/mach.h>
#include <mach/mach_time.h>
#include <stddef.h>
#endif
#if defined(_WIN32) || defined(_WIN64)
LARGE_INTEGER frequency; // ticks per second
LARGE_INTEGER t0, t1, t2; // ticks
LARGE_INTEGER t3, t4;
#elif defined(__linux__)
timespec t0, t1, t2;
timespec t3, t4;
double us;
#elif defined(__APPLE__)
unsigned long t0, t1, t2;
#endif
double elapsedTime;
void refreshTime() {
#if defined(_WIN32) || defined(_WIN64)
QueryPerformanceFrequency(&frequency); // get ticks per second
QueryPerformanceCounter(&t1); // start timer
t0 = t1;
#elif defined(__linux__)
clock_gettime(CLOCK_MONOTONIC_RAW, &t1);
t0 = t1;
#elif defined(__APPLE__)
t1 = mach_absolute_time();
t0 = t1;
#endif
}
void watch_report(const char *str) {
#if defined(_WIN32) || defined(_WIN64)
QueryPerformanceCounter(&t2);
printf(str, (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart);
t1 = t2;
elapsedTime = (t2.QuadPart - t0.QuadPart) * 1000.0 / frequency.QuadPart;
#elif defined(__linux__)
clock_gettime(CLOCK_MONOTONIC_RAW, &t2);
time_t sec = t2.tv_sec - t1.tv_sec;
long nsec;
if (t2.tv_nsec >= t1.tv_nsec) {
nsec = t2.tv_nsec - t1.tv_nsec;
} else {
nsec = 1000000000 - (t1.tv_nsec - t2.tv_nsec);
sec -= 1;
}
printf(str, (float)sec * 1000.f + (float)nsec / 1000000.f);
t1 = t2;
elapsedTime = (float)(t2.tv_sec - t0.tv_sec) * 1000.f +
(float)(t2.tv_nsec - t0.tv_nsec) / 1000000.f;
#elif defined(__APPLE__)
uint64_t elapsedNano;
static mach_timebase_info_data_t sTimebaseInfo;
if (sTimebaseInfo.denom == 0) {
(void)mach_timebase_info(&sTimebaseInfo);
}
t2 = mach_absolute_time();
elapsedNano = (t2 - t1) * sTimebaseInfo.numer / sTimebaseInfo.denom;
printf(str, (float)elapsedNano / 1000000.f);
t1 = t2;
elapsedNano = (t2 - t0) * sTimebaseInfo.numer / sTimebaseInfo.denom;
elapsedTime = (float)elapsedNano / 1000000.f;
#endif
}
/*This Function will work till you press q*/
void someFunction() {
while (1) {
char ch = std::cin.get();
if (ch == 'q')
break;
}
}
int main() {
refreshTime();
someFunction();
watch_report("some function was working: \t%9.3f ms\n");
}
Its because clock() function just counts the CPU clock cycles. Your host code could be invoking the kernel and then going to sleep. And this sleep time wont be counted by the clock function even if you kernel is executing. So the what this means is that the clock function considers only the execution time of the host code and not the openCL kernel. You need to use a function that counts the wall clock time rather than the CPU clock cycles.
Visual Studio will use MSVC compiler unless specified directly.
I think that the answer is hidden in you CPU's generations. On you PC it is Sandy Bridge (2nd gen) and on Mac - Haswell (4th gen). It is 2 generations difference.
OpenCL is one of the things that evolved significantly during these generations of Intel processors (excessive hardware support in Haswell).
Just to get the proof - find a friend with desktop equipped with Haswell CPU and run your tests. Desktop Haswell processor should beat your Mac's one (of course if the other hardware specs and the overall system load will match).
You've posted your timing functions but not the OpenCL code. What all is being timed? A big factor might be the kernel compilation time (clCreateProgramFromSource and clBuildProgram). Your PC might be using cached kernels while your Mac isn't. The proper way to measure OpenCL kernel execution time is to use OpenCL events.
Another possible reason why you might be having that results — maybe your program compiles to OpenCL ver.2.0.
On Windows, your GPU is from 2010, it only supports OpenCL 1.2.
On OSX, your Intel GPU supports OpenCL 2.0.
Related
I'm looking to implement a simple timer mechanism in C++. The code should work in Windows and Linux. The resolution should be as precise as possible (at least millisecond accuracy). This will be used to simply track the passage of time, not to implement any kind of event-driven design. What is the best tool to accomplish this?
Updated answer for an old question:
In C++11 you can portably get to the highest resolution timer with:
#include <iostream>
#include <chrono>
#include "chrono_io"
int main()
{
typedef std::chrono::high_resolution_clock Clock;
auto t1 = Clock::now();
auto t2 = Clock::now();
std::cout << t2-t1 << '\n';
}
Example output:
74 nanoseconds
"chrono_io" is an extension to ease I/O issues with these new types and is freely available here.
There is also an implementation of <chrono> available in boost (might still be on tip-of-trunk, not sure it has been released).
Update
This is in response to Ben's comment below that subsequent calls to std::chrono::high_resolution_clock take several milliseconds in VS11. Below is a <chrono>-compatible workaround. However it only works on Intel hardware, you need to dip into inline assembly (syntax to do that varies with compiler), and you have to hardwire the machine's clock speed into the clock:
#include <chrono>
struct clock
{
typedef unsigned long long rep;
typedef std::ratio<1, 2800000000> period; // My machine is 2.8 GHz
typedef std::chrono::duration<rep, period> duration;
typedef std::chrono::time_point<clock> time_point;
static const bool is_steady = true;
static time_point now() noexcept
{
unsigned lo, hi;
asm volatile("rdtsc" : "=a" (lo), "=d" (hi));
return time_point(duration(static_cast<rep>(hi) << 32 | lo));
}
private:
static
unsigned
get_clock_speed()
{
int mib[] = {CTL_HW, HW_CPU_FREQ};
const std::size_t namelen = sizeof(mib)/sizeof(mib[0]);
unsigned freq;
size_t freq_len = sizeof(freq);
if (sysctl(mib, namelen, &freq, &freq_len, nullptr, 0) != 0)
return 0;
return freq;
}
static
bool
check_invariants()
{
static_assert(1 == period::num, "period must be 1/freq");
assert(get_clock_speed() == period::den);
static_assert(std::is_same<rep, duration::rep>::value,
"rep and duration::rep must be the same type");
static_assert(std::is_same<period, duration::period>::value,
"period and duration::period must be the same type");
static_assert(std::is_same<duration, time_point::duration>::value,
"duration and time_point::duration must be the same type");
return true;
}
static const bool invariants;
};
const bool clock::invariants = clock::check_invariants();
So it isn't portable. But if you want to experiment with a high resolution clock on your own intel hardware, it doesn't get finer than this. Though be forewarned, today's clock speeds can dynamically change (they aren't really a compile-time constant). And with a multiprocessor machine you can even get time stamps from different processors. But still, experiments on my hardware work fairly well. If you're stuck with millisecond resolution, this could be a workaround.
This clock has a duration in terms of your cpu's clock speed (as you reported it). I.e. for me this clock ticks once every 1/2,800,000,000 of a second. If you want to, you can convert this to nanoseconds (for example) with:
using std::chrono::nanoseconds;
using std::chrono::duration_cast;
auto t0 = clock::now();
auto t1 = clock::now();
nanoseconds ns = duration_cast<nanoseconds>(t1-t0);
The conversion will truncate fractions of a cpu cycle to form the nanosecond. Other rounding modes are possible, but that's a different topic.
For me this will return a duration as low as 18 clock ticks, which truncates to 6 nanoseconds.
I've added some "invariant checking" to the above clock, the most important of which is checking that the clock::period is correct for the machine. Again, this is not portable code, but if you're using this clock, you've already committed to that. The private get_clock_speed() function shown here gets the maximum cpu frequency on OS X, and that should be the same number as the constant denominator of clock::period.
Adding this will save you a little debugging time when you port this code to your new machine and forget to update the clock::period to the speed of your new machine. All of the checking is done either at compile-time or at program startup time. So it won't impact the performance of clock::now() in the least.
For C++03:
Boost.Timer might work, but it depends on the C function clock and so may not have good enough resolution for you.
Boost.Date_Time includes a ptime class that's been recommended on Stack Overflow before. See its docs on microsec_clock::local_time and microsec_clock::universal_time, but note its caveat that "Win32 systems often do not achieve microsecond resolution via this API."
STLsoft provides, among other things, thin cross-platform (Windows and Linux/Unix) C++ wrappers around OS-specific APIs. Its performance library has several classes that would do what you need. (To make it cross platform, pick a class like performance_counter that exists in both the winstl and unixstl namespaces, then use whichever namespace matches your platform.)
For C++11 and above:
The std::chrono library has this functionality built in. See this answer by #HowardHinnant for details.
Matthew Wilson's STLSoft libraries provide several timer types, with congruent interfaces so you can plug-and-play. Amongst the offerings are timers that are low-cost but low-resolution, and ones that are high-resolution but have high-cost. There are also ones for measuring pre-thread times and for measuring per-process times, as well as all that measure elapsed times.
There's an exhaustive article covering it in Dr. Dobb's from some years ago, although it only covers the Windows ones, those defined in the WinSTL sub-project. STLSoft also provides for UNIX timers in the UNIXSTL sub-project, and you can use the "PlatformSTL" one, which includes the UNIX or Windows one as appropriate, as in:
#include <platformstl/performance/performance_counter.hpp>
#include <iostream>
int main()
{
platformstl::performance_counter c;
c.start();
for(int i = 0; i < 1000000000; ++i);
c.stop();
std::cout << "time (s): " << c.get_seconds() << std::endl;
std::cout << "time (ms): " << c.get_milliseconds() << std::endl;
std::cout << "time (us): " << c.get_microseconds() << std::endl;
}
HTH
The StlSoft open source library provides a quite good timer on both windows and linux platforms. If you want it to implement on your own, just have a look at their sources.
The ACE library has portable high resolution timers also.
Doxygen for high res timer:
http://www.dre.vanderbilt.edu/Doxygen/5.7.2/html/ace/a00244.html
I have seen this implemented a few times as closed-source in-house solutions .... which all resorted to #ifdef solutions around native Windows hi-res timers on the one hand and Linux kernel timers using struct timeval (see man timeradd) on the other hand.
You can abstract this and a few Open Source projects have done it -- the last one I looked at was the CoinOR class CoinTimer but there are surely more of them.
I highly recommend boost::posix_time library for that. It supports timers in various resolutions down to microseconds I believe
SDL2 has an excellent cross-platform high-resolution timer. If however you need sub-millisecond accuracy, I wrote a very small cross-platform timer library here.
It is compatible with both C++03 and C++11/higher versions of C++.
I found this which looks promising, and is extremely straightforward, not sure if there are any drawbacks:
https://gist.github.com/ForeverZer0/0a4f80fc02b96e19380ebb7a3debbee5
/* ----------------------------------------------------------------------- */
/*
Easy embeddable cross-platform high resolution timer function. For each
platform we select the high resolution timer. You can call the 'ns()'
function in your file after embedding this.
*/
#include <stdint.h>
#if defined(__linux)
# define HAVE_POSIX_TIMER
# include <time.h>
# ifdef CLOCK_MONOTONIC
# define CLOCKID CLOCK_MONOTONIC
# else
# define CLOCKID CLOCK_REALTIME
# endif
#elif defined(__APPLE__)
# define HAVE_MACH_TIMER
# include <mach/mach_time.h>
#elif defined(_WIN32)
# define WIN32_LEAN_AND_MEAN
# include <windows.h>
#endif
static uint64_t ns() {
static uint64_t is_init = 0;
#if defined(__APPLE__)
static mach_timebase_info_data_t info;
if (0 == is_init) {
mach_timebase_info(&info);
is_init = 1;
}
uint64_t now;
now = mach_absolute_time();
now *= info.numer;
now /= info.denom;
return now;
#elif defined(__linux)
static struct timespec linux_rate;
if (0 == is_init) {
clock_getres(CLOCKID, &linux_rate);
is_init = 1;
}
uint64_t now;
struct timespec spec;
clock_gettime(CLOCKID, &spec);
now = spec.tv_sec * 1.0e9 + spec.tv_nsec;
return now;
#elif defined(_WIN32)
static LARGE_INTEGER win_frequency;
if (0 == is_init) {
QueryPerformanceFrequency(&win_frequency);
is_init = 1;
}
LARGE_INTEGER now;
QueryPerformanceCounter(&now);
return (uint64_t) ((1e9 * now.QuadPart) / win_frequency.QuadPart);
#endif
}
/* ----------------------------------------------------------------------- */-------------------------------- */
The first answer to C++ library questions is generally BOOST: http://www.boost.org/doc/libs/1_40_0/libs/timer/timer.htm. Does this do what you want? Probably not but it's a start.
The problem is you want portable and timer functions are not universal in OSes.
STLSoft have a Performance Library, which includes a set of timer classes, some that work for both UNIX and Windows.
I am not sure about your requirement, If you want to calculate time interval please see thread below
Calculating elapsed time in a C program in milliseconds
Late to the party here, but I'm working in a legacy codebase that can't be upgraded to c++11 yet. Nobody on our team is very skilled in c++, so adding a library like STL is proving difficult (on top of potential concerns others have raised about deployment issues). I really needed an extremely simple cross platform timer that could live by itself without anything beyond bare-bones standard system libraries. Here's what I found:
http://www.songho.ca/misc/timer/timer.html
Reposting the entire source here just so it doesn't get lost if the site ever dies:
//////////////////////////////////////////////////////////////////////////////
// Timer.cpp
// =========
// High Resolution Timer.
// This timer is able to measure the elapsed time with 1 micro-second accuracy
// in both Windows, Linux and Unix system
//
// AUTHOR: Song Ho Ahn (song.ahn#gmail.com) - http://www.songho.ca/misc/timer/timer.html
// CREATED: 2003-01-13
// UPDATED: 2017-03-30
//
// Copyright (c) 2003 Song Ho Ahn
//////////////////////////////////////////////////////////////////////////////
#include "Timer.h"
#include <stdlib.h>
///////////////////////////////////////////////////////////////////////////////
// constructor
///////////////////////////////////////////////////////////////////////////////
Timer::Timer()
{
#if defined(WIN32) || defined(_WIN32)
QueryPerformanceFrequency(&frequency);
startCount.QuadPart = 0;
endCount.QuadPart = 0;
#else
startCount.tv_sec = startCount.tv_usec = 0;
endCount.tv_sec = endCount.tv_usec = 0;
#endif
stopped = 0;
startTimeInMicroSec = 0;
endTimeInMicroSec = 0;
}
///////////////////////////////////////////////////////////////////////////////
// distructor
///////////////////////////////////////////////////////////////////////////////
Timer::~Timer()
{
}
///////////////////////////////////////////////////////////////////////////////
// start timer.
// startCount will be set at this point.
///////////////////////////////////////////////////////////////////////////////
void Timer::start()
{
stopped = 0; // reset stop flag
#if defined(WIN32) || defined(_WIN32)
QueryPerformanceCounter(&startCount);
#else
gettimeofday(&startCount, NULL);
#endif
}
///////////////////////////////////////////////////////////////////////////////
// stop the timer.
// endCount will be set at this point.
///////////////////////////////////////////////////////////////////////////////
void Timer::stop()
{
stopped = 1; // set timer stopped flag
#if defined(WIN32) || defined(_WIN32)
QueryPerformanceCounter(&endCount);
#else
gettimeofday(&endCount, NULL);
#endif
}
///////////////////////////////////////////////////////////////////////////////
// compute elapsed time in micro-second resolution.
// other getElapsedTime will call this first, then convert to correspond resolution.
///////////////////////////////////////////////////////////////////////////////
double Timer::getElapsedTimeInMicroSec()
{
#if defined(WIN32) || defined(_WIN32)
if(!stopped)
QueryPerformanceCounter(&endCount);
startTimeInMicroSec = startCount.QuadPart * (1000000.0 / frequency.QuadPart);
endTimeInMicroSec = endCount.QuadPart * (1000000.0 / frequency.QuadPart);
#else
if(!stopped)
gettimeofday(&endCount, NULL);
startTimeInMicroSec = (startCount.tv_sec * 1000000.0) + startCount.tv_usec;
endTimeInMicroSec = (endCount.tv_sec * 1000000.0) + endCount.tv_usec;
#endif
return endTimeInMicroSec - startTimeInMicroSec;
}
///////////////////////////////////////////////////////////////////////////////
// divide elapsedTimeInMicroSec by 1000
///////////////////////////////////////////////////////////////////////////////
double Timer::getElapsedTimeInMilliSec()
{
return this->getElapsedTimeInMicroSec() * 0.001;
}
///////////////////////////////////////////////////////////////////////////////
// divide elapsedTimeInMicroSec by 1000000
///////////////////////////////////////////////////////////////////////////////
double Timer::getElapsedTimeInSec()
{
return this->getElapsedTimeInMicroSec() * 0.000001;
}
///////////////////////////////////////////////////////////////////////////////
// same as getElapsedTimeInSec()
///////////////////////////////////////////////////////////////////////////////
double Timer::getElapsedTime()
{
return this->getElapsedTimeInSec();
}
and the header file:
//////////////////////////////////////////////////////////////////////////////
// Timer.h
// =======
// High Resolution Timer.
// This timer is able to measure the elapsed time with 1 micro-second accuracy
// in both Windows, Linux and Unix system
//
// AUTHOR: Song Ho Ahn (song.ahn#gmail.com) - http://www.songho.ca/misc/timer/timer.html
// CREATED: 2003-01-13
// UPDATED: 2017-03-30
//
// Copyright (c) 2003 Song Ho Ahn
//////////////////////////////////////////////////////////////////////////////
#ifndef TIMER_H_DEF
#define TIMER_H_DEF
#if defined(WIN32) || defined(_WIN32) // Windows system specific
#include <windows.h>
#else // Unix based system specific
#include <sys/time.h>
#endif
class Timer
{
public:
Timer(); // default constructor
~Timer(); // default destructor
void start(); // start timer
void stop(); // stop the timer
double getElapsedTime(); // get elapsed time in second
double getElapsedTimeInSec(); // get elapsed time in second (same as getElapsedTime)
double getElapsedTimeInMilliSec(); // get elapsed time in milli-second
double getElapsedTimeInMicroSec(); // get elapsed time in micro-second
protected:
private:
double startTimeInMicroSec; // starting time in micro-second
double endTimeInMicroSec; // ending time in micro-second
int stopped; // stop flag
#if defined(WIN32) || defined(_WIN32)
LARGE_INTEGER frequency; // ticks per second
LARGE_INTEGER startCount; //
LARGE_INTEGER endCount; //
#else
timeval startCount; //
timeval endCount; //
#endif
};
#endif // TIMER_H_DEF
If one is using the Qt framework in the project, the best solution is probably to use QElapsedTimer.
I am trying to calculate the number of ticks a function uses to run and to do so using the clock() function like so:
unsigned long time = clock();
myfunction();
unsigned long time2 = clock() - time;
printf("time elapsed : %lu",time2);
But the problem is that the value it returns is a multiple of 10000, which I think is the CLOCK_PER_SECOND. Is there a way or an equivalent function value that is more precise?
I am using Ubuntu 64-bit, but would prefer if the solution can work on other systems like Windows & Mac OS.
There are a number of more accurate timers in POSIX.
gettimeofday() - officially obsolescent, but very widely available; microsecond resolution.
clock_gettime() - the replacement for gettimeofday() (but not necessarily so widely available; on an old version of Solaris, requires -lposix4 to link), with nanosecond resolution.
There are other sub-second timers of greater or lesser antiquity, portability, and resolution, including:
ftime() - millisecond resolution (marked 'legacy' in POSIX 2004; not in POSIX 2008).
clock() - which you already know about. Note that it measures CPU time, not elapsed (wall clock) time.
times() - CLK_TCK or HZ. Note that this measures CPU time for parent and child processes.
Do not use ftime() or times() unless there is nothing better. The ultimate fallback, but not meeting your immediate requirements, is
time() - one second resolution.
The clock() function reports in units of CLOCKS_PER_SEC, which is required to be 1,000,000 by POSIX, but the increment may happen less frequently (100 times per second was one common frequency). The return value must be divided by CLOCKS_PER_SEC to get time in seconds.
The most precise (but highly not portable) way to measure time is to count CPU ticks.
For instance on x86
unsigned long long int asmx86Time ()
{
unsigned long long int realTimeClock = 0;
asm volatile ( "rdtsc\n\t"
"salq $32, %%rdx\n\t"
"orq %%rdx, %%rax\n\t"
"movq %%rax, %0"
: "=r" ( realTimeClock )
: /* no inputs */
: "%rax", "%rdx" );
return realTimeClock;
}
double cpuFreq ()
{
ifstream file ( "/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq" );
string sFreq; if ( file ) file >> sFreq;
stringstream ssFreq ( sFreq ); double freq = 0.;
if ( ssFreq ) { ssFreq >> freq; freq *= 1000; } // kHz to Hz
return freq;
}
// Timing
unsigned long long int asmStart = asmx86Time ();
doStuff ();
unsigned long long int asmStop = asmx86Time ();
float asmDuration = ( asmStop - asmStart ) / cpuFreq ();
If you don't have an x86, you'll have to re-write the assembler code accordingly to your CPU. If you need maximum precision, that's unfortunatelly the only way to go... otherwise use clock_gettime().
Per the clock() manpage, on POSIX platforms the value of the CLOCKS_PER_SEC macro must be 1000000. As you say that the return value you're getting from clock() is a multiple of 10000, that would imply that the resolution is 10 ms.
Also note that clock() on Linux returns an approximation of the processor time used by the program. On Linux, again, scheduler statistics are updated when the scheduler runs, at CONFIG_HZ frequency. So if the periodic timer tick is 100 Hz, you get process CPU time consumption statistics with 10 ms resolution.
Walltime measurements are not bound by this, and can be much more accurate. clock_gettime(CLOCK_MONOTONIC, ...) on a modern Linux system provides nanosecond resolution.
I agree with the solution of Jonathan. Here is the implementation of clock_gettime() with nanoseconds of precision.
//Import
#define _XOPEN_SOURCE 500
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <sys/time.h>
int main(int argc, char *argv[])
{
struct timespec ts;
int ret;
while(1)
{
ret = clock_gettime (CLOCK_MONOTONIC, &ts);
if (ret)
{
perror ("clock_gettime");
return;
}
ts.tv_nsec += 20000; //goto sleep for 20000 n
printf("Print before sleep tid%ld %ld\n",ts.tv_sec,ts.tv_nsec );
// printf("going to sleep tid%d\n",turn );
ret = clock_nanosleep (CLOCK_MONOTONIC, TIMER_ABSTIME,&ts, NULL);
}
}
Although It's difficult to achieve ns precision, but this can be used to get precision for less than a microseconds (700-900 ns). printf above is used to just print the thread # (it'll definitely take 2-3 micro seconds to just print a statement).
If I had the following code
clock_t t;
t = clock();
//algorithm
t = clock() - t;
t would equal the number of ticks to run the program. Is this the same is CPU time? Are there any other ways to measure CPU time in C++?
OS -- Debian GNU/Linux
I am open to anything that will work. I am wanting to compare the CPU time of two algorithms.
clock() is specified to measure CPU time however not all implementations do this. In particular Microsoft's implementation in VS does not count additional time when multiple threads are running, or count less time when the program's threads are sleeping/waiting.
Also note that clock() should measure the CPU time used by the entire program, so while CPU time used by multiple threads in //algorithm will be measured, other threads that are not part of //algorithm also get counted.
clock() is the only method specified in the standard to measure CPU time, however there are certainly other, platform specific, methods for measuring CPU time.
std::chrono does not include any clock for measuring CPU time. It only has a clock synchronized to the system time, a clock that advances at a steady rate with respect to real time, and a clock that is 'high resolution' but which does not necessarily measure CPU time.
How to mesure cpu-time used ?
#include <ctime>
std::clock_t c_start = std::clock();
// your_algorithm
std::clock_t c_end = std::clock();
long_double time_elapsed_ms = 1000.0 * (c_end-c_start) / CLOCKS_PER_SEC;
std::cout << "CPU time used: " << time_elapsed_ms << " ms\n";
Of course, if you display time in seconds :
std::cout << "CPU time used: " << time_elapsed_ms / 1000.0 << " s\n";
Source : http://en.cppreference.com/w/cpp/chrono/c/clock
A non-standard way to do this using pthreads is:
#include <pthread.h>
#include <time.h>
timespec cpu_time()
{
thread_local bool initialized(false);
thread_local clockid_t clock_id;
if (!initialized)
{
pthread_getcpuclockid(pthread_self(), &clock_id);
initialized = true;
}
timespec result;
clock_gettime(clock_id, &result);
return result;
}
Addition: A more standard way is:
#include <sys/resource.h>
#include <stdio.h>
int main(int argc, char* argv[])
{
struct rusage r;
getrusage(RUSAGE_SELF, &r);
printf("CPU usage: %d.%06d\n", r.ru_utime.tv_sec, r.ru_utime.tv_usec);
}
C++ has a chrono library. See http://en.cppreference.com/w/cpp/chrono. There are also often platform dependent ways to get at high resolution timers (obviously varies by platform).
I need some way in c++ to keep track of the number of milliseconds since program execution. And I need the precision to be in milliseconds. (In my googling, I've found lots of folks that said to include time.h and then multiply the output of time() by 1000 ... this won't work.)
clock has been suggested a number of times. This has two problems. First of all, it often doesn't have a resolution even close to a millisecond (10-20 ms is probably more common). Second, some implementations of it (e.g., Unix and similar) return CPU time, while others (E.g., Windows) return wall time.
You haven't really said whether you want wall time or CPU time, which makes it hard to give a really good answer. On Windows, you could use GetProcessTimes. That will give you the kernel and user CPU times directly. It will also tell you when the process was created, so if you want milliseconds of wall time since process creation, you can subtract the process creation time from the current time (GetSystemTime). QueryPerformanceCounter has also been mentioned. This has a few oddities of its own -- for example, in some implementations it retrieves time from the CPUs cycle counter, so its frequency varies when/if the CPU speed changes. Other implementations read from the motherboard's 1.024 MHz timer, which does not vary with the CPU speed (and the conditions under which each are used aren't entirely obvious).
On Unix, you can use GetTimeOfDay to just get the wall time with (at least the possibility of) relatively high precision. If you want time for a process, you can use times or getrusage (the latter is newer and gives more complete information that may also be more precise).
Bottom line: as I said in my comment, there's no way to get what you want portably. Since you haven't said whether you want CPU time or wall time, even for a specific system, there's not one right answer. The one you've "accepted" (clock()) has the virtue of being available on essentially any system, but what it returns also varies just about the most widely.
See std::clock()
Include time.h, and then use the clock() function. It returns the number of clock ticks elapsed since the program was launched. Just divide it by "CLOCKS_PER_SEC" to obtain the number of seconds, you can then multiply by 1000 to obtain the number of milliseconds.
Some cross platform solution. This code was used for some kind of benchmarking:
#ifdef WIN32
LARGE_INTEGER g_llFrequency = {0};
BOOL g_bQueryResult = QueryPerformanceFrequency(&g_llFrequency);
#endif
//...
long long osQueryPerfomance()
{
#ifdef WIN32
LARGE_INTEGER llPerf = {0};
QueryPerformanceCounter(&llPerf);
return llPerf.QuadPart * 1000ll / ( g_llFrequency.QuadPart / 1000ll);
#else
struct timeval stTimeVal;
gettimeofday(&stTimeVal, NULL);
return stTimeVal.tv_sec * 1000000ll + stTimeVal.tv_usec;
#endif
}
The most portable way is using the clock function.It usually reports the time that your program has been using the processor, or an approximation thereof. Note however the following:
The resolution is not very good for GNU systems. That's really a pity.
Take care of casting everything to double before doing divisions and assignations.
The counter is held as a 32 bit number in GNU 32 bits, which can be pretty annoying for long-running programs.
There are alternatives using "wall time" which give better resolution, both in Windows and Linux. But as the libc manual states: If you're trying to optimize your program or measure its efficiency, it's very useful to know how much processor time it uses. For that, calendar time and elapsed times are useless because a process may spend time waiting for I/O or for other processes to use the CPU.
Here is a C++0x solution and an example why clock() might not do what you think it does.
#include <chrono>
#include <iostream>
#include <cstdlib>
#include <ctime>
int main()
{
auto start1 = std::chrono::monotonic_clock::now();
auto start2 = std::clock();
sleep(1);
for( int i=0; i<100000000; ++i);
auto end1 = std::chrono::monotonic_clock::now();
auto end2 = std::clock();
auto delta1 = end1-start1;
auto delta2 = end2-start2;
std::cout << "chrono: " << std::chrono::duration_cast<std::chrono::duration<float>>(delta1).count() << std::endl;
std::cout << "clock: " << static_cast<float>(delta2)/CLOCKS_PER_SEC << std::endl;
}
On my system this outputs:
chrono: 1.36839
clock: 0.36
You'll notice the clock() method is missing a second. An astute observer might also notice that clock() looks to have less resolution. On my system it's ticking by in 12 millisecond increments, terrible resolution.
If you are unable or unwilling to use C++0x, take a look at Boost.DateTime's ptime microsec_clock::universal_time().
This isn't C++ specific (nor portable), but you can do:
SYSTEMTIME systemDT;
In Windows.
From there, you can access each member of the systemDT struct.
You can record the time when the program started and compare the current time to the recorded time (systemDT versus systemDTtemp, for instance).
To refresh, you can call GetLocalTime(&systemDT);
To access each member, you would do systemDT.wHour, systemDT.wMinute, systemDT.wMilliseconds.
To get more information on SYSTEMTIME.
Do you want wall clock time, CPU time, or some other measurement? Also, what platform is this? There is no universally portable way to get more precision than time() and clock() give you, but...
on most Unix systems, you can use gettimeofday() and/or clock_gettime(), which give at least microsecond precision and access to a variety of timers;
I'm not nearly as familiar with Windows, but one of these functions probably does what you want.
You can try this code (get from StockFish chess engine source code (GPL)):
#include <iostream>
#include <stdio>
#if !defined(_WIN32) && !defined(_WIN64) // Linux - Unix
# include <sys/time.h>
typedef timeval sys_time_t;
inline void system_time(sys_time_t* t) {
gettimeofday(t, NULL);
}
inline long long time_to_msec(const sys_time_t& t) {
return t.tv_sec * 1000LL + t.tv_usec / 1000;
}
#else // Windows and MinGW
# include <sys/timeb.h>
typedef _timeb sys_time_t;
inline void system_time(sys_time_t* t) { _ftime(t); }
inline long long time_to_msec(const sys_time_t& t) {
return t.time * 1000LL + t.millitm;
}
#endif
struct Time {
void restart() { system_time(&t); }
uint64_t msec() const { return time_to_msec(t); }
long long elapsed() const {
return long long(current_time().msec() - time_to_msec(t));
}
static Time current_time() { Time t; t.restart(); return t; }
private:
sys_time_t t;
};
int main() {
sys_time_t t;
system_time(&t);
long long currentTimeMs = time_to_msec(t);
std::cout << "currentTimeMs:" << currentTimeMs << std::endl;
Time time = Time::current_time();
for (int i = 0; i < 1000000; i++) {
//Do something
}
long long e = time.elapsed();
std::cout << "time elapsed:" << e << std::endl;
getchar(); // wait for keyboard input
}
I am doing a performance comparison test. I want to record the run time for my c++ test application and compare it under different circumstances. The two cases to be compare are: 1) a file system driver is installed and active and 2) also when that same file system driver is not installed and active.
A series of tests will be conducted on several operating systems and the two runs described above will be done for each operating system and it's setup. Results will only be compared between the two cases for a given operating system and setup.
I understand that when running a c/c++ application within an operating system that is not a real-time system there is no way to get the real time it took for the application to run. I don't think this is a big concern as long as the test application runs for a fairly long period of time, therefore making the scheduling, priorities, switching, etc of the CPU negligible.
Edited: For Windows platform only
How can I generate some accurate application run time results within my test application?
If you're on a POSIX system you can use the time command, which will give you the total "wall clock" time as well as the actual CPU times (user and system).
Edit: Apparently there's an equivalent for Windows systems in the Windows Server 2003 Resource Kit called timeit.exe (not verified).
I think what you are asking is "How do I measure the time it takes for the process to run, irrespective of the 'external' factors, such as other programs running on the system?" In that case, the easiest thing would be to run the program multiple times, and get an average time. This way you can have a more meaningful comparison, hoping that various random things that the OS spends the CPU time on will average out. If you want to get real fancy, you can use a statistical test, such as the two-sample t-test, to see if the difference in your average timings is actually significant.
You can put this
#if _DEBUG
time_t start = time(NULL);
#endif
and finish with this
#if _DEBUG
time end = time(NULL);
#endif
in your int main() method. Naturally you'll have to return the difference either to a log or cout it.
Just to expand on ezod's answer.
You run the program with the time command to get the total time - there are no changes to your program
If you're on a Windows system you can use the high-performance counters by calling QueryPerformanceCounter():
#include <windows.h>
#include <string>
#include <iostream>
int main()
{
LARGE_INTEGER li = {0}, li2 = {0};
QueryPerformanceFrequency(&li);
__int64 freq = li.QuadPart;
QueryPerformanceCounter(&li);
// run your app here...
QueryPerformanceCounter(&li2);
__int64 ticks = li2.QuadPart-li.QuadPart;
cout << "Reference Implementation Ran In " << ticks << " ticks" << " (" << format_elapsed((double)ticks/(double)freq) << ")" << endl;
return 0;
}
...and just as a bonus, here's a function that converts the elapsed time (in seconds, floating point) to a descriptive string:
std::string format_elapsed(double d)
{
char buf[256] = {0};
if( d < 0.00000001 )
{
// show in ps with 4 digits
sprintf(buf, "%0.4f ps", d * 1000000000000.0);
}
else if( d < 0.00001 )
{
// show in ns
sprintf(buf, "%0.0f ns", d * 1000000000.0);
}
else if( d < 0.001 )
{
// show in us
sprintf(buf, "%0.0f us", d * 1000000.0);
}
else if( d < 0.1 )
{
// show in ms
sprintf(buf, "%0.0f ms", d * 1000.0);
}
else if( d <= 60.0 )
{
// show in seconds
sprintf(buf, "%0.2f s", d);
}
else if( d < 3600.0 )
{
// show in min:sec
sprintf(buf, "%01.0f:%02.2f", floor(d/60.0), fmod(d,60.0));
}
// show in h:min:sec
else
sprintf(buf, "%01.0f:%02.0f:%02.2f", floor(d/3600.0), floor(fmod(d,3600.0)/60.0), fmod(d,60.0));
return buf;
}
Download Cygwin and run your program by passing it as an argument to the time command. When you're done, spend some time to learn the rest of the Unix tools that come with Cygwin. This will be one of the best investments for your career you'll ever make; the Unix toolchest is a timeless classic.
QueryPerformanceCounter can have problems on multicore systems, so I prefer to use timeGetTime() which gives the result in milliseconds
you need a 'timeBeginPeriod(1)' before and 'timeEndPeriod(1)' afterwards to reduce the granularity as far as you can but I find it works nicely for my purposes (regulating timesteps in games), so it should be okay for benchmarking.
You can also use the program very sleepy to get a bunch of runtime information about your program. Here's a link: http://www.codersnotes.com/sleepy