How to interpret clock()? - c++

In my C++ program, I measure CPU time by the clock() command. As the code is executed on a cluster of different computers (running all the same OS, but having different hardware configuration, i.e. different CPUs), I am wonderung about measuring actual execution time. Here is my scenario:
As far as I read, clock() gives the amount of CPU clock ticks that passed since a fixed date. I measure the relative duration by calling clock() a second time and building the difference.
Now what defines the internal clock() in C++? If I have CPU A with 1.0 GHz and CPU B with 2.0 GHz and run the same code on them, how many clocks will CPU A and B take to finish? Does the clock() correspond to "work done"? Or is it really a "time"?
Edit: As the CLOCKS_PER_SEC is not set, I cannot use it for convertion of clocks to runtime in seconds. As the manual says, CLOCKS_PER_SEC depends on the hardware/architecture. That means there is a dependency of the clocks on the hardware. So, I really need to know what clock() gives me, without any additional calculation.

The clock() function should return the closest possible
representation of the CPU time used, regardless of the clock
spead of the CPU. Where the clock speed of the CPU might
intervene (but not necessarily) is in the granularity; more
often, however, the clock's granularity depends on some external
time source. (In the distant past, it was often based on the
power line frequence, with a granularity of 1/50 or 1/60 of
a second, depending on where you were.)
To get the time in seconds, you divide by CLOCKS_PER_SEC. Be
aware, however, that both clock() and CLOCKS_PER_SEC are
integral values, so the division is integral. You might want to
convert one to double before doing the division. In the past,
CLOCKS_PER_SEC also corresponded to the granularity, but
modern systems seem to just choose some large value (Posix
requires 1000000, regardless of the granularity); this means
that successive return values from clock() will "jump".
Finally, it's probably worth noting that in VC++, clock() is
broken, and returns wall clock time, rather than CPU time.
(This is probably historically conditionned; in the early days,
wall clock time was all that was available, and the people at
Microsoft probably think that there is code which depends on
it.)

You can convert clock ticks to real time by dividing the amount with CLOCKS_PER_SEC.
Note that since C++11 a more appropriate way of measuring elapsed time is by using std::steady_clock.

From man clock:
The value returned is the CPU time used so far as a clock_t; to get
the number of seconds used, divide by CLOCKS_PER_SEC. If the processor
time used is not available or its value cannot be represented, the
function returns the value (clock_t) -1.

Related

How accurate is std::chrono?

std::chrono advertises that it can report results down to the nanosecond level. On a typical x86_64 Linux or Windows machine, how accurate would one expect this to be? What would be the error bars for a measurement of 10 ns, 10 µs, 10 ms, and 10 s, for example?
It's most likely hardware and OS dependent. For example when I ask Windows what the clock frequency is using QueryPerformanceFrequency() I get 3903987, which if you take the inverse of that you get a clock period or resolution of about 256 nanoseconds. This is the value that that my operating system reports.
With std::chrono according to the docs the minimum representable duration is high_resolution_clock::period::num / high_resolution_clock::period::den.
The num and den are numerator and denominator. std::chrono::high_resolution_clock tells me the numerator is 1, and the denominator is 1 billion, supposedly corresponding to 1 nanosecond:
std::cout << (double)std::chrono::high_resolution_clock::period::num /
std::chrono::high_resolution_clock::period::den; // Results in a nanosecond.
So according to the std::chrono I have one nanosecond resolution but I don't believe it because the native OS system call is more likely to be reporting the more accurate frequency/period.
The accuracy will depend upon the application and how this application interacts with the operating system. I am not familiar with chrono specifically, but there are limitations at a lower level you must account for.
For example, if you timestamp network packets using the CPU, the measurement accuracy is very noisy. Even though the precision of the time measurement may be 1 nanosecond, the context switch time for the interrupt corresponding to the packet arrival may be ~1 microsecond. You can accurately measure when your application processes the packet, but not what time the packet arrived.
Short answer: Not accurate in microseconds and below.
Long answer:
I was interested to know about how much my two dp program takes time to execute. So I used the chrono library, but when I ran it it says 0 microseconds. So technically I was unable to compare. I can't increase the array size, because it will not be possible to extend it to 1e8.
So I wrote a sort program to test it and ran it for 100 and the following is the result:
Enter image description here
It is clearly visible that it is not consistent for same input, so I would recommend not to use for higher precision.

Measuring an algorithms running time in terms of clock ticks

I'm using the following code snippet to measure my algorithms running time in terms of clock ticks:
clock_t t;
t = clock();
//run algorithm
t = clock() - t;
printf ("It took me %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);
However, this returns 0 when input size is small. How is this even possible ?
The clock has some granularity, dependent on several factors such as your OS.
Therefore, it may happen that your algorithm runs that fast that the clock did not have time update. Hence the measured duration of 0.
You can try to run your algorithm n times and divide the measured time by n to get a better idea of the time taken on small inputs.
The resolution of the standard C clock() function can vary heavily between systems and is likely too small to measure your algorithm. You have 2 options:
Use operating system specific functions
Repeat your algorithm several times until it takes long enough to be measured with clock()
For 1) you can use QueryPerformanceCounter() and QueryPerformanceFrequency() if your program runs under windows or use clock_gettime() if it runs on Linux.
Refer to these pages for further details:
QueryPerformanceCounter()
clock_gettime()
For 2) you have to execute your algorithm a given number of times sequentially so that the time reported by clock() is several magnitudes above the minimum granularity of clock(). Lets say clock() only works in steps of 12 microseconds, then the time consumed by the total test run should be at least 1.2 milliseconds so your time measurement has at most 1% deviation. Otherwise, if you measure a time of 12 micros you never know if it ran for 12.0 micros or maybe 23.9 micros, but the next bigger clock() tick just didn't happen. The more often your algorithm executes sequentially inside the time measurement, the more exact your time measurement will be. Also be sure to copy-paste the call to your algorithm for sequential executions; if you just use a loop counter in a for-loop, this may severely influence your measurement!

Difference between clock() and MPI_Wtime()

Quick Question . for MPI implementation of my code ,i am getting a huge difference in both. I know MPI_Wtime is the real time elapsed by each processor and clock() gives a rough idea of the expected time . Do anyone wants to add some assertion ?
The clock function is utterly useless. It measures cpu time, not real time/wall time, and moreover it has the following serious issues:
On most implementations, the resolution is extremely bad, for example, 1/100 of a second. CLOCKS_PER_SECOND is not the resolution, just the scale.
With typical values of CLOCKS_PER_SECOND (Unix standards require it to be 1 million, for example), clock will overflow in a matter of minutes on 32-bit systems. After overflow, it returns -1.
Most historical implementations don't actually return -1 on overflow, as the C standard requires, but instead wrap. As clock_t is usually a signed type, attempting to perform arithmetic with the wrapped values will produce either meaningless results or undefined behavior.
On Windows it does the completely wrong thing and measures elapsed real time, rather than cpu time.
The official definition of clock is that it gives you CPU-time. In Windows, for hysterical historical reasons - it would break some apps if you change it to reflect CPU-time now - on Windows, the time is just elapsed time.
MPI_Wtime gives, as you say, the "current time on this processor", which is quite different. If you do something that sleeps for 1 minute, MPI_Wtime will move 60 seconds forward, where clock (except for Windows) would be pretty much unchanged.

Behaviour of CLOCKS_PER_SEC in different Operating Systems

I was running a cpp code , but one thing i noticed that on windows 7, CLOCKS_PER_SEC in C++ code gives 1000 while on linux fedora 16 it gives 1000000. Can anyone justify this behaviour?
What's to justify? CLOCKS_PER_SEC is implementation defined, and can
be anything. All it indicates it the units returned by the function
clock(). It doesn't even indicate the resolution of clock(): Posix
requires it to be 1000000, regardless of the actual resolution. If
Windows is returning 1000, that's probably not the actual resolution
either. (I find that my Linux box has a resolution of 10ms, and my Windows box 15ms.)
Basically the implementation of the clock() function has some leeway for different operating systems. On Linux Fedora, the clock ticks faster. It ticks 1 million times a second.
This clock tick is distinct from the clock rate of your CPU, on a different layer of abstraction. Windows tries to make the number of clock ticks equal to the number of milliseconds.
This macro expands to an expression representing the number of clock
ticks in a second, as returned by the function clock.
Dividing a count of clock ticks by this expression yields the number
of seconds.
CLK_TCK is an obsolete alias of this macro.
Reference: http://www.cplusplus.com/reference/clibrary/ctime/CLOCKS_PER_SEC/
You should also know that the Windows implementation is not for true real-time applications. The 1000 tick clock is derived by dividing a hardware clock by a power of 2. That means that they actually get a 1024 tick clock. To convert it to a 1000 tick clock, Windows will skip certain ticks, meaning some ticks are slower than others!
A separate hardware clock (not the CPU clock) is normally used for timing. Reference: http://en.wikipedia.org/wiki/Real-time_clock

How to calculate a operation's time in micro second precision

I want to calculate performance of a function in micro second precision on Windows platform.
Now Windows itself has milisecond granuality, so how can I achieve this.
I tried following sample, but not getting correct results.
LARGE_INTEGER ticksPerSecond = {0};
LARGE_INTEGER tick_1 = {0};
LARGE_INTEGER tick_2 = {0};
double uSec = 1000000;
// Get the frequency
QueryPerformanceFrequency(&ticksPerSecond);
//Calculate per uSec freq
double uFreq = ticksPerSecond.QuadPart/uSec;
// Get counter b4 start of op
QueryPerformanceCounter(&tick_1);
// The ope itself
Sleep(10);
// Get counter after opfinished
QueryPerformanceCounter(&tick_2);
// And now the op time in uSec
double diff = (tick_2.QuadPart/uFreq) - (tick_1.QuadPart/uFreq);
Run the operation in a loop a million times or so and divide the result by that number. That way you'll get the average execution time over that many executions. Timing one (or even a hundred) executions of a very fast operation is very unreliable, due to multitasking and whatnot.
compile it
look at the assembler output
count the number of each instruction in your function
apply the cycles per instruction on your target processor
end up with a cycle count
multiply by the clock speed you are running at
apply arbitrary scaling factors to account for cache misses and branch mis-predictions lol
(man I am so going to get down-voted for this answer)
No, you are probably getting an accurate result, QueryPerformanceCounter() works well for timing short intervals. What's wrong is the your expectation of the accuracy of Sleep(). It has a resolution of 1 millisecond, its accuracy is far worse. No better than about 15.625 milliseconds on most Windows machine.
To get it anywhere close to 1 millisecond, you'll have to call timeBeginPeriod(1) first. That probably will improve the match, ignoring the jitter you'll get from Windows being a multi-tasking operating system.
If you're doing this for offline profiling, a very simple way is to run the function 1000 times, measure to the closest millisecond and divide by 1000.
To get finer resolution than 1 ms, you will have to consult your OS documentation. There may be APIs to get timer resolution in microsecond resolution. If so, run your application many times and take the averages.
I like Matti Virkkunen's answer. Check the time, call the function a large number of times, check the time when you finish, and divide by the number of times you called the function. He did mention you might be off due to OS interrupts. You might vary the number of times you make the call and see a difference. Can you raise the priority of the process? Can you get it so all the calls within a single OS time slice?
Since you don't know when the OS might swap you out, you can put this all inside a larger loop to do the whole measurement a large number of times, and save the smallest number as that is the one that had the fewest OS interrupts. This still may be greater than the actual time for the function to execute because it may still contain some OS interrupts.
Sanjeet,
It looks (to me) like you're doing this exactly right. QueryPerformanceCounter is a perfectly good way to measure short periods of time with a high degree of precision. If you're not seeing the result you expected, it's most likely because the sleep isn't sleeping for the amount of time you expected it to! However, it is likely being measured correctly.
I want to go back to your original question about how to measure the time on windows with microsecond precision. As you already know, the high performance counter (i.e. QueryPerformanceCounter) "ticks" at the frequency reported by QueryPerformanceFrequency. That means that you can measure time with precision equal to:
1/frequency seconds
On my machine, QueryPerformanceFrequency reports 2337910 (counts/sec). That means that my computer's QPC can measure with precision 4.277e-7 seconds, or 0.427732 microseconds. That means that the smallest bit of time I can measure is 0.427732 microseconds. This, of course, gives you the precision that you originally asked for :) Your machine's frequency should be similar, but you can always do the math and check it.
Or you can use gettimeofday() which gives you a timeval struct that is a timestamp (down to µs)