Difference between clock() and MPI_Wtime() - c++

Quick Question . for MPI implementation of my code ,i am getting a huge difference in both. I know MPI_Wtime is the real time elapsed by each processor and clock() gives a rough idea of the expected time . Do anyone wants to add some assertion ?

The clock function is utterly useless. It measures cpu time, not real time/wall time, and moreover it has the following serious issues:
On most implementations, the resolution is extremely bad, for example, 1/100 of a second. CLOCKS_PER_SECOND is not the resolution, just the scale.
With typical values of CLOCKS_PER_SECOND (Unix standards require it to be 1 million, for example), clock will overflow in a matter of minutes on 32-bit systems. After overflow, it returns -1.
Most historical implementations don't actually return -1 on overflow, as the C standard requires, but instead wrap. As clock_t is usually a signed type, attempting to perform arithmetic with the wrapped values will produce either meaningless results or undefined behavior.
On Windows it does the completely wrong thing and measures elapsed real time, rather than cpu time.

The official definition of clock is that it gives you CPU-time. In Windows, for hysterical historical reasons - it would break some apps if you change it to reflect CPU-time now - on Windows, the time is just elapsed time.
MPI_Wtime gives, as you say, the "current time on this processor", which is quite different. If you do something that sleeps for 1 minute, MPI_Wtime will move 60 seconds forward, where clock (except for Windows) would be pretty much unchanged.

Related

How get very precise elapsed time C++

I'm running my code on Ubuntu, and I need to get the elapsed time about a function on my program. I need a very accurate time, like nano seconds or at least micro seconds.
I read about chrono.h but it uses system time, and I prefer use CPU time.
Is there a way to do that, and have that granularity (nano seconds)?
std::chrono does have a high_resolution_clock, though please bear in mind that the precision is limited by the processor.
If you want to use functions directory from libc, you can use gettimeofday but as before there is no guarantee that this will be nanosecond accurate. (this is only microsecond accuracy)
The achievable precision of the clock is one of the properties of different hardware / OS that still leak into virtually every language, and, to be honest, having been in the same situation I find building yourself an own abstraction that is good enough in your case is often the only choice.
That being said, I would avoid the STL for high-precision timing. Since it is a library standard with no one true implementation, it has to create an abstraction, which implies one of:
use a least common denominator
leak hardware/OS details through platform-dependent behavior
In the second case you are essentially back to where you started, if you want to have uniform behavior. If you can afford the possible loss of precision or the deviations of a standard clock, then by all means use it. Clocks are hard and subtle.
If you know your target environment you can choose the appropriate clocks the oldschool way (#ifdef PLATFORM_ID...), e.g. clock_gettime(), QPC), and implement the most precise abstraction you can get. Of course you are limited by the same choice the STL has to make, but by reducing the set of platforms, you can generally improve on the lcd-requirement.
If you need a more theoretical way to convince yourself of this argumentation, you can consider the set of clocks with their maximum precision, and a sequence of accesses to the current time. For clocks advancing uniformly in uniform steps, if two accesses happen faster than the maximum precision of one clock, but slower than the maximum precision of another clock, you are bound to get different behavior. If on the other hand you ensure that two accesses are at least the maximum precision of the slowest clock apart the behavior is the same. Now of course real clocks are not advancing uniformly (clock drift), and also not in unit-steps.
While there is a standards function that should return the CPU time (std::clock) in reality there's no portable way to do this.
On POSIX systems (which Linux is attempting to be) then std::clock should do the right thing though. Just don't expect it to work the same on non-POSIX platforms if you ever want to make your application portable.
The values returned by std::clock are also approximate, and the precision and resolution is system dependent.

Is there a standard library implementation where high_resolution_clock is not a typedef?

The C++ Draft par 20.12.7.3 reads:
high_resolution_clock may be a synonym for system_clock or steady_clock
Of course this may mandates nothing but I wonder :
Is there any point for high_resolution_clock to something other that a typedef ?
Are there such implementations ?
If a clock with a shorter tick period is devised it can either be steady or not steady. So if such a mechanism exists, wouldn't we want to "improve" system_clock and high_resolution_clock as well, defaulting to the typedef solution once more ?
The reason that specs have wording such as "may" and "can", and other vague words that allow for other possibilities comes from the wish by the spec writers not wanting to (unnecessarily) limit the implementation of a "better" solution of something.
Imagine a system where time in general is counted in seconds, and the system_clock is just that - the system_clock::period will return 1 second. This time is stored as a single 64-bit integer.
Now, in the same system, there is also a time in nano-seconds, but it's stored as a 128-bit integer. The resulting time-calculations are slightly more complex due to this large integer format, and for someone that only needs 1s precision for their time (in a system where a large number of calculations on time are made), you wouldn't want to have the extra penalty of using high_precision_clock, when the system doesn't need it.
As to if there are such things in real life, I'm not sure. The key is that it's not a violation to the standard, if you care to implement it such.
Note that steady is very much a property of "what happens when the system changes time" (e.g. if the outside network has been down for a several days, and the internal clock in the system has drifted off from the atomic clock that the network time updates to). Using steady_clock will guarantee that time doesn't go backwards or suddenly jumps forward 25 seconds all of a sudden. Likewise, there is no problem when there is a "leap second" or similar time adjustment in the computer system. On the other hand, a system_clock is guaranteed to give you the correct new time if you give it a forward duration past a daylight savings time, or some such, where steady_clock will just tick along hour after hour, regardless. So choosing the right one of those will affect the recording of your favourite program in the digital TV recorder - steady_clock will record at the wrong time [my DTV recorder did this wrong a few years back, but they appear to have fixed it now].
system_clock should also take into account the user (or sysadmin) changing the clock in the system, steady_clock should NOT do so.
Again, high_resolution_clock may or may not be steady - it's up to the implementor of the C++ library to give the appropriate response to is_steady.
In the 4.9.2 version of <chrono>, we find this using high_resolution_clock = system_clock;, so in this case it's a direct typedef (by a different name). But the spec doesn't REQUIRE this.

How to interpret clock()?

In my C++ program, I measure CPU time by the clock() command. As the code is executed on a cluster of different computers (running all the same OS, but having different hardware configuration, i.e. different CPUs), I am wonderung about measuring actual execution time. Here is my scenario:
As far as I read, clock() gives the amount of CPU clock ticks that passed since a fixed date. I measure the relative duration by calling clock() a second time and building the difference.
Now what defines the internal clock() in C++? If I have CPU A with 1.0 GHz and CPU B with 2.0 GHz and run the same code on them, how many clocks will CPU A and B take to finish? Does the clock() correspond to "work done"? Or is it really a "time"?
Edit: As the CLOCKS_PER_SEC is not set, I cannot use it for convertion of clocks to runtime in seconds. As the manual says, CLOCKS_PER_SEC depends on the hardware/architecture. That means there is a dependency of the clocks on the hardware. So, I really need to know what clock() gives me, without any additional calculation.
The clock() function should return the closest possible
representation of the CPU time used, regardless of the clock
spead of the CPU. Where the clock speed of the CPU might
intervene (but not necessarily) is in the granularity; more
often, however, the clock's granularity depends on some external
time source. (In the distant past, it was often based on the
power line frequence, with a granularity of 1/50 or 1/60 of
a second, depending on where you were.)
To get the time in seconds, you divide by CLOCKS_PER_SEC. Be
aware, however, that both clock() and CLOCKS_PER_SEC are
integral values, so the division is integral. You might want to
convert one to double before doing the division. In the past,
CLOCKS_PER_SEC also corresponded to the granularity, but
modern systems seem to just choose some large value (Posix
requires 1000000, regardless of the granularity); this means
that successive return values from clock() will "jump".
Finally, it's probably worth noting that in VC++, clock() is
broken, and returns wall clock time, rather than CPU time.
(This is probably historically conditionned; in the early days,
wall clock time was all that was available, and the people at
Microsoft probably think that there is code which depends on
it.)
You can convert clock ticks to real time by dividing the amount with CLOCKS_PER_SEC.
Note that since C++11 a more appropriate way of measuring elapsed time is by using std::steady_clock.
From man clock:
The value returned is the CPU time used so far as a clock_t; to get
the number of seconds used, divide by CLOCKS_PER_SEC. If the processor
time used is not available or its value cannot be represented, the
function returns the value (clock_t) -1.

C++: Timing in Linux (using clock()) is out of sync (due to OpenMP?)

At the top and end of my program I use clock() to figure out how long my program takes to finish. Unfortunately, it appears to take half as long as it's reporting. I double checked this with the "time" command.
My program reports:
Completed in 45.86s
Time command reports:
real 0m22.837s
user 0m45.735s
sys 0m0.152s
Using my cellphone to time it, it completed in 23s (aka: the "real" time). "User" time is the sum of all threads, which would make sense since I'm using OpenMP. (You can read about it here: What do 'real', 'user' and 'sys' mean in the output of time(1)?)
So, why is clock() reporting in "user" time rather than "real" time? Is there a different function I should be using to calculate how long my program has been running?
As a side note, Windows' clock() works as expected and reports in "real" time.
user 0m45.735s
clock() measures CPU time the process used (as good as it can) per 7.27.2.1
The clock function returns the implementation’s best approximation to the processor time used by the program since the beginning of an implementation-defined era related only to the program invocation.
and not wall clock time. Thus clock() reporting a time close to the user time that time reports is normal and standard-conforming.
To measure elapsed time, if you can assume POSIX, using clock_gettime is probably the best option, the standard function time() can also be used for that, but is not very fine-grained.
I would suggest clock_gettime using CLOCK_MONOTONIC for the clock.
Depending on your specific system, that should give near-microsecond or better resolution, and it will not do funny things if (e.g.) someone sets the system time while your program is running.
I would suggest that for benchmarking inside OpenMP applications you use the portable OpenMP timing function omp_get_wtime(), which returns a double value with the seconds since some unspecified point in the past. Call it twice and subtract the return values to obtain the elapsed time. You can find out how precise time measurements are by calling omp_get_wtick(). It returns a double value of the timer resolution - values closer to 0.0 indicate more precise timers.

How to calculate a operation's time in micro second precision

I want to calculate performance of a function in micro second precision on Windows platform.
Now Windows itself has milisecond granuality, so how can I achieve this.
I tried following sample, but not getting correct results.
LARGE_INTEGER ticksPerSecond = {0};
LARGE_INTEGER tick_1 = {0};
LARGE_INTEGER tick_2 = {0};
double uSec = 1000000;
// Get the frequency
QueryPerformanceFrequency(&ticksPerSecond);
//Calculate per uSec freq
double uFreq = ticksPerSecond.QuadPart/uSec;
// Get counter b4 start of op
QueryPerformanceCounter(&tick_1);
// The ope itself
Sleep(10);
// Get counter after opfinished
QueryPerformanceCounter(&tick_2);
// And now the op time in uSec
double diff = (tick_2.QuadPart/uFreq) - (tick_1.QuadPart/uFreq);
Run the operation in a loop a million times or so and divide the result by that number. That way you'll get the average execution time over that many executions. Timing one (or even a hundred) executions of a very fast operation is very unreliable, due to multitasking and whatnot.
compile it
look at the assembler output
count the number of each instruction in your function
apply the cycles per instruction on your target processor
end up with a cycle count
multiply by the clock speed you are running at
apply arbitrary scaling factors to account for cache misses and branch mis-predictions lol
(man I am so going to get down-voted for this answer)
No, you are probably getting an accurate result, QueryPerformanceCounter() works well for timing short intervals. What's wrong is the your expectation of the accuracy of Sleep(). It has a resolution of 1 millisecond, its accuracy is far worse. No better than about 15.625 milliseconds on most Windows machine.
To get it anywhere close to 1 millisecond, you'll have to call timeBeginPeriod(1) first. That probably will improve the match, ignoring the jitter you'll get from Windows being a multi-tasking operating system.
If you're doing this for offline profiling, a very simple way is to run the function 1000 times, measure to the closest millisecond and divide by 1000.
To get finer resolution than 1 ms, you will have to consult your OS documentation. There may be APIs to get timer resolution in microsecond resolution. If so, run your application many times and take the averages.
I like Matti Virkkunen's answer. Check the time, call the function a large number of times, check the time when you finish, and divide by the number of times you called the function. He did mention you might be off due to OS interrupts. You might vary the number of times you make the call and see a difference. Can you raise the priority of the process? Can you get it so all the calls within a single OS time slice?
Since you don't know when the OS might swap you out, you can put this all inside a larger loop to do the whole measurement a large number of times, and save the smallest number as that is the one that had the fewest OS interrupts. This still may be greater than the actual time for the function to execute because it may still contain some OS interrupts.
Sanjeet,
It looks (to me) like you're doing this exactly right. QueryPerformanceCounter is a perfectly good way to measure short periods of time with a high degree of precision. If you're not seeing the result you expected, it's most likely because the sleep isn't sleeping for the amount of time you expected it to! However, it is likely being measured correctly.
I want to go back to your original question about how to measure the time on windows with microsecond precision. As you already know, the high performance counter (i.e. QueryPerformanceCounter) "ticks" at the frequency reported by QueryPerformanceFrequency. That means that you can measure time with precision equal to:
1/frequency seconds
On my machine, QueryPerformanceFrequency reports 2337910 (counts/sec). That means that my computer's QPC can measure with precision 4.277e-7 seconds, or 0.427732 microseconds. That means that the smallest bit of time I can measure is 0.427732 microseconds. This, of course, gives you the precision that you originally asked for :) Your machine's frequency should be similar, but you can always do the math and check it.
Or you can use gettimeofday() which gives you a timeval struct that is a timestamp (down to µs)