`clock()` gives usual clocks instead of CPU clocks

`clock()` gives usual clocks instead of CPU clocks - c++

I used to take clock() to get CPU time of my algorithms. However, it does not seem to work anymore. I have a Windows 10 VM with 8 CPUs as also seen in the resource monitor.
Now, I measure time like this:
auto startTime = std::chrono::high_resolution_clock::now();
auto startClocks = std::clock();
// some code with TBB that uses multiple threads
auto endTime = std::chrono::high_resolution_clock::now();
auto endClocks = std::clock();
auto duration = endTime - startTime;
auto clockDuration = endClocks - startClocks;
auto durationSeconds = static_cast<double>(std::chrono::duration_cast<std::chrono::milliseconds>(duration).count()) / 1000.;
auto durationCpuSeconds = 1. * clockDuration / CLOCKS_PER_SEC;
The TBB part surely works, as I see in the resource monitor of Windows that all CPUs work with 100%. If I start some endless loop without parallelization, CPU usage is only 12.5% as expected.
However, durationSeconds and durationCpuSeconds are exactly the same...
I have measured the time with my watch and the result is the wall time. So, clock() clearly does not show the CPU time which should be substantially higher with 8 CPUs working 100% in parallel. Is clock() not reliable or am I missing something here?

Yeah, it's broken on Windows.
The clock function tells how much wall-clock time has passed since the CRT initialization during process start. Note that this function does not strictly conform to ISO C, which specifies net CPU time as the return value. To obtain CPU times, use the Win32 GetProcessTimes function.
(from Microsoft's clock docs)

Related

Time taken between two points in code independent of system clock CPP Linux

I need to find the time taken to execute a piece of code, and the method should be independent of system time, ie chrono and all wouldn't work.
My usecse looks somewhat like this.
int main{
//start
function();
//end
time_take = end - start;
}
I am working in an embedded platform that doesn't have the right time at the start-up. In my case, the start of funcion happens before actual time is set from ntp server and end happens after the exact time is obtained. So any method that compares the time difference between two points wouldn't work. Also, number of CPU ticks wouldn't work for me since my programme necessarily be running actively throughout.
I tried the conventional methods and they didn't work for me.

On Linux clock_gettime() has an option to return the the current CLOCK_MONOTONIC, which is unaffected by system time changes. Measuring the CLOCK_MONOTONIC at the beginning and the end, and then doing your own math to subtract the two values, will measure the elapsed time ignoring any system time changes.

If you don't want to dip down to C-level abstractions, <chrono> has this covered for you with steady_clock:
int main{
//start
auto t0 = std::chrono::steady_clock::now();
function();
auto t1 = std::chrono::steady_clock::now();
//end
auto time_take = end - start;
}
steady_clock is generally a wrapper around clock_gettime used with CLOCK_MONOTONIC except is portable across all platforms. I.e. some platforms don't have clock_gettime, but do have an API for getting a monotonic clock time.
Above the type of take_time will be steady_clock::duration. On all platforms I'm aware of, this type is an alias for nanoseconds. If you want an integral count of nanoseconds you can:
using namespace std::literals;
int64_t i = time_take/1ns;
The above works on all platforms, even if steady_clock::duration is not nanoseconds.
The minor advantage of <chrono> over a C-level API is that you don't have to deal with computing timespec subtraction manually. And of course it is portable.

C++ chrono timers go 1000x slower in a thread

My setup
iOS13.2
Xcode 11.2
I'm implementing a simple timer to count seconds by calendar/wall time.
From many SO tips, I tried various std::chrono:: timers, such as
system_clock
steady_clock
They seem to behave correctly under a single-threaded program.
But in a production code, when I use the timer in a background thread, then it falls apart,
meaning that the duration readings from calling the timer functions are way off.
m_thread = std::thread([&, this]() {
auto start = std::chrono::system_clock::now(); // a persistent state initialized at the beginning of the run.
while (true) {
auto end = std::chrono::system_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
auto isOutdated = duration > 1000;
if (isOutdated) {
print("we are outdated.").
}
}
});
duration seems to be almost always 0;
gettimeofday() works slightly better, i.e. it actually moves, but at a rate 1000x slower than wall time.
I had thought that the chrono class counts all kinds of times, but it seems that I have the wrong expectation how it works.
What am I missing?
UPDATE
Forgot to say, I have 2 more threads going at the same time. Could thread preemption affect this?
UPDATE 2
I tried a few things and now the program behaves as expected, but this actually drove me mad how it happened in the first place.
Things I did
Gradually increase the timeout threshold from 1 to 3000, each time the whole program gets recompiled. I found that when I lower the threshold, the program actually gets the duration right.
Try with gettimeofday() first, which consistently shows numbers 1000x slower, then switch back to system_clock.
Disable some logging to avoid performance hit. I use a thirdparty thread-safe logging lib, which writes to a log file and outputs to device syslog at the same time.
Right now I can finally see the correct duration. NO change in the code logic. What a bizarre experience!

Does steady_clock have only 10ms resolution on windows/cygwin?

I have suprising observation that steady_clock gives poor 10ms resolution when measuring durations. I compile for windows under cygwin. Is it sad true or am I doing sth wrong?
auto start = std::chrono::steady_clock::now();
/*...*/
auto end = std::chrono::steady_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::microseconds>
(end - start).count();
Result is 10000,20000 etc.

The resolution of std::steady_clock is implementation dependent, and you shouldn't rely on a precise minimum duration. It varies across platforms/compiler implementations.
From http://cppreference.com:
Class std::chrono::steady_clock represents a monotonic clock. The time
points of this clock cannot decrease as physical time moves forward.
This clock is not related to wall clock time, and is best suitable for
measuring intervals.
Related: Difference between std::system_clock and std::steady_clock?
If you don't care about monotonicity (i.e., you don't care if someone changes the wall clock while your program is running), you're probably better off with a
std::high_resolution_clock. (the latter is still implementation-dependent)

Measuring the runtime of a C++ code?

I want to measure the runtime of my C++ code. Executing my code takes about 12 hours and I want to write this time at the end of execution of my code. How can I do it in my code?
Operating system: Linux

If you are using C++11 you can use system_clock::now():
auto start = std::chrono::system_clock::now();
/* do some work */
auto end = std::chrono::system_clock::now();
auto elapsed = end - start;
std::cout << elapsed.count() << '\n';
You can also specify the granularity to use for representing a duration:
// this constructs a duration object using milliseconds
auto elapsed =
std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
// this constructs a duration object using seconds
auto elapsed =
std::chrono::duration_cast<std::chrono::seconds>(end - start);
If you cannot use C++11, then have a look at chrono from Boost.
The best thing about using such a standard libraries is that their portability is really high (e.g., they both work in Linux and Windows). So you do not need to worry too much if you decide to port your application afterwards.
These libraries follow a modern C++ design too, as opposed to C-like approaches.
EDIT: The example above can be used to measure wall-clock time. That is not, however, the only way to measure the execution time of a program. First, we can distinct between user and system time:
User time: The time spent by the program running in user space.
System time: The time spent by the program running in system (or kernel) space. A program enters kernel space for instance when executing a system call.
Depending on the objectives it may be necessary or not to consider system time as part of the execution time of a program. For instance, if the aim is to just measure a compiler optimization on the user code then it is probably better to leave out system time. On the other hand, if the user wants to determine whether system calls are a significant overhead, then it is necessary to measure system time as well.
Moreover, since most modern systems are time-shared, different programs may compete for several computing resources (e.g., CPU). In such a case, another distinction can be made:
Wall-clock time: By using wall-clock time the execution of the program is measured in the same way as if we were using an external (wall) clock. This approach does not consider the interaction between programs.
CPU time: In this case we only count the time that a program is actually running on the CPU. If a program (P1) is co-scheduled with another one (P2), and we want to get the CPU time for P1, this approach does not include the time while P2 is running and P1 is waiting for the CPU (as opposed to the wall-clock time approach).
For measuring CPU time, Boost includes a set of extra clocks:
process_real_cpu_clock, captures wall clock CPU time spent by the current process.
process_user_cpu_clock, captures user-CPU time spent by the current process.
process_system_cpu_clock, captures system-CPU time spent by the current process. A tuple-like class process_cpu_clock, that captures real, user-CPU, and system-CPU process times together.
A thread_clock thread steady clock giving the time spent by the current thread (when supported by a platform).
Unfortunately, C++11 does not have such clocks. But Boost is a wide-used library and, probably, these extra clocks will be incorporated into C++1x at some point. So, if you use Boost you will be ready when the new C++ standard adds them.
Finally, if you want to measure the time a program takes to execute from the command line (as opposed to adding some code into your program), you may have a look at the time command, just as #BЈовић suggests. This approach, however, would not let you measure individual parts of your program (e.g., the time it takes to execute a function).

Use std::chrono::steady_clock and not std::chrono::system_clock for measuring run time in C++11. The reason is (quoting system_clock's documentation):
on most systems, the system time can be adjusted at any moment
while steady_clock is monotonic and is better suited for measuring intervals:
Class std::chrono::steady_clock represents a monotonic clock. The time
points of this clock cannot decrease as physical time moves forward.
This clock is not related to wall clock time, and is best suitable for
measuring intervals.
Here's an example:
auto start = std::chrono::steady_clock::now();
// do something
auto finish = std::chrono::steady_clock::now();
double elapsed_seconds = std::chrono::duration_cast<
std::chrono::duration<double> >(finish - start).count();
A small practical tip: if you are measuring run time and want to report seconds std::chrono::duration_cast<std::chrono::seconds> is rarely what you need because it gives you whole number of seconds. To get the time in seconds as a double use the example above.

You can use time to start your program. When it ends, it print nice time statistics about program run. It is easy to configure what to print. By default, it print user and CPU times it took to execute the program.
EDIT : Take a note that every measure from the code is not correct, because your application will get blocked by other programs, hence giving you wrong values*.
* By wrong values, I meant it is easy to get the time it took to execute the program, but that time varies depending on the CPUs load during the program execution. To get relatively stable time measurement, that doesn't depend on the CPU load, one can execute the application using time and use the CPU as the measurement result.

I used something like this in one of my projects:
#include <sys/time.h>
struct timeval start, end;
gettimeofday(&start, NULL);
//Compute
gettimeofday(&end, NULL);
double elapsed = ((end.tv_sec - start.tv_sec) * 1000)
+ (end.tv_usec / 1000 - start.tv_usec / 1000);
This is for milliseconds and it works both for C and C++.

This is the code I use:
const auto start = std::chrono::steady_clock::now();
// Your code here.
const auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed = end - start;
std::cout << "Time in seconds: " << elapsed.count() << '\n';
You don't want to use std::chrono::system_clock because it is not monotonic! If the user changes the time in the middle of your code your result will be wrong - it might even be negative. std::chrono::high_resolution_clock might be implemented using std::chrono::system_clock so I wouldn't recommend that either.
This code also avoids ugly casts.

If you wish to print the measured time with printf(), you can use this:
auto start = std::chrono::system_clock::now();
/* measured work */
auto end = std::chrono::system_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
printf("Time = %lld ms\n", static_cast<long long int>(elapsed.count()));

You could also try some timer classes that start and stop automatically, and gather statistics on the average, maximum and minimum time spent in any block of code, as well as the number of calls. These cxx-rtimer classes are available on GitHub, and offer support for using std::chrono, clock_gettime(), or boost::posix_time as a back-end clock source.
With these timers, you can do something like:
void timeCriticalFunction() {
static rtimers::cxx11::DefaultTimer timer("expensive");
auto scopedStartStop = timer.scopedStart();
// Do something costly...
}
with timing stats written to std::cerr on program completion.

Profiling C++ threads with clock()

I am trying to measure how gcc threads perform on my system. I've written some very simple measurement code which is something like this...
start = clock();
for(int i=0; i < thread_iters; i++) {
pthread_mutex_lock(dataMutex);
data++;
pthread_mutex_unlock(dataMutex);
}
end = clock();
I do the usual subtract and div by CLOCKS_PER_SEC to get an elapsed time of about 2 seconds for 100000000 iterations. I then change the profiling code slightly so I am measuring the individual time for each mutex_lock/unlock call.
for(int i=0; i < thread_iters; i++) {
start1 = clock();
pthread_mutex_lock(dataMutex);
end1 = clock();
lock_time+=(end1-start1);
data++;
start2 = clock();
pthread_mutex_unlock(dataMutex);
end2 = clock();
unlock_time+=(end2-start2)
}
The times I get for the same number of iterations are
lock: ~27 seconds
unlock: ~27 seconds
I get why the total time for the program increases, more timer calls in the loop. But the time for the system calls should still add up to less than 2 seconds. Can someone help me figure out where I went wrong? Thanks!

The clock calls also measure the time it takes to call clock and return from it. This introduces a bias into the measurement. I.e. somewhere deep inside the clock function it takes a sample. But then before running your code, it has to return from deep inside clock. And then when you take the end measurement, before that time sample can be taken, clock has to be called and control has to pass somewhere deep inside that function where it actually obtains the time. So you're including all that overhead as part of the measurement.
You must find out how much time elapses between consecutive clock calls (by taking some samples over many pairs of clock calls to get an accurate average). That gives you a baseline bias: how much time does it take to execute nothing at all between two clock samples. You then carefully subtract your bias from the measurements.
But calls to clock can disturb the performance so that you're not getting an accurate answer. Calls to the kernel to get the clock are disturbing your L1 cache and instruction cache. For fine grained measurements like this, it is better to drop down to inline assembly and read a cycle counting register from the CPU.
clock is best used as you have it in your first example: take samples around something that executes for many iterations, and then divide by the number of iterations to estimate the single-iteration time.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

`clock()` gives usual clocks instead of CPU clocks - c++

Related

Time taken between two points in code independent of system clock CPP Linux

C++ chrono timers go 1000x slower in a thread

Does steady_clock have only 10ms resolution on windows/cygwin?

Measuring the runtime of a C++ code?

Profiling C++ threads with clock()

Categories

Resources