std::clock() and CLOCKS_PER_SEC - c++

I want to use a timer function in my program. Following the example at How to use clock() in C++, my code is:
int main()
{
std::clock_t start = std::clock();
while (true)
{
double time = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << time << std::endl;
}
return 0;
}
On running this, it begins to print out numbers. However, it takes about 15 seconds for that number to reach 1. Why does it not take 1 second for the printed number to reach 1?

Actually it is a combination of what has been posted. Basically as your program is running in a tight loop, CPU time should increase just as fast as wall clock time.
But since your program is writing to stdout and the terminal has a limited buffer space, your program will block whenever that buffer is full until the terminal had enough time to print more of the generated output.
This of course is much more expensive CPU wise than generating the strings from the clock values, therefore most of the CPU time will be spent in the terminal and graphics driver. It seems like your system takes about 14 times the CPU power to output that timestamps than generating the strings to write.

std::clock returns cpu time, not wall time. That means the number of cpu-seconds used, not the time elapsed. If your program uses only 20% of the CPU, then the cpu-seconds will only increase at 20% of the speed of wall seconds.

std::clock
Returns the approximate processor time used by the process since the beginning of an implementation-defined era related to the program's execution. To convert result value to seconds divide it by CLOCKS_PER_SEC.
So it will not return a second until the program uses an actual second of the cpu time.
If you want to deal with actual time I suggest you use the clocks provided by <chrono> like std::steady_clock or std::high_resolution_clock

Related

Execution time inconsistency in a program with high priority in the scheduler using RT Kernel

Problem
We are trying to implement a program that sends commands to a robot in a given cycle time. Thus this program should be a real-time application. We set up a pc with a preempted RT Linux kernel and are launching our programs with chrt -f 98 or chrt -rr 99 to define the scheduling policy and priority. Loading of the kernel and launching of the program seems to be fine and work (see details below).
Now we were measuring the time (CPU ticks) it takes our program to be computed. We expected this time to be constant with very little variation. What we measured though, were quite significant differences in computation time. Of course, we thought this could be undefined behavior in our rather complex program, so we created a very basic program and measured the time as well. The behavior was similarly bad.
Question
Why are we not measuring a (close to) constant computation time even for our basic program?
How can we solve this problem?
Environment Description
First of all, we installed an RT Linux Kernel on the PC using this tutorial. The main characteristics of the PC are:
PC Characteristics
Details
CPU
Intel(R) Atom(TM) Processor E3950 # 1.60GHz with 4 cores
Memory RAM
8 GB
Operating System
Ubunut 20.04.1 LTS
Kernel
Linux 5.9.1-rt20 SMP PREEMPT_RT
Architecture
x86-64
Tests
The first time we detected this problem was when we were measuring the time it takes to execute this "complex" program with a single thread. We did a few tests with this program but also with a simpler one:
The CPU execution times
The wall time (the world real-time)
The difference (Wall time - CPU time) between them and the ratio (CPU time / Wall time).
We also did a latency test on the PC.
Latency Test
For this one, we followed this tutorial, and these are the results:
Latency Test Generic Kernel
Latency Test RT Kernel
The processes are shown in htop with a priority of RT
Test Program - Complex
We called the function multiple times in the program and measured the time each takes. The results of the 2 tests are:
From this we observed that:
The first execution (around 0.28 ms) always takes longer than the second one (around 0.18 ms), but most of the time it is not the longest iteration.
The mode is around 0.17 ms.
For those that take 17 ms the difference is usually 0 and the ratio 1. Although this is not exclusive to this time. For these, it seems like only 1 CPU is being used and it is saturated (there is no waiting time).
When the difference is not 0, it is usually negative. This, from what we have read here and here, is because more than 1 CPU is being used.
Test Program - Simple
We did the same test but this time with a simpler program:
#include <vector>
#include <iostream>
#include <time.h>
int main(int argc, char** argv) {
int iterations = 5000;
double a = 5.5;
double b = 5.5;
double c = 4.5;
std::vector<double> wallTime(iterations, 0);
std::vector<double> cpuTime(iterations, 0);
struct timespec beginWallTime, endWallTime, beginCPUTime, endCPUTime;
std::cout << "Iteration | WallTime | cpuTime" << std::endl;
for (unsigned int i = 0; i < iterations; i++) {
// Start measuring time
clock_gettime(CLOCK_REALTIME, &beginWallTime);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &beginCPUTime);
// Function
a = b + c + i;
// Stop measuring time and calculate the elapsed time
clock_gettime(CLOCK_REALTIME, &endWallTime);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endCPUTime);
wallTime[i] = (endWallTime.tv_sec - beginWallTime.tv_sec) + (endWallTime.tv_nsec - beginWallTime.tv_nsec)*1e-9;
cpuTime[i] = (endCPUTime.tv_sec - beginCPUTime.tv_sec) + (endCPUTime.tv_nsec - beginCPUTime.tv_nsec)*1e-9;
std::cout << i << " | " << wallTime[i] << " | " << cpuTime[i] << std::endl;
}
return 0;
}
Final Thoughts
We understand that:
If the ratio == number of CPUs used, they are saturated and there is no waiting time.
If the ratio < number of CPUs used, it means that there is some waiting time (theoretically we should only be using 1 CPU, although in practice we use more).
Of course, we can give more details.
Thanks a lot for your help!
Your function will near certainly be optimized away so you are just measuring how long it takes to read the clocks. And as you can see that doesn't take very long with some exceptions:
The very first time you run the code (unless you just compiled it) the pages need to be loaded from disk. If you are unlucky the code spans pages and you include the loading of the next page in the measured time. Quite unlikely given the code size.
The first loop the code and any data needs to be loaded into cache. So that takes longer to execute. The branch predictor might also need a few loops to predict the loop right so the second, third loop might be slightly longer too.
For everything else I think you can blame scheduling:
an IRQ happens but nothing gets rescheduled
the process gets paused while another process runs
the process gets moved to another CPU thread leaving the caches hot
the process gets moved to another CPU core making L1 cache cold but leaving L2/L3 caches hot (if your L2 is shared)
the process gets moved to a CPU on another socket making L1/L2 caches cold but L3 cache hot (if L3 is shared)
You can do little about IRQs. Some you can fix to specific cores but others are just essential (like the timer interrupt for the scheduler itself). You kind of just have to live with that.
But you can fix your program to a specific CPU and you can fix everything else to all the other cores. Basically reserving the core for the real-time code. I guess you would have to use cgroups for this, to keep everything else off the chosen core. And you might still get some kernel threads run on the reserved core. Nothing you can do about that. But that should eliminate most of the large execution times.

why clock() does not work on the the cluster machine

I want to get the running time of part of my code.
my C++ code is like:
...
time_t t1 = clock();
/*
Here is my core code.
*/
time_t t2 = clock();
cout <<"Running time: "<< (1000.0 * (t2 - t1)) / CLOCKS_PER_SEC << "ms" << endl;
...
This code works well on my laptop.(Opensuse,g++ and clang++, Core i5).
But it does not work well on the cluster in the department.
(Ubuntu, g++, amd Opteron and intel Xeon)
I always get some integer running time :
like : 0ms or 10ms or 20ms.
What cause that ? Why? Thanks!
Clocks are not guaranteed to be exact down to ~10-44 seconds (Planck time), they often have a minimal resolution. The Linux man page implies this with:
The clock() function returns an approximation of processor time used by the program.
and so does the ISO standard C11 7.27.2.1 The clock function /3:
The clock function returns the implementation’s best approximation of ...
and in 7.27.1 Components of time /4:
The range and precision of times representable in clock_t and time_t are implementation-defined.
From your (admittedly limited) sample data, it looks like the minimum resolution of your cluster machines is on the order of 10ms.
In any case, you have several possibilities if you need a finer resolution.
First, find a (probably implementation-specific) means of timing things more accurately.
Second, don't do it once. Do it a thousand times in a tight loop and then just divide the time taken by 1000. That should roughly increase your resolution a thousand-fold.
Thirdly, think about the implication that your code only takes 50ms at the outside. Unless you have a pressing need to execute it more than twenty times a second (assuming you have no other code to run), it may not be an issue.
On that last point, think of things like "What's the longest a user will have to wait before they get annoyed?". The answer to that would vary but half a second might be fine in most situations.
Since 50ms code could run ten times over during that time, you may want to ignore it. You'd be better off concentrating on code that has a clearly larger impact.

Why is the CPU time different with every execution of this program?

I have a hard time understanding processor time. The result of this program:
#include <iostream>
#include <chrono>
// the function f() does some time-consuming work
void f()
{
volatile long double d;
int size = 10000;
for(int n=0; n<size; ++n)
for(int m=0; m<size; ++m)
d = n*m;
}
int main()
{
std::clock_t start = std::clock();
f();
std::clock_t end = std::clock();
std::cout << "CPU time used: "
<< (end - start)
<< "\n";
}
Seems to randomly fluctuate between 210 000, 220 000 and 230 000. At first I was amazed, why these discrete values. Then I found out that std::clock() returns only approximate processor time. So probably the value returned by std::clock() is rounded to a multiple of 10 000. This would also explain why the maximum difference between the CPU times is 20 000 (10 000 == rounding error by the first call to std::clock() and 10 000 by the second).
But if I change to int size = 40000; in the body of f(), I get fluctuations in the ranges of 3 400 000 to 3 500 000 which cannot be explained by rounding.
From what I read about the clock rate, on Wikipedia:
The CPU requires a fixed number of clock ticks (or clock cycles) to
execute each instruction. The faster the clock, the more instructions
the CPU can execute per second.
That is, if the program is deterministic (which I hope mine is), the CPU time needed to finish should be:
Always the same
Slightly higher than the number of instructions carried out
My experiments show neither, since my program needs to carry out at least 3 * size * size instructions. Could you please explain what I am doing wrong?
First, the statement you quote from Wikipedia is simply false.
It might have been true 20 years ago (but not always, even
then), but it is totally false today. There are many things
which can affect your timings:
The first: if you're running on Windows, clock is broken,
and totally unreliable. It returns the difference in elapsed
time, not CPU time. And elapsed time depends on all sorts of
other things the processor might be doing.
Beyond that: things like cache misses have a very significant
impact on time. And whether a particular piece of data is in
the cache or not can depend on whether your program was
interrupted between the last access and this one.
In general, anything less than 10% can easily be due to the
caching issues. And I've seen differences of a factor of 10
under Windows, depending on whether there was a build running or
not.
You don't state what hardware you're running the binary on.
Does it have an interrupt driven CPU ?
Is it a multitasking operating system ?
You're mistaking the cycle time of the CPU (the CPU clock as Wikipedia refers to) with the time it takes to execute a particular piece of code from start to end and all the other stuff the poor CPU has to do at the same time.
Also ... is all your executing code in level 1 cache, or is some in level 2 or in main memory, or on disk ... what about the next time you run it ?
Your program is not deterministic, because it uses library and system functions which are not deterministic.
As a particular example, when you allocate memory this is virtual memory, which must be mapped to physical memory. Although this is a system call, running kernel code, it takes place on your thread and will count against your clock time. How long it takes to do this will depend on what the overall memory allocation situation is.
The CPU time is indeed "fixed" for a given set of circumstances. However, in a modern computer, there are other things happening in the system, which interferes with the execution of your code. It may be that caches are being wiped out when your email software wakes up to check if there is any new emails for you, or when the HP printer software checks for updates, or when the antivirus software decides to run for a little bit checking if your memory contains any viruses, etc, etc, etc, etc.
Part of this is also caused by the problem that CPU time accounting in any system is not 100% accurate - it works on "clock-ticks" and similar things, so the time used by for example an interrupt to service a network packet coming in, or the hard disk servicing interrupt, or the timer interrupt to say "another millisecond ticked by" these all account into "the currently running process". Assuming this is Windows, there is a further "feature", and that is that for historical and other reasons, std::clock() simply returns the time now, not actually the time used by your process. So for exampple:
t = clock();
cin >> x;
t = clock() - t;
would leave t with a time of 10 seconds if it took ten seconds to input the value of x, even though 9.999 of those ten seconds were spent in the idle process, not your program.

How to calculate and print clock_t time roughly

I am timing how long it takes to do three different types of searches, sequential, recursive binary, and iterative binary. I have those in place, and it does iterate through and finish the search. My problem is that when I time them all, I get 0 for all of them every time, even if I make an array of 100,000, and I have it search for something not in the array. If I set a break point in the search it obviously makes the time longer, and it gives me a reasonable time that I can work with. But otherwise it is always 0. Here is my code, it is similar for all three search timers.
clock_t recStart = clock();
mySearch.recursiveSearch(SEARCH_INT);
clock_t recEnd = clock();
clock_t recDiff = recEnd - recStart;
double recClockTime = (double)recDiff/(double)CLOCKS_PER_SEC;
cout << recClockTime << endl;
cout << CLOCKS_PER_SEC << endl;
cout << recClockTime << endl;
For the last two I get 1000 and 0.
Am I doing something wrong here? Or is it in my search Object?
clock() is not an accurate timer, and it just don't work well for timing short intervals.
C says clock returns the implementation’s best approximation to the processor time used by the program since the beginning of an implementation-defined era related only to the program invocation.
If between two successive clock calls you program takes less time than one unity of the clock function, you could get 0. POSIX clock defines the unity with CLOCKS_PER_SEC as 1000000 (unity is then 1 microsecond).
(http://pubs.opengroup.org/onlinepubs/009604499/functions/clock.html)
To measure clock cycles in x86/x64 you can use assembly to retreive the clock count of the CPU Time Stamp Counter register rdtsc. (which can be achieved by inline assembling?) Note that it returns the time stamp, not the number of seconds elapsed. So you need to retrieve the cpu frequency as well.
However, the best way to get accurate time in seconds depends on your platform.
To sum up, it's virtually impossible to achieve calculating and printing clock_t time in seconds accurately. You might want to see this on Stackoverflow to find a better approach (if accuracy is top priority).
clock() just doesn't have enough resolution - here is one good discussion/blog on that topic
http://www.guyrutenberg.com/2007/09/10/resolution-problems-in-clock/
I think two options either use clock_gettime or even better have you considered using OProfile or CodeAnalyst?
I personally prefer to use tools - OProfile is good. I have not used CodeAnalyst before - and then there is Valgrind and gprof.
If you insist on using clock_gettime - please check this out
http://www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime/

Measuring the runtime of a C++ code?

I want to measure the runtime of my C++ code. Executing my code takes about 12 hours and I want to write this time at the end of execution of my code. How can I do it in my code?
Operating system: Linux
If you are using C++11 you can use system_clock::now():
auto start = std::chrono::system_clock::now();
/* do some work */
auto end = std::chrono::system_clock::now();
auto elapsed = end - start;
std::cout << elapsed.count() << '\n';
You can also specify the granularity to use for representing a duration:
// this constructs a duration object using milliseconds
auto elapsed =
std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
// this constructs a duration object using seconds
auto elapsed =
std::chrono::duration_cast<std::chrono::seconds>(end - start);
If you cannot use C++11, then have a look at chrono from Boost.
The best thing about using such a standard libraries is that their portability is really high (e.g., they both work in Linux and Windows). So you do not need to worry too much if you decide to port your application afterwards.
These libraries follow a modern C++ design too, as opposed to C-like approaches.
EDIT: The example above can be used to measure wall-clock time. That is not, however, the only way to measure the execution time of a program. First, we can distinct between user and system time:
User time: The time spent by the program running in user space.
System time: The time spent by the program running in system (or kernel) space. A program enters kernel space for instance when executing a system call.
Depending on the objectives it may be necessary or not to consider system time as part of the execution time of a program. For instance, if the aim is to just measure a compiler optimization on the user code then it is probably better to leave out system time. On the other hand, if the user wants to determine whether system calls are a significant overhead, then it is necessary to measure system time as well.
Moreover, since most modern systems are time-shared, different programs may compete for several computing resources (e.g., CPU). In such a case, another distinction can be made:
Wall-clock time: By using wall-clock time the execution of the program is measured in the same way as if we were using an external (wall) clock. This approach does not consider the interaction between programs.
CPU time: In this case we only count the time that a program is actually running on the CPU. If a program (P1) is co-scheduled with another one (P2), and we want to get the CPU time for P1, this approach does not include the time while P2 is running and P1 is waiting for the CPU (as opposed to the wall-clock time approach).
For measuring CPU time, Boost includes a set of extra clocks:
process_real_cpu_clock, captures wall clock CPU time spent by the current process.
process_user_cpu_clock, captures user-CPU time spent by the current process.
process_system_cpu_clock, captures system-CPU time spent by the current process. A tuple-like class process_cpu_clock, that captures real, user-CPU, and system-CPU process times together.
A thread_clock thread steady clock giving the time spent by the current thread (when supported by a platform).
Unfortunately, C++11 does not have such clocks. But Boost is a wide-used library and, probably, these extra clocks will be incorporated into C++1x at some point. So, if you use Boost you will be ready when the new C++ standard adds them.
Finally, if you want to measure the time a program takes to execute from the command line (as opposed to adding some code into your program), you may have a look at the time command, just as #BЈовић suggests. This approach, however, would not let you measure individual parts of your program (e.g., the time it takes to execute a function).
Use std::chrono::steady_clock and not std::chrono::system_clock for measuring run time in C++11. The reason is (quoting system_clock's documentation):
on most systems, the system time can be adjusted at any moment
while steady_clock is monotonic and is better suited for measuring intervals:
Class std::chrono::steady_clock represents a monotonic clock. The time
points of this clock cannot decrease as physical time moves forward.
This clock is not related to wall clock time, and is best suitable for
measuring intervals.
Here's an example:
auto start = std::chrono::steady_clock::now();
// do something
auto finish = std::chrono::steady_clock::now();
double elapsed_seconds = std::chrono::duration_cast<
std::chrono::duration<double> >(finish - start).count();
A small practical tip: if you are measuring run time and want to report seconds std::chrono::duration_cast<std::chrono::seconds> is rarely what you need because it gives you whole number of seconds. To get the time in seconds as a double use the example above.
You can use time to start your program. When it ends, it print nice time statistics about program run. It is easy to configure what to print. By default, it print user and CPU times it took to execute the program.
EDIT : Take a note that every measure from the code is not correct, because your application will get blocked by other programs, hence giving you wrong values*.
* By wrong values, I meant it is easy to get the time it took to execute the program, but that time varies depending on the CPUs load during the program execution. To get relatively stable time measurement, that doesn't depend on the CPU load, one can execute the application using time and use the CPU as the measurement result.
I used something like this in one of my projects:
#include <sys/time.h>
struct timeval start, end;
gettimeofday(&start, NULL);
//Compute
gettimeofday(&end, NULL);
double elapsed = ((end.tv_sec - start.tv_sec) * 1000)
+ (end.tv_usec / 1000 - start.tv_usec / 1000);
This is for milliseconds and it works both for C and C++.
This is the code I use:
const auto start = std::chrono::steady_clock::now();
// Your code here.
const auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed = end - start;
std::cout << "Time in seconds: " << elapsed.count() << '\n';
You don't want to use std::chrono::system_clock because it is not monotonic! If the user changes the time in the middle of your code your result will be wrong - it might even be negative. std::chrono::high_resolution_clock might be implemented using std::chrono::system_clock so I wouldn't recommend that either.
This code also avoids ugly casts.
If you wish to print the measured time with printf(), you can use this:
auto start = std::chrono::system_clock::now();
/* measured work */
auto end = std::chrono::system_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
printf("Time = %lld ms\n", static_cast<long long int>(elapsed.count()));
You could also try some timer classes that start and stop automatically, and gather statistics on the average, maximum and minimum time spent in any block of code, as well as the number of calls. These cxx-rtimer classes are available on GitHub, and offer support for using std::chrono, clock_gettime(), or boost::posix_time as a back-end clock source.
With these timers, you can do something like:
void timeCriticalFunction() {
static rtimers::cxx11::DefaultTimer timer("expensive");
auto scopedStartStop = timer.scopedStart();
// Do something costly...
}
with timing stats written to std::cerr on program completion.