I want to measure the runtime of my C++ code. Executing my code takes about 12 hours and I want to write this time at the end of execution of my code. How can I do it in my code?
Operating system: Linux
If you are using C++11 you can use system_clock::now():
auto start = std::chrono::system_clock::now();
/* do some work */
auto end = std::chrono::system_clock::now();
auto elapsed = end - start;
std::cout << elapsed.count() << '\n';
You can also specify the granularity to use for representing a duration:
// this constructs a duration object using milliseconds
auto elapsed =
std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
// this constructs a duration object using seconds
auto elapsed =
std::chrono::duration_cast<std::chrono::seconds>(end - start);
If you cannot use C++11, then have a look at chrono from Boost.
The best thing about using such a standard libraries is that their portability is really high (e.g., they both work in Linux and Windows). So you do not need to worry too much if you decide to port your application afterwards.
These libraries follow a modern C++ design too, as opposed to C-like approaches.
EDIT: The example above can be used to measure wall-clock time. That is not, however, the only way to measure the execution time of a program. First, we can distinct between user and system time:
User time: The time spent by the program running in user space.
System time: The time spent by the program running in system (or kernel) space. A program enters kernel space for instance when executing a system call.
Depending on the objectives it may be necessary or not to consider system time as part of the execution time of a program. For instance, if the aim is to just measure a compiler optimization on the user code then it is probably better to leave out system time. On the other hand, if the user wants to determine whether system calls are a significant overhead, then it is necessary to measure system time as well.
Moreover, since most modern systems are time-shared, different programs may compete for several computing resources (e.g., CPU). In such a case, another distinction can be made:
Wall-clock time: By using wall-clock time the execution of the program is measured in the same way as if we were using an external (wall) clock. This approach does not consider the interaction between programs.
CPU time: In this case we only count the time that a program is actually running on the CPU. If a program (P1) is co-scheduled with another one (P2), and we want to get the CPU time for P1, this approach does not include the time while P2 is running and P1 is waiting for the CPU (as opposed to the wall-clock time approach).
For measuring CPU time, Boost includes a set of extra clocks:
process_real_cpu_clock, captures wall clock CPU time spent by the current process.
process_user_cpu_clock, captures user-CPU time spent by the current process.
process_system_cpu_clock, captures system-CPU time spent by the current process. A tuple-like class process_cpu_clock, that captures real, user-CPU, and system-CPU process times together.
A thread_clock thread steady clock giving the time spent by the current thread (when supported by a platform).
Unfortunately, C++11 does not have such clocks. But Boost is a wide-used library and, probably, these extra clocks will be incorporated into C++1x at some point. So, if you use Boost you will be ready when the new C++ standard adds them.
Finally, if you want to measure the time a program takes to execute from the command line (as opposed to adding some code into your program), you may have a look at the time command, just as #BЈовић suggests. This approach, however, would not let you measure individual parts of your program (e.g., the time it takes to execute a function).
Use std::chrono::steady_clock and not std::chrono::system_clock for measuring run time in C++11. The reason is (quoting system_clock's documentation):
on most systems, the system time can be adjusted at any moment
while steady_clock is monotonic and is better suited for measuring intervals:
Class std::chrono::steady_clock represents a monotonic clock. The time
points of this clock cannot decrease as physical time moves forward.
This clock is not related to wall clock time, and is best suitable for
measuring intervals.
Here's an example:
auto start = std::chrono::steady_clock::now();
// do something
auto finish = std::chrono::steady_clock::now();
double elapsed_seconds = std::chrono::duration_cast<
std::chrono::duration<double> >(finish - start).count();
A small practical tip: if you are measuring run time and want to report seconds std::chrono::duration_cast<std::chrono::seconds> is rarely what you need because it gives you whole number of seconds. To get the time in seconds as a double use the example above.
You can use time to start your program. When it ends, it print nice time statistics about program run. It is easy to configure what to print. By default, it print user and CPU times it took to execute the program.
EDIT : Take a note that every measure from the code is not correct, because your application will get blocked by other programs, hence giving you wrong values*.
* By wrong values, I meant it is easy to get the time it took to execute the program, but that time varies depending on the CPUs load during the program execution. To get relatively stable time measurement, that doesn't depend on the CPU load, one can execute the application using time and use the CPU as the measurement result.
I used something like this in one of my projects:
#include <sys/time.h>
struct timeval start, end;
gettimeofday(&start, NULL);
//Compute
gettimeofday(&end, NULL);
double elapsed = ((end.tv_sec - start.tv_sec) * 1000)
+ (end.tv_usec / 1000 - start.tv_usec / 1000);
This is for milliseconds and it works both for C and C++.
This is the code I use:
const auto start = std::chrono::steady_clock::now();
// Your code here.
const auto end = std::chrono::steady_clock::now();
std::chrono::duration<double> elapsed = end - start;
std::cout << "Time in seconds: " << elapsed.count() << '\n';
You don't want to use std::chrono::system_clock because it is not monotonic! If the user changes the time in the middle of your code your result will be wrong - it might even be negative. std::chrono::high_resolution_clock might be implemented using std::chrono::system_clock so I wouldn't recommend that either.
This code also avoids ugly casts.
If you wish to print the measured time with printf(), you can use this:
auto start = std::chrono::system_clock::now();
/* measured work */
auto end = std::chrono::system_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
printf("Time = %lld ms\n", static_cast<long long int>(elapsed.count()));
You could also try some timer classes that start and stop automatically, and gather statistics on the average, maximum and minimum time spent in any block of code, as well as the number of calls. These cxx-rtimer classes are available on GitHub, and offer support for using std::chrono, clock_gettime(), or boost::posix_time as a back-end clock source.
With these timers, you can do something like:
void timeCriticalFunction() {
static rtimers::cxx11::DefaultTimer timer("expensive");
auto scopedStartStop = timer.scopedStart();
// Do something costly...
}
with timing stats written to std::cerr on program completion.
Related
I need to find the time taken to execute a piece of code, and the method should be independent of system time, ie chrono and all wouldn't work.
My usecse looks somewhat like this.
int main{
//start
function();
//end
time_take = end - start;
}
I am working in an embedded platform that doesn't have the right time at the start-up. In my case, the start of funcion happens before actual time is set from ntp server and end happens after the exact time is obtained. So any method that compares the time difference between two points wouldn't work. Also, number of CPU ticks wouldn't work for me since my programme necessarily be running actively throughout.
I tried the conventional methods and they didn't work for me.
On Linux clock_gettime() has an option to return the the current CLOCK_MONOTONIC, which is unaffected by system time changes. Measuring the CLOCK_MONOTONIC at the beginning and the end, and then doing your own math to subtract the two values, will measure the elapsed time ignoring any system time changes.
If you don't want to dip down to C-level abstractions, <chrono> has this covered for you with steady_clock:
int main{
//start
auto t0 = std::chrono::steady_clock::now();
function();
auto t1 = std::chrono::steady_clock::now();
//end
auto time_take = end - start;
}
steady_clock is generally a wrapper around clock_gettime used with CLOCK_MONOTONIC except is portable across all platforms. I.e. some platforms don't have clock_gettime, but do have an API for getting a monotonic clock time.
Above the type of take_time will be steady_clock::duration. On all platforms I'm aware of, this type is an alias for nanoseconds. If you want an integral count of nanoseconds you can:
using namespace std::literals;
int64_t i = time_take/1ns;
The above works on all platforms, even if steady_clock::duration is not nanoseconds.
The minor advantage of <chrono> over a C-level API is that you don't have to deal with computing timespec subtraction manually. And of course it is portable.
I used to take clock() to get CPU time of my algorithms. However, it does not seem to work anymore. I have a Windows 10 VM with 8 CPUs as also seen in the resource monitor.
Now, I measure time like this:
auto startTime = std::chrono::high_resolution_clock::now();
auto startClocks = std::clock();
// some code with TBB that uses multiple threads
auto endTime = std::chrono::high_resolution_clock::now();
auto endClocks = std::clock();
auto duration = endTime - startTime;
auto clockDuration = endClocks - startClocks;
auto durationSeconds = static_cast<double>(std::chrono::duration_cast<std::chrono::milliseconds>(duration).count()) / 1000.;
auto durationCpuSeconds = 1. * clockDuration / CLOCKS_PER_SEC;
The TBB part surely works, as I see in the resource monitor of Windows that all CPUs work with 100%. If I start some endless loop without parallelization, CPU usage is only 12.5% as expected.
However, durationSeconds and durationCpuSeconds are exactly the same...
I have measured the time with my watch and the result is the wall time. So, clock() clearly does not show the CPU time which should be substantially higher with 8 CPUs working 100% in parallel. Is clock() not reliable or am I missing something here?
Yeah, it's broken on Windows.
The clock function tells how much wall-clock time has passed since the CRT initialization during process start. Note that this function does not strictly conform to ISO C, which specifies net CPU time as the return value. To obtain CPU times, use the Win32 GetProcessTimes function.
(from Microsoft's clock docs)
I want to use a timer function in my program. Following the example at How to use clock() in C++, my code is:
int main()
{
std::clock_t start = std::clock();
while (true)
{
double time = (std::clock() - start) / (double)CLOCKS_PER_SEC;
std::cout << time << std::endl;
}
return 0;
}
On running this, it begins to print out numbers. However, it takes about 15 seconds for that number to reach 1. Why does it not take 1 second for the printed number to reach 1?
Actually it is a combination of what has been posted. Basically as your program is running in a tight loop, CPU time should increase just as fast as wall clock time.
But since your program is writing to stdout and the terminal has a limited buffer space, your program will block whenever that buffer is full until the terminal had enough time to print more of the generated output.
This of course is much more expensive CPU wise than generating the strings from the clock values, therefore most of the CPU time will be spent in the terminal and graphics driver. It seems like your system takes about 14 times the CPU power to output that timestamps than generating the strings to write.
std::clock returns cpu time, not wall time. That means the number of cpu-seconds used, not the time elapsed. If your program uses only 20% of the CPU, then the cpu-seconds will only increase at 20% of the speed of wall seconds.
std::clock
Returns the approximate processor time used by the process since the beginning of an implementation-defined era related to the program's execution. To convert result value to seconds divide it by CLOCKS_PER_SEC.
So it will not return a second until the program uses an actual second of the cpu time.
If you want to deal with actual time I suggest you use the clocks provided by <chrono> like std::steady_clock or std::high_resolution_clock
#include <iostream>
#include <conio.h>
#include <ctime>
using namespace std;
double diffclock(clock_t clock1,clock_t clock2)
{
double diffticks=clock1-clock2;
double diffms=(diffticks)/(CLOCKS_PER_SEC/1000);
return diffms;
}
int main()
{
clock_t start = clock();
for(int i=0;;i++)
{
if(i==10000)break;
}
clock_t end = clock();
cout << diffclock(start,end)<<endl;
getch();
return 0;
}
So my problems comes to that it returns me a 0, well to be stright i want to check how much time my program does operate...
I found tons of crap over the internet well mostly it comes to the same point of getting a 0 beacuse the start and the end is the same
This problems goes to C++ remeber : <
There are a few problems in here. The first is that you obviously switched start and stop time when passing to diffclock() function. The second problem is optimization. Any reasonably smart compiler with optimizations enabled would simply throw the entire loop away as it does not have any side effects. But even you fix the above problems, the program would most likely still print 0. If you try to imagine doing billions operations per second, throw sophisticated out of order execution, prediction and tons of other technologies employed by modern CPUs, even a CPU may optimize your loop away. But even if it doesn't, you'd need a lot more than 10K iterations in order to make it run longer. You'd probably need your program to run for a second or two in order to get clock() reflect anything.
But the most important problem is clock() itself. That function is not suitable for any time of performance measurements whatsoever. What it does is gives you an approximation of processor time used by the program. Aside of vague nature of the approximation method that might be used by any given implementation (since standard doesn't require it of anything specific), POSIX standard also requires CLOCKS_PER_SEC to be equal to 1000000 independent of the actual resolution. In other words — it doesn't matter how precise the clock is, it doesn't matter at what frequency your CPU is running. To put simply — it is a totally useless number and therefore a totally useless function. The only reason why it still exists is probably for historical reasons. So, please do not use it.
To achieve what you are looking for, people have used to read the CPU Time Stamp also known as "RDTSC" by the name of the corresponding CPU instruction used to read it. These days, however, this is also mostly useless because:
Modern operating systems can easily migrate the program from one CPU to another. You can imagine that reading time stamp on another CPU after running for a second on another doesn't make a lot of sense. It is only in latest Intel CPUs the counter is synchronized across CPU cores. All in all, it is still possible to do this, but a lot of extra care must be taken (i.e. once can setup the affinity for the process, etc. etc).
Measuring CPU instructions of the program oftentimes does not give an accurate picture of how much time it is actually using. This is because in real programs there could be some system calls where the work is performed by the OS kernel on behalf of the process. In that case, that time is not included.
It could also happen that OS suspends an execution of the process for a long time. And even though it took only a few instructions to execute, for user it seemed like a second. So such a performance measurement may be useless.
So what to do?
When it comes to profiling, a tool like perf must be used. It can track a number of CPU clocks, cache misses, branches taken, branches missed, a number of times the process was moved from one CPU to another, and so on. It can be used as a tool, or can be embedded into your application (something like PAPI).
And if the question is about actual time spent, people use a wall clock. Preferably, a high-precision one, that is also not a subject to NTP adjustments (monotonic). That shows exactly how much time elapsed, no matter what was going on. For that purpose clock_gettime() can be used. It is part of SUSv2, POSIX.1-2001 standard. Given that use you getch() to keep the terminal open, I'd assume you are using Windows. There, unfortunately, you don't have clock_gettime() and the closest thing would be performance counters API:
BOOL QueryPerformanceFrequency(LARGE_INTEGER *lpFrequency);
BOOL QueryPerformanceCounter(LARGE_INTEGER *lpPerformanceCount);
For a portable solution, the best bet is on std::chrono::high_resolution_clock(). It was introduced in C++11, but is supported by most industrial grade compilers (GCC, Clang, MSVC).
Below is an example of how to use it. Please note that since I know that my CPU will do 10000 increments of an integer way faster than a millisecond, I have changed it to microseconds. I've also declared the counter as volatile in hope that compiler won't optimize it away.
#include <ctime>
#include <chrono>
#include <iostream>
int main()
{
volatile int i = 0; // "volatile" is to ask compiler not to optimize the loop away.
auto start = std::chrono::steady_clock::now();
while (i < 10000) {
++i;
}
auto end = std::chrono::steady_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "It took me " << elapsed.count() << " microseconds." << std::endl;
}
When I compile and run it, it prints:
$ g++ -std=c++11 -Wall -o test ./test.cpp && ./test
It took me 23 microseconds.
Hope it helps. Good Luck!
At a glance, it seems like you are subtracting the larger value from the smaller value. You call:
diffclock( start, end );
But then diffclock is defined as:
double diffclock( clock_t clock1, clock_t clock2 ) {
double diffticks = clock1 - clock2;
double diffms = diffticks / ( CLOCKS_PER_SEC / 1000 );
return diffms;
}
Apart from that, it may have something to do with the way you are converting units. The use of 1000 to convert to milliseconds is different on this page:
http://en.cppreference.com/w/cpp/chrono/c/clock
The problem appears to be the loop is just too short. I tried it on my system and it gave 0 ticks. I checked what diffticks was and it was 0. Increasing the loop size to 100000000, so there was a noticeable time lag and I got -290 as output (bug -- I think that the diffticks should be clock2-clock1 so we should get 290 and not -290). I tried also changing "1000" to "1000.0" in the division and that didn't work.
Compiling with optimization does remove the loop, so you have to not use it, or make the loop "do something", e.g. increment a counter other than the loop counter in the loop body. At least that's what GCC does.
Note: This is available after c++11.
You can use std::chrono library.
std::chrono has two distinct objects. (timepoint and duration). Timepoint represents a point in time, and duration, as we already know the term represents an interval or a span of time.
This c++ library allows us to subtract two timepoints to get a duration of time passed in the interval. So you can set a starting point and a stopping point. Using functions you can also convert them into appropriate units.
Example using high_resolution_clock (which is one of the three clocks this library provides):
#include <chrono>
using namespace std::chrono;
//before running function
auto start = high_resolution_clock::now();
//after calling function
auto stop = high_resolution_clock::now();
Subtract stop and start timepoints and cast it into required units using the duration_cast() function. Predefined units are nanoseconds, microseconds, milliseconds, seconds, minutes, and hours.
auto duration = duration_cast<microseconds>(stop - start);
cout << duration.count() << endl;
First of all you should subtract end - start not vice versa.
Documentation says if value is not available clock() returns -1, did you check that?
What optimization level do you use when compile your program? If optimization is enabled compiler can effectively eliminate your loop entirely.
The time command returns the time elapsed in execution of a command.
If I put a "gettimeofday()" at the start of the command call (using system() ), and one at the end of the call, and take a difference, it doesn't come out the same. (its not a very small difference either)
Can anybody explain what is the exact difference between the two usages and which is the best way to time the execution of a call?
Thanks.
The Unix time command measures the whole program execution time, including the time it takes for the system to load your binary and all its libraries, and the time it takes to clean up everything once your program is finished.
On the other hand, gettimeofday can only work inside your program, that is after it has finished loading (for the initial measurement), and before it is cleaned up (for the final measurement).
Which one is best? Depends on what you want to measure... ;)
It's all dependent on what you are timing. If you are trying to time something in seconds, then time() is probably your best bet. If you need higher resolution than that, then I would consider gettimeofday(), which gives up to microsecond resolution (1 / 1000000th of a second).
If you need even higher resolution than that, consider using clock() and CLOCKS_PER_SECOND, just note that clock() is rarely an accurate description for the amount of time taken, but rather the number of CPU cycles used.
time() returns time since epoch in seconds.
gettimeofday(): returns:
struct timeval {
time_t tv_sec; /* seconds */
suseconds_t tv_usec; /* microseconds */
};
Each time function has different precision. In C++11 you would use std::chrono:
using namespace std::chrono;
auto start = high_resolution_clock::now();
/* do stuff*/
auto end = high_resolution_clock::now();
float elapsedSeconds = duration_cast<duration<float>>(end-start).count();