This question already has answers here:
Measuring execution time of a function in C++
(14 answers)
Closed 4 years ago.
In C# I would fire up the Stopwatch class to do some quick-and-dirty timing of how long certain methods take.
What is the equivalent of this in C++? Is there a high precision timer built in?
I used boost::timer for measuring the duration of an operation. It provides a very easy way to do the measurement, and at the same time being platform independent. Here is an example:
boost::timer myTimer;
doOperation();
std::cout << myTimer.elapsed();
P.S. To overcome precision errors, it would be great to measure operations that take a few seconds. Especially when you are trying to compare several alternatives. If you want to measure something that takes very little time, try putting it into a loop. For example run the operation 1000 times, and then divide the total time by 1000.
I've implemented a timer for situations like this before: I actually ended up with a class with two different implemations, one for Windows and one for POSIX.
The reason was that Windows has the QueryPerformanceCounter() function which gives you access to a very accurate clock which is ideal for such timings.
On POSIX however this isn't available so I just used boost.datetime's classes to store the start and end times then calculated the duration from those. It offers a "high resolution" timer but the resolution is undefined and varies from platform to platform.
I use my own version of Python's time_it function. The advantage of this function is that it repeats a computation as many times as necessary to obtain meaningful results. If the computation is very fast, it will be repeated many times. In the end you obtain the average time of all the repetitions. It does not use any non-standard functionality:
#include <ctime>
double clock_diff_to_sec(long clock_diff)
{
return double(clock_diff) / CLOCKS_PER_SEC;
}
template<class Proc>
double time_it(Proc proc, int N=1) // returns time in microseconds
{
std::clock_t const start = std::clock();
for(int i = 0; i < N; ++i)
proc();
std::clock_t const end = std::clock();
if(clock_diff_to_sec(end - start) < .2)
return time_it(proc, N * 5);
return clock_diff_to_sec(end - start) * (1e6 / N);
}
The following example uses the time_it function to measure the performance of different STL containers:
void dummy_op(int i)
{
if(i == -1)
std::cout << i << "\n";
}
template<class Container>
void test(Container const & c)
{
std::for_each(c.begin(), c.end(), &dummy_op);
}
template<class OutIt>
void init(OutIt it)
{
for(int i = 0; i < 1000; ++i)
*it = i;
}
int main( int argc, char ** argv )
{
{
std::vector<int> c;
init(std::back_inserter(c));
std::cout << "vector: "
<< time_it(boost::bind(&test<std::vector<int> >, c)) << "\n";
}
{
std::list<int> c;
init(std::back_inserter(c));
std::cout << "list: "
<< time_it(boost::bind(&test<std::list<int> >, c)) << "\n";
}
{
std::deque<int> c;
init(std::back_inserter(c));
std::cout << "deque: "
<< time_it(boost::bind(&test<std::deque<int> >, c)) << "\n";
}
{
std::set<int> c;
init(std::inserter(c, c.begin()));
std::cout << "set: "
<< time_it(boost::bind(&test<std::set<int> >, c)) << "\n";
}
{
std::tr1::unordered_set<int> c;
init(std::inserter(c, c.begin()));
std::cout << "unordered_set: "
<< time_it(boost::bind(&test<std::tr1::unordered_set<int> >, c)) << "\n";
}
}
In case anyone is curious here is the output I get (compiled with VS2008 in release mode):
vector: 8.7168
list: 27.776
deque: 91.52
set: 103.04
unordered_set: 29.76
You can use the ctime library to get the time in seconds. Getting the time in milliseconds is implementation-specific. Here is a discussion exploring some ways to do that.
See also: How to measure time in milliseconds using ANSI C?
High-precision timers are platform-specific and so aren't specified by the C++ standard, but there are libraries available. See this question for a discussion.
I humbly submit my own micro-benchmarking mini-library (on Github). It's super simple -- the only advantage it has over rolling your own is that it already has the high-performance timer code implemented for Windows and Linux, and abstracts away the annoying boilerplate.
Just pass in a function (or lambda), the number of times it should be called per test run (default: 1), and the number of test runs (default: 100). The fastest test run (measured in fractional milliseconds) is returned:
// Example that times the compare-and-swap atomic operation from C++11
// Sample GCC command: g++ -std=c++11 -DNDEBUG -O3 -lrt main.cpp microbench/systemtime.cpp -o bench
#include "microbench/microbench.h"
#include <cstdio>
#include <atomic>
int main()
{
std::atomic<int> x(0);
int y = 0;
printf("CAS takes %.4fms to execute 100000 iterations\n",
moodycamel::microbench(
[&]() { x.compare_exchange_strong(y, 0); }, /* function to benchmark */
100000, /* iterations per test run */
100 /* test runs */
)
);
// Result: Clocks in at 1.2ms (12ns per CAS operation) in my environment
return 0;
}
#include <time.h>
clock_t start, end;
start = clock();
//Do stuff
end = clock();
printf("Took: %f\n", (float)((end - start) / (float)CLOCKS_PER_SEC));
This might be an OS-dependent issue rather than a language issue.
If you're on Windows then you can access a millisecond 10- to 16-millisecond timer through GetTickCount() or GetTickCount64(). Just call it once at the start and once at the end, and subtract.
That was what I used before if I recall correctly. The linked page has other options as well.
You can find useful this class.
Using RAII idiom, it prints the text given in construction when destructor is called, filling elapsed time placeholder with the proper value.
Example of use:
int main()
{
trace_elapsed_time t("Elapsed time: %ts.\n");
usleep(1.005 * 1e6);
}
Output:
Elapsed time: 1.00509s.
Related
I would like to call a function a small number of times (i.e, 4 times) using multithreading. Using the example from the Solarian Programmer blog (https://solarianprogrammer.com/2011/12/16/cpp-11-thread-tutorial/), I have written this simple c++11 program:
#include <iostream>
#include <thread>
#include <ctime>
#include <math.h>
#define PI 3.14159265358979323846
static const int num_threads = 4;
void call_from_thread(int tid) {
std::cout << log(2) - 0.5*log(2*PI) - log(1.05) - pow(2.3-0.5,2)/(2*pow(1.05,2))<<std::endl;
}
int main() {
std::thread t[num_threads];
std::clock_t start;
start = std::clock();
//Launch a group of threads
for (int i = 0; i < num_threads; ++i) {
t[i] = std::thread(call_from_thread, i);
}
std::cout << "Launched from the main\n";
for (int i = 0; i < num_threads; ++i) {
t[i].join();
}
std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
return 0;
}
Of course, this example is trivial, and I do not need multi-threading here as the function returns always the same result. But, I plan to slightly modify this function so that there are some differences across threads. When running this code, I got something like :
Time: 0.806 ms
Now, if I modify the above code to allow just a single threaded code, I have the following:
int main() {
std::thread t[num_threads];
std::clock_t start;
start = std::clock();
//Launch a group of threads
for (int i = 0; i < num_threads; ++i) {
call_from_thread(i);
}
std::cout << "Launched from the main\n";
std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
return 0;
}
Here, the running time is much lower:
Time: 0.116 ms
So, my question is can I call this simple function a small number of times using multi-threading in order to speed up my code? Basically, I would like to go below those 0.116 ms.
Note also, I am a newbie in c++ and parallelism, so sorry if my question does not seem relevant.
You could (in theory) do that, but be aware that creating a new thread is a complex and heavy operation (for the operating system and the standard C++ library). On Linux, std::thread would use pthread_create(3) which uses clone(2) (a low level system call).
In practice (as a very naive rule of thumb), creating a thread might take a few milliseconds and is worthwhile only if that thread runs for more than a few milliseconds. So you might want to have thread pools. Remember than an elementary operation (a machine code instruction, e.g. doing a 32 bits addition) usually takes only a few nanoseconds (a million of them make a millisecond). So in real life using threads is worthwhile only for quite "complex" functions.
In other words, threads are quite heavy resources (e.g. because they usually have their own call stack, typically of a megabyte, and they want to run on some other core of your processor).
I had followed the link that you provided and read this on their web page:
Now, in order to see a parallel code at work we will need to give him a significative amount of work, otherwise the overhead of creating and destroying threads will nullify our effort to parallelize this code. The input image should be large enough to actually see an improvement in performance when the code is run in parallel. For this purpose I’ve used an image of 16000x10626 pixels which occupy about 512 MB in PPM format:
This page even states the same exact thing that user: Basile Starynkevitch had already given. I agree with this assertion when using multiple threads.
I'm trying to benchmark recursive fibonacci sequence calculator on C++. But surprisingly program outputs 0 nanoseconds, and start calculation after printing result. (CPU usage increase after printing 0 nanoseconds)
I think this is optimization feature of the compiler.
#include <iostream>
#include <chrono>
int fib2(int n) {
return (n < 2) ? n : fib2(n - 1) + fib2(n - 2);
}
int main(int argc, char* argv[])
{
auto tbegin = std::chrono::high_resolution_clock::now();
int a = fib2(50);
auto tend = std::chrono::high_resolution_clock::now();
std::cout << (tend - tbegin).count() << " nanoseconds" << std::endl;
std::cout << "fib => " << a << std::endl;
}
Output:
0 nanoseconds
Is it feature? If yes, how can I disable this feature?
The problem is that the result of this function called with a value of 50 doesn't fit to the int type, it's just too big. Try using int64_t instead.
Live demo
Note that I replaced the original Fibbonachi function with a more optimized one, as the execution took too long and the online tool cuts off the execution after some period of time. That is not a fault of the program or the code, it's just a protection of the online tool.
I am trying to calculate the time a certain function takes to run
#include <iostream>
#include <cstdlib>
#include <ctime>
#include "time.h"
int myFunction(int n)
{
.............
}
int n;
time_t start;
std::cout<<"What number would you like to enter ";
std::cout << std::endl;
std::cin>>n;
start = clock();
std::cout<<myFunction(m)<<std::endl;
std::cout << "Time it took: " << (clock() - start) / (double)(CLOCKS_PER_SEC/ 1000 ) << std::endl;
std::cout << std::endl;
This works fine in x-code (getting numbers such 4.2, 2.6 ...), but doesn't on a linux based server where I'm always getting 0. Any ideas why that is and how to fix it?
The "tick" of clock may be more than 1/CLOCKS_PER_SEC seconds, for example it could be 10ms, 15.832761ms or 32 microseconds.
If the time consumed by your code is smaller than this "tick", then the time taken will appear to be zero.
There's no simple way to find out what that is, other than - you could call clock repeatedly until the return-value is not the same as last time, but that may not be entirely reliable, and if the clock-tick is VERY small, it may not be accurate in that direction, but if the time is quite long, you may be able to find out.
For measuring very short times (a few milliseconds), and assuming the function is entirely CPU/Memory-bound, and not spending time waiting for file I/O or sending receiving packets over a network, then std::chrono can be used to measure the time. For extremely short times, using the processor time-stamp-counter can also be a method, although that can be quite tricky to use because it varies in speed depending on load, and can have different values between different cores.
In my compiler project, I'm using this to measure the time of things:
This part goes into a header:
class TimeTrace
{
public:
TimeTrace(const char *func) : impl(0)
{
if(timetrace)
{
createImpl(func);
}
}
~TimeTrace() { if(impl) destroyImpl(); }
private:
void createImpl(const char *func);
void destroyImpl();
TimeTraceImpl *impl;
};
This goes into a source file.
class TimeTraceImpl
{
public:
TimeTraceImpl(const char *func) : func(func)
{
start = std::chrono::steady_clock::now();
}
~TimeTraceImpl()
{
end = std::chrono::steady_clock::now();
uint64_t elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end-start).count();
std::cerr << "Time for " << func << " "
<< std::fixed << std::setprecision(3) << elapsed / 1000.0 << " ms" << std::endl;
}
private:
std::chrono::time_point<std::chrono::steady_clock> start, end;
const char* func;
};
void TimeTrace::createImpl(const char *func)
{
impl = new TimeTraceImpl(func);
}
void TimeTrace::destroyImpl()
{
delete impl;
}
The reason for the rather comple pImpl implementation is that I don't want to burden the code with extra work when the timing is turned off (timetrace is false).
Of course, the smallest actual tick of std::chrono also varies, but in most Linux implementations, it will be nanoseconds or some small multiple thereof, so (much) better precision than clock.
The drawback is that it measures the elapsed time, not the CPU-usage. This is fine for when the bottleneck is the CPU and memory, but not for things that depend on external hardware to perform something [unless you actually WANT that measurement].
This question already has answers here:
Measuring execution time of a function in C++
(14 answers)
Closed 4 years ago.
In C# I would fire up the Stopwatch class to do some quick-and-dirty timing of how long certain methods take.
What is the equivalent of this in C++? Is there a high precision timer built in?
I used boost::timer for measuring the duration of an operation. It provides a very easy way to do the measurement, and at the same time being platform independent. Here is an example:
boost::timer myTimer;
doOperation();
std::cout << myTimer.elapsed();
P.S. To overcome precision errors, it would be great to measure operations that take a few seconds. Especially when you are trying to compare several alternatives. If you want to measure something that takes very little time, try putting it into a loop. For example run the operation 1000 times, and then divide the total time by 1000.
I've implemented a timer for situations like this before: I actually ended up with a class with two different implemations, one for Windows and one for POSIX.
The reason was that Windows has the QueryPerformanceCounter() function which gives you access to a very accurate clock which is ideal for such timings.
On POSIX however this isn't available so I just used boost.datetime's classes to store the start and end times then calculated the duration from those. It offers a "high resolution" timer but the resolution is undefined and varies from platform to platform.
I use my own version of Python's time_it function. The advantage of this function is that it repeats a computation as many times as necessary to obtain meaningful results. If the computation is very fast, it will be repeated many times. In the end you obtain the average time of all the repetitions. It does not use any non-standard functionality:
#include <ctime>
double clock_diff_to_sec(long clock_diff)
{
return double(clock_diff) / CLOCKS_PER_SEC;
}
template<class Proc>
double time_it(Proc proc, int N=1) // returns time in microseconds
{
std::clock_t const start = std::clock();
for(int i = 0; i < N; ++i)
proc();
std::clock_t const end = std::clock();
if(clock_diff_to_sec(end - start) < .2)
return time_it(proc, N * 5);
return clock_diff_to_sec(end - start) * (1e6 / N);
}
The following example uses the time_it function to measure the performance of different STL containers:
void dummy_op(int i)
{
if(i == -1)
std::cout << i << "\n";
}
template<class Container>
void test(Container const & c)
{
std::for_each(c.begin(), c.end(), &dummy_op);
}
template<class OutIt>
void init(OutIt it)
{
for(int i = 0; i < 1000; ++i)
*it = i;
}
int main( int argc, char ** argv )
{
{
std::vector<int> c;
init(std::back_inserter(c));
std::cout << "vector: "
<< time_it(boost::bind(&test<std::vector<int> >, c)) << "\n";
}
{
std::list<int> c;
init(std::back_inserter(c));
std::cout << "list: "
<< time_it(boost::bind(&test<std::list<int> >, c)) << "\n";
}
{
std::deque<int> c;
init(std::back_inserter(c));
std::cout << "deque: "
<< time_it(boost::bind(&test<std::deque<int> >, c)) << "\n";
}
{
std::set<int> c;
init(std::inserter(c, c.begin()));
std::cout << "set: "
<< time_it(boost::bind(&test<std::set<int> >, c)) << "\n";
}
{
std::tr1::unordered_set<int> c;
init(std::inserter(c, c.begin()));
std::cout << "unordered_set: "
<< time_it(boost::bind(&test<std::tr1::unordered_set<int> >, c)) << "\n";
}
}
In case anyone is curious here is the output I get (compiled with VS2008 in release mode):
vector: 8.7168
list: 27.776
deque: 91.52
set: 103.04
unordered_set: 29.76
You can use the ctime library to get the time in seconds. Getting the time in milliseconds is implementation-specific. Here is a discussion exploring some ways to do that.
See also: How to measure time in milliseconds using ANSI C?
High-precision timers are platform-specific and so aren't specified by the C++ standard, but there are libraries available. See this question for a discussion.
I humbly submit my own micro-benchmarking mini-library (on Github). It's super simple -- the only advantage it has over rolling your own is that it already has the high-performance timer code implemented for Windows and Linux, and abstracts away the annoying boilerplate.
Just pass in a function (or lambda), the number of times it should be called per test run (default: 1), and the number of test runs (default: 100). The fastest test run (measured in fractional milliseconds) is returned:
// Example that times the compare-and-swap atomic operation from C++11
// Sample GCC command: g++ -std=c++11 -DNDEBUG -O3 -lrt main.cpp microbench/systemtime.cpp -o bench
#include "microbench/microbench.h"
#include <cstdio>
#include <atomic>
int main()
{
std::atomic<int> x(0);
int y = 0;
printf("CAS takes %.4fms to execute 100000 iterations\n",
moodycamel::microbench(
[&]() { x.compare_exchange_strong(y, 0); }, /* function to benchmark */
100000, /* iterations per test run */
100 /* test runs */
)
);
// Result: Clocks in at 1.2ms (12ns per CAS operation) in my environment
return 0;
}
#include <time.h>
clock_t start, end;
start = clock();
//Do stuff
end = clock();
printf("Took: %f\n", (float)((end - start) / (float)CLOCKS_PER_SEC));
This might be an OS-dependent issue rather than a language issue.
If you're on Windows then you can access a millisecond 10- to 16-millisecond timer through GetTickCount() or GetTickCount64(). Just call it once at the start and once at the end, and subtract.
That was what I used before if I recall correctly. The linked page has other options as well.
You can find useful this class.
Using RAII idiom, it prints the text given in construction when destructor is called, filling elapsed time placeholder with the proper value.
Example of use:
int main()
{
trace_elapsed_time t("Elapsed time: %ts.\n");
usleep(1.005 * 1e6);
}
Output:
Elapsed time: 1.00509s.
I want to measure the speed of a function within a loop. But why my way of doing it always print "0" instead of high-res timing with 9 digits decimal precision (i.e. in nano/micro seconds)?
What's the correct way to do it?
#include <iomanip>
#include <iostream>
#include <time.h>
int main() {
for (int i = 0; i <100; i++) {
std::clock_t startTime = std::clock();
// a very fast function in the middle
cout << "Time: " << setprecision(9) << (clock() - startTime + 0.00)/CLOCKS_PER_SEC << endl;
}
return 0;
}
Related Questions:
How to overcome clock()'s low resolution
High Resolution Timer with C++ and linux
Equivalent of Windows’ QueryPerformanceCounter on OSX
Move your time calculation functions outside for () { .. } statement then devide total execution time by the number of operations in your testing loop.
#include <iostream>
#include <ctime>
#define NUMBER 10000 // the number of operations
// get the difference between start and end time and devide by
// the number of operations
double diffclock(clock_t clock1, clock_t clock2)
{
double diffticks = clock1 - clock2;
double diffms = (diffticks) / (CLOCKS_PER_SEC / NUMBER);
return diffms;
}
int main() {
// start a timer here
clock_t begin = clock();
// execute your functions several times (at least 10'000)
for (int i = 0; i < NUMBER; i++) {
// a very fast function in the middle
func()
}
// stop timer here
clock_t end = clock();
// display results here
cout << "Execution time: " << diffclock(end, begin) << " ms." << endl;
return 0;
}
Note: std::clock() lacks sufficient precision for profiling. Reference.
A few pointers:
I would be careful with the optimizer, it might throw all your code if I will think that it doesn't do anything.
You might want to run the loop 100000 times.
Before doing the total time calc store the current time in a variable.
Run your program several times.
If you need higher resolution, the only way to go is platform dependent.
On Windows, check out the QueryPerformanceCounter/QueryPerformanceFrequency API's.
On Linux, look up clock_gettime().
See a question I asked about the same thing: apparently clock()'s resolution is not guaranteed to be so high.
C++ obtaining milliseconds time on Linux -- clock() doesn't seem to work properly
Try gettimeofday function, or boost
If you need platform independence you need to use something like ACE_High_Res_Timer (http://www.dre.vanderbilt.edu/Doxygen/5.6.8/html/ace/a00244.html)
You might want to look into using openMp.
#include <omp.h>
int main(int argc, char* argv[])
{
double start = omp_get_wtime();
// code to be checked
double end = omp_get_wtime();
double result = end - start;
return 0;
}