So I have a program that evaluates a polynomial in two different ways: Honrer's method and a Naive method. I'm trying to see their run times respectively, but depending on which order I place the function calls their times change. For example, I place the Horner method first and it takes longer. I then tried with the naive method first and then it takes longer. The Horner method should be much much faster since it only has one loop where the naive method has a nested loop. So i figured it must be the way I'm using the clocks from the chrono library. I tried both the high_resolution_clock and system_clock, but the same thing happens. Any help/comments are welcomed.
#include <cstdlib>
#include <iostream>
#include <chrono>
#include "Polynomial.h"
int main(int argc, char** argv) {
double c[5] = {5, 0, -3, 1, -8};
int degree = 4;
Polynomial obj(c, degree);
auto start = std::chrono::high_resolution_clock::now();
std::cout<<"Horner Evaluation: " << obj.hornerEval(-2)<<", ";
auto elapsed = std::chrono::high_resolution_clock::now() - start;
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(elapsed).count();
std::cout<< duration << " nanoseconds "<<std::endl;
auto start2 = std::chrono::high_resolution_clock::now();
std::cout<<"Naive Evaluation: " << obj.naiveEval(-2)<<", ";
auto elapsed2 = std::chrono::high_resolution_clock::now() - start2;
auto duration2 = std::chrono::duration_cast<std::chrono::nanoseconds>(elapsed2).count();
std::cout<< duration2 << " nanoseconds "<<std::endl;
}
You didn't put all the code but from description it looks it is caching effect.
When it runs first method CPU cache is cold (data from memory is not yet populated with CPU cache) so it takes more time to process (memory is slow compared to cache).
When second method is called it has all (or most depending on data size) the data already available in cache - cache is hot.
Solution - call both methods outside timing part first to warm up the cache than do measurements.
Like one of the previous responds already said, it's probably something with the cache, the prefetcher can maybe better determine which memory to load into cache in the naiveEval method. Here is a talk about benchmarking c++ code for futher information for exapmle on the effect of cold starts on benchmarking: https://www.youtube.com/watch?v=zWxSZcpeS8Q
Related
Basically I need a function that makes x decrement to 0 over a certain time period (40 seconds)
This seems pretty simple in theory but I haven't been able to do it for a bit now.
static auto decrement = [](int start_value, int end_value, int time) {
//i need this function to decrement start_value until it reaches end_value
//this should happen over a set time as well, in this case 40 seconds.
};
int cool_variable = decrement(2000, 0, 40); //40 seconds, the time should be expected in seconds
#DavidSchwartz has a great comment that should be considered a serious solution:
Why not just compute the correct value of cool_variable based on the clock whenever you need its value?
That being said, this is an answer to the actual question: How to write this function:
decrement = [](int start_value, int end_value, int time)
Where cool_variable starts with the value start_value and decrements at a steady rate of time until it equals end_value, and the total amount of time for this multi-decrement operation is time seconds.
This is a function with a time deadline. It is well-established that for problems with a deadline, one should lean towards *_until solutions as opposed to *_for solutions in handling the time aspect. This implies that instead of sleeping for some time duration between decrements, we need to sleep until it is time to decrement from some value to the next lower value.
The use of sleep_until allows a somewhat varying time for each iteration of the decrement loop, while ensuing that the total time of the full loop closely approximates the total desired time.
To achieve the use of sleep_until, we need a (presumably) linear function:
duration next_time(int value) {return a0 + a1 * value;}
where next_time(start_value) == 0s and next_time(end_value) == seconds{time}.
We have two equations, and two unknowns: a0 and a1. We can solve for the two unknowns to create our desired next_time function:
auto next_time = [&](int value)
{
return (value - start_value) * time / (end_value - start_value);
};
Now for each value of cool_variable, one can sleep_until(t0 + next_time(cool_variable)) where t0 is the time where you want cool_variable == start_value (and thus want to sleep for 0 seconds).
The next most important thing (after use of sleep_until) is to use <chrono>. int time is an error-prone API that has no place in modern C++. The type of time should be a <chrono> duration such as seconds (or perhaps some other unit of time). Let's start with seconds:
#include <atomic>
#include <chrono>
#include <iostream>
#include <thread>
std::atomic<int> cool_variable = 0;
void
decrement(int start_value, int end_value, std::chrono::seconds time)
{
using namespace std;
using namespace std::chrono;
auto next_time = [&](int value)
{
return (value - start_value) * nanoseconds{time} / (end_value - start_value);
};
auto t0 = steady_clock::now();
for (cool_variable = start_value; cool_variable >= end_value; --cool_variable)
{
this_thread::sleep_until(t0 + next_time(cool_variable));
cout << cool_variable << endl;
}
}
cool_variable is stored as an atomic<int> so that it can be concurrently read by other threads to avoid undefined behavior.
The input time variable is converted to nanoseconds precision in the computation so that the argument to sleep_until can be as precise as is practical.
Note that the current time need only be computed once, prior to the decrement loop.
Just as an example, cool_variable is printed to the terminal on each iteration. This is of course not necessary, and just used for demonstration purposes.
This can now be called like so:
decrement(2000, 0, 40s);
It can also be instructive to wrap the call to decrement with timing information in order to ensure that it is behaving as intended:
auto t0 = system_clock::now();
decrement(2000, 0, 40s);
auto t1 = system_clock::now();
std::cout << (t1-t0)/1s << '\n';
This will output each value of cool_variable between 2000 and 0 (inclusive), and then say how many seconds it took to do the operation (hopefully 40 in this example).
Finally, one minor simplification can be made:
Since we desire time to be nanoseconds in the computation, it is actually simpler to simply accept nanoseconds in the API, relieving us of the need to convert seconds to nanoseconds internally:
void
decrement(int start_value, int end_value, std::chrono::nanoseconds time)
{
using namespace std;
using namespace std::chrono;
auto next_time = [&](int value)
{
return (value - start_value) * time / (end_value - start_value);
};
auto t0 = steady_clock::now();
for (cool_variable = start_value; cool_variable >= end_value; --cool_variable)
{
this_thread::sleep_until(t0 + next_time(cool_variable));
cout << cool_variable << endl;
}
}
The client code need not change at all:
decrement(2000, 0, 40s);
The 40s argument will implicitly convert to 40'000'000'000ns at the call site. And this is why it is so important to use <chrono> types for time. Had we not done this, this final (minor) simplification would not have been minor at all. It would have required changing client code at the call site, which in real-world applications is often impractical.
In Summary
Use sleep_until.
Use <chrono>.
As silly as it seems, I would like to know whether there may be pitfalls when trying to reconcile the time costs for a for loop, as measured
either from time points just outside the for loop (global or external time cost)
or, from time points being inside the loop, and being cumulatively considered (local or internal time cost) ?
The example below illustrates my difficulties getting two equal measurements:
#include <iostream>
#include <vector> // std::vector
#include <ctime> // clock(), ..
int main(){
clock_t clockStartLoop;
double timeInternal(0)// the time cost of the loop, summing all time costs of commands within the "for" loop
, timeExternal // time cost of the loop, as measured outside the boundaries of "for" loop
;
std::vector<int> vecInt; // will be [0,1,..,10000] after the loop below
clock_t costExternal(clock());
for(int i=0;i<10000;i++){
clockStartLoop = clock();
vecInt.push_back(i);
timeInternal += clock() - clockStartLoop; // incrementing internal time cost
}
timeInternal /= CLOCKS_PER_SEC;
timeExternal = (clock() - costExternal)/(double)CLOCKS_PER_SEC;
std::cout << "timeExternal = "<< timeExternal << " s ";
std::cout << "vs timeInternal = " << timeInternal << std::endl;
std::cout << "We have a ratio of " << timeExternal/timeInternal << " between the two.." << std::endl;
}
I usually get a ratio around 2 as output e.g.
timeExternal = 0.008407 s vs timeInternal = 0.004287
We have a ratio of 1.96105 between the two..
, whereas I was hoping a ratio closer to 1.
Is it just because there are operations internal to the loop which are not measured by the clock() difference (such as incrementing timeInternal) ?
Could the i++ operation in the for(..) be non-negligible in the external measurement and also explain the difference with the internal one ?
I'm actually dealing with a more complex code and I would like to isolate time costs within a loop, being sure that all the time slices I consider do make up a complete pie (which I never achieved until now..). Thanks a lot
timeExternal = 0.008407 s vs timeInternal = 0.004287 We have a ratio of 1.96105 between the two..
A ratio of ~2 is to be expected - by far the heaviest call in your loop is clock() itself (on most systems clock() is a syscall to the kernel).
Imagine that clock() implementation looks like the following pseudocode:
clock_t clock() {
go_to_kernel(); // very long operation
clock_t rc = query_process_clock();
return_from_kernel(); // very long operation
return rc;
}
Now going back to the loop, we can annotate the places where time is spent:
for(int i=0;i<10000;i++){
// go_to_kernel - very long operation
clockStartLoop = clock();
// return_from_kernel - very long operation
vecInt.push_back(i);
// go_to_kernel - very long operation
timeInternal += clock() - clockStartLoop;
// return_from_kernel - very long operation
}
So between the two calls to clock() we have 2 long operations, with a total in the loop of 4. Hence the ratio of 2-to-1.
Is it just because there are operations internal to the loop which are not measured by the clock() difference (such as incrementing timeInternal) ?
No, incrementing timeInterval is negligible.
Could the i++ operation in the for(..) be non-negligible in the external measurement and also explain the difference with the internal one ?
No, i++ is also negligible. Remove the inner calls to clock() and you will see a much faster execution time. On my system it was 0.00003 s.
The next most expensive operation after clock() is vector::push_back(), because it needs to resize the vector. This is amortized by a quadratic growth factor and can be eliminated entirely by calling vector::reserve() before entering the loop.
Conclusion: when benchmarking, make sure to time entire loops, not individual iterations. Better yet, use frameworks like Google Benchmark, which will help to avoid many other pitfalls (like compiler optimizations). There's also quick-bench.com for simple cases (based on Google Benchmark).
This question already has answers here:
How can I benchmark C code easily?
(5 answers)
Closed 6 years ago.
I have a function which can generate 10000 random numbers and write them in a file.
void generator(char filename[])
{
int i;
int n;
FILE* fp;
if((fp=fopen(filename,"w+"))==NULL)
{
printf("Fail creating fileļ¼");
}
srand((unsigned)time(NULL));
for(i=0;i<10000;i++)
{
n=rand()%10000;
fprintf(fp,"%d ",n);
}
fclose(fp);
}
How can I get the execution time of this function using C/C++ ?
Code profiling is not a particularly easy task (or, as we oft-say in programming, it's "non-trivial".) The issue is because "execution time" measured in seconds isn't particularly accurate or useful.
What you're wanting to do is to measure the number of CPU cycles. This can be done using an external tool such as callgrind (one of Valgrind's tools). There's a 99% chance that's all you want.
If you REALLY want to do that yourself in code, you're undertaking a rather difficult task. I know first hand - I wrote a comparative benchmarking library in C++ for on-the-fly performance testing.
If you really want to go down that road, you can research benchmarking on Intel processors (that mostly carries over to AMD), or whatever processor you've using. However, as I said, that topic is large and in-depth, and far beyond the scope of a StackOverflow answer.
You can use the chrono library;
#include <chrono>
//*****//
auto start = std::chrono::steady_clock::now();
generator("file.txt")
auto end = std::chrono::steady_clock::now();
std::cout << "genarator() took "
<< std::chrono::duration_cast<std::chrono::microseconds>(end - start).count() << "us.\n";
You have already some nice C answers that also work with C++.
Here a native C++ solution using <chrono>:
auto tbegin = std::chrono::high_resolution_clock::now();
...
auto tend = std::chrono::high_resolution_clock::now();
auto tduration = std::chrono::duration_cast<std::chrono::microseconds>(tend - tbegin).count();
The advantage is that you can switch from microsecond to millisecond, seconds or any other time measurement units very easily.
Note that you may have OS limits to the clocking accuracy (typically 15 milliseconds on windows environment), so that this may give meaningful results only if you're really above this limit.
void generator(char filename)
{
clock_t tStart = clock();
/* your code here */
printf("Time taken: %.2fs\n", (double)(clock() - tStart)/CLOCKS_PER_SEC);
}
upd. And add #include <ctime>
Try this
#include <sys/time.h>
struct timeval tpstart,tpend;
double timeuse;
//get time before generator starts
gettimeofday(&tpstart,NULL);
//call generator function
generator(filename);
//get time after generator ends
gettimeofday(&tpend,NULL);
//calculate the used time
timeuse=1000000*(tpend.tv_sec-tpstart.tv_sec)+tpend.tv_usec-tpstart.tv_usec;
timeuse/=1000000;
printf("Used Time:%fsec\n",timeuse);
#include <ctime>
.....
clock_t start = clock();
...//the code you want to get the execution time for
double elapsed_time = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;
std::cout << elapsed_time << std::endl;//elapsed_time now contains the execution time(in seconds) of the code in between
will give you an approximate(not accurate) execution time of the code between the first and second clock() calls
Temporarily make the limit 10000000
Time it with a stopwatch. Divide the time by 1000
I basically have a school project testing the time it takes different sort algorithms and record how long they take with n amount of numbers to sort. So I decided to use Boost library with c++ to record the time. I am at the point I am not sure how to do it, I have googled it and have found people using different ways. for examples
auto start = boost::chrono::high_resolution_clock::now();
auto end = boost::chrono::high_resolution_clock::now();
auto time = (end-start).count();
or
boost::chrono::system_clock::now();
or
boost::chrono::steady_clock::now()
or even using something like this
boost::timer::cpu_timer and boost::timer::auto_cpu_time
or
boost::posix_time::ptime start = boost::posix_time::microsec_clock::local_time( );
so I want to be sure on how to do it right now this is what I have
typedef boost::chrono::duration<double, boost::nano> boost_nano;
auto start_t = boost::chrono::high_resolution_clock::now();
// call function
auto end_t = boost::chrono::high_resolution_clock::now();
boost_nano time = (end_t - start_t);
cout << t.count();
so am I on the right track?
You likely want the high resolution timer.
You can use either that of boost::chrono or std::chrono.
Boost Chrono has some support for IO builtin, so it makes it easier to report times in a human friendly way.
I usually use a wrapper similar to this:
template <typename Caption, typename F>
auto timed(Caption const& task, F&& f) {
using namespace boost::chrono;
struct _ {
high_resolution_clock::time_point s;
Caption const& task;
~_() { std::cout << " -- (" << task << " completed in " << duration_cast<milliseconds>(high_resolution_clock::now() - s) << ")\n"; }
} timing { high_resolution_clock::now(), task };
return f();
}
Which reports time taken in milliseconds.
The good part here is that you can time construction and similar:
std::vector<int> large = timed("generate data", [] {
return generate_uniform_random_data(); });
But also, general code blocks:
timed("do_step2", [] {
// step two is foo and bar:
foo();
bar();
});
And it works if e.g. foo() throws, just fine.
DEMO
Live On Coliru
int main() {
return timed("demo task", [] {
sleep(1);
return 42;
});
}
Prints
-- (demo task completed in 1000 milliseconds)
42
I typically use time(0) to control the duration of a loop. time(0) is simply one time measurement that, because of its own short duration, has the least impact on everything else going on (and you can even run a do-nothing loop to capture how much to subtract from any other loop measurement effort).
So in a loop running for 3 (or 10 seconds), how many times can the loop invoke the thing you are trying to measure?
Here is an example of how my older code measures the duration of 'getpid()'
uint32_t spinPidTillTime0SecChange(volatile int& pid)
{
uint32_t spinCount = 1; // getpid() invocation count
// no measurement, just spinning
::time_t tStart = ::time(nullptr);
::time_t tEnd = tStart;
while (0 == (tEnd - tStart)) // (tStart == tEnd)
{
pid = ::getpid();
tEnd = ::time(nullptr);
spinCount += 1;
}
return(spinCount);
}
Invoke this 3 (or 10) times, adding the return values together. To make it easy, discard the first measurement (because it probably will be a partial second).
Yes, I am sure there is a c++11 version of accessing what time(0) accesses.
Use std::chrono::steady_clock or std::chrono::high_resolution_clock (if it is steady - see below) and not std::chrono::system_clock for measuring run time in C++11 (or use its boost equivalent). The reason is (quoting system_clock's documentation):
on most systems, the system time can be adjusted at any moment
while steady_clock is monotonic and is better suited for measuring intervals:
Class std::chrono::steady_clock represents a monotonic clock. The time
points of this clock cannot decrease as physical time moves forward.
This clock is not related to wall clock time, and is best suitable for
measuring intervals.
Here's an example:
auto start = std::chrono::steady_clock::now();
// do something
auto finish = std::chrono::steady_clock::now();
double elapsed_seconds = std::chrono::duration_cast<
std::chrono::duration<double> >(finish - start).count();
A small practical tip: if you are measuring run time and want to report seconds std::chrono::duration_cast<std::chrono::seconds> is rarely what you need because it gives you whole number of seconds. To get the time in seconds as a double use the example above.
As suggested by Gregor McGregor, you can use a high_resolution_clock which may sometimes provide higher resolution (although it can be an alias of steady_clock), but beware that it may also be an alias of system_clock, so you might want to check is_steady.
In fact i am trying to calculate the time a function takes to complete in my program.
So i am using the logic to get system time when i call the function and time when the function returns a value then by subtracting the values i get time it took to complete.
So if anyone can tell me some better approach or just how to get system time at an instance it would be quite a help
The approach I use when timing my code is the time() function. It returns a single numeric value to you representing the epoch which makes the subtraction part easier for calculation.
Relevant code:
#include <time.h>
#include <iostream>
int main (int argc, char *argv[]) {
int startTime, endTime, totalTime;
startTime = time(NULL);
/* relevant code to benchmark in here */
endTime = time(NULL);
totalTime = endTime - startTime;
std::cout << "Runtime: " << totalTime << " seconds.";
return 0;
}
Keep in mind this is user time. For CPU, time see Ben's reply.
Your question is totally dependant on WHICH system you are using. Each system has its own functions for getting the current time. For finding out how long the system has been running, you'd want to access one of the "high resolution performance counters". If you don't use a performance counter, you are usually limited to microsecond accuracy (or worse) which is almost useless in profiling the speed of a function.
In Windows, you can access the counter via the 'QueryPerformanceCounter()' function. This returns an arbitrary number that is different on each processor. To find out how many ticks in the counter == 1 second, call 'QueryPerformanceFrequency()'.
If you're coding under a platform other than windows, just google performance counter and the system you are coding under, and it should tell you how you can access the counter.
Edit (clarification)
This is c++, just include windows.h and import the "Kernel32.lib" (seems to have removed my hyperlink, check out the documentation at: http://msdn.microsoft.com/en-us/library/ms644904.aspx). For C#, you can use the "System.Diagnostics.PerformanceCounter" class.
You can use time_t
Under Linux, try gettimeofday() for microsecond resolution, or clock_gettime() for nanosecond resolution.
(Of course the actual clock may have a coarser resolution.)
In some system you don't have access to the time.h header. Therefore, you can use the following code snippet to find out how long does it take for your program to run, with the accuracy of seconds.
void function()
{
time_t currentTime;
time(¤tTime);
int startTime = currentTime;
/* Your program starts from here */
time(¤tTime);
int timeElapsed = currentTime - startTime;
cout<<"It took "<<timeElapsed<<" seconds to run the program"<<endl;
}
You can use the solution with std::chrono described here: Getting an accurate execution time in C++ (micro seconds) you will have much better accuracy in your measurement. Usually we measure code execution in the round of the milliseconds (ms) or even microseconds (us).
#include <chrono>
#include <iostream>
...
[YOUR METHOD/FUNCTION STARTING HERE]
auto start = std::chrono::high_resolution_clock::now();
[YOUR TEST CODE HERE]
auto elapsed = std::chrono::high_resolution_clock::now() - start;
long long microseconds = std::chrono::duration_cast<std::chrono::microseconds>(elapsed).count();
std::cout << "Elapsed time: " << microseconds << " ms;