Class to calculate performance of a function - c++

I'm writing a class as follows:
struct TimeIt {
using TimePoint = std::chrono::time_point<std::chrono::high_resolution_clock>;
TimeIt(const std::string& functName) :
t1{std::chrono::high_resolution_clock::now()},
functName{functName} {}
~TimeIt() {
TimePoint t2 = std::chrono::high_resolution_clock::now();
std::cout << "Exiting from " << functName << "...\n Elapsed: ";
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count() << " ms" << "\n";
}
TimePoint t1;
std::string functName;
};
The whole point of it is measure the time that takes for one function to complete, calling this at the start of it. However, the only value I'm getting is 0ms. This is obviously wrong, because it takes up to a minute for some of the functions, but I can't see why it's wrong.
I did the same, but at the start and end of the function, creating the TimePoint (with auto) and doing a duration_cast. Any clue what I'm missing here?
Edit:
I'm going to try to make it reproducible. A little bit of context: I'm working with big matrixes (12000 dimensions) and doing a lot of input output operations.
template <typename InputType>
InputMat<InputType>
readInp(const std::string& filepath = "data.inp", const size_t& reserveSize = 15000) {
TimeIt("readInp");
std::ifstream F(filepath);
assert(F.is_open());
InputMat<InputType> res;
res.reserve(reserveSize);
std::string line;
while (F >> line) {
InputType lineBitset{line};
res.push_back(lineBitset);
}
return res;
}
This function reads a matrix, and calling TimeIt here gives really different results compared when I call it in the wrapper function:
void test1() {
//Testing for 0-1 values
auto t1 = std::chrono::high_resolution_clock::now();
auto inpMat = readInp<std::bitset<32>>();
auto t2 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count() << "\n";
//More code...
}
This outputs:
Exiting from readInp...
Elapsed: 0 milliseconds
4
and data.inp

NOW you have made the problem clear! By writing this:
TimeIt("Reading");
you are creating a temporary object, which is immediately deleted. You need to give this object a name so it lives until the end of the block:
TimeIt timer("Reading");

Related

Total time in different parts of recursive function

I am new to C++ and I need to measure the total time for different parts of a recursive function. A simple example to show where I get so far is:
#include <iostream>
#include <unistd.h>
#include <chrono>
using namespace std;
using namespace std::chrono;
int recursive(int);
void foo();
void bar();
int main() {
int n = 5; // this value is known only at runtime
int result = recursive(n);
return 0;
}
int recursive(int n) {
auto start = high_resolution_clock::now();
if (n > 1) { recursive(n - 1); n = n - 1; }
auto stop = high_resolution_clock::now();
auto duration_recursive = duration_cast<microseconds>(stop - start);
cout << "time in recursive: " << duration_recursive.count() << endl;
//
// .. calls to other functions and computations parts I don't want to time
//
start = high_resolution_clock::now();
foo();
stop = high_resolution_clock::now();
auto duration_foo = duration_cast<seconds>(stop - start);
cout << "time in foo: " << duration_foo.count() << endl;
//
// .. calls to other functions and computations parts I don't want to time
//
start = high_resolution_clock::now();
bar();
stop = high_resolution_clock::now();
auto duration_bar = duration_cast<seconds>(stop - start);
cout << "time in bar: " << duration_bar.count() << endl;
return 0;
}
void foo() { // a complex function
sleep(1);
}
void bar() { // another complex function
sleep(2);
}
I want the total time for each of the functions, for instance, for foo() it is 5 seconds, while now I always get 1 second. The number of iterations is known only at runtime (n=5 here is fixed just for simplicity).
To compute the total time for each of the functions I did try replacing the type above by using static and accumulate the results but didn't work.
You can use some container to store the times, pass it by reference and accumulate the times. For example with a std::map<std::string,unsinged> to have labels:
int recursive(int n, std::map<std::string,unsigned>& times) {
if (n >= 0) return;
// measure time of foo
times["foo"] += duration_foo;
// measure time of bar
times["bar"] += duration_bar;
// recurse
recursive(n-1,times);
}
Then
std::map<std::string,unsigned> times;
recursive(200,times);
for (const auto& t : times) {
std::cout << t.first << " took total : " << t.second << "\n";
}

Preventing compiler from optimising a loop

I want to measure the time it takes to call a function.
Here is the code:
for (int i = 0; i < 5; ++i) {
std::cout << "Pass : " << i << "\n";
const auto t0 = std::chrono::high_resolution_clock::now();
system1.euler_intregration(0.0166667);
const auto t1 = std::chrono::high_resolution_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() << "\n";
but the compiler keeps optimising the loop so time is not measured and returns zero.
I have tried using asm("") and __asm__("") as advised here but nothing works for me.
I must admit that I don't really know how these asm() functions works so I might be using them in the wrong way.

Chrono C++ timings not correct

I'm just comparing the speed of a couple Fibonacci functions, one gives an output almost immediately and reads it got done in 500 nanoseconds, while the other, depending on the depth, may sit there loading for many seconds, yet when it is done, it will read that it took it only 100 nanoseconds... After I just sat there and waited like 20 seconds for it.
It's not a big deal as I can prove the other is slower just with raw human perception, but why would chrono not be working? Something to do with recursion?
PS I know that fibonacci2() doesn't give the correct output on odd numbered depths, I'm just testing some things and the output is actually just there so the compiler doesn't optimize it away or something. Go ahead and just copy this code and you'll see fibonacci2() immediately output but you'll have to wait like 5 seconds for fibonacci(). Thank you.
#include <iostream>
#include <chrono>
int fibonacci2(int depth) {
static int a = 0;
static int b = 1;
if (b > a) {
a += b; //std::cout << a << '\n';
}
else {
b += a; //std::cout << b << '\n';
}
if (depth > 1) {
fibonacci2(depth - 1);
}
return a;
}
int fibonacci(int n) {
if (n <= 1) {
return n;
}
return fibonacci(n - 1) + fibonacci(n - 2);
}
int main() {
int f = 0;
auto start2 = std::chrono::steady_clock::now();
f = fibonacci2(44);
auto stop2 = std::chrono::steady_clock::now();
std::cout << f << '\n';
auto duration2 = std::chrono::duration_cast<std::chrono::nanoseconds>(stop2 - start2);
std::cout << "faster function time: " << duration2.count() << '\n';
auto start = std::chrono::steady_clock::now();
f = fibonacci(44);
auto stop = std::chrono::steady_clock::now();
std::cout << f << '\n';
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(stop - start);
std::cout << "way slower function with incorrect time: " << duration.count() << '\n';
}
I don't know what compiler you are using and with which compiler options, but I tested x64 msvc v19.28 using /O2 in godbolt. Here the compiled instructions are reordered such that it queries the perf_counter twice before invoking the fibonacci(int) function, which in code would look like
auto start = ...;
auto stop = ...;
f = fibonacci(44);
A solution to disallow this reordering might be to use a atomic_thread_fence just before and after the fibonacci function call.
As Mestkon answered the compiler can reorder your code.
Examples of how to prevent the compiler from reordering Memory Ordering - Compile Time Memory Barrier
It would be beneficial in the future if you provided information on what compiler you were using.
gcc 7.5 with -O2 for example does not reorder the timer instructions in this given scenario.

Pass any function by parameter [duplicate]

This question already has answers here:
Measuring execution time of a function in C++
(14 answers)
Closed 2 years ago.
Explication
Hello,
I want to create a function that can execute any type of function, indicating at the end of its execution the time it took.
The called function can have a return value or not and 0 or more parameters of any type.
The calling function must print something like this:
Running "myFunction" .....
Done ! (5210ms)
Basically I want to create a function that calls any type of function passed as a parameter by adding code before and after the call.
What I have done
For now I do it like this.
The calling function:
template <typename T>
T callFunctionPrintTime(std::string fnName, std::function<T()> fn) {
std::cout << ">> Running " << fnName << " ... " << std::endl;
auto t1 = std::chrono::high_resolution_clock::now();
//Call to the target function
T retVal = fn();
auto t2 = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
std::cout << "Done ! (" << duration << " ms)" << std::endl;
return retVal;
}
The main
int main()
{
//Store the function to call
std::function<unsigned long()> fn = []() {
return myFunction(15, 10000);
};
//Use of the function we are interested in
auto i = callFunctionPrintTime("myFunction", fn);
//The return value of myFunction can be used in the rest of the program.
std::cout << "i: " << i << std::endl;
}
myFunction
This function doesn't matter, it can be anything.
Here we execute a while loop for a given maximum time or maximum number of loop and retrieve the number of loop performed.
unsigned long myFunction(long maxMs, unsigned long maxI) {
unsigned long i = 0;
auto tStart = std::chrono::high_resolution_clock::now();
while (maxMs > (std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::high_resolution_clock::now() - tStart).count()) &&
maxI > i) {
i++;
}
return i;
}
Question
What is the best way for you to do this? I am not satisfied with my code.
I'm not sure I'm using the right way to pass a function of any kind by parameter.
Moreover by using a lambda expression to store my function I can't retrieve the name of the called function. So I have to pass its name by parameter.
I'm pretty sure there's no single answer to what's best - but this is a small improvement i.m.o. since it's a bit more generic.
#include <chrono>
#include <iostream>
#include <string>
#include <type_traits>
// enable it for invocables with any type of arguments
template <class Func, class... Args,
std::enable_if_t<std::is_invocable_v<Func, Args...>, int> = 0>
decltype(auto) callFunctionPrintTime(std::string fnName, Func fn, Args&&... args)
{
std::cout << ">> Running " << fnName << " ... " << std::endl;
auto t1 = std::chrono::high_resolution_clock::now();
//Call to the target function by forwarding the arguments to it
decltype(auto) retVal = fn(std::forward<Args>(args)...);
auto t2 = std::chrono::high_resolution_clock::now();
auto duration =
std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
std::cout << "Done ! (" << duration << " ms)" << std::endl;
return retVal;
}
Alternatively, if you don't plan on making overloads for non-invocables (which seems pretty obvious that you wont when I think about it) you can use static_assert instead of SFINAE:
template <class Func, class... Args>
decltype(auto) callFunctionPrintTime(std::string fnName, Func fn, Args&&... args)
{
static_assert(std::is_invocable_v<Func, Args...>, "must be invocable");
//...
Test usage:
int& a_func(int i) {
static int rv = 0;
rv += i;
return rv;
}
int main() {
int& ref = callFunctionPrintTime("a_func 1", a_func, 10);
std::cout << ref << '\n'; // prints 10
ref += 20;
callFunctionPrintTime("a_func 2", a_func, 100);
std::cout << ref << '\n'; // prints 130 (10 + 20 + 100)
}
Or some of the alternatives for calling myFunction:
std::function<unsigned long()> fn = []() { return myFunction(15, 100000); };
std::cout << callFunctionPrintTime("myFunction", fn);
std::cout << callFunctionPrintTime("myFunction",
[]() { return myFunction(15, 100000); });
std::cout << callFunctionPrintTime("myFunction", myFunction, 15, 100000);
Some useful links:
decltype(auto), std::enable_if_t, std::is_invocable_v, SFINAE
Main idea is correct. there are some details which might be improved:
template <typename Func, typename ... Ts>
decltype(auto) callFunctionPrintTime(std::string_view fnName, Func&& f, Ts&&... args) {
static_assert(std::is_invocable_v<Func&&, Ts&&...>); // Possibly SFINAE instead.
std::cout << ">> Running " << fnName << " ... " << std::endl;
struct Finally {
std::chrono::time_point<std::chrono::high_resolution_clock> t1 =
std::chrono::high_resolution_clock::now();
~Finally() {
auto t2 = std::chrono::high_resolution_clock::now();
auto duration =
std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
std::cout << "Done ! (" << duration << " ms)" << std::endl;
}
} finally;
return std::invoke(std::forward<Func>(f), std::forward<Ts>(args)...);
}
Now:
handles void return type (without specialization required).
Log also in case of exception (You can go further with std::uncaught_exceptions or try/catch block to dissociate exception from normal path).
handle any invocable with its parameters.
For automatic name, we have to rely on MACRO:
#define CallFunctionPrintTime(F, ...) callFunctionPrintTime(#F, F __VA_OPT__(,) __VA_ARGS__)
Demo

Why does switching the order of for loops significantly change the execution time?

In the following code, I have 2 nested for loops. The second one swaps the order of the for loops, and runs significantly faster.
Is this purely a cache locality issue (the first code loops over a vector many times, whereas the second code loops over the vector once), or is there something else that I'm not understanding?
int main()
{
using namespace std::chrono;
auto n = 1 << 12;
vector<int> v(n);
high_resolution_clock::time_point t1 = high_resolution_clock::now();
for(int i = 0; i < (1 << 16); ++i)
{
for(const auto val : v) i & val;
}
high_resolution_clock::time_point t2 = high_resolution_clock::now();
duration<double> time_span = duration_cast<duration<double>>(t2 - t1);
std::cout << "It took me " << time_span.count() << " seconds.";
std::cout << std::endl;
t1 = high_resolution_clock::now();
for(const auto val : v)
{
for(int i = 0; i < (1 << 16); ++i) i & val;
}
t2 = high_resolution_clock::now();
time_span = duration_cast<duration<double>>(t2 - t1);
std::cout << "It took me " << time_span.count() << " seconds.";
std::cout << std::endl;
}
As written, the second loop needs to read each val from vector v only once. The first version needs to read each val from vector v once in the inner loop for every i, so in total 65536 times.
So without any optimisation, this will make the second loop several times faster. With optimisation turned on high enough, the compiler will figure out that all these calculations achieve nothing, and are unnecessary, and throw them all away. Your execution times will then go down to zero.
If you change the code to do something with the results (like adding up all values i & val, then printing the total), a really good compiler may figure out that both pieces of code produce the same result and use the faster method for both cases.