I need to create a template function which calls an exchangeable "worker" function over and over again.
template<class F>
int exec(F f) {
long long s = 0;
for(int i=0; i<1000000; i++) {
s += f(i); // many calls to f
}
return s;
}
Now I thought of different possibilities of defining the worker function:
Inline function
inline int fooInline(int a) { return a + 1; }
exec(fooInline);
Function with definition from other compilation unit
int fooSrc(int a);
exec(fooSrc);
Function object
struct FooOp {
int operator()(int a) { return a + 1; }
};
exec(FooOp());
std::function (e.g. bound to an inline function)
std::function<int(int)> fooFnc = std::bind(&fooInline);
exec(fooFnc);
Lambda
auto fooLambda = [](int a) { return a + 1; };
exec(fooLambda);
Temporary lambda
exec([](int a) { return a + 1; });
What are the differences between theses methods? What would be the fastest way? Can I assume that exec(fooInline) will actually inline fooInline?
Within C++ there is no answer to your question which one would be the fastest. You can measure in your particular environment but then again you do not have any guarantees.
The guess would be that the compiler has the biggest chance to inline if it has easy access to the source of the worker function, so inline function or temporary lambda would be the best. But still a decent compiler might inline in all of the methods you listed.
Related
I have the situation where one function calls one of several possible functions. This seems like a good place to pass a function as a parameter. In this Quoara answer by Zubkov there are three ways to do this.
int g(int x(int)) { return x(1); }
int g(int (*x)(int)) { return x(1); }
int g(int (&x)(int)) { return x(1); }
...
int f(int n) { return n*2; }
g(f); // all three g's above work the same
When should which method be used? What are there differences? I prefer the simplest approach so why shouldn't the first way always be used?
For my situation, the function is only called once and I'd like to keep it simple. I have it working with pass by pointer and I just call it with g(myFunc) where myFunc is the function that gets called last.
Expanding on L.F.'s comment, it's often better to eschew function pointers entirely, and work in terms of invocable objects (things which define operator()). All of the following allow you to do that:
#include <type_traits>
// (1) unrestricted template parameter, like <algorithm> uses
template<typename Func>
int g(Func x) { return x(1); }
// (2) restricted template parameter to produce possibly better errors
template<
typename Func,
typename=std::enable_if_t<std::is_invocable_r_v<int, Func, int>>
>
int g(Func x) { return std::invoke(x, 1); }
// (3) template-less, trading a reduction in code size for runtime overhead and heap use
int g(std::function<int(int)> x) { return x(1); }
Importantly, all of these can be used on lambda functions with captures, unlike any of your options:
int y = 2;
int ret = g([y](int v) {
return y + v;
});
I have a function of a class I would like to iteratively call inside a loop, and while the loop is fixed, I want to be able to provide different functions (from the given object). To approach this, I created a templated struct MyWrapper to take the object whose function I want to call, the function itself, and data for which to evaluate the function. (In that sense, the member function will always have the same signature)
What I found though, was that using a member function pointer incurs a huge performance cost, even though at compile time, I know the function I want to call. So I was messing around to try and fix this, and (while I'm still unclear why the first situation happens), I've experienced another interesting behaviour.
In the following situation, every call to the wrapper function MyWrapper::eval will actually attempt to copy my whole Grid object into the parameter to the given function it has to wrap, f, even though the call to MyEquation::eval will know not to copy it every time (because of optimization).
template<typename T>
double neighbour_average(T *v, int n)
{
return v[-n] + v[n] - 2 * v[0];
}
template<typename T>
struct MyEquation
{
T constant;
int n;
T eval(Grid<T, 2> v, int i)
{
return rand() / RAND_MAX + neighbour_average(v.values + i, n) + constant;
}
};
template<typename T, typename R, typename A>
struct MyWrapper
{
MyWrapper(T &t, R(T::*f)(A, int), A a) : t{ t }, f{ f }, a{ a } {}
auto eval(int i)
{
return (t.*f)(a, i);
}
protected:
A a;
T &t;
R(T::*f)(A, int);
};
int main(int argc, char *argv[])
{
srand((unsigned int)time(NULL));
for (iter_type i = 0; i < config().len_; ++i)
{
op.values[i] = rand() / RAND_MAX;
}
srand((unsigned int)time(NULL));
double constant = rand() / RAND_MAX;
int n = 2;
int test_len = 100'000,
int test_run = 100'000'000;
Grid<double, 2> arr(100, 1000);
MyEquation<double> eq{ constant, n };
MyWrapper weq(eq, &MyEquation<double>::eval, arr); // I'm wrapping what I want to do
{
// Time t0("wrapper thing");
for (int i = 0; i < test_run; ++i)
{
arr.values[n + i % (test_len - n)] += weq.eval(n + i % (test_len - n)); // a call to the wrapping class to evaluate
}
}
{
// Time t0("regular thing");
for (int i = 0; i < test_run; ++i)
{
arr.values[n + i % (test_len - n)] += rand() / RAND_MAX + neighbour_average(arr.values + n + i % (test_len - n), n) + constant; // a usage of the neighbour function without the wrapping call
}
}
{
// Time t0("function thing");
for (int i = 0; i < test_run; ++i)
{
arr.values[n + i % (test_len - n)] += eq.eval(arr, n + i % (test_len - n)); // raw evaluation of my equation
}
}
}
Some context:
Grid is just a glorified dynamic array Grid::values with a few helper functions.
I've retained some of the (seemingly unnecessary) templates to my function and object, because it closely parallels how my code is actually set up.
The Time class will give me the duration of the object lifetime, so its a quick and dirty way of measuring certain blocks of code.
So anyways...
If the following code is changed so the signature of the function taken by MyWrapper is R(T::*f)(A&, int), then the execution time of MyWrapper::eval will be almost identical to the other calls (which is what I want anyways).
Why doesn't the compiler (msvc 2017) know it should treat the call weq.eval(n) (and consequently (t.*f)(a, n)) the with the same optimization considerations way as the direct evaluation, if the signature and function is given at compile time?
A function parameter is its own variable, which gets initialized from a function call argument. So when a function argument in the calling function is an lvalue such as the name of an object previously defined, and the function parameter is an object type, not a reference type, the parameter and the argument are two different objects. If the parameter has a class type, this means a constructor for that type has to be executed (unless the initialization is an aggregate initialization from a {} initializer list).
In other words, every call to
T eval(Grid<T, 2> v, int i);
needs to create a new Grid<T, 2> object called v, whether it's called via function pointer or by the member name eval.
But in many cases, initialization of a reference doesn't create a new object. It appears your eval doesn't need to modify v or the MyEquation, so it would be better to declare that eval as:
T eval(const Grid<T, 2> &v, int i) const;
This would mean the function pointer in Wrapper needs to be R (T::*f)(const A&, int) const.
But another change you might want to make, especially since Wrapper is already a template: Just make the function used a generic type, so that it can hold non-member function pointers, wrappers to member function pointers with any signature, lambdas, or any other class type with an operator() member.
#include <utility>
template<typename F, typename A>
struct MyWrapper
{
MyWrapper(F f, A a) : f{ std::move(f) }, a{ std::move(a) } {}
auto eval(int i)
{
return f(a, i);
}
protected:
A a;
F f;
};
Then two ways to create your Wrapper weq; are:
Wrapper weq([&eq](const auto &arr, int i) {
return eq.eval(arr, i);
}, arr);
or (requires #include <functional>):
using namespace std::placeholders;
Wrapper weq(
std::bind(std::mem_fn(&MyEquation<double>::eval), _1, _2),
arr);
Performance analysis question: Is there a way to execute a function in context of a class, or a method of a class?
I would like to analyze the performance of a specific segment of logic. What I envision is something like this
(Disclaimer: rough example just to illustrate a point. Will not compile).
const int DEBUG_LEVEL = 7;
class PerfWrapper {
public:
PerfWrapper(int f) {} // Constructor: take function as argument
void invoke() {} // Invoke the function passed as argument
double execution_time() {
begin = std::chrono::high_resolution_clock::now();
// etc..
}
double memory_usage() {}
private:
}
int foo() {
int sum{0}
for (int i=0; i<1000; ++i)
for (int j=0; j<MAX; ++j)
sum += i * j;
return sum;
}
int main() {
if (DEBUG_LEVEL = 7)
PerfWrapper p(foo); // Create an instance, passing foo as an argument
// below foo() is called in context of the performance wrapper
int myTime = p.invoke().execution_time(); // Invokes foo in context of p and tracks execution time
int myMemory = p.invoke().memory_usage(); // Same, except gathering memory usage info.
// etc..
}
}
Here we have class PerfWrapper. When instantiated, resulting methods on the object have the ability to accept a function as an argument, and execute a function in context of the class. It will take perf measurements, results of which are accessible through the interface.
Note the "DEBUG_LEVEL" setting. If performance profiling is needed then simply set the DEBUG_LEVEL to 7.
Have you seen anything like this? If not, how is the analysis best accomplished? I know that it seems a bit out there, but hopefully not so much. Thx, Keith :^)
Maybe you are looking for function pointers, which could be used as shown in the following simplified code:
typedef int(*aFooFunctionType)(void);
class PerformanceTest {
public:
PerformanceTest(aFooFunctionType fooFuncPtr) { m_fooFuncPtr = fooFuncPtr; }
void test() {
int x = m_fooFuncPtr();
// do something with x (or not...)
};
private:
aFooFunctionType m_fooFuncPtr;
};
int fooFunc(void) {
return 100;
}
int main(int argc, char* argv[]) {
PerformanceTest pTest(fooFunc);
pTest.test();
return 0;
}
You can wrap almost anything in a std::function. I would suggest use of a std::function in PerfWrapper to get the execution time. I don't have anything for measuring memory usage, though.
Example code:
#include <iostream>
#include <functional>
#include <chrono>
class PerfWrapper
{
public:
PerfWrapper(std::function<void()> f) : f_(f), execution_time_{} {}
void invoke()
{
auto begin = std::chrono::high_resolution_clock::now();
f_();
auto end = std::chrono::high_resolution_clock::now();
execution_time_ = end-begin;
}
double execution_time()
{
return execution_time_.count();
}
std::function<void()> f_;
std::chrono::duration<double> execution_time_;
};
unsigned long foo()
{
unsigned long sum{0};
for (int i=0; i<10000; ++i)
for (int j=0; j<2000; ++j)
sum += i * j;
return sum;
}
int main()
{
PerfWrapper pr([](){std::cout << foo() << std::endl;});
pr.invoke();
std::cout << "Execution time: " << pr.execution_time() << std::endl;
}
Output on my setup:
99940005000000
Execution time: 0.0454077
Consider using a template free function, with a reference parameter to extract the performance data. This example will:
Accept function pointers and functors (including std::function, which means it can work with methods, too).
Return the same value the proxied function call returns, so you can use both the measurement data and the call result.
struct measurement {
double execution_time;
double memory_usage;
};
template <typename FN, typename... T>
inline auto measure(FN fn, measurement& m, T&&... args) -> decltype(fn(std::forward<T>(args)...))
{
auto&& result = fn(std::forward<T>(args)...);
m.execution_time = 0; // example
m.memory_usage = 0;
return result;
}
Say I have a binary search function which initializes and uses a lambda:
bool custom_binary_search(std::vector<int> const& search_me)
{
auto comp = [](int const a, int const b)
{
return a < b;
};
return std::binary_search(search_me.begin(), search_me.end(), comp);
}
Without pointing out that this is completely redundant and just focusing on the lambda; is it expensive to be declaring and defining that lambda object every time? Should it be static? What would it mean for a lambda to be static?
The variable 'comp' with type <some anonymous lambda class> can be made static, pretty much as any other local variable, i.e. it is the same variable, pointing to the same memory address, every time this function is run).
However, beware of using closures, which will lead to subtle bugs (pass by value) or runtime errors (pass-by-reference) since the closure objects are also initialized only once:
bool const custom_binary_search(std::vector<int> const& search_me, int search_value, int max)
{
static auto comp_only_initialized_the_first_time = [max](int const a, int const b)
{
return a < b && b < max;
};
auto max2 = max;
static auto comp_error_after_first_time = [&max2](int const a, int const b)
{
return a < b && b < max2;
};
bool incorrectAfterFirstCall = std::binary_search(std::begin(search_me), std::end(search_me), search_value, comp_only_initialized_the_first_time);
bool errorAfterFirstCall = std::binary_search(std::begin(search_me), std::end(search_me), search_value, comp_error_after_first_time);
return false; // does it really matter at this point ?
}
Note that the 'max' parameter is just there to introduce a variable that you might want to capture in your comparator, and the functionality this "custom_binary_search" implements is probably not very useful.
the following code compiles and runs ok in visual studio 2013:
bool const test(int & value)
{
//edit `&value` into `&` #log0
static auto comp = [&](int const a, int const b)
{
return a < (b + value);
};
return comp(2,1);
}
And later:
int q = 1;
cout << test(q); // prints 0 //OK
q++;
cout << test(q); // prints 1 //OK
The compiler will transform any lambda declaration into a regular function and this is done at compile time. The actual definition in the test function is just a regular assignment to the comp variable with the pointer to a c function.
Closures are the generaly the same but will work ok only in the scope they were defined. In any other scope they will fail or generate a memory corruption bug.
Defining comp static would only improve the performance insignificantly or not at all.
Hope this helps:
Razvan.
I recently discovered that in C++ you can overload the "function call" operator, in a strange way in which you have to write two pair of parenthesis to do so:
class A {
int n;
public:
void operator ()() const;
};
And then use it this way:
A a;
a();
When is this useful?
This can be used to create "functors", objects that act like functions:
class Multiplier {
public:
Multiplier(int m): multiplier(m) {}
int operator()(int x) { return multiplier * x; }
private:
int multiplier;
};
Multiplier m(5);
cout << m(4) << endl;
The above prints 20. The Wikipedia article linked above gives more substantial examples.
There's little more than a syntactic gain in using operator() until you start using templates. But when using templates you can treat real functions and functors (classes acting as functions) the same way.
class scaled_sine
{
explicit scaled_sine( float _m ) : m(_m) {}
float operator()(float x) const { return sin(m*x); }
float m;
};
template<typename T>
float evaluate_at( float x, const T& fn )
{
return fn(x);
}
evaluate_at( 1.0, cos );
evaluate_at( 1.0, scaled_sine(3.0) );
A algorithm implemented using a template doesn't care whether the thing being called is a function or a functor, it cares about the syntax. Either standard ones (e.g. for_each()) or your own. And functors can have state, and do all kinds of things when they are called. Functions can only have state with a static local variable, or global variables.
If you're making a class that encapsulates a function pointer, this might make the usage more obvious.
The compiler can also inline the functor and the function call. It cannot inline a function pointer, however. This way, using the function call operator can significantly improve performance when it is used for example with the standard C++ libary algorithms.
For example for implementing generators:
// generator
struct Generator {
int c = 0;
virtual int operator()() {
return c++;
}
};
int sum(int n) {
Generator g;
int res = 0;
for( int i = 0; i < n; i++ ) {
res += g();
}
return res;
}
I see potential to yet one exotic use:
Suppose you have object of unknown type and have to declare another variable of same type, like this:
auto c=decltype(a*b)(123);
When such pattern used extensively, decltype become very annoying.
This case can occur when using some smart type system that automatically invent type of result of functions and operators based on types of arguments.
Now, if each specialization of each type of that type system equipped with
magic definition of operator() like this:
template<????> class Num<???>{
//specific implementation here
constexpr auto operator()(auto...p){return Num(p...);}
}
decltype() no more needed, you can write simply:
auto c=(a*b)(123);
Because operator() of object redirects to constructor of its own type.