C++ metaprogramming with templates versus inlining

C++ metaprogramming with templates versus inlining - c++

Is it worth to write code like the following to copy array elements:
#include <iostream>
using namespace std;
template<int START, int N>
struct Repeat {
static void copy (int * x, int * y) {
x[START+N-1] = y[START+N-1];
Repeat<START, N-1>::copy(x,y);
}
};
template<int START>
struct Repeat<START, 0> {
static void copy (int * x, int * y) {
x[START] = y[START];
}
};
int main () {
int a[10];
int b[10];
// initialize
for (int i=0; i<=9; i++) {
b[i] = 113 + i;
a[i] = 0;
}
// do the copy (starting at 2, 4 elements)
Repeat<2,4>::copy(a,b);
// show
for (int i=0; i<=9; i++) {
cout << a[i] << endl;
}
} // ()
or is it better to use a inlined function?
A first drawback is that you can't use variables in the template.

That's not better. First of all, it's not really compile time, since you make function calls here. If you are lucky, the compiler will inline these and end up with a loop you could have written yourself with much less amount of code (or just by using std::copy).

General rule: Use templates for things known at compile time, use inlining for things known at run time. If you don't know the size of your array at compile time, then don't be using templates for it.

You shouldn't do this. Templates were invented for different purpose, not for calculations, although you can do it. First you can't use variables, second templates will produce vast of unused structures at compilation, and third is: use for (int i = start; i <= end; i++) b[i] = a[i];

That's better because you control and enforce the loop unrolling by yourself.
A loop can be unrolled by the compiler depending on optimizing options...
The fact that copying with copy is almost the best is not a good general answer because the loop unrolling can be done whatever is the computation done inside...

Related

C++ Lambda Overhead

I have an O(N^4) scaling algorithm of the form
...
...
...
for (unsigned i = 0; i < nI; ++i) {
for (unsigned j = 0; j < nJ; ++j) {
for (unsigned k = 0; k < nK; ++k) {
for (unsigned l = 0; l < nL; ++l) {
*calculate value*
*do something with value*
}
}
}
}
I need this code in a couple of places so I put it a looper function as part of a class. This loop function is templated so that it can accept a lambda function which takes care of *do something with value*.
Some tests have shown that this is not optimal performance-wise but I do not have any idea on how to get around explicitly writing out this code every time I need it. Do you see a way of doing this?

Using a templated function to call the lambda should generate a code that can be optimized by modern optimizing compilers. It is actually the case for the last version of GCC, Clang and MSVC. You can check that on GodBolt with this code:
extern int unknown1();
extern int unknown2(int);
template <typename LambdaType>
int compute(LambdaType lambda, int nI, int nJ, int nK, int nL)
{
int sum = 0;
for (unsigned i = 0; i < nI; ++i) {
for (unsigned j = 0; j < nJ; ++j) {
for (unsigned k = 0; k < nK; ++k) {
for (unsigned l = 0; l < nL; ++l) {
sum += lambda(i, j, k, l);
}
}
}
}
return sum;
}
int caller(int nI, int nJ, int nK, int nL)
{
int context = unknown1();
auto lambda = [&](int i, int j, int k, int l) -> int {
return unknown2(context + i + j + k + l);
};
return compute(lambda, nI, nJ, nK, nL);
}
Using optimization flags, GCC, Clang and MSVC are capable of generating an efficient implementation of compute eliding the lambda calls in the 4 nested loops (unknown2 is directly called in the generated assembly). This is the case even if compute is not inlined. Note the fact that the lambda capture its context do not actually prevent optimisations (although this is much harder for the compiler to optimize this case).
Note that this is important not to use the direct lambda type and not wrappers like std::function as wrapper will likely prevent optimizations (or at least make optimizations much more difficult to apply) resulting in direct function calls. Indeed, the type help the compiler to inline the function and then apply further optimizations like vectorization and constant propagation.
Note that the code of the lambda should be kept small. Otherwise, it may not be inlined resulting in a function call. A direct function call is not so slow with if the function body is pretty big on modern processors because of good branch prediction units and relatively fast large caches. However, the cost of preventing further optimizations mostly possible due to the lambda inlining can be huge. One way to mitigate this cost is to move at least one loop in the lambda (see Data-oriented design for more information). Another solution is to use OpenMP to help the compiler vectorizing the lambda thanks to #pragma omp declare simd [...] directives (assuming your compiler supports it). You can also play with compiler inlining command-line parameters to tell your compiler to actually inline the lambda in such a case.

Is it possible in C++ to use the same code with and without compile time constants?

say you have a function like:
double do_it(int m)
{
double result = 0;
for(int i = 0; i < m; i++)
result += i;
return result;
}
If you know m at compile time you can do:
template<size_t t_m>
double do_it()
{
double result = 0;
for(int i = 0; i < t_m; i++)
result += i;
return result;
}
This gives a possibility for things like loop unrolling when optimizing. But, sometimes you might know some cases at compile-time and some at run-time. Or, perhaps you have defaults which a user could change...but it would be nice to optimize the default case.
I'm wondering if there is any way to provide both versions without basically duplicating the code or using a macro?
Note that the above is a toy example to illustrate the point.

In terms of the language specification, there's no general way to have a function that works in the way you desire. But that doesn't mean compilers can't do it for you.
This gives a possibility for things like loop unrolling when optimizing.
You say this as though the compiler cannot unroll the loop otherwise.
The reason the compiler can unroll the template loop is because of the confluence of the following:
The compiler has the definition of the function. In this case, the function definition is provided (it's a template function, so its definition has to be provided).
The compiler has the compile-time value of the loop counter. In this case, through the template parameter.
But none of these factors explicitly require a template. If the compiler has the definition of a function, and it can determine the compile-time value of the loop counter, then it has 100% of the information needed to unroll that loop.
How it gets this information is irrelevant. It could be an inline function (you have to provide the definition) which you call given a compile-time constant as an argument. It could be a constexpr function (again, you have to provide the definition) which you call given a compile-time constant as an argument.
This is a matter of quality of implementation, not of language. If compile-time parameters are to ever be a thing, it would be to support things you cannot do otherwise, not to support optimization (or at least, not compiler optimizations). For example, you can't have a function which returns a std::array whose length is specified by a regular function parameter rather than a template parameter.

Yes you can, with std::integral_constant. Specifically, the following function will work with an int, as well as specializations of std::integral_constant.
template<class Num>
constexpr double do_it(Num m_unconverted) {
double result = 0.;
int m_converted = static_cast<int>(m_unconverted);
for(int i = 0; i < m_converted; i++){ result += i; }
return result;
}
If you want to call do_it with a compile-time constant, then you can use
constexpr double result = do_it(std::integral_constant<int, 5>{});
Otherwise, it's just
double result = do_it(some_number);

Use constexpr (needs at least C++14 to allow for):
constexpr double do_it(int m)
{
double result = 0;
for(int i = 0; i < m; i++)
result += i;
return result;
}
constexpr double it_result = do_it(10) + 1; // compile time `do_it`, possibly runtime `+ 1`
int main() {
int x;
cin >> x;
do_it(x); // runtime
}
If you want to force a constexpr value to be inlined as part of a runtime expression, you can use the FORCE_CT_EVAL macro from this comment:
#include <utility>
#define FORCE_CT_EVAL(func) [](){constexpr auto ___expr = func; return std::move(___expr);}()
double it_result = FORCE_CT_EVAL(do_it(10)); // compile time

Why can't I initialize non-final instance variables in C++ or make variable-sized arrays?

I have little to no knowledge of C++ and how to use arrays. That being said, what I'm trying to do is create a simple class for chemical elements which automatically decides the number of shells allotted to that element based on just its atomic number. Here's my sample code:
class Element {
int n, i;
int s = 1;
for (int i = 2; i < n; i += 8) {s += 1;}
int shell[s + 1];
public: Element(int n) {this.n = n;}
};
That snippet of code is supposed to create an array called int shell[s + 1] which contains s-1 shells. I made it s-1 instead of s so I wouldn't constantly confuse myself by referring to shell #1 as shell[0] and so forth. Thus, shell[0] is unused. Or I could do it the other way around and actually use shell[0], but that's irrelevant. As you can see, int s is automatically set to 1 because all elements contain at least one shell. Then there's a for loop that adds shells based on int n. Finally, I declared the array int shell[s + 1].
Ultimately, I got a multitude of errors. Most of them were nonsensical syntax errors, but apparently in C++ you're not allowed to initialize non-final instance variables. That doesn't make much sense to me, because I really need int s to begin at 1 for the for loop. It also tells me that I can't make variable-sized arrays, either. What do?

You could use std::vector<int> and initialize it in constructor:
class Element {
int s, n;
std::vector<int> shell;
public: Element(int n) : s(1), n(n) {
for (int i = 2; i < n; i += 8) {s += 1;}
shell.resize(s + 1);
}
};

Advantage of function taking a pointer to a collection, to avoid copying on return?

Suppose I have the following C++ function:
// Returns a set containing {1!, 2!, ..., n!}.
set<int> GetFactorials(int n) {
set<int> ret;
int curr = 1;
for (int i = 1; i < n; i++) {
curr *= i;
ret.insert(curr);
}
return ret;
}
set<int> fs = GetFactorials(5);
(This is just a dummy example. The key is that the function creates the set itself and returns it.)
One of my friends tells me that instead of writing the function the way I did, I should write it so that the function takes in a pointer to a set, in order to avoid copying the set on return. I'm guessing he meant something like:
void GetFactorials2(int n, set<int>* fs) {
int curr = 1;
for (int i = 1; i < n; i++) {
curr *= i;
fs->insert(curr);
}
}
set<int> fs;
GetFactorials2(5, &fs);
My question: is this second way really a big advantage? It seems pretty weird to me. I'm new to C++, and don't know much about compilers, but I would assume that through some compiler magic, my original function wouldn't be that much more expensive. (And I'd get to avoid having to initialize the set myself.) Am I wrong? What should I know about pointers and copying-on-return to understand this?

No, it is generally not advantageous at all. Just about any reasonable compiler these days will utilize named return value optimization (see here). This effectively removes any performance penalty from the former example.
If you really want to get into the nitty gritty, read this article by Dave Abrahams (one of the big contributors to boost). Long story short, however, just return the value. It's probably faster.

Yes it can be expensive. Especially when the set gets bigger. There is no reason not to use pointers or reference here. It will save you a lot and you don't sacrifice much regarding readability.
And why rely on compiler optimizations when you can optimize it yourself. The compiler knows your code but not always understands your algorithm.
I would do this
void GetFactorials2(int n, set<int>& fs) {
// ^^
int curr = 1;
for (int i = 1; i < n; i++) {
curr *= i;
fs->insert(curr);
}
}
and the call will stay normal.
set<int> fs;
GetFactorials2(5, fs);
^^

Initializing static global data using function call (at compile time)

I am trying to save compute time by computing sequences of numbers at compile time and storing them as static vectors (but I might settle for computation once at the beginning of runtime for now). A simple (not compiling) example of what I am trying to do would be:
#include <vector>
using namespace std;
static vector<vector<int> > STATIC_THING(4, vector<int>(4));
void Generator(int x, int y, vector<int> *output) {
// Heavy computing goes here
for(int i=0; i < 4; ++i)
(*output)[i] = x * y;
return;
}
static void FillThings() {
for(int x=0; x < 4; ++x)
for(int y=0; y < 4; ++y)
Generator(x, y, &STATIC_THING[x]);
}
FillThings();
int main() {
}
Is there a way other than precomputing and hardcoding my sequences into arrays to get the compiler to do the lifting on this? I feel like there should be a way to at least get this done upon the first #include of the header this will live in, but I have only seen it done with classes. I can use arrays instead of vectors if it will facilitate computation at compile-time.
EDITS:
Although template metaprogramming was suggested, my actual generator algorithm is far too complex to lend itself to this technique.
Using a Lookup Table seems to be my only other option that will allow me to avoid runtime computation; I will fall back on this if performance continues to be an issue in the future.

Do this:
static int FillThings() {
for(int x=0; x < 4; ++x)
for(int y=0; y < 4; ++y)
Generator(x, y, &STATIC_THING[x]);
return 9087;
}
static int q = FillThings();

If you can't initialize from actual literals via a brace initializer, then you could do something like this:
typename std::vector<std::vector<int>> my_vector;
static my_vector make_static_data()
{
my_vector result;
// ... populate ...
return result;
}
static const my_vector static_data = make_static_data();

Not that easy: std::vector is a dynamic structure. It is not "fillable" ar "compile time". It can be filled in at startup by initializing a static variable with the return of an invoked function, or lambda, that actually fills-up the vector.
this can be a way.
But a proper "compile time vecotr" should look like a template whose "index" is an int given as a parameter, like
template<unsigned idx>
struct THING
{
static const int value = .... //put a costant expression here
};
to be used as THING<n>::value.
The "constant expression" can be a function(THING<idx-1>::value), recursively down to a specialized
temnplate<>
struct THING<0U> {};
That stops the compiler recursion.
There are, however, some limitations: the expression that defines the value static member must be a constexpr (so, only integer types, built-in oerations and no <cmath>, and just function declared with constexpr), and the value used as idx must be itself a constant (not a variable).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ metaprogramming with templates versus inlining - c++

That's not better. First of all, it's not really compile time, since you make function calls here. If you are lucky, the compiler will inline these and end up with a loop you could have written yourself with much less amount of code (or just by using std::copy).

General rule: Use templates for things known at compile time, use inlining for things known at run time. If you don't know the size of your array at compile time, then don't be using templates for it.

You shouldn't do this. Templates were invented for different purpose, not for calculations, although you can do it. First you can't use variables, second templates will produce vast of unused structures at compilation, and third is: use for (int i = start; i <= end; i++) b[i] = a[i];

Related

C++ Lambda Overhead

Is it possible in C++ to use the same code with and without compile time constants?

Why can't I initialize non-final instance variables in C++ or make variable-sized arrays?

Advantage of function taking a pointer to a collection, to avoid copying on return?

Initializing static global data using function call (at compile time)

Categories

Resources