I'm quite confused with the new keyword constexpr of C++2011. I would like to know where to use constexpr and where to use templates metaprogramming when I code compile-time functions (especially maths functions). For example if we take an integer pow function :
// 1 :
template <int N> inline double tpow(double x)
{
return x*tpow<N-1>(x);
}
template <> inline double tpow<0>(double x)
{
return 1.0;
}
// 2 :
constexpr double cpow(double x, int N)
{
return (N>0) ? (x*cpow(x, N-1)) : (1.0);
}
// 3 :
template <int N> constexpr double tcpow(double x)
{
return x*tcpow<N-1>(x);
}
template <> constexpr double tcpow<0>(double x)
{
return 1.0;
}
Are the 2nd and 3rd functions equivalent ?
What is the best solution ? Does it produce the same result :
if x is known at compile-time
if x is not known at compile-time
When to use constexpr and when to use template metaprogramming ?
EDIT 1 : code modified to include specialization for templates
I probably shouldn't be answering a template metaprogramming question this late. But, here I go.
Firstly, constexpr isn't implemented in Visual Studio 2012. If you want to develop for windows, forget about it. I know, it sucks, I hate Microsoft for not including it.
With that out of the way, there's lots of things you can declare as constant, but they aren't really "constant" in terms of "you can work with them at compile time." For instance:
const int foo[5] = { 2, 5, 1, 9, 4 };
const int bar = foo[foo[2]]; // Fail!
You'd think you could read from that at compile time, right? Nope. But you can if you make it a constexpr.
constexpr int foo[5] = { 2, 5, 1, 9, 4 };
constexpr int bar = foo[foo[2]]; // Woohoo!
Constexpr's are really good for "constant propagation" optimization. What that means is if you have a variable X, that is declared at compile time based on some condition (perhaps metaprogramming), if it is a constexpr then the compiler knows it can "safely" use it when doing optimization to, say, remove instructions like a = (X * y); and replace them with a = 0; if X evaluated to 0 (and other conditions are met).
Obviously this is great because for many mathematical functions, constant propagation can give you an easy (to use) premature optimization.
Their main use, other than rather esoteric things (such as enabling me to write a compile-time byte-code interpreter a lot easier), is to be able to make "functions" or classes that can be called and used both at compile-time and at runtime.
Basically they just sort of fill a hole in C++03 and help with optimization by the compiler.
So which of your 3 is "best"?
2 can be called at run-time, whereas the others are compile-time only. That's pretty sweet.
There's a bit more to it. Wikipedia gives you a very basic summary of "constexpr allows this," but template metaprogramming can be complicated. Constexpr makes parts of it a lot easier. I wish I had a clear example for you other than say, reading from an array.
A good mathematical example, I suppose, would be if you wanted to implement a user-defined complex number class. It would be an order of magnitude more complex to code that with only template metaprogramming and no constexpr.
So when should you not use constexpr? Honestly, constexpr is basically "const except MORE CONST." You can generally use it anywhere you'd use const, with a few caveats such as how when called during runtime a function will act non-const if its input isn't const.
Um. OK, that's all for now. I'm too overtired to say more. I hope i was helpful, feel free to downvote me if I wasn't and I'll delete this.
1st and 3rd are incorrect. Compiler will try instantiate tpow<N-1> before it will evaluate (N>0) ?, and you will get infinite template recursion. You need specialisation for N==1 (or ==0) to make it work. 2nd will work for xknown at compile time and run time.
Added after your specialization for ==0 edit. Now all function will work for compile time or run time x. 1st will always return non constexpr value. 2nd and 3rd will return constexpr if x and N are constexpr. 2nd even work if N is not constexpr, other need constexpr N (so, 2nd and 3rd are not equivalent).
The constexpr is used in two cases. When you write int N=10;, value of N is known at compile time but it is not constexpr and can not used for example as template argument. Keyword constexpr explicitly tells compiler that N is safe to use as compile time value.
Second use is as constexpr functions. They use subset of C++ to conditionally produce constexpr values and can dramatically simplify equivalent template functions. One detriment of constexpr functions is that you have no guaranteed compile time evaluation -- compiler can chose to do evaluation in in run time. With templated implementation you are guaranteed compile time evaluation.
Related
I've searched this question here(on SO), and as far as I know all questions assume what is compile time functions, but it is almost impossible for a beginner to know what that means, because resources to know that is quite rare.
I have found short wikipedia article which shows how to write incomprehensible code by writing never-seen-before use of enums in C++, and a video which is about future of it, but explains very little about that.
It seems to me that there are two ways to write compile time function in C++
constexpr
template<>
I've been through a short introduction of both of them, but I have no idea how they pop up here.
Can anyone explain compile time function with a sufficiently good example such that it encompasses most relevent features of it?
In cpp, as mentioned by you, there are two ways of evaluating a code on compile time - constexpr functions and template metaprogramming.
There are a few differences between those solutions. The template option is older and therefore supported by wider range of compilers. Additionaly templates are guaranteed to be evaluated in compile time while constexpr is somewhat like inline - it only suggests compiler that it is possible to do work while compiling. And for templates the arguments are usually passed via template parameters list while constexpr functions take arguments as regular functions (which they actually are). The constexpr functions are better in a manner that they can be called as regular functions in runtime.
Now the similarities - it must be possible for their parameters to be evaluated at compile time. So they must be either a literal or result of other compile-time function.
Having said all that let's look at compile time max function:
template<int a, int b>
struct max_template {
static constexpr int value = a > b ? a : b;
};
constexpr int max_fun(int a, int b) {
return a > b ? a : b;
}
int main() {
int x = 2;
int y = 3;
int foo = max_fun(3, 2); // can be evaluated at compile time
int bar = max_template<3, 2>::value; // is surely evaluated at compile time
// won't compile without compile-time arguments
// int bar2 = max_template<x, y>::value; // is surely evaluated at compile time
int foo = max_fun(x, y); // will be evaluated at runtime
return 0;
}
A "compile time function" as you have seen the term used is not a C++ construct, it's just the idea of computing stuff (hence, function) at compile-time (as opposed to computing at runtime or via a separate build tool outside the compiler). C++ makes this possible in several ways, of which you have found two:
Templates can indeed be used to compute arbitrary stuff, a set of techniques called "template metaprogramming". That's mostly by accident as they weren't designed for this purpose at all, hence the crazy syntax and struggles with old compilers. But in C++03 and before, that's all we had.
constexpr has been added in C++11 after seeing the need for compile-time calculations, and brings them back into somewhat saner territory. Its toolbelt has been expanding ever since, allowing more and more normal-looking code to be run at compile-time by just tacking a constexpr in the right place.
One could also mention macro metaprogramming, of which Boost.Preprocessor is a good example. But it's even more wonky and abhorrently arcane than old-school template metaprogramming, so you probably don't want to use it if you have a choice.
I wanted to do a couple of sanity tests for a pair of convenience functions that split a 64-bit integer in two 32-bit integers, or do the reverse. The intent is that you don't do the bit shifts and logic ops all over again with the potential of a typo somewhere. The sanity tests were supposed to make 100% sure that the pair of functions, although pretty trivial, indeed works as intended.
Nothing fancy, really... so as the first thing I added this:
static constexpr auto joinsplit(uint64_t h) noexcept { auto [a,b] = split(h); return join(a,b); }
static_assert(joinsplit(0x1234) == 0x1234);
... which works perfectly well, but is less "exhaustive" than I'd like. Of course I can follow up with another 5 or 6 tests with different patterns, copy-paste to the rescue. But seriously... wouldn't it be nice to have the compiler check a dozen or so values, within a pretty little function? No copy-paste? Now that would be cool.
With a recursive variadic template, this can be done (and it's what I'm using in lack of something better), but it's in my opinion needlessly ugly.
Given the power of constexpr functions and range-based for, wouldn't it be cool to have something nice and readable like:
constexpr bool test()
{
for(constexpr auto value : {1,2,3}) // other numbers of course
{
constexpr auto [a,b] = split(value);
static_assert(value == join(a,b));
}
return true; // never used
}
static_assert(test()); // invoke test
A big plus of this solution would be that in addtion to being much more readable, it would be obvious from the failing static_assert not just that the test failed in general, but also the exact value for which it failed.
This, however, doesn't work for two reasons:
You cannot declare value as constexpr because, as stated by the compiler: "The value of __for_begin is not usable in a constant expression". The reason for that is also explained by the compiler: "note: __for_begin was not declared constexpr". Fair enough, that is a reason, silly as it may be.
Decomposition declaration cannot be declared constexpr (which is promptly followed by a non-constexpr condition for static_assert error).
In both cases, I wonder if there is truly a hindrance to allowing these being constexpr. I understand why it doesn't work (see above!), but the interesting question is why is it like that?
I acknowledge that declaring value as constexpr is a lie to begin with since its value obviously is not constant (it's different in each iteration). On the other hand, any value that it ever takes is from a compiletime constant set of values, yet without the constexpr keyword the compiler refuses to treat it as such, i.e. the result of split is non-constexpr and not usable with static_assert although it really is, by all means.
OK, well... I'm probably really asking too much if I want to declare something that has a changing value as constant. Even though from some point of view, if it is constant, in each iteration's scope. Somehow... is the language missing a concept here?
I acknowledge that range-based for is, like lambdas, really just a hack that mostly works, and mostly works invisibly, not a true language feature -- the mention of __for_begin is a dead giveaway on its implementation. I also acknowledge that it's generally tricky (forbidding) to allow the counter in a normal for loop being constexpr, not only because it's not constant, but because you can in principle have any kind of expressions in there, and it truly cannot be easily told in advance what values in general will be generated (not with reasonable effort during compiletime, anyway).
On the other hand, given an exact finite sequence of literals (which is as compiletime-constant as it can get), the compiler should be able to do a number of iterations, each iteration of the loop with a different, compiletime-constant value (unroll the loop if you will). Somehow, in a readable (non-recursive-template) manner, such thing should be possible?
Am I asking too much there?
I acknowledge that a decomposition declaration is not an altogether "trivial" thing. It might for example require calling get on a tuple, which is a class template (that could in principle be anything). But, whatever, get happens to be constexpr (so that's no excuse), and also in my concrete example, an anonymous temporary of an anonymous struct with two members is returned, so public direct member binding (to a constexpr struct) is used.
Ironically, the compiler even does exactly the right thing in the first example, too (and with recursive templates as well). So apparently, it's quite possible. Only just, for some reason, not in the second example.
Again, am I asking too much here?
The likely correct answer will be "The standard doesn't provide that".
Apart from that, are there any true, technical reasons why this cannot, could not, or should not work? Is that an oversight, an implementation deficiency, or intentionally forbidden?
I can't answer you theoretical questions (" is the language missing a concept here?", " such thing should be possible? Am I asking too much there?", "there any true, technical reasons why this cannot, could not, or should not work? Is that an oversight, an implementation deficiency, or intentionally forbidden?") but, from the practical point of view...
With a recursive variadic template, this can be done (and it's what I'm using in lack of something better), but it's in my opinion needlessly ugly.
I think that variadic templates is the right way and (you tagged C++17), using folding, there is no reason to recursivize it.
By example
template <uint64_t ... Is>
static constexpr void test () noexcept
{ static_assert( ((joinsplit(Is) == Is) && ...) ); }
The following is a full compiling example
#include <utility>
#include <cstdint>
static constexpr std::pair<uint32_t, uint32_t> split (uint64_t h) noexcept
{ return { h >> 32 , h }; }
static constexpr uint64_t join (uint32_t h1, uint32_t h2) noexcept
{ return (uint64_t{h1} << 32) | h2; }
static constexpr auto joinsplit (uint64_t h) noexcept
{ auto [a,b] = split(h); return join(a, b); }
template <uint64_t ... Is>
static constexpr void test () noexcept
{ static_assert( ((joinsplit(Is) == Is) && ...) ); }
int main()
{
test<1, 2, 3>();
}
-- EDIT -- Bonus answer
Folding (C++17) is great but never underestimate the power of comma operator.
You can obtain the same result (well... quite same) in C++14 with an helper function and the initialization of an unused array
template <uint64_t I>
static constexpr void test_helper () noexcept
{ static_assert( joinsplit(I) == I, "!" ); }
template <uint64_t ... Is>
static constexpr void test () noexcept
{
using unused = int[];
(void)unused { 0, (test_helper<Is>(), 0)... };
}
Obviously after a little change in joinsplit() to make it C++14 compliant
static constexpr auto joinsplit (uint64_t h) noexcept
{ auto p = split(h); return join(p.first, p.second); }
Suppose for the sake of argument I have following private constexpr in a class:
static constexpr uint16_t square_it(uint16_t x)
{
return std::pow(x, 2);
}
Then I want to construct a static constant array of these values for the integers up to 255 in the same section of the same class using the above constexpr:
static const uint16_t array_of_squares[256] =
{
//something
};
I'd like the array to be constructed at compile time, not at runtime if possible. I think the first problem is that using expressions like std::pow in a constexpr is not valid ISO C++ (though allowed by arm-gcc maybe?), as it can return a domain error. The actual expression I want to use is a somewhat complicated function involving std::exp.
Note that I don't have much of the std library available as I'm compiling for a small microprocessor, the Cortex M4.
Is there a more appropriate way to do this, say using preprocessor macros? I'd very much like to avoid using something like an external Python script to calculate the table each time it needs to be modified during development, and then pasting it in.
The problem, as you say, is that C standard library functions are in general not marked constexpr.
The best workaround here, if you need to use std::exp, is to write your own implementation that can be run at compile-time. If it's meant to be done at compile-time, then optimizing it probably isn't necessary, it only needs to be accurate and moderately efficient.
Someone asked a question about how to do that here a long time ago. You could reuse the idea from there and rewrite as a constexpr function in C++11, although you'd have to refactor it to avoid the for loop. In C++14 less refactoring would be required.
You could also try doing it strictly via templates, but it would be more painful, and double cannot be a template parameter so it would be more complicated.
How about something like this?
constexpr uint16_t square_it(uint16_t v) { return v*v; }
template <size_t N, class = std::make_index_sequence<N>>
struct gen_table;
template <size_t N, size_t... Is>
struct gen_table<N, std::index_sequence<Is...>> {
static const uint16_t values[N] = {square_it(Is)...};
};
constexpr auto&& array_of_squares = gen_table<256>::values;
I have no idea whether that microprocessor supports this sort of operation. It may not have make_index_sequence in your standard library (though you can find implementations on SO), and maybe that template instantiation will take too much memory. But at least it's something that works somewhere.
I am reading about C++ templates and would like to contrast two different implementations of a function which computes sum from 0 to N.
Unfortunately, I have problems and would like to address a few questions through examples:
Code for naive sum:
#include <stdio.h>
template<int N>
struct Sum {
// Copied the implementation idea from Scott Meyers book
// "Effective C++". Is there a better way?
enum { value = N + Sum<N - 1>::value };
};
template<>
struct Sum<0> {
enum { value = 0 };
};
int main() {
// Works well in this case, but gives compilation error, if
// it's called with a larger value, such as 10000
// (error: template instantiation depth exceeds maximum of 900").
// How to improve the program properly, that it would
// not give compile time error?
printf("%d\n", Sum<100>::value);
}
Now my idea for an improvement is to use an accumulator:
template<int Acc, int N>
struct Sum {
enum { value = Sum<Acc + N, N - 1>::value };
};
// Is that an appropriate way of writing the base case?
template<int Acc>
struct Sum<Acc, 0> {
enum { value = Acc };
};
However, when compiled with simple g++ on Ubuntu OS:
int main() {
// Still gives the "depth exceeded" error.
printf("%d\n", Sum<0, 1000>::value);
}
Hence, my main concern is:
Does any modern c++ compiler support last call optimisation for
template metaprogramming? If yes, what is an appropriate way to write code for such optimisation?
Does any modern c++ compiler support last call optimisation for template metaprogramming? If yes, what is an appropriate way to write code for such optimisation?
No, and it wouldn't make sense. Template instantiations are not function calls... last/tail call optimisation has no relevance here. Unlike function calls, template instantiations are not transient with automatic variables to reclaim; rather, each template instantiation becomes a new type in the compiler's state.
The whole point of template metaprogramming is that all of these "calls" will be optimised out of your program; they are "executed" during the build.
That doesn't change the fact that there is an implementation-defined limit to the amount of recursion you can use during this process. This is the limit you've hit.
So, no, there's no "optimisation" to work around it.
Short answer: incorporating LCO is not worth the trouble.
Longer explanation:
C++ template meta programming is Turing Complete. In theory it would be possible to compute any computable function at compile time using only templates (if enough resources were given). LCO would make such computation more efficient.
That does not mean templates should be used for sophisticated computations. Run time is for that. C++ templates merely aid to avoid writing identical code.
In fact, doing complicated computation through templates is discouraged, because one has little compiler support. The preprocessor will only expand templated code into more code and that's it. No type checking, etc. is happening when processing templates.
So I think the designers of c++ have more interesting things to add in the language rather than optimise template meta programming. Maybe after 20 years we will have LCO support. Currently there is no.
This question already has answers here:
Reason for using non-type template parameter instead of regular parameter?
(6 answers)
Closed 8 years ago.
It is known that in C++ we can have non-type template parameters like int:
template <class T, int size>
void f(){...}
I wonder how it is different from the ordinary way of passing a parameter into a function:
template <class T>
void f(int size) {...}
I think one difference is that for templates size is evaluated at compile-time and substituted as literals when instantiating the template. Thus I doubt (correct me if I'm wrong) that every different size value leads to the creation of new binary codes (the ".text"), which seems to be an overhead.
Can anyone tell when this is necessary and worthwhile?
Thus I doubt (correct me if I'm wrong) that every different size value leads to the creation of new binary codes (the ".text"), which seems to be an overhead.
This is actually the case, and this is a common source of code bloat. You need to figure out when you want to generate different functions for each N and when you want a single function in which the compiler has less information (note that this is not just for performance, also for correctness).
Since Matt already brought a simple example, lets work on a function that takes an array by reference.:
template<typename T, size_t N>
size_t operateOnArray( T (&array)[N] )
{
// Some complex logic, which could include:
for (std::size_t i = 0; i < N; ++i) {
// complicated stuff
}
}
The type of the argument is a reference to an array, the compiler will verify for you that the array truly has N elements (and it will deduce the type of the values in the array). This is a great improvement in type safety compared with some similar C style code:
size_t operateOnArray( T *array, size_t N)
{
// Some complex logic, which could include:
for (std::size_t i = 0; i < N; ++i) {
// complicated stuff
}
}
In particular, the user can mistakenly pass the wrong value:
int array[10];
operateOnArray(arrah, 20); // typo!!!
Where in the first case the compiler will deduce the size and it will guarantee that it is correct.
You hit the nail in the head when you mentioned that this can potentially add to the code size, and that it can add quite a lot. Imagine that the function is complex enough that it does not get inlined, and imagine that in your program you end up calling the function with all sizes from 1 to 100. The program code will contain 100 instantiations of basically the same code where the only difference is the size.
There are solutions around this, like mixing the two approaches:
size_t operateOnArray( T *array, size_t N); // Possibly private, different name...
template<typename T, size_t N>
size_t operateOnArray( T (&array)[N] ) {
operateOnArray(array, N);
}
In this case, the compiler will have one single copy of the complex code, in the C-style function, and will generate 100 versions of the template, but those are simple, simple enough that the compiler will inline the code and transform the program into the equivalent of the C-style approach with guaranteed type safety.
Can anyone tell when this is necessary and worthwhile?
It is necessary when the code inside the template requires the value as a compile time constant, for example in the code above, you cannot have a function argument that is a reference to an array of N elements where the N is only available at runtime. In other cases, like std::array<T,N> it is required to statically create an array of the proper size. No matter what, all examples shared that: the value needs to be known at compile time.
It is worthwhile, well, when it adds type safety to your program (see example above), or if it will allow stronger optimizations (a functor taking a function pointer/member function pointer as non-type argument can inline the function call).
And you should be aware that everything comes at a cost, in this case binary size. If the template is small enough that the code is likely to be inlined, don't worry, but if the code is quite complex, consider using hybrid approaches where you use a template argument where needed or if it provides a big advantage and regular arguments otherwise.
As well as providing for much greater optimization ability, the parameters can be involved in template deduction (unlike function arguments), e.g. this is a common one for finding how many items are in a named array:
template<typename T, size_t N>
size_t lengthof( T (&array)[N] )
{
return N;
}
Usage:
#include <iostream>
int main()
{
wchar_t foo[] = L"The quick brown fox";
std::wcout << "\"" << foo << "\" has " << lengthof(foo) - 1 << " characters.\n";
}
Probably the compiler will calculate the length at compiletime and substitute that directly into the wcout << ....... line, not even making a runtime function call.
Passing size as a non-type template parameter is necessary when f() wants to allocate a C-style array of that size (e.g., as f(){int array[size]; }). If you pass size as a function parameter, your program won't compile because the size is not known at compile time.
Compile-time dimensional analysis. This one has helped me detect errors early when implementing physics simulations.