I found this code, and wondering whether I should really implement something like this in my real project or not.
And things confusing me are
It will take more compile time, but I should not bother about
compile time at the cost of run time.
And what if N gets really a very big number then? Is there any file
size limit in source code?
or its something good to know, not to implement?
#include <iostream>
using namespace std;
template<int N>
class Factorial {
public:
static const int value = N * Factorial<N-1>::value;
};
template <>
class Factorial<1> {
public:
static const int value = 1;
};
int main() {
Factorial<5L> f;
cout << "5! = " << f.value << endl;
}
outPut:
5! = 120
slight modification in question, as i was playing with the code, found that
Factorial<12> f1; // works
Factorial<13> f2; // doesn't work
error:
undefined reference to `Factorial<13>::value'
is it like, can go up-to 12 depth not further?
The answer to 1 is that it depends, template meta programming essentially involves a trade-off between doing the calculation at compile-time with the benefit that it does not have to be done at run-time. In general this technique can lead to hard to read and maintain code. So the answer ultimately depends on your need for faster run-time performance over slower compiler times and possibly harder to maintain code.
The article Want speed? Use constexpr meta-programming! explains how in modern C++ you can use constexpr functions many times as a replacement for template meta programming. This in general leads to code that is more readable and perhaps faster. Compare this template meta programming method to the constexpr example:
constexpr int factorial( int n )
{
return ( n == 0 ? 1 : n*factorial(n-1) ) ;
}
which is more concise, readable and will be executed at compile time for arguments that are constant expressions although as the linked answer explains the standard does not actually say it must be but in practice current implementations definitely do.
It is also worth noting that since the result will quickly overflow value that another advantage of constexpr is that undefined behavior is not a valid constant expression and at least the current implementations of gcc and clang will turn undefined behavior within a constexpr into an error for most cases. For example:
constexpr int n = factorial(13) ;
for me generates the following error:
error: constexpr variable 'n' must be initialized by a constant expression
constexpr int n = factorial(13) ;
^ ~~~~~~~~~~~~~
note: value 6227020800 is outside the range of representable values of type 'int'
return ( n == 0 ? 1 : n*factorial(n-1) ) ;
^
This is also why you example:
Factorial<13> f2;
fails because a constant expression is required and gcc 4.9 gives a useful error:
error: overflow in constant expression [-fpermissive]
static const int value = N * Factorial<N-1>::value;
^
although older versions of gcc give you the less than helpful error you are seeing.
For question 2, compilers have a template recursion limit, which can usually be configured but eventually you will run out of system resources. For example the flag for gcc would be -ftemplate-depth=n:
Set the maximum instantiation depth for template classes to n. A limit
on the template instantiation depth is needed to detect endless
recursions during template class instantiation. ANSI/ISO C++
conforming programs must not rely on a maximum depth greater than 17
(changed to 1024 in C++11). The default value is 900, as the compiler
can run out of stack space before hitting 1024 in some situations.
As in your specific problem you will need to worry about signed integer overflow, which is undefined behavior before you have system resource issues.
Related
This question already has answers here:
When does a constexpr function get evaluated at compile time?
(2 answers)
Closed 2 years ago.
This code, when compiled with g++ -O3, does not seem to evaluate get_fibonacci(50) at compile time - as it runs for a very long time.
#include <iostream>
constexpr long long get_fibonacci(int num){
if(num == 1 || num == 2){return 1;}
return get_fibonacci(num - 1) + get_fibonacci(num - 2);
}
int main()
{
std::cout << get_fibonacci(50) << std::endl;
}
Replacing the code with
#include <iostream>
constexpr long long get_fibonacci(int num){
if(num == 1 || num == 2){return 1;}
return get_fibonacci(num - 1) + get_fibonacci(num - 2);
}
int main()
{
long long num = get_fibonacci(50);
std::cout << num << std::endl;
}
worked perfectly fine. I don't know exactly why this is occurring, but my guess is that get_fibonacci(50) is not evaluated at compile-time in the first scenario because items given std::cout are evaluated at runtime. Is my reasoning correct, or is something else happening? Can somebody please point me in the right direction?
Actually, both versions of your code do not have the Fibonnaci number computed at compile-time, with typical compilers and compilation flags. But, interestingly enough, if you reduce the 50 to be, say, 30, both versions of your program do have the compile-time evaluation.
Proof: GodBolt
At the link, your first program is compiled and run first with 50 as the argument to get_fibbonacci(), then with 30, using GCC 10.2 and clang 11.0.
What you're seeing is the limits of the compiler's willingness to evaluate code at compile-time. Both compilers engage in the recursive evaluation at compile time - until a certain depth, or certain evaluation time cap, has elapsed. They then give up and leave it for run-time evaluation.
I don't know exactly why this is occurring, but my guess is that get_fibonacci(50) is not evaluated at compile-time in the first scenario because items given std::cout are evaluated at runtime
Your function can be computed compile-time, because receive a compile-time know value (50), but can also computed run-time, because the returned value is send to standard output so it's used run-time.
It's a gray area where the compiler can choose both solutions.
To impose (ignoring the as-if rule) the compile-time computation, you can place the returned value in a place where the value is required compile-time.
For example, in a template parameter, in your first example
std::cout << std::integral_constant<long long, get_fibonacci(50)>::value
<< std::endl;
or in a constexpr variable, in your second example
constexpr long long num = get_fibonacci(50);
But remember there is the "as-if rule", so the compiler (in this case, also using constexpr or std::integral_constant) can select the run-time solution because this "do not change the observable behavior of the program".
Assign to a constexpr to get the compiler to spit out an error message
constexpr auto val = get_fibonacci(50);
constexpr functions are evaluated at compile time only in constexpr context, which includes assignment to constexpr variables, template parameter, array size...
Regular function/operator call is not such context.
std::cout << get_fibonacci(50);
is done at runtime.
Now, compiler might optimize any (constexpr or not, inline or not) functions with the as-if rule, resulting in a constant, a simpler loop, ...
Wondering if the following surprises anyone, as it did me? Alex Allain's article here on using constexpr shows the following factorial example:
constexpr factorial (int n)
{
return n > 0 ? n * factorial( n - 1 ) : 1;
}
And states:
Now you can use factorial(2) and when the compiler sees it, it can
optimize away the call and make the calculation entirely at compile
time.
I tried this in VS2015 in Release mode with full optimizations on (/Ox) and stepped through the code in the debugger viewing the assembly and saw that the factorial calculation was not done at compilation.
Using GCC v5.4.0 with --std=C++14, I must use /O2 or /O3 before the calculation is performed at compile time. I was surprised thought that using just /O the calculation did not occur at compilation time.
Main main question is: Why is VS2015 not performing this calculation at compilation time?
It depends on the context of the function call.
For example, the following obviously could never be calculated at compile time:
int x;
std::cin >> x;
std::cout << factorial(x);
On the other hand, this context would require the answer at compile time:
class Foo {
int x[factorial(4)];
};
constexpr functions are only guaranteed to be evaluated at compile time if they are called from a constexpr context; otherwise it is up to the compiler to choose whether or not to eval at compile time (assuming such an optimization is possible, again, depending on the context).
You have to use it in const expression, as:
constexpr auto res = factorial(2);
else computation can be done at runtime.
constexpr is neither necessary nor sufficient to compile time evaluation of a function.
It's not sufficient, even aside from the fact that the arguments obviously also have to be constant expressions. Even if that is true, a conforming compiler does not have to evaluate it at compile time. It only has to be evaluated at compile time if it is in a constexpr context. Such as, assigning the result of the computation to a constexpr variable, or using the value as an array size, or as a non-type template parameter.
The other point, is that the compiler is completely capable of evaluating things at compile time, even without constexpr. There is a lot of confusion about this, and it's not clear why. compile time evaluation of constexpr functions fundamentally just boils down to constant propagation, and compilers have been doing this optimization since forever: https://godbolt.org/g/Sy214U.
int factorial(int n) {
if (n <= 1) return 1;
return n * factorial(n-1);
}
int foo() { return factorial(5); }
On gcc 6.3 with O3 (and 14) yields:
foo():
mov eax, 120
ret
In essence, outside of the specific case where you absolutely force compile time evaluation by assigning a constexpr function to another constexpr variable, compile time evaluation has more to do with the quality of your optimizer than the standard.
I found this example at cppreference.com, and it seems to be the defacto example used through-out StackOverflow:
template<int N>
struct S {
int a[N];
};
Surely, non-type templatization has more value than this example. What other optimizations does this syntax enable? Why was it created?
I am curious, because I have code that is dependent on the version of a separate library that is installed. I am working in an embedded environment, so optimization is important, but I would like to have readable code as well. That being said, I would like to use this style of templating to handle version differences (examples below). First, am I thinking of this correctly, and MOST IMPORTANTLY does it provide a benefit or drawback over using a #ifdef statement?
Attempt 1:
template<int VERSION = 500>
void print (char *s);
template<int VERSION>
void print (char *s) {
std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
}
template<>
void print<500> (char *s) {
// print using 500 syntax
}
template<>
void print<600> (char *s) {
// print using 600 syntax
}
OR - Since the template is constant at compile time, could a compiler consider the other branches of the if statement dead code using syntax similar to:
Attempt 2:
template<int VERSION = 500>
void print (char *s) {
if (VERSION == 500) {
// print using 500 syntax
} else if (VERSION == 600) {
// print using 600 syntax
} else {
std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
}
}
Would either attempt produce output comparable in size to this?
void print (char *s) {
#if defined(500)
// print using 500 syntax
#elif defined(600)
// print using 600 syntax
#else
std::cout << "ERROR! Unsupported version: " << VERSION << "!" << std::endl;
#endif
}
If you can't tell I'm somewhat mystified by all this, and the deeper the explanation the better as far as I'm concerned.
Compilers find dead code elimination easy. That is the case where you have a chain of ifs depending (only) on a template parameter's value or type. All branches must contain valid code, but when compiled and optimized the dead branches evaporate.
A classic example is a per pixel operation written with template parameters that control details of code flow. The body can be full of branches, yet the compiled output branchless.
Similar techniques can be used to unroll loops (say scanline loops). Care must be taken to understand the code size multiplication that can result: especially if your compiler lacks ICF (aka comdat folding) such as the gold gcc linker and msvc (among others) have.
Fancier things can also be done, like manual jump tables.
You can do pure compile time type checks with no runtime behaviour at alll stuff like dimensional analysis. Or distinguish between points and vectors in n-space.
Enums can be used to name types or switches. Pointers to functions to enable efficient inlining. Pointers to data to allow 'global' state that is mockable, or siloable, or decoupled from implementation. Pointers to strings to allow efficient readable names in code. Lists of integral values for myriads of purposes, like the indexes trick to unpack tuples. Complex operations on static data, like compile time sorting of data in multiple indexes, or checking integrity of static data with complex invariants.
I am sure I missed some.
An obvious optimization is when using an integer, the compiler has a constant rather than a variable:
int foo(size_t); // definition not visible
// vs
template<size_t N>
size_t foo() {return N*N;}
With the template, there's nothing to compute at runtime, and the result may be used as a constant, which can aid other optimizations. You can take this example further by declaring it constexpr, as 5gon12eder mentioned below.
Next example:
int foo(double, size_t); // definition not visible
// vs
template<size_t N>
size_t foo(double p) {
double r(p);
for (size_t i(0) i < N; ++i) {
r *= p;
}
return r;
}
Ok. Now the number of iterations of the loop is known. The loop may be unrolled/optimized accordingly, which can be good for size, speed, and eliminating branches.
Also, basing off your example, std::array<> exists. std::array<> can be much better than std::vector<> in some contexts, because std::vector<> uses heap allocations and non-local memory.
There's also the possibility that some specializations will have different implementations. You can separate those and (potentially) reduce other referenced definitions.
Of course, templates<> can also work against you unnecessarily duplication of your programs.
templates<> also require longer symbol names.
Getting back to your version example: Yes, it's certainly possible that if VERSION is known at compilation, the code which is never executed can be deleted and you may also be able to reduce referenced functions. The primary difference will be that void print (char *s) will have a shorter name than the template (whose symbol name includes all template parameters). For one function, that's counting bytes. For complex programs with many functions and templates, that cost can go up quickly.
There is an enormous range of potential applications of non-typename template parameters. In his book The C++ Programming Language, Stroustrup gives an interesting example that sketches out a type-safe zero-overhead framework for dealing with physical quantities. Basically, the idea is that he writes a template that accepts integers denoting the powers of fundamental physical quantities such as length or mass and then defines arithmetic on them. In the resulting framework, you can add speed with speed or divide distance by time but you cannot add mass to time. Have a look at Boost.Units for an industry-strength implementation of this idea.
For your second question. Any reasonable compiler should be able to produce exactly the same machine code for
#define FOO
#ifdef FOO
do_foo();
#else
do_bar();
#endif
and
#define FOO_P 1
if (FOO_P)
do_foo();
else
do_bar();
except that the second version is much more readable and the compiler can catch errors in both branches simultaneously. Using a template is a third way to generate the same code but I doubt that it will improve readability.
The below code calculates Fibonacci numbers by an exponentially slow algorithm:
#include <cstdlib>
#include <iostream>
#define DEBUG(var) { std::cout << #var << ": " << (var) << std::endl; }
constexpr auto fib(const size_t n) -> long long
{
return n < 2 ? 1: fib(n - 1) + fib(n - 2);
}
int main(int argc, char *argv[])
{
const long long fib91 = fib(91);
DEBUG( fib91 );
DEBUG( fib(45) );
return EXIT_SUCCESS;
}
And I am calculating the 45th Fibonacci number at run-time, and the 91st one at compile time.
The interesting fact is that GCC 4.9 compiles the code and computes fib91 in a fraction of a second, but it takes a while to spit out fib(45).
My question: If GCC is smart enough to optimize fib(91) computation and not to take the exponentially slow path, what stops it to do the same for fib(45)?
Does the above mean GCC produces two compiled versions of fib function where one is fast and the other exponentially slow?
The question is not how the compiler optimizes fib(91) calculation (yes! It does use a sort of memoization), but if it knows how to optimize the fib function, why does it not do the same for fib(45)? And, are there two separate compilations of the fib function? One slow, and the other fast?
GCC is likely memoizing constexpr functions (enabling a Θ(n) computation of fib(n)). That is safe for the compiler to do because constexpr functions are purely functional.
Compare the Θ(n) "compiler algorithm" (using memoization) to your Θ(φn) run time algorithm (where φ is the golden ratio) and suddenly it makes perfect sense that the compiler is so much faster.
From the constexpr page on cppreference (emphasis added):
The constexpr specifier declares that it is possible to evaluate the value of the function or variable at compile time.
The constexpr specifier does not declare that it is required to evaluate the value of the function or variable at compile time. So one can only guess what heuristics GCC is using to choose whether to evaluate at compile time or run time when a compile time computation is not required by language rules. It can choose either, on a case-by-case basis, and still be correct.
If you want to force the compiler to evaluate your constexpr function at compile time, here's a simple trick that will do it.
constexpr auto compute_fib(const size_t n) -> long long
{
return n < 2 ? n : compute_fib(n - 1) + compute_fib(n - 2);
}
template <std::size_t N>
struct fib
{
static_assert(N >= 0, "N must be nonnegative.");
static const long long value = compute_fib(N);
};
In the rest of your code you can then access fib<45>::value or fib<91>::value with the guarantee that they'll be evaluated at compile time.
At compile-time the compiler can memoize the result of the function. This is safe, because the function is a constexpr and hence will always return the same result of the same inputs.
At run-time it could in theory do the same. However most C++ programmers would frown at optimization passes that result in hidden memory allocations.
When you ask for fib(91) to give a value to your const fib91 in the source code, the compiler is forced to compute that value from you const expr. It does not compile the function (as you seem to think), just it sees that to compute fib91 it needs fib(90) and fib(89), to compute the it needs fib(87)... so on until he computes fib(1) which is given. This is an $O(n)$ algorithm and the result is computed fast enough.
However when you ask to evaluate fib(45) in runtime the compiler has to choose wether using the actual function call or precompute the result. Eventually it decides to use the compiled function. Now, the compiled function must execute exactly the exponential algorithm that you have decided there is no way the compiler could implement memoization to optimize a recursive function (think about the need to allocate some cache and to understand how many values to keep and how to manage them between function calls).
Could you give an example where static_assert(...) ('C++11') would solve the problem in hand elegantly?
I am familiar with run-time assert(...). When should I prefer static_assert(...) over regular assert(...)?
Also, in boost there is something called BOOST_STATIC_ASSERT, is it the same as static_assert(...)?
Static assert is used to make assertions at compile time. When the static assertion fails, the program simply doesn't compile. This is useful in different situations, like, for example, if you implement some functionality by code that critically depends on unsigned int object having exactly 32 bits. You can put a static assert like this
static_assert(sizeof(unsigned int) * CHAR_BIT == 32);
in your code. On another platform, with differently sized unsigned int type the compilation will fail, thus drawing attention of the developer to the problematic portion of the code and advising them to re-implement or re-inspect it.
For another example, you might want to pass some integral value as a void * pointer to a function (a hack, but useful at times) and you want to make sure that the integral value will fit into the pointer
int i;
static_assert(sizeof(void *) >= sizeof i);
foo((void *) i);
You might want to asset that char type is signed
static_assert(CHAR_MIN < 0);
or that integral division with negative values rounds towards zero
static_assert(-5 / 2 == -2);
And so on.
Run-time assertions in many cases can be used instead of static assertions, but run-time assertions only work at run-time and only when control passes over the assertion. For this reason a failing run-time assertion may lay dormant, undetected for extended periods of time.
Of course, the expression in static assertion has to be a compile-time constant. It can't be a run-time value. For run-time values you have no other choice but use the ordinary assert.
Off the top of my head...
#include "SomeLibrary.h"
static_assert(SomeLibrary::Version > 2,
"Old versions of SomeLibrary are missing the foo functionality. Cannot proceed!");
class UsingSomeLibrary {
// ...
};
Assuming that SomeLibrary::Version is declared as a static const, rather than being #defined (as one would expect in a C++ library).
Contrast with having to actually compile SomeLibrary and your code, link everything, and run the executable only then to find out that you spent 30 minutes compiling an incompatible version of SomeLibrary.
#Arak, in response to your comment: yes, you can have static_assert just sitting out wherever, from the look of it:
class Foo
{
public:
static const int bar = 3;
};
static_assert(Foo::bar > 4, "Foo::bar is too small :(");
int main()
{
return Foo::bar;
}
$ g++ --std=c++0x a.cpp
a.cpp:7: error: static assertion failed: "Foo::bar is too small :("
I use it to ensure my assumptions about compiler behaviour, headers, libs and even my own code are correct. For example here I verify that the struct has been correctly packed to the expected size.
struct LogicalBlockAddress
{
#pragma pack(push, 1)
Uint32 logicalBlockNumber;
Uint16 partitionReferenceNumber;
#pragma pack(pop)
};
BOOST_STATIC_ASSERT(sizeof(LogicalBlockAddress) == 6);
In a class wrapping stdio.h's fseek(), I have taken some shortcuts with enum Origin and check that those shortcuts align with the constants defined by stdio.h
uint64_t BasicFile::seek(int64_t offset, enum Origin origin)
{
BOOST_STATIC_ASSERT(SEEK_SET == Origin::SET);
You should prefer static_assert over assert when the behaviour is defined at compile time, and not at runtime, such as the examples I've given above. An example where this is not the case would include parameter and return code checking.
BOOST_STATIC_ASSERT is a pre-C++0x macro that generates illegal code if the condition is not satisfied. The intentions are the same, albeit static_assert is standardised and may provide better compiler diagnostics.
BOOST_STATIC_ASSERT is a cross platform wrapper for static_assert functionality.
Currently I am using static_assert in order to enforce "Concepts" on a class.
example:
template <typename T, typename U>
struct Type
{
BOOST_STATIC_ASSERT(boost::is_base_of<T, Interface>::value);
BOOST_STATIC_ASSERT(std::numeric_limits<U>::is_integer);
/* ... more code ... */
};
This will cause a compile time error if any of the above conditions are not met.
One use of static_assert might be to ensure that a structure (that is an interface with the outside world, such as a network or file) is exactly the size that you expect. This would catch cases where somebody adds or modifies a member from the structure without realising the consequences. The static_assert would pick it up and alert the user.
In absence of concepts one can use static_assert for simple and readable compile-time type checking, for example, in templates:
template <class T>
void MyFunc(T value)
{
static_assert(std::is_base_of<MyBase, T>::value,
"T must be derived from MyBase");
// ...
}
This doesn't directly answers the original question, but makes an interesting study into how to enforce these compile time checks prior to C++11.
Chapter 2 (Section 2.1) of Modern C++ Design by Andrei Alexanderscu implements this idea of Compile-time assertions like this
template<int> struct CompileTimeError;
template<> struct CompileTimeError<true> {};
#define STATIC_CHECK(expr, msg) \
{ CompileTimeError<((expr) != 0)> ERROR_##msg; (void)ERROR_##msg; }
Compare the macro STATIC_CHECK() and static_assert()
STATIC_CHECK(0, COMPILATION_FAILED);
static_assert(0, "compilation failed");
To add on to all the other answers, it can also be useful when using non-type template parameters.
Consider the following example.
Let's say you want to define some kind of function whose particular functionality can be somewhat determined at compile time, such as a trivial function below, which returns a random integer in the range determined at compile time. You want to check, however, that the minimum value in the range is less than the maximum value.
Without static_assert, you could do something like this:
#include <iostream>
#include <random>
template <int min, int max>
int get_number() {
if constexpr (min >= max) {
throw std::invalid_argument("Min. val. must be less than max. val.\n");
}
srand(time(nullptr));
static std::uniform_int_distribution<int> dist{min, max};
std::mt19937 mt{(unsigned int) rand()};
return dist(mt);
}
If min < max, all is fine and the if constexpr branch gets rejected at compile time. However, if min >= max, the program still compiles, but now you have a function that, when called, will throw an exception with 100% certainty. Thus, in the latter case, even though the "error" (of min being greater than or equal to max) was present at compile-time, it will only be discovered at run-time.
This is where static_assert comes in.
Since static_assert is evaluated at compile-time, if the boolean constant expression it is testing is evaluated to be false, a compile-time error will be generated, and the program will not compile.
Thus, the above function can be improved as so:
#include <iostream>
#include <random>
template <int min, int max>
int get_number() {
static_assert(min < max, "Min. value must be less than max. value.\n");
srand(time(nullptr));
static std::uniform_int_distribution<int> dist{min, max};
std::mt19937 mt{(unsigned int) rand()};
return dist(mt);
}
Now, if the function template is instantiated with a value for min that is equal to or greater than max, then static_assert will evaluate its boolean constant expression to be false, and will throw a compile-time error, thus alerting you to the error immediately, without giving the opportunity for an exception at runtime.
(Note: the above method is just an example and should not be used for generating random numbers, as repeated calls in quick succession to the function will generate the same numbers due to the seed value passed to the std::mt19937 constructor through rand() being the same (due to time(nullptr) returning the same value) - also, the range of values generated by std::uniform_int_distribution is actually a closed interval, so the same value can be passed to its constructor for upper and lower bounds (though there wouldn't be any point))
The static_assert can be used to forbid the use of the delete keyword this way:
#define delete static_assert(0, "The keyword \"delete\" is forbidden.");
Every modern C++ developer may want to do that if he or she wants to use a conservative garbage collector by using only classes and structs that overload the operator new to invoke a function that allocates memory on the conservative heap of the conservative garbage collector that can be initialized and instantiated by invoking some function that does this in the beginning of the main function.
For example every modern C++ developer that wants to use the Boehm-Demers-Weiser conservative garbage collector will in the beginning of the main function write:
GC_init();
And in every class and struct overload the operator new this way:
void* operator new(size_t size)
{
return GC_malloc(size);
}
And now that the operator delete is not needed anymore, because the Boehm-Demers-Weiser conservative garbage collector is responsible to both free and deallocate every block of memory when it is not needed anymore, the developer wants to forbid the delete keyword.
One way is overloading the delete operator this way:
void operator delete(void* ptr)
{
assert(0);
}
But this is not recommended, because the modern C++ developer will know that he/she mistakenly invoked the delete operator on run time, but this is better to know this soon on compile time.
So the best solution to this scenario in my opinion is to use the static_assert as shown in the beginning of this answer.
Of course that this can also be done with BOOST_STATIC_ASSERT, but I think that static_assert is better and should be preferred more always.