C++ constexpr function in return statement - c++

Why is a constexpr function no evaluated at compile time but in runtime in the return statement of main function?
It tried
template<int x>
constexpr int fac() {
return fac<x - 1>() * x;
}
template<>
constexpr int fac<1>() {
return 1;
}
int main() {
const int x = fac<3>();
return x;
}
and the result is
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 6
mov eax, 6
pop rbp
ret
with gcc 8.2. But when I call the function in the return statement
template<int x>
constexpr int fac() {
return fac<x - 1>() * x;
}
template<>
constexpr int fac<1>() {
return 1;
}
int main() {
return fac<3>();
}
I get
int fac<1>():
push rbp
mov rbp, rsp
mov eax, 1
pop rbp
ret
main:
push rbp
mov rbp, rsp
call int fac<3>()
nop
pop rbp
ret
int fac<2>():
push rbp
mov rbp, rsp
call int fac<1>()
add eax, eax
pop rbp
ret
int fac<3>():
push rbp
mov rbp, rsp
call int fac<2>()
mov edx, eax
mov eax, edx
add eax, eax
add eax, edx
pop rbp
ret
Why is the first code evaluated at compile time and the second at runtime?
Also I tried both snippets with clang 7.0.0 and they are evaluated at runtime. Why is this not valid constexpr for clang?
All evaluation was done in godbolt compiler explorer.

A common misconception with regard to constexpr is that it means "this will be evaluated at compile time"1.
It is not. constexpr was introduced to let us write natural code that may produce constant expressions in contexts that need them. It means "this must be evaluatable at compile time", which is what the compiler will check.
So if you wrote a constexpr function returning an int, you can use it to calculate a template argument, an initializer for a constexpr variable (also const if it's an integral type) or an array size. You can use the function to obtain natural, declarative, readable code instead of the old meta-programming tricks one needed to resort to in the past.
But a constexpr function is still a regular function. The constexpr specifier doesn't mean a compiler has2 to optimize it to heck and do constant folding at compile time. It's best not to confuse it for such a hint.
1 - Thanks user463035818 for the phrasing.
2 - c++20 and consteval is a different story however :)

StoryTeller's answer is good, but I think there's a slightly different take possible.
With constexpr, there are three situations to distinguish:
The result is needed in a compile-time context, such as array sizes. In this case, the arguments too must be known at compile time. Evaluation is probably at compile time, and at least all diagnosable errors will be found at compile time.
The arguments are only known at run time, and the result is not needed at compile time. In this case, evaluation necessarily has to happen at run time.
The arguments may be available at compile time, but the result is needed only at run time.
The fourth combination (arguments available only at runtime, result needed at compile time) is an error; the compiler will reject such code.
Now in cases 1 and 3 the calculation could happen at compile time, as all inputs are available. But to facilitate case 2, the compiler must be able to create a run-time version, and it may decide to use this variant in the other cases as well - if it can.
E.g. some compilers internally support variable-sized arrays, so even while the language requires compile-time array bounds, the implementation may decide not to.

Related

Why can compiler not optimize out unused static std::string?

If I compile this code with GCC or Clang and enable -O2 optimizations, I still get some global object initialization. Is it even possible for any code to reach these variables?
#include <string>
static const std::string s = "";
int main() { return 0; }
Compiler output:
main:
xor eax, eax
ret
_GLOBAL__sub_I_main:
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:s
mov edi, OFFSET FLAT:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev
mov QWORD PTR s[rip], OFFSET FLAT:s+16
mov QWORD PTR s[rip+8], 0
mov BYTE PTR s[rip+16], 0
jmp __cxa_atexit
Specifically, I was not expecting the _GLOBAL__sub_I_main: section.
Godbolt link
Edit:
Even with a simple custom defined type, the compiler still generates some code.
class Aloha
{
public:
Aloha () : i(1) {}
~Aloha() = default;
private:
int i;
};
static const Aloha a;
int main() { return 0; }
Compiler output:
main:
xor eax, eax
ret
_GLOBAL__sub_I_main:
ret
Compiling that code with short string optimization (SSO) may be an equivalent of taking address of std::string's member variable. Constructor have to analyze string length at compile time and choose if it can fit into internal storage of std::string object or it have to allocate memory dynamically but then find that it never was read so allocation code can be optimized out.
Lack of optimization in this case might be an optimization flaw limited to such simple outlying examples like this one:
const int i = 3;
int main()
{
return (long long)(&i); // to make sure that address was used
}
GCC generates code:
i:
.long 3 ; this a variable
main:
push rbp
mov rbp, rsp
mov eax, OFFSET FLAT:i
pop rbp
ret
GCC would not optimize this code as well:
const int i = 3;
const int *p = &i;
int main() { return 0; }
Static variables declared in file scope, especially const-qualified ones can be optimized out per as-if rule unless their address was used, GCC does that only to const-qualified ones regardless of use case. Taking address of variable is an observable behaviour, because it can be passed somewhere. Logic which would trace that would be too complex to implement and would be of little practical value.
Of course, the code that doesn't use address
const int i = 3;
int main() { return i; }
results in optimizing out reserved storage:
main:
mov eax, 3
ret
As of C++20 constexpr construction of std::string? Per older rules it could not be a compile-time expression if result was dependant on arguments. It possible that std::string would allocate memory dynamically if string is too long, which isn't a compile-time action. It appears that only mainstream compiler that supports C++20 features required for that it at this moment is MSVC in certain conditions.

Is it beneficial to make library functions templated to avoid compiler instructions?

Let's say I'm creating my own library in namespace l. Would it be beneficial to make as much as possible members of the namespace templated? This would encourage the compiler to only generate instructions for members that are actually called by the user of the library. To make my point clear, I'll demonstrate it here:
1)
namespace l{
template <typename = void>
int f() { return 3; }
}
vs.
2)
namespace l{
int f() { return 3; }
}
It is not being called in main to show the difference.
int main() { return EXIT_SUCCESS; }
function 1) does not require additional instructions for l::f():
main:
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], 0
mov eax, 0
pop rbp
ret
function 2) does require additional instructions for l::f() ( If, again, l::f() is not called):
l::f()
push rbp
mov rbp, rsp
mov eax, 3
pop rbp
ret
main:
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], 0
mov eax, 0
pop rbp
ret
tl;dr
Is it beneficial to make library functions templated to avoid compiler instructions?
No. Emitting dead code isn't the expensive part of compiling. File access, parsing and optimization (not necessarily in that order) take time, and this idea forces library clients to read & parse more code than in the regular header + library model.
Templates are usually blamed for slowing builds, not speeding them up.
It also means you can't build your library ahead of time, so each user needs to compile whichever parts they use from scratch, in every translation unit where they're used.
The total time spent compiling will probably be greater with the templated version. You'd have to profile to be sure (and I suspect this f is so small as to be immeasurable either way) but I have a hard time seeing this as a useful improvement.
Your comparison isn't representative anyway - a good compiler will discard dead code at link time. Some will also be able to inline code from static libraries, so there's no reliable effect on either compile-time or runtime performance.

Macro replace itself using undesired braces

I have problems with macros, since they are being replaced with braces.
Since I will need to compile for different Operative Systems [WINDOWS, OSX, ANDROID, iOS], I'm trying to use typedef for the basic C++ types, to replace them easily and test performances.
Since I'm doing lots of static_cast, I thought I could use a macro to do it only when its need (CPU is critical on my software). So in this way, the static_cast will be only performed when the types are different, instead doing weird things like this:
const int tv = 8;
const int tvc = static_cast<int>(8);
So, if FORCE_USE32 is enabled or not, it would choose the best version for it
So Visual Studio 2017 using default compiler gives me an error when I do some things like this:
#ifndef FORCE_USE32
#define FORCE_USE32 0
#endif
#if FORCE_USE32
typedef int s08;
#define Cs08(v) {v}
#else
typedef char s08;
#define Cs08(v) {static_cast<s08>(v)}
#endif
// this line give me an error because Cs08 is replaced by {static_cast<s08>(1)} instead just static_cast<s08>(1)
std::array<s08, 3> myArray{Cs08(1), 0, 0};
I know I could solve easily creating a variable before I do the array, something like this
const s08 tempVar = Cs08(1);
std::array<s08, 3> myArray{tempVar, 0, 0};
But I do not understand the reason, and I want to keep my code as clean as possible. Is there any way to include the macro inside the array definition?
You are trying to solve a non-problem
const int tvc = static_cast<int>(8);
Will not use any CPU cycles here. How dumb do you think compilers are nowadays? Even with no optimizations the above cast is a no-op (no operation). There won't be any additional instructions generated for the cast.
auto test(int a) -> int
{
return a;
}
auto test_cast(int a) -> int
{
return static_cast<int>(a);
}
With no optimization enabled the two functions generate identical code:
test(int): # #test(int)
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], edi
mov eax, dword ptr [rbp - 4]
pop rbp
ret
test_cast(int): # #test_cast(int)
push rbp
mov rbp, rsp
mov dword ptr [rbp - 4], edi
mov eax, dword ptr [rbp - 4]
pop rbp
ret
With -O3 they get:
test(int): # #test(int)
mov eax, edi
ret
test_cast(int): # #test_cast(int)
mov eax, edi
ret
Coming back to how smart the compilers (actually the optimization algorithms) are, with optimizations enabled a compiler can do crazy crazy things, like loop unrolling, converting a recursive function to an iterative one, removing entire redundant code and on and on and on. What you are doing is premature optimization. If your code is performance critical then you need a decent understanding of assembly, compiler optimizations and system architecture. And then you don't just blindly optimize what you think is slow. You write code for readability first and then you profile.
Answering your macro problem: just remove the {} from the macro:
#define Cs08(v) v
#define Cs08(v) static_cast<s08>(v)

Constexpr variable evaluation

Here is my code and I need clarification on what's happening:
constexpr int funct(int x){
return x + 1;
}
int main(){
int x = funct(10);
return 0;
}
constexpr's allows compile time calculation, and based on my code above, since funct is declared as constexpr, it is allowed to do compile time calculations if the arguments are constants or constexpr's themselves.
The part that I am confused with lies in this part, the int x. Since it is not declared as constexpr, would it mean that int x will get the value at runtime? Does that mean that declaring it to be constexpr int x will mean that int x will get the value at compile time unlike int x ?
It depends on the compiler in question on many counts. What sorts of optimizations that may take place, etc. However, constexpr does not inherently enable compile-time calculations.
Take this code:
#include <cstdio>
constexpr int test(int in)
{
return in + 25;
}
int main(int argc, char* argv[])
{
printf("Test: %u\n", test(5));
printf("Two: %u\n", test(10));
}
Under GCC 4.8.4 on my x86_64 Gentoo box, that actually still makes a call on both counts to the supposedly compile-time "test". The lines I used were
g++ -std=c++11 -Wall -g -c main.cpp -o obj/Debug/main.o
g++ -o bin/Debug/TestProject obj/Debug/main.o
So on the code above, that produces the following machine code:
0x40061c mov edi,0x5
0x400621 call 0x400659 <test(int)>
0x400626 mov esi,eax
0x400628 mov edi,0x4006f4
0x40062d mov eax,0x0
0x400632 call 0x4004f0 <printf#plt>
0x400637 mov edi,0xa
0x40063c call 0x400659 <test(int)>
0x400641 mov esi,eax
0x400643 mov edi,0x4006fd
0x400648 mov eax,0x0
0x40064d call 0x4004f0 <printf#plt>
Where the asm block for "test" is:
0x400659 push rbp
0x40065a mov rbp,rsp
0x40065d mov DWORD PTR [rbp-0x4],edi
0x400660 mov eax,DWORD PTR [rbp-0x4]
0x400663 add eax,0x19
0x400666 pop rbp
0x400667 ret
So, as you can see in that situation, it seems to hold pretty much no effect over how GCC produced that code. Even if you do this, you will still get runtime calculations:
int value = test(30);
printf("Value: %u\n", value);
Where that produces this (the address of test changed slightly due to adding some more code):
0x40061c mov edi,0x1e
0x400621 call 0x40067a <test(int)>
0x400626 mov DWORD PTR [rbp-0x4],eax
0x400629 mov eax,DWORD PTR [rbp-0x4]
0x40062c mov esi,eax
0x40062e mov edi,0x400714
0x400633 mov eax,0x0
0x400638 call 0x4004f0 <printf#plt>
It does, however, produce the expected result if you declare the value itself as constexpr:
constexpr int value = test(30);
printf("Value: %u\n", value);
And the associated code is:
0x400623 mov esi,0x37
0x400628 mov edi,0x400714
0x40062d mov eax,0x0
0x400632 call 0x4004f0 <printf#plt>
So essentially, you're not guaranteed a compile-time calculation if you simply prepend the method declaration with constexpr. You also need to declare your a variable as constexpr and assign to it. Doing such a declaration will actually require the compiler to statically evaluate the result. GCC actually yells at you if you try to do this and "test" isn't declared constexpr:
main.cpp|10|error: call to non-constexpr function ‘int test(int)’|

Register keyword in C++

What is difference between
int x=7;
and
register int x=7;
?
I am using C++.
register is a hint to the compiler, advising it to store that variable in a processor register instead of memory (for example, instead of the stack).
The compiler may or may not follow that hint.
According to Herb Sutter in "Keywords That Aren't (or, Comments by Another Name)":
A register specifier has the same
semantics as an auto specifier...
According to Herb Sutter, register is "exactly as meaningful as whitespace" and has no effect on the semantics of a C++ program.
In C++ as it existed in 2010, any program which is valid that uses the keywords "auto" or "register" will be semantically identical to one with those keywords removed (unless they appear in stringized macros or other similar contexts). In that sense the keywords are useless for properly-compiling programs. On the other hand, the keywords might be useful in certain macro contexts to ensure that improper usage of a macro will cause a compile-time error rather than producing bogus code.
In C++11 and later versions of the language, the auto keyword was re-purposed to act as a pseudo-type for objects which are initialized, which a compiler will automatically replace with the type of the initializing expression. Thus, in C++03, the declaration: auto int i=(unsigned char)5; was equivalent to int i=5; when used within a block context, and auto i=(unsigned char)5; was a constraint violation. In C++11, auto int i=(unsigned char)5; became a constraint violation while auto i=(unsigned char)5; became equivalent to auto unsigned char i=5;.
With today's compilers, probably nothing. Is was orginally a hint to place a variable in a register for faster access, but most compilers today ignore that hint and decide for themselves.
register is deprecated in C++11. It is unused and reserved in C++17.
Source: http://en.cppreference.com/w/cpp/keyword/register
Almost certainly nothing.
register is a hint to the compiler that you plan on using x a lot, and that you think it should be placed in a register.
However, compilers are now far better at determining what values should be placed in registers than the average (or even expert) programmer is, so compilers just ignore the keyword, and do what they wants.
The register keyword was useful for:
Inline assembly.
Expert C/C++ programming.
Cacheable variables declaration.
An example of a productive system, where the register keyword was required:
typedef unsigned long long Out;
volatile Out out,tmp;
Out register rax asm("rax");
asm volatile("rdtsc":"=A"(rax));
out=out*tmp+rax;
It has been deprecated since C++11 and is unused and reserved in C++17.
As of gcc 9.3, compiling using -std=c++2a, register produces a compiler warning, but it still has the desired effect and behaves identically to C's register when compiling without -O1–-Ofast optimisation flags in the respect of this answer. Using clang++-7 causes a compiler error however. So yes, register optimisations only make a difference on standard compilation with no optimisation -O flags, but they're basic optimisations that the compiler would figure out even with -O1.
The only difference is that in C++, you are allowed to take the address of the register variable which means that the optimisation only occurs if you don't take the address of the variable or its aliases (to create a pointer) or take a reference of it in the code (only on - O0, because a reference also has an address, because it's a const pointer on the stack, which, like a pointer can be optimised off the stack if compiling using -Ofast, except they will never appear on the stack using -Ofast, because unlike a pointer, they cannot be made volatile and their addresses cannot be taken), otherwise it will behave like you hadn't used register, and the value will be stored on the stack.
On -O0, another difference is that const register on gcc C and gcc C++ do not behave the same. On gcc C, const register behaves like register, because block-scope consts are not optimised on gcc. On clang C, register does nothing and only const block-scope optimisations apply. On gcc C, register optimisations apply but const at block-scope has no optimisation. On gcc C++, both register and const block-scope optimisations combine.
#include <stdio.h> //yes it's C code on C++
int main(void) {
const register int i = 3;
printf("%d", i);
return 0;
}
int i = 3;:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 3
mov eax, DWORD PTR [rbp-4]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
register int i = 3;:
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
push rbx
sub rsp, 8
mov ebx, 3
mov esi, ebx
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
mov rbx, QWORD PTR [rbp-8] //callee restoration
leave
ret
const int i = 3;
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 3 //still saves to stack
mov esi, 3 //immediate substitution
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
const register int i = 3;
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov esi, 3 //loads straight into esi saving rbx push/pop and extra indirection (because C++ block-scope const is always substituted immediately into the instruction)
mov edi, OFFSET FLAT:.LC0 // can't optimise away because printf only takes const char*
mov eax, 0 //zeroed: https://stackoverflow.com/a/6212755/7194773
call printf
mov eax, 0 //default return value of main is 0
pop rbp //nothing else pushed to stack -- more efficient than leave (rsp == rbp already)
ret
register tells the compiler to 1)store a local variable in a callee saved register, in this case rbx, and 2)optimise out stack writes if address of variable is never taken. const tells the compiler to substitute the value immediately (instead of assigning it a register or loading it from memory) and write the local variable to the stack as default behaviour. const register is the combination of these emboldened optimisations. This is as slimline as it gets.
Also, on gcc C and C++, register on its own seems to create a random 16 byte gap on the stack for the first local on the stack, which doesn't happen with const register.
Compiling using -Ofast however; register has 0 optimisation effect because if it can be put in a register or made immediate, it always will be and if it can't it won't be; const still optimises out the load on C and C++ but at file scope only; volatile still forces the values to be stored and loaded from the stack.
.LC0:
.string "%d"
main:
//optimises out push and change of rbp
sub rsp, 8 //https://stackoverflow.com/a/40344912/7194773
mov esi, 3
mov edi, OFFSET FLAT:.LC0
xor eax, eax //xor 2 bytes vs 5 for mov eax, 0
call printf
xor eax, eax
add rsp, 8
ret
Consider a case when compiler's optimizer has two variables and is forced to spill one onto stack. It so happened that both variables have the same weight to the compiler. Given there is no difference, the compiler will arbitrarily spill one of the variables. On the other hand, the register keyword gives compiler a hint which variable will be accessed more frequently. It is similar to x86 prefetch instruction, but for compiler optimizer.
Obviously register hints are similar to user-provided branch probability hints, and can be inferred from these probability hints. If compiler knows that some branch is taken often, it will keep branch related variables in registers. So I suggest caring more about branch hints, and forgetting about register. Ideally your profiler should communicate somehow with the compiler and spare you from even thinking about such nuances.