If I have a variable inside a function (say, a large array), does it make sense to declare it both static and constexpr? constexpr guarantees that the array is created at compile time, so would the static be useless?
void f() {
static constexpr int x [] = {
// a few thousand elements
};
// do something with the array
}
Is the static actually doing anything there in terms of generated code or semantics?
The short answer is that not only is static useful, it is pretty well always going to be desired.
First, note that static and constexpr are completely independent of each other. static defines the object's lifetime during execution; constexpr specifies that the object should be available during compilation. Compilation and execution are disjoint and discontiguous, both in time and space. So once the program is compiled, constexpr is no longer relevant.
Every variable declared constexpr is implicitly const but const and static are almost orthogonal (except for the interaction with static const integers.)
The C++ object model (§1.9) requires that all objects other than bit-fields occupy at least one byte of memory and have addresses; furthermore all such objects observable in a program at a given moment must have distinct addresses (paragraph 6). This does not quite require the compiler to create a new array on the stack for every invocation of a function with a local non-static const array, because the compiler could take refuge in the as-if principle provided it can prove that no other such object can be observed.
That's not going to be easy to prove, unfortunately, unless the function is trivial (for example, it does not call any other function whose body is not visible within the translation unit) because arrays, more or less by definition, are addresses. So in most cases, the non-static const(expr) array will have to be recreated on the stack at every invocation, which defeats the point of being able to compute it at compile time.
On the other hand, a local static const object is shared by all observers, and furthermore may be initialized even if the function it is defined in is never called. So none of the above applies, and a compiler is free not only to generate only a single instance of it; it is free to generate a single instance of it in read-only storage.
So you should definitely use static constexpr in your example.
However, there is one case where you wouldn't want to use static constexpr. Unless a constexpr declared object is either ODR-used or declared static, the compiler is free to not include it at all. That's pretty useful, because it allows the use of compile-time temporary constexpr arrays without polluting the compiled program with unnecessary bytes. In that case, you would clearly not want to use static, since static is likely to force the object to exist at runtime.
In addition to given answer, it's worth noting that compiler is not required to initialize constexpr variable at compile time, knowing that the difference between constexpr and static constexpr is that to use static constexpr you ensure the variable is initialized only once.
Following code demonstrates how constexpr variable is initialized multiple times (with same value though), while static constexpr is surely initialized only once.
In addition the code compares the advantage of constexpr against const in combination with static.
#include <iostream>
#include <string>
#include <cassert>
#include <sstream>
const short const_short = 0;
constexpr short constexpr_short = 0;
// print only last 3 address value numbers
const short addr_offset = 3;
// This function will print name, value and address for given parameter
void print_properties(std::string ref_name, const short* param, short offset)
{
// determine initial size of strings
std::string title = "value \\ address of ";
const size_t ref_size = ref_name.size();
const size_t title_size = title.size();
assert(title_size > ref_size);
// create title (resize)
title.append(ref_name);
title.append(" is ");
title.append(title_size - ref_size, ' ');
// extract last 'offset' values from address
std::stringstream addr;
addr << param;
const std::string addr_str = addr.str();
const size_t addr_size = addr_str.size();
assert(addr_size - offset > 0);
// print title / ref value / address at offset
std::cout << title << *param << " " << addr_str.substr(addr_size - offset) << std::endl;
}
// here we test initialization of const variable (runtime)
void const_value(const short counter)
{
static short temp = const_short;
const short const_var = ++temp;
print_properties("const", &const_var, addr_offset);
if (counter)
const_value(counter - 1);
}
// here we test initialization of static variable (runtime)
void static_value(const short counter)
{
static short temp = const_short;
static short static_var = ++temp;
print_properties("static", &static_var, addr_offset);
if (counter)
static_value(counter - 1);
}
// here we test initialization of static const variable (runtime)
void static_const_value(const short counter)
{
static short temp = const_short;
static const short static_var = ++temp;
print_properties("static const", &static_var, addr_offset);
if (counter)
static_const_value(counter - 1);
}
// here we test initialization of constexpr variable (compile time)
void constexpr_value(const short counter)
{
constexpr short constexpr_var = constexpr_short;
print_properties("constexpr", &constexpr_var, addr_offset);
if (counter)
constexpr_value(counter - 1);
}
// here we test initialization of static constexpr variable (compile time)
void static_constexpr_value(const short counter)
{
static constexpr short static_constexpr_var = constexpr_short;
print_properties("static constexpr", &static_constexpr_var, addr_offset);
if (counter)
static_constexpr_value(counter - 1);
}
// final test call this method from main()
void test_static_const()
{
constexpr short counter = 2;
const_value(counter);
std::cout << std::endl;
static_value(counter);
std::cout << std::endl;
static_const_value(counter);
std::cout << std::endl;
constexpr_value(counter);
std::cout << std::endl;
static_constexpr_value(counter);
std::cout << std::endl;
}
Possible program output:
value \ address of const is 1 564
value \ address of const is 2 3D4
value \ address of const is 3 244
value \ address of static is 1 C58
value \ address of static is 1 C58
value \ address of static is 1 C58
value \ address of static const is 1 C64
value \ address of static const is 1 C64
value \ address of static const is 1 C64
value \ address of constexpr is 0 564
value \ address of constexpr is 0 3D4
value \ address of constexpr is 0 244
value \ address of static constexpr is 0 EA0
value \ address of static constexpr is 0 EA0
value \ address of static constexpr is 0 EA0
As you can see yourself constexpr is initilized multiple times (address is not the same) while static keyword ensures that initialization is performed only once.
Not making large arrays static, even when they're constexpr can have dramatic performance impact and can lead to many missed optimizations. It may slow down your code by orders of magnitude. Your variables are still local and the compiler may decide to initialize them at runtime instead of storing them as data in the executable.
Consider the following example:
template <int N>
void foo();
void bar(int n)
{
// array of four function pointers to void(void)
constexpr void(*table[])(void) {
&foo<0>,
&foo<1>,
&foo<2>,
&foo<3>
};
// look up function pointer and call it
table[n]();
}
You probably expect gcc-10 -O3 to compile bar() to a jmp to an address which it fetches from a table, but that is not what happens:
bar(int):
mov eax, OFFSET FLAT:_Z3fooILi0EEvv
movsx rdi, edi
movq xmm0, rax
mov eax, OFFSET FLAT:_Z3fooILi2EEvv
movhps xmm0, QWORD PTR .LC0[rip]
movaps XMMWORD PTR [rsp-40], xmm0
movq xmm0, rax
movhps xmm0, QWORD PTR .LC1[rip]
movaps XMMWORD PTR [rsp-24], xmm0
jmp [QWORD PTR [rsp-40+rdi*8]]
.LC0:
.quad void foo<1>()
.LC1:
.quad void foo<3>()
This is because GCC decides not to store table in the executable's data section, but instead initializes a local variable with its contents every time the function runs. In fact, if we remove constexpr here, the compiled binary is 100% identical.
This can easily be 10x slower than the following code:
template <int N>
void foo();
void bar(int n)
{
static constexpr void(*table[])(void) {
&foo<0>,
&foo<1>,
&foo<2>,
&foo<3>
};
table[n]();
}
Our only change is that we have made table static, but the impact is enormous:
bar(int):
movsx rdi, edi
jmp [QWORD PTR bar(int)::table[0+rdi*8]]
bar(int)::table:
.quad void foo<0>()
.quad void foo<1>()
.quad void foo<2>()
.quad void foo<3>()
In conclusion, never make your lookup tables local variables, even if they're constexpr. Clang actually optimizes such lookup tables well, but other compilers don't. See Compiler Explorer for a live example.
Related
What assurances do I have that a core constant expression (as in [expr.const].2) possibly containing constexpr function calls will actually be evaluated at compile time and on which conditions does this depend?
The introduction of constexpr implicitly promises runtime performance improvements by moving computations into the translation stage (compile time).
However, the standard does not (and presumably cannot) mandate what code a compiler produces. (See [expr.const] and [dcl.constexpr]).
These two points appear to be at odds with each other.
Under which circumstances can one rely on the compiler resolving a core constant expression (which might contain an arbitrarily complicated computation) at compile time rather than deferring it to runtime?
At least under -O0 gcc appears to actually emit code and call for a constexpr function. Under -O1 and up it doesn't.
Do we have to resort to trickery such as this, that forces the constexpr through the template system:
template <auto V>
struct compile_time_h { static constexpr auto value = V; };
template <auto V>
inline constexpr auto compile_time = compile_time_h<V>::value;
constexpr int f(int x) { return x; }
int main() {
for (int x = 0; x < compile_time<f(42)>; ++x) {}
}
When a constexpr function is called and the output is assigned to a constexpr variable, it will always be run at compiletime.
Here's a minimal example:
// Compile with -std=c++14 or later
constexpr int fib(int n) {
int f0 = 0;
int f1 = 1;
for(int i = 0; i < n; i++) {
int hold = f0 + f1;
f0 = f1;
f1 = hold;
}
return f0;
}
int main() {
constexpr int blarg = fib(10);
return blarg;
}
When compiled at -O0, gcc outputs the following assembly for main:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 55
mov eax, 55
pop rbp
ret
Despite all optimization being turned off, there's never any call to fib in the main function itself.
This applies going all the way back to C++11, however in C++11 the fib function would have to be re-written to use conversion to avoid the use of mutable variables.
Why does the compiler include the assembly for fib in the executable sometimes? A constexpr function can be used at runtime, and when invoked at runtime it will behave like a regular function.
Used properly, constexpr can provide some performance benefits in specific cases, but the push to make everything constexpr is more about writing code that the compiler can check for Undefined Behavior.
What's an example of constexpr providing performance benefits? When implementing a function like std::visit, you need to create a lookup table of function pointers. Creating the lookup table every time std::visit is called would be costly, and assigning the lookup table to a static local variable would still result in measurable overhead because the program has to check if that variable's been initialized every time the function is run.
Thankfully, you can make the lookup table constexpr, and the compiler will actually inline the lookup table into the assembly code for the function so that the contents of the lookup table is significantly more likely to be inside the instruction cache when std::visit is run.
Does C++20 provide any mechanisms for guaranteeing that something runs at compiletime?
If a function is consteval, then the standard specifies that every call to the function must produce a compile-time constant.
This can be trivially used to force the compile-time evaluation of any constexpr function:
template<class T>
consteval T run_at_compiletime(T value) {
return value;
}
Anything given as a parameter to run_at_compiletime must be evaluated at compile-time:
constexpr int fib(int n) {
int f0 = 0;
int f1 = 1;
for(int i = 0; i < n; i++) {
int hold = f0 + f1;
f0 = f1;
f1 = hold;
}
return f0;
}
int main() {
// fib(10) will definitely run at compile time
return run_at_compiletime(fib(10));
}
Never; the C++ standard permits almost the entire compilation to occur at "runtime". Some diagnostics have to be done at compile time, but nothing prevents insanity on the part of the compiler.
Your binary could be a copy of the compiler with your source code appended, and C++ wouldn't say the compiler did anything wrong.
What you are looking at is a QoI - Quality of Implrmentation - issue.
In practice, constexpr variables tend to be compile time computed, and template parameters are always compile time computed.
consteval can also be used to markup functions.
I tried to create a vector with a reserve capacity at compile-time.
C++ seems not to be smart to prevent me from doing this
// global scope
constexpr auto vec = []() -> std::vector<uint16_t> {
auto vec = std::vector<uint16_t>();
vec.reserve(9);
return vec;
}();
int main() {
...
}
error: the type 'const std::vector' of 'constexpr' variable 'vec' is not literal
Thank you!
constexpr tells that the function can be evaluated at compile time. The returned object cannot change after that. So qualifying a vector constexpr kills it's purpose. It is supposed to be dynamic structure. If you constexpr it you could use simple array instead.
It's seems you don't want to have a constexpr vector, but one with preallocated space. You should check your compiler for this. E.g. new gcc is smart enough to merge reserve into initial allocation, at least for int type:
The following with -O2 (https://godbolt.org/z/BKE28g):
#include <vector>
std::vector<int> f() {
std::vector<int> v;
v.reserve(11);
}
Produces:
f():
mov edi, 44
sub rsp, 8
call operator new(unsigned long)
mov rdi, rax
call operator delete(void*)
44 is no coincidence. It's 11*sizeof(int). You can throw in in a return statement which complicates the output but there is still one operator new() with appropriate size.
I have used constexpr to calculate hash codes in compile times. Code compiles correctly, runs correctly. But I dont know, if hash values are compile time or run time. If I trace code in runtime, I dont do into constexpr functions. But, those are not traced even for runtime values (calculate hash for runtime generated string - same methods).
I have tried to look into dissassembly, but I quite dont understand it
For debug purposes, my hash code is only string length, using this:
constexpr inline size_t StringLengthCExpr(const char * const str) noexcept
{
return (*str == 0) ? 0 : StringLengthCExpr(str + 1) + 1;
};
I have ID class created like this
class StringID
{
public:
constexpr StringID(const char * key);
private:
const unsigned int hashID;
}
constexpr inline StringID::StringID(const char * key)
: hashID(StringLengthCExpr(key))
{
}
If I do this in program main method
StringID id("hello world");
I got this disassembled code (part of it - there is a lot of more from inlined methods and other stuff in main)
;;; StringID id("hello world");
lea eax, DWORD PTR [-76+ebp]
lea edx, DWORD PTR [id.14876.0]
mov edi, eax
mov esi, edx
mov ecx, 4
mov eax, ecx
shr ecx, 2
rep movsd
mov ecx, eax
and ecx, 3
rep movsb
// another code
How can I tell from this, that "hash value" is a compile time. I don´t see any constant like 11 moved to register. I am not quite good with ASM, so maybe it is correct, but I am not sure what to check or how to be sure, that "hash code" values are compile time and not computed in runtime from this code.
(I am using Visual Studio 2013 + Intel C++ 15 Compiler - VS Compiler is not supporting constexpr)
Edit:
If I change my code and do this
const int ix = StringLengthCExpr("hello world");
mov DWORD PTR [-24+ebp], 11 ;55.15
I have got the correct result
Even with this
change private hashID to public
StringID id("hello world");
// mov DWORD PTR [-24+ebp], 11 ;55.15
printf("%i", id.hashID);
// some other ASM code
But If I use private hashID and add Getter
inline uint32 GetHashID() const { return this->hashID; };
to ID class, then I got
StringID id("hello world");
//see original "wrong" ASM code
printf("%i", id.GetHashID());
// some other ASM code
The most convenient way is to use your constexpr in a static_assert statement. The code will not compile when it is not evaluated during compile time and the static_assert expression will give you no overhead during runtime (and no unnecessary generated code like with a template solution).
Example:
static_assert(_StringLength("meow") == 4, "The length should be 4!");
This also checks whether your function is computing the result correctly or not.
If you want to ensure that a constexpr function is evaluated at compile time, use its result in something which requires compile-time evaluation:
template <size_t N>
struct ForceCompileTimeEvaluation { static constexpr size_t value = N; };
constexpr inline StringID::StringID(const char * key)
: hashID(ForceCompileTimeEvaluation<StringLength(key)>::value)
{}
Notice that I've renamed the function to just StringLength. Name which start with an underscore followed by an uppercase letter, or which contain two consecutive underscores, are not legal in user code. They're reserved for the implementation (compiler & standard library).
In the future(c++20) you can use the consteval specifier to declare a function, which must be evaluated at compile time, thus requiring a constant expression context.
The consteval specifier declares a function or function template to be
an immediate function, that is, every call to the function must
(directly or indirectly) produce a compile time constant expression.
An example from cppreference(see consteval):
consteval int sqr(int n) {
return n*n;
}
constexpr int r = sqr(100); // OK
int x = 100;
int r2 = sqr(x); // Error: Call does not produce a constant
consteval int sqrsqr(int n) {
return sqr(sqr(n)); // Not a constant expression at this point, but OK
}
constexpr int dblsqr(int n) {
return 2*sqr(n); // Error: Enclosing function is not consteval and sqr(n) is not a constant
}
There are a few ways to force compile-time evaluation. But these aren't as flexible and easy to setup as what you'd expect when using constexpr. And they don't help you in finding if the compile-time constants are actually been used.
What you'd want for constexpr is to work where you expect it to beneficial. Therefor you try to meet its requirements. But Then you need to test if the code you expect to be generated at compile-time has been generated, and if the users actually consume the generated result or trigger the function at runtime.
I've found two ways to detect if a class or (member)function is using the compile-time or runtime evaluated path.
Using the property of constexpr functions returning true from the noexcept operator (bool noexcept( expression )) if evaluated at compile-time. Since the generated result will be a compile-time constant. This method is quite accessible and usable with Unit-testing.
(Be aware that marking these functions explicitly noexcept will break the test.)
Source: cppreference.com (2017/3/3)
Because the noexcept operator always returns true for a constant expression, it can be used to check if a particular invocation of a constexpr function takes the constant expression branch (...)
(less convenient) Using a debugger: By putting a break-point inside the function marked constexpr. Whenever the break-point isn't triggered, the compiler evaluated result was used. Not the easiest, but possible for incidental checking.
Soure: Microsoft documentation (2017/3/3)
Note: In the Visual Studio debugger, you can tell whether a constexpr function is being evaluated at compile time by putting a breakpoint inside it. If the breakpoint is hit, the function was called at run-time. If not, then the function was called at compile time.
I've found both these methods to be useful while experimenting with constexpr. Although I haven't done any testing with environments outside VS2017. And haven't been able to find an explicit statement supporting this behaviour in the current draft of the standard.
The following trick can help to check if the constexpr function has been evaluated during compile time only:
With gcc you can compile the source file with assembly listing + c sources; given that both the constexpr and its calls are in source file try.cpp
gcc -std=c++11 -O2 -Wa,-a,-ad try.cpp | c++filt >try.lst
If the constexpr function has been evaluated during run time time then you will see the compiled function and a call instruction (call function_name on x86) in the assembly listing try.lst (note that c++filt command has undecorated the linker names)
Interesting that it I always see a call if compiled without optimization (without -O2 or -O3 option).
Simply put it in constexpr variable.
constexpr StringID id("hello world");
constexpr int ix = StringLengthCExpr("hello world");
A constexpr variable is always a real constant expression. If it compiles, it is computed on compile time.
My question related to style and underlying efficiency, if there's a difference at all, for effectively static member variables.
Consider:
class C {
public:
static const int const_m = 13;
static const int const_n = 17;
};
class D {
public:
enum : int { const_m = 13 };
enum : int { const_n = 17 };
};
In both cases I can write (in a main() fcxn):
int main() {
int cm = C::const_m;
int cn = C::const_n;
int dm = D::const_m;
int dn = D::const_n;
}
so the result is the same, and the coding style looks the same. In class C, the value of const_m will be put in the static section of the compiled code, and const_m will refer to the address of this value. In class D, the enum is part of the memory footprint of the class.
I've called g++ -S on both these classes and looked the trivial main() function above. I've also done this with -O0 and -O3 and I can see no different in the asm code. The key ops that correspond to the c++ code above are:
movl $13, -4(%rbp)
movl $17, -8(%rbp)
movl $13, -12(%rbp)
movl $17, -16(%rbp)
Is there a consideration that I'm missing when electing to use one style or the other?
Thanks in advance, -Jay
Edit
class C {
public:
static constexpr int const_m = 13;
static constexpr int const_n = 17;
};
ensures that const_m is compile-time available.
I don't believe you can use the static const to size an array for example, because they aren't technically required to be compile time constants.
Efficiency should definitely be the same for the two choices here. They are "compile time constants", so the compiler will be able to use the value directly without complication.
As to which is "better" from a style perspective, I think it really depends on what you are trying to achieve and what the meaning of the constants are. Enum's are definitely my choice if you have a number of different constants that are closely related, where the static const int makes more sense if it's just a single, standalone constant (max_size or magic_file_marker_value).
As touched on in the other answer, it's possible to come up with a situation where a static const int something = ...; is not a compile time constant - e.g.
static const time_t seed = time(NULL);
This would not allow you to then use it where a compile-time constant is needed [such as array sizes], because although it's a CONSTANT from many perspectives, it is not a compile time known value.
I've been trying to use 'thunking' so I can use member functions to legacy APIs which expects a C function. I'm trying to use a similar solution to this. This is my thunk structure so far:
struct Thunk
{
byte mov; // ↓
uint value; // mov esp, 'value' <-- replace the return address with 'this' (since this thunk was called with 'call', we can replace the 'pushed' return address with 'this')
byte call; // ↓
int offset; // call 'offset' <-- we want to return here for ESP alignment, so we use call instead of 'jmp'
byte sub; // ↓
byte esp; // ↓
byte num; // sub esp, 4 <-- pop the 'this' pointer from the stack
//perhaps I should use 'ret' here as well/instead?
} __attribute__((packed));
The following code is a test of mine which uses this thunk structure (but it does not yet work):
#include <iostream>
#include <sys/mman.h>
#include <cstdio>
typedef unsigned char byte;
typedef unsigned short ushort;
typedef unsigned int uint;
typedef unsigned long ulong;
#include "thunk.h"
template<typename Target, typename Source>
inline Target brute_cast(const Source s)
{
static_assert(sizeof(Source) == sizeof(Target));
union { Target t; Source s; } u;
u.s = s;
return u.t;
}
void Callback(void (*cb)(int, int))
{
std::cout << "Calling...\n";
cb(34, 71);
std::cout << "Called!\n";
}
struct Test
{
int m_x = 15;
void Hi(int x, int y)
{
printf("X: %d | Y: %d | M: %d\n", x, y, m_x);
}
};
int main(int argc, char * argv[])
{
std::cout << "Begin Execution...\n";
Test test;
Thunk * thunk = static_cast<Thunk*>(mmap(nullptr, sizeof(Thunk),
PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0));
thunk->mov = 0xBC; // mov esp
thunk->value = reinterpret_cast<uint>(&test);
thunk->call = 0xE8; // call
thunk->offset = brute_cast<uint>(&Test::Hi) - reinterpret_cast<uint>(thunk);
thunk->offset -= 10; // Adjust the relative call
thunk->sub = 0x83; // sub
thunk->esp = 0xEC; // esp
thunk->num = 0x04; // 'num'
// Call the function
Callback(reinterpret_cast<void (*)(int, int)>(thunk));
std::cout << "End execution\n";
}
If I use that code; I receive a segmentation fault within the Test::Hi function. The reason is obvious (once you analyze the stack in GDB) but I do not know how to fix this. The stack is not aligned properly.
The x argument contains garbage but the y argument contains the this pointer (see the Thunk code). That means the stack is misaligned by 8 bytes, but I still don't know why this is the case. Can anyone tell why this is happening? x and y should contain 34 and 71 respectively.
NOTE: I'm aware of the fact that this is does not work in all scenarios (such as MI and VC++ thiscall convention) but I want to see if I can get this work, since I would benefit from it a lot!
EDIT: Obviously I also know that I can use static functions, but I see this more as a challenge...
Suppose you have a standalone (non-member, or maybe static) cdecl function:
void Hi_cdecl(int x, int y)
{
printf("X: %d | Y: %d | M: %d\n", x, y, m_x);
}
Another function calls it this way:
push 71
push 36
push (return-address)
call (address-of-hi)
add esp, 8 (stack cleanup)
You want to replace this by the following:
push 71
push 36
push this
push (return-address)
call (address-of-hi)
add esp, 4 (cleanup of this from stack)
add esp, 8 (stack cleanup)
For this, you have to read the return-address from the stack, push this, and then, push the return-address. And for the cleanup, add 4 (not subtract) to esp.
Regarding the return address - since the thunk must do some cleanup after the callee returns, it must store the original return-address somewhere, and push the return-address of the cleanup part of the thunk. So, where to store the original return-address?
In a global variable - might be an acceptable hack (since you probably don't need your solution to be reentrant)
On the stack - requires moving the whole block of parameters (using a machine-language equivalent of memmove), whose length is pretty much unknown
Please also note that the resulting stack is not 16-byte-aligned; this can lead to crashes if the function uses certain types (those that require 8-byte and 16-byte alignment - the SSE ones, for example; also maybe double).