Related
Recently I read on IsoCpp about how compiler known size of array created with new. The FAQ describes two ways of implementation, but so basically and without any internal information. I tried to find an implementation of these mechanisms in STL sources from Microsoft and GCC, but as I see, both of them just call the malloc internally. I tried to go deeper and found an implementation of the malloc function in GCC, but I couldn't figure out where the magic happens.
Is it possible to find how this works, or it implemented in system runtime libraries?
Here is where the compiler stores the size in the source code for GCC: https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/gcc/cp/init.c#L3319-L3325
And the equivalent place in the source code for Clang: https://github.com/llvm/llvm-project/blob/c11051a4001c7f89e8655f1776a75110a562a45e/clang/lib/CodeGen/ItaniumCXXABI.cpp#L2183-L2185
What the compilers do is store a "cookie" which is the number of elements allocated (the N in new T[N]) immediately before the pointer that new T[N] returns. This in turn means that a few extra bytes have to be allocated in the call to operator new[]. The compiler generates code to do this at runtime.
operator new[](std::size_t x) itself does no work: It simply allocates x bytes. The compiler makes new T[N] call operator new[](sizeof(T) * N + cookie_size).
The compiler does not "know" the size (it's a run-time value), but it knows how to generate code to retrieve the size on a subsequent delete[] p.
At least for GCC targeting x86_64, it is possible to investigate this question by looking at the assembly GCC generates for this simple program:
#include <iostream>
struct Foo
{
int x, y;
~Foo() { std::cout << "Delete foo " << this << std::endl; }
};
Foo * create()
{
return new Foo[8];
}
void destroy(Foo * p)
{
delete[] p;
}
int main()
{
destroy(create());
}
Using Compiler Explorer, we see this code generated for the create function:
create():
sub rsp, 8
mov edi, 72
call operator new[](unsigned long)
mov QWORD PTR [rax], 8
add rax, 8
add rsp, 8
ret
It looks to me like the compiler is calling operator new[] to allocate 72 bytes of memory, which is 8 bytes more than is needed for the storage of the objects (8 * 8 = 64). Then it is storing the object count (8) at the beginning of this allocation, and adding 8 bytes to the pointer before returning it, so the pointer points to the first object.
This is one of the methods what was listed in the document you linked to:
Over-allocate the array and put n just to the left of the first Fred object.
I searched a little bit in the source code of libstdc++ to see if this was implmented by the standard library or the compiler, and I think it's actually implemented by the compiler itself, though I could be wrong.
I want to use vector<char> as a buffer. The interface is perfect for my needs, but there's a performance penalty when resizing it beyond its current size, since the memory is initialized. I don't need the initialization, since the data will be overwritten in any case by some third-party C functions. Is there a way or a specific allocator to avoid the initialization step? Note that I do want to use resize(), not other tricks like reserve() and capacity(), because I need size() to always represent the significative size of my "buffer" at any moment, while capacity() might be greater than its size after a resize(), so, again, I cannot rely on capacity() as a significative information for my application. Furthemore, the (new) size of the vector is never known in advance, so I cannot use std::array. If vector cannot be configured that way, I'd like to know what kind of container or allocator I could use instead of vector<char, std::alloc>. The only requirement is that the alternative to vector must at most be based on STL or Boost. I have access to C++11.
It is a known issue that initialization can not be turned off even explicitly for std::vector.
People normally implement their own pod_vector<> that does not do any initialization of the elements.
Another way is to create a type which is layout-compatible with char, whose constructor does nothing:
struct NoInitChar
{
char value;
NoInitChar() noexcept {
// do nothing
static_assert(sizeof *this == sizeof value, "invalid size");
static_assert(__alignof *this == __alignof value, "invalid alignment");
}
};
int main() {
std::vector<NoInitChar> v;
v.resize(10); // calls NoInitChar() which does not initialize
// Look ma, no reinterpret_cast<>!
char* beg = &v.front().value;
char* end = beg + v.size();
}
There's nothing in the standard library that meets your requirements, and nothing I know of in boost either.
There are three reasonable options I can think of:
Stick with std::vector for now, leave a comment in the code and come back to it if this ever causes a bottleneck in your application.
Use a custom allocator with empty construct/destroy methods - and hope your optimiser will be smart enough to remove any calls to them.
Create a wrapper around a a dynamically allocated array, implementing only the minimal functionality that you require.
As an alternative solution that works with vectors of different pod types:
template<typename V>
void resize(V& v, size_t newSize)
{
struct vt { typename V::value_type v; vt() {}};
static_assert(sizeof(vt[10]) == sizeof(typename V::value_type[10]), "alignment error");
typedef std::vector<vt, typename std::allocator_traits<typename V::allocator_type>::template rebind_alloc<vt>> V2;
reinterpret_cast<V2&>(v).resize(newSize);
}
And then you can:
std::vector<char> v;
resize(v, 1000); // instead of v.resize(1000);
This is most likely UB, even though it works properly for me for cases where I care more about performance. Difference in generated assembly as produced by clang:
test():
push rbx
mov edi, 1000
call operator new(unsigned long)
mov rbx, rax
mov edx, 1000
mov rdi, rax
xor esi, esi
call memset
mov rdi, rbx
pop rbx
jmp operator delete(void*)
test_noinit():
push rax
mov edi, 1000
call operator new(unsigned long)
mov rdi, rax
pop rax
jmp operator delete(void*)
So to summarize the various solutions found on stackoverflow:
use a special default-init allocator. (https://stackoverflow.com/a/21028912/1984766)
drawback: changes the vector-type to std::vector<char, default_init_allocator<char>> vec;
use a wrapper-struct struct NoInitChar around a char that has an empty constructor and therefore skips the value-initialization (https://stackoverflow.com/a/15220853/1984766)
drawback: changes the vector-type to std::vector<NoInitChar> vec;
temporarily cast the vector<char> to vector<NoInitChar> and resize it (https://stackoverflow.com/a/57053750/1984766)
drawback: does not change the type of the vector but you need to call your_resize_function (vec, x) instead of vec.resize (x).
With this post I wanted to point out that all of these methods need to be optimized by the compiler in order to speed up your program. I can confirm that the initialization of the new chars when resizing is indeed optimized away with every compiler I tested. So everything looks good ...
But --> since method 1 & 2 change the type of the vector, what happens when you use these vectors under more "complex" circumstances.
Consider this example:
#include <time.h>
#include <vector>
#include <string_view>
#include <iostream>
//high precision-timer
double get_time () {
struct timespec timespec;
::clock_gettime (CLOCK_MONOTONIC_RAW, ×pec);
return timespec.tv_sec + timespec.tv_nsec / (1000.0 * 1000.0 * 1000.0);
}
//method 1 --> special allocator
//reformated to make it readable
template <typename T, typename A = std::allocator<T>>
class default_init_allocator : public A {
private:
typedef std::allocator_traits<A> a_t;
public:
template<typename U>
struct rebind {
using other = default_init_allocator<U, typename a_t::template rebind_alloc<U>>;
};
using A::A;
template <typename U>
void construct (U* ptr) noexcept (std::is_nothrow_default_constructible<U>::value) {
::new (static_cast<void*>(ptr)) U;
}
template <typename U, typename...Args>
void construct (U* ptr, Args&&... args) {
a_t::construct (static_cast<A&>(*this), ptr, std::forward<Args>(args)...);
}
};
//method 2 --> wrapper struct
struct NoInitChar {
public:
NoInitChar () noexcept { }
NoInitChar (char c) noexcept : value (c) { }
public:
char value;
};
//some work to waste time
template<typename T>
void do_something (T& vec, std::string_view str) {
vec.push_back ('"');
vec.insert (vec.end (), str.begin (), str.end ());
vec.push_back ('"');
vec.push_back (',');
}
int main (int argc, char** argv) {
double timeBegin = get_time ();
std::vector<char> vec; //normal case
//std::vector<char, default_init_allocator<char>> vec; //method 1
//std::vector<NoInitChar> vec; //method 2
vec.reserve (256 * 1024 * 1024);
for (int i = 0; i < 1024 * 1024; ++i) {
do_something (vec, "foobar1");
do_something (vec, "foobar2");
do_something (vec, "foobar3");
do_something (vec, "foobar4");
do_something (vec, "foobar5");
do_something (vec, "foobar6");
do_something (vec, "foobar7");
do_something (vec, "foobar8");
vec.resize (vec.size () + 64);
}
double timeEnd = get_time ();
std::cout << (timeEnd - timeBegin) * 1000 << "ms" << std::endl;
return 0;
}
You would expect that method 1 & 2 outperform the normal vector with every "recent" compiler since the resize is free and the other operations are the same. Well think again:
g++ 7.5.0 g++ 8.4.0 g++ 9.3.0 clang++ 9.0.0
vector<char> 95ms 134ms 133ms 97ms
method 1 130ms 159ms 166ms 91ms
method 2 166ms 160ms 159ms 89ms
All test-applications are compiled like this and executed 50times taking the lowest measurement:
$(cc) -O3 -flto -std=c++17 sample.cpp
Encapsulate it.
Initialise it to the maximum size (not reserve).
Keep a reference to the iterator representing the end of the real size, as you put it.
Use begin and real end, instead of end, for your algorithms.
It's very rare that you would need to do this; I strongly encourage you to benchmark your situation to be absolutely sure this hack is needed.
Even then, I prefer the NoInitChar solution. (See Maxim's answer)
But if you're sure you would benefit from this, and NoInitChar doesn't work for you, and you're using clang, or gcc, or MSVC as your compiler, consider using folly's routine for this purpose.
See https://github.com/facebook/folly/blob/master/folly/memory/UninitializedMemoryHacks.h
The basic idea is that each of these library implementations has a routine for doing uninitialized resizing; you just need to call it.
While hacky, you can at least console yourself with knowing that facebook's C++ code relies on this hack working properly, so they'll be updating it if new versions of these library implementations require it.
I have used constexpr to calculate hash codes in compile times. Code compiles correctly, runs correctly. But I dont know, if hash values are compile time or run time. If I trace code in runtime, I dont do into constexpr functions. But, those are not traced even for runtime values (calculate hash for runtime generated string - same methods).
I have tried to look into dissassembly, but I quite dont understand it
For debug purposes, my hash code is only string length, using this:
constexpr inline size_t StringLengthCExpr(const char * const str) noexcept
{
return (*str == 0) ? 0 : StringLengthCExpr(str + 1) + 1;
};
I have ID class created like this
class StringID
{
public:
constexpr StringID(const char * key);
private:
const unsigned int hashID;
}
constexpr inline StringID::StringID(const char * key)
: hashID(StringLengthCExpr(key))
{
}
If I do this in program main method
StringID id("hello world");
I got this disassembled code (part of it - there is a lot of more from inlined methods and other stuff in main)
;;; StringID id("hello world");
lea eax, DWORD PTR [-76+ebp]
lea edx, DWORD PTR [id.14876.0]
mov edi, eax
mov esi, edx
mov ecx, 4
mov eax, ecx
shr ecx, 2
rep movsd
mov ecx, eax
and ecx, 3
rep movsb
// another code
How can I tell from this, that "hash value" is a compile time. I don´t see any constant like 11 moved to register. I am not quite good with ASM, so maybe it is correct, but I am not sure what to check or how to be sure, that "hash code" values are compile time and not computed in runtime from this code.
(I am using Visual Studio 2013 + Intel C++ 15 Compiler - VS Compiler is not supporting constexpr)
Edit:
If I change my code and do this
const int ix = StringLengthCExpr("hello world");
mov DWORD PTR [-24+ebp], 11 ;55.15
I have got the correct result
Even with this
change private hashID to public
StringID id("hello world");
// mov DWORD PTR [-24+ebp], 11 ;55.15
printf("%i", id.hashID);
// some other ASM code
But If I use private hashID and add Getter
inline uint32 GetHashID() const { return this->hashID; };
to ID class, then I got
StringID id("hello world");
//see original "wrong" ASM code
printf("%i", id.GetHashID());
// some other ASM code
The most convenient way is to use your constexpr in a static_assert statement. The code will not compile when it is not evaluated during compile time and the static_assert expression will give you no overhead during runtime (and no unnecessary generated code like with a template solution).
Example:
static_assert(_StringLength("meow") == 4, "The length should be 4!");
This also checks whether your function is computing the result correctly or not.
If you want to ensure that a constexpr function is evaluated at compile time, use its result in something which requires compile-time evaluation:
template <size_t N>
struct ForceCompileTimeEvaluation { static constexpr size_t value = N; };
constexpr inline StringID::StringID(const char * key)
: hashID(ForceCompileTimeEvaluation<StringLength(key)>::value)
{}
Notice that I've renamed the function to just StringLength. Name which start with an underscore followed by an uppercase letter, or which contain two consecutive underscores, are not legal in user code. They're reserved for the implementation (compiler & standard library).
In the future(c++20) you can use the consteval specifier to declare a function, which must be evaluated at compile time, thus requiring a constant expression context.
The consteval specifier declares a function or function template to be
an immediate function, that is, every call to the function must
(directly or indirectly) produce a compile time constant expression.
An example from cppreference(see consteval):
consteval int sqr(int n) {
return n*n;
}
constexpr int r = sqr(100); // OK
int x = 100;
int r2 = sqr(x); // Error: Call does not produce a constant
consteval int sqrsqr(int n) {
return sqr(sqr(n)); // Not a constant expression at this point, but OK
}
constexpr int dblsqr(int n) {
return 2*sqr(n); // Error: Enclosing function is not consteval and sqr(n) is not a constant
}
There are a few ways to force compile-time evaluation. But these aren't as flexible and easy to setup as what you'd expect when using constexpr. And they don't help you in finding if the compile-time constants are actually been used.
What you'd want for constexpr is to work where you expect it to beneficial. Therefor you try to meet its requirements. But Then you need to test if the code you expect to be generated at compile-time has been generated, and if the users actually consume the generated result or trigger the function at runtime.
I've found two ways to detect if a class or (member)function is using the compile-time or runtime evaluated path.
Using the property of constexpr functions returning true from the noexcept operator (bool noexcept( expression )) if evaluated at compile-time. Since the generated result will be a compile-time constant. This method is quite accessible and usable with Unit-testing.
(Be aware that marking these functions explicitly noexcept will break the test.)
Source: cppreference.com (2017/3/3)
Because the noexcept operator always returns true for a constant expression, it can be used to check if a particular invocation of a constexpr function takes the constant expression branch (...)
(less convenient) Using a debugger: By putting a break-point inside the function marked constexpr. Whenever the break-point isn't triggered, the compiler evaluated result was used. Not the easiest, but possible for incidental checking.
Soure: Microsoft documentation (2017/3/3)
Note: In the Visual Studio debugger, you can tell whether a constexpr function is being evaluated at compile time by putting a breakpoint inside it. If the breakpoint is hit, the function was called at run-time. If not, then the function was called at compile time.
I've found both these methods to be useful while experimenting with constexpr. Although I haven't done any testing with environments outside VS2017. And haven't been able to find an explicit statement supporting this behaviour in the current draft of the standard.
The following trick can help to check if the constexpr function has been evaluated during compile time only:
With gcc you can compile the source file with assembly listing + c sources; given that both the constexpr and its calls are in source file try.cpp
gcc -std=c++11 -O2 -Wa,-a,-ad try.cpp | c++filt >try.lst
If the constexpr function has been evaluated during run time time then you will see the compiled function and a call instruction (call function_name on x86) in the assembly listing try.lst (note that c++filt command has undecorated the linker names)
Interesting that it I always see a call if compiled without optimization (without -O2 or -O3 option).
Simply put it in constexpr variable.
constexpr StringID id("hello world");
constexpr int ix = StringLengthCExpr("hello world");
A constexpr variable is always a real constant expression. If it compiles, it is computed on compile time.
I want to use vector<char> as a buffer. The interface is perfect for my needs, but there's a performance penalty when resizing it beyond its current size, since the memory is initialized. I don't need the initialization, since the data will be overwritten in any case by some third-party C functions. Is there a way or a specific allocator to avoid the initialization step? Note that I do want to use resize(), not other tricks like reserve() and capacity(), because I need size() to always represent the significative size of my "buffer" at any moment, while capacity() might be greater than its size after a resize(), so, again, I cannot rely on capacity() as a significative information for my application. Furthemore, the (new) size of the vector is never known in advance, so I cannot use std::array. If vector cannot be configured that way, I'd like to know what kind of container or allocator I could use instead of vector<char, std::alloc>. The only requirement is that the alternative to vector must at most be based on STL or Boost. I have access to C++11.
It is a known issue that initialization can not be turned off even explicitly for std::vector.
People normally implement their own pod_vector<> that does not do any initialization of the elements.
Another way is to create a type which is layout-compatible with char, whose constructor does nothing:
struct NoInitChar
{
char value;
NoInitChar() noexcept {
// do nothing
static_assert(sizeof *this == sizeof value, "invalid size");
static_assert(__alignof *this == __alignof value, "invalid alignment");
}
};
int main() {
std::vector<NoInitChar> v;
v.resize(10); // calls NoInitChar() which does not initialize
// Look ma, no reinterpret_cast<>!
char* beg = &v.front().value;
char* end = beg + v.size();
}
There's nothing in the standard library that meets your requirements, and nothing I know of in boost either.
There are three reasonable options I can think of:
Stick with std::vector for now, leave a comment in the code and come back to it if this ever causes a bottleneck in your application.
Use a custom allocator with empty construct/destroy methods - and hope your optimiser will be smart enough to remove any calls to them.
Create a wrapper around a a dynamically allocated array, implementing only the minimal functionality that you require.
As an alternative solution that works with vectors of different pod types:
template<typename V>
void resize(V& v, size_t newSize)
{
struct vt { typename V::value_type v; vt() {}};
static_assert(sizeof(vt[10]) == sizeof(typename V::value_type[10]), "alignment error");
typedef std::vector<vt, typename std::allocator_traits<typename V::allocator_type>::template rebind_alloc<vt>> V2;
reinterpret_cast<V2&>(v).resize(newSize);
}
And then you can:
std::vector<char> v;
resize(v, 1000); // instead of v.resize(1000);
This is most likely UB, even though it works properly for me for cases where I care more about performance. Difference in generated assembly as produced by clang:
test():
push rbx
mov edi, 1000
call operator new(unsigned long)
mov rbx, rax
mov edx, 1000
mov rdi, rax
xor esi, esi
call memset
mov rdi, rbx
pop rbx
jmp operator delete(void*)
test_noinit():
push rax
mov edi, 1000
call operator new(unsigned long)
mov rdi, rax
pop rax
jmp operator delete(void*)
So to summarize the various solutions found on stackoverflow:
use a special default-init allocator. (https://stackoverflow.com/a/21028912/1984766)
drawback: changes the vector-type to std::vector<char, default_init_allocator<char>> vec;
use a wrapper-struct struct NoInitChar around a char that has an empty constructor and therefore skips the value-initialization (https://stackoverflow.com/a/15220853/1984766)
drawback: changes the vector-type to std::vector<NoInitChar> vec;
temporarily cast the vector<char> to vector<NoInitChar> and resize it (https://stackoverflow.com/a/57053750/1984766)
drawback: does not change the type of the vector but you need to call your_resize_function (vec, x) instead of vec.resize (x).
With this post I wanted to point out that all of these methods need to be optimized by the compiler in order to speed up your program. I can confirm that the initialization of the new chars when resizing is indeed optimized away with every compiler I tested. So everything looks good ...
But --> since method 1 & 2 change the type of the vector, what happens when you use these vectors under more "complex" circumstances.
Consider this example:
#include <time.h>
#include <vector>
#include <string_view>
#include <iostream>
//high precision-timer
double get_time () {
struct timespec timespec;
::clock_gettime (CLOCK_MONOTONIC_RAW, ×pec);
return timespec.tv_sec + timespec.tv_nsec / (1000.0 * 1000.0 * 1000.0);
}
//method 1 --> special allocator
//reformated to make it readable
template <typename T, typename A = std::allocator<T>>
class default_init_allocator : public A {
private:
typedef std::allocator_traits<A> a_t;
public:
template<typename U>
struct rebind {
using other = default_init_allocator<U, typename a_t::template rebind_alloc<U>>;
};
using A::A;
template <typename U>
void construct (U* ptr) noexcept (std::is_nothrow_default_constructible<U>::value) {
::new (static_cast<void*>(ptr)) U;
}
template <typename U, typename...Args>
void construct (U* ptr, Args&&... args) {
a_t::construct (static_cast<A&>(*this), ptr, std::forward<Args>(args)...);
}
};
//method 2 --> wrapper struct
struct NoInitChar {
public:
NoInitChar () noexcept { }
NoInitChar (char c) noexcept : value (c) { }
public:
char value;
};
//some work to waste time
template<typename T>
void do_something (T& vec, std::string_view str) {
vec.push_back ('"');
vec.insert (vec.end (), str.begin (), str.end ());
vec.push_back ('"');
vec.push_back (',');
}
int main (int argc, char** argv) {
double timeBegin = get_time ();
std::vector<char> vec; //normal case
//std::vector<char, default_init_allocator<char>> vec; //method 1
//std::vector<NoInitChar> vec; //method 2
vec.reserve (256 * 1024 * 1024);
for (int i = 0; i < 1024 * 1024; ++i) {
do_something (vec, "foobar1");
do_something (vec, "foobar2");
do_something (vec, "foobar3");
do_something (vec, "foobar4");
do_something (vec, "foobar5");
do_something (vec, "foobar6");
do_something (vec, "foobar7");
do_something (vec, "foobar8");
vec.resize (vec.size () + 64);
}
double timeEnd = get_time ();
std::cout << (timeEnd - timeBegin) * 1000 << "ms" << std::endl;
return 0;
}
You would expect that method 1 & 2 outperform the normal vector with every "recent" compiler since the resize is free and the other operations are the same. Well think again:
g++ 7.5.0 g++ 8.4.0 g++ 9.3.0 clang++ 9.0.0
vector<char> 95ms 134ms 133ms 97ms
method 1 130ms 159ms 166ms 91ms
method 2 166ms 160ms 159ms 89ms
All test-applications are compiled like this and executed 50times taking the lowest measurement:
$(cc) -O3 -flto -std=c++17 sample.cpp
Encapsulate it.
Initialise it to the maximum size (not reserve).
Keep a reference to the iterator representing the end of the real size, as you put it.
Use begin and real end, instead of end, for your algorithms.
It's very rare that you would need to do this; I strongly encourage you to benchmark your situation to be absolutely sure this hack is needed.
Even then, I prefer the NoInitChar solution. (See Maxim's answer)
But if you're sure you would benefit from this, and NoInitChar doesn't work for you, and you're using clang, or gcc, or MSVC as your compiler, consider using folly's routine for this purpose.
See https://github.com/facebook/folly/blob/master/folly/memory/UninitializedMemoryHacks.h
The basic idea is that each of these library implementations has a routine for doing uninitialized resizing; you just need to call it.
While hacky, you can at least console yourself with knowing that facebook's C++ code relies on this hack working properly, so they'll be updating it if new versions of these library implementations require it.
If I have a variable inside a function (say, a large array), does it make sense to declare it both static and constexpr? constexpr guarantees that the array is created at compile time, so would the static be useless?
void f() {
static constexpr int x [] = {
// a few thousand elements
};
// do something with the array
}
Is the static actually doing anything there in terms of generated code or semantics?
The short answer is that not only is static useful, it is pretty well always going to be desired.
First, note that static and constexpr are completely independent of each other. static defines the object's lifetime during execution; constexpr specifies that the object should be available during compilation. Compilation and execution are disjoint and discontiguous, both in time and space. So once the program is compiled, constexpr is no longer relevant.
Every variable declared constexpr is implicitly const but const and static are almost orthogonal (except for the interaction with static const integers.)
The C++ object model (§1.9) requires that all objects other than bit-fields occupy at least one byte of memory and have addresses; furthermore all such objects observable in a program at a given moment must have distinct addresses (paragraph 6). This does not quite require the compiler to create a new array on the stack for every invocation of a function with a local non-static const array, because the compiler could take refuge in the as-if principle provided it can prove that no other such object can be observed.
That's not going to be easy to prove, unfortunately, unless the function is trivial (for example, it does not call any other function whose body is not visible within the translation unit) because arrays, more or less by definition, are addresses. So in most cases, the non-static const(expr) array will have to be recreated on the stack at every invocation, which defeats the point of being able to compute it at compile time.
On the other hand, a local static const object is shared by all observers, and furthermore may be initialized even if the function it is defined in is never called. So none of the above applies, and a compiler is free not only to generate only a single instance of it; it is free to generate a single instance of it in read-only storage.
So you should definitely use static constexpr in your example.
However, there is one case where you wouldn't want to use static constexpr. Unless a constexpr declared object is either ODR-used or declared static, the compiler is free to not include it at all. That's pretty useful, because it allows the use of compile-time temporary constexpr arrays without polluting the compiled program with unnecessary bytes. In that case, you would clearly not want to use static, since static is likely to force the object to exist at runtime.
In addition to given answer, it's worth noting that compiler is not required to initialize constexpr variable at compile time, knowing that the difference between constexpr and static constexpr is that to use static constexpr you ensure the variable is initialized only once.
Following code demonstrates how constexpr variable is initialized multiple times (with same value though), while static constexpr is surely initialized only once.
In addition the code compares the advantage of constexpr against const in combination with static.
#include <iostream>
#include <string>
#include <cassert>
#include <sstream>
const short const_short = 0;
constexpr short constexpr_short = 0;
// print only last 3 address value numbers
const short addr_offset = 3;
// This function will print name, value and address for given parameter
void print_properties(std::string ref_name, const short* param, short offset)
{
// determine initial size of strings
std::string title = "value \\ address of ";
const size_t ref_size = ref_name.size();
const size_t title_size = title.size();
assert(title_size > ref_size);
// create title (resize)
title.append(ref_name);
title.append(" is ");
title.append(title_size - ref_size, ' ');
// extract last 'offset' values from address
std::stringstream addr;
addr << param;
const std::string addr_str = addr.str();
const size_t addr_size = addr_str.size();
assert(addr_size - offset > 0);
// print title / ref value / address at offset
std::cout << title << *param << " " << addr_str.substr(addr_size - offset) << std::endl;
}
// here we test initialization of const variable (runtime)
void const_value(const short counter)
{
static short temp = const_short;
const short const_var = ++temp;
print_properties("const", &const_var, addr_offset);
if (counter)
const_value(counter - 1);
}
// here we test initialization of static variable (runtime)
void static_value(const short counter)
{
static short temp = const_short;
static short static_var = ++temp;
print_properties("static", &static_var, addr_offset);
if (counter)
static_value(counter - 1);
}
// here we test initialization of static const variable (runtime)
void static_const_value(const short counter)
{
static short temp = const_short;
static const short static_var = ++temp;
print_properties("static const", &static_var, addr_offset);
if (counter)
static_const_value(counter - 1);
}
// here we test initialization of constexpr variable (compile time)
void constexpr_value(const short counter)
{
constexpr short constexpr_var = constexpr_short;
print_properties("constexpr", &constexpr_var, addr_offset);
if (counter)
constexpr_value(counter - 1);
}
// here we test initialization of static constexpr variable (compile time)
void static_constexpr_value(const short counter)
{
static constexpr short static_constexpr_var = constexpr_short;
print_properties("static constexpr", &static_constexpr_var, addr_offset);
if (counter)
static_constexpr_value(counter - 1);
}
// final test call this method from main()
void test_static_const()
{
constexpr short counter = 2;
const_value(counter);
std::cout << std::endl;
static_value(counter);
std::cout << std::endl;
static_const_value(counter);
std::cout << std::endl;
constexpr_value(counter);
std::cout << std::endl;
static_constexpr_value(counter);
std::cout << std::endl;
}
Possible program output:
value \ address of const is 1 564
value \ address of const is 2 3D4
value \ address of const is 3 244
value \ address of static is 1 C58
value \ address of static is 1 C58
value \ address of static is 1 C58
value \ address of static const is 1 C64
value \ address of static const is 1 C64
value \ address of static const is 1 C64
value \ address of constexpr is 0 564
value \ address of constexpr is 0 3D4
value \ address of constexpr is 0 244
value \ address of static constexpr is 0 EA0
value \ address of static constexpr is 0 EA0
value \ address of static constexpr is 0 EA0
As you can see yourself constexpr is initilized multiple times (address is not the same) while static keyword ensures that initialization is performed only once.
Not making large arrays static, even when they're constexpr can have dramatic performance impact and can lead to many missed optimizations. It may slow down your code by orders of magnitude. Your variables are still local and the compiler may decide to initialize them at runtime instead of storing them as data in the executable.
Consider the following example:
template <int N>
void foo();
void bar(int n)
{
// array of four function pointers to void(void)
constexpr void(*table[])(void) {
&foo<0>,
&foo<1>,
&foo<2>,
&foo<3>
};
// look up function pointer and call it
table[n]();
}
You probably expect gcc-10 -O3 to compile bar() to a jmp to an address which it fetches from a table, but that is not what happens:
bar(int):
mov eax, OFFSET FLAT:_Z3fooILi0EEvv
movsx rdi, edi
movq xmm0, rax
mov eax, OFFSET FLAT:_Z3fooILi2EEvv
movhps xmm0, QWORD PTR .LC0[rip]
movaps XMMWORD PTR [rsp-40], xmm0
movq xmm0, rax
movhps xmm0, QWORD PTR .LC1[rip]
movaps XMMWORD PTR [rsp-24], xmm0
jmp [QWORD PTR [rsp-40+rdi*8]]
.LC0:
.quad void foo<1>()
.LC1:
.quad void foo<3>()
This is because GCC decides not to store table in the executable's data section, but instead initializes a local variable with its contents every time the function runs. In fact, if we remove constexpr here, the compiled binary is 100% identical.
This can easily be 10x slower than the following code:
template <int N>
void foo();
void bar(int n)
{
static constexpr void(*table[])(void) {
&foo<0>,
&foo<1>,
&foo<2>,
&foo<3>
};
table[n]();
}
Our only change is that we have made table static, but the impact is enormous:
bar(int):
movsx rdi, edi
jmp [QWORD PTR bar(int)::table[0+rdi*8]]
bar(int)::table:
.quad void foo<0>()
.quad void foo<1>()
.quad void foo<2>()
.quad void foo<3>()
In conclusion, never make your lookup tables local variables, even if they're constexpr. Clang actually optimizes such lookup tables well, but other compilers don't. See Compiler Explorer for a live example.