Why can compiler not optimize out unused static std::string?

Why can compiler not optimize out unused static std::string? - c++

If I compile this code with GCC or Clang and enable -O2 optimizations, I still get some global object initialization. Is it even possible for any code to reach these variables?
#include <string>
static const std::string s = "";
int main() { return 0; }
Compiler output:
main:
xor eax, eax
ret
_GLOBAL__sub_I_main:
mov edx, OFFSET FLAT:__dso_handle
mov esi, OFFSET FLAT:s
mov edi, OFFSET FLAT:_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEED1Ev
mov QWORD PTR s[rip], OFFSET FLAT:s+16
mov QWORD PTR s[rip+8], 0
mov BYTE PTR s[rip+16], 0
jmp __cxa_atexit
Specifically, I was not expecting the _GLOBAL__sub_I_main: section.
Godbolt link
Edit:
Even with a simple custom defined type, the compiler still generates some code.
class Aloha
{
public:
Aloha () : i(1) {}
~Aloha() = default;
private:
int i;
};
static const Aloha a;
int main() { return 0; }
Compiler output:
main:
xor eax, eax
ret
_GLOBAL__sub_I_main:
ret

Compiling that code with short string optimization (SSO) may be an equivalent of taking address of std::string's member variable. Constructor have to analyze string length at compile time and choose if it can fit into internal storage of std::string object or it have to allocate memory dynamically but then find that it never was read so allocation code can be optimized out.
Lack of optimization in this case might be an optimization flaw limited to such simple outlying examples like this one:
const int i = 3;
int main()
{
return (long long)(&i); // to make sure that address was used
}
GCC generates code:
i:
.long 3 ; this a variable
main:
push rbp
mov rbp, rsp
mov eax, OFFSET FLAT:i
pop rbp
ret
GCC would not optimize this code as well:
const int i = 3;
const int *p = &i;
int main() { return 0; }
Static variables declared in file scope, especially const-qualified ones can be optimized out per as-if rule unless their address was used, GCC does that only to const-qualified ones regardless of use case. Taking address of variable is an observable behaviour, because it can be passed somewhere. Logic which would trace that would be too complex to implement and would be of little practical value.
Of course, the code that doesn't use address
const int i = 3;
int main() { return i; }
results in optimizing out reserved storage:
main:
mov eax, 3
ret
As of C++20 constexpr construction of std::string? Per older rules it could not be a compile-time expression if result was dependant on arguments. It possible that std::string would allocate memory dynamically if string is too long, which isn't a compile-time action. It appears that only mainstream compiler that supports C++20 features required for that it at this moment is MSVC in certain conditions.

Related

When the object of std::function is destroyed? [duplicate]

This question already has answers here:
Can a local variable's memory be accessed outside its scope?
(20 answers)
Closed 2 years ago.
When the object of std::function is destroyed?
Why the pointer to the object of std::function is still valid when the variable fn1 is out of the scope（you see the code snippet works well, http://cpp.sh/6nnd4）?
// function example
#include <iostream> // std::cout
#include <functional> // std::function, std::negate
// a function:
int half(int x) {return x/2;}
int main () {
std::function<int(int)> *pFun;
{
std::function<int(int)> fn1 = half; // function
pFun= &fn1;
std::cout << "fn1(60): " << (*pFun)(60) << '\n';
}
std::cout << "fn1(60): " << (*pFun)(90) << '\n';
return 0;
}

The short answer is that C++ doesn't validate the contents of a pointer. That's the developer's responsibility. It only validates that the variable pFun is in scope.
In C++ it is often the developers responsibility to make sure that their pointers are pointing to valid objects. In this case, fn1 has likely been created on the stack. Since many compilers implement local variables that way. When fn1goes out of scope the compiler will no longer allow the use of the variable fn1. However, what happens to the memory on the stack backing fn1 is not defined. In your case, the compiler left it untouched so (*pFun)(90) happens to still work. However, running the same code on another compiler may not. In fact, simply turning on compiler optimization may cause it to stop working. It all depends on whether the compiler re-uses that memory, or frees it from the stack.
In your example the scope is a code block. If the scope had instead been a separate function named MyFunction2then when MyFunction2exited the stack memory associated with fn1 would have been freed off the stack and the memory available for reuse. Both by interrupts and whatever code was called after MyFunction2. So the data would be more likely to have been changed such that (*pFun)(90) faulted.
Now debugging something like this crashing is fairly strait forward. Imagine if you had written to pFun instead, this would have caused memory corruption of what ever object happened to be using that memory after fn1 had gone out of scope.

Why the pointer to the object of std::function is still valid when the variable fn1 is out of the scope?
Let me present a simpler example, using ints. But if you are brave, you can try to read the assembler for the std::function version.
int main () {
int a = 0;
int *c = nullptr;
{
int b = 1;
c = &b;
}
a = *c;
return a;
}
This is generated with gcc 10.2 -O0, but the other compilers have a really similar output. I will comment it to aid the understanding.
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 0 # space for `a`
mov QWORD PTR [rbp-16], 0 # space for `c`
mov DWORD PTR [rbp-20], 1 # space for `b`
lea rax, [rbp-20] # int *tmp = &b Not really, but enough for us
mov QWORD PTR [rbp-16], rax # c = tmp
mov rax, QWORD PTR [rbp-16]
mov eax, DWORD PTR [rax] # int tmp2 = *tmp
mov DWORD PTR [rbp-4], eax # a = tmp2
mov eax, DWORD PTR [rbp-4]
pop rbp
ret # return a
And the program return 1, as expected when you see the assembler. You can see, b was not "invalidated", because we did not roll the stack back, and we didnt change its value. This will be a case like the one you are in, were it works.
Now lets enable optimizations. Here is it compiled with -O1, the lowest level.
Here is it compiled with gcc:
main:
mov eax, 0
ret
And here is it compiled with clang:
main: # #main
mov eax, 1
ret
And here is with visual studio:
main PROC ; COMDAT
mov eax, 1
ret 0
main ENDP
You can see the results are diferent. It is undefined behaviour, each implementation will do as it sees fit.
Now this is happening with some local variables in the same function. But consider this cases:
It was a pointer, lets say allocated with new. You already called delete. What guaranties that that memory is not used by someone else now?
When the stack grows, the value will eventually be overiden. Consider this code:
int* foo() {
int a = 0;
return &a;
}
void bar() {
int b = 1;
}
int main () {
int *ptr = foo();
bar();
int a = *ptr;
return a;
}
It didnt return 1 or 0. It returned 139.
And here is a good read on this.

Is there difference between a static const and const variable inside a c++ class in terms of its storage

Here is a snippet of the code I have:
class modbus {
public:
static const uint8_t modbusHeader = 2;
static const uint8_t modbusCRC = 2;
static const uint8_t modbusPDU = modbusHeader + modbusCRC;
static const uint8_t exceptionBase = 0x80;
static const uint32_t transmitTimeout = 5000;
};
It defines some sizes for the modbus packets that I need to create inside the class. I work inside an embedded environment and as such size optimisations and considerations are always there. As such I really want to have only one occurrence of these constant values inside my read only part of the flash.
I have chosen to set these variables as static but is this necessary? Would a compiler infer that these values need only be saved once inside the binary and as such only include them once when I remove the static keyword?

I suppose, technically, if the compiler knew that you never performed sizeof on modbus, and never took the address of these members through different modbus* pointers, and knew that they were only ever initialised with the exact same trivial value, it might use the "as-if" rule to merge them into one and remove them from the class in terms of storage. (If it couldn't guarantee just one of these things, the rules of the language would be violated.)
But that's a tall order (particularly when you consider multiple translation units), and would not really be useful.
So no. I don't expect that this would ever happen.
You should indeed make those things static const (with perhaps a sprinkling of constexpr).

The code
#include <cstdint>
class modbus {
public:
static const uint8_t modbusHeader = 2;
};
int main() {
return modbus::modbusHeader;
}
results in
main:
pushq %rbp
movq %rsp, %rbp
movl $2, %eax
popq %rbp
ret
and
#include <cstdint>
class modbus {
public:
const uint8_t modbusHeader = 2;
};
int main() {
return modbus().modbusHeader;
}
results in
modbus::modbus() [base object constructor]:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov BYTE PTR [rax], 2
nop
pop rbp
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov BYTE PTR [rbp-1], 0
lea rax, [rbp-1]
mov rdi, rax
call modbus::modbus() [complete object constructor]
movzx eax, BYTE PTR [rbp-1]
movzx eax, al
leave
ret
At least here is a large difference. I used g++ 9.2 with -O0

If you insist on using C++, you must const-qualify an object of the class, not members of the class. You must then ensure that const variables with static storage duration end up in flash for your given target. Often the tool chain has a build called "flash-something". It's related to the linker rather than the compiler. If that part is working, it shouldn't matter if the object of the class is static or not, as long as the object is declared at file scope (static storage duration).
What you shouldn't do, is to just const qualify class members, rather than the object they reside in. Because that might cause silent C++ bloat to remain in the executable, in the form of default constructors called by the "CRT". You can verify this by single-stepping through the part of the CRT that deals with calling constructors of objects with static storage duration.
As for making members static, you should only need to do that for purposes like singleton, not for making members end up in flash.

C++ constexpr function in return statement

Why is a constexpr function no evaluated at compile time but in runtime in the return statement of main function?
It tried
template<int x>
constexpr int fac() {
return fac<x - 1>() * x;
}
template<>
constexpr int fac<1>() {
return 1;
}
int main() {
const int x = fac<3>();
return x;
}
and the result is
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], 6
mov eax, 6
pop rbp
ret
with gcc 8.2. But when I call the function in the return statement
template<int x>
constexpr int fac() {
return fac<x - 1>() * x;
}
template<>
constexpr int fac<1>() {
return 1;
}
int main() {
return fac<3>();
}
I get
int fac<1>():
push rbp
mov rbp, rsp
mov eax, 1
pop rbp
ret
main:
push rbp
mov rbp, rsp
call int fac<3>()
nop
pop rbp
ret
int fac<2>():
push rbp
mov rbp, rsp
call int fac<1>()
add eax, eax
pop rbp
ret
int fac<3>():
push rbp
mov rbp, rsp
call int fac<2>()
mov edx, eax
mov eax, edx
add eax, eax
add eax, edx
pop rbp
ret
Why is the first code evaluated at compile time and the second at runtime?
Also I tried both snippets with clang 7.0.0 and they are evaluated at runtime. Why is this not valid constexpr for clang?
All evaluation was done in godbolt compiler explorer.

A common misconception with regard to constexpr is that it means "this will be evaluated at compile time"1.
It is not. constexpr was introduced to let us write natural code that may produce constant expressions in contexts that need them. It means "this must be evaluatable at compile time", which is what the compiler will check.
So if you wrote a constexpr function returning an int, you can use it to calculate a template argument, an initializer for a constexpr variable (also const if it's an integral type) or an array size. You can use the function to obtain natural, declarative, readable code instead of the old meta-programming tricks one needed to resort to in the past.
But a constexpr function is still a regular function. The constexpr specifier doesn't mean a compiler has2 to optimize it to heck and do constant folding at compile time. It's best not to confuse it for such a hint.
1 - Thanks user463035818 for the phrasing.
2 - c++20 and consteval is a different story however :)

StoryTeller's answer is good, but I think there's a slightly different take possible.
With constexpr, there are three situations to distinguish:
The result is needed in a compile-time context, such as array sizes. In this case, the arguments too must be known at compile time. Evaluation is probably at compile time, and at least all diagnosable errors will be found at compile time.
The arguments are only known at run time, and the result is not needed at compile time. In this case, evaluation necessarily has to happen at run time.
The arguments may be available at compile time, but the result is needed only at run time.
The fourth combination (arguments available only at runtime, result needed at compile time) is an error; the compiler will reject such code.
Now in cases 1 and 3 the calculation could happen at compile time, as all inputs are available. But to facilitate case 2, the compiler must be able to create a run-time version, and it may decide to use this variant in the other cases as well - if it can.
E.g. some compilers internally support variable-sized arrays, so even while the language requires compile-time array bounds, the implementation may decide not to.

Do rvalue references have the same overhead as lvalue references?

Consider this example:
#include <utility>
// runtime dominated by argument passing
template <class T>
void foo(T t) {}
int main() {
int i(0);
foo<int>(i); // fast -- int is scalar type
foo<int&>(i); // slow -- lvalue reference overhead
foo<int&&>(std::move(i)); // ???
}
Is foo<int&&>(i) as fast as foo<int>(i), or does it involve pointer overhead like foo<int&>(i)?
EDIT: As suggested, running g++ -S gave me the same 51-line assembly file for foo<int>(i) and foo<int&>(i), but foo<int&&>(std::move(i)) resulted in 71 lines of assembly code (it looks like the difference came from std::move).
EDIT: Thanks to those who recommended g++ -S with different optimization levels -- using -O3 (and making foo noinline) I was able to get output which looks like xaxxon's solution.

In your specific situation, it's likely they are all the same. The resulting code from godbolt with gcc -O3 is https://godbolt.org/g/XQJ3Z4 for:
#include <utility>
// runtime dominated by argument passing
template <class T>
int foo(T t) { return t;}
int main() {
int i{0};
volatile int j;
j = foo<int>(i); // fast -- int is scalar type
j = foo<int&>(i); // slow -- lvalue reference overhead
j = foo<int&&>(std::move(i)); // ???
}
is:
mov dword ptr [rsp - 4], 0 // foo<int>(i);
mov dword ptr [rsp - 4], 0 // foo<int&>(i);
mov dword ptr [rsp - 4], 0 // foo<int&&>(std::move(i));
xor eax, eax
ret
The volatile int j is so that the compiler cannot optimize away all the code because it would otherwise know that the results of the calls are discarded and the whole program would optimize to nothing.
HOWEVER, if you force the function to not be inlined, then things change a bit int __attribute__ ((noinline)) foo(T t) { return t;}:
int foo<int>(int): # #int foo<int>(int)
mov eax, edi
ret
int foo<int&>(int&): # #int foo<int&>(int&)
mov eax, dword ptr [rdi]
ret
int foo<int&&>(int&&): # #int foo<int&&>(int&&)
mov eax, dword ptr [rdi]
ret
above: https://godbolt.org/g/pbZ1BT
For questions like these, learn to love https://godbolt.org and https://quick-bench.com/ (quick bench requires you to learn how to properly use google test)

Efficiency of parameter passing depends on the ABI.
For example, on linux the Itanium C++ ABI specifies that references are passed as pointers to the referred object:
3.1.2 Reference Parameters
Reference parameters are handled by passing a pointer to the actual parameter.
This is independent of the reference category (rvalue/lvalue reference).
For a broader view, I have found this quote in a document from the Technical University of Denmark, calling convention, which analyzes most of the compilers:
References are treated as identical to pointers in all respects.
So rvalue and lvalue reference involve pointer overhead on all ABI.

Constexpr variable evaluation

Here is my code and I need clarification on what's happening:
constexpr int funct(int x){
return x + 1;
}
int main(){
int x = funct(10);
return 0;
}
constexpr's allows compile time calculation, and based on my code above, since funct is declared as constexpr, it is allowed to do compile time calculations if the arguments are constants or constexpr's themselves.
The part that I am confused with lies in this part, the int x. Since it is not declared as constexpr, would it mean that int x will get the value at runtime? Does that mean that declaring it to be constexpr int x will mean that int x will get the value at compile time unlike int x ?

It depends on the compiler in question on many counts. What sorts of optimizations that may take place, etc. However, constexpr does not inherently enable compile-time calculations.
Take this code:
#include <cstdio>
constexpr int test(int in)
{
return in + 25;
}
int main(int argc, char* argv[])
{
printf("Test: %u\n", test(5));
printf("Two: %u\n", test(10));
}
Under GCC 4.8.4 on my x86_64 Gentoo box, that actually still makes a call on both counts to the supposedly compile-time "test". The lines I used were
g++ -std=c++11 -Wall -g -c main.cpp -o obj/Debug/main.o
g++ -o bin/Debug/TestProject obj/Debug/main.o
So on the code above, that produces the following machine code:
0x40061c mov edi,0x5
0x400621 call 0x400659 <test(int)>
0x400626 mov esi,eax
0x400628 mov edi,0x4006f4
0x40062d mov eax,0x0
0x400632 call 0x4004f0 <printf#plt>
0x400637 mov edi,0xa
0x40063c call 0x400659 <test(int)>
0x400641 mov esi,eax
0x400643 mov edi,0x4006fd
0x400648 mov eax,0x0
0x40064d call 0x4004f0 <printf#plt>
Where the asm block for "test" is:
0x400659 push rbp
0x40065a mov rbp,rsp
0x40065d mov DWORD PTR [rbp-0x4],edi
0x400660 mov eax,DWORD PTR [rbp-0x4]
0x400663 add eax,0x19
0x400666 pop rbp
0x400667 ret
So, as you can see in that situation, it seems to hold pretty much no effect over how GCC produced that code. Even if you do this, you will still get runtime calculations:
int value = test(30);
printf("Value: %u\n", value);
Where that produces this (the address of test changed slightly due to adding some more code):
0x40061c mov edi,0x1e
0x400621 call 0x40067a <test(int)>
0x400626 mov DWORD PTR [rbp-0x4],eax
0x400629 mov eax,DWORD PTR [rbp-0x4]
0x40062c mov esi,eax
0x40062e mov edi,0x400714
0x400633 mov eax,0x0
0x400638 call 0x4004f0 <printf#plt>
It does, however, produce the expected result if you declare the value itself as constexpr:
constexpr int value = test(30);
printf("Value: %u\n", value);
And the associated code is:
0x400623 mov esi,0x37
0x400628 mov edi,0x400714
0x40062d mov eax,0x0
0x400632 call 0x4004f0 <printf#plt>
So essentially, you're not guaranteed a compile-time calculation if you simply prepend the method declaration with constexpr. You also need to declare your a variable as constexpr and assign to it. Doing such a declaration will actually require the compiler to statically evaluate the result. GCC actually yells at you if you try to do this and "test" isn't declared constexpr:
main.cpp|10|error: call to non-constexpr function ‘int test(int)’|

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js