C++ std::string initialization better performance (assembly)

C++ std::string initialization better performance (assembly) - c++

I was playing with www.godbolt.org to check what code generates better assembly code, and I can't understand why this two different approaches generate different results (in assembly commands).
The first approach is to declare a string, and then later set a value:
#include <string>
int foo() {
std::string a;
a = "abcdef";
return a.size();
}
Which, in my gcc 7.4 (-O3) outputs:
.LC0:
.string "abcdef"
foo():
push rbp
mov r8d, 6
mov ecx, OFFSET FLAT:.LC0
xor edx, edx
push rbx
xor esi, esi
sub rsp, 40
lea rbx, [rsp+16]
mov rdi, rsp
mov BYTE PTR [rsp+16], 0
mov QWORD PTR [rsp], rbx
mov QWORD PTR [rsp+8], 0
call std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long)
mov rdi, QWORD PTR [rsp]
mov rbp, QWORD PTR [rsp+8]
cmp rdi, rbx
je .L1
call operator delete(void*)
.L1:
add rsp, 40
mov eax, ebp
pop rbx
pop rbp
ret
mov rbp, rax
jmp .L3
foo() [clone .cold]:
.L3:
mov rdi, QWORD PTR [rsp]
cmp rdi, rbx
je .L4
call operator delete(void*)
.L4:
mov rdi, rbp
call _Unwind_Resume
So, I imagined that if I initialize the string in the declaration, the output assembly would be shorter:
int bar() {
std::string a {"abcdef"};
return a.size();
}
And indeed it is:
bar():
mov eax, 6
ret
Why this huge difference? What prevents gcc to optimize the first version similar to the second?
godbolt link

This is just a guess:
operator= has a strong exception guarantee; which means:
If an exception is thrown for any reason, this function has no effect (strong exception guarantee).
(since C++11)
(source)
So while the constructor can leave the object in any condition it likes, operator= needs to make sure that the object is the same as before; I suspect that's why the call to operator delete is there (to clean up potentially allocated memory).

Related

Will delete the pointer to pointer cause the memory leak?

I have a question about C-style string symbolic constants and dynamically allocating arrays.
const char** name = new const char* { "Alan" };
delete name;
when I try to delete name after new'ing a piece of memory, the compiler suggest to me to use delete instead of delete[]. I understand name only stores the address of the pointer to the only-read string.
However, if I only delete the pointer to pointer (which is name), will the string itself cause a memory leak?

As the comments above indicate, you don't need to manage the memory that "Alan" exists in.
Let's see what that looks like in practice.
I made a modified version of your code:
#include <iostream>
void test() {
const char** name;
name = new const char* { "Alan\n" };
delete name;
}
int main()
{
test();
}
and then I popped it into godbolt and it shows what's happening under the hood. (excerpts copied below)
In both clang and gcc, the memory that stores "Alan\n" is in static memory so it always exists. This is how it creates no memory leak even though you never touch it again after mentioning it. The value of the pointer to "Alan\n" is just the position in the program's memory, offset .L.str or OFFSET FLAT:.LC0.
clang:
test(): # #test()
push rbp
mov rbp, rsp
sub rsp, 16
mov edi, 8
call operator new(unsigned long)
mov rcx, rax
movabs rdx, offset .L.str
mov qword ptr [rax], rdx
mov qword ptr [rbp - 8], rcx
mov rax, qword ptr [rbp - 8]
cmp rax, 0
mov qword ptr [rbp - 16], rax # 8-byte Spill
je .LBB1_2
mov rax, qword ptr [rbp - 16] # 8-byte Reload
mov rdi, rax
call operator delete(void*)
.L.str:
.asciz "Alan\n"
gcc:
.LC0:
.string "Alan\n"
test():
push rbp
mov rbp, rsp
sub rsp, 16
mov edi, 8
call operator new(unsigned long)
mov QWORD PTR [rax], OFFSET FLAT:.LC0
mov QWORD PTR [rbp-8], rax
mov rax, QWORD PTR [rbp-8]
test rax, rax
je .L3
mov esi, 8
mov rdi, rax
call operator delete(void*, unsigned long)

Why can I create an object when there is no explicitly defined constructor?

I have two pieces of code:
Piece 1:
class Test{
public:
Test(int a) {};
};
Piece 2:
class Test{
public:
Test() {};
};
When i want to create an array of some size with the Test object i have a problem. With the first piece of code i can't create the array as Test arr[some number], instead i must to do Test *arr[some number] = new Test();
But with the second piece of code i can do Test arr[some number].
Apparently, without arguments in the constructor, I can create the array without having to point to the object's memory, but if I use constructor with args, I have to point to the object's memory address and then use the arrow operator to access its methods.
Why is this happening?

From logical point of view, which permeates C++ language rules:
Constructors of class describes ways one object of class-type can be created.
If any user-defined constructor was provided, compiler would not generate default constructor. If no constructors defined, compiler would generate copy, move and default constructors.
Results in:
If there is no default constructor, object cannot be created by default initialization. Because there is no constructor for that.
To allow object be created that way, user should specify constructor for default initialization.
class Test{
public:
Test() = default;
Test(int a) {};
};
Note that array is another object, consisting of objects that are its elements. It still can be value-initialized, e.g.
Test a1[3] = {3,5,7}; // those values are passed to constructors of individual elements
// or
Test a2[3] = {{3},{5},{7}};
But if Test doesn't have a default constructor, i.e. it cannot be default initialized, code would be ill-formed if count of elements in list would be smaller than size of array.
Test a[13] = {3,5,7}; // for elements 3-12 Test() will be called

Here's a test example from Compiler Explorer
Here's the actual code:
class Test1 {
public:
Test1() {}
};
class Test2 {
public:
Test2(int a) {}
};
int main() {
// Compiles just fine
Test1 test1Arr[5];
// Fails to Compile
Test2 test2Arr[5];
return 0;
}
Now, this fails to compile because of the 2nd case... If we comment Test2 test2Arr[5]; so that it compiles it will generate this assembly for gcc 10.2...
Test1::Test1() [base object constructor]:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
nop
pop rbp
ret
main:
push rbp
mov rbp, rsp
push r12
push rbx
sub rsp, 16
lea rax, [rbp-21]
mov ebx, 4
mov r12, rax
.L4:
test rbx, rbx
js .L3
mov rdi, r12
call Test1::Test1() [complete object constructor]
add r12, 1
sub rbx, 1
jmp .L4
.L3:
mov eax, 0
add rsp, 16
pop rbx
pop r12
pop rbp
ret
So as you can see you can create an array of class or struct objects with default constructors easily.
The reason it is failing in the other case is that it is not a default constructor. You have an explicit argument making it a user-defined constructor and you have not declared a default constructor.
In other words, if you were to create a single instance of this class object you must pass an argument to its constructor.
Test2 test2(3);
Now since you want to create an array of these and without using dynamic memory, you can take advantage of brace-initialization
If you were to do this:
Test2 testArra[5] = {0,1,2,3,4};
This will compile and this will generate the following assembly:
Test1::Test1() [base object constructor]:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
nop
pop rbp
ret
Test2::Test2(int):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov DWORD PTR [rbp-12], esi
nop
pop rbp
ret
main:
push rbp
mov rbp, rsp
push r12
push rbx
sub rsp, 16
lea rax, [rbp-21]
mov ebx, 4
mov r12, rax
.L5:
test rbx, rbx
js .L4
mov rdi, r12
call Test1::Test1() [complete object constructor]
add r12, 1
sub rbx, 1
jmp .L5
.L4:
lea rbx, [rbp-26]
mov esi, 0
mov rdi, rbx
call Test2::Test2(int)
add rbx, 1
mov esi, 1
mov rdi, rbx
call Test2::Test2(int)
add rbx, 1
mov esi, 2
mov rdi, rbx
call Test2::Test2(int)
add rbx, 1
mov esi, 3
mov rdi, rbx
call Test2::Test2(int)
lea rax, [rbx+1]
mov esi, 4
mov rdi, rax
call Test2::Test2(int)
mov eax, 0
add rsp, 16
pop rbx
pop r12
pop rbp
ret
This works because it will assign each value of the brace-initialization through the use of the comma-operator to each element of the array's user-defined constructor...
Now, if you define both constructors within the same class as such:
class Test3 {
public:
Test3() {}
Test3(int a) {}
};
int main() {
Test3 test3Arr[5];
return 0;
}
This will compile generating this assembly code:
Test3::Test3() [base object constructor]:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
nop
pop rbp
ret
main:
push rbp
mov rbp, rsp
push r12
push rbx
sub rsp, 16
lea rax, [rbp-21]
mov ebx, 4
mov r12, rax
.L4:
test rbx, rbx
js .L3
mov rdi, r12
call Test3::Test3() [complete object constructor]
add r12, 1
sub rbx, 1
jmp .L4
.L3:
mov eax, 0
add rsp, 16
pop rbx
pop r12
pop rbp
ret
which can be seen here.
The bottom line is... when you declare-define a user-defined constructor, the compiler will not automatically generate a default constructor for you and if you want that behavior you must define it yourself. Now, if you do not have any user-defined constructors and everything is default initialized, then you don't have to define any constructor and the compiler will automatically generate the default constructor for you.

Is this an old C++ style constructor?

Here a piece of C++ code.
In this example, many code blocks look like constructor calls.
Unfortunately, block code #3 is not (You can check it using https://godbolt.org/z/q3rsxn and https://cppinsights.io).
I think, it is an old C++ notation and it could explain the introduction of the new C++11 construction notation using {} (cf #4).
Do you have an explanation for T(i) meaning, so close to a constructor notation, but definitely so different?
struct T {
T() { }
T(int i) { }
};
int main() {
int i = 42;
{ // #1
T t(i); // new T named t using int ctor
}
{ // #2
T t = T(i); // new T named t using int ctor
}
{ // #3
T(i); // new T named i using default ctor
}
{ // #4
T{i}; // new T using int ctor (unnamed result)
}
{ // #5
T(2); // new T using int ctor (unnamed result)
}
}
NB: thus, T(i) (#3) is equivalent to T i = T();

The statement:
T(i);
is equivalent to:
T i;
In other words, it declares a variable named i with type T. This is because parentheses are allowed in declarations in some places (in order to change the binding of declarators) and since this statement can be parsed as a declaration, it is a declaration (even though it might make more sense as an expression).

You can use Compiler Explorer to see what happens in assembler.
You can see that #1,#2 #4 and #5 do same thing but strangly #3 call the other constructor (the base object constructor).
Does anyone have an explanation?
Assembler code :
::T() [base object constructor]:
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
nop
pop rbp
ret
T::T(int):
push rbp
mov rbp, rsp
mov QWORD PTR [rbp-8], rdi
mov DWORD PTR [rbp-12], esi
nop
pop rbp
ret
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 42
// #1
mov edx, DWORD PTR [rbp-4]
lea rax, [rbp-7]
mov esi, edx
mov rdi, rax
call T::T(int)
// #2
mov edx, DWORD PTR [rbp-4]
lea rax, [rbp-8]
mov esi, edx
mov rdi, rax
call T::T(int)
// #3
lea rax, [rbp-9]
mov rdi, rax
call T::T() [complete object constructor]
// #4
mov edx, DWORD PTR [rbp-4]
lea rax, [rbp-6]
mov esi, edx
mov rdi, rax
call T::T(int)
// #5
lea rax, [rbp-5]
mov esi, 2
mov rdi, rax
call T::T(int)
mov eax, 0
leave
ret

What does Znwm and ZdlPv mean in assembly?

I'm new to assembly and I'm trying to figure out how C++ handles dynamic dispatch in assembly.
When looking through assembly code, I saw that there were 2 unusual calls:
call _Znwm
call _ZdlPv
These did not have a subroutine that I could trace them to. From examining the code, Znwm seemed to return the address of the object when its constructor was called, but I'm not sure about that. ZdlPv was in a block of code that could never be reached (it was jumped over).
C++:
Fruit * f;
f = new Apple();
x86:
# BB#1:
mov eax, 8
mov edi, eax
call _Znwm
mov rdi, rax
mov rcx, rax
.Ltmp6:
mov qword ptr [rbp - 48], rdi # 8-byte Spill
mov rdi, rax
mov qword ptr [rbp - 56], rcx # 8-byte Spill
call _ZN5AppleC2Ev
Any advice would be appreciated.
Thanks.

_Znwm is operator new.
_ZdlPv is operator delete.

Why does gcc and clang produce very differnt code for member function template parameters?

I am trying to understand what is going on when a member function pointer is used as template parameter. I always thought that function pointers (or member function pointers) are a run-time concept, so I was wondering what happens when they are used as template parameters. For this reason I took a look a the output produced by this code:
struct Foo { void foo(int i){ } };
template <typename T,void (T::*F)(int)>
void callFunc(T& t){ (t.*F)(1); }
void callF(Foo& f){ f.foo(1);}
int main(){
Foo f;
callF(f);
callFunc<Foo,&Foo::foo>(f);
}
where callF is for comparison. gcc 6.2 produces the exact same output for both functions:
callF(Foo&): // void callFunc<Foo, &Foo::foo>(Foo&):
push rbp
mov rbp, rsp
sub rsp, 16
mov QWORD PTR [rbp-8], rdi
mov rax, QWORD PTR [rbp-8]
mov esi, 1
mov rdi, rax
call Foo::foo(int)
nop
leave
ret
while clang 3.9 produces almost the same output for callF():
callF(Foo&): # #callF(Foo&)
push rbp
mov rbp, rsp
sub rsp, 16
mov esi, 1
mov qword ptr [rbp - 8], rdi
mov rdi, qword ptr [rbp - 8]
call Foo::foo(int)
add rsp, 16
pop rbp
ret
but very different output for the template instantiation:
void callFunc<Foo, &Foo::foo>(Foo&): # #void callFunc<Foo, &Foo::foo>(Foo&)
push rbp
mov rbp, rsp
sub rsp, 32
xor eax, eax
mov cl, al
mov qword ptr [rbp - 8], rdi
mov rdi, qword ptr [rbp - 8]
test cl, 1
mov qword ptr [rbp - 16], rdi # 8-byte Spill
jne .LBB3_1
jmp .LBB3_2
.LBB3_1:
movabs rax, Foo::foo(int)
sub rax, 1
mov rcx, qword ptr [rbp - 16] # 8-byte Reload
mov rdx, qword ptr [rcx]
mov rax, qword ptr [rdx + rax]
mov qword ptr [rbp - 24], rax # 8-byte Spill
jmp .LBB3_3
.LBB3_2:
movabs rax, Foo::foo(int)
mov qword ptr [rbp - 24], rax # 8-byte Spill
jmp .LBB3_3
.LBB3_3:
mov rax, qword ptr [rbp - 24] # 8-byte Reload
mov esi, 1
mov rdi, qword ptr [rbp - 16] # 8-byte Reload
call rax
add rsp, 32
pop rbp
ret
Why is that? Is gcc taking some (possibly non-standard) shortcut?

gcc was able to figure out what the template was doing, and generated the simplest code possible. clang didn't. A compiler is permitted to perform any optimization as long as the observable results are compliant with the C++ specification. If optimizing away an intermediate function pointer, so be it. Nothing else in the code references the temporary function pointer, so it can be optimized away completely, and the whole thing replaced with a simple function call.
gcc and clang are different compilers, written by different people, with different approaches and algorithms for compiling C++.
It is natural, and expected to see different results from different compilers. In this case, gcc was able to figure things out better than clang. I'm sure there are other situations where clang will be able to figure things out better than gcc.

This test was done without any optimizations requested.
One compiler generated more verbose unoptimized code.
Unoptimized code is, quite simply, uninteresting. It is intended to be correct and easy to debug and derive directly from some intermediate representation that is easy to optimize.
The details of optimized code are what matter, barring a ridiculous and widespread slowdown that makes debugging painful.
There is nothing of interest to see or explain here.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ std::string initialization better performance (assembly) - c++

Related

Will delete the pointer to pointer cause the memory leak?

Why can I create an object when there is no explicitly defined constructor?

Is this an old C++ style constructor?

What does Znwm and ZdlPv mean in assembly?

Why does gcc and clang produce very differnt code for member function template parameters?

Categories

Resources