C++ memory leak in template specialization [duplicate] - c++

EDIT:
I've voted to close this is it is now incorrect.
In March 2016 Valgrind gained an option "--run-cxx-freeres=<yes|no>" (default is yes). This will call a libstdc++ function to free one-off allocations used for things like iostream. If you are using a post-2016 Valgrind and libstdc++ you will get
==9356== HEAP SUMMARY:
==9356== in use at exit: 0 bytes in 0 blocks
==9356== total heap usage: 1 allocs, 1 frees, 72,704 bytes allocated
==9356==
==9356== All heap blocks were freed -- no leaks are possible
ORIGINAL POST:
Take the following trivial program:
#include <iostream>
int main() {
return 0;
}
If I run this using valgrind, I'm told that there are 72,704 bytes in 1 blocks that are still reachable. There have been extensive discussions on SO about whether or not to worry about still reachable warnings--I'm not concerned about that. I'd just like to understand how simply including a standard library header could cause a still reachable warning, when none of the objects from that library were allocated in the program itself.
Here is the full valgrind output:
$ valgrind --leak-check=full --track-origins=yes --show-reachable=yes ./ValgrindTest
==27671== Memcheck, a memory error detector
==27671== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==27671== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==27671== Command: ./ValgrindTest
==27671==
==27671==
==27671== HEAP SUMMARY:
==27671== in use at exit: 72,704 bytes in 1 blocks
==27671== total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated
==27671==
==27671== 72,704 bytes in 1 blocks are still reachable in loss record 1 of 1
==27671== at 0x4C2AB9D: malloc (vg_replace_malloc.c:296)
==27671== by 0x4EC060F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==27671== by 0x400F305: call_init.part.0 (dl-init.c:85)
==27671== by 0x400F3DE: call_init (dl-init.c:52)
==27671== by 0x400F3DE: _dl_init (dl-init.c:134)
==27671== by 0x40016E9: ??? (in /lib/x86_64-linux-gnu/ld-2.15.so)
==27671==
==27671== LEAK SUMMARY:
==27671== definitely lost: 0 bytes in 0 blocks
==27671== indirectly lost: 0 bytes in 0 blocks
==27671== possibly lost: 0 bytes in 0 blocks
==27671== still reachable: 72,704 bytes in 1 blocks
==27671== suppressed: 0 bytes in 0 blocks
==27671==
==27671== For counts of detected and suppressed errors, rerun with: -v
==27671== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
And an object dump:
$ objdump -d ValgrindTest
ValgrindTest: file format elf64-x86-64
Disassembly of section .init:
0000000000400718 <_init>:
400718: 48 83 ec 08 sub $0x8,%rsp
40071c: e8 8b 00 00 00 callq 4007ac <call_gmon_start>
400721: 48 83 c4 08 add $0x8,%rsp
400725: c3 retq
Disassembly of section .plt:
0000000000400730 <_ZNSt8ios_base4InitC1Ev#plt-0x10>:
400730: ff 35 ba 08 20 00 pushq 0x2008ba(%rip) # 600ff0 <_GLOBAL_OFFSET_TABLE_+0x8>
400736: ff 25 bc 08 20 00 jmpq *0x2008bc(%rip) # 600ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
40073c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000400740 <_ZNSt8ios_base4InitC1Ev#plt>:
400740: ff 25 ba 08 20 00 jmpq *0x2008ba(%rip) # 601000 <_GLOBAL_OFFSET_TABLE_+0x18>
400746: 68 00 00 00 00 pushq $0x0
40074b: e9 e0 ff ff ff jmpq 400730 <_init+0x18>
0000000000400750 <__libc_start_main#plt>:
400750: ff 25 b2 08 20 00 jmpq *0x2008b2(%rip) # 601008 <_GLOBAL_OFFSET_TABLE_+0x20>
400756: 68 01 00 00 00 pushq $0x1
40075b: e9 d0 ff ff ff jmpq 400730 <_init+0x18>
0000000000400760 <__cxa_atexit#plt>:
400760: ff 25 aa 08 20 00 jmpq *0x2008aa(%rip) # 601010 <_GLOBAL_OFFSET_TABLE_+0x28>
400766: 68 02 00 00 00 pushq $0x2
40076b: e9 c0 ff ff ff jmpq 400730 <_init+0x18>
0000000000400770 <_ZNSt8ios_base4InitD1Ev#plt>:
400770: ff 25 a2 08 20 00 jmpq *0x2008a2(%rip) # 601018 <_GLOBAL_OFFSET_TABLE_+0x30>
400776: 68 03 00 00 00 pushq $0x3
40077b: e9 b0 ff ff ff jmpq 400730 <_init+0x18>
Disassembly of section .text:
0000000000400780 <_start>:
400780: 31 ed xor %ebp,%ebp
400782: 49 89 d1 mov %rdx,%r9
400785: 5e pop %rsi
400786: 48 89 e2 mov %rsp,%rdx
400789: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
40078d: 50 push %rax
40078e: 54 push %rsp
40078f: 49 c7 c0 80 09 40 00 mov $0x400980,%r8
400796: 48 c7 c1 f0 08 40 00 mov $0x4008f0,%rcx
40079d: 48 c7 c7 90 08 40 00 mov $0x400890,%rdi
4007a4: e8 a7 ff ff ff callq 400750 <__libc_start_main#plt>
4007a9: f4 hlt
4007aa: 90 nop
4007ab: 90 nop
00000000004007ac <call_gmon_start>:
4007ac: 48 83 ec 08 sub $0x8,%rsp
4007b0: 48 8b 05 29 08 20 00 mov 0x200829(%rip),%rax # 600fe0 <_DYNAMIC+0x1f0>
4007b7: 48 85 c0 test %rax,%rax
4007ba: 74 02 je 4007be <call_gmon_start+0x12>
4007bc: ff d0 callq *%rax
4007be: 48 83 c4 08 add $0x8,%rsp
4007c2: c3 retq
4007c3: 90 nop
4007c4: 90 nop
4007c5: 90 nop
4007c6: 90 nop
4007c7: 90 nop
4007c8: 90 nop
4007c9: 90 nop
4007ca: 90 nop
4007cb: 90 nop
4007cc: 90 nop
4007cd: 90 nop
4007ce: 90 nop
4007cf: 90 nop
00000000004007d0 <deregister_tm_clones>:
4007d0: b8 37 10 60 00 mov $0x601037,%eax
4007d5: 55 push %rbp
4007d6: 48 2d 30 10 60 00 sub $0x601030,%rax
4007dc: 48 83 f8 0e cmp $0xe,%rax
4007e0: 48 89 e5 mov %rsp,%rbp
4007e3: 77 02 ja 4007e7 <deregister_tm_clones+0x17>
4007e5: 5d pop %rbp
4007e6: c3 retq
4007e7: b8 00 00 00 00 mov $0x0,%eax
4007ec: 48 85 c0 test %rax,%rax
4007ef: 74 f4 je 4007e5 <deregister_tm_clones+0x15>
4007f1: 5d pop %rbp
4007f2: bf 30 10 60 00 mov $0x601030,%edi
4007f7: ff e0 jmpq *%rax
4007f9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000400800 <register_tm_clones>:
400800: b8 30 10 60 00 mov $0x601030,%eax
400805: 55 push %rbp
400806: 48 2d 30 10 60 00 sub $0x601030,%rax
40080c: 48 c1 f8 03 sar $0x3,%rax
400810: 48 89 e5 mov %rsp,%rbp
400813: 48 89 c2 mov %rax,%rdx
400816: 48 c1 ea 3f shr $0x3f,%rdx
40081a: 48 01 d0 add %rdx,%rax
40081d: 48 d1 f8 sar %rax
400820: 75 02 jne 400824 <register_tm_clones+0x24>
400822: 5d pop %rbp
400823: c3 retq
400824: ba 00 00 00 00 mov $0x0,%edx
400829: 48 85 d2 test %rdx,%rdx
40082c: 74 f4 je 400822 <register_tm_clones+0x22>
40082e: 5d pop %rbp
40082f: 48 89 c6 mov %rax,%rsi
400832: bf 30 10 60 00 mov $0x601030,%edi
400837: ff e2 jmpq *%rdx
400839: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000400840 <__do_global_dtors_aux>:
400840: 80 3d e9 07 20 00 00 cmpb $0x0,0x2007e9(%rip) # 601030 <__bss_start>
400847: 75 11 jne 40085a <__do_global_dtors_aux+0x1a>
400849: 55 push %rbp
40084a: 48 89 e5 mov %rsp,%rbp
40084d: e8 7e ff ff ff callq 4007d0 <deregister_tm_clones>
400852: 5d pop %rbp
400853: c6 05 d6 07 20 00 01 movb $0x1,0x2007d6(%rip) # 601030 <__bss_start>
40085a: f3 c3 repz retq
40085c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000400860 <frame_dummy>:
400860: 48 83 3d 80 05 20 00 cmpq $0x0,0x200580(%rip) # 600de8 <__JCR_END__>
400867: 00
400868: 74 1e je 400888 <frame_dummy+0x28>
40086a: b8 00 00 00 00 mov $0x0,%eax
40086f: 48 85 c0 test %rax,%rax
400872: 74 14 je 400888 <frame_dummy+0x28>
400874: 55 push %rbp
400875: bf e8 0d 60 00 mov $0x600de8,%edi
40087a: 48 89 e5 mov %rsp,%rbp
40087d: ff d0 callq *%rax
40087f: 5d pop %rbp
400880: e9 7b ff ff ff jmpq 400800 <register_tm_clones>
400885: 0f 1f 00 nopl (%rax)
400888: e9 73 ff ff ff jmpq 400800 <register_tm_clones>
40088d: 90 nop
40088e: 90 nop
40088f: 90 nop
0000000000400890 <main>:
400890: 55 push %rbp
400891: 48 89 e5 mov %rsp,%rbp
400894: b8 00 00 00 00 mov $0x0,%eax
400899: 5d pop %rbp
40089a: c3 retq
000000000040089b <_Z41__static_initialization_and_destruction_0ii>:
40089b: 55 push %rbp
40089c: 48 89 e5 mov %rsp,%rbp
40089f: 48 83 ec 10 sub $0x10,%rsp
4008a3: 89 7d fc mov %edi,-0x4(%rbp)
4008a6: 89 75 f8 mov %esi,-0x8(%rbp)
4008a9: 83 7d fc 01 cmpl $0x1,-0x4(%rbp)
4008ad: 75 27 jne 4008d6 <_Z41__static_initialization_and_destruction_0ii+0x3b>
4008af: 81 7d f8 ff ff 00 00 cmpl $0xffff,-0x8(%rbp)
4008b6: 75 1e jne 4008d6 <_Z41__static_initialization_and_destruction_0ii+0x3b>
4008b8: bf 34 10 60 00 mov $0x601034,%edi
4008bd: e8 7e fe ff ff callq 400740 <_ZNSt8ios_base4InitC1Ev#plt>
4008c2: ba 28 10 60 00 mov $0x601028,%edx
4008c7: be 34 10 60 00 mov $0x601034,%esi
4008cc: bf 70 07 40 00 mov $0x400770,%edi
4008d1: e8 8a fe ff ff callq 400760 <__cxa_atexit#plt>
4008d6: c9 leaveq
4008d7: c3 retq
00000000004008d8 <_GLOBAL__sub_I_main>:
4008d8: 55 push %rbp
4008d9: 48 89 e5 mov %rsp,%rbp
4008dc: be ff ff 00 00 mov $0xffff,%esi
4008e1: bf 01 00 00 00 mov $0x1,%edi
4008e6: e8 b0 ff ff ff callq 40089b <_Z41__static_initialization_and_destruction_0ii>
4008eb: 5d pop %rbp
4008ec: c3 retq
4008ed: 90 nop
4008ee: 90 nop
4008ef: 90 nop
00000000004008f0 <__libc_csu_init>:
4008f0: 48 89 6c 24 d8 mov %rbp,-0x28(%rsp)
4008f5: 4c 89 64 24 e0 mov %r12,-0x20(%rsp)
4008fa: 48 8d 2d df 04 20 00 lea 0x2004df(%rip),%rbp # 600de0 <__init_array_end>
400901: 4c 8d 25 c8 04 20 00 lea 0x2004c8(%rip),%r12 # 600dd0 <__frame_dummy_init_array_entry>
400908: 4c 89 6c 24 e8 mov %r13,-0x18(%rsp)
40090d: 4c 89 74 24 f0 mov %r14,-0x10(%rsp)
400912: 4c 89 7c 24 f8 mov %r15,-0x8(%rsp)
400917: 48 89 5c 24 d0 mov %rbx,-0x30(%rsp)
40091c: 48 83 ec 38 sub $0x38,%rsp
400920: 4c 29 e5 sub %r12,%rbp
400923: 41 89 fd mov %edi,%r13d
400926: 49 89 f6 mov %rsi,%r14
400929: 48 c1 fd 03 sar $0x3,%rbp
40092d: 49 89 d7 mov %rdx,%r15
400930: e8 e3 fd ff ff callq 400718 <_init>
400935: 48 85 ed test %rbp,%rbp
400938: 74 1c je 400956 <__libc_csu_init+0x66>
40093a: 31 db xor %ebx,%ebx
40093c: 0f 1f 40 00 nopl 0x0(%rax)
400940: 4c 89 fa mov %r15,%rdx
400943: 4c 89 f6 mov %r14,%rsi
400946: 44 89 ef mov %r13d,%edi
400949: 41 ff 14 dc callq *(%r12,%rbx,8)
40094d: 48 83 c3 01 add $0x1,%rbx
400951: 48 39 eb cmp %rbp,%rbx
400954: 75 ea jne 400940 <__libc_csu_init+0x50>
400956: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx
40095b: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp
400960: 4c 8b 64 24 18 mov 0x18(%rsp),%r12
400965: 4c 8b 6c 24 20 mov 0x20(%rsp),%r13
40096a: 4c 8b 74 24 28 mov 0x28(%rsp),%r14
40096f: 4c 8b 7c 24 30 mov 0x30(%rsp),%r15
400974: 48 83 c4 38 add $0x38,%rsp
400978: c3 retq
400979: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000400980 <__libc_csu_fini>:
400980: f3 c3 repz retq
400982: 90 nop
400983: 90 nop
Disassembly of section .fini:
0000000000400984 <_fini>:
400984: 48 83 ec 08 sub $0x8,%rsp
400988: 48 83 c4 08 add $0x8,%rsp
40098c: c3 retq
For completeness, I'm using:
Ubuntu: 12.04
Valgrind: 3.10.1 3.7.0
g++: 4.8.1
NB: As a side note, this does not happen when I include other headers such as <fstream> or <cmath>.

It's Valgrind's fault. First, -fsanitize=leak does not show anything. Second, Valgrind itself states that:
First of all: relax, it's probably not a bug, but a feature. Many
implementations of the C++ standard libraries use their own memory
pool allocators. Memory for quite a number of destructed objects is
not immediately freed and given back to the OS, but kept in the
pool(s) for later re-use. The fact that the pools are not freed at the
exit of the program cause Valgrind to report this memory as still
reachable. The behaviour not to free pools at the exit could be called
a bug of the library though.
Using GCC, you can force the STL to use malloc and to free memory as
soon as possible by globally disabling memory caching. Beware! Doing
so will probably slow down your program, sometimes drastically.
With GCC 2.91, 2.95, 3.0 and 3.1, compile all source using the STL
with -D__USE_MALLOC. Beware! This was removed from GCC starting with
version 3.3.
With GCC 3.2.2 and later, you should export the environment variable
GLIBCPP_FORCE_NEW before running your program.
With GCC 3.4 and later, that variable has changed name to
GLIBCXX_FORCE_NEW.
[...]
I guess those alleged memory pools are freed after program's termination, in the so-called start-up code that calls main, among the other settings. Internal functions defined outside user's code should be treated as if they didn't exist, that's why Valgrind can't (and shouldn't) see further frees.

Consider the following trivial include file:
#ifndef TRIVIAL_INCLUDE_FILE
#define TRIVIAL_INCLUDE_FILE
static int *x = new int (0);
#endif

For gcc 6 and higher, a related bug fix arrived:
Bug report: "Warning about "still reachable" memory when using libstdc++ from gcc 5"
Discussion on the Arch Linux bug tracker
With gcc 5, you can also get the same warning without including iostream.
So, if you see a similar warning referring to dl-init.c and you are using gcc 5, consider upgrading to a newer version (gcc >=6), or try to compile with clang.

Related

How are non-static, non-virtual methods implemented in C++?

I wanted to know how methods are implemented in C++. I wanted to know how methods are implemented "under the hood".
So, I have made a simple C++ program which has a class with 1 non static field and 1 non static, non virtual method.
Then I instantiated the class in the main function and called the method. I have used objdump -d option in order to see the CPU instructions of this program. I have a x86-64 processor.
Here's the code:
#include<stdio.h>
class TestClass {
public:
int x;
int xPlus2(){
return x + 2;
}
};
int main(){
TestClass tc1 = {5};
int variable = tc1.xPlus2();
printf("%d \n", variable);
return 0;
}
Here are instructions for the method xPlus2:
0000000000402c30 <_ZN9TestClass6xPlus2Ev>:
402c30: 55 push %rbp
402c31: 48 89 e5 mov %rsp,%rbp
402c34: 48 89 4d 10 mov %rcx,0x10(%rbp)
402c38: 48 8b 45 10 mov 0x10(%rbp),%rax
402c3c: 8b 00 mov (%rax),%eax
402c3e: 83 c0 02 add $0x2,%eax
402c41: 5d pop %rbp
402c42: c3 retq
402c43: 90 nop
402c44: 90 nop
402c45: 90 nop
402c46: 90 nop
402c47: 90 nop
402c48: 90 nop
402c49: 90 nop
402c4a: 90 nop
402c4b: 90 nop
402c4c: 90 nop
402c4d: 90 nop
402c4e: 90 nop
402c4f: 90 nop
If I understand it correctly, these instructions can be replaced by just 3 instructions, because I believe that I don't need to use the stack, I think the compiler used it redundantly:
mov (%rcx), eax
add $2, eax
retq
and then maybe I still need lots of nop instructions for synchronization purposes or whatnot. If you look at the CPU instructions, it looks like the value that x field has is stored at the location in memory which rcx register holds. You will see the rest of the CPU instructions in a moment. It is a little bit hard for me to track what has happened here (especially what is going on with the call of _main function), I don't even know what parts of assembly are important to look at. Compiler produces main function (as I expected), but then it also produced _main function which is called from the main, there are some weird functions in between those two as well.
Here are other parts of the assembly that I think may be interesting:
0000000000401550 <main>:
401550: 55 push %rbp
401551: 48 89 e5 mov %rsp,%rbp
401554: 48 83 ec 30 sub $0x30,%rsp
401558: e8 e3 00 00 00 callq 401640 <__main>
40155d: c7 45 f8 05 00 00 00 movl $0x5,-0x8(%rbp)
401564: 48 8d 45 f8 lea -0x8(%rbp),%rax
401568: 48 89 c1 mov %rax,%rcx
40156b: e8 c0 16 00 00 callq 402c30 <_ZN9TestClass6xPlus2Ev>
401570: 89 45 fc mov %eax,-0x4(%rbp)
401573: 8b 45 fc mov -0x4(%rbp),%eax
401576: 89 c2 mov %eax,%edx
401578: 48 8d 0d 81 2a 00 00 lea 0x2a81(%rip),%rcx # 404000 <.rdata>
40157f: e8 ec 14 00 00 callq 402a70 <printf>
401584: b8 00 00 00 00 mov $0x0,%eax
401589: 48 83 c4 30 add $0x30,%rsp
40158d: 5d pop %rbp
40158e: c3 retq
40158f: 90 nop
0000000000401590 <__do_global_dtors>:
401590: 48 83 ec 28 sub $0x28,%rsp
401594: 48 8b 05 75 1a 00 00 mov 0x1a75(%rip),%rax # 403010 <p.93846>
40159b: 48 8b 00 mov (%rax),%rax
40159e: 48 85 c0 test %rax,%rax
4015a1: 74 1d je 4015c0 <__do_global_dtors+0x30>
4015a3: ff d0 callq *%rax
4015a5: 48 8b 05 64 1a 00 00 mov 0x1a64(%rip),%rax # 403010 <p.93846>
4015ac: 48 8d 50 08 lea 0x8(%rax),%rdx
4015b0: 48 8b 40 08 mov 0x8(%rax),%rax
4015b4: 48 89 15 55 1a 00 00 mov %rdx,0x1a55(%rip) # 403010 <p.93846>
4015bb: 48 85 c0 test %rax,%rax
4015be: 75 e3 jne 4015a3 <__do_global_dtors+0x13>
4015c0: 48 83 c4 28 add $0x28,%rsp
4015c4: c3 retq
4015c5: 90 nop
4015c6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
4015cd: 00 00 00
00000000004015d0 <__do_global_ctors>:
4015d0: 56 push %rsi
4015d1: 53 push %rbx
4015d2: 48 83 ec 28 sub $0x28,%rsp
4015d6: 48 8b 0d 23 2d 00 00 mov 0x2d23(%rip),%rcx # 404300 <.refptr.__CTOR_LIST__>
4015dd: 48 8b 11 mov (%rcx),%rdx
4015e0: 83 fa ff cmp $0xffffffff,%edx
4015e3: 89 d0 mov %edx,%eax
4015e5: 74 39 je 401620 <__do_global_ctors+0x50>
4015e7: 85 c0 test %eax,%eax
4015e9: 74 20 je 40160b <__do_global_ctors+0x3b>
4015eb: 89 c2 mov %eax,%edx
4015ed: 83 e8 01 sub $0x1,%eax
4015f0: 48 8d 1c d1 lea (%rcx,%rdx,8),%rbx
4015f4: 48 29 c2 sub %rax,%rdx
4015f7: 48 8d 74 d1 f8 lea -0x8(%rcx,%rdx,8),%rsi
4015fc: 0f 1f 40 00 nopl 0x0(%rax)
401600: ff 13 callq *(%rbx)
401602: 48 83 eb 08 sub $0x8,%rbx
401606: 48 39 f3 cmp %rsi,%rbx
401609: 75 f5 jne 401600 <__do_global_ctors+0x30>
40160b: 48 8d 0d 7e ff ff ff lea -0x82(%rip),%rcx # 401590 <__do_global_dtors>
401612: 48 83 c4 28 add $0x28,%rsp
401616: 5b pop %rbx
401617: 5e pop %rsi
401618: e9 f3 fe ff ff jmpq 401510 <atexit>
40161d: 0f 1f 00 nopl (%rax)
401620: 31 c0 xor %eax,%eax
401622: eb 02 jmp 401626 <__do_global_ctors+0x56>
401624: 89 d0 mov %edx,%eax
401626: 44 8d 40 01 lea 0x1(%rax),%r8d
40162a: 4a 83 3c c1 00 cmpq $0x0,(%rcx,%r8,8)
40162f: 4c 89 c2 mov %r8,%rdx
401632: 75 f0 jne 401624 <__do_global_ctors+0x54>
401634: eb b1 jmp 4015e7 <__do_global_ctors+0x17>
401636: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40163d: 00 00 00
0000000000401640 <__main>:
401640: 8b 05 ea 59 00 00 mov 0x59ea(%rip),%eax # 407030 <initialized>
401646: 85 c0 test %eax,%eax
401648: 74 06 je 401650 <__main+0x10>
40164a: c3 retq
40164b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
401650: c7 05 d6 59 00 00 01 movl $0x1,0x59d6(%rip) # 407030 <initialized>
401657: 00 00 00
40165a: e9 71 ff ff ff jmpq 4015d0 <__do_global_ctors>
40165f: 90 nop
I think what you are looking for are these instructions:
40155d: c7 45 f8 05 00 00 00 movl $0x5,-0x8(%rbp)
401564: 48 8d 45 f8 lea -0x8(%rbp),%rax
401568: 48 89 c1 mov %rax,%rcx
40156b: e8 c0 16 00 00 callq 402c30 <_ZN9TestClass6xPlus2Ev>
401570: 89 45 fc mov %eax,-0x4(%rbp)
These match with the code from main:
TestClass tc1 = {5};
int variable = tc1.xPlus2();
At address 40155d the field tc1.x is initialized with the value 5.
At address 401564 the pointer to tc1 is loaded into the register %rax
At address 401568 the pointer to tc1 is copied into the register %rcx
At address 40156b is the call of the method tc1.xPlus2()
At address 401570 the result is store in variable
Your observations are mostly correct. rcx holds the this pointer to the object on which the method was called. x is stored in the first area of memory that the this pointer points to, so that is why rcx was dereferenced and the result added to. It is the responsibility of the caller to make sure that rcx is the address of the object before invoking the function. We can see main prepare rcx by setting it to an address in its stack frame. You are correct that the compiler produced inefficient code here and did not need to use the stack. Compiling with higher optimization levels -O1, -O2, or -O3 will likely fix that. These higher optimizations will probably get rid of the nops too, since they are used for function alignment. You can mostly ignore __main. It's used for libc initialization.

What does assembly code of function "do_compare" exactly do?

The do_compare function is in the libstdc++ library. It basically checks two strings and returns -1, 1, or 0 accordingly.
Here is the C++ code:
template<typename _CharT>
int
collate<_CharT>::
do_compare(const _CharT* __lo1, const _CharT* __hi1,
const _CharT* __lo2, const _CharT* __hi2) const
{
// strcoll assumes zero-terminated strings so we make a copy
// and then put a zero at the end.
const string_type __one(__lo1, __hi1);
const string_type __two(__lo2, __hi2);
const _CharT* __p = __one.c_str();
const _CharT* __pend = __one.data() + __one.length();
const _CharT* __q = __two.c_str();
const _CharT* __qend = __two.data() + __two.length();
// strcoll stops when it sees a nul character so we break
// the strings into zero-terminated substrings and pass those
// to strcoll.
for (;;)
{
const int __res = _M_compare(__p, __q);
if (__res)
return __res;
__p += char_traits<_CharT>::length(__p);
__q += char_traits<_CharT>::length(__q);
if (__p == __pend && __q == __qend)
return 0;
else if (__p == __pend)
return -1;
else if (__q == __qend)
return 1;
__p++;
__q++;
}
}
I have to put the entire assembly code of do_compare to show my problem, sorry:
0000000000101c40 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4>:
101c40: 41 57 push %r15
101c42: 41 56 push %r14
101c44: 49 89 fe mov %rdi,%r14
101c47: 48 89 f7 mov %rsi,%rdi
101c4a: 48 89 d6 mov %rdx,%rsi
101c4d: 41 55 push %r13
101c4f: 41 54 push %r12
101c51: 55 push %rbp
101c52: 4c 89 c5 mov %r8,%rbp
101c55: 53 push %rbx
101c56: 48 89 cb mov %rcx,%rbx
101c59: 48 83 ec 38 sub $0x38,%rsp
101c5d: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
101c64: 00 00
101c66: 48 89 44 24 28 mov %rax,0x28(%rsp)
101c6b: 31 c0 xor %eax,%eax
101c6d: 4c 8d 6c 24 27 lea 0x27(%rsp),%r13
101c72: 4c 89 ea mov %r13,%rdx
101c75: 4c 89 6c 24 18 mov %r13,0x18(%rsp)
101c7a: e8 f1 a2 f8 ff callq 8bf70 <_ZNSs12_S_constructIPKcEEPcT_S3_RKSaIcESt20forward_iterator_tag#plt>
101c7f: 4c 89 ea mov %r13,%rdx
101c82: 48 89 ee mov %rbp,%rsi
101c85: 48 89 df mov %rbx,%rdi
101c88: 49 89 c7 mov %rax,%r15
101c8b: 48 89 44 24 08 mov %rax,0x8(%rsp)
101c90: e8 db a2 f8 ff callq 8bf70 <_ZNSs12_S_constructIPKcEEPcT_S3_RKSaIcESt20forward_iterator_tag#plt>
101c95: 4d 8b 67 e8 mov -0x18(%r15),%r12
101c99: 4c 8b 68 e8 mov -0x18(%rax),%r13
101c9d: 48 89 c5 mov %rax,%rbp
101ca0: 48 89 44 24 10 mov %rax,0x10(%rsp)
101ca5: 4c 89 fb mov %r15,%rbx
101ca8: 4d 01 fc add %r15,%r12
101cab: 49 01 c5 add %rax,%r13
101cae: eb 32 jmp 101ce2 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0xa2>
101cb0: 48 89 df mov %rbx,%rdi
101cb3: e8 98 87 f8 ff callq 8a450 <strlen#plt>
101cb8: 48 89 ef mov %rbp,%rdi
101cbb: 48 01 c3 add %rax,%rbx
101cbe: e8 8d 87 f8 ff callq 8a450 <strlen#plt>
101cc3: 48 01 c5 add %rax,%rbp
101cc6: 49 39 dc cmp %rbx,%r12
101cc9: 75 05 jne 101cd0 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x90>
101ccb: 49 39 ed cmp %rbp,%r13
101cce: 74 27 je 101cf7 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0xb7>
101cd0: 49 39 dc cmp %rbx,%r12
101cd3: 74 6b je 101d40 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x100>
101cd5: 49 39 ed cmp %rbp,%r13
101cd8: 74 76 je 101d50 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x110>
101cda: 48 83 c3 01 add $0x1,%rbx
101cde: 48 83 c5 01 add $0x1,%rbp
101ce2: 48 89 ea mov %rbp,%rdx
101ce5: 48 89 de mov %rbx,%rsi
101ce8: 4c 89 f7 mov %r14,%rdi
101ceb: e8 20 8b f8 ff callq 8a810 <_ZNKSt7collateIcE10_M_compareEPKcS2_#plt>
101cf0: 41 89 c7 mov %eax,%r15d
101cf3: 85 c0 test %eax,%eax
101cf5: 74 b9 je 101cb0 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x70>
101cf7: 48 8b 7c 24 10 mov 0x10(%rsp),%rdi
101cfc: 48 8b 1d 9d 08 28 00 mov 0x28089d(%rip),%rbx # 3825a0 <_ZNSs4_Rep20_S_empty_rep_storageE##GLIBCXX_3.4-0x57e0>
101d03: 48 83 ef 18 sub $0x18,%rdi
101d07: 48 39 df cmp %rbx,%rdi
101d0a: 75 54 jne 101d60 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x120>
101d0c: 48 8b 7c 24 08 mov 0x8(%rsp),%rdi
101d11: 48 83 ef 18 sub $0x18,%rdi
101d15: 48 39 df cmp %rbx,%rdi
101d18: 75 56 jne 101d70 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x130>
101d1a: 48 8b 4c 24 28 mov 0x28(%rsp),%rcx
101d1f: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
101d26: 00 00
101d28: 44 89 f8 mov %r15d,%eax
101d2b: 75 4f jne 101d7c <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x13c>
101d2d: 48 83 c4 38 add $0x38,%rsp
101d31: 5b pop %rbx
101d32: 5d pop %rbp
101d33: 41 5c pop %r12
101d35: 41 5d pop %r13
101d37: 41 5e pop %r14
101d39: 41 5f pop %r15
101d3b: c3 retq
101d3c: 0f 1f 40 00 nopl 0x0(%rax)
101d40: 41 bf ff ff ff ff mov $0xffffffff,%r15d
101d46: eb af jmp 101cf7 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0xb7>
101d48: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
101d4f: 00
101d50: 41 bf 01 00 00 00 mov $0x1,%r15d
101d56: eb 9f jmp 101cf7 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0xb7>
101d58: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
101d5f: 00
101d60: 48 8b 74 24 18 mov 0x18(%rsp),%rsi
101d65: e8 96 fe ff ff callq 101c00 <_ZNSt14codecvt_bynameIcc11__mbstate_tED0Ev##GLIBCXX_3.4+0x20>
101d6a: eb a0 jmp 101d0c <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0xcc>
101d6c: 0f 1f 40 00 nopl 0x0(%rax)
101d70: 48 8b 74 24 18 mov 0x18(%rsp),%rsi
101d75: e8 86 fe ff ff callq 101c00 <_ZNSt14codecvt_bynameIcc11__mbstate_tED0Ev##GLIBCXX_3.4+0x20>
101d7a: eb 9e jmp 101d1a <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0xda>
101d7c: e8 7f 95 f8 ff callq 8b300 <__stack_chk_fail#plt>
101d81: 48 89 c3 mov %rax,%rbx
101d84: 48 8b 7c 24 08 mov 0x8(%rsp),%rdi
101d89: 48 83 ef 18 sub $0x18,%rdi
101d8d: 48 3b 3d 0c 08 28 00 cmp 0x28080c(%rip),%rdi # 3825a0 <_ZNSs4_Rep20_S_empty_rep_storageE##GLIBCXX_3.4-0x57e0>
101d94: 74 0a je 101da0 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x160>
101d96: 48 8b 74 24 18 mov 0x18(%rsp),%rsi
101d9b: e8 60 fe ff ff callq 101c00 <_ZNSt14codecvt_bynameIcc11__mbstate_tED0Ev##GLIBCXX_3.4+0x20>
101da0: 48 89 df mov %rbx,%rdi
101da3: e8 e8 a1 f8 ff callq 8bf90 <_Unwind_Resume#plt>
101da8: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
101daf: 00
*******101db0: 53 push %rbx
101db1: 48 89 fb mov %rdi,%rbx
101db4: 48 8b 3f mov (%rdi),%rdi
101db7: 89 f0 mov %esi,%eax
101db9: 48 85 ff test %rdi,%rdi
101dbc: 74 05 je 101dc3 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x183>
101dbe: 83 fe ff cmp $0xffffffff,%esi
101dc1: 74 05 je 101dc8 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x188>
101dc3: 5b pop %rbx
101dc4: c3 retq
101dc5: 0f 1f 00 nopl (%rax)
101dc8: 48 8b 47 10 mov 0x10(%rdi),%rax
101dcc: 48 3b 47 18 cmp 0x18(%rdi),%rax
101dd0: 73 0e jae 101de0 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x1a0>
101dd2: 0f b6 00 movzbl (%rax),%eax
101dd5: 5b pop %rbx
101dd6: c3 retq
101dd7: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
101dde: 00 00
101de0: 48 8b 07 mov (%rdi),%rax
101de3: ff 50 48 callq *0x48(%rax)
101de6: 83 f8 ff cmp $0xffffffff,%eax
101de9: 75 d8 jne 101dc3 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x183>
101deb: 48 c7 03 00 00 00 00 movq $0x0,(%rbx)
101df2: 5b pop %rbx
101df3: c3 retq
101df4: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
101dfb: 00 00 00
101dfe: 66 90 xchg %ax,%ax
101e00: 55 push %rbp
101e01: 89 f5 mov %esi,%ebp
101e03: 53 push %rbx
101e04: 48 89 fb mov %rdi,%rbx
101e07: 48 83 ec 08 sub $0x8,%rsp
101e0b: e8 b0 88 f8 ff callq 8a6c0 <_ZNKSt5ctypeIcE13_M_widen_initEv#plt>
101e10: 48 8b 03 mov (%rbx),%rax
101e13: 48 8b 40 30 mov 0x30(%rax),%rax
101e17: 48 3b 05 7a 11 28 00 cmp 0x28117a(%rip),%rax # 382f98 <_ZNKSt5ctypeIcE8do_widenEc##GLIBCXX_3.4+0x2e2c48>
101e1e: 75 10 jne 101e30 <_ZNKSt7collateIcE10do_compareEPKcS2_S2_S2_##GLIBCXX_3.4+0x1f0>
101e20: 48 83 c4 08 add $0x8,%rsp
101e24: 89 e8 mov %ebp,%eax
101e26: 5b pop %rbx
101e27: 5d pop %rbp
101e28: c3 retq
101e29: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
101e30: 48 83 c4 08 add $0x8,%rsp
101e34: 40 0f be f5 movsbl %bpl,%esi
101e38: 48 89 df mov %rbx,%rdi
101e3b: 5b pop %rbx
101e3c: 5d pop %rbp
101e3d: ff e0 jmpq *%rax
101e3f: 90 nop
It seems to me that the assembly code not only performs the C++ code logic but also adds other logic.
As an example, the function _M_extract_int in libstdc++ which coverts a char to int calls this function as the following:
callq 0x101db0
The instruction address 0x101db0 is in the middle of the assembly code. The code section from 0x101db0 to 0x101dbc seems to have nothing to do with the above C++ code. Really confused about what is going on here...

How std::memory_order_XXX works

I don't understand how std::memory_order_XXX(like memory_order_release/memory_order_acquire ...) works.
From some documents, it shows that these memory mode have different feature, but I'm really confused that they have the same assemble code, what determined the differences?
That code:
static std::atomic<long> gt;
void test1() {
gt.store(1, std::memory_order_release);
gt.store(2, std::memory_order_relaxed);
gt.load(std::memory_order_acquire);
gt.load(std::memory_order_relaxed);
}
Corresponds to:
00000000000007a0 <_Z5test1v>:
7a0: 55 push %rbp
7a1: 48 89 e5 mov %rsp,%rbp
7a4: 48 83 ec 30 sub $0x30,%rsp
**memory_order_release:
7a8: 48 c7 45 f8 01 00 00 movq $0x1,-0x8(%rbp)
7af: 00
7b0: c7 45 e8 03 00 00 00 movl $0x3,-0x18(%rbp)
7b7: 8b 45 e8 mov -0x18(%rbp),%eax
7ba: be ff ff 00 00 mov $0xffff,%esi
7bf: 89 c7 mov %eax,%edi
7c1: e8 b1 00 00 00 callq 877 <_ZStanSt12memory_orderSt23__memory_order_modifier>
7c6: 89 45 ec mov %eax,-0x14(%rbp)
7c9: 48 8b 55 f8 mov -0x8(%rbp),%rdx
7cd: 48 8d 05 44 08 20 00 lea 0x200844(%rip),%rax # 201018 <_ZL2gt>
7d4: 48 89 10 mov %rdx,(%rax)
7d7: 0f ae f0 mfence**
**memory_order_relaxed:
7da: 48 c7 45 f0 02 00 00 movq $0x2,-0x10(%rbp)
7e1: 00
7e2: c7 45 e0 00 00 00 00 movl $0x0,-0x20(%rbp)
7e9: 8b 45 e0 mov -0x20(%rbp),%eax
7ec: be ff ff 00 00 mov $0xffff,%esi
7f1: 89 c7 mov %eax,%edi
7f3: e8 7f 00 00 00 callq 877 <_ZStanSt12memory_orderSt23__memory_order_modifier>
7f8: 89 45 e4 mov %eax,-0x1c(%rbp)
7fb: 48 8b 55 f0 mov -0x10(%rbp),%rdx
7ff: 48 8d 05 12 08 20 00 lea 0x200812(%rip),%rax # 201018 <_ZL2gt>
806: 48 89 10 mov %rdx,(%rax)
809: 0f ae f0 mfence**
**memory_order_acquire:
80c: c7 45 d8 02 00 00 00 movl $0x2,-0x28(%rbp)
813: 8b 45 d8 mov -0x28(%rbp),%eax
816: be ff ff 00 00 mov $0xffff,%esi
81b: 89 c7 mov %eax,%edi
81d: e8 55 00 00 00 callq 877 <_ZStanSt12memory_orderSt23__memory_order_modifier>
822: 89 45 dc mov %eax,-0x24(%rbp)
825: 48 8d 05 ec 07 20 00 lea 0x2007ec(%rip),%rax # 201018 <_ZL2gt>
82c: 48 8b 00 mov (%rax),%rax**
**memory_order_relaxed:
82f: c7 45 d0 00 00 00 00 movl $0x0,-0x30(%rbp)
836: 8b 45 d0 mov -0x30(%rbp),%eax
839: be ff ff 00 00 mov $0xffff,%esi
83e: 89 c7 mov %eax,%edi
840: e8 32 00 00 00 callq 877 <_ZStanSt12memory_orderSt23__memory_order_modifier>
845: 89 45 d4 mov %eax,-0x2c(%rbp)
848: 48 8d 05 c9 07 20 00 lea 0x2007c9(%rip),%rax # 201018 <_ZL2gt>
84f: 48 8b 00 mov (%rax),%rax**
852: 90 nop
853: c9 leaveq
854: c3 retq
00000000000008cc <_ZStanSt12memory_orderSt23__memory_order_modifier>:
8cc: 55 push %rbp
8cd: 48 89 e5 mov %rsp,%rbp
8d0: 89 7d fc mov %edi,-0x4(%rbp)
8d3: 89 75 f8 mov %esi,-0x8(%rbp)
8d6: 8b 55 fc mov -0x4(%rbp),%edx
8d9: 8b 45 f8 mov -0x8(%rbp),%eax
8dc: 21 d0 and %edx,%eax
8de: 5d pop %rbp
8df: c3 retq
I expect different memory mode has different implements on assemble code,
but setting different mode value is no effect on assemble, who can explain this?
Each memory model setting has its semantics. Compiler is obliged to satisfy this semantics, meaning that:
It disallows compiler to perform certain optimizations, such as reordering of reads and writes.
It instructs the compiler to propagate the very same message down to the hardware. How it is done, depends on the platform. x86_64 itself provides very strong memory model. Hence in almost all cases you will see no difference in generated assembler code for x86_64 no matter what memory model you choose. However, on RISC architectures (e.g. ARM), you will see the difference because compiler will have to insert memory barriers. Type of memory barrier depends on the selected memory model setting.
EDIT: Have a look at the JSR-133. It is very old and is about Java, but it provides the nicest explanation about memory model from the compiler perspective that I know. In particular, look at the table of memory barrier instructions for different architectures.
Given the code:
#include <atomic>
static std::atomic<long> gt;
void test1() {
gt.store(41, std::memory_order_release);
gt.store(42, std::memory_order_relaxed);
gt.load(std::memory_order_acquire);
gt.load(std::memory_order_relaxed);
}
At decent optimization level there is no garbage assembly moving values around on registers than the stack:
test1():
movq $41, gt(%rip)
movq $42, gt(%rip)
movq gt(%rip), %rax
movq gt(%rip), %rax
ret
We see that the exact same code is generated for the different memory orders; although testing different instructions in the same function in sequence is very bad practice as C++ instructions don't have to be compiled independently and context might influence code generation. But with the current code generation in GCC, it compiles each statement involving an atomic as its own. Good practice is to have a different function for each statement.
The same code is generated here because no special instruction happens to be needed for these memory orders.

c++ code compiled with -fsanitize=address crashes

I used gcc 6.3.0 with address sanitizer to compile the following code:
#include <iostream>
int increment(int &x)
{
x++;
return x;
}
int main()
{
int x = 0;
increment(x);
return 0;
}
The code gets compiled and instrumented. Objdump (-S) of the compiled code:
000008d0 <_Z9incrementRi>:
int increment(int &x) {
8d0: 55 push %ebp
8d1: 89 e5 mov %esp,%ebp
8d3: 56 push %esi
8d4: 53 push %ebx
8d5: 83 ec 08 sub $0x8,%esp
8d8: e8 9f 01 00 00 call a7c <__x86.get_pc_thunk.cx>
8dd: 81 c1 b3 89 00 00 add $0x89b3,%ecx
++x;
8e3: 8b 45 08 mov 0x8(%ebp),%eax
8e6: 89 c2 mov %eax,%edx
8e8: c1 ea 03 shr $0x3,%edx
8eb: 81 c2 00 00 00 20 add $0x20000000,%edx
8f1: 0f b6 12 movzbl (%edx),%edx
8f4: 84 d2 test %dl,%dl
8f6: 0f 95 45 f7 setne -0x9(%ebp)
8fa: 89 c6 mov %eax,%esi
8fc: 83 e6 07 and $0x7,%esi
8ff: 8d 5e 03 lea 0x3(%esi),%ebx
902: 38 d3 cmp %dl,%bl
904: 0f 9d c2 setge %dl
907: 22 55 f7 and -0x9(%ebp),%dl
90a: 84 d2 test %dl,%dl
90c: 74 0b je 919 <_Z9incrementRi+0x49>
90e: 83 ec 04 sub $0x4,%esp
911: 50 push %eax
912: 89 cb mov %ecx,%ebx
914: e8 a0 07 00 00 call 10b9 <__asan_report_load4>
919: 8b 45 08 mov 0x8(%ebp),%eax
91c: 8b 00 mov (%eax),%eax
91e: 8d 50 01 lea 0x1(%eax),%edx
921: 8b 45 08 mov 0x8(%ebp),%eax
924: 89 10 mov %edx,(%eax)
}
926: 90 nop
927: 8d 65 f8 lea -0x8(%ebp),%esp
92a: 5b pop %ebx
92b: 5e pop %esi
92c: 5d pop %ebp
92d: c3 ret
0000092e <main>:
int main(void)
{
92e: 8d 4c 24 04 lea 0x4(%esp),%ecx
932: 83 e4 f8 and $0xfffffff8,%esp
935: ff 71 fc pushl -0x4(%ecx)
938: 55 push %ebp
939: 89 e5 mov %esp,%ebp
93b: 57 push %edi
93c: 56 push %esi
93d: 53 push %ebx
93e: 51 push %ecx
93f: 83 ec 60 sub $0x60,%esp
942: e8 39 01 00 00 call a80 <__x86.get_pc_thunk.bx>
947: 81 c3 49 89 00 00 add $0x8949,%ebx
94d: 8d 75 90 lea -0x70(%ebp),%esi
950: 89 f7 mov %esi,%edi
952: 8d 83 d0 05 00 00 lea 0x5d0(%ebx),%eax
958: 83 38 00 cmpl $0x0,(%eax)
95b: 74 13 je 970 <main+0x42>
95d: 83 ec 04 sub $0x4,%esp
960: 6a 60 push $0x60
962: e8 b4 02 00 00 call c1b <__asan_stack_malloc_1>
967: 83 c4 08 add $0x8,%esp
96a: 85 c0 test %eax,%eax
96c: 74 02 je 970 <main+0x42>
96e: 89 c6 mov %eax,%esi
970: 8d 46 60 lea 0x60(%esi),%eax
973: c7 06 b3 8a b5 41 movl $0x41b58ab3,(%esi)
979: 8d 93 d0 e8 ff ff lea -0x1730(%ebx),%edx
97f: 89 56 04 mov %edx,0x4(%esi)
982: 8d 93 9e 76 ff ff lea -0x8962(%ebx),%edx
988: 89 56 08 mov %edx,0x8(%esi)
98b: 89 f3 mov %esi,%ebx
98d: c1 eb 03 shr $0x3,%ebx
990: c7 83 00 00 00 20 f1 movl $0xf1f1f1f1,0x20000000(%ebx)
997: f1 f1 f1
99a: c7 83 04 00 00 20 04 movl $0xf4f4f404,0x20000004(%ebx)
9a1: f4 f4 f4
9a4: c7 83 08 00 00 20 f3 movl $0xf3f3f3f3,0x20000008(%ebx)
9ab: f3 f3 f3
int x = 0;
9ae: c7 40 c0 00 00 00 00 movl $0x0,-0x40(%eax)
increment(x);
9b5: 83 ec 04 sub $0x4,%esp
9b8: 83 e8 40 sub $0x40,%eax
9bb: 50 push %eax
9bc: e8 0f ff ff ff call 8d0 <_Z9incrementRi>
9c1: 83 c4 08 add $0x8,%esp
return 0;
9c4: b8 00 00 00 00 mov $0x0,%eax
{
9c9: 39 f7 cmp %esi,%edi
9cb: 74 26 je 9f3 <main+0xc5>
9cd: c7 06 0e 36 e0 45 movl $0x45e0360e,(%esi)
9d3: c7 83 00 00 00 20 f5 movl $0xf5f5f5f5,0x20000000(%ebx)
9da: f5 f5 f5
9dd: c7 83 04 00 00 20 f5 movl $0xf5f5f5f5,0x20000004(%ebx)
9e4: f5 f5 f5
9e7: c7 83 08 00 00 20 f5 movl $0xf5f5f5f5,0x20000008(%ebx)
9ee: f5 f5 f5
9f1: eb 1e jmp a11 <main+0xe3>
9f3: c7 83 00 00 00 20 00 movl $0x0,0x20000000(%ebx)
9fa: 00 00 00
9fd: c7 83 04 00 00 20 00 movl $0x0,0x20000004(%ebx)
a04: 00 00 00
a07: c7 83 08 00 00 20 00 movl $0x0,0x20000008(%ebx)
a0e: 00 00 00
}
a11: 8d 65 f0 lea -0x10(%ebp),%esp
a14: 59 pop %ecx
a15: 5b pop %ebx
a16: 5e pop %esi
a17: 5f pop %edi
a18: 5d pop %ebp
a19: 8d 61 fc lea -0x4(%ecx),%esp
a1c: c3 ret
Execution crashes on instrumented code at line:
990: c7 83 00 00 00 20 f1 movl $0xf1f1f1f1,0x20000000(%ebx)
before increment(int &x) function is called.
ASAN option "stack-use-after-return" was enabled.
The code was compiled with:
gcc -O0 -g -fsanitize=address main.cpp
If the integer variable x is defined as a global variable the, code doesn't get instrumented and crash does not happen.
Before I posted my question, I found this question, that is very similar to my problem with address sanitizer.
So my question would be:
Why the execution of the code crashed at the mentioned line?
Is it possible that the instrumentation went wrong at some point?
Edit
GCC version and configure flags
Configured with:
../gcc-6.3.0/configure --prefix=/opt/V6.3.0 --target=i686-elf --with-pic
--with-newlib --enable-fully-dynamic-string --enable-languages=c,c++
--disable-initfini-array --disable-nls --disable-shared --disable-multilib
--disable-threads --disable-tls --disable-win32-registry --enable-sjlj-
exceptions --enable-frame-pointer --disable-__cxa_atexit --disable-libgomp
--disable-libquadmath --disable-libssp --disable-libada --disable-libitm
--disable-libstdcxx-verbose --disable-libstdcxx-visibility --with-default-
libstdcxx-abi=gcc4-compatible --without-headers
Thread model: single
gcc version 6.3.0 (GCC)

Weird alloc by a very simple C++ code [duplicate]

EDIT:
I've voted to close this is it is now incorrect.
In March 2016 Valgrind gained an option "--run-cxx-freeres=<yes|no>" (default is yes). This will call a libstdc++ function to free one-off allocations used for things like iostream. If you are using a post-2016 Valgrind and libstdc++ you will get
==9356== HEAP SUMMARY:
==9356== in use at exit: 0 bytes in 0 blocks
==9356== total heap usage: 1 allocs, 1 frees, 72,704 bytes allocated
==9356==
==9356== All heap blocks were freed -- no leaks are possible
ORIGINAL POST:
Take the following trivial program:
#include <iostream>
int main() {
return 0;
}
If I run this using valgrind, I'm told that there are 72,704 bytes in 1 blocks that are still reachable. There have been extensive discussions on SO about whether or not to worry about still reachable warnings--I'm not concerned about that. I'd just like to understand how simply including a standard library header could cause a still reachable warning, when none of the objects from that library were allocated in the program itself.
Here is the full valgrind output:
$ valgrind --leak-check=full --track-origins=yes --show-reachable=yes ./ValgrindTest
==27671== Memcheck, a memory error detector
==27671== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==27671== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==27671== Command: ./ValgrindTest
==27671==
==27671==
==27671== HEAP SUMMARY:
==27671== in use at exit: 72,704 bytes in 1 blocks
==27671== total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated
==27671==
==27671== 72,704 bytes in 1 blocks are still reachable in loss record 1 of 1
==27671== at 0x4C2AB9D: malloc (vg_replace_malloc.c:296)
==27671== by 0x4EC060F: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==27671== by 0x400F305: call_init.part.0 (dl-init.c:85)
==27671== by 0x400F3DE: call_init (dl-init.c:52)
==27671== by 0x400F3DE: _dl_init (dl-init.c:134)
==27671== by 0x40016E9: ??? (in /lib/x86_64-linux-gnu/ld-2.15.so)
==27671==
==27671== LEAK SUMMARY:
==27671== definitely lost: 0 bytes in 0 blocks
==27671== indirectly lost: 0 bytes in 0 blocks
==27671== possibly lost: 0 bytes in 0 blocks
==27671== still reachable: 72,704 bytes in 1 blocks
==27671== suppressed: 0 bytes in 0 blocks
==27671==
==27671== For counts of detected and suppressed errors, rerun with: -v
==27671== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
And an object dump:
$ objdump -d ValgrindTest
ValgrindTest: file format elf64-x86-64
Disassembly of section .init:
0000000000400718 <_init>:
400718: 48 83 ec 08 sub $0x8,%rsp
40071c: e8 8b 00 00 00 callq 4007ac <call_gmon_start>
400721: 48 83 c4 08 add $0x8,%rsp
400725: c3 retq
Disassembly of section .plt:
0000000000400730 <_ZNSt8ios_base4InitC1Ev#plt-0x10>:
400730: ff 35 ba 08 20 00 pushq 0x2008ba(%rip) # 600ff0 <_GLOBAL_OFFSET_TABLE_+0x8>
400736: ff 25 bc 08 20 00 jmpq *0x2008bc(%rip) # 600ff8 <_GLOBAL_OFFSET_TABLE_+0x10>
40073c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000400740 <_ZNSt8ios_base4InitC1Ev#plt>:
400740: ff 25 ba 08 20 00 jmpq *0x2008ba(%rip) # 601000 <_GLOBAL_OFFSET_TABLE_+0x18>
400746: 68 00 00 00 00 pushq $0x0
40074b: e9 e0 ff ff ff jmpq 400730 <_init+0x18>
0000000000400750 <__libc_start_main#plt>:
400750: ff 25 b2 08 20 00 jmpq *0x2008b2(%rip) # 601008 <_GLOBAL_OFFSET_TABLE_+0x20>
400756: 68 01 00 00 00 pushq $0x1
40075b: e9 d0 ff ff ff jmpq 400730 <_init+0x18>
0000000000400760 <__cxa_atexit#plt>:
400760: ff 25 aa 08 20 00 jmpq *0x2008aa(%rip) # 601010 <_GLOBAL_OFFSET_TABLE_+0x28>
400766: 68 02 00 00 00 pushq $0x2
40076b: e9 c0 ff ff ff jmpq 400730 <_init+0x18>
0000000000400770 <_ZNSt8ios_base4InitD1Ev#plt>:
400770: ff 25 a2 08 20 00 jmpq *0x2008a2(%rip) # 601018 <_GLOBAL_OFFSET_TABLE_+0x30>
400776: 68 03 00 00 00 pushq $0x3
40077b: e9 b0 ff ff ff jmpq 400730 <_init+0x18>
Disassembly of section .text:
0000000000400780 <_start>:
400780: 31 ed xor %ebp,%ebp
400782: 49 89 d1 mov %rdx,%r9
400785: 5e pop %rsi
400786: 48 89 e2 mov %rsp,%rdx
400789: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
40078d: 50 push %rax
40078e: 54 push %rsp
40078f: 49 c7 c0 80 09 40 00 mov $0x400980,%r8
400796: 48 c7 c1 f0 08 40 00 mov $0x4008f0,%rcx
40079d: 48 c7 c7 90 08 40 00 mov $0x400890,%rdi
4007a4: e8 a7 ff ff ff callq 400750 <__libc_start_main#plt>
4007a9: f4 hlt
4007aa: 90 nop
4007ab: 90 nop
00000000004007ac <call_gmon_start>:
4007ac: 48 83 ec 08 sub $0x8,%rsp
4007b0: 48 8b 05 29 08 20 00 mov 0x200829(%rip),%rax # 600fe0 <_DYNAMIC+0x1f0>
4007b7: 48 85 c0 test %rax,%rax
4007ba: 74 02 je 4007be <call_gmon_start+0x12>
4007bc: ff d0 callq *%rax
4007be: 48 83 c4 08 add $0x8,%rsp
4007c2: c3 retq
4007c3: 90 nop
4007c4: 90 nop
4007c5: 90 nop
4007c6: 90 nop
4007c7: 90 nop
4007c8: 90 nop
4007c9: 90 nop
4007ca: 90 nop
4007cb: 90 nop
4007cc: 90 nop
4007cd: 90 nop
4007ce: 90 nop
4007cf: 90 nop
00000000004007d0 <deregister_tm_clones>:
4007d0: b8 37 10 60 00 mov $0x601037,%eax
4007d5: 55 push %rbp
4007d6: 48 2d 30 10 60 00 sub $0x601030,%rax
4007dc: 48 83 f8 0e cmp $0xe,%rax
4007e0: 48 89 e5 mov %rsp,%rbp
4007e3: 77 02 ja 4007e7 <deregister_tm_clones+0x17>
4007e5: 5d pop %rbp
4007e6: c3 retq
4007e7: b8 00 00 00 00 mov $0x0,%eax
4007ec: 48 85 c0 test %rax,%rax
4007ef: 74 f4 je 4007e5 <deregister_tm_clones+0x15>
4007f1: 5d pop %rbp
4007f2: bf 30 10 60 00 mov $0x601030,%edi
4007f7: ff e0 jmpq *%rax
4007f9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000400800 <register_tm_clones>:
400800: b8 30 10 60 00 mov $0x601030,%eax
400805: 55 push %rbp
400806: 48 2d 30 10 60 00 sub $0x601030,%rax
40080c: 48 c1 f8 03 sar $0x3,%rax
400810: 48 89 e5 mov %rsp,%rbp
400813: 48 89 c2 mov %rax,%rdx
400816: 48 c1 ea 3f shr $0x3f,%rdx
40081a: 48 01 d0 add %rdx,%rax
40081d: 48 d1 f8 sar %rax
400820: 75 02 jne 400824 <register_tm_clones+0x24>
400822: 5d pop %rbp
400823: c3 retq
400824: ba 00 00 00 00 mov $0x0,%edx
400829: 48 85 d2 test %rdx,%rdx
40082c: 74 f4 je 400822 <register_tm_clones+0x22>
40082e: 5d pop %rbp
40082f: 48 89 c6 mov %rax,%rsi
400832: bf 30 10 60 00 mov $0x601030,%edi
400837: ff e2 jmpq *%rdx
400839: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000400840 <__do_global_dtors_aux>:
400840: 80 3d e9 07 20 00 00 cmpb $0x0,0x2007e9(%rip) # 601030 <__bss_start>
400847: 75 11 jne 40085a <__do_global_dtors_aux+0x1a>
400849: 55 push %rbp
40084a: 48 89 e5 mov %rsp,%rbp
40084d: e8 7e ff ff ff callq 4007d0 <deregister_tm_clones>
400852: 5d pop %rbp
400853: c6 05 d6 07 20 00 01 movb $0x1,0x2007d6(%rip) # 601030 <__bss_start>
40085a: f3 c3 repz retq
40085c: 0f 1f 40 00 nopl 0x0(%rax)
0000000000400860 <frame_dummy>:
400860: 48 83 3d 80 05 20 00 cmpq $0x0,0x200580(%rip) # 600de8 <__JCR_END__>
400867: 00
400868: 74 1e je 400888 <frame_dummy+0x28>
40086a: b8 00 00 00 00 mov $0x0,%eax
40086f: 48 85 c0 test %rax,%rax
400872: 74 14 je 400888 <frame_dummy+0x28>
400874: 55 push %rbp
400875: bf e8 0d 60 00 mov $0x600de8,%edi
40087a: 48 89 e5 mov %rsp,%rbp
40087d: ff d0 callq *%rax
40087f: 5d pop %rbp
400880: e9 7b ff ff ff jmpq 400800 <register_tm_clones>
400885: 0f 1f 00 nopl (%rax)
400888: e9 73 ff ff ff jmpq 400800 <register_tm_clones>
40088d: 90 nop
40088e: 90 nop
40088f: 90 nop
0000000000400890 <main>:
400890: 55 push %rbp
400891: 48 89 e5 mov %rsp,%rbp
400894: b8 00 00 00 00 mov $0x0,%eax
400899: 5d pop %rbp
40089a: c3 retq
000000000040089b <_Z41__static_initialization_and_destruction_0ii>:
40089b: 55 push %rbp
40089c: 48 89 e5 mov %rsp,%rbp
40089f: 48 83 ec 10 sub $0x10,%rsp
4008a3: 89 7d fc mov %edi,-0x4(%rbp)
4008a6: 89 75 f8 mov %esi,-0x8(%rbp)
4008a9: 83 7d fc 01 cmpl $0x1,-0x4(%rbp)
4008ad: 75 27 jne 4008d6 <_Z41__static_initialization_and_destruction_0ii+0x3b>
4008af: 81 7d f8 ff ff 00 00 cmpl $0xffff,-0x8(%rbp)
4008b6: 75 1e jne 4008d6 <_Z41__static_initialization_and_destruction_0ii+0x3b>
4008b8: bf 34 10 60 00 mov $0x601034,%edi
4008bd: e8 7e fe ff ff callq 400740 <_ZNSt8ios_base4InitC1Ev#plt>
4008c2: ba 28 10 60 00 mov $0x601028,%edx
4008c7: be 34 10 60 00 mov $0x601034,%esi
4008cc: bf 70 07 40 00 mov $0x400770,%edi
4008d1: e8 8a fe ff ff callq 400760 <__cxa_atexit#plt>
4008d6: c9 leaveq
4008d7: c3 retq
00000000004008d8 <_GLOBAL__sub_I_main>:
4008d8: 55 push %rbp
4008d9: 48 89 e5 mov %rsp,%rbp
4008dc: be ff ff 00 00 mov $0xffff,%esi
4008e1: bf 01 00 00 00 mov $0x1,%edi
4008e6: e8 b0 ff ff ff callq 40089b <_Z41__static_initialization_and_destruction_0ii>
4008eb: 5d pop %rbp
4008ec: c3 retq
4008ed: 90 nop
4008ee: 90 nop
4008ef: 90 nop
00000000004008f0 <__libc_csu_init>:
4008f0: 48 89 6c 24 d8 mov %rbp,-0x28(%rsp)
4008f5: 4c 89 64 24 e0 mov %r12,-0x20(%rsp)
4008fa: 48 8d 2d df 04 20 00 lea 0x2004df(%rip),%rbp # 600de0 <__init_array_end>
400901: 4c 8d 25 c8 04 20 00 lea 0x2004c8(%rip),%r12 # 600dd0 <__frame_dummy_init_array_entry>
400908: 4c 89 6c 24 e8 mov %r13,-0x18(%rsp)
40090d: 4c 89 74 24 f0 mov %r14,-0x10(%rsp)
400912: 4c 89 7c 24 f8 mov %r15,-0x8(%rsp)
400917: 48 89 5c 24 d0 mov %rbx,-0x30(%rsp)
40091c: 48 83 ec 38 sub $0x38,%rsp
400920: 4c 29 e5 sub %r12,%rbp
400923: 41 89 fd mov %edi,%r13d
400926: 49 89 f6 mov %rsi,%r14
400929: 48 c1 fd 03 sar $0x3,%rbp
40092d: 49 89 d7 mov %rdx,%r15
400930: e8 e3 fd ff ff callq 400718 <_init>
400935: 48 85 ed test %rbp,%rbp
400938: 74 1c je 400956 <__libc_csu_init+0x66>
40093a: 31 db xor %ebx,%ebx
40093c: 0f 1f 40 00 nopl 0x0(%rax)
400940: 4c 89 fa mov %r15,%rdx
400943: 4c 89 f6 mov %r14,%rsi
400946: 44 89 ef mov %r13d,%edi
400949: 41 ff 14 dc callq *(%r12,%rbx,8)
40094d: 48 83 c3 01 add $0x1,%rbx
400951: 48 39 eb cmp %rbp,%rbx
400954: 75 ea jne 400940 <__libc_csu_init+0x50>
400956: 48 8b 5c 24 08 mov 0x8(%rsp),%rbx
40095b: 48 8b 6c 24 10 mov 0x10(%rsp),%rbp
400960: 4c 8b 64 24 18 mov 0x18(%rsp),%r12
400965: 4c 8b 6c 24 20 mov 0x20(%rsp),%r13
40096a: 4c 8b 74 24 28 mov 0x28(%rsp),%r14
40096f: 4c 8b 7c 24 30 mov 0x30(%rsp),%r15
400974: 48 83 c4 38 add $0x38,%rsp
400978: c3 retq
400979: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
0000000000400980 <__libc_csu_fini>:
400980: f3 c3 repz retq
400982: 90 nop
400983: 90 nop
Disassembly of section .fini:
0000000000400984 <_fini>:
400984: 48 83 ec 08 sub $0x8,%rsp
400988: 48 83 c4 08 add $0x8,%rsp
40098c: c3 retq
For completeness, I'm using:
Ubuntu: 12.04
Valgrind: 3.10.1 3.7.0
g++: 4.8.1
NB: As a side note, this does not happen when I include other headers such as <fstream> or <cmath>.
It's Valgrind's fault. First, -fsanitize=leak does not show anything. Second, Valgrind itself states that:
First of all: relax, it's probably not a bug, but a feature. Many
implementations of the C++ standard libraries use their own memory
pool allocators. Memory for quite a number of destructed objects is
not immediately freed and given back to the OS, but kept in the
pool(s) for later re-use. The fact that the pools are not freed at the
exit of the program cause Valgrind to report this memory as still
reachable. The behaviour not to free pools at the exit could be called
a bug of the library though.
Using GCC, you can force the STL to use malloc and to free memory as
soon as possible by globally disabling memory caching. Beware! Doing
so will probably slow down your program, sometimes drastically.
With GCC 2.91, 2.95, 3.0 and 3.1, compile all source using the STL
with -D__USE_MALLOC. Beware! This was removed from GCC starting with
version 3.3.
With GCC 3.2.2 and later, you should export the environment variable
GLIBCPP_FORCE_NEW before running your program.
With GCC 3.4 and later, that variable has changed name to
GLIBCXX_FORCE_NEW.
[...]
I guess those alleged memory pools are freed after program's termination, in the so-called start-up code that calls main, among the other settings. Internal functions defined outside user's code should be treated as if they didn't exist, that's why Valgrind can't (and shouldn't) see further frees.
Consider the following trivial include file:
#ifndef TRIVIAL_INCLUDE_FILE
#define TRIVIAL_INCLUDE_FILE
static int *x = new int (0);
#endif
For gcc 6 and higher, a related bug fix arrived:
Bug report: "Warning about "still reachable" memory when using libstdc++ from gcc 5"
Discussion on the Arch Linux bug tracker
With gcc 5, you can also get the same warning without including iostream.
So, if you see a similar warning referring to dl-init.c and you are using gcc 5, consider upgrading to a newer version (gcc >=6), or try to compile with clang.