What causes failure to unwind in a DWARF perf call stack? - c++

In backtraces produced by perf record --call-graph dwarf and printed by perf script, I am consistently getting wrong addresses for maybe 5% of call stacks, i.e. unwinding fails. One example is
my_bin 770395 705462.825887: 3560360 cycles:
7f0398b9b7e2 __vsnprintf_internal+0x12 (/usr/lib/x86_64-linux-gnu/libc-2.32.so)
7ffcdb6fbfdf [unknown] ([stack])
my_bin 770395 705462.827040: 3447195 cycles:
7f0398ba1624 __GI__IO_default_xsputn+0x104 (inlined)
7ffcdb6fb7af [unknown] ([stack])
and it was produced from this code (compiled with g++ -O3 -g -fno-omit-frame-pointer my_bin.cpp -o my_bin):
#include <cstdio>
#include <iostream>
int __attribute__ ((noinline)) libc_string(int x) {
char buf[64] = {0};
// Some nonsense workload using libc
int res = 0;
for (int i = 0; i < x; ++i) {
res += snprintf(buf, 64, "%d %d %d Can I bring my friends to tea?", (i%10), (i%3)*10, i+2);
res = res % 128;
}
return res;
}
int main() {
int result = libc_string(20000000);
std::cout << result << "\n";
}
I'm pretty sure that my program should not have executable code in the stack, so these addresses seem wrong. It's not only one program, but most programs that I've profiled which have about 5% wrong call stacks. These call stacks mostly just have two stack frames, with the innermost one sometimes in libraries like Eigen (even though they generally have correct calls stacks) and sometimes in the C++ STL or libc. I've seen the unwinding end up in unknown, [stack], [heap], anon, //anon, libstdc++.so.6.0.28, or <my_bin>.
I've seen this on Ubuntu 18.04, 20.04 and 20.10.
This happens only with DWARF unwinding. How can this be fixed?

Which other types of unwinding did you test?
In examples I disabled kernel ASLR feature with setarch x86_64 -R for more stable addresses and smaller perf.data files.
Usage of perf record option -e cycles:u command may help, as it will not include kernel samples.
I have reproduction of similar dwarf unwind issue for files generated with default perf record event ('-e cycles:u'; libc-2.31 from libc6-prof package was used) for __GI__IO_default_xsputn (inlined) function:
env LD_LIBRARY_PATH=/lib/libc6-prof/x86_64-linux-gnu setarch `uname -m` -R perf record --call-graph dwarf -o perf.data.dwarf.u -e cycles:u ./my_bin
perf script -i perf.data.dwarf.u |less
Incorrect sample:
my_bin 28100 760107.271010: 461418 cycles:u:
7ffff7c74f06 __GI__IO_default_xsputn+0x106 (inlined)
7ffff7c59c6c __vfprintf_internal+0xf4c (/usr/lib/libc6-prof/x86_64-linux-gnu/libc-2.31.so)
Correct sample:
my_bin 28100 760107.257283: 267268 cycles:u:
7ffff7c74eff __GI__IO_default_xsputn+0xff (inlined)
7ffff7c59c6c __vfprintf_internal+0xf4c (/usr/lib/libc6-prof/x86_64-linux-gnu/libc-2.31.so)
7ffff7c6e9f6 __vsnprintf_internal+0xb6 (/usr/lib/libc6-prof/x86_64-linux-gnu/libc-2.31.so)
7ffff7d14a2c ___snprintf_chk+0x9c (inlined)
555555555314 libc_string+0xb4 (/home/user/so/my_bin)
555555555314 libc_string+0xb4 (/home/user/so/my_bin)
555555555111 main+0x11 (/home/user/so/my_bin)
7ffff7c040fa __libc_start_main+0x10a (/usr/lib/libc6-prof/x86_64-linux-gnu/libc-2.31.so)
55555555519d _start+0x2d (/home/user/so/my_bin)
For correct unwind I have many different offsets in __GI__IO_default_xsputn+ (the number after +):
perf script -i perf.data.dwarf.u ||grep vsnprintf_internal -B3 |grep _GI__IO_default_xsputn|cut -d + -f 2|sort | uniq -c
...
208 0x0 (inlined)
45 0x101 (inlined)
2 0x105 (inlined)
91 0x10 (inlined)
294 0x110 (inlined)
2 0x117 (inlined)
2 0x11d (inlined)
326 0x121 (inlined)
But I have no +0x106 address with correct unwind. And all incorrect unwinds have +0x106 address. Let's check with gdb (it is easier after ASLR disabling; +262 is +0x106):
env LD_LIBRARY_PATH=/lib/libc6-prof/x86_64-linux-gnu setarch `uname -m` -R gdb -q ./my_bin
(gdb) start
(gdb) x/i 0x7ffff7c74f06
0x7ffff7c74f06 <__GI__IO_default_xsputn+262>: retq
(gdb) disassemble __GI__IO_default_xsputn
Dump of assembler code for function __GI__IO_default_xsputn:
...
0x00007ffff7c74f05 <+261>: pop %rbp
0x00007ffff7c74f06 <+262>: retq
Unwind issue seems to be connected with inlined(?) functions sampled at retq instruction or after pop %rbp?

Related

How do you find out the cause of rare crashes that are caused by things that are not caught by try catch (access violation, divide by zero, etc.)?

I am a .NET programmer who is starting to dabble into C++. In C# I would put the root function in a try catch, this way I would catch all exceptions, save the stack trace, and this way I would know what caused the exception, significantly reducing the time spent debugging.
But in C++ some stuff(access violation, divide by zero, etc.) are not caught by try catch. How do you deal with them, how do you know which line of code caused the error?
For example let's assume we have a program that has 1 million lines of code. It's running 24/7, has no user-interaction. Once in a month it crashes because of something that is not caught by try catch. How do you find out which line of code caused the crash?
Environment: Windows 10, MSVC.
C++ is meant to be a high performance language and checks are expensive. You can't run at C++ speeds and at the same time have all sorts of checks. It is by design.
Running .Net this way is akin to running C++ in debug mode with sanitizers on. So if you want to run your application with all the information you can, turn on debug mode in your cmake build and add sanitizers, at least undefined and address sanitizers.
For Windows/MSVC it seems that address sanitizers were just added in 2021. You can check the announcement here: https://devblogs.microsoft.com/cppblog/addresssanitizer-asan-for-windows-with-msvc/
For Windows/mingw or Linux/* you can use Gcc and Clang's builtin sanitizers that have largely the same usage/syntax.
To set your build to debug mode:
cd <builddir>
cmake -DCMAKE_BUILD_TYPE=debug <sourcedir>
To enable sanitizers, add this to your compiler command line: -fsanitize=address,undefined
One way to do that is to add it to your cmake build so altogether it becomes:
cmake -DCMAKE_BUILD_TYPE=debug \
-DCMAKE_CXX_FLAGS_DEBUG_INIT="-fsanitize=address,undefined" \
<sourcedir>
Then run your application binary normally as you do. When an issue is found a meaningful message will be printed along with a very informative stack trace.
Alternatively you can set so the sanitizer breaks inside the debugger (gdb) so you can inspect it live but that only works with the undefined sanitizer. To do so, replace
-fsanitize=address,undefined
with
-fsanitize-undefined-trap-on-error -fsanitize-trap=undefined -fsanitize=address
For example, this code has a clear problem:
void doit( int* p ) {
*p = 10;
}
int main() {
int* ptr = nullptr;
doit(ptr);
}
Compile it in the optimized way and you get:
$ g++ -O3 test.cpp -o test
$ ./test
Segmentation fault (core dumped)
Not very informative. You can try to run it inside the debugger but no symbols are there to see.
$ g++ -O3 test.cpp -o test
$ gdb ./test
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
...
Reading symbols from ./test...
(No debugging symbols found in ./test)
(gdb) r
Starting program: /tmp/test
Program received signal SIGSEGV, Segmentation fault.
0x0000555555555044 in main ()
(gdb)
That's useless so we can turn on debug symbols with
$ g++ -g3 test.cpp -o test
$ gdb ./test
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
...
Reading symbols from ./test...
(gdb) r
Starting program: /tmp/test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
test.cpp:4:5: runtime error: store to null pointer of type 'int'
Program received signal SIGSEGV, Segmentation fault.
0x0000555555555259 in doit (p=0x0) at test.cpp:4
4 *p = 10;
Then you can inspect inside:
(gdb) p p
$1 = (int *) 0x0
Now, turn on sanitizers to get even more messages without the debugger:
$ g++ -O0 -g3 test.cpp -fsanitize=address,undefined -o test
$ ./test
test.cpp:4:5: runtime error: store to null pointer of type 'int'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==931717==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x563b7b66c259 bp 0x7fffd167c240 sp 0x7fffd167c230 T0)
==931717==The signal is caused by a WRITE memory access.
==931717==Hint: address points to the zero page.
#0 0x563b7b66c258 in doit(int*) /tmp/test.cpp:4
#1 0x563b7b66c281 in main /tmp/test.cpp:9
#2 0x7f36164a9082 in __libc_start_main ../csu/libc-start.c:308
#3 0x563b7b66c12d in _start (/tmp/test+0x112d)
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /tmp/test.cpp:4 in doit(int*)
==931717==ABORTING
That is much better!

How to run HelloWorld on ARM

I need to run HelloWorld on arm.
I launch:
$ arm-none-eabi-g++ -mthumb -mcpu=cortex-m3 -c test.cpp -o test
$ file test
test: ELF 32-bit LSB relocatable, ARM, EABI5 version 1 (SYSV), not stripped
$ qemu-arm test <br>
Error while loading test: Permission denied
qemu-system-arm -machine help
...
lm3s811evb Stellaris LM3S811EVB
...
From either the lm3s811 datasheet or looking at the source for the stellaris backend on qemu arm hardware. Uart0 base address is 0x4000C000, the data (rx and tx) register is offset 0x000. From experience the qemu backends tend not to bother with the tx buffer being busy...
flash.s
.cpu cortex-m0
.thumb
.thumb_func
.global _start
_start:
.word 0x20001000
.word reset
.word hang
.word hang
.word hang
.word hang
.word hang
.thumb_func
reset:
bl notmain
b hang
.thumb_func
hang: b .
.thumb_func
.globl PUT32
PUT32:
str r1,[r0]
bx lr
uart01.c
void PUT32 ( unsigned int, unsigned int );
#define UART0BASE 0x4000C000
int notmain ( void )
{
unsigned int rx;
for(rx=0;rx<7;rx++)
{
PUT32(UART0BASE+0x00,0x30+rx);
}
return(0);
}
flash.ld
MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
Yep, the stellaris was the first cortex-m3 in silicon that you could buy and I specified cortex-m0. Basically preventing the thumb2 extensions, or most of them. More portable, can easily change that if you want.
arm-none-eabi-as --warn -mcpu=cortex-m0 flash.s -o flash.o
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=cortex-m0 -mthumb -c uart01.c -o uart01.o
arm-none-eabi-ld -o uart01.elf -T flash.ld flash.o uart01.o
arm-none-eabi-objdump -D uart01.elf > uart01.list
arm-none-eabi-objcopy uart01.elf uart01.bin -O binary
then
qemu-system-arm -M lm3s811evb -m 16K -nographic -kernel uart01.bin
and the output is
0123456
ctrl-a then press x to exit qemu. or
qemu-system-arm -M lm3s811evb -m 16K -kernel uart01.bin
then ctrl-alt-3 (3 not F3) and the serial console pops up with the serial output. when you close that serial console qemu closes.
I want to remember someone telling me that the qemu cortex-m3 support is not that good.
the normal arm cores should be well tested as they are used to cross compile for all kinds of arm target boards. not sure exactly which cores are well tested, but you could do thumb stuff if you booted like an arm but did the rest in thumb, boot with
.globl _start
_start:
b reset
b hang
b hang
b hang
reset:
mov sp,#0x20000
bl notmain
hang:
b hang
the machine
versatilepb ARM Versatile/PB (ARM926EJ-S)
has its uart at 0x101f1000, so
for(ra=0;;ra++)
{
ra&=7;
PUT32(0x101f1000,0x30+ra);
}
can build your app with
arm-none-eabi-gcc -Wall -O2 -nostdlib -nostartfiles -ffreestanding -mcpu=arm7tdmi -mthumb -c uart01.c -o uart01.o
change your linker script to be all ram based.
MEMORY
{
ram : ORIGIN = 0x00000000, LENGTH = 32K
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.text*) } > ram
}
and then
qemu-system-arm -M versatilepb -m 128M -nographic -kernel hello.bin
(hmmm, does this load at 0x0000, or 0x8000?, shouldnt be too hard to figure out)
you can get mostly the thumb feel of a cortex-m (an m0 basically not an m3, you can find an armv7-a machine you can probably run thumb2 built code (still boots like an arm not a cortex-m)). for example
realview-pb-a8 ARM RealView Platform Baseboard for Cortex-A8
Can probably use newlib almost as is, need to change the crt0.s or whatever it is called these days to boot like an arm not a cortex-m. The rest can build using armv7m and in theory will work.
And or start with the stellaris and just make your own hw support for whatever your real target is and fix the cortex-m3 core if/when you find problems.

Using -fsanitize=memory with clang on linux with libstdc++

With the system supplied libstdc++ the clang memory sanitizer is basically unusable due to false positives - eg the code below fails.
#include <iostream>
#include <fstream>
int main(int argc, char **argv)
{
double foo = 1.2;
std::ofstream out("/tmp/junk");
auto prev = out.flags(); //false positive here
out.setf(std::ios::scientific);
out << foo << std::endl;
out.setf(prev);
}
Building libstdc++ as described here:
https://code.google.com/p/memory-sanitizer/wiki/InstrumentingLibstdcxx
and running it like so:
LD_LIBRARY_PATH=~/goog-gcc-4_8/build/src/.libs ./msan_example
gives the foolowing output
/usr/bin/llvm-symbolizer: symbol lookup error: /home/hal/goog-gcc-4_8/build/src/.libs/libstdc++.so.6: undefined symbol: __msan_init
Both centos7 (epel clang) and ubuntu fail in this manner.
Is there something I'm doing wrong?
Previous stack overflow question Using memory sanitizer with libstdc++
edit
Using #eugins suggestion, compile command line for this code is:
clang++ -std=c++11 -fPIC -O3 -g -fsanitize=memory -fsanitize-memory-track-origins -fPIE -fno-omit-frame-pointer -Wall -Werror -Xlinker --build-id -fsanitize=memory -fPIE -pie -Wl,-rpath,/home/hal/goog-gcc-4_8/build/src/.libs/libstdc++.so test_msan.cpp -o test_msan
$ ./test_msan
==19027== WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7f66df377d4e in main /home/hal/tradingsystems/test_msan.cpp:9
#1 0x7f66ddefaaf4 in __libc_start_main (/lib64/libc.so.6+0x21af4)
#2 0x7f66df37780c in _start (/home/hal/tradingsystems/test_msan+0x7380c)
Uninitialized value was created by an allocation of 'out' in the stack frame of function 'main'
#0 0x7f66df377900 in main /home/hal/tradingsystems/test_msan.cpp:6
SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/hal/tradingsystems/test_msan.cpp:9 main
Exiting
MSan spawns llvm-symbolizer process to translate stack trace PCs into function names and file/line numbers. Because of the LD_LIBRARY_PATH setting, the instrumented libstdc++ is loaded into both the main MSan process (which is good) and the llvm-symbolizer process (which won't work).
Preferred way of dealing with it is though RPATH setting (at link time):
-Wl,-rpath,/path/to/libstdcxx_msan
You could also check this msan/libc++ guide which is more detailed and up-to-date:
https://code.google.com/p/memory-sanitizer/wiki/LibcxxHowTo

GDB doesn't show function names

I am debugging from an embedded device using gdbserver:
./gdbserver HOST:5000 /home/test_app
In my PC, I execute gdb in this way:
arm-none-linux-gnueabi-gdb test_app
Once the application is executing, I receive the Segfault I want to debug, but it's impossible to know what line produced it:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 715]
0x31303030 in ?? ()
(gdb) bt
#0 0x31303030 in ?? ()
#1 0x0000dff8 in ?? ()
#2 0x0000dff8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(I must say I'm totally new to GDB)
Ok this usually happens if debug symbols are missing... just to make sure run following commands
file <your_executable>
you will get info on your binary like format, arch etc.. last part of the info describes if the binary is stripped or not. For debugging in GDB the binary should not have been stripped.
nm --debug-sym <your_executable> | grep debug
If you have some valid prints as below it means debug symbols are present.
00000000 N .debug_abbrev
00000000 N .debug_aranges
00000000 N .debug_frame
00000000 N .debug_info
00000000 N .debug_line
00000000 N .debug_loc
00000000 N .debug_pubnames
00000000 N .debug_str
Further when you invoke GDB you should have follwing line
Reading symbols from <your_executable>...done.
At this point you should be able to list sources with list command.
Make sure both gdb and gdbserver have same versioninig.
arm-none-linux-gnueabi-gdb --version
./gdbserver --version
If all the above are true, and you still don't get backtrace, there is something bad going on with your stack. Try running some static analysis, valgrind on your code / newly added code.
You need to build your application with debug symbols enabled. The switch for gcc is -g
For others, if nm --debug-sym <your_executable> | grep debug prints the debug symbols but you do not get them in gdb, this might be because you are opening a core in gdb using a executable that is different from the one that generated the core.
You will need to include -g for every translation unit, for example, if you have a bunch of object files that are linked to build your final executable you will need to include -g for each compilation command.
g++ -g file1.cpp -c -o file1.o
g++ -g file2.cpp -c -o file2.o
...
g++ -g file1.o file2.o -o main

Global constructor call not in .init_array section

I'm trying to add global constructor support on an embedded target (ARM Cortex-M3).
Lets say I've the following code:
class foobar
{
int i;
public:
foobar()
{
i = 100;
}
void inc()
{
i++;
}
};
foobar foo;
int main()
{
foo.inc();
for (;;);
}
I compile it like this:
arm-none-eabi-g++ -O0 -gdwarf-2 -mcpu=cortex-m3 -mthumb -c foo.cpp -o foo.o
When I look at the .init_array section with objdump it shows the .init_section has a zero size.
I do get an symbol named _Z41__static_initialization_and_destruction_0ii.
When I disassemble the object file I see that the global construction is done in the static_initialization_and_destruction symbol.
Why isn't a pointer added to this symbol in the .init_section?
I know it has been almost two years since this question was asked, but I just had to figure out the mechanics of bare-metal C++ initialization with GCC myself, so I thought I'd share the details here. There turns out to be a lot of out-of-date or confusing information on the web. For example, the oft-mentioned collect2 wrapper does not appear to be used for ARM ELF targets, since its arbitrary section support enables the approach described below.
First, when I compile the code above with the given command line using Sourcery CodeBench Lite 2012.09-63, I do see the correct .init_array section size of 4:
$ arm-none-eabi-objdump -h foo.o
foo.o: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
...
13 .init_array 00000004 00000000 00000000 0000010c 2**2
CONTENTS, ALLOC, LOAD, RELOC, DATA
...
When I look at the section contents, it just contains 0:
$ arm-none-eabi-objdump -j .init_array -s foo.o
Contents of section .init_array:
0000 00000000 ....
However, there is also a relocation section that sets it correctly to _GLOBAL__sub_I_foo:
$ arm-none-eabi-objdump -x foo.o
...
RELOCATION RECORDS FOR [.init_array]:
OFFSET TYPE VALUE
00000000 R_ARM_TARGET1 _GLOBAL__sub_I_foo
In general, .init_array points to all of your _GLOBAL__sub_I_XXX initializer stubs, each of which calls its own copy of _Z41__static_initialization_and_destruction_0ii (yes, it is multiply-defined), which calls the constructor with the appropriate arguments.
Because I'm using -nostdlib in my build, I can't use CodeSourcery's __libc_init_array to execute the .init_array for me, so I need to call the static initializers myself:
extern "C"
{
extern void (**__init_array_start)();
extern void (**__init_array_end)();
inline void static_init()
{
for (void (**p)() = __init_array_start; p < __init_array_end; ++p)
(*p)();
}
}
__init_array_start and __init_array_end are defined by the linker script:
. = ALIGN(4);
.init_array :
{
__init_array_start = .;
KEEP (*(.init_array*))
__init_array_end = .;
}
This approach seems to work with both the CodeSourcery cross-compiler and native ARM GCC, e.g. in Ubuntu 12.10 for ARM. Supporting both compilers is one reason for using -nostdlib and not relying on the CodeSourcery CS3 bare-metal support.
Timmmm,
I just had the same issue on the nRF51822 and solved it by adding KEEP() around a couple lines in the stock Nordic .ld file:
KEEP(*(SORT(.init_array.*)))
KEEP(*(.init_array))
While at it, I did the same to the fini_array area too. Solved my problem and the linker can still remove other unused sections...
You have only produced an object file, due to the -c argument to gcc. To create the .init section, I believe that you need to link that .o into an actual executable or shared library. Try removing the -c argument and renaming the output file to "foo", and then check the resulting executable with the disassembler.
If you look carefully _Z41__static_initialization_and_destruction_0ii would be called inside global constructor. Which inturn would be linked in .init_array section (in arm-none-eabi- from CodeSourcery.) or some other function (__main() if you are using Linux g++). () This should be called at startup or at main().
See also this link.
I had a similar issue where my constructors were not being called (nRF51822 Cortex-M0 with GCC). The problem turned out to be due to this linker flag:
-Wl,--gc-sections
Don't ask me why! I thought it only removed dead code.