gdb `list` command only shows one line number instead of contents when attaching to a process - c++

I invoked gdb to attach a current running process with this command sudo gdb binary PID. After that I set breakpoints and typed continue in gdb. Then I sent a request to this process to hit the breakpoint. After that, when I type command list, it only shows one line instead of multiple lines as expected, and it only shows line number instead of contents. And may I ask what does the output of command n mean? On the internet some docs mentioned it means the next line to be executed. But from the output itself, it doesn't make too much sense to me(after two n commands the last l command shows 169 instead of 172 or 174). Can anyone help answer above two questions? Really appreciate it.
(gdb) l
164 in CBFEMultiSectionResponseModule.cc
(gdb) n
172 in CBFEMultiSectionResponseModule.cc
(gdb) l
167 in CBFEMultiSectionResponseModule.cc
(gdb) n
174 in CBFEMultiSectionResponseModule.cc
(gdb) l
169 in CBFEMultiSectionResponseModule.cc
The build command line for this source file is like this:
/usr/bin/g++ -c -fPIC -DMODULEADAPTER_BUILTIN_VERSION=\"2.36.375.10894.aff30c2\" -DLINUX -D_GNU_SOURCE -D_THREAD_SAFE -DUSE_STD_YUTSTRING -I../api -I. -I/home/y/include/ydisc \
-I/home/y/include/avro -I../.. -I../../.. -I../../../external_interfaces -I../../../sg_interfaces -I/home/y/include64 -I/home/y/include
\-fPIC -g -O2 -Wall -Werror -Wno-invalid-offsetof -fno-strict-aliasing -pipe -MD -MP
\-DYAHOO_PLATFORM_MAJOR=6 -DYAHOO_PLATFORM_MINOR=10 CBFEMultiSectionResponseModule.cc -o x86_64-linux-gcc/CBFEMultiSectionResponseModule.o
This is the filesystem type:
-bash-4.1$ df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/vda1 ext4 246G 97G 137G 42% /
tmpfs tmpfs 12G 30M 12G 1% /dev/shm
Below is the output of info source:
(gdb) info source
Current source file is CBFEMultiSectionResponseModule.cc
Compilation directory is /home/myusername/ufe/modules/multisectionresponse/impl
Source language is c++.
Compiled with DWARF 2 debugging format.
Does not include preprocessor macro info.
Below is the output of shell cat:
(gdb) shell cat /home/myusername/ufe/modules/multisectionresponse/impl/CBFEMultiSectionResponseModule.cc
cat: /home/myusername/ufe/modules/multisectionresponse/impl/CBFEMultiSectionResponseModule.cc: No such file or directory

when I type command list, it only shows one line instead of multiple lines as expected, and it only shows line number instead of contents
This is most likely happening because GDB has no access to the source. The sudo is the key here. Your source likely resides on a filesystem that doesn't allow root access, such as NFS.
it doesn't make too much sense to me(after two n commands the last l command shows 169 instead of 172 or 174).
You are debugging optimized code. See e.g. this answer.
Update:
The path to source is correct in the compilation environment. However, the runtime environment is different than the compilation env ..
Well, why didn't you tell us that?
My answer is correct: GDB doesn't list source because source is inaccessible (it's just inaccessible for a different reason from what I guessed).
If you want GDB list command to work in the runtime environment, then you must make the source available (though not necessarily in the same location; use dir command to point GDB to the location where sources are available).
Update 2:
. I used to think GDB has some magic ways to get the source code from the binary.
The binary does not contain sources (that would make significantly larger). Instead, it contains references to the source locations.
In particular, the compiler encodes into the binary for each translation unit (each .cpp file):
Compilation directory
Name of the source file(s) (there could be more than one due to #includes).
A mapping from program counter to file/line that the particular chunk of assembly was generated for.
(There is additional info describing variable locations, types, etc. But these are irrelevant to the list command.)
GDB decodes above into, locates the source file(s), and allows you to set breakpoints by file/line, lists the source when you hit the breakpoint, etc.

Related

Analyze Linux core dump on different machine: threads and shared libs [duplicate]

We get core files from running our software on a Customer's box. Unfortunately because we've always compiled with -O2 without debugging symbols this has lead to situations where we could not figure out why it was crashing, we've modified the builds so now they generate -g and -O2 together. We then advice the Customer to run a -g binary so it becomes easier to debug.
I have a few questions:
What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
Are there any good books for debugging on Linux, or Solaris? Something example oriented would be great. I am looking for real-life examples of figuring out why a routine crashed and how the author arrived at a solution. Something more on the intermediate to advanced level would be good, as I have been doing this for a while now. Some assembly would be good as well.
Here's an example of a crash that requires us to tell the Customer to get a -g ver. of the binary:
Program terminated with signal 11, Segmentation fault.
#0 0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00454ff1 in select () from /lib/libc.so.6
...
<omitted frames>
Ideally I'd like to solve find out why exactly the app crashed - I suspect it's memory corruption but I am not 100% sure.
Remote debugging is strictly not allowed.
Thanks
What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
It the executable is dynamically linked, as yours is, the stack GDB produces will (most likely) not be meaningful.
The reason: GDB knows that your executable crashed by calling something in libc.so.6 at address 0x00454ff1, but it doesn't know what code was at that address. So it looks into your copy of libc.so.6 and discovers that this is in select, so it prints that.
But the chances that 0x00454ff1 is also in select in your customers copy of libc.so.6 are quite small. Most likely the customer had some other procedure at that address, perhaps abort.
You can use disas select, and observe that 0x00454ff1 is either in the middle of instruction, or that the previous instruction is not a CALL. If either of these holds, your stack trace is meaningless.
You can however help yourself: you just need to get a copy of all libraries that are listed in (gdb) info shared from the customer system. Have the customer tar them up with e.g.
cd /
tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...
Then, on your system:
mkdir /tmp/from-customer
tar xzf to-you.tar.gz -C /tmp/from-customer
gdb /path/to/binary
(gdb) set solib-absolute-prefix /tmp/from-customer
(gdb) core core # Note: very important to set solib-... before loading core
(gdb) where # Get meaningful stack trace!
We then advice the Customer to run a -g binary so it becomes easier to debug.
A much better approach is:
build with -g -O2 -o myexe.dbg
strip -g myexe.dbg -o myexe
distribute myexe to customers
when a customer gets a core, use myexe.dbg to debug it
You'll have full symbolic info (file/line, local variables), without having to ship a special binary to the customer, and without revealing too many details about your sources.
You can indeed get useful information from a crash dump, even one from an optimized compile (although it's what is called, technically, "a major pain in the ass.") a -g compile is indeed better, and yes, you can do so even when the machine on which the dump happened is another distribution. Basically, with one caveat, all the important information is contained in the executable and ends up in the dump.
When you match the core file with the executable, the debugger will be able to tell you where the crash occurred and show you the stack. That in itself should help a lot. You should also find out as much as you can about the situation in which it happens -- can they reproduce it reliably? If so, can you reproduce it?
Now, here's the caveat: the place where the notion of "everything is there" breaks down is with shared object files, .so files. If it is failing because of a problem with those, you won't have the symbol tables you need; you may only be able to see what library .so it happens in.
There are a number of books about debugging, but I can't think of one I'd recommend.
As far as I remember, you dont need to ask your customer to run with the binary built with -g option. What is needed is that you should have a build with -g option. With that you can load the core file and it will show the whole stack trace. I remember few weeks ago, I created core files, with build (-g) and without -g and the size of core was same.
Inspect the values of local variables you see when you walk the stack ? Especially around the select() call. Do this on customer's box, just load the dump and walk the stack...
Also , check the value of FD_SETSIZE on both your DEV and PROD platforms !
Copying the resolution from my question which was considered a duplicate of this.
set solib-absolute-prefix from the accepted solution did not help for me. set sysroot was absolutely necessary to make gdb load locally provided libs.
Here is the list of commands I used to open core dump:
# note: all the .so files obtained from user machine must be put into local directory.
#
# most importantly, the following files are necessary:
# 1. libthread_db.so.1 and libpthread.so.0: required for thread debugging.
# 2. other .so files are required if they occur in call stack.
#
# these files must also be renamed exactly as the symlinks
# i.e. libpthread-2.28.so should be renamed to libpthread.so.0
# load executable file
file ./thedarkmod.x64
# force gdb to forget about local system!
# load all .so files using local directory as root
set sysroot .
# drop dump-recorded paths to .so files
# i.e. load ./libpthread.so.0 instead of ./lib/x86_64-linux-gnu/libpthread.so.0
set solib-search-path .
# disable damn security protection
set auto-load safe-path /
# load core dump file
core core.6487
# print stacktrace
bt

Higher line numbers are unresolved as breakpoints when debugging using lldb

I am trying to set breakpoints in a MIPS32r6 program that computes the Mandelbrot Set in Brainfsck. The program itself is written in C++, compiled with Clang, and I am debugging with LLDB.
The issue that I am having is that when in LLDB, I can set certain breakpoints, mainly on lower line numbers, with no issues. However, after Line #70 in Main.cpp, the breakpoints are coming up as 'unresolved' (even though executing breakpoint list shows them with completely reasonable addresses). That is to say, all breakpoints that I try to set after Line #70 are coming up as unresolved, and all reasonable breakpoints before Line #70 resolve without issue.
I've placed a copy of the binary that I've linked here: http://filebin.ca/2tJzo2LLBJWO/MipsTest.bin
And a copy of Main.cpp here: https://paste.ee/p/WYs8Y
I am building with the following options:
clang -mcompact-branches=always -fasynchronous-unwind-tables -funwind-tables -fexceptions -fcxx-exceptions -mips32r6 -O0 -g -glldb ...
lld --discard-none -znorelro --eh-frame-hdr ...
At this point, I am unsure as to what might be causing this issue.
I'd try doing target modules dump line-table Main.cpp in lldb to see what lldb thinks the line table looks like. Then look at the binary's DWARF line table with something like readelf --debug-dump=decodedline MipsTest.bin (I think that's right - I'm looking at a readelf main page on the web).
Using your sample binary, I get:
(lldb) b s -l 72
Breakpoint 1: where = MipsTest.bin`main + 544 at Main.cpp:72, address = 0x000134a0
So we found an address for the breakpoint. If it is unresolved when you run, that means we weren't able to implement the breakpoint at that address (e.g. for some reason couldn't write the trap into the program memory there.)

Useless core dump (SIGBUS). Why? [duplicate]

We get core files from running our software on a Customer's box. Unfortunately because we've always compiled with -O2 without debugging symbols this has lead to situations where we could not figure out why it was crashing, we've modified the builds so now they generate -g and -O2 together. We then advice the Customer to run a -g binary so it becomes easier to debug.
I have a few questions:
What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
Are there any good books for debugging on Linux, or Solaris? Something example oriented would be great. I am looking for real-life examples of figuring out why a routine crashed and how the author arrived at a solution. Something more on the intermediate to advanced level would be good, as I have been doing this for a while now. Some assembly would be good as well.
Here's an example of a crash that requires us to tell the Customer to get a -g ver. of the binary:
Program terminated with signal 11, Segmentation fault.
#0 0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00454ff1 in select () from /lib/libc.so.6
...
<omitted frames>
Ideally I'd like to solve find out why exactly the app crashed - I suspect it's memory corruption but I am not 100% sure.
Remote debugging is strictly not allowed.
Thanks
What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
It the executable is dynamically linked, as yours is, the stack GDB produces will (most likely) not be meaningful.
The reason: GDB knows that your executable crashed by calling something in libc.so.6 at address 0x00454ff1, but it doesn't know what code was at that address. So it looks into your copy of libc.so.6 and discovers that this is in select, so it prints that.
But the chances that 0x00454ff1 is also in select in your customers copy of libc.so.6 are quite small. Most likely the customer had some other procedure at that address, perhaps abort.
You can use disas select, and observe that 0x00454ff1 is either in the middle of instruction, or that the previous instruction is not a CALL. If either of these holds, your stack trace is meaningless.
You can however help yourself: you just need to get a copy of all libraries that are listed in (gdb) info shared from the customer system. Have the customer tar them up with e.g.
cd /
tar cvzf to-you.tar.gz lib/libc.so.6 lib/ld-linux.so.2 ...
Then, on your system:
mkdir /tmp/from-customer
tar xzf to-you.tar.gz -C /tmp/from-customer
gdb /path/to/binary
(gdb) set solib-absolute-prefix /tmp/from-customer
(gdb) core core # Note: very important to set solib-... before loading core
(gdb) where # Get meaningful stack trace!
We then advice the Customer to run a -g binary so it becomes easier to debug.
A much better approach is:
build with -g -O2 -o myexe.dbg
strip -g myexe.dbg -o myexe
distribute myexe to customers
when a customer gets a core, use myexe.dbg to debug it
You'll have full symbolic info (file/line, local variables), without having to ship a special binary to the customer, and without revealing too many details about your sources.
You can indeed get useful information from a crash dump, even one from an optimized compile (although it's what is called, technically, "a major pain in the ass.") a -g compile is indeed better, and yes, you can do so even when the machine on which the dump happened is another distribution. Basically, with one caveat, all the important information is contained in the executable and ends up in the dump.
When you match the core file with the executable, the debugger will be able to tell you where the crash occurred and show you the stack. That in itself should help a lot. You should also find out as much as you can about the situation in which it happens -- can they reproduce it reliably? If so, can you reproduce it?
Now, here's the caveat: the place where the notion of "everything is there" breaks down is with shared object files, .so files. If it is failing because of a problem with those, you won't have the symbol tables you need; you may only be able to see what library .so it happens in.
There are a number of books about debugging, but I can't think of one I'd recommend.
As far as I remember, you dont need to ask your customer to run with the binary built with -g option. What is needed is that you should have a build with -g option. With that you can load the core file and it will show the whole stack trace. I remember few weeks ago, I created core files, with build (-g) and without -g and the size of core was same.
Inspect the values of local variables you see when you walk the stack ? Especially around the select() call. Do this on customer's box, just load the dump and walk the stack...
Also , check the value of FD_SETSIZE on both your DEV and PROD platforms !
Copying the resolution from my question which was considered a duplicate of this.
set solib-absolute-prefix from the accepted solution did not help for me. set sysroot was absolutely necessary to make gdb load locally provided libs.
Here is the list of commands I used to open core dump:
# note: all the .so files obtained from user machine must be put into local directory.
#
# most importantly, the following files are necessary:
# 1. libthread_db.so.1 and libpthread.so.0: required for thread debugging.
# 2. other .so files are required if they occur in call stack.
#
# these files must also be renamed exactly as the symlinks
# i.e. libpthread-2.28.so should be renamed to libpthread.so.0
# load executable file
file ./thedarkmod.x64
# force gdb to forget about local system!
# load all .so files using local directory as root
set sysroot .
# drop dump-recorded paths to .so files
# i.e. load ./libpthread.so.0 instead of ./lib/x86_64-linux-gnu/libpthread.so.0
set solib-search-path .
# disable damn security protection
set auto-load safe-path /
# load core dump file
core core.6487
# print stacktrace
bt

Measure static memory usage for C++ ported to embedded platform

I have created a small program as a proof-of-concept for a system which are to be implemented on an embedded platform. The program is written in C++11 with use of std and compiled to run on a laptop. The final program which should be implemented later is an embedded system. We do not have access to the compiler of the embedded platform.
I would like to know if there is a way to determine a programs static memory (the size of the compiled binaries) in a sensible and comparable way when it should be ported to an embedded platform.
The requirement is that the size of the binary is less than 10kb.
Our binary has a size of 700Kb when compiled and stripped with the following flags:
g++ options: -Os -s -ffunction-sections -fdata-sections
linker options: -s -Wl,--gc-sections
strip libmodel.a -s -R .comment -R .gnu.version --strip-unneeded -R .note
It took up 4MB before we used strip and optimization options.
I am still way off and it is not really that big a program. How can I justify a comparison in any way with an equivalent program on an embedded platform.
Note that the size of the binary can be a little deceptive in the sense that uninitialised variables, the .bss sections, will not necessarily take up physical space in the binary as these are generally just noted as present without actually have any space given to them... this normally happens by the OS loader when it runs your program.
objdump (http://www.gnu.org/software/binutils/) or perhaps elfdump or the elf tool chain (http://sourceforge.net/apps/trac/elftoolchain/) will help you determine the size of your various segments, data and text, as well as the size of individual functions and globals etc. All these programs "look" into your compiled binary and extract a lot of information such as the size of the .text, .data section, list the various symbols, their locations and sizes, and can even dissasemble the .text section...
An example of using elfdump on an ELF image test.elf might be elfdump -z test.elf > output.txt. This will dump everything including text section dissassembly. For example, from an elfdump on my system I saw
Section #6: .text, type=NOBITS, addr=0x500, off=0x5f168
size=149404(0x2479c), link=0, info=0, align=16, entsize=1
flags=<WRITE,ALLOC,EXECINSTR>
Section #7: .text, type=NOBITS, addr=0x24c9c, off=0x5f168
size=362822(0x58946), link=0, info=0, align=4, entsize=1
flags=<WRITE,ALLOC,EXECINSTR,INCLUDE>
....
Section #9: .rodata, type=NOBITS, addr=0x7d5e4, off=0x5f168
size=7670(0x1df6), link=0, info=0, align=4, entsize=1
flags=<WRITE,ALLOC>
So I can see how much my code is taking up (the .text sections) and my read only data. Later in the file I then see...
Symbol table ".symtab"
Value Size Bind Type Section Name
----- ---- ---- ---- ------- ----
218 0x7c090 130 LOC FUNC .text IRemovedThisName
So I can see that my function IRemovedThisName takes 130 bytes. A quick script would allow you list functions sorted by size and variables sorted by size. This could point you at places to optimize...
For a good example of objdump try http://www.thegeekstuff.com/2012/09/objdump-examples/, specifically the section 3, which shows you how to get the contents of the section headers using the -h option.
As to how the program will compare on two different platforms I think you will just have to compile on both platforms and compare the results you get from your obj/elfdump on each system - the results will depend on the system instruction set, how well each compiler can optimize, general hardware architecture differences etc.
If you don't have access to the embedded system, you might try using a cross-compiler, configured for your eventual target, on your laptop. This would give you a binary suited to the embedded platform and the tools to analyze the file (i.e. the cross-platform version of objdump). This would give you some ball-park figures for how the program would look on the eventual embedded sys.
Hope this helps.
EDIT: This will also help How to get the size of a C function from inside a C program or with inline assembly?
It appeared that the included libraries took up an enormous of space (as it was pointed out in the comment) and by removing these it was possible to reduce the size to nearly nothing in combination with the following flags:
set(CMAKE_CXX_FLAGS "-Os -s -ffunction-sections -fdata-sections -DNO_STD -fno-rtti -fno-exceptions")
set(CMAKE_EXE_LINKER_FLAGS "-s -Wl,--gc-sections")
And stripping away any unnecessary code using:
strip libmodel.a -s -R .comment -R .gnu.version --strip-unneeded -R .note
The 4MB could be reduced to 9.4kb which is below our limit.
In summary, std takes up an tremendous amount of space.

gdb not showing the line source

GDB is not showing me the line source after next/stop , and displays only line number and source file , like this :
(gdb) n
7 in test/test.c
whereas I expect it to display the current line , like this :
(gdb) next
17 char * good_message = "Hello, world.";
any settings in .gdbinit that might help me do this ?
whereas I expect it to display the current line , like this
On many platforms, such as ELF, the compiler records both the path to the source (test/test.c in your case), and the compilation directory, allowing GDB to display source regardless of which directory you invoke it in.
But many platforms are less flexible, and don't have a place to record compilation directory. On such platforms (e.g. AIX), you must either start GDB in the compilation directory, or tell it where to look for sources with directory command.
Probably my answer may not be a perfect solution but the way you compile your source program matters. For example in my case if you do g++ fib.cpp -o fib and then try to run gdb fib it won't print the source code with list. Using debug flag g++ -g fib.cpp -o fib and then running with gdb solved my problem.