Deciphering a C++ Backtrace - c++

Can someone point me as to where I might find an explanation for decoding/deciphering a backtrace. There are thousands of links that explain how to read a dump file, a call stack within an IDE, or creating a backtrace. So where is the one for "I was given a backtrace in an email for a previously internally released code, but nothing explaining why"
i.e. /lib/x86_64-linux-gnu/libc.so.6(+0x350e0) [0x7f58aa6a80e0]
I have been given a backtrace from a crash that occurred once. So I am trying to determine why there was a crash and fix so the it gracefully continues. I have no explanation of what the user was doing, what the system was supposedly doing or anything just the backtrace.
I am not trying to find an issue in the above line it is just the example of a backtrace line. Now what I do know from the example above.
The line states that a line of code in libc.so.6 was called, that the line of code can be found at 0x7f58aa6a80e0 within the code segment of the binary. Problem here is that the code segment address is not returned using addr2line since the address in not in the viewable range (symbols removed). What again does the +0x350e0 represent and how do I use?
I know the exact function that the crash occurred in, not the line.
This is not a solution but a WT.
Best I can tell from a little reverse engineering of the code.
Recompiled libtsm_sl with the added flags -Xlinker -Map=object.map
Found the Address for both AssignReservationSlots and GenerateSOL in object.map.
Then
objdump -D libtsm_sl | less
Then searched the output for AssignReservationSlots+0x5ec which I found. At 558481
Also searched for the address for the call from GenerateSOL and got that address. At 55a480
Now from the stack trace in the log file
4) /usr/lib/libtsm_sl.so.0(AssignReservationSlots()+0x5ec) [0x7f58ae415490]
5) /usr/lib/libtsm_sl.so.0(GenerateSOL()+0x87d) [0x7f58ae417485]
I took the [value] and found the delta which is 0x1FF5, subtracted that from the address for in the libtsm_sl for GenerateSOL’s call to AssignReservationSlots and the resulting address comes to fall on the line in the objdump that has AssignReservationSlots+0x5ec
7f58ae417485 - 7f58ae415490 = 1ff5
55a480 – 1ff5 = 55848B
558481 is line 1449 in NCClass.cpp
55848B is also 1449. It is an if statement and refers to the second argument on the line which is just a bool variable.
So we crashed on evaluating a bool????
Now what??? No answer needed, its rhetorical.

Related

What is GDB's "here"?

I am trying to troubleshoot a bus error with some inline SSE2 assembly. The source code has a macro that uses 5 pointers, and I suspect one of them is not aligned.
I set a breakpoint on the source line. But when I perform a disass, it disassembles from the top of the function, and not where the debugger is stopped. There are hundreds of lines of assembly, so its not really helpful to me. Pressing ENTER 30 to 40 times in response to "Press ENTER to continue" got old very quickly.
I tried a disass $pc, but it dsassembled from the top of the function. I also tried a disass . (with the dot meaning "here"), but that resulted in:
A syntax error in expression, near `.'.
What does GDB use to denote "here"?
You were correct with the use of $pc to represent the current location. The reason that this did not do what you expected when used with the disassemble command is that the disassemble command tries by default to disassemble the function containing the given address which is what you are seeing.
There are alternative forms that can be given to disassemble, for example start,end where start and end are addresses, or start,+length where start is an address and length is a number of bytes.
Try help disassemble at the gdb prompt for more information.
As an alternative you can also use the x (examine) command to display instructions, without the smart find the beginning of the function behaviour, so x/10i $pc will display 10 instructions starting from $pc. This can be helpful if you only want the instructions disassembled, however you don't have access to the /m or /r modifiers that are available on the disassemble command. These modifiers display interleaved source and assembler (for /m) or the raw instruction bytes (for /r).
Also, if the whole press ENTER to continue thing is getting old then you can try set height 0 to turn off the pager, do make sure that you have enough scroll back in your terminal though :)

Scilab compilation "cannot allocate this quantity of memory"

I am facing issues with memory allocation in Scilab after compiling.
I am compiling on a Red Hat on ppc64 (POWER8). Stack limits are already set to unlimited (ulimit -s unlimited). The ./configure script (with several options I am not showing here) runs successfully, but the make all fails and stops. When it stops, it is stuck at the Scilab command prompt with this message:
./bin/scilab-cli -ns -noatomsautoload -f modules/functions/scripts/buildmacros/buildmacros.sce
stacksize(5000000);
!--error 10001
stacksize: Cannot allocate memory.
%s: Cannot allocate this quantity of memory.
at line 27 of exec file called by :
exec('modules/functions/scripts/buildmacros/buildmacros.sce',-1)
-->
I have investigated a bit, and that error message seems to be called of course at line 00027 in buildmatros.sce, where the function stacksize(5000000) is called.
This function is defined in:
scilab-5.5.1/modules/core/sci_gateway/c/sci_stacksize.c
I found a version of the file at this page: http://doxygen.scilab.org/master_wg/d5/dfb/sci__stacksize_8c_source.html.
The condition that is FALSE and that triggers the message seems to me to show up at line 00295.
Inside that file, you see that error is displayed whenever the stacksize given as input is LARGER than what is returned by the method get_max_memory_for_scilab_stack() from the class:
scilab-5.5.1/modules/core/src/c/stackinfo.c
Again I found a version online at the following page:
http://doxygen.scilab.org/master_wg/dd/dfb/stackinfo_8h.html#afbd65a57df45bed9445a7393a4558395
The Method is declared from line 109.
It seems to invoke a variable called MAXLONG, which is however NEVER explicitly declared! As you see, it is declared several times (line 00019, 00035, 00043, 00050), but all lines are commented! [correction: the lines are NOT commented, it was my false understanding of # being a comment sign, but it's not]
So my guess is: MAXLONG is not declared, so the function does not return a value (or it returns 0) and therefore the error message is triggered because the stacksize given as input is higher than 0 or NULL or N/A.
My questions are then:
Why are all lines commented where MAXLONG is defined?
Where does MAXLONG originate from? Is it something passed from the kernel?
How can I solve the problem?
Thanks!
PS - I tried to uncomment the line in buildmacros, and it compiled and installed without issues. However, when I started scilab-cli, it displayed the same message again.
Edit after further investigation:
After further investigation, I found out that what I thought were the comments are indeed instructions for the compiler... but I kept those errors of mine, so that the answer to my question is understandable.
Here are my new points.
In Scilab I noticed that by giving an input stacksize out of bounds, the same method get_max_memory_for_scilab_stack() is invoked, to get the upper bound. The lower bound I've seen it's defined by default.
-->stacksize(1)
!--error 1504
stacksize: Out of bounds value. Not in [180000,268435454].
Also the stacksize used seems fine:
-->stacksize()
ans =
7999994. 332.
However, when trying to give such value an input inbetween, it fails.
-->stacksize(1)
!--error 1504
stacksize: Out of bounds value. Not in [180000,268435454].
It seems to invoke a variable called MAXLONG
It's not a variable, but a pre-processor macro.
Why are all lines commented where MAXLONG is defined?
You should ask that from the person who commented the lines. They're not commented in scilab-5.5.1 that's online.
Where does MAXLONG originate from? Is it something passed from the kernel?
It's defined in the file scilab-5.5.1/modules/core/src/c/stackinfo.c. It's defined to the same value as LONG_MAX which is defined by the standard c library (<limits.h> header). If the macro is not supplied by the standard library, then it's defined to some other, platform specific value.
How can I solve the problem?
If your problem originates from the lack of definition for MAXLONG, then you must define it. One way going about it is to uncomment the lines that define it. Or re-download the original sources since yours don't appear to match with the official ones.

Analyze backtrace of a crash occurring due to a faulty library

In my application I have setup signal handler to catch Segfaults, and print bactraces.
My application loads some plugins libraries, when process starts.
If my application crashes with a segfault, due to an error in the main executable binary, I can analyze the backtrace with:
addr2line -Cif -e ./myapplication 0x4...
It accurately displays the function and the source_file:line_no
However how do analyze if the crash occurs due to an error in the plugin as in the backtrace below?
/opt/myapplication(_Z7sigsegvv+0x15)[0x504245]
/lib64/libpthread.so.0[0x3f1c40f500]
/opt/myapplication/modules/myplugin.so(_ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi+0x6af)[0x7f5588fe4bbf]
/opt/myapplication/modules/myplugin.so(_Z11myplugin_reqmodP12CONNECTION_TP7Filebuf+0x68)[0x7f5588fe51e8]
/opt/myapplication(_ZN10Processors7ExecuteEiP12CONNECTION_TP7Filebuf+0x5b)[0x4e584b]
/opt/myapplication(_Z15process_requestP12CONNECTION_TP7Filebuf+0x462)[0x4efa92]
/opt/myapplication(_Z14handle_requestP12CONNECTION_T+0x1c6d)[0x4d4ded]
/opt/myapplication(_Z13process_entryP12CONNECTION_T+0x240)[0x4d79c0]
/lib64/libpthread.so.0[0x3f1c407851]
/lib64/libc.so.6(clone+0x6d)[0x3f1bce890d]
Both my application and plugin libraries have been compiled with gcc and are unstripped.
My application when executed, loads the plugin.so with dlopen
Unfortunately, the crash is occurring at a site where I cannot run the application under gdb.
Googled around frantically for an answer but all sites discussing backtrace and addr2line exclude scenarios where analysis of faulty plugins may be required.
I hope some kind-hearted hack knows solution to this dilemma, and can share some insights. It would be so invaluable for fellow programmers.
Tons of thanks in advance.
Here are some hints that may help you debug this:
The address in your backtrace is an address in the address space of the process at the time it crashed. That means that, if you want to translate it into a 'physical' address relative to the start of the .text section of your library, you have to subtract the start address of the relevant section of pmap from the address in your backtrace.
Unfortunately, this means that you need a pmap of the process before it crashed. I admittedly have no idea whether loading addresses for libraries on a single system are constant if you close and rerun it (imaginably there are security features which randomize this), but it certainly isn't portable across systems, as you have noticed.
In your position, I would try:
demangling the symbol names with c++filt -n or manually. I don't have a shell right now, so here is my manual attempt: _ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi is ICAPSection::process(CONNECTION_T *, Filebuf *, int). This may already be helpful. If not:
use objdump or nm (I'm pretty sure they can do that) to find the address corresponding to the mangled name, then add the offset (+0x6af as per your stacktrace) to this, then look up the resulting address with addr2line.
us2012's answer was quite the trick required to solve the problem. I am just trying to restate it here just to help any other newbie struggling with the same problem, or if somebody wishes to offer improvements.
In the backtrace it is clearly visible that the flaw exists in the code for myplugin.so. And the backtrace indicates that it exists at:
/opt/myapplication/modules/myplugin.so(_ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi+0x6af)[0x7f5588fe4bbf]
The problem of locating the line corresponding to this fault cannot be determined as simplistically as:
addr2line -Cif -e /opt/myapplication/modules/myplugin.so 0x7f5588fe4bbf
The correct procedure here would be to use nm or objdump to determine the address pointing to the mangled name. (Demangling as done by us2012 is not really necessary at this point). So using:
nm -Dlan /opt/myapplication/modules/myplugin.so | grep "_ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi"
I get:
0000000000008510 T _ZN11ICAPSection7processEP12CONNECTION_TP7Filebufi /usr/local/src/unstable/myapplication/sources/modules/myplugin/myplugin.cpp:518
Interesting to note here is that myplugin.cpp:518 actually points to the line where the opening "{" of the function ICAPSection::process(CONNECTION_T *, Filebuf *, int)
Next we add 0x6af to the address (revealed by the nm output above) 0000000000008510 using linux shell command
printf '0x%x\n' $(( 0x0000000000008510 + 0x6af ))
And that results in 0x8bbf
And this is the actual source_file:line_no of the faulty code, and can be precisely determined with addr2line as:
addr2line -Cif -e /opt/myapplication/modules/myplugin.so 0x8bbf
Which displays:
std::char_traits<char>::length(char const*)
/usr/include/c++/4.4/bits/char_traits.h:263
std::string::assign(char const*)
/usr/include/c++/4.4/bits/basic_string.h:970
std::string::operator=(char const*)
/usr/include/c++/4.4/bits/basic_string.h:514
??
/usr/local/src/unstable/myapplication/sources/modules/myplugin/myplugin.cpp:622
I am not too sure why the function name was not displayed here, but myplugin.cpp:622 was quite precisely where the fault was.

Linux Kernel Text Symbols

When I look through a linux kernel OOPS output, the EIP and other code address have values in the range of 0xC01-----. In my System.map and objdump -S vmlinux output, all the code addresses are at least above 0xC1------. My vmlinux has debug symbols included (CONFIG_DEBUG_INFO).
When I debug over a serial connection (kgdb), and I load gdb with gdb ./vmlinux, again I have the same issue that I cannot reconcile $eip with what I have in System.map and objdump output. When I run where in gdb, I get a jumbled mess on the stack:
#0 0xC01----- in ?? ()
#1 0xC01----- in ?? ()
#2 0xC01----- in ?? ()
...
Can anyone make any suggestions on how to resolve this/these issues? My main concern is how I actually map an eip value from an OOPS to System.map or objdump -S vmlinux. I know that the OOPS will give me the function name and offset into the object code, but I am more concerned about the previously mentioned issue and why gdb can't correctly display a stack backtrace.
Looks like the OOPS is because you jumped into a place that's not a function.
This would easily cause a crash, and would also prevent the debugger from resolving the address as a symbol.
You can check this by disassembling the area around this EIP. If I'm correct, it won't make sense as machine code.
There are generally two causes for such things:
1. Function call using a corrupt function pointer. In this case, the stack frame before the last should show the caller. But you don't have this frame, so it may be the other reason.
2. Stack overrun - your return address is corrupt, so you've returned to a bad location. If it's so, the data ESP points to should contain the address in EIP. Debugging stack overruns is hard, because the most important source of information is missing. You can try to print the stack in "raw" format (x/xa addr), and try to make sense of it.

GDB question - how do I go through disassembled code line by line?

I'd like to go through a binary file my teacher gave me line by line to check addresses on the stack and the contents of different registers, but I'm not extremely familiar with using gdb. Although I have the C code, we're supposed to work entirely from a binary file. Here are the commands I've used so far:
(gdb) file SomeCode
Which gives me this message:
Reading symbols from ../overflow/SomeCode ...(no debugging symbols found)...done.
Then I use :
(gdb) disas main
which gives me all of the assembly. I wanted to set up a break point and use the "next" command, but none of the commands I tried work. Does anyone know the syntax I would use?
try using ni which is nexti. equivalent is si which is step instruction
nexti if you want to jump over function calls.
stepi if you want to enter a function call.
The following documentation is very helpful; it has a list of all the important commands you could use on gdb.
X86-64: http://csapp.cs.cmu.edu/public/docs/gdbnotes-x86-64.pdf
IA32: http://csapp.cs.cmu.edu/public/docs/gdbnotes-ia32.pdf