Function call missing from C stack trace - c++

I'm importing a stack-tracing C code (found somewhere on Stack Overflow) in my code to trace where memory blocks have been allocated:
struct layout
{
struct layout *ebp;
void *ret;
};
struct layout *fr;
__asm__("movl %%ebp, %[fp]" : /* output */ [fp] "=r" (fr));
for (int i=1 ; i<8 && (unsigned char*) fr > dsRAM; i++) {
x[i] = (size_t) fr->ret;
fr = fr->ebp;
}
Things work fairly well, except that in some calls, the code is missing some functions near the top of the stack, e.g. GDB will report:
malloc() at main.cpp
operator new() from libstdc++.so.6
TestBasicScript() at BasicScript.cpp
main() at main.cpp
While the code fills x[] with the addresses of malloc, new operator and main(), missing TestBasicScript.
The code got compiled by g++ 4.5.1 (old devkit for homebrew console programming) with the following flags:
CFLAGS += -I libgeds/source/ -I wrappers -I $(DEVKITPRO)/include -DARM9 \
-include wrappers/nds/system.h -include wrappers/fake.h
CFLAGS += -m32 -Duint=uint32_t -g -Wall -Weffc++ -fno-omit-frame-pointer
I tried to use __builtin_return_address() instead, but I get pretty much the same result with much longer code.
EDIT: I noted that I'm systematically missing the caller of operator new, which could be explained if the code of _Znwj don't setup a stack frame. So the list of questions become :
How does GDB manage to find that TestBasicScript() function call if it's not in the stack frames list ?
How do I configure linking steps so that debug-friendly variant of libstdc++ (if any) is used ?
Original sub-question "Is there compile-time options that guarantee I can trace 100% of the calls to my malloc clone ?" is thus answered by #chqrlie: -O0 is all I should need. But it will be effective only if applied on all my binaries, shared libraries included.

There are many reasons why some frames might be omitted, like for example inlining and optimization (although the provided CFLAGS do not contain optimization flags and the default is AFAIK no optimization).
Anyway, for GCC there is builtin support of stack walking, by using backtrace(), backtrace_symbols() and perhaps combined with abi::__cxa_demangle(), you can try those as well.
Other option is to use libunwind, I was trying it as well with quite good results (and in its source code you can see some useful techniques for in-app stack walking).
All the above usually don't work very well with optimized (release) executables, in particular if they do not contain the debug info (although it might have been generated and stored aside) the printed stack will be useless (besides skipped frames because of the optimization).
An ultimate technique which works even for optimized code is generating a core dump. There you have all the information about the stack (the binary itself does not need to contain the debuginfo, it just can be left aside and only used for examining the core offline), and as a bonus values of all variables on the stack, information about all threads currently running etc.
For tracing memory allocations it is probably an overkill (it is also quite slow), but sometimes it can be pretty useful. In one of my projects I created a working implementation of such core dumper which is still present in the production code.
Note that you can actually generate a core dump of the app without terminating the application - the implementation I created basically works as follows:
fork() the process at the point where the core dump should be generated
the child process calls abort() to generate the core dump (the call stack of the forked process is the same as the original process), i.e. only the forked process is terminated by the abort()
the original parent process uses waitpid() to wait until the child process generates the core dump and terminates (with a guard counter to not wait forever)
then the original process continues running (and writes to the log that the diagnostic core has been generated along with the PID of the forked process which was used to generate the core)
This turned out to work pretty well in some situations where a diagnostic stack trace was required for release production application.
EDIT: Another option which I also tried is using ptrace() (if I remember well, that is also one of the techniques used by the libunwind mentioned above and actually also by GDB). That works the similar way - spawning a child process by fork() and then calling ptrace(PTRACE_TRACEME) in there; the parent process can then issue various ptrace() calls to examine the stack of the child (which happens to be the same as the stack of the parent at the point of fork()). I think the libunwind source code contain its use so you can examine it there.

The compiler may not always generate a stack frame with %ebp pointing the the previous frame. For some functions, it may generate code that uses %esp based addressing to retrieve the arguments, for others it may generate tail recursion with a jump instead of a call/ret sequence. The stack trace as you are trying to scan it may be incomplete.
Try compiling the whole project with optimisation disabled (-O0).

Related

How to get a caller graph from a given symbol in a binary

This question is related to a question I've asked earlier this day: I wonder if it's possible to generate a caller graph from a given function (or symbol name e.g. taken from nm), even if the function of interest is not part of "my" source code (e.g. located in a library, e.g. malloc())
For example to know where malloc is being used in my program named foo I would first lookup the symbol name:
nm foo | grep malloc
U malloc##GLIBC_2.2.5
And then run a tool (which might need a specially compiled/linked version of my program or some compiler artifacts):
find_usages foo-with-debug-symbols "malloc##GLIBC_2.2.5"
Which would generate a (textual) caller graph I can then process further.
Reading this question I found radare2 which seems to accomplish nearly everything you can imagine but somehow I didn't manage to generate a caller graph from a given symbol yet..
Progress
Using radare2 I've managed to generate a dot caller graph from an executable, but something is still missing. I'm compiling the following C++ program which I'm quite sure has to use malloc() or new:
#include <string>
int main() {
auto s = std::string("hello");
s += " welt";
return 0;
}
I compile it with static libraries in order to be sure all calls I want to analyze can be found in the binary:
g++ foo.cpp -static
By running nm a.out | grep -E "_Znwm|_Znam|_Znwj|_Znaj|_ZdlPv|_ZdaPv|malloc|free" you can see a lot of symbols which are used for memory allocation.
Now I run radare2 on the executable:
r2 -qAc 'agCd' a.out > callgraph.dot
With a little script (inspired by this answer) I'm looking for a call-path from any symbol containing "sym.operatornew" but there seems to be none!
Is there a way to make sure all information needed to generate a call graph from/to any function which get's called inside that binary?
Is there a better way to run radare2? It looks like the different call graph visualization types provide different information - e.g. the ascii art generator does provide names for symbols not provided by the dot generator while the dot generator provides much more details regarding calls.
In general, you cannot extract an exact control flow graph from a binary, because of indirect jumps and calls there. A machine code indirect call is jumping into the content of some register, and you cannot reliably estimate all the values that register could take (doing so could be proven equivalent to the halting problem).
Is there a way to make sure all information needed to generate a call graph from/to any function which get's called inside that binary?
No, and that problem is equivalent to the halting problem, so there would be never a sure way to get that call graph (in a complete and sound way).
The C++ compiler would (usually) generate indirect jumps for virtual function calls (they jump thru the vtable) and probably when using a shared library (read Drepper's How To Write Shared Libraries paper for more).
Look into the BINSEC tool (developed by colleagues from CEA, LIST and by INRIA), at least to find references.
If you really want to find most (but not all) dynamic memory allocations in your C++ source code, you might use static source code analysis (like Frama-C or Frama-Clang) and other tools, but they are not a silver bullet.
Remember that allocating functions like malloc or operator new could be put in function pointer locations (and your C++ code might have some allocator deeply buried somewhere, then you are likely to have indirect calls to malloc)
Maybe you could spend months of effort in writing your own GCC plugin to look for calls to malloc after optimizations inside the GCC compiler (but notice that GCC plugins are tied to one particular version of GCC). I am not sure it is worth the effort. My old (obsolete, non maintained) GCC MELT project was able to find calls to malloc with a size above some given constant. Perhaps in at least a year -end of 2019 or later- my successor project (bismon, funded by CHARIOT H2020 project) might be mature enough to help you.
Remember also that GCC is capable of quite fancy optimizations related to malloc. Try to compile the following C code
//file mallfree.c
#include <stdlib.h>
int weirdsum(int x, int y) {
int*ar2 = malloc(2*sizeof(int));
ar2[0] = x; ar2[1] = y;
int r = ar2[0] + ar2[1];
free (ar2);
return r;
}
with gcc -S -fverbose-asm -O3 mallfree.c. You'll see that the generated mallfree.s assembler file contain no call to malloc or to free. Such an optimization is permitted by the As-if rule, and is practically useful to optimize most usages of C++ standard containers.
So what you want is not simple even for apparently "simple" C++ code (and is impossible in the general case).
If you want to code a GCC plugin and have more than a full year to spend on that issue (or could pay at least 500k€ for that), please contact me. See also
https://xkcd.com/1425/ (your question is a virtually impossible one).
BTW, of course, what you really care about is dynamic memory allocation in optimized code (you really want inlining and dead code elimination, and GCC does that quite well with -O3 or -O2). When GCC is not optimizing at all (e.g. with -O0 which is the implicit optimization) it would do a lot of "useless" dynamic memory allocation, specially with C++ code (using the C++ standard library). See also CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” talk.

Is it possible to use addr2line with application compiled with release optimization arguments?

So I've got a backtrace
Exit with signal 11 at 2013-12-28_14:28:58.000000
/opt/s3ds/App(_Z7handlers+0x52) [0x5de322]
/lib/libc.so.6(+0x32230) [0x7f6ab9b3a230]
/opt/s3ds/App(_ZN17Service17Controller5frameERKf+0x505) [0x5a6b85]
/opt/s3ds/App(_ZN17Service15Cloud10updateEf+0x1de) [0x58642e]
/opt/s3ds/App(_ZN17Manager6updateEf+0x21b) [0x59194b]
/opt/s3ds/App(_ZN7Manager3runEv+0xd2) [0x604fa2]
/opt/s3ds/App() [0x62bfea]
/lib/libpthread.so.0(+0x68ca) [0x7f6abb0048ca]
/lib/libc.so.6(clone+0x6d) [0x7f6ab9bd7b6d]
I've compiled my application with next arguments:
-std=c++11 -fpermissive -m64 -g -rdynamic -mtune=native -flto -O3
So it is a release build with some minimal debug information.
I wonder if it is possible to use addr2line to get any line number from such optimized build?
I tried example shown here yet I get ??:0 like:
$ addr2line -e ./App 0x62bfea
??:0
for all adresses in []. I know tha functions in that trace from Service::Controller::frame up to Manager::run (and probably even that lambda /opt/s3ds/App() [0x62bfea]) shall be in my app code (not in some library).
So Is it possible to get line numbers for production optimized code? Are there any additional compiler argiments required to get them?
It may be possible, but it might not amount to much.
You have to understand that the very goal of optimizations is to alter the code to make it better (by some metric); and alteration means that the resulting code may not be meaningfully mapped to source code afterwards.
Some examples:
Dead Code Elimination and the like will remove existing code, this mainly affect an attempt to place a breakpoint at a given source-line since there may not be code for that line
Common Sub-Expression Elimination will create new temporary variables out of the blue to compute a sub-expression only once; those sub-expressions may have originally appeared in multiple expressions spread throughout the source code so the new instructions belong to multiple lines... or none at all
Invariant Hoisting or Loop Rotation will change the order in which expressions are computed compared with the original source code so that you might see code executed at line 3 then 6 then 4, 5, 7...
Loop Unrolling will copy/paste the body of a loop multiple times
And of course, those are local to a function, you also to have to account for
Function Inlining will copy paste the body of a function at the call site
Function Merging will take two different functions and remove one of them, forwarding its calls to the other (because they have the same behavior, of course)
After all that, is it even meaningful to try and reason in terms of source code ? No, not really. And of course I did not even account for the fact that all those transformations occurred on the Intermediate Representation and that the final emission of assembly code will scramble things even further (Strength Reduction, yeah!).
Honestly, even if addr2line gives you some line, I would doubt its result... and then what is the point of asking in the first place ?
I'm not sure. Normally, the rdynamic switch should be sufficient when the function is part of your own code (which seem to be the case, in your example)
Have you tried to compile with -fno-inline-functions -fno-inline-functions-called-once -fno-optimize-sibling-calls? It is useful when you profiling an optimized program. Maybe it can also help to solve your problem.
(Side note: Calling addr2line with the -C switch activates demangling, which is recommended since you are using C++.)

Boost threads: in IOS, thread_info object is being destructed before the thread finishes executing

Our project uses a few boost 1.48 libraries on several platforms, including Windows, Mac, Android, and IOS.
We are able to consistently get the IOS version of the project to crash (nontrivially but reliably) when using IOS, and
from our investigation we see that ~thread_data_base is being called on the thread's thread_info while its thread is still running.
This seems to happen as a result of the smart pointer reaching a zero count, even though it is obviously still
in scope in the thread_proxy function which creates it and runs the requested function in the thread.
This seems to happen in various cases - the call stack is not identical between crashes, though there are a few
variations which are common.
Just to be clear - this often requires running code which is creating hundreds of threads, though there are
never more than about 30 running simultaneously. I have "been lucky" and got it very very early in the
run also, but that's rare.
I created a version of the destructor which actually catches the code red-handed:
in libs/thread/src/pthread/thread.cpp:
thread_data_base::~thread_data_base()
{
boost::detail::thread_data_base* const thread_info=detail::get_current_thread_data();
void *void_thread_info = (void *) thread_info;
void *void_this = (void *) this;
// is somebody destructing the thread_data other than its own thread?
// (remember that its own which should no longer point to it anyway,
// because of the call to detail::set_current_thread_data(0) in thread_proxy)
if (void_thread_info) { // == void_this) {
__builtin_trap();
}
}
I should note that (as seen from the commented-out code) I had previously checked to see that void_thread_info == void_this because I
was only checking for the case where the thread's current thread_info was killing itself.
I have also seen cases where the value returned by get_current_thread_data is non-zero and
different from "this", which is really weird.
Also when I first wrote that version of the code, I wrote:
if (((void*)thread_info) == ((void*)this))
and at run-time I got some very weird exception that said I something about a virtual function table
or something like that - I don't remember. I decided that it was trying to call "==" for this object type
and was unhappy with that, so I rewrote as above, putting the conversions to void * as separate
lines of code. That in itself is quite suspicious to me. I am not one to run to rush to blame compilers, but...
I should also note that when we did catch this happening the trap, we saw the destructor for
~shared_count appear twice consecutively on the stack in Xcode source. Very doubleweird.
We tried to look at the disassembly, but couldn't make much out of it.
Again - it looks like this is always a result of the shared_count which seems to be owned by
the shared_ptr which owns the thread_info reaching zero too early.
Update: it seems that it is possible to get into situations which reach the above trap without the situation doing any harm. Since fixing the issue (see answer) I have seen it happen, but always after thread_info->run() has finished executing. Don't yet understand how...but it's working.
Some additional info:
I should note that the boost.sh from Pete Goodliffe (and modified by others) that is commonly used to compile boost for IOS
has the following note in the header:
: ${EXTRA_CPPFLAGS:="-DBOOST_AC_USE_PTHREADS -DBOOST_SP_USE_PTHREADS"}
# The EXTRA_CPPFLAGS definition works around a thread race issue in
# shared_ptr. I encountered this historically and have not verified that
# the fix is no longer required. Without using the posix thread primitives
# an invalid compare-and-swap ARM instruction (non-thread-safe) was used for the
# shared_ptr use count causing nasty and subtle bugs.
#
# Should perhaps also consider/use instead: -BOOST_SP_USE_PTHREADS
I use those flags, but to no avail.
I found the following which is very tantalizing - it looks like they had the same issue in std::thread:
http://llvm.org/bugs/show_bug.cgi?format=multiple&id=12730
That was suggestive of using an alternate implementation inside boost for arm processors which seems also to directly address this issue:
spinlock_gcc_arm.hpp
The version included with boost 1.48 uses outdated arm assembly.
I took the updated version from boost 1.52, but I'm having trouble compiling it.
I get the following error:
predicated instructions must be in IT block
I found a reference to what looks to be a similar use of this instruction here:
https://zeromq.jira.com/browse/LIBZMQ-414
I was able to use the same idea to get the 1.52 code to compile by modifying
the code as follows (I inserted an appropriate IT instruction)
__asm__ __volatile__(
"ldrex %0, [%2]; \n"
"cmp %0, %1; \n"
"it ne; \n"
"strexne %0, %1, [%2]; \n"
BOOST_SP_ARM_BARRIER :
"=&r"( r ): // outputs
"r"( 1 ), "r"( &v_ ): // inputs
"memory", "cc" );
But in any case, there are ifdefs in this file which look for the arm architecture, which is not defined that way in my environment. After I simply edited the file so that only ARM 7 code
was left, the compiler complains about the definition of BOOST_SP_ARM_BARRIER:
In file included from ./boost/smart_ptr/detail/spinlock.hpp:35:
./boost/smart_ptr/detail/spinlock_gcc_arm.hpp:39:13: error: instruction requires a CPU feature not currently enabled
BOOST_SP_ARM_BARRIER :
^
./boost/smart_ptr/detail/spinlock_gcc_arm.hpp:13:32: note: expanded from macro 'BOOST_SP_ARM_BARRIER'
# define BOOST_SP_ARM_BARRIER "dmb"
Any ideas??
Figured this out. It turns out that the boost.sh script that I mention in the question chose the incorrect boost flag to address this problem - instead of BOOST_SP_USE_PTHREADS (and the other flag there with it, BOOST_AC_USE_PTHREADS) it turns out that what is needed on IOS is BOOST_SP_USE_SPINLOCK. This ends up giving pretty much the identical solution used in the std::thread issue referred to in the question.
If you are compiling for any modern IOS device which uses ARM 7, but using an older boost (we are using 1.48), you need to copy the file spinlock_gcc_arm.hpp from a more recent boost (like 1.52). That file is #ifdef'd for the different arm architectures, but it is not clear to me that the defines it is looking for are defined in the IOS compile environment using the script. So you can either edit the file (violent but effective) or invest some time to figure out how to make this tidy and correct.
In any case, you may need to insert the extra assembly instruction that I did above in the question:
"it ne; \n"
I have not yet gone back to see if I can delete that now that I have my compile environment working problem.
However, we're not done yet. The code used in boost for this option includes, as discussed, ARM assembly language instructions. The ARM chips support two instruction sets which can't be mixed in a given module (not sure of the scope, but evidently file by file is an acceptable granularity when compiling). The instructions used in boost for this locking include non-Thumb instructions, but IOS by default uses the Thumb instruction set. The boost code, aware of the instruction set issue, checks to see that you have arm enabled but not thumb, but by default in IOS, thumb is on.
Getting the compiler to generate non-thumb ARM code depends on which compiler you are using in IOS - Apple's LLVM or LLVM GCC. GCC is deprecated, and Apple's LLVM is the default when you use XCode.
For the default Clang + Apple LLVM 4.1, you need to compile using the -mno-thumb flag. Also any files in your IOS app which use any part of boost which uses smart pointers will also have to be compiled using -mno-thumb.
To compile boost like this, I think you can just add -mno-thumb to the EXTRA_CPP_FLAGS in the script. (I modified the user-config.jam directly while experimenting and haven't yet gone back to clean up.)
For your app, in Xcode you need to select your target, then go into the Build Phases tab, and there select Compile sources. There you have the option of adding compile flags, so for each relevant file (which includes boost), add the -mno-thumb flag. You can do this directly in project.pbxproj also where each file has
settings = { COMPILER_FLAGS = ""; };
you just change this to
settings = { COMPILER_FLAGS = "-mno-thumb"; };
But there's a little more. You also have to modify the darwin.jam file in the tools/build/v2/tools directory. In boost 1.48, there is a code that says:
case arm :
{
options = -arch armv6;
}
This has to be modified to
case arm :
{
options = -arch armv7 ;
}
Finally, in the boost.sh script, in the function writeBjamUserConfig(), you should remove the references to -arch armv6.
If somebody knows how to do this a little more generally and cleanly, I'm sure we'd all benefit. For now, this is where I've gotten to, and I hope that this will help other IOS boost threads users. I hope that the various variants on the boost.sh IOS script out there will be updated. I plan to add some more links to this answer later.
Update: For a great article which describes the issue on the processor level,
see here:
http://preshing.com/20121019/this-is-why-they-call-it-a-weakly-ordered-cpu
Enjoy!
I use boost.asio, boost.thread, boost.smart_ptr etc. on iOS platform, the app always crash when run in release mode, which throws signal sigabrt. The crash call stack is :
__stack_chk_fail
boost::asio::detail::completion_handle
boost::asio::detail::task_ios_service_operation::complete
boost::asio::detail::task_io_service::do_run_one
boost::asio::detail::task_ios_service::run
boost::asio::io_service::run
![when create a asio work with creating new thread and io_service][1]
When trying to solve the problem, I found the following articles:
[boost-thread-threads-not-starting-on-the-iphone-ipad-in-release-build][2]
[The issue of spin_lock and thumb on iOS][3]
Then I try to add -mno-thumb to my project compile flag, and the problem occured in release mode is gone.
However, a new bug bring out : EXC_ARM_DA_ALIGN, which crashed at where I try to convert network data to host-endian.
As[this article][4] says, the ARM instructions strict that the memory data must be aligned.
And follow the article [Exc_arm_da_align][5], I fix it by using memcpy for the data convert, instead of directly converting from the pointer.
[1]: http://i.stack.imgur.com/3ijF4.png
[2]: http://stackoverflow.com/questions/4201262/boost-thread-threads-not-starting-on-the-iphone-ipad-in-release-builds/4245821#4245821
[3]: http://groups.google.com/group/boost-list/browse_thread/thread/7dc1e80659182ab3
[4]: https://brewx.qualcomm.com/bws/content/gi/common/appseng/en/knowledgebase/docs/kb95.html
[5]: http://www.cnblogs.com/unionfind/archive/2013/02/25/2932262.html

Tracking down the source code line of a crash from a non-debug built module

I have a widows crash-dump with a call stack showing me the module!functionname+offset of the function that caused the crash. The module is built without debug information using gcc.
The cause of the crash is an exception caused by a failed to write at a given address, i.e access violation(05), write violation(01)
On my development machine I have access to the same module built with debugging information. What I'm looking for is a way to track down the corresponding source code line that caused the crash, this by using the module!functionname+offset information as starting point.
The method name of the top frame in the call stack is a class destructor
The mangled function name is _ZN20ViewErrorDescriptionD0Ev+x79
Running objdump -d searching for the module!functionname+offset gives:
.... call *%eax
.... mov 0xffffffbc(%ebp), %eax
.... cmpl 0x0, 0x148(%eax)
trying to find this in the debug built file gives no match
The source code of the destructor only contains two delete pointerX calls.
Using gdb to load the debug built module(sharedlibrary) and then calling info line gives me a starting and ending address, using grep on the objdump output shows the corresponding disassembled code, which looks quite much like the one from the module without debug info, but still far from the same.
!NB - The output from info line says _ZN20ViewErrorDescriptionD2Ev not _ZN20ViewErrorDescriptionD0Ev as the crash dump says.
Taken from the ABI documentation:
::= D1 # complete object destructor
::= D2 # base object destructor
Where do I go from here?
Best regards
Kristofer H
Unfortunately even debug/non-debug builds may have different address layouts. The only way I'm aware of to accomplish something like this is to build with debug symbols and save off a copy of that binary. Then you can deploy a stripped version without the debug information.
Your approach attempting to locate the assembly code seems the most hopeful here. I would expand that even though: Try to look at a much larger chunk of assembly in the crashed file and see if you can generate more context yourself rather than having the computer attempt to match low-level instructions that might in fact slightly differ.
This works on the assumption that gcc compilation is 100% deterministic. I'm not sure how valid that assumption is. However, taking the further assumption that you still have exactly the same source code you could try enabling the gcc's -S command line option and rebuilding. This will result in a set of .s files, one for each source file, containing the assembly code. You can then search through this for the code machine code that you want to find.

How to debug a segmentation fault while the gdb stack trace is full of '??'?

My executable contains symbol table. But it seems that the stack trace is overwrited.
How to get more information out of that core please? For instance, is there a way to inspect the heap ? See the objects instances populating the heap to get some clues. Whatever, any idea is appreciated.
I am a C++ programmer for a living and I have encountered this issue more times than i like to admit. Your application is smashing HUGE part of the stack. Chances are the function that is corrupting the stack is also crashing on return. The reason why is because the return address has been overwritten, and this is why GDB's stack trace is messed up.
This is how I debug this issue:
1)Step though the application until it crashes. (Look for a function that is crashing on return).
2)Once you have identified the function, declare a variable at the VERY FIRST LINE of the function:
int canary=0;
(The reason why it must be the first line is that this value must be at the very top of the stack. This "canary" will be overwritten before the function's return address.)
3) Put a variable watch on canary, step though the function and when canary!=0, then you have found your buffer overflow! Another possibility it to put a variable breakpoint for when canary!=0 and just run the program normally, this is a little easier but not all IDE's support variable breakpoints.
EDIT: I have talked to a senior programmer at my office and in order to understand the core dump you need to resolve the memory addresses it has. One way to figure out these addresses is to look at the MAP file for the binary, which is human readable. Here is an example of generating a MAP file using gcc:
gcc -o foo -Wl,-Map,foo.map foo.c
This is a piece of the puzzle, but it will still be very difficult to obtain the address of function that is crashing. If you are running this application on a modern platform then ASLR will probably make the addresses in the core dump useless. Some implementation of ASLR will randomize the function addresses of your binary which makes the core dump absolutely worthless.
You have to use some debugger to detect, valgrind is ok
while you are compiling your code make sure you add -Wall option, it makes compiler will tell you if there are some mistakes or not (make sure you done have any warning in your code).
ex: gcc -Wall -g -c -o oke.o oke.c
3. Make sure you also have -g option to produce debugging information. You can call debugging information using some macros. The following macros are very useful for me:
__LINE__ : tells you the line
__FILE__ : tells you the source file
__func__ : tells yout the function
Using the debugger is not enough I think, you should get used to to maximize compiler ablity.
Hope this would help
TL;DR: extremely large local variable declarations in functions are allocated on the stack, which, on certain platform and compiler combinations, can overrun and corrupt the stack.
Just to add another potential cause to this issue. I was recently debugging a very similar issue. Running gdb with the application and core file would produce results such as:
Core was generated by `myExecutable myArguments'.
Program terminated with signal 6, Aborted.
#0 0x00002b075174ba45 in ?? ()
(gdb)
That was extremely unhelpful and disappointing. After hours of scouring the internet, I found a forum that talked about how the particular compiler we were using (Intel compiler) had a smaller default stack size than other compilers, and that large local variables could overrun and corrupt the stack. Looking at our code, I found the culprit:
void MyClass::MyMethod {
...
char charBuffer[MAX_BUFFER_SIZE];
...
}
Bingo! I found MAX_BUFFER_SIZE was set to 10000000, thus a 10MB local variable was being allocated on the stack! After changing the implementation to use a shared_ptr and create the buffer dynamically, suddenly the program started working perfectly.
Try running with Valgrind memory debugger.
To confirm, was your executable compiled in release mode, i.e. no debug symbols....that could explain why there's ?? Try recompiling with -g switch which 'includes debugging information and embedding it into the executable'..Other than that, I am out of ideas as to why you have '??'...
Not really. Sure you can dig around in memory and look at things. But without a stack trace you don't know how you got to where you are or what the parameter values were.
However, the very fact that your stack is corrupt tells you that you need to look for code that writes into the stack.
Overwriting a stack array. This can be done the obvious way or by calling a function or system call with bad size arguments or pointers of the wrong type.
Using a pointer or reference to a function's local stack variables after that function has returned.
Casting a pointer to a stack value to a pointer of the wrong size and using it.
If you have a Unix system, "valgrind" is a good tool for finding some of these problems.
I assume that since you say "My executable contains symbol table" that you compiled and linked with -g, and that your binary wasn't stripped.
We can just confirm this:
strings -a |grep function_name_you_know_should_exist
Also try using pstack on the core ans see if it does a better job of picking up the callstack. In that case it sounds like your gdb is out of date compared to your gcc/g++ version.
Sounds like you're not using the identical glibc version on your machine as the corefile was when it crashed on production. Get the files output by "ldd ./appname" and load them onto your machine, then tell gdb where to look;
set solib-absolute-prefix /path/to/libs