Find sourcecode line which causes undefined reference error - c++

sooner or later when programming in C/C++ everyone will face the "undefined reference error".
Often this is caused by missing libraries and most of those errors are fixed within seconds by linking against the missing libraries.
However, when for instance one uses templates with seperate files for declaration and implementation, one may get undefined reference caused by "unintended" template instantiation. Unfortunately, all information we now get is an instance of "undefined reference error", without possible hints for the cause such as line numbers of the callers, etc.
What I am curious about:
Is there an easy way to spot the actual sourcecode line(s) that calls the function/the template causing the undefined reference error?

As I mentioned in my answer to this question, whether or not it's straightforward to get a line number causing the link error depends on whether the compiler emitted all the necessary information.
To begin with, these are the cases I've run into that lead to the behavior you're seeing:
The compiler emitting faulty debug info (solaris studio 12.3 with debugging/optimizations under certain circumstances)
A destructor executing for an object going out of scope
Code inserted by the compiler:
stack protector
sanitizers
other tools that instrument code either for debugging or profiling
What I'll suggest for tracking it down may help if you have a link error resembling:
asdf.o: In function `whatever':
asdf.o(.text+0x1238): undefined reference to `fdsa'
... because at the very least you have an address to work with.
First, try addr2line:
~ addr2line -e asdf.o 0x1238
# If it works, you'll get:
asdf.cc:N
# If it doesn't work, you'll get:
??:?
Failing that, try objdump:
~ objdump --dwarf=decodedline asdf.o
asdf.o: file format elf64-x86-64
Decoded dump of debug contents of section .debug_line:
CU: asdf.cc:
File name Line number Starting address
asdf.cc 1 0x1234
asdf.cc 3 0x1254
asdf.cc 5 0x1274
In the completely fabricated example I've given here there isn't an entry in .debug_line corresponding to 0x1238 (the address in the linker error), so it could be compiler magic (eg extra code added by something like stack protector or a sanitizer), or hopefully it's related to whatever is happening on lines 1/3 since the address is between those two lines.
If that doesn't give you enough to go on: when I wanted a little more to go on I did the following:
Insert a link flag to stop it from demangling to get the mangled symbol
Recompile the object file, but have it generate assembly instead
Search the assembly for the mangled symbol
Assuming the assembly is annotated well enough it shouldn't be terribly hard to correlate the missing symbol + info from objdump + the assembly and at least get a fix on the line of code to start the rest of your search (assuming you still have more rabbit holes to go down as is often the case with STL).

Related

Is it possible to use addr2line with application compiled with release optimization arguments?

So I've got a backtrace
Exit with signal 11 at 2013-12-28_14:28:58.000000
/opt/s3ds/App(_Z7handlers+0x52) [0x5de322]
/lib/libc.so.6(+0x32230) [0x7f6ab9b3a230]
/opt/s3ds/App(_ZN17Service17Controller5frameERKf+0x505) [0x5a6b85]
/opt/s3ds/App(_ZN17Service15Cloud10updateEf+0x1de) [0x58642e]
/opt/s3ds/App(_ZN17Manager6updateEf+0x21b) [0x59194b]
/opt/s3ds/App(_ZN7Manager3runEv+0xd2) [0x604fa2]
/opt/s3ds/App() [0x62bfea]
/lib/libpthread.so.0(+0x68ca) [0x7f6abb0048ca]
/lib/libc.so.6(clone+0x6d) [0x7f6ab9bd7b6d]
I've compiled my application with next arguments:
-std=c++11 -fpermissive -m64 -g -rdynamic -mtune=native -flto -O3
So it is a release build with some minimal debug information.
I wonder if it is possible to use addr2line to get any line number from such optimized build?
I tried example shown here yet I get ??:0 like:
$ addr2line -e ./App 0x62bfea
??:0
for all adresses in []. I know tha functions in that trace from Service::Controller::frame up to Manager::run (and probably even that lambda /opt/s3ds/App() [0x62bfea]) shall be in my app code (not in some library).
So Is it possible to get line numbers for production optimized code? Are there any additional compiler argiments required to get them?
It may be possible, but it might not amount to much.
You have to understand that the very goal of optimizations is to alter the code to make it better (by some metric); and alteration means that the resulting code may not be meaningfully mapped to source code afterwards.
Some examples:
Dead Code Elimination and the like will remove existing code, this mainly affect an attempt to place a breakpoint at a given source-line since there may not be code for that line
Common Sub-Expression Elimination will create new temporary variables out of the blue to compute a sub-expression only once; those sub-expressions may have originally appeared in multiple expressions spread throughout the source code so the new instructions belong to multiple lines... or none at all
Invariant Hoisting or Loop Rotation will change the order in which expressions are computed compared with the original source code so that you might see code executed at line 3 then 6 then 4, 5, 7...
Loop Unrolling will copy/paste the body of a loop multiple times
And of course, those are local to a function, you also to have to account for
Function Inlining will copy paste the body of a function at the call site
Function Merging will take two different functions and remove one of them, forwarding its calls to the other (because they have the same behavior, of course)
After all that, is it even meaningful to try and reason in terms of source code ? No, not really. And of course I did not even account for the fact that all those transformations occurred on the Intermediate Representation and that the final emission of assembly code will scramble things even further (Strength Reduction, yeah!).
Honestly, even if addr2line gives you some line, I would doubt its result... and then what is the point of asking in the first place ?
I'm not sure. Normally, the rdynamic switch should be sufficient when the function is part of your own code (which seem to be the case, in your example)
Have you tried to compile with -fno-inline-functions -fno-inline-functions-called-once -fno-optimize-sibling-calls? It is useful when you profiling an optimized program. Maybe it can also help to solve your problem.
(Side note: Calling addr2line with the -C switch activates demangling, which is recommended since you are using C++.)

C++ name mangling in a so

Here's what i did:
I changed a .h file from
SomeObj* getCacheObj( int i = 0 );
to
SomeObj* getCacheObj( int i );
SomeObj* getCacheObj();
I recompiled the code (no problems), the changes went to somelib.so (one of many so files). I then replaced the old so on the equipment with this one and got the folowing error when loading the so:
undefined symbol: _ZN13KeypathHelper11getCacheObjEv
Now the strange part is that I've been told this class is only used in this so file (How can I make sure?). I am not that experienced and not sure how to investigate. Any suggestions are welcome.
Update
This particular problem was caused because another so file was using the KeypathHelper class and I only replaced the one containing it. The way I found out which other so needed to be updated was by greping all so's for KeypathHelper.
The _ZN13KeypathHelper11getCacheObjEv symbol is a mangled name for KeypathHelper::getCacheObj() (you can easily translate using c++filt, for example). Given that you have only added a method and whatever is loading the shared object cannot find it makes me think that you either haven't updated the shared object or forgot to provide a definition for KeypathHelper::getCacheObj() (in other words — implement the method).
In order to investigate, you have to see what is failing to resolve the symbol. Usually, developers have a sense for it. Say, if a binary XXX cannot load library YYY due to unresolved symbol, then XXX is using it and it does not appear to be in YYY (or anywhere else for that matter). If there is no sense for that, one can resort to reading ld.so (8) manual page and debug the dynamic linker by using available means like defining LD_DEBUG.
Also, #PlasmaHH has asked a very good question. If the only change you made was to the header file, then you must know that a single function/method with a default value for a parameter is not the same as as two functions/methods where one has a parameter and one does not.
As for your second question about how to make sure that symbol in a shared object is not being used outside — you have to change the symbol visibility so that nobody from the outside is able to link/resolve/use the symbol. For example, see GCC Visibility.
Hope it helps. Good Luck!

How do debuggers get line numbers of commands?

I'm trying to get line numbers of address I collected in a stackwalk by using symgetlinefromaddr64, but I can't seem to get addresses of simple commands or their lines.
for example, if i'm looking at the method:
void Test(int g)
{
g++;
DoSomething(g);
g--;
}
I'll get only the line number of "DoSomething", but I want the line numbers of "g++" etc.
I suppose it is doable because debuggers do it.
how can I do it myself in c++ on windows?
A stack walk will only retrieve addresses that are stored on the stack, which pretty much means function calls. If you want the address of your g++ or g--, you'll need to use something other than a stack walk to get them (e.g., SymGetFileLineOffsets64). If you're starting from a stackwalk and have info from SymGetLineFromAddr64, you can use SymGetLineNext64 and SymGetLinePrev64 to get information about the surrounding lines.
The only way to do it is to use compiler generated symbol files like the *.pdb files for microsoft visual studio compilers (pdb stands for program database). These files contain all symbols used during the compilation step. Even for a release compilation you'll get information about the symbols in use (some may have be optimized away).
The main disadvantage is that this is highly compiler dependent/specific. gcc for example may include symbol information in the executable so-file or executable. Other compilers have other formats...
What compiler do you use (name/version)?

Tracking down the source code line of a crash from a non-debug built module

I have a widows crash-dump with a call stack showing me the module!functionname+offset of the function that caused the crash. The module is built without debug information using gcc.
The cause of the crash is an exception caused by a failed to write at a given address, i.e access violation(05), write violation(01)
On my development machine I have access to the same module built with debugging information. What I'm looking for is a way to track down the corresponding source code line that caused the crash, this by using the module!functionname+offset information as starting point.
The method name of the top frame in the call stack is a class destructor
The mangled function name is _ZN20ViewErrorDescriptionD0Ev+x79
Running objdump -d searching for the module!functionname+offset gives:
.... call *%eax
.... mov 0xffffffbc(%ebp), %eax
.... cmpl 0x0, 0x148(%eax)
trying to find this in the debug built file gives no match
The source code of the destructor only contains two delete pointerX calls.
Using gdb to load the debug built module(sharedlibrary) and then calling info line gives me a starting and ending address, using grep on the objdump output shows the corresponding disassembled code, which looks quite much like the one from the module without debug info, but still far from the same.
!NB - The output from info line says _ZN20ViewErrorDescriptionD2Ev not _ZN20ViewErrorDescriptionD0Ev as the crash dump says.
Taken from the ABI documentation:
::= D1 # complete object destructor
::= D2 # base object destructor
Where do I go from here?
Best regards
Kristofer H
Unfortunately even debug/non-debug builds may have different address layouts. The only way I'm aware of to accomplish something like this is to build with debug symbols and save off a copy of that binary. Then you can deploy a stripped version without the debug information.
Your approach attempting to locate the assembly code seems the most hopeful here. I would expand that even though: Try to look at a much larger chunk of assembly in the crashed file and see if you can generate more context yourself rather than having the computer attempt to match low-level instructions that might in fact slightly differ.
This works on the assumption that gcc compilation is 100% deterministic. I'm not sure how valid that assumption is. However, taking the further assumption that you still have exactly the same source code you could try enabling the gcc's -S command line option and rebuilding. This will result in a set of .s files, one for each source file, containing the assembly code. You can then search through this for the code machine code that you want to find.

How to debug a segmentation fault while the gdb stack trace is full of '??'?

My executable contains symbol table. But it seems that the stack trace is overwrited.
How to get more information out of that core please? For instance, is there a way to inspect the heap ? See the objects instances populating the heap to get some clues. Whatever, any idea is appreciated.
I am a C++ programmer for a living and I have encountered this issue more times than i like to admit. Your application is smashing HUGE part of the stack. Chances are the function that is corrupting the stack is also crashing on return. The reason why is because the return address has been overwritten, and this is why GDB's stack trace is messed up.
This is how I debug this issue:
1)Step though the application until it crashes. (Look for a function that is crashing on return).
2)Once you have identified the function, declare a variable at the VERY FIRST LINE of the function:
int canary=0;
(The reason why it must be the first line is that this value must be at the very top of the stack. This "canary" will be overwritten before the function's return address.)
3) Put a variable watch on canary, step though the function and when canary!=0, then you have found your buffer overflow! Another possibility it to put a variable breakpoint for when canary!=0 and just run the program normally, this is a little easier but not all IDE's support variable breakpoints.
EDIT: I have talked to a senior programmer at my office and in order to understand the core dump you need to resolve the memory addresses it has. One way to figure out these addresses is to look at the MAP file for the binary, which is human readable. Here is an example of generating a MAP file using gcc:
gcc -o foo -Wl,-Map,foo.map foo.c
This is a piece of the puzzle, but it will still be very difficult to obtain the address of function that is crashing. If you are running this application on a modern platform then ASLR will probably make the addresses in the core dump useless. Some implementation of ASLR will randomize the function addresses of your binary which makes the core dump absolutely worthless.
You have to use some debugger to detect, valgrind is ok
while you are compiling your code make sure you add -Wall option, it makes compiler will tell you if there are some mistakes or not (make sure you done have any warning in your code).
ex: gcc -Wall -g -c -o oke.o oke.c
3. Make sure you also have -g option to produce debugging information. You can call debugging information using some macros. The following macros are very useful for me:
__LINE__ : tells you the line
__FILE__ : tells you the source file
__func__ : tells yout the function
Using the debugger is not enough I think, you should get used to to maximize compiler ablity.
Hope this would help
TL;DR: extremely large local variable declarations in functions are allocated on the stack, which, on certain platform and compiler combinations, can overrun and corrupt the stack.
Just to add another potential cause to this issue. I was recently debugging a very similar issue. Running gdb with the application and core file would produce results such as:
Core was generated by `myExecutable myArguments'.
Program terminated with signal 6, Aborted.
#0 0x00002b075174ba45 in ?? ()
(gdb)
That was extremely unhelpful and disappointing. After hours of scouring the internet, I found a forum that talked about how the particular compiler we were using (Intel compiler) had a smaller default stack size than other compilers, and that large local variables could overrun and corrupt the stack. Looking at our code, I found the culprit:
void MyClass::MyMethod {
...
char charBuffer[MAX_BUFFER_SIZE];
...
}
Bingo! I found MAX_BUFFER_SIZE was set to 10000000, thus a 10MB local variable was being allocated on the stack! After changing the implementation to use a shared_ptr and create the buffer dynamically, suddenly the program started working perfectly.
Try running with Valgrind memory debugger.
To confirm, was your executable compiled in release mode, i.e. no debug symbols....that could explain why there's ?? Try recompiling with -g switch which 'includes debugging information and embedding it into the executable'..Other than that, I am out of ideas as to why you have '??'...
Not really. Sure you can dig around in memory and look at things. But without a stack trace you don't know how you got to where you are or what the parameter values were.
However, the very fact that your stack is corrupt tells you that you need to look for code that writes into the stack.
Overwriting a stack array. This can be done the obvious way or by calling a function or system call with bad size arguments or pointers of the wrong type.
Using a pointer or reference to a function's local stack variables after that function has returned.
Casting a pointer to a stack value to a pointer of the wrong size and using it.
If you have a Unix system, "valgrind" is a good tool for finding some of these problems.
I assume that since you say "My executable contains symbol table" that you compiled and linked with -g, and that your binary wasn't stripped.
We can just confirm this:
strings -a |grep function_name_you_know_should_exist
Also try using pstack on the core ans see if it does a better job of picking up the callstack. In that case it sounds like your gdb is out of date compared to your gcc/g++ version.
Sounds like you're not using the identical glibc version on your machine as the corefile was when it crashed on production. Get the files output by "ldd ./appname" and load them onto your machine, then tell gdb where to look;
set solib-absolute-prefix /path/to/libs