rogue missing c++ symbol - debugging strategies? - c++

I am building/using the python module for LAMMPS, which is the open source Molecular Dynamics simulator (project home, source).
The python module works by compiling the C++ application as a library, and using CDLL/ctypes to call a C function interface. When you call the CDLL() function in python, the load fails if there are any missing symbols that the OS doesn't find in the library itself, and can't load from other available libraries.
The particular symbol I'm getting as missing is a C++ mangled name __ZN3MPI3Win14Set_errhandlerERKNS_10ErrhandlerE, which is probably MPI_Win_set_errhandler (or some namespaced/object oriented equivilent with a similar name). For context, I've compiled it using the python/setup_serial.py file, which should build with a dummy MPI interface, and shouldn't reference any real MPI symbols at all; so this is a rogue reference that's crept in somewhere. I've also made some modifications to the source, but I get the same error when I disable all my changes.
My question is, what is the best debugging strategy for finding out where a symbol is referenced in a dynamic library giving this sort of error? So far, I've tried searching the source for references to this symbol (or parts of the name,) but I'm not finding any instances (in fact, the only results are the binary files from the python build process, of the library I'm having trouble importing.)
My next step is to search inside the binary somehow, I guess, but I have no idea where to begin that (or some other strategy).

c++filt is your friend
$ c++filt __ZN3MPI3Win14Set_errhandlerERKNS_10ErrhandlerE
MPI::Win::Set_errhandler(MPI::Errhandler const&)
Now do a quick google search on that library
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=512616
Looks like there is a case where parts are upgraded but not recompiled. The second thing is to look at the link line for the complier and see what libraries it includes.
Final last resort is to do something like:
readelf -s /path/to/libfoo.so
And start grepping around to see if it's defined somewhere.

Related

How to solve C++ linking error in shared library linked to a static library

I'm struggling with a linking error.
I have 3 modules:
static library A which defines function ole::compound_document::find_storage(const std::string&);
shared library B which is linked to A and uses the function;
executable C which is linked to B and uses functions from B (but does not call directly functions of library A).
During the linking of the executable C, I receive the following error message:
../../bin/B.so: undefined reference to `ole::compound_document::find_storage(std::string const&)'
The function is defined in library A.
If I run utility nm on shared library B, I receive the following output:
0000000001841c70 T ole::compound_document::find_storage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
U ole::compound_document::find_storage(std::string const&)
It shows two find_storage functions. One of them is defined another is not defined.
I'm trying to understand how can it happens. So far unsuccessful.
The problem appears under Linux (Ubuntu), compiler: clang-9. On Windows, I can build the libraries and the executable without any problem.
I've tried to create a minimal example, just putting 3 simple modules together with only a few functions. Everything works. The compiler uses only the first definition of the function. I don't understand where the second definition comes from. I suspected some mix of c++ standards but cannot find anything.
Any suggestions will be highly appreciated.
You will need to look for problems in library B's code. std::string, verbatim, is not something that's expected to be an actual type referenced from an exported symbol, since std::string is just an alias for a std::basic_string instance. Narrow this down by looking at symbols of all modules that were linked into library B, and once you identified the module that does, you'll need to figure out why.
Your question does not provide sufficient data to decisively identify the linking problem because, of course, that can only be done by inspecting all the object modules involved in the linking, and inspecting all the source code for the likely violations of the One Definition Rule, or ill-formed code that did not produce a diagnostic, but manifested itself as a link failure.
Therefore, the following answer is meant to be as a general guide to isolating these kinds of linking failure. I don't think this question qualifies to be hammered by the canonical answer.
You have a symbol in the linked shared library that's not getting resolved when linked to an executable, specifically:
ole::compound_document::find_storage(std::string const&)
Just like you ran nm on the shared library, you can use it on every module that went into the shared library, individually. This will find which object module the unresolved reference came from. If you don't find it, it must've come from the static library you linked with, so repeat your search there.
That reference came from one of the object modules that you used to build the shared library. You will find it this way, it's unlikely to pop into existence of its own.
Once you find the relevant module, you're then on your own, by looking at the actual code that was compiled, and figure out what's up. If you can't figure it out: divide and conquer. Take the object module, and split the source file into two files, half of the functions in each one, compile them separately, and then look and see where the unresolved reference comes from.
Finally: before doing all that, try the low hanging fruit: make clean, then recompile everything. This has all the hallmarks of a compiler switch, some of the object modules were compiled by a different compiler. If that static library was provided by a third party vendor as a binary blob, it must've been compiled by a different compiler or a different version of your compiler. C++ does not guarantee binary ABI compatibility.
As mentioned by n. 'pronouns' m.The problem was in inconsistent use of -D_GLIBCXX_USE_CXX11_ABI flag. I used c++14 standard, but some projects were compiled with the flag -D_GLIBCXX_USE_CXX11_ABI=1.
It was not easy to find out, because I use a lot of 3rd party libraries with conan package manager.

Receiving the error "undefined symbol" when loading C++ dynamic library from C executable

I am trying to write a plugin for a popular program whose code and compilation process I do not have control over. The program is written in C. However, I have written parts of my plugin in C++, since I use the QT5 library for graphics capabilities. The functions that the C program calls are written in C.
When the C program tries to load the plugin (shared library), it produces this error:
dlopen('build/libfoo.so') failed: build/libfoo.so: undefined symbol: _ZTV13JoystickPanel
JoystickPanel is a class in the C++ part of the program.
I've tried rewriting parts of the program in C, but the error was unaffected. I know that I could rewrite the entire program in C, but I'd rather not have to switch to another, more C-friendly GUI framework. I've also opened up libfoo.so in a text editor and search for JoystickPanel, but it appears to be mangled as _ZN13JoystickPanel.
Are there any compiler options or solutions that I'm missing?
I have no idea what _ZN13JoystickPanel means, since it's not apparently a valid mangled C++ name. It should perhaps be _ZN13JoystickPanelE, which would translate to JoystickPanel. That'd be symbol name for sure, but without much meaning anyway. You must have truncated something: I tried quite a bit and just can't generate an object file that would include _ZN13JoystickPanel as the complete symbol. It's just a prefix, there should be a "second half" attached to it - was there?
But _ZTV13JoystickPanel is the vtable for the JoystickPanel class. It's missing because you didn't provide implementations for all the virtual methods of the JoystickPanel class. Most likely, you didn't invoke moc properly, or forgot to compile and link its output.
You need to show a complete build script for your plugin at the very least (the .pro file, or CMakeLists.txt). You'll also need to provide a github link to your project (I presume it's open source).
The symbols you want to find in the compiled output are at least _ZTV13JoystickPanelD#Ev - virtual destructors, where # is a digit, _ZTV13JoystickPanel - the virtual method table,
Those symbols may be absent when compiled with optimization and/or LTCG, but also absent will be references to them.
You may wish to delete the build folder and rebuild your project, just to be sure. qmake is bad at dependency generation for the makefiles it produces, so if you use it, I suggest switching to cmake + ninja.
Apparently, I'd forgetten to put the #include "moc_controller.cpp" line at the bottom of a file that needed it.
For anyone else chasing this issue while using Qt on CMake, consider making sure that the proper lines are added.

How to detect multiply defined symbols

I have a common scenario:
Some source files for an executable which depend on some standard libraries and some user libraries. All the user libraries are statically linked into the executable whereas the standard libraries are linked dynamically to it.
Problem:
I believe that I have multiply defined symbols in my complete package (executable which already includes the user library code + shared standard libraries). The linker obviously has insight into it, but as I understand the linker won't complain unless it encounters multiple strong named symbols. My fear is that, while I am moving my code from solaris 8/sparc platform to solaris10/sparc platform, some standard unix functions have been implemented in the user libraries which are causing the app to crash at runtime. Note that the app runs fine in solaris 8/sparc platform
I have been facing weird issues which have led me to believe this might be the source
Modifying one variable from one library is changing the value of another variable in another library
Solaris 8-10: host2ip conversion problems
What I need:
Is there a way to easily list all multiply defined symbols?
Is there a way to easily list all multiply defined symbols stemming from user libraries?
Do you guys think the issue #1 might be caused by linking issues, or you feel it might be a sign of some other issue?
Edit1:
Since then I know that on generating map file using ld, it has a section of multiply defined symbol which I am going through to find names that look like standard library call. For people who do not know, the linker will only fail to link if it finds multiple symbols with the same name AND the names are strong names.
You could turn on MAP file generation in the compiler (actually linker) settings and look through the map file for symbols that match the UNIX system functions you are concerned about. You'd probably have to write a script to automate it, but this would be a good starting point. The command line switch is probably -map or something similar, it will depend on which compiler/linker you are using.
The actual problem that was happening is:
The library (let's call it lib1) had an array like below
#define ARRAY_SIZE 1024
SomeStruct* global_array[ARRAY_SIZE];
This array is used by my another library (let's call it lib2) which in turn is used by my application using an extern declaration for it.
While compiling lib2 (or is it the app not sure), we did not define ARRAY_SIZE at all. This somehow caused the compiler of lib2 (or the app) to miscalculate the size of global_array in-turn causing it to allocate the memory for some other variable at a location which was already allocated to the global_array.
By defining ARRAY_SIZE again while compiling my libs and apps, everything starts behaving normal. I do not fully understand what caused the issue and why it gets resolved since extern declaration of arrays do not contain the size. Also, if the library really used the MACRO ARRAY_SIZE, then why wouldn't the compilation fail? Also, there is a possibility, that the name used for the define is a standard name (the actual string was FD_SETSIZE)
My initial gut feeling about the linker was wrong.

Can a Visual Studio produced static library, be stripped of symbols?

I'll divide this questions in 3 parts:
I would like to produce a static library and strip off its symbols. (Debug info is already not included)
Similar to the strip command in linux. Can it be done?
Is there an equivalent tool in windows env, to the nm tool in linux?
When creating a static library using VS2008. Is it possible to define a script that will exclude some of the produced .obj files out of the build and out of the static lib?
Can it be dynamic? I mean I'd define a compilation mode in the script and this would result in specific object files being excluded from the build
If anything is visible that you feel should not be, try declaring it with the "static" keyword. This tells the compiler that it is accessible only to the current module.
There are cases where it would be convenient to be able to strip out all but a small number of "exported" public symbols, but it's not really feasible.
A static library is little more than a collection of .obj files. The internal dependencies haven't been resolved yet, and they won't be resolved until link time.
For example, if your .lib consists of foo.obj and bar.obj, and there's a call in foo.obj to a function defined in bar.obj, then that symbol must be available at link time, even if nothing outside of the library should be able to see it.
For that reason, you cannot strip the symbols (with the possible exception of file-scope static symbols). Even class methods that are protected or private (in the C++-sense) will exist in the symbol table, since the enforcement of the visibility is a compile-time issue, not a link-time one.
In contrast, a dynamic library is a standalone binary that has already been linked. References from foo.obj to bar.obj have already been resolved. Thus a DLL can be stripped of symbols except for the ones that must be exported (and even those can be renamed or replaced by ordinals).
If your DLL exposes a simple C API, then you're all set. But if you want to expose a C++ class, you're probably going to end up exporting all of its methods, even the protected and private ones (since inlining in the external application might result in direct calls to private methods).
No, how do you think the users of the static library would link to it without knowing where are the symbols they use defined?
Yes, try the DUMPBIN utility.
Well, yes. You can run the LIB utility with /REMOVE:foo.
That said, I think you are doing something that either is not worth doing or could be done a lot simpler than with removing library members.
I kept finding the names of certain (but not all) static functions in .obj files produced by VS2010. Interestingly, they were visible in my Release .obj files but not the Debug .obj files. I just used cygwin strings to perform the search:
$ strings myObjectFile.obj | grep myStaticFunctionName
I tracked it down to the "Whole Program Optimization = Yes" setting ("/GL"). When I switched this to "No" the function names no longer appear.
Update: As a followup test I opened the "cleansed" myObjectFile.obj in vim and I can still find them (with either :set encoding=utf-8 or :set encoding=latin1). I'm not sure why strings was missing the matches. Oh well.

static link library

I am writing a hello world c++ application, in the instruction #include help the compiler or linker to import the c++ library. My " cout << "hello world"; " use a cout in the library. The question is after compile and generated exe is about 96k in size, so what instructions are actually contained in this exe file, does this file also contains the iostream library?
Thanks
In the general case, the linker will only bring in what it needs. Once the compiler phase has turned your source code into an object file, it's treated much the same as all other object files. You have:
the C start-up code which prepares the execution environment (sets up argv, argv and so on) then calls your main or equivalent.
your code itself.
whatever object files need to be dragged in from libraries (dynamic linking is a special case of linking that happens at runtime and I won't cover that here since you asked specifically about static linking).
The linker will include all the object files you explicitly specify (unless it's a particularly smart linker and can tell you're not using the object file).
With libraries, it's a little different. Basically, you start with a list of unresolved symbols (like cout). The linker will search all the object files in all the libraries you specify and, when it finds an object file that satisfies that symbol, it will drag it in and fix up the symbol references.
This may, of course, add even more unresolved symbols if, for example, there was something in the object file that relies on the C printf function (unlikely but possible).
The linker continues like this until all symbols are satisfied (when it gives you an executable) or one cannot be satisfied (when it complains to you bitterly about your coding practices).
So as to what is in your executable, it may be the entire iostream library or it may just be the minimum required to do what you asked. It will usually depend on how many object files the iostream library was built into.
I've seen code where an entire subsystem went into one object file so, that if you wanted to just use one tiny bit, you still got the lot. Alternatively, you can put every single function into its own object file and the linker will probably create an executable as small as possible.
There are options to the linker which can produce a link map which will show you how things are organised. You probably won't generally see it if you're using the IDE but it'll be buried deep within the compile-time options dialogs under MSVC.
And, in terms of your added comment, the code:
cout << "hello";
will quite possibly bring in sizeable chunks of both the iostream and string processing code.
Use cl /EHsc hello.cpp -link /MAP. The .map file generated will give you a rough idea which pieces of the static library are present in the .exe.
Some of the space is used by C++ startup code, and the portions of the static library that you use.
In windows, the library or part of the libraries (which are used) are also usually included in the .exe, the case is different in case of Linux. However, there are optimization options.
I guess this Wiki link will be useful : Static Libraries