I'm trying to compile a Fortran program (vasp) on Ubuntu 14.04.
I was successful on an older system (13.10), but after re-installing my machine, I get a symbol lookup error:
/usr/lib/libmpi_f77.so.1: undefined symbol: mpi_fortran_errcodes_ignore__
Strangely, the symbol seems to be actually there:
readelf -W -s /usr/lib/libmpi_f77.so.1 | grep "errcodes_ignore"
16: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND mpi_fortran_errcodes_ignore__
142: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND mpi_fortran_errcodes_ignore
244: 0000000000000000 0 OBJECT GLOBAL DEFAULT UND mpi_fortran_errcodes_ignore_
I suspect that this may be due to the symbol being only in the static part of the library, for when I run
readelf -W -s -D /usr/lib/libmpi_f77.so.1 | grep "errcodes_ignore"
I don't get any result, but I'm a bit clueless on how whether this really is the problem and how to solve this.
Any help would be appreciated.
Thanks,
Martin.
I actually solved this with the help of this question:
symbol lookup error despite the symbol being present in linked library
My LD_LIBRARY_PATH contained a path to a different version of MPI (used for another software I am using). Changing the path before running the program resolved the issue.
Related
I am trying to link a shared library I have no control on. This library has an undefined symbol (nm output):
U __aarch64_swp1_acq_rel
Which seems to be defined in libgcc.a:
[user#fedora ~]$ nm -a /usr/lib/gcc/aarch64-redhat-linux/12/libgcc.a | grep swp1_acq_rel
0000000000000000 T __aarch64_swp1_acq_rel
[user#fedora ~]$ objdump -t /usr/lib/gcc/aarch64-redhat-linux/12/libgcc.a | grep swp1_acq_rel
0000000000000000 g F .text 000000000000002c .hidden __aarch64_swp1_acq_rel
But whenever I try to link, I get the error in the title of this question. I understand that this symbol is hidden for dynamic linking (please confirm if I am wrong). So my question is what is the right approach to link against this libgcc symbol when the shared library I am using (and linking against) does not define it.
I expected that it would be possible to resolve this symbol with the libgcc.a in my system. Why is it hidden?
Compile it with cflags "-mno-outline-atomics" can solve my problem.
GCC 10.0 enables calls to out-of-line helpers to implement atomic operations.
You can view the compile code to see the differences:
https://godbolt.org/z/z8W7z1cqx
When I try to see definition of cout, I land to iostream file where it is declared as,
extern _CRTDATA2 ostream cout;
So where it is defined? Because extern is just declaration and not definition.
Global symbols are defined in a run-time library that you link with your applications. For example, in gcc you pass the compiler option -lstdc++ that will link your application with the libstdc++.a library. That is where all such symbols reside.
Though, this is specific to your compiler/run-time library version and will vary. Microsoft Visual C++ may behave differently but the idea is the same: the symbols are inside the precompiled libraries that are delivered with your C++ compiler.
With GNU you can type:
nm -g libstdc++.a
to see the symbols inside the library. The output may look like this (among lots of other lines):
ios_init.o:
U _ZSt3cin
globals_io.o:
0000000000000000 D _ZSt3cin
0000000000000000 D _ZSt4cerr
0000000000000000 D _ZSt4clog
0000000000000000 D _ZSt4cout
0000000000000000 D _ZSt4wcin
0000000000000000 D _ZSt5wcerr
0000000000000000 D _ZSt5wclog
0000000000000000 D _ZSt5wcout
I am after some suggestions as to how to go about debugging a significant problem that I cannot reduce to a minimal example.
The problem: I compile my application which links to a number of different libraries. The flags include:
-static-libstdc++ -static-libgcc -pipe -std=c++1z -fno-PIC -flto=10 -m64 -O3 -flto=10 -fuse-linker-plugin -fuse-ld=gold -UNDEBUG -lrt -ldl
The compiler is gcc-7.3.0, compiled against binutils-2.30.
Boost is compiled with the same flags as the rest of the program, and linked statically.
When the program is linked, I get various warnings about relocation refers to discarded section, both in my own code, and in boost.
For instance:
/tmp/ccq2Ddku.ltrans13.ltrans.o:<artificial>:function boost::system::(anonymous namespace)::generic_error_category::message(int) const: warning: relocation refers to discarded section
Then when I run the program, it segfaults on destruction, with the backtrace:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff7345a49 in __run_exit_handlers () from /lib64/libc.so.6
#2 0x00007ffff7345a95 in exit () from /lib64/libc.so.6
#3 0x00007ffff732eb3c in __libc_start_main () from /lib64/libc.so.6
#4 0x000000000049b3e3 in _start ()
The function pointer attempting to be called is 0x0.
If I remove using static-libstdc++, the linker warnings and runtime segfault go away.
If I change from c++1z to c++14, the linker warnings and runtime segfault go away.
If I remove -flto, the linker warnings and runtime segfault go away.
If I add "-g" to the compile flags, the linker warnings and runtime segfault go away.
I have tried asking gold for extra debugging, by specifying -Wl,--debug=all, but it tells me seemingly nothing relevant.
If I try and use a small section of the code that appears relevant, compile and link it separately but to the same boost libraries (ie. attempting to produce minimal example), there are no linker warnings, and the program runs to completion without issues.
Help! What can I do to narrow the problem down?
This warning is usually indicative of an inconsistency in the contents of a COMDAT group between two compilation units. If the compiler emits a COMDAT group G with symbol A defined in one compilation unit, but emits the same group G with symbols A and B defined in a second compilation unit, the linker will keep group G from the first compilation unit and discard group G from the second. Any references to symbol B from outside the group in the second compilation unit will produce this error.
The cause is usually a bug in the compiler, and using -flto makes it that much harder to diagnose. In this case, your second compilation unit is the result of link-time optimization (the *.ltrans.o file name). With LTO, it's quite believable that many of the changes you've mentioned will make the problem go away.
The very latest version of gold on the master branch of the binutils git repo has a new [-Wl,]--debug=plugin option, which will save a log and all the temporary .ltrans.o files. Having the log and those files, along with all the original input files (which you can get a list of by adding the [-Wl,]-t option), should help isolate the problem better.
The latest version of gold will also print the symbol referenced by the relocation. For a local symbol, it will show the symbol index; use readelf -s to get more info about the symbol. For a global symbol, it will show the name; you can add the --no-demangle option for the exact name.
If it's a local symbol, the problem is almost certainly the compiler. References from outside a comdat group to a local symbol in the group are strictly forbidden.
If it's a global symbol, it could be either a compiler problem or a one-definition rule (ODR) violation in your sources. You'll need to identify the comdat group in the named object file, find its key symbol, then find the object file that provided the definition kept by the linker (the -y option will help), and compare the symbols defined in those groups by the two objects. These steps should help:
(1) Starting from the error message:
b.o(.data+0x0): warning: relocation refers to symbol "two" defined in discarded section
(2) Look for symbol "two" in b.o:
$ readelf -sW b.o | grep two
7: 0000000000000008 0 NOTYPE WEAK DEFAULT 6 two
The next-to-last field ("6") is the section number where "two" is defined.
(3) Verify that section 6 is in fact a comdat group:
$ readelf -SW b.o
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 6] .one PROGBITS 0000000000000000 000058 000018 00 WAG 0 0 1
The "G" in the sh_flags field ("Flg") indicates the section belongs to a comdat group.
(4) Find the comdat group containing the section:
$ readelf -g b.o
COMDAT group section [ 1] `.group' [one] contains 1 sections:
[Index] Name
[ 6] .one
This shows us that section 6 is a member of group section 1.
(5) Find the key symbol for that group:
$ readelf -SW b.o
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 1] .group GROUP 0000000000000000 000040 000008 04 7 8 4
The sh_info field ("Inf") tells us the key symbol is symbol #8, which is "one". (That should match the name shown in brackets in step 4.)
$ readelf -sW b.o
Num: Value Size Type Bind Vis Ndx Name
8: 0000000000000000 0 NOTYPE WEAK DEFAULT 6 one
(6) Now you can add the -y one option to your link to find which objects provided a definition of "one":
$ gcc -Wl,-y,one ...
a.o: definition of one
b.o: definition of one
The first one listed (a.o) is the one that gold keeps; it will discard all subsequent comdat groups with the same key symbol.
If you use the same techniques to examine the comdat group that defines "one" in a.o, and compare the symbols that belong to that group with those that belong to the group in b.o, that should give you more clues.
I have built SpiderMonkey on Windows. They provide MSVC++ toolchain and I couldn't build it for mingw. I've built it for 64bit.
It is a DLL, I need to convert its lib to gnu C++ format (.lib to .a).
After looking on the web, I've found here how to do this, roughly:
gendef mozjs-45.dll
dlltool --as-flags=--64 -m i386:x86-64 -k --output-lib mozjs-45.a --input-def mozjs-45.def
I use TDM-GCC-64 under Code::Blocks. At link time it throws errors like:
undefined reference to `__imp__Z13JS_GetPrivateP8JSObject'
I have checked the lib content using:
nm libmozjs-45.a > libmozjs-45.nm
I see there are the same entries as in the def file exported, but that looks different than linker expects (I presume):
?JS_GetPrivate##YAPEAXPEAVJSObject###Z
Edit 1
I have managed to build SpiderMonkey with mingw-w64. Now, at linking time I get the following error:
undefined reference to `__imp__ZN17JSAutoCompartmentC1EP9JSContextP8JSObject'
Looking with nm at the lib, I have:
d000536.o:
0000000000000000 i .idata$4
0000000000000000 i .idata$5
0000000000000000 i .idata$6
0000000000000000 i .idata$7
0000000000000000 t .text
0000000000000000 I __imp__ZN17JSAutoCompartmentC1EP9JSContextP8JSObjectON7mozilla6detail19GuardObjectNotifierE
U _head_mozjs_45_dll
0000000000000000 T _ZN17JSAutoCompartmentC1EP9JSContextP8JSObjectON7mozilla6detail19GuardObjectNotifierE
Indeed, the definition of the class is:
class MOZ_RAII JS_PUBLIC_API(JSAutoCompartment)
{
JSContext* cx_;
JSCompartment* oldCompartment_;
public:
JSAutoCompartment(JSContext* cx, JSObject* target
MOZ_GUARD_OBJECT_NOTIFIER_PARAM);
JSAutoCompartment(JSContext* cx, JSScript* target
MOZ_GUARD_OBJECT_NOTIFIER_PARAM);
~JSAutoCompartment();
MOZ_DECL_USE_GUARD_OBJECT_NOTIFIER
};
Why the same compiler exports this as __imp__ZN17JSAutoCompartmentC1EP9JSContextP8JSObjectON7mozilla6detail19GuardObjectNotifierE, but, when referencing it, expects it as __imp__ZN17JSAutoCompartmentC1EP9JSContextP8JSObject?
Answer: missed a symbol definition that exclude MOZ_GUARD_OBJECT_NOTIFIER_PARAM from build.
$ nm --demangle /usr/bin/../lib/gcc/x86_64-linux-gnu/4.9/libsupc++.a | grep "__cxxabiv1::__class_type_info::~__class_type_info"
gives following output:
0000000000000000 T __cxxabiv1::__class_type_info::~__class_type_info()
0000000000000000 T __cxxabiv1::__class_type_info::~__class_type_info()
0000000000000000 T __cxxabiv1::__class_type_info::~__class_type_info()
U __cxxabiv1::__class_type_info::~__class_type_info()
U __cxxabiv1::__class_type_info::~__class_type_info()
So, how to interpret this output?
Here is multiple definitions of the symbol (three T's) - how it could be? Why linker produced such library with violated ODR? What is the purpose? And why all of them have the same (and strange) address (0000000000000000)?
How the same symbol could be both defined (T) and undefined (U) simultaneously?
A static library (archive file, .a) is essentially a collection of individual .o files (plus some indexing information so the linker can find the .o files it needs). Some of those undefined symbols are in a different object than the one that defines them. If you look at the full output of nm this becomes clear. (Or use the -o flag to nm.)
The reason you have multiple defined symbols is that demangle isn't a 1-to-1 operation. In my copy of libsupc++, those three definitions are:
0000000000000000 T _ZN10__cxxabiv117__class_type_infoD0Ev
0000000000000000 T _ZN10__cxxabiv117__class_type_infoD1Ev
0000000000000000 T _ZN10__cxxabiv117__class_type_infoD2Ev
Why are there several symbols which all demangle to the destructor? They're destructors for different situations. gcc uses the Itanium ABI for C++, whose name mangling rules are described here.