LD_BIND_NOW: Symbol lookup error but executable still running - c++

I am trying to diagnose linker/runtime errors using setenv LD_BIND_NOW TRUE. When I run the executable with this option enabled, I get the error
lib/libmkl_intel_thread.so: error: symbol lookup error: undefined symbol: DftiFreeDescriptor (fatal)
However, if I then remove the LD_BIND_NOW environmental variable, the program executes just fine (until termination, whereupon it reports a memory corruption--though that might be unrelated).
So I am a bit confused: How does the program execute when it has a symbol lookup error? I thought it would have to terminate as the program is written in C++, not Java. (See here for reference.)
Also, does this error imply that my rpath is set incorrectly, or has the MKL so been built improperly? Is there a fix that can be achieved in bounded time?

Firstly, I thought you needed LD_BIND_NOW=1 (as opposed to TRUE, though that may be a synonym).
Secondly, although your application would not have linked had there been an unresolved symbol, is it possible you've done some form of shared library update so that one of the libraries used now uses a library in turn with an unresolved symbol? Or that it's using a different library to that with which it was linked?

Related

What / where is __scrt_common_main_seh?

A third party library in my program is trying to call __scrt_common_main_seh by way of the Microsoft library msvcrt.lib, but is defined by some unknown library and therefore gives a linker error. I don't know what this function is supposed to do or where it is defined.
I looked online for this function, but did not find any clues except for general descriptions of what linker errors are.
I believe it might be doing some setup for win32 GUI applications. The library which defines it might be configured as project dependency by Visual Studio but my project is using Bazel.
Summary
For non-console applications having error error LNK2019: unresolved external symbol main referenced in function "int __cdecl __scrt_common_main_seh(void)" try adding linker flag /ENTRY:wWinMainCRTStartup or /ENTRY:WinMainCRTStartup
For console applications having that error, make sure to implement a main() function.
Details
This answer shows that __scrt_common_main_seh is normally called during mainCRTStartup which is the default entry point for windows console applications. __scrt_common_main_seh is then (indirectly) responsible for calling main().
My program did not have a main() function, which might have prevented the compiler from generating __scrt_common_main_seh (Just speculating. I am totally clueless about who defines __scrt_common_main_seh)
I did find, however, that the library I was linking against defined a wWinMain() function. So I tried adding the linker flag /ENTRY:wWinMainCRTStartup and the linker error went away.

OS X equivalent of --unresolved-symbols=ignore-in-object-files

On Linux (CentOS) I have occasionally used -Wl,--unresolved-symbols=ignore-in-object-files when building a test application that only depends on parts of some object files even though the full dependency would require a lot more object files to be included. The point is that I know by design any unresolved symbols are never encountered when running the test application (otherwise it should just crash).
On OS X, I found similar options -Wl,-undefined,suppress (or warning, dynamic_lookup),-flat_namespace which allowed me to build the binary, but it failed at run time complaining about dyld: Symbol not found: ... even though the missing symbols are never used during the run (the same application runs perfectly fine on CentOS).
Is there something else to force the application to run (till it crashes if ever an unresolved symbol is encountered) like on Linux?

Symbol lookup error at runtime instead of load time

I have an application which uses a class Foo from an .so shared library. I've come across a problem where at runtime it prints
<appname>: symbol lookup error: <appname>: undefined symbol: <mangled_Foo_symbol_name>
Now, it turned out that the unmangled symbol was for the constructor of the class Foo, and the problem was simply that an old version of the library was loaded, which didn't contain Foo yet.
My question isn't about resolving the error (that's obviously to use the correct library), but why it appears at runtime instead of at time of load / startup.
The line of code causing the error just instantiates an object of class Foo, so I'm not using anything like dlopen here, at least not explicitly / to my knowledge.
In contrast, if I remove the whole library from the load search path, I get this error at startup:
<appname>: error while loading shared libraries: libname.so.2: cannot open shared object file: No such file or directory
When the wrong version of gcc / libstdc++ is on the load path, an error also appears at starup:
<appname>: /path/to/gcc-4.8.0/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by <appname>)
This "fail fast" behavior is much more desirable, I don't want to run my application for quite awhile first, until I finally realize it's using the wrong library.
What causes the load error to appear at runtime and how can I make it appear immediately?
From the man page of ld.so:
ENVIRONMENT
LD_BIND_NOW (libc5; glibc since 2.1.1) If set to a nonempty string, causes the dynamic linker to resolve all symbols at program startup instead of deferring function call resolution to the point when they are first referenced. This is useful when using a debugger.
LD_WARN (ELF only)(glibc since 2.1.3) If set to a nonempty string, warn about unresolved symbols.
I think you can not statically link .so library. If you want to avoid load/run time errors you have to use all static libraries (.a). If you do not have static version of library and source then try to find some statifier. After googling I find few statifiers but do not know how do they work so leaving that part up to you.

Getting "error LNK2001: unresolved external symbol _gnutls_free" when using GnuTLS 3.1.6 from Visual Studio 2012

I am attempting to build a project in Visual Studio 2012 that uses GnuTLS. I downloaded the latest official Windows build from the website, and created a link library by running lib /def:libgnutls-28.def in the bin directory form a Visual Studio command prompt.
After adding a typedef long ssize_t, the code compiles fine, but linking fails with the following error:
source_file.obj : error LNK2001: unresolved external symbol _gnutls_free
C:\Path\to\executable.exe : fatal error LNK1120: 1 unresolved externals
I am calling gnutls_free to free some memory allocated and returned by the library. If I remove the call to gnutls_free, the project links successfully. Given that gnutls_free is just a global variable (containing a function pointer) exported by the library, I'm not sure why accessing it results in an unresolved reference to a different symbol. I have verified that gnutls_free is not #defineed to anything.
As a test, I tried doing gnutls_free_function test = gnutls_free; which also resulting in the link error. Running grep -w -r _gnutls_free . on the GnuTLS source code returns nothing, so I am at a loss.
Any ideas for getting this working would be greatly appreciated.
EDIT:
Adding __declspec(dllimport) to the declaration of gnutls_free in gnutls.h allows the link to succeed. Is there any way to accomplish this without maintaining a custom version of the header file?
There doesn't seem to be a way to have the linker or import library automatically dereference the IAT's pointer to the data item the same way that is done for functions (via a small trampoline function that is statically linked into the module importing the function). The __declspec(dllimport) attribute tells that compiler that this dereferencing needs to be done so it can insert code to perform the dereferencing of the IAT pointer implicitly. This allows exported data to be accessed and for functions allows the compiler to call the imported function via an indirect call through the IAT pointer rather than by calling the trampoline function.
See a couple of Raymond Chen's articles about dllimport for a good explanation of what goes on for function calls (he didn't discuss importing data, unfortunately):
Calling an imported function, the naive way
How a less naive compiler calls an imported function
The MS linker or import library doesn't have a mechanism to help the compiler get imported data in a 'naive' way - the compiler needs the the __delcspec(dllimport) hint that an extra dereference through the IAT is needed. Anyway, the point of all this is that it seems there's no way to import data except by using the __declspec(dllimport) attribute.
If you want to avoid modifying the gnutls distribution (which I can understand), here's one rather imperfect workaround:
You can create a small object file that contains nothing but a simple wrapper for gnutls_free(); since gnutls_free() has an interface with no real dependencies, you can have the necessary declarations 'hardcoded' instead of including gnutls.h:
typedef void (*gnutls_free_function) (void *);
__declspec(dllimport) extern gnutls_free_function gnutls_free;
void xgnutls_free(void* p)
{
gnutls_free(p);
}
Have your code call xgnutls_free() instead of gnutls_free().
Not a great solution - it requires your code to call a wrapper (so it's particularly not great if you'll be incorporating 3rd party code that might depend on gnutls_free()), but it might be good enough.

Strange symbol lookup error in libstdc++

Trying to track down a segfault somewhere in MPI, I got this error:
./mpitest: symbol lookup error: /usr/lib64/libstdc++.so.6: bàþ;# BC_
-------------------------------------------------------------------
mpirun has exited due to process rank 2 with PID 8729 on ...
First, I'm used to getting lookup errors when the process is loaded if the library path is wrong. But those all happen before the process starts executing. This happened in the middle of the output from the test. Shouldn't all symbols be resolved by the runtime loader before the process starts?
Second, that symbol looks like garbage. It's certainly not a normal mangled C++ symbol.
Is it possible for memory corruptions (since I am tracking a segfault, it's likely there's something like that going on) to corrupt symbols like this?
This was compiled with icpc 12.0.3 20110309 on a Linux 2.6.18-194.32.1.el5 x86_64 machine.
OpenMPI loads plugins as dynamic shared object at runtime when MPI_INIT is called. See this FAQ. Therefore symbol lookup happens at that time. So it looks to me that your OpenMPI's libmpi_cxx.so was built against a different libstdc++ than what is available or found at runtime. on the system.
You can either rebuild OpenMPI, or if the correct libstdc++ is somewhere on your system (not /usr/lib64/libstdc++.so.6), you can adjust your LD_LIBRARY_PATH. Also, try setting LD_DEBUG=files to see if you are in fact load 2 different libstdc++'s.