C++ shared so library changed at runtime - c++

I have a program which is to be used with a shared library.
I have one library (1) that is compiled with the methods full of code i.e.
class A
{
public:
int* funcA(int a, int b)
{
int* pInt = new int;
*pInt = a + (b * 20);
return pInt;
}
};
Then I have another library (2) with exactly the same name and interface with nothing in the methods i.e. a dummy class
class A
{
public:
int* funcA(int a, int b)
{
return 0;
}
};
(note: code just used to illustrate my problem)
If I compile against library 1 and then use library 1 at runtime, everything works as expected.
If I compile against library 2 and then use library 1 at runtime, the first called to funcA dies.
If I used nm -D libMy.so and look at the offset of funcA at runtime, it is different. Is this included in the binary?
Ive read various manuals and tutorials but am none the wiser as to how the compilation and runtime aspect causes this failure. I would have thought the interface is the same so the method would succeed.
Thanks.

The reason this is failing is that you have linked against a different library and thus (as you have seen) the function offsets are different. The linker has placed the offsets into your compiled binary and so it will only run against that library. In order to accomplish what you are attempting here you will need to use dynamic library loading see this SO question for more info.
EDIT:
With a little further reading, I came across this PDF, you may find it helpful.

(I don't have enough rep to just make a comment below your question)
This might be because the program is prelinked (Linux) or prebinded (MacOS) although I am not 100% sure. Some basic info about it on wikipedia below. Have you encountered this on your searches through the manuals?
http://en.wikipedia.org/wiki/Prelink
http://en.wikipedia.org/wiki/Prebinding

Did you forget a -fPIC option while compiling libraries? Please, add compilation commands.

Related

C++ function instrumentation via clang++'s -finstrument-functions : how to ignore internal std library calls?

Let's say I have a function like:
template<typename It, typename Cmp>
void mysort( It begin, It end, Cmp cmp )
{
std::sort( begin, end, cmp );
}
When I compile this using -finstrument-functions-after-inlining with clang++ --version:
clang version 11.0.0 (...)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: ...
The instrument code explodes the execution time, because my entry and exit functions are called for every call of
void std::__introsort_loop<...>(...)
void std::__move_median_to_first<...>(...)
I'm sorting a really big array, so my program doesn't finish: without instrumentation it takes around 10 seconds, with instrumentation I've cancelled it at 10 minutes.
I've tried adding __attribute__((no_instrument_function)) to mysort (and the function that calls mysort), but this doesn't seem to have an effect as far as these standard library calls are concerned.
Does anyone know if it is possible to ignore function instrumentation for the internals of a standard library function like std::sort? Ideally, I would only have mysort instrumented, so a single entry and a single exit!
I see that clang++ sadly does not yet support anything like finstrument-functions-exclude-function-list or finstrument-functions-exclude-file-list, but g++ does not yet support -finstrument-functions-after-inlining which I would ideally have, so I'm stuck!
EDIT: After playing more, it would appear the effect on execution-time is actually less than that described, so this isn't the end of the world. The problem still remains however, because most people who are doing function instrumentation in clang will only care about the application code, and not those functions linked from (for example) the standard library.
EDIT2: To further highlight the problem now that I've got it running in a reasonable time frame: the resulting trace that I produce from the instrumented code with those two standard library functions is 15GB. When I hard code my tracing to ignore the two function addresses, the resulting trace is 3.7MB!
I've run into the same problem. It looks like support for these flags was once proposed, but never merged into the main branch.
https://reviews.llvm.org/D37622
This is not a direct answer, since the tool doesn't support what you want to do, but I think I have a decent work-around. What I wound up doing was creating a "skip list" of sorts. In the instrumented functions (__cyg_profile_func_enter and __cyg_profile_func_exit), I would guess the part that is contributing most to your execution time is the printing. If you can come up with a way of short-circuiting the profile functions, that should help, even if it's not the most ideal. At the very least it will limit the size of the output file.
Something like
#include <stdint.h>
uintptr_t skipAddrs[] = {
// assuming 64-bit addresses
0x123456789abcdef, 0x2468ace2468ace24
};
size_t arrSize = 0;
int main(void)
{
...
arrSize = sizeof(skipAddrs)/sizeof(skipAddrs[0]);
// https://stackoverflow.com/a/37539/12940429
...
}
void __cyg_profile_func_enter (void *this_fn, void *call_site) {
for (size_t idx = 0; idx < arrSize; idx++) {
if ((uintptr_t) this_fn == skipAddrs[idx]) {
return;
}
}
}
I use something like objdump -t binaryFile to examine the symbol table and find what the addresses are for each function.
If you specifically want to ignore library calls, something that might work is examining the symbol table of your object file(s) before linking against libraries, then ignoring all the ones that appear new in the final binary.
All this should be possible with things like grep, awk, or python.
You have to add attribute __attribute__((no_instrument_function)) to the functions that should not be instrumented. Unfortunately it is not easy to make it work with C/C++ standard library functions because this feature requires editing all the C++ library functions.
There are some hacks you can do like #define existing macros from include/__config to add this attribute as well. e.g.,
-D_LIBCPP_INLINE_VISIBILITY=__attribute__((no_instrument_function,internal_linkage))
Make sure to append existing macro definition with no_instrument_function to avoid unexpected errors.

Program crashing with embedded Python/C++ code across DLL boundary in Windows

Sorry for the long post. I've searched around quite a bit and couldn't find an answer for this so here it goes:
I am working developing a Python extension library using C++ (BoostPython). For testing, we have a Python-based test harness but I also want to add a separate C++ executable (eg. using BoostUnitTest or similar) to add further testing of the library including testing of the functionality that is not directly exposed to Python.
I am currently running this in Linux without problems. I am building the library and this then is dynamically linked to an executable that uses BoostUnitTest. Everything compiles and runs as expected.
In Windows though, I'm having problems. I think it might a problem with the registering of the C++->Python type converters across DLL boundaries.
To show the problem I have the following example:
In my library I have defined:
namespace bp = boost::python;
namespace bn = boost::numpy;
class DLL_API DummyClass
{
public:
static std::shared_ptr<DummyClass> Create()
{
return std::make_shared<DummyClass>();
}
static void RegisterPythonBindings();
};
void DummyClass::RegisterPythonBindings()
{
bp::class_<DummyClass>("DummyClass", bp::init<>())
;
bp::register_ptr_to_python< std::shared_ptr<DummyClass> >();
}
where DLL_API is the usual _declspec(…) for Windows. The idea is that this dummy class would be exported as part of a bigger Python module with
BOOST_PYTHON_MODULE(module)
{
DummyClass::RegisterPythonBindings();
}
From within the executable linking to the library I have (omitting includes, etc):
void main()
{
Py_Initialize();
DummyClass::RegisterPythonBindings();
auto myDummy = DummyClass::Create();
auto dummyObj = bp::object( myDummy );
}
The last line where I wrap myDummy within a boost::python::object crashes with an unhandled exception in Windows. The exception is being thrown from Python (throw_error_already_set). I believe (but could be wrong) that it is not finding an appropriate converter of the C++ type to Python, even though I made the call to register the bindings.
KernelBase.dll!000007fefd91a06d()
msvcr110.dll!000007fef7bde92c()
TestFromMain.exe!boost::python::throw_error_already_set(void)
TestFromMain.exe!boost::python::converter::registration::to_python(void const volatile *)
TestFromMain.exe!boost::python::converter::detail::arg_to_python_base::arg_to_python_base(void const volatile *,struct boost::python::converter::registration const &)
TestFromMain.exe!main() Line 66
TestFromMain.exe!__tmainCRTStartup()
kernel32.dll!0000000077a259cd()
ntdll.dll!0000000077b5a561()
As a test, I copied the exact same code defining the DummyClass all inside the executable just before the main function, instead of linking to the dll, and this works as expected.
Is my model of compiling as a DLL using embedded python in both sides of the boundary even possible in Windows (this is only used for a testing harness so I’d always use the exact same toolchain all over).
Thanks very much.
In case anyone ever reads this again, the solution in Windows was to compile Boost as dynamic libraries and link everything dynamically. We had to change the structure of our code a bit, but it now works.
There is a (small) reference in the Boost documentation stating that in Windows the dynamic lib version of Boost has one common register of types used for conversion between Python/C+. The doc doesn't mention not having a common register for the static lib version (but I now know it doesn't work).

dynamic_cast fails when used with dlopen/dlsym

Intro
Let me apologise upfront for the long question. It is as short as I could make it, which is, unfortunately, not very short.
Setup
I have defined two interfaces, A and B:
class A // An interface
{
public:
virtual ~A() {}
virtual void whatever_A()=0;
};
class B // Another interface
{
public:
virtual ~B() {}
virtual void whatever_B()=0;
};
Then, I have a shared library "testc" constructing objects of class C, implementing both A and B, and then passing out pointers to their A-interface:
class C: public A, public B
{
public:
C();
~C();
virtual void whatever_A();
virtual void whatever_B();
};
A* create()
{
return new C();
}
Finally, I have a second shared library "testd", which takes a A* as input, and tries to cast it to a B*, using dynamic_cast
void process(A* a)
{
B* b = dynamic_cast<B*>(a);
if(b)
b->whatever_B();
else
printf("Failed!\n");
}
Finally, I have main application, passing A*'s between the libraries:
A* a = create();
process(a);
Question
If I build my main application, linking against the 'testc' and 'testd' libraries, everything works as expected. If, however, I modify the main application to not link against 'testc' and 'testd', but instead load them at runtime using dlopen/dlsym, then the dynamic_cast fails.
I do not understand why. Any clues?
Additional information
Tested with gcc 4.4.1, libc6 2.10.1 (Ubuntu 9.10)
Example code available
I found the answer to my question here. As I understand it, I need to make the typeinfo available in 'testc' available to the library 'testd'. To do this when using dlopen(), two extra things need to be done:
When linking the library, pass the linker the -E option, to make sure it exports all symbols to the executable, not just the ones that are unresolved in it (because there are none)
When loading the library with dlopen(), add the RTLD_GLOBAL option, to make sure symbols exported by testc are also available to testd
In general, gcc does not support RTTI across dlopen boundaries. I have personal experience with this messing up try/catch, but your problem looks like more of the same. Sadly, I'm afraid that you need to stick to simple stuff across dlopen.
I have to add to this question since I encountered this problem as well.
Even when providing -Wl,-E and using RTLD_GLOBAL, the dynamic_casts still failed. However, passing -Wl,-E in the actual application's linkage as well and not only in the library seem to have fixed it.
If one have no control over the source of the main application, -Wl,-E is not applicable. Passing -Wl,-E to the linker while building own binaries (the host so and the plugins) do not help either.
In my case the only working solution was to load and unload my host so from the _init function of the host so itself using RTLD_GLOBAL flag (See code below). This solution works in both cases:
the main application links against the host so.
the main application loads host so using dlopen (without RTLD_GLOBAL).
In both cases one has to follow the instructions stated by gcc visibility wiki.
If one makes the symbols of the plugin and the host so visible to each other (by using #pragma GCC visibility push/pop or corresponding attribute) and loads the plugins (from the host so) by using RTLD_GLOBAL 1. will work also without loading and unloading the own so (as mentioned by link given above).
This solution makes 2. also work which has not been the case before.
// get the path to the module itself
static std::string get_module_path() {
Dl_info info;
int res = dladdr( (void*)&get_module_path, &info);
assert(res != 0); //failure...
std::string module_path(info.dli_fname);
assert(!module_path.empty()); // no name? should not happen!
return module_path;
}
void __attribute__ ((constructor)) init_module() {
std::string module = get_module_path();
// here the magic happens :)
// without this 2. fails
dlclose(dlopen(module.c_str(), RTLD_LAZY | RTLD_GLOBAL));
}

C++ operator new, object versions, and the allocation sizes

I have a question about different versions of an object, their sizes, and allocation. The platform is Solaris 8 (and higher).
Let's say we have programs A, B, and C that all link to a shared library D. Some class is defined in the library D, let's call it 'classD', and assume the size is 100 bytes. Now, we want to add a few members to classD for the next version of program A, without affecting existing binaries B or C. The new size will be, say, 120 bytes. We want program A to use the new definition of classD (120 bytes), while programs B and C continue to use the old definition of classD (100 bytes). A, B, and C all use the operator "new" to create instances of D.
The question is, when does the operator "new" know the amount of memory to allocate? Compile time or run time? One thing I am afraid of is, programs B and C expect classD to be and alloate 100 bytes whereas the new shared library D requires 120 bytes for classD, and this inconsistency may cause memory corruption in programs B and C if I link them with the new library D. In other words, the area for extra 20 bytes that the new classD require may be allocated to some other variables by program B and C. Is this assumption correct?
Thanks for your help.
Changing the size of a class is binary incompatible. That means that if you change the size of classD without recompiling the code that uses it, you get undefined behavior (most likely crashes).
A common trick to get around this limitation is to design classD so that it can be safely extended in a binary compatible way, for example by using the Pimpl idiom.
In any case, if you want different programs to use different versions of your class, I think you have no choice but releasing multiple versions of the shared library and have those programs linked to the appropriate version.
Compile Time, you should not change shared object size underneath their clients.
there is a simple workaround for that:
class foo
{
public:
// make sure this is not inlined
static foo* Create()
{
return new foo();
}
}
// at the client
foo* f = foo::Create();
You are correct the memory size is defined at compile time and applications B/C would be in danger of serious memory corruption problems.
There is no way to handle this explicitly at the language level. You need to work with the OS to get the appropriate shared libraries to the application.
You need to version your libraries.
As there is no explicit way of doing this with the build tools you need to do it with file names. If you look at most products this is approx how they work.
In the lib directory:
libD.1.00.so
libD.1.so -> libD.1.00.so // Symbolic link
libD.so -> libD.1.so // Symbolic link
Now at compile time you specify -lD and it links against libD.1.00.so because it follows the symbolic links. At run time it knows to use this version as this is the version it compiled against.
So you now update lib D to version 2.0
In the lib directory:
libD.1.00.so
libD.2.00.so
libD.1.so -> libD.1.00.so // Symbolic link
libD.2.so -> libD.2.00.so // Symbolic link
libD.so -> libD.2.so // Symbolic link
Now when you build with -libD it links against version 2. Thus you re-build A and it will use version 2 of the lib from now on; while B and C will still use version 1. If you rebuild B or C it will use the new version of the library unless you explicitly use an old version of the library when building -libD.1
Some linkers do not know to follow symbolic links very well so there are linker commands that help. gcc use the '-install_name' flag your linker may have a slightly different named flag.
As a runtime check it is usally a good idea to put version information into your shared objects (global variable/function call etc). Thus at runtime you can retrieve the shared libraries version information and check that your application is compatible. If not you should exit with the appropriate error message.
Also note: If you serialize objects of D to a file. You know need to make sure that version information about D is maintained. Libd.2 may know how to read version 1 D objects (with some explicit work), but the inverse would not be true.
Memory allocation is figured out at compile time. Changing the size of a class in D will trigger a recompile.
Consider publicly deriving from the class in question to extend it, if that would apply. Or, compose it in another object.
The amount of memory to allocate is determined at compile time when doing something like
new Object();
but it can be a dynamic parameter such as in
new unsigned char[variable];
I really advise you to go through some middleware to achieve what you want. C++ guarantees nothing in terms of binary interfaces.
Have you looked at protobuf?
In addition to the mentioned 'ad hoc' techniques, you can also model compatibility into your system by saying that your new class A is really a subclass of the 'old' class A. That way, your old code keeps working, but all code that needs the extended functionality needs to be revised.
This design principle is clearly visible in the COM world, where especially interfaces are never changed over versions, only extended by inheritance. Next to that, they only construct classes by the CreateInstance method, which moves the allocation problem to the library containing the class.

Alternatives to dlsym() and dlopen() in C++

I have an application a part of which uses shared libraries. These libraries are linked at compile time.
At Runtime the loader expects the shared object to be in the LD_LIBRARY_PATH , if not found the entire application crashes with error "unable to load shared libraries".Note that there is no guarantee that client would be having the library, in that case I want the application to leave a suitable error message also the independent part should work correctly.
For this purpose I am using dlsym() and dlopen() to use the API in the shared library. The problem with this is if I have a lot of functions in the API, i have to access them Individually using dlsym() and ptrs which in my case are leading to memory corruption and code crashes.
Are there any alternatives for this?
The common solution to your problem is to declare a table of function pointers, to do a single dlsym() to find it, and then call all the other functions through a pointer to that table. Example (untested):
// libfoo.h
struct APIs {
void (*api1)(void);
void *(*api2)(int);
long (*api3)(int, void *);
};
// libfoo.cc
void fn1(void) { ... }
void *fn2(int) { ... }
long fn3(int, void *) { ... }
APIs api_table = { fn1, fn2, fn3 };
// client.cc
#include "libfoo.h"
...
void *foo_handle = dlopen("libfoo.so", RTLD_LAZY);
if (!foo_handle) {
return false; // library not present
}
APIs *table = dlsym(foo_handle, "api_table");
table->api1(); // calls fn1
void *p = table->api2(42); // calls fn2
long x = table->api3(1, p); // calls fn3
P.S. Accessing your API functions individually using dlsym and pointers does not in itself lead to memory corruption and crashes. Most likely you just have bugs.
EDIT:
You can use this exact same technique with a 3rd-party library. Create a libdrmaa_wrapper.so and put the api_table into it. Link the wrapper directly against libdrmaa.so.
In the main executable, dlopen("libdrmaa_wrapper.so", RTLD_NOW). This dlopen will succeed if (and only if) libdrmaa.so is present at runtime and provides all API functions you used in the api_table. If it does succeed, a single dlsym call will give you access to the entire API.
You can wrap your application with another one which first checks for all the required libraries, and if something is missing it errors out nicely, but if everything is allright it execs the real application.
Use below type of code
Class DynLib
{
/* All your functions */
void fun1() {};
void fun2() {};
.
.
.
}
DynLib* getDynLibPointer()
{
DynLib* x = new Dynlib;
return x;
}
use dlopen() for loading this library at runtime.
and use dlsym() and call getDynLibPointer() which returns DynLib object.
from this object you can access all your functions jst as obj.fun1().....
This is ofcource a C++ style of struct method proposed earlier.
You are probably looking for some form of delay library load on Linux. It's not available out-of-the-box but you can easily mimic it by creating a small static stub library that would try to dlopen needed library on first call to any of it's functions (emitting diagnostic message and terminating if dlopen failed) and then forwarding all calls to it.
Such stub libraries can be written by hand, generated by project/library-specific script or generated by universal tool Implib.so:
$ implib-gen.py libxyz.so
$ gcc myapp.c libxyz.tramp.S libxyz.init.c ...
Your problem is that the resolution of unresolved symbols is done very early on - on Linux I believe the data symbols are resolved at process startup, and the function symbols are done lazily. Therefore depending on what symbols you have unresolved, and on what sort of static initialization you have going on - you may not get a chance to get in with your code.
My suggestion would be to have a wrapper application that traps the return code/error string "unable to load shared libraries", and then converts this into something more meaningful. If this is generic, it will not need to be updated every time you add a new shared library.
Alternatively you could have your wrapper script run ldd and then parse the output, ldd will report all libraries that are not found for your particular application.