I have an application consisting of different modules written in C++.
One of the modules is meant for handling distributed tasks on SunGrid Engine. It uses the DRMAA API for
submitting and monitoring grid jobs.If the client doesn't supports grid, local machine should be used
The shared object of the API libdrmaa.so is linked at compile time and loaded at runtime.
If the client using my application has this ".so" everything is fine but in case the client doesn't have that ,
the application exits failing to load shared libraries.
To avoid this , I have replaced the API calls with function pointers obtained using dlsym() and dlopen().
Now I can use the local machine instead of grid if the call to dlopen doesn't succeeds and my objective is achieved.
The problem now is that the application now runs successfully for small testcases but with larger testcases it throws segmentation fault while the same code using dynamic loading works correctly.
Am I missing something while using dlsym() and dlopen()?
Is there any other way to achieve the same goal?
Any help would be appreciated.
Thanx,
It is very unlikely to be a direct problem with the code loaded via dlsym() - in the sense that the dynamic loading makes it seg-fault.
What it may be doing is exposing a separate problem, probably by moving stuff around. This probably means a stray (uninitialized) pointer that points somewhere 'legitimate' in the static link case but somewhere else in the dynamic link case - and the somewhere else triggers the seg-fault. Indeed, that is a benefit to you in the long run - it shows that there is a problem that otherwise might remain undetected for a long time.
I regard this as particularly likely since you mention that it occurs with larger tests and not with small ones.
As Jonathan Leffler says, the problem very likely exists in the case where you are using the API directly; it just hasn't caused a crash yet.
Your very first step when you get a SIGSEGV should be analyzing the resulting core dump (or just run the app directly under debugger), and looking where it crashed. I'll bet $0.02 that it's crashing somewhere inside malloc or free, in which case the problem is plain old heap corruption, and there are many heap-checker tools available to help you catch it. Solaris provides watchmalloc, which is a good start.
If you are throwing an exception across a extern "C" function then the application has to quit. This is because the C ABI does not have the facilities to propagate exceptions.
To counter this when using DLL's (or shared libs) you normally have a one C function that returns a C++ object. Then the remaining interaction is with that C++ object that was returned from the DLL.
This pattern suggests (and I stress suggests) a factory like object, thus your DLL should have a single extern "C" function that returns a void* which you can reinterpret_cast<> back into a C++ factory object.
Related
What can one reasonably do to try to contain catastrophic failures (such as crashes) within a shared library plugin written in C++, beyond catching all exceptions using catch (...) { /* handle here */ }?
My situation is a bit peculiar so let me describe it in detail. The "plugins" extend Mathematica (a programming language). They must be compiled into shared libraries which export a number of C (not C++) functions all of which have the same signature (similar to argc, argv, etc.). These functions can use a simple API to handle "arguments", return a value, or call back to Mathematica. After the "plugin" is loaded, these C functions will be exposed as Mathematica functions.
My contribution here is a tool that takes the interface description of a C++ class and automatically generates these standard C functions to call class members. I want to generate code to try to catch failures and report on them. So far the only protection I have is catching all exceptions.
Note 1: I am not interested in trying to run the plugin in a separate process for performance reasons. The API that Mathematica provides give direct access to large numerical matrices within Mathematica. I absolutely want to avoid copying these. Mathematica also provides a different API for out-of-process plugins, which I do use when performance is not critical.
Note 2: I cannot change Mathematica itself, e.g. control how it loads shared libraries.
Note 3: I looked a bit at signal handlers, but I worry that it's not a good idea to set once since it will affect the host application, which I have no control over. I only want to catch failures within the plugin itself, not other parts of the host.
I think this should be possible to some extent because MATLAB does it: if a MATLAB "plugin" (i.e. MEX file written in C) crashes, it won't immediately bring down the entire application. Instead it shows an error like this:
Notice the "Attempt to Continue" button. This is precisely the kind of thing I want to do: report some information about the problem (such as which plugin misbehaved, if this can be found out) and give a last chance to the user to try to save their work, while warning them about the potential consequences of continuing.
I expect some answers to be platform specific. I'm interested in Windows, OS X and GNU/Linux.
I have a strange issue that I am trying to work out for someone. I don't have any access to the code. There is a program that loads a DLL and has somewhat of a plugin framework. They provide virtually no documentation beyond how to import functions from the DLL and what calling convention to use for exports.
This person's plugin imports functions from a DLL (let's assume they used the proper calling conventions and imported properly). It periodically runs into access violations (usually access violation write/read from 0x0000000). Sometimes, it crashes the program and Event Viewer shows exception code 0xc0000005 (another access violation) with faulting module SHLWAPI.dll.
Using depends, I have determined that the program is statically linked to msvcr. I found that the plugin DLL dynamically links to msvcr120.dll.
Yes, I am aware that this is just asking for trouble and the access violations are no surprise, but unfortunately, I have to deal with someone else's problem.
Anyway, my question is this:
Let's say is imported from this DLL and inside is a call to a function that is provided by msvcr120. When the program calls the imported , is it possible that it is calling from the msvcr it is statically linked to rather than from msvcr120?
I realize that it probably depends on the main program's plugin framework, but general feedback would be appreciated.
Thanks in advance!
There are known issues when using multiple copies of the CRT in one program, even when they all use the same version of the CRT (see Potential Errors Passing CRT Objects Across DLL Boundaries). If the CRTs are different versions, there are lots of other problems due to different size or layout of internal structures.
Since the program you use statically links with the CRT, it can not reliably be plugged in to. The anti-debugger code is just plain silly; there are several ways around it. If you paid for it send it back and demand a refund.
I am designing a system in C/C++ which is extendible with all sort of plugins. There is a well defined C public API which mostly works with (const) char* and other pointer types. The plugins are compiled into .so or .dll files, and the main application loads them upon startup, and later unloads or reloads them upon request.
The plugins might come in from various sources, trustable or not so :)
Now, I would like to make sure, that if one plugin does something stupid (such as tries to free a memory which he was not supposed to free), this action does not bring down the entire system, but merely notices the main system about the misbehaving plugin for it in order to remove it from the queue.
The code calls are being done in the following manner:
const char* data = get_my_data();
for(int i = 0; i<plugins; i++)
{
plugins[i]->execute(data);
}
but if plugin[0] frees "by accident" the data string or overwrites it or by mistake jumps to address 0x0 this would bring down the entire system, and I don't want this. How can I avoid this kind of catastrophe. (I know, I can duplicate the data string ... this does not solve my problem :) )
Make a wrapper process for plugin and communicate with that wrapper through IPC.
In case of plugin failure your main process would be untouched
Simply put, you can't do that in the same process. If your plugins are written in C or C++, they can contain numerous sources of undefined behavior, meaning sources for undetectable unavoidable crashes. So you should either launch the plugins in their own processes like kassak suggested and let them crash if they want to, or use another language for your plugins, e.g. some intepreted scripting language like lua.
Have a look at http://msdn.microsoft.com/en-us/library/1deeycx5(v=vs.90).aspx
I use /EHa in one of my projects to help me catch exceptions from libraries that do stupid things. If you compile your code with this setting a normal try catch block will catch exceptions like devide by zero, etc.
Not sure if there is some equivalent for this on Linux -- please let me know if there is..
I've a C++ program that links at runtime with, lets say, mylib.so. then, the same program uses dlopen()/dlsym() to load a function from myplugin.so, dynamic library that in turn has dependencies to mylib.so.
My question is: will the program AND the function in the plugin access the same globals defined in mydlib.so in the same memory area reserved for the program, or each will be assigned different, unrelated copies in its own memory space? if the latter is the default behaviour, is it possible to change that?
Thanks in advance =)!
Globals in the main program that does the dlopen should be visible to the code that is dynamically loaded. However, the best advice I've seen to date (especially if you ever want to have even vaguely portable code) is to only have function calls be passed across the linker divide, and to not export any variables in either direction. It's also best if there is an API for the loaded code to register the interesting parts of its API with the loader (e.g., "Here is how I provide this SPI for drawing foobars on a baz") as that's a much saner way of doing callbacks rather than just mashing everything together.
[EDIT]: The other reason for doing this is if you're simulating weak linking on a platform that doesn't support it. That's a lot like the other one I list, except that it is the main program that is building the SPI out of the API exported by the dynamic library rather than the .so exporting it explicitly on startup. It's second best really, but you make do with what you've got rather than wishing (well, unless you're prepared to do the work by writing some sort of connection library).
Yesterday, I got bit by a rather annoying crash when using DLLs compiled with GCC under Cygwin. Basically, as soon as you run with a debugger, you may end up landing in a debugging trap caused by RtlFreeHeap() receiving an address to something it did not allocate.
This is a known bug with GCC 3.4 on Cygwin. The situation arises because the libstdc++ library includes a "clever" optimization for empty strings. I spare you the details (see the references throughout this post), but whenever you allocate memory in one DLL for an std::string object that "belongs" to another DLL, you end up giving one heap a chunk to free that came from another heap. Hence the SIGTRAP in RtlFreeHeap().
There are other problems reported when exceptions are thrown across DLL boundaries.
This makes GCC 3.4 on Windows an unacceptable solution as soon as your project is based on DLLs and the STL. I have a few options to move past this option, many of which are very time-consuming and/or annoying:
I can patch my libstdc++ or rebuild it with the --enable-fully-dynamic-string configuration option
I can use static libraries instead, which increases my link time
I cannot (yet) switch to another compiler either, because of some other tools I'm using. The comments I find from some GCC people is that "it's almost never reported, so it's probably not a problem", which annoys me even more.
Does anyone have some news about this? I can't find any clear announcement that this has been fixed (the bug is still marked as "assigned"), except one comment on the GNU Radio bug tracker.
Thanks!
The general problem you're running into is that C++ was never really meant as a component language. It was really designed to be used to create complete standalone applications. Things like shared libraries and other such mechanisms were created by vendors on their own. Think of this example: suppose you created a C++ component that returns a C++ object. How is the C++ component know that it will be used by a C++ caller? And if the caller is a C++ application, why not just use the library directly?
Of course, the above information doesn't really help you.
Instead, I would create the shared libraries/DLLs such that you follow a couple of rules:
Any object created by a component is also destroyed by the same component.
A component can be safely unloaded when all of its created objects are destroyed.
You may have to create additional APIs in your component to ensure these rules, but by following these rules, it will ensure that problems like the one described won't happen.