Ending global threads in DLL during process shutdown - c++

There is this multiplatform (Windows, Linux, Cygwin) dynamic library which is loaded at run time by a Cygwin executable. At some point of time, during the normal workflow, the DLL allocates a pool of threads for use. These threads are managed as global variables (reference counted). So when the client process goes to shutdown, it starts releasing global objects, threads should be released too.
Issue is, as I understand, that during the process shutdown, the Loader lock is acquired and further down the street, threads want to acuiqre the same lock and, we have now a deadlock.
Now my ask for advise is, how we can make a nice shutdown?
The DLL has no init() or uninit() methods to be called. The client at best can be enhanced with some code before the end of main () (so this is before the process shutdown).
If I detach the threads, instead of joining them, during the global var clean up, memory goes corrupted. If I terminate them, we have ugly process dumps.
Btw, under Linux I see no such problems.
DLL is only C++14, client is C99 (Cygwin).
I tried to make the situation clear, but let me know if you have further questions. Thanks in advance for any ideas.

The fix is to add an uninit method to the DLL. It may not have one yet, but it needs one. You found out why: while the OS will call DllMain on DLL unload, it does so under loader lock. You need to do things that aren't possible under loader lock, so you need an extra call before DllMain. Naming that method uninit() is reasonable enough.
C++14 is not an issue here; this is an OS mechanism. Loader Lock has been around since ancient times.

At a previous job I struggled with this issue for a pretty lengthy period. Ultimately it came down to only 2 possible solutions:
Clean up every single resource the thread has claim to, then TerminateThread. It's violent and ugly but it works around the THREAD_DETACH issue and I actually found it advised on the Internet.
If you have the luxury of being able to get advance notice prior to PROCESS_DETACH - clean up everything at that early point, including orderly shutdown of your threads. Then by all means do absolutely nothing during PROCESS_DETACH - yes, don't even free any lingering heap objects, as you may be exposing yourself to deadlock or crash risks and the process is going down and freeing up all its resources anyway.
As an added note, I also learned to avoid at all costs having any global variables linked to the DLL lifetime. These will have their constructors and destructors executed in the DllMain context, needless to say more... If you need global singletons in a DLL, make sure to have manual control on their lifetimes (on both ends, so no auto-destruct smart pointers either).

Related

Can (Should?) a native shared library which creates its own threads support the using process exiting 'without warning'?

I work on a product that's usually built as a shared library.
The using application will load it, create some handles, use them, and eventually free all the handles and unload the library.
The library creates some background threads which are usually stopped at the point the handles are freed.
Now, the issue is that some consuming applications aren't super well-behaved, and will fail to free the handles in some cases (cancellation, errors, etc). Eventually, static destructors in our library run, and crash when they try to interact with the (now dead) background thread(s).
One possibility is to not have any global objects with destructors, and so to avoid running any code in the library during static destruction. This would probably solve the crash on process exit, but it would introduce leaks and crashes in the scenario where the application simply unloads the library without freeing the handles (as opposed to exiting), as we wouldn't ensure that the background threads are actually stopped before the code they were running was unloaded.
More importantly, to my knowledge, when main() exits, all other threads will be killed, wherever they happened to be at the time, which could leave locks locked, and invariants broken (for example, within the heap manager).
Given that, does it even make sense to try and support these buggy applications?
Yes, your library should allow the process to exit without warning. Perhaps in an ideal world every program using your library would carefully track the handles and free them all when it exits for any reason, but in practice this isn't a realistic requirement. The code path that is triggering the program exit might be a shared component that isn't even aware that your library is in use!
In any case, it is likely that your current architecture has a more general problem, because it is inherently unsafe for static destructors to interact with other threads.
From DllMain entry point in MSDN:
Because DLL notifications are serialized, entry-point functions should not attempt to communicate with other threads or processes. Deadlocks may occur as a result.
and
If your DLL is linked with the C run-time library (CRT), the entry point provided by the CRT calls the constructors and destructors for global and static C++ objects. Therefore, these restrictions for DllMain also apply to constructors and destructors and any code that is called from them.
In particular, if your destructors attempt to wait for your threads to exit, that is almost certain to deadlock in the case where the library is explicitly unloaded while the threads are still running. If the destructors don't wait, the process will crash when the code the threads are running disappears. I'm not sure why you aren't seeing that problem already; perhaps you are terminating the threads? (That's not safe either, although for different reasons.)
There are a number of ways to resolve this problem. Probably the simplest is the one you already mentioned:
One possibility is to not have any global objects with destructors, and so to avoid running any code in the library during static destruction.
You go on to say:
[...] but it would introduce leaks and crashes in the scenario where the application simply unloads the library without freeing the handles [...]
That's not your problem! The library will only be unloaded if the application explicitly chooses to do so; obviously, and unlike the earlier scenario, the code in question knows your library is present, so it is perfectly reasonable for you to require that it close all your handles before doing so.
Ideally, however, you would provide an uninitialization function that closes all the handles automatically, rather than requiring the application to close each handle individually. Explicit initialization and uninitialization functions also allows you to safely set up and free global resources, which is usually more efficient than doing all of your setup and teardown on a per-handle basis and is certainly safer than using global objects.
(See the link above for a full description of all the restrictions applicable to static constructors and destructors; they are quite extensive. Constructing all your globals in an explicit initialization routine, and destroying them in an explicit uninitialization routine, avoids the whole messy business.)

Safely cleaning up a blocking std::thread after an automated test

Consider a test case for some Mutex class implementation. The test creates several std::thread instances during execution. All threads should finish if the Mutex class is implemented correctly according to the test. If there is a problem, it's possible that one thread may block indefinitely. How can the test correctly cleanup after itself?
At first I thought to detach the thread, but then the thread is leaked. Even worse, the thread relies on a Mutex instance from inside the test case which will sporadically cause access violations after the test case returns.
Some thread libraries such at Qt’s QThread have terminate() methods, but I’d like to use std::thread even though Qt is already a dependency for my project.
Is there a general pattern for testing potentially indefinitely blocking concurrent code?
Killing threads that may hold a lock is one of the main reasons terminating threads forcibly is frowned upon, and why C++11 doesn't support it. You are not supposed to do it, period.
If you need to do something like it, your best best would probably be to spawn a new process to run the test in; if it locks up, you can terminate the process without the same risks.
For examples of why terminating threads is bad news, take a look at the specific example from the Old New Thing on what sort of garbage thread termination leaves lying around on Windows; similar issues occur on most operating systems under different contexts.
I think destructors could come in help here, is the only thing 100% sure that be executed after any problem, by design. I suggest a nice blocking test inside of some destructor and release resources in a SECURE WAY (smart pointers ?) before leave it.

Should I always call Release on COM pointers when my program terminates?

I know that modern Windows versions reclaim memory that was previously acquired with malloc, new and the like, after program termination, but what about COM objects? Should I call obj->Release() on them on program's exit, or will the system do this for me?
My guess it: it depends. For out of process COM, I should probably always call Release(), but for in-process COM, I think it really doesn't matter, because the COM objects die after program termination anyway.
If you're in the process itself then yes you should as you might not know where the server is and the server could be out of proc. If you're in a DLL it becomes more complicated.
In a DLL you should UNLESS you receive a DLL_PROCESS_DETACH notification, in which case you should do absolutely nothing and just let the application close. This is because this notification is called during process teardown. As such it is too late to clean up at that point. The kernel may have already reclaimed the blocks you call release on.
Remember as a DLL writer there is nothing you can do if the process exits ungracefully, you can only do what you can within reason to clean up after yourself in a graceful exit.
One easy solution is to use smart COM Pointers everywhere, the ATL and WRL have implementations that work nicely and make it so you don't have to worry about it for the most part. Even if these are stored statically their destructors will be called before process teardown or DLL unload, thus releasing safely at a time when it is safe to do so.
So the short answer is If you can e.g. you should always call release if it is safe to do so. However there are times when it is not and you should most definitely NOT do anything.
Depending on the implementation of the underlying object, there may or may not be a penalty. The object may have state that persists beyond process shutdown. A persistent lock in a local database is the easiest example that comes to mind.
With that in mind, I say it's better to call Release just in case.
in-process COM object will die with the process
out-of-process reference will be released on timeout
poorly designed servers and clients might remain in bad state holding pointers to objects (typically proxies), that are not available any longer, being unable to see they are dead. being unable to get rid of them
it is always a good idea to release the pointers gracefully
If you don't release COM pointers properly, and COM activity included marshaling, you are very likely to have exceptions in CoUninitialze which are both annoying and/or can end up showing process crash message to the user.

boost::interprocess::named_mutex vs CreateMutex

I want to switch from CreatMutex to boost::interprocess::named_mutex to limit my application to a single instance. Both methods works when the application runs and ends just fine. However, the lock is not released when the application crashes and using boost::interprocess::named_mutex. I could resolve that issue by using two name_mutex but I don't really understand the issue.
Why is the lock for boost::interprocess::named_mutex not released when the application crashes but it is release with CreatMutex? What's the difference?
boost::interprocess::named_mutex mutex(boost::interprocess::open_or_create, "my_mutex");
boost::interprocess::scoped_lock<boost::interprocess::named_mutex> lock(mutex, boost::interprocess::try_to_lock);
if(!lock) {
return 1; //exit
}
//application may crash here.
boost::interprocess::named_mutex::remove("my_mutex");
return 1; //exit
Caveat: I've not spent much time with boost::interprocess, so this information is just from a quick inspection of the source. That said, I've used the Windows synchronisation API's a lot, so here goes...
The main difference between the two methods of interprocess synchronisation is how the object exists within the system.
With boost::interprocess::named_mutex, as well as a system-specific mutex, it looks like a synchronisation object is created as a file on the system. The location of the file is based on Registry entries (see note 1) (at least in Boost 1.54.0)... it's most likely located under the Common Application Data folder (see note 2). When the aplication crashes, this file is, in your case, not removed. I'm not sure if this is by design... however in the case of an application crash, it's perhaps best not to mess with the file system, just in case.
Conversely, when you use CreateMutex, an object is created at the kernel mode, which for named mutexes can be accessed by several applications. You get a handle to the Mutex by specifying the name when you create it, and you lose the handle when you call CloseHandle on it. The mutex object is destroyed when there are no more handles referencing it.
The important part of this is in the documentation:
The system closes the handle automatically when the process terminates. The mutex object is destroyed when its last handle has been closed.
This basically means that Windows will clean up after your application.
Note that if you don't perform a ReleaseMutex, and your application owns the mutex when it dies, then it's possible/likely that a waiting thread or process would see that the mutex had been abandoned (WaitForSingleObject returns WAIT_ABANDONED), and would gain ownership.
I apologise for not providing a solution, but I hope it answers your question about why the two systems act differently.
Just as an aside, using registry entries to get this information is horrible - it would be safer, and more future-proof, to use SHGetKnownFolderPath. But I digress.
Depending on your OS version, this could be %ALLUSERSPROFILE%\Application Data\boost.interprocess or ProgramData\boost.interprocess, or somewhere else entirely.
What you want is not trivial and the interprocess_mutex definitively the wrong way to do.
What you may could do is remove the mutex on termination, by providing a remover destructor and/or in a catch(...). But this is not guaranteed to work, since it won't be done if you terminate the process directly (from the OS). Also it could accidently remove the mutex while your application starts twice.
One approach is to safe the process-id (for example in a shared memory) on the first time your program starts and remove it when it stops. Everytime you start the application read and check if the id still in process, if not, start the program.

DLL thread safety

I am developing a DLL in MS VC express c++ that will be loaded in multiple client applications at the same time, the DLL has a shared memory space created using data_seg(".SHARED_SPACE_NAME"). In this shared memory space there are some vectors that can be modified.
Lets assume we have a function in the DLL body called doCalc():
_DLLAPI void __stdcall doCalc(int argument)
{
//Add to vector
//Loop through vector
//Erase from vector
//etc.
}
If doCalc is called at the same time from two or more client applications the system will crash.
I want the doCalc calls to "wait in line" for the previous call to finish - like it was a single-threaded application.
So that if client 1 calls and then immediately after client 2 calls, then client 1 should finish the function, and then client 2 should run the function.
The best solution would be to run the DLL as a single thread but I have searched the internet a I do not think it is possible.
I have tried searching the internet for this issue, and I have come up with something about making the function static would make it thread safe.
I have also read that C++0x somehow will make this thread-safe. But that it is not supported in MS VC express.
I have no experience in multithreading, so I hope you can help. Thanks in advance.
The Windows API to use here would be CreateMutex. Create a named mutex object. As you need to manipulate the shared data, call WaitForSingleObject with the mutex handle, and when you are done, call ReleaseMutex. Each thread that calls WaitForSingleObject takes ownership of the mutex and any other thread that calls WaitForSingleObject will stall, until the owning thread calls ReleaseMutex.
Of course, I don't belive you can do what you want to do:
Dlls may be mapped in at different addresses in each process space. If so, all pointers will be incorrect.
C++ does not allow fine grained control over allocations and has many implicit allocations, especially when dealing with STL objects. I don't belive that you can get the vector to store all the relevant data in the shared area.
You are going to have to do this with C style raw arrays.
Looks like you need a system-wide mutex that will protect your critical section of code (the code that mustn't run simultaneously). Making the function static has nothing to do with it, because it doesn't prevent different applications from running it at the same time.
I think that Boost.Interprocess is exactly what you need. It will solve both, the synchronization problem, and the one that Jim Brissom said in his comment that you even haven't thought about yet.