DLL thread safety

DLL thread safety - c++

I am developing a DLL in MS VC express c++ that will be loaded in multiple client applications at the same time, the DLL has a shared memory space created using data_seg(".SHARED_SPACE_NAME"). In this shared memory space there are some vectors that can be modified.
Lets assume we have a function in the DLL body called doCalc():
_DLLAPI void __stdcall doCalc(int argument)
{
//Add to vector
//Loop through vector
//Erase from vector
//etc.
}
If doCalc is called at the same time from two or more client applications the system will crash.
I want the doCalc calls to "wait in line" for the previous call to finish - like it was a single-threaded application.
So that if client 1 calls and then immediately after client 2 calls, then client 1 should finish the function, and then client 2 should run the function.
The best solution would be to run the DLL as a single thread but I have searched the internet a I do not think it is possible.
I have tried searching the internet for this issue, and I have come up with something about making the function static would make it thread safe.
I have also read that C++0x somehow will make this thread-safe. But that it is not supported in MS VC express.
I have no experience in multithreading, so I hope you can help. Thanks in advance.

The Windows API to use here would be CreateMutex. Create a named mutex object. As you need to manipulate the shared data, call WaitForSingleObject with the mutex handle, and when you are done, call ReleaseMutex. Each thread that calls WaitForSingleObject takes ownership of the mutex and any other thread that calls WaitForSingleObject will stall, until the owning thread calls ReleaseMutex.
Of course, I don't belive you can do what you want to do:
Dlls may be mapped in at different addresses in each process space. If so, all pointers will be incorrect.
C++ does not allow fine grained control over allocations and has many implicit allocations, especially when dealing with STL objects. I don't belive that you can get the vector to store all the relevant data in the shared area.
You are going to have to do this with C style raw arrays.

Looks like you need a system-wide mutex that will protect your critical section of code (the code that mustn't run simultaneously). Making the function static has nothing to do with it, because it doesn't prevent different applications from running it at the same time.

I think that Boost.Interprocess is exactly what you need. It will solve both, the synchronization problem, and the one that Jim Brissom said in his comment that you even haven't thought about yet.

Related

Is it correct to use std::async for background tasks inside an internal thread (not from main process's thread)

I would like to have your opinion for this general technical concept. (I am working on microsoft windows OS)
There is a Process, this process creates multiple threads for different tasks.
Main process: it is a windows service written by C# code.
There are several threads that are create inside the main process: Thread_01, Thread_02, ...
Inside Thread_01: There is a Wrapper dll written in managed C++ to consume DLL_01. (DLL_01 is a dll written by me in native C++ code, that provides some APIs: Add, Remove, Connect)
Add and Remove can run very fast, but Connect may take more than 10 seconds and blocks the caller until it finishes.
I am thinking to use std::async to do the Connect function code, and send the result through a callback to the caller (main process).
Is it a good approach? I heard we cannot create or it is better not to create any thread inside inner threads, is it true? If so, how about std::async ?
Any recommendation is appreciated.
Thanks in advance,

None of what you describe makes the use of threads inacceptable for your code.
As usual, threads have issues that need to be cared for:
Data races due to access to shared data.
Problems of ownership of resources is now not just "Who own what?" but "Who and when owns what?".
When a thread is blocked and you want to abort this operation, how do you cancel it without causing issues down the line? In your case, you must avoid calling the callback, when the receiver doesn't exist any more.
Concerning your approach of using a callback, consider std::future<> instead. This takes care of a few of the issues above, though some are only shifted to the caller instead.

Ending global threads in DLL during process shutdown

There is this multiplatform (Windows, Linux, Cygwin) dynamic library which is loaded at run time by a Cygwin executable. At some point of time, during the normal workflow, the DLL allocates a pool of threads for use. These threads are managed as global variables (reference counted). So when the client process goes to shutdown, it starts releasing global objects, threads should be released too.
Issue is, as I understand, that during the process shutdown, the Loader lock is acquired and further down the street, threads want to acuiqre the same lock and, we have now a deadlock.
Now my ask for advise is, how we can make a nice shutdown?
The DLL has no init() or uninit() methods to be called. The client at best can be enhanced with some code before the end of main () (so this is before the process shutdown).
If I detach the threads, instead of joining them, during the global var clean up, memory goes corrupted. If I terminate them, we have ugly process dumps.
Btw, under Linux I see no such problems.
DLL is only C++14, client is C99 (Cygwin).
I tried to make the situation clear, but let me know if you have further questions. Thanks in advance for any ideas.

The fix is to add an uninit method to the DLL. It may not have one yet, but it needs one. You found out why: while the OS will call DllMain on DLL unload, it does so under loader lock. You need to do things that aren't possible under loader lock, so you need an extra call before DllMain. Naming that method uninit() is reasonable enough.
C++14 is not an issue here; this is an OS mechanism. Loader Lock has been around since ancient times.

At a previous job I struggled with this issue for a pretty lengthy period. Ultimately it came down to only 2 possible solutions:
Clean up every single resource the thread has claim to, then TerminateThread. It's violent and ugly but it works around the THREAD_DETACH issue and I actually found it advised on the Internet.
If you have the luxury of being able to get advance notice prior to PROCESS_DETACH - clean up everything at that early point, including orderly shutdown of your threads. Then by all means do absolutely nothing during PROCESS_DETACH - yes, don't even free any lingering heap objects, as you may be exposing yourself to deadlock or crash risks and the process is going down and freeing up all its resources anyway.
As an added note, I also learned to avoid at all costs having any global variables linked to the DLL lifetime. These will have their constructors and destructors executed in the DllMain context, needless to say more... If you need global singletons in a DLL, make sure to have manual control on their lifetimes (on both ends, so no auto-destruct smart pointers either).

boost::interprocess::named_mutex vs CreateMutex

I want to switch from CreatMutex to boost::interprocess::named_mutex to limit my application to a single instance. Both methods works when the application runs and ends just fine. However, the lock is not released when the application crashes and using boost::interprocess::named_mutex. I could resolve that issue by using two name_mutex but I don't really understand the issue.
Why is the lock for boost::interprocess::named_mutex not released when the application crashes but it is release with CreatMutex? What's the difference?
boost::interprocess::named_mutex mutex(boost::interprocess::open_or_create, "my_mutex");
boost::interprocess::scoped_lock<boost::interprocess::named_mutex> lock(mutex, boost::interprocess::try_to_lock);
if(!lock) {
return 1; //exit
}
//application may crash here.
boost::interprocess::named_mutex::remove("my_mutex");
return 1; //exit

Caveat: I've not spent much time with boost::interprocess, so this information is just from a quick inspection of the source. That said, I've used the Windows synchronisation API's a lot, so here goes...
The main difference between the two methods of interprocess synchronisation is how the object exists within the system.
With boost::interprocess::named_mutex, as well as a system-specific mutex, it looks like a synchronisation object is created as a file on the system. The location of the file is based on Registry entries (see note 1) (at least in Boost 1.54.0)... it's most likely located under the Common Application Data folder (see note 2). When the aplication crashes, this file is, in your case, not removed. I'm not sure if this is by design... however in the case of an application crash, it's perhaps best not to mess with the file system, just in case.
Conversely, when you use CreateMutex, an object is created at the kernel mode, which for named mutexes can be accessed by several applications. You get a handle to the Mutex by specifying the name when you create it, and you lose the handle when you call CloseHandle on it. The mutex object is destroyed when there are no more handles referencing it.
The important part of this is in the documentation:
The system closes the handle automatically when the process terminates. The mutex object is destroyed when its last handle has been closed.
This basically means that Windows will clean up after your application.
Note that if you don't perform a ReleaseMutex, and your application owns the mutex when it dies, then it's possible/likely that a waiting thread or process would see that the mutex had been abandoned (WaitForSingleObject returns WAIT_ABANDONED), and would gain ownership.
I apologise for not providing a solution, but I hope it answers your question about why the two systems act differently.
Just as an aside, using registry entries to get this information is horrible - it would be safer, and more future-proof, to use SHGetKnownFolderPath. But I digress.
Depending on your OS version, this could be %ALLUSERSPROFILE%\Application Data\boost.interprocess or ProgramData\boost.interprocess, or somewhere else entirely.

What you want is not trivial and the interprocess_mutex definitively the wrong way to do.
What you may could do is remove the mutex on termination, by providing a remover destructor and/or in a catch(...). But this is not guaranteed to work, since it won't be done if you terminate the process directly (from the OS). Also it could accidently remove the mutex while your application starts twice.
One approach is to safe the process-id (for example in a shared memory) on the first time your program starts and remove it when it stops. Everytime you start the application read and check if the id still in process, if not, start the program.

Odd issue with std::map and thread safety

This isn't so much of a problem now as I've implemented my own collection but still a little curious on this one.
I've got a singleton which provides access to various common components, it holds instances of these components with thread ID's so each thread should (and does, I checked) have it's own instance of the component such as an Oracle database access library.
When running the system (which is a C++ library being called by a C# application) with multiple incoming requests everything seems to run fine for a while but then it crashes out with an AccessViolation exception. Stepping through the debugger the problem appears to be when one thread finishes and clears out it's session information (held in a std::map object) the session information held in a separate collection instance for the other thread also appears to be cleared out.
Is this something anyone else has encountered or knows about? I've tried having a look around but can't find anything about this kind of problem.
Cheers

Standard C++ containers do not concern themselves with thread safety much. Your code sounds like it is modifying the map instance from two different threads or modifying the map in one thread and reading from it in another. That is obviously wrong. Use some locking primitives to synchronize the access between the threads.

If all you want is a separate object for each thread, you might want to take a look at boost::thread_specific_ptr.

How do you manage giving each thread its own session information? Somewhere under there you have classes managing the lifetimes of these objects, and this is where it appears to be going wrong.

QueueUserWorkItem with COM in C++

I have a performance issue where clients are creating hundreds of a particular kind of object "Foo" in my C++ application's DOM. Each Foo instance has its own asynchronous work queue with its own thread. Obviously, that doesn't scale.
I need to share threads amongst work queues, and I don't want to re-invent the wheel. I need to support XP, so I can't use the Vista/Win7 thread pool. The work that needs to be done to process each queue item involves making COM calls in the multi-threaded COM apartment. The documentation for the XP thread pool says that it is okay to call CoInitializeEx() with the MTA apartment in the thread worker function callback. I've written a test app and verified that this works. I made the app run 1 million iterations with and without a CoInitializeEx/CoUninitialize pair in the WorkItem callback function. It takes 35 seconds with the CoInit* calls and 5 seconds without them. That's way too much overhead for my application. Since the thread pool is per-process and 3rd-party code runs in my process, I'm assuming it isn't safe to CoInitializeEx() once per thread and never CoUninitialize().
Given all of that, is there any way that I can use the Win32 thread pool? Am I missing something, or is the XP thread pool pretty useless for high-performance COM applications? Am I just going to have to create my own thread-sharing system?

Have you verified what is taking so long? i.e. is it the call to CoInitializeEx()? You definitely don't need to call CoInitialize once per task. You also don't say how many threads you spawn, i.e. if your running on a dual core and your work is CPU intensive don't expect more than a 2x speedup, and if your work isn't CPU intensive then it's waiting on some resource (memory, disk, net) and speedups will be similarly constrained, perhaps made worse if there is a lock being held for that resource.
If you can use Visual Studio 2010 take a look at the Parallel Pattern Library and Asynchronous Agents Library, there are a couple tools that can help make this take less code to write.
If you can't you can at least try placing a token in TLS that represents whether COM has been initialized on that thread and use the presence of this token to bypass your calls to CoInitialize when they aren't needed.

I'm assuming it isn't safe to CoInitializeEx() once per thread and never CoUninitialize().
Windows will clean up if a thread exits without calling CoUninitialize, we know this works because if it didn't there would be no cleanup when threads crash or are aborted.
So the only way this hack could cause a problem is of someone was trying to queue work items that needed an STA apartment, which seem unlikely.
I'd be tempted to go for it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js