Problems with statically linking Intel tbb - c++

I recently read this question How to statically link to TBB? and I still don't really understand the problems with using tbb as a statically linked library (which is possible with their makefile if you do make extra_inc=big_iron.inc tbb)
The answer seems to say that the problem is that there can be multiple singletons in a single program, all (most?) implementations of a singletons don't let that happen. I don't understand the reasoning behind this.
Is the problem that when you fork() another process the singleton becomes two separate singletons in two separate processes? Is that what they mean by "program"? Also if thats the case why can they not mmap() shared memory and use that as the communication medium?
Also doesn't dynamically linking only mean that the library itself is shared in memory, i.e. the code segment?
Thanks!

No, the singleton explanation refers to a single process, not the multiple processes case (though, it has some of the same issues with oversubscription and load balancing).
Dynamic linker makes sure there is only one global data section exists for the library and calls global constructors exactly once implementing singleton.
With statically linked TBB library, one can end up with multiple instances of TBB thread pool working in the same process simultaneously, which come from different components of an application. This causes the issue of over-subscription or even worse if somehow a memory or some object being allocated and registered in one instance of the scheduler gets used in another instance of the scheduler. This is especially easy to achieve because of thread-local storage that is heavily used by TBB scheduler. Each instance of the scheduler would use separate TLS breaking rules of nested parallelism up to deadlock and enabling memory leaks and segfaults because tasks allocated in one scheduler might end up being returned to another scheduler. Thus, this situation might not be obvious for developers who don't even intend to pass objects between module boundaries.
Sometimes, such a situation happens even with dynamic linkage when e.g. TBB shared library is renamed for one of application components. TBB team is working to solve this issue.

Related

Can (Should?) a native shared library which creates its own threads support the using process exiting 'without warning'?

I work on a product that's usually built as a shared library.
The using application will load it, create some handles, use them, and eventually free all the handles and unload the library.
The library creates some background threads which are usually stopped at the point the handles are freed.
Now, the issue is that some consuming applications aren't super well-behaved, and will fail to free the handles in some cases (cancellation, errors, etc). Eventually, static destructors in our library run, and crash when they try to interact with the (now dead) background thread(s).
One possibility is to not have any global objects with destructors, and so to avoid running any code in the library during static destruction. This would probably solve the crash on process exit, but it would introduce leaks and crashes in the scenario where the application simply unloads the library without freeing the handles (as opposed to exiting), as we wouldn't ensure that the background threads are actually stopped before the code they were running was unloaded.
More importantly, to my knowledge, when main() exits, all other threads will be killed, wherever they happened to be at the time, which could leave locks locked, and invariants broken (for example, within the heap manager).
Given that, does it even make sense to try and support these buggy applications?
Yes, your library should allow the process to exit without warning. Perhaps in an ideal world every program using your library would carefully track the handles and free them all when it exits for any reason, but in practice this isn't a realistic requirement. The code path that is triggering the program exit might be a shared component that isn't even aware that your library is in use!
In any case, it is likely that your current architecture has a more general problem, because it is inherently unsafe for static destructors to interact with other threads.
From DllMain entry point in MSDN:
Because DLL notifications are serialized, entry-point functions should not attempt to communicate with other threads or processes. Deadlocks may occur as a result.
and
If your DLL is linked with the C run-time library (CRT), the entry point provided by the CRT calls the constructors and destructors for global and static C++ objects. Therefore, these restrictions for DllMain also apply to constructors and destructors and any code that is called from them.
In particular, if your destructors attempt to wait for your threads to exit, that is almost certain to deadlock in the case where the library is explicitly unloaded while the threads are still running. If the destructors don't wait, the process will crash when the code the threads are running disappears. I'm not sure why you aren't seeing that problem already; perhaps you are terminating the threads? (That's not safe either, although for different reasons.)
There are a number of ways to resolve this problem. Probably the simplest is the one you already mentioned:
One possibility is to not have any global objects with destructors, and so to avoid running any code in the library during static destruction.
You go on to say:
[...] but it would introduce leaks and crashes in the scenario where the application simply unloads the library without freeing the handles [...]
That's not your problem! The library will only be unloaded if the application explicitly chooses to do so; obviously, and unlike the earlier scenario, the code in question knows your library is present, so it is perfectly reasonable for you to require that it close all your handles before doing so.
Ideally, however, you would provide an uninitialization function that closes all the handles automatically, rather than requiring the application to close each handle individually. Explicit initialization and uninitialization functions also allows you to safely set up and free global resources, which is usually more efficient than doing all of your setup and teardown on a per-handle basis and is certainly safer than using global objects.
(See the link above for a full description of all the restrictions applicable to static constructors and destructors; they are quite extensive. Constructing all your globals in an explicit initialization routine, and destroying them in an explicit uninitialization routine, avoids the whole messy business.)

POSIX Shared Memory Sync Across Processes C++/C++11

Problem (in short):
I'm using POSIX Shared Memory and currently just used POSIX semaphores and i need to control multiple readers, multiple writers. I need help with what variables/methods i can use to control access within the limitations described below.
I've found an approach that I want to implement but i'm unsure of what methodology i can use to implement it when using POSIX Shared memory.
What I've Found
https://stackoverflow.com/a/28140784
This link has the algorithm i'd like to use but i'm unsure how to implement it with shared memory. Do i store the class in shared memory somehow? This is where I need help please.
The reason I'm unsure is a lot of my research, points towards keeping shared memory to primitives only to avoid addressing problems and STL objects can't be used.
NOTE:
For all my multi-threading i'm using C++11 features. This shared memory will be completely seperate program executables using C++11 std::threads from which any thread of any process/executable will want access. I have avoided the Linux pthread for any of my multi-threading and will continue to do so (except if its just control variable not actual pThreads).
Solution Parameters aimed for
Must be shareable between 2+ processes which will be running multiple C++11 std::thread that may wish access. I.e. Multiple Writers (exclusive one at a time) while allowing multiple simultaneous readers when no writer wants access.
Not using BOOST libraries. Ideally native C++11 or built in linux libraries, something that will work without the need to install abstract libraries.
Not using pThread actual threads but could use some object from there that will work with C++11 std::thread.
Ideally can handle a process crash while in operation. E.g. Using POSIX semaphore if a process crashes while it has the semaphore, everyone is screwed. I have seen people using file locks?
Thanks in advance
keeping shared memory to primitives only to avoid addressing problems
You can use pointers in and to shared memory objects across programs, so long as the memory is mmaped to the same address. This is actually a straightforward proposition, especially on 64 bit. See this open source C library I wrote for implementation details: rszshm - resizable pointer-safe shared memory.
Using POSIX semaphore if a process crashes while it has the semaphore, everyone is screwed.
If you want to use OS mediated semaphores, the SysV semaphores have SEM_UNDO, which recovers in this case. OTOH pthread offers robust mutexes that can be embedded and shared in shared memory. This can be used to build more sophisticated mechanisms.
The SysV scheme of providing multiple semaphores in a semaphore set, where a group of actions must all succeed, or the call blocks, permits building sophisticated mechanism too. A read/write lock can be made with a set of three semaphores.

Why should avoid singleton in C++

People use singleton everywhere. Read some threads recently from stackoverflow that singleton should be avoided in C++, but not clear why is that.
Some might worry about memory leak with undeleted pointers, things such as exceptions will skip the memory recycle codes. But will the auto_ptr solve this problem?
In general, as mentioned in another answer, you should avoid mutable global data. It introduces a difficulty in tracking code side effects.
However your question is specifically about C++. You could, for instance, have global immutable data that is worth sharing in a singleton. In C++, specifically, it's nearly impossible to safely initialize a singleton in a multithreaded environment.
Multithreaded Environment
You can use the "construct on first use" idiom to make sure the singleton is properly initialized exactly by the time it is needed: http://www.parashift.com/c++-faq-lite/static-init-order.html.
However, what happens if you have 2 (or more) threads which all try to access the singleton for the first time, at exactly the same time? This scenario is not as far fetched as it seems, if the shared immutable data is data required by your calculateSomeData thread, and you initialize several of these threads at the same time.
Reading the discussion linked above in the C++ FAQ Lite, you can see that it's a complex question in the first place. Adding threads makes it much harder.
On Linux, with gcc, the compiler solves this problem for you - statics are initialized within a mutex and the code is made safe for you. This is an enhancement, the standard requires no such behavior.
In MSVC the compiler does not provide this utility for you and you get a crash instead. You may think "that's ok, I'll just put a mutex around my first use initialization!" However, the mutex itself suffers from exactly the same problem, itself needing to be static.
The only way to make sure your singleton is safe for threaded use is to initialize it very early in the program before any threads are started. This can be accomplished with a trick that causes the singleton to be initialized before main is called.
Singletons That Rely on Other Singletons
This problem can be mostly solved with the construct on first use idiom, but if you have the problem of initializing them before any threads are initialized, you can potentially introduce new problems.
Cross-platform Compatibility
If you plan on using your code on more than one platform, and compile shared libraries, expect some issues. Because there is no C++ ABI interface specified, each compiler and platform handles global statics differently. For instance, unless the symbols are explicitly exported in MSVC, each DLL will have its own instance of the singleton. On Linux, the singletons will be implicitly shared between shared libraries.
Avoid mutable global variables whether they're singletons or not, since they introduce unconstrained lines of communication: you don't know what part of the code is affecting what other parts, or when that happens.

How to use an old single-threaded C++ library in a multithreaded environment

I have an old C++ library which has been designed for use in single-threaded environmens.
The library exposes the interfaces for initialization, which change the internal data structures of the library, and usage, which only reads data and makes calculations.
My objective is to use this library in a Windows multithreaded application, with different threads calling instances of the dll initialized with different data.
Assuming that rewriting the dll to allow multithreading would be prohibitive, is there some way to let multiple instances of a DLL exist in the same process, with separate memory spaces, or to obtain a similar result by other means?
If the DLL contains static resources, then those would be shared among all instances created.
One possible way would be to create a single instance and restrict access to it using some kind of lock mechanism. This may reduce performance depending on usage, but without modifying internal structure of DLL, it may be difficult to work with multiple instance.
The sharing of static resources between all threads attached to a single DLL within a process conspires against you here.
However, there is a trick to achieve this. So long as DLLs have different names, then the system regards them as being different and so separate instances of code and data are created.
The way to achieve this is, for each thread, copy the DLL to a temporary file and load from there with LoadLibrary. You have to use explicit linking (GetProcAddress) rather than lib files but that's really the only way.

Are Multiple singleton instances possible in a shared DLL?

I am going to develop a DLL for an MFC Application, and suppose I have a singleton class in this DLL with some synchronization mechanism. And this DLL is used by other processes, namely EXEs. The question is: is this singleton created only once for all sharing processes or every process has its own singleton?
And How can I solve this multiple singleton problem?
I suppose you are talking about Windows. In that case every process has its own singleton. You could place it in shared memory and use named synchronization primitives to share singleton between processes.
If based on the singleton pattern, it'll end up being one singleton per process. Note that if you run multiple threads within that process there will still only be one singleton.
It depends. By default, all data in a DLL is non-shared and all code is shared. But by using #pragma section ("SharedSingleton", read, write, shared) you create a data section named "SharedSingleton", which is shared across all users of the DLL.
Note that this does introduce security risks! Another troublesome issue you might encounter is the initialization of the singleton; C++ doesn't really understand the concept of shared sections.