I am not entirely sure what exactly the threading=multi flag does when building boost. The documentation says:
Causes the produced binaries to be thread-safe. This requires proper
support in the source code itself.
which does not seem to be very specific. Does this mean that the accesses to, for example, boost containers are guarded by mutexes/locks or similar? As the performance of my code is critical, I would like to minimize any unnecessary mutexes etc.
Some more details:
My code is a plug-in DLL which gets loaded into a multi-threaded third-party application. I statically link boost into the DLL (the plug-in is not allowed to have any other dependencies except standard Windows DLLs, so I am forced to do this).
Although, the application is multi-threaded, most of the functions in my DLL are only ever called from a single thread and therefore the accesses to containers need not be guarded. I explicitly guard the the remaining places of my code, which can be called from multiple threads, by using boost::mutex and friends.
I've tried building boost with both threading=multi and threading=single and both seem to work but I'd really like to know what I am doing here.
No, threading=multi doesn't mean that things like boost containers will suddenly become safe for concurrent access by multiple threads (that would be prohibitively expensive from a performance point of view).
Rather, what it means in theory is that boost will be compiled to be thread aware. Basically this means that boost methods and classes will behave in a reasonable default way when accessed from multiple threads, much like classes in the std library. This means that you cannot access the same object from multiple threads, unless otherwise documented, but you can access different objects from multiple threads, safely. This might seem obvious, even without explicit support, but any static state used by the library would break that guarantee if not protected. Using threading=multi guarantees that any such shared state is property guarded by mutex or some other mechanism.
In the past similar arguments or stdlib flavors were available for the C and C++ std libraries supplied my compilers, although today mostly just the multithreaded versions are available.
There is likely little downside to compiling with threading=multi, given that only a limited amount of static state need to be synchronized. Your comment that your library will mostly only be called by a single thread doesn't inspire a lot of confidence - after all, those are the kinds of latent bugs that will cause you to be woken up at 3 AM by your boss after a night of long drinking.
The example of boost's shared_ptr is informative. With threading=single, it is not even guaranteed that independent manipulation of two shared_ptr instances, from multiple threads, is safe. If they happen to point to the same object (or, in theory, under some exotic implementations even if they don't), you will gen undefined behavior, because shared state won't be manipulated with the proper protections.
With threading=multi, this won't happen. However, it is still not safe to access the same shared_ptr instance from multiple threads. That is, it doesn't give any thread safety guarantees which aren't documented for the object in question - but it does give the "expected/reasonable/default" guarantees of independent objects being independent. There isn't a good name for this default level of thread-safety, but it is in fact what's generally offered all standard libraries for multi-threaded languages today.
As a final point, it's worth noting that Boost.Thread is implicitly always compiled with threading=multi - since using boost's multithreaded classes is an implicit hint that multiple threads are present. Using Boost.Thread without multithreaded support would be nonsensical.
Now, all that said, the above is the theoretical idea behind compiling boost "with thread support" or "without thread support" which is the purpose of the threading= flag. In practice, since this flag was introduced, multithreading has become the default, and single threading the exception. Indeed, many compilers and linkers which defaulted to single threaded behavior, now default to multithreaded - or at least require only a single "hint" (e.g., the presence of -pthread on the command line) to flip to multithreaded.
Apart from that, there has also been a concerted effort to make the boost build be "smart" - in that it should flip to multithreaded mode when the environment favors it. That's pretty vague, but necessarily so. It gets as complicated as weak-linking the pthreads symbols, so that the decision to use MT or ST code is actually deferred to runtime - if pthreads is available at execution time, those symbols will be used, otherwise weakly linked stubs - that do nothing at all - will be used.
The bottom line is that threading=multi is correct, and harmless, for your scenario, and especially if you are producing a binary you'll distribute to other hosts. If you don't special that, it is highly likely that it will work anyway, due to the build-time and even runtime heuristics, but you do run the chance of silently using empty stub methods, or otherwise using MT-unsafe code. There is little downside to using the right option - but some of the gory details can also be found in the comments to this point, and Igor's reply as well.
After some digging, it turns out that threading=single doesn't have much effect, as one would expect. In particular, it does not affect BOOST_HAS_THREADS macro and thus doesn't configure libraries to assume single threaded environment.
With gcc threading=multi just implies #define BOOST_HAS_PTHREADS, while with MSVC it doesn't produce any visible effect. Paricularly, _MT is defined both in threading=single and threading=multi modes.
Note however, that one can explicitly configure Boost libraries for single-threaded mode by defining the appropriate macro, like BOOST_SP_DISABLE_THREADS , BOOST_ASIO_DISABLE_THREADS, or globally with BOOST_DISABLE_THREADS.
Let's be concise. Causes the produced binaries to be thread-safe means that Boost code is adapted to that different threads can use different Boost objects. That means in particular that the Boost code will have a special care of the thread-safety of accesses to hidden global or static objects that might be use by the implementation of Boost libraries. It does not automatically allow different threads to use the same Boost objects at the same time without protection (locks/mutexes/...).
Edit: Some Boost libraries may document an extra thread-safety for specific functions or classes. For example asio::io_service as suggested by Igor R. in a comment.
The documentation says everything that is needed: this option ensures thread safety. That is, when programming in a multithreaded environment, you need to ensure certain properties like avoiding unrestricted access to e.g. variables.
I think, enabling this option is the way to go.
For further reference: BOOST libraries in multithreading-aware mode
Related
Imagine a project with the development stretched over 10+ years timespan. Some parts are in C, some are in C++ and all of the code uses global functions and global variables. The architecture was designed inherently single threaded and kept growing that way. But now we consider utilizing many-core architectures.
Now one idea being evaluated is to refactor a part of the code into a library, to make it possible to create more than one instance, so that they can run in separate threads and don’t interfere with each other.
The proposal that gains the most traction at this point is to wrap all the library files into namespaces with macro defines, like:
namespace VARIANT {
// all the code
}
Then define the VARIANT in a header or on project level. This will make it possible to have different contexts within different namespaces. And the selling point is that this approach will require minimal code change and has low risk of introducing any regression.
But if at some point we need to make the behavior of Variant1 different from Variant2, things will get tricky, since there’s no way to compare the value of a macro define with a string in a preprocessor macro.
Is there a more elegant way to achieve this?
Another variant might be spotting all global variables and making them thread_local. Requires either C++11 or at least compiler extensions providing the same (__thread using older GCC).
If I read this question right, you even don't need to convert your C files into C++ files (which your approach requires as C does not support namespaces...), but you need C11 for.
Refactoring an old project and make it multithreading is not so simple. First of all you have mixture of C and C++ codes and you cannot blindly follow C++ approach here. Instead of namespace you need to thing on the below areas:-
Find out all the code blocks, container like list, large array of objects etc which need synchronization.
Find out interdependency of threads and how will you control them. For example one thread will generate a report and insert into a table and second one need that information to generate its final report, now you need to find out these kind of dependency among the threads in your code base and need to find out their control mechanism.
Old style of multithreading in C++ was very tricky hence you need to migrate your code to C++11 where implementation of multithreading is much easier.
As you said that in your current project there are lots of global variable, you need to think properly how you are going to share these variables amongst different threads and how will you synchronize access of these variables.
These are some hints you need to consider lots of areas in advance before starting refactoring else all your efforts end in smoke.
GOOD LUCK for your plan.
Just do it in steps, testing each time:
1) typedef a struct with all the globals in it. malloc one, and edit the existing code to reference it. Test - should work exactly the same as with the globals.
2) Create one thread to run one instance of the code. Test - should work exactly the same as with the globals.
3) Try multiple threads.
One step at a time...
Please try very hard to not attempt any bodges!
I recently posted a question about stack segmentation and boost coroutines but it seems like the -fsplit-stack approach only works with source files that are compiled with that flag, the runtime breaks down when you branch to another function that has not been compiled with -fsplit-stack. For example
This implies that the runtime uses a function local technique to detect when the current stack has been surpassed. And not a "guard page signal" trick, where the end of the stack always has a guard page which will raise a signal on write or read, telling the runtime to allocate a new stack frame and branch to that.
Then what is the use of this flag? If I link to any other library that has not been built with this, code will break (even libstdc++ and libc), then how is this something people use practically with big projects?
From reading the gcc wiki about split stacks it seems like calling a non split stack function from a split stack function results in an allocation of a 64KB stack frame. Good.
But it seems like calling a non split stack function from a function pointer has not yet been implemented to follow the above scheme.
What use is this flag then? If I proceed to call any virtual function will my program break?
Further from the answer below it seems like clang has not implemented split stacks?
You have to compile boost (at least boost.context and boost.coroutine) with segmeented-stacks support AND your application.
compile boost (boost.context and boost.coroutine) with b2 property segmented-stacks=on (enables special code inside boost.coroutine and boost.context).
your app has to be compiled with -DBOOST_USE_SEGMENTED_STACKS and -fsplit-stack (required by boost.coroutines headers).
see boost.coroutine documentation
boost.coroutine contains an example that demonstrates segmented stacks (in directory coroutine/example/asymmetric/ call b2 toolset=gcc segmented-stacks=on).
regarding your last question GCC Wiki states:
For calls from split-stack code to non-split-stack code, the linker
will change the initial instructions in the split-stack (caller)
function. This means that the linker will have to have special
knowledge of the instructions that the compiler emits. The effect of
the changes will be to increase the required framesize by a number
large enough to reasonably work for a non-split-stack. This will be a
target dependent number; the default will be something like 64K. Note
that this large stack will be released when the split-stack function
returns. Note that I'm disregarding the case of split-stack code in a
shared library calling non-split-stack code in the main executable;
that seems like an unlikely problem.
please note: while llvm supports segmented stacks, clang seams not to provide the __splitstack_<xyz> functions.
First I'd say split stack support is somewhat experimental in nature to begin with. It is not a widely supported thing nor has a single implementation become accepted as the way to go. As such, part of the purpose of it existing in the compiler is to enable research in real use.
That said, one generally wants to use such a feature to enable lots of threads with small stacks, but which can get bigger if they need to. In some applications, the code that runs in these threads can be tightly controlled. E.g. fairly specialized request handlers that do not call general purpose libraries such as Boost. High performance systems work often involves tightening down the constraints on what code is used in a given path and this would be an example thereof. It certainly limits the applicability of the feature, but I wouldn't be surprised if someone is using it in production this way.
Note that similar issues exist with flags such as -fno-exceptions and -fno-rtti . Generally C++ requires compiling everything that goes into an executable with a compatible set of flags. Sometimes one can mix and match, but it is often fragile. This is part of the motivation of building everything from source and hermetic build tools like bazel. Other languages have different approaches to non-source components, especially virtual machine based languages such as Java and the .NET family. In those worlds things like split stacks are decided at a lower-level of compilation, but typically one would not have any control over or awareness of them at the source code level.
Why are all libraries in Boost not headers-only?
Saying it differently, what makes the use of .lib/.dll mandatory?
Is it when a class can't be a template or has static fields?
Different points, I guess.
Binary size. Could header-only put a size burden on the client?
Compilation times. Could header-only mean a significant decrease in compilation performance?
Runtime Performance. Could header-only give superior performance?
Restrictions. Does the design require header-only?
About binary size.
and a bit of security
If there's a lot of reachable code in the boost library, or code about which the compiler can't argue whether it is reachable by the client, it has to be put into the final binary. (*)
On operating systems that have package management (e.g. RPM- or .deb-based), shared libraries can mean a big decrease in binary distribution size and have a security advantage: Security fixes are distributed faster and are then automatically used by all .so/.DLL users. So you had one recompile and one redistribution, but N profiteers. With a header-only library, you have N recompiles, N redistributions, always for each fix, and some member of those N are huge in themselves already.
(*) reachable here means "potentially executed"
About compilation times.
Some boost libraries are huge. If you would #include it all, each time you change a bit in your source-file, you have to recompile everything you #included.
This can be counter-measured with cherry picked headers, e.g.
#include <boost/huge-boost-library.hpp> // < BAD
#include <boost/huge-boost-library/just-a-part-of-it.hpp> // < BETTER
but sometimes the stuff you really need to include is already big enough to cripple your recompiles.
The countermeasure is to make it a static or shared library, in turn meaning "compile completely exactly once (until the next boost update)".
About runtime performance.
We are still not in an age were global optimization solves all of our C++ performance problems. To make sure you give the compiler all the information it needs, you can make stuff header-only and let the compiler make inlining decisions.
In that respect, note that inlining gives not always superior performance because of caching and speculation issues on the CPU.
Note also that this argument is mostly with regards to boost libraries that might be used frequently enough, e.g. one could expect boost::shared_ptr<> to be used very often, and thus be a relevant performance factor.
But consider the real and only relevant reason boost::shared_ptr<> is header-only ...
About restrictions.
Some stuff in C++ can not be put into libraries, namely templates and enumerations.
But note that this is only halfway true. You can write typesafe, templated interfaces to your real data structures and algorithms, which in turn have their runtime-generic implementation in a library.
Likewise, some stuff in C++ should be put into source files, and in case of boost, libraries. Basically, this is everything that would give "multiple definition" errors, like static member variables or global variables in general.
Some examples can also be found in the standard library: std::cout is defined in the standard as extern ostream cout;, and so cout basically requires the distribution of something (library or sourcefile) that defines it once and only once.
In unix environments the makecontext()/swapcontext() family of functions is sometimes used to implement coroutines in C. However these functions directly manipulate the stack and the execution flow. Often when these low level functionalities are quite different when switching from C to C++.
So the question is, if there would be any problem with implementing coroutines using makecontext() and swapcontext(). Of course one obviously would have to take very good care, that an exception could never escape such a coroutine, since there would be no exception handler on the stack for this and the program would most likely segfault. But other than that is there any incompatibility between the way C++ handles things internally and makecontext() and setcontext() modify the execution path?
I've used makecontext()/swapcontext() with C++ code before, and as you say, the main thing to watch out for are exceptions. Beyond that I haven't had any trouble. Despite their obsolescence according to the standard, they're still well-supported by unix-like operating systems. (there is a caveat for Mac OS X: you have to #define _XOPEN_SOURCE before #including the relevant headers.) The rationale for making them obsolete is pretty lame, too - they could have replaced them with a pthreads-like version, where the function pointer takes a single void* argument.
As you say, threads aren't a useful substitute, so I'd go ahead and use swapcontext(). You may also find this blog post interesting for rolling your own version of the functions.
The C++ standard 3.6.3 states
Destructors for initialized objects of static duration are called as a result of returning from main and as a result of calling exit
On windows you have FreeLibrary and linux you have dlclose to unload a dynamically linked library. And you can call these functions before returning from main.
A side effect of unloading a shared library is that all destructors for static objects defined in the library are run.
Does this mean it violates the C++ standard as these destructors have been run prematurely ?
It's a meaningless question. The C++ standard doesn't say what dlclose does or should do.
If the standard were to include a specification for dlclose, it would certainly point out that dlclose is an exception to 3.6.3. So then 3.6.3 wouldn't be violated because it would be a documented exception. But we can't know that, since it doesn't cover it.
What effect dlclose has on the guarantees in the C++ standard is simply outside the scope of that standard. Nothing dlclose can do can violate the C++ standard because the standard says nothing about it.
(If this were to happen without the program doing anything specific to invoke it, then you would have a reasonable argument that the standard is being violated.)
Parapura, it may be helpful to keep in mind that the C++ standard is a language definition that imposes constraints on how the compiler converts source code into object code.
The standard does not impose constraints on the operating system, hardware, or anything else.
If a user powers off his machine, is that a violation of the C++ standard? Of course not. Does the standard need to say "unless the user powers off the device" as an "exception" to every rule? That would be silly.
Similarly, if an operating system kills a process or forces the freeing of some system resources, or even allows a third party program to clobber your data structures -- this is not a violation of the C++ standard. It may well be a bug in the OS, but the C++ language definition remains intact.
The standard is only binding on compilers, and forces the resulting executable code to have certain properties. Nevertheless, it does not bind runtime behavior, which is why we spend so much time on exception handling.
I'm taking this to be a bit of an open-ended question.
I'd say it's like this: The standard only defines what a program is. And a program (a "hosted" one, I should add) is a collection of compiled and linked translation units that has a unique main entry point.
A shared library has no such thing, so it doesn't even constitute a "program" in the sense of the standard. It's just a bunch of linked executable code without any sort of "flow". If you use load-time linking, the library becomes part of the program, and all is as expected. But if you use runtime linking, the situation is different.
Therefore, you may like to view it like this: global variables in the runtime-linked shared object are essentially dynamic objects which are constructed by the dynamic loader, and which are destroyed when the library is unloaded. The fact that those objects are declared like global objects doesn't change that, since the objects aren't part of a "program" at that point.
They are only run prematurely if you go to great effort to do so - the default behavior is standard conforming.
If it does violate the standard, who is the violator? The C++ compiler cannot be considered the violator (since things are being loaded dynamically via a library call); thus it must the the vendor of the dynamic loading functionality, aka the OS vendor. Are OS vendors bound by the C++ standard when designing their systems? That definitely seems to be outside of the scope of the standard.
Or for another perspective, consider the library itself to be a separate program providing some sort of service. When this program is terminated (by whatever means the library is unloaded) then all associated service objects should disappear as well, static or not.
This is just one of the tons and tons of platform-specific "extensions" (for a target compiler, architecture, OS, etc) that are available. All of which "violate" the standard in all sorts of ways. But there is only one expected consequence for deviating from standard C++: you aren't portable anymore. (Unless you do a lot of #ifdef or something, but still, that particular code is locked in to that platform).
Since there is currently no standard/cross-platform notion of libraries, if you want the feature, you have to either not use it or re-implement it per-platform. Since similar things are appearing on most platforms, maybe the standard will one day find a clean way to abstract them so that the standard covers them. The advantage will be a cross-platform solution and it will simplify cross platform code.