So I'm reading about monitors vs mutexes and finding mentions that suggest that monitors are faster mutexes because they don't lock system wide but rather only across the threads of a given process.
Is there some way in C++ to accomplish or simulate this?
Edit: I'm curious now what the difference is between system wide mutex and one restricted to a specific process.
C++ Standard does not define system-wide vs per-process primitives. So C++ does not specify whether std::mutex is system-wide.
Reasonable implementations have efficient per-process std::mutex; to have system-wide mutex you'll need to use libraries or operating system objects for your platform
The difference is that per-process mutex may use any memory operations to avoid system calls, as the process memory is shared among process's threads. Atomic operation on that memory are more efficient, and system call is often avoided via them. System-wide mutex will either start with system calls (not efficient), or will have to use shared memory (might be unsafe, also still may have some overhead).
The answer by #Alex Guteniev is as accurate as one can get (and should be considered the accepted answer). It states that the c++ standard doesn't define a system wide concept, and that mutexes for all practical purposes are per process i.e for synchronization between threads (execution agents) in a single process (and therefore according to your needs). The C++ makes it clear what a thread (std::thread) is (33.3 - ... intended to map one-to-one with OS threads (in my draft, at least...N4687)).
Microsoft post VC2015 has improved their implementation to use windows primitives as stated here. This is also indicated here in the most upvoted answer. I've also looked at the boost library implementations (which often precedes/influences the c++ standard) for microsoft and (AFAICT) it doesn't use any inter-process calls.
So to answer your question. In C++ threads and monitors are practically the same thing if this definition is to be considered accurate.
Update, stumbled across the answer to this while researching something related.
On Windows, Critical Sections can be used for single processes instead of system wide mutexes and are often faster:
Edit:
While the above statement is correct, c++ doesn't have the concept system wide mutex. This concept only exists when using OS specific primitives such as win32 CreateMutex and is not relevant to std c++.
Source:
std::mutex performance compared to win32 CRITICAL_SECTION
On Linux, pthreads are for processes.
Related
As Far as I know,
In computer science, a thread of execution is the smallest sequence of
programmed instructions that can be managed independently by a
scheduler, which is typically a part of the operating system. The
implementation of threads and processes differs between operating
systems, but in most cases a thread is a component of a process.
Multiple threads can exist within one process, executing concurrently
and sharing resources such as memory, while different processes do not
share these resources. In particular, the threads of a process share
its executable code and the values of its variables at any given time.[1]
When I decided to write a multi thread program in c++, i faced with many choices like boost thread, posix thread and std thread.
A simple search on internet shows a performance measurement taken by boost.org website here.
My question is a bit more basic and performance related as well.
Basically, Why do they differ in performance ? Why, for example thread type A, is faster than the others? The are written by most professional programmers, are ran by powerful OSs ,yet they offer different performance.
What makes them faster or slower?
The Boost documentation refers to the Fiber library, which are not actually threads. Creating what the library calls a fiber (essentially a user-space thread or coroutine, sometimes also referred to as green threads) does not create a separate schedulable entity on the kernel side, so it can be much more efficient at creation time. Other things could be less efficient because I/O operations necessarily become much more involved under this model (because a fiber doing I/O should not block the operating system thread it runs on if other fibers could do work there).
Note that some of the coroutine implementations out there are well out of the conceptual limits of the de-facto GNU/Linux ABI and other POSIX-like operating systems, so they should be considered ugly hacks at best.
Problem (in short):
I'm using POSIX Shared Memory and currently just used POSIX semaphores and i need to control multiple readers, multiple writers. I need help with what variables/methods i can use to control access within the limitations described below.
I've found an approach that I want to implement but i'm unsure of what methodology i can use to implement it when using POSIX Shared memory.
What I've Found
https://stackoverflow.com/a/28140784
This link has the algorithm i'd like to use but i'm unsure how to implement it with shared memory. Do i store the class in shared memory somehow? This is where I need help please.
The reason I'm unsure is a lot of my research, points towards keeping shared memory to primitives only to avoid addressing problems and STL objects can't be used.
NOTE:
For all my multi-threading i'm using C++11 features. This shared memory will be completely seperate program executables using C++11 std::threads from which any thread of any process/executable will want access. I have avoided the Linux pthread for any of my multi-threading and will continue to do so (except if its just control variable not actual pThreads).
Solution Parameters aimed for
Must be shareable between 2+ processes which will be running multiple C++11 std::thread that may wish access. I.e. Multiple Writers (exclusive one at a time) while allowing multiple simultaneous readers when no writer wants access.
Not using BOOST libraries. Ideally native C++11 or built in linux libraries, something that will work without the need to install abstract libraries.
Not using pThread actual threads but could use some object from there that will work with C++11 std::thread.
Ideally can handle a process crash while in operation. E.g. Using POSIX semaphore if a process crashes while it has the semaphore, everyone is screwed. I have seen people using file locks?
Thanks in advance
keeping shared memory to primitives only to avoid addressing problems
You can use pointers in and to shared memory objects across programs, so long as the memory is mmaped to the same address. This is actually a straightforward proposition, especially on 64 bit. See this open source C library I wrote for implementation details: rszshm - resizable pointer-safe shared memory.
Using POSIX semaphore if a process crashes while it has the semaphore, everyone is screwed.
If you want to use OS mediated semaphores, the SysV semaphores have SEM_UNDO, which recovers in this case. OTOH pthread offers robust mutexes that can be embedded and shared in shared memory. This can be used to build more sophisticated mechanisms.
The SysV scheme of providing multiple semaphores in a semaphore set, where a group of actions must all succeed, or the call blocks, permits building sophisticated mechanism too. A read/write lock can be made with a set of three semaphores.
I'm learning C++11 and have run into a threading issue. My general question: are C++11 mutexes compatible with threads not created with C++11's standard libraries?
I would like to safely share information between a thread created with C++11 and another thread created by a third-party library that I have no control over.
For example, my application uses PortAudio, which creates its own thread for audio output. I'm not sure if it's using pthreads, or OS-specific threading libraries, but I do know that PortAudio is NOT written in C++11. I want to safely share data between a GUI thread (using a C++11 thread) and the PortAudio thread using a mutex.
Similarly, can I use a C++11 mutex to synchronize QT QThreads and C++11 threads?
Are C++11 mutexes compatible with threads not created with C++11's standard libraries?
The C++ standard does not define a "thread" as something exclusively created by the C++ standard library.
1.10 Multi-threaded executions and data races [intro.multithread]
1 A thread of execution (also known as a thread) is a single flow of
control within a program, including the initial invocation of a
specific top-level function, and recursively including every function
invocation subsequently executed by the thread.
So, I would conclude the answer to your question is "yes".
Obviously, the C++ standard doesn't make any guarantees about compatebility with other systems. Part of the reason the C and C++ standards added threading facilities was to standardize on one threading system.
In practice it is expected that the C and C++ threads library is built to integrate with a platform threading system if there is one. For example, on platforms using pthreads the expectation is that pthreads are used where appropriate to buildtge standard library threading facilities (as far as I know there is no pthreads interface for the various atomic operations, i.e., the standard library may need to provide its own synchronization primitives).
The standard library classes provide access to the underlying representation through the native_handle() methods. A standard library should implement what is returned from these and, e.g., if pthreads types are provided it seems safe to assume that this particular standard library will play nice with pthreads.
The C++11 standard specifies that mutexes should work with any kind of 'execution agent', including different thread libraries. Here are some relevant quotes from the standard which I think answer the question conclusively:
Mutex requirements
A mutex object facilitates protection against data races and allows
safe synchronization of data between execution agents (30.2.5). An
execution agent owns a mutex from the time it successfully calls one
of the lock functions until it calls unlock.
Requirements for Lockable types
An execution agent is an entity such as a thread that may perform work
in parallel with other execution agents. [Note: Implementations or
users may introduce other kinds of agents such as processes or
thread-pool tasks. —end note ] The calling agent is determined by
context, e.g. the calling thread that contains the call, and so on.
It is inconceivable that C++11's threading implementation will be incompatible with the platform's native threading implementation because any practical program using C++11 threads is going to call into platform libraries, and those libraries may themselves be threaded or make thread related calls (to mutexes for example).
The C++11 library implementation for threads is not of course obliged to use the high level native threading library (say, pthreads or windows threads) but it probably will, for which purpose as has been mentioned there is a std::thread::native_handle() method to get the native handle. However, even where it does not use the high level native implementation, it will have to use the same low level kernel primitives underneath.
In all conceivable circumstances it should therefore be perfectly safe to use C++11 mutexes with thread instances created by native library calls, and vice versa, and mix any other native or C++ library synchronization calls. There may indeed be cases where it is necessary to do so. For example, the C++11 library does not at present provide thread pools or read-write locks (shared mutexes). You might want to use native read-write locks with threads started using std::thread, or use one of the many thread pool implementations provided by third party libraries in your C++ program.
The only caveat to observe is that trying to mix C++11 threads (which will in practice be obliged to use kernel threads in one way or another for the reasons mentioned above) with thread libraries which do not use kernel threads at all (for example, libraries based on green threads or "user" threads), is likely to lead you into trouble.
Edit: In support of this I notice that §30.3 of C++11 states, albeit non-normatively, that "These threads [std::thread threads] are intended to map one-to-one with operating system threads".
The disadvantage would be in comparison to a technique that was specialized to work on threads that are running within the same process. For example, does wait/post cause the whole process to yield, rather than just the executing thread, even though anyone waiting for a post would be within the same process?
The semaphore would be used, for example, to solve a producer/consumer problem in a shared buffer between two threads in the same process.
Are there any reasonable alternatives?
Use Boost.Thread condition variables as shown here. The accompanying article has a good summary of Boost.Thread features.
Using interprocess semaphores will work but it's likely to place a tax on your execution due to use of unnecessarily heavyweight underlying OS locking primitives (named kernel objects in Windows, for example).
Windows provides a number of objects useful for synchronising threads, such as event (with SetEvent and WaitForSingleObject), mutexes and critical sections.
Personally I have always used them, especially critical sections since I'm pretty certain they incur very little overhead unless already locked. However, looking at a number of libraries, such as boost, people then to go to a lot of trouble to implement their own locks using the interlocked methods on Windows.
I can understand why people would write lock-less queues and such, since thats a specialised case, but is there any reason why people choose to implement their own versions of the basic synchronisation objects?
Libraries aren't implementing their own locks. That is pretty much impossible to do without OS support.
What they are doing is simply wrapping the OS-provided locking mechanisms.
Boost does it for a couple of reasons:
They're able to provide a much better designed locking API, taking advantage of C++ features. The Windows API is C only, and not very well-designed C, at that.
They are able to offer a degree of portability. the same Boost API can be used if you run your application on a Linux machine or on Mac. Windows' own API is obviously Windows-specific.
The Windows-provided mechanisms have a glaring disadvantage: They require you to include windows.h, which you may want to avoid for a large number of reasons, not least its extreme macro abuse polluting the global namespace.
One particular reason I can think of is portability. Windows locks are just fine on their own but they are not portable to other platforms. A library which wishes to be portable must implement their own lock to guarantee the same semantics across platforms.
In many libraries (aka Boost) you need to write corss platform code. So, using WaitForSingleObject and SetEvent are no-go. Also, there common idioms, like Monitors, Conditions that Win32 API misses, (but it can be implemented using these basic primitives)
Some lock-free data structures like atomic counter are very useful; for example: boost::shared_ptr uses them in order to make it thread safe without overhead of critical section, most compilers (not msvc) use atomic counters in order to implement thread safe copy-on-write std::string.
Some things like queues, can be implemented very efficiently in thread safe way without locks at all that may give significant perfomance boost in certain applications.
There may occasionally be good reasons for implementing your own locks that don't use the Windows OS synchronization objects. But doing so is a "sharp stick." It's easy to poke yourself in the foot.
Here's an example: If you know that you are running the same number of threads as there are hardware contexts, and if the latency of waking up one of those threads which is waiting for a lock is very important to you, you might choose a spin lock implemented completely in user space. If the waiting thread is the only thread spinning on the lock, the latency of transferring the lock from the thread that owns it to the waiting thread is just the latency of moving the cache line to the owner thread and back to the waiting thread -- orders of magnitude faster than the latency of signaling a thread with an OS lock under the same circumstances.
But the scenarios where you want to do this is pretty narrow. As soon as you start having more software threads than hardware threads, you'll likely regret it. In that scenario, you could spend entire OS scheduling quanta doing nothing but spinning on your spin lock. And, if you care about power, spinlocks are bad because they prevent the processor from going into a low-power state.
I'm not sure I buy the portability argument. Portable libraries often have an OS portability layer that abstracts the different OS APIs for synchronization. If you're dealing with locks, a pthread_mutex can be made semantically the same as a Windows Mutex or Critical Section under an abstraction layer. There's some exceptions here, but for most people this is true. If you're dealing with Windows Events or POSIX condition variables, well, those are tougher to abstract. (Vista did introduce POSIX-style condition variables, but not many Windows software developers are in a position to require Vista...)
Writing locking code for a library is useful if that library is meant to be cross platform. Users of the library can use the library's locking functionality and not have to care about the underlying platform implementation. Assuming the library has versions for all the platforms being targetted it's one less bit of code that has to be ported.