Could you help me to understand how to use mutexes in multithread Linux application, where:
during data writing it is need to lock variable on write and read
during data reading from the variable it is need to lock it on write.
So it is possible to read simultaneously, but writing opertion is a single opertaion in the same time. During writing, all other operation should wait before it finishes.
You're asking about something that is a bit higher level than mutexes. A mutex is a simple, low-level device. When you lock a thread with a mutex, the CPU is either executing code in the thread that obtained the lock or it is executing some other process entirely. In other words, the mutex has locked out all other threads that belong to the same (heavyweight) process.
You are asking about a read-write lock. Read-write locks use mutexes underneath the hood. The POSIX functions that deal with read-write locks start with pthread_rwlock_. Since you are on a Linux machine, just type man pthread and look for the section marked "READ/WRITE LOCK ROUTINES".
You need a reader/writer lock to allow multiple readers/single writer.
Boost.Thread has one of these (boost::shared_mutex), if you have no other preferred threading library. This uses PThreads primitives under the covers, and will probably save you time in wrapping the raw APIs yourself.
I would not recommend implementing this yourself - it's easy to get something that appears to work, but under load either crashes or kills performance or (worst of all) silently modifies your data in a way it should not be, so you get bad results.
A simple boost::mutex can also be used here as noted by #Als, but won't allow multiple concurrent reads. That is simpler to implement, and may be sufficient for your needs, depending on your read/write access profile.
You will need to use mutexes, if you have global or static objects which are being accessed(read and written to) from different threads.
Related
I've used pthreads a fair bit for concurrent programs, mainly utilising spinlocks, mutexes, and condition variables.
I started looking into multithreading using std::thread and using std::mutex, and I noticed that there doesn't seem to be an equivalent to spinlock in pthreads.
Anyone know why this is?
there doesn't seem to be an equivalent to spinlock in pthreads.
Spinlocks are often considered a wrong tool in user-space because there is no way to disable thread preemption while the spinlock is held (unlike in kernel). So that a thread can acquire a spinlock and then get preempted, causing all other threads trying to acquire the spinlock to spin unnecessarily (and if those threads are of higher priority that may cause a deadlock (threads waiting for I/O may get a priority boost on wake up)). This reasoning also applies to all lockless data structures, unless the data structure is truly wait-free (there aren't many practically useful ones, apart from boost::spsc_queue).
In kernel, a thread that has locked a spinlock cannot be preempted or interrupted before it releases the spinlock. And that is why spinlocks are appropriate there (when RCU cannot be used).
On Linux, one can prevent preemption (not sure if completely, but there has been recent kernel changes towards such a desirable effect) by using isolated CPU cores and FIFO real-time threads pinned to those isolated cores. But that requires a deliberate kernel/machine configuration and an application designed to take advantage of that configuration. Nevertheless, people do use such a setup for business-critical applications along with lockless (but not wait-free) data structures in user-space.
On Linux, there is adaptive mutex PTHREAD_MUTEX_ADAPTIVE_NP, which spins for a limited number of iterations before blocking in the kernel (similar to InitializeCriticalSectionAndSpinCount). However, that mutex cannot be used through std::mutex interface because there is no option to customise non-portable pthread_mutexattr_t before initialising pthread_mutex_t.
One can neither enable process-sharing, robostness, error-checking or priority-inversion prevention through std::mutex interface. In practice, people write their own wrappers of pthread_mutex_t which allows to set desirable mutex attributes; along with a corresponding wrapper for condition variables. Standard locks like std::unique_lock and std::lock_guard can be reused.
IMO, there could be provisions to set desirable mutex and condition variable properties in std:: APIs, like providing a protected constructor for derived classes that would initialize that native_handle, but there aren't any. That native_handle looks like a good idea to do platform specific stuff, however, there must be a constructor for the derived class to be able to initialize it appropriately. After the mutex or condition variable is initialized that native_handle is pretty much useless. Unless the idea was only to be able to pass that native_handle to (C language) APIs that expect a pointer or reference to an initialized pthread_mutex_t.
There is another example of Boost/C++ standard not accepting semaphores on the basis that they are too much of a rope to hang oneself, and that mutex (a binary semaphore, essentially) and condition variable are more fundamental and more flexible synchronisation primitives, out of which a semaphore can be built.
From the point of view of the C++ standard those are probably right decisions because educating users to use spinlocks and semaphores correctly with all the nuances is a difficult task. Whereas advanced users can whip out a wrapper for pthread_spinlock_t with little effort.
You are right there's no spin lock implementation in the std namespace. A spin lock is a great concept but in user space is generally quite poor. OS doesn't know your process wants to spin and usually you can have worse results than using a mutex. To be noted that on several platforms there's the optimistic spinning implemented so a mutex can do a really good job. In addition adjusting the time to "pause" between each loop iteration can be not trivial and portable and a fine tuning is required. TL;DR don't use a spinlock in user space unless you are really really sure about what you are doing.
C++ Thread discussion
Article explaining how to write a spin lock with benchmark
Reply by Linus Torvalds about the above article explaining why it's a bad idea
Spin locks have two advantages:
They require much fewer storage as a std::mutex, because they do not need a queue of threads waiting for the lock. On my system, sizeof(pthread_spinlock_t) is 4, while sizeof(std::mutex) is 40.
They are much more performant than std::mutex, if the protected code region is small and the contention level is low to moderate.
On the downside, a poorly implemented spin lock can hog the CPU. For example, a tight loop with a compare-and-set assembler instructions will spam the cache system with loads and loads of unnecessary writes. But that's what we have libraries for, that they implement best practice and avoid common pitfalls. That most user implementations of spin locks are poor, is not a reason to not put spin locks into the library. Rather, it is a reason to put it there, to stop users from trying it themselves.
There is a second problem, that arises from the scheduler: If thread A acquires the lock and then gets preempted by the scheduler before it finishes executing the critical section, another thread B could spin "forever" (or at least for many milliseconds, before thread A gets scheduled again) on that lock.
Unfortunately, there is no way, how userland code can tell the kernel "please don't preempt me in this critical code section". But if we know, that under normal circumstances, the critical code section executes within 10 ns, we could at least tell thread B: "preempt yourself voluntarily, if you have been spinning for over 30 ns". This is not guaranteed to return control directly back to thread A. But it will stop the waste of CPU cycles, that otherwise would take place. And in most scenarios, where thread A and B run in the same process at the same priority, the scheduler will usually schedule thread A before thread B, if B called std::this_thread::yield().
So, I am thinking about a template spin lock class, that takes a single unsigned integer as a parameter, which is the number of memory reads in the critical section. This parameter is then used in the library to calculate the appropriate number of spins, before a yield() is performed. With a zero count, yield() would never be called.
I have a piece of shared memory that contains a char string and an integer between two processes.
Process A writes to it and Process B reads it (and not vice versa)
What is the most efficient and effective way to make sure that Process A doesn't happen to update (write to it) that same time Process B is reading it? (Should I just use flags in the shared memory, use semaphores, critical section....)
If you could point me in the right direction, I would appreciate it.
Thanks.
Windows, C++
You cannot use a Critical Section because these can only be used for synchronization between threads within the same process. For inter process synchronization you need to use a Mutex or a Semaphore. The difference between these two is that the former allows only a single thread to own a resource, while the latter can allow up to a maximum number (specified during creation) to own the resource simultaneously.
In your case a Mutex seems appropriate.
Since you have two processes you need a cross-process synchronisation object. I think this means that you need to use a mutex.
A mutex object facilitates protection against data races and allows
thread-safe synchronization of data between threads. A thread obtains
ownership of a mutex object by calling one of the lock functions and
relinquishes ownership by calling the corresponding unlock function.
If you are using boost thread, you can use it's mutex and locking, more to read see the link below:
http://www.boost.org/doc/libs/1_47_0/doc/html/thread/synchronization.html#thread.synchronization.mutex_types
Since you're talking about two processes, system-wide mutexes will work, and Windows has those. However, they aren't necessarily the most efficient way.
If you can put more things in shared memory, then passing data via atomic operations on flags in that memory should be the most efficient thing to do. For instance, you might use the Interlocked functions to implement Dekker's Algorithm (you'll probably want to use something like YieldProcessor() to avoid busy waiting).
I'm trying to make a C++ API (for Linux and Solaris) thread-safe, so that its functions can be called from different threads without breaking internal data structures. In my current approach I'm using pthread mutexes to protect all accesses to member variables. This means that a simple getter function now locks and unlocks a mutex, and I'm worried about the overhead of this, especially as the API will mostly be used in single-threaded apps where any mutex locking seems like pure overhead.
So, I'd like to ask:
do you have any experience with performance of single-threaded apps that use locking versus those that don't?
how expensive are these lock/unlock calls, compared to eg. a simple "return this->isActive" access for a bool member variable?
do you know better ways to protect such variable accesses?
All modern thread implementations can handle an uncontended mutex lock entirely in user space (with just a couple of machine instructions) - only when there is contention, the library has to call into the kernel.
Another point to consider is that if an application doesn't explicitly link to the pthread library (because it's a single-threaded application), it will only get dummy pthread functions (which don't do any locking at all) - only if the application is multi-threaded (and links to the pthread library), the full pthread functions will be used.
And finally, as others have already pointed out, there is no point in protecting a getter method for something like isActive with a mutex - once the caller gets a chance to look at the return value, the value might already have been changed (as the mutex is only locked inside the getter method).
"A mutex requires an OS context switch. That is fairly expensive. "
This is not true on Linux, where mutexes are implemented using something called futex'es. Acquiring an uncontested (i.e., not already locked) mutex is, as cmeerw points out, a matter of a few simple instructions, and is typically in the area of 25 nanoseconds w/current hardware.
For more info:
Futex
Numbers everybody should know
This is a bit off-topic but you seem to be new to threading - for one thing, only lock where threads can overlap. Then, try to minimize those places. Also, instead of trying to lock every method, think of what the thread is doing (overall) with an object and make that a single call, and lock that. Try to get your locks as high up as possible (this again increases efficiency and may /help/ to avoid deadlocking). But locks don't 'compose', you have to mentally at least cross-organize your code by where the threads are and overlap.
I did a similar library and didn't have any trouble with lock performance. (I can't tell you exactly how they're implemented, so I can't say conclusively that it's not a big deal.)
I'd go for getting it right first (i.e. use locks) then worry about performance. I don't know of a better way; that's what mutexes were built for.
An alternative for single thread clients would be to use the preprocessor to build a non-locked vs locked version of your library. E.g.:
#ifdef BUILD_SINGLE_THREAD
inline void lock () {}
inline void unlock () {}
#else
inline void lock () { doSomethingReal(); }
inline void unlock () { doSomethingElseReal(); }
#endif
Of course, that adds an additional build to maintain, as you'd distribute both single and multithread versions.
I can tell you from Windows, that a mutex is a kernel object and as such incurs a (relatively) significant locking overhead. To get a better performing lock, when all you need is one that works in threads, is to use a critical section. This would not work across processes, just the threads in a single process.
However.. linux is quite a different beast to multi-process locking. I know that a mutex is implemented using the atomic CPU instructions and only apply to a process - so they would have the same performance as a win32 critical section - ie be very fast.
Of course, the fastest locking is not to have any at all, or to use them as little as possible (but if your lib is to be used in a heavily threaded environment, you will want to lock for as short a time as possible: lock, do something, unlock, do something else, then lock again is better than holding the lock across the whole task - the cost of locking isn't in the time taken to lock, but the time a thread sits around twiddling its thumbs waiting for another thread to release a lock it wants!)
A mutex requires an OS context switch. That is fairly expensive. The CPU can still do it hundreds of thousands of times per second without too much trouble, but it is a lot more expensive than not having the mutex there. Putting it on every variable access is probably overkill.
It also probably is not what you want. This kind of brute-force locking tends to lead to deadlocks.
do you know better ways to protect such variable accesses?
Design your application so that as little data as possible is shared. Some sections of code should be synchronized, probably with a mutex, but only those that are actually necessary. And typically not individual variable accesses, but tasks containing groups of variable accesses that must be performed atomically. (perhaps you need to set your is_active flag along with some other modifications. Does it make sense to set that flag and make no further changes to the object?)
I was curious about the expense of using a pthred_mutex_lock/unlock.
I had a scenario where I needed to either copy anywhere from 1500-65K bytes without using
a mutex or to use a mutex and do a single write of a pointer to the data needed.
I wrote a short loop to test each
gettimeofday(&starttime, NULL)
COPY DATA
gettimeofday(&endtime, NULL)
timersub(&endtime, &starttime, &timediff)
print out timediff data
or
ettimeofday(&starttime, NULL)
pthread_mutex_lock(&mutex);
gettimeofday(&endtime, NULL)
pthread_mutex_unlock(&mutex);
timersub(&endtime, &starttime, &timediff)
print out timediff data
If I was copying less than 4000 or so bytes, then the straight copy operation took less time. If however I was copying more than 4000 bytes, then it was less costly to do the mutex lock/unlock.
The timing on the mutex lock/unlock ran between 3 and 5 usec long including the time for
the gettimeofday for the currentTime which took about 2 usec
For member variable access, you should use read/write locks, which have slightly less overhead and allow multiple concurrent reads without blocking.
In many cases you can use atomic builtins, if your compiler provides them (if you are using gcc or icc __sync_fetch*() and the like), but they are notouriously hard to handle correctly.
If you can guarantee the access being atomic (for example on x86 an dword read or write is always atomic, if it is aligned, but not a read-modify-write), you can often avoid locks at all and use volatile instead, but this is non portable and requires knowledge of the hardware.
Well a suboptimal but simple approach is to place macros around your mutex locks and unlocks. Then have a compiler / makefile option to enable / disable threading.
Ex.
#ifdef THREAD_ENABLED
#define pthread_mutex_lock(x) ... //actual mutex call
#endif
#ifndef THREAD_ENABLED
#define pthread_mutex_lock(x) ... //do nothing
#endif
Then when compiling do a gcc -DTHREAD_ENABLED to enable threading.
Again I would NOT use this method in any large project. But only if you want something fairly simple.
If the locks make sure only one thread accesses the locked data at a time, then what controls access to the locking functions?
I thought that boost::mutex::scoped_lock should be at the beginning of each of my functions so the local variables don't get modified unexpectedly by another thread, is that correct? What if two threads are trying to acquire the lock at very close times? Won't the lock's local variables used internally be corrupted by the other thread?
My question is not boost-specific but I'll probably be using that unless you recommend another.
You're right, when implementing locks you need some way of guaranteeing that two processes don't get the lock at the same time. To do this, you need to use an atomic instruction - one that's guaranteed to complete without interruption. One such instruction is test-and-set, an operation that will get the state of a boolean variable, set it to true, and return the previously retrieved state.
What this does is this allows you to write code that continually tests to see if it can get the lock. Assume x is a shared variable between threads:
while(testandset(x));
// ...
// critical section
// this code can only be executed by once thread at a time
// ...
x = 0; // set x to 0, allow another process into critical section
Since the other threads continually test the lock until they're let into the critical section, this is a very inefficient way of guaranteeing mutual exclusion. However, using this simple concept, you can build more complicated control structures like semaphores that are much more efficient (because the processes aren't looping, they're sleeping)
You only need to have exclusive access to shared data. Unless they're static or on the heap, local variables inside functions will have different instances for different threads and there is no need to worry. But shared data (stuff accessed via pointers, for example) should be locked first.
As for how locks work, they're carefully designed to prevent race conditions and often have hardware level support to guarantee atomicity. IE, there are some machine language constructs guaranteed to be atomic. Semaphores (and mutexes) may be implemented via these.
The simplest explanation is that the locks, way down underneath, are based on a hardware instruction that is guaranteed to be atomic and can't clash between threads.
Ordinary local variables in a function are already specific to an individual thread. It's only statics, globals, or other data that can be simultaneously accessed by multiple threads that needs to have locks protecting it.
The mechanism that operates the lock controls access to it.
Any locking primitive needs to be able to communicate changes between processors, so it's usually implemented on top of bus operations, i.e., reading and writing to memory. It also needs to be structured such that two threads attempting to claim it won't corrupt its state. It's not easy, but you can usually trust that any OS implemented lock will not get corrupted by multiple threads.
I have a multi-threaded C++ app which does 3D rendering with the OpenSceneGraph library. I'm planning to kick off OSG's render loop as a separate thread using boost::threads, passing a data structure containing shared state in to the thread. I'm trying to avoid anything too heavyweight (like mutexes) for synchronization, as the render loop needs to be pretty tight, and OSG itself tries to avoid having to ever lock. Most of the shared state is set before the thread is started, and never changed. I do have some data that does need to be changed, which I am planning to double-buffer. However, I have a simple boolean for signaling the thread to suspend rendering, and later resume rendering, and another to kill it. In both cases the app thread sets the bool, and the render thread only reads it. Do I need to synchronize access to these bools? As far as I can tell, the worse thing that could happen is the the render loop continues on for an extra frame before suspending or quitting.
In C++11 and later, which has standards-defined concurrency, use std::atomic<bool> for this purpose. From http://en.cppreference.com/w/cpp/atomic/atomic:
If one thread writes to an atomic object while another thread reads from it, the behavior is well-defined (see memory model for details on data races).
The following old answer may have been true at some time in the past with some compilers and some operating environments, but it should not be relied upon today:
You're right, in this case you won't need to synchronise the bools. You should declare them volatile though, to ensure that the compiler actually reads them from memory each time, instead of caching the previous read in a thread (that's a simplified explanation, but it should do for this purpose).
The following question has more information about this:
C++ Thread, shared data
Why not simply use an interlocked variable?
As for C++11 and later it is finally threads-aware and clearly states that modifying a bool (or other non-atomic variable) in one thread and accessing it at the same time in another one is undefined behavior.
In you case using std::atomic<bool> should be enough to make your program correct, saving you from using locks.
Do not use volatile. It has nothing to do with threads.
For more discussion look at Can I read a bool variable in a thread without mutex?
I don't think you need a fully fledged mutex here -- though the render thread will need to busy wait in the 'suspended' state if you aren't using a synchronization object that supports a wait primitive.
You should look into using the various interlocked exchange primitives though (InterlockedExchange under Windows). Not because read/writes from the bool are non-atomic, but to ensure that there are no weird behaviours the compiler reordering memory accesses on a single thread.
This thread has a little more info and discussion on thread-safety, especially for simple data types:
How can I create a thread-safe singleton pattern in Windows?