This is an interview question. How do you implement a read/write mutex? There will be multiple threads reading and writing to a resource. I'm not sure how to go about it. If there's any information needed, please let me know.
Update: I'm not sure if my statement above is valid/understandable. But what I really want to know is how do you implement multiple read and multiple writes on a single object in terms of mutex and other synchronization objects needed?
Check out Dekker's algorithm.
Dekker's algorithm is the first known
correct solution to the mutual
exclusion problem in concurrent
programming. The solution is
attributed to Dutch mathematician Th.
J. Dekker by Edsger W. Dijkstra in his
manuscript on cooperating sequential
processes. It allows two threads to
share a single-use resource without
conflict, using only shared memory for
communication.
Note that Dekker's algorithm uses a spinlock (not a busy waiting) technique.
(Th. J. Dekker's solution, mentioned by E. W. Dijkstra in his EWD1303 paper)
The short answer is that it is surprisingly difficult to roll your own read/write lock. It's very easy to miss a very subtle timing problem that could result in deadlock, two threads both thinking they have an "exclusive" lock, etc.
In a nutshell, you need to keep a count of how many readers are active at any particular time. Only when the number of active readers is zero, should you grant a thread write access. There are some design choices as to whether readers or writers are given priority. (Often, you want to give writers the priority, on the assumption that writing is done less frequently.) The (surprisingly) tricky part is to ensure that no writer is given access when there are readers, or vice versa.
There is an excellent MSDN article, "Compound Win32 Synchronization Objects" that takes you through the creation of a reader/writer lock. It starts simple, then grows more complicated to handle all the corner cases. One thing that stood out was that they showed a sample that looked perfectly good-- then they would explain why it wouldn't actually work. Had they not pointed out the problems, you might have never noticed. Well worth a read.
Hope this is helpful.
This sounds like an rather difficult question for an interview; I would not "implement" a read/write mutex, in the sense of writing one from scratch--there are much better off-the-shelf solutions available. The sensible real world thing would be to use an existing mutex type. Perhaps what they really wanted to know was how you would use such a type?
Afaik you need either an atomic compare-and-swap instruction, or you need to be able to disable interrupts. See Compare-and-swap on wikipedia. At least, that's how an OS would implement it. If you have an operating system, stand on it's shoulders, and use an existing library (boost for example).
Related
I am writing a network service which receives raw packets then converts them and puts them into a queue, there are also a couple of worker threads that take the converted packets from the queue and based on some rules update a hash-map. in order to prevent concurrent update on hash-map from different worker threads I have to use mutex. unfortunately using mutex imposes a big performance hit. I need to find a work around for this.
EDITED:
the converted packets contain a sessio_id, this session_id is used as the hash-map key. Before any insertion or update the session_id is first searched and if there is no session_id found then a new entry is added and this is exactly where i use mutex lock, otherwise if the session_id already exists I just update the existing value and there is no mutex lock used for mere value update. It might be helping to know that I use boost::unordered_map as the underlying hash-map.
below is a psudo code of the logic I use:
if hash.find(session_id) then
hash.update(value)
else
mutex.lock()
hash.insert(value)
mutex.unlock()
end
what is you suggestion?
by the way this is my working environment and tools:
Compiler: C++(gcc)
Thread library: pthread
OS: Ubuntu 14.04
The fastest solution would be to split the data in a way that each thread uses its own data set, so you would not need any locking at all. Maybe you can get there by distributing the messages among the threads based on some key data.
Second best solution would be to have a read-write-spinlock implemented using either C++ 11 atomics or the functions from the C library, see https://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Atomic-Builtins.html
Read-write spinlock typically allow multiple, parallel read accesses, but only one write access (which of course also blocks all read accesses).
There is also a read-write mutex in Linux, but I found it to be slightly slower than a hand-made implementation.
Have you looked into lock-free data structures? You can refer to an interesting paper from Andrei Alexandrescu and Maged Michael, Lock-Free Data Structures with Hazard Pointers. Some implementations using similar ideas can for instance be found on the libcds Github repository.
Although they use locking to some extent, Facebook's folly AtomiHashMap and Intel's TBB also provide high performance concurrent hash-maps.
Of course these approaches will require a bit a extra reading and integration work, but if you have determined that your current locking strategy is the bottleneck, it may well be worth the cost.
I know Boost has support for mutexes and lock_guard, which can be used to implement critical sections.
But Windows has a special API for critical sections (see EnterCriticalSection and LeaveCriticalSection) which is a LOT faster than a mutex (for rarely contended, short sections of code).
Hence my question - it is possible in Boost to take advantage of this API, and fallback to spinlock/mutex/futex-based implementation on other platforms?
The simple answer is no.
Here's some relevant background from an old mailing list thread:
BTW. I am agree that mutex is more universal solution from a
performance point of view. But to be fair - CS are faster in simple
design. I believe that possibility to support them should be at
least
taken in account.
This was the article that someone pointed me to. The conclusion was
that CS are only faster if:
There are less than 8 threads total in the process.
You weren't running in the background.
You weren't on an dual processor machine.
To me this means that simple testing yields good CS performance
results, but any real world program is better off with a full blown
mutex.
I'm not adverse to supporting a CS implementation. However, I
originally chose not to for the following reasons:
You get either construction and destruction hits from using a PIMPL
idiom or you must include Windows.h in the Boost.Threads headers,
which I simply don't want to do. (This can be worked around by
emulating a CS ala OPTEX from the MSDN.)
According to this research paper most programs won't benefit from
a CS design.
It's trivial to code a (non-portable) critical_section class that
follows the Mutex model if you truly can make use of this.
For now I think I've made the right choice, though down the road we
may change the implementation to use a critical section or OPTEX.
Bill Kempf
Speaking as someone who helps out maintaining Boost.Thread, and as someone who failed to get an event object into Boost.Thread, I don't think critical sections have ever been added nor would be added to Boost for these reasons:
A Win32 critical section is trivially easy to build using a boost::atomic and a boost::condition_variable, so much so it isn't really worth having an official one. Here is probably the most complex one you could imagine, but extremely configurable including being constexpr ready (don't ask!): https://github.com/ned14/boost.outcome/blob/master/include/boost/outcome/v1/spinlock.hpp#L331
You can build your own simply by matching (Basic)Lockable concept and using atomic compare_exchange (non-x86/x64) or atomic exchange (x86/x64) and then grab it using a lock_guard around the critical section.
Some may object that a win32 critical section is not this. I am afraid it is: it simply spins on an atomic for a spin count, and then lazily tries to allocate a win32 event object which it then waits upon. Nothing special.
As much as you might think critical sections (really user mode mutexes) are better/faster/whatever, they probably are not as great as you might think. boost::mutex is a big vast heavyweight thing on Windows internally using a win32 semaphore as the kernel wait object because of the need to emulate thread cancellation and to behave well in a general purpose use context. It's easy to write a concurrency structure which is faster than another for some single use case, but it is very very hard to write a concurrency structure which is all of:
Faster than a standard implementation in the uncontended case.
Faster than a standard implementation in the lightly contended case.
Faster than a standard implementation in the heavily contended case.
Even if you manage all three of the above, that still isn't enough: you also need some guarantees on worst case progression ordering, so whether certain patterns of locks, waits and unlocks produce predictable outcomes. This is why threading facilities can appear to look slow in narrow use case scenarios, so Boost.Thread much as the STL can appear to be much slower than hand rolled locking code in say an uncontended use case.
Boost.Thread already does substantial work in user mode to avoid going to kernel sleep on Windows. On POSIX any of the major pthreads implementations also does substantial work to avoid kernel sleeps and hence Boost.Thread doesn't replicate that work. In other words, critical sections don't gain you anything in terms of scaling to load behaviours, though inevitably Boost.Thread v4 especially on Windows does a ton load of work a naive implementation does not (the planned rewrite of Boost.Thread is vastly more efficient on Windows as it can assume Windows Vista or above).
So, it looks like the default Boost mutex doesn't support it, but asio::detail::mutex does.
So I ended up using that:
#include <boost/asio/detail/mutex.hpp>
#include <boost/thread.hpp>
using boost::asio::detail::mutex;
using boost::lock_guard;
int myFunc()
{
static mutex mtx;
lock_guard<mutex> lock(mtx);
. . .
}
If I have
1. mainThread: write data A,
2. Thread_1: read A and write it to into a Buffer;
3. Thread_2: read from the Buffer.
how to synchronize these three threads safely, with not much performance loss? Is there any existing solution to use? I use C/C++ on linux.
IMPORTANT: the goal is to know the synchronization mechanism or algorithms for this particular case, not how mutex or semaphore works.
First, I'd consider the possibility of building this as three separate processes, using pipes to connect them. A pipe is (in essence) a small buffer with locking handled automatically by the kernel. If you do end up using threads for this, most of your time/effort will be spent on creating nearly an exact duplicate of the pipes that are already built into the kernel.
Second, if you decide to build this all on your own anyway, I'd give serious consideration to following a similar model anyway. You don't need to be slavish about it, but I'd still think primarily in terms of a data structure to which one thread writes data, and from which another reads the data. By strong preference, all the necessary thread locking necessary would be built into that data structure, so most of the code in the thread is quite simple, reading, processing, and writing data. The main difference from using normal Unix pipes would be that in this case you can maintain the data in a more convenient format, instead of all the reading and writing being in text.
As such, what I think you're looking for is basically a thread-safe queue. With that, nearly everything else involved becomes borders on trivial (at least the threading part of it does -- the processing involved may not be, but at least building it with multiple threads isn't adding much to the complexity).
It's hard to say how much experience with C/C++ threads you have. I hate to just point to a link but have you read up on pthreads?
https://computing.llnl.gov/tutorials/pthreads/
And for a shorter example with code and simple mutex'es (lock object you need to sync data):
http://students.cs.byu.edu/~cs460ta/cs460/labs/pthreads.html
I would suggest Boost.Thread for this purpose. This is quite good framework with mutexes and semaphores, and it is multiplatform. Here you can find very good tutorial about this.
How exactly synchronize these threads is another problem and needs more information about your problem.
Edit The simplest solution would be to put two mutexes -- one on A and second on Buffer. You don't have to worry about deadlocks in this particular case. Just:
Enter mutex_A from MainThread; Thread1 waits for mutex to be released.
Leave mutex from MainThread; Thread1 enters mutex_A and mutex_Buffer, starts reading from A and writes it to Buffer.
Thread1 releases both mutexes. ThreadMain can enter mutex_A and write data, and Thread2 can enter mutex_Buffer safely read data from Buffer.
This is obviously the simplest solution, and probably can be improved, but without more knowledge about the problem, this is the best I can come up with.
Is it ok to check the current thread inside a function?
For example if some non-thread safe data structure is only altered by one thread, and there is a function which is called by multiple threads, it would be useful to have separate code paths depending on the current thread. If the current thread is the one that alters the data structure, it is ok to alter the data structure directly in the function. However, if the current thread is some other thread, the actual altering would have to be delayed, so that it is performed when it is safe to perform the operation.
Or, would it be better to use some boolean which is given as a parameter to the function to separate the different code paths?
Or do something totally different?
What do you think?
You are not making all too much sense. You said a non-thread safe data structure is only ever altered by one thread, but in the next sentence you talk about delaying any changes made to that data structure by other threads. Make up your mind.
In general, I'd suggest wrapping the access to the data structure up with a critical section, or mutex.
It's possible to use such animals as reader/writer locks to differentiate between readers and writers of datastructures but the performance advantage for typical cases usually wont merit the additional complexity associated with their use.
From the way your question is stated, I'm guessing you're fairly new to multithreaded development. I highly suggest sticking with the simplist and most commonly used approaches for ensuring data integrity (most books/articles you readon the issue will mention the same uses for mutexes/critical sections). Multithreaded development is extremely easy to get wrong and can be difficult to debug. Also, what seems like the "optimal" solution very often doesn't buy you the huge performance benefit you might think. It's usually best to implement the simplist approach that will work then worry about optimizing it after the fact.
There is a trick that could work in case, as you said, the other threads will only make changes only once in a while, although it is still rather hackish:
make sure your "master" thread can't be interrupted by the other ones (higher priority, non fair scheduling)
check your thread
if "master", just change
if other, put off scheduling, if needed by putting off interrupts, make change, reinstall scheduling
really test to see whether there are no issues in your setup.
As you can see, if requirements change a little bit, this could turn out worse than using normal locks.
As mentioned, the simplest solution when two threads need access to the same data is to use some synchronization mechanism (i.e. critical section or mutex).
If you already have synchronization in your design try to reuse it (if possible) instead of adding more. For example, if the main thread receives its work from a synchronized queue you might be able to have thread 2 queue the data structure update. The main thread will pick up the request and can update it without additional synchronization.
The queuing concept can be hidden from the rest of the design through the Active Object pattern. The activ object may also be able to publish the data structure changes through the Observer pattern to other interested threads.
How do i design a multithreaded C program to avoid Semaphore Mutex Concurrency
There are a few ways.
The best way is to eliminate all shared data between threads. While this isn't always practical, it's always good to eliminate as much shared data as possible.
After that, you need to start looking into lockless programming. Lockless programming is a bit of a fad right now, but the dirty secret is that it's often a much better idea to use lock-based concurrency like mutexes and semaphores. Lockless programming is very hard to get correct. Look up Herb Sutter's articles on the subject, or the wikipedia page. There are a lot of good resources about lockless synchronization out there.
Somewhere in between is critical sections. If you're programming on Windows, critical sections should be preferred to mutexes as they do some work to avoid the overhead of full mutex locks and unlocks. Try those out first, and if your performance is unacceptable (or you're targeting platforms without critical sections), then you can look into lockless techniques.
It would be best to program lock less code. It's hard, but possible.
GCC Atomic Builtins
Article by Andrei Alexandrescu (C++)
Be sure to pass on data structures between threads, always knowing which thread the data exclusively belongs to. If you use (as mentioned by Dan before) e.q. lockless queues to pass your data around, you shouldn't run into too many concurrency issues (as your code behaves much more like any other code waiting for some data to arrive).
If you are, however, migrating single- to multithreaded code - this is an entirely different beast. It's very hard. And most of the time there are no elegant solutions.
Also, have a look at InterlockedXXX series to perform atomic operation in multi-threaded environment.
InterlockedIncrement Function
InterlockedDecrement Function
InterlockedExchange Function