Reader/Writer: multiple heavy readers, only 1 write per day - c++

I have a huge tbb::concurrent_unordered_map that gets "read" heavily by multiple (~60) threads concurrently.
Once per day I need to clear it (either fully, or selectively). Erasing is obviously not thread safe in tbb implementation, so some synchronisation needs to be in place to prevent UB.
However I know for a fact that the "write" will only happen once per day (the exact time is unknown).
I've been looking at std::shared_mutexto allow concurrent reads but I am afraid that even in an uncontended scenario might slow things significantly.
Is there a better solution for this?
Perhaps checking a std::atomic<bool> before locking on the mutex?
Thanks in advance.

It might require a bit of extra work on maintaining it, but you can use copy-on-write scheme.
Keep the map in a singleton within a shared pointer.
For "read" operations, have users thread-safely copy the shared pointer and use it for as long as they want.
For "write" operations, create a new instance map in a new shared pointer, fill it with whatever you want and replace the old version it in the singleton.
This way "read" users will still see the old version and can use it safely. Just make sure they occasionally get the newest version from the singleton. Perhaps, even give them a handle that automatically updates the shared pointer once a second or something.
This works in case you don't need the threads to synchronously update all at once.
Another scheme, you create atomic boolean to indicate when an update is incoming, and just make all threads pause their operations on the map when it is true. Once they all stopped you perform the update and let them resume their operation.

This is a perfect job for a read/write lock.
In C++ this can be implemented by having a shared_mutex, then using a unique_lock to lock it for writing, and a shared_lock to lock it for reading. See this post for a example code.
The end effect is that readers will never block on eachother, reads can all happen at the same time, but if the writer has the lock, everything will block to let the writing operation proceed.
If the writing takes a long time, so long that the once-per-day delay is unacceptable, then you can have the writer create and populate a new copy of the data without taking a lock, then take the write end of the lock and swap the data:
Readers:
Lock mutex with a shared_lock
Do stuff
Unlock
Repeat
Writer:
Create new copy of data
Lock mutex with a unique_lock
Swap data quickly
Unlock
Repeat tomorrow
A shared_lock on a shared_mutex will be fast. You could use a double check locking strategy but I would not do that until you do performance testing and also probably take a look at the source for shared_lock, because I suspect it does something similar already, and a double-check on the read end might just add overhead unnecessarily. Also I don't have enough coffee in me yet to work out double check locking in a read/write lock scenario.
There is a threading construct called a spin lock as well, but it's really just an encapsulation of a double-checked lock that repeats the "check" until it clears. It's a good construct but again you'll want performance analyses and a look at the shared_lock + shared_mutex source, because they might spin already. A good implementation of a spin lock can be found here, it covers some common gotchas. You might have to tweak it to get a read/write spinlock.
Generally speaking though, it's best to use existing constructs during the initial implementation at the very least as a clearly coded proof-of-concept. Then if you know that you're seeing too much read contention, you can optimize from there. But you need the general strategy down first, and 91 times out of a hundred, it's good enough. In this case, no matter what, some manifestation of a read/write lock is what you're going to end up with.

Related

C++ prioritize data synchronization between threads

I have a scenario, where I have a shared data model between several threads. Some threads are going to write to that data model cyclically and other threads are reading from that data model cyclically. But it is guaranteed that writer threads are only writing and reader threads are only reading.
Now the scenario is, that reading data shall have higher priority than writing data due to real time constraints on the reader side. So it is not acceptable that e.g. a writer is locking the data for a too long time. But a lock with a guaranteed locking time would be acceptable (e.g. it would be acceptable for the reader to wait max 1 ms until the data is synchronized and available).
So I'm wondering how this is achievable, because the "traditional" locking mechanisms (e.g. std::lock) wouldn't give those real time guarantees.
Normally in such a scenario you use a reader-writer-lock. This allows either a read by all readers in parallel or a write by a single writer.
But that does nothing to stop a writer from holding the lock for minutes if it so desires. Forcing the writer out of the lock is probably also not a good idea. The object is probably in some inconsistent state mid changed.
There is another synchronization method called read-copy-update that might help. This allows writers to modify element without being blocked by readers. The drawback is that you might get some readers still reading the old data and others reading the new data for some time.
It also might be problematic with multiple writers if they try to change the same member. The slower writer might have computed all the needed updates only to notice some other thread changes the object. It then has to start over wasting all the time it already spend.
Note: copying the element can be done in constant time, certainly under 1ms. So you can guarantee readers are never blocked for long. By releasing the write lock first you guarantee readers to read between any 2 writes, assuming the RW lock is designed with the same principle.
So I would suggest another solution I call write-intent-locking:
You start with a RW lock but add a lock to handle write-intent. Any writer can acquire the write-intent lock at any time, but only one of them, it's exclusive. Once a write holds the write-intent lock it copies the element
and starts modifying the copy. It can take as long as it wants to do that as it's not blocking any readers. It does block other writers though.
When all the modifications are done the writer acquires the write lock and then quickly copies, moves or replaces the element with the prepared copy. It then releases the write and write-intent lock, unblocking both the readers and writers that want to access the same element.
The way I would approach this is to have two identical copies of the dataset; call them copy A and copy B.
Readers always read from copy B, being careful to lock a reader/writer lock in read-only mode before accessing it.
When a writer-thread wants to update the dataset, it locks copy A (using a regular mutex) and updates it. The writer-thread can take as long as it likes to do this, because no readers are using copy A.
When the writer-thread is done updating copy A, it locks the reader/writer lock (in exclusive/writer-lock mode) and swaps dataset A with dataset B. (This swap should be done by exchanging pointers, and is therefore O(1) fast).
The writer-thread then unlocks the reader/writer-lock (so that any waiting reader-threads can now access the updated data-set), and then updates the other data-set the same way it updated the first data-set. This can also take as long as the writer-thread likes, since no reader-threads are waiting on this dataset anymore.
Finally the writer-thread unlocks the regular mutex, and we're done.
Well, you've got readers, and you've got writers, and you need a lock, so.... how about a readers/writer lock?
The reason I mention that up-front is because (a) you might not be aware of it, but more importantly (b) there's no standard RW lock in C++ (EDIT: my mistake, one was added in C++14), so your thinking about this is perhaps being done in the context of std::mutex. Once you've decided to go with a RW lock, you can benefit from other people's thinking about those locks.
In particular, there's a number of different options for prioritizing threads contending over RW locks. With one option, a thread acquiring a write lock waits until all current reader threads drop the lock, but readers who start waiting after the writer don't get the lock until the writer's done with it.
With that strategy, as long as the writer thread releases and reacquires the lock after each transaction, and as long as the writer completes each transaction within your 1 ms target, readers don't starve.
And if your writer can't promise that, then there is zero alternative but to redesign the writer: either doing more processing before acquiring the lock, or splitting a transaction into multiple pieces where it's safe to drop the lock between each.
If, on the other hand, your writer's transactions take much less than 1 ms, then you might consider skipping the release/reacquire between each one if less than 1 ms has elapsed (purely to reduce the processing overhead of doing so).... but I wouldn't advise it. Adding complexity and special cases and (shudder) wall clock time to your implementation is rarely the most practical way to maximize performance, and rapidly increases the risk of bugs. A simple multithreading system is a reliable multithreading system.
If model allows writing to be interrupted, then it also allows buffering. Use a fifo queue and start reading only when there are 50 elements written already. Use (smart)pointers to swap data in fifo queue. Swapping 8 bytes of pointer takes nanoseconds. Since there is buffering, writing will be on a different element than readers are working with so there wont be lock contention as long as producer can keep the pace with producers.
Why doesn't the reader produce its own consumer data? If you can have n producers and n consumers, each consumer can produce its own data too, without any producer. But this will have different multithread scaling. Maybe your algorithm is not applicable here but if it is, it would be more like independent multi-processing instead of multi-threading.
Writer work can be converted to multiple smaller jobs? Progress within writer can be reported to an atomic counter. When a reader has a waiting budget, it checks atomic value and if it looks slow, it can use same atomic value to instantly push it to 100% progress and writer sees it and early-quits lock.

Is there a C++ design pattern that implements a mechanism or mutex that controls the amount of time a thread can own a locked resource?

I am looking for a way to guarantee that any time a thread locks a specific resource, it is forced to release that resource after a specific period of time (if it has not already released it). Envision a connection where you need to limit the amount of time any specific thread can own that connection for.
I envision this is how it could be used:
{
std::lock_guard<std::TimeLimitedMutex> lock(this->myTimeLimitedMutex, timeout);
try {
// perform some operation with the resource that myTimeLimitedMutex guards.
}
catch (MutexTimeoutException ex) {
// perform cleanup
}
}
I see that there is a timed_mutex that lets the program timeout if a lock cannot be acquired. I need the timeout to occur after the lock is acquired.
There are already some situations where you get a resource that can be taken away unexpectedly. For instance, a tcp sockets -- once a socket connection is made, code on each side needs to handle the case where the other side drops the connection.
I am looking for a pattern that handle types of resources that normally time out on their own, but when they don't, they need to be reset. This does not have to handle every type of resource.
This can't work, and it will never work. In other words, this can never be made. It goes against all concept of ownership and atomic transactions. Because when thread acquires the lock and implements two transactions in a row, it expects them to become atomically visible to outside word. In this scenario, it would be very possible that the transaction will be torn - first part of it will be performed, but the second will be not.
What's worse is that since the lock will be forcefully removed, the part-executed transaction will become visible to outside word, before the interrupted thread has any chance to roll-back.
This idea goes contrary to all school of multi-threaded thinking.
I support SergeyAs answer. Releasing a locked mutex after a timeout is a bad idea and cannot work. Mutex stands for mutual exclusion and this is a rock-hard contract which cannot be violated.
But you can do what you want:
Problem: You want to guarantee that your threads do not hold the mutex longer than a certain time T.
Solution: Never lock the mutex for longer than time T. Instead write your code so that the mutex is locked only for the absolutely necessary operations. It is always possible to give such a time T (modulo the uncertainties and limits given my a multitasking and multiuser operating system of course).
To achieve that (examples):
Never do file I/O inside a locked section.
Never call a system call while a mutex is locked.
Avoid sorting a list while a mutex is locked (*).
Avoid doing a slow operation on each element of a list while a mutex is locked (*).
Avoid memory allocation/deallocation while a mutex is locked (*).
There are exceptions to these rules, but the general guideline is:
Make your code slightly less optimal (e.g. do some redundant copying inside the critical section) to make the critical section as short as possible. This is good multithreading programming.
(*) These are just examples for operations where it is tempting to lock the entire list, do the operations and then unlock the list. Instead it is advisable to just take a local copy of the list and clear the original list while the mutex is locked, ideally by using the swap() operation offered by most STL containers. And then do the slow operation on the local copy outside of the critical section. This is not always possible but always worth considering. Sorting has square complexity in the worst case and usually needs random access to the entire list. It is useful to sort (a copy of) the list outside of the critical section and later check whether elements need to be added or removed. Memory allocations also have quite some complexity behind them, so massive memory allocations/deallocations should be avoided.
You can't do that with only C++.
If you are using a Posix system, it can be done.
You'll have to trigger a SIGALARM signal that's only unmasked for the thread that'll timeout. In the signal handler, you'll have to set a flag and use longjmp to return to the thread code.
In the thread code, on the setjmp position, you can only be called if the signal was triggered, thus you can throw the Timeout exception.
Please see this answer for how to do that.
Also, on linux, it seems you can directly throw from the signal handler (so no longjmp/setjmp here).
BTW, if I were you, I would code the opposite. Think about it: You want to tell a thread "hey, you're taking too long, so let's throw away all the (long) work you've done so far so I can make progress".
Ideally, you should have your long thread be more cooperative, doing something like "I've done A of a ABCD task, let's release the mutex so other can progress on A. Then let's check if I can take it again to do B and so on."
You probably want to be more fine grained (have more mutex on smaller objects, but make sure you're locking in the same order) or use RW locks (so that other threads can use the objects if you're not modifying them), etc...
Such an approach cannot be enforced because the holder of the mutex needs the opportunity to clean up anything which is left in an invalid state part way through the transaction. This can take an unknown arbitrary amount of time.
The typical approach is to release the lock when doing long tasks, and re-aquire it as needed. You have to manage this yourself as everyone will have a slightly different approach.
The only situation I know of where this sort of thing is accepted practice is at the kernel level, especially with respect to microcontrollers (which either have no kernel, or are all kernel, depending on who you ask). You can set an interrupt which modifies the call stack, so that when it is triggered it unwinds the particular operations you are interested in.
"Condition" variables can have timeouts. This allows you to wait until a thread voluntarily releases a resource (with notify_one() or notify_all()), but the wait itself will timeout after a specified fixed amount of time.
Examples in the Boost documentation for "conditions" might make this more clear.
If you want to force a release, you have to write the code which will force it though. This could be dangerous. The code written in C++ can be doing some pretty close-to-the-metal stuff. The resource could be accessing real hardware and it could be waiting on it to finish something. It may not be physically possible to end whatever the program is stuck on.
However, if it is possible, then you can handle it in the thread in which the wait() times out.

Thread read versus write locking

I'm writing a CPU intensive program in C++ that has several threads needing to access a shared data structure, so locking will be required. To maximize throughput, I want to keep the bottleneck to a minimum. It looks like maybe nine times out of ten it will only be necessary to read the data structure, and one time out of ten it will be necessary to modify it.
Is there a way to have threads take read or write locks, so that write locks block everything but read locks don't block each other?
A portable solution would be ideal, but if there is one solution for Windows and another for Linux that would be okay.
Yes, this is a common situation that can be solved with a reader-writer lock.
Note that depending on the dynamic properties of your program, you may need to be careful about writer starvation. If there are enough readers that their attempts to read always overlap (or overlap for a long time), then a simple implementation of a reader-writer lock will "starve" the writer by making the writer wait until there are no readers reading. In a more advanced implementation, a writer request will be conceptually inserted into the queue before subsequent readers, allowing the writer to have a chance to access after all the previously active readers finish.
Most implementations require you to know ahead of time whether you want a read lock or a write lock. Some implementations allow you to "upgrade" a read lock into a write lock without having to release the read lock first (which would give another writer an opportunity to enter the lock).

Fast thread syncronization

I'm working on a system that have multiple threads and one shared object. There is a number of threads that do read operations very often, but write operations are rare, maybe 3 to 5 per day.
I'm using rwlock for synchronization but the lock acquisition operation it's not fast enough since it happens all the time. So, I'm looking for a faster way of doing it.
Maybe a way of making the write function atomic or looking all threads during the write. Portability it's not a hard requirement, I'm using Linux with GCC 4.6.
Have you considered using read-copy-update with liburcu? This lets you avoid atomic operations and locking entirely on the read path, at the expense of making writes quite a bit slower. Note that some readers might see stale data for a short time, though; if you need the update to take effect immediately, it may not be the best option for you.
You might want to use a multiple objects rather than a single one. Instead of actually sharing the object, create an object that holds the object and an atomic count, then share a pointer to this structure among the threads.
[Assuming that there is only one writer] Each reader will get the pointer, then atomically increment the counter and use the object, after reading, atomically decrement the counter. The writer will create a new object that holds a copy of the original and modify it. Then perform an atomic swap of the two pointers. Now the problem is releasing the old object, which is why you need the count of the readers. The writer needs to continue checking the count of the old object until all readers have completed the work at which point you can delete the old object.
If there are multiple writers (i.e. there can be more than one thread updating the variable) you can follow the same approach but with writers would need to do a compare-and-swap exchange of the pointer. If the pointer from which the updated copy has changed, then the writer restarts the process (deletes it's new object, copies again from the pointer and retries the CAS)
Maybe you could use a spinlock, the threads will busy wait until unlocked. If the threads aren't locked for long it can be much more efficent than mutexes since the locking and unlocking is completed with less instructions.
spinlock is a part of POSIX pthread although optional so I don't know if it's implemented on your system. I used them in a C program on ubuntu but had to compile with -std=gnu99 instead of c99.

Threads and simple Dead lock cure

When dealing with threads (specifically in C++) using mutex locks and semaphores is there a simple rule of thumb to avoid Dead Locks and have nice clean Synchronization?
A good simple rule of thumb is to always obtain your locks in a consistent predictable order from everywhere in your application. For example, if your resources have names, always lock them in alphabetical order. If they have numeric ids, always lock from lowest to highest. The exact order or criteria is arbitrary. The key is to be consistent. That way you'll never have a deadlock situation. eg.
Thread 1 locks resource A
Thread 2 locks resource B
Thread 1 waits to obtain a lock on B
Thread 2 waits to obtain a lock on A
Deadlock
The above can never happen if you follow the rule of thumb outlined above. For a more detailed discussion, see the Wikipedia entry on the Dining Philosophers problem.
If at all possible, design your code so that you never have to lock more then a single mutex/semaphore at a time.
If that's not possible, make sure to always lock multiple mutex/semaphores in the same order. So if one part of the code locks mutex A and then takes semaphore B, make sure that no other part of the code takes semaphore B and then locks mutex A.
Try to avoid acquiring one lock and trying to acquire another. This can result into circular dependency and cause for deadlock.
If it is un-avoidable then at least the order of acquire locks should be predictable.
Use RAII ( to make sure lock is release properly in case of exception as well)
There is no simple deadlock cure.
Acquire locks in agreed order: If all calls acquire A->B->C then no deadlock can occur. Deadlocks can occur only if the locking order differs between the two threads (one acquires A->B the second B->A).
In practice is hard to choose an order between arbitrary objects in memory. On a simple trivial project is possible, but on large projects with many individual contributors is very hard. A partial solution is to create hierarchies, by ranking the locks. All locks in module A have rank 1, all locks in module B have rank 2. One can acquire a lock of rank 2 when helding locks of rank 1, but not vice-versa. Of course you need a framework around the locking primitives that tracks and validates the ranking.
One way to ensure the ordering that other folks have talked about is to acquire locks in an order defined by their memory address. If at any point, you try to acquire a lock that should have been earlier in the sequence, you release all the locks and start over.
With a little work, it's possible to do this nearly automatically with some wrapper classes around the system primitives.
There's no practical cure. Specifically, there's no way to simply test code for being synchronizationally correct, or to have your programmers obey the rules of the gentleman with the green V.
There's no way to properly test the multithreaded code, because the program logic may depend on timing of locks acquisition, and therefore, be different from execution to execution, somehow invalidating the concept of QA.
I would say
prefer using threads only as a performance optimization for multi-core machines
only optimize performance when you are sure you need this performance
you may use threads to simplify program logic, but only when you are absolutely sure what you are doing. Be extra careful and all locks are confined to a very small piece of code. Do not let any newbies near such code.
never use threads in a mission-critical system, such as flying an aircraft or operating dangerous machinery
in all cases, threads are seldom cost-effective, due to higher debug and QA costs
If you determined to do threads or maintaining existing codebase:
confine all locks to small and simple pieces of code, which operate on primitives
avoid function calls or getting the program flow away to where the fact of being executed under lock is not immediately visible. This function will change by future authors, widening your lock span without your control.
get locks inside objects to reduce locking scope, wrap non-thread-safe 3rd-party objects with your own thread-safe interfaces.
never send synchronous notifications (callbacks) when executing under lock
use only RAII locks, to reduce the cognitive load when thinking "how else can we exit from here", as in exceptions, etc.
A few words on how to avoid multi-threading.
A single-threaded design usually involves some heart-beat function provided by program components, and called in a loop (called heartbeat cycle) which, when called, gives a chance to all components to do the next piece of work and to surrender control back again. What algorithmists like to think of as "loops" inside the components, will turn into state machines, to identify what is the next thing that should be done when called. State is best maintained as member data of respective objects.
There are plenty of simple "deadlock cures". But none that are easy to apply and work universally.
The simplest of all, of course, is "never have more than one thread".
Assuming you have a multithreaded application though, there are still a number of solutions:
You can try to minimize shared state and synchronization. Two threads that just run in parallel and never interact can never deadlock. Deadlocks only occur when multiple threads try to access the same resource. Why do they do that? Can that be avoided? Can the resource be restructured or divided so that for example, one thread can write to it, and other threads are asynchronously passed the data they need?
Perhaps the resource can be copied, giving each thread its own private copy to work with?
And as already mentioned by every other answer, if and when you try to acquire locks, do so in a global consistent order. To simplify this, you should try to ensure that all the locks a thread is going to need are acquired as a single operation. If a thread needs to acquire locks A, B and C, it should not make three lock() calls at different times and from different places. You'll get confused, and you won't be able to keep track of which locks are held by the thread, and which ones it has yet to acquire, and then you'll mess up the order. If you can acquire all the lock you need once, then you can factor it out into a separate function call which acquires N locks, and does so in the correct order to avoid deadlocks.
Then there are the more ambitious approaches: Techniques like CSP make threading extremely simple and easy to prove correct, even with thousands of concurrent threads. But it requires you to structure your program very differently from what you're used to.
Transactional Memory is another promising option, and one that may be easier to integrate into conventional programs. But production-quality implementations are still very rare.
Read Deadlock: the Problem and a Solution.
"The common advice for avoiding deadlock is to always lock the two mutexes in the same order: if you always lock mutex A before mutex B, then you'll never deadlock. Sometimes this is straightforward, as the mutexes are serving different purposes, but other times it is not so simple, such as when the mutexes are each protecting a separate instance of the same class".
If you want to attack the possibility of a deadlock you must attack one of the 4 crucial conditions for the existence of a deadlock.
The 4 conditions for a deadlock are:
1. Mutual Exclusion - only one thread can enter the critical section at a time.
2. Hold and Wait - a thread doesn't release the resources he acquired as long as he didn't finish his job even if other resources are un available.
3. No preemption - A thread doesn't have a priority over other threads.
4. Resource Cycle - There has to be a cycle chain of threads that waits for resources from other threads.
The easiest condition to attack is the resource cycle by making sure that no cycles are possible.