I am looking for the best way to effectively share chunks of data between two (or more) processes in a writer-biased reader/writer model.
My current tests are with boost::interprocess. I have created some managed_shared_memory and am attempting to lock access to the data chunk by using an interprocess mutex stored in the shared memory.
However, even when using sharable_lock on the reader and upgradable_lock on the writer, the client will read fragmented values during write operations instead of blocking. While doing a similar reader/writer setup between threads in a single process, I used upgrade_to_unique_lock to solve this issue. However, I have not found its boost::interprocess equivalent. Does one exist?
Server (writer):
while (1) {
// Get upgrade lock on the mutex
upgradable_lock <MutexType> lock(myMutex);
// Need 'upgrade_to_unique_lock' here so shared readers will block until
// write operation is finished.
// Write values here
}
Client (reader)
while (1)
{
// Get shared access
sharable_lock <MutexType> lock(myMutex);
// Read p1's data here -- occasionally invalid!
}
I guess the bigger question at hand is this: is an interprocess mutex even the proper way to access shared memory between processes in a writer-biased setup?
Note: using Boost 1.44.0
All Boost.Interprocess upgradable locks support upgrade per this. Definition here.
Regarding your broader question - I would think that this is precisely what you want. Readers can still work concurrently, and you have to prevent concurrent writes. Unless you can partition the shared memory such that more constrained access is guaranteed, this looks the best.
Solution by OP.
The answer, as stated in the question comments is to use the member function unlock_upgradable_and_lock. If there is an boost::interprocess analog to upgrade_to_unique_lock, I don't know where it is. But the writer() function can be rewritten as:
while (1) {
// Get upgrade lock on the mutex
myMutex.lock_upgradable();
// Get exclusive access and block everyone else
myMutex.unlock_upgradable_and_lock();
// Write values here
// Unlock the mutex (and stop blocking readers)
myMutex.unlock();
}
Related
I have inherited an application which I'm trying to improve the performance of and it currently uses mutexes (std::lock_guard<std::mutex>) to transfer data from one thread to another. One thread is a low-frequency (slow) one which simply modifies the data to be used by the other.
The other thread (which we'll call fast) has rather stringent performance requirements (it needs to do maximum number of cycles per second possible) and we believe this is being impacted by the use of the mutexes.
Basically, the current logic is:
slow thread: fast thread:
occasionally: very-often:
claim mutex claim mutex
change data use data
release mutex release mutex
In order to get the fast thread running at maximum throughput, I'd like to experiment with removing the number of mutex locks it has to do.
I suspect a variation of the double locking check pattern may be of use here. I know it has serious issues with bi-directional data flow (or singleton creation) but the areas of responsibility in my case are a little more limited in terms of which thread performs which operations (and when) on the shared data.
Basically, the slow thread sets up the data and never reads or writes to it again unless a new change comes in. The fast thread uses and changes the data but never expects to pass any information back to the other thread. In other words, ownership mostly flows strictly one way.
I wanted to see if anyone could pick any holes in the strategy I'm thinking of.
The new idea is to have two sets of data, one current and one pending. There is no need for a queue in my case as incoming data overwrites previous data.
The pending data will only ever be written to by the slow thread under the control of the mutex and it will have an atomic flag to indicate that it has written and relinquished control (for now).
The fast thread will continue to use current data (without the mutex) until such time as the atomic flag is set. Since it is responsible for transferring pending to current, it can ensure the current data is always consistent.
At the point where the flag is set, it will lock the mutex and, transfer pending to current, clear the flag, unlock the mutex and carry on.
So, basically, the fast thread runs at full speed and only does mutex locks when it knows the pending data needs to be transferred.
Getting into more concrete details, the class will have the following data members:
std::atomic_bool m_newDataReady;
std::mutex m_protectData;
MyStruct m_pendingData;
MyStruct m_currentData;
The method for receiving new data in the slow thread would be:
void NewData(const MyStruct &newData) {
std::lock_guard<std::mutex> guard(m_protectData);
m_newDataReady = false;
Transfer(m_newData, 'to', m_pendingData);
m_newDataReady = true;
}
Clearing the flag prevents the fast thread from even trying to check for new data until the immediate transfer operation is complete.
The fast thread is a little trickier, using the flag to keep mutex locks to a minimum:
while (true) {
if (m_newDataReady) {
std::lock_guard<std::mutex> guard(m_protectData);
if (m_newDataReady) {
Transfer(m_pendingData, 'to', m_currentData);
m_newDataReady = false;
}
}
Use (m_currentData);
}
Now it appears to me that the use of this method in the fast thread could improve performance quite a bit:
There is only one place where the atomic flag is used outside the control of the mutex and the fact that it's an atomic means its state should be consistent there.
Even if it's not consistent, the second check inside the mutex-locked area should provide a safety valve (it's rechecked when we know it's consistent).
The transfer of data is only ever performed under the control of the mutex so that should always be consistent.
The outer loop in the fast thread means that unnecessary mutex locks will be avoided - they'll only be done if the flag is true (or "half-true", a possibly inconsistent state).
The inner if will take care of that "half-true" possibility that, between checking the and locking the mutex, the flag has been cleared.
I can't see any holes in this strategy but, given I'm only just getting into atomics/threading in the standard-C++ world, it may be I'm missing something.
Are there any clear problems in using this method?
I am new to threading . Correct me if I am wrong that mutex locks the access to a shared data structure so that it cannot be used by other threads until it is unlocked . So, lets consider that there are 2 or more shared data structures . So , should I make different mutex objects for different data structures ? If no ,then how std::mutex will know which object it should lock ? What If I have to lock more than 1 objects at the same time ?
There are several points in your question that can be made more precise. Perhaps clearing this will solve things for you.
To begin with, a mutex, by itself, does not lock access to anything. It is basically something that your code can lock and unlock, and some "magic" ensures that only one thread can lock it at a time.
If, by convention, you decide that any code accessing some data structure foo will first begin by locking a mutex foo_mutex, then it will have the effect of protecting this data structure.
So, having said that, regarding your questions:
It depends on whether the two data structures need to be accessed together or not (e.g., can updating one without the other leave the system in an inconsistent state). If so, you should lock them with a single mutex. If not, you can improve parallelism by using two.
The mutex does not lock anything. It is you who decide by convention whether you can access 1, 2, or a million data structures, while holding it.
If you always needs to access both structures then it could be considered as a single resource so only a single lock is needed.
If you sometimes, even just once, need to access one of the structures independently then they can no longer be considered a single resource and you might need two locks. Of course, a single lock could still be sufficient, but then that lock would lock both resources at once, prohibiting other threads from accessing any of the structures.
Mutex does not "know" anything other than about itself. The lock is performed on mutex itself.
If there are two objects (or pieces of code) that need synchronized access (but can be accessed at the same time) then you have the liberty to use just one mutex for both or one for each. If you use one mutex they will not be accessed at the same time from two different threads.
If it cannot happen that access to one object is required while accessing the other object then you can use two mutexes, one for each. But if it can happen that one object must be accessed while the thread already holds another mutex then care must be taken that code never can reach a deadlock, where two threads hold one mutex each, and both at the same time wait that the other mutex is released.
My problem is quite common I suppose, but it drives me crazy:
I have a multi-threaded application with 5 threads. 4 of these threads do their job, like network communication and local file system access, and then all write their output to a data structure of this form:
struct Buffer {
std::vector<std::string> lines;
bool has_been_modified;
}
The 5th thread prints these buffer/structures to the screen:
Buffer buf1, buf2, buf3, buf4;
...
if ( buf1.has_been_modified ||
buf2.has_been_modified ||
buf3.has_been_modified ||
buf4.has_been_modified )
{
redraw_screen_from_buffers();
}
How do I protect the buffers from being overwritten while they are either being read from or written to?
I can't find a proper solution, although I think this has to be a quiet common problem.
Thanks.
You should use a mutex. The mutex class is std::mutex. With C++11 you can use std::lock_guard<std::mutex> to encapsulate the mutex using RAII. So you would change your Buffer struct to
struct Buffer {
std::vector<std::string> lines;
bool has_been_modified;
std::mutex mutex;
};
and whenever you read or write to the buffer or has_been_modified you would do
std::lock_guard<std::mutex> lockGuard(Buffer.mutex); //Do this for each buffer you want to access
... //Access buffer here
and the mutex will be automatically released by the lock_guard when it is destroyed.
You can read more about mutexes here.
You can use a mutex (or mutexes) around the buffers to ensure that they're not modified by multiple threads at the same time.
// Mutex shared between the multiple threads
std::mutex g_BufferMutex;
void redraw_screen_from_buffers()
{
std::lock_guard<std::mutex> bufferLockGuard(g_BufferMutex);
//redraw here after mutex has been locked.
}
Then your buffer modification code would have to lock the same mutex when the buffers are being modified.
void updateBuffer()
{
std::lock_guard<std::mutex> bufferLockGuard(g_BufferMutex);
// update here after mutex has been locked
}
This contains some mutex examples.
What appears you want to accomplish is to have multiple threads/workers and one observer. The latter needs to do its job only when all workers are done/signal. If this is the case then check code in this SO q/a. std::condition_variable - Wait for several threads to notify observer
mutex are a very nice thing when trying to avoid dataraces, and I'm sure the answer posted by #Phantom will satisfy most people. However, one should know that this is not scalable to large systems.
By locking you are synchronising your threads. As only one at a time can be accessing the vector, on thread writting to the container will cause the other one to wait for it to finish ... with may be good for you but causes serious performance botleneck when high performance is needed.
The best solution would be to use a more complexe lock free structure. Unfortunatelly I don't think there is any standart lockfree structure in the STL. One exemple of lockfree queue is available here
Using such a structure, your 4 working threads would be able to enqueue messages to the container while the 5th one would dequeue them, without any dataraces
More on lockfree datastructure can be found here !
There is a widely known way of locking multiple locks, which relies on choosing fixed linear ordering and aquiring locks according to this ordering.
That was proposed, for example, in the answer for "Acquire a lock on two mutexes and avoid deadlock". Especially, the solution based on address comparison seems to be quite elegant and obvious.
When I tried to check how it is actually implemented, I've found, to my surprise, that this solution in not widely used.
To quote the Kernel Docs - Unreliable Guide To Locking:
Textbooks will tell you that if you always lock in the same order, you
will never get this kind of deadlock. Practice will tell you that this
approach doesn't scale: when I create a new lock, I don't understand
enough of the kernel to figure out where in the 5000 lock hierarchy it
will fit.
PThreads doesn't seem to have such a mechanism built in at all.
Boost.Thread came up with
completely different solution, lock() for multiple (2 to 5) mutexes is based on trying and locking as many mutexes as it is possible at the moment.
This is the fragment of the Boost.Thread source code (Boost 1.48.0, boost/thread/locks.hpp:1291):
template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3)
{
unsigned const lock_count=3;
unsigned lock_first=0;
for(;;)
{
switch(lock_first)
{
case 0:
lock_first=detail::lock_helper(m1,m2,m3);
if(!lock_first)
return;
break;
case 1:
lock_first=detail::lock_helper(m2,m3,m1);
if(!lock_first)
return;
lock_first=(lock_first+1)%lock_count;
break;
case 2:
lock_first=detail::lock_helper(m3,m1,m2);
if(!lock_first)
return;
lock_first=(lock_first+2)%lock_count;
break;
}
}
}
where lock_helper returns 0 on success and number of mutexes that weren't successfully locked otherwise.
Why is this solution better, than comparing addresses or any other kind of ids? I don't see any problems with pointer comparison, which can be avoided using this kind of "blind" locking.
Are there any other ideas on how to solve this problem on a library level?
From the bounty text:
I'm not even sure if I can prove correctness of the presented Boost solution, which seems more tricky than the one with linear order.
The Boost solution cannot deadlock because it never waits while already holding a lock. All locks but the first are acquired with try_lock. If any try_lock call fails to acquire its lock, all previously acquired locks are freed. Also, in the Boost implementation the new attempt will start from the lock failed to acquire the previous time, and will first wait till it is available; it's a smart design decision.
As a general rule, it's always better to avoid blocking calls while holding a lock. Therefore, the solution with try-lock, if possible, is preferred (in my opinion). As a particular consequence, in case of lock ordering, the system at whole might get stuck. Imagine the very last lock (e.g. the one with the biggest address) was acquired by a thread which was then blocked. Now imagine some other thread needs the last lock and another lock, and due to ordering it will first get the other one and will wait on the last lock. Same can happen with all other locks, and the whole system makes no progress until the last lock is released. Of course it's an extreme and rather unlikely case, but it illustrates the inherent problem with lock ordering: the higher a lock number the more indirect impact the lock has when acquired.
The shortcoming of the try-lock-based solution is that it can cause livelock, and in extreme cases the whole system might also get stuck for at least some time. Therefore it is important to have some back-off schema that make pauses between locking attempts longer with time, and perhaps randomized.
Sometimes, lock A needs to be acquired before lock B does. Lock B might have either a lower or a higher address, so you can't use address comparison in this case.
Example: When you have a tree data-structure, and threads try to read and update nodes, you can protect the tree using a reader-writer lock per node. This only works if your threads always acquire locks top-down root-to-leave. The address of the locks does not matter in this case.
You can only use address comparison if it does not matter at all which lock gets acquired first. If this is the case, address comparison is a good solution. But if this is not the case you can't do it.
I guess the Linux kernel requires certain subsystems to be locked before others are. This cannot be done using address comparison.
The "address comparison" and similar approaches, although used quite often, are special cases. They works fine if you have
a lock-free mechanism to get
two (or more) "items" of the same kind or hierarchy level
any stable ordering schema between those items
For example: You have a mechanism to get two "accounts" from a list. Assume that the access to the list is lock-free. Now you have pointers to both items and want to lock them. Since they are "siblings" you have to choose which one to lock first. Here the approach using addresses (or any other stable ordering schema like "account id") is OK.
But the linked Linux text talks about "lock hierarchies". This means locking not between "siblings" (of the same kind) but between "parent" and "children" which might be from different types. This may happen in actual tree structures as well in other scenarios.
Contrived example: To load a program you must
lock the file inode,
lock the process table
lock the destination memory
These three locks are not "siblings" not in a clear hierarchy. The locks are also not taken directly one after the other - each subsystem will take the locks at free will. If you consider all usecases where those three (and more) subsystems interact you see, that there is no clear, stable ordering you can think of.
The Boost library is in the same situation: It strives to provide generic solutions. So they cannot assume the points from above and must fall back to a more complicated strategy.
One scenario when address compare will fail is if you use the proxy pattern.
You can delegate the locks to the same object and the addresses will be different.
Consider the following example
template<typename MutexType>
class MutexHelper
{
MutexHelper(MutexType &m) : _m(m) {}
void lock()
{
std::cout <<"locking ";
m.lock();
}
void unlock()
{
std::cout <<"unlocking ";
m.unlock();
}
MutexType &_m;
};
if the function
template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3);
will actually use address compare the following code ca produce a deadlock
Mutex m1;
Mutex m1;
thread1
MutexHelper hm1(m1);
MutexHelper hm2(m2);
lock(hm1, hm2);
thread2:
MutexHelper hm2(m2);
MutexHelper hm1(m1);
lock(hm1, hm2);
EDIT:
this is an interesting thread that share some light on boost::lock implementation
thread-best-practice-to-lock-multiple-mutexes
Address compare does not work for inter-process shared mutexes (named synchronization objects).
I have a thread pool with some threads (e.g. as many as number of cores) that work on many objects, say thousands of objects. Normally I would give each object a mutex to protect access to its internals, lock it when I'm doing work, then release it. When two threads would try to access the same object, one of the threads has to wait.
Now I want to save some resources and be scalable, as there may be thousands of objects, and still only a hand full of threads. I'm thinking about a class design where the thread has some sort of mutex or lock object, and assigns the lock to the object when the object should be accessed. This would save resources, as I only have as much lock objects as I have threads.
Now comes the programming part, where I want to transfer this design into code, but don't know quite where to start. I'm programming in C++ and want to use Boost classes where possible, but self written classes that handle these special requirements are ok. How would I implement this?
My first idea was to have a boost::mutex object per thread, and each object has a boost::shared_ptr that initially is unset (or NULL). Now when I want to access the object, I lock it by creating a scoped_lock object and assign it to the shared_ptr. When the shared_ptr is already set, I wait on the present lock. This idea sounds like a heap full of race conditions, so I sort of abandoned it. Is there another way to accomplish this design? A completely different way?
Edit:
The above description is a bit abstract, so let me add a specific example. Imagine a virtual world with many objects (think > 100.000). Users moving in the world could move through the world and modify objects (e.g. shoot arrows at monsters). When only using one thread, I'm good with a work queue where modifications to objects are queued. I want a more scalable design, though. If 128 core processors are available, I want to use all 128, so use that number of threads, each with work queues. One solution would be to use spatial separation, e.g. use a lock for an area. This could reduce number of locks used, but I'm more interested if there's a design which saves as much locks as possible.
You could use a mutex pool instead of allocating one mutex per resource or one mutex per thread. As mutexes are requested, first check the object in question. If it already has a mutex tagged to it, block on that mutex. If not, assign a mutex to that object and signal it, taking the mutex out of the pool. Once the mutex is unsignaled, clear the slot and return the mutex to the pool.
Without knowing it, what you were looking for is Software Transactional Memory (STM).
STM systems manage with the needed locks internally to ensure the ACI properties (Atomic,Consistent,Isolated). This is a research activity. You can find a lot of STM libraries; in particular I'm working on Boost.STM (The library is not yet for beta test, and the documentation is not really up to date, but you can play with). There are also some compilers that are introducing TM in (as Intel, IBM, and SUN compilers). You can get the draft specification from here
The idea is to identify the critical regions as follows
transaction {
// transactional block
}
and let the STM system to manage with the needed locks as far as it ensures the ACI properties.
The Boost.STM approach let you write things like
int inc_and_ret(stm::object<int>& i) {
BOOST_STM_TRANSACTION {
return ++i;
} BOOST_STM_END_TRANSACTION
}
You can see the couple BOOST_STM_TRANSACTION/BOOST_STM_END_TRANSACTION as a way to determine a scoped implicit lock.
The cost of this pseudo transparency is of 4 meta-data bytes for each stm::object.
Even if this is far from your initial design I really think is what was behind your goal and initial design.
I doubt there's any clean way to accomplish your design. The problem that assigning the mutex to the object looks like it'll modify the contents of the object -- so you need a mutex to protect the object from several threads trying to assign mutexes to it at once, so to keep your first mutex assignment safe, you'd need another mutex to protect the first one.
Personally, I think what you're trying to cure probably isn't a problem in the first place. Before I spent much time on trying to fix it, I'd do a bit of testing to see what (if anything) you lose by simply including a Mutex in each object and being done with it. I doubt you'll need to go any further than that.
If you need to do more than that I'd think of having a thread-safe pool of objects, and anytime a thread wants to operate on an object, it has to obtain ownership from that pool. The call to obtain ownership would release any object currently owned by the requesting thread (to avoid deadlocks), and then give it ownership of the requested object (blocking if the object is currently owned by another thread). The object pool manager would probably operate in a thread by itself, automatically serializing all access to the pool management, so the pool management code could avoid having to lock access to the variables telling it who currently owns what object and such.
Personally, here's what I would do. You have a number of objects, all probably have a key of some sort, say names. So take the following list of people's names:
Bill Clinton
Bill Cosby
John Doe
Abraham Lincoln
Jon Stewart
So now you would create a number of lists: one per letter of the alphabet, say. Bill and Bill would go in one list, John, Jon Abraham all by themselves.
Each list would be assigned to a specific thread - access would have to go through that thread (you would have to marshall operations to an object onto that thread - a great use of functors). Then you only have two places to lock:
thread() {
loop {
scoped_lock lock(list.mutex);
list.objectAccess();
}
}
list_add() {
scoped_lock lock(list.mutex);
list.add(..);
}
Keep the locks to a minimum, and if you're still doing a lot of locking, you can optimise the number of iterations you perform on the objects in your lists from 1 to 5, to minimize the amount of time spent acquiring locks. If your data set grows or is keyed by number, you can do any amount of segregating data to keep the locking to a minimum.
It sounds to me like you need a work queue. If the lock on the work queue became a bottle neck you could switch it around so that each thread had its own work queue then some sort of scheduler would give the incoming object to the thread with the least amount of work to do. The next level up from that is work stealing where threads that have run out of work look at the work queues of other threads.(See Intel's thread building blocks library.)
If I follow you correctly ....
struct table_entry {
void * pObject; // substitute with your object
sem_t sem; // init to empty
int nPenders; // init to zero
};
struct table_entry * table;
object_lock (void * pObject) {
goto label; // yes it is an evil goto
do {
pEntry->nPenders++;
unlock (mutex);
sem_wait (sem);
label:
lock (mutex);
found = search (table, pObject, &pEntry);
} while (found);
add_object_to_table (table, pObject);
unlock (mutex);
}
object_unlock (void * pObject) {
lock (mutex);
pEntry = remove (table, pObject); // assuming it is in the table
if (nPenders != 0) {
nPenders--;
sem_post (pEntry->sem);
}
unlock (mutex);
}
The above should work, but it does have some potential drawbacks such as ...
A possible bottleneck in the search.
Thread starvation. There is no guarantee that any given thread will get out of the do-while loop in object_lock().
However, depending upon your setup, these potential draw-backs might not matter.
Hope this helps.
We here have an interest in a similar model. A solution we have considered is to have a global (or shared) lock but used in the following manner:
A flag that can be atomically set on the object. If you set the flag you then own the object.
You perform your action then reset the variable and signal (broadcast) a condition variable.
If the acquire failed you wait on the condition variable. When it is broadcast you check its state to see if it is available.
It does appear though that we need to lock the mutex each time we change the value of this variable. So there is a lot of locking and unlocking but you do not need to keep the lock for any long period.
With a "shared" lock you have one lock applying to multiple items. You would use some kind of "hash" function to determine which mutex/condition variable applies to this particular entry.
Answer the following question under the #JohnDibling's post.
did you implement this solution ? I've a similar problem and I would like to know how you solved to release the mutex back to the pool. I mean, how do you know, when you release the mutex, that it can be safely put back in queue if you do not know if another thread is holding it ?
by #LeonardoBernardini
I'm currently trying to solve the same kind of problem. My approach is create your own mutex struct (call it counterMutex) with a counter field and the real resource mutex field. So every time you try to lock the counterMutex, first you increment the counter then lock the underlying mutex. When you're done with it, you decrement the coutner and unlock the mutex, after that check the counter to see if it's zero which means no other thread is trying to acquire the lock . If so put the counterMutex back to the pool. Is there a race condition when manipulating the counter? you may ask. The answer is NO. Remember you have a global mutex to ensure that only one thread can access the coutnerMutex at one time.