How to detect circular calls? - c++

I've been looking for causes for deadlocks and strategies/tools to avoid and detect them.
Another potential cause for deadlocks is to have blocking functions calling other blocking functions in a circular way, so that eventually a call never returns.
Sometimes this is hard to discover, specially in very large projects.
So, are there any tools/libraries/techiques that allow to automate the detection of circular calls in a program?
EDIT:
I code mostly in C and C++ so, if possible, give any information about the topic that is applicable to those languages.
Nevertheless, it seems this topic is scarcely covered in SO, so answers for other languages are ok too. although maybe those deserve a topic of its own if someone finds it relevant
Thanks.

Circular (or recursive) calls that try to acquire the same non-reentrant lock are one of the easiest to debug blocking scenarios: locking is deterministic, and can be easily checked. When the application locks, fire up the debugger and look at the stack trace to understand what locks are held and why.
As to general solutions for the problem of locking... you can look into some libraries that provide mutex ordering, and detect when you are trying to lock on a mutex out of order. This type of solutions might be complex to implement correctly, but once in place it ensures that you cannot enter a deadlock condition, as it forces all processes to obtain the locks in the same order (i.e. if process A holds lock La, and it tries to acquire lock Lb for which the ordering is correct, then it can either succeed or lock, but whichever process is holding lock Lb cannot try to lock La as the ordering constraint would not be met).

If you are on Linux there 2 Valgrind tools for detecting deadlocks and race conditions: Helgrind, DRD. They both complement each other and it's worth to check for thread errors by both of them.

In linux you can use valgrind to detect deadlocks, use --tool=helgrind.

Best way to detect deadlocks (IMO) is to make a test program that calls all the functions in a random order in like 30 different threads 10000s of times.
If you get a deadlock you can use VS2010 "Parallel Stacks" window. Debug->Windows->Parallel Stacks
This window will show you all the stacks, so you can find the methods that are deadlocking.
A simple strategy I use to write thread-safe objects:
A thread safe object should be safe when its public methods are called, so you don't get deadlocks when it is used.
So, the idea is to just lock all the public methods that access the object's data.
Besides that you need to insure that within the class' code you never call a public method. If you need to use one of the public methods, then make that method private, and wrap the private method with a public method that locks and then calls it.
If you want better lock granularity you could just create objects for each part that has its own lock, and lock it like I suggested. Then use encapsulation to combine those classes to the one class.
Example:
class Blah {
MyData data;
Lock lock;
public:
DataItem GetData(int index)
{
ReadLock read(lock);
return LocalGetData(index);
}
DataItem FindData(string key)
{
ReadLock read(lock);
DataItem item;
//find the item, can use LocalGetData() to get the item without deadlocking
return item;
}
void PutData(DataItem item)
{
ReadLock write(lock);
//put item in database
}
private:
DataItem LocalGetData(int index)
{
return data[index];
}
}

You could find a tool that builds a call graph, and check the graph for cycles.
Otherwise, there are a number of strategies for detecting deadlocks or other circularities, but they all depend on having some sort of supporting infrastructure in place.
There are deadlock avoidance strategies, having to do with assigning lock priorities and ordering the locks according to priority. These require code changes and enforcing the standards, though.

Related

Multiple mutex locking strategies and why libraries don't use address comparison

There is a widely known way of locking multiple locks, which relies on choosing fixed linear ordering and aquiring locks according to this ordering.
That was proposed, for example, in the answer for "Acquire a lock on two mutexes and avoid deadlock". Especially, the solution based on address comparison seems to be quite elegant and obvious.
When I tried to check how it is actually implemented, I've found, to my surprise, that this solution in not widely used.
To quote the Kernel Docs - Unreliable Guide To Locking:
Textbooks will tell you that if you always lock in the same order, you
will never get this kind of deadlock. Practice will tell you that this
approach doesn't scale: when I create a new lock, I don't understand
enough of the kernel to figure out where in the 5000 lock hierarchy it
will fit.
PThreads doesn't seem to have such a mechanism built in at all.
Boost.Thread came up with
completely different solution, lock() for multiple (2 to 5) mutexes is based on trying and locking as many mutexes as it is possible at the moment.
This is the fragment of the Boost.Thread source code (Boost 1.48.0, boost/thread/locks.hpp:1291):
template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3)
{
unsigned const lock_count=3;
unsigned lock_first=0;
for(;;)
{
switch(lock_first)
{
case 0:
lock_first=detail::lock_helper(m1,m2,m3);
if(!lock_first)
return;
break;
case 1:
lock_first=detail::lock_helper(m2,m3,m1);
if(!lock_first)
return;
lock_first=(lock_first+1)%lock_count;
break;
case 2:
lock_first=detail::lock_helper(m3,m1,m2);
if(!lock_first)
return;
lock_first=(lock_first+2)%lock_count;
break;
}
}
}
where lock_helper returns 0 on success and number of mutexes that weren't successfully locked otherwise.
Why is this solution better, than comparing addresses or any other kind of ids? I don't see any problems with pointer comparison, which can be avoided using this kind of "blind" locking.
Are there any other ideas on how to solve this problem on a library level?
From the bounty text:
I'm not even sure if I can prove correctness of the presented Boost solution, which seems more tricky than the one with linear order.
The Boost solution cannot deadlock because it never waits while already holding a lock. All locks but the first are acquired with try_lock. If any try_lock call fails to acquire its lock, all previously acquired locks are freed. Also, in the Boost implementation the new attempt will start from the lock failed to acquire the previous time, and will first wait till it is available; it's a smart design decision.
As a general rule, it's always better to avoid blocking calls while holding a lock. Therefore, the solution with try-lock, if possible, is preferred (in my opinion). As a particular consequence, in case of lock ordering, the system at whole might get stuck. Imagine the very last lock (e.g. the one with the biggest address) was acquired by a thread which was then blocked. Now imagine some other thread needs the last lock and another lock, and due to ordering it will first get the other one and will wait on the last lock. Same can happen with all other locks, and the whole system makes no progress until the last lock is released. Of course it's an extreme and rather unlikely case, but it illustrates the inherent problem with lock ordering: the higher a lock number the more indirect impact the lock has when acquired.
The shortcoming of the try-lock-based solution is that it can cause livelock, and in extreme cases the whole system might also get stuck for at least some time. Therefore it is important to have some back-off schema that make pauses between locking attempts longer with time, and perhaps randomized.
Sometimes, lock A needs to be acquired before lock B does. Lock B might have either a lower or a higher address, so you can't use address comparison in this case.
Example: When you have a tree data-structure, and threads try to read and update nodes, you can protect the tree using a reader-writer lock per node. This only works if your threads always acquire locks top-down root-to-leave. The address of the locks does not matter in this case.
You can only use address comparison if it does not matter at all which lock gets acquired first. If this is the case, address comparison is a good solution. But if this is not the case you can't do it.
I guess the Linux kernel requires certain subsystems to be locked before others are. This cannot be done using address comparison.
The "address comparison" and similar approaches, although used quite often, are special cases. They works fine if you have
a lock-free mechanism to get
two (or more) "items" of the same kind or hierarchy level
any stable ordering schema between those items
For example: You have a mechanism to get two "accounts" from a list. Assume that the access to the list is lock-free. Now you have pointers to both items and want to lock them. Since they are "siblings" you have to choose which one to lock first. Here the approach using addresses (or any other stable ordering schema like "account id") is OK.
But the linked Linux text talks about "lock hierarchies". This means locking not between "siblings" (of the same kind) but between "parent" and "children" which might be from different types. This may happen in actual tree structures as well in other scenarios.
Contrived example: To load a program you must
lock the file inode,
lock the process table
lock the destination memory
These three locks are not "siblings" not in a clear hierarchy. The locks are also not taken directly one after the other - each subsystem will take the locks at free will. If you consider all usecases where those three (and more) subsystems interact you see, that there is no clear, stable ordering you can think of.
The Boost library is in the same situation: It strives to provide generic solutions. So they cannot assume the points from above and must fall back to a more complicated strategy.
One scenario when address compare will fail is if you use the proxy pattern.
You can delegate the locks to the same object and the addresses will be different.
Consider the following example
template<typename MutexType>
class MutexHelper
{
MutexHelper(MutexType &m) : _m(m) {}
void lock()
{
std::cout <<"locking ";
m.lock();
}
void unlock()
{
std::cout <<"unlocking ";
m.unlock();
}
MutexType &_m;
};
if the function
template<typename MutexType1,typename MutexType2,typename MutexType3>
void lock(MutexType1& m1,MutexType2& m2,MutexType3& m3);
will actually use address compare the following code ca produce a deadlock
Mutex m1;
Mutex m1;
thread1
MutexHelper hm1(m1);
MutexHelper hm2(m2);
lock(hm1, hm2);
thread2:
MutexHelper hm2(m2);
MutexHelper hm1(m1);
lock(hm1, hm2);
EDIT:
this is an interesting thread that share some light on boost::lock implementation
thread-best-practice-to-lock-multiple-mutexes
Address compare does not work for inter-process shared mutexes (named synchronization objects).

How to synchronize access to many objects

I have a thread pool with some threads (e.g. as many as number of cores) that work on many objects, say thousands of objects. Normally I would give each object a mutex to protect access to its internals, lock it when I'm doing work, then release it. When two threads would try to access the same object, one of the threads has to wait.
Now I want to save some resources and be scalable, as there may be thousands of objects, and still only a hand full of threads. I'm thinking about a class design where the thread has some sort of mutex or lock object, and assigns the lock to the object when the object should be accessed. This would save resources, as I only have as much lock objects as I have threads.
Now comes the programming part, where I want to transfer this design into code, but don't know quite where to start. I'm programming in C++ and want to use Boost classes where possible, but self written classes that handle these special requirements are ok. How would I implement this?
My first idea was to have a boost::mutex object per thread, and each object has a boost::shared_ptr that initially is unset (or NULL). Now when I want to access the object, I lock it by creating a scoped_lock object and assign it to the shared_ptr. When the shared_ptr is already set, I wait on the present lock. This idea sounds like a heap full of race conditions, so I sort of abandoned it. Is there another way to accomplish this design? A completely different way?
Edit:
The above description is a bit abstract, so let me add a specific example. Imagine a virtual world with many objects (think > 100.000). Users moving in the world could move through the world and modify objects (e.g. shoot arrows at monsters). When only using one thread, I'm good with a work queue where modifications to objects are queued. I want a more scalable design, though. If 128 core processors are available, I want to use all 128, so use that number of threads, each with work queues. One solution would be to use spatial separation, e.g. use a lock for an area. This could reduce number of locks used, but I'm more interested if there's a design which saves as much locks as possible.
You could use a mutex pool instead of allocating one mutex per resource or one mutex per thread. As mutexes are requested, first check the object in question. If it already has a mutex tagged to it, block on that mutex. If not, assign a mutex to that object and signal it, taking the mutex out of the pool. Once the mutex is unsignaled, clear the slot and return the mutex to the pool.
Without knowing it, what you were looking for is Software Transactional Memory (STM).
STM systems manage with the needed locks internally to ensure the ACI properties (Atomic,Consistent,Isolated). This is a research activity. You can find a lot of STM libraries; in particular I'm working on Boost.STM (The library is not yet for beta test, and the documentation is not really up to date, but you can play with). There are also some compilers that are introducing TM in (as Intel, IBM, and SUN compilers). You can get the draft specification from here
The idea is to identify the critical regions as follows
transaction {
// transactional block
}
and let the STM system to manage with the needed locks as far as it ensures the ACI properties.
The Boost.STM approach let you write things like
int inc_and_ret(stm::object<int>& i) {
BOOST_STM_TRANSACTION {
return ++i;
} BOOST_STM_END_TRANSACTION
}
You can see the couple BOOST_STM_TRANSACTION/BOOST_STM_END_TRANSACTION as a way to determine a scoped implicit lock.
The cost of this pseudo transparency is of 4 meta-data bytes for each stm::object.
Even if this is far from your initial design I really think is what was behind your goal and initial design.
I doubt there's any clean way to accomplish your design. The problem that assigning the mutex to the object looks like it'll modify the contents of the object -- so you need a mutex to protect the object from several threads trying to assign mutexes to it at once, so to keep your first mutex assignment safe, you'd need another mutex to protect the first one.
Personally, I think what you're trying to cure probably isn't a problem in the first place. Before I spent much time on trying to fix it, I'd do a bit of testing to see what (if anything) you lose by simply including a Mutex in each object and being done with it. I doubt you'll need to go any further than that.
If you need to do more than that I'd think of having a thread-safe pool of objects, and anytime a thread wants to operate on an object, it has to obtain ownership from that pool. The call to obtain ownership would release any object currently owned by the requesting thread (to avoid deadlocks), and then give it ownership of the requested object (blocking if the object is currently owned by another thread). The object pool manager would probably operate in a thread by itself, automatically serializing all access to the pool management, so the pool management code could avoid having to lock access to the variables telling it who currently owns what object and such.
Personally, here's what I would do. You have a number of objects, all probably have a key of some sort, say names. So take the following list of people's names:
Bill Clinton
Bill Cosby
John Doe
Abraham Lincoln
Jon Stewart
So now you would create a number of lists: one per letter of the alphabet, say. Bill and Bill would go in one list, John, Jon Abraham all by themselves.
Each list would be assigned to a specific thread - access would have to go through that thread (you would have to marshall operations to an object onto that thread - a great use of functors). Then you only have two places to lock:
thread() {
loop {
scoped_lock lock(list.mutex);
list.objectAccess();
}
}
list_add() {
scoped_lock lock(list.mutex);
list.add(..);
}
Keep the locks to a minimum, and if you're still doing a lot of locking, you can optimise the number of iterations you perform on the objects in your lists from 1 to 5, to minimize the amount of time spent acquiring locks. If your data set grows or is keyed by number, you can do any amount of segregating data to keep the locking to a minimum.
It sounds to me like you need a work queue. If the lock on the work queue became a bottle neck you could switch it around so that each thread had its own work queue then some sort of scheduler would give the incoming object to the thread with the least amount of work to do. The next level up from that is work stealing where threads that have run out of work look at the work queues of other threads.(See Intel's thread building blocks library.)
If I follow you correctly ....
struct table_entry {
void * pObject; // substitute with your object
sem_t sem; // init to empty
int nPenders; // init to zero
};
struct table_entry * table;
object_lock (void * pObject) {
goto label; // yes it is an evil goto
do {
pEntry->nPenders++;
unlock (mutex);
sem_wait (sem);
label:
lock (mutex);
found = search (table, pObject, &pEntry);
} while (found);
add_object_to_table (table, pObject);
unlock (mutex);
}
object_unlock (void * pObject) {
lock (mutex);
pEntry = remove (table, pObject); // assuming it is in the table
if (nPenders != 0) {
nPenders--;
sem_post (pEntry->sem);
}
unlock (mutex);
}
The above should work, but it does have some potential drawbacks such as ...
A possible bottleneck in the search.
Thread starvation. There is no guarantee that any given thread will get out of the do-while loop in object_lock().
However, depending upon your setup, these potential draw-backs might not matter.
Hope this helps.
We here have an interest in a similar model. A solution we have considered is to have a global (or shared) lock but used in the following manner:
A flag that can be atomically set on the object. If you set the flag you then own the object.
You perform your action then reset the variable and signal (broadcast) a condition variable.
If the acquire failed you wait on the condition variable. When it is broadcast you check its state to see if it is available.
It does appear though that we need to lock the mutex each time we change the value of this variable. So there is a lot of locking and unlocking but you do not need to keep the lock for any long period.
With a "shared" lock you have one lock applying to multiple items. You would use some kind of "hash" function to determine which mutex/condition variable applies to this particular entry.
Answer the following question under the #JohnDibling's post.
did you implement this solution ? I've a similar problem and I would like to know how you solved to release the mutex back to the pool. I mean, how do you know, when you release the mutex, that it can be safely put back in queue if you do not know if another thread is holding it ?
by #LeonardoBernardini
I'm currently trying to solve the same kind of problem. My approach is create your own mutex struct (call it counterMutex) with a counter field and the real resource mutex field. So every time you try to lock the counterMutex, first you increment the counter then lock the underlying mutex. When you're done with it, you decrement the coutner and unlock the mutex, after that check the counter to see if it's zero which means no other thread is trying to acquire the lock . If so put the counterMutex back to the pool. Is there a race condition when manipulating the counter? you may ask. The answer is NO. Remember you have a global mutex to ensure that only one thread can access the coutnerMutex at one time.

what is the best way to synchronize container access between multiple threads in real-time application

I have std::list<Info> infoList in my application that is shared between two threads. These 2 threads are accessing this list as follows:
Thread 1: uses push_back(), pop_front() or clear() on the list (Depending on the situation)
Thread 2: uses an iterator to iterate through the items in the list and do some actions.
Thread 2 is iterating the list like the following:
for(std::list<Info>::iterator i = infoList.begin(); i != infoList.end(); ++i)
{
DoAction(i);
}
The code is compiled using GCC 4.4.2.
Sometimes ++i causes a segfault and crashes the application. The error is caused in std_list.h line 143 at the following line:
_M_node = _M_node->_M_next;
I guess this is a racing condition. The list might have changed or even cleared by thread 1 while thread 2 was iterating it.
I used Mutex to synchronize access to this list and all went ok during my initial test. But the system just freezes under stress test making this solution totally unacceptable. This application is a real-time application and i need to find a solution so both threads can run as fast as possible without hurting the total applications throughput.
My question is this:
Thread 1 and Thread 2 need to execute as fast as possible since this is a real-time application. what can i do to prevent this problem and still maintain the application performance? Are there any lock-free algorithms available for such a problem?
Its ok if i miss some newly added Info objects in thread 2's iteration but what can i do to prevent the iterator from becoming a dangling pointer?
Thanks
Your for() loop can potentially keep a lock for a relatively long time, depending on how many elements it iterates. You can get in real trouble if it "polls" the queue, constantly checking if a new element became available. That makes the thread own the mutex for an unreasonably long time, giving few opportunities to the producer thread to break in and add an element. And burning lots of unnecessary CPU cycles in the process.
You need a "bounded blocking queue". Don't write it yourself, the lock design is not trivial. Hard to find good examples, most of it is .NET code. This article looks promising.
In general it is not safe to use STL containers this way. You will have to implement specific method to make your code thread safe. The solution you chose depends on your needs. I would probably solve this by maintaining two lists, one in each thread. And communicating the changes through a lock free queue (mentioned in the comments to this question). You could also limit the lifetime of your Info objects by wrapping them in boost::shared_ptr e.g.
typedef boost::shared_ptr<Info> InfoReference;
typedef std::list<InfoReference> InfoList;
enum CommandValue
{
Insert,
Delete
}
struct Command
{
CommandValue operation;
InfoReference reference;
}
typedef LockFreeQueue<Command> CommandQueue;
class Thread1
{
Thread1(CommandQueue queue) : m_commands(queue) {}
void run()
{
while (!finished)
{
//Process Items and use
// deleteInfo() or addInfo()
};
}
void deleteInfo(InfoReference reference)
{
Command command;
command.operation = Delete;
command.reference = reference;
m_commands.produce(command);
}
void addInfo(InfoReference reference)
{
Command command;
command.operation = Insert;
command.reference = reference;
m_commands.produce(command);
}
}
private:
CommandQueue& m_commands;
InfoList m_infoList;
}
class Thread2
{
Thread2(CommandQueue queue) : m_commands(queue) {}
void run()
{
while(!finished)
{
processQueue();
processList();
}
}
void processQueue()
{
Command command;
while (m_commands.consume(command))
{
switch(command.operation)
{
case Insert:
m_infoList.push_back(command.reference);
break;
case Delete:
m_infoList.remove(command.reference);
break;
}
}
}
void processList()
{
// Iterate over m_infoList
}
private:
CommandQueue& m_commands;
InfoList m_infoList;
}
void main()
{
CommandQueue commands;
Thread1 thread1(commands);
Thread2 thread2(commands);
thread1.start();
thread2.start();
waitforTermination();
}
This has not been compiled. You still need to make sure that access to your Info objects is thread safe.
I would like to know what is the purpose of this list, it would be easier to answer the question then.
As Hoare said, it is generally a bad idea to try to share data to communicate between two threads, rather you should communicate to share data: ie messaging.
If this list is modelling a queue, for example, you might simply use one of the various ways to communicate (such as sockets) between the two threads. Consumer / Producer is a standard and well-known problem.
If your items are expensive, then only pass the pointers around during communication, you'll avoid copying the items themselves.
In general, it's exquisitely difficult to share data, although it is unfortunately the only way of programming we hear of in school. Normally only low-level implementation of "channels" of communication should ever worry about synchronization and you should learn to use the channels to communicate instead of trying to emulate them.
To prevent your iterator from being invalidated you have to lock the whole for loop. Now I guess the first thread may have difficulties updating the list. I would try to give it a chance to do its job on each (or every Nth iteration).
In pseudo-code that would look like:
mutex_lock();
for(...){
doAction();
mutex_unlock();
thread_yield(); // give first thread a chance
mutex_lock();
if(iterator_invalidated_flag) // set by first thread
reset_iterator();
}
mutex_unlock();
You have to decide which thread is the more important. If it is the update thread, then it must signal the iterator thread to stop, wait and start again. If it is the iterator thread, it can simply lock the list until iteration is done.
The best way to do this is to use a container that is internally synchronized. TBB and Microsoft's concurrent_queue do this. Anthony Williams also has a good implementation on his blog here
Others have already suggested lock-free alternatives, so I'll answer as if you were stuck using locks...
When you modify a list, existing iterators can become invalidated because they no longer point to valid memory (the list automagically reallocates more memory when it needs to grow). To prevent invalidated iterators, you could make the producer block on a mutex while your consumer traverses the list, but that would be needless waiting for the producer.
Life would be easier if you used a queue instead of a list, and have your consumer use a synchronized queue<Info>::pop_front() call instead of iterators that can be invalidated behind your back. If your consumer really needs to gobble chunks of Info at a time, then use a condition variable that'll make your consumer block until queue.size() >= minimum.
The Boost library has a nice portable implementation of condition variables (that even works with older versions of Windows), as well as the usual threading library stuff.
For a producer-consumer queue that uses (old-fashioned) locking, check out the BlockingQueue template class of the ZThreads library. I have not used ZThreads myself, being worried about lack of recent updates, and because it didn't seem to be widely used. However, I have used it as inspiration for rolling my own thread-safe producer-consumer queue (before I learned about lock-free queues and TBB).
A lock-free queue/stack library seems to be in the Boost review queue. Let's hope we see a new Boost.Lockfree in the near future! :)
If there's interest, I can write up an example of a blocking queue that uses std::queue and Boost thread locking.
EDIT:
The blog referenced by Rick's answer already has a blocking queue example that uses std::queue and Boost condvars. If your consumer needs to gobble chunks, you can extend the example as follows:
void wait_for_data(size_t how_many)
{
boost::mutex::scoped_lock lock(the_mutex);
while(the_queue.size() < how_many)
{
the_condition_variable.wait(lock);
}
}
You may also want to tweak it to allow time-outs and cancellations.
You mentioned that speed was a concern. If your Infos are heavyweight, you should consider passing them around by shared_ptr. You can also try making your Infos fixed size and use a memory pool (which can be much faster than the heap).
As you mentioned that you don't care if your iterating consumer misses some newly-added entries, you could use a copy-on-write list underneath. That allows the iterating consumer to operate on a consistent snapshot of the list as of when it first started, while, in other threads, updates to the list yield fresh but consistent copies, without perturbing any of the extant snapshots.
The trade here is that each update to the list requires locking for exclusive access long enough to copy the entire list. This technique is biased toward having many concurrent readers and less frequent updates.
Trying to add intrinsic locking to the container first requires you to think about which operations need to behave in atomic groups. For instance, checking if the list is empty before trying to pop off the first element requires an atomic pop-if-not-empty operation; otherwise, the answer to the list being empty can change in between when the caller receives the answer and attempts to act upon it.
It's not clear in your example above what guarantees the iteration must obey. Must every element in the list eventually be visited by the iterating thread? Does it make multiple passes? What does it mean for one thread to remove an element from the list while another thread is running DoAction() against it? I suspect that working through these questions will lead to significant design changes.
You're working in C++, and you mentioned needing a queue with a pop-if-not-empty operation. I wrote a two-lock queue many years ago using the ACE Library's concurrency primitives, as the Boost thread library was not yet ready for production use, and the chance for the C++ Standard Library including such facilities was a distant dream. Porting it to something more modern would be easy.
This queue of mine -- called concurrent::two_lock_queue -- allows access to the head of the queue only via RAII. This ensures that acquiring the lock to read the head will always be mated with a release of the lock. A consumer constructs a const_front (const access to head element), a front (non-const access to head element), or a renewable_front (non-const access to head and successor elements) object to represent the exclusive right to access the head element of the queue. Such "front" objects can't be copied.
Class two_lock_queue also offers a pop_front() function that waits until at least one element is available to be removed, but, in keeping with std::queue's and std::stack's style of not mixing container mutation and value copying, pop_front() returns void.
In a companion file, there's a type called concurrent::unconditional_pop, which allows one to ensure through RAII that the head element of the queue will be popped upon exit from the current scope.
The companion file error.hh defines the exceptions that arise from use of the function two_lock_queue::interrupt(), used to unblock threads waiting for access to the head of the queue.
Take a look at the code and let me know if you need more explanation as to how to use it.
If you're using C++0x you could internally synchronize list iteration this way:
Assuming the class has a templated list named objects_, and a boost::mutex named mutex_
The toAll method is a member method of the list wrapper
void toAll(std::function<void (T*)> lambda)
{
boost::mutex::scoped_lock(this->mutex_);
for(auto it = this->objects_.begin(); it != this->objects_.end(); it++)
{
T* object = it->second;
if(object != nullptr)
{
lambda(object);
}
}
}
Calling:
synchronizedList1->toAll(
[&](T* object)->void // Or the class that your list holds
{
for(auto it = this->knownEntities->begin(); it != this->knownEntities->end(); it++)
{
// Do something
}
}
);
You must be using some threading library. If you are using Intel TBB, you can use concurrent_vector or concurrent_queue. See this.
If you want to continue using std::list in a multi-threaded environment, I would recommend wrapping it in a class with a mutex that provides locked access to it. Depending on the exact usage, it might make sense to switch to a event-driven queue model where messages are passed on a queue that multiple worker threads are consuming (hint: producer-consumer).
I would seriously take Matthieu's thought into consideration. Many problems that are being solved using multi-threaded programming are better solved using message-passing between threads or processes. If you need high throughput and do not absolutely require that the processing share the same memory space, consider using something like the Message-Passing Interface (MPI) instead of rolling your own multi-threaded solution. There are a bunch of C++ implementations available - OpenMPI, Boost.MPI, Microsoft MPI, etc. etc.
I don't think you can get away without any synchronisation at all in this case as certain operation will invalidate the iterators you are using. With a list, this is fairly limited (basically, if both threads are trying to manipulate iterators to the same element at the same time) but there is still a danger that you'll be removing an element at the same time you're trying to append one to it.
Are you by any chance holding the lock across DoAction(i)? You obviously only want to hold the lock for the absolute minimum of time that you can get away with in order to maximise the performance. From the code above I think you'll want to decompose the loop somewhat in order to speed up both sides of the operation.
Something along the lines of:
while (processItems) {
Info item;
lock(mutex);
if (!infoList.empty()) {
item = infoList.front();
infoList.pop_front();
}
unlock(mutex);
DoAction(item);
delayALittle();
}
And the insert function would still have to look like this:
lock(mutex);
infoList.push_back(item);
unlock(mutex);
Unless the queue is likely to be massive, I'd be tempted to use something like a std::vector<Info> or even a std::vector<boost::shared_ptr<Info> > to minimize the copying of the Info objects (assuming that these are somewhat more expensive to copy compared to a boost::shared_ptr. Generally, operations on a vector tend to be a little faster than on a list, especially if the objects stored in the vector are small and cheap to copy.

Is this the right approach for a thread-safe Queue class?

I'm wondering if this is the right approach to writing a thread-safe queue in C++?
template <class T>
class Queue
{
public:
Queue() {}
void Push(T& a)
{
m_mutex.lock();
m_q.push_back(a);
m_mutex.unlock();
}
T& Pop()
{
m_mutex.lock();
T& temp = m_q.pop();
m_mutex.unlock();
return temp;
}
private:
std::queue<t> m_q;
boost::mutex m_mutex;
};
You get the idea... I'm just wondering if this is the best approach. Thanks!
EDIT:
Because of the questions I'm getting, I wanted to clarify that the mutex is a boost::mutex
I recommend using the Boost threading libraries to assist you with this.
Your code is fine, except that when you write code in C++ like
some_mutex.lock();
// do something
some_mutex.unlock();
then if the code in the // do something section throws an exception then the lock will never be released. The Boost library solves this with its classes such as lock_guard in which you initialize an object which acquires a lock in its constructor, and whose destructor releases the lock. That way you know that your lock will always be released. Other languages accomplish this through try/finally statements, but C++ doesn't support this construct.
In particular, what happens when you try to read from a queue with no elements? Does that throw an exception? If so, then your code would run into problems.
When trying to get the first element, you probably want to check if something is there, then go to sleep if not and wait until something is. This is a job for a condition object, also provided by the Boost library, though available at a lower level if you prefer.
Herb Sutter wrote an excellent article last year in Dr. Dobbs Journal, covering all of the major concerns for a thread-safe, lock-free, single-producer, single-consumer queue implementation. (Which made corrections over an implementation published the previous month.)
His followup article in the next issue tackled a more generic approach for a multi-user concurrent queue, along with a full discussion of potential pitfalls and performance issues.
There are a few more articles on similar concurrency topics.
Enjoy.
From a threading-point of view, that looks about right for a simple, thread-safe queue.
You do have one problem, though: std::queue's pop() does not return the element popped from the queue. What you need to do is:
T Pop()
{
m_mutex.lock();
T temp = m_q.front();
m_q.pop();
m_mutex.unlock();
return temp;
}
You don't want to return a reference in this case since the referenced element is being popped from the queue and destroyed.
You also need to have some public Size() function to tell you how many elements are in the queue (either that, or you'll need to gracefully handle the case where Pop() is called and there are no elements in the queue).
Edit: Though, as Eli Courtwright points out, you do have to be careful with the queue operations throwing exceptions, and using Boost is a good idea.
Depends what are your goals. Crafted in this manner, you'll have your "reader" client blocking your "writer" client. You might want to consider using a "condition" to avoid dead-locks etc.
The approach that your are trying to implement is a locking approach. It will work, except that if you use a plain system-provided "mutex" object, it's performance might turn out disappointing (its lock-unlock overhead is pretty high). It is hard to say whether it will be good or not, since we don't know what your performance requirements and expectations are.
Since the operations you perform in "locked" segments of your code are rather quick, it might make sense to use a spin-lock instead of a true mutex, or a combination of the two. This will give you much better performance. Then again, maybe your "mutex" already implements all that (no way to know, since you provided no details about what is actually hiding behind that name).
And finally, if you are happen to be looking for best performance, you might want to read up on lock-free synchronization approach, which is a completely different story. Lock-free methods are typically much more difficult to implement though.
As Jean-Lou Dupont pointed out, your current code is quite prone to deadlocks. When I've done this, I've used three locks: two counted semaphores and one mutex. The counted semaphores signal when:there's space available to insert an objectThere at least one object to retrieve from the queue
The mutex is used only while actually putting an item in the queue or retrieving an item from the queue -- but no attempt is ever made at locking the mutex until we know that the insertion or retrieval will be able to succeed immediately.

Is it practically safe to write static data from multiple threads

I have some status data that I want to cache from a database. Any of several threads may modify the status data. After the data is modified it will be written to the database. The database writes will always be done in series by the underlying database access layer which queues database operations in a different process so I cam not concerned about race conditions for those.
Is it a problem to just modify the static data from several threads? In theory it is possible that modifications are implemented as read, modify, write but in practice I can't imagine that this is so.
My data handling class will look something like this:
class StatusCache
{
public:
static void SetActivityStarted(bool activityStarted)
{ m_activityStarted = activityStarted; WriteToDB(); }
static void SetActivityComplete(bool activityComplete);
{ m_activityComplete = activityComplete; WriteToDB(); }
static void SetProcessReady(bool processReady);
{ m_processReady = processReady; WriteToDB(); }
static void SetProcessPending(bool processPending);
{ m_processPending = processPending; WriteToDB(); }
private:
static void WriteToDB(); // will write all the class data to the db (multiple requests will happen in series)
static bool m_activityStarted;
static bool m_activityComplete;
static bool m_processReady;
static bool m_processPending;
};
I don't want to use locks as there are already a couple of locks in this part of the app and adding more will increase the possibility of deadlocks.
It doesn't matter if there is some overlap between 2 threads in the database update, e.g.
thread 1 thread 2 activity started in db
SetActivityStarted(true) SetActivityStarted(false)
m_activityStated = true
m_activityStarted = false
WriteToDB() false
WriteToDB() false
So the db shows the status that was most recently set by the m_... = x lines. This is OK.
Is this a reasonable approach to use or is there a better way of doing it?
[Edited to state that I only care about the last status - order is unimportant]
No, it's not safe.
The code generated that does the writing to m_activityStarted and the others may be atomic, but that is not garantueed. Also, in your setters you do two things: set a boolean and make a call. That is definately not atomic.
You're better off synchronizing here using a lock of some sort.
For example, one thread may call the first function, and before that thread goes into "WriteDB()" another thread may call another function and go into WriteDB() without the first going there. Then, perhaps the status is written in the DB in the wrong order.
If you're worried about deadlocks then you should revise your whole concurrency strategy.
On multi CPU machines, there's no guarantee that memory writes will be seen by threads running on different CPUs in the correct order without issuing a synchronisation instruction. It's only when you issue a synch order, e.g. a mutex lock or unlock, that the each thread's view of the data is guaranteed to be consistent.
To be safe, if you want the state shared between your threads, you need to use synchronisation of some form.
You never know exactly how things are implemented at the lower levels. Especially when you start dealing with multiple cores, the various cache levels, pipelined execution, etc. At least not without a lot of work, and implementations change frequently!
If you don't mutex it, eventually you will regret it!
My favorite example involves integers. This one particular system wrote its integer values in two writes. E.g. not atomic. Naturally, when the thread was interrupted between those two writes, well, you got the upper bytes from one set() call, and the lower bytes() from the other. A classic blunder. But far from the worst that can happen.
Mutexing is trivial.
You mention: I don't want to use locks as there are already a couple of locks in this part of the app and adding more will increase the possibility of deadlocks.
You'll be fine as long as you follow the golden rules:
Don't mix mutex lock orders. E.g. A.lock();B.lock() in one place and B.lock();A.lock(); in another. Use one order or the other!
Lock for the briefest possible time.
Don't try to use one mutex for multiple purposes. Use multiple mutexes.
Whenever possible use recursive or error-checking mutexes.
Use RAII or macros to insure unlocking.
E.g.:
#define RUN_UNDER_MUTEX_LOCK( MUTEX, STATEMENTS ) \
do { (MUTEX).lock(); STATEMENTS; (MUTEX).unlock(); } while ( false )
class StatusCache
{
public:
static void SetActivityStarted(bool activityStarted)
{ RUN_UNDER_MUTEX_LOCK( mMutex, mActivityStarted = activityStarted );
WriteToDB(); }
static void SetActivityComplete(bool activityComplete);
{ RUN_UNDER_MUTEX_LOCK( mMutex, mActivityComplete = activityComplete );
WriteToDB(); }
static void SetProcessReady(bool processReady);
{ RUN_UNDER_MUTEX_LOCK( mMutex, mProcessReady = processReady );
WriteToDB(); }
static void SetProcessPending(bool processPending);
{ RUN_UNDER_MUTEX_LOCK( mMutex, mProcessPending = processPending );
WriteToDB(); }
private:
static void WriteToDB(); // read data under mMutex.lock()!
static Mutex mMutex;
static bool mActivityStarted;
static bool mActivityComplete;
static bool mProcessReady;
static bool mProcessPending;
};
Im no c++ guy but i dont think it will be safe to write to it if you dont have some sort of synchronization..
It looks like you have two issues here.
#1 is that your boolean assignment is not necessarily atomic, even though it's one call in your code. So, under the hood, you could have inconsistent state. You could look into using atomic_set(), if your threading/concurrency library supports that.
#2 is synchronization between your reading and writing. From your code sample, it looks like your WriteToDB() function writes out the state of all 4 variables. Where is WriteToDB serialized? Could you have a situation where thread1 starts WriteToDB(), which reads m_activityStarted but doesn't finish writing it to the database, then is preempted by thread2, which writes m_activityStarted all the way through. Then, thread1 resumes, and finishes writing its inconsistent state through to the database. At the very least, I think that you should have write access to the static variables locked out while you are doing the read access necessary for the database update.
In theory it is possible that modifications are implemented as read, modify, write but in practice I can't imagine that this is so.
Generally it is so unless you've set up some sort of transactional memory. Variables are generally stored in RAM but modified in hardware registers, so the read isn't just for kicks. The read is necessary to copy the value out of RAM and into a place it can be modified (or even compared to another value). And while the data is being modified in the hardware register, the stale value is still in RAM in case somebody else wants to copy it into another hardware register. And while the modified data is being written back to RAM somebody else may be in the process of copying it into a hardware register.
And in C++ ints are guaranteed to take at least a byte of space. Which means it is actually possible for them to have a value other than true or false, say due to race condition where the read happens partway through a write.
On .Net there is some amount of automatic synchronization of static data and static methods. There is no such guarantee in standard C++.
If you're looking at only ints, bools, and (I think) longs, you have some options for atomic reads/writes and addition/subtraction. C++0x has something. So does Intel TBB. I believe that most operating systems also have the needed hooks to accomplish this.
While you may be afraid of deadlocks, I am sure you will be extremely proud of your code to know it works perfectly.
So I would recommend you throw in the locks, you may also want to consider semaphores, a more primitive(and perhaps more versatile) type of lock.
You may get away with it with bools, but if the static objects being changed are of types of any great complexity, terrible things will occur. My advice - if you are going to write from multiple threads, always use synchronisation objects, or you will sooner or later get bitten.
This is not a good idea. There are many variables that will affect the timing of different threads.
Without some kind of lock you will not be guaranteed to have the correct last state.
It is possible that two status updates could be written to the database out of order.
As long as the locking code is designed properly dead locks should not be an issue with a simple process like this.
As others have pointed out, this is generally a really bad idea (with some caveats).
Just because you don't see a problem on your particular machine when you happen to test it doesn't prove that the algorithm works right. This is especially true for concurrent applications. Interleavings can change dramatically for example when you switch to a machine with a different number of cores.
Caveat: if all your setters are doing atomic writes and if you don't care about the timing of them, then you may be okay.
Based on what you've said, I'd think that you could just have a dirty flag that's set in the setters. A separate database writing thread would poll the dirty flag every so often and send the updates to the database. If some items need extra atomicity, their setters would need to lock a mutex. The database writing thread must always lock the mutex.