Does a single queue with separate reader and writer threads needs locking? - c++

I have a shared queue (implemented using a singleton queue wrapper) and a reader thread and a writer thread. I also have a mechanism to inform the reader thread when writer thread adds elements (enqueue) to the queue. Reader thread dequeue only one element when informed. Is there a necessity of a Read Write Lock in this scenario.
Since writer is only enqueing and reader dequeing I feel like there is no need for a lock, if reader checks the queue size when dequeing.

Since writer is only enqueing and reader dequeing I feel like there is no need for a lock, if reader checks the queue size when dequeing.
Among other problems that operation alone is already unsafe, when the queue is modified by another thread. In c++, any unsynchronized access to a non-atomic shared variable (with at least one of them is a write) is a data race and hence UB.

I assume you mean a stl::queue and no most operations on stl containers are not thread save. For an discussion on exceptions see C++11 STL containers and thread safety. STL prefers speed over security (e.g. range check for array indices etc.) assuming that developers will implement their own checks.

Related

C++ Access to vector from multiple threads

In my program I've some threads running. Each thread gets a pointer to some object (in my program - vector). And each thread modifies the vector.
And sometimes my program fails with a segm-fault. I thought it occurred because thread A begins doing something with the vector while thread B hasn't finished operating with it? Can it be true?
How am I supposed to fix it? Thread synchronization? Or maybe make a flag VectorIsInUse and set this flag to true while operating with it?
vector, like all STL containers, is not thread-safe. You have to explicitly manage the synchronization yourself. A std::mutex or boost::mutex could be use to synchronize access to the vector.
Do not use a flag as this is not thread-safe:
Thread A checks value of isInUse flag and it is false
Thread A is suspended
Thread B checks value of isInUse flag and it is false
Thread B sets isInUse to true
Thread B is suspended
Thread A is resumed
Thread A still thinks isInUse is false and sets it true
Thread A and Thread B now both have access to the vector
Note that each thread will have to lock the vector for the entire time it needs to use it. This includes modifying the vector and using the vector's iterators as iterators can become invalidated if the element they refer to is erase() or the vector undergoes an internal reallocation. For example do not:
mtx.lock();
std::vector<std::string>::iterator i = the_vector.begin();
mtx.unlock();
// 'i' can become invalid if the `vector` is modified.
If you want a container that is safe to use from many threads, you need to use a container that is explicitly designed for the purpose. The interface of the Standard containers is not designed for concurrent mutation or any kind of concurrency, and you cannot just throw a lock at the problem.
You need something like TBB or PPL which has concurrent_vector in it.
That's why pretty much every class library that offers threads also has synchronization primitives such as mutexes/locks. You need to setup one of these, and aquire/release the lock around every operation on the shared item (read AND write operations, since you need to prevent reads from occuring during a write too, not just preventing multiple writes happening concurrently).

Writing to a mutex'ed shared resourced

I've a C++ list which is being processed by multiple thread.
Each thread creates a pthread_mutex_lock on the list so that other threads cannot "interfere" with the list. As a part of processing, each thread also push_back data on the list.
My question is - is push_back on a mutex-ed list a bad idea? Is the mutex still valid while the thread is pusing more data on the list? Most of the documentation/examples I've seen on pthread_mutex_lock are only doing "reading" so I am curious to know what happens the same thread which acquired lock, writes on the shared resource.
As long as only that particular thread is holding the lock, and no other thread can take this lock, writing should be fine. think of why a problem could happen? it wouldve been a problem if one thread was writing and the other was reading simultaneously. If a ball is yours, you can do anything with it right? things change when they're shared.
The mutex needs to be unique for the entire group of threads (i.e. all threads must use the same mutex). If you create a mutex for each thread, then you are not thread-safe at all, because each thread will wait on its own mutex and not be synchronized with the rest.
And yes an acquired mutex can be used safely to both read and write.

Thread-safe linked list with fine grained locks

In a program I have a class M:
class M{
/*
very big immutable fields
*/
int status;
};
And I need a linked-list of objects of type M.
Three types of threads are accessing the list:
Producers: Produce and append objects to the end of the list. All of the newly produced objects have the status=NEW. (Operation time = O(1))
Consumers: Consume objects at the beginning of the list. An object can be consumed by a consumer if it has status=CONSUMER_ID. Each of the consumers keeps the first item in the linked-list that it can consume so the consumption is (amortized?) O(1)(see note below).
Destructor: Deletes consumed objects when there is a notification that says the object has been consumed correctly (Operation time = O(1)).
Modifier: Changes the status of the objects based on a state diagram. The final status of any object is the id of a consumer (Operation time = O(1) per object).
The number of consumers is less than 10. The number of Producers may be as big as a couple of hundreds. There is one modifier.
note: The modifier may modify the already consumed objects and thus the stored items of consumers may move back and forth. I did not find any better solutions for this problem (Although, the comparison between objects is O(1), the operation is no more amortized O(1)).
The performance is very important. Therefore, I want to use atomic operations or fine-grained locks (one per object) to avoid unnecessary blocking.
My questions are:
Atomic operations are preferred because they are lighter. I guess I must use locks for updating the pointers in destructor thread only and I can use atomic operations for handling contention between other threads. Please let me know if I am missing something and there is a reason that I cannot use atomic operations on status field.
I think I cannot use STL list because it does not support fine-grained locks. But would you recommend using Boost::Intrusive lists (instead of writing my own)? Here it is mentioned that intrusive data structures are harder to make thread-safe? Is this true for fine-grained locks?
The producers, consumers and destructor would be called asynchronously based on some events (I am planning to use Boost::asio. But I don't know how to run the modifier to minimize its contention with other threads. The options are:
Asynchronously from producers.
Asynchronously from consumers.
Using its own timer.
Any such call would operate on the list only if some conditions hold. My own intuition is that there is no difference between how I call the modifier. Am I missing something?
My system is Linux/GCC and I am using boost 1.47 in case it matters.
Similar question: Thread-safe deletion of a linked list node, using the fine-grained approach
The performance is very important. Therefore, I want to use atomic operations or fine-grained locks (one per object) to avoid unnecessary blocking.
This will make performance worse by increasing the probability that threads that contend (access the same data) will run at the same time on different cores. If the locks are too fine, threads may contend (ping-pong data between their caches) and run in slow lock step without ever blocking on a lock, causing terrible performance.
You want to use coarse enough locks that threads that contend over the same data block each other as soon as possible. That will force the scheduler to schedule non-contending threads, eliminating the cache ping-ponging that destroys performance.
You have a common misconception that blocking is bad. In fact, contention is bad, because it slows cores down to bus speeds. Blocking ends contention. Blocking is good because it de-schedules contending threads, allowing non-contending threads (that can run concurrently at full speed) to be scheduled.
If you're already planning to use Boost Asio, then good news! You can stop writing your custom asynchronous producer-consumer queue right now.
The Boost Asio io_service class is an asynchronous queue, so you can easily use it to pass objects from producers to consumers. Use the io_service::post() method to enqueue a bound function object for asychronous callback by another thread.
boost::asio::io_service io_service_;
void produce()
{
M* m = new M;
io_service_.post(boost::bind(&consume, m));
}
void consume(M* m)
{
delete m;
}
Have your producer threads call produce(), then have your consumers threads call io_service_.run(), and then consume() will be called back on your consumer threads. Instant producer-consumer!
Plus, you can enqueue all kinds of other heterogeneous events into the io_service_ to be handled by your consumer threads if you like, such as network reads and waiting for signals. Boost Asio is more than just a network library-- it's also an easy way to express a proactor, reactor, producer-consumer, thread-pool, or any other kind of threading architecture.
EDIT
Oh, and one more tip. Don't make separate pools of dedicated producer threads and dedicated consumer threads. Just make one thread for each core available on your machine (4 core machine => 4 threads). Then have all those threads call io_service_.run(). Use the io_service_ to asynchronously read stuff to produce, from files or the network or whatever, then use the io_service_ again to asynchronously consume whatever was produced.
That's the most performant threading architecture. One thread per core.
As #David Schwartz fairly noted, blocking is not always slow and spinning (in user space multithreaded applications) can be quite dangerous.
Moreover, linux pthread library has "smart" implementation of pthread_mutex. It's designed to be "lightweight", i.e. when a thread tries to lock already acquired mutex, it spins some time making several attempts to get the lock before it blocks. Number of attempts is not big enough to harm your system or even break real-time requirements (if any). Additional linux specific feature is so-called fast user space mutex (FUTEX), which reduces number of syscalls. The main idea is that it'll do mutex_lock syscall only when a thread really needs to block on a mutex (when a thread locks unacquired mutex, it doesn't do a syscall).
Actually in most cases you don't need to reinvent the wheel or introduce some very specific locking techniques. If you have to, then either something wrong with design or you're dealing with highly concurrent environment (for the first sight 10 consumers don't seem that and all these seem like over engineering).
If I were you I'd prefer to use conditional variable + mutex protecting the list.
Another thing I'd do is to go over the design again. Why use one global list when consumer needs to do a search to find out whether the list contains the item with its ID (and if so, remove/dequeue it)? May be it's better to make a separate list for each consumer? In this case you probably can get rid of status field.
Does read access is more frequent than write access? If so it would be better to use R/W locks or RCU
If I wouldn't satisfied with pthread primitives and futex stuff (and if I wouldn't, I would have proved by the tests that locking primitives are bottleneck, not the number of consumers or the algorithm I chosen), then I'd try to think about complicated algorithm with reference counting, separate GC thread and restriction of all updates to be atomic.
I would advice you on a slightly different approach to the problem:
Producers: Enqueue objects at the end of a shared queue (SQ). Wakes up
the Modifier via a semaphore.
producer()
{
while (true)
{
o = get_object_from_somewhere ()
atomic_enqueue (SQ.queue, o)
signal(SQ.sem)
}
}
Consumers: Deque objects from the front of a per consumer queue (CQ[i]).
consumer()
{
while (true)
{
wait (CQ[self].sem)
o = atomic_dequeue (CQ[self].queue)
process (o)
destroy (o)
}
}
Destructor: Destructor does not exist, after a consumer is done with
an object, the consumer destroys it.
Modifier: The modifier dequeues objects from the shared queue,
processed them and enqueues them to the private queue of the appropriate consumer.
modifier()
{
while (true)
{
wait (SQ.sem)
o = atomic_dequeue (SQ.queue)
FSM (o)
atomic_enqueue (CQ [o.status].queue, o)
signal (CQ [o.status].sem)
}
}
A note to the various atomic_xxx functions in the pseudo code: this
does not necessarily mean using atomic instructions like CAS, CAS2,
LL/SC, etc. It can be using atomics, spinlocks or plain mutexes. I
would advice implementing it in the most straighforward way
(e.g. mutexes) and optimizing it later if it proves to be a
performance issue.

Thread safety for STL queue

I am using a queue to communicate between threads. I have one reader and multiple writer threads. My question is do I need to lock the queue every time when I use push/front/pop from the queue for the reader? Can I do something like the following:
//reader threads
getLock();
get the number of elements from the queue
releaseLock();
int i = 0;
while( i < numOfElements){
queue.front();
queue.pop();
i++
}
The idea is that I want to reduce the granularity of the locked code and since the writer thread would only write to the back of the queue and there is only a single reader thread. As long as I get the number of elements, then I could get the elements from the queue OR do I need to enclose the front() and pop() in the lock as well?
As others have already mentioned, standard containers are not required to guarantee thread safety so what you're asking for cannot be implemented portably. You can reduce the time your reader thread is locking the writers out by using 2 queues and a queue pointer that indicates the queue that is currently in use by the writers.
Each writer would:
Acquire lock
Push element(s) into the queue currently pointed to by the queue pointer
Release lock
The reader can then do the following:
Acquire lock
Switch queue pointer to point to the second queue
Release lock
Process elements from the first queue
Any type that doesn't explicitly state its thread-safety guarantees should always be controlled by a mutex. That said, your implementation's stdlib may allow some variation of this — but you can't know for all implementations of std::queue.
As std::queue wraps another container (it's a container adapter), you need to look at the underlying container, which defaults to deque.
You may find it easier, better, or more portable to write your own container adapter that makes the guarantees you need. I don't know of anything that does this exactly for a queue in Boost.
I haven't looked at C++0x enough to know if it has any solution for this out-of-the-box, but that could be another option.
This is absolutely implementation-dependent. The C++ standard makes no mention about threads or thread safety, so whether or not this will work depends on how your implementation handles queue elements.
In your case, the reader is actually popping the queue, which is considered a write operation. I doubt any of the common implementations actually guarantee thread-safety in this case, when multiple threads simultaneously write to a container. At least VC++ does not:
For reads to the same object, the object is thread safe for reading when no writers on other threads.
For writes to the same object, the object is thread safe for writing from one thread when no readers on other threads.
Sometimes you can resolve a lot of concurrency headache by avoiding sharing state or resources among threads. If you have multiple threads that access a container concurrently in order to push in their work then try to have them work on dedicated containers. At specific points you then collect the containers' elements onto the central container in a non-concurrent manner.
If you can avoid sharing state or resources among threads then you have no problem running threads concurrently. Threads then need not worry about each other, because they are completely isolated and bear no effect whatsoever on each other.
Your hunch is correct: Even though you cannot count on STD queue to be thread safe, a queue should be thread safe by design.
A nice explanation of why that is the case and a standard implementation of thread safe, lock free queues in C++ is given by van Dooren

To Mutex or Not To Mutex?

Do I need a mutex if I have only one reader and one writer? The reader takes the next command (food.front()) from the queue and executes a task based on the command. After the command is executed, it pops off the command. The writer to the queue pushes commands onto the queue (food.push()).
Do I need a mutex? My reader (consumer) only executes if food.size() > 0. I am using a reader thread and send thread.
A mutex is used in multi-threaded environments. I don't see mention of threads in your question, so I don't see a need for a mutex.
However, if we assume by reader and writer you mean you have two threads, you need to protect mutual data with a mutex (or other multi-threaded protection scheme.)
What happens when the queue has items, and the reader thread pops something off while the writer thread puts something on? Disaster! With a mutex, you'll be sure only one thread is operating on the queue at a time.
Another method is a lock-free thread-safe queue. It would use atomic operations to ensure the data isn't manipulated incorrectly.
What happens if the reader sees the size is greater than zero, but the structure isn't yet completely updated?
That can be avoided by very carefully coding the updates, but the way to make the code resistant to future tampering updates is to use a mutex.
Assuming the "writer" and the "reader" are in separate threads:
Most probably yes: you could have a "metastable" state between the "writing" event and a "reading" event where the pointers to the structures are consistent.
Of course this depends on the implementation: if an atomic operation is used to update the pointers, you might be good without a mutex.
Depends entirely on the implementation, if you have two different threads accessing the same variables you will need a mutex. Otherwise you may for example end up with an inconsistent count.
Say in write you do ++count and in read you do --count and say the current value is 2. Now note that these statements do not need to be atomic, the ++count may consist of reading variable count, incrementing it and then writing it back again. No a write and read are simultaneously executed and say the first bit of the write is executed is executed (i.e. it loads value 2. then the whole read is executed decrementing the count, but the other thread still had value 2 loaded, which it increments and subsequently writes back to the variable. Now you just lost a read action.
your question depends on two conditions:
there are only two threads, one is producer, the other is consumer
the structure is designed for lock-free
if satisfy both, you can drop the lock, or you need to use a lock to protect the queue structure.
for dropping the lock, must remember to update the header or tailer pointer in the end of steps.