I am using a queue to communicate between threads. I have one reader and multiple writer threads. My question is do I need to lock the queue every time when I use push/front/pop from the queue for the reader? Can I do something like the following:
//reader threads
getLock();
get the number of elements from the queue
releaseLock();
int i = 0;
while( i < numOfElements){
queue.front();
queue.pop();
i++
}
The idea is that I want to reduce the granularity of the locked code and since the writer thread would only write to the back of the queue and there is only a single reader thread. As long as I get the number of elements, then I could get the elements from the queue OR do I need to enclose the front() and pop() in the lock as well?
As others have already mentioned, standard containers are not required to guarantee thread safety so what you're asking for cannot be implemented portably. You can reduce the time your reader thread is locking the writers out by using 2 queues and a queue pointer that indicates the queue that is currently in use by the writers.
Each writer would:
Acquire lock
Push element(s) into the queue currently pointed to by the queue pointer
Release lock
The reader can then do the following:
Acquire lock
Switch queue pointer to point to the second queue
Release lock
Process elements from the first queue
Any type that doesn't explicitly state its thread-safety guarantees should always be controlled by a mutex. That said, your implementation's stdlib may allow some variation of this — but you can't know for all implementations of std::queue.
As std::queue wraps another container (it's a container adapter), you need to look at the underlying container, which defaults to deque.
You may find it easier, better, or more portable to write your own container adapter that makes the guarantees you need. I don't know of anything that does this exactly for a queue in Boost.
I haven't looked at C++0x enough to know if it has any solution for this out-of-the-box, but that could be another option.
This is absolutely implementation-dependent. The C++ standard makes no mention about threads or thread safety, so whether or not this will work depends on how your implementation handles queue elements.
In your case, the reader is actually popping the queue, which is considered a write operation. I doubt any of the common implementations actually guarantee thread-safety in this case, when multiple threads simultaneously write to a container. At least VC++ does not:
For reads to the same object, the object is thread safe for reading when no writers on other threads.
For writes to the same object, the object is thread safe for writing from one thread when no readers on other threads.
Sometimes you can resolve a lot of concurrency headache by avoiding sharing state or resources among threads. If you have multiple threads that access a container concurrently in order to push in their work then try to have them work on dedicated containers. At specific points you then collect the containers' elements onto the central container in a non-concurrent manner.
If you can avoid sharing state or resources among threads then you have no problem running threads concurrently. Threads then need not worry about each other, because they are completely isolated and bear no effect whatsoever on each other.
Your hunch is correct: Even though you cannot count on STD queue to be thread safe, a queue should be thread safe by design.
A nice explanation of why that is the case and a standard implementation of thread safe, lock free queues in C++ is given by van Dooren
Related
So, suppose I have a struct A { int val1; int val2};
And a std::queue<A> fifo
Two threads,
Reader thread: reads all contents from A, and clears it.
Writer thread: writes one A at a time to queue.
Is std::queue enough for maintaining a lockless thread safe fifo container with one reader and one writer? If not, can any other stl container work? dequeue is the default underlying in std::queue.
No, you absolutely cannot use any STL container directly for this. What you can use is any of the many, many lockfree queue implementations that already exist for C++. You should search for "SPSC" meaning Single Producer, Single Consumer. For example, from Boost: http://www.boost.org/doc/libs/1_59_0/doc/html/boost/lockfree/spsc_queue.html
One wait-free, fixed-size implementation is right here: SPSC lock free queue without atomics (but do note the answer and comments there, which explain some ways that the implementation in the question is not completely safe, and offer some solutions).
I have a shared queue (implemented using a singleton queue wrapper) and a reader thread and a writer thread. I also have a mechanism to inform the reader thread when writer thread adds elements (enqueue) to the queue. Reader thread dequeue only one element when informed. Is there a necessity of a Read Write Lock in this scenario.
Since writer is only enqueing and reader dequeing I feel like there is no need for a lock, if reader checks the queue size when dequeing.
Since writer is only enqueing and reader dequeing I feel like there is no need for a lock, if reader checks the queue size when dequeing.
Among other problems that operation alone is already unsafe, when the queue is modified by another thread. In c++, any unsynchronized access to a non-atomic shared variable (with at least one of them is a write) is a data race and hence UB.
I assume you mean a stl::queue and no most operations on stl containers are not thread save. For an discussion on exceptions see C++11 STL containers and thread safety. STL prefers speed over security (e.g. range check for array indices etc.) assuming that developers will implement their own checks.
Using std::forward_list are there any data races when erasing and inserting? For example I have one thread that does nothing but add new elements at the end of the list, and I have another thread that walks the (same) list and can erase elements from it.
From what I know of linked lists, each element holds a pointer to the next element, so if I erase the last element, at the same time that I am inserting a new element, would this cause a data race or do these containers work differently (or do they handle that possibility)?
If it is a data race, is there a (simple and fast) way to avoid this? (Note: The thread that inserts is the most speed critical of the two.)
There are thread-safety guarantees for the standard C++ library containers but they tend not to be of the kind people would consider thread-safety guarantees (that is, however, an error of people expecting the wrong thing). The thread-safety guarantees of standard library containers are roughly (the relevant section 17.6.5.9 [res.on.data.races]):
You can have as many readers of a container as you want. What exactly qualifies as reader is a bit subtly but roughly amounts to users of const member functions plus using a few non-const members to only read the data (the thread safety of the read data isn't any of the containers concern, i.e., 23.2.2 [container.requirements.dataraces] specifies that the elements can be changed without the containers introducing data races).
If there is one writer of a container, there shall be no other readers or writes of the container in another thread.
That is, reading one end of a container and writing the other end is not thread safe! In fact, even if the actual container changes don't affect the reader immediately, you always need synchronization of some form when communicating a piece of data from one thread to another thread. That is, even if you can guarantee that the consumer doesn't erase() the node the producer currently insert()s, there would be a data race.
No, neither forward_list nor any other STL containers are thread-safe for writes. You must provide synchronization so that no other threads read or write to the container while a write is occurring. Only simultaneous reads are safe.
The simplest way to do this is to use a mutex to lock access to the container while an insert is occurring. Doing this in a portable way requires C++ 11 (std::mutex) or platform-specific features (mutexes in Windows, perhaps pthreads in Linux/Unix).
Unless you're using a version of the STL that explicitly states it is thread-safe then no, the containers are not thread safe.
It's rare to make general purpose containers thread safe by default, as it imposses a performance hit on users who don't require thread safe access to the container, and this is by far the normal usage pattern.
If thread safety is an issue for you then you'll need to surround your code with locks, or use a data structure that is designed specifically designed for multi threaded access.
std containers are not meant to be thread safe.
You should carefully protect them for modify operations.
In a program I have a class M:
class M{
/*
very big immutable fields
*/
int status;
};
And I need a linked-list of objects of type M.
Three types of threads are accessing the list:
Producers: Produce and append objects to the end of the list. All of the newly produced objects have the status=NEW. (Operation time = O(1))
Consumers: Consume objects at the beginning of the list. An object can be consumed by a consumer if it has status=CONSUMER_ID. Each of the consumers keeps the first item in the linked-list that it can consume so the consumption is (amortized?) O(1)(see note below).
Destructor: Deletes consumed objects when there is a notification that says the object has been consumed correctly (Operation time = O(1)).
Modifier: Changes the status of the objects based on a state diagram. The final status of any object is the id of a consumer (Operation time = O(1) per object).
The number of consumers is less than 10. The number of Producers may be as big as a couple of hundreds. There is one modifier.
note: The modifier may modify the already consumed objects and thus the stored items of consumers may move back and forth. I did not find any better solutions for this problem (Although, the comparison between objects is O(1), the operation is no more amortized O(1)).
The performance is very important. Therefore, I want to use atomic operations or fine-grained locks (one per object) to avoid unnecessary blocking.
My questions are:
Atomic operations are preferred because they are lighter. I guess I must use locks for updating the pointers in destructor thread only and I can use atomic operations for handling contention between other threads. Please let me know if I am missing something and there is a reason that I cannot use atomic operations on status field.
I think I cannot use STL list because it does not support fine-grained locks. But would you recommend using Boost::Intrusive lists (instead of writing my own)? Here it is mentioned that intrusive data structures are harder to make thread-safe? Is this true for fine-grained locks?
The producers, consumers and destructor would be called asynchronously based on some events (I am planning to use Boost::asio. But I don't know how to run the modifier to minimize its contention with other threads. The options are:
Asynchronously from producers.
Asynchronously from consumers.
Using its own timer.
Any such call would operate on the list only if some conditions hold. My own intuition is that there is no difference between how I call the modifier. Am I missing something?
My system is Linux/GCC and I am using boost 1.47 in case it matters.
Similar question: Thread-safe deletion of a linked list node, using the fine-grained approach
The performance is very important. Therefore, I want to use atomic operations or fine-grained locks (one per object) to avoid unnecessary blocking.
This will make performance worse by increasing the probability that threads that contend (access the same data) will run at the same time on different cores. If the locks are too fine, threads may contend (ping-pong data between their caches) and run in slow lock step without ever blocking on a lock, causing terrible performance.
You want to use coarse enough locks that threads that contend over the same data block each other as soon as possible. That will force the scheduler to schedule non-contending threads, eliminating the cache ping-ponging that destroys performance.
You have a common misconception that blocking is bad. In fact, contention is bad, because it slows cores down to bus speeds. Blocking ends contention. Blocking is good because it de-schedules contending threads, allowing non-contending threads (that can run concurrently at full speed) to be scheduled.
If you're already planning to use Boost Asio, then good news! You can stop writing your custom asynchronous producer-consumer queue right now.
The Boost Asio io_service class is an asynchronous queue, so you can easily use it to pass objects from producers to consumers. Use the io_service::post() method to enqueue a bound function object for asychronous callback by another thread.
boost::asio::io_service io_service_;
void produce()
{
M* m = new M;
io_service_.post(boost::bind(&consume, m));
}
void consume(M* m)
{
delete m;
}
Have your producer threads call produce(), then have your consumers threads call io_service_.run(), and then consume() will be called back on your consumer threads. Instant producer-consumer!
Plus, you can enqueue all kinds of other heterogeneous events into the io_service_ to be handled by your consumer threads if you like, such as network reads and waiting for signals. Boost Asio is more than just a network library-- it's also an easy way to express a proactor, reactor, producer-consumer, thread-pool, or any other kind of threading architecture.
EDIT
Oh, and one more tip. Don't make separate pools of dedicated producer threads and dedicated consumer threads. Just make one thread for each core available on your machine (4 core machine => 4 threads). Then have all those threads call io_service_.run(). Use the io_service_ to asynchronously read stuff to produce, from files or the network or whatever, then use the io_service_ again to asynchronously consume whatever was produced.
That's the most performant threading architecture. One thread per core.
As #David Schwartz fairly noted, blocking is not always slow and spinning (in user space multithreaded applications) can be quite dangerous.
Moreover, linux pthread library has "smart" implementation of pthread_mutex. It's designed to be "lightweight", i.e. when a thread tries to lock already acquired mutex, it spins some time making several attempts to get the lock before it blocks. Number of attempts is not big enough to harm your system or even break real-time requirements (if any). Additional linux specific feature is so-called fast user space mutex (FUTEX), which reduces number of syscalls. The main idea is that it'll do mutex_lock syscall only when a thread really needs to block on a mutex (when a thread locks unacquired mutex, it doesn't do a syscall).
Actually in most cases you don't need to reinvent the wheel or introduce some very specific locking techniques. If you have to, then either something wrong with design or you're dealing with highly concurrent environment (for the first sight 10 consumers don't seem that and all these seem like over engineering).
If I were you I'd prefer to use conditional variable + mutex protecting the list.
Another thing I'd do is to go over the design again. Why use one global list when consumer needs to do a search to find out whether the list contains the item with its ID (and if so, remove/dequeue it)? May be it's better to make a separate list for each consumer? In this case you probably can get rid of status field.
Does read access is more frequent than write access? If so it would be better to use R/W locks or RCU
If I wouldn't satisfied with pthread primitives and futex stuff (and if I wouldn't, I would have proved by the tests that locking primitives are bottleneck, not the number of consumers or the algorithm I chosen), then I'd try to think about complicated algorithm with reference counting, separate GC thread and restriction of all updates to be atomic.
I would advice you on a slightly different approach to the problem:
Producers: Enqueue objects at the end of a shared queue (SQ). Wakes up
the Modifier via a semaphore.
producer()
{
while (true)
{
o = get_object_from_somewhere ()
atomic_enqueue (SQ.queue, o)
signal(SQ.sem)
}
}
Consumers: Deque objects from the front of a per consumer queue (CQ[i]).
consumer()
{
while (true)
{
wait (CQ[self].sem)
o = atomic_dequeue (CQ[self].queue)
process (o)
destroy (o)
}
}
Destructor: Destructor does not exist, after a consumer is done with
an object, the consumer destroys it.
Modifier: The modifier dequeues objects from the shared queue,
processed them and enqueues them to the private queue of the appropriate consumer.
modifier()
{
while (true)
{
wait (SQ.sem)
o = atomic_dequeue (SQ.queue)
FSM (o)
atomic_enqueue (CQ [o.status].queue, o)
signal (CQ [o.status].sem)
}
}
A note to the various atomic_xxx functions in the pseudo code: this
does not necessarily mean using atomic instructions like CAS, CAS2,
LL/SC, etc. It can be using atomics, spinlocks or plain mutexes. I
would advice implementing it in the most straighforward way
(e.g. mutexes) and optimizing it later if it proves to be a
performance issue.
I have a thread push-backing to STL list and another thread pop-fronting from the list. Do I need to lock the list with mutex in such case?
From SGI's STL on Thread Safety:
If multiple threads access a single container, and at least one thread may potentially write, then the user is responsible for ensuring mutual exclusion between the threads during the container accesses.
Since both your threads modify the list, I guess you have to lock it.
Most STL implementations are thread safe in the sens that you can access several instances of a list type from several threads without locking. But you MUST lock when you are accessing the same instance of your list.
Have a look on this for more informations : thread safty in sgi stl
Probably. These operations are not simple enough to be atomic, so they'll only be thread-safe if the implementation explicitly performs the necessary locking.
However, the C++ standard does not specify whether these operations should be thread-safe, so it is up to the individual implementation to decide that. Check the docs. (Or let us know which implementation you're using)
There is no guarantee that a STL implementation is thread-safe, and since it costs performance I would guess that most aren't. You should definitely use a mutex.
Since the stl pop / push operations are AFAIK non-atomic you do have to use a mutex.