I am planning to do the following:
store a deque of pre-built objects to be consumed. The main thread might consume these objects here and there. I have another junky thread used for logging and other not time-critical but expensive things. When the pre-built objects are running low, I will refill them in the junky thread.
Now my question is, is there going to be race condition here? Technically one thread is consuming objects from the front, and another thread is pushing objects into the back. As long as I don't let the size run down to zero, it should be fine. The only thing that concerns me is the "size" of this deque. Do they store a integer "size" variable in STL containers? should modifying that size variable introduce race conditions?
What's the best way of solving this problem? I don't really want to use locks, because the main thread is performance critical (the reason I pre-built these objects in the first place!)
STL containers are not thread safe, period, don't play with this. Specifically the deque elements are usually stored in a chain of short arrays and that chain will be modified when operating with the deque, so there's a lot of room for messing things up.
Another option would be to have 2 deques, one for read another for write. The main thread reads, and the other writes. When the read deque is empty, switch the deques (just move 2 pointers), which would involve a lock, but only occasionally.
The consumer thread would drive the switch so it would only need to do a lock when switching. The producer thread would need to lock per write in case the switch happens in the middle of a write, but as you mention the consumer is less performance-critical, so no worries there.
What you're suggesting regarding no locks is indeed dangerous as others mention.
As #sharptooth mentioned, STL containers aren't thread-safe. Are you using a C++11 capable compiler? If so, you could implement a lock-free queue using atomic types. Otherwise you'd need to use assembler for compare-and-swap or use a platform specific API (see here). See this question to get information on how to do this.
I would emphasise that you should measure performance when using standard thread synchronisation and see if you do actually need a lock-free technique.
There will be a data race even with non-empty deque.
You'll have to protect all accesses (not just writes) to the deque through locks, or use a queue specifically designed for consumer-producer model in multi-threaded environment (such as Microsoft's unbounded_buffer).
Related
I am new to using threads and have read a lot about how data is shared and protected. But I have also not really got a good grasp of using mutexes and locks to protect data.
Below is a description of the problem I will be working on. The important thing to note is that it will be time-critical, so I need to reduce overheads as much as possible.
I have two fixed-size double arrays.
The first array will provide data for subsequent calculations.
Threads will read values from it, but it will never be modified. An element may be read at some time by any of the threads.
The second array will be used to store the results of the calculations performed by the threads. An element of this array will only ever be updated by one thread, and probably only once when the result value
is written to it.
My questions then:
Do I really need to use a mutex in a thread each time I access the data from the read-only array? If so, could you explain why?
Do I need to use a mutex in a thread when it writes to the result array even though this will be the only thread that ever writes to this element?
Should I use atomic data types, and will there be any significant time overhead if I do?
Many answers to this type of question seem to be - no, you don't need the mutex if your variables are aligned. Would my array elements in this example be aligned, or is there some way to ensure they are?
The code will be implemented on 64bit Linux. I am planning on using Boost libraries for multithreading.
I have been mulling this over and looking all over the web for days, and once posted, the answer and clear explanations came back in literally seconds. There is an "accepted answer," but all the answers and comments were equally helpful.
Do I really need to use a mutex in a thread each time I access the data from the read-only array? If so could you explain why?
No. Because the data is never modified, there cannot be synchronization problem.
Do I need to use a mutex in a thread when it writes to the result array even though this will be the only thread that ever writes to this element?
Depends.
If any other thread is going to read the element, you need synchronization.
If any thread may modify the size of the vector, you need synchronization.
In any case, take care of not writing into adjacent memory locations by different threads a lot. That could destroy the performance. See "false sharing". Considering, you probably don't have a lot of cores and therefore not a lot of threads and you say write is done only once, this is probably not going to be a significant problem though.
Should I use atomic data types and will there be any significant time over head if I do?
If you use locks (mutex), atomic variables are not necessary (and they do have overhead). If you need no synchronization, atomic variables are not necessary. If you need synchronization, then atomic variables can be used to avoid locks in some cases. In which cases can you use atomics instead of locks... is more complicated and beyond the scope of this question I think.
Given the description of your situation in the comments, it seems that no synchronization is required at all and therefore no atomics nor locks.
...Would my array elements in this example be aligned, or is there some way to ensure they are?
As pointed out by Arvid, you can request specific alignment using the alginas keyword which was introduced in c++11. Pre c++11, you may resort to compiler specific extensions: https://gcc.gnu.org/onlinedocs/gcc-5.1.0/gcc/Variable-Attributes.html
Under the two conditions given, there's no need for mutexes. Remember every use of a mutex (or any synchronization construct) is a performance overhead. So you want to avoid them as much as possible (without compromising correct code, of course).
No. Mutexes are not needed since threads are only reading the array.
No. Since each thread only writes to a distinct memory location, no race condition is possible.
No. There's no need for atomic access to objects here. In fact, using atomic objects could affect the performance negatively as it prevents the optimization possibilities such as re-ordering operations.
The only time you need to use Locks is when data is modified on a shared resource. Eg if some threads where used to write data and some used to read data (in both cases from the same resource) then you only need a lock for when writing is done. This is to prevent whats known as "race".
There is good information of race on google for when you make programs that manipulate data on a shared resource.
You are on the right track.
1) For the first array (read only) , you do not need to utilize a mutex lock for it. Since the threads are just reading not altering the data there is no way a thread can corrupt the data for another thread
2) I'm a little confused by this question. If you know that thread 1 will only write an element to array slot 1 and thread 2 will only write to array slot 2 then you do not need a mutex lock. However I'm not sure how your achieving this property. If my above statement is not correct for your situation you would definitely need a mutex lock.
3) Given the definition of atomic:
Atomic types are types that encapsulate a value whose access is guaranteed to not cause data races and can be used to synchronize memory accesses among different threads.
Key note, a mutex lock is atomic meaning that there is only 1 assembly instruction needed to grab/release a lock. If it required 2 assembly instructions to grab/release a lock, a lock would not be thread safe. For example, if thread 1 attempted to grab a lock and was switched to thread 2, thread 2 would grab the lock.
Use of atomic data types would decrease your overhead but not significantly.
4) I'm not sure how you can assure your variables are lined. Since threads can switch at any moment in your program (Your OS determines when a thread switches)
Hope this helps
How Can I develop a producer/ consumer pattern which is thread safe?
in my case, the producer runs in a thread and the consumer runs on another thread.
Is std::deque is safe for this purpose?
can I push_back to the back of a deque in one thread and push_front in another thread?
Edit 1
In my case, I know the maximum number of items in the std::deque (for example 10). Is there any way that I can reserve enough space for items beforehand so during processing, there was no need to change the size of queue memory and hence make sure when I am adding pushing data to back, no change could be happen on front data?
STL C++ containers are not thread-safe: if you decide for them, you need to use proper synchronizations (basically std::mutex and std::lock) when pushing/popping elements.
Alternatively you can use properly designed containers (single-producer-single-consumer queues should fit your needs), one example of them here: http://www.boost.org/doc/libs/1_58_0/doc/html/lockfree.html
addon after your EDIT:
yep, a SPSC queue is basically a ring buffer and definitively fits you needs.
How Can I develop a producer/ consumer pattern which is thread safe?
There are several ways, but using locks and monitors is fairly easy to grasp and doesn't have many hidden caveats. The standard library has std::unique_lock, std::lock_guard and std::condition_variable to implement the pattern. Check out the cppreference page of condition_variable for simple example.
Is std::deque is safe for this purpose?
It's not safe. You need synchronization.
can I push_back to the back of a deque in one thread and push_front in another thread?
Sure, but you need synchronization. There is a race condition when the queue is empty or has only one element. Also when the queue is full or one short of full in case you want to limit it's size.
I think you mean push_back() and pop_front().
std::deque is not thread-safe on its own.
You will need to serialise access using an std::mutex so the consumer isn't trying to pop while the producer is trying to push.
You should also consider how you handle the following:
How does the consumer behave if the deque is empty when it looks for the next item?
If it enters a wait state then you will need a std::condition_variable to be notified by the producer when the deque has been added to.
You may also need to handle program termination in which the consumer is waiting on the deque and the program is terminated. It could be left 'waiting forever' unless you orchestrate things correctly.
10 items is 'piffle' so I wouldn't bother about reserving space. std::deque grows and shrinks automatically so don't bother with fine grain tuning until you've built a working application.
Premature optimization is the root of all evil.
NB: It's not clear how your limiting the queue size but if the producer fills up the queue and then waits for it to clear back down you'll need more waits and conditions coming back the other way to coordinate.
I was wondering what is the better choice: it's assumed there is a trivially copyable object, let's say a queue data structure, that is used by several threads to pop/push data. The object provides only methods put/push, that can't be accessed by more than one thread the same time. Obviously if put is called, push can't be called neither.
Would you suggest to wrap the model into atomic type (if possible), or rather use mutexes?
Regards!
Atomic is hardware thing, whereas mutex is OS thing. Mutex will end up by suspending the task, even though in some cases mutex will behave as a spinlock for a short period of time aka "optimistic spin", see https://lore.kernel.org/all/56C2673F.6070202#hpe.com/T/
So, if you have small operations like incrementing a variable, aka "atomic", without waiting for other things which might take longer, then atomic is for you.
If you want to (indefinitely) wait for some things to happen in other threads, polling for results via atomics, aka spinlock, might be a waste of CPU cycles therefore less cooperative, so it's better to use a mutex/condition variable which would suspend the task at a price of context switch latency.
Atomic is preferable for those kinds of cases. The atomic is a kind of operation supported by the CPU specifically whereas the other kinds of thread control tend to be implemented by the OS or other measures and incur more overhead.
EDIT: A quick search shows up this which has more info and is basically the same kind of question: Which is more efficient, basic mutex lock or atomic integer?
EDIT 2: And a more detailed article here http://www.informit.com/articles/article.aspx?p=1832575
I have multiple consumer threads and one producer thread. Producer thread writes the data into a map belong to a certain consumer thread and sends a signal to the consumer thread. I am using mutexes around the map when I am inserting and erasing the data. however this approach looks not efficient in terms of speed performance. Can you suggest another approach instead of map which requires mutex locks and unlocks and I think mutex slows down the transmission.
however this approach looks not efficient in terms of speed performance. Can you suggest another approach instead of map which requires mutex locks and unlocks and I think mutex slows down the transmission.
You should use a profiler to identify where the bottleneck is.
Producer thread writes the data into a map belong to a certain consumer thread and sends a signal to the consumer thread.
The producer should not be concerned what kind of data structure the consumer uses - it is a consumer's implementation detail. Keep in mind that inserting a value into a map requires a memory allocation (unless you are using a custom allocator) and memory allocation internally takes locks as well to protect the state of the heap. The end result is that locking a mutex around map::insert operation may lock it for too long actually.
A simpler and more efficient design would be to have an atomic queue between the producer and consumer (e.g. pipe, TBB concurrent_bounded_queue which pre-allocates its storage so that push/pop operations are really quick). Since your producer communicates directly to each consumer that queue is one-writer-one-reader and it can be implemented as a wait-free queue (or ring buffer a-la C++ disruptor).
Andrei Alexandrescu made the good point in that you should measure your code (https://www.facebook.com/notes/facebook-engineering/three-optimization-tips-for-c/10151361643253920) and this is the same advice I would give you, which is to measure your code and see what performance differences you are getting between a baseline test and your test running single threaded:
Time required to insert data using single thread to map
with above listed data
Time required to insert data
using single thread to map with above listed data and using mutex
locks
If you are still looking for a thread-safe container, you may want to look at Intel's open-source implementation of thread-safe containers at http://www.threadingbuildingblocks.org/docs/help/reference/containers_overview/concurrent_queue_cls.htm .
Also, as a suggestion for the consumer thread implementation, you may want to read the ActiveObject article that Herb Sutter posted on his website: http://herbsutter.com/2010/07/12/effective-concurrency-prefer-using-active-objects-instead-of-naked-threads/
If you can provide some more details, like why the map has to be locked all the time, we may be able to draft up a mechanism that is better performing.
This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
Is the C++ STL std::set thread-safe?
Thread safety for STL queue
I'm guessing it isn't, I just want to make sure.
meaning 2 threads using the same std::deque using std::deque::push_back or push_front at the same time.
Same question goes for std::priority_queue and the functions std::priority_queue::push and std::priority_queue::pop..
Are those containers thread-safe? Or I should personally program it to be thread-safe?
Tnx a lot.
From Scott Myer's Effective STL Item 12. Have realistic expectations about the thread safety of STL containers
Multiple readers are safe. Multiple threads may simultaneously read the contents of a single container, and this will work correctly. Naturally, there must not be any writers acting on the container during the reads.
Multiple writers to different containers are safe. Multiple threads may simultaneously write to different containers.
When it comes to thread safely and STL containers, you can hope for a library implementation that allows multiple readers
on one container and multiple writers on separate containers. You can't hope for the library to eliminate the need for manual concurrency control, and you can't rely on any thread support at all.
The STL does not provide any guarantees for thread safety. This is especially the case when modifying the same container from multiple threads.
The implementation of the STL that you're using may provide some level of thread safety, but you would need to look at the documentation for your implementation.
When you say are they thread safe, presumably you mean can you use them in multiple threads without having to lock anything.
In theory you could potentially have 2 threads, one pushing to the back and one to the front, and you'd probably get away with it although I would be wary because the implementor is not under a guarantee to make it thread safe as iterators become invalidated with inserts at either end, if the implementation of push_back used "end" and of push_front used "begin", such would be invalidated in the call by the other thread, and might blow up on you.
std::priority_queue is almost certainly not usable in two threads together, presumably for producer/consumer threads, with one pushing and one popping and you will need to lock first.
I found that when I wrote a producer/consumer queue based around std::deque, I allowed the producer also to push more than one item at a time, and the consumer to sweep the entire queue to process. This meant only one lock per bulk-insert, so reduced the number of times you needed to lock.