should I synchronize the deque or not - c++

I have a deque with pointers inside in a C++ application. I know there are two threads to access it.
Thread1 will add pointers from the back and Thread2 will process and remove pointers from the front.
The Thread2 will wait until deque reach certain of amount, saying 10 items, and then start to process it. It will only loop and process 10 items at a time. In the meantime, Thread1 may still keep adding new items into the deque.
I think it will be fine without synchronize the deque because Thread1 and Thread2 are accessing different part of the deque. It is deque not vector. So there is no case that the existing memory of the container will be reallocated.
Am I right? if not, why (I want to know what I am missing)?
EDIT:
I know it will not hurt to ALWAYS synchronize it. But it may hurt the performance or not necessary. I just want it run faster and correctly if possible.

The deque has to keep track of how many elements it has and where those elements are. Adding an element changes that stored data, as does removing an element. Changing that data from two threads without synchronization is a data race, and produces undefined behavior.
In short, you must synchronize those operations.

In general, the Standard Library containers cannot be assumed to be thread-safe unless all you do is reading from them.
If you take a look under the covers, at deque implementation, you will uncover something similar to this:
template <typename T>
class deque {
public:
private:
static size_t const BufferCapacity = /**/;
size_t _nb_available_buffer;
size_t _first_occupied_buffer;
size_t _last_occupied_buffer;
size_t _size_first_buffer;
size_t _size_last_buffer;
T** _buffers; // heap allocated array of
// heap allocated arrays of fixed capacity
}; // class deque
Do you see the problem ? _buffers, at the very least, may be access concurrently by both enqueue and dequeue operations (especially when the array has become too small and need be copied in a bigger array).
So, what is the alternative ? What you are looking for is a concurrent queue. There are some implementations out there, and you should probably not worry too much on whether or not they are lock-free unless it proves to be a bottleneck. An example would be TTB concurrent_queue.
I would advise against creating your own lock-free queue, even if you heard it's all the fad, because all first implementations I have seen had (sometimes subtle) race-conditions.

Related

What is the fastest way for multiple threads to insert into a vector safely?

I have a program where multiple threads share the same data structure which is basically a 2D array of vectors and sometimes two or more threads might have to insert at the same position i.e. vector which might result in a crash if no precautions were taken. What is the fastest and most efficient way to implement a safe solution for this issue ? Since this issue does not happen very often (no high contention) I had a 2D array of mutexes where each mutex maps to a vector and then each thread locks then unlocks the mutex after finishing from updating the corresponding vector. If this is a good solution, I would like to know if there is something faster than mutex to use.
Note, I am using OpenMP for the multithreading.
The solution greatly depends on how the problem is. For example:
If the vector size may exceed its capacity (i.e. reallocation is required).
Whether the vector is only being read, elements are being inserted or elements can be both inserted and removed.
In the first case, you don't have any other possibility than using locks, since you always need to check whether the vector is being reallocated, and wait for the reallocation to complete if necessary.
On the other hand, if you are completely sure that the vector is only initialized once by a single thread (which is not your case), probably you would not need any synchronization mechanism to perform access to vector elements (inside-element access synchronization may still be required though).
If elements are being inserted and removed from the back of the vector only (queue style), then using atomic compare and swap would be enough (atomically increase the size of the vector, and insert in position size-1 when the swap was successful.
If elements may be removed at any point of the vector, its contents may need to be moved to remove empty holes. This case is similar to a reallocation. You can use a customized heap to manage the empty positions in your vector, although this will increase the complexity.
At the end of the day, probably you will need to either develop your own parallel data structure or rely on a library, such as TBB or Boost.

Multithreaded read-many, write-seldom array/vector iteration in C++

I have a need to almost-constantly iterate over a sequence of structs in a read-only fashion but for every 1M+ reads, one of the threads may append an item. I think using a mutex would be overkill here and I also read somewhere that r/w locks have their own drawbacks for the readers.
I was thinking about using reserve() on a std::vector but this answer Iterate over STL container using indices safe way to avoid using locks? seemed to invalidate that.
Any ideas on what way might be fastest? The most important thing is for the readers to be able to quickly and efficiently iterate with as little contention as possible. The writing operations aren't time-sensitive.
Update: Another one of my use cases is that the "list" could contain pointers rather than structs. I.e, std::vector. Same requirements apply.
Update 2: Hypothetic example
globally accessible:
typedef std::vector<MyClass*> Vector;
Vector v;
v.reserve(50);
Reader threads 1-10: (these run pretty much run all the time)
.
.
int total = 0;
for (Vector::const_iterator it = v.begin(); it != v.end(); ++it)
{
MyClass* ptr = *it;
total += ptr->getTotal();
}
// do something with total
.
.
Writer threads 11-15:
MyClass* ptr = new MyClass();
v.push_back(ptr);
That's basically what happens here. threads 1-15 could all be running concurrently although generally there are only 1-2 reading threads and 1-2 writer threads.
What I think could work here is own implementation of vector, something like this:
template <typename T> class Vector
{
// constructor will be needed of course
public:
std::shared_ptr<const std::vector<T> > getVector()
{ return mVector; }
void push_back(const T&);
private:
std::shared_ptr<std::vector<T> > mVector;
};
Then, whenever readers need to access a specific Vector, they should call getVector() and keep the returned shared_ptr until finished reading.
But writers should always use Vector's push_back to add new value. This push_back should then check if mVector.size() == mVector.capacity() and if true, allocate new vector and assign it to mVector. Something like:
template <typename T> Vector<T>::push_back(const T& t)
{
if (mVector->size() == mVector->capacity())
{
// make certain here that new_size > old_size
std::vector<T> vec = new std::vector<T> (mVector->size() * SIZE_MULTIPLIER);
std::copy(mVector->begin(), mVector->end(), vec->begin());
mVector.reset(vec);
}
// put 't' into 'mVector'. 'mVector' is guaranteed not to reallocate now.
}
The idea here is inspired by RCU (read-copy-update) algorithm. If storage space is exhausted, the new storage should not invalidate the old storage as long as there is at least one reader accessing it. But, the new storage should be allocated and any reader coming after allocation, should be able to see it. The old storage should be deallocated as soon as no one is using it anymore (all readers are finished).
Since most HW architectures provide some way to have atomic increments and decrements, I think shared_ptr (and thus Vector) will be able to run completely lock-less.
One disadvantage to this approach though, is that depending on how long readers hold that shared_ptr you might end up with several copies of your data.
PS: hope I haven't made too many embarrassing errors in the code :-)
... using reserve() on a std::vector ...
This can only be useful if you can guarantee the vector will never need to grow. You've stated that the number if items is not bounded above, so you can't give that guarantee.
Notwithstanding the linked question, you could conceivable use std::vector just to manage memory for you, but it would take an extra layer of logic on top to work around the problems identified in the accepted answer.
The actual answer is: the fastest thing to do is minimize the amount of synchronization. What the minimal amount of synchronization is depends on details of your code and usage that you haven't specified.
For example, I sketched a solution using a linked-list of fixed-size chunks. This means your common use case should be as efficient as an array traversal, but you're able to grow dynamically without re-allocating.
However, the implementation turns out to be sensitive to questions like:
whether you need to remove items
whenever they're read?
only from the front or from other places?
whether you want the reader to busy-wait if the container is empty
whether this should use some kind of backoff
what degree of consistency is required?

C++ STL vector iterator vs indexes access and thread safety

I am iterating over an STL vector and reading values from it. There is another thread which can make changes to this vector. Now, if the other thread inserts or removes and element from the vector, it invalidates the iterator. There is no use of locks involved. Does my choice of accessing the container through indexes(Approach 1) in place of iterators(Approach 2) make it thread safe? What about performance?
struct A{int i; int j;};
Approach 1:
size_t s = v.size();//v contains pointers to objects of type A
for(size_t i = 0; i < s; ++i)
{
A* ptr = v[i];
ptr->i++;
}
Approach 2:
std::vector<A*>::iterator begin = v.begin();
std::vector<A*>::iterator end = v.end();
for(std::vector<A*>::iterator it = begin; it != end; ++it)
{
A* ptr = *it;
ptr->i++:
}
The thread-safety guarantees for standard library containers are very straight forward (these rules were added in C++ 2011 but essentially all current library implementations conform to these requirements and impose the corresponding restrictions):
it is OK to have multiple concurrent readers
if there is one thread modifying a container there shall be no other thread accessing (reading or writing) it
the requirements are per container object
Effectively, this means that you need to use some mechanism external to the container to guarantee that a container accessed from multiple threads is handled correctly. For example, you can use a mutex or a readerwriter lock. Of course, most of the time containers are accessed only from one thread and things work just fine without any locking.
Without using explict locks you will cause data races and the behavior is undefined, independent of whether you use indices or iterators.
OP "Does my choice of accessing the container through indexes(Approach 1) in place of iterators(Approach 2) make it thread safe?"
No, neither approach is thread safe once you start writing to your data structure.
Therefore you will need to serialize access to your data structure.
To save you a lot of time and frustration there a lot of ready-rolled solutions e.g.
Intel Threading Building Blocks (TBB) which comes with thread safe containers such as concurrent_vector.
http://threadingbuildingblocks.org/
A concurrent_vector is a container with the following features:
Random access by index. The index of the first element is zero.
Multiple threads can grow the container and append new elements concurrently.
Growing the container does not invalidate existing iterators or indices.*
OP "What about performance?"
Not knowable. Different performance on different systems with different compilers but not known to be large enough to influence your choices.
No. STL containers are not thread safe.
You should provide exclusive access to each thread(the one that removes/the one that adds), while they're accessing the vector. Even when using indexes, you might be removing the i-th elemenet, making the pointer you had retrieved, invalid.
Could your algorithm work with a fixed size array?
Reason I ask is that the only way, logically, to have multiple threads modifying (most kinds of) container in a thread-safe, lock-free way is to make the container itself invariant. That means the CONTAINER doesn't ever change within the threads, just the elements within it. Think of the difference between messing with the insides of boxcars on a train, vs. actually adding & removing entire boxcars FROM that train as its moving down the tracks. Even meddling with the elements is only safe if your operations on that data observe certain constraints.
Good news is that locks are not always the end of the world. If multiple execution contexts (threads, programs, etc.) can hit the same object simultaneously, they're often the only solution anyway.

Is it possible to use mutex to lock an element in a vector not the whole vector?

Is it possible to use mutex to lock an element in a vector not the whole vector ?
For example, given a vector myVec;
push back 10 elements into myVec
for (int i = 0; i < 10; ++i)
{
buffer myBuf = i; // actually myBuf is not necessarily int.
myVec.push_back(myBuf);
}
Each element of the vector will be changed asynchronously by multiple threads.
How to use mutex to lock only one buffer in myVec such that one thread can write or read an element ; another can read and write another element at the same time ?
thanks
What you want is both simpler and more difficult than you think:
If your container as a whole is unchanged, i.e. there are no insertions or erases, then the standard library containers already offer a limited type of thread safety, which is that different threads are allowed to read or modify different container elements, i.e. as long as no more than one thread accesses any given element.
On the other hand, if the container is modified as a whole, then you have almost no safeties at all: Depending on the type of container, you absolutely must understand reference and iterator invalidation. If you know that references or iterators to an element are unaffected, then the above applies (respectively to the reference or the dereferenced iterator). If not, then you have no hope of doing anything other than reacquiring a new reference to the desired element.
If the vector is initialized at startup it is just like a fixed size array, so there is no need to lock it.
I would prefer an array at that point :) allocated with new[] if you want.
If, let's say, threadN access only fieldN there is no need of any lock, lock is needed when several threads try to access for read AND write the same resource.
If one thread access only one resource for read and write and that resource is not accessed by any other thread, there are absolutely no problems! You don't need any lock.
If one resource is accessed between several threads only in readonly mode, you don't need any lock.
And if it was not clear, in your case, array[i] is a read/write resource, while array is a shared readonly resource.
If you need to synchronize each element you need a mutex for each element.
If there are n resources accessed by m threads, you need to lock the resources using n mutexes. They are not expensive.
If you have really too many resources you can lock portions of the array: a single mutex will make your application single threaded, but you can assign 1 mutex every 10 items, for example. In this way you reduce the number of mutex but in the same time you ensure that not too much threads are not stalled together.

STL vector and thread-safety

Let's say I have a vector of N elements, but up to n elements of this vector have meaningful data. One updater thread updates the nth or n+1st element (then sets n = n+1), also checks if n is too close to N and calls vector::resize(N+M) if necessary. After updating, the thread calls multiple child threads to read up to nth data and do some calculations.
It is guaranteed that child threads never change or delete data, (in fact no data is deleted what so ever) and updater calls children just after it finishes updating.
So far no problem has occured, but I want to ask whether a problem may occur during reallocating of vector to a larger memory block, if there are some child working threads left from the previous update.
Or is it safe to use vector, as it is not thread-safe, in such a multithreaded case?
EDIT:
Since only insertion takes place when the updater calls vector::resize(N+M,0), are there any possible solutions to my problem? Due to the great performance of STL vector I am not willing to replace it with a lockable vector or in this case are there any performant,known and lock-free vectors?
I want to ask whether a problem may occur during reallocating of vector to a larger memory block, if there are some child working threads left from the previous update.
Yes, this would be very bad.
If you are using a container from multiple threads and at least one thread may perform some action that may modify the state of the container, access to the container must be synchronized.
In the case of std::vector, anything that changes its size (notably, insertions and erasures) change its state, even if a reallocation is not required (any insertion or erasure requires std::vector's internal size bookkeeping data to be updated).
One solution to your problem would be to have the producer dynamically allocate the std::vector and use a std::shared_ptr<std::vector<T> > to own it and give this std::shared_ptr to each of the consumers.
When the producer needs to add more data, it can dynamically allocate a new std::vector with a new, larger size and copies of the elements from the old std::vector. Then, when you spin off new consumers or update consumers with the new data, you simply need to give them a std::shared_ptr to the new std::vector.
Is how your workers decide to work on data thread safe? Is there any signaling between workers done and the producer? If not then there is definitely an issue where the producer could cause the vector to move while it is still being worked on. Though this could trivially be fixed by moving to a std::deque instead.(note that std::deque invalidates iterators on push_back but references to elements are not affected).
I've made my own GrowVector. It works for me and it is really fast.
Link: QList, QVector or std::vector multi-threaded usage