C++ STL vector iterator vs indexes access and thread safety

C++ STL vector iterator vs indexes access and thread safety - c++

I am iterating over an STL vector and reading values from it. There is another thread which can make changes to this vector. Now, if the other thread inserts or removes and element from the vector, it invalidates the iterator. There is no use of locks involved. Does my choice of accessing the container through indexes(Approach 1) in place of iterators(Approach 2) make it thread safe? What about performance?
struct A{int i; int j;};
Approach 1:
size_t s = v.size();//v contains pointers to objects of type A
for(size_t i = 0; i < s; ++i)
{
A* ptr = v[i];
ptr->i++;
}
Approach 2:
std::vector<A*>::iterator begin = v.begin();
std::vector<A*>::iterator end = v.end();
for(std::vector<A*>::iterator it = begin; it != end; ++it)
{
A* ptr = *it;
ptr->i++:
}

The thread-safety guarantees for standard library containers are very straight forward (these rules were added in C++ 2011 but essentially all current library implementations conform to these requirements and impose the corresponding restrictions):
it is OK to have multiple concurrent readers
if there is one thread modifying a container there shall be no other thread accessing (reading or writing) it
the requirements are per container object
Effectively, this means that you need to use some mechanism external to the container to guarantee that a container accessed from multiple threads is handled correctly. For example, you can use a mutex or a readerwriter lock. Of course, most of the time containers are accessed only from one thread and things work just fine without any locking.
Without using explict locks you will cause data races and the behavior is undefined, independent of whether you use indices or iterators.

OP "Does my choice of accessing the container through indexes(Approach 1) in place of iterators(Approach 2) make it thread safe?"
No, neither approach is thread safe once you start writing to your data structure.
Therefore you will need to serialize access to your data structure.
To save you a lot of time and frustration there a lot of ready-rolled solutions e.g.
Intel Threading Building Blocks (TBB) which comes with thread safe containers such as concurrent_vector.
http://threadingbuildingblocks.org/
A concurrent_vector is a container with the following features:
Random access by index. The index of the first element is zero.
Multiple threads can grow the container and append new elements concurrently.
Growing the container does not invalidate existing iterators or indices.*
OP "What about performance?"
Not knowable. Different performance on different systems with different compilers but not known to be large enough to influence your choices.

No. STL containers are not thread safe.
You should provide exclusive access to each thread(the one that removes/the one that adds), while they're accessing the vector. Even when using indexes, you might be removing the i-th elemenet, making the pointer you had retrieved, invalid.

Could your algorithm work with a fixed size array?
Reason I ask is that the only way, logically, to have multiple threads modifying (most kinds of) container in a thread-safe, lock-free way is to make the container itself invariant. That means the CONTAINER doesn't ever change within the threads, just the elements within it. Think of the difference between messing with the insides of boxcars on a train, vs. actually adding & removing entire boxcars FROM that train as its moving down the tracks. Even meddling with the elements is only safe if your operations on that data observe certain constraints.
Good news is that locks are not always the end of the world. If multiple execution contexts (threads, programs, etc.) can hit the same object simultaneously, they're often the only solution anyway.

Related

Thread safety std::array [duplicate]

Lets say I have a vector of int which I've prefilled with 100 elements with a value of 0.
Then I create 2 threads and tell the first thread to fill elements 0 to 49 with numbers, then tell thread 2 to fill elements 50 to 99 with numbers. Can this be done? Otherwise, what's the best way of achieving this?
Thanks

Yes, this should be fine. As long as you can guarantee that different threads won't modify the same memory location, there's no problem.

Yes, for most implementations of vector, this should be ok to do. That said, this will have very poor performance on most systems, unless you have a very large number of elements and you are accessing elements that are far apart from each other so that they don't live on the same cache line... otherwise, on many systems, the two threads will invalidate each other's caches back-and-forth (if you are frequently reading/writing to those elements), leading to lots of cache misses in both threads.

Since C++11:
Different elements in the same container can be modified concurrently
by different threads, except for the elements of std::vector< bool>
(for example, a vector of std::future objects can be receiving values
from multiple threads).
cppreference discusses the thread safety of containers here in good detail.
Link was found in a Quora post.

The fact that "vector is not thread-safe" doesn't mean anything.
There's no problem with doing this.
Also you don't have to allocate your vector on heap (as one of the answers suggested). You just have to ensure that the lifetime of your vector covers the lifetime of your threads (more precisely - where those threads access the vector).
And, of course, since you want your both threads to work on the same vector - they must receive it from somewhere by pointer/reference rather than by value.
There's also absolutely no problem to access the same element of the array from within different threads. You should know however that your thread is not the only one that accesses it, and treat it respectively.
In simple words - there's no problem to access an array from within different threads.
Accessing the same element from different thread is like accessing a single variable from different thread - same precautions/consequences.
The only situation you have to worry about is when new elements are added, which is impossible in your case.

There is no reason why this cannot be done. But, as soon as you start mixing accesses (both threads accessing the same element) it becomes far more challenging.

vector is not thread safe. You need to guard the vector between threads. In your case it depends on vector implementation. If the vector internal data is accessed\modified from different threads, it purely depends on vector impl.

With arrays it can be done for sure (the threads do not access the same memory area); but as already noted, if you use the std::vector class, the result may depend on how it is implemented. Indeed I don't see how the implementation of [] on the vector class can be thread unsafe (provided the threads try to access different "indexes"), but it could be.
The solution is: stick to the use of an array, or control the access to the vector using a semaphore or similar.

What you describe is quite possible, and should word just fine.
Note, however, that the threads will need to work on a std::vector*, i.e. a pointer to the original vector---and you probably should allocate the vector on the heap, rather than the stack. If you pass the vector directly, the copy constructor will be invoked and create a separate copy of the data on each thread.
There are also lots of more subtle ways to get this wrong, as always with multithreaded programming. But in principle what you described will work well.

This should be fine, but you would end up with poor performance due to false sharing where each thread can potentially invalidate each other's cache line

Can we do something atomically with 2 or more lock-free containers without locking both?

I'm looking for Composable operations - it fairly easily to do using transactional memory. (Thanks to Ami Tavory)
And it easily to do using locks (mutex/spinlock) - but it can lead to deadlocks - so lock-based algorithms composable only with manual tuning.
Lock-free algorithms do not have the problem of deadlocks, but it is not composable. Required to designed 2 or more containers as a single composed lock-free data structure.
Is there any approach, helper-implementation or some lock-free algorithms - to atomically work with several lock-free containers to maintain consistency?
To check if if an item is in both containers at once
To move element from one container to another atomically
...
Or can RCU or hazard-pointers help to do this?
As known, we can use lock-free containers, which is difficult in its implementations, for example from Concurrent Data Structures (CDS) library: http://libcds.sourceforge.net/doc/cds-api/group__cds__nonintrusive__map.html
And for example we can use lock-free ordered-map like SkipList CDS-lib
But even simple lock-free algorithm is not lock-free for any cases:
Iterators documentation-link
You may iterate over skip-list set items only under RCU lock. Only in
this case the iterator is thread-safe since while RCU is locked any
set's item cannot be reclaimed. The requirement of RCU lock during
iterating means that deletion of the elements (i.e. erase) is not
possible.
::contains(K const &key) - documentation-link
The function applies RCU lock internally.
To ::get(K const &key) and update element which we got, we should use lock: documentation-link
Example:
typedef cds::container::SkipListMap< cds::urcu::gc< cds::urcu::general_buffered<> >, int, foo, my_traits > skip_list;
skip_list theList;
// ...
typename skip_list::raw_ptr pVal;
{
// Lock RCU
skip_list::rcu_lock lock;
pVal = theList.get( 5 );
if ( pVal ) {
// Deal with pVal
//...
}
}
// You can manually release pVal after RCU-locked section
pVal.release();
But if we use 2 lock-free containers instead of 1, and if we use only methods wich is always lock-free, or one of it lock-free, then can we do it without locking both containers?
typedef cds::urcu::gc< cds::urcu::general_buffered<> > rcu_gpb;
cds::container::SkipListMap< rcu_gpb, int, int > map_1;
cds::container::SkipListMap< rcu_gpb, int, int > map_2;
Can we atomically move 1 element from map_1 to map_2 without locking both containers - i.e. map_1.erase(K const &key) and map_2.insert(K const &key, V const &val) if we want to maintain atomicity and consistency:
that other threads do not see that there is no element in the first container, and he still had not appear in the second
that other threads do not see that there is element in the first container, and the same element already in the second
Can we do something atomically with 2 or more lock-free containers without locking both - if we want to maintain atomicity and consistency?
ANSWER: We can't do any atomically operations with two or more lock-free containers at once without locks by using simply its usual functions.
Only if we do 1 simply operation provided by lock-free algorithm in containers-API then for 2 lock-free containers it is enough 1 lock, exclude 3 cases described above when even in lock-free containers uses locks.
Also "but maybe something with a bunch of extra overhead" if you made complicated custom improvements of lock-free algorithms then you can provide some composable, for example, as "the two queues know about each other, and the code looking at them is carefully designed" as Peter Cordes noted.

TL:DR: what you're asking doesn't make a lot of sense, as Yakk points out. But since you only asked for a way to do it without locking both containers, here's something you can do. If this isn't what you're looking for, then maybe this will help illustrate one of the problems with how you've posed the question.
A multiple-readers / single-writer lock on one of the containers would allow it easily, and solve the problem of observing both containers.
But then lock-free access to the container you lock is never allowed, so it's pointless to use a lock-free container.
If you hold a read-lock on the locking container while you observe the lock-free container, then whatever you learned about the locking container is still true while you observe the lock-free container.
Taking a write-lock on the locking container stops any readers from observing the locked data structure while you remove an element. So you'd use an algorithm like:
write_lock(A); // exclude readers from A
tmp = pop(A);
push(B, tmp);
write_unlock(A); // allow readers to observe A again, after both ops are done
Moving a node in the other direction works the same way: do both the remove and add while holding a write-lock on the locking container.
You can save copying by temporarily having the element in both containers, instead of temporarily in neither (copied to a temporary).
write_lock(A); // exclude readers from A
B.add(A[i]); // copy directly from A to B
A.remove(i);
write_unlock(A); // allow readers to observe A again, after both ops are done
I'm not claiming that there is no lock-free way to do this, BTW. #Ami points out that transactional memory can support synchronization composability.
But the major problem with your specification is that it's not clear what exactly you're trying to stop potential observers from observing, since they can only observe two lock-free data structures in one order or another, not atomically, as #Yakk points out.
If you control which order the observers do their observing, and which order the writers do their writing, that might be all you need.
If you need stronger linking between two containers, they probably have to be designed as a single lock-free data structure that knows about both containers.

Multithreading on arrays / Do I need locking mechanisms here?

I am writing a Multithreaded application. That application contains an array of length, lets say, 1000.
If I now would have two threads and I would make sure, that thread 1 will only access the elements 0-499 and thread 2 would only access elements 500-999, would I need a locking mechanism to protect the array or would that be fine.
Note: Only the content of the array will be changed during calculations! The array wont be moved, memcpyed or in some other way altered than altering elements inside of the array.

What you want is perfectly fine! Those kind of strategies (melt together with a bunch of low level atomic primitives) are the basis for what's called lock-free programming.

Actually, there could be possible problems in implementing this solution. You have to strongly guarantee the properties, that you have mentioned.
Make sure, that your in memory data array never moves. You cannot rely on most std containers. Most of them could significantly change during modification. std::map are rebalancing inner trees and making some inner pointers invalid. std::vector sometimes reallocates the whole container when inserting.
Make sure that there is only one consumer and only one producer for any data, that you have. Each consumer have to store inner iterator in valid state to prevent reading same item twice, or skip some item. Each producer must put data in valid place, without possibility to overwrite existing, not read data.
Disobeying of any of this rules makes you need to implement mutexes.

should I synchronize the deque or not

I have a deque with pointers inside in a C++ application. I know there are two threads to access it.
Thread1 will add pointers from the back and Thread2 will process and remove pointers from the front.
The Thread2 will wait until deque reach certain of amount, saying 10 items, and then start to process it. It will only loop and process 10 items at a time. In the meantime, Thread1 may still keep adding new items into the deque.
I think it will be fine without synchronize the deque because Thread1 and Thread2 are accessing different part of the deque. It is deque not vector. So there is no case that the existing memory of the container will be reallocated.
Am I right? if not, why (I want to know what I am missing)?
EDIT:
I know it will not hurt to ALWAYS synchronize it. But it may hurt the performance or not necessary. I just want it run faster and correctly if possible.

The deque has to keep track of how many elements it has and where those elements are. Adding an element changes that stored data, as does removing an element. Changing that data from two threads without synchronization is a data race, and produces undefined behavior.
In short, you must synchronize those operations.

In general, the Standard Library containers cannot be assumed to be thread-safe unless all you do is reading from them.
If you take a look under the covers, at deque implementation, you will uncover something similar to this:
template <typename T>
class deque {
public:
private:
static size_t const BufferCapacity = /**/;
size_t _nb_available_buffer;
size_t _first_occupied_buffer;
size_t _last_occupied_buffer;
size_t _size_first_buffer;
size_t _size_last_buffer;
T** _buffers; // heap allocated array of
// heap allocated arrays of fixed capacity
}; // class deque
Do you see the problem ? _buffers, at the very least, may be access concurrently by both enqueue and dequeue operations (especially when the array has become too small and need be copied in a bigger array).
So, what is the alternative ? What you are looking for is a concurrent queue. There are some implementations out there, and you should probably not worry too much on whether or not they are lock-free unless it proves to be a bottleneck. An example would be TTB concurrent_queue.
I would advise against creating your own lock-free queue, even if you heard it's all the fad, because all first implementations I have seen had (sometimes subtle) race-conditions.

Can multiple threads access a vector at different places?

Lets say I have a vector of int which I've prefilled with 100 elements with a value of 0.
Then I create 2 threads and tell the first thread to fill elements 0 to 49 with numbers, then tell thread 2 to fill elements 50 to 99 with numbers. Can this be done? Otherwise, what's the best way of achieving this?
Thanks

Yes, this should be fine. As long as you can guarantee that different threads won't modify the same memory location, there's no problem.

Yes, for most implementations of vector, this should be ok to do. That said, this will have very poor performance on most systems, unless you have a very large number of elements and you are accessing elements that are far apart from each other so that they don't live on the same cache line... otherwise, on many systems, the two threads will invalidate each other's caches back-and-forth (if you are frequently reading/writing to those elements), leading to lots of cache misses in both threads.

Since C++11:
Different elements in the same container can be modified concurrently
by different threads, except for the elements of std::vector< bool>
(for example, a vector of std::future objects can be receiving values
from multiple threads).
cppreference discusses the thread safety of containers here in good detail.
Link was found in a Quora post.

The fact that "vector is not thread-safe" doesn't mean anything.
There's no problem with doing this.
Also you don't have to allocate your vector on heap (as one of the answers suggested). You just have to ensure that the lifetime of your vector covers the lifetime of your threads (more precisely - where those threads access the vector).
And, of course, since you want your both threads to work on the same vector - they must receive it from somewhere by pointer/reference rather than by value.
There's also absolutely no problem to access the same element of the array from within different threads. You should know however that your thread is not the only one that accesses it, and treat it respectively.
In simple words - there's no problem to access an array from within different threads.
Accessing the same element from different thread is like accessing a single variable from different thread - same precautions/consequences.
The only situation you have to worry about is when new elements are added, which is impossible in your case.

There is no reason why this cannot be done. But, as soon as you start mixing accesses (both threads accessing the same element) it becomes far more challenging.

vector is not thread safe. You need to guard the vector between threads. In your case it depends on vector implementation. If the vector internal data is accessed\modified from different threads, it purely depends on vector impl.

With arrays it can be done for sure (the threads do not access the same memory area); but as already noted, if you use the std::vector class, the result may depend on how it is implemented. Indeed I don't see how the implementation of [] on the vector class can be thread unsafe (provided the threads try to access different "indexes"), but it could be.
The solution is: stick to the use of an array, or control the access to the vector using a semaphore or similar.

What you describe is quite possible, and should word just fine.
Note, however, that the threads will need to work on a std::vector*, i.e. a pointer to the original vector---and you probably should allocate the vector on the heap, rather than the stack. If you pass the vector directly, the copy constructor will be invoked and create a separate copy of the data on each thread.
There are also lots of more subtle ways to get this wrong, as always with multithreaded programming. But in principle what you described will work well.

This should be fine, but you would end up with poor performance due to false sharing where each thread can potentially invalidate each other's cache line

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js