Can multiple threads access a vector at different places? - c++

Lets say I have a vector of int which I've prefilled with 100 elements with a value of 0.
Then I create 2 threads and tell the first thread to fill elements 0 to 49 with numbers, then tell thread 2 to fill elements 50 to 99 with numbers. Can this be done? Otherwise, what's the best way of achieving this?
Thanks

Yes, this should be fine. As long as you can guarantee that different threads won't modify the same memory location, there's no problem.

Yes, for most implementations of vector, this should be ok to do. That said, this will have very poor performance on most systems, unless you have a very large number of elements and you are accessing elements that are far apart from each other so that they don't live on the same cache line... otherwise, on many systems, the two threads will invalidate each other's caches back-and-forth (if you are frequently reading/writing to those elements), leading to lots of cache misses in both threads.

Since C++11:
Different elements in the same container can be modified concurrently
by different threads, except for the elements of std::vector< bool>
(for example, a vector of std::future objects can be receiving values
from multiple threads).
cppreference discusses the thread safety of containers here in good detail.
Link was found in a Quora post.

The fact that "vector is not thread-safe" doesn't mean anything.
There's no problem with doing this.
Also you don't have to allocate your vector on heap (as one of the answers suggested). You just have to ensure that the lifetime of your vector covers the lifetime of your threads (more precisely - where those threads access the vector).
And, of course, since you want your both threads to work on the same vector - they must receive it from somewhere by pointer/reference rather than by value.
There's also absolutely no problem to access the same element of the array from within different threads. You should know however that your thread is not the only one that accesses it, and treat it respectively.
In simple words - there's no problem to access an array from within different threads.
Accessing the same element from different thread is like accessing a single variable from different thread - same precautions/consequences.
The only situation you have to worry about is when new elements are added, which is impossible in your case.

There is no reason why this cannot be done. But, as soon as you start mixing accesses (both threads accessing the same element) it becomes far more challenging.

vector is not thread safe. You need to guard the vector between threads. In your case it depends on vector implementation. If the vector internal data is accessed\modified from different threads, it purely depends on vector impl.

With arrays it can be done for sure (the threads do not access the same memory area); but as already noted, if you use the std::vector class, the result may depend on how it is implemented. Indeed I don't see how the implementation of [] on the vector class can be thread unsafe (provided the threads try to access different "indexes"), but it could be.
The solution is: stick to the use of an array, or control the access to the vector using a semaphore or similar.

What you describe is quite possible, and should word just fine.
Note, however, that the threads will need to work on a std::vector*, i.e. a pointer to the original vector---and you probably should allocate the vector on the heap, rather than the stack. If you pass the vector directly, the copy constructor will be invoked and create a separate copy of the data on each thread.
There are also lots of more subtle ways to get this wrong, as always with multithreaded programming. But in principle what you described will work well.

This should be fine, but you would end up with poor performance due to false sharing where each thread can potentially invalidate each other's cache line

Related

Thread safety std::array [duplicate]

Lets say I have a vector of int which I've prefilled with 100 elements with a value of 0.
Then I create 2 threads and tell the first thread to fill elements 0 to 49 with numbers, then tell thread 2 to fill elements 50 to 99 with numbers. Can this be done? Otherwise, what's the best way of achieving this?
Thanks
Yes, this should be fine. As long as you can guarantee that different threads won't modify the same memory location, there's no problem.
Yes, for most implementations of vector, this should be ok to do. That said, this will have very poor performance on most systems, unless you have a very large number of elements and you are accessing elements that are far apart from each other so that they don't live on the same cache line... otherwise, on many systems, the two threads will invalidate each other's caches back-and-forth (if you are frequently reading/writing to those elements), leading to lots of cache misses in both threads.
Since C++11:
Different elements in the same container can be modified concurrently
by different threads, except for the elements of std::vector< bool>
(for example, a vector of std::future objects can be receiving values
from multiple threads).
cppreference discusses the thread safety of containers here in good detail.
Link was found in a Quora post.
The fact that "vector is not thread-safe" doesn't mean anything.
There's no problem with doing this.
Also you don't have to allocate your vector on heap (as one of the answers suggested). You just have to ensure that the lifetime of your vector covers the lifetime of your threads (more precisely - where those threads access the vector).
And, of course, since you want your both threads to work on the same vector - they must receive it from somewhere by pointer/reference rather than by value.
There's also absolutely no problem to access the same element of the array from within different threads. You should know however that your thread is not the only one that accesses it, and treat it respectively.
In simple words - there's no problem to access an array from within different threads.
Accessing the same element from different thread is like accessing a single variable from different thread - same precautions/consequences.
The only situation you have to worry about is when new elements are added, which is impossible in your case.
There is no reason why this cannot be done. But, as soon as you start mixing accesses (both threads accessing the same element) it becomes far more challenging.
vector is not thread safe. You need to guard the vector between threads. In your case it depends on vector implementation. If the vector internal data is accessed\modified from different threads, it purely depends on vector impl.
With arrays it can be done for sure (the threads do not access the same memory area); but as already noted, if you use the std::vector class, the result may depend on how it is implemented. Indeed I don't see how the implementation of [] on the vector class can be thread unsafe (provided the threads try to access different "indexes"), but it could be.
The solution is: stick to the use of an array, or control the access to the vector using a semaphore or similar.
What you describe is quite possible, and should word just fine.
Note, however, that the threads will need to work on a std::vector*, i.e. a pointer to the original vector---and you probably should allocate the vector on the heap, rather than the stack. If you pass the vector directly, the copy constructor will be invoked and create a separate copy of the data on each thread.
There are also lots of more subtle ways to get this wrong, as always with multithreaded programming. But in principle what you described will work well.
This should be fine, but you would end up with poor performance due to false sharing where each thread can potentially invalidate each other's cache line

Is writing to elements within children of a multi-dimensional vector thread-safe?

I'm trying to take a (very) large vector, and reassign all of the values in it into a multidimensional (2D) vector>.
The multidimensional vector has both dimensions resized to the correct size prior to value population, to avoid reallocation.
Currently, I am doing it single-threaded, but it is something that needs to happen repeatedly, and is very slow due to the large size (~7 seconds). The question is whether it is thread-safe for me to use, for instance, a thread per 2D element.
Some pseudocode:
vector<string> source{/*assume that it is populated by 8,000,000 strings
of varying length*/};
vector<vector<string>> destination;
destination.resize(8);
for(loop=0;loop<8;loop++)destination[loop].resize(1000000);
//current style
for(loop=0;loop<source.size();loop++)destination[loop/1000000][loop%1000000]=source[loop];
//desired style
void Populate(int index){
for(loop=0;loop<destination[index].size();loop++)destination[index][loop]=source[index*1000000+loop];
}
for(loop=0;loop<8;loop++)boost::thread populator(populate,loop);
I would think that the threaded version should work, since they're writing to separate 2nd dimensional elements. However, I'm not sure whether writing the strings would break things, since they are being resized.
When considering only thread-safety, this is fine.
Writing concurrently to distinct objects is allowed. C++ considers objects distinct even if they are neighboring fields in a struct or elements in the same array. The data type of the object does not matter here, so this holds true for string just as well as it does for int. The only important thing is that you must ensure that the ranges that you operate on are really fully distinct. If there is any overlap, you have a data race on your hands.
There is however, another thing to take into consideration here and that is performance. This is highly platform dependent, so the language standard does not give you any rules here, but there are some effects to look out for. For instance, neighboring elements in an array might reside on the same cache line. So in order for the hardware to be able to fulfill the thread-safety guarantees of the language, it must synchronize access to such elements. For instance: Partitioning array access in a way that one thread works out all the elements with even indices, while another works on the odd indices is technically thread-safe, but puts a lot of stress on the hardware as both threads are likely to contend for data stored on the same cache line.
Similarly, your case there is contention on the memory bus. If your threads are able to complete calculation of the data much faster than you are able to write them to memory, you might not actually gain anything by using multiple threads, because all the threads will end up waiting for the memory in the end.
Keep these things in mind when deciding whether parallelism is really the right solution to your problem.

Is calling std::copy from multiple threads for different ranges of the same vector safe?

I am computing floats from multiple threads and storing the results in non-overlapping ranges of the the same vector<float> as follows:
Before running any of the threads I pre-allocated it using vector::reserve.
In each thread a thread-specific vector of results is computed and then copied into the target container like this:
vector<float>::iterator destination = globalVector.begin() + threadSpecificIndex;
std::copy( localVector.begin(), localVector.end(), destination );
Is this a safe practice?
First vector::reserve doesn't actually create any elements. It just sets the the capacity of the vector. If you need the elements to be there you need vector::resize or just construct the vector to the size needed.
Secondly the rule of thumb is if you have a shared object between threads and at least one of them is a writer you need synchronization. Since in this case the "object" is the iterator range and they do not overlap you are okay in this regard. As long as the vectors size is not changed then you should be okay.
One issue you could have with this is false sharing. If the same cache line contains variables that different threads are using then those cache lines have to re-synchronized every time a variable in the line is updated. This can slow down the performance of the code quite a bit.
If the vector has fixed size (and it seems like it has from your question), and ranges are not overlapping, then:
vector won't be reallocated
different threads won't access the same memory
thus I don't see any data races here. (But according to 1st comment to your question, you must ensure, that vector has this fixed size when it is used). You can see also "Data Races" section for std::copy: http://www.cplusplus.com/reference/algorithm/copy/

Multithreading on arrays / Do I need locking mechanisms here?

I am writing a Multithreaded application. That application contains an array of length, lets say, 1000.
If I now would have two threads and I would make sure, that thread 1 will only access the elements 0-499 and thread 2 would only access elements 500-999, would I need a locking mechanism to protect the array or would that be fine.
Note: Only the content of the array will be changed during calculations! The array wont be moved, memcpyed or in some other way altered than altering elements inside of the array.
What you want is perfectly fine! Those kind of strategies (melt together with a bunch of low level atomic primitives) are the basis for what's called lock-free programming.
Actually, there could be possible problems in implementing this solution. You have to strongly guarantee the properties, that you have mentioned.
Make sure, that your in memory data array never moves. You cannot rely on most std containers. Most of them could significantly change during modification. std::map are rebalancing inner trees and making some inner pointers invalid. std::vector sometimes reallocates the whole container when inserting.
Make sure that there is only one consumer and only one producer for any data, that you have. Each consumer have to store inner iterator in valid state to prevent reading same item twice, or skip some item. Each producer must put data in valid place, without possibility to overwrite existing, not read data.
Disobeying of any of this rules makes you need to implement mutexes.

Is it safe to access different sub-arrays with different threads without syncing?

If I have 10 threads, and an array of 10 sub-arrays, is it safe to have each thread do work on a different one of the sub-arrays? i.e. thread[0] does stuff to array[0], thread[1] does stuff to array[1], etc. Or is this not safe to do? Does it make a difference if it's a vector or array (or any data set for that matter)?
Yes, you're safe. As long as none of the threads modifies a resource other threads access without guards or syncing you're safe. It doesn't matter if the memory addresses are very close to each other; proximity doesn't play a role. All that matters is whether there's sharing, and if so does any of the threads modify the shared resource.
Yes, but beware of false sharing.
Essentially yes - it is safe at the array level (but it may also depend - see below). However, if it were another structure, for example a tree or a doubly-linked list, then you may run into issues if you try to modify the structure since a change to one element may require changes to other elements as well which is not safe. But as long as you are only reading the data, you should be OK. One possible pitfall is if the array contains references or pointers. In this case it may happen that while you are accessing separate array entries, the are directly or indirectly referencing the same areas in memory. In that case, you must perform proper synchronization.
So in one word: if it's an array of int or another simple data type, you are completely safe. If it's not an array or the elements are not completely in-place but contain pointers or references, you should be careful.
If you create a "mother-array" which contains 10 smaller arrays and each thread is accessing one of those arrays exclusively then nothing bad can happen. The size of the elements of those arrays does not matter.
If you use more complex structures instead of arrays, if reading does not change anything then you are safe too. However, if a simple read from the structure can modify it (e.g. something is cached, reorganised), having parallel threads accessing the mother-structure can be problematic.
Apart from that - I don't see any case when this could cause trouble.