Thread safety and block write size - c++

Is there a minimal block size that multiple threads can write to in a contiguous block of memory that avoids race conditions or the losing of values?
For example, can 32 threads write to individual elements of this array without affecting the other values?
int array[32];
How about this?
bool array[32];
How about an object that stores simple true/false into a bit array?
I'm guessing there is some block write size or cache related functionality that would come into play that determines this. Is that correct? And is there anything standard/safe with regards to this size(platform defines, etc)?

Is there a minimal block size that multiple threads can write to in a contiguous block of memory that avoids race conditions or the losing of values?
No. A conflict (and potential data race) only occurs if two threads access the same memory location (that is, the same byte). Two threads accessing different objects won't conflict.
For example, can 32 threads write to individual elements of this array without affecting the other values?
Yes, each element has its own memory location, so two threads accessing different elements won't conflict.
How about this?
Yes; again, each bool has its own location.
How about an object that stores simple true/false into a bit array?
No; that would pack multiple values into a larger element, and two threads accessing the same larger element would conflict.
I'm guessing there is some block write size or cache related functionality that would come into play that determines this.
There could be a performance impact (known as false sharing) when multiple threads access the same cache line; but the language guarantees that it won't affect the program's correctness as long as they don't access the same memory location.

There is no garuntee in standard. If you need exclusive access to element you may use std::atomic.i.e. You may use like:
std::vector<std::atomic<int> > array;
Otherwise you are always free to use std::mutex.
can 32 threads write to individual elements of this array without affecting the other values?
You are free to do this provided that one thread does interfare with other. i.e thread i modifies the value of array[i] ONLY.

Related

Do I need to use a mutex to protect access to an array of mutexes from different threads?

Let's say I have a bunch of files, and an array with a mutex for each file. Now I have different threads reading from random files, but first they need to acquire the lock from the array. Should I have a lock on the entire array that must be acquired before taking the mutex for the particular file?
No, but what you do is to bring the memory in which these mutexes live into every thread since you placed the mutexes close on purpose.
Keep the other threads accesses to memory away from what the other individual threads deal with.
Assosiate each thread with data as tightly packed (but aligned), and as in as few cache lines, as possible. One mutex and one data set - nowhere close to where the other working threads needs access.
You can easily measure the effect by using a homemade std::hardware_constructive_interference_size like ... 64 (popular, non-scientific, but common).
Separate the data in such a way that no other thread needs to touch data within those 64 (or whatever number you come up with) bytes.
It's a "no kidding?" experience.
The number 64 is almost arbitrary. I can compile a program using that constant - but it will not be translated into something meaningful for a different target platform - it'll stay 64. It's a best guess.
Understanding std::hardware_destructive_interference_size and std::hardware_constructive_interference_size
No, accessing different elements of an array in different threads does not cause data races and a mutex can be used by multiple threads unsynchronized, because it must be able to to fulfill its purpose.
You do not need to add a lock for the array itself. The same is true for member functions of standard library containers that only access elements and do not modify the container itself.

Is writing to elements within children of a multi-dimensional vector thread-safe?

I'm trying to take a (very) large vector, and reassign all of the values in it into a multidimensional (2D) vector>.
The multidimensional vector has both dimensions resized to the correct size prior to value population, to avoid reallocation.
Currently, I am doing it single-threaded, but it is something that needs to happen repeatedly, and is very slow due to the large size (~7 seconds). The question is whether it is thread-safe for me to use, for instance, a thread per 2D element.
Some pseudocode:
vector<string> source{/*assume that it is populated by 8,000,000 strings
of varying length*/};
vector<vector<string>> destination;
destination.resize(8);
for(loop=0;loop<8;loop++)destination[loop].resize(1000000);
//current style
for(loop=0;loop<source.size();loop++)destination[loop/1000000][loop%1000000]=source[loop];
//desired style
void Populate(int index){
for(loop=0;loop<destination[index].size();loop++)destination[index][loop]=source[index*1000000+loop];
}
for(loop=0;loop<8;loop++)boost::thread populator(populate,loop);
I would think that the threaded version should work, since they're writing to separate 2nd dimensional elements. However, I'm not sure whether writing the strings would break things, since they are being resized.
When considering only thread-safety, this is fine.
Writing concurrently to distinct objects is allowed. C++ considers objects distinct even if they are neighboring fields in a struct or elements in the same array. The data type of the object does not matter here, so this holds true for string just as well as it does for int. The only important thing is that you must ensure that the ranges that you operate on are really fully distinct. If there is any overlap, you have a data race on your hands.
There is however, another thing to take into consideration here and that is performance. This is highly platform dependent, so the language standard does not give you any rules here, but there are some effects to look out for. For instance, neighboring elements in an array might reside on the same cache line. So in order for the hardware to be able to fulfill the thread-safety guarantees of the language, it must synchronize access to such elements. For instance: Partitioning array access in a way that one thread works out all the elements with even indices, while another works on the odd indices is technically thread-safe, but puts a lot of stress on the hardware as both threads are likely to contend for data stored on the same cache line.
Similarly, your case there is contention on the memory bus. If your threads are able to complete calculation of the data much faster than you are able to write them to memory, you might not actually gain anything by using multiple threads, because all the threads will end up waiting for the memory in the end.
Keep these things in mind when deciding whether parallelism is really the right solution to your problem.

Critical Sections openMP

I would like to know where do we need to set critical sections?
If there are multiple threads with a shared array, and each one want
to write in different place does it need to be in a critical section, even though each
thread write to a different place in the array?
lets say that I have 2 dimensional array M[3][3], initial_array[3] and some double variable
and I want to calculate something and store it at the first column of M.
I can use with a for loop, but I want to use with openMP , so I did:
omp_set_num_threads(3);
#pragma omp parallel shared(M,variable)
{
int id = omp_get_thread_num();
double init = initial_array[id]*variable;
M[id][0] = init;
}
It works fine, but I know that it can cause to deadlock or for bad running time.
I mean what if I had more threads and even a larger M..
what is the correct way to set critical section?
another thing i want to ask is about the initial_array, is it also need to be shared?
This is safe code.
Random access in arrays does not cause any race conditions to other elements in the array. As long as you continue to read and write to unshared elements within the array concurrently, you'll never hit a race condition.
Keep in mind that a read can race with a write depending on the type and size of the element. Your example shows double, and I'd be concerned if you had reads concurrent with write operations on the same element. It is possible for there to be a context switch during a write, but that depends on your arch/platform. Anyways, you aren't doing this but it is worth mentioning.
I don't see any problem with regards to concurrency since you are accessing different parts of the memory (different indices of the array), but the only problem I see is performance hit if your cores have dedicated L1 caches.
In this case there will be a performance hit due to cache coherency, where one updates the index, invalidates others, does a write back etc. For small no of threads/cores not an issue but on threads running on large number of cores it sure it. Because the data your threads running on aren't truly independent, they are read as a block of data in cache (if you are accessing M[0][0], then not only M[0][0] is read into the cache but M[0][0] to M[n][col] where n depends upon the cache block size ). And if the block is large, it might contain more of shared data.

Does the C++11 memory model prevent memory tearing and conflicts?

Reading a draft of C++11 I was interested by clause 1.7.3:
A memory location is either an object of scalar type or a maximal sequence of adjacent bit-fields all having non-zero width. ... Two threads of execution (1.10) can update and access separate memory locations without interfering with each other.
Does this clause protect from hardware related race conditions such as:
unaligned data access where memory is updated in two bus transactions (memory tearing)?
where you have distinct objects within a system memory unit, e.g. two 16-bit signed integers in a 32-bit word, and each independent update of the separate objects requires the entire memory unit to be written (memory conflict)?
Regarding the second point, the standard guarantees that there will be no race there. That being said, I have been told that this guarantee is not implemented in current compilers, and it might even be impossible to implement in some architectures.
Regarding the first point, if the second point is guaranteed, and if your program does not contain any race condition, then the natural outcome is that this will not be a race condition either. That is, given the premise that the standard guarantees that writes to different sub word locations are safe, then the only case where you can have a race condition is if multiple threads access the same variable (that is split across words, or more probably for this to be problematic, across cache lines).
Again this might be hard or even impossible to implement. If your unaligned datum goes across a cache line, then it would be almost impossible to guarantee the correctness of the code without imposing a huge cost to performance. You should try to avoid unaligned variables as much as possible for this and other reasons (including raw performance, a write to an object that touches two cache lines involves writing as many as 32 bytes to memory, and if any other thread is touching any of the cache lines, it also involves the cost of synchronization of the caches...
It does not protect against memory tearing, which is only visible when two threads access the same memory location (but the clause only applies to separate memory locations).
It appears to protect against memory conflict, according to your example. The most likely way to achieve this is that a system which can't write less than 32 bits at once would have 32-bit char, and then two separate objects could never share a "system memory unit". (The only way two 16-bit integers can be adjacent on a system with 32-bit char is as bitfields.)

Is it safe to access different sub-arrays with different threads without syncing?

If I have 10 threads, and an array of 10 sub-arrays, is it safe to have each thread do work on a different one of the sub-arrays? i.e. thread[0] does stuff to array[0], thread[1] does stuff to array[1], etc. Or is this not safe to do? Does it make a difference if it's a vector or array (or any data set for that matter)?
Yes, you're safe. As long as none of the threads modifies a resource other threads access without guards or syncing you're safe. It doesn't matter if the memory addresses are very close to each other; proximity doesn't play a role. All that matters is whether there's sharing, and if so does any of the threads modify the shared resource.
Yes, but beware of false sharing.
Essentially yes - it is safe at the array level (but it may also depend - see below). However, if it were another structure, for example a tree or a doubly-linked list, then you may run into issues if you try to modify the structure since a change to one element may require changes to other elements as well which is not safe. But as long as you are only reading the data, you should be OK. One possible pitfall is if the array contains references or pointers. In this case it may happen that while you are accessing separate array entries, the are directly or indirectly referencing the same areas in memory. In that case, you must perform proper synchronization.
So in one word: if it's an array of int or another simple data type, you are completely safe. If it's not an array or the elements are not completely in-place but contain pointers or references, you should be careful.
If you create a "mother-array" which contains 10 smaller arrays and each thread is accessing one of those arrays exclusively then nothing bad can happen. The size of the elements of those arrays does not matter.
If you use more complex structures instead of arrays, if reading does not change anything then you are safe too. However, if a simple read from the structure can modify it (e.g. something is cached, reorganised), having parallel threads accessing the mother-structure can be problematic.
Apart from that - I don't see any case when this could cause trouble.