What do the execution policies in std::copy_n really mean? - c++

I just discovered that std::copy_n provides overloads for different execution policies. Yet I find cppreference quite hard to understand here as (I suppose) it is kept very general. So I have difficulties putting together what actually goes on.
I don't really understand the explanation of the first policy:
The execution policy type used as a unique type to disambiguate
parallel algorithm overloading and require that a parallel algorithm's
execution may not be parallelized. The invocations of element access
functions in parallel algorithms invoked with this policy (usually
specified as std::execution::seq) are indeterminately sequenced in the
calling thread.
To my understanding this means that we don't parallelize (multithread) here and each element access is sequential like in strcpy. This basically means to me that one thread runs through the function and I'm done. But then there is
invocations of element access functions in parallel algorithms.
What now? Are there still paralell algorithms? How?
The second execution policy states that:
Any such invocations executing in the same thread are indeterminately
sequenced with respect to each other.
What I imagine that means is this: Each thread starts at a different position, e.g. the container is split up into multiple segments and each thread copies one of those segments. The threads are created by the library just to run the algorithm. Am I correct in assuming so?
From the third policy:
The invocations of element access functions in parallel algorithms
invoked with this policy are permitted to execute in an unordered
fashion in unspecified threads, and unsequenced with respect to one
another within each thread.
Does this mean the above mentioned container "segments" need not be copied one after another but can be copied in random fashion? If so, why is this so important to justify an extra policy? When I have multiple threads they will need to be somewhat intermixed to keep synchronisation on a minimum no?
So here's my probably incorrect current understanding of the policies. Please correct me!
sequenced_policy: 1 thread executes the algorithm and copies everything from A - Z.
parallel_policy: Lib creates new threads specifically for copying, whereas each thread's copied segment has to follow the other (sequenced)?
parallel_unsequenced_policy: try to multithread and SIMD. Copied segments can be intermixed by thread (it doesn't matter where you start).
unsequenced_policy: Try to use SIMD but only singlethreaded.

Your summary of the basic idea of each policy is basically correct.
Does this mean the above mentioned container "segments" need not be copied one after another but can be copied in random fashion? If so, why is this so important to justify an extra policy?
The extra policies for unsequenced_policy and parallel_unsequenced_policy are necessary because they impose an extra requirement on calling code1:
The
behavior of a program is undefined if it invokes a vectorization-unsafe standard library function from user code
called from a execution::unsequenced_policy algorithm.
[and a matching restriction for parallel_unsequenced_policy.]
These four execution policies are used for algorithms in general. The mention of user code called from execution of the algorithm mostly applies to things like std::for_each, or std::generate, where you tell the algorithm to invoke a function. Here's one of the examples from the standard:
int a[] = {0,1};
std::vector<int> v;
std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int i) {
v.push_back(i*2+1); // incorrect: data race
});
This particular example shows a problem created by parallel execution--you might have two threads trying to invoke push_back on v concurrently, giving a data race.
If you use for_each with one of the unsequenced policies, that imposes a further constraint on what your code can do.
When we look specifically at std::copy_n, that's probably less of a problem as a rule, because we're not passing it some code to be invoked. Well, we're not doing so directly, anyway. In reality, we are potentially doing so indirectly though. std::copy_n uses the assignment operator for the item being copied. So, for example, consider something like this:
class foo {
static int copy_count;
int data;
public:
foo &operator=(foo const &other) {
data = other.data;
++copy_count;
}
};
foo::int copy_count;
std::vector<foo> a;
std::vector<foo> b;
// code to fill a with data goes here
std::copy_n(std::execution::par, a.begin(), a.end(), std::back_inserter(b));
Our copy assignment operator accesses copy_count without synchronization. If we specify sequential execution, that's fine, but if we specify parallel execution we're now (potentially) invoking it concurrently on two or more threads, so we have a data race.
I'd probably have to work harder to put together a somewhat coherent reason for an assignment operator to do something that was vectorizaton-unsafe, but that doesn't mean it doesn't exist.
Summary
We have four separate execution policies because each imposes unique constraints on what you can do in your code. In the specific cases of std::copy or std::copy_n, those constraints apply primarily to the assignment operator for the items in the collection being copied.
N4835, section [algorithms.parallel.exec]

Related

Data race in parallelized std::for_each

On the cpp reference website on execution policy there is an example like this:
std::atomic<int> x{0};
int a[] = {1,2};
std::for_each(std::execution::par, std::begin(a), std::end(a), [&](int) {
x.fetch_add(1, std::memory_order_relaxed);
while (x.load(std::memory_order_relaxed) == 1) { } // Error: assumes execution order
});
As you see it is an example of (supposedly) erroneous code. But I do not really understand what the error is here, it does not seem to me that any part of the code assumes the execution order. AFAIK, the first thread to fetch_add will wait for the second one but that's it, no problematic behaviour. Am I missing something and there is some error out there?
The execution policy type used as a unique type to disambiguate
parallel algorithm overloading and indicate that a parallel
algorithm's execution may be parallelized. The invocations of element
access functions in parallel algorithms invoked with this policy
(usually specified as std::execution::par) are permitted to execute in
either the invoking thread or in a thread implicitly created by the
library to support parallel algorithm execution. Any such invocations
executing in the same thread are indeterminately sequenced with
respect to each other.
As far as I can see, the issue here is that there is no guarantee on how many threads are used, if the system uses a single thread - there's going to be an endless loop here (while (x.load(std::memory_order_relaxed) == 1) { } never completes).
So I guess the comment means that this codes wrongfully relies on multiple threads executing which would cause fetch_add to be called at some point more than once.
The only guarantee you get is that for each thread, the invocations are not interleaved.

C++17: how to control number of threads in execution policy?

The C++17 standard introduced an execution policy parameter (e.g. std::execution::par_unseq) which can be passed to some of the functions in the std library to make them execute in parallel, e.g.:
std::copy(std::execution::par_unseq, obj1.begin(), obj1.end(), obj2.begin())
In other frameworks like OpenMP, it’s possible to set the maximum number of threads that it will use, e.g. #pragma omp parallel num_threads(<desired_numer>) to set it locally within the section, or omp_set_num_threads(<desired_number>) to set it within the calling scope.
I’m wondering how can this be achieved in standard C++ for the execution policies.
This is a good question. That said, unfortunately, I don't think it's possible. [execpol.general]/1 says:
This subclause describes classes that are execution policy types. An
object of an execution policy type indicates the kinds of
parallelism allowed in the execution of an algorithm and expresses
the consequent requirements on the element access functions.
(emphasis mine)
Moreover, after that, the whole [execpol] is dealing about is_execution_policy, (disambiguator) policy types, and the execution policy objects.
In other words, execution policies only bring the possibility of parallelism at the cost of constrained element access functions. It is not really specified how these policies are carried out. To me, it seems even less possible to control the details of parallelism, with the number of threads being an example.

C++ thread safety - map reading

I am working on a program that needs std::map and specifically one like this map<string,map<string,int>> - it is meant to be something like bank change rates - the first string is the original currency and the one in the second map is the desired one and the int is their rate. This whole map will be read only. Do I still need mutexes ? I am a bit confused about the whole thread safety, since this is my first bigger multi-threaded program.
If you are talking about the standard std::map† and no thread writes to it, no synchronization is required. Concurrent reads without writes are fine.
If however at least one thread performs writes on the map, you will indeed need some sort of protection like a mutex.
Be aware that std::map::operator[] counts as write, so use std::map::at (or std::map::find if the key may not exist in the map) instead. You can make the compiler protect you from accidental writes by only referring to the shared map via const map&.
†Was clarified to be the case in the OP. For completeness' sake: Note that other classes may have mutable members. For those, even access through const& may introduce a race. If in doubt, check the documentation or use something else for parallel programming.
The rule of thumb is if you have shared data and at least one thread will be a writer then you need synchronization. If one of the threads is a writer you must have synchronization as you do not want a reader to read an element that is being written to. This can cause issues as the reader might read part of the old value and part of the new value.
In your case since all the threads will only ever being reading data there is nothing they can do that will affect the map so you can have concurrent(unsynchronized) reads.
Wrap a std::map<std::string, std::map<std::string,int>> const in a custom class which has only const member functions [*].
This will make sure that all threads which use an object of the class after its creation will only read from it, which is guaranteed to be safe since C++11.
As documentation says:
All const member functions can be called concurrently by different
threads on the same container.
Wrapping containers in your own custom types is good practice anyway. Increased thread safety is just one positive side effect of that good practice. Other positive effects include increased readability of client code, reduction/adaption of container interface to required functionality, ease of adding additional constraints and checks.
Here is a brief example:
class BankChangeRates
{
public:
BankChangeRates(std::map<std::string, std::map<std::string,int>> const& data) : data(data) {}
int get(std::string const& key, std::string const& inner_key) const
{
auto const find_iter = data.find(key);
if (find_iter != data.end())
{
auto const inner_find_iter = find_iter->second.find(inner_key);
if (inner_find_iter != find_iter->second.end())
{
return inner_find_iter->second;
}
}
// error handling
}
int size() const
{
return data.size();
}
private:
std::map<std::string, std::map<std::string,int>> const data;
};
In any case, the thread-safety problem is then reduced to how to make sure that the constructor does not read from an object to which another thread writes. This is often achieved trivially; for example, the object may be constructed before multi-threading even begins, or it may be initialised with hard-coded initialisation lists. In many other cases, the code which creates the object will generally access only other thread-safe functions and local objects.
The point is that concurrent accesses to your object will always be safe once it has been created.
[*] Of course, the const member functions should keep their promise and not attempt "workarounds" with mutable or const_cast.
If your are completely sure that both the maps are ALWAYS READONLY, Then you never need mutexes.
But you have to be extra careful that no one can update the map by any means during the program execution. Make sure that you are initializing the map at the init stage of program and then never update it for any reason.
If you are confused that, In future you may need to update it in between the program execution, then its better to have macros around the map, which are empty right now. And in future, if you need mutexes around them, just change the macro definition.
PS:: I have used map in answer which can be easily replaced by shared resources. It was for the ease of understanding

Why c++ threads are movable but not copiable?

As the title of the question says, why C++ threads (std::thread and pthread) are movable but not copiable? What consequences are there, if we do make it copiable?
Regarding copying, consider the following snippet:
void foo();
std::thread first (foo);
std::thread second = first; // (*)
When the line marked (*) takes place, presumably some of foo already executed. What would the expected behavior be, then? Execute foo from the start? Halt the thread, copy the registers and state, and rerun it from there?
In particular, given that function objects are now part of the standard, it's very easy to launch another thread that performs exactly the same operation as some earlier thread, by reusing the function object.
There's not much motivation to begin with for this, therefore.
Regarding moves, though, consider the following:
std::vector<std::thread> threads;
without move semantics, it would be problematic: when the vector needs to internally resize, how would it move its elements to another buffer? See more on this here.
If the thread objects are copyable, who is finally responsible for the single thread of execution associated with the thread objects? In particular, what would join() do for each of the thread objects?
There are several possible outcomes, but that is the problem, there are several possible outcomes with no real overlap that can be codified (standardised) as a general use case.
Hence, the most reasonable outcome is that 1 thread of execution is associated with at most 1 thread object.
That is not to say some shared state cannot be provided, it is just that the user then needs to take further action in this regard, such as using a std::shared_ptr.

Why map is not multithread safe in C++?

I met this problem when I tried to solve an concurrency issue in my code. In the original code, we only use a unique lock to lock the write operation on a cache which is a stl map. But there is no restrictions on read operation to the cache. So I was thinking add a shared lock to the read operation and keep the unique lock to the write. But someone told me that it's not safe to do multithreading on a map due to some internal caching issue that it itself does.
Can someone explain the reason in details? What does the internal caching do?
The implementations of std::map must all meet the usual
guarantees: if all your do is read, then there is no need for
external synchrionization, but as soon as one thread modifies,
all accesses must be synchronized.
It's not clear to me what you mean by "shared lock"; there is no
such thing in the standard. But if any one thread is writing,
you must ensure that no other threads may read at the same time.
(Something like Posix' pthread_rwlock could be used, but
there's nothing similar in the standard, at least not that I can
find off hand.)
Since C++11 at least, a const operation on a standard library class is guaranteed to be thread safe (assuming const operations on objects stored in it are thread safe).
All const member functions of std types can be safely called from multiple threads in C++11 without explicit synchronization. In fact, any type that is ever used in conjunction with the standard library (e.g. as a template parameter to a container) must fulfill this guarantee.
Clarificazion: The standard guarantees that your program will have the desired behaviour as long as you never cause a write and any other access to the same data location without a synchronization point in between. The rationale behind this is that modern CPUs don't have strict sequentially consistent memory models, which would limit scalability and performance. Under the hood, your compiler and standard library will emit appropriate memory fences at places where stronger memory orderings are needed.
I really don't see why there would be any caching issue...
If I refer to the stl definition of a map, it should be implemented as a binary search tree.
A binary search tree is simply a tree with a pool of key-value nodes. Those nodes are sorted following the natural order of their keys and, to avoid any problem, keys must be unique. So no internal caching is needed at all.
As no internal caching is required, read operations are safe in multi-threading context. But it's not the same story for write operations, for those you must provide your own synchronization mechanism as for any non-threading-aware data structure.
Just be aware that you must also forbid any read operations when a write operation is performed by a thread, because this write operation can result in a slow and complete rebalancing of the binary tree, i.e. a quick read operation during a long write operation would return a wrong result.