OpenMP race condition while reading from pointer - c++

I know that reading from a shared variable in OpenMP does not cause a race condition, because every thread has it's own copy of that variable.
But if the shared variable is a pointer (e.g. to a container), then every thread only gets a copy of the pointer.
If I now read from the location the pointer is pointing to (my container), can there be race conditons or does OpenMP somehow take care of this?
Is it better to share a copy of the container itself, instead of a pointer to it, among threads?

Just reading from a variable cannot produce a race condition: it doesn't matter whether the variable is shared or not. To produce a race condition you need to have two or more threads trying to modify the same instance of a variable at the same time.
Then, assuming that your threads are reading and modifying a certain variable, if you make this variable shared you will still have a race condition since all the threads share the same instance. I guess that in your first paragraph you wanted to say private, as #ilotXXI pointed out.
About your question about privatizing a pointer, if two o more instances of that pointer point to the same data and they modify it, you will have a race condition (each thread has a private version of the pointer but not a private version of the data).
Note that changing from one data-sharing clause to another may change the behavior of your application. Thus, in general, when you are parallelizing an application, what you have to do first is to analyze which kind of data accesses your application is performing. Once you know that, you have to think which data-sharing clauses and which synchronization constructs (if needed) you should use to keep the original behavior of your application.

Related

Does any of C++ smart pointers avoid data race in strict sense?

For consumer/producer model there is a built-in mechanism to avoid data race - queue.
But for global flag there seems not yet a ready-to-go type to avoid data race rather than attaching a mutex to each global flag as simple as boolean or int type.
I came across shared pointer. Is it true that as one pointer operates on that variable, another is prohibited from accessing it?
Or will unique pointer promise no data race?
e.g. scenario:
One thread updates the number of visits on serving a new visitor, while another thread periodically reads that number out (might be copy behavior) and save it to log. They will be accessing the same memory on the heap that stores that number, and race condition is that they are accessing it at the same time from different cpu cores, which would cause a crash.
For consumer/producer model there is a built-in mechanism to avoid data race - queue.
The standard library has no thread-safe queue. std::queue and others cannot be used without explicit synchronization in multiple threads.
I came across shared pointer. Is it true that as one pointer operates on that variable, another is prohibited from accessing it?
std::shared_ptr (or any other standard library smart pointer) does not in any way prevent multiple threads accessing the managed object unsynchronized. std::shared_ptr only guarantees that destruction of the managed object is thread-safe.
Or will unique pointer promise no data race?
std::unique_ptr cannot be copied, so you cannot have multiple std::unique_ptr or threads managing the object. None of the smart pointers guarantee that access to the smart pointer object itself is free of data races.
One thread updates the number of visits on serving a new visitor, while another thread periodically reads that number out (might be copy behavior) and save it to log. They will be accessing the same memory on the heap that stores that number, and race condition is that they are accessing it at the same time from different cpu cores, which would cause a crash.
That can simply be a std::atomic<int> or similar. Unsynchronized access to a std::atomic is allowed. There can of course still be race conditions if you rely on a particular order in which the access should happen, but in your example that doesn't seem to be the case. However, in contrast to non-atomic objects, there will be at least no undefined behavior due to the unsynchronized access (data race).

Why do I need to use std::this_thread::yield() for this program to work correctly? [duplicate]

I am building a very simple program as an exercise.
The idea is to compute the total size of a directory by recursively iterating over all its contents, and summing the sizes of all files contained in the directory (and its subdirectories).
To show to a user that the program is still working, this computation is performed on another thread, while the main thread prints a dot . once every second.
Now the main thread of course needs to know when it should stop printing dots and can look up a result.
It is possible to use e.g. a std::atomic<bool> done(false); and pass this to the thread that will perform the computation, which will set it to true once it is finished. But I am wondering if in this simple case (one thread writes once completed, one thread reads periodically until nonzero) it is necessary to use atomic data types for this. Obviously if multiple threads might write to it, it needs to be protected. But in this case, there's only one writing thread and one reading thread.
Is it necessary to use an atomic data type here, or is it overkill and could a normal data type be used instead?
Yes, it's necessary.
The issue is that the different cores of the processor can have different views of the "same" data, notably data that's been cached within the CPU. The atomic part ensures that these caches are properly flushed so that you can safely do what you are trying to do.
Otherwise, it's quite possible that the other thread will never actually see the flag change from the first thread.
Yes it is necessary. The rule is that if two threads could potentially be accessing the same memory at the same time, and at least one of the threads is a writer, then you have a data race. Any execution of a program with a data race has undefined behavior.
Relevant quotes from the C++14 standard:
1.10/23
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
1.10/6
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
Yes, it is necessary. Otherwise it is not guaranteed that changes to the bool in one thread will be observable in the other thread. In fact, if the compiler sees that the bool variable is, apparently, not ever used again in the execution thread that sets it, it might completely optimize away the code that sets the value of the bool.

C++ if one thread writes toggles a bool once done, is it safe to read that bool in a loop in a single other thread?

I am building a very simple program as an exercise.
The idea is to compute the total size of a directory by recursively iterating over all its contents, and summing the sizes of all files contained in the directory (and its subdirectories).
To show to a user that the program is still working, this computation is performed on another thread, while the main thread prints a dot . once every second.
Now the main thread of course needs to know when it should stop printing dots and can look up a result.
It is possible to use e.g. a std::atomic<bool> done(false); and pass this to the thread that will perform the computation, which will set it to true once it is finished. But I am wondering if in this simple case (one thread writes once completed, one thread reads periodically until nonzero) it is necessary to use atomic data types for this. Obviously if multiple threads might write to it, it needs to be protected. But in this case, there's only one writing thread and one reading thread.
Is it necessary to use an atomic data type here, or is it overkill and could a normal data type be used instead?
Yes, it's necessary.
The issue is that the different cores of the processor can have different views of the "same" data, notably data that's been cached within the CPU. The atomic part ensures that these caches are properly flushed so that you can safely do what you are trying to do.
Otherwise, it's quite possible that the other thread will never actually see the flag change from the first thread.
Yes it is necessary. The rule is that if two threads could potentially be accessing the same memory at the same time, and at least one of the threads is a writer, then you have a data race. Any execution of a program with a data race has undefined behavior.
Relevant quotes from the C++14 standard:
1.10/23
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
1.10/6
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
Yes, it is necessary. Otherwise it is not guaranteed that changes to the bool in one thread will be observable in the other thread. In fact, if the compiler sees that the bool variable is, apparently, not ever used again in the execution thread that sets it, it might completely optimize away the code that sets the value of the bool.

Is it always effective to read a variable in the multi-thread environment?

There is a variable(e.g. int temp;) in the multi-thread environment.
Some threads write to it, with write-lock protected.
while others read the variable, but without any lock.
My question is:
If the variable is writed to be one of element in a SET(e.g. {1, 2, 3}),
by some threads repeatedly.
Is it always one of that SET, when I read it.
The rule is very simple: if two or more threads access the same variable and at least one of those threads writes to it, you must synchronize all of those accesses. If you do not, the behavior is undefined.
volatile won't help here; either use a mutex or a condition variable, or make the variable itself atomic. (And "atomic" means C++11 atomic, not some selection of properties that someone thinks will act pretty well in multi-threaded applications).
Yes it will if your variable's type is immutable because SET does not allow duplicates
Refer SET
If noone write value from outside of your SET, the value will remains from this SET. You can possibly need to use volatile in your case.

are c++ pointers to user-defined objects thread safe for reading?

I can't find the answer but it's a simple question:
Is it safe for two threads to read the value of a pointer to a user-defined object in c++ at the same time with no locks or any other shenanigans?
Yes. Actually it is safe to read any values (of builtin type) concurrently.
Data races can only occur, if a value is modified concurrently with some other thread using it. The key statements from the Standard for this are:
A data race is defined in §1.10/21:
The execution of a program contains a data race if it contains two
conflicting actions in different threads, at least one of which is not
atomic, and neither happens before the other.
where conflicting is defined in §1.10/4:
Two expression evaluations conflict if one of them modifies a memory
location (1.7) and the other one accesses or modifies the same memory
location.
So you must use suitable synchronization between those reads and any writes.
It is always safe to read values from multiple threads. It's only when you're also writing to the data that you need to manage concurrent accesses.
The only possible issue for read-only data is ensuring that the value has, in fact, been initialized when the reading is done. If you initialize the value before you start your threads you'll be fine.
It is generally not thread-safe if the variable gets modified in one of the threads.
By thread-safe I suppose you mean to ask whether they have atomic writes. In C++03 this is not true, as C++03 doesn't really know about threads. In C++11 you have std::atomic, which is specialized for pointers.