I can't find the answer but it's a simple question:
Is it safe for two threads to read the value of a pointer to a user-defined object in c++ at the same time with no locks or any other shenanigans?
Yes. Actually it is safe to read any values (of builtin type) concurrently.
Data races can only occur, if a value is modified concurrently with some other thread using it. The key statements from the Standard for this are:
A data race is defined in §1.10/21:
The execution of a program contains a data race if it contains two
conflicting actions in different threads, at least one of which is not
atomic, and neither happens before the other.
where conflicting is defined in §1.10/4:
Two expression evaluations conflict if one of them modifies a memory
location (1.7) and the other one accesses or modifies the same memory
location.
So you must use suitable synchronization between those reads and any writes.
It is always safe to read values from multiple threads. It's only when you're also writing to the data that you need to manage concurrent accesses.
The only possible issue for read-only data is ensuring that the value has, in fact, been initialized when the reading is done. If you initialize the value before you start your threads you'll be fine.
It is generally not thread-safe if the variable gets modified in one of the threads.
By thread-safe I suppose you mean to ask whether they have atomic writes. In C++03 this is not true, as C++03 doesn't really know about threads. In C++11 you have std::atomic, which is specialized for pointers.
Related
I am building a very simple program as an exercise.
The idea is to compute the total size of a directory by recursively iterating over all its contents, and summing the sizes of all files contained in the directory (and its subdirectories).
To show to a user that the program is still working, this computation is performed on another thread, while the main thread prints a dot . once every second.
Now the main thread of course needs to know when it should stop printing dots and can look up a result.
It is possible to use e.g. a std::atomic<bool> done(false); and pass this to the thread that will perform the computation, which will set it to true once it is finished. But I am wondering if in this simple case (one thread writes once completed, one thread reads periodically until nonzero) it is necessary to use atomic data types for this. Obviously if multiple threads might write to it, it needs to be protected. But in this case, there's only one writing thread and one reading thread.
Is it necessary to use an atomic data type here, or is it overkill and could a normal data type be used instead?
Yes, it's necessary.
The issue is that the different cores of the processor can have different views of the "same" data, notably data that's been cached within the CPU. The atomic part ensures that these caches are properly flushed so that you can safely do what you are trying to do.
Otherwise, it's quite possible that the other thread will never actually see the flag change from the first thread.
Yes it is necessary. The rule is that if two threads could potentially be accessing the same memory at the same time, and at least one of the threads is a writer, then you have a data race. Any execution of a program with a data race has undefined behavior.
Relevant quotes from the C++14 standard:
1.10/23
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
1.10/6
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
Yes, it is necessary. Otherwise it is not guaranteed that changes to the bool in one thread will be observable in the other thread. In fact, if the compiler sees that the bool variable is, apparently, not ever used again in the execution thread that sets it, it might completely optimize away the code that sets the value of the bool.
I am building a very simple program as an exercise.
The idea is to compute the total size of a directory by recursively iterating over all its contents, and summing the sizes of all files contained in the directory (and its subdirectories).
To show to a user that the program is still working, this computation is performed on another thread, while the main thread prints a dot . once every second.
Now the main thread of course needs to know when it should stop printing dots and can look up a result.
It is possible to use e.g. a std::atomic<bool> done(false); and pass this to the thread that will perform the computation, which will set it to true once it is finished. But I am wondering if in this simple case (one thread writes once completed, one thread reads periodically until nonzero) it is necessary to use atomic data types for this. Obviously if multiple threads might write to it, it needs to be protected. But in this case, there's only one writing thread and one reading thread.
Is it necessary to use an atomic data type here, or is it overkill and could a normal data type be used instead?
Yes, it's necessary.
The issue is that the different cores of the processor can have different views of the "same" data, notably data that's been cached within the CPU. The atomic part ensures that these caches are properly flushed so that you can safely do what you are trying to do.
Otherwise, it's quite possible that the other thread will never actually see the flag change from the first thread.
Yes it is necessary. The rule is that if two threads could potentially be accessing the same memory at the same time, and at least one of the threads is a writer, then you have a data race. Any execution of a program with a data race has undefined behavior.
Relevant quotes from the C++14 standard:
1.10/23
The execution of a program contains a data race if it contains two potentially concurrent conflicting actions, at least one of which is not atomic, and neither happens before the other, except for the special case for signal handlers described below. Any such data race results in undefined behavior.
1.10/6
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
Yes, it is necessary. Otherwise it is not guaranteed that changes to the bool in one thread will be observable in the other thread. In fact, if the compiler sees that the bool variable is, apparently, not ever used again in the execution thread that sets it, it might completely optimize away the code that sets the value of the bool.
I know that reading from a shared variable in OpenMP does not cause a race condition, because every thread has it's own copy of that variable.
But if the shared variable is a pointer (e.g. to a container), then every thread only gets a copy of the pointer.
If I now read from the location the pointer is pointing to (my container), can there be race conditons or does OpenMP somehow take care of this?
Is it better to share a copy of the container itself, instead of a pointer to it, among threads?
Just reading from a variable cannot produce a race condition: it doesn't matter whether the variable is shared or not. To produce a race condition you need to have two or more threads trying to modify the same instance of a variable at the same time.
Then, assuming that your threads are reading and modifying a certain variable, if you make this variable shared you will still have a race condition since all the threads share the same instance. I guess that in your first paragraph you wanted to say private, as #ilotXXI pointed out.
About your question about privatizing a pointer, if two o more instances of that pointer point to the same data and they modify it, you will have a race condition (each thread has a private version of the pointer but not a private version of the data).
Note that changing from one data-sharing clause to another may change the behavior of your application. Thus, in general, when you are parallelizing an application, what you have to do first is to analyze which kind of data accesses your application is performing. Once you know that, you have to think which data-sharing clauses and which synchronization constructs (if needed) you should use to keep the original behavior of your application.
What should I be concerned about as far as thread safety and undefined behavior goes in a situation where multiple threads are reading from a single source that is constant?
I am working on a signal processing model that allows for parallel execution of independent processes, these processes may share an input buffer, but the process that fills the input buffer will always be complete before the next stage of possibly parallel processes will execute.
Do I need to worry about thread safety issues in this situation? and what could i do about it?
I would like to note that a lock free solution would be best if possible
but the process that fills the input buffer will always be complete before the next stage of possibly parallel processes will execute
If this is guaranteed then there is not a problem having multiple reads from different threads for const objects.
I don't have the official standard so the following is from n4296:
17.6.5.9 Data race avoidance
3 A C++ standard library function shall not directly or indirectly modify objects (1.10) accessible by threads
other than the current thread unless the objects are accessed directly or indirectly via the function’s non-const
arguments, including this.
4 [ Note: This means, for example, that implementations can’t use a static object for internal purposes without
synchronization because it could cause a data race even in programs that do not explicitly share objects
between threads. —end note ]
Here is the Herb Sutter video where I first learned about the meaning of const in the C++11 standard. (see around 7:00 to 10:30)
No, you are OK. Multiple reads from the same constant source are OK and do not pose any risks in all threading models I know of (namely, Posix and Windows).
However,
but the process that fills the input buffer will always be complete
What are the guarantees here? How do you really know this is the case? Do you have a synchronization?
I wanted to know that is it safe to assume that if multiple threads are accessing a single static container (boost::unordered_map) there is no need for locking the access to the container if multiple threads are only reading data from it. and no writing is done
When multiple threads are only reading and performing no write operation, you do not need to synchronize access.
Paragraph 1.10 of the C++11 Standard defines conflicting operations with respect to data races as:
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one
accesses or modifies the same memory location.
And then of course, per 1.10/21:
The execution of a program contains a data race if it contains two conflicting actions in different threads,
at least one of which is not atomic, and neither happens before the other. [...]