I am writing an unit test for a class to test for insertion when no memory is available. It relies on the fact that nbElementInserted is incremented AFTER insert_edge has returned.
void test()
{
adjacency_list a(true);
MemoryVacuum no_memory_after_this_line;
bool signalReceived = false;
size_t nbElementInserted = 0;
do
{
try
{
a.insert_edge( 0, 1, true ); // this should throw
nbElementInserted++;
}
catch(std::bad_alloc &)
{
signalReceived = true;
}
}
while (!signalReceived); // this loop is necessary because the
// memory vacuum only prevents new memory
// pages from being mapped. so the first
// allocations may succeed.
CHECK_EQUAL( nbElementInserted, a.nb_edges() );
}
Now I am wondering which of the two statement is true:
Reordering can happen, in which case nbElementInserted can be incremented before insert_edge throws an exception, and that invalidates my case. Reordering can happen because the visible result for the user is the same if the two lines are permuted.
Reordering cannot happen because insert_edge is a function and all the side effects of the function should be completed before going to the next line. Throwing is a side effect.
Bonus point: if the correct answer is “yes reordering can happen”, is a memory barrier between the 2 lines sufficient to fix it?
No. Reordering only comes into play in multithreaded or multiprocessing scenarios. In a single thread the compiler cannot reorder instructions in a way that would change the behavior of the program. Exceptions are not an exception to this rule.
Reordering becomes visible when two threads read and write to shared state. If thread A makes modifications to shared variables thread B can see those modifications out-of-order, or even not at all if it has the shared state cached. This can be due to optimizations in either thread A or thread B or both.
Thread A will always see its own modifications in-order, though. Each sequence point must happen in order, at least as far as the local thread is aware.
Let's say thread A executed this code:
a = foo() + bar();
b = baz;
Each ; introduces a sequence point. The compiler is allowed to call either foo() or bar() first, whichever it likes, since + does not introduce a sequence point. If you put printouts you might see foo() called first, or you might see bar() called first. Either one would be correct. It must call them before it assigns baz to b, though. If either foo() or bar() throws an exception b must retain its existing value.
However, if the compiler knew that foo() and bar() never throw, and their execution in no way depends on the value of b, it could reorder the two statements. It'd be a valid optimization. There would be no way for the thread A to know that statements had been reordered.
Thread B, on the other hand, would know. The problem in multithreaded programming is that sequence points don't apply to other threads. That's where memory barriers come in. Memory barriers are cross-thread sequence points, in a sense.
Related
I have a piece of multithreaded that I'm not sure is not liable to a data race because of compiler reordering.
Here is a minimal example:
int main()
{
int x = 0;
x = 5;
auto t = std::thread([&x]()
{
++x;
});
t.join();
return 0;
}
Is the assignment of x = 5 guaranteed to be before the thread start?
Short answer: The code will work as expected. No reordering will take place
Long answer:
Compile time reordering
Let's consider what's going on.
You put a variable in automatic storage (x)
You create an object that holds a reference to this variable (the lambda)
You pass that object to an external function (the thread constructor)
The compiler does escape analysis during optimization. Due to this sequence of events, the variable x has escaped once point 3 is reached. Which means from the compiler's point of view, any external function (except those marked as pure) may read or modify the variable. Therefore its value has to be stored to the stack before each function call and has to be loaded from stack after the function.
You did not make x an atomic variable. So the compiler is free to ignore any potential multithreading effects. Therefore the value may not be reloaded multiple times from memory in between calls to external functions. It may still be reloaded if the compiler decides to not keep the value in a register in between uses.
Let's annotate and expand your source code to show it:
int main()
{
int x = 0;
x = 5; // stores on stack for use by external function in next line
auto t = std::thread([&x]() mutable
{
++x;
});
int x1 = x; // loads x from stack after thread constructor may (in theory) have modified it
int x2 = x; // probably no reload because not an atomic variable
x = 7; // new value stored on stack because join() could access it (in theory)
t.join();
int x3 = x; // reload from stack because join() could have changed it
return 0;
}
Again, this has nothing to do with multithreading. Escape analysis and external function calls are sufficient.
Any access from main() between thread creation and joining would also be undefined behavior because it would be a data-race on a non-atomic variable. But that's just a side-note.
This takes care of the compiler behavior. But what about the CPU? May it reorder instructions?
Run time reordering
For this, we can look at the C++ standard Section 32.4.2.2 [thread.thread.constr] clause 7:
Synchronization: The completion of the invocation of the constructor synchronizes with the beginning of the invocation of the copy of f.
The constructor means the thread constructor. f is the thread function, meaning the lambda in your case. So this means that any memory effects are synchronized properly.
The join() call also synchronizes. Therefore access to x after the join can not suffer from runtime-reordering.
The completion of the thread represented by *this synchronizes with (6.9.2) the corresponding successful join() return.
Side note
Unlike suggested in some comments, the compiler will not optimize the thread creation away for two reasons: 1. No compiler is sufficiently magical to figure this out. 2. The thread creation may fail, which is defined behavior. Therefore it has to be included in the runtime.
I'm reading the implements of std::shared_ptr of libstdc++, and I noticed that libstdc++ has three locking policies: _S_single, _S_mutex, and _S_atomic (see here), and the lock policy would affect the specialization of class _Sp_counted_base (_M_add_ref and _M_release)
Below is the code snippet:
_M_release_last_use() noexcept
{
_GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
_M_dispose();
// There must be a memory barrier between dispose() and destroy()
// to ensure that the effects of dispose() are observed in the
// thread that runs destroy().
// See http://gcc.gnu.org/ml/libstdc++/2005-11/msg00136.html
if (_Mutex_base<_Lp>::_S_need_barriers)
{
__atomic_thread_fence (__ATOMIC_ACQ_REL);
}
// Be race-detector-friendly. For more info see bits/c++config.
_GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_weak_count);
if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count,
-1) == 1)
{
_GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_weak_count);
_M_destroy();
}
}
template<>
inline bool
_Sp_counted_base<_S_mutex>::
_M_add_ref_lock_nothrow() noexcept
{
__gnu_cxx::__scoped_lock sentry(*this);
if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, 1) == 0)
{
_M_use_count = 0;
return false;
}
return true;
}
my questions are:
When using the _S_mutex locking policy, the __exchange_and_add_dispatch function may only guarantee atomicity, but may not guarantee that it is fully fenced, am I right?
and because of 1, is the purpose of '__atomic_thread_fence (__ATOMIC_ACQ_REL)' to ensure that when thread A invoking _M_dispose, no thread will invoke _M_destory (so that thread A could never access a destroyed member(eg: _M_ptr) inside the function '_M_dispose'?
What puzzles me most is that if 1 and 2 are both correct, then why there is no need to add a thread fence before invoking the '_M_dispose'? (because the objects managed by _Sp_counted_base and _Sp_counted_base itself have the same problem when the reference count is dropped to zero)
Some of this is answered in the documentation and at the URL shown in the code you quoted.
When using the _S_mutex locking policy, the __exchange_and_add_dispatch function may only guarantee atomicity, but may not guarantee that it is fully fenced, am I right?
Yes.
and because of 1, is the purpose of '__atomic_thread_fence (__ATOMIC_ACQ_REL)' to ensure that when thread A invoking _M_dispose, no thread will invoke _M_destory (so that thread A could never access a destroyed member(eg: _M_ptr) inside the function '_M_dispose'?
No, that can't happen. When _M_release_last_use() starts executing _M_weak_count >= 1, and _M_destroy() function won't be called until after _M_weak_count is decremented, and so _M_ptr remains valid during the _M_dispose() call. If _M_weak_count==2 and another thread is decrementing it at the same time as _M_release_last_use() runs, then there is a race to see which thread decrements first, and so you don't know which thread will run _M_destroy(). But whichever thread it is, _M_dispose() already finished before _M_release_last_use() does its decrement.
The reason for the memory barrier is that _M_dispose() invokes the deleter, which runs user-defined code. That code could have arbitrary side effects, touching other objects in the program. The _M_destroy() function uses operator delete or an allocator to release the memory, which could also have arbitrary side effects (if the program has replaced operator delete, or the shared_ptr was constructed with a custom allocator). The memory barrier ensures that the effects of the deleter are synchronized with the effects of the deallocation. The library doesn't know what those effects are, but it doesn't matter, the code still ensures they synchronize.
What puzzles me most is that if 1 and 2 are both correct,
They are not.
It is well known that std::shared_ptr is not thread safe.
So it is easy to find a lot of crashing code samples in the web with simple programs to illustrate disadvantages of std::shared_ptr.
But I can't find any for the boost::atomic_shared_ptr
Does this mean that I can use boost::atomic_shared_ptr without any fear it can crash my app in the multithreaded environment?
(Of course the boost::atomic_shared_ptr can contain bugs. So I want to know - it is safe "by design" or not.)
You can use shared_pointer with atomic_load/atomic_store anyways ever since C++11, even though it is superseded in c++20:
https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic
I can't find a lot of examples myself, though I used it here: How to use boost::atomic_store with shared_ptr<T> and shared_ptr<const T>?, and here https://github.com/sehe/msghub/blob/std-over-boost/src/msghub.cpp#L25
Regardless of all this, none of this is going to make your code safe, because you'll still be sharing the objects pointed to by ths shared pointer, which makes your code every bit as prone to Data Races as before.
You still need to add proper synchronization to your code.
std::atomicstd::shared_ptr<> is thread-safe but ONLY for reading and writing the value of the pointer itself. It's use does not make all uses of shared_ptr thread-safe.
Consider this:
// using a single writer of data pointed to by foo for simplicity.
// contract: only set values from main thread (for example)
// set values, then call apply_changes when all your parameters
// are correct.
struct Foo
{
void set_valueA(int x);
//...
void set_valueZ(int x);
void apply_changes();
};
// using global data for silplicity, tis could be a member of some class,
// or in any stable memory location in an application.
// More than one thread may reallocate or reset this object, hence the std::atomic.
std::atomic<std::shared_ptr<Foo>> global_ptr;
void some_function()
{
// Set or reset global_ptr attributes
if (!global_ptr)
return;
global_ptr->set_valueA(42); // a bad_alloc exception could
// happen here
// what happens if global_ptr is set or reset
// by another thread in between these two statements?
global_ptr->set_valueB(17); // a bad_alloc exception may occur here.
global_ptr->apply_changes(); // undefined behavior as per contract of Foo
// may happen here.
}
// for this reason, proper usage would be...
void some_function()
{
// Set or reset global_ptr attributes
// since global_ptr is atomic, it is guaranteed that this line will not crash
// and will always return a well-formed shared_ptr. If fact, that's the
// one and only guarantee given by std::atomic<std::shared_ptr<>>.
if (std::shared_ptr<Foo> local_copy = global_ptr)
{
// Note that local_copy is guaranteed to point to the same object until
// end of scope.
local_copy->set_valueA(42);
local_copy->set_valueB(17);
local_copy->apply_changes();
// if global_ptr has changed, then memory will be released when
// exiting scope.
}
}
So, imho, some basic precautions must still be taken when using atomic shared pointers, either for read or write operations on the pointed to data.
What is the simplest way to signal a background thread to stop executing?
I have used something like:
volatile bool Global_Stop = false;
void do_stuff() {
while (!Global_Stop) {
//...
}
}
Is there anything wrong with this? I know for complex cases you might need "atomic" or mutexes, but for just boolean signaling this should work, right?
std::atomic is not for "complex cases". It is for when you need to access something from multiple threads. There are some myths about volatile, I cannot recall them, because all I remember is that volatile does not help when you need to access something from different threads. You need a std::atomic<bool>. Whether on your actual hardware accessing a bool is atomic does not really matter, because as far as C++ is concerned it is not.
Yes there's a problem: that's not guaranteed to work in C++. But it's very simple to fix, so long as you're on at least C++11: use std::atomic<bool> instead, like this:
#include <atomic>
std::atomic<bool> Global_Stop = false;
void do_stuff() {
while (!Global_Stop) {
//...
}
}
One problem is that the compiler is allowed to reorder memory accesses, so long as it can prove that it won't change the effect of the program:
int foo() {
int i = 1;
int j = 2;
++i;
++j;
return i + j;
}
Here the compiler is allowed to increment j before i because it clearly won't change the effect of the program. In fact it can optimise the whole thing away into return 5;. So what counts as "won't change the effect of the program?" The answer is long and complex and I don't pretend to understand them all, but one part of it is that the compiler only has to worry about threads in certain contexts. If i and j were global variables instead of local variables, it could still reverse ++i and ++j because it's allowed to assume there's only one thread accessing them unless you use certain thread primatives (such as a mutex).
Now when it comes to code like this:
while (!Global_Stop) {
//...
}
If it can prove the code hidden in the comment doesn't touch the Global_Stop, and there are no thread primatives such as a mutex, it can happily optimise it to:
if (!Global_Stop) {
while (true) {
//...
}
}
If it can prove that Global_Stop is false at the start then it can even remove the if check!
Actually things are even worse than this, at least in theory. You see, if a thread is in the process of writing to a variable when another thread accesses it then only part of that write might be observed, giving you a totally different value (e.g. you update i from 3 to 4 and the other thread reads 7). Admittedly that is unlikely with a bool. But the standard is even more broader than this: this situation is undefined behaviour, so it could even crash your program or have some other weird unexpected behaviour.
Yes, this will most likely work, but only "by accident". As #idclev463035818 already wrote correctly:
std::atomic is not for "complex cases". It is for when you need to access something from multiple threads.
So in this case you should use atomic<bool> instead of volatile. The fact that volatile has been part of that language long before the introduction of threads in C++11 should already be a strong indication that volatile was never designed or intended to be used for multi-threading. It is important to note that in C++ volatile is something fundamentally different from volatile in languages like Java or C# where volatile is in fact related to the memory model. In these languages a volatile variable is much like an atomic in C++.
In C++, volatile is used for what is often referred to as "unusual memory", where memory can be read or modified outside the current process (for example when using memory mapped I/O). volatile forces the compiler to execute all operations in the exact order as specified. This prevents some optimizations that would be perfectly legal for atomics, while also allowing some optimizations that are actually illegal for atomics. For example:
volatile int x;
int y;
volatile int z;
x = 1;
y = 2;
z = 3;
z = 4;
...
int a = x;
int b = x;
int c = y;
int d = z;
In this example, there are two assignments to z, and two read operations on x. If x and z were atomics instead of volatile, the compiler would be free to see the first store as irrelevant and simply remove it. Likewise it could just reuse the value returned by the first load of x, effectively generate code like int b = a. But since x and z are volatile these optimizations are not possible. Instead, the compiler has to ensure that all volatile operations are executed in the exact order as specified, i.e., the volatile operations cannot be reordered with respect to each other. However, this does not prevent the compiler from reordering non-volatile operations. For example, the operations on y could freely be moved up or down - something that would not be possible if x and z were atomics. So if you were to try implementing a lock based on a volatile variable, the compiler could simply (and legally) move some code outside your critical section.
Last but not least it should be noted that marking a variable as volatile does not prevent it from participating in a data race. In those rare cases where you have some "unusual memory" (and therefore really require volatile) that is also accessed by multiple threads, you have to use volatile atomics.
Derived from this question and related to this question:
If I construct an object in one thread and then convey a reference/pointer to it to another thread, is it thread un-safe for that other thread to access the object without explicit locking/memory-barriers?
// thread 1
Obj obj;
anyLeagalTransferDevice.Send(&obj);
while(1); // never let obj go out of scope
// thread 2
anyLeagalTransferDevice.Get()->SomeFn();
Alternatively: is there any legal way to convey data between threads that doesn't enforce memory ordering with regards to everything else the thread has touched? From a hardware standpoint I don't see any reason it shouldn't be possible.
To clarify; the question is with regards to cache coherency, memory ordering and whatnot. Can Thread 2 get and use the pointer before Thread 2's view of memory includes the writes involved in constructing obj? To miss-quote Alexandrescu(?) "Could a malicious CPU designer and compiler writer collude to build a standard conforming system that make that break?"
Reasoning about thread-safety can be difficult, and I am no expert on the C++11 memory model. Fortunately, however, your example is very simple. I rewrite the example, because the constructor is irrelevant.
Simplified Example
Question: Is the following code correct? Or can the execution result in undefined behavior?
// Legal transfer of pointer to int without data race.
// The receive function blocks until send is called.
void send(int*);
int* receive();
// --- thread A ---
/* A1 */ int* pointer = receive();
/* A2 */ int answer = *pointer;
// --- thread B ---
int answer;
/* B1 */ answer = 42;
/* B2 */ send(&answer);
// wait forever
Answer: There may be a data race on the memory location of answer, and thus the execution results in undefined behavior. See below for details.
Implementation of Data Transfer
Of course, the answer depends on the possible and legal implementations of the functions send and receive. I use the following data-race-free implementation. Note that only a single atomic variable is used, and all memory operations use std::memory_order_relaxed. Basically this means, that these functions do not restrict memory re-orderings.
std::atomic<int*> transfer{nullptr};
void send(int* pointer) {
transfer.store(pointer, std::memory_order_relaxed);
}
int* receive() {
while (transfer.load(std::memory_order_relaxed) == nullptr) { }
return transfer.load(std::memory_order_relaxed);
}
Order of Memory Operations
On multicore systems, a thread can see memory changes in a different order as what other threads see. In addition, both compilers and CPUs may reorder memory operations within a single thread for efficiency - and they do this all the time. Atomic operations with std::memory_order_relaxed do not participate in any synchronization and do not impose any ordering.
In the above example, the compiler is allowed to reorder the operations of thread B, and execute B2 before B1, because the reordering has no effect on the thread itself.
// --- valid execution of operations in thread B ---
int answer;
/* B2 */ send(&answer);
/* B1 */ answer = 42;
// wait forever
Data Race
C++11 defines a data race as follows (N3290 C++11 Draft): "The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior." And the term happens before is defined earlier in the same document.
In the above example, B1 and A2 are conflicting and non-atomic operations, and neither happens before the other. This is obvious, because I have shown in the previous section, that both can happen at the same time.
That's the only thing that matters in C++11. In contrast, the Java Memory Model also tries to define the behavior if there are data races, and it took them almost a decade to come up with a reasonable specification. C++11 didn't make the same mistake.
Further Information
I'm a bit surprised that these basics are not well known. The definitive source of information is the section Multi-threaded executions and data races in the C++11 standard. However, the specification is difficult to understand.
A good starting point are Hans Boehm's talks - e.g. available as online videos:
Threads and Shared Variables in C++11
Getting C++ Threads Right
There are also a lot of other good resources, I have mentioned elsewhere, e.g.:
std::memory_order - cppreference.com
There is no parallel access to the same data, so there is no problem:
Thread 1 starts execution of Obj::Obj().
Thread 1 finishes execution of Obj::Obj().
Thread 1 passes reference to the memory occupied by obj to thread 2.
Thread 1 never does anything else with that memory (soon after, it falls into infinite loop).
Thread 2 picks-up the reference to memory occupied by obj.
Thread 2 presumably does something with it, undisturbed by thread 1 which is still infinitely looping.
The only potential problem is if Send didn't acts as a memory barrier, but then it wouldn't really be a "legal transfer device".
As others have alluded to, the only way in which a constructor is not thread-safe is if something somehow gets a pointer or reference to it before the constructor is finished, and the only way that would occur is if the constructor itself has code that registers the this pointer to some type of container which is shared across threads.
Now in your specific example, Branko Dimitrijevic gave a good complete explanation how your case is fine. But in the general case, I'd say to not use something until the constructor is finished, though I don't think there's anything "special" that doesn't happen until the constructor is finished. By the time it enters the (last) constructor in an inheritance chain, the object is pretty much fully "good to go" with all of its member variables being initialized, etc. So no worse than any other critical section work, but another thread would need to know about it first, and the only way that happens is if you're sharing this in the constructor itself somehow. So only do that as the "last thing" if you are.
It is only safe (sort of) if you wrote both threads, and know the first thread is not accessing it while the second thread is. For example, if the thread constructing it never accesses it after passing the reference/pointer, you would be OK. Otherwise it is thread unsafe. You could change that by making all methods that access data members (read or write) lock memory.
Read this question until now... Still will post my comments:
Static Local Variable
There is a reliable way to construct objects when you are in a multi-thread environment, that is using a static local variable (static local variable-CppCoreGuidelines),
From the above reference: "This is one of the most effective solutions to problems related to initialization order. In a multi-threaded environment the initialization of the static object does not introduce a race condition (unless you carelessly access a shared object from within its constructor)."
Also note from the reference, if the destruction of X involves an operation that needs to be synchronized you can create the object on the heap and synchronize when to call the destructor.
Below is an example I wrote to show the Construct On First Use Idiom, which is basically what the reference talks about.
#include <iostream>
#include <thread>
#include <vector>
class ThreadConstruct
{
public:
ThreadConstruct(int a, float b) : _a{a}, _b{b}
{
std::cout << "ThreadConstruct construct start" << std::endl;
std::this_thread::sleep_for(std::chrono::seconds(2));
std::cout << "ThreadConstruct construct end" << std::endl;
}
void get()
{
std::cout << _a << " " << _b << std::endl;
}
private:
int _a;
float _b;
};
struct Factory
{
template<class T, typename ...ARGS>
static T& get(ARGS... args)
{
//thread safe object instantiation
static T instance(std::forward<ARGS>(args)...);
return instance;
}
};
//thread pool
class Threads
{
public:
Threads()
{
for (size_t num_threads = 0; num_threads < 5; ++num_threads) {
thread_pool.emplace_back(&Threads::run, this);
}
}
void run()
{
//thread safe constructor call
ThreadConstruct& thread_construct = Factory::get<ThreadConstruct>(5, 10.1);
thread_construct.get();
}
~Threads()
{
for(auto& x : thread_pool) {
if(x.joinable()) {
x.join();
}
}
}
private:
std::vector<std::thread> thread_pool;
};
int main()
{
Threads thread;
return 0;
}
Output:
ThreadConstruct construct start
ThreadConstruct construct end
5 10.1
5 10.1
5 10.1
5 10.1
5 10.1