http://www.boost.org/doc/libs/1_58_0/doc/html/atomic/usage_examples.html
In the "Singleton with double-checked locking pattern" example of the above boost examples, are the memory_order_consume for the second load of _instance and the memory_order_release for the store of _instance necessary? I thought the scoped_lock has acquire and release semantics already and the first load of _instance has synchronization mode memory_order_consume.
Under the assumption that the boost primitives used here support the same functionality as their std counterparts, the second load does not require memory_order_consume since it is guaranteed
to be synchronized with the store/release based on the acquire/release semantics of the mutex, you are right about that.
Perhaps, the use of memory_order_consume was based on the false assumption that a load/relaxed may float up across the mutex/acquire barrier,
but this is not possible per the mutex guarantees and as such memory_order_relaxed is perfectly fine.
The store/release on the other hand is absolutely necessary because it synchronizes with the first load/consume which is not protected by the mutex.
Related
According to https://en.cppreference.com/w/cpp/atomic/memory_order mutex.lock() and mutex.unlock() are acquire and release operations. An acquire operation makes it impossible to reorder later instructions in front of it. And release operations make it impossible to reorder earlier instructions after it. This makes it such that the following code:
[Thread 1]
mutex1.lock();
mutex1.unlock();
mutex2.lock();
mutex2.unlock();
[Thread 2]
mutex2.lock();
mutex2.unlock();
mutex1.lock();
mutex1.unlock();
Can be reordered into the following (possibly deadlocking) code:
[Thread 1]
mutex1.lock();
mutex2.lock();
mutex1.unlock();
mutex2.unlock();
[Thread 2]
mutex2.lock();
mutex1.lock();
mutex2.unlock();
mutex1.unlock();
Is it possible for this reordering to occur. Or is there a rule preventing it?
Almost a duplicate: How C++ Standard prevents deadlock in spinlock mutex with memory_order_acquire and memory_order_release? - that's using hand-rolled std::atomic spinlocks, but the same reasoning applies:
The compiler can't compile-time reorder mutex acquire and release in ways that could introduce a deadlock where the C++ abstract machine doesn't have one. That would violate the as-if rule.
It would effectively be introducing an infinite loop in a place the source doesn't have one, violating this rule:
ISO C++ current draft, section 6.9.2.3 Forward progress
18. An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.
The ISO C++ standard doesn't distinguish compile-time vs. run-time reordering. In fact it doesn't say anything about reordering. It only says things about when you're guaranteed to see something because of synchronizes-with effects, and the existence of a modification order for each atomic object, and the total order of seq_cst operations. It's a misreading of the standard to take it as permission to nail things down into asm in a way that requires mutexes to be taken in a different order than source order.
Taking a mutex is essentially equivalent to an atomic RMW with memory_order_acquire on the mutex object. (And in fact the ISO C++ standard even groups them together in 6.9.2.3 :: 18 quoted above.)
You're allowed to see an earlier release or relaxed store or even RMW appear inside a mutex lock/unlock critical section instead of before it. But the standard requires an atomic store (or sync operation) to be visible to other threads promptly, so compile-time reordering to force it to wait until after a lock had been acquired could violate that promptness guarantee. So even a relaxed store can't compile-time / source-level reorder with a mutex.lock(), only as a run-time effect.
This same reasoning applies to mutex2.lock(). You're allowed to see reordering, but the compiler can't create a situation where the code requires that reordering to always happen, if that makes execution different from the C++ abstract machine in any important / long-term observable ways. (e.g. reordering around an unbounded wait). Creating a deadlock counts as one of those ways, whether for this reason or another. (Every sane compiler developer would agree on that, even if C++ didn't have formal language to forbid it.)
Note that mutex unlock can't block, so compile-time reordering of two unlocks isn't forbidden for that reason. (If there are no slow or potentially blocking operations in between). But mutex unlock is a "release" operation, so that's ruled out: two release stores can't reorder with each other.
And BTW, the practical mechanism for preventing compile-time reordering of mutex.lock() operations is just to make them regular function calls that the compiler doesn't know how to inline. It has to assume that functions aren't "pure", i.e. that they have side effects on global state, and thus the order might be important. That's the same mechanism that keeps operations inside the critical section: How does a mutex lock and unlock functions prevents CPU reordering?
An inlinable std::mutex written with std::atomic would end up depending on the compiler actually applying the rules about making operations visible promptly and not introducing deadlocks by reordering things at compile-time. As described in How C++ Standard prevents deadlock in spinlock mutex with memory_order_acquire and memory_order_release?
An acquire operation makes it impossible to reorder later instructions in front of it. And release operations make it impossible to reorder earlier instructions after it.
Locking a mutex is not a memory read, not a memory write, and not an instruction. It's an algorithm that has numerous internal ordering requirements. Operations that have ordering requirements themselves use some mechanism to ensure the requirements of that operation are followed regardless of what reordering is allowed by other operations that occur before or after it.
When considering whether two operations can be re-ordered, you have to obey the ordering constraints of both of the two operations. Mutex lock and unlock operations contain numerous internal operations that have their own ordering constraints. You can't just move the block of operations and assume you aren't violating the internal constraints of those operations.
If the mutex lock and unlock implementations on your platform don't have sufficient ordering constraints to ensure they work correctly when used as intended, those implementations are broken.
I need a very fast (in the sense "low cost for reader", not "low latency") change notification mechanism between threads in order to update a read cache:
The situation
Thread W (Writer) updates a data structure (S) (in my case a setting in a map) only once in a while.
Thread R (Reader) maintains a cache of S and does read this very frequently. When Thread W updates S Thread R needs to be notified of the update in reasonable time (10-100ms).
Architecture is ARM, x86 and x86_64. I need to support C++03 with gcc 4.6 and higher.
Code
is something like this:
// variables shared between threads
bool updateAvailable;
SomeMutex dataMutex;
std::string myData;
// variables used only in Thread R
std::string myDataCache;
// Thread W
SomeMutex.Lock();
myData = "newData";
updateAvailable = true;
SomeMutex.Unlock();
// Thread R
if(updateAvailable)
{
SomeMutex.Lock();
myDataCache = myData;
updateAvailable = false;
SomeMutex.Unlock();
}
doSomethingWith(myDataCache);
My Question
In Thread R no locking or barriers occur in the "fast path" (no update available).
Is this an error? What are the consequences of this design?
Do I need to qualify updateAvailable as volatile?
Will R get the update eventually?
My understanding so far
Is it safe regarding data consistency?
This looks a bit like "Double Checked Locking". According to http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html a memory barrier can be used to fix it in C++.
However the major difference here is that the shared resource is never touched/read in the Reader fast path. When updating the cache, the consistency is guaranteed by the mutex.
Will R get the update?
Here is where it gets tricky. As I understand it, the CPU running Thread R could cache updateAvailable indefinitely, effectively moving the Read way way before the actual if statement.
So the update could take until the next cache flush, for example when another thread or process is scheduled.
Use C++ atomics and make updateAvailable an std::atomic<bool>. The reason for this is that it's not just the CPU that can see an old version of the variable but especially the compiler which doesn't see the side effect of another thread and thus never bothers to refetch the variable so you never see the updated value in the thread. Additionally, this way you get a guaranteed atomic read, which you don't have if you just read the value.
Other than that, you could potentially get rid of the lock, if for example the producer only ever produces data when updateAvailable is false, you can get rid of the mutex because the std::atomic<> enforces proper ordering of the reads and writes. If that's not the case, you'll still need the lock.
You do have to use a memory fence here. Without the fence, there is no guarantee updates will be ever seen on the other thread. In C++03 you have the option of either using platform-specific ASM code (mfence on Intel, no idea about ARM) or use OS-provided atomic set/get functions.
Do I need to qualify updateAvailable as volatile?
As volatile doesn't correlate with threading model in C++, you should use atomics for make your program strictly standard-confirmant:
On C++11 or newer preferable way is to use atomic<bool> with memory_order_relaxed store/load:
atomic<bool> updateAvailable;
//Writer
....
updateAvailable.store(true, std::memory_order_relaxed); //set (under mutex locked)
// Reader
if(updateAvailable.load(std::memory_order_relaxed)) // check
{
...
updateAvailable.store(false, std::memory_order_relaxed); // clear (under mutex locked)
....
}
gcc since 4.7 supports similar functionality with in its atomic builtins.
As for gcc 4.6, it seems there is not strictly-confirmant way to evade fences when access updateAvailable variable. Actually, memory fence is usually much faster than 10-100ms order of time. So you can use its own atomic builtins:
int updateAvailable = 0;
//Writer
...
__sync_fetch_and_or(&updateAvailable, 1); // set to non-zero
....
//Reader
if(__sync_fetch_and_and(&updateAvailable, 1)) // check, but never change
{
...
__sync_fetch_and_and(&updateAvailable, 0); // clear
...
}
Is it safe regarding data consistency?
Yes, it is safe. Your reason is absolutely correct here:
the shared resource is never touched/read in the Reader fast path.
This is NOT double-check locking!
It is explicitely stated in the question itself.
In case when updateAvailable is false, Reader thread uses variable myDataCache which is local to the thread (no other threads use it). With double-check locking scheme all threads use shared object directly.
Why memory fences/barriers are NOT NEEDED here
The only variable, accessed concurrently, is updateAvailable. myData variable is accessed with mutex protection, which provides all needed fences. myDataCache is local to the Reader thread.
When Reader thread sees updateAvailable variable to be false, it uses myDataCache variable, which is changed by the thread itself. Program order garantees correct visibility of changes in that case.
As for visibility garantees for variable updateAvailable, C++11 standard provide such garantees for atomic variable even without fences. 29.3 p13 says:
Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.
Jonathan Wakely has confirmed, that this paragraph is applied even to memory_order_relaxed accesses in chat.
If I am accessing shared memory for reading only, to check a condition for an if() block, should I still lock the mutex? E.g.
mutex_lock();
if (var /* shared memory */) {
}
mutex_unlock();
Is locking here needed and good practice?
If the variable you are reading could be written to concurrently, then yes, you should acquire a lock on the mutex.
You could only read it atomically if your compiler provides you with the necessary primitives for that; this could be either the atomic features that come with C11 and C++11 or some other language extension provided by your compiler. Then you could move the mutex acquisition into the conditional, but if you wait until after the test to acquire the mutex then someone else may change it between the time you test it and the time you acquire the mutex:
if (example) {
// "example" variable could be changed here by another thread.
mutex_lock();
// Now the condition might be false!
mutex_unlock();
}
Therefore, I would suggest acquiring the mutex before the conditional, unless profiling has pinpointed mutex acquisition as a bottleneck. (And in the case where the tested variable is larger than a CPU register -- a 64-bit number on a 32-bit CPU, for example -- then you don't even have the option of delaying mutex acquisition without some other kind of atomic fetch or compare primitive.)
I have seen some repeated code (methods to be precise) where they are entering the critical section and then using InterlockedExchange...Does this make sense since I thought that this operation was infact atomic and would not require such synchronization?
{
EnterCricSectionLock lock (somelock);
InterlockedExchange(&somelong, static_cast<long>(newlongVal));
}
That is basically what there is...
A normal exchange is generally not atomic. It is however ok to do it while owning a mutex, if all other uses is protected by the same mutex. It is also ok to use an atomic exchange, if all other uses are atomic. The only logical reason I can think of to do an atomic exchange while owning the mutex, is that not all uses of this value is mutex protected.
A single atomic operation won't need a CS, but it can act as a fence to make anything altered while the lock is held globally visible (IIRC, explicit fences are for SSE2+, but interlocked ops don't need SSE at all), however then it would need to be after any global stores.
Where this might make sense is that the CS is used to lock access to something else, and thus global being exchanged on is not part of the lock.
I have a parent and a worker thread that share a bool flag and a std::vector. The parent only reads (i.e., reads the bool or calls my_vector.empty()); the worker only writes.
My questions:
Do I need to mutex protect the bool flag?
Can I say that all bool read/writes are inherently atomic operations? If you say Yes or No, where did you get your information from?
I recently heard about GCC Atomic-builtin. Can I use these to make my flag read/writes atomic without having to use mutexes? What is the difference? I understand Atomic builtins boil down to machine code, but even mutexes boil down to CPU's memory barrier instructions right? Why do people call mutexes an "OS-level" construct?
Do I need to mutex protect my std::vector? Recall that the worker thread populates this vector, whereas the parent only calls empty() on it (i.e., only reads it)
I do not believe mutex protection is necessary for either the bool or the vector. I rationalize as follows, "Ok, if I read the shared memory just before it was updated.. thats still fine, I will get the updated value the next time around. More importantly, I do not see why the writer should be blocked while the reading is reading, because afterall, the reader is only reading!"
If someone can point me in the right direction, that would be just great. I am on GCC 4.3, and Intel x86 32-bit.
Thanks a lot!
Do I need to mutex protect the bool flag?
Not necessarily, an atomic instruction would do. By atomic instruction I mean a compiler intrinsic function that a) prevents compiler reordering/optimization and b) results in atomic read/write and c) issues an appropriate memory fence to ensure visibility between CPUs (not necessary for current x86 CPUs which employ MESI cache coherency protocol). Similar to gcc atomic builtins.
Can I say that all bool read/writes are inherently atomic operations? If you say Yes or No, where did you get your information from?
Depends on the CPU. For Intel CPUs - yes. See Intel® 64 and IA-32 Architectures Software Developer's Manuals.
I recently heard about GCC Atomic-builtin. Can I use these to make my flag read/writes atomic without having to use mutexes? What is the difference? I understand Atomic builtins boil down to machine code, but even mutexes boil down to CPU's memory barrier instructions right? Why do people call mutexes an "OS-level" construct?
The difference between atomics and mutexes is that the latter can put the waiting thread to sleep until the mutex is released. With atomics you can only busy-spin.
Do I need to mutex protect my std::vector? Recall that the worker thread populates this vector, whereas the parent only calls empty() on it (i.e., only reads it)
You do.
I do not believe mutex protection is necessary for either the bool or the vector. I rationalize as follows, "Ok, if I read the shared memory just before it was updated.. thats still fine, I will get the updated value the next time around. More importantly, I do not see why the writer should be blocked while the reading is reading, because afterall, the reader is only reading!"
Depending on the implementation, vector.empty() may involve reading two buffer begin/end pointers and subtracting or comparing them, hence there is a chance that you read a new version of one pointer and an old version of another one without a mutex. Surprising behaviour may ensue.
From the C++11 standards point of view, you have to protect the bool with a mutex, or alternatively use std::atomic<bool>. Even when you are sure that your bool is read and written to atomically anyways, there is still the chance that the compiler can optimize away accesses to it because it does not know about other threads that could potentially access it.
If for some reason you absolutely need the latest bit of performance of your platform, consider reading the "Intel 64 and IA-32 Architectures Software Developer's Manual", which will tell you how things work under the hood on your architecture. But of course, this will make your program unportable.
Answers:
You will need to protect the bool (or any other variable for that matter) that has the possibility of being operated on by two or more threads at the same time. You can either do this with a mutex or by operating on the bool atomically.
Bool reads and bool writes may be atomic operations, but two sequential operations are certainly not (e.g., a read and then a write). More on this later.
Atomic builtins provide a solution to the problem above: the ability to read and write a variable in a step that cannot be interrupted by another thread. This makes the operation atomic.
If you are using the bool flag as your 'mutex' (that is, only the thread that sets the bool flag to true has permission to modify the vector) then you're OK. The mutual exclusion is managed by the boolean, and as long as you're modifying the bool using atomic operations you should be all set.
To answer this, let me use an example:
bool flag(false);
std::vector<char> my_vector;
while (true)
{
if (flag == false) // check to see if the mutex is owned
{
flag = true; // obtain ownership of the flag (the mutex)
// manipulate the vector
flag = false; // release ownership of the flag
}
}
In the above code in a multithreaded environment it is possible for the thread to be preempted between the if statement (the read) and the assignment (the write), which means it possible for two (or more) threads with this kind of code to both "own" the mutex (and the rights to the vector) at the same time. This is why atomic operations are crucial: they ensure that in the above scenario the flag will only be set by one thread at a time, therefore ensuring the vector will only be manipulated by one thread at a time.
Note that setting the flag back to false need not be an atomic operation because you this instance is the only one with rights to modify it.
A rough (read: untested) solution may look something like:
bool flag(false);
std::vector<char> my_vector;
while (true)
{
// check to see if the mutex is owned and obtain ownership if possible
if (__sync_bool_compare_and_swap(&flag, false, true))
{
// manipulate the vector
flag = false; // release ownership of the flag
}
}
The documentation for the atomic builtin reads:
The “bool” version returns true if the comparison is successful and newval was written.
Which means the operation will check to see if flag is false and if it is set the value to true. If the value was false true is returned, otherwise false. All of this happens in an atomic step, so it is guaranteed not to be preempted by another thread.
I don't have the expertise to answer your entire question but your last bullet is incorrect in cases in which reads are non-atomic by default.
A context switch can happen anywhere, the reader can get context switched partway through a read, the writer can get switched in and do the full write, and then the reader would finish their read. The reader would see neither the first value, nor the second value, but potentially some wildly inaccurate intermediate value.