difference between standard's atomic bool and atomic flag - c++

I wasn't aware of the std::atomic variables but was aware about the std::mutex (weird right!) provided by the standard; however one thing caught my eye: there are two seemingly-same (to me) atomic types provided by the standard, listed below:
std::atomic<bool>
std::atomic_flag
The std::atomic_flag contains the following explanation:
std::atomic_flag is an atomic boolean type. Unlike all specializations of std::atomic, it is guaranteed to be lock-free. Unlike std::atomic<bool>, std::atomic_flag does not provide load or store operations.
which I fail to understand. Is std::atomic<bool> not guaranteed to be lock-free? Then it's not atomic or what?
So what's the difference between the two and when should I use which?

std::atomic bool type not guranteed to be lock-free?
Correct. std::atomic may be implemented using locks.
then it's not atomic or what?
std::atomic is atomic whether it has been implemented using locks, or without. std::atomic_flag is guaranteed to be implemented without using locks.
So what's the difference b/w two
The primary difference besides the lock-free guarantee is:
std::atomic_flag does not provide load or store operations.
and when should I use which?
Usually, you will want to use std::atomic<bool> when you need an atomic boolean variable. std::atomic_flag is a low level structure that can be used to implement custom atomic structures.

std::atomic<T> guarantees that accesses to the variable will be atomic. It however does not says how is the atomicity achieved. It can be using lock-free variable, or using a lock. The actual implementation depends on your target architecture and the type T.
std::atomic_flag on the other hand is guaranteed to be implemented using a lock-free technique.

Related

What are the exact inter-thread reordering constraints on mutex.lock() and .unlock() in c++11 and up?

According to https://en.cppreference.com/w/cpp/atomic/memory_order mutex.lock() and mutex.unlock() are acquire and release operations. An acquire operation makes it impossible to reorder later instructions in front of it. And release operations make it impossible to reorder earlier instructions after it. This makes it such that the following code:
[Thread 1]
mutex1.lock();
mutex1.unlock();
mutex2.lock();
mutex2.unlock();
[Thread 2]
mutex2.lock();
mutex2.unlock();
mutex1.lock();
mutex1.unlock();
Can be reordered into the following (possibly deadlocking) code:
[Thread 1]
mutex1.lock();
mutex2.lock();
mutex1.unlock();
mutex2.unlock();
[Thread 2]
mutex2.lock();
mutex1.lock();
mutex2.unlock();
mutex1.unlock();
Is it possible for this reordering to occur. Or is there a rule preventing it?
Almost a duplicate: How C++ Standard prevents deadlock in spinlock mutex with memory_order_acquire and memory_order_release? - that's using hand-rolled std::atomic spinlocks, but the same reasoning applies:
The compiler can't compile-time reorder mutex acquire and release in ways that could introduce a deadlock where the C++ abstract machine doesn't have one. That would violate the as-if rule.
It would effectively be introducing an infinite loop in a place the source doesn't have one, violating this rule:
ISO C++ current draft, section 6.9.2.3 Forward progress
18. An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.
The ISO C++ standard doesn't distinguish compile-time vs. run-time reordering. In fact it doesn't say anything about reordering. It only says things about when you're guaranteed to see something because of synchronizes-with effects, and the existence of a modification order for each atomic object, and the total order of seq_cst operations. It's a misreading of the standard to take it as permission to nail things down into asm in a way that requires mutexes to be taken in a different order than source order.
Taking a mutex is essentially equivalent to an atomic RMW with memory_order_acquire on the mutex object. (And in fact the ISO C++ standard even groups them together in 6.9.2.3 :: 18 quoted above.)
You're allowed to see an earlier release or relaxed store or even RMW appear inside a mutex lock/unlock critical section instead of before it. But the standard requires an atomic store (or sync operation) to be visible to other threads promptly, so compile-time reordering to force it to wait until after a lock had been acquired could violate that promptness guarantee. So even a relaxed store can't compile-time / source-level reorder with a mutex.lock(), only as a run-time effect.
This same reasoning applies to mutex2.lock(). You're allowed to see reordering, but the compiler can't create a situation where the code requires that reordering to always happen, if that makes execution different from the C++ abstract machine in any important / long-term observable ways. (e.g. reordering around an unbounded wait). Creating a deadlock counts as one of those ways, whether for this reason or another. (Every sane compiler developer would agree on that, even if C++ didn't have formal language to forbid it.)
Note that mutex unlock can't block, so compile-time reordering of two unlocks isn't forbidden for that reason. (If there are no slow or potentially blocking operations in between). But mutex unlock is a "release" operation, so that's ruled out: two release stores can't reorder with each other.
And BTW, the practical mechanism for preventing compile-time reordering of mutex.lock() operations is just to make them regular function calls that the compiler doesn't know how to inline. It has to assume that functions aren't "pure", i.e. that they have side effects on global state, and thus the order might be important. That's the same mechanism that keeps operations inside the critical section: How does a mutex lock and unlock functions prevents CPU reordering?
An inlinable std::mutex written with std::atomic would end up depending on the compiler actually applying the rules about making operations visible promptly and not introducing deadlocks by reordering things at compile-time. As described in How C++ Standard prevents deadlock in spinlock mutex with memory_order_acquire and memory_order_release?
An acquire operation makes it impossible to reorder later instructions in front of it. And release operations make it impossible to reorder earlier instructions after it.
Locking a mutex is not a memory read, not a memory write, and not an instruction. It's an algorithm that has numerous internal ordering requirements. Operations that have ordering requirements themselves use some mechanism to ensure the requirements of that operation are followed regardless of what reordering is allowed by other operations that occur before or after it.
When considering whether two operations can be re-ordered, you have to obey the ordering constraints of both of the two operations. Mutex lock and unlock operations contain numerous internal operations that have their own ordering constraints. You can't just move the block of operations and assume you aren't violating the internal constraints of those operations.
If the mutex lock and unlock implementations on your platform don't have sufficient ordering constraints to ensure they work correctly when used as intended, those implementations are broken.

Any disadvantages for std::atomic_flag not providing load or store operations? (Spin-lock example)

Comparing a std::atomic_flag to an std::atomic_bool (aka std::atomic<bool>), it seems to me that a std::atomic_flag just has a simpler interface. It provides only testing+setting and clearing the flag while an std::atomic_bool also provides overloads to several operators.
One question of mine is about terminology: What is meant by "load or store operations"? Does it mean that it is not possible to arbitrarily read and modify a std::atomic_flag's value?
Furthermore, I am wondering, could a std::atomic_bool be faster when being used for a spin-lock? It seems to me that an std::atomic_flag always must read AND write during a spin-lock:
while (my_atomic_flag.test_and_set()); // spin-lock
while an std::atomic_bool would only have to perform a read operation (assuming that the atomic bool is implemented lock-free):
while (my_atomic_bool); // spin-lock
Is an std::atomic_flag strictly more efficient than an std::atomic_bool or could it also be the other way round? What should be used for a spin-lock?
What is meant by "load or store operations"? Does it mean that it is not possible to arbitrarily read and modify a std::atomic_flag's value?
The normal store/load operations are not supported on a std::atomic_flag.
It is a modify-only type; ie. you cannot read-access a std::atomic_flag object without performing a modifying operation.
In general, std::atomic_flag is meant as a building block for other operations. It's interface is deliberately simple; it is the only atomic type that has guaranteed atomic lock-free operations.
The operations it supports are:
std::atomic_flag::clear()
std::atomic_flag::test_and_set()
With that, you can easily build your own spinlock (although generally not recommended):
class my_spinlock {
std::atomic_flag flag = ATOMIC_FLAG_INIT;
public:
void lock()
{
while(flag.test_and_set());
}
void unlock()
{
flag.clear();
}
};
Furthermore, I am wondering, could a std::atomic_bool be faster when being used for a spin-lock? It seems to me that an std::atomic_flag always must read AND write during a spin-lock
Well, the thing is, a spinlock always has to modify its state when acquiring the lock. You simply cannot take a lock without telling others.
The implementation for lock() based on a std::atomic<bool> looks very similar:
while (flag.exchange(true));
Whether a spinlock based on std::atomic_flag is faster ?
On my platform, the compiler emits the same assembly for both types so I would be very surprised.

Synchronization mode in mutex protected block

http://www.boost.org/doc/libs/1_58_0/doc/html/atomic/usage_examples.html
In the "Singleton with double-checked locking pattern" example of the above boost examples, are the memory_order_consume for the second load of _instance and the memory_order_release for the store of _instance necessary? I thought the scoped_lock has acquire and release semantics already and the first load of _instance has synchronization mode memory_order_consume.
Under the assumption that the boost primitives used here support the same functionality as their std counterparts, the second load does not require memory_order_consume since it is guaranteed
to be synchronized with the store/release based on the acquire/release semantics of the mutex, you are right about that.
Perhaps, the use of memory_order_consume was based on the false assumption that a load/relaxed may float up across the mutex/acquire barrier,
but this is not possible per the mutex guarantees and as such memory_order_relaxed is perfectly fine.
The store/release on the other hand is absolutely necessary because it synchronizes with the first load/consume which is not protected by the mutex.

Does C++11 std::atomic guarantee mutual exclusion as well as sequential consistency?

I believe the answer is yes, much like in Java.
Please correct me if I am wrong.
If I need to just use mutual exclusion, I can use std::mutex and others.
What if I need just sequential consistency and not mutual exclusion? What can be used for that?
Yes -- see std::atomic with memory_order_seq_cst for sequential consistency.
The individual operations performed on objects of type std::atomic<whatever> are atomic, but that's as far as it goes. So std::atomic<int>::fetch_add() is atomic. But
std::atomic<int> x;
...
int tmp = x.load();
tmp += 1;
x.store(tmp);
Is just sequentially consistent, not atomic.

C++ std::atomic vs. Boost atomic

In my application, I have an int and a bool variable, which are accessed (multiple write/read) by multiple threads. Currently, I am using two mutexes, one for int and one for bool to protect those variables.
I heard about using atomic variables and operators to write lock-free multi-thread program. My questions are
What's the definition of atomic variables and operators?
What's the main difference between std::atomic and
boost/atomic.hpp? Which one is more standard or popular?
Are these libraries platform-dependent? I am using gnu gcc 4.6 on
Linux at the moment, but ideally it shall be cross-platform. I heard that the definition of "atomic" actually depends on the hardware as well. Can anyone explain that as well?
What's the best way to share a bool variable among multiple threads? I would prefer not to use the "volatile" keyword.
Are these code thread-safe?
double double_m; // double_m is only accessed by current thread.
std::atomic<bool> atomic_bool_x;
atomic_bool_x = true && (double_m > 12.5);
int int_n; // int_n is only accessed by current thread.
std::atomic<int> atomic_int_x;
std::atomic<int> atomic_int_y;
atomic_int_y = atomic_int_x * int_n;
I'm not an expert or anything, but here's what I know:
std::atomic simply says that calling load and store (and a few other operations) concurrently is well-defined. An atomic operation is indivisible - nothing can happen 'in-between'.
I assume std::atomic is based off of boost::atomic. If you can, use std, otherwise use boost.
They are both portable, with the std being completely so, however your compiler will need to support C++11
Likely std::atomic_bool. You should not need to use volatile.
Also, I believe load/store differs from operator=/operator T only load/store are atomic.
Nevermind. I checked the standard and it appears that the operators are defined in terms of load/store/etc, however they may return different things.
Further reading:
http://en.cppreference.com/w/cpp/atomic/atomic
C++11 Standard
C++ Concurrency in Action
Volatile is orthogonal to what you use to implement atomics. In C++ it tells the compiler that certain it is not safe to perform optimizations with that variable. Herb Sutters lays it out:
To safely write lock-free code that communicates between threads without using locks, prefer to use ordered atomic variables: Java/.NET volatile, C++0x atomic, and C-compatible atomic_T.
To safely communicate with special hardware or other memory that has unusual semantics, use unoptimizable variables: ISO C/C++ volatile. Remember that reads and writes of these variables are not necessarily atomic, however.
Finally, to express a variable that both has unusual semantics and has any or all of the atomicity and/or ordering guarantees needed for lock-free coding, only the ISO C++0x draft Standard provides a direct way to spell it: volatile atomic.
(from http://drdobbs.com/article/print?articleId=212701484&siteSectionName=parallel)
See std::atomic class template
std::atomic is standard since C++11, and the Boost stuff is older. But since it is standard now, I would prefer std::atomic.
?? You can use std::atomic with each C++11 compiler on each platform you want.
Without any further information...
std::atomic;
I believe std::atomic (C++11) and boost.atomic are equivalent. If std::atomic is not supported by your compiler yet, use boost::atomic.