Why is is_lock_free a member function? - c++

What is the reason for why is_lock_free requires an instance (it's a member function)? Why not a metafunction of the type, or a static constexpr member function?
I am looking for an actual instance of why it is necessary.

The standard allows a type to be sometimes lock-free.
section 29.4 Lock-free property
The ATOMIC_..._LOCK_FREE macros indicate the lock-free property of the
corresponding atomic types, with the signed and unsigned variants
grouped together. The properties also apply to the corresponding
(partial) specializations of the atomic template. A value of 0
indicates that the types are never lock-free. A value of 1 indicates
that the types are sometimes lock-free. A value of 2 indicates that
the types are always lock-free.
The C++ atomic paper n2427 states the reason behind:
... The proposal provides run-time lock-free query functions rather
than compile-time constants because subsequent implementations of a
platform may upgrade locking operations with lock-free operations, so
it is common for systems to abstract such facilities behind dynamic
libraries, and we wish to leave that possiblility open. Furthermore,
we recommend that implementations without hardware atomic support use
that technique. ...
And also (as Jesse Good pointed out):
The proposal provides lock-free query functions on individual objects rather than whole types to permit unavoidably misaligned atomic variables without penalizing the performance of aligned atomic variables

Related

C++11 atomic<>: only to be read/written with provided methods?

I wrote some multithreaded but lock-free code that compiled and apparently executed fine on an earlier C++11-supporting GCC (7 or older). The atomic fields were ints and so on. To the best of my recollection, I used normal C/C++ operations to operate on them (a=1;, etc.) in places where atomicity or event ordering wasn't a concern.
Later I had to do some double-width CAS operations, and made a little struct with a pointer and counter as is common. I tried doing the same normal C/C++ operations, and errors came that the variable had no such members. (Which is what you'd expect from most normal templates, but I half-expected atomic to work differently, in part because normal assignments to and from were supported, to the best of my recollection, for ints.).
So two part question:
Should we use the atomic methods in all cases, even (say) initialization done by one thread with no race conditions? 1a) so once declared atomic there's no way to access unatomically? 1b) we also have to use the verboser verbosity of the atomic<> methods to do so?
Otherwise, if for integer types at least, we can use normal C/C++ operations. But in this case will those operations be the same as load()/store() or are they merely normal assignments?
And a semi-meta question: is there any insight as to why normal C/C++ operations aren't supported on atomic<> variables? I'm not sure if the C++11 language as spec'd has the power to write code that does that, but the spec can certainly require the compiler to do things the language as spec'd isn't powerful enough to do.
You're maybe looking for C++20 std::atomic_ref<T> to give you the ability to do atomic ops on objects that can also be accessed non-atomically. Make sure your non-atomic T object is declared with sufficient alignment for atomic<T>. e.g.
alignas(std::atomic_ref<long long>::required_alignment)
long long sometimes_shared_var;
But that requires C++20, and nothing equivalent is available in C++17 or earlier. Once an atomic object is constructed, I don't think there's any guaranteed portable safe way to modify it other than its atomic member functions.
Its internal object representation isn't guaranteed by the standard so memcpy to get the struct sixteenbyte object out of an atomic<sixteenbyte> efficiently isn't guaranteed by the standard to be safe even if no other thread has a reference to it. You'd have to know how a specific implementation stores it. Checking sizeof(atomic<T>) == sizeof(T) is a good sign, though, and mainstream implementations do in practice just have a T as the object-representation for atomic<T>.
Related: How can I implement ABA counter with c++11 CAS? for a nasty union hack ("safe" in GNU C++) to give efficient access to a single member, because compilers don't optimize foo.load().ptr to just atomically load that member. Instead GCC and clang will lock cmpxchg16b to load the whole pointer+counter pair, then just the first member. C++20 atomic_ref<> should solve that.
Accessing members of atomic<struct foo>: one reason for not allowing shared.x = tmp; is that it's the wrong mental model. If two different threads are storing to different members of the same struct, how does the language define any ordering for what other threads see? Plus it was probably considered too easy for programmer to design their lockless algorithms incorrectly if stuff like that were allowed.
Also, how would you even implement that? Return an lvalue-reference? It can't be to the underlying non-atomic object. And what if the code captures that reference and keeps using it long after calling some function that's not load or store?
Remember that ISO C++'s ordering model works in terms of synchronizes-with, not in terms of local reordering and a single cache-coherent domain like the way real ISAs define their memory models. The ISO C++ model is always strictly in terms of reading, writing, or RMWing the entire atomic object. So a load of the object can always sync-with any store of the whole object.
In hardware that would actually still work for a store to one member and a load from a different member if the whole object is in one cache line, on real-world ISAs. At least I think so, although possibly not on some SMT systems. (Being in one cache line is necessary for lock-free atomic access to the whole object to be possible on most ISAs.)
we also have to use the verboser verbosity of the atomic<> methods to do so?
The member functions of atomic<T> include overloads of all the operators, including operator= (store) and cast back to T (load). a = 1; is equivalent to a.store(1, std::memory_order_seq_cst) for atomic<int> a; and is the slowest way to set a new value.
Should we use the atomic methods in all cases, even (say) initialization done by one thread with no race conditions?
You don't have any choice, other than passing args to the constructors of std::atomic<T> objects.
You can use mo_relaxed loads / stores while your object is still thread-private, though. Avoid any RMW operators like +=. e.g. a.store(a.load(relaxed) + 1, relaxed); will compile about the same as for non-atomic objects of register-width or smaller.
(Except that it can't optimize away and keep the value in a register, so use local temporaries instead of actually updating the atomic object).
But for atomic objects too large to be lock-free, there's not really anything you can do efficiently except construct them with the right values in the first place.
The atomic fields were ints and so on. ...
and apparently executed fine
If you mean plain int, not atomic<int> then it wasn't portably safe.
Data-race UB doesn't guarantee visible breakage, the nasty thing with undefined behaviour is that happening to work in your test case is one of the things that's allowed to happen.
And in many cases with pure load or pure store, it won't break, especially on strongly ordered x86, unless the load or store can hoist or sink out of a loop. Why is integer assignment on a naturally aligned variable atomic on x86?. It'll eventually bite you when a compiler manages to do cross-file inlining and reorder some operations at compile time, though.
why normal C/C++ operations aren't supported on atomic<> variables?
... but the spec can certainly require the compiler to do things the language as spec'd isn't powerful enough to do.
This in fact was a limitation of C++11 through 17. Most compilers have no problem with it. For example implementation of the <atomic> header for gcc/clang's uses __atomic_ builtins which take a plain T* pointer.
The C++20 proposal for atomic_ref is p0019, which cites as motivation:
An object could be heavily used non-atomically in well-defined phases
of an application. Forcing such objects to be exclusively atomic would
incur an unnecessary performance penalty.
3.2. Atomic Operations on Members of a Very Large Array
High-performance computing (HPC) applications use very large arrays. Computations with these arrays typically have distinct phases that allocate and initialize members of the array, update members of the array, and read members of the array. Parallel algorithms for initialization (e.g., zero fill) have non-conflicting access when assigning member values. Parallel algorithms for updates have conflicting access to members which must be guarded by atomic operations. Parallel algorithms with read-only access require best-performing streaming read access, random read access, vectorization, or other guaranteed non-conflicting HPC pattern.
All of these things are a problem with std::atomic<>, confirming your suspicion that this is a problem for C++11.
Instead of introducing a way to do non-atomic access to std::atomic<T>, they introduced a way to do atomic access to a T object. One problem with this is that atomic<T> might need more alignment than a T would get by default, so be careful.
Unlike with giving atomic access to members of T, you could plausible have a .non_atomic() member function that returned an lvalue reference to the underlying object.

Why doesn't C++11 standard provide other lock free atomic structure [duplicate]

From C++ Concurrency in Action:
difference between std::atomic and std::atomic_flag is that std::atomic may not be lock-free; the implementation may have to acquire a mutex internally in order to ensure the atomicity of the operations
I wonder why. If atomic_flag is guaranteed to be lock-free, why isn't it guaranteed for atomic<bool> as well?
Is this because of the member function compare_exchange_weak? I know that some machines lack a single compare-and-exchange instruction, is that the reason?
First of all, you are perfectly allowed to have something like std::atomic<very_nontrivial_large_structure>, so std::atomic as such cannot generally be guaranteed to be lock-free (although most specializations for trivial types like bool or int probably could, on most systems). But that is somewhat unrelated.
The exact reasoning why atomic_flag and nothing else must be lock-free is given in the Note in N2427/29.3:
Hence the operations must be address-free. No other type requires lock-free operations, and hence the atomic_flag type is the minimum hardware-implemented type needed to conform to this standard. The remaining types can be emulated with atomic_flag, though with less than ideal properties.
In other words, it's the minimum thing that must be guaranteed on every platform, so it's possible to implement the standard correctly.
The standard does not garantee atomic objects are lock-free. On a platform that doesn't provide lock-free atomic operations for a type T, std::atomic<T> objects may be implemented using a mutex, which wouldn't be lock-free. In that case, any containers using these objects in their implementation would not be lock-free either.
The standard provide an opportunity to check if an std::atomic<T> variable is lock-free: you can use var.is_lock_free() or atomic_is_lock_free(&var). For basic types such as int, there is also macros provided (e.g. ATOMIC_INT_LOCK_FREE) which specify if lock-free atomic access to that type is available.
std::atomic_flag is an atomic boolean type. Almost always for boolean type it's not needed to use mutex or another way for synchronization.

Difference between interlocked variable access (on boolean) and std::atomic_flag

I was wondering what the differences are between accessing a boolean value using Windows' interlockedXXX functions and using std::atomic_flag.
To my knowledge, both of them are lock-less and you can't set or read an atomic_flag directly. I wonder whether there are more differences.
std::atomic_flag serves basically as a primitive for building other synchronization primitives. In case one needs to set or read, it might make more sense to compare with std::atomic<bool>.
However, there are some additional (conceptual) differences:
With interlockedXXX, you won't get portable code.
interlockedXXX is a function, while std::atomic_flag (as well as std::atomic) is a type. That's a significant difference, since, you can use interlockedXXX with any suitable memory location, such as an element of std::vector. On the contrary, you cannot make a vector of C++ atomic flags or atomic bools, since the corresponding types do not meet the vector value type requirements. 1
You can see the latter difference in the code created by #RmMm, where flag is an ordinary variable. I also added a case with atomic<bool> and you may notice that all the three variants produce the very same assembly:
https://godbolt.org/z/9xwRV6
[1] This problem should be addressed by std::atomic_ref in C++20.

std atomic is_lock_free() and is_always_lock_free() [duplicate]

What is the reason for why is_lock_free requires an instance (it's a member function)? Why not a metafunction of the type, or a static constexpr member function?
I am looking for an actual instance of why it is necessary.
The standard allows a type to be sometimes lock-free.
section 29.4 Lock-free property
The ATOMIC_..._LOCK_FREE macros indicate the lock-free property of the
corresponding atomic types, with the signed and unsigned variants
grouped together. The properties also apply to the corresponding
(partial) specializations of the atomic template. A value of 0
indicates that the types are never lock-free. A value of 1 indicates
that the types are sometimes lock-free. A value of 2 indicates that
the types are always lock-free.
The C++ atomic paper n2427 states the reason behind:
... The proposal provides run-time lock-free query functions rather
than compile-time constants because subsequent implementations of a
platform may upgrade locking operations with lock-free operations, so
it is common for systems to abstract such facilities behind dynamic
libraries, and we wish to leave that possiblility open. Furthermore,
we recommend that implementations without hardware atomic support use
that technique. ...
And also (as Jesse Good pointed out):
The proposal provides lock-free query functions on individual objects rather than whole types to permit unavoidably misaligned atomic variables without penalizing the performance of aligned atomic variables

Why only std::atomic_flag is guaranteed to be lock-free?

From C++ Concurrency in Action:
difference between std::atomic and std::atomic_flag is that std::atomic may not be lock-free; the implementation may have to acquire a mutex internally in order to ensure the atomicity of the operations
I wonder why. If atomic_flag is guaranteed to be lock-free, why isn't it guaranteed for atomic<bool> as well?
Is this because of the member function compare_exchange_weak? I know that some machines lack a single compare-and-exchange instruction, is that the reason?
First of all, you are perfectly allowed to have something like std::atomic<very_nontrivial_large_structure>, so std::atomic as such cannot generally be guaranteed to be lock-free (although most specializations for trivial types like bool or int probably could, on most systems). But that is somewhat unrelated.
The exact reasoning why atomic_flag and nothing else must be lock-free is given in the Note in N2427/29.3:
Hence the operations must be address-free. No other type requires lock-free operations, and hence the atomic_flag type is the minimum hardware-implemented type needed to conform to this standard. The remaining types can be emulated with atomic_flag, though with less than ideal properties.
In other words, it's the minimum thing that must be guaranteed on every platform, so it's possible to implement the standard correctly.
The standard does not garantee atomic objects are lock-free. On a platform that doesn't provide lock-free atomic operations for a type T, std::atomic<T> objects may be implemented using a mutex, which wouldn't be lock-free. In that case, any containers using these objects in their implementation would not be lock-free either.
The standard provide an opportunity to check if an std::atomic<T> variable is lock-free: you can use var.is_lock_free() or atomic_is_lock_free(&var). For basic types such as int, there is also macros provided (e.g. ATOMIC_INT_LOCK_FREE) which specify if lock-free atomic access to that type is available.
std::atomic_flag is an atomic boolean type. Almost always for boolean type it's not needed to use mutex or another way for synchronization.