Correct usage of volatile with std::atomic_ref<T>

Correct usage of volatile with std::atomic_ref<T> - c++

I'm having trouble wrapping my head around the correct usage of std::atomic_ref<int> with volatile.
Naively there are three possibilities:
std::atomic_ref<volatile int> ref1;
volatile std::atomic_ref<int> ref2;
volatile std::atomic_ref<volatile int> ref3;
When and do we want to use each one? The use-case I'm interested in is MMIO.

Unlike std::atomic<T>, std::atomic_ref<T> does not have volatile-qualified methods. So, you probably can't do much with a volatile std::atomic_ref<T> (whether T is itself volatile or not).
This makes sense given the quote
Like language references, constness is shallow for atomic_ref - it is possible to modify the referenced value through a const atomic_ref object.
Assuming cv-qualification is somewhat consistent, a shallowly-volatile atomic_ref is unlikely to be useful, and definitely isn't what you're asking for.
So, you want
std::atomic_ref<volatile int>
Note that it may be sufficient to just use std::atomic_ref<int>, but since the standard doesn't make any explicit guarantees about MMIO, you should probably consult your compiler documentation and/or check the code it generates.
Depending on std::atomic in this way is at least not portable. Specifically, this answer and its linked paper mention some ways in which std::atomic may be inadequate - you can check whether these are actual problems for you.

Related

Use constructor in place of atomic.store() when atomicity is not currently needed

I use std::atomic for atomicity. Still, somewhere in the code, atomicity is not needed by program logic. In this case, I'm wondering whether it is OK, both pedantically and practically, to use constructor in place of store() as an optimization. For example,
// p.store(nullptr, std::memory_order_relaxed);
new(p) std::atomic<node*>(nullptr);

In accord with the standard, whether this works depends entirely on the implementation of std::atomic<T>. If it is lock-free for that T, then the implementation probably just stores a T. If it isn't lock-free, things get more complex, since it may store a mutex or some other thing.
The thing is, you don't know what std::atomic<T> stores. This matters because if it stores a const-qualified object or a reference type, then reusing the storage here will cause problems. The pointer returned by placement-new can certainly be used, but if a const or reference type is used, the original object name p cannot.
Why would std::atomic<T> store a const or reference type? Who knows; my point is that, because its implementation is not under your control, then pedantically you cannot know how any particular implementation behaves.
As for "practically", it's unlikely that this will cause a problem. Especially if the atomic<T> is always lock-free.
That being said, "practically" should also include some notion of how other users will interpret this code. While people experienced with doing things like reusing storage will be able to understand what the code is doing, they will likely be puzzled by why you're doing it. That means you'll need to either stick a comment on that line or make a (template) function non_atomic_reset.
Also, it should be noted that std::shared_ptr uses atomic increments/decrements for its reference counter. I bring that up because there is no std::single_threaded_shared_ptr that doesn't use atomics, or a special constructor that doesn't use atomics. So even in cases where you're using shared_ptr in pure single-threaded code, those atomics are still firing. This was considered a reasonable tradeoff by the C++ standards committee.
Atomics aren't cheap, but they're not that expensive (most of the time) that using unusual mechanisms like this to bypass an atomic store is a good idea. As always, profile to see if the code obfuscation is worth it.

Is a fundamental type volatile initialization an observable behavior?

Consider this function:
void f(void* loc)
{
auto p = new(loc) volatile int{42};
*p = 0;
}
I have check the generated code by clang, gcc and CL, none of them elide the initialization. (The answer may be seen by the hardwer:).
Is it an extension provided by compilers to the standard? Does the standard allow compilers not to perform the write 42?
Actualy for objects of class type, it is specfied that constructor of an object is executed without consideration for the volatile qualifier [class.ctor]:
A constructor can be invoked for a const, volatile or const volatile object. const and volatile
semantics (10.1.7.1) are not applied on an object under construction. They come into effect when the
constructor for the most derived object (4.5) ends.

[intro.execution]/8 lists the minimum requirements for a conforming implementation; these are also known as “observable behavior”. The first requirement is that “Access to volatile objects are evaluated strictly according to the rules of the abstract machine.” The compiler is required to produce all observable behavior. In particular, it is not allowed to remove accesses to volatile objects. And note that “object” here is used in the compiler-writer’s sense: it includes built-in types.

This is not a coherent question because what it means for a compiler to perform a write is platform-specific. There is no platform-independent notion of performing a write other than perhaps seeing the effects of a write in a subsequent read.
As you see, typical compilers on x86 will emit a write instruction but no memory barrier. The CPU may reorder the write, coalesce it, or even avoid doing any write to main memory because of the way the platform's cache coherence works.
The reason they made this implementation choice is that it makes volatile work for a broad range of applications, including those where the standard requires it to work, and because it has acceptable performance consequences. The standard, being platform-neutral, doesn't dictate platform-specific decisions like this and compiler writers do not understand it to do that.
They could have forced every volatile access to be uncoalsecable, un-reorderable, and pushed through the cache subsystem to main memory. But that would provide terrible performance and, on this platform, no significant benefits. So they don't do it, and they don't understand the C++ standard to suggest that there's some mythical observer on the memory bus who must see specific things. The very existence of a memory bus is platform-specific. The standard is not platform-specific.
You will sometimes see people argue, for example, that the standard somehow requires the compiler to issue instructions to do volatile writes in order but that it doesn't matter if the CPU coalesces or re-orders the writes. This is, frankly, silly. The C++ standard doesn't impose requirements on the instructions compilers generate but rather on what those instructions must actually do when executed. It doesn't distinguish between optimizations done by a CPU and optimizations done by a compiler and any such distinctions would be platform-specific anyway.
If the standard allows a CPU to re-order two writes, then it allows the compiler to re-order them. It does not, and cannot, make that kind of distinction. Of course, compiler writers may still decide that they will issues the writes in order even though the CPU can re-order them because that may make the most sense on their platform.

std::is_trivially_default_constructible and volatile types

Looking at the answer to this question:
std::is_trivially_copyable - Why are volatile scalar types not trivially copyable?
I see that the standard says a type is not trivially copyable if it has a volatile member, even when that member is a basic type. This appears to be due to a need to ensure that volatile members are updated atomically, and the byte-by-byte copy implied by trivial copyability can be violated.
But doesn't this also imply that is_trivially_default_constructible should fail on these types?
It seems that this is the correct check to see if a class can "correctly" be zero initialized (memset-able) in the same way is_trivially_copyable is used to detect memcpy-ability. And the memset may be non-atomic for the same basic reason.
However maybe this assumption is wrong. Is it incorrect to assume that a trivially constructible class is memset-initializable? If it is incorrect, what is the correct check to determine this?
To be a little clearer, I'm not assuming that is_trivially_default_constructible actually tells me anything about the semantic correctness of memset constructibility, zero initializability, or any of that. I just mean that it can tell me if that is possible in the same way as memcpy-ability. Maybe is_trivially_copyable is sufficient for all cases, since construction as a buffer is logically equivalent to copying some (possibly static or algorithmically definable) buffer into the object after construction?

Is pointer assignment atomic in C++?

I've actually heard claims both ways. I suspect they are not, but I wanted to get the topic settled.

C++03 does not know about the existance of threads, therefore the concept of atomicity doesn't make much sense for C++03, meaning that it doesn't say anything about that.
C++11 does know about threads, but once again doesn't say anything about the atomicity of assigning pointers. However C++11 does contain std::atomic<T*>, which is guaranteed to be atomic.
Note that even if writing to a raw pointer is atomic on your platform the compiler is still free to move that assingment around, so that doesn't really buy you anything.
If you need to write to a pointer which is shared between threads use either std::atomic<T*> (or the not yet official boost::atomic<T*>, gccs atomic intrinsics or windows Interlocked*) or wrap all accesses to that pointer in mutexes.

The C++ norm does not define specific threading behavior. Depending on the compiler and the platform, the pointer assignment may or may not be atomic.

Strict pointer aliasing: is access through a 'volatile' pointer/reference a solution?

On the heels of a specific problem, a self-answer and comments to it, I'd like to understand if it is a proper solution, workaround/hack or just plain wrong.
Specifically, I rewrote code:
T x = ...;
if (*reinterpret_cast <int*> (&x) == 0)
...
As:
T x = ...;
if (*reinterpret_cast <volatile int*> (&x) == 0)
...
with a volatile qualifier to the pointer.
Let's just assume that treating T as int in my situation makes sense. Does this accessing through a volatile reference solve pointer aliasing problem?
For a reference, from specification:
[ Note: volatile is a hint to the implementation to avoid aggressive
optimization involving the object because the value of the object might
be changed by means undetectable by an implementation. See 1.9 for
detailed semantics. In general, the semantics of volatile are intended
to be the same in C++ as they are in C. — end note ]
EDIT:
The above code did solve my problem at least on GCC 4.5.

Volatile can't help you avoid undefined behaviour here. So, if it works for you with GCC it's luck.
Let's assume T is a POD. Then, the proper way to do this is
T x = …;
int i;
memcpy(&i,&x,sizeof i);
if (i==0)
…
There! No strict aliasing problem and no memory alignment problem. GCC even handles memcpy as an intrinsic function (no function call is inserted in this case).

Volatile can't help you avoid undefined behaviour here.
Well, anything regarding volatile is somewhat unclear in the standard. I mostly agreed with your answer, but now I would like to slightly disagree.
In order to understand what volatile means, the standard is not clear for most people, notably some compiler writers. It is better to think:
when using volatile (and only when), C/C++ is pretty much high level assembly.
When writing to a volatile lvalue, the compiler will issue a STORE, or multiple STORE if one is not enough (volatile does not imply atomic).
When writing to a volatile lvalue, the compiler will issue a LOAD, or multiple LOAD if one is not enough.
Of course, where there is no explicit LOAD or STORE, the compiler will just issue instructions which imply a LOAD or STORE.
sellibitze gave the best solution: use memcpy for bit reinterpretations.
But if all accesses to a memory region are done with volatile lvalues, it is perfectly clear that the strict aliasing rules do not apply. This is the answer to your question.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js