How to safely zero std::array?

How to safely zero std::array? - c++

I'm trying to safely zero a std::array in a class destructor. From safely, I mean I want to be sure that compiler never optimize this zeroing. Here is what I came with:
template<size_t SZ>
struct Buf {
~Buf() {
auto ptr = static_cast<volatile uint8_t*>(buf_.data());
std::fill(ptr, ptr + buf_.size(), 0);
}
std::array<uint8_t, SZ> buf_{};
};
is this code working as expected? Will that volatile keyword prevent optimizing by compiler in any case?

The C++ standard itself doesn't make explicit guarantees. It says:
[dcl.type.cv]
The semantics of an access through a volatile glvalue are implementation-defined. ...
[Note 5: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation.
Furthermore, for some implementations, volatile might indicate that special hardware instructions are required to access the object.
See [intro.execution] for detailed semantics.
In general, the semantics of volatile are intended to be the same in C++ as they are in C.
— end note]
Despite the lack of guarantees by the C++ standard, over-writing the memory through a pointer to volatile is one way that some crypto libraries clear memory - at least as a fallback when system specific function isn't available.
P.S. I recommend using const_cast instead, in order to avoid accidentally casting to a different type rather than differently qualified same type:
auto ptr = const_cast<volatile std::uint8_t*>(buf_.data());
Implicit conversion also works:
volatile std::uint8_t* ptr = buf_.data();
System specific functions for this purpose are SecureZeroMemory in windows and explicit_bzero in some BSDs and glibc.
The C11 standard has an optional function memset_s for this purpose and it may be available for you in C++ too but isn't of course guaranteed to be available.
There is a proposal P1315 to introduce similar function to the C++ standard.
Note that secure erasure is not the only consideration that has to be taken to minimise possibility of leaking sensitive data. For example, operating system may swap the memory onto permanent storage unless instructed to not do so. There's no standard way to make such instruction in C++. There's mlock in POSIX and VirtualLock in windows.

Related

Is it legal to optimize away stores/construction of volatile stack variables?

I noticed that clang and gcc optimize away the construction of or assignment to a volatile struct declared on the stack, in some scenarios. For example, the following code:
struct nonvol2 {
uint32_t a, b;
};
void volatile_struct2()
{
volatile nonvol2 temp = {1, 2};
}
Compiles on clang to:
volatile_struct2(): # #volatile_struct2()
ret
On the other hand, gcc does not remove the stores, although it does optimize the two implied stores into a single one:
volatile_struct2():
movabs rax, 8589934593
mov QWORD PTR [rsp-8], rax
ret
Oddly, clang won't optimize away a volatile store to a single int variable:
void volatile_int() {
volatile int x = 42;
}
Compiles to:
volatile_int(): # #volatile_int()
mov dword ptr [rsp - 4], 1
ret
Furthermore a struct with 1 member rather than 2 is not optimized away.
Although gcc doesn't remove the construction in this particular case, it does perhaps even more aggressive optimizations in the case that the struct members themselves are declared volatile, rather than the struct itself at the point of construction:
typedef struct {
volatile uint32_t a, b;
} vol2;
void volatile_def2()
{
vol2 temp = {1, 2};
vol2 temp2 = {1, 2};
temp.a = temp2.a;
temp.a = temp2.a;
}
simply compiles down to a simple ret.
While it seems entirely "reasonable" to remove these stores which are pretty much impossible to observe by any reasonable process, my impression was that in the standard volatile loads and stores are assumed to be part of the observable behavior of the program (in addition to calls to IO functions), full stop. The implication being they are not subject to removal by "as if", since it would by definition change the observable behavior of the program.
Am I wrong about that, or is clang breaking the rules here? Perhaps construction is excluded from the cases where volatile must be assumed to have side effects?

From the point of view of the Standard, there is no requirement that implementations document anything about how any objects are physically stored in memory. Even if an implementation documents the behavior of using pointers of type unsigned char* to access objects of a certain type, an implementation would be allowed to physically store data some other way and then have the code for character-based reads and writes adjust behaviors suitably.
If an execution platform specifies a relationship between abstract-machine objects and storage seen by the CPU, and defines ways by which accesses to certain CPU addresses might trigger side effects the compiler doesn't know about, a quality compiler suitable for low-level programming on that platform should generate code where the behavior of volatile-qualified objects is consistent with that specification. The Standard makes no attempt to mandate that all implementations be suitable for low-level programming (or any other particular purpose, for that matter).
If the address of an automatic variable is never exposed to outside code, a volatile qualifier need only have only two effects:
If setjmp is called within a function, a compiler must do whatever is necessary to ensure that longjmp will not disrupt the values of any volatile-qualified objects, even if they were written between the setjmp and longjmp. Absent the qualifier, the value of objects written between setjmp and longjmp would become indeterminate when a longjmp is executed.
Rules which would allow a compiler to presume that any loops which don't have side effects will run to completion do not apply in cases where a volatile object is accessed within the loop, whether or not an implementation would define any means by which such access would be observable.
Except in those cases, the as-if rule would allow a compiler to implement the volatile qualifier in the abstract machine in a way that has no relation to the physical machine.

Let us investigate what the standard directly says. The behavior of volatile is defined by a pair of statements. [intro.execution]/7:
The least requirements on a conforming implementation are:
Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.
...
And [intro.execution]/14:
Reading an object designated by a volatile glvalue (6.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment.
Well, [intro.execution]/14 does not apply because nothing in the above code constitutes "reading an object". You initialize it and destroy it; it is never read.
So that leaves [intro.execution]/7. The phrase of importance here is "accesses through volatile glvalues". While temp certainly is a volatile value, and it certainly is a glvalue... you never actually access through it. Oh yes, you initialize the object, but that doesn't actually access "though" temp as a glvalue.
That is, temp as an expression is a glvalue, per the definition of glvalue: "an expression whose evaluation determines the identity of an object, bit-field, or function." The statement creating and initializing temp results in a glvalue, but the initialization of temp isn't accessing through a glvalue.
Think of volatile like const. The rules about const objects don't apply until after it is initialized. Similarly, the rules about volatile objects don't apply until after it is initialized.
So there's a difference between volatile nonvol2 temp = {1, 2}; and volatile nonvol2 temp; temp.a = 1; temp.b = 2;. And Clang certainly does the right thing in that case.
That being said, the inconsistency of Clang with regard to this behavior (optimizing it out only when using a struct, and only when using a struct that contains more than one member) suggests that this is probably not a formal optimization by the writers of Clang. That is, they're not taking advantage of the wording so much as this just being an odd quirk of some accidental code coming together.
Although gcc doesn't remove the construction in this particular case, it does perhaps even more aggressive optimizations in the case that the struct members themselves are declared volatile, rather than the struct itself at the point of construction:
GCC's behavior here is:
Not in accord with the standard, as it is in violation of [intro.execution]/7, but
There's absolutely no way to prove that it isn't compliant with the standard.
Given the code you wrote, there is simply no way for a user to detect whether or not those reads and writes are actually happening. And I rather suspect that the moment you do anything to allow the outside world to see it, those changes will suddenly appear in the compiled code. However much the standard wishes to call it "observable behavior", the fact is that by C++'s own memory model, nobody can see it.
GCC gets away with the crime due to lack of witnesses. Or at least credible witnesses (anyone who could see it would be guilty of invoking UB).
So you should not treat volatile like some optimization off-switch.

Is a fundamental type volatile initialization an observable behavior?

Consider this function:
void f(void* loc)
{
auto p = new(loc) volatile int{42};
*p = 0;
}
I have check the generated code by clang, gcc and CL, none of them elide the initialization. (The answer may be seen by the hardwer:).
Is it an extension provided by compilers to the standard? Does the standard allow compilers not to perform the write 42?
Actualy for objects of class type, it is specfied that constructor of an object is executed without consideration for the volatile qualifier [class.ctor]:
A constructor can be invoked for a const, volatile or const volatile object. const and volatile
semantics (10.1.7.1) are not applied on an object under construction. They come into effect when the
constructor for the most derived object (4.5) ends.

[intro.execution]/8 lists the minimum requirements for a conforming implementation; these are also known as “observable behavior”. The first requirement is that “Access to volatile objects are evaluated strictly according to the rules of the abstract machine.” The compiler is required to produce all observable behavior. In particular, it is not allowed to remove accesses to volatile objects. And note that “object” here is used in the compiler-writer’s sense: it includes built-in types.

This is not a coherent question because what it means for a compiler to perform a write is platform-specific. There is no platform-independent notion of performing a write other than perhaps seeing the effects of a write in a subsequent read.
As you see, typical compilers on x86 will emit a write instruction but no memory barrier. The CPU may reorder the write, coalesce it, or even avoid doing any write to main memory because of the way the platform's cache coherence works.
The reason they made this implementation choice is that it makes volatile work for a broad range of applications, including those where the standard requires it to work, and because it has acceptable performance consequences. The standard, being platform-neutral, doesn't dictate platform-specific decisions like this and compiler writers do not understand it to do that.
They could have forced every volatile access to be uncoalsecable, un-reorderable, and pushed through the cache subsystem to main memory. But that would provide terrible performance and, on this platform, no significant benefits. So they don't do it, and they don't understand the C++ standard to suggest that there's some mythical observer on the memory bus who must see specific things. The very existence of a memory bus is platform-specific. The standard is not platform-specific.
You will sometimes see people argue, for example, that the standard somehow requires the compiler to issue instructions to do volatile writes in order but that it doesn't matter if the CPU coalesces or re-orders the writes. This is, frankly, silly. The C++ standard doesn't impose requirements on the instructions compilers generate but rather on what those instructions must actually do when executed. It doesn't distinguish between optimizations done by a CPU and optimizations done by a compiler and any such distinctions would be platform-specific anyway.
If the standard allows a CPU to re-order two writes, then it allows the compiler to re-order them. It does not, and cannot, make that kind of distinction. Of course, compiler writers may still decide that they will issues the writes in order even though the CPU can re-order them because that may make the most sense on their platform.

How to safely access memory mapped hardware register from C or C++ language level?

In C and C++ I usually access memory mapped hardware registers with the well known pattern:
typedef unsigned int uint32_t;
*((volatile uint32_t*)0xABCDEDCB) = value;
As far as I know, the only thing guaranteed by the C or C++ standard is that accesses to volatile variables are evaluated strictly according to the rules of the abstract machine.
How can I be sure that the compiler will not generate torn stores for the access for a 32-bit processor? For example the compiler is allowed to emit two 16-bit stores instead of a one 32-bit store, isn't it?
Are there any guarantees in this area made by gcc?

When speeking about MCUs, as far as I know there are no such guarantees. Even more, each case of accessing HW registers may be device specific and often may have its own sequence, rules and/or set of assembler instructions. And it depends on compiler implementation, too.
The only thing here that works for me is reading datasheets concering concrete devices/compilers and follow the examples.

If you are really worried use inline assembler. A single assembler instruction will not return until completed.
Also you must ensure that the memory page you are writing to is not cached otherwise the write may not be all the way through. On ARM memory barriers may be necessary as well.
Volatile is just an instruction which tells the compiler to make no assuptions about the content of the memory since the value may be changed outside one's program but has no effect or read write ordering. Use memory barriers or atomics if this is an issue.

Microsoft comment about ISO compliant usage of volatile
"The volatile keyword in C++11 ISO Standard code is to be used only for hardware access"
http://msdn.microsoft.com/en-us/library/12a04hfd.aspx
At least in the case of Microsoft C++ (going back to Visual Studio 2005), an example of a pointer to volatile type is shown:
http://msdn.microsoft.com/en-us/library/145yc477.aspx
Another reference, in this case C, which also includes examples of pointers to volatile types.
"static volatile objects model memory-mapped I/O ports, and static const volatile objects model memory-mapped input ports"
http://en.cppreference.com/w/c/language/volatile
Operations on volatile types are not allowed to be reordered by compiler or hardware, a requirement for hardware memory mapped access. However operations to a combination of volatile and non-volatile types may end up with reordered operations on the non-volatile types, making them non thread safe (all inter thread sharing of variables would require all of them to be volatile to be thread safe). Even if two threads only share volatile types, there's still a data race issue (one thread reads just before the other thread writes).
Microsoft compilers have a non-portable (to other compilers) extension to volatile, that makes them thread safe (/volatile:ms - Microsoft specific, used by default except for ARM processors).
Back to the original question, in the case of GCC, you can have the compiler generate assembly code to verify the operation is safe.

How can I be sure that the compiler will not generate torn stores for the access for a 32-bit processor? For example the compiler is allowed to emit two 16-bit stores instead of a one 32-bit store, isn't it?
Normally, the compiler can combine or split memory accesses under the as-if rule, as long as the observable behavior of the program is unchanged, since the observable behavior of access to ordinary objects is the effect on the object's value, and not the memory access itself.
However, accesses to volatile objects are part of the observable behavior of a program. Therefore the compiler can no longer combine or split memory transactions. In the section where the C++ Standard defines "observable behavior" it specifically says that "Access to volatile objects are evaluated strictly according to the rules of the abstract machine."
Please note that the code shown is still non-portable C++, because the C++ Standard only cares about whether the object accessed is volatile, and not about modifiers on the pointer used to form an lvalue for said access. You'd need to do something crazy like this example of placement-new, to force the existence of a volatile object:
*(new volatile uint32 ((uint32*)0xABCDEDCB)) = value;

Does using "pointer to volatile" prevent compiler optimizations at all times?

Here's the problem: your program temporarily uses some sensitive data and wants to erase it when it's no longer needed. Using std::fill() on itself won't always help - the compiler might decide that the memory block is not accessed later, so erasing it is a waste of time and eliminate erasing code.
User ybungalobill suggests using volatile keyword:
{
char buffer[size];
//obtain and use password
std::fill_n( (volatile char*)buffer, size, 0);
}
The intent is that upon seeing the volatile keyword the compiler will not try to eliminate the call to std::fill_n().
Will volatile keyword always prevent the compiler from such memory modifying code elimination?

The compiler is free to optimize your code out because buffer is not a volatile object.
The Standard only requires a compiler to strictly adhere to semantics for volatile objects. Here is what C++03 says
The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous evaluations are complete and
subsequent evaluations have not yet occurred.
[...]
and
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and
calls to library I/O functions
In your example, what you have are reads and writes using volatile lvalues to non-volatile objects. C++0x removed the second text I quoted above, because it's redundant. C++0x just says
The least requirements on a conforming implementation are:
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.[...]
These collectively are referred to as the observable behavior of the program.
While one may argue that "volatile data" could maybe mean "data accessed by volatile lvalues", which would still be quite a stretch, the C++0x wording removed all doubts about your code and clearly allows implementations to optimize it away.
But as people pointed out to me, It probably does not matter in practice. A compiler that optimizes such a thing will most probably go against the programmers intention (why would someone have a pointer to volatile otherwise) and so would probably contain a bug. Still, I have experienced compiler vendors that cited these paragraphs when they were faced with bugreports about their over-aggressive optimizations. In the end, volatile is inherent platform specific and you are supposed to double check the result anyway.

From the last C++0x draft [intro.execution]:
8 The least requirements on a
conforming implementation are:
— Access to volatile objects are
evaluated strictly according to the
rules of the abstract machine.
[...]
12 Accessing an object designated by a
volatile glvalue (3.10), modifying an
object, calling a library I/O
function, or calling a function that
does any of those operations are all
side effects, [...]
So even the code you provided must not be optimized.

The memory content you wish to remove may have already been flushed out from your CPU/core's inner cache to RAM, where other CPUs can continue to see it. After overwriting it, you need to use a mutex / memory barrier instruction / atomic operation or something to trigger a sync with other cores. In practice, your compiler will probably do this before calling any external functions (google Dave Butenhof's post on volatile's dubious utility in multi-threading), so if you thread does that soon afterwards anyway then it's not a major issue. Summarily: volatile isn't needed.

A conforming implementation may, at its leisure, defer the actual performance of any volatile reads and writes until the result of a volatile read would affect the execution of a volatile write or I/O operation.
For example, given something like:
volatile unsigned char vol1,vol2;
extern unsigned char res[1000];
void test(int scale)
{
unsigned char ch;
for (int 0=0; i<10000; i++)
{
res[i] = i*vol1*scale;
vol2 = res[i];
}
}
a conforming compiler could, at its option, check whether scale is a multiple of 128 and--if so--clear out all even-indexed values of res before doing any reads from vol1 or writes to vol2. Even though the compiler would need to do each reads from vol1 before it could do the following write to vol2, a compiler may be able to defer both operations until after it has run an essentially unlimited amount of code.

"volatile" qualifier and compiler reorderings

A compiler cannot eliminate or reorder reads/writes to a volatile-qualified variables.
But what about the cases where other variables are present, which may or may not be volatile-qualified?
Scenario 1
volatile int a;
volatile int b;
a = 1;
b = 2;
a = 3;
b = 4;
Can the compiler reorder first and the second, or third and the fourth assignments?
Scenario 2
volatile int a;
int b, c;
b = 1;
a = 1;
c = b;
a = 3;
Same question, can the compiler reorder first and the second, or third and the fourth assignments?

The C++ standard says (1.9/6):
The observable behavior of the
abstract machine is its sequence of
reads and writes to volatile data and
calls to library I/O functions.
In scenario 1, either of the changes you propose changes the sequence of writes to volatile data.
In scenario 2, neither change you propose changes the sequence. So they're allowed under the "as-if" rule (1.9/1):
... conforming implementations are
required to emulate (only) the
observable behavior of the abstract
machine ...
In order to tell that this has happened, you would need to examine the machine code, use a debugger, or provoke undefined or unspecified behavior whose result you happen to know on your implementation. For example, an implementation might make guarantees about the view that concurrently-executing threads have of the same memory, but that's outside the scope of the C++ standard. So while the standard might permit a particular code transformation, a particular implementation could rule it out, on grounds that it doesn't know whether or not your code is going to run in a multi-threaded program.
If you were to use observable behavior to test whether the re-ordering has happened or not (for example, printing the values of variables in the above code), then of course it would not be allowed by the standard.

For scenario 1, the compiler should not perform any of the reorderings you mention. For scenario 2, the answer might depend on:
and whether the b and c variables are visible outside the current function (either by being non-local or having had their address passed
who you talk to (apparently there is some disagreement about how string volatile is in C/C++)
your compiler implementation
So (softening my first answer), I'd say that if you're depending on certain behavior in scenario 2, you'd have to treat it as non-portable code whose behavior on a particular platform would have be determined by whatever the implementation's documentation might indicate (and if the docs said nothing about it, then you're out of luck with a guaranteed behavior.
from C99 5.1.2.3/2 "Program execution":
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
...
(paragraph 5) The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous accesses are complete and subsequent accesses have not yet occurred.
Here's a little of what Herb Sutter has to say about the required behavior of volatile accesses in C/C++ (from "volatile vs. volatile" http://www.ddj.com/hpc-high-performance-computing/212701484) :
what about nearby ordinary reads and writes -- can those still be reordered around unoptimizable reads and writes? Today, there is no practical portable answer because C/C++ compiler implementations vary widely and aren't likely to converge anytime soon. For example, one interpretation of the C++ Standard holds that ordinary reads can move freely in either direction across a C/C++ volatile read or write, but that an ordinary write cannot move at all across a C/C++ volatile read or write -- which would make C/C++ volatile both less restrictive and more restrictive, respectively, than an ordered atomic. Some compiler vendors support that interpretation; others don't optimize across volatile reads or writes at all; and still others have their own preferred semantics.
And for what it's worth, Microsoft documents the following for the C/C++ volatile keyword (as Microsoft-sepcific):
A write to a volatile object (volatile write) has Release semantics; a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary.
A read of a volatile object (volatile read) has Acquire semantics; a reference to a global or static object that occurs after a read of volatile memory in the instruction sequence will occur after that volatile read in the compiled binary.
This allows volatile objects to be used for memory locks and releases in multithreaded applications.

Volatile is not a memory fence. Assignments to B and C in snippet #2 can be eliminated or performed whenever. Why would you want the declarations in #2 to cause the behavior of #1?

Some compilers regard accesses to volatile-qualified objects as a memory fence. Others do not. Some programs are written to require that volatile works as a fence. Others aren't.
Code which is written to require fences, running on platforms that provide them, may run better than code which is written to not require fences, running on platforms that don't provide them, but code which requires fences will malfunction if they are not provided. Code which doesn't require fences will often run slower on platforms that provide them than would code which does require the fences, and implementations which provide fences will run such code more slowly than those that don't.
A good approach may be to define a macro semi_volatile as expanding to nothing on systems where volatile implies a memory fence, or to volatile on systems where it doesn't. If variables that need to have accesses ordered with respect to other volatile variables but not to each other are qualified as semi-volatile, and that macro is defined correctly, reliable operation will be achieved on systems with or without memory fences, and the most efficient operation that can be achieved on systems with fences will be achieved. If a compiler actually implements a qualifier that works as required, semivolatile, it could be defined as a macro that uses that qualifier and achieve even better code.
IMHO, that's an area the Standard really should address, since the concepts involved are applicable on many platforms, and any platform where fences aren't meaningful can simply ignore them.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js