I have a problem where the optimizer seems to be removing lines of code that are quite necessary. Some background: I have a program that interfaces a PCIe driver. I have an integer pointer UINT32 *bar_reg; that points to the user space address of the BAR register I am communicating to. To write to the register I just de-reference the pointer. *(bar_reg + OFFSET) = value;
With no optimizations, this works fine. However as soon as I turn on any level of optimization, all the lines that de-reference the pointer get removed. The way I finally discovered this was by stepping through in Visual Studio. However it happens independent of platform. I've been able to get by up to now with the optimizer off, but someone using my library code in Linux wants to turn on optimization now. So I'm curious as to why this problem occurs and what the most reasonable fix/workaround is.
Use volatile keyword in order to prevent the optimization of that variable.
For example:
volatile UINT32 *bar_reg;
The issue here is that the compiler assumes that since the memory is not accessed by the program, then it means that the memory will remain unchanged and hence it might try to optimize some of the writes to that memory.
The issue you are running into involves the as-if rule which allows the optimizer to transform your code in any way as long as it does not effect the observable behavior of the program.
So if you only write to the variable but never actually use in your program the optimizer believes there no observable behavior and assumes it can validly optimize away the writes.
In your case the data is being observed outside of your program and the way to indicate this to the compiler and optimizer is through the volatile qualifier, cppreference tells us (emphasis mine going forwrd):
an object whose type is volatile-qualified, or a subobject of a
volatile object, or a mutable subobject of a const-volatile object.
Every access (read or write operation, member function call, etc.) on
the volatile object is treated as a visible side-effect for the
purposes of optimization [...]
For reference the as-if rule is covered in the draft C++ standard in section 1.9 which says:
[...]Rather, conforming implementations are required to emulate (only)
the observable behavior of the abstract machine as explained below.
and with respect to the as-if rule volatile is also covered in section 1.9 and it says:
Access to volatile objects are evaluated strictly according to the
rules of the abstract machine.
Related
Two popular compilers (gcc, clang) emit a store instruction in the body of the following function:
void foo(char x) {
*(char *)0xE0000000 = x;
}
This program might behave correctly on some hardware architectures, where the address being written to is memory-mapped IO.
Since this access is performed through a pointer not qualified as volatile, is the compiler required to emit a store here? Could a sufficiently-aggressive optimizer legally eliminate this store? I'm curious about whether this store constitutes an observable side effect with respect to the abstract machine.
Additionally, do C17 and C++20 differ in this regard?
The C standard does not require an implementation to issue a store because C 2018 5.1.2.3 6 says:
The least requirements on a conforming implementation are:
— Accesses to volatile objects are evaluated strictly according to the rules of the abstract machine.
— At program termination, all data written into files shall be identical to the result that execution of the program according to the abstract semantics would have produced.
— The input and output dynamics of interactive devices shall take place as specified in 7.21.3. The intent of these requirements is that unbuffered or line-buffered output appear as soon as possible, to ensure that prompting messages actually appear prior to a program waiting for input.
This is the observable behavior of the program.
None of these includes assignment to a non-volatile lvalue.
In C, it is expressly permitted to convert an integer to any pointer type, but that does not imply that you can use the resulting pointer to access memory. Unless the conversion appears in the context of a null pointer constant (which canot be used to access memory), the result of the conversion
is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
(C17, 6.3.2.3/5)
"Implementation-defined" leaves abundant scope for subsequent accesses to the supposedly pointed-to object to have behavior that does not involve accessing memory, well beyond performing a trap or exhibiting the (unspecified) effects of a misaligned access or even exhibiting UB courtesy of a strict-aliasing violation.
Only if you get past all that do you reach the issues surrounding non-volatile access that #EricPostpischil brings up in his answer.
The bottom line is that if you are writing for a freestanding implementation, which is the only context in which it makes any sense at all to try to perform an access such as in the example code, then you need to consult your implementation's documentation about how to access absolute addresses.
As far as I am aware, the same conclusion applies to C++.
Consider this simple code:
void g();
void foo()
{
volatile bool x = false;
if (x)
g();
}
https://godbolt.org/z/I2kBY7
You can see that neither gcc nor clang optimize out the potential call to g. This is correct in my understanding: The abstract machine is to assume that volatile variables may change at any moment (due to being e.g. hardware-mapped), so constant-folding the false initialization into the if check would be wrong.
But MSVC eliminates the call to g entirely (keeping the reads and writes to the volatile though!). Is this standard-compliant behavior?
Background: I occasionally use this kind of construct to be able to turn on/off debugging output on-the-fly: The compiler has to always read the value from memory, so changing that variable/memory during debugging should modify the control flow accordingly. The MSVC output does re-read the value but ignores it (presumably due to constant folding and/or dead code elimination), which of course defeats my intentions here.
Edits:
The elimination of the reads and writes to volatile is discussed here: Is it allowed for a compiler to optimize away a local volatile variable? (thanks Nathan!). I think the standard is abundantly clear that those reads and writes must happen. But that discussion does not cover whether it is legal for the compiler to take the results of those reads for granted and optimize based on that. I suppose this is under-/unspecified in the standard, but I'd be happy if someone proved me wrong.
I can of course make x a non-local variable to side-step the issue. This question is more out of curiosity.
I think [intro.execution] (paragraph number vary) could be used to explain MSVC behavior:
An instance of each object with automatic storage duration is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended...
The standard does not permit elimination of a read through a volatile glvalue, but the paragraph above could be interpreted as allowing to predict the value false.
BTW, the C Standard (N1570 6.2.4/2) says that
An object exists, has a constant address, and retains its last-stored value throughout its lifetime.34
34) In the case of a volatile object, the last store need not be explicit in the program.
It is unclear if there could be a non-explicit store into an object with automatic storage duration in C memory/object model.
TL;DR The compiler can do whatever it wants on each volatile access. But the documentation has to tell you.--"The semantics of an access through a volatile glvalue are implementation-defined."
The standard defines for a program permitted sequences of "volatile accesses" & other "observable behavior" (achieved via "side-effects") that an implementation must respect per "the 'as-if' rule".
But the standard says (my boldface emphasis):
Working Draft, Standard for Programming Language C++
Document Number: N4659
Date: 2017-03-21
§ 10.1.7.1 The cv-qualifiers
5 The semantics of an access through a volatile glvalue are implementation-defined. […]
Similarly for interactive devices (my boldface emphasis):
§ 4.6 Program execution
5 A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. [...]
7 The least requirements on a conforming implementation are:
(7.1) — Accesses through volatile glvalues are evaluated strictly according to the rules of the abstract machine.
(7.2) — At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
(7.3) — The input and output dynamics of interactive devices shall take place in such a fashion that prompting output is actually delivered before a program waits for input. What constitutes an interactive device is implementation-defined.
These collectively are referred to as the observable behavior of the program. [...]
(Anyway what specific code is generated for a program is not specified by the standard.)
So although the standard says that volatile accesses can't be elided from the abstract sequences of abstract machine side effects & consequent observable behaviors that some code (maybe) defines, you can't expect anything to be reflected in object code or real-world behaviour unless your compiler documentation tells you what constitutes a volatile access. Ditto for interactive devices.
If you are interested in volatile vis a vis the abstract sequences of abstract machine side effects and/or consequent observable behaviors that some code (maybe) defines then say so. But if you are interested in what corresponding object code is generated then you must interpret that in the context of your compiler & compilation.
Chronically people wrongly believe that for volatile accesses an abstract machine evaluation/read causes an implemented read & an abstract machine assignment/write causes an implemented write. There is no basis for this belief absent implementation documentation saying so. When/iff the implementation says that it actually does something upon a "volatile access", people are justified in expecting that something--maybe, the generation of certain object code.
I believe it is legal to skip the check.
The paragraph that everyone likes to quote
34) In the case of a volatile object, the last store need not be explicit in the program
does not imply that an implementation must assume such stores are possible at any time, or for any volatile variable. An implementation knows which stores are possible. For instance, it is entirely reasonable to assume that such implicit writes only happen for volatile variables that are mapped to device registers, and that such mapping is only possible for variables with external linkage. Or an implementation may assume that such writes only hapen to word-sized, word-aligned memory locations.
Having said that, I think MSVC behaviour is a bug. There is no real-world reason to optimise away the call. Such optimisation may be compliant, but it is needlessly evil.
I've looked at the standard but couldn't find any indication that simply writing to memory would be considered observable behaviour. If not, that would mean the compiled code need not actually write to that memory. If a compiler choose to optimize away such access anything involving mapper memory, or shared memory, may not work.
1.9-8 seems to defined a very limited observable behaviour but indicates an implementation may define more. Can one assume than any quality compiler would treat modifying memory as an observable behaviour? That is, it may not guarantee atomicity or ordering, but does guarantee that data will eventually be written.
So, have I overlooked something in the standard, or is the writing to memory merely something the compiler decides to do?
Statements from the current or C++0x standard are good. Please note I'm not talking about accessing memory through a function, I mean direct access, such as writing data to a pointer (perhaps retrieved via mmap or another library function).
This kind of thing is what volatile exists for. Else, writing to memory and never apparently reading from it is not observable behaviour. However, in the general case, it would be quite impossible for the optimizer to prove that you never read it back except in relatively trivial examples, so it's not usually an issue.
Can one assume than any quality compiler would treat modifying memory as an observable behaviour?
No. Volatile is meant for marking that. However, you cannot fully trust the compiler even after adding the volatile qualifier, at least as told by a 2008 paper: http://www.cs.utah.edu/~regehr/papers/emsoft08-preprint.pdf
EDIT:
From C standard (not C++) http://c0x.coding-guidelines.com/5.1.2.3.html
An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
My reading of C99 is that unless you specify volatile, how and when the variable is actually accessed is implementation defined. If you specify volatile qualifier then code must work according to the rules of an abstract machine.
Relevant parts in the standard are: 6.7.3 Type qualifiers (volatile description) and 5.1.2.3 Program execution (the abstract machine definition).
For some time now I know that many compilers actually have heuristics to detect cases when a variable should be reread again and when it is okay to use a cached copy. Volatile makes it clear to the compiler that every access to the variable should be actually an access to the memory. Without volatile it seems compiler is free to never reread the variable.
And BTW wrapping the access in a function doesn't change that since a function even without inline might be still inlined by the compiler within the current compilation unit.
From your question below:
Assume I use an array on the heap (unspecified where it is allocated),
and I use that array to perform a calculation (temp space). The
optimizer sees that it doesn't actually need any of that space as it
can use strictly registers. Does the compiler nonetheless write the
temp values to the memory?
Per MSalters below:
It's not guaranteed, and unlikely. Consider a a Static Single
Assignment optimizer. This figures out each possible write/read
dependency, and then assigns registers to optimize these dependencies.
As a side effect, any write that's not followed by a (possible) read
creates no dependencies at all, and is eliminated. In your example
("use strictly registers") the optimizer has satisfied all write/read
dependencies with registers, so it won't write to memory at all. All
reads produce the correct values, so it's a correct optimization.
I know when reading from a location of memory which is written to by several threads or processes the volatile keyword should be used for that location like some cases below but I want to know more about what restrictions does it really make for compiler and basically what rules does compiler have to follow when dealing with such case and is there any exceptional case where despite simultaneous access to a memory location the volatile keyword can be ignored by programmer.
volatile SomeType * ptr = someAddress;
void someFunc(volatile const SomeType & input){
//function body
}
What you know is false. Volatile is not used to synchronize memory access between threads, apply any kind of memory fences, or anything of the sort. Operations on volatile memory are not atomic, and they are not guaranteed to be in any particular order. volatile is one of the most misunderstood facilities in the entire language. "Volatile is almost useless for multi-threadded programming."
What volatile is used for is interfacing with memory-mapped hardware, signal handlers and the setjmp machine code instruction.
It can also be used in a similar way that const is used, and this is how Alexandrescu uses it in this article. But make no mistake. volatile doesn't make your code magically thread safe. Used in this specific way, it is simply a tool that can help the compiler tell you where you might have messed up. It is still up to you to fix your mistakes, and volatile plays no role in fixing those mistakes.
EDIT: I'll try to elaborate a little bit on what I just said.
Suppose you have a class that has a pointer to something that cannot change. You might naturally make the pointer const:
class MyGizmo
{
public:
const Foo* foo_;
};
What does const really do for you here? It doesn't do anything to the memory. It's not like the write-protect tab on an old floppy disc. The memory itself it still writable. You just can't write to it through the foo_ pointer. So const is really just a way to give the compiler another way to let you know when you might be messing up. If you were to write this code:
gizmo.foo_->bar_ = 42;
...the compiler won't allow it, because it's marked const. Obviously you can get around this by using const_cast to cast away the const-ness, but if you need to be convinced this is a bad idea then there is no help for you. :)
Alexandrescu's use of volatile is exactly the same. It doesn't do anything to make the memory somehow "thread safe" in any way whatsoever. What it does is it gives the compiler another way to let you know when you may have screwed up. You mark things that you have made truly "thread safe" (through the use of actual synchronization objects, like Mutexes or Semaphores) as being volatile. Then the compiler won't let you use them in a non-volatile context. It throws a compiler error you then have to think about and fix. You could again get around it by casting away the volatile-ness using const_cast, but this is just as Evil as casting away const-ness.
My advice to you is to completely abandon volatile as a tool in writing multithreadded applications (edit:) until you really know what you're doing and why. It has some benefit but not in the way that most people think, and if you use it incorrectly, you could write dangerously unsafe applications.
It's not as well defined as you probably want it to be. Most of the relevant standardese from C++98 is in section 1.9, "Program Execution":
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions.
Accessing an object designated by a volatile lvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression might produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
Once the execution of a function begins, no expressions from the calling function are evaluated until execution of the called function has completed.
When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects with type other than volatile sig_atomic_t are unspecified, and the value of any object not of volatile sig_atomic_t that is modified by the handler becomes undefined.
An instance of each object with automatic storage duration (3.7.2) is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended (by a call of a function or receipt of a signal).
The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous evaluations are complete and subsequent evaluations have not yet occurred.
At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place in such a fashion that prompting messages actually appear prior to a program waiting for input. What constitutes an interactive device is implementation-defined.
So what that boils down to is:
The compiler cannot optimize away reads or writes to volatile objects. For simple cases like the one casablanca mentioned, that works the way you might think. However, in cases like
volatile int a;
int b;
b = a = 42;
people can and do argue about whether the compiler has to generate code as if the last line had read
a = 42; b = a;
or if it can, as it normally would (in the absence of volatile), generate
a = 42; b = 42;
(C++0x may have addressed this point, I haven't read the whole thing.)
The compiler may not reorder operations on two different volatile objects that occur in separate statements (every semicolon is a sequence point) but it is totally allowed to rearrange accesses to non-volatile objects relative to volatile ones. This is one of the many reasons why you should not try to write your own spinlocks, and is the primary reason why John Dibling is warning you not to treat volatile as a panacea for multithreaded programming.
Speaking of threads, you will have noticed the complete absence of any mention of threads in the standards text. That is because C++98 has no concept of threads. (C++0x does, and may well specify their interaction with volatile, but I wouldn't be assuming anyone implements those rules yet if I were you.) Therefore, there is no guarantee that accesses to volatile objects from one thread are visible to another thread. This is the other major reason volatile is not especially useful for multithreaded programming.
There is no guarantee that volatile objects are accessed in one piece, or that modifications to volatile objects avoid touching other things right next to them in memory. This is not explicit in what I quoted but is implied by the stuff about volatile sig_atomic_t -- the sig_atomic_t part would be unnecessary otherwise. This makes volatile substantially less useful for access to I/O devices than it was probably intended to be, and compilers marketed for embedded programming often offer stronger guarantees, but it's not something you can count on.
Lots of people try to make specific accesses to objects have volatile semantics, e.g. doing
T x;
*(volatile T *)&x = foo();
This is legit (because it says "object designated by a volatile lvalue" and not "object with a volatile type") but has to be done with great care, because remember what I said about the compiler being totally allowed to reorder non-volatile accesses relative to volatile ones? That goes even if it's the same object (as far as I know anyway).
If you are worried about reordering of accesses to more than one volatile value, you need to understand the sequence point rules, which are long and complicated and I'm not going to quote them here because this answer is already too long, but here's a good explanation which is only a little simplified. If you find yourself needing to worry about the differences in the sequence point rules between C and C++ you have already screwed up somewhere (for instance, as a rule of thumb, never overload &&).
A particular and very common optimization that is ruled out by volatile is to cache a value from memory into a register, and use the register for repeated access (because this is much faster than going back to memory every time).
Instead the compiler must fetch the value from memory every time (taking a hint from Zach, I should say that "every time" is bounded by sequence points).
Nor can a sequence of writes make use of a register and only write the final value back later on: every write must be pushed out to memory.
Why is this useful? On some architectures certain IO devices map their inputs or outputs to a memory location (i.e. a byte written to that location actually goes out on the serial line). If the compiler redirects some of those writes to a register that is only flushed occasionally then most of the bytes won't go onto the serial line. Not good. Using volatile prevents this situation.
Declaring a variable as volatile means the compiler can't make any assumptions about the value that it could have done otherwise, and hence prevents the compiler from applying various optimizations. Essentially it forces the compiler to re-read the value from memory on each access, even if the normal flow of code doesn't change the value. For example:
int *i = ...;
cout << *i; // line A
// ... (some code that doesn't use i)
cout << *i; // line B
In this case, the compiler would normally assume that since the value at i wasn't modified in between, it's okay to retain the value from line A (say in a register) and print the same value in B. However, if you mark i as volatile, you're telling the compiler that some external source could have possibly modified the value at i between line A and B, so the compiler must re-fetch the current value from memory.
The compiler is not allowed to optimize away reads of a volatile object in a loop, which otherwise it'd normally do (i.e. strlen()).
It's commonly used in embedded programming when reading a hardware registry at a fixed address, and that value may change unexpectedly. (In contrast with "normal" memory, that doesn't change unless written to by the program itself...)
That is it's main purpose.
It could also be used to make sure one thread see the change in a value written by another, but it in no way guarantees atomicity when reading/writing to said object.
Here's the problem: your program temporarily uses some sensitive data and wants to erase it when it's no longer needed. Using std::fill() on itself won't always help - the compiler might decide that the memory block is not accessed later, so erasing it is a waste of time and eliminate erasing code.
User ybungalobill suggests using volatile keyword:
{
char buffer[size];
//obtain and use password
std::fill_n( (volatile char*)buffer, size, 0);
}
The intent is that upon seeing the volatile keyword the compiler will not try to eliminate the call to std::fill_n().
Will volatile keyword always prevent the compiler from such memory modifying code elimination?
The compiler is free to optimize your code out because buffer is not a volatile object.
The Standard only requires a compiler to strictly adhere to semantics for volatile objects. Here is what C++03 says
The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous evaluations are complete and
subsequent evaluations have not yet occurred.
[...]
and
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and
calls to library I/O functions
In your example, what you have are reads and writes using volatile lvalues to non-volatile objects. C++0x removed the second text I quoted above, because it's redundant. C++0x just says
The least requirements on a conforming implementation are:
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.[...]
These collectively are referred to as the observable behavior of the program.
While one may argue that "volatile data" could maybe mean "data accessed by volatile lvalues", which would still be quite a stretch, the C++0x wording removed all doubts about your code and clearly allows implementations to optimize it away.
But as people pointed out to me, It probably does not matter in practice. A compiler that optimizes such a thing will most probably go against the programmers intention (why would someone have a pointer to volatile otherwise) and so would probably contain a bug. Still, I have experienced compiler vendors that cited these paragraphs when they were faced with bugreports about their over-aggressive optimizations. In the end, volatile is inherent platform specific and you are supposed to double check the result anyway.
From the last C++0x draft [intro.execution]:
8 The least requirements on a
conforming implementation are:
— Access to volatile objects are
evaluated strictly according to the
rules of the abstract machine.
[...]
12 Accessing an object designated by a
volatile glvalue (3.10), modifying an
object, calling a library I/O
function, or calling a function that
does any of those operations are all
side effects, [...]
So even the code you provided must not be optimized.
The memory content you wish to remove may have already been flushed out from your CPU/core's inner cache to RAM, where other CPUs can continue to see it. After overwriting it, you need to use a mutex / memory barrier instruction / atomic operation or something to trigger a sync with other cores. In practice, your compiler will probably do this before calling any external functions (google Dave Butenhof's post on volatile's dubious utility in multi-threading), so if you thread does that soon afterwards anyway then it's not a major issue. Summarily: volatile isn't needed.
A conforming implementation may, at its leisure, defer the actual performance of any volatile reads and writes until the result of a volatile read would affect the execution of a volatile write or I/O operation.
For example, given something like:
volatile unsigned char vol1,vol2;
extern unsigned char res[1000];
void test(int scale)
{
unsigned char ch;
for (int 0=0; i<10000; i++)
{
res[i] = i*vol1*scale;
vol2 = res[i];
}
}
a conforming compiler could, at its option, check whether scale is a multiple of 128 and--if so--clear out all even-indexed values of res before doing any reads from vol1 or writes to vol2. Even though the compiler would need to do each reads from vol1 before it could do the following write to vol2, a compiler may be able to defer both operations until after it has run an essentially unlimited amount of code.