I've been in a debate about a corner case regarding local variables in a multithread environment.
The question is regarding programs formed like:
std::mutex mut;
int main()
{
std::size_t i = 0;
doSomethingWhichMaySpawnAThreadAndUseTheMutex();
mut.lock();
i += 1; // can this be reordered?
mut.unlock();
return i;
}
The question revolves around whether the i += 1 can be reordered to occur above the mutex locking.
The obvious parts are that mut.lock() happens-before i += 1, so if any other thread might be able to observe the value of i, the compiler is obliged to not have incremented it. From 3.9.2.3 of the C++ spec, "If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained."" This means that if I used any means to get a pointer to i, I can expect to see the right value.
However, the spec does state that the compiler may use the "as-if" rule to not give an object a memory address (footnote 4 on section 1.8.6). For example, i could be stored in a register, which has no memory address. In such a case, there would be no memory address to point to, so the compiler could prove that no other thread could access i.
The question I am interested in is what if the compiler does not do this "as-if" optimization, and does indeed store the object. Is the compiler permitted to store i, but do reordering as-if i was not actually stored? From an implementation perspective, this would mean that i might be stored on a stack, and thus it would be possible to have a pointer point at it, but have the compiler assume nobody can see i, and do the re-order?
The compiler is allowed to perform optimizations as long as the observable results of program execution legitimately could have been obtained ("as-if") without those optimizations.[1] So this question uses "as-if" in a misleading manner, if not actually asking a backwards question:
Is the compiler permitted to store i, but do reordering as-if i was not actually stored?
This asks if the compiler is permitted to do things as long as the results of program execution could have been obtained with an optimization. That is not the question to ask. The question should use non-optimized behavior as the reference. So something more like: "Is the compiler permitted to re-order the statements?" The answer is yes, as long as the observable results do not change. Nothing external to this particular function is told how to access i, so the compiler should be allowed to implement the increment anywhere between the surrounding uses of it (specifically: its definition and the return statement).
That being said, what I would expect a compiler to do in this case is neither give i a memory address nor treat it as a register variable. I would expect the compiler to treat it like a constant, effectively changing your function to:
int main()
{
doSomethingWhichMaySpawnAThreadAndUseTheMutex();
mut.lock();
mut.unlock();
return 1;
}
This is allowed as long as you have no way to detect that it has been done (short of examining the machine code directly).
Note:
[1] The use of "could have been" is an acknowledgement that there are portions of the C++ specification that use the word "unspecified". These portions allow compilers to make choices that (when dealing with non-robust code) could change observable behavior. That is, there can be a set of allowed behaviors, rather than a single allowed behavior. As long as the results remain in this set, an optimization is allowed.
I find this question very muddled. With the code as posted, the compiler is obviously aware that the only use of i is in the return statement, so i will be optimised away, end of story. The mutex doesn't come into it.
But as soon as you take the address of i - and give it away to somebody else - the game changes. Now the compiler has to put a real variable on the stack and manipulate it only between mutex.lock() and mutex.unlock(). Doing anything else would alter the semantics of your program. The mutex also gives you a memory fence.
You can see this clearly at Godbolt.
Edit: I have fixed a silly bug in that code that rather obscured the point I was trying to make, sorry about that.
The whole sequence:
mut.lock(); // i == 0 at that point
i += 1; // can this be reordered?
mut.unlock(); // i == 1 at that point
exit(i);
}
can be compiled as just exit(1); and other threads can be ignored as there is no proper synchronisation.
You need to wait for other threads to terminate, or for them to engage in doing nothing ever again. You don't do that so it can be assumed that all other threads are doing nothing.
The mutex has no meaningful role here.
Related
In ISO/IEC 14882:2003 (C++03) is stated under 7.1.5.1/8, section "The cv-qualifers":
[Note: volatile is a hint to the implementation to avoid aggressive optimization involving the object because the value of the object might be changed by means undetectable by an implementation. See 1.9 for detailed semantics. In general, the semantics of volatile are intended to be the same in C++ as they are in C. ]
These "means", that are undetectable by an implementation has been also already subject of Nawaz´ Q&A Why do we use volatile keyword:
However, sometimes, optimization (of some parts of your program) may be undesirable, because it may be that someone else is changing the value of some_int from outside the program which compiler is not aware of, since it can't see it; but it's how you've designed it. In that case, compiler's optimization would not produce the desired result!
But unfortunately, he missed to explain what these "means", that may change objects from outside the program, can be and how they can change objects.
My Question:
What are examples for these "undetectable means" and how are they be able to change internal objects of a program from outside of it?
A pointer in memory may be visible by other parts of the same or another program. For example, a variable that exists in shared memory and can be changed by another program.
The compiler cannot detect that.
Other examples are hardware-based memory locations.
Generally, apps that need volatile variables usually deal with stuff like asynchronous audio, and at the system level, interrupts, APIC etc. Most apps do not need them.
An imaginary example:
int v = 0;
// Some thread
SetUpdatesOn(&v);
// Another thread
for(;;)
{
int g = v;
std::cout << g;
}
Assume that an imaginary OS-level function SetUpdatesOn periodically changes the variable passed to it. If the variable is not declared volatile, the compiler might optimize the int g = v call or assume that v always has the same value.
If the variable is declared volatile, the compiler will keep reading it in the loop.
Note that very often it is difficult to debug such programming mistakes because the optimization may only exist in release builds.
In C++03 Standard observable behavior (1.9/6) includes reading and writing volatile data. Now I have this code:
int main()
{
const volatile int value = 0;
if( value ) {
}
return 0;
}
which formally initializes a volatile variable and then reads it. Visual C++ 10 emits machine code that makes room on the stack by pushing a dword there, then writes zero into that stack location, then reads that location.
To me it makes no sense - no other code or hardware could possibly know where the local variable is located (since it's in automatic storage) and so it's unreasonable to expect that the variable could have been read/written by any other party and so it can be eliminated in this case.
Is eliminating this variable access allowed? Is accessing a volatile local which address is not known to any other party observable behavior?
The thread's entire stack might be located on a protected memory page, with a handler that logs all reads and writes (and allows them to complete, of course).
However, I don't think MSVC really cares whether or how the memory access might be detected. It understands volatile to mean, among other things, "do not bother applying optimizations to this object". So it doesn't. It doesn't have to make sense, because MSVC is not interested in speeding up this kind of use of volatile.
Since it's implementation-dependent whether and how observable behavior can actually be observed, I think you're right that an implementation can "cheat" if it knows, because of details of the hardware, that the access cannot possibly be detected. Observable behavior that has no physically-detectable effect can be skipped: no matter what the standard says, the means to detect non-conforming behavior are limited to what's physically possible.
If an implementation fails to conform to the standard in a forest, and nobody notices, does it make a sound? Kind of thing.
That's the whole point of declaring a variable volatile: you tell the implementation that that variable may change or be read by means unknown to the implementation itself and that the implementation should refrain from performing optimizations that might impact such access.
When a variable is declared both volatile and const your program may not change it, but it may still be changed from outside. This implies that not only the variable itself but also all read operations on it cannot be optimized away.
no other code or hardware could possibly know
You can look a the assembly (you just did!), figure out the address of the variable, and map it to some hardware for the duration of the call. volatile means the implementation is obliged to account for such things too.
Volatile also applies to your own code.
volatile int x;
spawn_thread(&x);
x = 0;
while (x == 0){};
This will be an endless loop if x is not volatile.
As for the const. I'm unsure whether the compiler can use that to decide.
To me it makes no sense - no other code or hardware could possibly
know where the local variable is located (since it's in automatic
storage)
Really? So if I write an x86 emulator and run your code on it, then that emulator won't know about that local variable?
The implementation can never actually know for sure that the behaviour is unobservable.
My answer is a bit late. Anyway, this statement
To me it makes no sense - no other code or hardware could possibly
know where the local variable is located (since it's in automatic
storage)
is wrong. The difference between volatile or not is actually very observable in VC++2010. For instance, in Release build, you cannot add a break point to a local variable declaration that was eliminated by optimization. Hence, if you need to set a break point to a variable declaration or even just to watch its value in the Debugger, we have to use Debug build. To debug a specific local variable in Release build, we could make use of the volatile keyword:
int _tmain(int argc, _TCHAR* argv[])
{
int a;
//int volatile a;
a=1; //break point here is not possible in Release build, unless volatile used
printf("%d\n",a);
return 0;
}
I know when reading from a location of memory which is written to by several threads or processes the volatile keyword should be used for that location like some cases below but I want to know more about what restrictions does it really make for compiler and basically what rules does compiler have to follow when dealing with such case and is there any exceptional case where despite simultaneous access to a memory location the volatile keyword can be ignored by programmer.
volatile SomeType * ptr = someAddress;
void someFunc(volatile const SomeType & input){
//function body
}
What you know is false. Volatile is not used to synchronize memory access between threads, apply any kind of memory fences, or anything of the sort. Operations on volatile memory are not atomic, and they are not guaranteed to be in any particular order. volatile is one of the most misunderstood facilities in the entire language. "Volatile is almost useless for multi-threadded programming."
What volatile is used for is interfacing with memory-mapped hardware, signal handlers and the setjmp machine code instruction.
It can also be used in a similar way that const is used, and this is how Alexandrescu uses it in this article. But make no mistake. volatile doesn't make your code magically thread safe. Used in this specific way, it is simply a tool that can help the compiler tell you where you might have messed up. It is still up to you to fix your mistakes, and volatile plays no role in fixing those mistakes.
EDIT: I'll try to elaborate a little bit on what I just said.
Suppose you have a class that has a pointer to something that cannot change. You might naturally make the pointer const:
class MyGizmo
{
public:
const Foo* foo_;
};
What does const really do for you here? It doesn't do anything to the memory. It's not like the write-protect tab on an old floppy disc. The memory itself it still writable. You just can't write to it through the foo_ pointer. So const is really just a way to give the compiler another way to let you know when you might be messing up. If you were to write this code:
gizmo.foo_->bar_ = 42;
...the compiler won't allow it, because it's marked const. Obviously you can get around this by using const_cast to cast away the const-ness, but if you need to be convinced this is a bad idea then there is no help for you. :)
Alexandrescu's use of volatile is exactly the same. It doesn't do anything to make the memory somehow "thread safe" in any way whatsoever. What it does is it gives the compiler another way to let you know when you may have screwed up. You mark things that you have made truly "thread safe" (through the use of actual synchronization objects, like Mutexes or Semaphores) as being volatile. Then the compiler won't let you use them in a non-volatile context. It throws a compiler error you then have to think about and fix. You could again get around it by casting away the volatile-ness using const_cast, but this is just as Evil as casting away const-ness.
My advice to you is to completely abandon volatile as a tool in writing multithreadded applications (edit:) until you really know what you're doing and why. It has some benefit but not in the way that most people think, and if you use it incorrectly, you could write dangerously unsafe applications.
It's not as well defined as you probably want it to be. Most of the relevant standardese from C++98 is in section 1.9, "Program Execution":
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and calls to library I/O functions.
Accessing an object designated by a volatile lvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression might produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
Once the execution of a function begins, no expressions from the calling function are evaluated until execution of the called function has completed.
When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects with type other than volatile sig_atomic_t are unspecified, and the value of any object not of volatile sig_atomic_t that is modified by the handler becomes undefined.
An instance of each object with automatic storage duration (3.7.2) is associated with each entry into its block. Such an object exists and retains its last-stored value during the execution of the block and while the block is suspended (by a call of a function or receipt of a signal).
The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous evaluations are complete and subsequent evaluations have not yet occurred.
At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.
The input and output dynamics of interactive devices shall take place in such a fashion that prompting messages actually appear prior to a program waiting for input. What constitutes an interactive device is implementation-defined.
So what that boils down to is:
The compiler cannot optimize away reads or writes to volatile objects. For simple cases like the one casablanca mentioned, that works the way you might think. However, in cases like
volatile int a;
int b;
b = a = 42;
people can and do argue about whether the compiler has to generate code as if the last line had read
a = 42; b = a;
or if it can, as it normally would (in the absence of volatile), generate
a = 42; b = 42;
(C++0x may have addressed this point, I haven't read the whole thing.)
The compiler may not reorder operations on two different volatile objects that occur in separate statements (every semicolon is a sequence point) but it is totally allowed to rearrange accesses to non-volatile objects relative to volatile ones. This is one of the many reasons why you should not try to write your own spinlocks, and is the primary reason why John Dibling is warning you not to treat volatile as a panacea for multithreaded programming.
Speaking of threads, you will have noticed the complete absence of any mention of threads in the standards text. That is because C++98 has no concept of threads. (C++0x does, and may well specify their interaction with volatile, but I wouldn't be assuming anyone implements those rules yet if I were you.) Therefore, there is no guarantee that accesses to volatile objects from one thread are visible to another thread. This is the other major reason volatile is not especially useful for multithreaded programming.
There is no guarantee that volatile objects are accessed in one piece, or that modifications to volatile objects avoid touching other things right next to them in memory. This is not explicit in what I quoted but is implied by the stuff about volatile sig_atomic_t -- the sig_atomic_t part would be unnecessary otherwise. This makes volatile substantially less useful for access to I/O devices than it was probably intended to be, and compilers marketed for embedded programming often offer stronger guarantees, but it's not something you can count on.
Lots of people try to make specific accesses to objects have volatile semantics, e.g. doing
T x;
*(volatile T *)&x = foo();
This is legit (because it says "object designated by a volatile lvalue" and not "object with a volatile type") but has to be done with great care, because remember what I said about the compiler being totally allowed to reorder non-volatile accesses relative to volatile ones? That goes even if it's the same object (as far as I know anyway).
If you are worried about reordering of accesses to more than one volatile value, you need to understand the sequence point rules, which are long and complicated and I'm not going to quote them here because this answer is already too long, but here's a good explanation which is only a little simplified. If you find yourself needing to worry about the differences in the sequence point rules between C and C++ you have already screwed up somewhere (for instance, as a rule of thumb, never overload &&).
A particular and very common optimization that is ruled out by volatile is to cache a value from memory into a register, and use the register for repeated access (because this is much faster than going back to memory every time).
Instead the compiler must fetch the value from memory every time (taking a hint from Zach, I should say that "every time" is bounded by sequence points).
Nor can a sequence of writes make use of a register and only write the final value back later on: every write must be pushed out to memory.
Why is this useful? On some architectures certain IO devices map their inputs or outputs to a memory location (i.e. a byte written to that location actually goes out on the serial line). If the compiler redirects some of those writes to a register that is only flushed occasionally then most of the bytes won't go onto the serial line. Not good. Using volatile prevents this situation.
Declaring a variable as volatile means the compiler can't make any assumptions about the value that it could have done otherwise, and hence prevents the compiler from applying various optimizations. Essentially it forces the compiler to re-read the value from memory on each access, even if the normal flow of code doesn't change the value. For example:
int *i = ...;
cout << *i; // line A
// ... (some code that doesn't use i)
cout << *i; // line B
In this case, the compiler would normally assume that since the value at i wasn't modified in between, it's okay to retain the value from line A (say in a register) and print the same value in B. However, if you mark i as volatile, you're telling the compiler that some external source could have possibly modified the value at i between line A and B, so the compiler must re-fetch the current value from memory.
The compiler is not allowed to optimize away reads of a volatile object in a loop, which otherwise it'd normally do (i.e. strlen()).
It's commonly used in embedded programming when reading a hardware registry at a fixed address, and that value may change unexpectedly. (In contrast with "normal" memory, that doesn't change unless written to by the program itself...)
That is it's main purpose.
It could also be used to make sure one thread see the change in a value written by another, but it in no way guarantees atomicity when reading/writing to said object.
Here's the problem: your program temporarily uses some sensitive data and wants to erase it when it's no longer needed. Using std::fill() on itself won't always help - the compiler might decide that the memory block is not accessed later, so erasing it is a waste of time and eliminate erasing code.
User ybungalobill suggests using volatile keyword:
{
char buffer[size];
//obtain and use password
std::fill_n( (volatile char*)buffer, size, 0);
}
The intent is that upon seeing the volatile keyword the compiler will not try to eliminate the call to std::fill_n().
Will volatile keyword always prevent the compiler from such memory modifying code elimination?
The compiler is free to optimize your code out because buffer is not a volatile object.
The Standard only requires a compiler to strictly adhere to semantics for volatile objects. Here is what C++03 says
The least requirements on a conforming implementation are:
At sequence points, volatile objects are stable in the sense that previous evaluations are complete and
subsequent evaluations have not yet occurred.
[...]
and
The observable behavior of the abstract machine is its sequence of reads and writes to volatile data and
calls to library I/O functions
In your example, what you have are reads and writes using volatile lvalues to non-volatile objects. C++0x removed the second text I quoted above, because it's redundant. C++0x just says
The least requirements on a conforming implementation are:
Access to volatile objects are evaluated strictly according to the rules of the abstract machine.[...]
These collectively are referred to as the observable behavior of the program.
While one may argue that "volatile data" could maybe mean "data accessed by volatile lvalues", which would still be quite a stretch, the C++0x wording removed all doubts about your code and clearly allows implementations to optimize it away.
But as people pointed out to me, It probably does not matter in practice. A compiler that optimizes such a thing will most probably go against the programmers intention (why would someone have a pointer to volatile otherwise) and so would probably contain a bug. Still, I have experienced compiler vendors that cited these paragraphs when they were faced with bugreports about their over-aggressive optimizations. In the end, volatile is inherent platform specific and you are supposed to double check the result anyway.
From the last C++0x draft [intro.execution]:
8 The least requirements on a
conforming implementation are:
— Access to volatile objects are
evaluated strictly according to the
rules of the abstract machine.
[...]
12 Accessing an object designated by a
volatile glvalue (3.10), modifying an
object, calling a library I/O
function, or calling a function that
does any of those operations are all
side effects, [...]
So even the code you provided must not be optimized.
The memory content you wish to remove may have already been flushed out from your CPU/core's inner cache to RAM, where other CPUs can continue to see it. After overwriting it, you need to use a mutex / memory barrier instruction / atomic operation or something to trigger a sync with other cores. In practice, your compiler will probably do this before calling any external functions (google Dave Butenhof's post on volatile's dubious utility in multi-threading), so if you thread does that soon afterwards anyway then it's not a major issue. Summarily: volatile isn't needed.
A conforming implementation may, at its leisure, defer the actual performance of any volatile reads and writes until the result of a volatile read would affect the execution of a volatile write or I/O operation.
For example, given something like:
volatile unsigned char vol1,vol2;
extern unsigned char res[1000];
void test(int scale)
{
unsigned char ch;
for (int 0=0; i<10000; i++)
{
res[i] = i*vol1*scale;
vol2 = res[i];
}
}
a conforming compiler could, at its option, check whether scale is a multiple of 128 and--if so--clear out all even-indexed values of res before doing any reads from vol1 or writes to vol2. Even though the compiler would need to do each reads from vol1 before it could do the following write to vol2, a compiler may be able to defer both operations until after it has run an essentially unlimited amount of code.
I was looking up the keyword volatile and what it's for, and the answer I got was pretty much:
It's used to prevent the compiler from optimizing away code.
There were some examples, such as when polling memory-mapped hardware: without volatile the polling loop would be removed as the compiler might recognize that the condition value is never changed. But since there only were one example or maybe two, it got me thinking: Are there other situations where we need to use volatile in terms of avoiding unwanted optimization? Are condition variables the only place where volatile is needed?
I imagine that optimization is compiler-specific and therefore is not specified in the C++ specification. Does that mean we have to go by gut feeling, saying Hm, I suspect my compiler will do away with this if I don't declare that variable as volatile or are there any clear rules to go by?
Basically, volatile announces that a value might change behind your program's back. That prevents compilers from caching the value (in a CPU register) and from optimizing away accesses to that value when they seem unnecessary from the POV of your program.
What should trigger usage of volatile is when a value changes despite the fact that your program hasn't written to it, and when no other memory barriers (like mutexes as used for multi-threaded programs) are present.
The observable behavior of a C++ program is determined by read and writes to volatile variables, and any calls to input/output functions.
What this entails is that all reads and writes to volatile variables must happen in the order they appear in code, and they must happen. (If a compiler broke one of those rules, it would be breaking the as-if rule.)
That's all. It's used when you need to indicate that reading or writing a variable is to be seen as an observable effect. (Note, the "C++ and the Perils of Double-Checked Locking" article touches on this quite a bit.)
So to answer the title question, it prevents any optimization that might re-order the evaluation of volatile variables relative to other volatile variables.
That means a compiler that changes:
int x = 2;
volatile int y = 5;
x = 5;
y = 7;
To
int x = 5;
volatile int y = 5;
y = 7;
Is fine, since the value of x is not part of the observable behavior (it's not volatile). What wouldn't be fine is changing the assignment from 5 to an assignment to 7, because that write of 5 is an observable effect.
Condition variables are not where volatile is needed; strictly it is only needed in device drivers.
volatile guarantees that reads and writes to the object are not optimized away, or reordered with respect to another volatile. If you are busy-looping on a variable modified by another thread, it should be declared volatile. However, you shouldn't busy-loop. Because the language wasn't really designed for multithreading, this isn't very well supported. For example, the compiler may move a write to a non-volatile variable from after to before the loop, violating the lock. (For indefinite spinloops, this might only happen under C++0x.)
When you call a thread-library function, it acts as a memory fence, and the compiler will assume that any and all values have changed — essentially everything is volatile. This is either specified or tacitly implemented by any threading library to keep the wheels turning smoothly.
C++0x might not have this shortcoming, as it introduces formal multithreading semantics. I'm not really familiar with the changes, but for the sake of backward compatibility, it doesn't require to declare anything volatile that wasn't before.
Remember that the "as if rule" means that the compiler can, and should, do whatever it wants, as long as the behaviour as seen from outside the program as a whole is the same. In particular, while a variable conceptually names an area in memory, there is no reason why it actually should be in memory.
It could be in a register:
Its value could be calculated away, e.g. in:
int x = 2;
int y = x + 7;
return y + 1;
Need not have an x and y at all, but could just be replaced with:
return 10;
And another example, is that any code that doesn't affect state from the outside could be removed entirely. E.g. if you zeroise sensitive data, the compiler can see this as a wasted exercise ("why are you writing to what won't be read?") and remove it. volatile can be used to stop that happening.
volatile can be thought of as meaning "the state of this variable must be considered part of the outwardly visible state, and not messed with". Optimisations that would use it other than literally following the source code are not allowed.
(A note C#. A lot I've seen of late on volatile suggests that people are reading about C++ volatile and applying it to C#, and reading about it in C# and applying it to C++. Really though, volatile behaves so differently between the two as to not be useful to consider them related).
Volatile doesn't try to keep data to a cpu register (100's of times faster than memory). It has to read it from memory every time it is used.
One way to think about a volatile variable is to imagine that it's a virtual property; writes and even reads may do things compiler can't know about. The actual generated code for a writing/reading a volatile variable is simply a memory write or read(*), but the compiler has to regard the code as opaque; it can't make any assumptions under which it might be superfluous. The issue isn't merely with making sure that the compiled code notices that something has caused a variable to change. On some systems, even memory reads can "do" things.
(*) On some compilers, volatile variables may be added to, subtracted from, incremented, decremented, etc. as distinct operations. It's probably useful for a compiler to compile:
volatilevar++;
as
inc [_volatilevar]
since the latter form may be atomic on many microprocessors (though not on modern multi-core PCs). It's important to note, however, that if the statement were:
volatilevar2 = (volatilevar1++);
the correct code would not be:
mov ax,[_volatilevar1] ; Reads it once
inc [_volatilevar] ; Reads it again (oops)
mov [_volatilevar2],ax
nor
mov ax,[_volatilevar1]
mov [_volatilevar2],ax ; Writes in wrong sequence
inc ax
mov [_volatilevar1],ax
but rather
mov ax,[_volatilevar1]
mov bx,ax
inc ax
mov [_volatilevar1],ax
mov [_volatilevar2],bx
Writing the source code differently would allow the generation of more efficient (and possibly safer) code. If 'volatilevar1' didn't mind being read twice and 'volatilevar2' didn't mind being written before volatilevar1, then splitting the statement into
volatilevar2 = volatilevar1;
volatilevar1++;
would allow for faster, and possibly safer, code.
Unless you are on an embedded system, or you are writing hardware drivers where memory mapping is used as the means of communication, you should never ever ever be using volatile
Consider:
int main()
{
volatile int SomeHardwareMemory; //This is a platform specific INT location.
for(int idx=0; idx < 56; ++idx)
{
printf("%d", SomeHardwareMemory);
}
}
Has to produce code like:
loadIntoRegister3 56
loadIntoRegister2 "%d"
loopTop:
loadIntoRegister1 <<SOMEHARDWAREMEMORY>
pushRegister2
pushRegister1
call printf
decrementRegister3
ifRegister3LessThan 56 goto loopTop
whereas without volatile it could be:
loadIntoRegister3 56
loadIntoRegister2 "%d"
loadIntoRegister1 <<SOMEHARDWAREMEMORY>
loopTop:
pushRegister2
pushRegister1
call printf
decrementRegister3
ifRegister3LessThan 56 goto loopTop
The assumption about volatile is that the memory location of the variable may be changed. You are forcing the compiler to load the actual value from memory each time the variable is used; and you tell the compiler that reuse of that value in a register is not allowed.
usually compiler assumes that a program is single threaded, therefore it has complete knowledge of what's happening with variable values. a smart compiler can then prove that the program can be transformed into another program with equivalent semantics but better performance. for example
x = y+y+y+y+y;
can be transformed to
x = y*5;
however, if a variable can be changed outside the thread, compiler doesn't have a complete knowledge of what's going on by simply examining this piece of code. it can no longer make optimizations like above. (edit: it probably can in this case; we need more sophisticated examples)
by default, for performance optimization, single thread access is assumed. this assumption is usually true. unless programmer explicitly instruct otherwise with the volatile keyword.