What kind of operations might one need to do before main() - c++

I came across this question asking how to execute code before main() in C, mentioning there were strategies for C++. I've mostly lived in application space, so executing before main() has never occurred to me. What kind of things require this technique?

"What kind of things require this technique?"
Point of fact: none.
However, there are a lot of useful things you might WANT to do before main for a variety of reasons. For just one practical example, say you have an abstract factory that builds doohickies. You could make sure to build the factory instance, assign it to some special area, and then register the various concrete doohickies to it...yes, you can do that.
On the other hand, if you implement the factory as a singleton and use the facts of global value initialization to "trick" the implementation into registering concrete doohickies before main starts you gain several benefits with very few costs (the fact of using singletons, basically a non-issue here, is pretty much the only one).
For example you:
Don't have to maintain a list of registrations that all must be explicitly called. In fact, you can even declare and define an entire class in private scope, out of sight of anyone, and have it available for use when the program starts.
main() doesn't have to do a bunch of crap with a bunch of objects it doesn't care about.
So, none of this is actually necessary. However, you can reduce coupling and maintenance issues if you leverage the fact that globals are initialized before main begins.
Edit:
Should note here that I've since learned that this isn't guaranteed by the language. C++ only guarantees that zero or constant initialization happens before main. What I talk about in this answer is dynamic initialization. This C++ guarantees happens before the first use of the variable, much like function-local static variables.
Every compiler though seems to do dynamic initialization before main. I thought I ran into one once that did not but I believe the source of the issue was something else.

This technique can be used for library initialization routines or for initializing data that will be used implicitly during the execution of the program.
GCC provides constructor and destructor function attributes that cause a function to be called automatically before execution enters main() or main() has completed or exit() has been called, respectively.
void __attribute__ ((constructor)) my_init(void);
void __attribute__ ((destructor)) my_fini(void);
In the case of library initialization, constructor routines are executed before dlopen() returns if the library is loaded at runtime or before main() is started if the library is loaded at load time. When used for library cleanup, destructor routines are executed before dlclose() returns if the library is loaded at runtime or after exit() or completion of main() if the library is loaded at load time.

The only things you could want to do before main involve global variables, which are bad, and the same things could always be accomplished by lazy initialization (initialization at the point of first use). Of course they'd be much better accomplished by not using global variables at all.
One possible "exception" is the initialization of global constant tables at runtime. But this is a very bad practice, as the tables are not sharable between instances of a library/process if you fill them at runtime. It's much smarter to write a script to generate the static const tables as a C or C++ source file at build time.

Stuff done before main:
On x86, the stack pointer register is usually &=0XF3 to make it a multiple of 4 (alignment)
Static members are initialized
push argc and argv (and environ if needed)
call _main =p
g++ 4.4 emits the following before any of my code is emitted. Technically it inserts it into the top of main before any of my code, but I've seen compilers that use _init instead of _main as the entry point:
.cfi_startproc
.cfi_personality 0x3,__gxx_personality_v0
pushq %rbp
.cfi_def_cfa_offset 16
movq %rsp, %rbp
.cfi_offset 6, -16
.cfi_def_cfa_register 6
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
# My code follows

Anything that needs to run code to guarantee invariants for your code after main starts needs to run before main. Things like global iostreams, the C runtime library, OS bindings, etc.
Now whether you actually need to write code that does such things as well is what everyone else is answering.

If you have a library, it is very convenient to be able to initialise some data, create threads etc. before main() is invoked, and know that a desired state is achieved without burdening and trusting the client app to explicitly call some library initialisation and/or shutdown code. Superficially, this can be achieved by having a static object whose constructor and destructor performs the necessary operations. Unfortunately, multiple static objects in different translation units or libraries will have an undefined order of initialisation, so if they depend upon each other (worse yet, in a cyclic fashion), then they may still not have achieved their initialised state before a request comes in. Similarly, one static object may create threads and call services in another object that aren't yet threadsafe. So, a more structured approach with proper singleton instances and locks is needed for robustness in the face of arbitrary usage, and the whole thing looks much less appealing, though it may still be a net win in some cases.

Related

Is it bad practice to declare a C function as static if it can still be executed indirectly (through a callback function)?

I have a C module for an embedded system (foo.c, foo.h) that contains a function my_driver_fn() that is local in scope from an API perspective (e.g. not in foo's public header: any other code that uses its API via #include "foo.h" should not be allowed to call this function). Assume my_driver_fn() is reentrant.
However, foo uses a library libdostuff that needs to be initialized with a few user-supplied callback functions (architecture/hardware specific things) for it to work properly on any platform. In foo, my_driver_fn mentioned above would be one of the functions in question...needed by libdostuff, but not by anyone that uses foo.
Is it bad form, dangerous, disadvantageous, handicapping the compiler in any way, or undefined behavior that the compiler could capitalize on, for these callback functions (my_driver_fn() to be declared as static within foo.c? Given that its address is provided to libdostuff and it is "indirectly" called (though never directly)?
Note: I happen to be writing both foo and libdostuff, and I'm wondering whether it makes more sense for the user-supplied functions to be extern and purely resolved at link-time, or passed into libdostuff via a user-supplied callback table supplied in an initialization function (e.g. libdostuff_init(CallbackTable *user_callbacks) where CallbackTable has a function pointer that would be initialized to point to my_driver_fn)
In my opinion this is good practice. The static refers to the visibility of the name, and nothing else.
If the name is not needed to be used by other translation units, then marking it static reduces the risk of a clash in the "namespace" of externally visible functions .
It is good practice and there are no negative side-effects like poorly-defined behavior.
There are many reasons for using static, like reducing namespace pollution and avoiding accidental/intentional calls. But the main reason is actually private encapsulation design and self-documenting code - to keep functions in the module they belong, so that the outside world need not worry about how and when to call them. They should only care about the external linkage functions you have declared in the public header file.
This is for example exactly how you design ISRs in embedded systems. They should always be declared static and be placed inside the driver controlling the hardware that the ISR relates to. All communication with the ISR, including race condition protection, should be placed encapsulated in that driver. The calling application never speaks directly with the ISR and should not need to worry about re-entrancy etc. Same thing with DMA buffers.
As an example of this, I always design a cyclic timer driver for all my MCU projects, where the API to the caller lets them register simple callback functions. The timer driver then calls the callback when the timer elapses, from inside the ISR. This can then be used as a general-purpose timer for all low priority tasks that need a timer: delays, de-bouncing etc. The caller then only needs to responsibly keep the callback function minimal and specify a time in ms, but the caller doesn't know or care about the timer hardware, and the timer hardware knows nothing of the callback it executes. Loose coupling in both directions. This timer API also serves as a HAL, so you can port the code to another MCU without changing the caller code - only change the underlying driver.
It is good practice. It ensures the function can only be called by the code you have explicitly provided the callback to. If it had external linkage, anything could call it, whether it makes sense or not. It is clearly not impossible to abuse, but does make it more difficult to do accidentally.

llvm caller saved registers not reloaded after function call

I am generating code with a customized RISCV backend for llvm. When I compile with no optimization (-O0) the program is functional. When I compile with optimization (-O2), none of the temporary registers (caller saved) are being reloaded after calls. The optimized code does appear to have a good register allocation but it does things like loading a temporary register with the address of a class initializer and then calling the initializer several times without reloading the register.
load t1 %(foo)
jal t1
...
jal t1
...
jal t1
The .bc file looks basically the same as this pseudo-code, correctly declaring the live ranges (lifetime.start and lifetime.end) to span all of the function calls.
I think the strategy of the register allocator is to use the caller saved registers first, so I don't think this is an issue of incorrectly declaring my registers. In the older register allocators, I see where loadRegFromStackSlot() or assignVirt2StackSlot() functions are used to spill the registers requiring reloading after being clobbered. But in the greedy register allocator I don't see an obvious place where this occurs, so I don't know how to debug what my backend might be missing.
Perhaps I need to add another pass or a different pass to the code generator. Perhaps I have something wrong with my register declarations, but I don't think so. Any insight would be appreciated.
If you are using riscv/riscv-llvm from GitHub you may need to grab the following fix. I was seeing similar issues without it.
https://github.com/riscv/riscv-llvm/commit/ce8ac4e3ed8956a37bf330ee4a431c6fdfabac37
The following issue report indicates similar problems and one of the answers points to the above change:
https://github.com/riscv/riscv-llvm/issues/32

Static objects and singletons

I'm using boost singletons (from serialization).
For example, there are some classes which inherit boost::serialization::singleton. Each of them has such define near it's definition (in h-file):
#define appManager ApplicationManager::get_const_instance()
class ApplicationManager: public boost::serialization::singleton<ApplicationManager> { ... };
And I have to call some method from that class each update (nearly 17 ms), for example, 200 times. So the code is like:
for (int i=0; i < 200; ++i)
appManager.get_some_var();
I looked with gprof at function call stack and saw that boost::get_const_instance calls each time. Maybe, in release-mode compiler will optimize this?
My idea is to make some global variable like:
ApplicationManager &handle = ApplicationManager::get_const_instance();
And use handle, so it wouldn't call get_const_instnace each time. Is that right?
Instead of using the Singleton anti-pattern, just a global variable and be done with it. It's more honest.
The main benefit of Singleton is when you want lazy initialization, or more fine grained control over initialization order than a global variable would allow you. It doesn't look like either of these things are a concern for you, so just use a global.
Personally, I think designs with global variables or Singletons are almost certainly broken. But to each h(is/er) own.
If you are bent on using a Singleton, the performance concern you raise is interesting, but likely not an issue as the function call overhead is probably less than 100ns. As was pointed out, you should profile. If it really concerns you a whole lot, store a local reference to the Singleton before the loop:
ApplicationManager &myAppManager = appManager;
for (int i=0; i < 200; ++i)
myAppManager.get_some_var();
BTW, using that #define in that way is a serious mistake. Almost all cases where you use the preprocessor for anything other than conditional compilation based on compile-time flags is probably a poor use. Boost does make extensive use of the pre-processor, but mostly to get around C++ limitations. Do not emulate it.
Lastly, that function is probably doing something important. One of the jobs of a get_instance method for Singletons is to avoid having multiple threads initialize the same Singleton at the same time. With global variables this shouldn't be an issue because they should be initialized before you've started any threads.
Is it really a problem? I mean, does your application really suffers for this behaviour?
I would despise such a solution because, in all effects, you are countering one of the benefits of the Singleton pattern, namely to avoid global variables. If you want to use a global variable, then don't use Singleton at all, right?
Yes, that is certainly a possible solution. I'm not entirely sure what boost is doing with its singleton behind the scenes; you can look that up yourself in the code.
The singleton pattern is just like creating a global object and accessing the global object, in most respects. There are some differences:
1) The singleton object instance is not created until it is first accessed, whereas the global object is created at program startup.
2) Because the singleton object is not created until it is first accessed, it is actually created when the program is running. Thus the singleton instance has access to other fully constructed objects in the program when the constructor is actually running.
3) Because you access the singleton through the getInstance() method (boost's get_const_instance method) there is a little bit of overhead for executing that method call.
So if you're not concerned about when the singleton is actually created, and can live with it being created at program startup, you could just go with a global variable and access that. If you really need the singleton created after the program starts up, then you need the singleton. In that case, you can grab and hold onto a reference to the object returned by get_const_instance() and use that reference.
Something that bit me in the past though you should be aware of. You're actually getting a reference to the object that is owned by the singleton. You don't own that object.
1) Do not write code that would cause the destructor to execute (say, using a shared pointer on the returned reference), or write any other code that could cause the object to end up in a bad state.
2) In a multi-threaded app, take care to correctly lock fields in the object if the object may be used by more than one thread.
3) In a multi-threaded app, make sure that all threads that hold onto references to the object terminate before the program is unloaded. I've seen a case where the singleton's code resides in one DLL library; a thread that holds the reference lives in another DLL library. When the program ends, the thread was still active. The DLL holding the singleton's code was unloaded first; the thread that was still alive tried to do something to the singleton's object and caused a crash.
Singletons have their advantages in situations where you want to control the level of access to something at process or application scope beyond what a global variable could achieve in a more elegant way.
However most singleton objects provided in a library will be designed to ensure some level of thread safety and most likely access to the instance is being locked via a mutex or other critical section of some kind which can affect performance.
In the case of a game or 3d application where performance is key you may want to consider making your own lightweight singleton if thread safety is not a concern and gain some performance.

Can threads be safely created during static initialization?

At some point I remember reading that threads can't be safely created until the first line of main(), because compilers insert special code to make threading work that runs during static initialization time. So if you have a global object that creates a thread on construction, your program may crash. But now I can't find the original article, and I'm curious how strong a restriction this is -- is it strictly true by the standard? Is it true on most compilers? Will it remain true in C++0x? Is it possible for a standards conforming compiler to make static initialization itself multithreaded? (e.g. detecting that two global objects don't touch one another, and initializing them on separate threads to accelerate program startup)
Edit: To clarify, I'm trying to at least get a feel for whether implementations really differ significantly in this respect, or if it's something that's pseudo-standard. For example, technically the standard allows for shuffling the layout of members that belong to different access specifiers (public/protected/etc.). But no compiler I know of actually does this.
What you're talking about isn't strictly in the language but in the C Run Time Library (CRT).
For a start, if you create a thread using a native call such as CreateThread() on windows then you can do it anywhere you'd like because it goes straight to the OS with no intervention of the CRT.
The other option you usually have is to use _beginthread() which is part of the CRT. There are some advantages to using _beginthread() such as having a thread-safe errno. Read more about this here. If you're going to create threads using _beginthread() there could be some issues because the initializations needed for _beginthread() might not be in place.
This touches on a more general issue of what exactly happens before main() and in what order. Basically you have the program's entry point function which takes care of everything that needs to happen before main() with Visual Studio you can actually look at this piece of code which is in the CRT and find out for yourself what exactly's going on there. The easyest way to get to that code is to stop a breakpoint in your code and look at the stack frames before main()
The underlying problem is a Windows restriction on what you can and cannot do in DllMain. In particular, you're not supposed to create threads in DllMain. Static initialization often happens from DllMain. Then it follows logically that you can't create threads during static initialization.
As far as I can tell from reading the C++0x/1x draft, starting a thread prior to main() is fine, but still subject to the normal pitfalls of static initialization. A conforming implementation will have to make sure the code to intialize threading executes before any static or thread constructors do.

Why does volatile exist?

What does the volatile keyword do? In C++ what problem does it solve?
In my case, I have never knowingly needed it.
volatile is needed if you are reading from a spot in memory that, say, a completely separate process/device/whatever may write to.
I used to work with dual-port ram in a multiprocessor system in straight C. We used a hardware managed 16 bit value as a semaphore to know when the other guy was done. Essentially we did this:
void waitForSemaphore()
{
volatile uint16_t* semPtr = WELL_KNOWN_SEM_ADDR;/*well known address to my semaphore*/
while ((*semPtr) != IS_OK_FOR_ME_TO_PROCEED);
}
Without volatile, the optimizer sees the loop as useless (The guy never sets the value! He's nuts, get rid of that code!) and my code would proceed without having acquired the semaphore, causing problems later on.
volatile is needed when developing embedded systems or device drivers, where you need to read or write a memory-mapped hardware device. The contents of a particular device register could change at any time, so you need the volatile keyword to ensure that such accesses aren't optimised away by the compiler.
Some processors have floating point registers that have more than 64 bits of precision (eg. 32-bit x86 without SSE, see Peter's comment). That way, if you run several operations on double-precision numbers, you actually get a higher-precision answer than if you were to truncate each intermediate result to 64 bits.
This is usually great, but it means that depending on how the compiler assigned registers and did optimizations you'll have different results for the exact same operations on the exact same inputs. If you need consistency then you can force each operation to go back to memory by using the volatile keyword.
It's also useful for some algorithms that make no algebraic sense but reduce floating point error, such as Kahan summation. Algebraicly it's a nop, so it will often get incorrectly optimized out unless some intermediate variables are volatile.
From a "Volatile as a promise" article by Dan Saks:
(...) a volatile object is one whose value might change spontaneously. That is, when you declare an object to be volatile, you're telling the compiler that the object might change state even though no statements in the program appear to change it."
Here are links to three of his articles regarding the volatile keyword:
Use volatile judiciously
Place volatile accurately
Volatile as a promise
You MUST use volatile when implementing lock-free data structures. Otherwise the compiler is free to optimize access to the variable, which will change the semantics.
To put it another way, volatile tells the compiler that accesses to this variable must correspond to a physical memory read/write operation.
For example, this is how InterlockedIncrement is declared in the Win32 API:
LONG __cdecl InterlockedIncrement(
__inout LONG volatile *Addend
);
A large application that I used to work on in the early 1990s contained C-based exception handling using setjmp and longjmp. The volatile keyword was necessary on variables whose values needed to be preserved in the block of code that served as the "catch" clause, lest those vars be stored in registers and wiped out by the longjmp.
In Standard C, one of the places to use volatile is with a signal handler. In fact, in Standard C, all you can safely do in a signal handler is modify a volatile sig_atomic_t variable, or exit quickly. Indeed, AFAIK, it is the only place in Standard C that the use of volatile is required to avoid undefined behaviour.
ISO/IEC 9899:2011 §7.14.1.1 The signal function
¶5 If the signal occurs other than as the result of calling the abort or raise function, the
behavior is undefined if the signal handler refers to any object with static or thread
storage duration that is not a lock-free atomic object other than by assigning a value to an
object declared as volatile sig_atomic_t, or the signal handler calls any function
in the standard library other than the abort function, the _Exit function, the
quick_exit function, or the signal function with the first argument equal to the
signal number corresponding to the signal that caused the invocation of the handler.
Furthermore, if such a call to the signal function results in a SIG_ERR return, the
value of errno is indeterminate.252)
252) If any signal is generated by an asynchronous signal handler, the behavior is undefined.
That means that in Standard C, you can write:
static volatile sig_atomic_t sig_num = 0;
static void sig_handler(int signum)
{
signal(signum, sig_handler);
sig_num = signum;
}
and not much else.
POSIX is a lot more lenient about what you can do in a signal handler, but there are still limitations (and one of the limitations is that the Standard I/O library — printf() et al — cannot be used safely).
Developing for an embedded, I have a loop that checks on a variable that can be changed in an interrupt handler. Without "volatile", the loop becomes a noop - as far as the compiler can tell, the variable never changes, so it optimizes the check away.
Same thing would apply to a variable that may be changed in a different thread in a more traditional environment, but there we often do synchronization calls, so compiler is not so free with optimization.
I've used it in debug builds when the compiler insists on optimizing away a variable that I want to be able to see as I step through code.
Besides using it as intended, volatile is used in (template) metaprogramming. It can be used to prevent accidental overloading, as the volatile attribute (like const) takes part in overload resolution.
template <typename T>
class Foo {
std::enable_if_t<sizeof(T)==4, void> f(T& t)
{ std::cout << 1 << t; }
void f(T volatile& t)
{ std::cout << 2 << const_cast<T&>(t); }
void bar() { T t; f(t); }
};
This is legal; both overloads are potentially callable and do almost the same. The cast in the volatile overload is legal as we know bar won't pass a non-volatile T anyway. The volatile version is strictly worse, though, so never chosen in overload resolution if the non-volatile f is available.
Note that the code never actually depends on volatile memory access.
you must use it to implement spinlocks as well as some (all?) lock-free data structures
use it with atomic operations/instructions
helped me once to overcome compiler's bug (wrongly generated code during optimization)
The volatile keyword is intended to prevent the compiler from applying any optimisations on objects that can change in ways that cannot be determined by the compiler.
Objects declared as volatile are omitted from optimisation because their values can be changed by code outside the scope of current code at any time. The system always reads the current value of a volatile object from the memory location rather than keeping its value in temporary register at the point it is requested, even if a previous instruction asked for a value from the same object.
Consider the following cases
1) Global variables modified by an interrupt service routine outside the scope.
2) Global variables within a multi-threaded application.
If we do not use volatile qualifier, the following problems may arise
1) Code may not work as expected when optimisation is turned on.
2) Code may not work as expected when interrupts are enabled and used.
Volatile: A programmer’s best friend
https://en.wikipedia.org/wiki/Volatile_(computer_programming)
Other answers already mention avoiding some optimization in order to:
use memory mapped registers (or "MMIO")
write device drivers
allow easier debugging of programs
make floating point computations more deterministic
Volatile is essential whenever you need a value to appear to come from the outside and be unpredictable and avoid compiler optimizations based on a value being known, and when a result isn't actually used but you need it to be computed, or it's used but you want to compute it several times for a benchmark, and you need the computations to start and end at precise points.
A volatile read is like an input operation (like scanf or a use of cin): the value seems to come from the outside of the program, so any computation that has a dependency on the value needs to start after it.
A volatile write is like an output operation (like printf or a use of cout): the value seems to be communicated outside of the program, so if the value depends on a computation, it needs to be finished before.
So a pair of volatile read/write can be used to tame benchmarks and make time measurement meaningful.
Without volatile, your computation could be started by the compiler before, as nothing would prevent reordering of computations with functions such as time measurement.
All answers are excellent. But on the top of that, I would like to share an example.
Below is a little cpp program:
#include <iostream>
int x;
int main(){
char buf[50];
x = 8;
if(x == 8)
printf("x is 8\n");
else
sprintf(buf, "x is not 8\n");
x=1000;
while(x > 5)
x--;
return 0;
}
Now, lets generate the assembly of the above code (and I will paste only that portions of the assembly which relevant here):
The command to generate assembly:
g++ -S -O3 -c -fverbose-asm -Wa,-adhln assembly.cpp
And the assembly:
main:
.LFB1594:
subq $40, %rsp #,
.seh_stackalloc 40
.seh_endprologue
# assembly.cpp:5: int main(){
call __main #
# assembly.cpp:10: printf("x is 8\n");
leaq .LC0(%rip), %rcx #,
# assembly.cpp:7: x = 8;
movl $8, x(%rip) #, x
# assembly.cpp:10: printf("x is 8\n");
call _ZL6printfPKcz.constprop.0 #
# assembly.cpp:18: }
xorl %eax, %eax #
movl $5, x(%rip) #, x
addq $40, %rsp #,
ret
.seh_endproc
.p2align 4,,15
.def _GLOBAL__sub_I_x; .scl 3; .type 32; .endef
.seh_proc _GLOBAL__sub_I_x
You can see in the assembly that the assembly code was not generated for sprintf because the compiler assumed that x will not change outside of the program. And same is the case with the while loop. while loop was altogether removed due to the optimization because compiler saw it as a useless code and thus directly assigned 5 to x (see movl $5, x(%rip)).
The problem occurs when what if an external process/ hardware would change the value of x somewhere between x = 8; and if(x == 8). We would expect else block to work but unfortunately the compiler has trimmed out that part.
Now, in order to solve this, in the assembly.cpp, let us change int x; to volatile int x; and quickly see the assembly code generated:
main:
.LFB1594:
subq $104, %rsp #,
.seh_stackalloc 104
.seh_endprologue
# assembly.cpp:5: int main(){
call __main #
# assembly.cpp:7: x = 8;
movl $8, x(%rip) #, x
# assembly.cpp:9: if(x == 8)
movl x(%rip), %eax # x, x.1_1
# assembly.cpp:9: if(x == 8)
cmpl $8, %eax #, x.1_1
je .L11 #,
# assembly.cpp:12: sprintf(buf, "x is not 8\n");
leaq 32(%rsp), %rcx #, tmp93
leaq .LC0(%rip), %rdx #,
call _ZL7sprintfPcPKcz.constprop.0 #
.L7:
# assembly.cpp:14: x=1000;
movl $1000, x(%rip) #, x
# assembly.cpp:15: while(x > 5)
movl x(%rip), %eax # x, x.3_15
cmpl $5, %eax #, x.3_15
jle .L8 #,
.p2align 4,,10
.L9:
# assembly.cpp:16: x--;
movl x(%rip), %eax # x, x.4_3
subl $1, %eax #, _4
movl %eax, x(%rip) # _4, x
# assembly.cpp:15: while(x > 5)
movl x(%rip), %eax # x, x.3_2
cmpl $5, %eax #, x.3_2
jg .L9 #,
.L8:
# assembly.cpp:18: }
xorl %eax, %eax #
addq $104, %rsp #,
ret
.L11:
# assembly.cpp:10: printf("x is 8\n");
leaq .LC1(%rip), %rcx #,
call _ZL6printfPKcz.constprop.1 #
jmp .L7 #
.seh_endproc
.p2align 4,,15
.def _GLOBAL__sub_I_x; .scl 3; .type 32; .endef
.seh_proc _GLOBAL__sub_I_x
Here you can see that the assembly codes for sprintf, printf and while loop were generated. The advantage is that if the x variable is changed by some external program or hardware, sprintf part of the code will be executed. And similarly while loop can be used for busy waiting now.
Beside the fact that the volatile keyword is used for telling the compiler not to optimize the access to some variable (that can be modified by a thread or an interrupt routine), it can be also used to remove some compiler bugs -- YES it can be ---.
For example I worked on an embedded platform were the compiler was making some wrong assuptions regarding a value of a variable. If the code wasn't optimized the program would run ok. With optimizations (which were really needed because it was a critical routine) the code wouldn't work correctly. The only solution (though not very correct) was to declare the 'faulty' variable as volatile.
Your program seems to work even without volatile keyword? Perhaps this is the reason:
As mentioned previously the volatile keyword helps for cases like
volatile int* p = ...; // point to some memory
while( *p!=0 ) {} // loop until the memory becomes zero
But there seems to be almost no effect once an external or non-inline function is being called. E.g.:
while( *p!=0 ) { g(); }
Then with or without volatile almost the same result is generated.
As long as g() can be completely inlined, the compiler can see everything that's going on and can therefore optimize. But when the program makes a call to a place where the compiler can't see what's going on, it isn't safe for the compiler to make any assumptions any more. Hence the compiler will generate code that always reads from memory directly.
But beware of the day, when your function g() becomes inline (either due to explicit changes or due to compiler/linker cleverness) then your code might break if you forgot the volatile keyword!
Therefore I recommend to add the volatile keyword even if your program seems to work without. It makes the intention clearer and more robust in respect to future changes.
In the early days of C, compilers would interpret all actions that read and write lvalues as memory operations, to be performed in the same sequence as the reads and writes appeared in the code. Efficiency could be greatly improved in many cases if compilers were given a certain amount of freedom to re-order and consolidate operations, but there was a problem with this. Even though operations were often specified in a certain order merely because it was necessary to specify them in some order, and thus the programmer picked one of many equally-good alternatives, that wasn't always the case. Sometimes it would be important that certain operations occur in a particular sequence.
Exactly which details of sequencing are important will vary depending upon the target platform and application field. Rather than provide particularly detailed control, the Standard opted for a simple model: if a sequence of accesses are done with lvalues that are not qualified volatile, a compiler may reorder and consolidate them as it sees fit. If an action is done with a volatile-qualified lvalue, a quality implementation should offer whatever additional ordering guarantees might be required by code targeting its intended platform and application field, without requiring that programmers use non-standard syntax.
Unfortunately, rather than identify what guarantees programmers would need, many compilers have opted instead to offer the bare minimum guarantees mandated by the Standard. This makes volatile much less useful than it should be. On gcc or clang, for example, a programmer needing to implement a basic "hand-off mutex" [one where a task that has acquired and released a mutex won't do so again until the other task has done so] must do one of four things:
Put the acquisition and release of the mutex in a function that the compiler cannot inline, and to which it cannot apply Whole Program Optimization.
Qualify all the objects guarded by the mutex as volatile--something which shouldn't be necessary if all accesses occur after acquiring the mutex and before releasing it.
Use optimization level 0 to force the compiler to generate code as though all objects that aren't qualified register are volatile.
Use gcc-specific directives.
By contrast, when using a higher-quality compiler which is more suitable for systems programming, such as icc, one would have another option:
Make sure that a volatile-qualified write gets performed everyplace an acquire or release is needed.
Acquiring a basic "hand-off mutex" requires a volatile read (to see if it's ready), and shouldn't require a volatile write as well (the other side won't try to re-acquire it until it's handed back) but having to perform a meaningless volatile write is still better than any of the options available under gcc or clang.
One use I should remind you is, in the signal handler function, if you want to access/modify a global variable (for example, mark it as exit = true) you have to declare that variable as 'volatile'.
I would like to quote Herb Sutter's words from his GotW #95, which can help to understand the meaning of the volatile variables:
C++ volatile variables (which have no analog in languages like C# and Java) are always beyond the scope of this and any other article about the memory model and synchronization. That’s because C++ volatile variables aren’t about threads or communication at all and don’t interact with those things. Rather, a C++ volatile variable should be viewed as portal into a different universe beyond the language — a memory location that by definition does not obey the language’s memory model because that memory location is accessed by hardware (e.g., written to by a daughter card), have more than one address, or is otherwise “strange” and beyond the language. So C++ volatile variables are universally an exception to every guideline about synchronization because are always inherently “racy” and unsynchronizable using the normal tools (mutexes, atomics, etc.) and more generally exist outside all normal of the language and compiler including that they generally cannot be optimized by the compiler (because the compiler isn’t allowed to know their semantics; a volatile int vi; may not behave anything like a normal int, and you can’t even assume that code like vi = 5; int read_back = vi; is guaranteed to result in read_back == 5, or that code like int i = vi; int j = vi; that reads vi twice will result in i == j which will not be true if vi is a hardware counter for example).