Use of global variables in C++ application - c++

I’ve used global variables without having any noticeable problems but would like to know if there are potential problems or drawbacks with my use of globals.
In the first scenario, I include const globals into a globals.h file, I then include the header into various implementation files where I need access to any one of the globals:
globals.h
const int MAX_URL_LEN = 100;
const int MAX_EMAIL_LEN = 50;
…
In the second scenario, I declare and initialize the globals in an implementation file when the application executes. These globals are never modified again. When I need access to these globals from a different implementation file, I use the extern keyword:
main.cpp
char application_path[128];
char data_path[128];
// assign data to globals
strcpy(application_path, get_dll_path().c_str());
…
do_something.cpp
extern char application _path[]; // global is now accessible in do_something.cpp
Regarding the first scenario above, I’ve considered removing all of the different “include globals.h” and using extern where access to those globals is needed but have not done so since just including the globals.h is so convenient.
I am concerned that I will have different versions of the variables for each implementation file that includes globals.h.
Should I use extern instead of including the globals.h everywhere access is needed?
Please advise, and thank you.

Global mutable variables
provide invisible lines of influence across all of the code, and
you cannot rely on their values, or whether they've been initialized.
That is, global mutable variables do for data flow what the global goto once did for execution flow, creating a spaghetti mess, wasting everyone's time.
Constant global variables are more OK, but even for those you run into
the initialization order fiasco.
I remember how angry I got when I realized that all my troubles in wrapping a well known GUI framework, was due to it needlessly using global variables and provoking the initialization order fiasco. First the anger was directed at the author, then at myself for being so stupid, not realizing what was going on (or rather, was not going on). Anyway.
A sensible solution to all this is Meyers' singletons, like
inline
auto pi_decimal_digits()
-> const string&
{
static const string the_value = compute_pi_digits();
return the_value;
}
For the case of a global that's dynamically initialized from some place that knows the value, “one programmer's constant is another programmer's variable”, there is no good solution, but one practical solution is to accept the possibility of a run time error and at least detect it:
namespace detail {
inline
auto mutable_pi_digits()
-> string&
{
static string the_value;
return the_value;
}
} // namespace detail
inline
void set_pi_digits( const string& value )
{
string& digits = detail::mutable_pi_digits();
assert( digits.length() == 0 );
digits = value;
}
inline
auto pi_digits()
-> const string&
{ return detail::mutable_pi_digits(); }

Your implementation is fine for now. Globals become a problem when
Your program grows and so does your number of globals.
New people join the team that don't know what you were thinking.
Number 1 becomes particularly troublesome when your program becomes multi-threaded. Then you have a number of threads using the same data and you may require protection, which is difficult with just a list of globals.
By grouping data in separate files according to some criteria such as purpose or subject matter your code becomes more maintainable as it grows and you leave breadcrumbs for new programmers on the project to figure out how the software works.

One issue with globals is that when you go to include 3rd party libraries in your code, sometimes they've used globals with the same names as yours. There are definitely times when a global makes sense, but if possible you should also take care to do something like put it into a namespace.

Related

Refactoring large case statement, externs, static locals

I'm trying to do some refactoring and wish to figure out the best path forward.
I have
myonce{
static int i //for operation 1
switch(commandid) {
case 1: operation 1
i = 1;
...
where myonce is a function that is called in a loop. This is not my code, I'm trying to make it better. Operation 1 (or each case) is a series of commands, and I want to put them in their own translation units (one function per file).
Since myonce runs in a loop, the original author has many static variables that he uses to keep state, some of these state sets are used across multiple operations. Note that these are not static file scope, they are static block scope.
To keep things simple, as a proof of concept, I want to know if the following is possible.
Consider 1 operation with 1 set of static vars.
main.cpp
myonce {
static int i //for op 1
switch(commandid) {
case 1: operation1();
operation1.cpp
extern int i;
void operation1() {
i = 1;
}
In the case of multiple operations using the same sets of state, I would make a header to declare them all extern.
Currently compilation of this file is counted in minutes, and my first goal is to break it up into smaller compilation units so that the author can work more freely. this refactoring will take a long time, but I mention this as an explanation of my motivation of this approach.
I understand that a static file scope variable is not accessible to other translation units (extern in other files), so I wish to distinguish that this is not the case I'm handling. What I don't know at the moment, is where I should declare operation1() to main, should it be
static int i
extern void operation1();
So that int is declared as visible to the function?
I would appreciate any pointers in this regard. Thanks.
Put the state variables into a struct. Pass this struct to each function.
Example.
// foo.h
struct TheState
{
int x;
char *y;
// ...
};
void func1(TheState &);
void func2(TheState &);
// main.cc
#include "foo.h"
void main_loop()
{
TheState the_state; // initialize this however you want
for (;;)
{
if ( blah) func1(the_state);
else func2(the_state);
}
}
// func1.cc
#include "foo.h"
void func1(TheState &the_state)
{
++the_state.x;
}
No, you can't do that. static objects aren't visible in other source files, ever.
How large is your switch anyway? And what is the reason for modifying it?
Perhaps the original programmer had good reasons for the local, static variables? You say it is called in a loop, and some of the static variables are used to keep state from one iteration to the next, shared among branches of the switch. It is certainly a weird way to structure the code. I can think of doing something like this to run some sort of finite automaton, but in that case I'd write the automaton as a string of snippets of code for each state, and transfer among them by straight gotos. I'd make certain somewhere very near there is a description of the automaton in a more readable form.
But I might be totally off-base. Can you share a bit more about what this code does?
First, switches often are avoidable by creating better data structures with their functions (e.g. classes with a virtual member function command whose implementations do the right thing).
On a less ambitious level you could just pass pointers to the statics which are needed in that particular case to the function so that it can read and modify the state of those variables.
Depending on what the functions do, one could also pass state information as value parameters (copies), let the function do their work depending on that state, receive the results and THEN change global state in the main switch according to the result. The state change then is clearly visible (i.e. no side effects in the functions) and the noisy distracting details are banned to another file.
If each case tends to use many of the static variables then you could put them all in a struct; that change should be doable with a text editor (replace variable name x with mystruct.x etc.). Then each function just gets a pointer to that struct. EDIT: As I said in a comment: Perhaps the commands naturally form groups which are concerned with only parts of the state (e.g. there are commands which only read, others which only write data etc.). Then the global state could be split in corresponding groups of data. Each function only gets to see the data group which concerns it, which limits potential side effects.
But generally spoken the function as it is now seems badly designed/grown over time; working on a large set of static variables means having "side effects" in the code all over -- it's not easy to see what any given portion of code does and how it interacts with others. The information flow is not explicit. Analyzing clusters of data which belong together, organizing them in classes and separating them in files would be one task here, even without any virtual member functions.
As to your last question: The "case functions" you create (operation1(); etc.) need only be known in the file which call them. If they are in one or several separate files you should create a header containing the prototypes.

Try to find global variables from compiled files. The program can't distinguish constants from global variables.

Good Day! I'm trying to find the decision for a long time.
My problem is:
For example I have 2 .cpp files, one of them containing
const std::string DICTIONARY_DEFAULT = "blah";
const std::string ADDTODICTIONARY_DEFAULT = "blah";
const std::string BUTTONS = "blah";
and the second one with
static int x1;
static int NewY1, NewY2, NewX1, NewX2;
Both fragments are in the global variables section. I need to print the global static variables (for example), but ignore constants. In nm output they're looking absolutely identical (b-type for every case, which means uninitialized local scope symbol). Is there any way to separate this cases automatically using only linux utilities (grep, regexps and so on are perfectly okay)?
MY TASK FOR BETTER UNDERSTANDING:
There is a program in C++, the main task is to find and to withdraw the list of global variables.
Input data looks like archives with lots of .cpp files. Every .cpp file is syntactically correct program in C++ (It Must successfully compiled using compilier GNU C++ and Microsoft Visual C++).
For every file from the archive I must output in separate string the name of the file and the list of global variables, like in the example:
Output Data :
000000.cpp ancestor ansv cost graph M N p qr query u
000001.cpp
000002.cpp
000003.cpp
000004.cpp
000005.cpp
000006.cpp
000007.cpp edge tree
finding global variables is a 'subject' of this clang tutorial -- in this tutorial author did it 'just for fun', but you may add some code to do exactly what you need... (btw, it is not so hard as one may guess :))
Short answer: There is actually no way to do it in every case
Long answer: Take a look at the SYMBOL TABLE using 'objdump -x file.o'. You can see that all global variables, both static and const, are allocated into a section called .bss. A section called .rodata also exists and it is, generally speaking, used to store const data. Unfortunately, in your case you are declaring two const std::string objects. Those objects are initialized by invoking their constructor before the 'main' function is run. Still, the initialization of their fields happens at run-time and so they are only 'logically' const, and not really const.
The compiler has no choice but to allocate them into the .bss section with all other globals.
If you add the following line
const int willBeInRoData = 42;
You will find that its symbol will be in the .rodata section and so it will be distinguishable from the other global integers.

Unnecessary curly braces in C++

When doing a code review for a colleague today I saw a peculiar thing. He had surrounded his new code with curly braces like this:
Constructor::Constructor()
{
// Existing code
{
// New code: do some new fancy stuff here
}
// Existing code
}
What is the outcome, if any, from this? What could be the reason for doing this? Where does this habit come from?
The environment is embedded devices. There is a lot of legacy C code wrapped in C++ clothing. There are a lot of C turned C++ developers.
There are no critical sections in this part of the code. I have only seen it in this part of the code. There are no major memory allocations done, just some flags that are set, and some bit twiddling.
The code that is surrounded by curly braces is something like:
{
bool isInit;
(void)isStillInInitMode(&isInit);
if (isInit) {
return isInit;
}
}
(Don't mind the code, just stick to the curly braces... ;) )
After the curly braces there are some more bit twiddling, state checking, and basic signaling.
I talked to the guy and his motivation was to limit the scope of variables, naming clashes, and some other that I couldn't really pick up.
From my point of view this seems rather strange and I don't think that the curly braces should be in our code. I saw some good examples in all the answers on why one could surround code with curly braces, but shouldn't you separate the code into methods instead?
fsdf
It's sometimes nice since it gives you a new scope, where you can more "cleanly" declare new (automatic) variables.
In C++ this is maybe not so important since you can introduce new variables anywhere, but perhaps the habit is from C, where you could not do this until C99. :)
Since C++ has destructors, it can also be handy to have resources (files, mutexes, or whatever) automatically released as the scope exits, which can make things cleaner. This means you can hold on to some shared resource for a shorter duration than you would if you grabbed it at the start of the method.
One possible purpose is to control variable scope. And since variables with automatic storage are destroyed when they go out of scope, this can also enable a destructor to be called earlier than it otherwise would.
The extra braces are used to define the scope of the variable declared inside the braces. It is done so that the destructor will be called when the variable goes out of scope. In the destructor, you may release a mutex (or any other resource) so that other could acquire it.
In my production code, I've written something like this:
void f()
{
// Some code - MULTIPLE threads can execute this code at the same time
{
scoped_lock lock(mutex); // Critical section starts here
// Critical section code
// EXACTLY ONE thread can execute this code at a time
} // The mutex is automatically released here
// Other code - MULTIPLE threads can execute this code at the same time
}
As you can see, in this way, you can use scoped_lock in a function and at the same time, can define its scope by using extra braces. This makes sure that even though the code outside the extra braces can be executed by multiple threads simultaneously, the code inside the braces will be executed by exactly one thread at a time.
As others have pointed out, a new block introduces a new scope, enabling one to write a bit of code with its own variables that don't trash the namespace of the surrounding code, and doesn't use resources any longer than necessary.
However, there's another fine reason for doing this.
It is simply to isolate a block of code that achieves a particular (sub)purpose. It is rare that a single statement achieves a computational effect I want; usually it takes several. Placing those in a block (with a comment) allows me tell the reader (often myself at a later date):
This chunk has a coherent conceptual purpose
Here's all the code needed
And here's a comment about the chunk.
e.g.
{ // update the moving average
i= (i+1) mod ARRAYSIZE;
sum = sum - A[i];
A[i] = new_value;
sum = sum + new_value;
average = sum / ARRAYSIZE ;
}
You might argue I should write a function to do all that. If I only do it once, writing a function just adds additional syntax and parameters; there seems little point. Just think of this as a parameterless, anonymous function.
If you are lucky, your editor will have a fold/unfold function that will even let you hide the block.
I do this all the time. It is great pleasure to know the bounds of the code I need to inspect, and even better to know that if that chunk isn't the one I want, I don't have to look at any of the lines.
One reason could be that the lifetime of any variables declared inside the new curly braces block is restricted to this block. Another reason that comes to mind is to be able to use code folding in the favourite editor.
This is the same as an if (or while, etc.) block, just without if. In other words, you introduce a scope without introducing a control structure.
This "explicit scoping" is typically useful in following cases:
To avoid name clashes.
To scope using.
To control when the destructors are called.
Example 1:
{
auto my_variable = ... ;
// ...
}
// ...
{
auto my_variable = ... ;
// ...
}
If my_variable happens to be a particularly good name for two different variables that are used in isolation from each other, then explicit scoping allows you to avoid inventing a new name just to avoid the name clash.
This also allows you to avoid using my_variable out of its intended scope by accident.
Example 2:
namespace N1 { class A { }; }
namespace N2 { class A { }; }
void foo() {
{
using namespace N1;
A a; // N1::A.
// ...
}
{
using namespace N2;
A a; // N2::A.
// ...
}
}
Practical situations when this is useful are rare and may indicate the code is ripe for refactoring, but the mechanism is there should you ever genuinely need it.
Example 3:
{
MyRaiiClass guard1 = ...;
// ...
{
MyRaiiClass guard2 = ...;
// ...
} // ~MyRaiiClass for guard2 called.
// ...
} // ~MyRaiiClass for guard1 called.
This can be important for RAII in cases when the need for freeing resources does not naturally "fall" onto boundaries of functions or control structures.
Everyone else already covered correctly the scoping, RAII etc. possiblities, but since you mention an embedded environment, there is one further potential reason:
Maybe the developer doesn't trust this compiler's register allocation or wants to explicitly control the stack frame size by limiting the number of automatic variables in scope at once.
Here isInit will likely be on the stack:
{
bool isInit;
(void)isStillInInitMode(&isInit);
if (isInit) {
return isInit;
}
}
If you take out the curly braces, space for isInit may be reserved in the stack frame even after it could potentially be reused: if there are lots of automatic variables with similarly localized scope, and your stack size is limited, that could be a problem.
Similarly, if your variable is allocated to a register, going out of scope should provide a strong hint that register is now available for re-use. You'd have to look at the assembler generated with and without the braces to figure out if this makes a real difference (and profile it - or watch for stack overflow - to see if this difference really matters).
This is really useful when using scoped locks in conjunction with critical sections in multithreaded programming. Your scoped lock initialised in the curly braces (usually the first command) will go out of scope at the end of the end of the block and so other threads will be able to run again.
I think others have covered scoping already, so I'll mention the unnecessary braces might also serve purpose in the development process. For example, suppose you are working on an optimization to an existing function. Toggling the optimization or tracing a bug to a particular sequence of statements is simple for the programmer -- see the comment prior to the braces:
// if (false) or if (0)
{
//experimental optimization
}
This practice is useful in certain contexts like debugging, embedded devices, or personal code.
I agree with ruakh. If you want a good explanation of the various levels of scope in C, check out this post:
Various Levels of Scope in C Application
In general, the use of "Block scope" is helpful if you want to just use a temporary variable that you don't have to keep track of for the lifetime of the function call. Additionally, some people use it so you can use the same variable name in multiple locations for convenience, though that's not generally a good idea. E.g.:
int unusedInt = 1;
int main(void) {
int k;
for(k = 0; k<10; k++) {
int returnValue = myFunction(k);
printf("returnValue (int) is: %d (k=%d)",returnValue,k);
}
for(k = 0; k<100; k++) {
char returnValue = myCharacterFunction(k);
printf("returnValue (char) is: %c (k=%d)",returnValue,k);
}
return 0;
}
In this particular example, I have defined returnValue twice, but since it is just at block scope, instead of function scope (i.e., function scope would be, for example, declaring returnValue just after int main(void)), I don't get any compiler errors, as each block is oblivious to the temporary instance of returnValue declared.
I can't say that this is a good idea in general (i.e., you probably shouldn't reuse variable names repeatedly from block-to-block), but in general, it saves time and lets you avoid having to manage the value of returnValue across the entire function.
Finally, please note the scope of the variables used in my code sample:
int: unusedInt: File and global scope (if this were a static int, it would only be file scope)
int: k: Function scope
int: returnValue: Block scope
char: returnValue: Block scope
So, why to use "unnecessary" curly braces?
For "Scoping" purposes (as mentioned above)
Making code more readable in a way (pretty much like using #pragma, or defining "sections" that can be visualized)
Because you can. Simple as that.
P.S. It's not BAD code; it's 100% valid. So, it's rather a matter of (uncommon) taste.
After viewing the code in the edit, I can say that the unnecessary brackets are probably (in the original coders view) to be 100% clear what will happen during the if/then, even tho it is only one line now, it might be more lines later, and the brackets guarantee you wont make an error.
{
bool isInit;
(void)isStillInInitMode(&isInit);
if (isInit) {
return isInit;
}
return -1;
}
if the above was original, and removing "extras" woudl result in:
{
bool isInit;
(void)isStillInInitMode(&isInit);
if (isInit)
return isInit;
return -1;
}
then, a later modification might look like this:
{
bool isInit;
(void)isStillInInitMode(&isInit);
if (isInit)
CallSomethingNewHere();
return isInit;
return -1;
}
and that, would of course, cause an issue, since now isInit would always be returned, regardless of the if/then.
Objects are automagically destroyed when they go out of scope...
Another example of usage is UI-related classes, especially Qt.
For example, you have some complicated UI and a lot of widgets, each of them got its own spacing, layout, etc. Instead of naming them space1, space2, spaceBetween, layout1, ... you can save yourself from non-descriptive names for variables that exist only in two-three lines of code.
Well, some might say that you should split it in methods, but creating 40 non-reusable methods doesn't look ok - so I decided to just add braces and comments before them, so it looks like logical block.
Example:
// Start video button
{
<Here goes the code >
}
// Stop video button
{
<...>
}
// Status label
{
<...>
}
I can't say that's the best practice, but it's good one for legacy code.
Got these problems when a lot of people added their own components to UI and some methods became really massive, but it's not practical to create 40 onetime-usage methods inside class that already messed up.

How local constants are stored in c++ library files

I am writing a library where I need to use some constant integers. I have declared constant int as a local variable in my c function e.g. const int test = 45325;
Now I want to hide this constant variable. What it means is, if I share this library as a .so with someone, he should not be able to find out this constant value ?
Is it possible to hide constant integers defined inside a library ? Please help
Here is my sample code
int doSomething()
{
const int abc = 23456;
int def = abc + 123;
}
doSomething is defined as local function in my cpp file. I am referring this constant for some calculations inside the same function.
If I understand right, you're not so much worried about an exported symbol (since it's a plain normal local variable, I'd not worry about that anyway), but about anyone finding out that constant at all (probably because it is an encryption key or a magic constant for a license check, or something the like).
This is something that is, in principle, impossible. Someone who has the binary code (which is necessarily the case in a library) can figure it out if he wants to. You can make it somewhat harder by calculating this value in an obscure way (but be aware of compiler optimizations), but even so this only makes it trivially harder for someone who wants to find out. It will just mean that someone won't see "mov eax, 45325" in the disassembly right away, but it probably won't keep someone busy for more than a few minutes either way.
The constant will always be contained in the library in some form, even if it is as instructions to load it into a register, for the simple reason that the library needs it at runtime to work with it.
If this is meant as some sort of a secret key, there is no good way to protect it inside the library (in fact, the harder you make it, the more people will consider it a sport to find it).
The simplest is probably to just do a wrapper class for them
struct Constants
{
static int test();
...
then you can hide the constant in the .cpp file
You can declare it as
extern const int test;
and then have it actually defined in a compilation unit somewhere (.cpp file).
You could also use a function to obtain the value.

Finding C++ static initialization order problems

We've run into some problems with the static initialization order fiasco, and I'm looking for ways to comb through a whole lot of code to find possible occurrences. Any suggestions on how to do this efficiently?
Edit: I'm getting some good answers on how to SOLVE the static initialization order problem, but that's not really my question. I'd like to know how to FIND objects that are subject to this problem. Evan's answer seems to be the best so far in this regard; I don't think we can use valgrind, but we may have memory analysis tools that could perform a similar function. That would catch problems only where the initialization order is wrong for a given build, and the order can change with each build. Perhaps there's a static analysis tool that would catch this. Our platform is IBM XLC/C++ compiler running on AIX.
Solving order of initialization:
First off, this is just a temporary work-around because you have global variables that you are trying to get rid of but just have not had time yet (you are going to get rid of them eventually aren't you? :-)
class A
{
public:
// Get the global instance abc
static A& getInstance_abc() // return a reference
{
static A instance_abc;
return instance_abc;
}
};
This will guarantee that it is initialised on first use and destroyed when the application terminates.
Multi-Threaded Problem:
C++11 does guarantee that this is thread-safe:
§6.7 [stmt.dcl] p4
If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
However, C++03 does not officially guarantee that the construction of static function objects is thread safe. So technically the getInstance_XXX() method must be guarded with a critical section. On the bright side, gcc has an explicit patch as part of the compiler that guarantees that each static function object will only be initialized once even in the presence of threads.
Please note: Do not use the double checked locking pattern to try and avoid the cost of the locking. This will not work in C++03.
Creation Problems:
On creation, there are no problems because we guarantee that it is created before it can be used.
Destruction Problems:
There is a potential problem of accessing the object after it has been destroyed. This only happens if you access the object from the destructor of another global variable (by global, I am referring to any non-local static variable).
The solution is to make sure that you force the order of destruction.
Remember the order of destruction is the exact inverse of the order of construction. So if you access the object in your destructor, you must guarantee that the object has not been destroyed. To do this, you must just guarantee that the object is fully constructed before the calling object is constructed.
class B
{
public:
static B& getInstance_Bglob;
{
static B instance_Bglob;
return instance_Bglob;;
}
~B()
{
A::getInstance_abc().doSomthing();
// The object abc is accessed from the destructor.
// Potential problem.
// You must guarantee that abc is destroyed after this object.
// To guarantee this you must make sure it is constructed first.
// To do this just access the object from the constructor.
}
B()
{
A::getInstance_abc();
// abc is now fully constructed.
// This means it was constructed before this object.
// This means it will be destroyed after this object.
// This means it is safe to use from the destructor.
}
};
I just wrote a bit of code to track down this problem. We have a good size code base (1000+ files) that was working fine on Windows/VC++ 2005, but crashing on startup on Solaris/gcc.
I wrote the following .h file:
#ifndef FIASCO_H
#define FIASCO_H
/////////////////////////////////////////////////////////////////////////////////////////////////////
// [WS 2010-07-30] Detect the infamous "Static initialization order fiasco"
// email warrenstevens --> [initials]#[firstnamelastname].com
// read --> http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.12 if you haven't suffered
// To enable this feature --> define E-N-A-B-L-E-_-F-I-A-S-C-O-_-F-I-N-D-E-R, rebuild, and run
#define ENABLE_FIASCO_FINDER
/////////////////////////////////////////////////////////////////////////////////////////////////////
#ifdef ENABLE_FIASCO_FINDER
#include <iostream>
#include <fstream>
inline bool WriteFiasco(const std::string& fileName)
{
static int counter = 0;
++counter;
std::ofstream file;
file.open("FiascoFinder.txt", std::ios::out | std::ios::app);
file << "Starting to initialize file - number: [" << counter << "] filename: [" << fileName.c_str() << "]" << std::endl;
file.flush();
file.close();
return true;
}
// [WS 2010-07-30] If you get a name collision on the following line, your usage is likely incorrect
#define FIASCO_FINDER static const bool g_psuedoUniqueName = WriteFiasco(__FILE__);
#else // ENABLE_FIASCO_FINDER
// do nothing
#define FIASCO_FINDER
#endif // ENABLE_FIASCO_FINDER
#endif //FIASCO_H
and within every .cpp file in the solution, I added this:
#include "PreCompiledHeader.h" // (which #include's the above file)
FIASCO_FINDER
#include "RegularIncludeOne.h"
#include "RegularIncludeTwo.h"
When you run your application, you will get an output file like so:
Starting to initialize file - number: [1] filename: [p:\\OneFile.cpp]
Starting to initialize file - number: [2] filename: [p:\\SecondFile.cpp]
Starting to initialize file - number: [3] filename: [p:\\ThirdFile.cpp]
If you experience a crash, the culprit should be in the last .cpp file listed. And at the very least, this will give you a good place to set breakpoints, as this code should be the absolute first of your code to execute (after which you can step through your code and see all of the globals that are being initialized).
Notes:
It's important that you put the "FIASCO_FINDER" macro as close to the top of your file as possible. If you put it below some other #includes you run the risk of it crashing before identifying the file that you're in.
If you're using Visual Studio, and pre-compiled headers, adding this extra macro line to all of your .cpp files can be done quickly using the Find-and-replace dialog to replace your existing #include "precompiledheader.h" with the same text plus the FIASCO_FINDER line (if you check off "regular expressions, you can use "\n" to insert multi-line replacement text)
Depending on your compiler, you can place a breakpoint at the constructor initialization code. In Visual C++, this is the _initterm function, which is given a start and end pointer of a list of the functions to call.
Then step into each function to get the file and function name (assuming you've compiled with debugging info on). Once you have the names, step out of the function (back up to _initterm) and continue until _initterm exits.
That gives you all the static initializers, not just the ones in your code - it's the easiest way to get an exhaustive list. You can filter out the ones you have no control over (such as those in third-party libraries).
The theory holds for other compilers but the name of the function and the capability of the debugger may change.
perhaps use valgrind to find usage of uninitialized memory. The nicest solution to the "static initialization order fiasco" is to use a static function which returns an instance of the object like this:
class A {
public:
static X &getStatic() { static X my_static; return my_static; }
};
This way you access your static object is by calling getStatic, this will guarantee it is initialized on first use.
If you need to worry about order of de-initialization, return a new'd object instead of a statically allocated object.
EDIT: removed the redundant static object, i dunno why but i mixed and matched two methods of having a static together in my original example.
There is code that essentially "initializes" C++ that is generated by the compiler. An easy way to find this code / the call stack at the time is to create a static object with something that dereferences NULL in the constructor - break in the debugger and explore a bit. The MSVC compiler sets up a table of function pointers that is iterated over for static initialization. You should be able to access this table and determine all static initialization taking place in your program.
We've run into some problems with the
static initialization order fiasco,
and I'm looking for ways to comb
through a whole lot of code to find
possible occurrences. Any suggestions
on how to do this efficiently?
It's not a trivial problem but at least it can done following fairly simple steps if you have an easy-to-parse intermediate-format representation of your code.
1) Find all the globals that have non-trivial constructors and put them in a list.
2) For each of these non-trivially-constructed objects, generate the entire potential-function-tree called by their constructors.
3) Walk through the non-trivially-constructor function tree and if the code references any other non-trivially constructed globals (which are quite handily in the list you generated in step one), you have a potential early-static-initialization-order issue.
4) Repeat steps 2 & 3 until you have exhausted the list generated in step 1.
Note: you may be able to optimize this by only visiting the potential-function-tree once per object class rather than once per global instance if you have multiple globals of a single class.
Replace all the global objects with global functions that return a reference to an object declared static in the function. This isn't thread-safe, so if your app is multi-threaded you might need some tricks like pthread_once or a global lock. This will ensure that everything is initialized before it is used.
Now, either your program works (hurrah!) or else it sits in an infinite loop because you have a circular dependency (redesign needed), or else you move on to the next bug.
The first thing you need to do is make a list of all static objects that have non-trivial constructors.
Given that, you either need to plug through them one at a time, or simply replace them all with singleton-pattern objects.
The singleton pattern comes in for a lot of criticism, but the lazy "as-required" construction is a fairly easy way to fix the majority of the problems now and in the future.
old...
MyObject myObject
new...
MyObject &myObject()
{
static MyObject myActualObject;
return myActualObject;
}
Of course, if your application is multi-threaded, this can cause you more problems than you had in the first place...
Gimpel Software (www.gimpel.com) claims that their PC-Lint/FlexeLint static analysis tools will detect such problems.
I have had good experience with their tools, but not with this specific issue so I can't vouch for how much they would help.
Some of these answers are now out of date. For the sake of people coming from search engines, like myself:
On Linux and elsewhere, finding instances of this problem is possible through Google's AddressSanitizer.
AddressSanitizer is a part of LLVM starting with version 3.1 and a
part of GCC starting with version 4.8
You would then do something like the following:
$ g++ -fsanitize=address -g staticA.C staticB.C staticC.C -o static
$ ASAN_OPTIONS=check_initialization_order=true:strict_init_order=true ./static
=================================================================
==32208==ERROR: AddressSanitizer: initialization-order-fiasco on address ... at ...
#0 0x400f96 in firstClass::getValue() staticC.C:13
#1 0x400de1 in secondClass::secondClass() staticB.C:7
...
See here for more details:
https://github.com/google/sanitizers/wiki/AddressSanitizerInitializationOrderFiasco
Other answers are correct, I just wanted to add that the object's getter should be implemented in a .cpp file and it should not be static. If you implement it in a header file, the object will be created in each library / framework you call it from....
If your project is in Visual Studio (I've tried this with VC++ Express 2005, and with Visual Studio 2008 Pro):
Open Class View (Main menu->View->Class View)
Expand each project in your solution and Click on "Global Functions and Variables"
This should give you a decent list of all of the globals that are subject to the fiasco.
In the end, a better approach is to try to remove these objects from your project (easier said than done, sometimes).