C++ ctor question (linux) - c++

environment: linux, userspace-application created via g++ from a couple of C++ files
(result is an ELF)
there is a problem (SIGSEGV) when traversing the constructor list
( __CTOR_LIST__ )
(note: code called via this list is a kind of system initialisation for every class,
not the constructor-code I wrote)
when I understand correctly every compilation unit (every .o created from a .cpp)
creates one entry in
__CTOR_LIST__
the problem (SIGSEGV) does not exist when I step via GDB through the program
for debugging this I'm looking for an way to add own code code before the
call of
"_do_global_ctors_aux"
any hints for this ?
thanks,
Uwe

There are many possible reasons of this. Ranges from that you access objects not yet created (because order of creation of objects across different translation units is undefined) which i think is quite probable in this case, and ranges to an error on your build-environment.
To make a own function be called before other constructor function, you have a constructor (priority) attribute described here. GCC keeps a priority for each files' constructor input section. And it links them in order of those priorities. In the linker script of my linux system, that code looks like this (output it using ld -verbose):
.ctors :
{
/* gcc uses crtbegin.o to find the start of
the constructors, so we make sure it is
first. Because this is a wildcard, it
doesn't matter if the user does not
actually link against crtbegin.o; the
linker won't look for a file to match a
wildcard. The wildcard also means that it
doesn't matter which directory crtbegin.o
is in. */
KEEP (*crtbegin.o(.ctors))
KEEP (*crtbegin?.o(.ctors))
/* We don't want to include the .ctor section from
the crtend.o file until after the sorted ctors.
The .ctor section from the crtend file contains the
end of ctors marker and it must be last */
KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors))
KEEP (*(SORT(.ctors.*)))
KEEP (*(.ctors))
}
You would want to give it a low priority to make it execute before other registered ctor functions having a higher priority number. However from the looks of it, it seems like constructors having no number will be executed first. Not sure entirely. Best you give it a try. If you want to have your function called even before _do_global_ctors_aux, you have to release the original _init function that is normally executed when your program is loaded by the ELF loader (look into the -init option of ld). It's been some time since i messed with it, but i remember it has to do some intimate details of initialization, so i wouldn't try to replace it. Try using the constructor attribute i linked to. However, be very careful. Your code will possibly be executed before other important objects like cout are constructed.
Update: I did a test, and it actually executes ctor functions in reverse. So ctor functions that are linked first are executed later. This code happens to be in crtstuff.c of the gcc source code:
func_ptr *p;
for (p = __CTOR_END__ - 1; *p != (func_ptr) -1; p--)
(*p) ();
I made a little test:
void dothat() { }
struct f {
f() { dothat(); }
} f_;
void doit() __attribute__((constructor (0)));
void doit() { }
int main() { }
Linking with --print-map yields, among others, this output:
.ctors 0x080494f4 0x10
*crtbegin.o(.ctors)
.ctors 0x080494f4 0x4 /usr/lib/gcc/i686-pc-linux-gnu/4.3.2/crtbegin.o
*crtbegin?.o(.ctors)
*(EXCLUDE_FILE(*crtend?.o *crtend.o) .ctors)
.ctors 0x080494f8 0x4 /tmp/ccyzWBjs.o
*(SORT(.ctors.*))
.ctors.65535 0x080494fc 0x4 /tmp/ccyzWBjs.o
*(.ctors)
.ctors 0x08049500 0x4 /usr/lib/gcc/i686-pc-linux-gnu/4.3.2/crtend.o
Notice how .ctors.65535 is the section we implicitly created by our attribute priority 0. Now, if you give it that priority, gcc warns and it's totally right :p
test.cpp:7: warning: constructor priorities from 0 to 100 are reserved for the implementation
I tested it by breaking on doit and dothat, and it called them in the order we expect. Have fun!

It's possible that you're being bitten by the so-called "Static Initialization Order Fiasco".
Basically, when there is more than one translation unit (that is, C++ source file), and each file defines a global object, it is not possible for the C++ compiler/linker to establish which to construct first. If x depends on y being constructed first, but by chance compilation/linking causes x to be constructed before y, the program will typically crash. See item [10.12] of the C++ FAQ Lite for more details. Item [10.13] contains a solution -- the "construct on first use" idiom.

Not the question you asked, but...
In C++/g++, you can have a class where the declared [header] methods are never actually defined in the source [.cc] files as long as those methods are never invoked.
As a consequence, you can copy your current code files to a temporary directory, do a hack and slash job on them, run a [manual] binary search, and isolate the problem fairly quickly.
Not elegant, but highly effective.
Aside from the infamous "Static Initialization Order" issue, there are also more esoteric cases such as the one recently pointed out to me here on SO by Charles Bailey (see the comments).
E.g Mixing: int p [] = { 1,2,3 };
And: extern int * p;
Will produce a similar coredump problem.

Related

Override the call to main()?

I'm working on a project where we have several executables that share several object files. We want to add logging to all of the executables, and have a library for doing so.
However, it seems clumsy to go to the main() function of every executable file and add in the same boiler-plate function call to start the logging. It means we write the same thing over again, and loose out on maintainability and DRY ("don't repeat yourself"). It would be nice if we could systematically ensure that logging started before the main function gets called.
It occurred to me there are functions in libc++ that make the call to main, and it may be possible to override them. However, I don't know what they are and imagine this could break things if we're not careful. Does anyone know how this would be done? Or, if that's too over-the-top, any other suggestions on how to proceed?
We're using C++11 with g++ 4.8 if it makes any difference.
You do not need to do this by modifying main().
You should instead create a class at global scope in a shared object library. The constructor of this class will perform the "initialisation" you want to do, before main() runs, and its destructor will run after main().
The issue you need to deal with is that the order of this initialisation and destruction is not guaranteed to be deterministic with regards to any other global-scope objects. All of this could go in one .cpp compilation unit.
class LoggingManager // you can make this a singleton but not necessary
{
public:
LoggingManager();
~LoggingManager();
};
LoggingManager::LoggingManager()
{
// your initialisation code goes here
}
LoggingManager::~LoggingManager()
{
// your clean-up code goes here. It should not throw
}
LoggingManager loggingManagerStaticInstance;
Note that there is a small danger of the "static initialization" issue which means in reality your loggingManagerStaticInstance might not be loaded until your compilation unit is first accessed.
In reality it doesn't matter if this is after main() as long as the initialisation happens before it is first needed (a bit like a singleton) but it means your compilation unit might need to contain something that is guaranteed to get pulled in.
If you want to "stick" to gnu or similar they provide __attribute__(constructor) which might resolve it although there is an easier way of having some dummy extern int implemented or dummy function that returns an int that gets called from within whatever header you do actually use to implement logging.

Is it bad practice if I have to organise Compile Sources

I am working on a C++ project in Xcode, and one of my .cpp files instantiates some variables. Another .cpp file in the application uses these variables to instantiate another object and needs them to be instantiated to not throw a null-pointer exception. My solution so far was simply to drag-drop (XCode simplicity) the first file over the second one in the build-phase order. It works fine now, but I have a feeling that it is not the optimal solution, and that there is something fundamentally wrong with my code if I need to organise the compile order manually for the application to run properly.
Should I never instantiate something outside of functions, or what is the golden rule? Thanks.
EDIT: An example as requested.
The problem lies in a Observer/Event system.
In a source-file I do this:
Trigger* mainMenu_init = new Trigger(std::vector<Event*> {
// Event(s):
event_gameInit,
}, [](Event* e) {
// Action(s):
std::cout << "Hello World" << std::endl;
});
In the trigger's constructor the Event is asked to add is as an observer:
for(Event* event : events)
event->addObserver(this);
BUT, the events are just external pointers, so if they are not initialised (which they are in another source-file) this initialisation will fail. So what I found was that if I do not organise the compilation-phase myself, random triggers will not work while other will, depending on if they are built before or after the Event.cpp file.
I assume you are talking about non-trivial initialization of global variables (or of static variables), such as (at the top level of a file):
MyObject *myPtrObject = new MyObject(42, "blah");
MyObject myOtherObject;
("trivial" initialization is, roughly speaking, when there is no constructor involved and everything just involves constants; so if you initialize a pointer to zero, it will be zero before any code is actually invoked)
The order of initialization between different source files is NOT GUARANTEED in C++. It happens to depend on the order of the files with Apple's current system, but THAT MIGHT CHANGE.
So yes, there is something fundamentally wrong.
Golden Rules
IMPORTANT: In the initialization of a global object, don't use any other global objects from different source files.
Don't overuse global variables. They have numerous disadvantages from a software design point of view.
Keep initialization of global objects simple. That will make it easier to stick to the first rule.
Not knowing anything about your program, it's of course hard to give more concrete design advice.

Do you really need a main() in C++?

From what I can tell you can kick off all the action in a constructor when you create a global object. So do you really need a main() function in C++ or is it just legacy?
I can understand that it could be considered bad practice to do so. I'm just asking out of curiosity.
If you want to run your program on a hosted C++ implementation, you need a main function. That's just how things are defined. You can leave it empty if you want of course. On the technical side of things, the linker wants to resolve the main symbol that's used in the runtime library (which has no clue of your special intentions to omit it - it just still emits a call to it). If the Standard specified that main is optional, then of course implementations could come up with solutions, but that would need to happen in a parallel universe.
If you go with the "Execution starts in the constructor of my global object", beware that you set yourself up to many problems related to the order of constructions of namespace scope objects defined in different translation units (So what is the entry point? The answer is: You will have multiple entry points, and what entry point is executed first is unspecified!). In C++03 you aren't even guaranteed that cout is properly constructed (in C++0x you have a guarantee that it is, before any code tries to use it, as long as there is a preceeding include of <iostream>).
You don't have those problems and don't need to work around them (wich can be very tricky) if you properly start executing things in ::main.
As mentioned in the comments, there are however several systems that hide main from the user by having him tell the name of a class which is instantiated within main. This works similar to the following example
class MyApp {
public:
MyApp(std::vector<std::string> const& argv);
int run() {
/* code comes here */
return 0;
};
};
IMPLEMENT_APP(MyApp);
To the user of this system, it's completely hidden that there is a main function, but that macro would actually define such a main function as follows
#define IMPLEMENT_APP(AppClass) \
int main(int argc, char **argv) { \
AppClass m(std::vector<std::string>(argv, argv + argc)); \
return m.run(); \
}
This doesn't have the problem of unspecified order of construction mentioned above. The benefit of them is that they work with different forms of higher level entry points. For example, Windows GUI programs start up in a WinMain function - IMPLEMENT_APP could then define such a function instead on that platform.
Yes! You can do away with main.
Disclaimer: You asked if it were possible, not if it should be done. This is a totally un-supported, bad idea. I've done this myself, for reasons that I won't get into, but I am not recommending it. My purpose wasn't getting rid of main, but it can do that as well.
The basic steps are as follows:
Find crt0.c in your compiler's CRT source directory.
Add crt0.c to your project (a copy, not the original).
Find and remove the call to main from crt0.c.
Getting it to compile and link can be difficult; How difficult depends on which compiler and which compiler version.
Added
I just did it with Visual Studio 2008, so here are the exact steps you have to take to get it to work with that compiler.
Create a new C++ Win32 Console Application (click next and check Empty Project).
Add new item.. C++ File, but name it crt0.c (not .cpp).
Copy contents of C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\crt\src\crt0.c and paste into crt0.c.
Find mainret = _tmain(__argc, _targv, _tenviron); and comment it out.
Right-click on crt0.c and select Properties.
Set C/C++ -> General -> Additional Include Directories = "C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\crt\src".
Set C/C++ -> Preprocessor -> Preprocessor Definitions = _CRTBLD.
Click OK.
Right-click on the project name and select Properties.
Set C/C++ -> Code Generation -> Runtime Library = Multi-threaded Debug (/MTd) (*).
Click OK.
Add new item.. C++ File, name it whatever (app.cpp for this example).
Paste the code below into app.cpp and run it.
(*) You can't use the runtime DLL, you have to statically link to the runtime library.
#include <iostream>
class App
{
public: App()
{
std::cout << "Hello, World! I have no main!" << std::endl;
}
};
static App theApp;
Added
I removed the superflous exit call and the blurb about lifetime as I think we're all capable of understanding the consequences of removing main.
Ultra Necro
I just came across this answer and read both it and John Dibling's objections below. It was apparent that I didn't explain what the above procedure does and why that does indeed remove main from the program entirely.
John asserts that "there is always a main" in the CRT. Those words are not strictly correct, but the spirit of the statement is. Main is not a function provided by the CRT, you must add it yourself. The call to that function is in the CRT provided entry point function.
The entry point of every C/C++ program is a function in a module named 'crt0'. I'm not sure if this is a convention or part of the language specification, but every C/C++ compiler I've come across (which is a lot) uses it. This function basically does three things:
Initialize the CRT
Call main
Tear down
In the example above, the call is _tmain but that is some macro magic to allow for the various forms that 'main' can have, some of which are VS specific in this case.
What the above procedure does is it removes the module 'crt0' from the CRT and replaces it with a new one. This is why you can't use the Runtime DLL, there is already a function in that DLL with the same entry point name as the one we are adding (2). When you statically link, the CRT is a collection of .lib files, and the linker allows you to override .lib modules entirely. In this case a module with only one function.
Our new program contains the stock CRT, minus its CRT0 module, but with a CRT0 module of our own creation. In there we remove the call to main. So there is no main anywhere!
(2) You might think you could use the runtime DLL by renaming the entry point function in your crt0.c file, and changing the entry point in the linker settings. However, the compiler is unaware of the entry point change and the DLL contains an external reference to a 'main' function which you're not providing, so it would not compile.
Generally speaking, an application needs an entry point, and main is that entry point. The fact that initialization of globals might happen before main is pretty much irrelevant. If you're writing a console or GUI app you have to have a main for it to link, and it's only good practice to have that routine be responsible for the main execution of the app rather than use other features for bizarre unintended purposes.
Well, from the perspective of the C++ standard, yes, it's still required. But I suspect your question is of a different nature than that.
I think doing it the way you're thinking about would cause too many problems though.
For example, in many environments the return value from main is given as the status result from running the program as a whole. And that would be really hard to replicate from a constructor. Some bit of code could still call exit of course, but that seems like using a goto and would skip destruction of anything on the stack. You could try to fix things up by having a special exception you threw instead in order to generate an exit code other than 0.
But then you still run into the problem of the order of execution of global constructors not being defined. That means that in any particular constructor for a global object you won't be able to make any assumptions about whether or not any other global object yet exists.
You could try to solve the constructor order problem by just saying each constructor gets its own thread, and if you want to access any other global objects you have to wait on a condition variable until they say they're constructed. That's just asking for deadlocks though, and those deadlocks would be really hard to debug. You'd also have the issue of which thread exiting with the special 'return value from the program' exception would constitute the real return value of the program as a whole.
I think those two issues are killers if you want to get rid of main.
And I can't think of a language that doesn't have some basic equivalent to main. In Java, for example, there is an externally supplied class name who's main static function is called. In Python, there's the __main__ module. In perl there's the script you specify on the command line.
If you have more than one global object being constructed, there is no guarantee as to which constructor will run first.
If you are building static or dynamic library code then you don't need to define main yourself, but you will still wind up running in some program that has it.
If you are coding for windows, do not do this.
Running your app entirely from within the constructor of a global object may work just fine for quite awhile, but sooner or later you will make a call to the wrong function and end up with a program that terminates without warning.
Global object constructors run during the startup of the C runtime.
The C runtime startup code runs during the DLLMain of the C runtime DLL
During DLLMain, you are holding the DLL loader lock.
Tring to load another DLL while already holding the DLL loader lock results in a swift death for your process.
Compiling your entire app into a single executable won't save you - many Win32 calls have the potential to quietly load system DLLs.
There are implementations where global objects are not possible, or where non-trivial constructors are not possible for such objects (especially in the mobile and embedded realms).

How exactly does __attribute__((constructor)) work?

It seems pretty clear that it is supposed to set things up.
When exactly does it run?
Why are there two parentheses?
Is __attribute__ a function? A macro? Syntax?
Does this work in C? C++?
Does the function it works with need to be static?
When does __attribute__((destructor)) run?
Example in Objective-C:
__attribute__((constructor))
static void initialize_navigationBarImages() {
navigationBarImages = [[NSMutableDictionary alloc] init];
}
__attribute__((destructor))
static void destroy_navigationBarImages() {
[navigationBarImages release];
}
It runs when a shared library is loaded, typically during program startup.
That's how all GCC attributes are; presumably to distinguish them from function calls.
GCC-specific syntax.
Yes, this works in C and C++.
No, the function does not need to be static.
The destructor runs when the shared library is unloaded, typically at program exit.
So, the way the constructors and destructors work is that the shared object file contains special sections (.ctors and .dtors on ELF) which contain references to the functions marked with the constructor and destructor attributes, respectively. When the library is loaded/unloaded the dynamic loader program (ld.so or somesuch) checks whether such sections exist, and if so, calls the functions referenced therein.
Come to think of it, there is probably some similar magic in the normal static linker so that the same code is run on startup/shutdown regardless if the user chooses static or dynamic linking.
.init/.fini isn't deprecated. It's still part of the the ELF standard and I'd dare say it will be forever. Code in .init/.fini is run by the loader/runtime-linker when code is loaded/unloaded. I.e. on each ELF load (for example a shared library) code in .init will be run. It's still possible to use that mechanism to achieve about the same thing as with __attribute__((constructor))/((destructor)). It's old-school but it has some benefits.
.ctors/.dtors mechanism for example require support by system-rtl/loader/linker-script. This is far from certain to be available on all systems, for example deeply embedded systems where code executes on bare metal. I.e. even if __attribute__((constructor))/((destructor)) is supported by GCC, it's not certain it will run as it's up to the linker to organize it and to the loader (or in some cases, boot-code) to run it. To use .init/.fini instead, the easiest way is to use linker flags: -init & -fini (i.e. from GCC command line, syntax would be -Wl -init my_init -fini my_fini).
On system supporting both methods, one possible benefit is that code in .init is run before .ctors and code in .fini after .dtors. If order is relevant that's at least one crude but easy way to distinguish between init/exit functions.
A major drawback is that you can't easily have more than one _init and one _fini function per each loadable module and would probably have to fragment code in more .so than motivated. Another is that when using the linker method described above, one replaces the original _init and _fini default functions (provided by crti.o). This is where all sorts of initialization usually occur (on Linux this is where global variable assignment is initialized). A way around that is described here
Notice in the link above that a cascading to the original _init() is not needed as it's still in place. The call in the inline assembly however is x86-mnemonic and calling a function from assembly would look completely different for many other architectures (like ARM for example). I.e. code is not transparent.
.init/.fini and .ctors/.detors mechanisms are similar, but not quite. Code in .init/.fini runs "as is". I.e. you can have several functions in .init/.fini, but it is AFAIK syntactically difficult to put them there fully transparently in pure C without breaking up code in many small .so files.
.ctors/.dtors are differently organized than .init/.fini. .ctors/.dtors sections are both just tables with pointers to functions, and the "caller" is a system-provided loop that calls each function indirectly. I.e. the loop-caller can be architecture specific, but as it's part of the system (if it exists at all i.e.) it doesn't matter.
The following snippet adds new function pointers to the .ctors function array, principally the same way as __attribute__((constructor)) does (method can coexist with __attribute__((constructor))).
#define SECTION( S ) __attribute__ ((section ( S )))
void test(void) {
printf("Hello\n");
}
void (*funcptr)(void) SECTION(".ctors") =test;
void (*funcptr2)(void) SECTION(".ctors") =test;
void (*funcptr3)(void) SECTION(".dtors") =test;
One can also add the function pointers to a completely different self-invented section. A modified linker script and an additional function mimicking the loader .ctors/.dtors loop is needed in such case. But with it one can achieve better control over execution order, add in-argument and return code handling e.t.a. (In a C++ project for example, it would be useful if in need of something running before or after global constructors).
I'd prefer __attribute__((constructor))/((destructor)) where possible, it's a simple and elegant solution even it feels like cheating. For bare-metal coders like myself, this is just not always an option.
Some good reference in the book Linkers & loaders.
This page provides great understanding about the constructor and destructor attribute implementation and the sections within within ELF that allow them to work. After digesting the information provided here, I compiled a bit of additional information and (borrowing the section example from Michael Ambrus above) created an example to illustrate the concepts and help my learning. Those results are provided below along with the example source.
As explained in this thread, the constructor and destructor attributes create entries in the .ctors and .dtors section of the object file. You can place references to functions in either section in one of three ways. (1) using either the section attribute; (2) constructor and destructor attributes or (3) with an inline-assembly call (as referenced the link in Ambrus' answer).
The use of constructor and destructor attributes allow you to additionally assign a priority to the constructor/destructor to control its order of execution before main() is called or after it returns. The lower the priority value given, the higher the execution priority (lower priorities execute before higher priorities before main() -- and subsequent to higher priorities after main() ). The priority values you give must be greater than100 as the compiler reserves priority values between 0-100 for implementation. Aconstructor or destructor specified with priority executes before a constructor or destructor specified without priority.
With the 'section' attribute or with inline-assembly, you can also place function references in the .init and .fini ELF code section that will execute before any constructor and after any destructor, respectively. Any functions called by the function reference placed in the .init section, will execute before the function reference itself (as usual).
I have tried to illustrate each of those in the example below:
#include <stdio.h>
#include <stdlib.h>
/* test function utilizing attribute 'section' ".ctors"/".dtors"
to create constuctors/destructors without assigned priority.
(provided by Michael Ambrus in earlier answer)
*/
#define SECTION( S ) __attribute__ ((section ( S )))
void test (void) {
printf("\n\ttest() utilizing -- (.section .ctors/.dtors) w/o priority\n");
}
void (*funcptr1)(void) SECTION(".ctors") =test;
void (*funcptr2)(void) SECTION(".ctors") =test;
void (*funcptr3)(void) SECTION(".dtors") =test;
/* functions constructX, destructX use attributes 'constructor' and
'destructor' to create prioritized entries in the .ctors, .dtors
ELF sections, respectively.
NOTE: priorities 0-100 are reserved
*/
void construct1 () __attribute__ ((constructor (101)));
void construct2 () __attribute__ ((constructor (102)));
void destruct1 () __attribute__ ((destructor (101)));
void destruct2 () __attribute__ ((destructor (102)));
/* init_some_function() - called by elf_init()
*/
int init_some_function () {
printf ("\n init_some_function() called by elf_init()\n");
return 1;
}
/* elf_init uses inline-assembly to place itself in the ELF .init section.
*/
int elf_init (void)
{
__asm__ (".section .init \n call elf_init \n .section .text\n");
if(!init_some_function ())
{
exit (1);
}
printf ("\n elf_init() -- (.section .init)\n");
return 1;
}
/*
function definitions for constructX and destructX
*/
void construct1 () {
printf ("\n construct1() constructor -- (.section .ctors) priority 101\n");
}
void construct2 () {
printf ("\n construct2() constructor -- (.section .ctors) priority 102\n");
}
void destruct1 () {
printf ("\n destruct1() destructor -- (.section .dtors) priority 101\n\n");
}
void destruct2 () {
printf ("\n destruct2() destructor -- (.section .dtors) priority 102\n");
}
/* main makes no function call to any of the functions declared above
*/
int
main (int argc, char *argv[]) {
printf ("\n\t [ main body of program ]\n");
return 0;
}
output:
init_some_function() called by elf_init()
elf_init() -- (.section .init)
construct1() constructor -- (.section .ctors) priority 101
construct2() constructor -- (.section .ctors) priority 102
test() utilizing -- (.section .ctors/.dtors) w/o priority
test() utilizing -- (.section .ctors/.dtors) w/o priority
[ main body of program ]
test() utilizing -- (.section .ctors/.dtors) w/o priority
destruct2() destructor -- (.section .dtors) priority 102
destruct1() destructor -- (.section .dtors) priority 101
The example helped cement the constructor/destructor behavior, hopefully it will be useful to others as well.
Here is a "concrete" (and possibly useful) example of how, why, and when to use these handy, yet unsightly constructs...
Xcode uses a "global" "user default" to decide which XCTestObserver class spews it's heart out to the beleaguered console.
In this example... when I implicitly load this psuedo-library, let's call it... libdemure.a, via a flag in my test target á la..
OTHER_LDFLAGS = -ldemure
I want to..
At load (ie. when XCTest loads my test bundle), override the "default" XCTest "observer" class... (via the constructor function) PS: As far as I can tell.. anything done here could be done with equivalent effect inside my class' + (void) load { ... } method.
run my tests.... in this case, with less inane verbosity in the logs (implementation upon request)
Return the "global" XCTestObserver class to it's pristine state.. so as not to foul up other XCTest runs which haven't gotten on the bandwagon (aka. linked to libdemure.a). I guess this historically was done in dealloc.. but I'm not about to start messing with that old hag.
So...
#define USER_DEFS NSUserDefaults.standardUserDefaults
#interface DemureTestObserver : XCTestObserver #end
#implementation DemureTestObserver
__attribute__((constructor)) static void hijack_observer() {
/*! here I totally hijack the default logging, but you CAN
use multiple observers, just CSV them,
i.e. "#"DemureTestObserverm,XCTestLog"
*/
[USER_DEFS setObject:#"DemureTestObserver"
forKey:#"XCTestObserverClass"];
[USER_DEFS synchronize];
}
__attribute__((destructor)) static void reset_observer() {
// Clean up, and it's as if we had never been here.
[USER_DEFS setObject:#"XCTestLog"
forKey:#"XCTestObserverClass"];
[USER_DEFS synchronize];
}
...
#end
Without the linker flag... (Fashion-police swarm Cupertino demanding retribution, yet Apple's default prevails, as is desired, here)
WITH the -ldemure.a linker flag... (Comprehensible results, gasp... "thanks constructor/destructor"... Crowd cheers)
Here is another concrete example. It is for a shared library. The shared library's main function is to communicate with a smart card reader, but it can also receive 'configuration information' at runtime over UDP. The UDP is handled by a thread which MUST be started at init time.
__attribute__((constructor)) static void startUdpReceiveThread (void) {
pthread_create( &tid_udpthread, NULL, __feigh_udp_receive_loop, NULL );
return;
}
The library was written in C.

Finding C++ static initialization order problems

We've run into some problems with the static initialization order fiasco, and I'm looking for ways to comb through a whole lot of code to find possible occurrences. Any suggestions on how to do this efficiently?
Edit: I'm getting some good answers on how to SOLVE the static initialization order problem, but that's not really my question. I'd like to know how to FIND objects that are subject to this problem. Evan's answer seems to be the best so far in this regard; I don't think we can use valgrind, but we may have memory analysis tools that could perform a similar function. That would catch problems only where the initialization order is wrong for a given build, and the order can change with each build. Perhaps there's a static analysis tool that would catch this. Our platform is IBM XLC/C++ compiler running on AIX.
Solving order of initialization:
First off, this is just a temporary work-around because you have global variables that you are trying to get rid of but just have not had time yet (you are going to get rid of them eventually aren't you? :-)
class A
{
public:
// Get the global instance abc
static A& getInstance_abc() // return a reference
{
static A instance_abc;
return instance_abc;
}
};
This will guarantee that it is initialised on first use and destroyed when the application terminates.
Multi-Threaded Problem:
C++11 does guarantee that this is thread-safe:
§6.7 [stmt.dcl] p4
If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
However, C++03 does not officially guarantee that the construction of static function objects is thread safe. So technically the getInstance_XXX() method must be guarded with a critical section. On the bright side, gcc has an explicit patch as part of the compiler that guarantees that each static function object will only be initialized once even in the presence of threads.
Please note: Do not use the double checked locking pattern to try and avoid the cost of the locking. This will not work in C++03.
Creation Problems:
On creation, there are no problems because we guarantee that it is created before it can be used.
Destruction Problems:
There is a potential problem of accessing the object after it has been destroyed. This only happens if you access the object from the destructor of another global variable (by global, I am referring to any non-local static variable).
The solution is to make sure that you force the order of destruction.
Remember the order of destruction is the exact inverse of the order of construction. So if you access the object in your destructor, you must guarantee that the object has not been destroyed. To do this, you must just guarantee that the object is fully constructed before the calling object is constructed.
class B
{
public:
static B& getInstance_Bglob;
{
static B instance_Bglob;
return instance_Bglob;;
}
~B()
{
A::getInstance_abc().doSomthing();
// The object abc is accessed from the destructor.
// Potential problem.
// You must guarantee that abc is destroyed after this object.
// To guarantee this you must make sure it is constructed first.
// To do this just access the object from the constructor.
}
B()
{
A::getInstance_abc();
// abc is now fully constructed.
// This means it was constructed before this object.
// This means it will be destroyed after this object.
// This means it is safe to use from the destructor.
}
};
I just wrote a bit of code to track down this problem. We have a good size code base (1000+ files) that was working fine on Windows/VC++ 2005, but crashing on startup on Solaris/gcc.
I wrote the following .h file:
#ifndef FIASCO_H
#define FIASCO_H
/////////////////////////////////////////////////////////////////////////////////////////////////////
// [WS 2010-07-30] Detect the infamous "Static initialization order fiasco"
// email warrenstevens --> [initials]#[firstnamelastname].com
// read --> http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.12 if you haven't suffered
// To enable this feature --> define E-N-A-B-L-E-_-F-I-A-S-C-O-_-F-I-N-D-E-R, rebuild, and run
#define ENABLE_FIASCO_FINDER
/////////////////////////////////////////////////////////////////////////////////////////////////////
#ifdef ENABLE_FIASCO_FINDER
#include <iostream>
#include <fstream>
inline bool WriteFiasco(const std::string& fileName)
{
static int counter = 0;
++counter;
std::ofstream file;
file.open("FiascoFinder.txt", std::ios::out | std::ios::app);
file << "Starting to initialize file - number: [" << counter << "] filename: [" << fileName.c_str() << "]" << std::endl;
file.flush();
file.close();
return true;
}
// [WS 2010-07-30] If you get a name collision on the following line, your usage is likely incorrect
#define FIASCO_FINDER static const bool g_psuedoUniqueName = WriteFiasco(__FILE__);
#else // ENABLE_FIASCO_FINDER
// do nothing
#define FIASCO_FINDER
#endif // ENABLE_FIASCO_FINDER
#endif //FIASCO_H
and within every .cpp file in the solution, I added this:
#include "PreCompiledHeader.h" // (which #include's the above file)
FIASCO_FINDER
#include "RegularIncludeOne.h"
#include "RegularIncludeTwo.h"
When you run your application, you will get an output file like so:
Starting to initialize file - number: [1] filename: [p:\\OneFile.cpp]
Starting to initialize file - number: [2] filename: [p:\\SecondFile.cpp]
Starting to initialize file - number: [3] filename: [p:\\ThirdFile.cpp]
If you experience a crash, the culprit should be in the last .cpp file listed. And at the very least, this will give you a good place to set breakpoints, as this code should be the absolute first of your code to execute (after which you can step through your code and see all of the globals that are being initialized).
Notes:
It's important that you put the "FIASCO_FINDER" macro as close to the top of your file as possible. If you put it below some other #includes you run the risk of it crashing before identifying the file that you're in.
If you're using Visual Studio, and pre-compiled headers, adding this extra macro line to all of your .cpp files can be done quickly using the Find-and-replace dialog to replace your existing #include "precompiledheader.h" with the same text plus the FIASCO_FINDER line (if you check off "regular expressions, you can use "\n" to insert multi-line replacement text)
Depending on your compiler, you can place a breakpoint at the constructor initialization code. In Visual C++, this is the _initterm function, which is given a start and end pointer of a list of the functions to call.
Then step into each function to get the file and function name (assuming you've compiled with debugging info on). Once you have the names, step out of the function (back up to _initterm) and continue until _initterm exits.
That gives you all the static initializers, not just the ones in your code - it's the easiest way to get an exhaustive list. You can filter out the ones you have no control over (such as those in third-party libraries).
The theory holds for other compilers but the name of the function and the capability of the debugger may change.
perhaps use valgrind to find usage of uninitialized memory. The nicest solution to the "static initialization order fiasco" is to use a static function which returns an instance of the object like this:
class A {
public:
static X &getStatic() { static X my_static; return my_static; }
};
This way you access your static object is by calling getStatic, this will guarantee it is initialized on first use.
If you need to worry about order of de-initialization, return a new'd object instead of a statically allocated object.
EDIT: removed the redundant static object, i dunno why but i mixed and matched two methods of having a static together in my original example.
There is code that essentially "initializes" C++ that is generated by the compiler. An easy way to find this code / the call stack at the time is to create a static object with something that dereferences NULL in the constructor - break in the debugger and explore a bit. The MSVC compiler sets up a table of function pointers that is iterated over for static initialization. You should be able to access this table and determine all static initialization taking place in your program.
We've run into some problems with the
static initialization order fiasco,
and I'm looking for ways to comb
through a whole lot of code to find
possible occurrences. Any suggestions
on how to do this efficiently?
It's not a trivial problem but at least it can done following fairly simple steps if you have an easy-to-parse intermediate-format representation of your code.
1) Find all the globals that have non-trivial constructors and put them in a list.
2) For each of these non-trivially-constructed objects, generate the entire potential-function-tree called by their constructors.
3) Walk through the non-trivially-constructor function tree and if the code references any other non-trivially constructed globals (which are quite handily in the list you generated in step one), you have a potential early-static-initialization-order issue.
4) Repeat steps 2 & 3 until you have exhausted the list generated in step 1.
Note: you may be able to optimize this by only visiting the potential-function-tree once per object class rather than once per global instance if you have multiple globals of a single class.
Replace all the global objects with global functions that return a reference to an object declared static in the function. This isn't thread-safe, so if your app is multi-threaded you might need some tricks like pthread_once or a global lock. This will ensure that everything is initialized before it is used.
Now, either your program works (hurrah!) or else it sits in an infinite loop because you have a circular dependency (redesign needed), or else you move on to the next bug.
The first thing you need to do is make a list of all static objects that have non-trivial constructors.
Given that, you either need to plug through them one at a time, or simply replace them all with singleton-pattern objects.
The singleton pattern comes in for a lot of criticism, but the lazy "as-required" construction is a fairly easy way to fix the majority of the problems now and in the future.
old...
MyObject myObject
new...
MyObject &myObject()
{
static MyObject myActualObject;
return myActualObject;
}
Of course, if your application is multi-threaded, this can cause you more problems than you had in the first place...
Gimpel Software (www.gimpel.com) claims that their PC-Lint/FlexeLint static analysis tools will detect such problems.
I have had good experience with their tools, but not with this specific issue so I can't vouch for how much they would help.
Some of these answers are now out of date. For the sake of people coming from search engines, like myself:
On Linux and elsewhere, finding instances of this problem is possible through Google's AddressSanitizer.
AddressSanitizer is a part of LLVM starting with version 3.1 and a
part of GCC starting with version 4.8
You would then do something like the following:
$ g++ -fsanitize=address -g staticA.C staticB.C staticC.C -o static
$ ASAN_OPTIONS=check_initialization_order=true:strict_init_order=true ./static
=================================================================
==32208==ERROR: AddressSanitizer: initialization-order-fiasco on address ... at ...
#0 0x400f96 in firstClass::getValue() staticC.C:13
#1 0x400de1 in secondClass::secondClass() staticB.C:7
...
See here for more details:
https://github.com/google/sanitizers/wiki/AddressSanitizerInitializationOrderFiasco
Other answers are correct, I just wanted to add that the object's getter should be implemented in a .cpp file and it should not be static. If you implement it in a header file, the object will be created in each library / framework you call it from....
If your project is in Visual Studio (I've tried this with VC++ Express 2005, and with Visual Studio 2008 Pro):
Open Class View (Main menu->View->Class View)
Expand each project in your solution and Click on "Global Functions and Variables"
This should give you a decent list of all of the globals that are subject to the fiasco.
In the end, a better approach is to try to remove these objects from your project (easier said than done, sometimes).