Initialization of member: bug in GCC or my thinking? - c++

I've got an enum type defined in the private section of my class. I have a member of this type defined as well. When I try to initialize this member in the constructor body, I get memory corruption problems at run-time. When I initialize it through an initialization list in the same constructor instead, I do not get memory corruption problems. Am I doing something wrong?
I'll simplify the code, and if it is a GCC bug I'm sure that it's a combination of the specific classes I'm combining/inheriting/etc., but I promise that this captures the essence of the problem. Nothing uses this member variable before it is initialized, and nothing uses the newly created object until after it is fully constructed. The initialization of this member is indeed the first thing I do in the body, and when the memory corruption happens, valgrind says it is on the line where I initialize the variable. Valgrind says that it is an invalid write of size 4.
Pertinent header code:
private:
enum StateOption{original = 0, blindside};
StateOption currentState;
pertinent .cpp code (causes memory corruption and crash):
MyClass::MyClass(AClass* classPtr) :
BaseClass(std::string("some_setting"),classPtr)
{
currentState = original;
...
}
pertinent .cpp code (does not cause memory corruption and crash):
MyClass::MyClass(AClass* classPtr) :
BaseClass(std::string("some_setting"),classPtr),
currentState(original)
{
...
}
edit: see my "answer" for what was causing this. After reading it, can anybody explain to me why it made a difference? I didn't change anything in the header, and obviously the object file was being rebuilt because of my print statements appearing when I put them in and the lack of seeing the bug under one build but not the other?
For a good explanation, I'll mark it as the answer to this question.

For posterity:
It appears as though the make script isn't pickup up the changes to these files for some reason. Manually deleting the objects rather than letting our "clean" target in the makefile caused a full rebuild (which took some time), and the problem disappeared.

Related

C++ What could cause a line of code in one place to change behavior in an unrelated function?

Consider this mock-up of my situation.
in an external header:
class ThirdPartyObject
{
...
}
my code: (spread among a few headers and source files)
class ThirdPartyObjectWrapper
{
private:
ThirdPartyObject myObject;
}
class Owner
{
public:
Owner() {}
void initialize();
private:
ThirdPartyObjectWrapper myWrappedObject;
};
void Owner::initialize()
{
//not weird:
//ThirdPartyObjectWrapper testWrappedObject;
//weird:
//ThirdPartyObject testObject;
}
ThirdPartyObject is, naturally, an object defined by a third party (static precompiled) library I'm using. ThirdPartyObjectWrapper is a convenience class that eliminates a lot of boiler-plating for working with ThirdPartyObject. Owner::initialize() is called shortly after an instance of Owner is created.
Notice the two lines I have labeled as "weird" and "not weird" in Owner::initialize(). All I'm doing here is creating a couple of objects on the stack with their default constructors. I don't do anything with those objects and they get destroyed when they leave scope. There are no build or linker errors involved, I can uncomment either or both lines and the code will build.
However, if I uncomment "weird" then I get a segmentation fault, and (here's why I say it's weird) it's in a completely unrelated location. Not in the constructor of testObject, like you might expect, but in the constructor of Owner::myObjectWrapper::myObject. The weird line never even gets called, but somehow its presence or absence consistently changes the behavior of an unrelated function in a static library.
And consider that if I only uncomment "not weird" then it runs fine, executing the ThirdPartyObject constructor twice with no problems.
I've been working with C++ for a year so it's not really a surprise to me that something like this would be able happen, but I've about reached the limit of my ability to figure out how this gotcha is happening. I need the input of people with significantly more C++ experience than me.
What are some possibilities that could cause this to happen? What might be going on here?
Also, note, I'm not asking for advice on how to get rid of the segfault. Segfaults I understand, I suspect it's a simple race condition. What I don't understand is the behavior gotcha so that's the only thing I'm trying to get answers for.
My best lead is that it has to do with headers and macros. The third party library actually already has a couple of gotchas having to do with its headers and macros, for example the code won't build if you put your #include's in the wrong order. I'm not changing any #include's so strictly this still wouldn't make sense, but perhaps the compiler is optimizing includes based on the presence of a symbol here? (it would be the only mention of ThirdPartyObject in the file)
It also occurs to me that because I am using Qt, it could be that the Meta-Object Compiler (which generates supplementary code between compilations) might be involved in this. Very unlikely, as Qt has no knowledge of the third party library where the segfault is happening and this is not actually relevant to the functionality of the MOC (since at no point ThirdPartyObject is being passed as an argument), but it's worth investigating at least.
Related questions have suggested that it could be a relatively small buffer overflow or race condition that gets tripped up by compiler optimizations. Continuing to investigate but all leads are welcome.
Typical culprits:
Some build products are stale and not binary-compatible.
You have a memory bug that has corrupted the state of your process, and are seeing a manifestation of that in a completely unrelated location.
Fixing #1 is trivial: delete the build folder and build again. If you're not building in a shadow build folder, you've set yourself up for failure, hopefully you now know enough to stop :)
Fixing #2 is not trivial. View manual memory management and possible buffer overflows with suspicion. Use modern C++ programming techniques to leverage the compiler to help you out: store things by value, use containers, use smart pointers, and use iterators and range-for instead of pointers. Don't use C-style arrays. Abhor C-style APIs of the (Type * array, int count) kind - they don't belong in C++.
What fun. I've boiled this down to the bottom.
//#include <otherthirdpartyheader.h>
#include <thirdpartyobject.h>
int main(...)
{
ThirdPartyObject test;
return 0;
}
This code runs. If I uncomment the first include, delete all build artifacts, and build again, then it breaks. There's obviously a header/macro component, and probably some kind of compiler-optimization component. But, get this, according to the library documentation it should give me a segfault every time because I haven't been doing a required initialization step. So the fact that it runs at all indicates unexpected behavior.
I'm chalking this up to library-specific issues rather than broad spectrum C++ issues. I'll be contacting the vendor going forward from here, but thanks everyone for the help.

Remove never-run call to templated function, get allocation error on run-time

I have a piece of templated code that is never run, but is compiled. When I remove it, another part of my program breaks.
First off, I'm a bit at a loss as to how to ask this question. So I'm going to try throwing lots of information at the problem.
Ok, so, I went to completely redesign my test project for my experimental core library thingy. I use a lot of template shenanigans in the library. When I removed the "user" code, the tests gave me a memory allocation error. After quite a bit of experimenting, I narrowed it down to this bit of code (out of a couple hundred lines):
void VOODOO(components::switchBoard &board) {
board.addComponent<using_allegro::keyInputs<'w'> >();
}
Fundementally, what's weirding me out is that it appears that the act of compiling this function (and the template function it then uses, and the template functions those then use...), makes this bug not appear. This code is not being run. Similar code (the same, but for different key vals) occurs elsewhere, but is within Boost TDD code.
I realize I certainly haven't given enough information for you to solve it for me; I tried, but it more-or-less spirals into most of the code base. I think I'm most looking for "here's what the problem could be", "here's where to look", etc. There's something that's happening during compile because of this line, but I don't know enough about that step to begin looking.
Sooo, how can a (presumably) compilied, but never actually run, bit of templated code, when removed, cause another part of code to fail?
Error:
Unhandled exceptionat 0x6fe731ea (msvcr90d.dll) in Switchboard.exe:
0xC0000005: Access violation reading location 0xcdcdcdc1.
Callstack:
operator delete(void * pUser Data)
allocator< class name related to key inputs callbacks >::deallocate
vector< same class >::_Insert_n(...)
vector< " " >::insert(...)
vector<" ">::push_back(...)
It looks like maybe the vector isn't valid, because _MyFirst and similar data members are showing values of 0xcdcdcdcd in the debugger. But the vector is a member variable...
Update: The vector isn't valid because it's never made. I'm getting a channel ID value stomp, which is making me treat one type of channel as another.
Update:
Searching through with the debugger again, it appears that my method for giving each "channel" it's own, unique ID isn't giving me a unique ID:
inline static const char channel<template args>::idFunction() {
return reinterpret_cast<char>(&channel<CHANNEL_IDENTIFY>::idFunction);
};
Update2: These two are giving the same:
slaveChannel<switchboard, ALLEGRO_BITMAP*, entityInfo<ALLEGRO_BITMAP*>
slaveChannel<key<c>, char, push<char>
Sooo, having another compiled channel type changing things makes sense, because it shifts around the values of the idFunctions? But why are there two idFunctions with the same value?
you seem to be returning address of the function as a character? that looks weird. char has much smaller bit count than pointer, so it's highly possible you get same values. that could reason why changing code layout fixes/breaks your program
As a general answer (though aaa's comment alludes to this): When something like this affects whether a bug occurs, it's either because (a) you're wrong and it is being run, or (b) the way that the inclusion of that code happens to affect your code, data, and memory layout in the compiled program causes a heisenbug to change from visible to hidden.
The latter generally occurs when something involves undefined behavior. Sometimes a bogus pointer value will cause you to stomp on a bit of your code (which might or might not be important depending on the code layout), or sometimes a bogus write will stomp on a value in your data stack that might or might not be a pointer that's used later, or so forth.
As a simple example, supposing you have a stack that looks like:
float data[10];
int never_used;
int *important pointer;
And then you erroneously write
data[10] = 0;
Then, assuming that stack got allocated in linear order, you'll stomp on never_used, and the bug will be harmless. However, if you remove never_used (or change something so the compiler knows it can remove it for you -- maybe you remove a never-called function call that would use it), then it will stomp on important_pointer instead, and you'll now get a segfault when you dereference it.

How to debug 'value of ESP was not saved across function call' error?

On rare occasions when my program exits, I get a "value of ESP has not been saved across a function call" error. The error is quite random and hard to reproduce.
How do I debug this error (VC++ 2008)? How harsh it is, as it only occurs on shutdown? Is the error visible also in release mode?
This means that either you call a function with a wrong calling convention - that often happens when you declare a function pointer improperly - or there's something overwriting the stack.
To debug the former check what function causes this situation. To debug the latter look for thing like stack-allocated buffer overruns.
I had this same problem and managed to fix it. In my case though things were are very specific. It's hard to tell without some sample code posted. This is what was causing the problem. Below I'll show an example of what was breaking my program.
class MyClass; //Forward declaration
typedef (MyClass::*CallBack)(Object*);
When registering a new CallBack the program crashed as it was leaving the current function call.
class ThisClass : public MyClass
{
//...
}
//...
//...
void ThisClass::Init(void)
{
Sys.RegisterCallBack((CallBack)&ThisClass::Foo);
} //The program crashed at this line
To fix the problem I got rid of the forward declaration and simply included the header file.
#include "MyClass.h"
typedef (MyClass::*CallBack)(Object*);
To summarize, do not forward declare when you want to use a member function pointer from that class!
This means some part of your program wrote over the stack. That's bad. You're just lucky that right now it happens on shutdown, but sooner or later someone may use the failing function in another place.
When the message goes off you can see the function you're in. What you can do is rerun the program, and when entering the function, put a data breakpoint at the location esp was written to. Then run to the end of the function - the offending code will trigger the data breakpoint.

Virtual methods chaos, how can I find what causes that?

A colleague of mine had a problem with some C++ code today. He was debugging the weird behaviour of an object's virtual method. Whenever the method executed ( under debug, Visual Studio 2005 ), everything went wrong, and the debugger wouldn't step in that method, but in the object's destructor! Also, the virtual table of the object, only listed it's destructor, no other methods.
I haven't seen this behaviour before, and a runtime error was printed, saying something about the ESP register. I wish I could give you the right error message, but I don't remember it correctly now.
Anyway, have any of you guys ever encountered that? What could cause such behaviour? How would that be fixed? We tried to rebuild the project many times, restarted the IDE, nothing helped. We also used the _CrtCheckMemory function before that method call to make sure the memory was in a good state, and it returned true ( which means ok ) . I have no more ideas. Do you?
I've seen that before. Generally it occurs because I'm using a class from a Release built .LIB file while I'm in Debug mode. Someone else probably has seen a better example and I'd yield my answer to their answer.
Maybe you use C-style casts where a static_cast<> is required? This may result in the kind of error you report, whenever multiple inheritance is involved, e.g.:
class Base1 {};
class Base2 {};
class Derived : public Base1, public Base2 {};
Derived *d = new Derived;
Base2* b2_1 = (Base2*)d; // wrong!
Base2* b2_2 = static_cast<Base2*>(d); // correct
assert( b2_1 == b2_2 ); // assertion may fail, because b2_1 != b2_2
Note, this may not always be the case, this depends on the compiler and on declarations of all the classes involved (it probably happens when all classes have virtual methods, but I do not have exact rules at hand).
OR: A completely different part of your code is going wild and is overwriting memory. Try to isolate the error and check if it still occurs. CrtCheckMemory will find only a few cases where you overwrite memory (e.g. when you write into specially marked heap management locations).
If you ever call a function with the wrong number of parameters, this can easily end up trashing your stack and producing undefined behaviour. I seem to recall that certain errors when using MFC could easily cause this, for example if you use the dispatch macros to point a message at a method that doesn't have the right number or type of parameters (I seem to recall that those function pointers aren't strongly checked for type). It's been probably a decade since I last encountered that particular problem, so my memory is hazy.
The value of ESP was not properly saved across a function call.
This sort of behaviour is usually indicative of the calling code having been compiled with a different definition of a class or function than the code that created the particular class in question.
Is it possible that there is an different version of a component dll that is being loaded instead of the freshly built one? This can happen if you copy things as part of a post-build step or if the process is run from a different directory or changes its dll search path before doing a LoadLibrary or equivalent.
I've encountered it most often in complex projects where a class definition is changed to add, remove or change the signature of a virtual function and then an incremental build is done and not all the code that needs to be recompiled is actually recompiled. Theoretically, it could happen if some part of the program is overwriting the vptr or vtables of some polymorhpic objects but I've always found that a bad partial build is a much more likely cause.
This may 'user error', a developer deliberately tells the compiler to only build one project when others should be rebuilt, or it can be having multiple solutions or multiple projects in a solution where the dependencies are not correctly setup.
Very occasionally, Visual Studio can slip up and not get the generated dependencies correct even when the projects in a solution are correctly linked. This happens less often than Visual Studios is blamed for it.
Expunging all intermediate build files and rebuilding everything from source usually fixes the problem. Obviously for very large projects this can be a severe penalty.
Since it's guess-fest anyway, here's one from me:
You stack is messed up and _CrtCheckMemory doesn't check for that. As to why the stack is corrupted:
good old stack overflow
calling convention mismatches, which was already mentioned (I don't know, like passing a callback in the wrong calling convention to a WinAPI function; what static or dynamic libraries are you linking with?)
a line like printf("%d");

Finding C++ static initialization order problems

We've run into some problems with the static initialization order fiasco, and I'm looking for ways to comb through a whole lot of code to find possible occurrences. Any suggestions on how to do this efficiently?
Edit: I'm getting some good answers on how to SOLVE the static initialization order problem, but that's not really my question. I'd like to know how to FIND objects that are subject to this problem. Evan's answer seems to be the best so far in this regard; I don't think we can use valgrind, but we may have memory analysis tools that could perform a similar function. That would catch problems only where the initialization order is wrong for a given build, and the order can change with each build. Perhaps there's a static analysis tool that would catch this. Our platform is IBM XLC/C++ compiler running on AIX.
Solving order of initialization:
First off, this is just a temporary work-around because you have global variables that you are trying to get rid of but just have not had time yet (you are going to get rid of them eventually aren't you? :-)
class A
{
public:
// Get the global instance abc
static A& getInstance_abc() // return a reference
{
static A instance_abc;
return instance_abc;
}
};
This will guarantee that it is initialised on first use and destroyed when the application terminates.
Multi-Threaded Problem:
C++11 does guarantee that this is thread-safe:
ยง6.7 [stmt.dcl] p4
If control enters the declaration concurrently while the variable is being initialized, the concurrent execution shall wait for completion of the initialization.
However, C++03 does not officially guarantee that the construction of static function objects is thread safe. So technically the getInstance_XXX() method must be guarded with a critical section. On the bright side, gcc has an explicit patch as part of the compiler that guarantees that each static function object will only be initialized once even in the presence of threads.
Please note: Do not use the double checked locking pattern to try and avoid the cost of the locking. This will not work in C++03.
Creation Problems:
On creation, there are no problems because we guarantee that it is created before it can be used.
Destruction Problems:
There is a potential problem of accessing the object after it has been destroyed. This only happens if you access the object from the destructor of another global variable (by global, I am referring to any non-local static variable).
The solution is to make sure that you force the order of destruction.
Remember the order of destruction is the exact inverse of the order of construction. So if you access the object in your destructor, you must guarantee that the object has not been destroyed. To do this, you must just guarantee that the object is fully constructed before the calling object is constructed.
class B
{
public:
static B& getInstance_Bglob;
{
static B instance_Bglob;
return instance_Bglob;;
}
~B()
{
A::getInstance_abc().doSomthing();
// The object abc is accessed from the destructor.
// Potential problem.
// You must guarantee that abc is destroyed after this object.
// To guarantee this you must make sure it is constructed first.
// To do this just access the object from the constructor.
}
B()
{
A::getInstance_abc();
// abc is now fully constructed.
// This means it was constructed before this object.
// This means it will be destroyed after this object.
// This means it is safe to use from the destructor.
}
};
I just wrote a bit of code to track down this problem. We have a good size code base (1000+ files) that was working fine on Windows/VC++ 2005, but crashing on startup on Solaris/gcc.
I wrote the following .h file:
#ifndef FIASCO_H
#define FIASCO_H
/////////////////////////////////////////////////////////////////////////////////////////////////////
// [WS 2010-07-30] Detect the infamous "Static initialization order fiasco"
// email warrenstevens --> [initials]#[firstnamelastname].com
// read --> http://www.parashift.com/c++-faq-lite/ctors.html#faq-10.12 if you haven't suffered
// To enable this feature --> define E-N-A-B-L-E-_-F-I-A-S-C-O-_-F-I-N-D-E-R, rebuild, and run
#define ENABLE_FIASCO_FINDER
/////////////////////////////////////////////////////////////////////////////////////////////////////
#ifdef ENABLE_FIASCO_FINDER
#include <iostream>
#include <fstream>
inline bool WriteFiasco(const std::string& fileName)
{
static int counter = 0;
++counter;
std::ofstream file;
file.open("FiascoFinder.txt", std::ios::out | std::ios::app);
file << "Starting to initialize file - number: [" << counter << "] filename: [" << fileName.c_str() << "]" << std::endl;
file.flush();
file.close();
return true;
}
// [WS 2010-07-30] If you get a name collision on the following line, your usage is likely incorrect
#define FIASCO_FINDER static const bool g_psuedoUniqueName = WriteFiasco(__FILE__);
#else // ENABLE_FIASCO_FINDER
// do nothing
#define FIASCO_FINDER
#endif // ENABLE_FIASCO_FINDER
#endif //FIASCO_H
and within every .cpp file in the solution, I added this:
#include "PreCompiledHeader.h" // (which #include's the above file)
FIASCO_FINDER
#include "RegularIncludeOne.h"
#include "RegularIncludeTwo.h"
When you run your application, you will get an output file like so:
Starting to initialize file - number: [1] filename: [p:\\OneFile.cpp]
Starting to initialize file - number: [2] filename: [p:\\SecondFile.cpp]
Starting to initialize file - number: [3] filename: [p:\\ThirdFile.cpp]
If you experience a crash, the culprit should be in the last .cpp file listed. And at the very least, this will give you a good place to set breakpoints, as this code should be the absolute first of your code to execute (after which you can step through your code and see all of the globals that are being initialized).
Notes:
It's important that you put the "FIASCO_FINDER" macro as close to the top of your file as possible. If you put it below some other #includes you run the risk of it crashing before identifying the file that you're in.
If you're using Visual Studio, and pre-compiled headers, adding this extra macro line to all of your .cpp files can be done quickly using the Find-and-replace dialog to replace your existing #include "precompiledheader.h" with the same text plus the FIASCO_FINDER line (if you check off "regular expressions, you can use "\n" to insert multi-line replacement text)
Depending on your compiler, you can place a breakpoint at the constructor initialization code. In Visual C++, this is the _initterm function, which is given a start and end pointer of a list of the functions to call.
Then step into each function to get the file and function name (assuming you've compiled with debugging info on). Once you have the names, step out of the function (back up to _initterm) and continue until _initterm exits.
That gives you all the static initializers, not just the ones in your code - it's the easiest way to get an exhaustive list. You can filter out the ones you have no control over (such as those in third-party libraries).
The theory holds for other compilers but the name of the function and the capability of the debugger may change.
perhaps use valgrind to find usage of uninitialized memory. The nicest solution to the "static initialization order fiasco" is to use a static function which returns an instance of the object like this:
class A {
public:
static X &getStatic() { static X my_static; return my_static; }
};
This way you access your static object is by calling getStatic, this will guarantee it is initialized on first use.
If you need to worry about order of de-initialization, return a new'd object instead of a statically allocated object.
EDIT: removed the redundant static object, i dunno why but i mixed and matched two methods of having a static together in my original example.
There is code that essentially "initializes" C++ that is generated by the compiler. An easy way to find this code / the call stack at the time is to create a static object with something that dereferences NULL in the constructor - break in the debugger and explore a bit. The MSVC compiler sets up a table of function pointers that is iterated over for static initialization. You should be able to access this table and determine all static initialization taking place in your program.
We've run into some problems with the
static initialization order fiasco,
and I'm looking for ways to comb
through a whole lot of code to find
possible occurrences. Any suggestions
on how to do this efficiently?
It's not a trivial problem but at least it can done following fairly simple steps if you have an easy-to-parse intermediate-format representation of your code.
1) Find all the globals that have non-trivial constructors and put them in a list.
2) For each of these non-trivially-constructed objects, generate the entire potential-function-tree called by their constructors.
3) Walk through the non-trivially-constructor function tree and if the code references any other non-trivially constructed globals (which are quite handily in the list you generated in step one), you have a potential early-static-initialization-order issue.
4) Repeat steps 2 & 3 until you have exhausted the list generated in step 1.
Note: you may be able to optimize this by only visiting the potential-function-tree once per object class rather than once per global instance if you have multiple globals of a single class.
Replace all the global objects with global functions that return a reference to an object declared static in the function. This isn't thread-safe, so if your app is multi-threaded you might need some tricks like pthread_once or a global lock. This will ensure that everything is initialized before it is used.
Now, either your program works (hurrah!) or else it sits in an infinite loop because you have a circular dependency (redesign needed), or else you move on to the next bug.
The first thing you need to do is make a list of all static objects that have non-trivial constructors.
Given that, you either need to plug through them one at a time, or simply replace them all with singleton-pattern objects.
The singleton pattern comes in for a lot of criticism, but the lazy "as-required" construction is a fairly easy way to fix the majority of the problems now and in the future.
old...
MyObject myObject
new...
MyObject &myObject()
{
static MyObject myActualObject;
return myActualObject;
}
Of course, if your application is multi-threaded, this can cause you more problems than you had in the first place...
Gimpel Software (www.gimpel.com) claims that their PC-Lint/FlexeLint static analysis tools will detect such problems.
I have had good experience with their tools, but not with this specific issue so I can't vouch for how much they would help.
Some of these answers are now out of date. For the sake of people coming from search engines, like myself:
On Linux and elsewhere, finding instances of this problem is possible through Google's AddressSanitizer.
AddressSanitizer is a part of LLVM starting with version 3.1 and a
part of GCC starting with version 4.8
You would then do something like the following:
$ g++ -fsanitize=address -g staticA.C staticB.C staticC.C -o static
$ ASAN_OPTIONS=check_initialization_order=true:strict_init_order=true ./static
=================================================================
==32208==ERROR: AddressSanitizer: initialization-order-fiasco on address ... at ...
#0 0x400f96 in firstClass::getValue() staticC.C:13
#1 0x400de1 in secondClass::secondClass() staticB.C:7
...
See here for more details:
https://github.com/google/sanitizers/wiki/AddressSanitizerInitializationOrderFiasco
Other answers are correct, I just wanted to add that the object's getter should be implemented in a .cpp file and it should not be static. If you implement it in a header file, the object will be created in each library / framework you call it from....
If your project is in Visual Studio (I've tried this with VC++ Express 2005, and with Visual Studio 2008 Pro):
Open Class View (Main menu->View->Class View)
Expand each project in your solution and Click on "Global Functions and Variables"
This should give you a decent list of all of the globals that are subject to the fiasco.
In the end, a better approach is to try to remove these objects from your project (easier said than done, sometimes).