Debugging a sigtrap in c++ - c++

In a recent program I've made I had some worrying random crashes/ crash on shutdown, after stripping down I think I've narrowed it down to a SIGTRAP which happens when a vector is created under specific circumstances. The main bulk of the code can be found here :http://pastebin.com/xp9Cm04Q and the tile class here : http://pastebin.com/Niv7SSyF (the issue arises when the buildworld subroutine is ran) and the console output can be found here http://pastebin.com/7HyaMke8 . The debugger goes to new_allicator when this happens, if that's worth knowing.
Also note that for some reason removing the calls to rTest in the tiles (which only makes a call to the RNG that class has), but only if another subZone has been created since then. Needless to say I am completely baffled as to why this is happening.
Am I doing anything dumb here? I'm only using std libraries, so I don't think I could of installed them wrong or anything. Is this an issue I can/should ignore? Any kind of help on how to approach this issue is greatly appreciated.

tiles.back().back().giveRGen(&zoneRGen);
One problem: your tile have a link to RNG object; that link would be broken then subZone copied. For example here:
allZones.push_back( subZone( x , y , worldRGen() ) );

Related

C++ What could cause a line of code in one place to change behavior in an unrelated function?

Consider this mock-up of my situation.
in an external header:
class ThirdPartyObject
{
...
}
my code: (spread among a few headers and source files)
class ThirdPartyObjectWrapper
{
private:
ThirdPartyObject myObject;
}
class Owner
{
public:
Owner() {}
void initialize();
private:
ThirdPartyObjectWrapper myWrappedObject;
};
void Owner::initialize()
{
//not weird:
//ThirdPartyObjectWrapper testWrappedObject;
//weird:
//ThirdPartyObject testObject;
}
ThirdPartyObject is, naturally, an object defined by a third party (static precompiled) library I'm using. ThirdPartyObjectWrapper is a convenience class that eliminates a lot of boiler-plating for working with ThirdPartyObject. Owner::initialize() is called shortly after an instance of Owner is created.
Notice the two lines I have labeled as "weird" and "not weird" in Owner::initialize(). All I'm doing here is creating a couple of objects on the stack with their default constructors. I don't do anything with those objects and they get destroyed when they leave scope. There are no build or linker errors involved, I can uncomment either or both lines and the code will build.
However, if I uncomment "weird" then I get a segmentation fault, and (here's why I say it's weird) it's in a completely unrelated location. Not in the constructor of testObject, like you might expect, but in the constructor of Owner::myObjectWrapper::myObject. The weird line never even gets called, but somehow its presence or absence consistently changes the behavior of an unrelated function in a static library.
And consider that if I only uncomment "not weird" then it runs fine, executing the ThirdPartyObject constructor twice with no problems.
I've been working with C++ for a year so it's not really a surprise to me that something like this would be able happen, but I've about reached the limit of my ability to figure out how this gotcha is happening. I need the input of people with significantly more C++ experience than me.
What are some possibilities that could cause this to happen? What might be going on here?
Also, note, I'm not asking for advice on how to get rid of the segfault. Segfaults I understand, I suspect it's a simple race condition. What I don't understand is the behavior gotcha so that's the only thing I'm trying to get answers for.
My best lead is that it has to do with headers and macros. The third party library actually already has a couple of gotchas having to do with its headers and macros, for example the code won't build if you put your #include's in the wrong order. I'm not changing any #include's so strictly this still wouldn't make sense, but perhaps the compiler is optimizing includes based on the presence of a symbol here? (it would be the only mention of ThirdPartyObject in the file)
It also occurs to me that because I am using Qt, it could be that the Meta-Object Compiler (which generates supplementary code between compilations) might be involved in this. Very unlikely, as Qt has no knowledge of the third party library where the segfault is happening and this is not actually relevant to the functionality of the MOC (since at no point ThirdPartyObject is being passed as an argument), but it's worth investigating at least.
Related questions have suggested that it could be a relatively small buffer overflow or race condition that gets tripped up by compiler optimizations. Continuing to investigate but all leads are welcome.
Typical culprits:
Some build products are stale and not binary-compatible.
You have a memory bug that has corrupted the state of your process, and are seeing a manifestation of that in a completely unrelated location.
Fixing #1 is trivial: delete the build folder and build again. If you're not building in a shadow build folder, you've set yourself up for failure, hopefully you now know enough to stop :)
Fixing #2 is not trivial. View manual memory management and possible buffer overflows with suspicion. Use modern C++ programming techniques to leverage the compiler to help you out: store things by value, use containers, use smart pointers, and use iterators and range-for instead of pointers. Don't use C-style arrays. Abhor C-style APIs of the (Type * array, int count) kind - they don't belong in C++.
What fun. I've boiled this down to the bottom.
//#include <otherthirdpartyheader.h>
#include <thirdpartyobject.h>
int main(...)
{
ThirdPartyObject test;
return 0;
}
This code runs. If I uncomment the first include, delete all build artifacts, and build again, then it breaks. There's obviously a header/macro component, and probably some kind of compiler-optimization component. But, get this, according to the library documentation it should give me a segfault every time because I haven't been doing a required initialization step. So the fact that it runs at all indicates unexpected behavior.
I'm chalking this up to library-specific issues rather than broad spectrum C++ issues. I'll be contacting the vendor going forward from here, but thanks everyone for the help.

Call to method ends up in completely different one

I have a strange problem which appears in wxWidgets libraries and which can be reproduced easily in debugger. It does not seem to be wxWidgets-specific, so I'm asking this here.
I perform some kind of non-standard initialisation. During this initialisation a method wxAppBase::Initialize() is called. wxAppBase is the base class of my own App-class. Within wxAppBase::Initialize() an other method OnInitGui() is called, this method exists in wxAppBase as well as in my derived class.
And here is the problem: In Debug Build, everthing works well, OnInitGui() is executed as expected. But in Release-build I never reach OnInitGui() method but end up in a completely different one which does not have anything to do with OnInitGui(). So it seems an illegal jump is performed.
All pointers seem to be valid and this happens in wxWidgets library completely.
Anybody an idea what could cause such a behaviour and how one could fix this?
Every hint/idea/suggestion is welcome.
This looks like a bad build. Clean everything and rebuild both the library and your application using exactly the same compiler and compiler options and the problem will probably just disappear.

Segmentation fault when adding a vector. (C++)

So I've got a pretty basic class that has a few methods and some class variables. Everythings working great up until I add a vector to the member variables in the header file:
std::vector <std::string> vectorofstuff;
If all I do is add this line then my program run perfectly but at the end, after all the output is there, I get a message about a seg fault.
My first guess is that I need to call the destructor on the vector, but that didn't seem to work. Plus my understanding is I don't need to call the destructor unless I use the word 'new'.
Any pushes in the right direction? Thanks bunches!
I guess the following either happened to you, or it was something similar involving unrealised dependencies/headers. Either way, I hope this answer might show up on Google and help some later, extremely confused programmer figure out why they're suddenly observing arbitrary crashes.
So, from experience, this can happen if you compile a new version of SomeObject.o but accidentally have another object file #include an old version of SomeObject.hpp. This leads to corruption, which'll be caused by the compiler referring to outdated member offsets, etc. Sometimes this mostly works and only produces segfaults when destructing objects - either related or seemingly distant ones - and other times the program segfaults right away or somewhere in-between; I've seen several permutations (regrettably!).
For anyone wondering why this can happen, maybe this is just a reflection of how little sleep I get while programming, but I've encountered this pattern in the context of Git submodules, e.g.:
MyRepo
/ GuiSubmodule
/ HelperSubmodule
/ / GuiSubmodule
If (A) you have a new commit in GuiSubmodule, which has not yet been pulled into HelperSubmodule's copy, (B) your makefile compiles MyRepo/uiSubmodule/SomeObject.o, and (C) another translation unit - either in a submodule or in the main repo via the perils of #include - links against an older version of SomeObject.hpp that has a different class layout... You're in for a fun time, and a lot of chasing red herrings until you finally realise the simple mistake.
Since I had cobbled together my build process from scratch, I might've just not been using Git/make properly - or strictly enough (forgetting to push/pull all submodules). Probably the latter! I see fewer odd bugs nowadays at least :)
You are probably corrupting the memory of the vectorofstuff member somewhere within your class. When the class destructor is called the destructor of the vector is called as well, which would try to point and/or delete to invalid memory.
I was fooling around with it and decided to, just to be sure, do an rm on everything and recompile. And guess what? That fixed it. I have no idea why, in the makefile I do this anyway, but whatever, I'm just glad I can move on and continue working on it. Thanks so much for all the help!

Virtual methods chaos, how can I find what causes that?

A colleague of mine had a problem with some C++ code today. He was debugging the weird behaviour of an object's virtual method. Whenever the method executed ( under debug, Visual Studio 2005 ), everything went wrong, and the debugger wouldn't step in that method, but in the object's destructor! Also, the virtual table of the object, only listed it's destructor, no other methods.
I haven't seen this behaviour before, and a runtime error was printed, saying something about the ESP register. I wish I could give you the right error message, but I don't remember it correctly now.
Anyway, have any of you guys ever encountered that? What could cause such behaviour? How would that be fixed? We tried to rebuild the project many times, restarted the IDE, nothing helped. We also used the _CrtCheckMemory function before that method call to make sure the memory was in a good state, and it returned true ( which means ok ) . I have no more ideas. Do you?
I've seen that before. Generally it occurs because I'm using a class from a Release built .LIB file while I'm in Debug mode. Someone else probably has seen a better example and I'd yield my answer to their answer.
Maybe you use C-style casts where a static_cast<> is required? This may result in the kind of error you report, whenever multiple inheritance is involved, e.g.:
class Base1 {};
class Base2 {};
class Derived : public Base1, public Base2 {};
Derived *d = new Derived;
Base2* b2_1 = (Base2*)d; // wrong!
Base2* b2_2 = static_cast<Base2*>(d); // correct
assert( b2_1 == b2_2 ); // assertion may fail, because b2_1 != b2_2
Note, this may not always be the case, this depends on the compiler and on declarations of all the classes involved (it probably happens when all classes have virtual methods, but I do not have exact rules at hand).
OR: A completely different part of your code is going wild and is overwriting memory. Try to isolate the error and check if it still occurs. CrtCheckMemory will find only a few cases where you overwrite memory (e.g. when you write into specially marked heap management locations).
If you ever call a function with the wrong number of parameters, this can easily end up trashing your stack and producing undefined behaviour. I seem to recall that certain errors when using MFC could easily cause this, for example if you use the dispatch macros to point a message at a method that doesn't have the right number or type of parameters (I seem to recall that those function pointers aren't strongly checked for type). It's been probably a decade since I last encountered that particular problem, so my memory is hazy.
The value of ESP was not properly saved across a function call.
This sort of behaviour is usually indicative of the calling code having been compiled with a different definition of a class or function than the code that created the particular class in question.
Is it possible that there is an different version of a component dll that is being loaded instead of the freshly built one? This can happen if you copy things as part of a post-build step or if the process is run from a different directory or changes its dll search path before doing a LoadLibrary or equivalent.
I've encountered it most often in complex projects where a class definition is changed to add, remove or change the signature of a virtual function and then an incremental build is done and not all the code that needs to be recompiled is actually recompiled. Theoretically, it could happen if some part of the program is overwriting the vptr or vtables of some polymorhpic objects but I've always found that a bad partial build is a much more likely cause.
This may 'user error', a developer deliberately tells the compiler to only build one project when others should be rebuilt, or it can be having multiple solutions or multiple projects in a solution where the dependencies are not correctly setup.
Very occasionally, Visual Studio can slip up and not get the generated dependencies correct even when the projects in a solution are correctly linked. This happens less often than Visual Studios is blamed for it.
Expunging all intermediate build files and rebuilding everything from source usually fixes the problem. Obviously for very large projects this can be a severe penalty.
Since it's guess-fest anyway, here's one from me:
You stack is messed up and _CrtCheckMemory doesn't check for that. As to why the stack is corrupted:
good old stack overflow
calling convention mismatches, which was already mentioned (I don't know, like passing a callback in the wrong calling convention to a WinAPI function; what static or dynamic libraries are you linking with?)
a line like printf("%d");

Visual Studio 2005 C compiler problem when optimizing a switch statement

General Question which may be of interest to others:
I ran into a, what I believe, C++-compiler optimization (Visual Studio 2005) problem with a switch statement. What I'd want to know is if there is any way to satisfy my curiosity and find out what the compiler is trying to but failing to do. Is there any log I can spend some time (probably too much time) deciphering?
My specific problem for those curious enough to continue reading - I'd like to hear your thoughts on why I get problems in this specific case.
I've got a tiny program with about 500 lines of code containing a switch statement. Some of its cases contain some assignment of pointers.
double *ptx, *pty, *ptz;
double **ppt = new double*[3];
//some code initializing etc ptx, pty and ptz
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
The middle statement seems to hang the compiler. The compilation never ends. OK, I didn't wait for longer than it took to walk down the hall, talk to some people, get a cup of coffee and return to my desk, but this is a tiny program which usually compiles in less than a second. Remove a single line (the one indicated in the code above) and the problem goes away, as it also does when removing the optimization (on the whole program or using #pragma on the function).
Why does this middle line cause a problem? The compilers optimizer doesn't like pty.
There is no difference in the vectors ptx, pty, and ptz in the program. Everything I do to pty I do to ptx and ptz. I tried swapping their positions in ppt, but pty was still the line causing a problem.
I'm asking about this because I'm curious about what is happening. The code is rewritten and is working fine.
Edit:
Almost two weeks later, I check out the closest version to the code I described above and I can't edit it back to make it crash. This is really annoying, embarrassing and irritating. I'll give it another try, but if I don't get it breaking anytime soon I guess this part of the question is obsolete and I'll remove it. Really sorry for taking your time.
If you need to make this code compilable without changing it too much consider using memcpy where you assign a value to ppt[1]. This should at least compile fine.
However, you problem seems more like another part of the source code causes this behaviour.
What you can also try is to put this stuff:
ppt[0]=ptx;
ppt[1]=pty; //<----- this statement causes problems
ppt[2]=ptz;
in another function.
This should also help compiler a bit to avoid the path it is taking to compile your code.
Did you try renaming pty to something else (i.e. pt_y)? I encountered a couple of times (i.e. with a variable "rect2") the problem that some names seem to be "reserved".
It sounds like a compiler bug. Have you tried re-ordering the lines? e.g.,
ppt[1]=pty;
ppt[0]=ptx;
ppt[2]=ptz;
Also what happens if you juggle about the values that are assigned (which will introduce bugs in your code, but may indicator whether its the pointer or the array that's the issue), e.g.:
ppt[0] = pty;
ppt[1] = ptz;
ppt[2] = ptx;
(or similar).
It's probably due to your declaration of ptx, pty and ptz with them being optimised out to use the same address. Then this action is causing your compiler problems later in your code.
Try
static double *ptx;
static double *pty;
static double *ptz;