We have been debugging a strange case for some days now, and have somewhat isolated the bug, but it still doesn't make any sense. Perhaps anyone here can give me a clue about what is going on.
The problem is an access violation that occur in a part of the code.
Basically we have something like this:
void aclass::somefunc() {
try {
erroneous_member_function(*someptr);
}
catch (AnException) {
}
}
void aclass::erroneous_member_function(const SomeObject& ref) {
// { //<--scope here error goes away
LargeObject obj = Singleton()->Object.someLargeObj; //<-remove this error goes away
//DummyDestruct dummy1//<-- this is not destroyed before the unreachable
throw AnException();
// } //<--end scope here error goes away
UnreachableClass unreachable; //<- remove this, and the error goes away
DummyDestruct dummy2; //<- destructor of this object is called!
}
While in the debugger it actually looks like it is destructing the UnreachableClass, and when I insert the DummyDestruct object this does not get destroyed before the strange destructor are called. So it is not seem like the destruction of the LargeObject is going awry.
All this is in the middle of production code, and it is very hard to isolate it to a small example.
My question is, does anyone have a clue about what is causing this, and what is happening? I have a quite full featured debugger available (Embarcadero RAD studio), but now I am not sure what to do with it.
Can anyone give me some advise on how to proceed?
Update:
I placed a DummyDestruct object beneath the throw clause, and placed a breakpoint in the destructor. The destructor for this object is entered (and its only us is in this piece of code).
With the information you have provided, and if everything is as you state, the only possible answer is a bug in the compiler/optimizer. Just add the extra scope with a comment (This is, again, if everything is exactly as you have stated).
Stuff like this sometimes happens due to writing through uninitialized pointers, out of bounds array access, etc. The point at which the error is caused may be quite removed from the place where it manifests. However, based on the symptoms you describe it seems to be localized in this function. Could the copy constructor of LargeObject be misbehaving? Is ref being used? Perhaps somePtr isn't pointing to a valid SomeObject. Is Singleton() returning a pointer to a valid object?
Compiler error is also a possibility, especially with aggressive optimization turned on. I would try to recreate the bug with no optimizations.
Time to practice my telepathic debugging skills:
My best guess is your application has a stack corruption bug. This can write junk over the call stack, which means the debugger is incorrectly reporting the function when you break, and it's not really in the destructor. Either that or you are incorrectly interpreting the debugger's information and the object really is being destructed correctly, but you don't know why!
If stack corruption is the case you're going to have a really tough time working out what the root cause is. This is why it's important to implement tonnes of diagnostics (eg. asserts) throughout your program so you can catch the stack corruption when it happens, rather than getting stuck on its weird side effects.
This might be a real long shot but I'm going to put it out there anyway...
You say you use borland - what version? And you say you see the error in a string - STL? Do you include winsock2 at all in your project?
The reason I ask is that I had a problem when using borland 6 (2002) and winsock - the header seemed to mess up the structure packing and meant different translation units had a different idea of the memory layout of std::string, depending on what headers were included by the translation unit, with predictably disastrous results.
Here's another wild guess, since you mentioned strings. I know of at least one implementation where (STL) string copying is done in a lazy manner (i.e., no actual copying of the string contents takes place until a change is made; the "copying" is done by simply having the target string object point to the same buffer as the source). In that particular implementation (GNU) there is a bug whereby excessive copying causes the reference counter (how many objects are using the same actual string memory after supposedly copying it) to roll over to 0, resulting in all sorts of mischief. I haven't encountered this bug myself, but have been told about it by someone who has. (I say this because one would think that the ref counter would be a 32 bit number and the chances of that ever rolling over are pretty slim, to say the least, so I may not be describing the problem properly.)
Related
Please see following code,
class MyClass
{
public:
int i;
MyClass()
{
i = 10;
}
};
MyClass* pObj = nullptr;
int main()
{
{
MyClass obj;
pObj = &obj;
}
while (1)
{
cout << pObj->i; //pObj is dangling pointer, still no crash.
Sleep(1000);
}
return 0;
}
obj will die once it comes out of scope. But I tested in VS 2017, I see no crash even after I use it.
Is it good practice to reset int member varialbe i?
Accessing a member after an object got destroyed is undefined behavior. It may seem like a good to set members in a destructor to a predictable and most likely unexpected value, e.g., a rather large value or a value with specific bit pattern making it easy to recognize the value in a debugger.
However, this idea is flawed and dwarved by the system:
All classes would need to play along and instead of concentrating on creating correct code developers would spent time (both development time as well as run-time) making pointless change.
Compilers happen get rather smart and can detect that changes in destructor are not needed. Since a correct program cannot detect whether the change was made they may not make the change at all. This effect is an actual issue for security applications where, e.g., a password should be erased from memory so it cannot be read (using some non-portable mean).
Even if the value gets set to a specific value, memory gets reused and the values get overwritten. Especially with objects on the stack it is most likely that the memory is used for something else before you see the bad value in a debugger.
Even when resetting values you would necessarily see a "crash": a crash is caused by something being setup to protect against something invalid. In your example you are accessing an int on the stack: the stack will remain accessible from a CPU point of view and at best you'd get an unexpected value. Use of unusual pointer values typically leads to a crash because the memory management system tries to access a location which isn't mapped but even that isn't guaranteed: on a busy 32 bit system pretty much all memory may be in use. That is, trying to rely on undefined behavior being detect is also futile.
Correspondingly, it is much better to use good coding practices which avoid dangling references right away and concentrate on using these. For example, I'm always initializing members in the member initializer list, even in the rare cases they end up getting changed in the body of the constructor (i.e., you'd write your constructor as MyClass(): i() {}).
As a debugging tool it may be reasonable to replace the allocation functions (ideally the allocator object but potentially the global operator new()/operator delete() and family with a version which doesn't quickly hand out released memory and instead fills the released memory with a predictable pattern. Since these actions slow down the program you'd only use this code in a debug build but it is relatively simple to implement once and easy to enable/disable centrally it may be worth the effort. In practice I don't think even such a system pays off as use of managed pointers and proper design of ownership and lifetime avoid most errors due to dangling references.
The behaviour of code you gave is undefined. Partial case of undefined behaviour is working as expected, so here is nothing strange that the code works. Code can work now and it can broke anyway at any time depending of compiler version, compiler options, stack content and a moon phase.
So first and most important is to avoid dangling pointers (and all other kinds of undefined behaviour) everywhere.
What about clearing variables in destructor, I found a best practice:
Follow coding rules saving me from mistakes of access to unallocated or destroyed objects. I cannot describe it in a few words but rules are pretty common (see here and anywhere).
Analyze code by humans (code review) or by statical analyzers (like cppcheck or PVS-Studio or another) to avoid cases similar to one you described above.
Do not call delete manually, better use scoped_ptr or similar object lifetime managers. When delete is reasonable, I usually (usually) set pointer to nullptr after deletion to keep myself from mistakes.
Use pointers as rare as it possible. References are preferred.
When objects of my class used outside and I suspect that somebody can access it after deletion I can put signature field inside, set it to something like 0xDEAD in destructor and check at enter or every public method. Here be careful to not slow down your code to unacceptable speed.
After all of this setting i from your example to 0 or -1 is redundant. As for me it's not a thing you should focus your attention.
If I come across old code that does if (!this) return; in an app, how severe a risk is this? Is it a dangerous ticking time bomb that requires an immediate app-wide search and destroy effort, or is it more like a code smell that can be quietly left in place?
I am not planning on writing code that does this, of course. Rather, I've recently discovered something in an old core library used by many pieces of our app.
Imagine a CLookupThingy class has a non-virtual CThingy *CLookupThingy::Lookup( name ) member function. Apparently one of the programmers back in those cowboy days encountered many crashes where NULL CLookupThingy *s were being passed from functions, and rather than fixing hundreds of call sites, he quietly fixed up Lookup():
CThingy *CLookupThingy::Lookup( name )
{
if (!this)
{
return NULL;
}
// else do the lookup code...
}
// now the above can be used like
CLookupThingy *GetLookup()
{
if (notReady()) return NULL;
// else etc...
}
CThingy *pFoo = GetLookup()->Lookup( "foo" ); // will set pFoo to NULL without crashing
I discovered this gem earlier this week, but now am conflicted as to whether I ought to fix it. This is in a core library used by all of our apps. Several of those apps have already been shipped to millions of customers, and it seems to be working fine; there are no crashes or other bugs from that code. Removing the if !this in the lookup function will mean fixing thousands of call sites that potentially pass NULL; inevitably some will be missed, introducing new bugs that will pop up randomly over the next year of development.
So I'm inclined to leave it alone, unless absolutely necessary.
Given that it is technically undefined behavior, how dangerous is if (!this) in practice? Is it worth man-weeks of labor to fix, or can MSVC and GCC be counted on to safely return?
Our app compiles on MSVC and GCC, and runs on Windows, Ubuntu, and MacOS. Portability to other platforms is irrelevant. The function in question is guaranteed to never be virtual.
Edit: The kind of objective answer I am looking for is something like
"Current versions of MSVC and GCC use an ABI where nonvirtual members are really statics with an implicit 'this' parameter; therefore they will safely branch into the function even if 'this' is NULL" or
"a forthcoming version of GCC will change the ABI so that even nonvirtual functions require loading a branch target from the class pointer" or
"the current GCC 4.5 has an inconsistent ABI where sometimes it compiles nonvirtual members as direct branches with an implicit parameter, and sometimes as class-offset function pointers."
The former means the code is stinky but unlikely to break; the second is something to test after a compiler upgrade; the latter requires immediate action even at high cost.
Clearly this is a latent bug waiting to happen, but right now I'm only concerned with mitigating risk on our specific compilers.
I would leave it alone. This might have been a deliberate choice as an old-fashioned version of the SafeNavigationOperator. As you say, in new code, I wouldn't recommend it, but for existing code, I'd leave it alone. If you do end up modifying it, I'd make sure that all calls to it are well-covered by tests.
Edit to add: you could choose to remove it only in debug versions of your code via:
CThingy *CLookupThingy::Lookup( name )
{
#if !defined(DEBUG)
if (!this)
{
return NULL;
}
#endif
// else do the lookup code...
}
Thus, it wouldn't break anything on production code, while giving you a chance to test it in debug mode.
Like all undefined behavior
if (!this)
{
return NULL;
}
this is a bomb waiting to go off. If it works with your current compilers, you are kind-of lucky, kind-of unlucky!
The next release of the same compilers might be more aggressive and see this as dead code. As this can never be null, the code can "safely" be removed.
I think it is better if you removed it!
If you have many GetLookup functions return NULL, then you're better off fixing code that calls methods using a NULL pointer. First, replace
if (!this) return NULL;
with
if (!this) {
// TODO(Crashworks): Replace this case with an assertion on July, 2012, once all callers are fixed.
printf("Please mail the following stack trace to myemailaddress. Thanks!");
print_stacktrace();
return NULL;
}
Now, carry on with your other work, but fix these as they roll in. Replace:
GetLookup(x)->Lookup(y)...
with
convert_to_proxy(GetLookup(x))->Lookup(y)...
Where conver_to_proxy does returns the pointer unchanged, unless it's NULL, in which case it returns a FailedLookupObject as in my other answer.
It may not crash in most compilers since non-virtual functions are typically either inlined or translated into non-member functions taking "this" as a parameter. However, the standard specifically says that calling a non-static member function outside the lifetime of the object is undefined, and the lifetime of an object is defined as beginning when memory for the object has been allocated and the constructor has completed, if it has non-trivial initialization.
The standard only makes an exception to this rule for calls made by the object itself during construction or destruction, but even then one must be careful because the behavior of virtual calls can differ from the behavior during the object's lifetime.
TL:DR: I'd kill it with fire, even if it will take a long time to clean up all the call sites.
Future versions of the compiler are likely to more aggressively optimize in cases of formally undefined behavior. I wouldn't worry about existing deployments (where you know the behavior the compiler actually implemented), but it should be fixed in the source code in case you ever use a different compiler or different version.
this is something that's called 'a smart and ugly hack'. note: smart != wise.
finding all the call sites without any refactoring tools should be easy enough; break GetLookup() somehow so it doesn't compile (e.g. change signature) so you can identify misusage statically. then add a function called DoLookup() which does what all this hacks are doing right now.
In this case I'd suggest removing the NULL check from the member function and create a non-member function
CThingy* SafeLookup(CLookupThing *lookupThing) {
if (lookupThing == NULL) {
return NULL;
} else {
return lookupThing->Lookup();
}
}
Then it should be easy enough to find every call to the Lookup member function and replace it with the safe non-member function.
If it's something that's bothering you today, it'll bother you a year from now. As you pointed out, changing it will almost certainly introduce some bugs -- but you can begin by retaining the return NULL functionality, add a bit of logging, let it run in the wild for a few weeks, and find how many times it even gets hit?
This is a "ticking bomb" only if you are pedantic about the wording of the specification. However, regardless, it is a terrible, ill-advised approach because it obscures a program error. For that reason alone, I would remove it, even if it means considerable work. It is not an immediate (or even middle-term) risk, but it just isn't a good approach.
Such error hiding behavior really isn't something you want to rely on, either. Imagine you rely on this behavior (i.e. it doesn't matter whether my objects are valid, it will work anyway!) and then, by some hazard, the compiler optimizes out the if statement in a particular case because it can prove that this is not a null pointer. That is a legitimate optimization not just for some hypothetical future compiler, but for very real, present-time compilers as well.
But of course, since your program isn't well-formed, it happens that at some point you pass it a null this around 20 corners. Bang, you're dead.
That's very contrieved, admittedly, and it won't happen, but you cannot be 100% certain that it still cannot possibly happen.
Note that when I shout out "remove!", that does not mean the whole lot of them must be removed immediately or in one massive manpower operation. You could remove these checks one by one as you encounter them (when you change something in the same file anyway, avoid recompilations), or just text-search for one (preferrably in a highly used function), and remove that one.
Since you are using GCC, you may be intersted in __builtin_return_address, which may help you remove these checks without massive manpower and totally disrupting the whole workflow and rendering the application entirely unusable.
Before removing the check, modify it to to output the caller's address, and addr2line will tell you the location in your source. That way, you should be able to quickly identify all the locations in the application that are behaving wrongly, so you can fix these.
So instead of
if(!this) return 0;
change one location at a time to something like:
if(!this) { __builtin_printf("!!! %p\n", __builtin_return_address(0)); return 0; }
That lets you identify the invalid call sites for this change while still letting the program "work as intended" (you can also query the caller's caller if needed). Fix every ill-behaved location, one by one. The program will still "work" as normal.
Once no more addresses come up, remove the check alltogether. You might still have to fix one or the other crash if you are unlucky (because it didn't show while you tested), but that should be a very rare thing to happen. In any case, it should prevent your co-worker from shouting at you.
Remove one or two checks per week, and eventually none will be left. Meanwhile, life goes on and nobody notices what you're doing at all.
TL;DR
As for "current versions of GCC", you are fine for non-virtual functions, but of course nobody can tell what a future version might do. I deem it however highly unlikely that a future version will cause your code to break. Not few existing projects have this kind of check (I remember we had literally hundreds of them in Code::Blocks code completion at some time, don't ask me why!). Compiler makers probably don't want to make dozens/hundreds of major project maintainers unhappy on purpose, only to prove a point.
Also, consider the last paragraph ("from a logical point of view"). Even if this check will crash and burn with a future compiler, it will crash and burn anyway.
The if(!this) return; statement is somewhat useless insofar as this cannot ever be a null pointer in a well-formed program (it means you called a member function on a null pointer). This does not mean, of course, that it couldn't possibly happen. But in this case, the program should crash hard or abort with an assertion. Under no conditions should such a program continue silently.
On the other hand, it is perfectly possible to call a member function on an invalid object that happens to be not null. Checking whether this is the null pointer obviously doesn't catch that case, but it is the exact same UB. So, apart from hiding wrong behavior, this check also only detects one half of the problematic cases.
If you are going by the wording of the speficication, using this (which includes checking whether it's a null pointer) is undefined behavior. Insofar, strictly speaking, it is a "time bomb". However, I would not reasonably deem that a problem, both from a practical point of view and from a logical one.
From a practical point of view, it doesn't really matter whether you read a pointer that is not valid as long as you do not dereference it. Yes, strictly to the letter, this is not allowed. Yes, in theory someone might build a CPU which will check invalid pointers when you load them, and fault. Alas, this isn't the case, if you're being real.
From a logical point of view, assuming that the check will blow up, it still isn't going to happen. For this statement to be executed, the member function must be called, and (virtual or not, inlined or not) is using an invalid this, which it makes available inside the function body. If one illegitimate use of this blows up, the other will, too. Thus, the check is being obsoleted because the program already crashes earlier.
n.b.: This check is very similar to the "safe delete idiom" which sets a pointer to nullptr after deleting it (using a macro or a templated safe_delete function). Presumably, this is "safe" because it doesn't crash deleting the same pointer twice. Some people even add a redundant if(!ptr) delete ptr;.
As you know, operator delete is guaranteed to be a no-op on a null pointer. Which means no more and no less than by setting a pointer to the null pointer, you have successfully eliminated the only chance to detect double deletion (which is a program error that needs to be fixed!). It is not any "safer", but it instead hides incorrect program behavior. If you delete an object twice, the program should crash hard.
You should either leave a deleted pointer alone, or, if you insist on tampering, set it to a non-null invalid pointer (such as the address of a special "invalid" global variable, or a magic value like -1 if you will -- but you should not try to cheat and hide the crash when it is to occur).
You can safely fix this today by returning a failed lookup object.
class CLookupThingy: public Interface {
// ...
}
class CFailedLookupThingy: public Interface {
public:
CThingy* Lookup(string const& name) {
return NULL;
}
operator bool() const { return false; } // So that GetLookup() can be tested in a condition.
} failed_lookup;
Interface *GetLookup() {
if (notReady())
return &failed_lookup;
// else etc...
}
This code still works:
CThingy *pFoo = GetLookup()->Lookup( "foo" ); // will set pFoo to NULL without crashing
It's my personal opinion that you should fail as early as possible to alert you to problems. In that case, I'd unceremoniously remove each and every occurrence of if(!this) I could find.
This must be me doing something stupid, but has anyone seen this behaviour before:
I have a map in a class member defined like so:
std::map <const std::string, int> m_fCurveMap;
all behaves fine in debug but all goes wrong in release mode. map gets initialised to some crazy number: m_fCurveMap [14757395258967641292]()
Any member I have after the map gets completely corrupted, ie if I put an int on the line after the map like this:
std::map <const std::string, int> m_fCurveMap;
int m_myIntThing;
and in my constructors set m_myIntThing to 0, after the constructor has been called m_myIntThing is some crazy number. If I move m_myIntThing to the line above the map everything for m_myIntThing is fine. This ends up causing big problems for me further down the line. Do I need to do something to the map in my constructor? I'm not at the moment.
I am using visual studio, this works fine with gcc. I only see the problem in release. The project is a dll.
If you have seen this kind of madness before please help its driving me mad. :-)
Many thanks,
Mark
This has happened to me lots of times. Although it's hard to say in your case, a very likely reason is that you have different versions of the C run time library in between different projects. Check your "code generation" tab in the compiler settings for your different projects and make certain they are the same.
What's effectively happening is that different versions of the C run time libraries implement STL containers in different ways. Then when the different projects try to talk to each other, the meaning of what an std::map is (for instance) have changed and are no longer binary compatible.
The strange behavior is very likely some kind of heap corruption, or if it's being passed as a parameter to a function, stack corruption.
The problem is memory corruption of some kind.
A bug that I have seen often in C++ projects is using an object after it has been deleted.
Another possibility is a buffer overflow. It could be any object on the same stack or nearby on the heap.
A pretty good way to catch the culprit is to set a debugger breakpoint that fires on memory change. While the object is still good, set your breakpoint. Then wait until some code writes into that memory location. That should reveal your bug.
If you're getting your information from the VS debugger, I wouldn't trust what it is telling you for a Release DLL. The debugger can only be really trusted with Debug DLLs.
If program output is telling you this, then that's different -- in that case, you're not providing enough information.
Are you mixing a release DLL with a debug app?
Otherwise it sounds like memory corruption, although I can't say for sure.
Something else is stomping on memory
You're accessing deleted memory
You're returning a temporary by pointer or reference
etc
Any of these could appear to work fine in some cases as they're undefined behavior, and only in release mode do they blow up.
I had the exact same problem on g++, I got it resolved by removing the pragmas in a pragma paragraph before that. Eventhough the code is correct, I wonder if this is a compiler bug on the platform showing up when using stl::map in some situations.
#pragma pack(push,1)
xxxx
#pragma(pop)
Just to give a concrete example for the memory corruption:
typedef std::map<int, int> mymap_t;
static mymap_t static_init() { return mymap_t(); }
class foo {
foo(): mymap(static_init()) {}
//!> d'oh, don't reference!
const mymap_t &mymap;
};
Accidentally, I defined a ref to a member variable and not the member variable itself. It gets initialized alright, but as soon as the scope of static_init() is left, the map is destroyed and the ref will just show up in debug as "std::map with 140737305218461 elements" (pretty-printed) or similar as it points to now unallocated meory (or worse).
Beware of accidental references!
I've got a strange problem here. Assume that I have a class with some virtual methods. Under a certain circumstances an instance of this class should call one of those methods. Most of the time no problems occur on that stage, but sometimes it turns out that virtual method cannot be called, because the pointer to that method is NULL (as shown in VS), so memory access violation exception occurs. How could that happen?
Application is pretty large and complicated, so I don't really know what low-level steps lead to this situation. Posting raw code wouldn't be useful.
UPD: Ok, I see that my presentation of the problem is rather indefinite, so schematically code looks like
void MyClass::FirstMethod() const { /* Do stuff */ }
void MyClass::SecondMethod() const
{
// This is where exception occurs,
// description of this method during runtime in VS looks like 0x000000
FirstMethod();
}
No constructors or destructors involved.
Heap corruption is a likely candidate. The v-table pointer in the object is vulnerable, it is usually the first field in the object. A buffer overflow for some kind of other object that happens to be adjacent to the object will wipe the v-table pointer. The call to a virtual method, often much later, will blow.
Another classic case is having a bad "this" pointer, usually NULL or a low value. That happens when the object reference on which you call the method is bad. The method will run as usual but blow up as soon as it tries to access a class member. Again, heap corruption or using a pointer that was deleted will cause this. Good luck debugging this; it is never easy.
Possibly you're calling the function (directly or indirectly) from a constructor of a base class which itself doesn't have that function.
Possibly there's a broken cast somewhere (such as a reinterpret_cast of a pointer when there's multiple inheritance involved) and you're looking at the vtable for the wrong class.
Possibly (but unlikely) you have somehow trashed the vtable.
Is the pointer to the function null just for this object, or for all other objects of the same type? If the former, then the vtable pointer is broken, and you're looking in the wrong place. If the latter, then the vtable itself is broken.
One scenario this could happen in is if you tried to call a pure virtual method in a destructor or constructor. At this point the virtual table pointer for the method may not be initialized causing a crash.
Is it possible the "this" pointer is getting deleted during SecondMethod's processing?
Another possibility is that SecondMethod is actually being called with an invalid pointer right up front, and that it just happens to work (by undefined behavior) up to the nested function call which then fails. If you're able to add print code, check to see if "this" and/or other pointers being used is something like 0xcdcdcdcd or 0xfdfdfdfd at various points during execution of those methods. Those values are (I believe) used by VS on memory alloc/dealloc, which may be why it works when compiled in debug mode.
What you are most likely seeing is a side-effect of the actual problem. Most likely heap or memory corruption, or referencing a previously freed object or null pointer.
If you can consistently have it crash at the same place and can figure out where the null pointer is being loaded from then I suggest using the debugger and put a breakpoint on 'write' at that memory location, once the breakpoint is trigerred then most likely you are viewing the code that has actually caused the corruption.
If memory access violation happens only when Studio fails to show method address, then it could be caused by missing debug information. You probably are debugging the code compiled with release (non-debug) compiler/linker flags.
Try to enable some debug info in C++ properties of project, rebuild and restart debugger. If it will help, you will see all normal traceable things like stack, variables etc.
If your this pointer is NULL, corruption is unlikely. Unless you're zeroing memory you shouldn't have.
You didn't say if you're debugging Debug (not optimized) or Release (optimized) build. Typically, in Release build optimizer will remove this pointer if it is not needed. So, if you're debugging optimized build, seeing this pointer as 0 doesn't mean anything. You have to rely on the deassembly to tell you what's going on. Try turning off optimization in your Release build if you cannot reproduce the problem in Debug build. When debugging optimized build, you're debugging assembly not C++.
If you're already debugging a non-optimized build, make sure you have a clean rebuild before spending too much time debugging corrupted images. Debug builds are typically linked incrementally and incremental linker are known to produce problems like this. If you're running Debug build with clean build and still couldn't figure out what went wrong, post the stack dump and more code. I'm sure we can help you figure it out.
I'm working on a GUI in SDL. I've created a slave/master class that contains a std::list of pointers to it's own slaves to create a heirarchy in the GUI (window containing buttons. Button a label and so on). It worked fine for a good while, until I edited a completely different class that doesn't effect the slave/master class directly. The call to the list.push_front() in the old slave/master class now throws the following error when debugging in VS C++ 2008 express and I can't find what's causing it.
"Unhandled exception at 0x00b6decd in
workbench.exe: 0xC0000005: Access
violation reading location
0x00000004."
*workbench.exe is my project.
The exception is raised in the _Insert method in the list code on row 718:
_Nodeptr _Newnode = _Buynode(_Pnode, _Prevnode(_Pnode), _Val);
The list is created in the master/slave class' definition and the slave/master class is created on the heap to be inserted in another master's slave list. The list that crashes is empty when push_front() is called but it is second in line in the heirarchy, so it worked once. As I said, it worked fine before and the slave/master class hasn't been altered to cause the error.
The new class does use lists aswell. Can the use of several lists cause clashes? May I have accidentally screwed up the heap?
Any help and tips to what I could look for is appreciated.
P.S The code is rather large now so I would guess it's better to not include it. Especially since I'm not exactly sure just what causes the error. Sorry if it's a bit scarce
Update: I've replaced the push_front() with creating an iterator and using insert(). The result was an iterator pointing to "baadf00d" after assigning the list.begin(). baadf00d is some error/NULL pointer that VS uses to objects that haven't been assigned anything, as far as I can tell. I guess it's another sign that the list is corrupt?
Usually errors like this with addresses like 0x00000004 indicate dereferencing a NULL pointer, e.g.
struct point {
int x;
int y;
};
struct point *pt = NULL;
printf("%d\n", pt->y);
can create an error like that.
Doesn't smell like heap corruption to me, usually those errors tend to be subtler, I bet this is a case of a NULL pointer. I'd go up the call stack and hunt for null pointers, could be a member of the the object you're pushing on to the fron of the list or that object itself. If you do think this is a heap corruption issue, you can use gflags, which is free, to enable page heap and the like which will let you detect heap corruption earlier, hopefully as it happens, rather than by the side effects it causes later.
Probably, you have caused some kind of buffer overrun or other memory corruption in your original code which did not manifest until now. There is no risk of conflict between different list instances, and as you say, the new code does not interact with the old code. Therefore barring magic, you have coded a bug.
Given the lack of code, the best I can do is give you some scenarios I can think of:
The most obvious, and hardest to find, is a memory corruption. The list has been walked on, so adding to the item means the list manipulates crap memory.
You can test this by moving variables around in their declaration, and by changing the order of assignment. If this makes the error go away or move, you are looking at a memory problem. You could also try changing the list to a vector, and see what that does.
A second possibility is that you have a list of pointers/references and the items are being deallocated before/after being put on the stack. This can easily happen if you put the address of a stack object into a list allocated elsewhere. You say you created the object on the heap, so I guess it isn't this.
My guess, based on the seeing the combination of _Prevnode and push_front is that the list was corrupted earlier, possibly by misusing an iterator. Another way to corrupt a list is removing an element from an empty list. Make sure you have iterator debugging turned on in VS2008. It will catch many problems a lot earlier.
Finally after looking through every nook and cranny I've found the bug! It was completely unexpected and I feel a bit embarrassed about it.
I had recently re-arranged the files. Prior to that I had generic classes in one folder and my user interface files in a subfolder. I copied the GUI files to the main folder and I thought I linked everything up correctly, but obviously I missed one line and it never occurred to me when it started acting up. My library compiled since that was linked fine, but my testing program wasn't... it simply looked at the old header files! Worked fine to begin with ofcourse since the headers were the same, but then I edited one and the class declared in it started acting funny as mentioned, obviously, since it couldn't recognize the damn thing anymore. It looked like corrupted memory, so that's what I looked for.
Lesson learned: Don't keep two versions close to each other or at all.