C++ rare runtime error - c++

I have a class B that inherits the class A with some virtual functions. Class B also has a virtual function (foo) that seems to have no address. When i walk with the debugger it points that foo has 0x00000000 address and when i try to step in it will fail with access violation at 0x00000005. If i make that function not virtual the debugger steps in and will work fine until i reach a std::vector. There when i call push_back it will fail with the same access violation at address 0x000000005 while writing some stuff at address 0xabababab, and the call stack points to a mutex lock in insert function.
Note: I'm not using any other thread and the incremental linker will crash every time i compile. Only the full linker will successfully create the exe. The compiler is from Visual Studio 2008 pro and this problem started to occur when stripping out unused source files and source code.
Unfortunately i was unable to revert to the previous state, in order to spot the change that created this.
How can i detect the source of the problem, without reverting the entire project? Also has anyone encountered this kind of error, maybe it might the same cause.

You guess that the virtual table is broken, but that's unlikely, because vtables are usually stored in read-only memory.
I can think of two reasons for this behavior:
The object you are using has been deleted. It may work by chance if the memory where the object used to be, but fail miserably if it get overwritten.
The object you are using is not of dynamic type B. Maybe it is of type A or maybe of an unrelated type.
I have successfully tracked this kind of issues with printf debugging: Add a few lines with printf("XXX %p", this); in the constructor of B, the destructor, the virtual functions and the failing function, and you'll be able to deduce what is happening.
Yes, I know, printf debugging is not cool...

You are calling a virtual function on a null pointer. The compiler adds code that will use a hidden pointer in the object to locate what is the final overrider, and that operation is failing. When you change the function to non-virtual, the call is dispatched statically, but again, access to members fail as the this pointer is null.
You should check the validity of the object on which you are calling the method in your code.

Related

Segfault calling virtual method on initialized object

I'm getting a seg fault that I do not understand. I'm using the Wt library and doing some fancy things with signals (which I only mention because it has enabled me to attempt to debug this).
I'm getting a pointer to one of my widgets from a vector and trying to call a method on the object it points to. Gdb shows that the pointer resolves, and if I examine the object it points to, it is exactly the one I need to modify. In this instance, the widget is broadcasting to itself, so it is registered as both the broadcaster and the listener; therefore, I was also able to verify that the 'broadcaster' pointer and the 'listener' pointer are accessing the same object. They do!
However, even though I can see that the object exists, and is initialized, and is in fact the correct object, when I try to call a method on the object, I get an immediate seg fault. I've tried a few different methods (including a few boolean returns that don't modify the object). I've tried calling them through the broadcaster pointer and the listener pointer, again, just to try to debug.
The debugger doesn't even enter the object; the segfault occurs immediately on attempting to call a method.
Code!
/* listeners is a vector of pointers to widgets to whom the broadcasting widget
* is trying to signal.
*/
unsigned int num_listeners = listeners.size();
for (int w = 0; w < num_listeners; w++)
{
// Moldable is an abstraction of another widget type
Moldable* widget = listeners.at(w);
/* Because in this case, the broadcaster and the listener are one in the same,
* these two point to the same location in memory; this part works. I know, therefore,
* that the object has been instantiated, exists, and is happy, or we wouldn't
* have gotten to this point to begin with. I can also examine the fields with gdb
* and can verify that all of this is correct.
*/
Moldable* broadcaster_debug = broadcast->getBroadcaster();
/* setStyle is a method I created, and have tested in other instances and it
* works just fine; I've also used native Wt methods for testing this problem and
* they are also met with segfaults.
*/
widget->setStyle(new_style); // segfault goes here!
}
I have read since researching that storing pointers in vectors is not the greatest idea and I should look into boost::shared_ptr. That may be so, and I will look into it, but it doesn't explain why calling a method on an object known to exist causes a segfault. I'd like to understand why this is happening.
Thanks for any assistance.
Edit:
I have created a gist with the vector operations detailed because it was more code than would comfortably fit in the post.
https://gist.github.com/3111137
I have not shown the code where the widgets are created because it's a recursive algorithm and in order to do that, I would have to show the entire class decision tree for creating widgets. Suffice to say that the widgets are being created; I can see them on the page when viewing the application in a browser. Everything works fine until I start playing with my fancy signals.
Moar Edit:
When I take a look at the disassembly in instruction stepping mode, I can see that just before the segfault occurs, the following operation takes place, the first argument of which is listed as 'void'. Admittedly, I know nothing about Assembly much to my chagrin, but this seems to be important. Can anyone explain what this instruction means and whether it might be the cause of my woes?
add $0x378,%rax //$0x378 is listed as 'void'
Another Edit:
At someone's suggestion, I created a non-virtual method that I am able to successfully call just before the seg fault, meaning the object is in fact there. If I take the same method and make it virtual, the seg fault occurs. So, why do only virtual methods create a seg fault?
I've discovered now that if in the calling class, I make sure to specify Moldable::debug_test (and Moldable::setStyle), the seg fault does not take place. However, this seems to have a similar effect as const bubbling -- every virtual method seems to want this specifier. I've never witnessed this behaviour before. While i'm willing to correct my code if that's REALLY how it's supposed to be, I'm not sure if the root problem is something else.
Getting there!
Well, I figured out the problem, though I'm sad to say it was a totally newbish mistake that due to the nature of the project was super difficult to find. I'll put the answer here, and I've also voted to close the question as too localized. Please feel free to do the same.
The BroadcastMessage class had a __broadcaster field (Moldable* __broadcaster;). When passing in the pointer to the broadcaster into the BroadcastMessage constructor, I forgot to assign the inbound pointer to that field, meaning __broadcaster was not a fully realised instance of the Moldable class.
Therefore, some methods were in fact working -- those that could be inlined, or my dummy functions that I created for testing (one of which returned a value of 1, for instance), so it was appearing that there was a full object there when in fact there was not. It wasn't until calling a more specialized method that tried to access some specific, dynamic property of the object that the segfault occurred.
What's more, most of the broadcast message lifespan was in its constructor, meaning that most of its purpose was fulfilled without issue, because the broadcaster was available in the local scope of the constructor.
However, using Valgrind as suggested, I did uncover some other potential issues. I also pretty much stripped-down and re-built the entire project. I trashed tons of unnecessary code and it runs a lot faster now as a side effect.
Anyway, thanks for all the assistance. Sorry the solution wasn't more of a discovery.

new object causes corruption on the heap

I've been struggling with a heap corruption problem for a few days. I was first warned by the vs 2005 debugger that I may have corrupted the heap, after deleting an object I had previously new'ed. Doing research on this problem led me to gflags and the page heap setting. After enabling this setting for my particular image, it supposedly pointed me to the line that is actually causing the corruption.
Gflags identified the constructor for the object in question as the culprit. The object derives as follows:
class POPUPS_EXPORT MLUNumber : public MLUBase
{
...
}
class POPUPS_EXPORT MLUBase : public BusinessLogicUnit
{
...
}
I can instantiate an MLUNumber in a separate thread, and no heap corruption occurs.
I can instantiate a different class, that also inherits from MLUBase, that does not cause heap corruption.
The access violation raises due to the corruption occurs on the opening brace of the constructor, which appears to be because of the implicit initializing of the object (?).
The base class constructor (MLUBase) successfully finishes.
From digging with the memory window in vs 2005, it appears that there was not enough space allocated for the actual object. My guess is that enough was allocated for the base class only.
The line causing the fault:
BusinessLogicUnit* biz = new MLUNumber();
I'm hoping for either a reason that might cause this, or another troubleshooting step to follow.
Unfortunately, with the information given, it's not possible to definitively diagnose the problem.
Some things you may want to check:
Make sure BusinessLogicUnit has a virtual destructor. When deleteing objects through a base pointer, a virtual destructor must be present in the base class for the subclass to be properly destructed.
Make sure you're building all source files with the same preprocessor flags and compiler options. A difference in flags (perhaps between debug/release flags?) could result in a change in structure size, and thus an inconsistency between sizes reported in different source files.
It's possible for some types of heap corruption to go undetected, even with your gflags settings. Audit your other heap uses to try to find the source of your issues as well. Ideally you should put together a minimal test case that will reliably crash, but with a minimum amount of activity, so you can narrow down the cause.
Try a clean solution and rebuild; I've occasionally seen timestamps getting screwed up, and an old object file can get in with an out-of-date structure definition. Worth checking at least :)
BusinessLogicUnit* biz = new MLUNumber();
How do you delete the memory? Using the base-class pointer? Have you made the destructor of BusinessLogicUnit virtual? It must be virtual.
class BusinessLogicUnit
{
public:
//..
virtual ~BusinessLogicUnit(); //it must be virtual!
};
Otherwise deleting the derived class object through the base-class pointer invokes undefined behavior as per the C++ Standard.
BusinessLogicUnit is not an MLUNumber. Why would you allocate this way? Instead
BusinessLogicUnit* biz = new BusinessLogicUnit();
Or maybe you do something like this?
struct A
{
SomeType & m_param;
A(SomeType & param) : m_param(param)
{
...use m_param here...
}
};
A a(SomeType()); // passing a temporary by reference
Then that's undefined behaviour, because the referenced temporary dies right after m_param(param) happens..
I agree with bdonlan that there isn't enough information yet to figure out what's wrong. There are a lot of good suggestions here, but just guessing possible reasons why an application is crashing is not a smart way to root cause an issue.
You've done the right thing by enabling instrumentation (pageheap) to help you narrow down the problem. I would continue down this path by finding out exactly which memory address is causing the access violation (and where the address came from).

VSC++, virtual method at bad address, curious bug

This guy:
virtual phTreeClass* GetTreeClass() const { return (phTreeClass*)m_entity_class; }
When called, crashed the program with an access violation, even after a full recompile. All member functions and virtual member functions had correct memory addresses (I hovered mouse over the methods in debug mode), but this function had a bad memory address: 0xfffffffc.
Everything looked okay: the 'this' pointer, and everything works fine up until this function call. This function is also pretty old and I didn't change it for a long time. The problem just suddenly popped up after some work, which I commented all out to see what was doing it, without any success.
So I removed the virtual, compiled, and it works fine. I add virtual, compiled, and it still works fine! I basically changed nothing, and remember that I did do a full recompile earlier, and still had the error back then.
I wasn't able to reproduce the problem. But now it is back. I didn't change anything. Removing virtual fixes the problem.
Don't ever use C-style casts with polymorphic types unless you're seriously sure of what you're doing. The overwhelming probability is that you cast it to a type that it wasn't. If your pointers don't implicitly cast (because they cast to a base class, which is safe) then you're doing it wrong.
Compilers and linkers are pieces of software written by human like any other, and thus inherently cannot be error-free..
We occasionally run into such inexplicable issues and fixes too. There's a myth going around here that deleting the ncb file once fixed a build..
Given that recompiling originally fixed the problem, try doing a full clean and rebuild first.
If that fails, then it looks extremely likely that even though your this pointer appears correct to you, it is in fact deleted/deconstructed and pointed at garbage memory that just happens to look like the real object that was there before. If you're using gdb to debug, the first word at the object's pointer will be the vtable. If you do an x/16xw <addr> (for example) memory dump at that location gdb will tell you what sort of object's vtable resides there. If it's the parent-most type then the object is definitely gone.
Alternately if the this pointer isthe same every time you can put a breakpoint in the class destructor with the condition that this == known_addr.

Access violation exception when calling a method

I've got a strange problem here. Assume that I have a class with some virtual methods. Under a certain circumstances an instance of this class should call one of those methods. Most of the time no problems occur on that stage, but sometimes it turns out that virtual method cannot be called, because the pointer to that method is NULL (as shown in VS), so memory access violation exception occurs. How could that happen?
Application is pretty large and complicated, so I don't really know what low-level steps lead to this situation. Posting raw code wouldn't be useful.
UPD: Ok, I see that my presentation of the problem is rather indefinite, so schematically code looks like
void MyClass::FirstMethod() const { /* Do stuff */ }
void MyClass::SecondMethod() const
{
// This is where exception occurs,
// description of this method during runtime in VS looks like 0x000000
FirstMethod();
}
No constructors or destructors involved.
Heap corruption is a likely candidate. The v-table pointer in the object is vulnerable, it is usually the first field in the object. A buffer overflow for some kind of other object that happens to be adjacent to the object will wipe the v-table pointer. The call to a virtual method, often much later, will blow.
Another classic case is having a bad "this" pointer, usually NULL or a low value. That happens when the object reference on which you call the method is bad. The method will run as usual but blow up as soon as it tries to access a class member. Again, heap corruption or using a pointer that was deleted will cause this. Good luck debugging this; it is never easy.
Possibly you're calling the function (directly or indirectly) from a constructor of a base class which itself doesn't have that function.
Possibly there's a broken cast somewhere (such as a reinterpret_cast of a pointer when there's multiple inheritance involved) and you're looking at the vtable for the wrong class.
Possibly (but unlikely) you have somehow trashed the vtable.
Is the pointer to the function null just for this object, or for all other objects of the same type? If the former, then the vtable pointer is broken, and you're looking in the wrong place. If the latter, then the vtable itself is broken.
One scenario this could happen in is if you tried to call a pure virtual method in a destructor or constructor. At this point the virtual table pointer for the method may not be initialized causing a crash.
Is it possible the "this" pointer is getting deleted during SecondMethod's processing?
Another possibility is that SecondMethod is actually being called with an invalid pointer right up front, and that it just happens to work (by undefined behavior) up to the nested function call which then fails. If you're able to add print code, check to see if "this" and/or other pointers being used is something like 0xcdcdcdcd or 0xfdfdfdfd at various points during execution of those methods. Those values are (I believe) used by VS on memory alloc/dealloc, which may be why it works when compiled in debug mode.
What you are most likely seeing is a side-effect of the actual problem. Most likely heap or memory corruption, or referencing a previously freed object or null pointer.
If you can consistently have it crash at the same place and can figure out where the null pointer is being loaded from then I suggest using the debugger and put a breakpoint on 'write' at that memory location, once the breakpoint is trigerred then most likely you are viewing the code that has actually caused the corruption.
If memory access violation happens only when Studio fails to show method address, then it could be caused by missing debug information. You probably are debugging the code compiled with release (non-debug) compiler/linker flags.
Try to enable some debug info in C++ properties of project, rebuild and restart debugger. If it will help, you will see all normal traceable things like stack, variables etc.
If your this pointer is NULL, corruption is unlikely. Unless you're zeroing memory you shouldn't have.
You didn't say if you're debugging Debug (not optimized) or Release (optimized) build. Typically, in Release build optimizer will remove this pointer if it is not needed. So, if you're debugging optimized build, seeing this pointer as 0 doesn't mean anything. You have to rely on the deassembly to tell you what's going on. Try turning off optimization in your Release build if you cannot reproduce the problem in Debug build. When debugging optimized build, you're debugging assembly not C++.
If you're already debugging a non-optimized build, make sure you have a clean rebuild before spending too much time debugging corrupted images. Debug builds are typically linked incrementally and incremental linker are known to produce problems like this. If you're running Debug build with clean build and still couldn't figure out what went wrong, post the stack dump and more code. I'm sure we can help you figure it out.

Using shared_ptr in dll-interfaces

I have an abstract class in my dll.
class IBase {
protected:
virtual ~IBase() = 0;
public:
virtual void f() = 0;
};
I want to get IBase in my exe-file which loads dll.
First way is to create following function
IBase * CreateInterface();
and to add the virtual function Release() in IBase.
Second way is to create another function
boost::shared_ptr<IBase> CreateInterface();
and no Release() function is needed.
Questions.
1) Is it true that the destructor and memory deallocation is called in the dll (not in exe-file) in the second case?
2) Does the second case work well if exe-file and dll was compiled with different compilers (or different settings).
An answer to your first question: The virtual destructor in your dll is called - the information about its location is embedded in your object (in the vtable). In the case of memory deallocation it depends how disciplined the users of your IBase are. If they know they have to call Release() and consider that exception can bypass the control flow in an surprising direction, the right one will be used.
But if CreateInterface() returns shared_ptr<IBase> it can bind the right deallocation function right to this smart pointer. Your library may look like this:
Destroy(IBase* p) {
... // whatever is needed to delete your object in the right way
}
boost::shared_ptr<IBase> CreateInterface() {
IBase *p = new MyConcreteBase(...);
...
return shared_ptr<IBase>(p, Destroy); // bind Destroy() to the shared_ptr
} // which is called instead of a plain
// delete
Thus every user of your DLL is easily prevented against resource leaks. They never have to bother about calling Release() or pay attention to exceptions bypassing surprisingly their control flow.
To answer your second question: The downside of this approach is clearly stated by the other answers: You're audience has to use the same compiler, linker, settings, libraries as you. And if they could be quite a lot this can be major drawback for your library. You have to choose: Safety vs. larger audience
But there's a possible loophole: Use shared_ptr<IBase>in your application, i.e.
{
shared_ptr<IBase> p(CreateInterface(), DestroyFromLibrary);
...
func();
...
}
Thus no implementation specific object is passed across the DLL boundary. Nevertheless your pointer is safely hidden behind the shared_ptr, who's calling DestroyFromLibrary at the right time even if func()'s throwing an exception or not.
I would advise against using shared_ptr in the interface. Even using C++ at all in the interface of a DLL (as opposed to "extern C" only routines) is problematic because name-mangling will prevent you from using the DLL with a different compiler. Using shared_ptr is particularly problematic because, as you have already identified, there is no guarantee that the client of the DLL will use the same implementation of shared_ptr as the caller. (This is because shared_ptr is a template class and the implementation is contained entirely in the header file.)
To answer your specific questions:
I'm not quite sure what you are asking here... I'm assuming that your DLL will contain implementations of classes derived from IBase. The code for their destructors (as well as the rest of the code) will, in both of your cases, be contained in the DLL. However, if the client initiates the destruction of the object (by calling delete in the first case or by letting the last instance of the shared_ptr go out of scope in the second case), then the destructor will be called from client code.
Name-mangling will usually prevent your DLL from being used with a different compiler anyway... but the implementation of shared_ptr may change even in a new release of the same compiler, and that can get you into trouble. I would shy away from using the second option.
Using shared_ptr will make sure the resource releasing function will be called in the DLL.
Have a look at the answers to this question.
A way out of this problem is to create a pure C interface and a thin fully inlined C++ wrapper around it.
On your first question: I'm taking an educated guess and not speaking from experience, but it seems to me that the second case the memory deallocation will be called "in the .exe". There are two things that happen when you call delete object;: first, destructors are called and second, the memory for the object is freed. The first part, destructor calling, will definitely work as you expect, calling the right destructors in your dll. However, since shared_ptr is class template, its destructor is generated in your .exe, and therefore it will call operator delete() in your exe and not the one in the .dll. If the two were linked against different runtime versions (or even statically linked against the same runtime version) this should lead to the dreaded undefined behavior (this is the part I'm not entirely sure about, but it seems logical to be that way). There's a simple way to verify if what I said is true - override the global operator delete in your exe, but not your dll, put a breakpoint in it and see what's called in the second case (I'd do that myself, but I have this much time for slacking off, unfortunately).
Note that the same gotcha exist for the first case (you seem to realize that, but just in case). If you do this in the exe:
IBase *p = CreateInterface();
delete p;
then you are in the same trap - calling operator new in the dll and calling operator delete in the exe. You'll either need a corresponding DeleteInterface(IBase *p) function in your dll or a Release() method in IBase (which doesn't have to be virtual, just not make it inline) for the sole purpose of calling the right memory deallocation function.