Virtual methods chaos, how can I find what causes that? - c++

A colleague of mine had a problem with some C++ code today. He was debugging the weird behaviour of an object's virtual method. Whenever the method executed ( under debug, Visual Studio 2005 ), everything went wrong, and the debugger wouldn't step in that method, but in the object's destructor! Also, the virtual table of the object, only listed it's destructor, no other methods.
I haven't seen this behaviour before, and a runtime error was printed, saying something about the ESP register. I wish I could give you the right error message, but I don't remember it correctly now.
Anyway, have any of you guys ever encountered that? What could cause such behaviour? How would that be fixed? We tried to rebuild the project many times, restarted the IDE, nothing helped. We also used the _CrtCheckMemory function before that method call to make sure the memory was in a good state, and it returned true ( which means ok ) . I have no more ideas. Do you?

I've seen that before. Generally it occurs because I'm using a class from a Release built .LIB file while I'm in Debug mode. Someone else probably has seen a better example and I'd yield my answer to their answer.

Maybe you use C-style casts where a static_cast<> is required? This may result in the kind of error you report, whenever multiple inheritance is involved, e.g.:
class Base1 {};
class Base2 {};
class Derived : public Base1, public Base2 {};
Derived *d = new Derived;
Base2* b2_1 = (Base2*)d; // wrong!
Base2* b2_2 = static_cast<Base2*>(d); // correct
assert( b2_1 == b2_2 ); // assertion may fail, because b2_1 != b2_2
Note, this may not always be the case, this depends on the compiler and on declarations of all the classes involved (it probably happens when all classes have virtual methods, but I do not have exact rules at hand).
OR: A completely different part of your code is going wild and is overwriting memory. Try to isolate the error and check if it still occurs. CrtCheckMemory will find only a few cases where you overwrite memory (e.g. when you write into specially marked heap management locations).

If you ever call a function with the wrong number of parameters, this can easily end up trashing your stack and producing undefined behaviour. I seem to recall that certain errors when using MFC could easily cause this, for example if you use the dispatch macros to point a message at a method that doesn't have the right number or type of parameters (I seem to recall that those function pointers aren't strongly checked for type). It's been probably a decade since I last encountered that particular problem, so my memory is hazy.

The value of ESP was not properly saved across a function call.
This sort of behaviour is usually indicative of the calling code having been compiled with a different definition of a class or function than the code that created the particular class in question.
Is it possible that there is an different version of a component dll that is being loaded instead of the freshly built one? This can happen if you copy things as part of a post-build step or if the process is run from a different directory or changes its dll search path before doing a LoadLibrary or equivalent.
I've encountered it most often in complex projects where a class definition is changed to add, remove or change the signature of a virtual function and then an incremental build is done and not all the code that needs to be recompiled is actually recompiled. Theoretically, it could happen if some part of the program is overwriting the vptr or vtables of some polymorhpic objects but I've always found that a bad partial build is a much more likely cause.
This may 'user error', a developer deliberately tells the compiler to only build one project when others should be rebuilt, or it can be having multiple solutions or multiple projects in a solution where the dependencies are not correctly setup.
Very occasionally, Visual Studio can slip up and not get the generated dependencies correct even when the projects in a solution are correctly linked. This happens less often than Visual Studios is blamed for it.
Expunging all intermediate build files and rebuilding everything from source usually fixes the problem. Obviously for very large projects this can be a severe penalty.

Since it's guess-fest anyway, here's one from me:
You stack is messed up and _CrtCheckMemory doesn't check for that. As to why the stack is corrupted:
good old stack overflow
calling convention mismatches, which was already mentioned (I don't know, like passing a callback in the wrong calling convention to a WinAPI function; what static or dynamic libraries are you linking with?)
a line like printf("%d");

Related

C++ What could cause a line of code in one place to change behavior in an unrelated function?

Consider this mock-up of my situation.
in an external header:
class ThirdPartyObject
{
...
}
my code: (spread among a few headers and source files)
class ThirdPartyObjectWrapper
{
private:
ThirdPartyObject myObject;
}
class Owner
{
public:
Owner() {}
void initialize();
private:
ThirdPartyObjectWrapper myWrappedObject;
};
void Owner::initialize()
{
//not weird:
//ThirdPartyObjectWrapper testWrappedObject;
//weird:
//ThirdPartyObject testObject;
}
ThirdPartyObject is, naturally, an object defined by a third party (static precompiled) library I'm using. ThirdPartyObjectWrapper is a convenience class that eliminates a lot of boiler-plating for working with ThirdPartyObject. Owner::initialize() is called shortly after an instance of Owner is created.
Notice the two lines I have labeled as "weird" and "not weird" in Owner::initialize(). All I'm doing here is creating a couple of objects on the stack with their default constructors. I don't do anything with those objects and they get destroyed when they leave scope. There are no build or linker errors involved, I can uncomment either or both lines and the code will build.
However, if I uncomment "weird" then I get a segmentation fault, and (here's why I say it's weird) it's in a completely unrelated location. Not in the constructor of testObject, like you might expect, but in the constructor of Owner::myObjectWrapper::myObject. The weird line never even gets called, but somehow its presence or absence consistently changes the behavior of an unrelated function in a static library.
And consider that if I only uncomment "not weird" then it runs fine, executing the ThirdPartyObject constructor twice with no problems.
I've been working with C++ for a year so it's not really a surprise to me that something like this would be able happen, but I've about reached the limit of my ability to figure out how this gotcha is happening. I need the input of people with significantly more C++ experience than me.
What are some possibilities that could cause this to happen? What might be going on here?
Also, note, I'm not asking for advice on how to get rid of the segfault. Segfaults I understand, I suspect it's a simple race condition. What I don't understand is the behavior gotcha so that's the only thing I'm trying to get answers for.
My best lead is that it has to do with headers and macros. The third party library actually already has a couple of gotchas having to do with its headers and macros, for example the code won't build if you put your #include's in the wrong order. I'm not changing any #include's so strictly this still wouldn't make sense, but perhaps the compiler is optimizing includes based on the presence of a symbol here? (it would be the only mention of ThirdPartyObject in the file)
It also occurs to me that because I am using Qt, it could be that the Meta-Object Compiler (which generates supplementary code between compilations) might be involved in this. Very unlikely, as Qt has no knowledge of the third party library where the segfault is happening and this is not actually relevant to the functionality of the MOC (since at no point ThirdPartyObject is being passed as an argument), but it's worth investigating at least.
Related questions have suggested that it could be a relatively small buffer overflow or race condition that gets tripped up by compiler optimizations. Continuing to investigate but all leads are welcome.
Typical culprits:
Some build products are stale and not binary-compatible.
You have a memory bug that has corrupted the state of your process, and are seeing a manifestation of that in a completely unrelated location.
Fixing #1 is trivial: delete the build folder and build again. If you're not building in a shadow build folder, you've set yourself up for failure, hopefully you now know enough to stop :)
Fixing #2 is not trivial. View manual memory management and possible buffer overflows with suspicion. Use modern C++ programming techniques to leverage the compiler to help you out: store things by value, use containers, use smart pointers, and use iterators and range-for instead of pointers. Don't use C-style arrays. Abhor C-style APIs of the (Type * array, int count) kind - they don't belong in C++.
What fun. I've boiled this down to the bottom.
//#include <otherthirdpartyheader.h>
#include <thirdpartyobject.h>
int main(...)
{
ThirdPartyObject test;
return 0;
}
This code runs. If I uncomment the first include, delete all build artifacts, and build again, then it breaks. There's obviously a header/macro component, and probably some kind of compiler-optimization component. But, get this, according to the library documentation it should give me a segfault every time because I haven't been doing a required initialization step. So the fact that it runs at all indicates unexpected behavior.
I'm chalking this up to library-specific issues rather than broad spectrum C++ issues. I'll be contacting the vendor going forward from here, but thanks everyone for the help.

Call to method ends up in completely different one

I have a strange problem which appears in wxWidgets libraries and which can be reproduced easily in debugger. It does not seem to be wxWidgets-specific, so I'm asking this here.
I perform some kind of non-standard initialisation. During this initialisation a method wxAppBase::Initialize() is called. wxAppBase is the base class of my own App-class. Within wxAppBase::Initialize() an other method OnInitGui() is called, this method exists in wxAppBase as well as in my derived class.
And here is the problem: In Debug Build, everthing works well, OnInitGui() is executed as expected. But in Release-build I never reach OnInitGui() method but end up in a completely different one which does not have anything to do with OnInitGui(). So it seems an illegal jump is performed.
All pointers seem to be valid and this happens in wxWidgets library completely.
Anybody an idea what could cause such a behaviour and how one could fix this?
Every hint/idea/suggestion is welcome.
This looks like a bad build. Clean everything and rebuild both the library and your application using exactly the same compiler and compiler options and the problem will probably just disappear.

memory address of a member variable of argument objects changes when dll function is called

class SomeClass
{
//some members
MemberClass one_of_the_mem_;
}
I have a function foo( SomeClass *object ) within a dll, it is being called from an exe.
Problem
address of one_of_the_mem_ changes during the time the dll call is dispatched.
Details:
before the call is made (from exe):
'&(this).one_of_the_mem_' - `0x00e913d0`
after - in the dll itself :
'&(this).one_of_the_mem_' - `0x00e913dc`
The address of object remains constant. It is only the member whose address shift by c every time.
I want some pointers regarding how can I troubleshoot this problem.
Code :
Code from Exe
stat = module->init ( this,
object_a,
&object_b,
object_c,
con_dir
);
Code in DLL
Status_C ModuleClass( SomeClass *object, int index, Config *conf, const char* name)
{
_ASSERT(0); //DEBUGGING HOOK
...
Update 1:
I compared the Offsets of members following Michael's instruction and they are the same in both cases.
Update 2:
I found a way to dump the class layout and noticed the difference in size, I have to figure out why is that happening though.
linked is the question that I found to dump class layout.
Update 3:
Final Update : Solved the problem, much thanks to Michael Burr.
it turned out that one of the build was using 32 bit time, _USE_32BIT_TIME_T was defined in it and the other one was using 64 bit time. So it generated the different layout for the object, attached is the difference file.
Your DLL was probably compiled with different set of compiler options (or maybe even a slightly different header file) and the class layout is different as a result.
For example, if one was built using debug flags and other wasn't or even if different compiler versions were used. For example, the libraries used by different compiler versions might have subtle differences and if your class incorporates a type defined by the library you could have different layouts.
As a concrete example, with Microsoft's compiler iterators and containers are sensitive to release/debug, _SECURE_SCL on/off , and _HAS_ITERATOR_DEBUGGING on/off setting (at least up though VS 2008 - VS 2010 may have changed some of this to a certain extent). See http://connect.microsoft.com/VisualStudio/feedback/details/352699/secure-scl-is-broken-in-release-builds for some details.
These kinds of issues make using C++ classes across DLL boundaries a bit more fragile than using straight C interfaces. They can occur in C structures as well, but it seems like C++ libraries have these differences more often (I think that's the nature of having richer functionality).
Another layout-changing issue that occurs every now and then is having a different structure packing option in effect in the different compiles. One thing that can 'hide' this is that pragmas are often used in headers to set structure packing to a certain value, and sometimes you may come across a header that does this without changing it back to the default (or more correctly the previous setting). If you have such a header, it's easy to have it included in the build for one module, but not another.
that sounds a bit wierd, you should show more code, it should 'move' if it being passed by ref, it sounds more like a copy of it is being made and that having the member function called.
Perhaps the DLL versions is compiled against a different version that you are referencing. check and make sure the header file is for the same version as the dll.
Recompile the library if you can.

Access violation calling C++ dll

I created c++ dll (using mingw) from code I wrote on linux (gcc), but somehow have difficulties using it in VC++. The dll basically exposes just one class, I created pure virtual interface for it and also factory function which creates the object (the only export) which looks like this:
extern "C" __declspec(dllexport) DeviceDriverApi* GetX5Driver();
I added extern "C" to prevent name mangling, dllexport is replaced by dllimport in actual code where I want to use the dll, DeviceDriverApi is the pure virtual interface.
Now I wrote simple code in VC++ which just call the factory function and then just tries to delete the pointer. It compiles without any problems but when I try to run it I get access violation error. If I try to call any method of the object I get access violation again.
When I compile the same code in MinGW (gcc) and use the same library, it runs without any problems. So there must be something (hehe, I guess many differences actually :)) between how VC++ code uses the library and gcc code.
Any ideas what?
Cheers,
Tom
Edit:
The code is:
DeviceDriverApi* x5Driver = GetX5Driver();
if (x5Driver->isConnected())
Console::WriteLine(L"Hello World");
delete x5Driver;
It's crashing when I try to call the method and when I try to delete the pointer as well. The object is created correctly though (the first line). There are some debug outputs when the object is created and I can see them before I get the access violation error.
You're using one compiler (mingw) for the DLL, and another (VC++) for the calling code.
You're calling a 'C' function, but returning a pointer to a C++ Object.
That will never work, because VTable layouts are almost guranteed to be incompatible. And, the DLL and app are probably using different memory managers, so you're doing new() with one and delete() with the other. Again, it just won't work.
For this to work the two compilers need to both support a standard ABI (Application Binary Interface). I don't think such a thing exists for Windows.
The best option is to expose all you DLL object methods and properties via C functions (including one to delete the object). You can the re-wrap into a C++ object on the calling end.
The two different compilers may be using different calling conventions. Try putting _cdecl before the function name in both the client code and the DLL code and recompiling both.
More info on calling conventions here: http://en.wikipedia.org/wiki/X86_calling_conventions
EDIT: The question was updated with more detail and it looks likely the problem is what Adrien Plisson describes at the end of his answer. You're creating an object in one module and freeing it in another, which is wrong.
(1) I suspect a calling covnention problem as well, though the simple suggestion by Leo doesn't seem to have helped.
Is isConnected virtual? It is possible that MinGW and VC++ use different implementations for a VTable, in which case, well, tough luck.
Try to see how far you get with the debugger: does it crash at the call, or the return? Do you arrive at invalid code? (If you know to read assembly, that usually helps a lot with these problems.)
Alternatively, add trace statements to the various methods, to see how far you get.
(2) For a public DLL interface, never free memory in the caller that was allocated by a callee (or vice versa). The DLL likely runs with a completely different heap, so the pointer is not known.
If you want to rely on that behavior, you need to make sure:
Caller and Callee (i.e. DLL and main program, in your case) are compiled with the same version of the sam compiler
for all supported compilers, you have configured the compile options to ensure caller and callee use the same shared runtime library state.
So the best way is to change your API to:
extern "C" __declspec(dllexport) DeviceDriverApi* GetX5Driver();
extern "C" __declspec(dllexport) void FreeDeviceDriver(DeviceDriverApi* driver);
and, at caller site, wrap in some way (e.g. in a boost::intrusive_ptr).
try looking at the imported libraries from both your DLL and your client executable. (you can use the Dependency Viewer or dumpbin or any other tool you like). verify that both the DLL and the client code are using the same C++ runtime.
if it is not the case, you can indeed run into some issues since the way the memory is managed may be different between the 2, leading to a crash when freeing from one runtime a pointer allocated from another runtime.
if this is really your problem, try not destroying the pointer in your client executable, but rather declare and export a function in your DLL which will take care of destroying the pointer.

Trouble tracking down a potential memory overwrite. Windows weirdness

This is driving me nuts. I am using some 3rd-party code in a Windows .lib that, in debug mode, is causing an error similar to the following:
Run-Time Check Failure #2 - Stack around the variable 'foo' was corrupted.
The error is thrown when either the object goes out of scope or is deleted. Simply allocating one of these objects and then deleting it will throw the error. I therefore think the problem is either in one of the many constructors/destructors but despite stepping through every line of code I cannot find the problem.
However, this only happens when creating one of these objects in a static library. If I create one in my EXE application, the error does not appear. The 3rd-party code itself lives in a static lib. For example, this fails:
**3RDPARTY.LIB**
class Foo : public Base
{
...
};
**MY.LIB**
void Test()
{
Foo* foo = new Foo;
delete foo; // CRASH!
}
**MY.EXE**
void Func()
{
Test();
}
But this will work:
**3RDPARTY.LIB**
class Foo : public Base
{
...
};
**MY.EXE**
void Func()
{
Foo* foo = new Foo;
delete foo; // NO ERROR
}
So, cutting out the 'middle' .lib file makes the problem go away and it is this weridness that is driving me mad. The EXE and 2 libs all use the same CRT library. There are no errors linking. The 3rd-party code uses inheritance and there are 5 base classes. I've commented out as much code as I can whilst still getting it to build and I just can't see what's up.
So if anyone knows why code in a .lib would act differently to the same code in a .exe, I would love to hear it. Ditto any tips for tracking down memory overwrites! I am using Visual Studio 2008.
One possibility is that it's a calling convention mismatch - make sure that your libraries and executables are all set to use the same default calling convention (usually __cdecl). To set that, open up your project properties and go to Configuration Properties > C/C++ > Advanced and look at the Calling Convention option. If you call a function with the wrong calling convention, you'll completely mess up the stack.
OK, I tracked the problem down and it's a cracker, if anyone's interested. Basically, my .LIB, which exhibited the problem. had defined _WIN32_WINNT as 0x0501 (Windows 2000 and greater), but my EXE and the 3rd-party LIB had it defined as 0x0600 (Vista). Now, one of the headers included by the 3rd-party lib is sspi.h which defines a structure called SecurityFunctionTable which includes the following snippet:
#if OSVER(NTDDI_VERSION) > NTDDI_WIN2K
// Fields below this are available in OSes after w2k
SET_CONTEXT_ATTRIBUTES_FN_W SetContextAttributesW;
#endif // greater thean 2K
Th cut a long story short, this meant a mismatch in object sizes between the LIBs and this was causing the Run-Time Check Failure.
Class!
Is your .lib file linked against the library's .lib? I assume from your example that you are including the header with the declaration of the destructor; without it, deleting such a type is allowed but can result in UB (in a bizarre manner contrary to the general rule that something must be defined before used). If the .lib files aren't linked together, it's possible that a custom operator delete or destructor is having some weird linking issues, and while that shouldn't happen, you never can quite tell if it won't.
Without seeing more code, it's hard to give you a firm answer. However, for tracking down memory overwrites, I recommend using WinDbg (free from Microsoft, search for "Debugging Tools for Windows").
When you have it attached to your process, you can have it set breakpoints for memory access (read, write, or execute). It's really powerful overall, but it should especially help you with this.
The error is thrown when either the object goes out of scope or is deleted.
Whenever I've run into this it had to do with the compiled library using a different version of the C++ runtime than the rest of the application.