I have a Windows application, it crashes every now and then and not reroducibly.
When it does, I get pure virtual function called. I have set it up to create a Windows dump using ADplus and even when it crashes, there is never a dump.
I am pretty sure this is build error, I am building using VC2008 SP1 and this is the release build.
Any insights on this? How can I debug this, it is a release build with .pdb file and map file.
Thanks
Reza
As was pointed out already, it is very unlikely that this is a build error. I had a quite complex project with a similar problem and was able to track it down by using _set_purecall_handler and providing my own handler. This way I was able to break into the debugger when it happened and see the call stack. Obviously an alternative here is to create the minidump when it happens. Remember that you need to prepare everything for the minidump before the program encounters an exception.
However, there is also a good chance this could be caused by a heap corruption. In such a case I would expect a variety of symptoms, though. You describe this specific symptom, so it's likely that your code is indeed at fault.
The project I mentioned above was a legacy project that modeled something similar to COM and there were indeed places where the compiler could not have possibly found all occasions of pure virtual functions for which no implementation existed in derived classes.
First of all, make sure that you are not calling pure virtual function. If this is not the case, try to start your program under WinDbg.
I am pretty sure this is build error
Very, very unlikely. It's much more likely that this is a bug in your code.
You might be slicing an object somewhere. If you have your app set up to generate a dumpfile on unhanded exceptions and you are still dying with no dump file, then its very possible the bug is in the exception handler.
You need to get a dump file. That should be your first priority.
If your issue is not constant most probably there are some problems with out of index writes which could corrupt memory
If you've tried a full rebuild, then it seems more likely it's a coding issue.
Are you calling any (pure) virtual functions in any constructors or destructors?
More likely is that you're calling a pure virtual function on a deleted object, and the vtable has been moved to point to the parent object. In this case valgrind (free, linux) or Purify ($$$, Windows) will really be able to help you out. VS may have a memory checker with it too.
I once had a similar problem with C++. I had a call to a virtual function from a constructor somewhere, so the object was not constructed yet when the call happened, explaining the "pure" virtual function called. It might also be a problem of slicing if you have bad casts on objects instead of pointer to objects. Use a debugger and step by step/backtrace to find the source so we can help you better.
I have the same error. I reviewed code many times and never found virtual functions usage in constructors or destructors.
The problem was in build system. I had old static library version on my computer where some function wasn't pure virtual, and new one when it became (I added one more abstraction layer). Because EXE file was created with old static library, but new headers, this error appeared.
So, make sure that your libraries and include files versions are consistent if nothing else helps.
Related
I debug an application which links against two DLLs. When an object from one of these DLLs is instantiated the application segfaults. However, when the order of the .lib files in (VS2010) Linker->Input->Additional Dependencies is swapped then the application runs fine.
This workaround works for now but I still want to understand what caused the problem. Any hints, how can I further debug this?
While without much more information any answer is bound to be a speculation, one potential reason is the following:
in Windows, every DLL has DLLMain() function
as soon as you have two DLLs, they will have two DllMain() function calls
if there is a dependency between the two (i.e. one DLLMain() implicitly relies on an object being initialized within another one) - you have a problem.
I've seen such a problem myself (which is obviously a bad programming practice, but it does happen). However, a number of other explanations may also exist.
The way to debug this would be the way to debug any crash: start by inspecting the stack at the catch of the segfault. If the crash reproduces at debug builds, understanding the direct reason should be straightforward (the root cause usually takes some deeper investigation).
Not much can be said about the cause of the crash without viewing this stack, but it is reasonable to assume that a global resource is involved. A named event, common temporary file, perhaps some internal framework structure (which, if any, frameworks are involved in your code?) - but it must be something that crosses the individual DLL boundaries, and so is probably a resource at application scope.
I am working on the largest project I've ever worked on, and I've never debugged something like this, so I don't know where to get started.
Some info on the crash:
I am using Visual Studio, and the debugger is completely useless. The only information it gives me is that it appears to be happening during a call to "memcpy". The call stack is completely empty except for the memcpy function, and the local variables are listed but it does not have values for any of them.
It happens occasionally on any computer.
It does not ALWAYS happen under any (known) condition, but it only ever happens under a few conditions. In particular it only happens when a particular type of object is destroyed, although that's not necessarily the direct cause, and investigating the destruction process has not been helpful.
A little more about the project:
It is a game using SFML 2.0, linked statically.
I am not calling memcpy anywhere in my own code.
Some questions:
Where could the call to memcpy be coming from? Is it in SFML or elsewhere?
How do I (using visual studio) get more information on a crash when the debugger isn't working?
This is an answer to "Where could the call to memcpy be coming from?"
In most cases this is the result of a call to the copy constructor of std::string with a this pointer of NULL, or a string operation on an already destructed string. This string can be a member of a class of you, of course.
This in itself won't help you to find the problem when the project is really large. However, you can almost certainly assume that you are using a reference or pointer (or iterator) to a custom object that is already destructed. A most straightforward way to find this access would be by running your program, compiled without optimization and with debug info, in valgrind. Unfortunately that isn't available for windows (see Is there a good Valgrind substitute for Windows?).
The main problem here seems to be that you aren't even getting a backtrace, because that would give a strong hint to where to look into, at least. I'm not familiar with windows though, so I can only guess what is the cause of that. Are you sure you have everything compiled with debug info?
I migrated a project from Qt4 to Qt5, and it compiles and everything but the application crashes before it even reaches the main function. I know there is a null value that fucks up something at some point, maybe a file that cant find or something, but there are so many .cpps and .h and libraries that its pretty hard to locate the source of the error plus I cant set any breakpoints. I have a lot of debugging data so maybe any of you can guide me in the right direction. I dont know what I would be doing without stack overflow honestly, so thankyou in advance.
When debugging I get different crashes:
The stack in each case shows different crashes, but all of them have something in common, which is this __do_global_ctors thingy, I have researched and apparently it has to do with constructors, but I have no idea what I should be looking for.
if I missed any info please do ask. I hope someone can enlighten me, I am so so close to get this working.
The __do_global_ctors() is called before your main(), as the framework needs to instantiate all of the global objects that main() might use.
This method will call the constructors for all static objects, and their component objects. I.e. all static constructors.
From the look of the stack trace, it appears that the segfault occurs during the construction of a QGlobalStatic<QMutex, [incomlpete types - see trace for details]> object, which makes sense. This is being constructed by qRegisterResourceData as part of qInitResources_mimetypes.
Try placing a breakpoint in this function in qrc_mimetypes.cpp (if you have the source) and see where that gets you. Or look at the Qt documentation for mimetypes initialisation and make sure you've specified your application's resources correctly.
I managed to solve the issue by thoughtfully re-compiling all the libraries to Qt5 and making sure all the cpps that the program refered were Qt5 too. Also double-checked the linkings. I thought I had done it but apparently I missed one library.
Mind that some libraries need to be migrated and there are others that you can download and compile directly with Qt5. If you are having this same problem make sure that there are no Qt5 versions of that library before migrating them yourself.
COM+ application, building with MS Visual Studio 6, SP 6 on Windows XP SP 3, and debugging remotely.
My main question is this; why would I not be able to step into 'new' if I can step into 'delete'? I'm mostly looking for ideas as to what I should look into.
I'm working on a fairly large project that I'm only just getting familiar with. The current issue is a heap corruption problem in which a release build will eventually exhaust its working set and crash. The problem is so pervasive that the following code will corrupt the heap:
int * iArray = new int [100];
delete [] iArray;
I say 'corrupt the heap' because the debug output displays "heap[dllhost.exe]: invalid address specified to rtlvalidateheap" on the 'delete'.
I can step into the 'delete' call fine and it seems to be calling the proper one (located in DELOP.cpp in ...\Microsoft Visual Studio\VC98\CRT\SRC) but, for whatever reason, I cannot step into any call to 'new'. I'm grasping at straws here but I have a feeling somewhere in the code base someone overrode the 'new' operator and the code I'm looking at is unintentionally using it. The symptoms of the release build seem like memory is being allocated to one heap and attempted to be deleted from another, at least that's my hunch.
EDIT:
Ack! Sorry everyone, I posted too soon, should have given more info.
I've searched the code base and found a few overrides but they are all in classes, not globally defined. The only one that isn't in a class is the following:
struct _new_selector
{
};
inline void* operator new(size_t, void *ptr, _new_selector)
{
return (ptr);
}
But that's a placement new and I'm pretty sure it doesn't count in this situation. What is the library that I should be stepping into for the original 'new'? I'm guessing it's the same as the 'delete' but if not, maybe I just don't have debug info for it?
EDIT 2:
Messing around with this I've found that on a debug build this issue does not exist. I've already looked into the run-time libraries, it's using /MD and /MDd for debug. Not only that, but I've built the release version with /MDd jut to make sure and there was still no change. Looking at the maps of both a debug build and a release build the new operator (with it's mangling) is at the following:
Release:
0001:00061306 ??2#YAPAXI#Z 10062306 f MSVCRTD:MSVCRTD.dll
0002:00000298 _imp??2#YAPAXI#Z 1006e298 MSVCRTD:MSVCRTD.dll
Debug:
0001:00077d06 ??2#YAPAXI#Z 10078d06 f MSVCRTD:MSVCRTD.dll
0004:00000ad4 _imp??2#YAPAXI#Z 100b5ad4 MSVCRTD:MSVCRTD.dll
I also checked the delete operator:
Release:
0001:000611f0 ??3#YAXPAX#Z 100621f0 f msvcprtd:delop_s.obj
Debug:
0001:00077bf0 ??3#YAXPAX#Z 10078bf0 f msvcprtd:delop_s.obj
Also, and I don't have a print out of them but I can get it if it would help, the disassembly of the new operator looks the same in release and debug. So I guess it's not an override? Would an inline override of an operator make this untrue?
Also, being a COM+ application which spawns multiple dllhost.exe processes, would it be possible for a call to the new operator go to another DLL, or the exe, and a call to delete go to the opposite?
Going with your hunch that there is possibly an overloaded new somewhere in the code, few things you can check
the disassembly of the code and find out the library name in that file, generally there is something in the assembly that will give you a hint
If you are not able to find out the library name, check the address in the assembly that you are entering. Then in the debug output window, check the load addresses for the various libraries - that could give you a clue as to which library to check
If the above doesn't help, check if you can generate a map file for the complete project. If you can, then you can look up the address in the map file and that might help
Try to use the debug version of the runtime libraries. Can't recollect what was the option that will turn on the debug_malloc. That can help you figure out what is happening on the heap
The community can add a few more that I might have missed. And finally, if you do crack the problem, please share how you did so. Either here or as a link to your blog. Working on heap problems for a large project is generally not easy and we all can learn a new trick or two.
Ok, so here is what it ended up being.
The program I'm working on has about thirty or so projects within it. Some create libs, others dlls, still others exes. In any case, it's all intermingled. Add into that the fact that we use ATL and COM and it starts getting crazy pretty quick. The end result is that some, not all, of the projects were being built with the _ATL_MIN_CRT compiler definition, even though this is a desktop application, not a web based one in which a client would need to download a few modules.
Here is some info on _ATL_MIN_CRT:
• http://support.microsoft.com/default.aspx?scid=kb;EN-US;q166480
• http://msdn.microsoft.com/en-us/library/y3s1z4aw%28v=vs.80%29.aspx
Notice, from the first link, that this also will eliminate the use of memory allocation routines. I’m unsure what the original motive behind using this was, or if it was really intentional, but it certainly causes issues with allocating memory. Also, this only affects Release builds, hence why it was such a pain to find.
Basically, module A was built with _ATL_MIN_CRT and module B was built without it but had a dependency to module A. Since this is also using COM and everything was run in dllhost.exe, when module B tried to use the new operator it seems to have gone out of its dll to attempt to allocate memory on the heap. Then, when calling delete, it tried to delete it within its dll. Thus, we have a crazy memory leak.
Mind, removing _ATL_MIN_CRT fixes this but what I mention above is only my understanding of it. It may very well be more/less complicated but, regardless, this was the issue.
Thanks to everyone's suggestions, they really did help me find this thing!
I have been provided with a C++ DLL and associated header file in order to integrate it with my application. To begin with, I am simply trying to call the DLL from a simple Win32 console application (I'm using Visual Studio 2008 Express).
I've linked the DLL by specifying it as an additional dependency in the project settings.
The interface (i.e. the only exported function) simply returns a pointer to an instance of the Class that I actually need to call. I can successfully call this, get the pointer and call the first function that I need to (an "init" function).
When I come to call the function that actually does the processing I require, I'm intermittently getting a "0xC0000005: Access violation reading location...." error. That is, I run the program - it works successfully and exits - I try to run it again (changing nothing - all parameters are hard coded) and get the error (and continue to do so).
I can't consistently recreate the problem but I'm beginning to think that it may be something to do with the DLL not being unloaded properly - after getting the error on one occasion I tried deleting the DLL and was told by Windows that it was in use. That said, on another occasion I was able to delete the DLL after getting the error, copy it back, then still got the error on the next run.
Should the DLL be correctly unloaded when my .exe finishes? Would I be better off trying to explicitly load/unload the DLL rather than doing it implicitly?
Any other help or advice greatly appreciated.
It won't have anything to do with the DLL being unloaded; different processes using the same DLL do not share any state. Also, the DLL will be unloaded when the process exits; perhaps not gracefully, but it will be unloaded.
I can think of two likely reasons for an intermittent failure.
Most likely, the DLL has a race condition. This could be one that's exposed if the DLL has been cached, causing the timing to change. That would explain why your first run didn't fail but subsequent ones did.
I think it's also possible that the DLL didn't release its lock on some file. If there is some file you know is accessed by this DLL, try checking to see if that file is locked after the process ends.
Also, get this in a debugger. Turn on first-chance exceptions in visual studio and look at the callstack where the AV is happening, and post it here.
What types are involved in the classes exported by the DLL? We have often had these kinds of problems with Visual Studio when the classes make use of STL - I'd guess that any template usage would probably be a way to potentially cause this sort of problem.
What's your setting for "Runtime library" under "C/C++" in the configuration properties?
You should probably try /MD (or /MDd for debugging).
see http://msdn.microsoft.com/en-us/library/2kzt1wy3(VS.71).aspx