COM+ application, building with MS Visual Studio 6, SP 6 on Windows XP SP 3, and debugging remotely.
My main question is this; why would I not be able to step into 'new' if I can step into 'delete'? I'm mostly looking for ideas as to what I should look into.
I'm working on a fairly large project that I'm only just getting familiar with. The current issue is a heap corruption problem in which a release build will eventually exhaust its working set and crash. The problem is so pervasive that the following code will corrupt the heap:
int * iArray = new int [100];
delete [] iArray;
I say 'corrupt the heap' because the debug output displays "heap[dllhost.exe]: invalid address specified to rtlvalidateheap" on the 'delete'.
I can step into the 'delete' call fine and it seems to be calling the proper one (located in DELOP.cpp in ...\Microsoft Visual Studio\VC98\CRT\SRC) but, for whatever reason, I cannot step into any call to 'new'. I'm grasping at straws here but I have a feeling somewhere in the code base someone overrode the 'new' operator and the code I'm looking at is unintentionally using it. The symptoms of the release build seem like memory is being allocated to one heap and attempted to be deleted from another, at least that's my hunch.
EDIT:
Ack! Sorry everyone, I posted too soon, should have given more info.
I've searched the code base and found a few overrides but they are all in classes, not globally defined. The only one that isn't in a class is the following:
struct _new_selector
{
};
inline void* operator new(size_t, void *ptr, _new_selector)
{
return (ptr);
}
But that's a placement new and I'm pretty sure it doesn't count in this situation. What is the library that I should be stepping into for the original 'new'? I'm guessing it's the same as the 'delete' but if not, maybe I just don't have debug info for it?
EDIT 2:
Messing around with this I've found that on a debug build this issue does not exist. I've already looked into the run-time libraries, it's using /MD and /MDd for debug. Not only that, but I've built the release version with /MDd jut to make sure and there was still no change. Looking at the maps of both a debug build and a release build the new operator (with it's mangling) is at the following:
Release:
0001:00061306 ??2#YAPAXI#Z 10062306 f MSVCRTD:MSVCRTD.dll
0002:00000298 _imp??2#YAPAXI#Z 1006e298 MSVCRTD:MSVCRTD.dll
Debug:
0001:00077d06 ??2#YAPAXI#Z 10078d06 f MSVCRTD:MSVCRTD.dll
0004:00000ad4 _imp??2#YAPAXI#Z 100b5ad4 MSVCRTD:MSVCRTD.dll
I also checked the delete operator:
Release:
0001:000611f0 ??3#YAXPAX#Z 100621f0 f msvcprtd:delop_s.obj
Debug:
0001:00077bf0 ??3#YAXPAX#Z 10078bf0 f msvcprtd:delop_s.obj
Also, and I don't have a print out of them but I can get it if it would help, the disassembly of the new operator looks the same in release and debug. So I guess it's not an override? Would an inline override of an operator make this untrue?
Also, being a COM+ application which spawns multiple dllhost.exe processes, would it be possible for a call to the new operator go to another DLL, or the exe, and a call to delete go to the opposite?
Going with your hunch that there is possibly an overloaded new somewhere in the code, few things you can check
the disassembly of the code and find out the library name in that file, generally there is something in the assembly that will give you a hint
If you are not able to find out the library name, check the address in the assembly that you are entering. Then in the debug output window, check the load addresses for the various libraries - that could give you a clue as to which library to check
If the above doesn't help, check if you can generate a map file for the complete project. If you can, then you can look up the address in the map file and that might help
Try to use the debug version of the runtime libraries. Can't recollect what was the option that will turn on the debug_malloc. That can help you figure out what is happening on the heap
The community can add a few more that I might have missed. And finally, if you do crack the problem, please share how you did so. Either here or as a link to your blog. Working on heap problems for a large project is generally not easy and we all can learn a new trick or two.
Ok, so here is what it ended up being.
The program I'm working on has about thirty or so projects within it. Some create libs, others dlls, still others exes. In any case, it's all intermingled. Add into that the fact that we use ATL and COM and it starts getting crazy pretty quick. The end result is that some, not all, of the projects were being built with the _ATL_MIN_CRT compiler definition, even though this is a desktop application, not a web based one in which a client would need to download a few modules.
Here is some info on _ATL_MIN_CRT:
• http://support.microsoft.com/default.aspx?scid=kb;EN-US;q166480
• http://msdn.microsoft.com/en-us/library/y3s1z4aw%28v=vs.80%29.aspx
Notice, from the first link, that this also will eliminate the use of memory allocation routines. I’m unsure what the original motive behind using this was, or if it was really intentional, but it certainly causes issues with allocating memory. Also, this only affects Release builds, hence why it was such a pain to find.
Basically, module A was built with _ATL_MIN_CRT and module B was built without it but had a dependency to module A. Since this is also using COM and everything was run in dllhost.exe, when module B tried to use the new operator it seems to have gone out of its dll to attempt to allocate memory on the heap. Then, when calling delete, it tried to delete it within its dll. Thus, we have a crazy memory leak.
Mind, removing _ATL_MIN_CRT fixes this but what I mention above is only my understanding of it. It may very well be more/less complicated but, regardless, this was the issue.
Thanks to everyone's suggestions, they really did help me find this thing!
Related
Basically the situation is we have a C++ program that occasionally crashes when it attempts to access an already-freed object (in Debug build we notice that the memory being pointed to is full of the usual "cdcdcdcd" pattern). We tried to trace every point where the object is cleared and couldn't find a place where known pointers aren't properly set to null.
There are two main issues - The code is extremely large and convoluted, written over a period of at least a decade and there are several developers whose whereabouts are unknown and even some known to be deceased, so it's not possible to get in touch with the people who originally wrote the code. The complexity of the code makes it impractical to manually determine how many pointers to the said object exist and which functions use or make copies of them.
The second big issue is that we don't have a reliable way to reproduce. We know that in the production system, which has hundreds of concurrent users, it crashes about twice a day, but all attempts to reproduce the crash in a test environment have failed. It should be possible for us to inspect the production environment for a few minutes after a crash but eventually we have to bring it back up. The environment is Windows Server 2019 and the program is compiled with Visual Studio 2019. There is a copy of Visual Studio and the program source code on the server. We have already attempted to use DMP files, which failed because the dump only shows the use of the dangling pointer, it does not tell us where was the point where the pointed object was freed.
I would appreciate any advice because I'm pretty much out of ideas.
Thanks.
You're already somewhere by noticing a "use-after-free" problem. The next step is figuring out what part of that is wrong. Should the object not have been freed, or should it not have been used?
With smart pointers this is still relevant to know - should you use a shared_ptr or a weak_ptr? These classes are not magic - you can implement the same behavior in many other ways. But they're sure convenient; you just need to figure out which one to use. The other big advantage is that future readers will see a weak_ptr and can reverse your logic - that's a non-owning pointer, so that implements the "don't use after free" instead of "don't free while in use".
I have some troubles with Embarcadero C++ Builder XE3. When I run my program, I have an access violation BEFORE the first instruction in the main...So I can't debug, it's very weird.
I used to have this problem a couple of weeks ago : I was forced to full rebuild the entire projet (even if only a comma was missing...) and the violation didn't occurs anymore. I solved it by ckecking the option "Disable incremental link".
I was very happy, but today, the problem is back, and whatever I do, my application crash before enterring in the main ...
Does anyone have an idea ? It's a big project, so I can't really post an exemple because I don't really know what to show...
Thanks a lot
Probably you have a bug in a constructor of a static global object. These constructors are all executed before getting into main(), so this can happen without being a runtime environment or a compiler bug.
As you told, debugging these is difficult as you probably don't know which class is failing, and probably you don't have exception info also.
As you say it's a large project, perhaps you have to resign to use large project toolkits/methodologies to deal with these problems, like unit testing and lean methodologies (like scrum or the like).
With the information you post I think this is the most can be said.
I am working on the largest project I've ever worked on, and I've never debugged something like this, so I don't know where to get started.
Some info on the crash:
I am using Visual Studio, and the debugger is completely useless. The only information it gives me is that it appears to be happening during a call to "memcpy". The call stack is completely empty except for the memcpy function, and the local variables are listed but it does not have values for any of them.
It happens occasionally on any computer.
It does not ALWAYS happen under any (known) condition, but it only ever happens under a few conditions. In particular it only happens when a particular type of object is destroyed, although that's not necessarily the direct cause, and investigating the destruction process has not been helpful.
A little more about the project:
It is a game using SFML 2.0, linked statically.
I am not calling memcpy anywhere in my own code.
Some questions:
Where could the call to memcpy be coming from? Is it in SFML or elsewhere?
How do I (using visual studio) get more information on a crash when the debugger isn't working?
This is an answer to "Where could the call to memcpy be coming from?"
In most cases this is the result of a call to the copy constructor of std::string with a this pointer of NULL, or a string operation on an already destructed string. This string can be a member of a class of you, of course.
This in itself won't help you to find the problem when the project is really large. However, you can almost certainly assume that you are using a reference or pointer (or iterator) to a custom object that is already destructed. A most straightforward way to find this access would be by running your program, compiled without optimization and with debug info, in valgrind. Unfortunately that isn't available for windows (see Is there a good Valgrind substitute for Windows?).
The main problem here seems to be that you aren't even getting a backtrace, because that would give a strong hint to where to look into, at least. I'm not familiar with windows though, so I can only guess what is the cause of that. Are you sure you have everything compiled with debug info?
I migrated a project from Qt4 to Qt5, and it compiles and everything but the application crashes before it even reaches the main function. I know there is a null value that fucks up something at some point, maybe a file that cant find or something, but there are so many .cpps and .h and libraries that its pretty hard to locate the source of the error plus I cant set any breakpoints. I have a lot of debugging data so maybe any of you can guide me in the right direction. I dont know what I would be doing without stack overflow honestly, so thankyou in advance.
When debugging I get different crashes:
The stack in each case shows different crashes, but all of them have something in common, which is this __do_global_ctors thingy, I have researched and apparently it has to do with constructors, but I have no idea what I should be looking for.
if I missed any info please do ask. I hope someone can enlighten me, I am so so close to get this working.
The __do_global_ctors() is called before your main(), as the framework needs to instantiate all of the global objects that main() might use.
This method will call the constructors for all static objects, and their component objects. I.e. all static constructors.
From the look of the stack trace, it appears that the segfault occurs during the construction of a QGlobalStatic<QMutex, [incomlpete types - see trace for details]> object, which makes sense. This is being constructed by qRegisterResourceData as part of qInitResources_mimetypes.
Try placing a breakpoint in this function in qrc_mimetypes.cpp (if you have the source) and see where that gets you. Or look at the Qt documentation for mimetypes initialisation and make sure you've specified your application's resources correctly.
I managed to solve the issue by thoughtfully re-compiling all the libraries to Qt5 and making sure all the cpps that the program refered were Qt5 too. Also double-checked the linkings. I thought I had done it but apparently I missed one library.
Mind that some libraries need to be migrated and there are others that you can download and compile directly with Qt5. If you are having this same problem make sure that there are no Qt5 versions of that library before migrating them yourself.
Yesterday, I got bit by a rather annoying crash when using DLLs compiled with GCC under Cygwin. Basically, as soon as you run with a debugger, you may end up landing in a debugging trap caused by RtlFreeHeap() receiving an address to something it did not allocate.
This is a known bug with GCC 3.4 on Cygwin. The situation arises because the libstdc++ library includes a "clever" optimization for empty strings. I spare you the details (see the references throughout this post), but whenever you allocate memory in one DLL for an std::string object that "belongs" to another DLL, you end up giving one heap a chunk to free that came from another heap. Hence the SIGTRAP in RtlFreeHeap().
There are other problems reported when exceptions are thrown across DLL boundaries.
This makes GCC 3.4 on Windows an unacceptable solution as soon as your project is based on DLLs and the STL. I have a few options to move past this option, many of which are very time-consuming and/or annoying:
I can patch my libstdc++ or rebuild it with the --enable-fully-dynamic-string configuration option
I can use static libraries instead, which increases my link time
I cannot (yet) switch to another compiler either, because of some other tools I'm using. The comments I find from some GCC people is that "it's almost never reported, so it's probably not a problem", which annoys me even more.
Does anyone have some news about this? I can't find any clear announcement that this has been fixed (the bug is still marked as "assigned"), except one comment on the GNU Radio bug tracker.
Thanks!
The general problem you're running into is that C++ was never really meant as a component language. It was really designed to be used to create complete standalone applications. Things like shared libraries and other such mechanisms were created by vendors on their own. Think of this example: suppose you created a C++ component that returns a C++ object. How is the C++ component know that it will be used by a C++ caller? And if the caller is a C++ application, why not just use the library directly?
Of course, the above information doesn't really help you.
Instead, I would create the shared libraries/DLLs such that you follow a couple of rules:
Any object created by a component is also destroyed by the same component.
A component can be safely unloaded when all of its created objects are destroyed.
You may have to create additional APIs in your component to ensure these rules, but by following these rules, it will ensure that problems like the one described won't happen.