I'm maintaining a legacy application written in C++. It crashes every now and then and Valgrind tells me its a double delete of some object.
What are the best ways to find the bug that is causing a double delete in an application you don't fully understand and which is too large to be rewritten ?
Please share your best tips and tricks!
Here's some general suggestion's that have helped me in that situation:
Turn your logging level up to full debug, if you are using a logger. Look for suspicious stuff in the output. If your app doesn't log pointer allocations and deletes of the object/class under suspicion, it's time to insert some cout << "class Foo constructed, ptr= " << this << endl; statements in your code (and corresponding delete/destructor prints).
Run valgrind with --db-attach=yes. I've found this very handy, if a bit tedious. Valgrind will show you a stack trace every time it detects a significant memory error or event and then ask you if you want to debug it. You may find yourself repeatedly pressing 'n' many many times if your app is large, but keep looking for the line of code where the object in question is first (and secondly) deleted.
Just scour the code. Look for construction/deletion of the object in question. Sadly, sometimes it winds up being in a 3rd party library :-(.
Update: Just found this out recently: Apparently gcc 4.8 and later (if you can use GCC on your system) has some new built-in features for detecting memory errors, the "address sanitizer". Also available in the LLVM compiler system.
Yep. What #OliCharlesworth said. There's no surefire way of testing a pointer to see if it points to allocated memory, since it really is just the memory location itself.
The biggest problem your question implies is the lack of reproducability. Continuing with that in mind, you're stuck with changing simple 'delete' constructs to delete foo;foo = NULL;.
Even then the best case scenario is "it seems to occur less" until you've really stamped it down.
I'd also ask by what evidence Valgrind suggests it's a double-delete problem. Might be a better clue lingering around in there.
It's one of the simpler truly nasty problems.
This may or may not work for you.
Long time ago I was working on 1M+ lines program that was 15 years old at the time. Faced with the exact same problem - double delete with huge data set. With such data any out of the box "memory profiler" would be a no go.
Things that were on my side:
It was very reproducible - we had macro language and running same script exactly the same way reproduced it every time
Sometime during the history of the project someone decided that "#define malloc my_malloc" and "#define free my_free" had some use. These didn't do much more than call built-in malloc() and free() but project already compiled and worked this way.
Now the trick/idea:
my_malloc(int size)
{
static int allocation_num = 0; // it was single threaded
void* p = builtin_malloc(size+16);
*(int*)p = ++allocation_num;
*((char*)p+sizeof(int)) = 0; // not freed
return (char*)p+16; // check for NULL in order here
}
my_free(void* p)
{
if (*((char*)p+sizeof(int)))
{
// this is double free, check allocation_number
// then rerun app with this in my_alloc
// if (alloc_num == XXX) debug_break();
}
*((char*)p+sizeof(int)) = 1; // freed
//built_in_free((char*)p-16); // do not do this until problem is figured out
}
With new/delete it might be trickier, but still with LD_PRELOAD you might be able to replace malloc/free without even recompiling your app.
you are probably upgrading from a version that treated delete differently then the new version.
probably what the previous version did was when delete was called it did a static check for if (X != NULL){ delete X; X = NULL;} and then in the new version it just does the delete action.
you might need to go through and check for pointer assignments, and tracking references of object names from construction to deletion.
I've found this useful: backtrace() on linux. (You have to compile with -rdynamic.) This lets you find out where that double free is coming from by putting a try/catch block around all memory operations (new/delete) then in the catch block, print out your stack trace.
This way you can narrow down the suspects much faster than running valgrind.
I wrapped backtrace in a handy little class so that I can just say:
try {
...
} catch (...) {
StackTrace trace;
std::cerr << "Double free!!!\n" << trace << std::endl;
throw;
}
On Windows, assuming the app is built with MSVC++, you can take advantage of the extensive heap debugging tools built into the debug version of the standard library.
Also on Windows, you can use Application Verifier. If I recall correctly, it has a mode the forces each allocation onto a separate page with protected guard pages in between. It's very effective at finding buffer overruns, but I suspect it would also be useful for a double-free situation.
Another thing you could do (on any platform) would be to make a copy of the sources that are transformed (perhaps with macros) so that every instance of:
delete foo;
is replaced with:
{ delete foo; foo = nullptr; }
(The braces help in many cases, though it's not perfect.) That will turn many instances of double-free into a null pointer reference, making it much easier to detect. It doesn't catch everything; you might have a copy of a stale pointer, but it can help squash a lot of the common use-after-delete scenarios.
Related
I have an std::string which appears to be getting corrupted somehow. Sometimes the string destructor will trigger an access violation, and sometimes printing it via std::cout will produce a crash.
If I pad the string in a struct as follows, the back_padding becomes slightly corrupted at a relatively consistant point in my code:
struct Test {
int front_padding[128] = {0};
std::string my_string;
int back_padding[128] = {0};
};
Is there a way to protect the front and back padding arrays so that writing to them will cause a exception or something? Or perhaps some tool which can be used to catch the culprit writing to this memory?
Platform: Windows x64 built with MSVC.
In general you have to solve problem of code sanitation, which is quite a broad topic. It sounds like you may have either out-of-bound write, or use of a dangling pointer or even a race condition in using a pointer, but in latter case bug's visibility is affected by obsevation, like the proverbial cat in quantum superposition state.
A dirty way to debug source of such rogue write is to create a data breakpont. It is especially effective if bug appears to be deterministic and isn't a "heisenbug". It is possible in MSVS during debug session. In gdb it is possible by using watch breakpoints.
You can point at the std::string storage or, in your experimental case, at the front padding array to in attempt to trigger breakpoint where a write operation occurs.
How can you catch memory corruption in C++?
The best way with a modern compiler is to compile with an address sanitizer. This inserts exactly the sort of guard areas you describe around automatic (stack) and dynamic (heap) allocations, and detects when they're trampled. It's built into Clang, GCC and MSVC.
If you don't have compiler support, or need to diagnose the problem in an existing binary without recompiling, you can use Valgrind.
The sanitized executable runs at full speed, although it's doing more work and deliberately has a less cache-friendly memory layout; expect it to be about 2x slower than an equivalent un-instrumented build.
Running under valgrind is much slower (expect 10x-30x for memcheck), but will catch more types of error, and is your only option if you can't recompile.
I have been using tcmalloc for a few months in a large project, and so far I must say that I am pretty happy about it, most of all for its HeapProfiling features which allowed to track memory leaks and remove them.
In the past couple of weeks though we experienced random crashes in our application, and we could not find the source of the random crash. In a very particular situation, When the application crashed, we found ourselves with a completely corrupted stack for one of the application threads. Several times instead I found that threads were stuck in tcmalloc::PageHeap::AllocLarge(), but since I dont have debug symbols of tcmalloc linked, I could not understand what the issue was.
After nearly one week of investigation, today I tried the most simple of things: removed tcmalloc from linkage to avoid using it, just to see what happened. Well... I finally found out what the problem was, and the offending code looks very much like this:
void AllocatingFunction()
{
Object object_on_stack;
ProcessObject(&object_on_stack);
}
void ProcessObject(Object* object)
{
...
// Do Whatever
...
delete object;
}
Using libc the application still crashed but I finally saw that I was calling delete on an object which was allocated on the stack.
What I still cant figure out is why tcmalloc instead kept the application running regardless of this very risky (if not utterly wrong) object deallocation, and the double deallocation when object_on_stack goes out of scope when AllocatingFunction ends. The fact is that the offending code could be called repeatedly without any hint of the underlying abomination.
I know that memory deallocation is one of those "undefined behaviour" when not used properly, but my surprise is such a different behaviour between "standard" libc and tcmalloc.
Does anyone have some sort of explanation of insight, on why tcmalloc keeps the application running?
Thanks in advance :)
Have a Nice Day
very risky (if not utterly wrong) object deallocation
well, I disagree here, it is utterly wrong, and since you invoke UB, anything can happen.
It very much depends on what the tcmalloc code acutally does on deallocation, and how it uses the (possibly garbage) data around the stack at that location.
I have seen tcmalloc crash on such occasions too, as well as glibc going into an infinite loop. What you see is just coincidence.
Firstly, there was no double free in your case. When object_on_stack goes out of scope there is no free call, just the stack pointer is decreased (or rather increased, as stack grows down...).
Secondly, during delete TcMalloc should be able to recognize that the address from a stack does not belong to the program heap. Here is a part of free(ptr) implementation:
const PageID p = reinterpret_cast<uintptr_t>(ptr) >> kPageShift;
Span* span = NULL;
size_t cl = Static::pageheap()->GetSizeClassIfCached(p);
if (cl == 0) {
span = Static::pageheap()->GetDescriptor(p);
if (!span) {
// span can be NULL because the pointer passed in is invalid
// (not something returned by malloc or friends), or because the
// pointer was allocated with some other allocator besides
// tcmalloc. The latter can happen if tcmalloc is linked in via
// a dynamic library, but is not listed last on the link line.
// In that case, libraries after it on the link line will
// allocate with libc malloc, but free with tcmalloc's free.
(*invalid_free_fn)(ptr); // Decide how to handle the bad free request
return;
}
Call to invalid_free_fn crashes.
I have some piece of code that runs many times and suddenly reports an access violation.
for(std::list<Projectile>::iterator it = projectiles.begin(); it != projectiles.end();) {
bool finished = ...
// try to print address of collides(), prints always 1
std::cout << &Projectile::collides << std::endl;
// here it crashes:
if(it->collides(&hero)) {
std::cout << "Projectile hits " << hero.getName() << std::endl;
finished = it->apply(&hero);
}
// ...
if(finished) {
it = projectiles.erase(it);
} else {
++it;
}
}
So VS debug stacktrace says that in line if(it->collides(&hero)) { the program tries to call a method at cdcdcdcd() which causes the access violation.
it, *it and hero are valid objects according to VS.
So I assume that cdcdcdcd() should actually be collides(). Since collides is a non-virtual method its address should basically not change or?
The thing is that the method collides() is executed several times before successfully, but suddenly it does not work anymore.
Can it be that the address is changed? Have I overwritten it?
Please help me! Also I appreciate information on anything that is not fine with this code :)
0xcdcdcdcd is a fill pattern used by the Win32 Debug CRT Heap; assuming you're running in a debugger on Windows, this may be a significant clue. There's a decent explanation of the heap fill patterns here.
My guess is you're somehow either invalidating the iterator elsewhere, or have some other buffer overflow or dangling pointer or other issue.
Application Verifier may help with diagnosing this. You may also want to look into the other things mentioned in the question How to debug heap corruption errors? or some of the techniques from Any reason to overload global new and delete? (disclaimer: at the moment I have the top-rated answer on both).
If your STL library has a debugging feature it may help ferret this out as well. Metrowerks (now Freescale) Codewarrior has the define _MSL_DEBUG, for example, that can be used to build a version of the standard libraries (including std::list) which will detect common issues like iterator invalidation (at some runtime cost).
It looks like Visual Studio has debug iterator support which might fit the bill, if you're using that.
Your termination condition seems wrong [EDIT: actually it looks correct, even if it still scares me. See comments.]. When finished becomes true, you erase the current projectile and then set the iterator to ... something. There is no reason to think that the loop will terminate at this point. It's is likely that it will eventually either start pointing outside of the list, or to an otherwise invalid object (which your debugger will not always flag as such).
When you erase the projectile, you should just explicitly leave the loop using "break".
I get a segmentation fault when attempting to delete this.
I know what you think about delete this, but it has been left over by my predecessor. I am aware of some precautions I should take, which have been validated and taken care of.
I don't get what kind of conditions might lead to this crash, only once in a while. About 95% of the time the code runs perfectly fine but sometimes this seems to be corrupted somehow and crash.
The destructor of the class doesn't do anything btw.
Should I assume that something is corrupting my heap somewhere else and that the this pointer is messed up somehow?
Edit : As requested, the crashing code:
long CImageBuffer::Release()
{
long nRefCount = InterlockedDecrement(&m_nRefCount);
if(nRefCount == 0)
{
delete this;
}
return nRefCount;
}
The object has been created with a new, it is not in any kind of array.
The most obvious answer is : don't delete this.
If you insists on doing that, then use common ways of finding bugs :
1. use valgrind (or similar tool) to find memory access problems
2. write unit tests
3. use debugger (prepare for loooong staring at the screen - depends on how big your project is)
It seems like you've mismatched new and delete. Note that delete this; can only be used on an object which was allocated using new (and in case of overridden operator new, or multiple copies of the C++ runtime, the particular new that matches delete found in the current scope)
Crashes upon deallocation can be a pain: It is not supposed to happen, and when it happens, the code is too complicated to easily find a solution.
Note: The use of InterlockedDecrement have me assume you are working on Windows.
Log everything
My own solution was to massively log the construction/destruction, as the crash could well never happen while debugging:
Log the construction, including the this pointer value, and other relevant data
Log the destruction, including the this pointer value, and other relevant data
This way, you'll be able to see if the this was deallocated twice, or even allocated at all.
... everything, including the stack
My problem happened in Managed C++/.NET code, meaning that I had easy access to the stack, which was a blessing. You seem to work on plain C++, so retrieving the stack could be a chore, but still, it remains very very useful.
You should try to load code from internet to print out the current stack for each log. I remember playing with http://www.codeproject.com/KB/threads/StackWalker.aspx for that.
Note that you'll need to either be in debug build, or have the PDB file along the executable file, to make sure the stack will be fully printed.
... everything, including multiple crashes
I believe you are on Windows: You could try to catch the SEH exception. This way, if multiple crashes are happening, you'll see them all, instead of seeing only the first, and each time you'll be able to mark "OK" or "CRASHED" in your logs. I went even as far as using maps to remember addresses of allocations/deallocations, thus organizing the logs to show them together (instead of sequentially).
I'm at home, so I can't provide you with the exact code, but here, Google is your friend, but the thing to remember is that you can't have a __try/__except handdler everywhere (C++ unwinding and C++ exception handlers are not compatible with SEH), so you'll have to write an intermediary function to catch the SEH exception.
Is your crash thread-related?
Last, but not least, the "I happens only 5% of the time" symptom could be caused by different code path executions, or the fact you have multiple threads playing together with the same data.
The InterlockedDecrement part bothers me: Is your object living in multiple threads? And is m_nRefCount correctly aligned and volatile LONG?
The correctly aligned and LONG part are important, here.
If your variable is not a LONG (for example, it could be a size_t, which is not a LONG on a 64-bit Windows), then the function could well work the wrong way.
The same can be said for a variable not aligned on 32-byte boundaries. Is there #pragma pack() instructions in your code? Does your projet file change the default alignment (I assume you're working on Visual Studio)?
For the volatile part, InterlockedDecrement seem to generate a Read/Write memory barrier, so the volatile part should not be mandatory (see http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx).
I have a C/C++ program that might be hanging when it runs out of memory. We discovered this by running many copies at the same time. I want to debug the program without completely destroying performance on the development machine. Is there a way to limit the memory available so that a new or malloc will return a NULL pointer after, say, 500K of memory has been requested?
Try turning the question on its head and asking how to limit the amount of memory an OS will allow your process to use.
Try looking into http://ss64.com/bash/ulimit.html
Try say:
ulimit -v
Here is another link that's a little old but gives a little more back ground:
http://www.network-theory.co.uk/docs/gccintro/gccintro_77.html
One way is to write a wrapper around malloc().
static unsigned int requested =0;
void* my_malloc(size_tamount){
if (requested + amount < LIMIT){
requested+=amount;
return malloc(amount);
}
return NULL
}
Your could use a #define to overload your malloc.
As GMan states, you could overload new / delete operators as well (for the C++ case).
Not sure if that's the best way, or what you are looking for
Which OS? For Unix, see ulimit -d/limit datasize depending on your shell (sh/csh).
You can write a wrapper for malloc which returns an error in the circonstance you want. Depending on your OS, you may be able to substitute it for the implementation's one.
An other way of doing it is to use failmalloc which is a shared library that overrides malloc etc. and then fail :-). It gives you control over when to fail and can be made to fail randomly, every nth time etc.
I havent used it my self but have heard good things.
That depends on your platform. For example, this can be achieved programmatically on Unix-like platforms using setrlimit(RLIMIT_DATA, ...).
EDIT:
The RLIMIT_AS resource may also be useful in this case as well.
Override new and new[].
void* operator new(size_t s)
{
}
void* operator new[](size_t s)
{
}
Put your own code in the braces to selectively die after X number of calls to new. Normally you would call malloc to allocate the memory and return it.
I once had a student in CS 1 (in C, yeah, yeah, not my fault) try this, and ran out of memory:
int array[42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42][42]..... (42 dimensions);
and then he wanted to know why it gave errors...
If you want to spend money, there's a tool called Holodeck by SecurityInnovations, which lets you inject faults into your program (including low memory). Nice thing is you can turn stuff on and off at will. I haven't really used it, much, so I don't know if it's possible to program in faults at certain points with the tool. I also don't know what platforms are supported...
As far as I know, on Linux, malloc will never return a null pointer. Instead, the OOM Killer will get called. This is, of course, unless you've disabled the OOM Killer. Some googling should come up with a result.
I know this isn't your actual question, but it does have to do with where you're coming from.