I have some piece of code that runs many times and suddenly reports an access violation.
for(std::list<Projectile>::iterator it = projectiles.begin(); it != projectiles.end();) {
bool finished = ...
// try to print address of collides(), prints always 1
std::cout << &Projectile::collides << std::endl;
// here it crashes:
if(it->collides(&hero)) {
std::cout << "Projectile hits " << hero.getName() << std::endl;
finished = it->apply(&hero);
}
// ...
if(finished) {
it = projectiles.erase(it);
} else {
++it;
}
}
So VS debug stacktrace says that in line if(it->collides(&hero)) { the program tries to call a method at cdcdcdcd() which causes the access violation.
it, *it and hero are valid objects according to VS.
So I assume that cdcdcdcd() should actually be collides(). Since collides is a non-virtual method its address should basically not change or?
The thing is that the method collides() is executed several times before successfully, but suddenly it does not work anymore.
Can it be that the address is changed? Have I overwritten it?
Please help me! Also I appreciate information on anything that is not fine with this code :)
0xcdcdcdcd is a fill pattern used by the Win32 Debug CRT Heap; assuming you're running in a debugger on Windows, this may be a significant clue. There's a decent explanation of the heap fill patterns here.
My guess is you're somehow either invalidating the iterator elsewhere, or have some other buffer overflow or dangling pointer or other issue.
Application Verifier may help with diagnosing this. You may also want to look into the other things mentioned in the question How to debug heap corruption errors? or some of the techniques from Any reason to overload global new and delete? (disclaimer: at the moment I have the top-rated answer on both).
If your STL library has a debugging feature it may help ferret this out as well. Metrowerks (now Freescale) Codewarrior has the define _MSL_DEBUG, for example, that can be used to build a version of the standard libraries (including std::list) which will detect common issues like iterator invalidation (at some runtime cost).
It looks like Visual Studio has debug iterator support which might fit the bill, if you're using that.
Your termination condition seems wrong [EDIT: actually it looks correct, even if it still scares me. See comments.]. When finished becomes true, you erase the current projectile and then set the iterator to ... something. There is no reason to think that the loop will terminate at this point. It's is likely that it will eventually either start pointing outside of the list, or to an otherwise invalid object (which your debugger will not always flag as such).
When you erase the projectile, you should just explicitly leave the loop using "break".
Related
I'm maintaining a legacy application written in C++. It crashes every now and then and Valgrind tells me its a double delete of some object.
What are the best ways to find the bug that is causing a double delete in an application you don't fully understand and which is too large to be rewritten ?
Please share your best tips and tricks!
Here's some general suggestion's that have helped me in that situation:
Turn your logging level up to full debug, if you are using a logger. Look for suspicious stuff in the output. If your app doesn't log pointer allocations and deletes of the object/class under suspicion, it's time to insert some cout << "class Foo constructed, ptr= " << this << endl; statements in your code (and corresponding delete/destructor prints).
Run valgrind with --db-attach=yes. I've found this very handy, if a bit tedious. Valgrind will show you a stack trace every time it detects a significant memory error or event and then ask you if you want to debug it. You may find yourself repeatedly pressing 'n' many many times if your app is large, but keep looking for the line of code where the object in question is first (and secondly) deleted.
Just scour the code. Look for construction/deletion of the object in question. Sadly, sometimes it winds up being in a 3rd party library :-(.
Update: Just found this out recently: Apparently gcc 4.8 and later (if you can use GCC on your system) has some new built-in features for detecting memory errors, the "address sanitizer". Also available in the LLVM compiler system.
Yep. What #OliCharlesworth said. There's no surefire way of testing a pointer to see if it points to allocated memory, since it really is just the memory location itself.
The biggest problem your question implies is the lack of reproducability. Continuing with that in mind, you're stuck with changing simple 'delete' constructs to delete foo;foo = NULL;.
Even then the best case scenario is "it seems to occur less" until you've really stamped it down.
I'd also ask by what evidence Valgrind suggests it's a double-delete problem. Might be a better clue lingering around in there.
It's one of the simpler truly nasty problems.
This may or may not work for you.
Long time ago I was working on 1M+ lines program that was 15 years old at the time. Faced with the exact same problem - double delete with huge data set. With such data any out of the box "memory profiler" would be a no go.
Things that were on my side:
It was very reproducible - we had macro language and running same script exactly the same way reproduced it every time
Sometime during the history of the project someone decided that "#define malloc my_malloc" and "#define free my_free" had some use. These didn't do much more than call built-in malloc() and free() but project already compiled and worked this way.
Now the trick/idea:
my_malloc(int size)
{
static int allocation_num = 0; // it was single threaded
void* p = builtin_malloc(size+16);
*(int*)p = ++allocation_num;
*((char*)p+sizeof(int)) = 0; // not freed
return (char*)p+16; // check for NULL in order here
}
my_free(void* p)
{
if (*((char*)p+sizeof(int)))
{
// this is double free, check allocation_number
// then rerun app with this in my_alloc
// if (alloc_num == XXX) debug_break();
}
*((char*)p+sizeof(int)) = 1; // freed
//built_in_free((char*)p-16); // do not do this until problem is figured out
}
With new/delete it might be trickier, but still with LD_PRELOAD you might be able to replace malloc/free without even recompiling your app.
you are probably upgrading from a version that treated delete differently then the new version.
probably what the previous version did was when delete was called it did a static check for if (X != NULL){ delete X; X = NULL;} and then in the new version it just does the delete action.
you might need to go through and check for pointer assignments, and tracking references of object names from construction to deletion.
I've found this useful: backtrace() on linux. (You have to compile with -rdynamic.) This lets you find out where that double free is coming from by putting a try/catch block around all memory operations (new/delete) then in the catch block, print out your stack trace.
This way you can narrow down the suspects much faster than running valgrind.
I wrapped backtrace in a handy little class so that I can just say:
try {
...
} catch (...) {
StackTrace trace;
std::cerr << "Double free!!!\n" << trace << std::endl;
throw;
}
On Windows, assuming the app is built with MSVC++, you can take advantage of the extensive heap debugging tools built into the debug version of the standard library.
Also on Windows, you can use Application Verifier. If I recall correctly, it has a mode the forces each allocation onto a separate page with protected guard pages in between. It's very effective at finding buffer overruns, but I suspect it would also be useful for a double-free situation.
Another thing you could do (on any platform) would be to make a copy of the sources that are transformed (perhaps with macros) so that every instance of:
delete foo;
is replaced with:
{ delete foo; foo = nullptr; }
(The braces help in many cases, though it's not perfect.) That will turn many instances of double-free into a null pointer reference, making it much easier to detect. It doesn't catch everything; you might have a copy of a stale pointer, but it can help squash a lot of the common use-after-delete scenarios.
I get a segmentation fault when attempting to delete this.
I know what you think about delete this, but it has been left over by my predecessor. I am aware of some precautions I should take, which have been validated and taken care of.
I don't get what kind of conditions might lead to this crash, only once in a while. About 95% of the time the code runs perfectly fine but sometimes this seems to be corrupted somehow and crash.
The destructor of the class doesn't do anything btw.
Should I assume that something is corrupting my heap somewhere else and that the this pointer is messed up somehow?
Edit : As requested, the crashing code:
long CImageBuffer::Release()
{
long nRefCount = InterlockedDecrement(&m_nRefCount);
if(nRefCount == 0)
{
delete this;
}
return nRefCount;
}
The object has been created with a new, it is not in any kind of array.
The most obvious answer is : don't delete this.
If you insists on doing that, then use common ways of finding bugs :
1. use valgrind (or similar tool) to find memory access problems
2. write unit tests
3. use debugger (prepare for loooong staring at the screen - depends on how big your project is)
It seems like you've mismatched new and delete. Note that delete this; can only be used on an object which was allocated using new (and in case of overridden operator new, or multiple copies of the C++ runtime, the particular new that matches delete found in the current scope)
Crashes upon deallocation can be a pain: It is not supposed to happen, and when it happens, the code is too complicated to easily find a solution.
Note: The use of InterlockedDecrement have me assume you are working on Windows.
Log everything
My own solution was to massively log the construction/destruction, as the crash could well never happen while debugging:
Log the construction, including the this pointer value, and other relevant data
Log the destruction, including the this pointer value, and other relevant data
This way, you'll be able to see if the this was deallocated twice, or even allocated at all.
... everything, including the stack
My problem happened in Managed C++/.NET code, meaning that I had easy access to the stack, which was a blessing. You seem to work on plain C++, so retrieving the stack could be a chore, but still, it remains very very useful.
You should try to load code from internet to print out the current stack for each log. I remember playing with http://www.codeproject.com/KB/threads/StackWalker.aspx for that.
Note that you'll need to either be in debug build, or have the PDB file along the executable file, to make sure the stack will be fully printed.
... everything, including multiple crashes
I believe you are on Windows: You could try to catch the SEH exception. This way, if multiple crashes are happening, you'll see them all, instead of seeing only the first, and each time you'll be able to mark "OK" or "CRASHED" in your logs. I went even as far as using maps to remember addresses of allocations/deallocations, thus organizing the logs to show them together (instead of sequentially).
I'm at home, so I can't provide you with the exact code, but here, Google is your friend, but the thing to remember is that you can't have a __try/__except handdler everywhere (C++ unwinding and C++ exception handlers are not compatible with SEH), so you'll have to write an intermediary function to catch the SEH exception.
Is your crash thread-related?
Last, but not least, the "I happens only 5% of the time" symptom could be caused by different code path executions, or the fact you have multiple threads playing together with the same data.
The InterlockedDecrement part bothers me: Is your object living in multiple threads? And is m_nRefCount correctly aligned and volatile LONG?
The correctly aligned and LONG part are important, here.
If your variable is not a LONG (for example, it could be a size_t, which is not a LONG on a 64-bit Windows), then the function could well work the wrong way.
The same can be said for a variable not aligned on 32-byte boundaries. Is there #pragma pack() instructions in your code? Does your projet file change the default alignment (I assume you're working on Visual Studio)?
For the volatile part, InterlockedDecrement seem to generate a Read/Write memory barrier, so the volatile part should not be mandatory (see http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx).
Sorry if this sounds like an "It compiles, so it must work!" question, but I want to understand why something is happening (or not happening, as the case may be).
In Project Settings, I set Basic Runtime Checks to Both. The debugger informs me that:
Run-Time Check Failure #2 - Stack around the variable 'beg' was corrupted.
But if I set it to the default, which is none, the program runs and completes normally, throwing no exceptions and causing no errors.
My question is, can I safely ignore this (because MSVC++ could be somehow wrong) or is this a real problem? I don't see how the program can continue successfully when the stack has been screwed up.
Edit:
The function that causes this error looks exactly like this:
int fun(list<int>::iterator&, const list<int>::iterator&);
int foo(list<int>& l) {
list<int>::iterator beg = l.begin();
list<int>::iterator end = l.end();
return fun(beg, end);
}
fun increments and operates on beg and when it returns, beg == end, and when MSVC++ breaks, it points to the closing }.
Edit 2:
I have isolated the problem. In some situations, fun removes some elements from the list who owns the items it iterates. This is what causes the error.
Your question isn't answerable without code to reproduce the problem.
But to give a vague answer to your general problem - If the compiler or debugger detected a problem, you probably have one.
In C++, just because something "goes wrong" doesn't mean your program will crash - it might keep running with completely unpredictable results. It may even complete with the results you desired. But just because it ran well on your system doesn't give you any guarantee for other systems, compilers, times of day, or even for additional runs of the same program.
This is called undefined behavior, and is caused by using the language incorrectly (but not in a way that causes a compile failure). A buffer overrun is only one of dozens of examples.
It turned out something was wrong with my Visual Studio installation, so reinstalling it fixed the problem.
So I have been debugging this error for hours now. I writing a program using Ogre3d relevant only because it doesn't load symbols so it doesn't let me stack trace which made finding the location of the crash even harder. So, write before I call a specific function I print out "Starting" then I call the function and immediately after I print "Stopping". Throughout the function I print out letters A-F where F is printed right before the function returns (one line above the last '}') The weird thing is when the crash occurs it is after the 'F' is printed but there is no 'Stopping'. Does this mean that the crash is happening in between somewhere? The only thing I can think of is something going wrong during the deallocation of some of the memory allocated during the function. I've never had anything happen like this, I will keep checking to make sure it's going wrong where I think it is.
Most of the times when something weird and un-understandable happens, it's because of something else.
You could have some dangling pointers in your code (even in a place far away from that function) pointing to some random memory cells.
You might have used such dangling pointer, and it might have resulted in overwriting some memory cells you need. The result of this is that you changed the behavior of your program by changing some variable defined elsewhere, some constants, or even some code!
I'd suggest you to debug your application using some tool able to check and report erroneous memory accesses, like Valgrind.
Anyway if you are able to localize the source of your crash and to write a really small piece of code that will crash post it here -- it could be just a simple error in your function, although it sounds unlikely, from your description.
This probably means that the error is happening when the function returns and some destructor is firing. Chances are that you have some destructor trying to free memory it doesn't own, or writing off the end of some buffer in a log, etc.
Another possibility to be aware of might come up if you aren't flushing the output stream. It's possible that "Stopping" is getting printed, but is being buffered before hitting stdout. Make sure to check for this, since if that's what's going on you'll be barking up the wrong tree.
I had a similar problem, and it turned out that my function was not returning anything when the signature expected a return type of std::shared_ptr, even though I was not using the return anywhere.
The function had the following signature:
std::shared_ptr<blDataNode> blConditionBasedDataSelectionUI::selectData(std::shared_ptr<blDataNode> inputData)
{
// My error was due to the function
// not returning anything
}
I encountered the same problem and it turned out I forgot to init my vector before appending new items, which cause error when my function was comparing the vector with other list.
std::vector<cv::Point> lefteyeCV;
void Init() {
// I need to add "lefteyeCV.clear();" here!
for (int i = 0; i < 8; i++) {
lefteyeCV.push_back(cv::Point(0, 0));
}
}
// the following comparison will crash after "return 0"
// because cl_ is of size 8, but if I run "Init()" twice, lefteyeCV.size() = 16
// then the comparison is out of range.
int irisTrack(){
for (int i = 0; i < lefteyeCV.size(); i++) {
cl_[i] = cv::Point(lefteyeCV[order[i]].x - leftRect.x, lefteyeCV[order[i]].y - leftRect.y);
}
return 0;
}
What's confusing is that, I'm using Xcode and the app crash right after "return 0" with the indecipherable message "thread 13: signal SIGABRT". However, using Visual Studio instead showed me the line where index is out of range.
I'm changing over from Visual Studio 2008 -> 2010 and I've come across a weird bug in my code when evaluating a find on a std::set of pointers.
I know that this version brings about a change where set::iterator has the same type as set::const_iterator to bring about some compatability with the standard. But I can't figure out why this section of code which previously worked now causes a crash?
void checkStop(Stop* stop)
{
set<Stop*> m_mustFindStops;
if (m_mustFindStops.find(stop) != m_mustFindStops.end()) // this line crashes for some reason??
{
// do some stuff
}
}
PS m_mustFindStops is empty when it crashes.
EDIT: Thanks for the quick replies... I can't get it to reproduce with a simple case either - it's probably not a problem with the set its self. I think that heap corruption may be a culprit - I just wish I knew why changing compilers would suddenly cause corruption for the same code and same input data.
The only thing I can think of is that you have multiple threads, and m_mustfindStops is in fact a member or global variable and not a local to this function. There is no way the code above can cause problems, if correct and taken in isolation.
If you have multiple threads, then read access concurrent with write access will cause random errors - even if the container looks empty, it might not have been when the find call started.
Another possibility is that some other code has corrupted the heap, in which case however any of your code that uses heap memory could malfunction. With that in mind, if it's always this logic that breaks, my bet would be on a threading issue.
btw - there is absolutely nothing wrong with std::set in Visual C++ v10 - your code must have a bug.