I have an std::string which appears to be getting corrupted somehow. Sometimes the string destructor will trigger an access violation, and sometimes printing it via std::cout will produce a crash.
If I pad the string in a struct as follows, the back_padding becomes slightly corrupted at a relatively consistant point in my code:
struct Test {
int front_padding[128] = {0};
std::string my_string;
int back_padding[128] = {0};
};
Is there a way to protect the front and back padding arrays so that writing to them will cause a exception or something? Or perhaps some tool which can be used to catch the culprit writing to this memory?
Platform: Windows x64 built with MSVC.
In general you have to solve problem of code sanitation, which is quite a broad topic. It sounds like you may have either out-of-bound write, or use of a dangling pointer or even a race condition in using a pointer, but in latter case bug's visibility is affected by obsevation, like the proverbial cat in quantum superposition state.
A dirty way to debug source of such rogue write is to create a data breakpont. It is especially effective if bug appears to be deterministic and isn't a "heisenbug". It is possible in MSVS during debug session. In gdb it is possible by using watch breakpoints.
You can point at the std::string storage or, in your experimental case, at the front padding array to in attempt to trigger breakpoint where a write operation occurs.
How can you catch memory corruption in C++?
The best way with a modern compiler is to compile with an address sanitizer. This inserts exactly the sort of guard areas you describe around automatic (stack) and dynamic (heap) allocations, and detects when they're trampled. It's built into Clang, GCC and MSVC.
If you don't have compiler support, or need to diagnose the problem in an existing binary without recompiling, you can use Valgrind.
The sanitized executable runs at full speed, although it's doing more work and deliberately has a less cache-friendly memory layout; expect it to be about 2x slower than an equivalent un-instrumented build.
Running under valgrind is much slower (expect 10x-30x for memcheck), but will catch more types of error, and is your only option if you can't recompile.
Related
I have what seems to be a memory corruption in my application but it seems I cannot find the source of it with any of the following tools: gdb, valgrind, address sanitizer, rr (seems like my processor is too old for this).
I found out about the problem because our signal handler reports a signal when accessing that specific memory.
Valgrind was slightly helpful in saying that I have an invalid read of size 8 (because that memory is accessed through the pointer) and asan reported that the pointer, which in code is of type A*, is actually an unrelated type B.
Sadly I cannot give a sscce so I'd like to know how would one continue debugging in this situation ? are there any other tools I could try ?
Platform is RHEL linux kernel version 3.10
Compiler GCC 6.3.1
I'd have a suggestion how to proceed. First some assumptions:
You know how to reproduce the bug.
It's not a random corruption but in one place or at least a restricted area.
The basic idea is that you regularly trigger the access to the corrupted area that causes the signal. This increases the likelihood that the corruption is detected close to the point that caused it. For that, I'd suggest using a simple object that takes the data structure in construction and validates it:
struct validator
{
validator(data_structure const& data):
m_data(data)
{
validate();
}
~validator()
{
validate();
}
void validate()
{
// Code that walks over the data structure
// and possibly trips over corruptions.
}
data_structure const& m_data;
};
Just sprinkle instances of this throughout your code. When going through the generated crashes, you will find that some happen in areas that don't usually have anything to do with that data structure while others happen in code that really uses the data structure. This distinction is completely meaningless, because the fact that it detects the corruption doesn't mean it caused the corruption! However, if you collect more and more of these crashes, you will find that some code is more likely to be involved, and exactly that code should be surveyed more closely.
The larger the effort is to reproduce the bug, the more difficult it becomes to collect meaningful amounts of data. Therefore, it becomes important to be able to reproduce what you did and what results you received. I would therefore move the bug hunt to a separate branch in version control and include a file with a protocol in version control, too. Of course, that branch isn't intended to be merged back, but it provides relevant context information.
I'm maintaining a legacy application written in C++. It crashes every now and then and Valgrind tells me its a double delete of some object.
What are the best ways to find the bug that is causing a double delete in an application you don't fully understand and which is too large to be rewritten ?
Please share your best tips and tricks!
Here's some general suggestion's that have helped me in that situation:
Turn your logging level up to full debug, if you are using a logger. Look for suspicious stuff in the output. If your app doesn't log pointer allocations and deletes of the object/class under suspicion, it's time to insert some cout << "class Foo constructed, ptr= " << this << endl; statements in your code (and corresponding delete/destructor prints).
Run valgrind with --db-attach=yes. I've found this very handy, if a bit tedious. Valgrind will show you a stack trace every time it detects a significant memory error or event and then ask you if you want to debug it. You may find yourself repeatedly pressing 'n' many many times if your app is large, but keep looking for the line of code where the object in question is first (and secondly) deleted.
Just scour the code. Look for construction/deletion of the object in question. Sadly, sometimes it winds up being in a 3rd party library :-(.
Update: Just found this out recently: Apparently gcc 4.8 and later (if you can use GCC on your system) has some new built-in features for detecting memory errors, the "address sanitizer". Also available in the LLVM compiler system.
Yep. What #OliCharlesworth said. There's no surefire way of testing a pointer to see if it points to allocated memory, since it really is just the memory location itself.
The biggest problem your question implies is the lack of reproducability. Continuing with that in mind, you're stuck with changing simple 'delete' constructs to delete foo;foo = NULL;.
Even then the best case scenario is "it seems to occur less" until you've really stamped it down.
I'd also ask by what evidence Valgrind suggests it's a double-delete problem. Might be a better clue lingering around in there.
It's one of the simpler truly nasty problems.
This may or may not work for you.
Long time ago I was working on 1M+ lines program that was 15 years old at the time. Faced with the exact same problem - double delete with huge data set. With such data any out of the box "memory profiler" would be a no go.
Things that were on my side:
It was very reproducible - we had macro language and running same script exactly the same way reproduced it every time
Sometime during the history of the project someone decided that "#define malloc my_malloc" and "#define free my_free" had some use. These didn't do much more than call built-in malloc() and free() but project already compiled and worked this way.
Now the trick/idea:
my_malloc(int size)
{
static int allocation_num = 0; // it was single threaded
void* p = builtin_malloc(size+16);
*(int*)p = ++allocation_num;
*((char*)p+sizeof(int)) = 0; // not freed
return (char*)p+16; // check for NULL in order here
}
my_free(void* p)
{
if (*((char*)p+sizeof(int)))
{
// this is double free, check allocation_number
// then rerun app with this in my_alloc
// if (alloc_num == XXX) debug_break();
}
*((char*)p+sizeof(int)) = 1; // freed
//built_in_free((char*)p-16); // do not do this until problem is figured out
}
With new/delete it might be trickier, but still with LD_PRELOAD you might be able to replace malloc/free without even recompiling your app.
you are probably upgrading from a version that treated delete differently then the new version.
probably what the previous version did was when delete was called it did a static check for if (X != NULL){ delete X; X = NULL;} and then in the new version it just does the delete action.
you might need to go through and check for pointer assignments, and tracking references of object names from construction to deletion.
I've found this useful: backtrace() on linux. (You have to compile with -rdynamic.) This lets you find out where that double free is coming from by putting a try/catch block around all memory operations (new/delete) then in the catch block, print out your stack trace.
This way you can narrow down the suspects much faster than running valgrind.
I wrapped backtrace in a handy little class so that I can just say:
try {
...
} catch (...) {
StackTrace trace;
std::cerr << "Double free!!!\n" << trace << std::endl;
throw;
}
On Windows, assuming the app is built with MSVC++, you can take advantage of the extensive heap debugging tools built into the debug version of the standard library.
Also on Windows, you can use Application Verifier. If I recall correctly, it has a mode the forces each allocation onto a separate page with protected guard pages in between. It's very effective at finding buffer overruns, but I suspect it would also be useful for a double-free situation.
Another thing you could do (on any platform) would be to make a copy of the sources that are transformed (perhaps with macros) so that every instance of:
delete foo;
is replaced with:
{ delete foo; foo = nullptr; }
(The braces help in many cases, though it's not perfect.) That will turn many instances of double-free into a null pointer reference, making it much easier to detect. It doesn't catch everything; you might have a copy of a stale pointer, but it can help squash a lot of the common use-after-delete scenarios.
I get a segmentation fault when attempting to delete this.
I know what you think about delete this, but it has been left over by my predecessor. I am aware of some precautions I should take, which have been validated and taken care of.
I don't get what kind of conditions might lead to this crash, only once in a while. About 95% of the time the code runs perfectly fine but sometimes this seems to be corrupted somehow and crash.
The destructor of the class doesn't do anything btw.
Should I assume that something is corrupting my heap somewhere else and that the this pointer is messed up somehow?
Edit : As requested, the crashing code:
long CImageBuffer::Release()
{
long nRefCount = InterlockedDecrement(&m_nRefCount);
if(nRefCount == 0)
{
delete this;
}
return nRefCount;
}
The object has been created with a new, it is not in any kind of array.
The most obvious answer is : don't delete this.
If you insists on doing that, then use common ways of finding bugs :
1. use valgrind (or similar tool) to find memory access problems
2. write unit tests
3. use debugger (prepare for loooong staring at the screen - depends on how big your project is)
It seems like you've mismatched new and delete. Note that delete this; can only be used on an object which was allocated using new (and in case of overridden operator new, or multiple copies of the C++ runtime, the particular new that matches delete found in the current scope)
Crashes upon deallocation can be a pain: It is not supposed to happen, and when it happens, the code is too complicated to easily find a solution.
Note: The use of InterlockedDecrement have me assume you are working on Windows.
Log everything
My own solution was to massively log the construction/destruction, as the crash could well never happen while debugging:
Log the construction, including the this pointer value, and other relevant data
Log the destruction, including the this pointer value, and other relevant data
This way, you'll be able to see if the this was deallocated twice, or even allocated at all.
... everything, including the stack
My problem happened in Managed C++/.NET code, meaning that I had easy access to the stack, which was a blessing. You seem to work on plain C++, so retrieving the stack could be a chore, but still, it remains very very useful.
You should try to load code from internet to print out the current stack for each log. I remember playing with http://www.codeproject.com/KB/threads/StackWalker.aspx for that.
Note that you'll need to either be in debug build, or have the PDB file along the executable file, to make sure the stack will be fully printed.
... everything, including multiple crashes
I believe you are on Windows: You could try to catch the SEH exception. This way, if multiple crashes are happening, you'll see them all, instead of seeing only the first, and each time you'll be able to mark "OK" or "CRASHED" in your logs. I went even as far as using maps to remember addresses of allocations/deallocations, thus organizing the logs to show them together (instead of sequentially).
I'm at home, so I can't provide you with the exact code, but here, Google is your friend, but the thing to remember is that you can't have a __try/__except handdler everywhere (C++ unwinding and C++ exception handlers are not compatible with SEH), so you'll have to write an intermediary function to catch the SEH exception.
Is your crash thread-related?
Last, but not least, the "I happens only 5% of the time" symptom could be caused by different code path executions, or the fact you have multiple threads playing together with the same data.
The InterlockedDecrement part bothers me: Is your object living in multiple threads? And is m_nRefCount correctly aligned and volatile LONG?
The correctly aligned and LONG part are important, here.
If your variable is not a LONG (for example, it could be a size_t, which is not a LONG on a 64-bit Windows), then the function could well work the wrong way.
The same can be said for a variable not aligned on 32-byte boundaries. Is there #pragma pack() instructions in your code? Does your projet file change the default alignment (I assume you're working on Visual Studio)?
For the volatile part, InterlockedDecrement seem to generate a Read/Write memory barrier, so the volatile part should not be mandatory (see http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx).
I have a very simple C++ code here:
char *s = new char[100];
strcpy(s, "HELLO");
delete [] s;
int n = strlen(s);
If I run this code from Visual C++ 2008 by pressing F5 (Start Debugging,) this always result in crash (Access Violation.) However, starting this executable outside the IDE, or using the IDE's Ctrl+F5 (Start without Debugging) doesn't result in any crash. What could be the difference?
I also want to know if it's possible to stably reproduce the Access Violation crash caused from accessing deleted area? Is this kind of crash rare in real-life?
Accessing memory through a deleted pointer is undefined behavior. You can't expect any reliable/repeatable behavior.
Most likely it "works" in the one case because the string is still "sitting there" in the now available memory -= but you cannot rely on that. VS fills memory with debug values to help force crashes to help find these errors.
The difference is that a debugger, and debug libraries, and code built in "debug" mode, likes to break stuff that should break. Your code should break (because it accesses memory it no longer technically owns), so it breaks easier when compiled for debugging and run in the debugger.
In real life, you don't generally get such unsubtle notice. All that stuff that makes things break when they should in the debugger...that stuff's expensive. So it's not checked as strictly in release. You might be able 99 times out of 100 to get away with freeing some memory and accessing it right after, cause the runtime libs don't always hand the memory back to the OS right away. But that 100th time, either the memory's gone, or another thread owns it now and you're getting the length of a string that's no longer a string, but a 252462649-byte array of crap that runs headlong into unallocated (and thus non-existent, as far as you or the runtime should care) memory. And there's next to nothing to tell you what just happened.
So don't do that. Once you've deleted something, consider it dead and gone. Or you'll be wasting half your life tracking down heisenbugs.
Dereferencing a pointer after delete is undefined behavior - anything can happen, including but not limited to:
data corruption
access violation
no visible effects
exact results will depend on multiple factors most of which are out of your control. You'll be much better off not triggering undefined behavior in the first place.
Usually, there is no difference in allocated and freed memory from a process perspective. E.g the process only has one large memory map that grows on demand.
Access violation is caused by reading/writing memory that is not available, ususally not paged in to the process. Various run-time memory debugging utilities uses the paging mechanism to track invalid memory accesses without the severe run time penalty that software memory checking would have.
Anyway your example proves only that an error is sometimes detected when running the program in one environment, but not detected in another environment, but it is still an error and the behaviour of the code above is undefined.
The executable with debug symbols is able to detect some cases of access violations. The code to detect this is contained in the executable, but will not be triggered by default.
Here you'll find an explanation of how you can control behaviour outside of a debugger: http://msdn.microsoft.com/en-us/library/w500y392%28v=VS.80%29.aspx
I also want to know if it's possible
to stably reproduce the Access
Violation crash caused from accessing
deleted area?
Instead of plain delete you could consider using an inline function that also sets the value of the deleted pointer to 0/NULL. This will typically crash if you reference it. However, it won't complain if you delete it a second time.
Is this kind of crash rare in
real-life?
No, this kind of crash is probably behind the majority of the crashes you and I see in software.
I have a high memory requirement in my code and this statement is repeated a lot of times:
Node** x;
x = new Node*[11];
It fails at this allocation. I figured out this line by throwing output to the console!
I am building my code on Visual Studio. It works fine in Debug mode (both in VS2005 and VS2008)
However it throws the error in VS2005 Release mode.
A direct exe generated from
cl Program.cpp works if cl is from VS2010 but fails when it's from VS2005.
Any clues?
PS: Linux gives me a Bus Error(core dumped) for the same
Thanks
UPDATE:
And I guess, it can be due to 'unaligned' thing as I understand. I just made 11 to 12 (or any even number) and It works!!! I don't know why. It doesn't work with odd numbers!
Update 2 : http://www.devx.com/tips/Tip/13265 ?
I think you've done something somewhere else which corrupted the program heap: for example, writing past the end of an allocated chunk of memory, or writing to a chunk of memory after it's been freed.
I recommend that the easiest way to diagnose the problem would be to run the software using a kind of debugger that's intended to detect this kind of problem, for example valgrind.
I have a high memory requirement in my code
Are you actually running out of memory?
x = new Node*[11];
Are you deleting x like so:
delete [] x; // the correct way
or:
delete x; // incorrect
Or there could simply be something else corrupting the heap, though I would have expected that running in debug mode mode would make it more obvious, not less so. But with heap corruption there are rarely any guarantees that it'll do so in a nice, easy to debug way.
There is nothing wrong with this code.
Node **x;
x = new Node*[11];
You are allocating 11 pointers to class Node and storing it as a double-pointer in variable x. This is fine.
The fact that your program is crashing here is probably due to some memory error that is occurring elsewhere in your program. Perhaps you're writing past array bounds somewhere. If you load this array using a for loop, double-check your indexing.
If you have access to a memory profiler, I'd recommend using it. These bugs can be difficult to track down in large programs.
A valid C++98 implementation will throw an exception (std::bad_alloc) if allocation fails, not just crash. I'd agree with previous answers and suggest running your program in valgrind as this reeks of memory corruption. Valgrind should be available in your Linux distribution of choice.