Debugging a memory error with GDB and C++

Debugging a memory error with GDB and C++ - c++

I'm running my C++ program in gdb. I'm not real experienced with gdb, but I'm getting messages like:
warning: HEAP[test.exe]:
warning: Heap block at 064EA560 modified at 064EA569 past requested size of 1
How can I track down where this is happening at? Viewing the memory doesn't give me any clues.
Thanks!

So you're busting your heap. Here's a nice GDB tutorial to keep in mind.
My normal practice is to set a break in known good part of the code. Once it gets there step through until you error out. Normally you can determine the problem that way.
Because you're getting a heap error I'd assume it has to do with something you're putting on the heap so pay special attention to variables (I think you can use print in GDB to determine it's memory address and that may be able to sync you with where your erroring out). You should also remember that entering functions and returning from functions play with the heap so they may be where your problem lies (especially if you messed your heap before returning from a function).

You can probably use a feature called a "watch point". This is like a break point but the debugger stops when the memory is modified.
I gave a rough idea on how to use this in an answer to a different question.

If you can use other tools, I highly recommend trying out Valgrind. It is an instrumentation framework, that can run your code in a manner that allows it to, typically, stop at the exact instruction that causes the error. Heap errors are usually easy to find, this way.

One thing you can try, as this is the same sort of thing as the standard libc, with the MALLOC_CHECK_ envronment variable configured (man libc).
If you keep from exiting gdb (if your application quit's, just use "r" to re-run it), you can setup a memory breakpoint at that address, "hbreak 0x64EA569", also use "help hbreak" to configure condition's or other breakpoitn enable/disable options to prevent excessively entering that breakpoint....
You can just configure a log file, set log ... setup a stack trace on every break, "display/bt -4", then hit r, and just hold down the enter key and let it scroll by
"or use c ## to continue x times... etc..", eventually you will see that same assertion, then you will now have (due to the display/bt) a stacktrace which you can corolate to what code was modifying on that address...

I had similar problem when I was trying to realloc array of pointers to my structures, but instead I was reallocating as array of ints (because I got the code from tutorial and forgot to change it). The compiler wasnt correcting me because it cannot be checked whats in size argument.
My variable was:
itemsetList_t ** iteration_isets;
So in realloc instead of having:
iteration_isets = realloc(iteration_isets, sizeof(itemsetList_t *) * max_elem);
I had:
iteration_isets = realloc(iteration_isets, sizeof(int) * max_elem);
And this caused my heap problem.

Related

Debugging a Corrupted Object on the Heap

I'm debugging a non-trivial software project where I have a bunch of objects located on the heap. At some point in time (at least) one of these objects gets corrupted.
I added a const member to my class to serve as a canary and indeed, it gets corrupted during executing. Typically I'd add a watchpoint to this variable to figure out when the memory is written to. However, I don't know which instance gets overwritten, as any information stored in the class gets corrupted as well.
I have too many objects to set a watchpoint on each of them and I haven't been able to reproduce with a smaller input set. Running valgrind I see "Invalid read of size 4", which is my canary int of 4 bytes being read but at this point it's already too late.
Any suggestions on how to proceed from here?

Probably this won't be specific enough, but when I had a similar problem, here is what I ended up doing. I'm assuming you can reproduce your problem in a deterministic fashion.
My strategy was to find which instance caused the problem first. This I did with a counter on a specific line that exposes the symptom. For example, on Visual Studio, I would setup a breakpoint that triggers on the 100000th hit, so that it never does; but Visual Studio still tells you how many times the breakpoint is encountered during execution. By trial and error, I would find that the problem occurs on the say, 20th time the breakpoint is encountered, and so I would set the breakpoint to trigger on the 19th iteration, to be able to discriminate the appropriate instance before corruption occurred.
Starting from there, I could have the address of the variable that was corrupted before it was, and play with the debugger to find out what is going on: gather enough information about the faulty instance.
Then, I did setup breakpoints at strategic places, which were triggered by conditions : eg. trigger only for an instance with the appropriate address, or with specific values in members.
You'll probably get to when the symptom occurs precisely, but not to the problem, but that's still something.
Hope this helps!

Running valgrind I see "Invalid read of size 4", which is my canary int of 4 bytes being read but at this point it's already too late.
You are confused: if valgrind told you that you are doing invalid read (presumably because the object has been freed), then you are reading danging (already freed) object, and that is exactly your problem.
You shouldn't try to access such objects, and the fact that your canary has been changed / corrupted after you freed / deleted the object is irrelevant.

I managed to find out what was causing my issue. Turns out the object I was looking at never existed in the first place. Like #employed-russian, I wondered whether my object might have been deleted somewhere I wasn't aware of. Putting a breakpoint on the destructor yielded nothing so the only reasonable explanation is the pointer itself being invalid, pointing to memory that wasn't a valid instance of my class.
Lo and behold; the pointer I was dereferencing was left uninitialized by some constructor of another class. I figured it out when I added an explicit check for null and Valgrind's error became Conditional jump or move depends on uninitialised value(s). By using --track-origins=yes, I quickly figured out the source of the uninitialized data, i.e. the pointer missing from the initialization list.
(I know uninitialized values can be detected by the compiler with -Wuninitialized but apparently my version of clang (apple) didn't feel like mentioning it with -Wall enabled.)

Segmentation fault on one Linux machine but not another with C++ code

I have been having a peculiar problem. I have developed a C++ program on a Linux cluster at work. I have tried to use it home on an Ubuntu 14.04 machine, but the program, which is composed of 6 files: main.hpp,main.cpp (dependent on) sarsa.hpp,sarsa.cpp (class Sarsa) (dependent on) wec.hpp,wec.cpp, does compile, but when I run it it either returns segmenation fault or does not enter one fundamental function of the class Sarsa.
The main code calls the constructor and setter functions without problems:
Sarsa run;
run.setVectorSize(memory,3,tilings,1000);
etc.
However, it cannot run the public function episode , since learningRate, which should contain a large integer, returns 0 for all episodes (iterations).
learningRate[episode]=run.episode(numSteps,graph);}
I tried to debug the code with gdb, which has returned:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000408f4a in main () at main.cpp:152
152 learningRate[episode]=run.episode(numSteps,graph);}
I also tried valgrind, which returned:
==10321== Uninitialised value was created by a stack allocation
==10321== at 0x408CAD: main (main.cpp:112)
But no memory leakage issues.
I was wondering if there was a setting to try to debug the external file sarsa.cpp, since I think that class is likely to be the culpript
In the file, I use C++v11 language (I would be expecting errors at compile-time,though), so I even compiled with g++ -std=c++0x, but there were no improvements.
Unluckily, because of the size of the code, I cannot post it here. I would really appreciate any help with this problem. Am I missing anything obvious? Could you help me at least with the debugging?
Thank you in advance for the help.
Correction:
main.cpp:
Definition of the global array:
`#define numEpisodes 10
int learningRate[numEpisodes];`
Towards the end of the main function:
for (int episode; episode<numEpisodes; episode++) {
if (episode==(numEpisodes-1)) { // Save the simulation data only at the
graph=true;} // last episode
learningRate[episode]=run.episode(numSteps,graph);}

As the code you just added to the question reveals, the problem arises because you did not initialize the episode variable. The behavior of any code that uses its value before you assign one is undefined, so it is entirely reasonable that the program behaves differently in one environment than in another.

A segmentation fault indicates an invalid memory access. Usually this means that somewhere, you're reading or writing past the end of an array, or through an invalid pointer, or through an object that has already been freed. You don't necessarily get the segmentation fault at the point where the bug occurs; for instance, you could write past the end of an array onto heap metadata, which causes a crash later on when you try to allocate or release an unrelated object. So it's perfectly reasonable for a program to appear to work on one system but crash on another.
In this case, I'd start by looking at learningRate[episode]. What is the value of episode? Is it within the bounds of learningRate?

I was wondering if there was a setting to try to debug the external file sarsa.cpp, since I think that class is likely to be the culpript
It's possible to set breakpoints in functions other than main.cpp.
break location
Set a breakpoint at the given location, which can specify a function name, a line number, or an address of an instruction.
At least, I think that's your question. You'll also need to know how to step into functions.
More importantly, you need to learn what your tools are trying to tell you. A segfault is the operating system's reaction to an attempt to dereference memory that doesn't belong to you. One common reason for that is trying to dereference NULL. Another would be trying to dereference a pointer that was never initialized. The Valgrind error message suggests that you may have an unitialized pointer.
Without the code, I can't tell you why the pointer isn't initialized when you run the program on your home system, but is (apparently) initialized when you run it at work. I suspect that you don't have the necessary data on your home system, but you'll need to investigate and figure that out. The fundamental question to keep asking yourself is "what is different between my home computer an dmy work computer?"

Visual Studio 2013 C++ - How to determine which line write to my variable?

today I encountered weird problem, one of my global variables somehow gets corrupted.
I tried modifying it during runtime (assigned some key for it), while I'm holding button it is ok, but as soon as release it less significant byte is zeroed (corupted). In fact its dancing around this two values, I am using Cheat Engine to check value.
I checked all references to it, and for sure I can say I only write to it on program start, later only read from it. So most likely its some bad pointer, overflow, or something.
I attached Cheat Engine's debbuger and easily found which assembly code write to it (I can see exact names of function calls around) and that way tried to determine which line is bad, but when watching code there is no command between these two function calls.
I already use safe functions in my code (strcpy_s, ...), and aswell always sizeof macro, not sure where could things go wrong.
Is there a way in Visual Studio to determine which line write to my variable (address)? Or any help is greatly appreciated.

you'll be trashing memory - something will be writing to memory in the block before your variable, but "overflowing" into the memory location your variable lives in. What I'd do is put a memory breakpoint on (check the VC docs if you can't figure it out from the breakpoint menu) and wait for it to change. Then you can see in the call stack which routines are being processed, which should some memory update (eg a C-style string clear or a memset or similar).

My code crashes on delete this

I get a segmentation fault when attempting to delete this.
I know what you think about delete this, but it has been left over by my predecessor. I am aware of some precautions I should take, which have been validated and taken care of.
I don't get what kind of conditions might lead to this crash, only once in a while. About 95% of the time the code runs perfectly fine but sometimes this seems to be corrupted somehow and crash.
The destructor of the class doesn't do anything btw.
Should I assume that something is corrupting my heap somewhere else and that the this pointer is messed up somehow?
Edit : As requested, the crashing code:
long CImageBuffer::Release()
{
long nRefCount = InterlockedDecrement(&m_nRefCount);
if(nRefCount == 0)
{
delete this;
}
return nRefCount;
}
The object has been created with a new, it is not in any kind of array.

The most obvious answer is : don't delete this.
If you insists on doing that, then use common ways of finding bugs :
1. use valgrind (or similar tool) to find memory access problems
2. write unit tests
3. use debugger (prepare for loooong staring at the screen - depends on how big your project is)

It seems like you've mismatched new and delete. Note that delete this; can only be used on an object which was allocated using new (and in case of overridden operator new, or multiple copies of the C++ runtime, the particular new that matches delete found in the current scope)

Crashes upon deallocation can be a pain: It is not supposed to happen, and when it happens, the code is too complicated to easily find a solution.
Note: The use of InterlockedDecrement have me assume you are working on Windows.
Log everything
My own solution was to massively log the construction/destruction, as the crash could well never happen while debugging:
Log the construction, including the this pointer value, and other relevant data
Log the destruction, including the this pointer value, and other relevant data
This way, you'll be able to see if the this was deallocated twice, or even allocated at all.
... everything, including the stack
My problem happened in Managed C++/.NET code, meaning that I had easy access to the stack, which was a blessing. You seem to work on plain C++, so retrieving the stack could be a chore, but still, it remains very very useful.
You should try to load code from internet to print out the current stack for each log. I remember playing with http://www.codeproject.com/KB/threads/StackWalker.aspx for that.
Note that you'll need to either be in debug build, or have the PDB file along the executable file, to make sure the stack will be fully printed.
... everything, including multiple crashes
I believe you are on Windows: You could try to catch the SEH exception. This way, if multiple crashes are happening, you'll see them all, instead of seeing only the first, and each time you'll be able to mark "OK" or "CRASHED" in your logs. I went even as far as using maps to remember addresses of allocations/deallocations, thus organizing the logs to show them together (instead of sequentially).
I'm at home, so I can't provide you with the exact code, but here, Google is your friend, but the thing to remember is that you can't have a __try/__except handdler everywhere (C++ unwinding and C++ exception handlers are not compatible with SEH), so you'll have to write an intermediary function to catch the SEH exception.
Is your crash thread-related?
Last, but not least, the "I happens only 5% of the time" symptom could be caused by different code path executions, or the fact you have multiple threads playing together with the same data.
The InterlockedDecrement part bothers me: Is your object living in multiple threads? And is m_nRefCount correctly aligned and volatile LONG?
The correctly aligned and LONG part are important, here.
If your variable is not a LONG (for example, it could be a size_t, which is not a LONG on a 64-bit Windows), then the function could well work the wrong way.
The same can be said for a variable not aligned on 32-byte boundaries. Is there #pragma pack() instructions in your code? Does your projet file change the default alignment (I assume you're working on Visual Studio)?
For the volatile part, InterlockedDecrement seem to generate a Read/Write memory barrier, so the volatile part should not be mandatory (see http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx).

How to track down a SIGFPE/Arithmetic exception

I have a C++ application cross-compiled for Linux running on an ARM CortexA9 processor which is crashing with a SIGFPE/Arithmetic exception. Initially I thought that it's because of some optimizations introduced by the -O3 flag of gcc but then I built it in debug mode and it still crashes.
I debugged the application with gdb which catches the exception but unfortunately the operation triggering exception seems to also trash the stack so I cannot get any detailed information about the place in my code which causes that to happen. The only detail I could finally get was the operation triggering the exception(from the following piece of stack trace):
3 raise() 0x402720ac
2 __aeabi_uldivmod() 0x400bb0b8
1 __divsi3() 0x400b9880
The __aeabi_uldivmod() is performing an unsigned long long division and reminder so I tried the brute force approach and searched my code for places that might use that operation but without much success as it proved to be a daunting task. Also I tried to check for potential divisions by zero but again the code base it's pretty large and checking every division operation it's a cumbersome and somewhat dumb approach. So there must be a smarter way to figure out what's happening.
Are there any techniques to track down the causes of such exceptions when the debugger cannot do much to help?
UPDATE: After crunching on hex numbers, dumping memory and doing stack forensics(thanks Crashworks) I came across this gem in the ARM Compiler documentation(even though I'm not using the ARM Ltd. compiler):
Integer division-by-zero errors can be trapped and identified by
re-implementing the appropriate C library helper functions. The
default behavior when division by zero occurs is that when the signal
function is used, or
__rt_raise() or __aeabi_idiv0() are re-implemented, __aeabi_idiv0() is
called. Otherwise, the division function returns zero.
__aeabi_idiv0() raises SIGFPE with an additional argument, DIVBYZERO.
So I put a breakpoint at __aeabi_idiv0(_aeabi_ldiv0) et Voila!, I had my complete stack trace before being completely trashed. Thanks everybody for their very informative answers!
Disclaimer: the "winning" answer was chosen solely and subjectively taking into account the weight of its suggestions into my debugging efforts, because more than one was informative and really helpful.

My first suggestion would be to open a memory window looking at the region around your stack pointer, and go digging through it to see if you can find uncorrupted stack frames nearby that might give you a clue as to where the crash was. Usually stack-trashes only burn a couple of the stack frames, so if you look upwards a few hundred bytes, you can get past the damaged area and get a general sense of where the code was. You can even look down the stack, on the assumption that the dead function might have called some other function before it died, and thus there might be an old frame still in memory pointing back at the current IP.
In the comments, I linked some presentation slides that illustrate the technique on a PowerPC — look at around #73-86 for a case study in a similar botched-stack crash. Obviously your ARM's stack frames will be laid out differently, but the general principle holds.

(Using the basic idea from Fedor Skrynnikov, but with compiler help instead)
Compile your code with -pg. This will insert calls to mcount and mcountleave() in every function. Do not link against the GCC profiling lib, but provide your own. The only thing you want to do in your mcount and mcountleave() is to keep a copy of the current stack, so just copy the top 128 bytes or so of the stack to a fixed buffer. Both the stack and the buffer will be in cache all the time so it's fairly cheap.

You can implement special guards in functions that can cause the exception. Guard is a simple class, in constractor of this class you put the name of the file and line (_FILE_, _LINE_) into file/array/whatever. The main condition is that this storage should be the same for all instances of this class(kind of stack). In the destructor you remove this line. To make it works you need to put the creation of this guard on the first line of each function and to create it only on stack. When you will be out of current block deconstructor will be called. So in the moment of your exception you will know from this improvised callstack which function is causing a problem.
Ofcaurse you may put creation of this class under debug condition

Enable generation of core files, and open the core file with the debuger

Since it uses raise() to raise the exception, I would expect that signal() should be able to catch it. Is this not the case?
Alternatively, you can set a conditional breakpoint at __aeabi_uldivmod to break when divisor (r1) is 0.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js