buffer overrun throw return address - c++

When I throw in a method A, it causes buffer overrun but when I return, it runs fine.
I thought throw moves execution to the caller method so the address it goes to should be the same as return address, but i am obviuosly wrong.
Is there a way to see what address throw goes to in Visual Studio debugger?
Thank you
Berkus:
does this mean that stack of upper caller method is corrupted? So for example,
Method A calls
Method B calls
Method C. Method C throws an exception
Then, it is possible that Method C's return address is ok but Method B's return address is corrupted, causing buffer overrun? What I am seeing is that if there is no throw, my application runs fine, so I think Method A,B and C all have valid return addresses.

Throw will unwind the stack, until it reaches the function with catch in it. The return address does not matter, as throw can go up several levels of stack frames if necessary.

In C++, if you have no try/catch block then this:
Method A calls
Method B calls
Method C. Method C throws an exception
will terminate the application. You must have a try/catch block in your code if want to avoid termination.

Exactly how return addresses and thrown exceptions interact is left to the details of how your particular compiler implements exception handling. To say anything about it with much certainty, someone here would have to be familiar with those internal details.
It's certainly conceivable that a buffer overrun could corrupt data that is only used by the exception handling (thus causing a throw to fail) while leaving the return address intact (thus allowing a normal return to succeed). But again, that depends on how your compiler is making use of the stack. (On a different compiler you might get totally different symptoms). It's also possible that the corruption has caused other problems that you simply haven't noticed yet. Or that such corruption will cause problems in the future after the next time you change your code. If the stack (or other memory that C++ depends on) becomes corrupted, then just about anything could happen.
Wither some educated guess-work or knowledge of the compiler details, I'm sure someone could eventually answer the specific questions about return addresses and how throwing works. However, I really think these are the wrong questions to be asking.
If you actually know you have a buffer overrun, then there's no point in trying to answer these questions. Just fix the overrun.
If you only suspect you have an overrun or are trying to track down how it's happening, then try stepping through your code in the debugger and watch for changes to memory outside the bounds of your variables. Or perhaps, alter your function to always throw and then start commenting out suspicious parts of your process one by one. Once the throw starts working again, you can then take a closer look at the code you last commented since the problem is most likely there. If these suggestions don't help, then I think the real question to ask here is "How do I track down memory corruption that only affects throwing an exception?".

Related

C++ - Debug Assertion Failed on program exit

When calling 'exit', or letting my program end itself, it causes a debug assertion:
.
Pressing 'retry' doesn't help me find the source of the problem. I know that it's most likely caused by memory being freed twice somewhere, the problem is that I have no clue where.
The whole program consists of several hundred thousand lines which makes it fairly difficult to take a guess on what exactly is causing the error. Is there a way to accurately tell where the source of the problem is located without having to comb line for line through the code?
The callstack doesn't really help either:
.
this kind of errors usally appear if you delete already deleted objects.
this happens if one object is given to multiple other objects that are supposed to take ownership of that first object and both try to delete it in their destructor.
As the messagebox already suggests, you're probably corrupting your heap in some way. Either you free/delete some memory block that you should not or you're trying to write to some memory block that has already been freed/deleted.
The callstack suggests that this is probably happening when stepping over the last line of your main function. If that is the case, then the problem is probably in the cleanup routines of some user defined types of which you create instances inside the main function. Try setting breakpoints inside the destructors of your own classes and investigate.
It is possible that you are corrupting the heap during the program operation, but it is not being detected until the end of the program, in which case the stack trace would only point to the memory checking routine
There may be a function you can call during operation that checks if the heap is valid, which may bring the fail closer to the point of corruption
HeapValidate is an example of such a routine, but it would depend on what platform you are using
This error can also happen when you use delete[] instead of delete. However, as mentioned, this is only one of many causes.

General way of solving Error: Stack around the variable 'x' was corrupted

I have a program which prompts me the error in VS2010, in debug :
Error: Stack around the variable 'x' was corrupted
This gives me the function where a stack overflow likely occurs, but I can't visually see where the problem is.
Is there a general way to debug this error with VS2010? Would it be possible to indentify which write operation is overwritting the incorrect stack memory?
thanks
Is there a general way to debug this error with VS2010?
No, there isn't. What you have done is to somehow invoke undefined behavior. The reason these behaviors are undefined is that the general case is very hard to detect/diagnose. Sometimes it is provably impossible to do so.
There are however, a somewhat smallish number of things that typically cause your problem:
Improper handling of memory:
Deleting something twice,
Using the wrong type of deletion (free for something allocated with new, etc.),
Accessing something after it's memory has been deleted.
Returning a pointer or reference to a local.
Reading or writing past the end of an array.
This can be caused by several issues, that are generally hard to see:
double deletes
delete a variable allocated with new[] or delete[] a variable allocated with new
delete something allocated with malloc
delete an automatic storage variable
returning a local by reference
If it's not immediately clear, I'd get my hands on a memory debugger (I can think of Rational Purify for windows).
This message can also be due to an array bounds violation. Make sure that your function (and every function it calls, especially member functions for stack-based objects) is obeying the bounds of any arrays that may be used.
Actually what you see is quite informative, you should check in near x variable location for any activity that might cause this error.
Below is how you can reproduce such exception:
int main() {
char buffer1[10];
char buffer2[20];
memset(buffer1, 0, sizeof(buffer1) + 1);
return 0;
}
will generate (VS2010):
Run-Time Check Failure #2 - Stack around the variable 'buffer1' was corrupted.
obviously memset has written 1 char more than it should. VS with option \GS allows to detect such buffer overflows (which you have enabled), for more on that read here: http://msdn.microsoft.com/en-us/library/Aa290051.
You can for example use debuger and step throught you code, each time watch at contents of your variable, how they change. You can also try luck with data breakpoints, you set breakpoint when some memory location changes and debugger stops at that moment,possibly showing you callstack where problem is located. But this actually might not work with \GS flag.
For detecting heap overflows you can use gflags tool.
I was puzzled by this error for hours, I know the possible causes, and they are already mentioned in the previous answers, but I don't allocate memory, don't access array elements, don't return pointers to local variables...
Then finally found the source of the problem:
*x++;
The intent was to increment the pointed value. But due to the precedence ++ comes first, moving the x pointer forward then * does nothing, then writing to *x will be corrupt the stack canary if the parameter comes from the stack, making VS complain.
Changing it to (*x)++ solves the problem.
Hope this helps.
Here is what I do in this situation:
Set a breakpoint at a location where you can see the (correct) value of the variable in question, but before the error happens. You will need the memory address of the variable whose stack is being corrupted. Sometimes I have to add a line of code in order for the debugger to give me the address easily (int *x = &y)
At this point you can set a memory breakpoint (Debug->New Breakpoint->New Data Breakpoint)
Hit Play and the debugger should stop when the memory is written to. Look up the stack (mine usually breaks in some assembly code) to see whats being called.
I usually follow the variable before the complaining variable which usually helps me get the problem. But this can sometime be very complex with no clue as you have seen it. You could enable Debug menu >> Exceptions and tick the 'Win32 exceptions" to catch all exceptions. This will still not catch this exceptions but it could catch something else which could indirectly point to the problem.
In my case it was caused by library I was using. It turnout the header file I was including in my project didn't quite match the actual header file in that library (by one line).
There is a different error which is also related:
0xC015000F: The activation context being deactivated is not the most
recently activated one.
When I got tired of getting the mysterious stack corrupted message on my computer with no debugging information, I tried my project on another computer and it was giving me the above message instead. With the new exception I was able to work my way out.
I encountered this when I made a pointer array of 13 items, then trying to set the 14th item. Changing the array to 14 items solved the problem. Hope this helps some people ^_^
One relatively common source of "Stack around the variable 'x' was corrupted" problem is wrong casting. It is sometimes hard to spot. Here is an example of a function where such problem occurs and the fix. In the function assignValue I want to assign some value to a variable. The variable is located at the memory address passed as argument to the function:
using namespace std;
template<typename T>
void assignValue(uint64_t address, T value)
{
int8_t* begin_object = reinterpret_cast<int8_t*>(std::addressof(value));
// wrongly casted to (int*), produces the error (sizeof(int) == 4)
//std::copy(begin_object, begin_object + sizeof(T), (int*)address);
// correct cast to (int8_t*), assignment byte by byte, (sizeof(int8_t) == 1)
std::copy(begin_object, begin_object + sizeof(T), (int8_t*)address);
}
int main()
{
int x = 1;
int x2 = 22;
assignValue<int>((uint64_t)&x, x2);
assert(x == x2);
}

My code crashes on delete this

I get a segmentation fault when attempting to delete this.
I know what you think about delete this, but it has been left over by my predecessor. I am aware of some precautions I should take, which have been validated and taken care of.
I don't get what kind of conditions might lead to this crash, only once in a while. About 95% of the time the code runs perfectly fine but sometimes this seems to be corrupted somehow and crash.
The destructor of the class doesn't do anything btw.
Should I assume that something is corrupting my heap somewhere else and that the this pointer is messed up somehow?
Edit : As requested, the crashing code:
long CImageBuffer::Release()
{
long nRefCount = InterlockedDecrement(&m_nRefCount);
if(nRefCount == 0)
{
delete this;
}
return nRefCount;
}
The object has been created with a new, it is not in any kind of array.
The most obvious answer is : don't delete this.
If you insists on doing that, then use common ways of finding bugs :
1. use valgrind (or similar tool) to find memory access problems
2. write unit tests
3. use debugger (prepare for loooong staring at the screen - depends on how big your project is)
It seems like you've mismatched new and delete. Note that delete this; can only be used on an object which was allocated using new (and in case of overridden operator new, or multiple copies of the C++ runtime, the particular new that matches delete found in the current scope)
Crashes upon deallocation can be a pain: It is not supposed to happen, and when it happens, the code is too complicated to easily find a solution.
Note: The use of InterlockedDecrement have me assume you are working on Windows.
Log everything
My own solution was to massively log the construction/destruction, as the crash could well never happen while debugging:
Log the construction, including the this pointer value, and other relevant data
Log the destruction, including the this pointer value, and other relevant data
This way, you'll be able to see if the this was deallocated twice, or even allocated at all.
... everything, including the stack
My problem happened in Managed C++/.NET code, meaning that I had easy access to the stack, which was a blessing. You seem to work on plain C++, so retrieving the stack could be a chore, but still, it remains very very useful.
You should try to load code from internet to print out the current stack for each log. I remember playing with http://www.codeproject.com/KB/threads/StackWalker.aspx for that.
Note that you'll need to either be in debug build, or have the PDB file along the executable file, to make sure the stack will be fully printed.
... everything, including multiple crashes
I believe you are on Windows: You could try to catch the SEH exception. This way, if multiple crashes are happening, you'll see them all, instead of seeing only the first, and each time you'll be able to mark "OK" or "CRASHED" in your logs. I went even as far as using maps to remember addresses of allocations/deallocations, thus organizing the logs to show them together (instead of sequentially).
I'm at home, so I can't provide you with the exact code, but here, Google is your friend, but the thing to remember is that you can't have a __try/__except handdler everywhere (C++ unwinding and C++ exception handlers are not compatible with SEH), so you'll have to write an intermediary function to catch the SEH exception.
Is your crash thread-related?
Last, but not least, the "I happens only 5% of the time" symptom could be caused by different code path executions, or the fact you have multiple threads playing together with the same data.
The InterlockedDecrement part bothers me: Is your object living in multiple threads? And is m_nRefCount correctly aligned and volatile LONG?
The correctly aligned and LONG part are important, here.
If your variable is not a LONG (for example, it could be a size_t, which is not a LONG on a 64-bit Windows), then the function could well work the wrong way.
The same can be said for a variable not aligned on 32-byte boundaries. Is there #pragma pack() instructions in your code? Does your projet file change the default alignment (I assume you're working on Visual Studio)?
For the volatile part, InterlockedDecrement seem to generate a Read/Write memory barrier, so the volatile part should not be mandatory (see http://msdn.microsoft.com/en-us/library/f20w0x5e.aspx).

I get an exception if I leave the program running for a while

Platform : Win32
Language : C++
I get an error if I leave the program running for a while (~10 min).
Unhandled exception at 0x10003fe2 in ImportTest.exe: 0xC0000005: Access violation reading location 0x003b1000.
I think it could be a memory leak but I don't know how to find that out.
Im also unable to 'free()' memory because it always causes (maybe i shouldn't be using free() on variables) :
Unhandled exception at 0x76e81f70 in ImportTest.exe: 0xC0000005: Access violation reading location 0x0fffffff.
at that stage the program isn't doing anything and it is just waiting for user input
dllHandle = LoadLibrary(L"miniFMOD.dll");
playSongPtr = (playSongT)GetProcAddress(dllHandle,"SongPlay");
loadSongPtr = (loadSongT)GetProcAddress(dllHandle,"SongLoadFromFile");
int songHandle = loadSongPtr("FILE_PATH");
// ... {just output , couldn't cause errors}
playSongPtr(songHandle);
getch(); // that is where it causes an error if i leave it running for a while
Edit 2:
playSongPtr(); causes the problem. but i don't know how to fix it
I think it's pretty clear that your program has a bug. If you don't know where to start looking, a useful technique is "divide and conquer".
Start with your program in a state where you can cause the exception to happen. Eliminate half your code, and try again. If the exception still happens, then you've got half as much code to look through. If the exception doesn't happen, then it might have been related to the code you just removed.
Repeat the above until you isolate the problem.
Update: You say "at that stage the program isn't doing anything" but clearly it is doing something (otherwise it wouldn't crash). Is your program a console mode program? If so, what function are you using to wait for user input? If not, then is it a GUI mode program? Have you opened a dialog box and are waiting for something to happen? Have you got any Windows timers running? Any threads?
Update 2: In light of the small snippet of code you posted, I'm pretty sure that if you try to remove the call to the playSongPtr(songHandle) function, then your problem is likely to go away. You will have to investigate what the requirements are for "miniFMOD.dll". For example, that DLL might assume that it's running in a GUI environment instead of a console program, and may do things that don't necessarily work in console mode. Also, in order to do anything in the background (including playing a song), that DLL probably needs to create a thread to periodically load the next bit of the song and queue it in the play buffer. You can check the number of threads being created by your program in Task Manager (or better, Process Explorer). If it's more than one, then there are other things going on that you aren't directly controlling.
The error tells you that memory is accessed which you have not allocated at the moment. It could be a pointer error like dereferencing NULL. Another possibility is that you use memory after you freed it.
The first step would be to check your code for NULL reference checks, i.e. make sure you have a valid pointer before you use it, and to check the lifecycle of all allocated and freed resources. Writing NULL's over references you just freed might help find the problem spot.
I doubt this particular problem is a memory leak; the problem is dereferencing a pointer that does not point to something useful. To check for a memory leak, watch your process in your operating system's process list tool (task manager, ps, whatever) and see if the "used memory" value keeps growing.
On calling free: You should call free() once and only once on the non-null values returned from malloc(), calloc() or strdup(). Calling free() less than once will lead to a memory leak. Calling free() more than once will lead to memory corruption.
You should get a stack trace to see what is going on when the process crashes. Based on my reading of the addresses involved you probably have a stack overflow or have an incorrect pointer calculation using a stack address (in C/C++ terms: an "auto" variable.) A stack trace will tell you how you got to the point where it crashed.

What can modify the frame pointer?

I have a very strange bug cropping up right now in a fairly massive C++ application at work (massive in terms of CPU and RAM usage as well as code length - in excess of 100,000 lines). This is running on a dual-core Sun Solaris 10 machine. The program subscribes to stock price feeds and displays them on "pages" configured by the user (a page is a window construct customized by the user - the program allows the user to configure such pages). This program used to work without issue until one of the underlying libraries became multi-threaded. The parts of the program affected by this have been changed accordingly. On to my problem.
Roughly once in every three executions the program will segfault on startup. This is not necessarily a hard rule - sometimes it'll crash three times in a row then work five times in a row. It's the segfault that's interesting (read: painful). It may manifest itself in a number of ways, but most commonly what will happen is function A calls function B and upon entering function B the frame pointer will suddenly be set to 0x000002. Function A:
result_type emit(typename type_trait<T_arg1>::take _A_a1) const
{ return emitter_type::emit(impl_, _A_a1); }
This is a simple signal implementation. impl_ and _A_a1 are well-defined within their frame at the crash. On actual execution of that instruction, we end up at program counter 0x000002.
This doesn't always happen on that function. In fact it happens in quite a few places, but this is one of the simpler cases that doesn't leave that much room for error. Sometimes what will happen is a stack-allocated variable will suddenly be sitting on junk memory (always on 0x000002) for no reason whatsoever. Other times, that same code will run just fine. So, my question is, what can mangle the stack so badly? What can actually change the value of the frame pointer? I've certainly never heard of such a thing. About the only thing I can think of is writing out of bounds on an array, but I've built it with a stack protector which should come up with any instances of that happening. I'm also well within the bounds of my stack here. I also don't see how another thread could overwrite the variable on the stack of the first thread since each thread has it's own stack (this is all pthreads). I've tried building this on a linux machine and while I don't get segfaults there, roughly one out of three times it will freeze up on me.
Stack corruption, 99.9% definitely.
The smells you should be looking carefully for are:-
Use of 'C' arrays
Use of 'C' strcpy-style functions
memcpy
malloc and free
thread-safety of anything using pointers
Uninitialised POD variables.
Pointer Arithmetic
Functions trying to return local variables by reference
I had that exact problem today and was knee-deep in gdb mud and debugging for a straight hour before occurred to me that I simply wrote over array boundaries (where I didn't expect it the least) of a C array.
So, if possible, use vectors instead because any decend STL implementation will give good compiler messages if you try that in debug mode (whereas C arrays punish you with segfaults).
I'm not sure what you're calling a "frame pointer", as you say:
On actual execution of that
instruction, we end up at program
counter 0x000002
Which makes it sound like the return address is being corrupted. The frame pointer is a pointer that points to the location on the stack of the current function call's context. It may well point to the return address (this is an implementation detail), but the frame pointer itself is not the return address.
I don't think there's enough information here to really give you a good answer, but some things that might be culprits are:
incorrect calling convention. If you're calling a function using a calling convention different from how the function was compiled, the stack may become corrupted.
RAM hit. Anything writing through a bad pointer can cause garbage to end up on the stack. I'm not familiar with Solaris, but most thread implementations have the threads in the same process address space, so any thread can access any other thread's stack. One way a thread can get a pointer into another thread's stack is if the address of a local variable is passed to an API that ultimately deals with the pointer on a different thread. unless you synchronize things properly, this will end up with the pointer accessing invalid data. Given that you're dealing with a "simple signal implementation", it seems like it's possible that one thread is sending a signal to another. Maybe one of the parameters in that signal has a pointer to a local?
There's some confusion here between stack overflow and stack corruption.
Stack Overflow is a very specific issue cause by try to use using more stack than the operating system has allocated to your thread. The three normal causes are like this.
void foo()
{
foo(); // endless recursion - whoops!
}
void foo2()
{
char myBuffer[A_VERY_BIG_NUMBER]; // The stack can't hold that much.
}
class bigObj
{
char myBuffer[A_VERY_BIG_NUMBER];
}
void foo2( bigObj big1) // pass by value of a big object - whoops!
{
}
In embedded systems, thread stack size may be measured in bytes and even a simple calling sequence can cause problems. By default on windows, each thread gets 1 Meg of stack, so causing stack overflow is much less of a common problem. Unless you have endless recursion, stack overflows can always be mitigated by increasing the stack size, even though this usually is NOT the best answer.
Stack Corruption simply means writing outside the bounds of the current stack frame, thus potentially corrupting other data - or return addresses on the stack.
At it's simplest:-
void foo()
{
char message[10];
message[10] = '!'; // whoops! beyond end of array
}
That sounds like a stack overflow problem - something is writing beyond the bounds of an array and trampling over the stack frame (and probably the return address too) on the stack. There's a large literature on the subject. "The Shell Programmer's Guide" (2nd Edition) has SPARC examples that may help you.
With C++ unitialized variables and race conditions are likely suspects for intermittent crashes.
Is it possible to run the thing through Valgrind? Perhaps Sun provides a similar tool. Intel VTune (Actually I was thinking of Thread Checker) also has some very nice tools for thread debugging and such.
If your employer can spring for the cost of the more expensive tools, they can really make these sorts of problems a lot easier to solve.
It's not hard to mangle the frame pointer - if you look at the disassembly of a routine you will see that it is pushed at the start of a routine and pulled at the end - so if anything overwrites the stack it can get lost. The stack pointer is where the stack is currently at - and the frame pointer is where it started at (for the current routine).
Firstly I would verify that all of the libraries and related objects have been rebuilt clean and all of the compiler options are consistent - I've had a similar problem before (Solaris 2.5) that was caused by an object file that hadn't been rebuilt.
It sounds exactly like an overwrite - and putting guard blocks around memory isn't going to help if it is simply a bad offset.
After each core dump examine the core file to learn as much as you can about the similarities between the faults. Then try to identify what is getting overwritten. As I remember the frame pointer is the last stack pointer - so anything logically before the frame pointer shouldn't be modified in the current stack frame - so maybe record this and copy it elsewhere and compare upon return.
Is something meaning to assign a value of 2 to a variable but instead is assigning its address to 2?
The other details are lost on me but "2" is the recurring theme in your problem description. ;)
I would second that this definitely sounds like a stack corruption due to out of bound array or buffer writing. Stack protector would be good as long as the writing is sequential, not random.
I second the notion that it is likely stack corruption. I'll add that the switch to a multi-threaded library makes me suspicious that what has happened is a lurking bug has been exposed. Possibly the sequencing the buffer overflow was occurring on unused memory. Now it's hitting another thread's stack. There are many other possible scenarios.
Sorry if that doesn't give much of a hint at how to find it.
I tried Valgrind on it, but unfortunately it doesn't detect stack errors:
"In addition to the performance penalty an important limitation of Valgrind is its inability to detect bounds errors in the use of static or stack allocated data."
I tend to agree that this is a stack overflow problem. The tricky thing is tracking it down. Like I said, there's over 100,000 lines of code to this thing (including custom libraries developed in-house - some of it going as far back as 1992) so if anyone has any good tricks for catching that sort of thing, I'd be grateful. There's arrays being worked on all over the place and the app uses OI for its GUI (if you haven't heard of OI, be grateful) so just looking for a logical fallacy is a mammoth task and my time is short.
Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
No one asked this, but I built with gcc-4.2. Also, I can guarantee ABI safety here so that's also not the issue. As for the "garbage at the end of the stack" on the RAM hit, the fact that it is universally 2 (though in different places in the code) makes me doubt that as garbage tends to be random.
It is impossible to know, but here are some hints that I can come up with.
In pthreads you must allocate the stack and pass it to the thread. Did you allocate enough? There is no automatic stack growth like in a single threaded process.
If you are sure that you don't corrupt the stack by writing past stack allocated data check for rouge pointers (mostly uninitialized pointers).
One of the threads could overwrite some data that others depend on (check your data synchronisation).
Debugging is usually not very helpful here. I would try to create lots of log output (traces for entry and exit of every function/method call) and then analyze the log.
The fact that the error manifest itself differently on Linux may help. What thread mapping are you using on Solaris? Make sure you map every thread to it's own LWP to ease the debugging.
Also agreed that the 0x000002 is suspect. It is about the only constant between crashes. Even weirder is the fact that this only cropped up with the multi-threaded switch. I think that the smaller stack as a result of the multiple-threads is what's making this crop up now, but that's pure supposition on my part.
If you pass anything on the stack by reference or by address, this would most certainly happen if another thread tried to use it after the first thread returned from a function.
You might be able to repro this by forcing the app onto a single processor. I don't know how you do that with Sparc.