How to debug a buffer overrun in Visual C++ 9? - c++

I have a huge MMC snapin written in Visual C++ 9. Every once in a while when I hit F5 in MMC mmc.exe crashes. If I attach a debugger to it I see the following message:
A buffer overrun has occurred in mmc.exe which has corrupted the program's internal state. Press Break to debug the program or Continue to terminate the program.
For more details please see Help topic 'How to debug Buffer Overrun Issues'.
First of all, there's no How to debug Buffer Overrun Issues topic anywhere.
When I inspect the call stack I see that it's likely something with security cookies used to guard against stack-allocated buffer overruns:
MySnapin.dll!__crt_debugger_hook() Unknown
MySnapin.dll!__report_gsfailure() Line 315 + 0x7 bytes C
mssvcr90d.dll!ValidateLocalCookies(void (unsigned int)* CookieCheckFunction=0x1014e2e3, _EH4_SCOPETABLE * ScopeTable=0x10493e48, char * FramePointer=0x0007ebf8) + 0x57 bytes C
msvcr90d.dll!_except_handler4_common(unsigned int * CookiePointer=0x104bdcc8, void (unsigned int)* CookieCheckFunction=0x1014e2e3, _EXCEPTION_RECORD * ExceptionRecord=0x0007e764, _EXCEPTION_REGISTRATION_RECORD * EstablisherFrame=0x0007ebe8, _CONTEXT * ContextRecord=0x0007e780, void * DispatcherContext=0x0007e738) + 0x44 bytes C
MySnapin.dll!_except_handler4(_EXCEPTION_RECORD * ExceptionRecord=0x0007e764, _EXCEPTION_REGISTRATION_RECORD * EstablisherFrame=0x0007ebe8, _CONTEXT * ContextRecord=0x0007e780, void * DispatcherContext=0x0007e738) + 0x24 bytes C
ntdll.dll!7c9032a8()
[Frames below may be incorrect and/or missing, no symbols loaded for ntdll.dll]
ntdll.dll!7c90327a()
ntdll.dll!7c92aa0f()
ntdll.dll!7c90e48a()
MySnapin.dll!IComponentImpl<CMySnapin>::GetDisplayInfo(_RESULTDATAITEM * pResultDataItem=0x0007edb0) Line 777 + 0x14 bytes C++
// more Win32 libraries functions follow
I have lots of code and no idea where the buffer overrun might occur and why. I found this forum discussion and specifically the advise to replace all wcscpy-like functions with more secure versions like wcscpy_s(). I followed the advise and that didn't get me closer to the problem solution.
How do I debug my code and find why and where the buffer overrun occurs with Visual Studio 2008?

Add /RTCs switch to the compiler. This will enable detection of buffer overruns and underruns at runtime. When overrun will be detected, program will break exactly in place where it happened rather than giving you postmortem message.
If that does not help, then investigate wcscpy_s() calls that you mentioned. Verify that the 'number of elements' has correct value. I recently fixed buffer overrun caused incorrect usage of wcscpy_s(). Here is an example:
const int offset = 10;
wchar_t buff[MAXSIZE];
wcscpy_s(buff + offset, MAXSIZE, buff2);
Notice that buff + offset has MAXSIZE - offset elements, not MAXSIZE.

I just had this problem a minute ago, and I was able to solve it. I searched first on the net with no avail, but I got to this thread.
Anyways, I am running VS2005 and I have a multi-threaded program. I had to 'guess' which thread caused the problem, but luckily I only have a few.
So, what I did was in that thread I ran through the debugger, stepping through the code at a high level function. I noticed that it always occurred at the same place in the function, so now it was a matter of drilling down.
The other thing I would do is step through with the callstack window open making sure that the stack looked okay and just noting when the stack goes haywire.
I finally narrowed down to the line that caused the bug, but it wasn't actually that line. It was the line before it.
So what was the cause for me? Well, in short-speak I tried to memcpy a NULL pointer into a valid area of memory.
I'm surprised the VS2005 can't handle this.
Anyways, hope that helps. Good luck.

I assume you aren't able to reproduce this reliably.
I've successfully used Rational Purify to hunt down a variety of memory problems in the past, but it costs $ and I'm not sure how it would interact with MMC.
Unless there's some sort of built-in memory debugger you may have to try solving this programmatically. Are you able to remove/disable chunks of functionality to see if the problem manifests itself?
If you have "guesses" about where the problem occurs you can try disabling/changing that code as well. Even if you changed the copy functions to _s versions, you still need to be able to reliably handle truncated data.

I have got this overrun when I wanted to increment a value in a pointer variable like this:
*out_BMask++;
instead
(*out_BMask)++;
where out_BMask was declared as int *out_BMask
If you did something like me then I hope this will help you ;)

Related

Why would an incorrect memory allocation to a buffer only cause crashes when compiled in Release mode and not in Debug mode?

This is my first ever project that I've managed to complete so I'm a bit unsure of how to reference an executable vs a project being worked on and debugged in "debug mode" or whether there's multiple ways to do so etc, etc.
To be more specific, however, I encountered a heap corruption issue that only occurred when Visual Studio 2019 had been set to Release Mode, spit out the "exe" version of my program, and then went through its first debugging session in that form. It turns out (I'm probably wrong, but this is the last thing I changed before the issue completely disappeared) that the following code:
std::unique_ptr<std::vector<Stat>> getSelStudStats(HWND listboxcharnames) {
std::unique_ptr<std::vector<Stat>> selStats = std::make_unique<std::vector<Stat>>();
int pos = ListBox_GetCurSel(listboxcharnames);
int len = ListBox_GetTextLen(listboxcharnames, pos);
const wchar_t* buffer = new const wchar_t[++len];
ListBox_GetText(listboxcharnames, pos, buffer);
for (int i = 0; i < getSize(); i++) {
Character character = getCharacterPtr(i);
std::wstring name = character.getName();
if (name.compare(buffer) == 0) {
*selStats = character.getAllStats();
return selStats;
}
}
return selStats;
delete[] buffer;
}
was not assigning the correct size to the buffer variable through len. By adding the prefix increment operator to len, the null terminator character that would come along with the list box's text was now being accounted for; Consequently, the heap corruption error stopped occurring.
While I'm glad to have figured out the issue, I don't know why VS2019 didn't bring this issue up in Debug Mode. In attempting to debug the issue, I've learned that optimizations in Release Mode can change the structure and order of code execution.
Is there something in this block of code that would create the error I had, but only in Release Mode/executable form?
EDITED: I removed the asterisks that were originally surrounding ++len in my attempt to highlight the change that I reference making. Apologies for the confusion it, understandably, caused.
Docs explain the behavior:
When you request a memory block, the debug heap manager allocates from the base heap a slightly larger block of memory than requested and returns a pointer to your portion of that block. For example, suppose your application contains the call: malloc( 10 ). In a Release build, malloc would call the base heap allocation routine requesting an allocation of 10 bytes. In a Debug build, however, malloc would call _malloc_dbg, which would then call the base heap allocation routine requesting an allocation of 10 bytes plus approximately 36 bytes of additional memory.
So in debug you don't overrun your buffer. However, it may cause other bugs later (but unlikely for one byte overrun.)

How to debug segmentation fault?

It works when, in the loop, I set every element to 0 or to entry_count-1.
It works when I set it up so that entry_count is small, and I write it by hand instead of by loop (sorted_order[0] = 0; sorted_order[1] = 1; ... etc).
Please do not tell me what to do to fix my code. I will not be using smart pointers or vectors for very specific reasons. Instead focus on the question:
What sort of conditions can cause this segfault?
Thank you.
---- OLD -----
I am trying to debug code that isn't working on a unix machine. The gist of the code is:
int *sorted_array = (int*)memory;
// I know that this block is large enough
// It is allocated by malloc earlier
for (int i = 0; i < entry_count; ++i){
sorted_array[i] = i;
}
There appears to be a segfault somewhere in the loop. Switching to debug mode, unfortunately, makes the segfault stop. Using cout debugging I found that it must be in the loop.
Next I wanted to know how far into the loop the segfault happend so I added:
std::cout << i << '\n';
It showed the entire range it was suppose to be looping over and there was no segfault.
With a little more experimentation I eventually created a string stream before the loop and write an empty string into it for each iteration of the loop and there is no segfault.
I tried some other assorted operations trying to figure out what is going on. I tried setting a variable j = i; and stuff like that, but I haven't found anything that works.
Running valgrind the only information I got on the segfault was that it was a "General Protection Fault" and something about default response to 11. It also mentions that there's a Conditional jump or move depends on uninitialized value(s), but looking at the code I can't figure out how that's possible.
What can this be? I am out of ideas to explore.
This is clearly a symptoms of invalid memory uses within your program.This would be bit difficult to find by looking out your code snippet as it is most likely be the side effect of something else bad which has already happened.
However as you have mentioned in your question that you are able to attach your program using Valgrind. as it is reproducible. So you may want to attach your program(a.out).
$ valgrind --tool=memcheck --db-attach=yes ./a.out
This way Valgrind would attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This should be the best possible way to understand and resolve your problem.
Once you are able to figure it out your first error, fix it and rerun it and see what are other errors you are getting.This steps should be done till no error is getting reported by Valgrind.
However you should avoid using the raw pointers in modern C++ programs and start using std::vector std::unique_ptr as suggested by others as well.
Valgrind and GDB are very useful.
The most previous one that I used was GDB- I like it because it showed me the exact line number that the Segmentation Fault was on.
Here are some resources that can guide you on using GDB:
GDB Tutorial 1
GDB Tutorial 2
If you still cannot figure out how to use GDB with these tutorials, there are tons on Google! Just search debugging Segmentation Faults with GDB!
Good luck :)
That is hard, I used valgrind tools to debug seg-faults and it usually pointed to violations.
Likely your problem is freed memory that you are writing to i.e. sorted_array gets out of scope or gets freed.
Adding more code hides this problem as data allocation shifts around.
After a few days of experimentation, I figured out what was really going on.
For some reason the machine segfaults on unaligned access. That is, the integers I was writing were not being written to memory boundaries that were multiples of four bytes. Before the loop I computed the offset and shifted the array up that much:
int offset = (4 - (uintptr_t)(memory) % 4) % 4;
memory += offset;
After doing this everything behaved as expected again.

Memory error: Dereference null pointer/ SSE misalignment

I'm compiling a program on remote linux server. The program compiled. However when I run it the program ends abruptly. So I debugged the program using DDT. It spits out the following error:
Process 0:
Memory error detected in ClassName::function (filename.cpp:6462).
Thread 1 attempted to dereference a null pointer or execute an SSE instruction with an
incorrectly aligned memory address (the latter may sometimes occur spuriously if guard
pages are enabled)
Tip: Use the stack list and the local variables to explore your program's current
state and identify the source of the error.
Can anyone please tell me what exactly this error means?
The line where the program stops looks like this:
SumUtility = ParaEst[0] + hhincome * ParaEst[71] + IsBlack * ParaEst[61] + IsBachAss * (ParaEst[55]);
This is within a switch case.
These are the variable types
vector<double> ParaEst;
double hhincome;
int IsBlack, Is BachAss;
Thanks for the help!
It means that:
ParaEst is NULL or a bad Pointer
ParaEst's individual array values are not aligned to 16-byte boundaries, required for SSE.
hhincome, IsBlack, or IsBachAss are not aligned to 16-byte boundaries and are SSE type values.
SumUtility is not aligned to 16-bytes and is a SSE type field.
If you could post the assembly code of the exact line that failed along with the register values of that assembler line, we could tell you exactly which of the above conditions have failed. It would also help to see the types of each variable shown to help narrow root the cause.
Ok... The problem finally got fixed.
The issue was that the expression where the code was breaking down was in a newly defined function. However for some weird reason running the make-file did not incorporate these changes and was still compiling using the previously compiled .o file. This resulted in garbage values being assigned to the variables within this new function. To top things off the program calls this function as a first step. Hence there was this systematic breakdown. The technical aspect of this was what Michael alluded to.
After this I would always recommend to use a make clean option in the make file. The issue of why running the make file is failing to compile the modified source file is an issue that definitely warrants further discussion.
Thanks for the responses!!

realloc crashing in previously stable function

Apparently this function in SDL_Mixer keeps dying, and I'm not sure why. Does anyone have any ideas? According to visual studio, the crash is caused by Windows triggering a breakpoint somewhere in the realloc() line.
The code in question is from the SVN version of SDL_Mixer specifically, if that makes a difference.
static void add_music_decoder(const char *decoder)
{
void *ptr = realloc(music_decoders, num_decoders * sizeof (const char **));
if (ptr == NULL) {
return; /* oh well, go on without it. */
}
music_decoders = (const char **) ptr;
music_decoders[num_decoders++] = decoder;
}
I'm using Visual Studio 2008, and music_decoders and num_decoders are both correct (music_decoders contains one pointer, to the string "WAVE", and music_decoders. ptr is 0x00000000, and the best I can tell, the crash seems to be in the realloc() function. Does anyone have any idea how I could handle this crash problem? I don't mind having to do a bit of refactoring in order to make this work, if it comes down to that.
For one thing, it's not valid to allocate an array of num_decoders pointers, and then write to index num_decoders in that array. Presumably the first time this function was called, it allocated 0 bytes and wrote a pointer to the result. This could have corrupted the memory allocator's structures, resulting in a crash/breakpoint when realloc is called.
Btw, if you report the bug, note that add_chunk_decoder (in mixer.c) is broken in the same way.
I'd replace
void *ptr = realloc(music_decoders, num_decoders * sizeof (const char **));
with
void *ptr = realloc(music_decoders, (num_decoders + 1) * sizeof(*music_decoders));
Make sure that the SDL_Mixer.DLL file and your program build are using the same C Runtime settings. It's possible that the memory is allocated using one CRT, and realloc'ed using another CRT.
In the project settings, look for C/C++ -> Code Generation. The Runtime Library setting there should be the same for both.
music_decoders[num_decoders++] = decoder;
You are one off here. If num_decoders is the size of the array then the last index is num_decoders - 1. Therefore you should replace the line with:
music_decoders[num_decoders-1] = decoder;
And you may want to increment num_decoders at the beginning of the function, not at the end since you want to reallow for the new size, not for the old one.
One additional thing: you want to multiply the size with sizeof (const char *), not with double-star.
Ah, the joys of C programming. A crash in realloc (or malloc or free) can be triggered by writing past the bounds of a memory block -- and this can happen anywhere else in your program. The approach I've used in the past is some flavor of debugging malloc package. Before jumping in with a third party solution, check the docs to see if Visual Studio provides anything along these lines.
Crashes are not generally triggered by breakpoints. Are you crashing, breaking due to a breakpoint or crashing during the handling of the breakpoint?
The debug output window should have some information as to why a CRT breakpoint is being hit. For example, it might notice during the memory operations that guard bytes around the original block have been modified (due to a buffer overrun that occurred before add_music_decoder was even invoked). The CRT will check these guard pages when memory is freed and possibly when realloced too.

Crash within CString

I am observing a crash within my application and the call stack shows below
mfc42u!CString::AllocBeforeWrite+5
mfc42u!CString::operator=+22
No idea why this occuring. This does not occur frequently also.
Any suggestions would help. I have the crash dump with me but not able to progress any further.
The operation i am performing is something like this
iParseErr += m_RawMessage[wMsgLen-32] != NC_SP;
where m_RawMessage is a 512 length char array.
wMsgLen is unsigned short
and NC_SP is defined as
#define NC_SP 0x20 // Space
EDIT:
Call Stack:
042afe3c 5f8090dd mfc42u!CString::AllocBeforeWrite+0x5 * WARNING: Unable to verify checksum for WP Communications Server.exe
042afe50 0045f0c0 mfc42u!CString::operator=+0x22
042aff10 5f814d6b WP_Communications_Server!CParserN1000::iCheckMessage(void)+0x665 [V:\CSAC\SourceCode\WP Communications Server\HW Parser N1000.cpp # 1279]
042aff80 77c3a3b0 mfc42u!_AfxThreadEntry+0xe6
042affb4 7c80b729 msvcrt!_endthreadex+0xa9
042affec 00000000 kernel32!BaseThreadStart+0x37
Well this is complete call stack and i have posted the code snippet as in my original message
Thanks
I have a suggestion that might be a little frustrating for you:
CString::AllocBeforeWrite does implicate to me, that the system tries to allocate some memory.
Could it be, that some other memory operation (specially freeing or resizing of memory) is corrupted before?
A typical problem with C/C++ memory management is, that an error on freeing (or resizing) memory (for example two times freeing the same junk of memory) will not crash the system immediatly but can cause dumps much later -- specially when new memory is to be allocated.
Your situation looks to me quite like that.
The bad thing is:
It can be very difficult to find the place where the real error occurs -- where the heap is corrupted in the first place.
This also can be the reason, why your problem only occurs once in a while. It could depend on some complicated situation beforehand.
I'm sure you'll have checked the obvious: wMsgLen >= 32