Debug stack corruption - c++

Now I am debugging a large project, which has a stack corruption: the application fails.
I would like to know how to find (debug) such stack corruption code with Visual Studio 2010?
Here's an example of some code which causes stack problems, how would I find less obvious cases of this type of corruption?
void foo()
{
int i = 10;
int *p = &i;
p[-2] = 100;
}
Update
Please note that this is just an example. I need to find such bad code in the current project.

There's one technique that can be very effective with these kinds of bugs, but it'll only work on a subset of them that has a few characteristics:
the corrupting value must be stable (ie., as in your example, when the corruption occurs, it's always 100), or at least something that can be readily identified in a simple expression
the corruption has to occur at a particular address on the stack
the corrupting value is unusual enough that you won't be hit with a slew of false positives
Note that the second condition may seem unlikely at first glance because the stack can be used in so many different ways depending on the runtime actions. However, stack usage is generally pretty deterministic. The problem is that a particular stack location can be used for so many different things that the problem is really item #3.
Anyway, if your bug has these characteristics, you should identify the stack address (or one of them) that gets corrupted, then set a memory breakpoint for a write to that address with a condition that causes it to break only if the value written is the corrupting value. In visual Studio, you can do this by creating a "New Data Breakpoint..." in the Breakpoints window then right clicking the breakpoint to set the condition.
If you end up getting too many false positives, it might help to narrow the scope of the breakpoint by leaving it disabled until some point in the execution path that's closer to the bug (if you can identify such a time), or set the hit count high enough to remove most of the false positives.
An additional complication is the address of the stack may change from run to run - in this case, you'll have to take care to set the breakpoint on each run (the lower bits of the address should be the same).

I believe your Questions quotes an example of stack corruption and the question you are asking is not why it crashes.
If it is so, It crashes because it creates an Undefined Behavior because the index -2 points to an unknown memory location.
To answer the question on profiling your application:
You can use Rational Purify Plus for Visual studio to check for memory overrites and access errors.

This is UB: p[-2] = 100;
You can access p with the operator[] in this(p[i]) way, but in this case i is an invalid value. So p[-2] points to an invalid memory location and causes Undefined Behaviour.
To find it you should debug your app and find where it crashes, and hopefully, it'll be at a place where something is actually wrong.

Related

How to debug segmentation fault?

It works when, in the loop, I set every element to 0 or to entry_count-1.
It works when I set it up so that entry_count is small, and I write it by hand instead of by loop (sorted_order[0] = 0; sorted_order[1] = 1; ... etc).
Please do not tell me what to do to fix my code. I will not be using smart pointers or vectors for very specific reasons. Instead focus on the question:
What sort of conditions can cause this segfault?
Thank you.
---- OLD -----
I am trying to debug code that isn't working on a unix machine. The gist of the code is:
int *sorted_array = (int*)memory;
// I know that this block is large enough
// It is allocated by malloc earlier
for (int i = 0; i < entry_count; ++i){
sorted_array[i] = i;
}
There appears to be a segfault somewhere in the loop. Switching to debug mode, unfortunately, makes the segfault stop. Using cout debugging I found that it must be in the loop.
Next I wanted to know how far into the loop the segfault happend so I added:
std::cout << i << '\n';
It showed the entire range it was suppose to be looping over and there was no segfault.
With a little more experimentation I eventually created a string stream before the loop and write an empty string into it for each iteration of the loop and there is no segfault.
I tried some other assorted operations trying to figure out what is going on. I tried setting a variable j = i; and stuff like that, but I haven't found anything that works.
Running valgrind the only information I got on the segfault was that it was a "General Protection Fault" and something about default response to 11. It also mentions that there's a Conditional jump or move depends on uninitialized value(s), but looking at the code I can't figure out how that's possible.
What can this be? I am out of ideas to explore.
This is clearly a symptoms of invalid memory uses within your program.This would be bit difficult to find by looking out your code snippet as it is most likely be the side effect of something else bad which has already happened.
However as you have mentioned in your question that you are able to attach your program using Valgrind. as it is reproducible. So you may want to attach your program(a.out).
$ valgrind --tool=memcheck --db-attach=yes ./a.out
This way Valgrind would attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This should be the best possible way to understand and resolve your problem.
Once you are able to figure it out your first error, fix it and rerun it and see what are other errors you are getting.This steps should be done till no error is getting reported by Valgrind.
However you should avoid using the raw pointers in modern C++ programs and start using std::vector std::unique_ptr as suggested by others as well.
Valgrind and GDB are very useful.
The most previous one that I used was GDB- I like it because it showed me the exact line number that the Segmentation Fault was on.
Here are some resources that can guide you on using GDB:
GDB Tutorial 1
GDB Tutorial 2
If you still cannot figure out how to use GDB with these tutorials, there are tons on Google! Just search debugging Segmentation Faults with GDB!
Good luck :)
That is hard, I used valgrind tools to debug seg-faults and it usually pointed to violations.
Likely your problem is freed memory that you are writing to i.e. sorted_array gets out of scope or gets freed.
Adding more code hides this problem as data allocation shifts around.
After a few days of experimentation, I figured out what was really going on.
For some reason the machine segfaults on unaligned access. That is, the integers I was writing were not being written to memory boundaries that were multiples of four bytes. Before the loop I computed the offset and shifted the array up that much:
int offset = (4 - (uintptr_t)(memory) % 4) % 4;
memory += offset;
After doing this everything behaved as expected again.

What do the memory operations malloc and free exactly do?

Recently I met a memory release problem. First, the blow is the C codes:
#include <stdio.h>
#include <stdlib.h>
int main ()
{
int *p =(int*) malloc(5*sizeof (int));
int i ;
for(i =0;i<5; i++)
p[i ]=i;
p[i ]=i;
for(i =0;i<6; i++)
printf("[%p]:%d\n" ,p+ i,p [i]);
free(p );
printf("The memory has been released.\n" );
}
Apparently, there is the memory out of range problem. And when I use the VS2008 compiler, it give the following output and some errors about memory release:
[00453E80]:0
[00453E84]:1
[00453E88]:2
[00453E8C]:3
[00453E90]:4
[00453E94]:5
However when I use the gcc 4.7.3 compiler of cygwin, I get the following output:
[0x80028258]:0
[0x8002825c]:1
[0x80028260]:2
[0x80028264]:3
[0x80028268]:4
[0x8002826c]:51
The memory has been released.
Apparently, the codes run normally, but 5 is not written to the memory.
So there are maybe some differences between VS2008 and gcc on handling these problems.
Could you guys give me some professional explanation on this? Thanks In Advance.
This is normal as you have never allocated any data into the mem space of p[5]. The program will just print what ever data was stored in that space.
There's no deterministic "explanation on this". Writing data into the uncharted territory past the allocated memory limit causes undefined behavior. The behavior is unpredictable. That's all there is to it.
It is still strange though to see that 51 printed there. Typically GCC will also print 5 but fail with memory corruption message at free. How you managed to make this code print 51 is not exactly clear. I strongly suspect that the code you posted is not he code you ran.
It seems that you have multiple questions, so, let me try to answer them separately:
As pointed out by others above, you write past the end of the array so, once you have done that, you are in "undefined behavior" territory and this means that anything could happen, including printing 5, 6 or 0xdeadbeaf, or blow up your PC.
In the first case (VS2008), free appears to report an error message on standard output. It is not obvious to me what this error message is so it is hard to explain what is going on but you ask later in a comment how VS2008 could know the size of the memory you release. Typically, if you allocate memory and store it in pointer p, a lot of memory allocators (the malloc/free implementation) store at p[-1] the size of the memory allocated. In practice, it is common to also store at address p[p[-1]] a special value (say, 0xdeadbeaf). This "canary" is checked upon free to see if you have written past the end of the array. To summarize, your 5*sizeof(int) array is probably at least 5*sizeof(int) + 2*sizeof(char*) bytes long and the memory allocator used by code compiled with VS2008 has quite a few checks builtin.
In the case of gcc, I find it surprising that you get 51 printed. If you wanted to investigate wwhy that is exactly, I would recommend getting an asm dump of the generated code as well as running this under a debugger to check if 5 is actually really written past the end of the array (gcc could well have decided not to generate that code because it is "undefined") and if it is, to put a watchpoint on that memory location to see who overrides it, when, and why.

Memory error: Dereference null pointer/ SSE misalignment

I'm compiling a program on remote linux server. The program compiled. However when I run it the program ends abruptly. So I debugged the program using DDT. It spits out the following error:
Process 0:
Memory error detected in ClassName::function (filename.cpp:6462).
Thread 1 attempted to dereference a null pointer or execute an SSE instruction with an
incorrectly aligned memory address (the latter may sometimes occur spuriously if guard
pages are enabled)
Tip: Use the stack list and the local variables to explore your program's current
state and identify the source of the error.
Can anyone please tell me what exactly this error means?
The line where the program stops looks like this:
SumUtility = ParaEst[0] + hhincome * ParaEst[71] + IsBlack * ParaEst[61] + IsBachAss * (ParaEst[55]);
This is within a switch case.
These are the variable types
vector<double> ParaEst;
double hhincome;
int IsBlack, Is BachAss;
Thanks for the help!
It means that:
ParaEst is NULL or a bad Pointer
ParaEst's individual array values are not aligned to 16-byte boundaries, required for SSE.
hhincome, IsBlack, or IsBachAss are not aligned to 16-byte boundaries and are SSE type values.
SumUtility is not aligned to 16-bytes and is a SSE type field.
If you could post the assembly code of the exact line that failed along with the register values of that assembler line, we could tell you exactly which of the above conditions have failed. It would also help to see the types of each variable shown to help narrow root the cause.
Ok... The problem finally got fixed.
The issue was that the expression where the code was breaking down was in a newly defined function. However for some weird reason running the make-file did not incorporate these changes and was still compiling using the previously compiled .o file. This resulted in garbage values being assigned to the variables within this new function. To top things off the program calls this function as a first step. Hence there was this systematic breakdown. The technical aspect of this was what Michael alluded to.
After this I would always recommend to use a make clean option in the make file. The issue of why running the make file is failing to compile the modified source file is an issue that definitely warrants further discussion.
Thanks for the responses!!

The value of out-of-bounds element of an array - should it change on each execution or stay the same?

For the actual question, jump to question part. For an interesting real-world example of undefined behavior, keep reading :)
There was this enumeration:
struct EnumStruct
{
enum Enum
{
val0 = 0,
val1,
val2,
val3,
val4
};
};
and in some function we had this:
const int arrayCount = 6;
int arr[] = {
EnumStruct::val0,
EnumStruct::val1,
EnumStruct::val2,
EnumStruct::val3,
EnumStruct::val4
InvalidValue
};
Then there was a loop that put arrayCount elements of the arr into a file. This was the Prepare() routine for the unit tests. And the unit test was supposed to check for the presence of the InvalidValue in the file. I was assigned a defect stating that the unit test fails. It worked perfect on my machine though. After a couple of hours of debugging I noticed that InvalidValue is actually #defined as -1, and there is a missing comma after val4. You can only imagine the swearwords that came out of my mouth in the address of whoever wrote that code (which worked perfectly for more than 3 years, actually).
Now, as you can see, the array is actually comprised of 5 values - 0, 1, 2, 3, 3 but the loop writes to the file the 6th element as well, which is of course Undefined Behavior. Now, while technically it is undefined, on Windows with MSVC no crashes happen - it just writes the garbage that is at that memory location. The thing is that if the garbage happens to be anything but 0, 1, 2, 3, or 4, the unit tests will succeed.
Question: It appears that the .vcproj file of the UT's is somehow patched before the UT's are built. I don't know how they do it, but with their build out-of-bounds array elements are always 0. It seems to me that the whole virtual memory is set to 0 before program execution. What project setting is that? Or am I imagining things?
I mean, if it was just luck that there was a 0 sitting out-of-bound of an array, then upon multiple executions my luck would fail, wouldn't it? But it is always 0... I am confused.
By the way, when I build the same project the out-of-bounds element always has different values on each execution. Can you please explain this? Thanks.
To the actual question of is memory always 0 at the start? Well, it might depend. In general, when the OS provides you with a page of memory it will be cleared (as a security measure, so that you cannot read the values that any other process had in their memory), so in many cases you will find that uninitialized can look like 0, until memory is being reused in your own process, where you will get whatever you wrote before.
There are also some compiler flags that can affect it. To detect uninitialized memory issues, sometimes debug builds will write patterns to memory after allocation from the OS and before handling it to the program, and a different pattern after releasing the memory in the program before reallocating (to detect access to freed memory), so that it is simpler to identify what happened (if you see in a debugger that the value is 0xDEADBEEF, you will know that the memory has already been released by the program. You will have to read the compiler/IDE documentation for more precise information.
As you say, it's undefined, so the implementor is free to do whatever they like. I can't speak for Visual C++ at all, but I am aware of other products that do things like zero out memory when running debug builds so that things like invalid pointer dereferences will fail at the site of the fault. It's possible that Microsoft is doing something similar I guess.

Crash within CString

I am observing a crash within my application and the call stack shows below
mfc42u!CString::AllocBeforeWrite+5
mfc42u!CString::operator=+22
No idea why this occuring. This does not occur frequently also.
Any suggestions would help. I have the crash dump with me but not able to progress any further.
The operation i am performing is something like this
iParseErr += m_RawMessage[wMsgLen-32] != NC_SP;
where m_RawMessage is a 512 length char array.
wMsgLen is unsigned short
and NC_SP is defined as
#define NC_SP 0x20 // Space
EDIT:
Call Stack:
042afe3c 5f8090dd mfc42u!CString::AllocBeforeWrite+0x5 * WARNING: Unable to verify checksum for WP Communications Server.exe
042afe50 0045f0c0 mfc42u!CString::operator=+0x22
042aff10 5f814d6b WP_Communications_Server!CParserN1000::iCheckMessage(void)+0x665 [V:\CSAC\SourceCode\WP Communications Server\HW Parser N1000.cpp # 1279]
042aff80 77c3a3b0 mfc42u!_AfxThreadEntry+0xe6
042affb4 7c80b729 msvcrt!_endthreadex+0xa9
042affec 00000000 kernel32!BaseThreadStart+0x37
Well this is complete call stack and i have posted the code snippet as in my original message
Thanks
I have a suggestion that might be a little frustrating for you:
CString::AllocBeforeWrite does implicate to me, that the system tries to allocate some memory.
Could it be, that some other memory operation (specially freeing or resizing of memory) is corrupted before?
A typical problem with C/C++ memory management is, that an error on freeing (or resizing) memory (for example two times freeing the same junk of memory) will not crash the system immediatly but can cause dumps much later -- specially when new memory is to be allocated.
Your situation looks to me quite like that.
The bad thing is:
It can be very difficult to find the place where the real error occurs -- where the heap is corrupted in the first place.
This also can be the reason, why your problem only occurs once in a while. It could depend on some complicated situation beforehand.
I'm sure you'll have checked the obvious: wMsgLen >= 32