How to prevent stack corruption? - c++

I'm trying to debug segfault in native app for android.
GDB shows the following:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 5200]
0xbfcc6744 in ?? ()
(gdb) bt
#0 0xbfcc6744 in ?? ()
#1 0x5cfb5458 in WWMath::unProject (x=2.1136094475592566, y=472.2994384765625, z=0, mvpMatrix=#0x0,
viewport=#0x0, result=#0x0) at jni/src/core/util/WWMath.cpp:118
#2 0x00000000 in ?? ()
Is it possible to get a good stack? Or find a place where the stack was corrupted?
UPD:
The function mentioned takes references:
bool WWMath::unProject(double x, double y, double z, const Matrix &mvpMatrix,
const Rect& viewport, Vec4& result)
and reference to simple local variable is passed as the last argument:
Vec4 far, near;
if (!unProject(x, y, 0, tMvp, viewport, near))

We don't have much information to go by! There is no general rule to avoid memory corruption except to be careful with addressing.
But it looks to me like you overflowed an array of floats, because the bogus address 0xbfcc6744 equates to a reasonable float value -1.597 which is in line with the other values reported by GDB.
Overwriting the return address caused execution to jump to that value, so look specifically at the caller of the function WWMath::unProject, whose locals precede its return address, to find the offending buffer. (And now we have it, near.)

Compiling with --fstack-protector-all will cause your program to abort (with signal SIGABRT) when it returns from a function that corrupts the stack, if that corruption includes the area of the stack around the return address.
Stack-protector-all isn't a great debugging tool, but it is easy to try and sometimes does catch problems like this one. Though it will not point you to which line caused the problem, it will at least narrow it down to a single function. Once you have that information, you can step through it in GDB in order to pinpoint the line in question.

Solved this problem only by stepping line-by-line from the beginning of suspicious code and looknig for the moment when the stack gets corrupted.(It was ugly pointer arithmetic with two-dimensional arrays.)
And it seems that there was another way: try to put everything into the heap and hope that incorrect operation will cause segfault.

Related

When and how exactly is a segmentation fault of a C/C++ application reported and handled by the OS?

I'm failing to understand a specific scenario in which my C++ multi-threaded application (running on a Linux machine, Wind River 6.x) is facing a segmentation fault.
I know the concept of segmentation fault and even went over this post and also this one but failed to encounter a scenario similar to mine and/or an answer to my question, so I'm posting this question.
My code that generates the segmentation fault is as follows (abbreviated and simplified):
// MyStruct* pMyStruct is a function argument that arrives to the function and at some point of time
// being set to NULL
ASSERT_PTR_NE(pMyStruct, NULL); <--- this assertion is logged to my application log (meaning, at this line, pMyStruct is NULL)
int someInt = pMyStruct->someIntOfMyStruct; <--- this line does NOT create the segmentation fault
double someDouble = pMyStruct->someDoubleOfMyStruct; <--- this line ALSO does NOT create the segmentation fault
ASSERT_NUM_EQ(pMyStruct->someIntOfMyStruct, SOME_INT_VALUE_TO_CHECK); <--- this line DOES create the segmentation fault
As mentioned in the last code line, the 4th line of code is the "last line" that my application is executing (I guess) --> when examining the core file with GDB, frame 0 of the core file indicates that this line is the line that causes the crash.
My questions are if so:
How come the 2nd and 3rd lines of code of my application did not cause segmentation fault?
What exactly takes place, system wise, i.e. - in the OS and the application from the moment the NULL was accessed (in the first line) until the application is being terminated by the OS?
Meaning, is it possible that indeed the actual segmentation fault was raised due to the 1st line, YET, for some reason, until the OS actually took the decision and action to terminate the application, also lines 2-4 were executed and when arriving to the 4th line the application "again" raised segmentation fault?
Or, perhaps, is it possible that what actually took place here is an overrun of the pMyStruct variable - meaning, after the first line that does the assert (and prints info to the log file of the application) another thread set the pMyStruct to NON NULL value, thus "allowing" lines 2-3 to run WITHOUT causing a crash and then JUST before line 4 was executed the pMyStruct was "overrun" by another thread and was set to NULL thus, this time causing line 4 to crash?
Typically, an OS creates a segmentation fault after the CPU faults on an address. The CPU doesn't know why the fault happened. It might be that the memory is paged out to disk, but for this question we're assuming a bad pointer. The OS knows it's a bad pointer because the address doesn't correspond to any paged-out memory. Hence, the OS tells the CPU it is handling the situation, and tells the CPU to continue execution in the signal handler.
The C++ null pointer isn't special to the CPU. It just so happens that the OS by convention does not allocate RAM at this address.
By C++ standards, your code has Undefined Behavior, and that allows "time travel". More accurately, to allow optimizations, compilers may shuffle around code in the assumption that Undefined Behavior does not happen. It would seem that lines 2 & 3 are shuffled after line 4. You can't detect this in a correct C++ program.
This is not how a typical CPU sees it. Modern CPU's also shuffle around instructions internally, like compilers do, but when the CPU reports the fault to the OS it will pretend that all instructions happened in the right order.

core dump on malloc_consolidate () from /lib64/libc.so.6 [duplicate]

I usually love good explained questions and answers. But in this case I really can't give any more clues.
The question is: why malloc() is giving me SIGSEGV? The debug bellow show the program has no time to test the returned pointer to NULL and exit. The program quits INSIDE MALLOC!
I'm assuming my malloc in glibc is just fine. I have a debian/linux wheezy system, updated, in an old pentium (i386/i486 arch).
To be able to track, I generated a core dump. Lets follow it:
iguana$gdb xadreco core-20131207-150611.dump
Core was generated by `./xadreco'.
Program terminated with signal 11, Segmentation fault.
#0 0xb767fef5 in ?? () from /lib/i386-linux-gnu/libc.so.6
(gdb) bt
#0 0xb767fef5 in ?? () from /lib/i386-linux-gnu/libc.so.6
#1 0xb76824bc in malloc () from /lib/i386-linux-gnu/libc.so.6
#2 0x080529c3 in enche_pmovi (cabeca=0xbfd40de0, pmovi=0x...) at xadreco.c:4519
#3 0x0804b93a in geramov (tabu=..., nmovi=0xbfd411f8) at xadreco.c:1473
#4 0x0804e7b7 in minimax (atual=..., deep=1, alfa=-105000, bet...) at xadreco.c:2778
#5 0x0804e9fa in minimax (atual=..., deep=0, alfa=-105000, bet...) at xadreco.c:2827
#6 0x0804de62 in compjoga (tabu=0xbfd41924) at xadreco.c:2508
#7 0x080490b5 in main (argc=1, argv=0xbfd41b24) at xadreco.c:604
(gdb) frame 2
#2 0x080529c3 in enche_pmovi (cabeca=0xbfd40de0, pmovi=0x ...) at xadreco.c:4519
4519 movimento *paux = (movimento *) malloc (sizeof (movimento));
(gdb) l
4516
4517 void enche_pmovi (movimento **cabeca, movimento **pmovi, int c0, int c1, int c2, int c3, int p, int r, int e, int f, int *nmovi)
4518 {
4519 movimento *paux = (movimento *) malloc (sizeof (movimento));
4520 if (paux == NULL)
4521 exit(1);
Of course I need to look at frame 2, the last on stack related to my code. But the line 4519 gives SIGSEGV! It does not have time to test, on line 4520, if paux==NULL or not.
Here it is "movimento" (abbreviated):
typedef struct smovimento
{
int lance[4]; //move in integer notation
int roque; // etc. ...
struct smovimento *prox;// pointer to next
} movimento;
This program can load a LOT of memory. And I know the memory is in its limits. But I thought malloc would handle better when memory is not available.
Doing a $free -h during execution, I can see memory down to as low as 1MB! Thats ok. The old computer only has 96MB. And 50MB is used by the OS.
I don't know to where start looking. Maybe check available memory BEFORE a malloc call? But that sounds a wast of computer power, as malloc would supposedly do that. sizeof (movimento) is about 48 bytes. If I test before, at least I'll have some confirmation of the bug.
Any ideas, please share. Thanks.
Any crash inside malloc (or free) is an almost sure sign of heap corruption, which can come in many forms:
overflowing or underflowing a heap buffer
freeing something twice
freeing a non-heap pointer
writing to freed block
etc.
These bugs are very hard to catch without tool support, because the crash often comes many thousands of instructions, and possibly many calls to malloc or free later, in code that is often in a completely different part of the program and very far from where the bug is.
The good news is that tools like Valgrind or AddressSanitizer usually point you straight at the problem.

How to debug crash, when backtrace starts with zero

my long running application crashes randomly with segmentation fault. When trying to debug the generated coredump, I get stuck with wierd stacktrace:
(gdb) bt full
#0 __memmove_ssse3 () at ../sysdeps/i386/i686/multiarch/memcpy-ssse3.S:2582
No locals.
#1 0x00000000 in ?? ()
No symbol table info available.
How it can happen, that the backtrace starts at 0x00000000?
What can I do to debug this issue more? I can't run it in gdb as it may take even a week till the crash occures.
Generally this means that the return address on the stack has been overwritten with 0, probably due to overrunning the end of an on-stack array. You can trying building with address sanitizer on gcc or clang (if you are using them). Or you can try running with valgrind to see if it will tell you about invalid memory writes.

Segmentation fault - why and how does it work?

In both the functions defined below, it tries to allocate 10M of memory in the stack. But the segmentation fault happens only in the second case and not it the first and I am trying to understand why so.
Function definition 1:
a(int *i)
{
char iptr[50000000];
*i = 1;
}
Function definition 2:
a()
{
char c;
char iptr[5000000];
printf("&c = 0x%lx, iptr = 0x%x ... ", &c, iptr);
fflush(stdout);
c = iptr[0];
printf("ok\n");
}
According to my understanding in case of local variables that are not alloted memory dynamically are stored in stack section of the program. So I suppose, during compile time itself the compiler checks if the variable fits in the stack or not.
Hence if above stated is true, then segmentation fault should occur in both the cases (i.e. also in case 1).
The website (http://web.eecs.utk.edu/courses/spring2012/cs360/360/notes/Memory/lecture.html) from where I picked this states that the segfault happens in function 2 in a when the code attempts to push iptr on the stack for the printf call. This is because the stack pointer is pointing to the void. Had we not referenced anything at the stack pointer, our program should have worked.
I need help understanding this last statement and my earlier doubt related to this.
So I suppose, during compile time itself the compiler checks if the variable fits in the stack or not.
No, that cannot be done. When compiling a function, the compiler does not know what the call stack will be when the function is called, so it will assume that you know what you are doing (which might or not be the case). Also note that the amount of stack space may be affected by both compile time and runtime restrictions (in Linux you can set the stack size with ulimit on the shell that starts the process).
I need help understanding this last statement and my earlier doubt related to this.
I would not attempt to look too much into that statement, it is not standard but rather based on knowledge of a particular implementation that is not even described there, and thus is built on some assumptions that are not necessarily true.
It assumes that the act of allocating the array does not 'touch' the allocated memory (in some debug builds in some implementations that is false) and thus whether you attempt to allocate 1 byte or 100M if the data is not touched by your program the allocation is fine --this need not be the case.
It also assumes that the arguments of the function printf are passed in the stack (this is actually the case in all implementations I know, due to the variadic arguments nature of the function). With the previous assumption, the array would overflow the stack (assuming an stack of <10M), but would not crash as the memory is not accessed, but to be able to call printf the value of the argument would be pushed to the stack beyond the array. This will write to memory and that write will be beyond the allocated space for the stack and crash.
Again, all this is implementation, not defined by the language.
Error in your code is being thrown by the following code:
; Find next lower page and probe
cs20:
sub eax, _PAGESIZE_ ; decrease by PAGESIZE
test dword ptr [eax],eax ; probe page. "**This line throws the error**"
jmp short cs10
_chkstk endp
end
From chkstk.asm file, which Provide stack checking on procedure entry. And this file explicitically defines:
_PAGESIZE_ equ 1000h
Now as a explanation of your problem This Question tells everything you need as mentioned by: Shafik Yaghmour
Your printf format string assumes that pointers, ints (%x), and longs (%lx) are all the same size; this may be false on your platform, leading to undefined behavior. Use %p instead. I intended to make this a comment, but can't yet.
I am surprised no one noticed that the first function allocates 10 times the space than the second function. There are seven zeros after 5 in the first function whereas the second function has six zeros after 5 :-)
I compiled it with gcc-4.6.3 and got segmentation fault on the first function but not on the second function. After I removed the additional zero in the first function, seg fault went away. Adding a zero in the second function introduced the seg fault. So at least in my case, the reason of this seg fault is that the program could not allocate the required space on the stack. I would be happy to hear about the observations that differ from the above.

Buffer array overflow in for loop in c

When would a program crash in a buffer overrun case
#include<stdio.h>
#include<stdlib.h>
main() {
char buff[50];
int i=0;
for( i=0; i <100; i++ )
{
buff[i] = i;
printf("buff[%d]=%d\n",i,buff[i]);
}
}
What will happen to first 50 bytes assigned, when would the program crash?
I see in my UBUNTU with gcc a.out it is crashing when i 99
>>
buff[99]=99
*** stack smashing detected ***: ./a.out terminated
Aborted (core dumped)
<<
I would like to know why this is not crashing when assignment happening at buff[51] in the for loop?
It is undefined behavior. You can never predict when (or if at all) it crashes, but you cannot rely upon it 'not crashing' and code an application.
Reasoning
The rationale is that there is no compile or run time 'index out of bound checking' in c arrays. That is present in STL vectors or arrays in other higher level languages. So whenever your program accesses memory beyond the allocated range, it depends whether it simply corrupts another field on your program's stack or affects memory of another program or something else, so one can never predict a crash which only occurs in extreme cases. It only crashes in a state that forces the OS to intervene OR when it no longer remains possible for your program to function correctly.
Example
Say you were inside a function call, and immediately next to your array was, the RETURN address i.e. the address your program uses to return to the function it was called from. Suppose you corrupted that and now your program tries to return to the corrupted value, which is not a valid address. Hence it would crash in such a situation.
The worst happens when you silently modified another field's value and didn't even discover what was wrong assuming no crash occurred.
Since it seems you have allocated on the stack the buffer, the app possibly will crash on the first occasion you overwrite an instruction which is to be executed, possibly somewhere in the code of the for loop... at least that's how it's supposed to be in theory.