What is a buffer overflow and how do I cause one? - c++

I have heard about a buffer overflow and I would like to know how to cause one.
Can someone show me a small buffer overflow example?
New(And what they are used for?)

Classical example of a buffer-overflow:
// noone will ever have the time to type more than 64 characters...
char buf[64];
gets(buf); // let user put his name
The buffer overflow alone does most often not happen purposely. It happens most often because of a so-called "off-by-one" error. Meaning you have mis-calculated the array-size by one - maybe because you forgot to account for a terminating null character, or because some other stuff.
But it can also be used for some evil stuff. Indeed, the user long knew this hole, and then inserts say 70 characters, with the last ones containing some special bytes which overwrite some stack-slot - if the user is really tricky he/she will hit the return-address slot in the stack, and overwrites it so it jumps forward into that just inserted buffer: Because what the user entered was not his name, but his shell-code that he previously compiled and dumped out. That one will then just executed. There are some problems. For example, you have to arrange not to have a "\n" in that binary code (because gets would stop reading there). For other ways that mess with dangerous string functions, the binary zero is problematic because string functions stop copying there to the buffer. People have used xor with two times the same value to produce a zero too, without writing a zero byte explicitly.
That's the classic way of doing it. But there are some security blocks that can tell that such things happened and other stuff that make the stack non-executable. But i guess there are way better tricks than i just explained. Some assembler guy could probably now tell you long stories about that :)
How to avoid it
Always use functions that take a maximal-length argument too, if you are not 100% sure that a buffer is really large enough. Don't play such games as "oh, the number will not exceed 5 characters" - it will fail some day. Remember that one rocket where scientists said that the number will not exceed some magnitude, because the rocket would never be that fast. But some day, it was actually faster, and what resulted was an integer overflow and the rocket crashed (it's about a bug in Ariane 5, one of the most expensive Computer bugs in history).
For example, instead of gets use fgets. And instead of sprintf use snprintf where suitable and available (or just the C++ style things like istream and stuff)

A buffer overflow is basically when a crafted section (or buffer) of memory is written outside of its intended bounds. If an attacker can manage to make this happen from outside of a program it can cause security problems as it could potentially allow them to manipulate arbitrary memory locations, although many modern operating systems protect against the worst cases of this.
While both reading and writing outside of the intended bounds are generally considered a bad idea, the term "buffer overflow" is generally reserved for writing outside the bounds, as this can cause an attacker to easily modify the way your code runs. There is a good article on Wikipedia about buffer overflows and the various ways they can be used for exploits.
In terms of how you could program one yourself, it would be a simple matter of:
char a[4];
strcpy(a,"a string longer than 4 characters"); // write past end of buffer (buffer overflow)
printf("%s\n",a[6]); // read past end of buffer (also not a good idea)
Whether that compiles and what happens when it runs would probably depend on your operating system and compiler.

In the modern linux OS you can't made exploiting buffer overflow without some EXTRA experiment.
why ? because you will be blocked by ASLR (Address Stack Layer Randomization) and stack protector in this modern GNU C compiler. you will not locate memory easily because memory will fall into random memory caused by ASLR. and you will blocked by stack protector if you try to overflow the program.
For begining you need to put of ASLR to be 0
default value is 2
root#bt:~# cat /proc/sys/kernel/randomize_va_space
2
root#bt:~# echo 0 > /proc/sys/kernel/randomize_va_space
root#bt:~# cat /proc/sys/kernel/randomize_va_space
0
root#bt:~#
in this is case not about OLD STYLE buffer overflow tutorial you may got from internet. or aleph one tutorial will not work anymore in your system now.
now lets make a program vulnerability to buffer overflow scenario
---------------------bof.c--------------------------
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv)
{
char buffer[400];
strcpy(buffer, argv[1]);
return 0;
}
---------------------EOF-----------------------------
looks at strcpy function is dangerous without stack protector, because function without checking how many bytes we will input.
compile with extra option -fno-stack-protector dan -mpreferred-stack-boundary=2 for take off stack protector in your C program
root#bt:~# gcc -g -o bof -fno-stack-protector -mpreferred-stack-boundary=2 bof.c
root#bt:~# chown root:root bof
root#bt:~# chmod 4755 bof
buffer overflow C program with SUID root access scenatio now we have make it.
now lets search how many bytes we need to put into buffer to made a program segmentation fault
root#bt:~# ./bof `perl -e 'print "A" x 400'`
root#bt:~# ./bof `perl -e 'print "A" x 403'`
root#bt:~# ./bof `perl -e 'print "A" x 404'`
Segmentation fault
root#bt:~#
you see we need 404 bytes to made program segmentation fault (crash) now how many bytes we need to overwrite EIP ? EIP is instruction will be executed after. so hacker do overwrite EIP to evil instruction what they want in the binary SUID on the program. if the program in the SUID root, the instruction will be runned in root access.
root#bt:~# gdb -q bof
(gdb) list
1 #include <stdio.h>
2 #include <string.h>
3
4 int main(int argc, char** argv)
5 {
6 char buffer[400];
7 strcpy(buffer, argv[1]);
8
9 return 0;
10 }
(gdb) run `perl -e 'print "A" x 404'`
Starting program: /root/bof `perl -e 'print "A" x 404'`
Program received signal SIGSEGV, Segmentation fault.
0xb7e86606 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
(gdb) run `perl -e 'print "A" x 405'`
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /root/bof `perl -e 'print "A" x 405'`
Program received signal SIGSEGV, Segmentation fault.
0xb7e800a9 in ?? () from /lib/tls/i686/cmov/libc.so.6
(gdb)
program GOT segmentation fault return code. let's input more bytes and take see to EIP register.
(gdb) run `perl -e 'print "A" x 406'`
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /root/bof `perl -e 'print "A" x 406'`
Program received signal SIGSEGV, Segmentation fault.
0xb7004141 in ?? ()
(gdb)
(gdb) run `perl -e 'print "A" x 407'`
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /root/bof `perl -e 'print "A" x 407'`
Program received signal SIGSEGV, Segmentation fault.
0x00414141 in ?? ()
(gdb)
little more
(gdb) run `perl -e 'print "A" x 408'`
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /root/bof `perl -e 'print "A" x 408'`
Program received signal SIGSEGV, Segmentation fault.
0x41414141 in ?? ()
(gdb)
(gdb) i r
eax 0x0 0
ecx 0xbffff0b7 -1073745737
edx 0x199 409
ebx 0xb7fc9ff4 -1208180748
esp 0xbffff250 0xbffff250
ebp 0x41414141 0x41414141
esi 0x8048400 134513664
edi 0x8048310 134513424
eip 0x41414141 0x41414141 <-- overwriten !!
eflags 0x210246 [ PF ZF IF RF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
(gdb)
now you can do your next step...

A buffer overflow is just writing past the end of a buffer:
int main(int argc, const char* argv[])
{
char buf[10];
memset(buf, 0, 11);
return 0;
}

In addition to what has already been said, keep in mind that you'r program may or may not "crash" when a buffer overflow occurs. It should crash, and you should hope it does - but if the buffer overflow "overflows" into another address that your application has also allocated - your application may appear to operate normally for a longer period of time.
If you are using a later edition of Microsoft Visual Studio - I would suggest using the new secure counterparts in the stdlib, such as sprintf_s insted of sprintf, ect...

This should be enought to reproduce it:
void buffer_overflow()
{
char * foo = "foo";
char buffer[10];
for(int it = 0; it < 1000; it++) {
buffer[it] = '*';
}
char accessViolation = foo[0];
}

The "classic" buffer overflow example is:
int main(int argc, char *argv[])
{
char buffer[10];
strcpy(buffer, argv[1]);
}
That lets you play with the buffer overflow parameters and tweak them to your hearts content. The book "Hacking - The Art of Exploitation" (Link goes to Amazon) goes into great detail about how to play around with buffer overflows (purely as an intellectual exercise obviously).

If you want to check you program for buffer overflows, you could run it with tools like Valgrind. They will find some memory management bugs for you.

This is a general comment about the answers you received. For example:
int main(int argc, char *argv[])
{
char buffer[10];
strcpy(buffer, argv[1]);
}
And:
int main(int argc, const char* argv[])
{
char buf[10];
memset(buf, 0, 11);
return 0;
}
On modern Linux platforms, this may not work as expected or intended. It may not work because of the FORTIFY_SOURCE security feature.
FORTIFY_SOURCE uses "safer" variants of high risk functions like memcpy and strcpy. The compiler uses the safer variants when it can deduce the destination buffer size. If the copy would exceed the destination buffer size, then the program calls abort().
To disable FORTIFY_SOURCE for your testing, you should compile the program with -U_FORTIFY_SOURCE or -D_FORTIFY_SOURCE=0.

In this context, a buffer is a portion of memory set aside for a particular purpose, and a buffer overflow is what happens when a write operation into the buffer keeps going past the end (writing into memory which has a different purpose). This is always a bug.
A buffer overflow attack is one which uses this bug to accomplish something that the program's author didn't intend to be possible.

With the correct answers given: To get more into this topic, you might want to listen to the Podcast Security Now. In Episode 39 (a while back) they discussed this in depth. This is a quick way to get a deeper understanding without requiring to digest a whole book.
(At the link you'll find the archive with multiple size versions as well as a transcript, if you're rather visually oriented). Audio is not the perfect medium for this topic but Steve is working wonders to deal with this.

Buffer overflow is the insertion of characters beyond what the allocated memory can hold.

Related

Why is the recursion depth non-deterministic (C++)?

Repeated runs of the following C++ program give a different maximum number of recursion calls (varying by approximately 100 function calls) before a segmentation fault.
#include <iostream>
void recursion(int i)
{
std::cout << "iteration: " << ++i << std::endl;
recursion(i);
}
int main()
{
recursion(0);
};
I compiled the file main.cpp with
g++ -O0 main.cpp -o main
Here and here the same issue as above is discussed for java. In both cases, the answers are based on java related concepts, JIT, garbage collection, HotSpot optimizer, etc.
Why does the maximum number of recursions vary for C++?
Your recursion never logically terminates. It only terminates when your program crashes due to lack of stack space.
A certain amount of stack space is used for every recursive call, but in C++, it's not defined exactly how much stack space is available and how much is used per recursive call.
The stack space used per call may vary by optimization settings, linker options, alignment requirements, how your program is launched, and a ton of other things.
Bottom line: you have coded a bug, and you are running afoul of undefined behavior in your compiler and platform. If you want to figure out exactly how much stack space your program has on its current thread, your platform will have APIs you can call to get that value.
What happens when you blow the stack is not a guaranteed crash. Depending on the system, you could just be trashing memory in a relatively random bit of your memory space.
What is in that memory might depend on what memory allocations occurred, how much contiguous memory the OS handed to you when you asked for some, ASLR, or whatever.
Undefined behaviour in C++ is not predictable.
Beyond the C++ aspect: Following the comments of Eljay and n.'pronouns'.m, I turned of ASLR. This post describes how to do that. In short, ASLR can be disabled via
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
and enabled via
echo 2 | sudo tee /proc/sys/kernel/randomize_va_space
After disabling ASLR, the number of recursions before the system segmentation fault is constant for repeated execution of the described program.

C++ Dereference the Non-allocated Memory but Without Segmentation Fault

I have encountered a problem which I don't understand, the following is my code:
#include <iostream>
#include <stdio.h>
#include <string.h>
#include <cstdlib>
using namespace std;
int main(int argc, char **argv)
{
char *format = "The sum of the two numbers is: %d";
char *presult;
int sum = 10;
presult = (char *)calloc(sizeof(format) + 20, 1); //allocate 24 bytes
sprintf(presult, format, sum); // after this operation,
// the length of presult is 33
cout << presult << endl;
presult[40] = 'g'; //still no segfault here...
delete(presult);
}
I compiled this code on different machines. On one machine the sizeof(format) is 4 bytes and on another, the sizeof(format) is 8 bytes; (On both machines, the char only takes one byte, which means sizeof(*format) equals 1)
However, no matter on which machine, the result is still confusing to me. Because even for the second machine, the allocated memory for use is just 20 + 8 which is 28 bytes and obviously the string has a length of 33 meaning that at least 33 bytes are needed. But there is NO segmentation fault occurring after I run this program. As you can see, even if I tried to dereference the presult at position 40, the program doesn't crash and show any segfault information.
Could anyone help to explain why? Thank you so much.
Accessing unallocated memory is undefined behavior, meaning you might get a segfault (if you're lucky) or you might not.
Or your program is free to display kittens on the screen.
Speculating on why something happens or doesn't happen in undefined behavior land is usually counter-productive, but I'd imagine what's happening to you is that the OS is actually assigning your application a larger block of memory than it's asking for. Since your application isn't trying to dereference anything outside that larger block, the OS doesn't detect the problem, and therefore doesn't kill your program with a segmentation fault.
Because undefined behavior is undefined. It's not "defined to crash".
There is no seg fault because there is no reason for there to be one. You are very likely stil writing into the heap since you got memory from the heap, so the memory isn't read only. Also, the memory there is likely to exist and be allocated for you(or at least the program), so it's not an access violation. Normally you would get a seg fault because you might try to access memory that is not given to you or you may be trying to write to memory that is read only. Neither of these appears to be the case here, so nothing goes wrong.
In fact, writing past the end of a buffer is a common security problem, known as the buffer overflow. It was the most common security vulnerability for some time. Nowadays people are using higher level languages which check for out of index bounds, so this is not as big of a problem anymore.
To respond to this: "the result is still confusing to me. Because even for the second machine, the allocated memory for use is just 20 + 8 which is 28 bytes and obviously the string has a length of 33 meaning that at least 33 bytes are needed."
sizeof(some_pointer) == sizeof(size_t) on any infrastructure. You were testing on a 32bit machine (4B) and on a 64bit machine (8B).
You have to give malloc the number of bytes to allocate; sizeof(ptr_to_char) will not give you the length of the string (the number of chars until '\0').
Btw, strlen does what you want: http://www.cplusplus.com/reference/cstring/strlen/

How can I monitor what's being put into the standard out buffer and break when a specific string is deposited in the pipe?

In Linux, with C/C++ code, using gdb, how can you add a gdb breakpoint to scan the incoming strings in order to break on a particular string?
I don't have access to a specific library's code, but I want to break as soon as that library sends a specific string to standard out so I can go back up the stack and investigate the part of my code that is calling the library. Of course I don't want to wait until a buffer flush occurs. Can this be done? Perhaps a routine in libstdc++ ?
This question might be a good starting point: how can I put a breakpoint on "something is printed to the terminal" in gdb?
So you could at least break whenever something is written to stdout. The method basically involves setting a breakpoint on the write syscall with a condition that the first argument is 1 (i.e. STDOUT). In the comments, there is also a hint as to how you could inspect the string parameter of the write call as well.
x86 32-bit mode
I came up with the following and tested it with gdb 7.0.1-debian. It seems to work quite well. $esp + 8 contains a pointer to the memory location of the string passed to write, so first you cast it to an integral, then to a pointer to char. $esp + 4 contains the file descriptor to write to (1 for STDOUT).
$ gdb break write if 1 == *(int*)($esp + 4) && strcmp((char*)*(int*)($esp + 8), "your string") == 0
x86 64-bit mode
If your process is running in x86-64 mode, then the parameters are passed through scratch registers %rdi and %rsi
$ gdb break write if 1 == $rdi && strcmp((char*)($rsi), "your string") == 0
Note that one level of indirection is removed since we're using scratch registers rather than variables on the stack.
Variants
Functions other than strcmp can be used in the above snippets:
strncmp is useful if you want match the first n number of characters of the string being written
strstr can be used to find matches within a string, since you can't always be certain that the string you're looking for is at the beginning of string being written through the write function.
Edit: I enjoyed this question and finding it's subsequent answer. I decided to do a blog post about it.
catch + strstr condition
The cool thing about this method is that it does not depend on glibc write being used: it traces the actual system call.
Furthermore, it is more resilient to printf() buffering, as it might even catch strings that are printed across multiple printf() calls.
x86_64 version:
define stdout
catch syscall write
commands
printf "rsi = %s\n", $rsi
bt
end
condition $bpnum $rdi == 1 && strstr((char *)$rsi, "$arg0") != NULL
end
stdout qwer
Test program:
#define _XOPEN_SOURCE 700
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main() {
write(STDOUT_FILENO, "asdf1", 5);
write(STDOUT_FILENO, "qwer1", 5);
write(STDOUT_FILENO, "zxcv1", 5);
write(STDOUT_FILENO, "qwer2", 5);
printf("as");
printf("df");
printf("qw");
printf("er");
printf("zx");
printf("cv");
fflush(stdout);
return EXIT_SUCCESS;
}
Outcome: breaks at:
qwer1
qwer2
fflush. The previous printf didn't actually print anything, they were buffered! The write syacall only happened on the fflush.
Notes:
$bpnum thanks to Tromey at: https://sourceware.org/bugzilla/show_bug.cgi?id=18727
rdi: register that contains the number of the Linux system call in x86_64, 1 is for write
rsi: first argument of the syscall, for write it points to the buffer
strstr: standard C function call, searches for submatches, returns NULL if non found
Tested in Ubuntu 17.10, gdb 8.0.1.
strace
Another option if you are feeling interactive:
setarch "$(uname -m)" -R strace -i ./stdout.out |& grep '\] write'
Sample output:
[00007ffff7b00870] write(1, "a\nb\n", 4a
Now copy that address and paste it into:
setarch "$(uname -m)" -R strace -i ./stdout.out |& grep -E '\] write\(1, "a'
The advantage of this method is that you can use the usual UNIX tools to manipulate strace output, and it does not require deep GDB-fu.
Explanation:
-i makes strace output RIP
setarch -R disables ASLR for a process with a personality system call: How to debug with strace -i when everytime address is different GDB already does that by default, so no need to do it again.
Anthony's answer is awesome. Following his answer, I tried out another solution on Windows(x86-64 bits Windows). I know this question here is for GDB on Linux, however, I think this solution is a supplement for this kind of question. It might be helpful for others.
Solution on Windows
In Linux a call to printf would result in call to the API write. And because Linux is an open source OS, we could debug within the API. However, the API is different on Windows, it provided it's own API WriteFile. Due to Windows is a commercial non-open source OS, breakpoints could not be added in the APIs.
But some of the source code of VC is published together with Visual Studio, so we could find out in the source code where finally called the WriteFile API and set a breakpoint there. After debugging on the sample code, I found the printf method could result in a call to _write_nolock in which WriteFile is called. The function is located in:
your_VS_folder\VC\crt\src\write.c
The prototype is:
/* now define version that doesn't lock/unlock, validate fh */
int __cdecl _write_nolock (
int fh,
const void *buf,
unsigned cnt
)
Compared to the write API on Linux:
#include <unistd.h>
ssize_t write(int fd, const void *buf, size_t count);
They have totally the same parameters. So we could just set a condition breakpoint in _write_nolock just refer to the solutions above, with only some differences in detail.
Portable Solution for Both Win32 and x64
It is very lucky that we could use the name of parameters directly on Visual Studio when setting a condition for breakpoints on both Win32 and x64. So it becomes very easy to write the condition:
Add a breakpoints in _write_nolock
NOTICE: There are little difference on Win32 and x64. We could just use the function name to set the location of breakpoints on Win32. However, it won't work on x64 because in the entrance of the function, the parameters is not initialized. Therefore, we could not use the parameter name to set the condition of breakpoints.
But fortunately we have some work around: use the location in the function rather than the function name to set the breakpoints, e.g., the 1st line of the function. The parameters are already initialized there. (I mean use the filename+line number to set the breakpoints, or open the file directly and set a breakpoint in the function, not the entrance but the first line. )
Restrict the condition:
fh == 1 && strstr((char *)buf, "Hello World") != 0
NOTICE: there is still a problem here, I tested two different ways to write something into stdout: printf and std::cout. printf would write all the strings to the _write_nolock function at once. However std::cout would only pass character by character to _write_nolock, which means the API would be called strlen("your string") times. In this case, the condition could not be activated forever.
Win32 Solution
Of course we could use the same methods as Anthony provided: set the condition of breakpoints by registers.
For a Win32 program, the solution is almost the same with GDB on Linux. You might notice that there is a decorate __cdecl in the prototype of _write_nolock. This calling convention means:
Argument-passing order is Right to left.
Calling function pops the arguments from the stack.
Name-decoration convention: Underscore character (_) is prefixed to names.
No case translation performed.
There is a description here. And there is an example which is used to show the registers and stacks on Microsoft's website. The result could be found here.
Then it is very easy to set the condition of breakpoints:
Set a breakpoint in _write_nolock.
Restrict the condition:
*(int *)($esp + 4) == 1 && strstr(*(char **)($esp + 8), "Hello") != 0
It is the same method as on the Linux. The first condition is to make sure the string is written to stdout. The second one is to match the specified string.
x64 Solution
Two important modification from x86 to x64 are the 64-bit addressing capability and a flat set of 16 64-bit registers for general use. As the increase of registers, x64 only use __fastcall as the calling convention. The first four integer arguments are passed in registers. Arguments five and higher are passed on the stack.
You could refer to the Parameter Passing page on Microsoft's website. The four registers (in order left to right) are RCX, RDX, R8 and R9. So it is very easy to restrict the condition:
Set a breakpoint in _write_nolock.
NOTICE: it's different from the portable solution above, we could just set the location of breakpoint to the function rather than the 1st line of the function. The reason is all the registers are already initialized at the entrance.
Restrict condition:
$rcx == 1 && strstr((char *)$rdx, "Hello") != 0
The reason why we need cast and dereference on esp is that $esp accesses the ESP register, and for all intents and purposes is a void*. While the registers here stores directly the values of parameters. So another level of indirection is not needed anymore.
Post
I also enjoy this question very much, so I translated Anthony's post into Chinese and put my answer in it as a supplement. The post could be found here. Thanks for #anthony-arnold 's permission.
Anthony's answer is very interesting and it definitely gives some results.
Yet, I think it might miss the buffering of printf.
Indeed on Difference between write() and printf(), you can read that: "printf doesn't necessarily call write every time. Rather, printf buffers its output."
STDIO WRAPPER SOLUTION
Hence I came with another solution that consists in creating a helper library that you can pre-load to wrap the printf like functions. You can then set some breakpoints on this library source and backtrace to get the info about the program you are debugging.
It works on Linux and target the libc, I do not know for c++ IOSTREAM, also if the program use write directly, it will miss it.
Here is the wrapper to hijack the printf (io_helper.c).
#include<string.h>
#include<stdio.h>
#include<stdarg.h>
#define MAX_SIZE 0xFFFF
int printf(const char *format, ...){
char target_str[MAX_SIZE];
int i=0;
va_list args1, args2;
/* RESOLVE THE STRING FORMATING */
va_start(args1, format);
vsprintf(target_str,format, args1);
va_end(args1);
if (strstr(target_str, "Hello World")){ /* SEARCH FOR YOUR STRING */
i++; /* BREAK HERE */
}
/* OUTPUT THE STRING AS THE PROGRAM INTENTED TO */
va_start(args2, format);
vprintf(format, args2);
va_end(args2);
return 0;
}
int puts(const char *s)
{
return printf("%s\n",s);
}
I added puts because gcc tend to replace printf by puts when it can. So I force it back to printf.
Next you just compile it to a shared library.
gcc -shared -fPIC io_helper.c -o libio_helper.so -g
And you load it before running gdb.
LD_PRELOAD=$PWD/libio_helper.so; gdb test
Where test is the program you are debugging.
Then you can break with break io_helper.c:19 because you compiled the library with -g.
EXPLANATIONS
Our luck here is that printf and other fprintf, sprintf... are just here to resolve the variadic arguments and to call their 'v' equivalent. (vprintf in our case). Doing this job is easy, so we can do it and leave the real work to libc with the 'v' function. To get the variadic args of printf, we just have to use va_start and va_end.
The main advantages of this method is that you are sure that when you break, you are in the portion of the program that output your target string and that this is not a leftover in a buffer. Also you do not make any assumption on the hardware. The drawback is that you are assuming that the program use the libc stdio function to output things.

off-by-one error with string functions (C/C++) and security potentials

So this code has the off-by-one error:
void foo (const char * str) {
char buffer[64];
strncpy(buffer, str, sizeof(buffer));
buffer[sizeof(buffer)] = '\0';
printf("whoa: %s", buffer);
}
What can malicious attackers do if she figured out how the function foo() works?
Basically, to what kind of security potential problems is this code vulnerable?
I personally thought that the attacker can't really do anything in this case, but I heard that they can do a lot of things even if they are limited to work with 1 byte.
The only off-by-one error I see here is this line:
buffer[sizeof(buffer)] = '\0';
Is that what you're talking about? I'm not an expert on these things, so maybe I've overlooking something, but since the only thing that will ever get written to that wrong byte is a zero, I think the possibilities are quite limited. The attacker can't control what's being written there. Most likely it would just cause a crash, but it could also cause tons of other odd behavior, all of it specific to your application. I don't see any code injection vulnerability here unless this error causes your app to expose another such vulnerability that would be used as the vector for the actual attack.
Again, take with a grain of salt...
Read Shell Coder's Handbook 2nd Edition for lots of information.
Disclaimer: This is inferred knowledge from some research I just did, and should not be taken as gospel.
It's going to overwrite part or all of your saved frame pointer with a null byte - that's the reference point that your calling function will use to offset it's memory accesses. So at that point the calling function's memory operations are going to a different location. I don't know what that location will be, but you don't want to be accessing the wrong memory. I won't say you can do anything, but you might be able to do something.
How do I know this (really, how did I infer this)? Smashing the stack for Fun and Profit by Aleph One. It's quite old, and I don't know if Windows or Compilers have changed the way the stack behaves to avoid these problems. But it's a starting point.
example1.c:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
void main() {
function(1,2,3);
}
To understand what the program does to call function() we compile it with
gcc using the -S switch to generate assembly code output:
$ gcc -S -o example1.s example1.c
By looking at the assembly language output we see that the call to
function() is translated to:
pushl $3
pushl $2
pushl $1
call function
This pushes the 3 arguments to function backwards into the stack, and
calls function(). The instruction 'call' will push the instruction pointer
(IP) onto the stack. We'll call the saved IP the return address (RET). The
first thing done in function is the procedure prolog:
pushl %ebp
movl %esp,%ebp
subl $20,%esp
This pushes EBP, the frame pointer, onto the stack. It then copies the
current SP onto EBP, making it the new FP pointer. We'll call the saved FP
pointer SFP. It then allocates space for the local variables by subtracting
their size from SP.
We must remember that memory can only be addressed in multiples of the
word size. A word in our case is 4 bytes, or 32 bits. So our 5 byte buffer
is really going to take 8 bytes (2 words) of memory, and our 10 byte buffer
is going to take 12 bytes (3 words) of memory. That is why SP is being
subtracted by 20. With that in mind our stack looks like this when
function() is called (each space represents a byte):
bottom of top of
memory memory
buffer2 buffer1 sfp ret a b c
<------ [ ][ ][ ][ ][ ][ ][ ]
top of bottom of
stack stack
What can malicious attackers do if she
figured out how the function foo()
works? Basically, to what kind of
security potential problems is this
code vulnerable?
This is probably not the best example of a bug that could be easily exploited for security purposes although it could exploited to potentially crash the code simply by using a string of 64-characters or longer.
While it certainly is a bug that will corrupt the address immediately after the array (on the stack) with a single zero byte, there is no easy way for a hacker to inject data into the corrupted area. Calling the printf() function will push parameters on the stack and may clear the zero that was written out of array bounds and lead to a potentially unterminated string being passed to printf.
However, without intimate knowledge of what goes on in printf (and needing to exploit printf as well as foo), a hacker would be hard pressed to do anything other than crash your code.
FWIW, this is a good reason to compile with warnings on or to use functions like strncpy_s which both respects buffer size and also includes a terminating null even if the copied string is larger than the buffer. With strncpy_s, the line "buffer[sizeof(buffer)] = '\0';" is not even necessary.
The issue is that you don't have permission to write to the item after the array. When you asked for 64 chars for buffer, the system is required to give you at least 64 bytes. It's normal for the system to give you more than that -- in which case the memory belongs to you and there is no problem in practice -- but it is possible that even the first byte after the array belongs to "somebody else."
So what happens if you overwrite it? If the "somebody else" is actually inside your program (maybe in a different structure or thread) the operating system probably won't notice you trampled on that data, but that other structure or thread might. There's no telling what data should be there or how trampling over it will affect things.
In this case you allocated buffer on the stack, which means (1) the somebody else is you, and in fact is your current stack frame, and (2) it's not in another thread (but could affect other local variables in the current stack frame).

Segmentation fault in strcpy

consider the program below
char str[5];
strcpy(str,"Hello12345678");
printf("%s",str);
When run this program gives segmentation fault.
But when strcpy is replaced with following, program runs fine.
strcpy(str,"Hello1234567");
So question is it should crash when trying to copy to str any other string of more than 5 chars length.
So why it is not crashing for "Hello1234567" and only crashing for "Hello12345678" ie of string with length 13 or more than 13.
This program was run on 32 bit machine .
There are three types of standards behaviour you should be interested in.
1/ Defined behaviour. This will work on all complying implementations. Use this freely.
2/ Implementation-defined behaviour. As stated, it depends on the implementation but at least it's still defined. Implementations are required to document what they do in these cases. Use this if you don't care about portability.
3/ Undefined behaviour. Anything can happen. And we mean anything, up to and including your entire computer collapsing into a naked singularity and swallowing itself, you and a large proportion of your workmates. Never use this. Ever! Seriously! Don't make me come over there.
Copying more that 4 characters and a zero-byte to a char[5] is undefined behaviour.
Seriously, it doesn't matter why your program crashes with 14 characters but not 13, you're almost certainly overwriting some non-crashing information on the stack and your program will most likely produce incorrect results anyway. In fact, the crash is better since at least it stops you relying on the possibly bad effects.
Increase the size of the array to something more suitable (char[14] in this case with the available information) or use some other data structure that can cope.
Update:
Since you seem so concerned with finding out why an extra 7 characters doesn't cause problems but 8 characters does, let's envisage the possible stack layout on entering main(). I say "possible" since the actual layout depends on the calling convention that your compiler uses. Since the C start-up code calls main() with argc and argv, the stack at the start of main(), after allocating space for a char[5], could look like this:
+------------------------------------+
| C start-up code return address (4) |
| argc (4) |
| argv (4) |
| x = char[5] (5) |
+------------------------------------+
When you write the bytes Hello1234567\0 with:
strcpy (x, "Hello1234567");
to x, it overwrites the argc and argv but, on return from main(), that's okay. Specifically Hello populates x, 1234 populates argv and 567\0 populates argc. Provided you don't actually try to use argc and/or argv after that, you'll be okay:
+------------------------------------+ Overwrites with:
| C start-up code return address (4) |
| argc (4) | '567<NUL>'
| argv (4) | '1234'
| x = char[5] (5) | 'Hello'
+------------------------------------+
However, if you write Hello12345678\0 (note the extra "8") to x, it overwrites the argc and argv and also one byte of the return address so that, when main() attempts to return to the C start-up code, it goes off into fairy land instead:
+------------------------------------+ Overwrites with:
| C start-up code return address (4) | '<NUL>'
| argc (4) | '5678'
| argv (4) | '1234'
| x = char[5] (5) | 'Hello'
+------------------------------------+
Again, this depends entirely on the calling convention of your compiler. It's possible a different compiler would always pad out arrays to a multiple of 4 bytes and the code wouldn't fail there until you wrote another three characters. Even the same compiler may allocate variables on the stack frame differently to ensure alignment is satisfied.
That's what they mean by undefined: you don't know what's going to happen.
You're copying to the stack, so it's dependent on what the compiler has placed on the stack, for how much extra data will be required to crash your program.
Some compilers might produce code that will crash with only a single byte over the buffer size - it's undefined what the behaviour is.
I guess size 13 is enough to overwrite the return address, or something similar, which crashes when your function returns. But another compiler or another platform could/will crash with a different length.
Also your program might crash with a different length if it ran for a longer time, if something less important was being overwritten.
For 32-bit Intel platform the explanation is the following. When you declare char[5] on stack the compiler really allocates 8 bytes because of alignment. Then it's typical for functions to have the following prologue:
push ebp
mov ebp, esp
this saves ebp registry value on stack, then moves esp register value into ebp for using esp value to access the parameters. This leads to 4 more bytes on stack to be occupied with ebp value.
In the epilogue ebp is restored, but its value is usually only used for accessing stack-allocated function parameters, so overwriting it may not hurt in most cases.
So you have the following layout (stack grows downwards on Intel): 8 bytes for your array, then 4 bytes for ebp, then usually the return address.
This is why you need to overwrite at least 13 bytes to crash your program.
To add to the above answers: you can test for bugs like these with a tool such as Valgrind. If you're on Windows, have a look at this SO thread.
It depends on what's on the stack after the "str" array. You just happen not to be trampling on anything critical until you copy that many characters.
So it's going to depend on what else is in the function, the compiler you use and possibly the compiler options too.
13 is 5 + 8, suggesting there are two non-critical words after the str array, then something critical (maybe the return address)
That's the pure beauty of undefined behavior (UB): it's undefined.
Your code:
char str[5];
strcpy(str,"Hello12345678");
Writes 14 bytes/chars to str which can only hold 5 bytes/chars. This invokes UB.
Q: So why it is not crashing for "Hello1234567" and only crashing for "Hello12345678" ie of string with length 13 or more than 13.
Because the behaviour is undefined. Use strncpy. See this page http://en.wikipedia.org/wiki/Strcpy for more information.
Because the behaviour is undefined.
Use strncpy. See this page
http://en.wikipedia.org/wiki/Strcpy
for more information.
strncpy is unsafe since it doesn't add a NULL termination if the source string has a length >= n where n is the size of the destination string.
char s[5];
strncpy(s,5,"test12345");
printf("%s",s); // crash
We always use strlcpy to alleviate this.