C++ Dereference the Non-allocated Memory but Without Segmentation Fault - c++

I have encountered a problem which I don't understand, the following is my code:
#include <iostream>
#include <stdio.h>
#include <string.h>
#include <cstdlib>
using namespace std;
int main(int argc, char **argv)
{
char *format = "The sum of the two numbers is: %d";
char *presult;
int sum = 10;
presult = (char *)calloc(sizeof(format) + 20, 1); //allocate 24 bytes
sprintf(presult, format, sum); // after this operation,
// the length of presult is 33
cout << presult << endl;
presult[40] = 'g'; //still no segfault here...
delete(presult);
}
I compiled this code on different machines. On one machine the sizeof(format) is 4 bytes and on another, the sizeof(format) is 8 bytes; (On both machines, the char only takes one byte, which means sizeof(*format) equals 1)
However, no matter on which machine, the result is still confusing to me. Because even for the second machine, the allocated memory for use is just 20 + 8 which is 28 bytes and obviously the string has a length of 33 meaning that at least 33 bytes are needed. But there is NO segmentation fault occurring after I run this program. As you can see, even if I tried to dereference the presult at position 40, the program doesn't crash and show any segfault information.
Could anyone help to explain why? Thank you so much.

Accessing unallocated memory is undefined behavior, meaning you might get a segfault (if you're lucky) or you might not.
Or your program is free to display kittens on the screen.
Speculating on why something happens or doesn't happen in undefined behavior land is usually counter-productive, but I'd imagine what's happening to you is that the OS is actually assigning your application a larger block of memory than it's asking for. Since your application isn't trying to dereference anything outside that larger block, the OS doesn't detect the problem, and therefore doesn't kill your program with a segmentation fault.

Because undefined behavior is undefined. It's not "defined to crash".

There is no seg fault because there is no reason for there to be one. You are very likely stil writing into the heap since you got memory from the heap, so the memory isn't read only. Also, the memory there is likely to exist and be allocated for you(or at least the program), so it's not an access violation. Normally you would get a seg fault because you might try to access memory that is not given to you or you may be trying to write to memory that is read only. Neither of these appears to be the case here, so nothing goes wrong.
In fact, writing past the end of a buffer is a common security problem, known as the buffer overflow. It was the most common security vulnerability for some time. Nowadays people are using higher level languages which check for out of index bounds, so this is not as big of a problem anymore.

To respond to this: "the result is still confusing to me. Because even for the second machine, the allocated memory for use is just 20 + 8 which is 28 bytes and obviously the string has a length of 33 meaning that at least 33 bytes are needed."
sizeof(some_pointer) == sizeof(size_t) on any infrastructure. You were testing on a 32bit machine (4B) and on a 64bit machine (8B).
You have to give malloc the number of bytes to allocate; sizeof(ptr_to_char) will not give you the length of the string (the number of chars until '\0').
Btw, strlen does what you want: http://www.cplusplus.com/reference/cstring/strlen/

Related

How long* cast works

So I have this chunk of code
char buf[2];
buf[0] = 'a';
buf[1] = 'b';
std::cout << *((long *)((void*)buf) + 1) << std::endl;
When I saw that I said to myself:
We have memory address 1000 (for example) and that's the address of buf[1].
So I thought that *((long )((void)buf) + 1) will print out whatever is in addresses:
from 1000 until 1000 + sizeof(long)
But that's not the case. This code prints always -858993460
What is that number and why it isn't random?
I obviously lack the knowledge to understand what is going on so I would appreciate if you could give me a hint or something!
What is that number and why it isn't random?
It is a random value. Nothing in your program suggests that value should be printed.
It happens to be consistent so far as you've run your program. Maybe you haven't run it enough. Using uninitialized memory produces undefined behavior. Programs with UB might work as intended for years and then fail to compile.
By the way, your expression doesn't have the intended meaning. Try adding more spaces.
* ( (long *) ((void*)buf) + 1 )
First you cast buf to void *, then you cast it againt to long *, then you added one (to get the next long, not the next byte), and then fetched a long from memory. The bytes that got printed are entirely outside the char[2] array.
This code is reading past the end of a buffer and so is a security risk and completely undefined behaviour.
The contents of memory beyond buf could be anything.
Your compiler, architecture, and/or build settings may be such that currently that value is the same each time you run it, but that's just blind chance and could change at any point.
It will be different again on 64-bit systems where long is 64 bits wide. Alignment rules may also cause this code to fail outright.
Summary: even though this code is returning you the same result for each run right now, this is totally unsafe and undefined behaviour.
Avoid.
At a particular instant the value in a particular memory address will be constant unless other variables are created which take it's place. In your program if you output the memory address of buf it will be the same. Which means that you would be referring to the same address everytime the program is run and hence the same garbage value would be printed.

This code seems to append characters outside allocated range

I'm playing with some basic stuff of cpp. I'm new in this language... so I'm warning that my question maybe was not correctly formulated. I appreciate any help.
The thing is that after saw the example in www.cplusplus.com/reference/cstdlib/malloc/ I found my self with this code:
#include <stdio.h>
int main (void) {
char *str;
str = (char*) malloc(2);
str[0] ='8';
str[1] ='8';
str[2] ='6';
str[3] ='\0';
printf ("%s\n",str);
}
And compiling with:
gcc -O0 -pedantic -Wall test2.cpp
(gcc version 4.7.2)
I get no errors and the output 886. Why I get no errors? Have I not passed the boundary of the allocated space?
I didn't get no errors and I got the output 886. Why no errors? Have I not passed the boundary of the allocated space?
In the case that code is ok... Why the example in the reference?
In the other (more probable) case... What are the risks?
Thanks!
You don't get any errors because C and C++ don't do bounds checking. You overwrote sections of memory that you weren't using, but you got lucky and it wasn't anything important. Compare it to putting a row of nails into a wall where you know there's a stud. If you miss the stud, most of the time, you just put a hole in the plaster, but it's dangerous to keep doing it because eventually, you're going to hit one of the live wires instead.
You have passed over the boundary of the allocated memory.
However, printf does not bother what size of a memory you have declared. All it cares is it will start from the start and continue till it finds a 0.
The case you created is an undefined behaviour. There can be some other data right after your allocated region (maybe another variable) in which case it will get corrupted. If the next part is unallocated memory you might escape without a visible problem. And if the memory right after your allocated memory belongs to another process, you will see the nice and tidy Segmentation Fault. The consequences can be even worse, so better not try this anywhere.
the following can be found in comments in malloc.c of glibc:
Minimum overhead per allocated chunk: 4 or 8 bytes Each malloced
chunk has a hidden word of overhead holding size and status
information.
Minimum allocated size: 4-byte ptrs: 16 bytes (including 4
overhead)
8-byte ptrs: 24/32 bytes (including, 4/8 overhead)
When a chunk is freed, 12 (for 4byte ptrs) or 20 (for 8 byte
ptrs but 4 byte size) or 24 (for 8/8) additional bytes are needed;
4 (8) for a trailing size field and 8 (16) bytes for free list
pointers. Thus, the minimum allocatable size is 16/24/32 bytes.
Since minimum allocated size would be 16/24/32, since it is greater than 3 bytes your program ran without errors. This is one of the possibility executing your program correctly.

What do the memory operations malloc and free exactly do?

Recently I met a memory release problem. First, the blow is the C codes:
#include <stdio.h>
#include <stdlib.h>
int main ()
{
int *p =(int*) malloc(5*sizeof (int));
int i ;
for(i =0;i<5; i++)
p[i ]=i;
p[i ]=i;
for(i =0;i<6; i++)
printf("[%p]:%d\n" ,p+ i,p [i]);
free(p );
printf("The memory has been released.\n" );
}
Apparently, there is the memory out of range problem. And when I use the VS2008 compiler, it give the following output and some errors about memory release:
[00453E80]:0
[00453E84]:1
[00453E88]:2
[00453E8C]:3
[00453E90]:4
[00453E94]:5
However when I use the gcc 4.7.3 compiler of cygwin, I get the following output:
[0x80028258]:0
[0x8002825c]:1
[0x80028260]:2
[0x80028264]:3
[0x80028268]:4
[0x8002826c]:51
The memory has been released.
Apparently, the codes run normally, but 5 is not written to the memory.
So there are maybe some differences between VS2008 and gcc on handling these problems.
Could you guys give me some professional explanation on this? Thanks In Advance.
This is normal as you have never allocated any data into the mem space of p[5]. The program will just print what ever data was stored in that space.
There's no deterministic "explanation on this". Writing data into the uncharted territory past the allocated memory limit causes undefined behavior. The behavior is unpredictable. That's all there is to it.
It is still strange though to see that 51 printed there. Typically GCC will also print 5 but fail with memory corruption message at free. How you managed to make this code print 51 is not exactly clear. I strongly suspect that the code you posted is not he code you ran.
It seems that you have multiple questions, so, let me try to answer them separately:
As pointed out by others above, you write past the end of the array so, once you have done that, you are in "undefined behavior" territory and this means that anything could happen, including printing 5, 6 or 0xdeadbeaf, or blow up your PC.
In the first case (VS2008), free appears to report an error message on standard output. It is not obvious to me what this error message is so it is hard to explain what is going on but you ask later in a comment how VS2008 could know the size of the memory you release. Typically, if you allocate memory and store it in pointer p, a lot of memory allocators (the malloc/free implementation) store at p[-1] the size of the memory allocated. In practice, it is common to also store at address p[p[-1]] a special value (say, 0xdeadbeaf). This "canary" is checked upon free to see if you have written past the end of the array. To summarize, your 5*sizeof(int) array is probably at least 5*sizeof(int) + 2*sizeof(char*) bytes long and the memory allocator used by code compiled with VS2008 has quite a few checks builtin.
In the case of gcc, I find it surprising that you get 51 printed. If you wanted to investigate wwhy that is exactly, I would recommend getting an asm dump of the generated code as well as running this under a debugger to check if 5 is actually really written past the end of the array (gcc could well have decided not to generate that code because it is "undefined") and if it is, to put a watchpoint on that memory location to see who overrides it, when, and why.

C++ buffer overflow different on 3 machines

I am testing a simple buffer overflow in c++. The example is a test where given that checks are not in place, a malicious user could overwrite variables using a buffer overflow.
The example defines a buffer and then a variable, this means that space should be allocated for the buffer, and then space for the variable. The example reads from cin to a buffer of length 5, and then checks if the admin variable is set to something other that 0, if it is, the user conceptually gained admin access.
#include <iostream>
using namespace std;
int main()
{
char buffer[5];
int admin = 0;
cin>>buffer;
if(strcmp(buffer,"in") == 0)
{
admin = 1;
cout<<"Correct"<<endl;
}
if(admin != 0)
cout << "Access" << endl;
return 0;
}
I have 3 machines, 1 Windows and 2 Linux systems.
When I test this on windows (CodeBlocks) it works (logically)
entering more than 5 characters overflows and rewrites the the admin variable's bytes
Now my first linux system also works but only when I enter 13 characters, is this to do with different compilers and how they allocate memory to the program?
My second linux machine can't overflow at all. It will give a dump error only after the 13th character.
Why do they differ that much?
You should examine disassembly. From there you will see what happens precisely.
Generally speaking, there are two things to consider:
Padding done by the compiler to align stack variables.
Relative placement of the stack variables by the compiler.
The first point: Your array char buffer[5]; will be padded so int admin; will be properly aligned on stack. I would expect it to be generally padded to 8 bytes on both x86 or x64 and so 9 symbols to overwrite. But compiler might do differently depending on what it sees fit. Nonetheless, it appears that Windows and Linux machines are x86 (32bit).
The second point: compiler is not required to put stack variables on stack in order of their declaration. On Windows and first Linux machine compiler does indeed place char buffer[5]; below int admin;, so you can overflow into it. On second Linux machine, compiler chooses to place it in reverse order, so instead of overflowing into int admin;, you are corrupting stack frame of the caller of main() after writing beyond space allocated for char buffer[5];.
Here is shameless link to my own answer to a similar question - an example of examining of such overflow.
Undefined behavior is, as you have discovered, undefined. Trying to explain it is in general not terribly productive.
In this case it's almost certainly due to the arrangement of your stack and padding bytes inserted by/between the local variables varying between compiler/system.

Why am I not getting a segmentation fault with this code? (Bus error)

I had a bug in my code that went like this.
char desc[25];
char name[20];
char address[20];
sprintf (desc, "%s %s", name, address);
Ideally this should give a segfault. However, I saw this give a bus error.
Wikipedia says something to the order of 'Bus error is when the program tries to access an unaligned memory location or when you try to access a physical (not virtual) memory location that does not exist or is not allowed. '
The second part of the above statement sounds similar to a seg fault. So my question is, when do you get a SIGBUS and when a SIGSEGV?
EDIT:-
Quite a few people have mentioned the context. I'm not sure what context would be needed but this was a buffer overflow lying inside a static class function that get's called from a number of other class functions. If there's something more specific that I can give which will help, do ask.
Anyways, someone had commented that I should simply write better code. I guess the point of asking this question was "can an application developer infer anything from a SIGBUS versus a SIGSEGV?" (picked from that blog post below)
As you probably realize, the base cause is undefined behavior in your
program. In this case, it leads to an error detected by the hardware,
which is caught by the OS and mapped to a signal. The exact mapping
isn't really specified (and I've seen integral division by zero result
in a SIGFPE), but generally: SIGSEGV occurs when you access out of
bounds, SIGBUS for other accessing errors, and SIGILL for an illegal
instruction. In this case, the most likely explination is that your
bounds error has overwritten the return address on the stack. If the
return address isn't correctly aligned, you'll probably get a SIGBUS,
and if it is, you'll start executing whatever is there, which could
result in a SIGILL. (But the possibility of executing random bytes as
code is what the standards committee had in mind when they defined
“undefined behavior”. Especially on machines with no memory
protection, where you could end up jumping directly into the OS.)
A segmentation fault is never guaranteed when you're doing fishy stuff with memory. It all depends on a lot of factors (how the compiler lays out the program in memory, optimizations etc).
What may be illegal for a C++ program may not be illegal for a program in general. For instance the OS doesn't care if you step outside an array. It doesn't even know what an array is. However it does care if you touch memory that doesn't belong to you.
A segmentation fault occurs if you try to do a data access a virtual address that is not mapped to your process. On most operating systems, memory is mapped in pages of a few kilobytes; this means that you often won't get a fault if you write off the end of an array, since there is other valid data following it in the memory page.
A bus error indicates a more low-level error; a wrongly-aligned access or a missing physical address are two reasons, as you say. However, the first is not happening here, since you're dealing with bytes, which have no alignment restriction; and I think the second can only happen on data accesses when memory is completely exhausted, which probably isn't happening.
However, I think you might also get a bus error if you try to execute code from an invalid virtual address. This could well be what is happening here - by writing off the end of a local array, you will overwrite important parts of the stack frame, such as the function's return address. This will cause the function to return to an invalid address, which (I think) will give a bus error. That's my best guess at what particular flavour of undefined behaviour you are experiencing here.
In general, you can't rely on segmentation faults to catch buffer overruns; the best tool I know of is valgrind, although that will still fail to catch some kinds of overrun. The best way to avoid overruns when working with strings is to use std::string, rather than pretending that you're writing C.
In this particular case, you don't know what kind of garbage you have in the format string. That garbage could potentially result in treating the remaining arguments as those of an "aligned" data type (e.g. int or double). Treating an unaligned area as an aligned argument definitely causes SIGBUS on some systems.
Given that your string is made up of two other strings each being a max of 20 characters long, yet you are putting it into a field that is 25 characters, that is where your first issue lies. You are have a good potential to overstep your bounds.
The variable desc should be at least 41 characters long (20 + 20 + 1 [for the space you insert]).
Use valgrind or gdb to figure out why you are getting a seg fault.
char desc[25];
char name[20];
char address[20];
sprintf (desc, "%s %s", name, address);
Just by looking at this code, I can assume that name and address each can be 20 chars long. If that is so, then does it not imply that desc should be minimum 20+20+1 chars long? (1 char for the space between name and address, as specified in the sprintf).
That can be the one reason of segfault. There could be other reasons as well. For example, what if name is longer than 20 chars?
So better you use std::string:
std::string name;
std::string address;
std::string desc = name + " " + address;
char const *char_desc = desc.str(); //if at all you need this