This code seems to append characters outside allocated range - c++

I'm playing with some basic stuff of cpp. I'm new in this language... so I'm warning that my question maybe was not correctly formulated. I appreciate any help.
The thing is that after saw the example in www.cplusplus.com/reference/cstdlib/malloc/ I found my self with this code:
#include <stdio.h>
int main (void) {
char *str;
str = (char*) malloc(2);
str[0] ='8';
str[1] ='8';
str[2] ='6';
str[3] ='\0';
printf ("%s\n",str);
}
And compiling with:
gcc -O0 -pedantic -Wall test2.cpp
(gcc version 4.7.2)
I get no errors and the output 886. Why I get no errors? Have I not passed the boundary of the allocated space?
I didn't get no errors and I got the output 886. Why no errors? Have I not passed the boundary of the allocated space?
In the case that code is ok... Why the example in the reference?
In the other (more probable) case... What are the risks?
Thanks!

You don't get any errors because C and C++ don't do bounds checking. You overwrote sections of memory that you weren't using, but you got lucky and it wasn't anything important. Compare it to putting a row of nails into a wall where you know there's a stud. If you miss the stud, most of the time, you just put a hole in the plaster, but it's dangerous to keep doing it because eventually, you're going to hit one of the live wires instead.

You have passed over the boundary of the allocated memory.
However, printf does not bother what size of a memory you have declared. All it cares is it will start from the start and continue till it finds a 0.
The case you created is an undefined behaviour. There can be some other data right after your allocated region (maybe another variable) in which case it will get corrupted. If the next part is unallocated memory you might escape without a visible problem. And if the memory right after your allocated memory belongs to another process, you will see the nice and tidy Segmentation Fault. The consequences can be even worse, so better not try this anywhere.

the following can be found in comments in malloc.c of glibc:
Minimum overhead per allocated chunk: 4 or 8 bytes Each malloced
chunk has a hidden word of overhead holding size and status
information.
Minimum allocated size: 4-byte ptrs: 16 bytes (including 4
overhead)
8-byte ptrs: 24/32 bytes (including, 4/8 overhead)
When a chunk is freed, 12 (for 4byte ptrs) or 20 (for 8 byte
ptrs but 4 byte size) or 24 (for 8/8) additional bytes are needed;
4 (8) for a trailing size field and 8 (16) bytes for free list
pointers. Thus, the minimum allocatable size is 16/24/32 bytes.
Since minimum allocated size would be 16/24/32, since it is greater than 3 bytes your program ran without errors. This is one of the possibility executing your program correctly.

Related

Valgrind error with new array [duplicate]

I am getting an invalid read error when the src string ends with \n, the error disappear when i remove \n:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main (void)
{
char *txt = strdup ("this is a not socket terminated message\n");
printf ("%d: %s\n", strlen (txt), txt);
free (txt);
return 0;
}
valgrind output:
==18929== HEAP SUMMARY:
==18929== in use at exit: 0 bytes in 0 blocks
==18929== total heap usage: 2 allocs, 2 frees, 84 bytes allocated
==18929==
==18929== All heap blocks were freed -- no leaks are possible
==18929==
==18929== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==18929==
==18929== 1 errors in context 1 of 1:
==18929== Invalid read of size 4
==18929== at 0x804847E: main (in /tmp/test)
==18929== Address 0x4204050 is 40 bytes inside a block of size 41 alloc'd
==18929== at 0x402A17C: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==18929== by 0x8048415: main (in /tmp/test)
==18929==
==18929== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
How to fix this without sacrificing the new line character?
It's not about the newline character, nor the printf format specifier. You've found what is arguably a bug in strlen(), and I can tell you must be using gcc.
Your program code is perfectly fine. The printf format specifier could be a little better, but it won't cause the valgrind error you are seeing. Let's look at that valgrind error:
==18929== Invalid read of size 4
==18929== at 0x804847E: main (in /tmp/test)
==18929== Address 0x4204050 is 40 bytes inside a block of size 41 alloc'd
==18929== at 0x402A17C: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==18929== by 0x8048415: main (in /tmp/test)
"Invalid read of size 4" is the first message we must understand. It means that the processor ran an instruction which would load 4 consecutive bytes from memory. The next line indicates that the address attempted to be read was "Address 0x4204050 is 40 bytes inside a block of size 41 alloc'd."
With this information, we can figure it out. First, if you replace that '\n' with a '$', or any other character, the same error will be produced. Try it.
Secondly, we can see that your string has 40 characters in it. Adding the \0 termination character brings the total bytes used to represent the string to 41.
Because we have the message "Address 0x4204050 is 40 bytes inside a block of size 41 alloc'd," we now know everything about what is going wrong.
strdup() allocated the correct amount of memory, 41 bytes.
strlen() attempted to read 4 bytes, starting at the 40th, which would extend to a non-existent 43rd byte.
valgrind caught the problem
This is a glib() bug. Once upon a time, a project called Tiny C Compiler (TCC) was starting to take off. Coincidentally, glib was completely changed so that the normal string functions, such as strlen() no longer existed. They were replaced with optimized versions which read memory using various methods such as reading four bytes at a time. gcc was changed at the same time to generate calls to the appropriate implementations, depending on the alignment of the input pointer, the hardware compiled for, etc. The TCC project was abandoned when this change to the GNU environment made it so difficult to produce a new C compiler, by taking away the ability to use glib for the standard library.
If you report the bug, glib maintainers probably won't fix it. The reason is that under practical use, this will likely never cause an actual crash. The strlen function is reading bytes 4 at a time because it sees that the addresses are 4-byte aligned. It's always possible to read 4 bytes from a 4-byte-aligned address without segfaulting, given that reading 1 byte from that address would succeed. Therefore, the warning from valgrind doesn't reveal a potential crash, just a mismatch in assumptions about how to program. I consider valgrind technically correct, but I think there is zero chance that glib maintainers will do anything to squelch the warning.
The error message seems to indicate that it's strlen that read past the malloced buffer allocated by strdup. On a 32-bit platform, an optimal strlen implementation could read 4 bytes at a time into a 32-bit register and do some bit-twiddling to see if there's a null byte in there. If near the end of the string, there are less than 4 bytes left, but 4 bytes are still read to perform the null byte check, then I could see this error getting printed. In that case, presumably the strlen implementer would know if it's "safe" to do this on the particular platform, in which case the valgrind error is a false positive.

Expected behavior for using pointer beyond malloc allocated memory

Just wondering, because I can't figure out a way to test this. Imagine the scenario whereby I have 10 bytes of memory
I malloc varA with 5 bytes
Assign a string with 7 characters (which use up 8 bytes)
I malloc varB with 5 bytes
Will the program run into an error? Or just end up with gibberish memory?
Does the behaviour varies from a c/c++ and a cuda program?
That's not a memory leak, it's a buffer overflow. And those leads to undefined behavior, which will most likely give you weird problems (or even crashes) during run-time.
Unless you mean point 2 literally, like in
char *str = malloc(5);
str = "foobar";
Then you do have a memory leak, and not a buffer overflow.
It is an undefined behavior to write beyond allocated memory.

What do the memory operations malloc and free exactly do?

Recently I met a memory release problem. First, the blow is the C codes:
#include <stdio.h>
#include <stdlib.h>
int main ()
{
int *p =(int*) malloc(5*sizeof (int));
int i ;
for(i =0;i<5; i++)
p[i ]=i;
p[i ]=i;
for(i =0;i<6; i++)
printf("[%p]:%d\n" ,p+ i,p [i]);
free(p );
printf("The memory has been released.\n" );
}
Apparently, there is the memory out of range problem. And when I use the VS2008 compiler, it give the following output and some errors about memory release:
[00453E80]:0
[00453E84]:1
[00453E88]:2
[00453E8C]:3
[00453E90]:4
[00453E94]:5
However when I use the gcc 4.7.3 compiler of cygwin, I get the following output:
[0x80028258]:0
[0x8002825c]:1
[0x80028260]:2
[0x80028264]:3
[0x80028268]:4
[0x8002826c]:51
The memory has been released.
Apparently, the codes run normally, but 5 is not written to the memory.
So there are maybe some differences between VS2008 and gcc on handling these problems.
Could you guys give me some professional explanation on this? Thanks In Advance.
This is normal as you have never allocated any data into the mem space of p[5]. The program will just print what ever data was stored in that space.
There's no deterministic "explanation on this". Writing data into the uncharted territory past the allocated memory limit causes undefined behavior. The behavior is unpredictable. That's all there is to it.
It is still strange though to see that 51 printed there. Typically GCC will also print 5 but fail with memory corruption message at free. How you managed to make this code print 51 is not exactly clear. I strongly suspect that the code you posted is not he code you ran.
It seems that you have multiple questions, so, let me try to answer them separately:
As pointed out by others above, you write past the end of the array so, once you have done that, you are in "undefined behavior" territory and this means that anything could happen, including printing 5, 6 or 0xdeadbeaf, or blow up your PC.
In the first case (VS2008), free appears to report an error message on standard output. It is not obvious to me what this error message is so it is hard to explain what is going on but you ask later in a comment how VS2008 could know the size of the memory you release. Typically, if you allocate memory and store it in pointer p, a lot of memory allocators (the malloc/free implementation) store at p[-1] the size of the memory allocated. In practice, it is common to also store at address p[p[-1]] a special value (say, 0xdeadbeaf). This "canary" is checked upon free to see if you have written past the end of the array. To summarize, your 5*sizeof(int) array is probably at least 5*sizeof(int) + 2*sizeof(char*) bytes long and the memory allocator used by code compiled with VS2008 has quite a few checks builtin.
In the case of gcc, I find it surprising that you get 51 printed. If you wanted to investigate wwhy that is exactly, I would recommend getting an asm dump of the generated code as well as running this under a debugger to check if 5 is actually really written past the end of the array (gcc could well have decided not to generate that code because it is "undefined") and if it is, to put a watchpoint on that memory location to see who overrides it, when, and why.

C++ Dereference the Non-allocated Memory but Without Segmentation Fault

I have encountered a problem which I don't understand, the following is my code:
#include <iostream>
#include <stdio.h>
#include <string.h>
#include <cstdlib>
using namespace std;
int main(int argc, char **argv)
{
char *format = "The sum of the two numbers is: %d";
char *presult;
int sum = 10;
presult = (char *)calloc(sizeof(format) + 20, 1); //allocate 24 bytes
sprintf(presult, format, sum); // after this operation,
// the length of presult is 33
cout << presult << endl;
presult[40] = 'g'; //still no segfault here...
delete(presult);
}
I compiled this code on different machines. On one machine the sizeof(format) is 4 bytes and on another, the sizeof(format) is 8 bytes; (On both machines, the char only takes one byte, which means sizeof(*format) equals 1)
However, no matter on which machine, the result is still confusing to me. Because even for the second machine, the allocated memory for use is just 20 + 8 which is 28 bytes and obviously the string has a length of 33 meaning that at least 33 bytes are needed. But there is NO segmentation fault occurring after I run this program. As you can see, even if I tried to dereference the presult at position 40, the program doesn't crash and show any segfault information.
Could anyone help to explain why? Thank you so much.
Accessing unallocated memory is undefined behavior, meaning you might get a segfault (if you're lucky) or you might not.
Or your program is free to display kittens on the screen.
Speculating on why something happens or doesn't happen in undefined behavior land is usually counter-productive, but I'd imagine what's happening to you is that the OS is actually assigning your application a larger block of memory than it's asking for. Since your application isn't trying to dereference anything outside that larger block, the OS doesn't detect the problem, and therefore doesn't kill your program with a segmentation fault.
Because undefined behavior is undefined. It's not "defined to crash".
There is no seg fault because there is no reason for there to be one. You are very likely stil writing into the heap since you got memory from the heap, so the memory isn't read only. Also, the memory there is likely to exist and be allocated for you(or at least the program), so it's not an access violation. Normally you would get a seg fault because you might try to access memory that is not given to you or you may be trying to write to memory that is read only. Neither of these appears to be the case here, so nothing goes wrong.
In fact, writing past the end of a buffer is a common security problem, known as the buffer overflow. It was the most common security vulnerability for some time. Nowadays people are using higher level languages which check for out of index bounds, so this is not as big of a problem anymore.
To respond to this: "the result is still confusing to me. Because even for the second machine, the allocated memory for use is just 20 + 8 which is 28 bytes and obviously the string has a length of 33 meaning that at least 33 bytes are needed."
sizeof(some_pointer) == sizeof(size_t) on any infrastructure. You were testing on a 32bit machine (4B) and on a 64bit machine (8B).
You have to give malloc the number of bytes to allocate; sizeof(ptr_to_char) will not give you the length of the string (the number of chars until '\0').
Btw, strlen does what you want: http://www.cplusplus.com/reference/cstring/strlen/

off-by-one error with string functions (C/C++) and security potentials

So this code has the off-by-one error:
void foo (const char * str) {
char buffer[64];
strncpy(buffer, str, sizeof(buffer));
buffer[sizeof(buffer)] = '\0';
printf("whoa: %s", buffer);
}
What can malicious attackers do if she figured out how the function foo() works?
Basically, to what kind of security potential problems is this code vulnerable?
I personally thought that the attacker can't really do anything in this case, but I heard that they can do a lot of things even if they are limited to work with 1 byte.
The only off-by-one error I see here is this line:
buffer[sizeof(buffer)] = '\0';
Is that what you're talking about? I'm not an expert on these things, so maybe I've overlooking something, but since the only thing that will ever get written to that wrong byte is a zero, I think the possibilities are quite limited. The attacker can't control what's being written there. Most likely it would just cause a crash, but it could also cause tons of other odd behavior, all of it specific to your application. I don't see any code injection vulnerability here unless this error causes your app to expose another such vulnerability that would be used as the vector for the actual attack.
Again, take with a grain of salt...
Read Shell Coder's Handbook 2nd Edition for lots of information.
Disclaimer: This is inferred knowledge from some research I just did, and should not be taken as gospel.
It's going to overwrite part or all of your saved frame pointer with a null byte - that's the reference point that your calling function will use to offset it's memory accesses. So at that point the calling function's memory operations are going to a different location. I don't know what that location will be, but you don't want to be accessing the wrong memory. I won't say you can do anything, but you might be able to do something.
How do I know this (really, how did I infer this)? Smashing the stack for Fun and Profit by Aleph One. It's quite old, and I don't know if Windows or Compilers have changed the way the stack behaves to avoid these problems. But it's a starting point.
example1.c:
void function(int a, int b, int c) {
char buffer1[5];
char buffer2[10];
}
void main() {
function(1,2,3);
}
To understand what the program does to call function() we compile it with
gcc using the -S switch to generate assembly code output:
$ gcc -S -o example1.s example1.c
By looking at the assembly language output we see that the call to
function() is translated to:
pushl $3
pushl $2
pushl $1
call function
This pushes the 3 arguments to function backwards into the stack, and
calls function(). The instruction 'call' will push the instruction pointer
(IP) onto the stack. We'll call the saved IP the return address (RET). The
first thing done in function is the procedure prolog:
pushl %ebp
movl %esp,%ebp
subl $20,%esp
This pushes EBP, the frame pointer, onto the stack. It then copies the
current SP onto EBP, making it the new FP pointer. We'll call the saved FP
pointer SFP. It then allocates space for the local variables by subtracting
their size from SP.
We must remember that memory can only be addressed in multiples of the
word size. A word in our case is 4 bytes, or 32 bits. So our 5 byte buffer
is really going to take 8 bytes (2 words) of memory, and our 10 byte buffer
is going to take 12 bytes (3 words) of memory. That is why SP is being
subtracted by 20. With that in mind our stack looks like this when
function() is called (each space represents a byte):
bottom of top of
memory memory
buffer2 buffer1 sfp ret a b c
<------ [ ][ ][ ][ ][ ][ ][ ]
top of bottom of
stack stack
What can malicious attackers do if she
figured out how the function foo()
works? Basically, to what kind of
security potential problems is this
code vulnerable?
This is probably not the best example of a bug that could be easily exploited for security purposes although it could exploited to potentially crash the code simply by using a string of 64-characters or longer.
While it certainly is a bug that will corrupt the address immediately after the array (on the stack) with a single zero byte, there is no easy way for a hacker to inject data into the corrupted area. Calling the printf() function will push parameters on the stack and may clear the zero that was written out of array bounds and lead to a potentially unterminated string being passed to printf.
However, without intimate knowledge of what goes on in printf (and needing to exploit printf as well as foo), a hacker would be hard pressed to do anything other than crash your code.
FWIW, this is a good reason to compile with warnings on or to use functions like strncpy_s which both respects buffer size and also includes a terminating null even if the copied string is larger than the buffer. With strncpy_s, the line "buffer[sizeof(buffer)] = '\0';" is not even necessary.
The issue is that you don't have permission to write to the item after the array. When you asked for 64 chars for buffer, the system is required to give you at least 64 bytes. It's normal for the system to give you more than that -- in which case the memory belongs to you and there is no problem in practice -- but it is possible that even the first byte after the array belongs to "somebody else."
So what happens if you overwrite it? If the "somebody else" is actually inside your program (maybe in a different structure or thread) the operating system probably won't notice you trampled on that data, but that other structure or thread might. There's no telling what data should be there or how trampling over it will affect things.
In this case you allocated buffer on the stack, which means (1) the somebody else is you, and in fact is your current stack frame, and (2) it's not in another thread (but could affect other local variables in the current stack frame).