What do the memory operations malloc and free exactly do? - c++

Recently I met a memory release problem. First, the blow is the C codes:
#include <stdio.h>
#include <stdlib.h>
int main ()
{
int *p =(int*) malloc(5*sizeof (int));
int i ;
for(i =0;i<5; i++)
p[i ]=i;
p[i ]=i;
for(i =0;i<6; i++)
printf("[%p]:%d\n" ,p+ i,p [i]);
free(p );
printf("The memory has been released.\n" );
}
Apparently, there is the memory out of range problem. And when I use the VS2008 compiler, it give the following output and some errors about memory release:
[00453E80]:0
[00453E84]:1
[00453E88]:2
[00453E8C]:3
[00453E90]:4
[00453E94]:5
However when I use the gcc 4.7.3 compiler of cygwin, I get the following output:
[0x80028258]:0
[0x8002825c]:1
[0x80028260]:2
[0x80028264]:3
[0x80028268]:4
[0x8002826c]:51
The memory has been released.
Apparently, the codes run normally, but 5 is not written to the memory.
So there are maybe some differences between VS2008 and gcc on handling these problems.
Could you guys give me some professional explanation on this? Thanks In Advance.

This is normal as you have never allocated any data into the mem space of p[5]. The program will just print what ever data was stored in that space.

There's no deterministic "explanation on this". Writing data into the uncharted territory past the allocated memory limit causes undefined behavior. The behavior is unpredictable. That's all there is to it.
It is still strange though to see that 51 printed there. Typically GCC will also print 5 but fail with memory corruption message at free. How you managed to make this code print 51 is not exactly clear. I strongly suspect that the code you posted is not he code you ran.

It seems that you have multiple questions, so, let me try to answer them separately:
As pointed out by others above, you write past the end of the array so, once you have done that, you are in "undefined behavior" territory and this means that anything could happen, including printing 5, 6 or 0xdeadbeaf, or blow up your PC.
In the first case (VS2008), free appears to report an error message on standard output. It is not obvious to me what this error message is so it is hard to explain what is going on but you ask later in a comment how VS2008 could know the size of the memory you release. Typically, if you allocate memory and store it in pointer p, a lot of memory allocators (the malloc/free implementation) store at p[-1] the size of the memory allocated. In practice, it is common to also store at address p[p[-1]] a special value (say, 0xdeadbeaf). This "canary" is checked upon free to see if you have written past the end of the array. To summarize, your 5*sizeof(int) array is probably at least 5*sizeof(int) + 2*sizeof(char*) bytes long and the memory allocator used by code compiled with VS2008 has quite a few checks builtin.
In the case of gcc, I find it surprising that you get 51 printed. If you wanted to investigate wwhy that is exactly, I would recommend getting an asm dump of the generated code as well as running this under a debugger to check if 5 is actually really written past the end of the array (gcc could well have decided not to generate that code because it is "undefined") and if it is, to put a watchpoint on that memory location to see who overrides it, when, and why.

Related

How to debug segmentation fault?

It works when, in the loop, I set every element to 0 or to entry_count-1.
It works when I set it up so that entry_count is small, and I write it by hand instead of by loop (sorted_order[0] = 0; sorted_order[1] = 1; ... etc).
Please do not tell me what to do to fix my code. I will not be using smart pointers or vectors for very specific reasons. Instead focus on the question:
What sort of conditions can cause this segfault?
Thank you.
---- OLD -----
I am trying to debug code that isn't working on a unix machine. The gist of the code is:
int *sorted_array = (int*)memory;
// I know that this block is large enough
// It is allocated by malloc earlier
for (int i = 0; i < entry_count; ++i){
sorted_array[i] = i;
}
There appears to be a segfault somewhere in the loop. Switching to debug mode, unfortunately, makes the segfault stop. Using cout debugging I found that it must be in the loop.
Next I wanted to know how far into the loop the segfault happend so I added:
std::cout << i << '\n';
It showed the entire range it was suppose to be looping over and there was no segfault.
With a little more experimentation I eventually created a string stream before the loop and write an empty string into it for each iteration of the loop and there is no segfault.
I tried some other assorted operations trying to figure out what is going on. I tried setting a variable j = i; and stuff like that, but I haven't found anything that works.
Running valgrind the only information I got on the segfault was that it was a "General Protection Fault" and something about default response to 11. It also mentions that there's a Conditional jump or move depends on uninitialized value(s), but looking at the code I can't figure out how that's possible.
What can this be? I am out of ideas to explore.
This is clearly a symptoms of invalid memory uses within your program.This would be bit difficult to find by looking out your code snippet as it is most likely be the side effect of something else bad which has already happened.
However as you have mentioned in your question that you are able to attach your program using Valgrind. as it is reproducible. So you may want to attach your program(a.out).
$ valgrind --tool=memcheck --db-attach=yes ./a.out
This way Valgrind would attach your program in the debugger when your first memory error is detected so that you can do live debugging(GDB). This should be the best possible way to understand and resolve your problem.
Once you are able to figure it out your first error, fix it and rerun it and see what are other errors you are getting.This steps should be done till no error is getting reported by Valgrind.
However you should avoid using the raw pointers in modern C++ programs and start using std::vector std::unique_ptr as suggested by others as well.
Valgrind and GDB are very useful.
The most previous one that I used was GDB- I like it because it showed me the exact line number that the Segmentation Fault was on.
Here are some resources that can guide you on using GDB:
GDB Tutorial 1
GDB Tutorial 2
If you still cannot figure out how to use GDB with these tutorials, there are tons on Google! Just search debugging Segmentation Faults with GDB!
Good luck :)
That is hard, I used valgrind tools to debug seg-faults and it usually pointed to violations.
Likely your problem is freed memory that you are writing to i.e. sorted_array gets out of scope or gets freed.
Adding more code hides this problem as data allocation shifts around.
After a few days of experimentation, I figured out what was really going on.
For some reason the machine segfaults on unaligned access. That is, the integers I was writing were not being written to memory boundaries that were multiples of four bytes. Before the loop I computed the offset and shifted the array up that much:
int offset = (4 - (uintptr_t)(memory) % 4) % 4;
memory += offset;
After doing this everything behaved as expected again.

Why am I not getting a segmentation fault with this code? (Bus error)

I had a bug in my code that went like this.
char desc[25];
char name[20];
char address[20];
sprintf (desc, "%s %s", name, address);
Ideally this should give a segfault. However, I saw this give a bus error.
Wikipedia says something to the order of 'Bus error is when the program tries to access an unaligned memory location or when you try to access a physical (not virtual) memory location that does not exist or is not allowed. '
The second part of the above statement sounds similar to a seg fault. So my question is, when do you get a SIGBUS and when a SIGSEGV?
EDIT:-
Quite a few people have mentioned the context. I'm not sure what context would be needed but this was a buffer overflow lying inside a static class function that get's called from a number of other class functions. If there's something more specific that I can give which will help, do ask.
Anyways, someone had commented that I should simply write better code. I guess the point of asking this question was "can an application developer infer anything from a SIGBUS versus a SIGSEGV?" (picked from that blog post below)
As you probably realize, the base cause is undefined behavior in your
program. In this case, it leads to an error detected by the hardware,
which is caught by the OS and mapped to a signal. The exact mapping
isn't really specified (and I've seen integral division by zero result
in a SIGFPE), but generally: SIGSEGV occurs when you access out of
bounds, SIGBUS for other accessing errors, and SIGILL for an illegal
instruction. In this case, the most likely explination is that your
bounds error has overwritten the return address on the stack. If the
return address isn't correctly aligned, you'll probably get a SIGBUS,
and if it is, you'll start executing whatever is there, which could
result in a SIGILL. (But the possibility of executing random bytes as
code is what the standards committee had in mind when they defined
“undefined behavior”. Especially on machines with no memory
protection, where you could end up jumping directly into the OS.)
A segmentation fault is never guaranteed when you're doing fishy stuff with memory. It all depends on a lot of factors (how the compiler lays out the program in memory, optimizations etc).
What may be illegal for a C++ program may not be illegal for a program in general. For instance the OS doesn't care if you step outside an array. It doesn't even know what an array is. However it does care if you touch memory that doesn't belong to you.
A segmentation fault occurs if you try to do a data access a virtual address that is not mapped to your process. On most operating systems, memory is mapped in pages of a few kilobytes; this means that you often won't get a fault if you write off the end of an array, since there is other valid data following it in the memory page.
A bus error indicates a more low-level error; a wrongly-aligned access or a missing physical address are two reasons, as you say. However, the first is not happening here, since you're dealing with bytes, which have no alignment restriction; and I think the second can only happen on data accesses when memory is completely exhausted, which probably isn't happening.
However, I think you might also get a bus error if you try to execute code from an invalid virtual address. This could well be what is happening here - by writing off the end of a local array, you will overwrite important parts of the stack frame, such as the function's return address. This will cause the function to return to an invalid address, which (I think) will give a bus error. That's my best guess at what particular flavour of undefined behaviour you are experiencing here.
In general, you can't rely on segmentation faults to catch buffer overruns; the best tool I know of is valgrind, although that will still fail to catch some kinds of overrun. The best way to avoid overruns when working with strings is to use std::string, rather than pretending that you're writing C.
In this particular case, you don't know what kind of garbage you have in the format string. That garbage could potentially result in treating the remaining arguments as those of an "aligned" data type (e.g. int or double). Treating an unaligned area as an aligned argument definitely causes SIGBUS on some systems.
Given that your string is made up of two other strings each being a max of 20 characters long, yet you are putting it into a field that is 25 characters, that is where your first issue lies. You are have a good potential to overstep your bounds.
The variable desc should be at least 41 characters long (20 + 20 + 1 [for the space you insert]).
Use valgrind or gdb to figure out why you are getting a seg fault.
char desc[25];
char name[20];
char address[20];
sprintf (desc, "%s %s", name, address);
Just by looking at this code, I can assume that name and address each can be 20 chars long. If that is so, then does it not imply that desc should be minimum 20+20+1 chars long? (1 char for the space between name and address, as specified in the sprintf).
That can be the one reason of segfault. There could be other reasons as well. For example, what if name is longer than 20 chars?
So better you use std::string:
std::string name;
std::string address;
std::string desc = name + " " + address;
char const *char_desc = desc.str(); //if at all you need this

The value of out-of-bounds element of an array - should it change on each execution or stay the same?

For the actual question, jump to question part. For an interesting real-world example of undefined behavior, keep reading :)
There was this enumeration:
struct EnumStruct
{
enum Enum
{
val0 = 0,
val1,
val2,
val3,
val4
};
};
and in some function we had this:
const int arrayCount = 6;
int arr[] = {
EnumStruct::val0,
EnumStruct::val1,
EnumStruct::val2,
EnumStruct::val3,
EnumStruct::val4
InvalidValue
};
Then there was a loop that put arrayCount elements of the arr into a file. This was the Prepare() routine for the unit tests. And the unit test was supposed to check for the presence of the InvalidValue in the file. I was assigned a defect stating that the unit test fails. It worked perfect on my machine though. After a couple of hours of debugging I noticed that InvalidValue is actually #defined as -1, and there is a missing comma after val4. You can only imagine the swearwords that came out of my mouth in the address of whoever wrote that code (which worked perfectly for more than 3 years, actually).
Now, as you can see, the array is actually comprised of 5 values - 0, 1, 2, 3, 3 but the loop writes to the file the 6th element as well, which is of course Undefined Behavior. Now, while technically it is undefined, on Windows with MSVC no crashes happen - it just writes the garbage that is at that memory location. The thing is that if the garbage happens to be anything but 0, 1, 2, 3, or 4, the unit tests will succeed.
Question: It appears that the .vcproj file of the UT's is somehow patched before the UT's are built. I don't know how they do it, but with their build out-of-bounds array elements are always 0. It seems to me that the whole virtual memory is set to 0 before program execution. What project setting is that? Or am I imagining things?
I mean, if it was just luck that there was a 0 sitting out-of-bound of an array, then upon multiple executions my luck would fail, wouldn't it? But it is always 0... I am confused.
By the way, when I build the same project the out-of-bounds element always has different values on each execution. Can you please explain this? Thanks.
To the actual question of is memory always 0 at the start? Well, it might depend. In general, when the OS provides you with a page of memory it will be cleared (as a security measure, so that you cannot read the values that any other process had in their memory), so in many cases you will find that uninitialized can look like 0, until memory is being reused in your own process, where you will get whatever you wrote before.
There are also some compiler flags that can affect it. To detect uninitialized memory issues, sometimes debug builds will write patterns to memory after allocation from the OS and before handling it to the program, and a different pattern after releasing the memory in the program before reallocating (to detect access to freed memory), so that it is simpler to identify what happened (if you see in a debugger that the value is 0xDEADBEEF, you will know that the memory has already been released by the program. You will have to read the compiler/IDE documentation for more precise information.
As you say, it's undefined, so the implementor is free to do whatever they like. I can't speak for Visual C++ at all, but I am aware of other products that do things like zero out memory when running debug builds so that things like invalid pointer dereferences will fail at the site of the fault. It's possible that Microsoft is doing something similar I guess.

Allocate room for null terminating character when copying strings in C?

const char* src = "hello";
Calling strlen(src); returns size 5...
Now say I do this:
char* dest = new char[strlen(src)];
strcpy(dest, src);
That doesn't seem like it should work, but when I output everything it looks right. It seems like I'm not allocating space for the null terminator on the end... is this right? Thanks
You are correct that you are not allocating space for the terminator, however the failure to do this will not necessarily cause your program to fail. You may be overwriting following information on the heap, or your heap manager will be rounding up allocation size to a multiple of 16 bytes or something, so you won't necessarily see any visible effect of this bug.
If you run your program under Valgrind or other heap debugger, you may be able to detect this problem sooner.
Yes, you should allocate at least strlen(src)+1 characters.
That doesn't seem like it should work, but when I output everything it looks right.
Welcome to the world of Undefined Behavior. When you do this, anything can happen. Your program can crash, your computer can crash, your computer can explode, demons can fly out of your nose.
And worst of all, your program could run just fine, inconspicuously looking like it's working correctly until one day it starts spitting out garbage because it's overwriting sensitive data somewhere due to the fact that somewhere, someone allocated one too few characters for their arrays, and now you've corrupted the heap and you get a segfault at some point a million miles away, or even worse your program happily chugs along with a corrupted heap and your functions are operating on corrupted credit card numbers and you get in huge trouble.
Even if it looks like it works, it doesn't. That's Undefined Behavior. Avoid it, because you can never be sure what it will do, and even when what it does when you try it is okay, it may not be okay on another platform.
The best description I have read (was on stackoverflow) and went like this:
If the speed limit is 50 and you drive at 60. You may get lucky and not get a ticket but one day maybe not today maybe not tomorrow but one day that cop will be waiting for you. On that day you will pay and you will pay dearly.
If somebody can find the original I would much rather point at that they were much more eloquent than my explanation.
strcpy will copy the null terminated char as well as all of the other chars.
So you are copying the length of hello + 1 which is 6 into a buffer size which is 5.
You have a buffer overflow here though, and overwiting memory that is not your own will have undefined results.
Alternatively, you could also use dest = strdup(src) which will allocate enough memory for the string + 1 for the null terminator (+1 for Juliano's answer).
This is why you should always, always, always run valgrind on any C program that appears to work.
Yeah, everyone has covered the major point; you are not guaranteed to fail. The fact is that the null terminator is usually 0 and 0 is a pretty common value to be sitting in any particular memory address. So it just happens to work. You could test this by taking a set of memory, writing a bunch of garbage to it and then writing that string there and trying to work with it.
Anyway, the major issue I see here is that you are talking about C but you have this line of code:
char* dest = new char[strlen(src)];
This won't compile in any standard C compiler. There's no new keyword in C. That is C++. In C, you would use one of the memory allocation functions, usually malloc. I know it seems nitpicy, but really, it's not.

c++ what happens if you print more characters with sprintf, than the char pointer has allocated?

I assume this is a common way to use sprintf:
char pText[x];
sprintf(pText, "helloworld %d", Count );
but what exactly happens, if the char pointer has less memory allocated, than it will be print to?
i.e. what if x is smaller than the length of the second parameter of sprintf?
i am asking, since i get some strange behaviour in the code that follows the sprintf statement.
It's not possible to answer in general "exactly" what will happen. Doing this invokes what is called Undefined behavior, which basically means that anything might happen.
It's a good idea to simply avoid such cases, and use safe functions where available:
char pText[12];
snprintf(pText, sizeof pText, "helloworld %d", count);
Note how snprintf() takes an additional argument that is the buffer size, and won't write more than there is room for.
This is a common error and leads to memory after the char array being overwritten. So, for example, there could be some ints or another array in the memory after the char array and those would get overwritten with the text.
See a nice detailed description about the whole problem (buffer overflows) here. There's also a comment that some architectures provide a snprintf routine that has a fourth parameter that defines the maximum length (in your case x). If your compiler doesn't know it, you can also write it yourself to make sure you can't get such errors (or just check that you always have enough space allocated).
Note that the behaviour after such an error is undefined and can lead to very strange errors. Variables are usually aligned at memory locations divisible by 4, so you sometimes won't notice the error in most cases where you have written one or two bytes too much (i.e. forget to make place for a NUL), but get strange errors in other cases. These errors are hard to debug because other variables get changed and errors will often occur in a completely different part of the code.
This is called a buffer overrun.
sprintf will overwrite the memory that happens to follow pText address-wise. Since pText is on the stack, sprintf can overwrite local variables, function arguments and the return address, leading to all sorts of bugs. Many security vulnerabilities result from this kind of code — e.g. an attacker uses the buffer overrun to write a new return address pointing to his own code.
The behaviour in this situation is undefined. Normally, you will crash, but you might also see no ill effects, strange values appearing in unrelated variables and that kind of thing. Your code might also call into the wrong functions, format your hard-drive and kill other running programs. It is best to resolve this by allocating more memory for your buffer.
I have done this many times, you will receive memory corruption error. AFAIK, I remember i have done some thing like this:-
vector<char> vecMyObj(10);
vecMyObj.resize(10);
sprintf(&vecMyObj[0],"helloworld %d", count);
But when destructor of vector is called, my program receive memory corruption error, if size is less then 10, it will work successfully.
Can you spell Buffer Overflow ? One possible result will be stack corruption, and make your app vulnerable to Stack-based exploitation.