int main()
{
char* p = new char('a');
*reinterpret_cast<int*>(p) = 43523;
return 0;
}
This code runs fine but how safe is it? It should have only allocated one byte of memory but it doesn't seem to be having any problem filling 4 bytes with data. What can happen with the other 3 bytes that are not allocated?
char* p = new char('a');
ASCII code of 'a' is 97, so this is equivalent to:
char* p = new char(97);
Single character (1 byte) space is allocated with *p='a'
Now, you are trying to put more than 1-byte value, that's certainly risky, even if this works, I mean runs without any segmentation fault. You are overwriting some other parts of memory that you don't own or even if own, must be for some other purpose.
The code is not safe. Your program has undefined behaviour. Anything at all could happen.
The code causes heap buffer overflow. You're overriding memory which you don't own and can be in use in other parts of the program.
Related
int main() {
int* i = new int(1);
i++;
*i=1;
delete i;
}
Here is my logic:
I increment I by 1, and then assign a value to it. Then I delete the I, so I free the memory location while leaking the original memory. Where is my problem?
I also tried different versions. Every time, as long as I do the arithmetics and delete the pointer, my program crashes.
What your program shows is several cases of undefined behaviour:
You write to memory that hasn't been allocated (*i = 1)
You free something that you didn't allocate, effectively delete i + 1.
You MUST call delete on exactly the same pointer-value that you got back from new - nothing else. Assuming the rest of your code was valid, it would be fine to do int *j = i; after int *i = new int(1);, and then delete j;. [For example int *i = new int[2]; would then make your i++; *i=1; valid code]
Who allocates is who deallocates. So you should not be able to delete something you did not new by yourself. Furthermore, i++;*i=1; is UB since you may access a restricted memory area or read-only memory...
The code made no sense . I think You have XY problem. If you could post your original problem there will be more chance to help you.
In this case you need to have a short understanding how the heap memory management works. in particular implementation of it, when you allocate an object you receive a pointer to the start of the memory available to you to work with. However, the 'really' allocated memory starts a bit 'earlier'. This means the allocated block is a bit more than you have requested to allocate. The start of the block is the address you have received minus some offset. Thus, when you pass the incremented pointer to the delete it tries to find the internal information at the left side of it. And because your address is now incremented this search fails what results in a crash. That's in short.
The problem lies here:
i++;
This line doesn't increment the value i points to, but the pointer itself by the number of bytes an int has (4 on 32-bit platform).
You meant to do this:
(*i)++;
Let's take it step by step:
int* i = new int(1); // 1. Allocate a memory.
i++; // 2. Increment a pointer. The pointer now points to
// another location.
*i=1; // 3. Dereference a pointer which points to unknown
// memory. This could cause segmentation fault.
delete i; // 4. Delete the unknown memory which is undefined
// behavior.
In short: If you don't own a piece of memory you can't do arithmetic with it neither delete it!
The following code resolves the problem of removing the duplicate characters in a string.
void removeDuplicatesEff(char *str)
{
if (!str)
return;
int len = strlen(str);
if (len < 2)
return;
const int sz = (1<<CHAR_BIT);
bool hit[sz] = {false};
int tail = 0;
for (int i=0; i<len; ++i)
{
if (!hit[str[i]])
{
str[tail] = str[i];
++tail;
hit[str[i]] = true;
}
}
str[tail] = 0;
}
After setting str[tail]=0 in the last step, if char *str does contain duplicate characters, its size will be smaller, i.e. tail. But I am wondering whether there is a memory leak here? It seems to me that, later, we cannot releasing all the spaces that is allocated to original char *str. Is this right? If so, how can we resolve it in such situations?
It seems to me that, later, we cannot releasing all the spaces that is allocated to original char *str. Is this right?
No. The length of a zero-terminated string is completely decoupled from the size of the allocated memory buffer, and the system treats it separately. As long as every allocation is followed by a symmetrical deallocation (e.g. there’s a free for every malloc operation), you’re safe.
But I am wondering whether there is a memory leak here?
Arguably, yes, this is still a leak since it (temporarily) uses more memory than required. However, that is usually not a problem since the memory gets released eventually. Except in very special circumstances, this would therefore not be considered a leak.
That said, the code is quite unconventional and definitely longer than necessary (it also assumes that CHAR_BIT == 8 but that’s another matter). For instance, you can initialise your flag array much easier, saving a loop:
bool hit[256] = {false};
And why is your loop going over the string one-based, and why is the first character handled separately?
No, there is no leak. You only modify the contents of the array by putting in 0 and not its length.
Also you shouldn't initialize your hit array by assignment with the for-loop. A standard initialization
bool hit[256] = { 0 };
would suffice and can be replaced by your compiler by the most efficient form of initialization.
There is no memory leak in your case. Memory leak happens when you allocate memory from head and not freeing after using it. In your case you are not allocating any memory from heap. You are using local variables which are stored in stack and freed when control returns from that function.
What you are doing is just changing the placement of the terminator character. It doesn't actually change the size of the allocated memory. It's actually a very common operation, and there is no risk of memory leak from doing it.
No, you will not have a memory leak. Performing a delete [] or free() on str will deallocate all allocated memory just fine because that information is stored elsewhere and does not depend on the type of data being stored in str.
But I am wondering whether there is a memory leak here? It seems to me that, later, we cannot releasing all the spaces that is allocated to original char *str
There's probably no problem here. the storage for str has been allocated in one of the following ways:
reserved space on the stack
malloc space on the heap
reserved space in the data segment.
In the first case, all of the space disappears when the stack frame unwinds. In the second case, malloc records the number of bytes allocated (usually in the memory location just before the first byte pointed to by the malloc return value. In the third case, the space is allocated only once when the program is first loaded.
No possibility of a leak there.
I am new to programming and I am trying to understand the difference between
A = (char * ) malloc(sizeof(char)*n);
and
A = (char * ) malloc(sizeof(char));
or
A = new char [n];
and
A = new char;
What is the default memory that a compiler is allocating to this pointer, when I do not specify the number of objects of particular data type.
Also when I declare
A = new char [n];
cout << A[n+1];
it does not give me a segmentation fault.
Should It not give segmentation fault because I am trying to access memory beyond what has been allocated for the Array.
Memory is not "allocated to this pointer", it's allocated and then you get a pointer to the memory.
This:
char *a = malloc(sizeof(char) * n);
is the same as
char *a = malloc(n);
since sizeof(char) is always 1. They both allocate space for n characters worth of data, and return a pointer to the location where the first character can be accessed (or NULL on failure).
Also, the casts are not needed in C, you should not have any.
Since sizeof(char) is 1, the second call is equivalent to:
char *a = malloc(1);
which means it allocates a memory block of size 1. This is of course distinct from the pointer to that memory block (the value that gets stored in the pointer variable a). The pointer is most likely larger than 1 char, but that doesn't affect the size of the block.
The argument to malloc() specifies how many chars to allocate space for.
I ignored the new usage, since that is C++ and the question is tagged C.
A = (char * ) malloc(sizeof(char)*n);
This allocates space for n characters.
A = (char * ) malloc(sizeof(char));
This allocates memory for 1 character.
Every call to malloc allocates memory in the heap.
The other code is C++, and it's exactly the same, except that it will use stack memory if A is a local variable. Accessing A[n+1] may or may not yield a segfault. A[n+1] can reference a memory address that you are allowed to use. Segfault happens when you go out of the region of memory you can access, and the way it works is that there is a "red zone" from which it is considered you accessed invalid memory. It may be the case that A[n+1] just isn't "invalid enough" to trigger a segfault.
allocate space for N characters (N should be some positive integer value here)
char *ma = (char * ) malloc(N);
char *na = new char [N];
don't forget to release this memory ...
delete [] na;
free(ma);
allocate space for a single character
char *mc = (char * ) malloc(sizeof(char));
char *nc = new char;
Now, as the others have pointed out, you tagged this C, but half your code is C++. If you were writing C, you couldn't use new/delete, and wouldn't need to cast the result of malloc.
Oh, and the reason you don't get a segmentation fault when you read off the end of your array is that this is undefined behaviour. It certainly could cause a SEGV, but it isn't required to check, so may appear to work, at least some of the time, or fail in a completely different way.
Well, the compiler doesn't allocate memory for the data. Only the pointer which is either 4 or 8 bytes depending on your architecture.
There is no difference between the first two and the last two in terms on functionality. Most C++ libraries I've seen use malloc internally for new.
When you run the code to allocate n characters and you print out the n + 1th character, you aren't getting a segmentation fault most likely because n isn't a multiple of some number, usually 8 or 16. Here's some code on how it might do that:
void* malloc(size_t size) {
if (size & 0x7 != size)
size = size & 0x7 + 1;
return _malloc(size);
}
So, if you requested, say, 5 bytes, malloc would actually allocate, with that code, 8 bytes. So, if you request the 6th byte (n + 1), you would get garbage, but it is still valid memory that your program can access.
I am sorry if I may not have phrased the question correctly, but in the following code:
int main() {
char* a=new char[5];
a="2222";
a[7]='f'; //Error thrown here
cout<<a;
}
If we try to access a[7] in the program, we get an error because we haven't been assigned a[7].
But if I do the same thing in a class :
class str
{
public:
char* a;
str(char *s) {
a=new char[5];
strcpy(a,s);
}
};
int main()
{
str s("ssss");
s.a[4]='f';s.a[5]='f';s.a[6]='f';s.a[7]='f';
cout<<s.a<<endl;
return 0;
}
The code works, printing the characters "abcdfff".
How are we able to access a[7], etc in the code when we have only allocated char[5] to a while we were not able to do so in the first program?
In your first case, you have an error:
int main()
{
char* a=new char[5]; // declare a dynamic char array of size 5
a="2222"; // assign the pointer to a string literal "2222" - MEMORY LEAK HERE
a[7]='f'; // accessing array out of bounds!
// ...
}
You are creating a memory leak and then asking why undefined behavior is undefined.
Your second example is asking, again, why undefined behavior is undefined.
As others have said, it's undefined behavior. When you write to memory out of bounds of the allocated memory for the pointer, several things can happen
You overwrite an allocated, but unused and so far unimportant location
You overwrite a memory location that stores something important for your program, which will lead to errors because you've corrupted your own memory at that point
You overwrite a memory location that you aren't allowed to access (something out of your program's memory space) and the OS freaks out, causing an error like "AccessViolation" or something
For your specific examples, where the memory is allocated is based on how the variable is defined and what other memory has to be allocated for your program to run. This may impact the probability of getting one error or another, or not getting an error at all. BUT, whether or not you see an error, you shouldn't access memory locations out of your allocated memory space because like others have said, it's undefined and you will get non-deterministic behavior mixed with errors.
int main() {
char* a=new char[5];
a="2222";
a[7]='f'; //Error thrown here
cout<<a;
}
If we try to access a[7] in the program, we get an error because we
haven't been assigned a[7].
No, you get a memory error from accessing memory that is write-protected, because a is pointing to the write-only memory of "2222", and by chance two bytes after the end of that string is ALSO write-protected. If you used the same strcpy as you use in the class str, the memory access would overwrite some "random" data after the allocated memory which is quite possibly NOT going to fail in the same way.
It is indeed invalid (undefined behaviour) to access memory outside of the memory you have allocated. The compiler, C and C++ runtime library and OS that your code is produced with and running on top of is not guaranteed to detect all such things (because it can be quite time-consuming to check every single operation that accesses memory). But it's guaranteed to be "wrong" to access memory outside of what has been allocated - it just isn't always detected.
As mentioned in other answers, accessing memory past the end of an array is undefined behavior, i.e. you don't know what will happen. If you are lucky, the program crashes; if not, the program continues as if nothing was wrong.
C and C++ do not perform bounds checks on (simple) arrays for performance reasons.
The syntax a[7] simply means go to memory position X + sizeof(a[0]), where X is the address where a starts to be stored, and read/write. If you try to read/write within the memory that you have reserved, everything is fine; if outside, nobody knows what happens (see the answer from #reblace).
I made a small program that looked like this:
void foo () {
char *str = "+++"; // length of str = 3 bytes
char buffer[1];
strcpy (buffer, str);
cout << buffer;
}
int main () {
foo ();
}
I was expecting that a stack overflow exception would appear because the buffer had smaller size than the str but it printed out +++ successfully... Can someone please explain why would this happened ?
Thank you very much.
Undefined Behavior(UB) happened and you were unlucky it did not crash.
Writing beyond the bounds of allocated memory is Undefined Behavior and UB does not warrant a crash. Anything might happen.
Undefined behavior means that the behavior cannot be defined.
You don't get a stack overflow because it's undefined behaviour, which means anything can happen.
Many compilers today have special flags that tell them to insert code to check some stack problems, but you often need to explicitly tell the compiler to enable that.
Undefined behavior...
In case you actually care about why there's a good chance of getting a "correct" result in this case: there are a couple of contributing factors. Variables with auto storage class (i.e., normal, local variables) will typically be allocated on the stack. In a typical case, all items on the stack will be a multiple of some specific size, most often int -- for example, on a typical 32-bit system, the smallest item you can allocate on the stack will be 32 bits. In other words, on your typical 32-bit system, room for four bytes (of four chars, if you prefer that term).
Now, as it happens, your source string contained only 3 characters, plus the NUL terminator, for a total of 4 characters. By pure bad chance, that just happened to be short enough to fit into the space the compiler was (sort of) forced to allocate for buffer, even though you told it to allocate less.
If, however, you'd copied a longer string to the target (possibly even just a single byte/char longer) chances of major problems would go up substantially (though in 64-bit software, you'd probably need longer still).
There is one other point to consider as well: depending on the system and the direction the stack grows, you might be able to write well the end of the space you allocated, and still have things appear to work. You've allocated buffer in main. The only other thing defined in main is str, but it's just a string literal -- so chances are that no space is actually allocated to store the address of the string literal. You end up with the string literal itself allocated statically (not on the stack) and its address substituted where you've used str. Therefore, if you write past the end of buffer, you may be just writing into whatever space is left at the top of the stack. In a typical case, the stack will be allocated one page at a time. On most systems, a page is 4K or 8K in size, so for a random amount of space used on the stack, you can expect an average of 2K or 4K free respectively.
In reality, since this is in main and nothing else has been called, you can expect the stack to be almost empty, so chances are that there's close to a full page of unused space at the top of the stack, so copying the string into the destination might appear to work until/unless the source string was quite long (e.g., several kilobytes).
As to why it will often fail much sooner than that though: in a typical case, the stack grows downward, but the addresses used by buffer[n] will grow upward. In a typical case, the next item on the stack "above" buffer will be the return address from main to the startup code that called main -- therefore, as soon as you write past the amount of space on the stack for buffer (which, as above, is likely to be larger than you specified) you'll end up overwriting the return address from main. In that case, the code inside main will often appear to work fine, but as soon as execution (tries to) return from main, it'll end up using that data you just wrote as the return address, at which point you're a lot more likely to see visible problems.
Outlining what happens:
Either you are lucky and it crashes at once. Or because it's undefined technically you could end up writing to a memory address used by something else. say that you had two buffers, one buffer[1] and one longbuffer[100] and assume that the memory address at buffer[2] could be the same as longbuffer[0] which would mean that long buffer now terminates at longbuffer[1] (because the null-termination).
char *s = "+++";
char longbuffer[100] = "lorem ipsum dolor sith ameth";
char buffer[1];
strcpy (buffer, str);
/*
buffer[0] = +
buffer[1] = +
buffer[2] = longbuffer[0] = +
buffer[3] = longbuffer[0] = \0 <- since assigning s will null terminate (i.e. add a \0)
*/
std::cout << longbuffer; // will output: +
Hope that helps in clarifying please note it's not very likely that these memory addresses will be the same in the random case, but it could happen, and it doesn't even need to be the same type, anything can be at buffer[2] and buffer[3] addresses before being overwritten by the assignment. Then the next time you try to use your (now destroyed) variable it might well crash, and thats when debugging become a bit tedious since the crash doesn't seem to have much to do with the real problem. (i.e. it crashes when you try to access a variable on your stack while the real problem is that you somewhere else in your code destroyed it).
There is no explicit bounds checking, or exception throwing on strcpy - it's a C function. If you want to use C functions in C++, you're going to have to take on the responsibility of checking for bounds etc. or switch to using std::string.
In this case it did work, but in a critical system, taking this approach might mean that your unit tests pass but in production, your code barfs - not a situation that you want.
Stack corruption is happening, its an undefined behaviour, luckily crash didnt occur. Do the below modifications in your program and run it will crash surely because of stack corruption.
void foo () {
char *str = "+++"; // length of str = 3 bytes
int a = 10;
int *p = NULL;
char buffer[1];
int *q = NULL;
int b = 20;
p = &a;
q = &b;
cout << *p;
cout << *q;
//strcpy (buffer, str);
//Now uncomment the strcpy it will surely crash in any one of the below cout statment.
cout << *p;
cout << *q;
cout << buffer;
}