Heap corruption by strings

Heap corruption by strings - c++

Lets say I have this piece of code
void someFunction(args..) {
char array[4];
array[0] = 'a';
array[1] = 'b';
array[2] = 'c';
array[3] = 'd';
}
Basically, what I'm getting at is that there is no '\0' at the end of the array.
When we leave this function, the array[] is de-allocated - right? Can the fact that there is no '\0' sign at the end cause heap corruption? What if functions like these occur often? Is it the same if I do this:
void someFunction(args..) {
char* array = new char[4];
array[0] = 'a';
array[1] = 'b';
array[2] = 'c';
array[3] = 'd';
//and now i dont call
//delete array;
}
Thanks in advance for the help ! :)

No, neither case will cause heap corruption.
The terminating null character is used to signal the end of a string to library functions. You are not using any library functions here.
In the first case, the array is allocated on the stack and you give an exact size (4). When the function is called the stack pointer will be decremented enough to add room for this variable, and when it returns the stack pointer will be incremented the same amount. The actual contents of the array (including the presence or absence of any null terminating character) has absolutely no effect on this process.
Your second case will cause a memory leak, but still won't cause heap corruption because -- again -- you don't use the pointer with any library functions that expect it.

When we leave this function, the array[] is de-allocated - right?
Yes. Like any automatic variable, it's destroyed when the program leaves its scope.
Can the fact that there is no '\0' sign at the end cause heap corruption?
It won't corrupt anything unless you write an element that's out of range; perhaps by passing it to a function like strcpy which can write an arbitrary number of characters to it. But simply creating it, writing within its range, and destroying it won't do any harm. (In any event, it's unlikely to corrupt the heap, since it's on the stack).
The terminator is only needed by code that interprets the array contents as a C-style string. There's no requirement for arrays to be terminated in general, and it's fairly unusual to use terminated string in C++, which has the much more convenient std::string type.
What if functions like these occur often?
No problem. Each time, the array is created on the function's stack frame, which is released when the function returns.
Is it the same if I do this: [new with no delete]
That causes a memory leak, since you're allocating a dynamic array but never freeing it. Again, it won't corrupt anything, since you're only writing within the array bounds; but if you keep leaking, then eventually you'll run out of memory.

Related

Memory leak on deallocating char * set by strcpy?

I have a memory leak detector tool which tells me below code is leaking 100 bytes
#include <string>
#include <iostream>
void setStr(char ** strToSet)
{
strcpy(*strToSet, "something!");
}
void str(std::string& s)
{
char* a = new char[100]();
setStr(&a);
s = a;
delete[] a;
}
int main()
{
std::string s1;
str(s1);
std::cout << s1 << "\n";
return 0;
}
According to this point number 3 it is leaking the amount I allocated (100) minus length of "something!" (10) and I should be leaking 90 bytes.
Am I missing something here or it is safe to assume the tool is reporting wrong?
EDIT: setStr() is in a library and I cannot see the code, so I guessed it is doing that. It could be that it is allocating "something!" on the heap, what about that scenario? Would we have a 90 bytes leak or 100?

This code does not leak and is not the same as point number 3 as you never overwrite variables storing pointer to allocated memory. The potential problems with this code are that it is vulnerable to buffer overflow as if setStr prints more than 99 symbols and it is not exception-safe as if s = a; throws then delete[] a; won't be called and memory would leak.
Updated: If setStr allocates new string and overwrites initial pointer value then the pointer to the 100 byte buffer that you've allocated is lost and those 100 bytes leak. You should initialize a with nullptr prior to passing it to setStr and check that it is not null after setStr returns so assignment s = a; won't cause null pointer dereference.

Summing up all the comments, it is clear what the problem is. The library you are using is requesting a char **. This is a common interface pattern for C functions that allocate memory and return a pointer to that memory, or that return a pointer to memory they own.
The memory you are leaking is allocated in the line char* a = new char[100]();. Because setStr is changing the value of a, you can no longer deallocate that memory.
Unfortunately, without the documentation, we cannot deduce what you are supposed to do with the pointer.
If it is from a call to new[] you need to call delete[].
If it is from a call to malloc you need to call std::free.
If it is a pointer to memory owned by the library, you should do nothing.
You need to find the documentation for this. However, if it is not available, you can try using your memory leak detection tool after removing the new statement and see if it detects a leak. I'm not sure if it is going to be reliable with memory allocated from a library function but it is worth a try.
Finally, regarding the question in your edit, if you leak memory you leak the whole amount, unless you do something that is undefined behavior, which is pointless to discuss anyway. If you new 100 chars and then write some data on them, that doesn't change the amount of memory leaked. It will still be 100 * sizeof(char)

Potential memory leak?

The following code resolves the problem of removing the duplicate characters in a string.
void removeDuplicatesEff(char *str)
{
if (!str)
return;
int len = strlen(str);
if (len < 2)
return;
const int sz = (1<<CHAR_BIT);
bool hit[sz] = {false};
int tail = 0;
for (int i=0; i<len; ++i)
{
if (!hit[str[i]])
{
str[tail] = str[i];
++tail;
hit[str[i]] = true;
}
}
str[tail] = 0;
}
After setting str[tail]=0 in the last step, if char *str does contain duplicate characters, its size will be smaller, i.e. tail. But I am wondering whether there is a memory leak here? It seems to me that, later, we cannot releasing all the spaces that is allocated to original char *str. Is this right? If so, how can we resolve it in such situations?

It seems to me that, later, we cannot releasing all the spaces that is allocated to original char *str. Is this right?
No. The length of a zero-terminated string is completely decoupled from the size of the allocated memory buffer, and the system treats it separately. As long as every allocation is followed by a symmetrical deallocation (e.g. there’s a free for every malloc operation), you’re safe.
But I am wondering whether there is a memory leak here?
Arguably, yes, this is still a leak since it (temporarily) uses more memory than required. However, that is usually not a problem since the memory gets released eventually. Except in very special circumstances, this would therefore not be considered a leak.
That said, the code is quite unconventional and definitely longer than necessary (it also assumes that CHAR_BIT == 8 but that’s another matter). For instance, you can initialise your flag array much easier, saving a loop:
bool hit[256] = {false};
And why is your loop going over the string one-based, and why is the first character handled separately?

No, there is no leak. You only modify the contents of the array by putting in 0 and not its length.
Also you shouldn't initialize your hit array by assignment with the for-loop. A standard initialization
bool hit[256] = { 0 };
would suffice and can be replaced by your compiler by the most efficient form of initialization.

There is no memory leak in your case. Memory leak happens when you allocate memory from head and not freeing after using it. In your case you are not allocating any memory from heap. You are using local variables which are stored in stack and freed when control returns from that function.

What you are doing is just changing the placement of the terminator character. It doesn't actually change the size of the allocated memory. It's actually a very common operation, and there is no risk of memory leak from doing it.

No, you will not have a memory leak. Performing a delete [] or free() on str will deallocate all allocated memory just fine because that information is stored elsewhere and does not depend on the type of data being stored in str.

But I am wondering whether there is a memory leak here? It seems to me that, later, we cannot releasing all the spaces that is allocated to original char *str
There's probably no problem here. the storage for str has been allocated in one of the following ways:
reserved space on the stack
malloc space on the heap
reserved space in the data segment.
In the first case, all of the space disappears when the stack frame unwinds. In the second case, malloc records the number of bytes allocated (usually in the memory location just before the first byte pointed to by the malloc return value. In the third case, the space is allocated only once when the program is first loaded.
No possibility of a leak there.

Why does a large static array give a seg-fault but dynamic doesn't? (C++)

The following code gives me a segmentation fault:
bool primeNums[100000000]; // index corresponds to number, t = prime, f = not prime
for (int i = 0; i < 100000000; ++i)
{
primeNums[i] = false;
}
However, if I change the array declaration to be dynamic:
bool *primeNums = new bool[100000000];
I don't get a seg-fault. I have a general idea of why this is: in the first example, the memory's being put on the stack while in the dynamic case it's being put on the heap.
Could you explain this in more detail?

bool primeNums[100000000];
used out all your stack space, therefore, you will get segmentation fault since there is not enough stack space to allocate a static array with huge size.
dynamic array is allocated on the heap, therefore, not that easy to get segmentation fault. Dynamic arrays are created using new in C++, it will call operator new to allocate memory then call constructor to initialize the allocated memory.
More information about how operator new works is quoted from the standard below [new.delete.single]:
Required behavior:
Return a nonnull pointer to suitably aligned storage (3.7.3), or else throw a bad_alloc exception. This requirement is binding on a replacement version of this function.
Default behavior:
— Executes a loop: Within the loop, the function first attempts to allocate the requested storage. Whether the attempt involves a call to the Standard C library function malloc is unspecified.
— Returns a pointer to the allocated storage if the attempt is successful. Otherwise, if the last argument to set_new_handler() was a null pointer, throw bad_alloc.
— Otherwise, the function calls the current new_handler (18.4.2.2). If the called function returns, the loop repeats.
— The loop terminates when an attempt to allocate the requested storage is successful or when a called new_handler function does not return.
So using dynamic array with new, when there is not enough space, it will throw bad_alloc by default, in this case, you will see an exception not a segmentation fault, when your array size is huge, it is better to use dynamic array or standard containers such as vectors.

bool primeNums[100000000];
This declaration allocates memory in the stack space. The stack space is a memory block allocated when your application is launched. It is usually in the range of a few kilobyes or megabytes (it depends on the language implementation, compiler, os, and other factors).
This space is used to store local and static variables so you have to be gentle and don't overuse it. Because this is a stack, all allocations are continuos (no empty space between allocations).
bool *primeNums = new bool[100000000];
In this case the memory is allocated is the heap. This is space free where large new chucks of memory can be allocated.

Some compilers or operating systems limit the size of the stack. On windows the default is 1 MB but it can be changed.

in the first case you allocate memory on stack:
bool primeNums[100000000]; // put 100000000 bools on stack
for (int i = 0; i < 100000000; ++i)
{
primeNums[i] = false;
}
however this is allocation on heap:
bool *primeNums = new bool[100000000]; // put 100000000 bools in the heap
and since stack is (very) limited this is the reason for segfault

Stack overflow for string in C++?

I made a small program that looked like this:
void foo () {
char *str = "+++"; // length of str = 3 bytes
char buffer[1];
strcpy (buffer, str);
cout << buffer;
}
int main () {
foo ();
}
I was expecting that a stack overflow exception would appear because the buffer had smaller size than the str but it printed out +++ successfully... Can someone please explain why would this happened ?
Thank you very much.

Undefined Behavior(UB) happened and you were unlucky it did not crash.
Writing beyond the bounds of allocated memory is Undefined Behavior and UB does not warrant a crash. Anything might happen.
Undefined behavior means that the behavior cannot be defined.

You don't get a stack overflow because it's undefined behaviour, which means anything can happen.
Many compilers today have special flags that tell them to insert code to check some stack problems, but you often need to explicitly tell the compiler to enable that.

Undefined behavior...
In case you actually care about why there's a good chance of getting a "correct" result in this case: there are a couple of contributing factors. Variables with auto storage class (i.e., normal, local variables) will typically be allocated on the stack. In a typical case, all items on the stack will be a multiple of some specific size, most often int -- for example, on a typical 32-bit system, the smallest item you can allocate on the stack will be 32 bits. In other words, on your typical 32-bit system, room for four bytes (of four chars, if you prefer that term).
Now, as it happens, your source string contained only 3 characters, plus the NUL terminator, for a total of 4 characters. By pure bad chance, that just happened to be short enough to fit into the space the compiler was (sort of) forced to allocate for buffer, even though you told it to allocate less.
If, however, you'd copied a longer string to the target (possibly even just a single byte/char longer) chances of major problems would go up substantially (though in 64-bit software, you'd probably need longer still).
There is one other point to consider as well: depending on the system and the direction the stack grows, you might be able to write well the end of the space you allocated, and still have things appear to work. You've allocated buffer in main. The only other thing defined in main is str, but it's just a string literal -- so chances are that no space is actually allocated to store the address of the string literal. You end up with the string literal itself allocated statically (not on the stack) and its address substituted where you've used str. Therefore, if you write past the end of buffer, you may be just writing into whatever space is left at the top of the stack. In a typical case, the stack will be allocated one page at a time. On most systems, a page is 4K or 8K in size, so for a random amount of space used on the stack, you can expect an average of 2K or 4K free respectively.
In reality, since this is in main and nothing else has been called, you can expect the stack to be almost empty, so chances are that there's close to a full page of unused space at the top of the stack, so copying the string into the destination might appear to work until/unless the source string was quite long (e.g., several kilobytes).
As to why it will often fail much sooner than that though: in a typical case, the stack grows downward, but the addresses used by buffer[n] will grow upward. In a typical case, the next item on the stack "above" buffer will be the return address from main to the startup code that called main -- therefore, as soon as you write past the amount of space on the stack for buffer (which, as above, is likely to be larger than you specified) you'll end up overwriting the return address from main. In that case, the code inside main will often appear to work fine, but as soon as execution (tries to) return from main, it'll end up using that data you just wrote as the return address, at which point you're a lot more likely to see visible problems.

Outlining what happens:
Either you are lucky and it crashes at once. Or because it's undefined technically you could end up writing to a memory address used by something else. say that you had two buffers, one buffer[1] and one longbuffer[100] and assume that the memory address at buffer[2] could be the same as longbuffer[0] which would mean that long buffer now terminates at longbuffer[1] (because the null-termination).
char *s = "+++";
char longbuffer[100] = "lorem ipsum dolor sith ameth";
char buffer[1];
strcpy (buffer, str);
/*
buffer[0] = +
buffer[1] = +
buffer[2] = longbuffer[0] = +
buffer[3] = longbuffer[0] = \0 <- since assigning s will null terminate (i.e. add a \0)
*/
std::cout << longbuffer; // will output: +
Hope that helps in clarifying please note it's not very likely that these memory addresses will be the same in the random case, but it could happen, and it doesn't even need to be the same type, anything can be at buffer[2] and buffer[3] addresses before being overwritten by the assignment. Then the next time you try to use your (now destroyed) variable it might well crash, and thats when debugging become a bit tedious since the crash doesn't seem to have much to do with the real problem. (i.e. it crashes when you try to access a variable on your stack while the real problem is that you somewhere else in your code destroyed it).

There is no explicit bounds checking, or exception throwing on strcpy - it's a C function. If you want to use C functions in C++, you're going to have to take on the responsibility of checking for bounds etc. or switch to using std::string.
In this case it did work, but in a critical system, taking this approach might mean that your unit tests pass but in production, your code barfs - not a situation that you want.

Stack corruption is happening, its an undefined behaviour, luckily crash didnt occur. Do the below modifications in your program and run it will crash surely because of stack corruption.
void foo () {
char *str = "+++"; // length of str = 3 bytes
int a = 10;
int *p = NULL;
char buffer[1];
int *q = NULL;
int b = 20;
p = &a;
q = &b;
cout << *p;
cout << *q;
//strcpy (buffer, str);
//Now uncomment the strcpy it will surely crash in any one of the below cout statment.
cout << *p;
cout << *q;
cout << buffer;
}

Does this generate a memory leak?

void aFunction_2()
{
char* c = new char[10];
c = "abcefgh";
}
Questions:
Will the: c = "abdefgh" be stored in the new char[10]?
If the c = "abcdefgh" is another memory area should I dealloc it?
If I wanted to save info into the char[10] would I use a function like strcpy to put the info into the char[10]?

Yes that is a memory leak.
Yes, you would use strcpy to put a string into an allocated char array.
Since this is C++ code you would do neither one though. You would use std::string.

void aFunction_2()
{
char* c = new char[10]; //OK
c = "abcefgh"; //Error, use strcpy or preferably use std::string
}
1- Will the: c = "abdefgh" be
allocated inner the new char[10]?
no, you are changing the pointer from previously pointing to a memory location of 10 bytes to point to the new constant string causing a memory leak of ten bytes.
2- If the c = "abcdefgh" is another
memory area should I dealloc it?
no, it hasn't been allocated on heap, its in read-only memory
3- If I wanted to save info inner the
char[10] I would use a function like
strcpy to put the info inner the
char[10]?
not sure what you mean with 'inner' here. when you allocate using new the memory is allocated in the heap and in normal circumstances can be accessed from any part of your program if you provide the pointer to the memory block.

Your answer already has been answered multiple times, but I think all answers are missing one important bit (that you did not ask for excplicitly):
While you allocated memory for ten characters and then overwrote the only pointer you have referencing this area of memory, you are created a memory leak that you can not fix anymore. To do it right, you would std::strcpy() the memory from the pre-allocated, pre-initialized constant part of the memory where the content of your string-literal has been stored into your dynamically allocated 10 characters.
And here comes the important part:
When you are done with dealing with these 10 characters, you deallocate them using delete[]. The [] are important here. Everything that you allocate using new x[] has to be deallocated with delete[]. Neither the compiler nor the runtime warn you when use a normal delete instead, so it's important to memorize this rule.

No, that is only pointer reassignment;
No, deleteing something that didn't come from new will often crash; and
Yes, strcpy will do the job… but it's not usually used in C++.
Since nobody has answered with code, std::uninitialized_copy_n (or just std::copy_n, it really doesn't make a difference here) is more C++ than strcpy:
#include <memory>
static char const abcs[] = "abcdefgh"; // define string (static in local scope)
char *c = new char[10]; // allocate
std::copy_n( abcs, sizeof abcs, c ); // initialize (no need for strlen)
// when you're done with c:
delete[] c; // don't forget []
Of course, std::string is what you should use instead:
#include <string>
std::string c( "abcdefgh" ); // does allocate and copy for you
// no need for delete when finished, that is automatic too!

No (pointer reassignment)
No
Yes, you can use strcpy(), see for instance:

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Heap corruption by strings - c++

Related

Memory leak on deallocating char * set by strcpy?

Potential memory leak?

Why does a large static array give a seg-fault but dynamic doesn't? (C++)

Stack overflow for string in C++?

Does this generate a memory leak?

Categories

Resources