Dynamically created string allocated on heap or stack - C - c++

Context
I was experimenting with getting C strings in C++ without allocating memory on the heap and came across this in testing:
#include <stddef.h>
#include <stdlib.h>
char* get_empty_c_string(size_t length) {
char buffer[length];
char *string = buffer;
for (size_t i = 0; i ^ length; i++) *(string + i) = '\0';
return string;
}
int main(void) {
char *string = get_empty_c_string(20u); // Allocated on heap?
// or stack?
return 0;
}
Question
Is the C string returned allocated on heap or stack?
As far as I know:
Heap allocation occurs with the calloc, malloc & realloc C standard functions or new & new[] C++ keywords.
Stack allocation in most other cases.

The array buffer is a variable length array (VLA), meaning its size is determined at runtime. As a variable local to a function is resides on the stack. The pointer string then points to that array, and that pointer is returned. And because the returned pointer points to a local stack variable which goes out of scope, attempting to use that pointer will invoke undefined behavior.
Also, note that VLAs are a C only feature.

There is no way in standard C++ to obtain runtime-sized memory of automatic storage duration (which usually maps to stack memory).
Therefore a proper string of any length cannot be obtained on the stack. You can only allocate a buffer with a maximal size and use strings up to that length in the program. (Something similar is usually done by std::string as so-called short string optimization.)
Furthermore, you cannot return pointers or references to variables with automatic storage duration from a function. When the function returns the variables are destroyed and the pointer/reference becomes invalid. You can only ever use the stack-allocation until the function returns. You can however return the variable by-value.

As #PaulMcKenzie points out, your implementation of get_empty_c_string() would fail to compile: In essence, arrays as temporary/instance variables of a function need to have a static size defined for them prior to compile time. This is because that volume of memory is pushed onto the stack at function invocation.
I can see that you're trying to have dynamic memory allocation as part of the function itself, which is why you need such heap-allocators.

Related

What happens when I initialize an array having the size as a variable?

I want to know where is my array stored if it has variable size such as in the code below this is because in my textbook it says that during runtime memory is allocated to the heap to my understanding, but it seems that it actually allocated to the stack can someone clarify how stack and heap memory allocation actually works.
#include<iostream>
using namespace std;
int main(){
int Array_size;
cin >> Array_size;
int array[Array_size];
return 0;
}
Your book is wrong, or you are misreading it.
A Variable-Length Array (a non-standard extension implemented by few C++ compilers) is always allocated in automatic memory (ie, on the stack), never in dynamic memory (ie, on the heap). The array's memory is reclaimed by the compiler when the array goes out of scope, just like any other local variable.
Dynamic memory is allocated only by the new operator, or by the [std::](m|c)alloc() functions.
Stack and heap memory are a bit abstract so I understand your confusion. In general, any variables inside functions, including main, that are not dynamically allocated (i.e. declaring a variable using new) go on the stack. If you declare a pointer, that pointer points to a variable on the heap while the pointer remains on the stack. This is why you must always clear the pointer using delete and, preferably, set the pointer to NULL if it has no more use. Any variables or functions that are pushed onto the stack are popped automatically (think of push and pop as inserting and removing, it’s just correct terminology). Memory on the heap is allocated and deallocated manually and during runtime. Hopefully that clears some confusion.

How to create string with predefined value and allocated space?

In C, there is a nice construct to create a c-string with more allocated space:
char str[6] = "Hi"; // [H,i,0,0,0,0]
I thought I could do the same using (4) version of string constructor, but the reference says
The behavior is undefined if s does not point at an array of at least count elements of CharT.
So it is not safe to use
std::string("Hi", 6);
Is there any way to create such std::string without extra copies and reallocations?
Theory:
Legacy c-strings
Consider the following snippet:
int x[10];
void method() {
int y[10];
}
The first declaration, int x[10], uses static storage duration, defined by cppreference as: "The storage for the object is allocated when the program begins and deallocated when the program ends. Only one instance of the object exists. All objects declared at namespace scope (including global namespace) have this storage duration, plus those declared with static or extern."
In this case, the allocation happens when the program begins and freed when it ends. From cppreference.com:
static storage duration. The storage for the object is allocated when the program begins and deallocated when the program ends.
Informally, it is implementation-defined. But, since these strings never change they are stored in read-only memory segments (.BSS/.DATA) of the executable and are only referenced during run-time.
The second one, int y[10], uses automatic storage duration, defined by cppreference as: "The object is allocated at the beginning of the enclosing code block and deallocated at the end. All local objects have this storage duration, except those declared static, extern or thread_local."
In this case, there is a very simple allocation, a simple as moving the stack pointer in most cases.
std::string
A std::string on the other hand is a run-time creature, and it has to allocate some run-time memory:
For smaller strings, std::string has an inner buffer with a constant size and is capable of storing small strings (think of it as a char buffer[N] member)
For larger strings, it performs dynamic allocations.
Practice
You could use reserve(). This method makes sure that the underlying buffer can hold at least N charT's.
Option 1: First reserve, then append
std::string str;
str.reserve(6);
str.append("Hi");
Option 2: First construct, then reserve
std::string str("Hi");
str.reserve(6);
To ensure at most one runtime allocation, you could write:
std::string str("Hi\0\0\0", 6);
str.resize(2);
However, in practice many string implementations use the Small String Optimization, which makes no allocations if the string is "short" (up to size 16 is suggested on that thread). So actually you would not suffer a reallocation by starting the string off at size 2 and later increasing to 6.

The difference of dynamic memory, stack memory, and static memory and in c++?

I want to know the difference between dynamic memory, stack memory and static memory in C++.
Here is some code as an example:
#include<iostream>
using namespace std;
char *GetMemory(void)
{
char p[]="hello world";
char *q="hello world";
return q;
}
int main(void)
{
return 0;
}
Why is p in the stack memory, but the q in dynamic memory?
p and q are both variables. p is of type "array of 12 char" and q is of type "pointer to char". Both p and q have automatic storage duration. That is, they are allocated on the stack.
q is a pointer and it is initialized to point to the initial character of the string "hello world". This string is a string literal, and all string literals have static storage duration.
p is an array, so when you initialize p with a string literal, it causes p to declare an array of characters, and when it is initialized, the contents of the string literal are copied into the array. So, when GetMemory() is called, space is allocated on the stack for the array p, and the contents of the string literal "hello world" are copied into that array.
No dynamic allocation is performed by your code.
Note that because q is a pointer to an array of characters that have static storage duration, it is safe to return q from the function: the array to which it points will exist for the entire duration of the program. It would not be safe to return p, however, because p ceases to exist when the function returns.
Note also that the type of "hello world" is char const[12]. There is an unsafe implicit conversion in C++ that allows a string literal to be converted to a char* pointing to the initial character of the string literal. This is unsafe because it silently drops the const-qualification. You should always use const char* when handling string literals, because the characters are not modifiable. (In the latest revision of the C++ language, C++11, this unsafe conversion has been removed.)
why is the the "p" in the stack memory but the "q" in the dynamic memory?
That's not true; both p and q are allocated with automatic storage duration (implemented as a stack structure). The differences between them are:
p is an array and points to modifiable memory (stack allocated).
q is a pointer and points to readonly memory that has been allocated statically. You really should have declared it as:
const char *p = "whatever";
There is no dynamic allocation here. You didn't call new, malloc, or some routine which uses those behind the scenes to allocate memory. As a result, it is incorrect to return p from this function as it will be invalid once the function returns.
For your examples, since you are using a string literal, this is likely written into the DATA segment of the executable. There is no dynamic memory allocated. A better example is something like this:
void foo()
{
//This is a stack variable. Space is allocated
//on the stack to store it. Its lifetime is
//the routine that calls it.
some_class stack_variable;
//This is a heap-allocated variable. It will
//remain in memory indefinitely unless deleted.
//If a pointer to this isn't returned, and it
//isn't deleted by the end of the routine, this
//will become a "memory leak".
another_class *heap_variable = new another_class();
//This is a (method) static variable. It retains its
//value between method calls
static int method_static = 1;
++method_static;
}
At the closing brace, stack_variable is cleaned up (that is, the stack space it occupied is reclaimed). heap_variable hasn't been deleted, and thus is a memory leak. If we call this method a few times:
for(int i = 0; i < 5; ++i) { foo(); }
Then method_static will have a value of 5.

The array is static, but the array size isn't know until runtime. How is this possible?

This has been troubling me for a while. It goes to the heart of my (lack of) understanding of the difference between static and dynamic memory allocation. The following array is an ordinary static array, which should mean the memory is allocated during compile time, correct? Yet, I've set it up so that the user enters the array size at runtime.
#include <iostream>
using namespace std;
int main() {
cout << "how many elements should the array hold? ";
int arraySize;
cin >> arraySize;
int arr[arraySize];
for (int i = 0; i < arraySize; ++i)
arr[i] = i * 2;
return 0;
}
Note that there are no new or delete operators in this program. It works fine in Xcode 4.2 (default Clang compiler) as well as my school's UNIX server (GCC 4.4.5). How does the compiler know how much memory to allocate for arr when the array is created at compile time? Is this just a fluke of my compiler, dangerous code that could corrupt other memory, or is this legit?
This is a non-standard extension of your C++ compilers. Note that in C, unlike in C++, this is officially supported (i.e. standard-mandated behaviour) since C99. In C++, it is not supported because there's already a solution to the problem: Use std::vector instead of the array.
Not however that the array is not using static memory allocation (nor dynamic memory allocation), but automatic memory allocation. Automatic variables are automatically deallocated at the end of the function (the memory area where they are allocated is known as the stack, because the allocations and deallocations on it have stack semantics). To have the array use static memory allocation you would have to put static in front of the definition (note that variables in global or namespace scope always use static memory allocation, though). However, if you make the variable static, you'll find that the compiler doesn't allow to use a non-constant array size any more.
Note that std::vector stores its data with dynamic memory allocations instead. For that reason, you can also use a non-constant size even for static std::vectors.
For an array (or any object) declared inside a function, the memory is allocated on entry to the function (typically on the stack) and deallocated when the function returns. The fact that the function happens to be main in this case doesn't affect that.
This:
cin >> arraySize;
int arr[arraySize];
is a "variable-length array" (VLA). The thing is, C++ doesn't support VLAs. C does, starting with the 1999 ISO C standard (C99), but it's not a feature that C++ has adopted.
Your compiler supports VLAs in C++ as an extension. Using them makes your code non-portable.
(One problem with VLAs is that there's no mechanism for detecting an allocation failure; if arraySize is too big, the program's behavior is undefined).
For gcc, compiling with -pedantic will produce a warning:
warning: ISO C++ forbids variable length array ‘arr’
The generated code allocates arraySize bytes on the stack at runtime. Once the function returns, the stack unwinds, including "giving back" the bytes which were allocated on it for the array.
Using new and delete is for allocating space on the heap. The allocated memory lifetime on the heap is independent of any function or method scope - If you allocate space on it in a function, and the function returns, the memory is still allocated and valid.
It's a Variable Length Array (supported only in C99 and not in C++). It is allocated on the stack at runtime.

Differences between dynamic memory and "ordinary" memory

What are some of the technical differences between memory that is allocated with the new operator and memory that is allocated via a simple variable declaration, such as int var? Does c++ have any form of automatic memory management?
In particular, I have a couple questions. First, since with dynamic memory you have to declare a pointer to store the address of the actual memory you work with, doesn't dynamic memory use more memory? I don't see why the pointer is necessary at all unless you're declaring an array.
Secondly, if I were to make a simple function such as this:
int myfunc() { int x = 2; int y = 3; return x+y; }
...And call it, would the memory allocated by the function be freed as soon as it's scope of existence has ended? What about with dynamic memory?
Note: This answer is way too long. I'll pare it down sometime. Meanwhile, comment if you can think of useful edits.
To answer your questions, we first need to define two areas of memory called the stack and the heap.
The stack
Imagine the stack as a stack of boxes. Each box represents the execution of a function. At the beginning, when main is called, there is one box sitting on the floor. Any local variables you define are in that box.
A simple example
int main(int argc, char * argv[])
{
int a = 3;
int b = 4;
return a + b;
}
In this case, you have one box on the floor with the variables argc (an integer), argv (a pointer to a char array), a (an integer), and b (an integer).
More than one box
int main(int argc, char * argv[])
{
int a = 3;
int b = 4;
return do_stuff(a, b);
}
int do_stuff(int a, int b)
{
int c = a + b;
c++;
return c;
}
Now, you have a box on the floor (for main) with argc, argv, a, and b. On top of that box, you have another box (for do_stuff) with a, b, and c.
This example illustrates two interesting effects.
As you probably know, a and b were passed-by-value. That's why there is a copy of those variables in the box for do_stuff.
Notice that you don't have to free or delete or anything for these variables. When your function returns, the box for that function is destroyed.
Box overflow
int main(int argc, char * argv[])
{
int a = 3;
int b = 4;
return do_stuff(a, b);
}
int do_stuff(int a, int b)
{
return do_stuff(a, b);
}
Here, you have a box on the floor (for main, as before). Then, you have a box (for do_stuff) with a and b. Then, you have another box (for do_stuff calling itself), again with a and b. And then another. And soon, you have a stack overflow.
Summary of the stack
Think of the stack as a stack of boxes. Each box represents a function executing, and that box contains the local variables defined in that function. When the function returns, that box is destroyed.
More technical stuff
Each "box" is officially called a stack frame.
Ever notice how your variables have "random" default values? When an old stack frame is "destroyed", it just stops being relevant. It doesn't get zeroed out or anything like that. The next time a stack frame uses that section of memory, you see bits of old stack frame in your local variables.
The heap
This is where dynamic memory allocation comes into play.
Imagine the heap as an endless green meadow of memory. When you call malloc or new, a block of memory is allocated in the heap. You are given a pointer to access this block of memory.
int main(int argc, char * argv[])
{
int * a = new int;
return *a;
}
Here, a new integer's worth of memory is allocated on the heap. You get a pointer named a that points to that memory.
a is a local variable, and so it is in main's "box"
Rationale for dynamic memory allocation
Sure, using dynamically allocated memory seems to waste a few bytes here and there for pointers. However, there are things that you just can't (easily) do without dynamic memory allocation.
Returning an array
int main(int argc, char * argv[])
{
int * intarray = create_array();
return intarray[0];
}
int * create_array()
{
int intarray[5];
intarray[0] = 0;
return intarray;
}
What happens here? You "return an array" in create_array. In actuality, you return a pointer, which just points to the part of the create_array "box" that contains the array. What happens when create_array returns? Its box is destroyed, and you can expect your array to become corrupt at any moment.
Instead, use dynamically allocated memory.
int main(int argc, char * argv[])
{
int * intarray = create_array();
int return_value = intarray[0];
delete[] intarray;
return return_value;
}
int * create_array()
{
int * intarray = new int[5];
intarray[0] = 0;
return intarray;
}
Because function returning does not modify the heap, your precious intarray escapes unscathed. Remember to delete[] it after you're done though.
Dynamic memory lives on the heap as opposed to the stack. The lifetime of dynamic memory is from the time of allocation, to the time of deallocation. With local variables, their lifetime is limited to the function / block they are defined in.
Regarding your question about the memory usage in the function, in your example the memory for your local variables would be freed at the end of the function. However, if the memory was dynamically allocated with new, it would not be automatically disposed, and you would be responsible for explicitly using delete to free the memory.
Regarding automatic memory management, the C++ Standard Library provides auto_ptr for this.
Memory allocated by "new" ends up on the heap.
Memory allocated in a function resides inside the function where the function is placed on the stack.
Read about stack vs heap allocation here: http://www-ee.eng.hawaii.edu/~tep/EE160/Book/chap14/subsection2.1.1.8.html
Memory allocated with the new operator is fetched from a memory section called "heap" while static allocations for variables are use a memory section shared with procedure/function-calls (the "stack").
You only need to worry about the dynamic memory allocations you made yourself with new, variables which are known at compile-time (defined in the source) are automatically freed at the end of their scope (end of function/procedure, block, ...).
The big difference between "dynamic" and "ordinary" memory was rather good reflected in the question itself.
Dynamic memory is not too good supported by C++ at all.
When you use dynamic memory, you are totally responsible for it by yourself. You have to allocate it. When you forget to do it and try to access it threw your pointer, you will have plenty off negative surprises. Also you have to free the memory -- and when you forget it by any way, you will have even more surprises. Such errors belong to the most difficult errors to find in C/C++ programms.
You need an extra pointer, since somehow you need access to your new memory. Some memory (if dynamic or not) is first of it nothing a programming language can handle. You need to have access to it. This is done by variables. But variables in languages like C++ are stored in "ordinary" memory. So you need to have "pointers" -- pointers are a form of indirection, that says "No, I am not the value you are searching for, but I point to it". Pointers are the only possibility in C++ to access dynamic memory.
By contrast, "ordinary" memory can be accessed directly, allocation and freeing is done automatically by the language itself.
Dynamic memory and pointers is the biggest source for problems in C++ -- but it is also a very mighty concept -- when you do it right, you can do much more then with ordinary memory.
That is also the reason, plenty of libraries have functions or whole modules for dealing with dynamic memory. The auto_ptr-example was also mentioned in a parallel answer, that tries to deal with the problem, that dynamic memory should be reliably released at the end of a method.
Normally you will use dynamic memory only in cases you really need it. You will not use it, to have a single integer variable, but to have arrays or build larger data structures in memory.