Heap Memory for a String in C++ - c++

I have created a project file and in int main, I make a call to a function. In the function, there is a character array, corr[40], which stores the user's input letter by letter.(its a hangman game).After the function is executed, and then the program goes back to main. If the function is called again, then the array has the inputs of the previous call and is not erased. And hence only a few characters of the previous input are overwritten by new ones.
So I want to know how to allocate memory from heap to the array(using a pointer)? Or is there any other way I can correct this issue?

You've got a char[40] as a local variable in a function. Since that's not a class type, there is no constructor. The initial values will depend on whatever used to be in that memory location before. That might very well be all or some of the previous letters.
If you want the array to be zero each time, you can just use std::fill(std::begin(foo), std::end(foo), 0);
Note that using heap memory is no solution. There's still no constructor to initialize the heap memory, so that too would have any old value. Using std::string, which does have a constructor, is a solution.

Related

Appending to vector with non-dynamically allocated stack

I'm working on a coding question, and to solve it I'm creating my own data structure (class), "SetOfStacks", that has as a member a vector of stacks. In one of the member functions of SetOfStacks, I need to expand the vector using the push_back() function. To do this, I declare a stack variable (non-dynamically) in the member function and then pass that variable in to push_back().
The code works fine, but I don't understand why. I would figure that after the member function has finished executing, the stack variable would go out of scope (because it is not dynamically allocated) and as a result the vector would contain garbage. I would think that the solution would be to use dynamically allocated memory. Why does this work? My best hypothesis is that push_back() takes in the new stack by value and not by reference, effectively making a new copy of it. Any help is appreciated!
When you push_back() the stack elements to the vector, the element is passed by value to the vector, and not as a reference, so even if after the function definition the stack elements got destroyed, the vector has got the value already.
This you can related with the return value from function to caller. Even if the return value is local to function stack( i.e it is going to be destroyed after function gets executed), the return values gets copied to caller function before it gets destroyed.

Returning an object: value, pointer and reference

I know this has probably been asked and I've looked through other answers, but still I cannot get this completely.
I want to understand the difference between the two following codes:
MyClass getClass(){
return MyClass();
}
and
MyClass* returnClass(){
return new MyClass();
}
Now let's say I call such functions in a main:
MyClass what = getClass();
MyClass* who = returnClass();
If I got this straight, in the first case the object created in the
function scope will have automatic storage, i.e. when you exit the
scope of the function its memory block will be freed. Also, before
freeing such memory, the returned object will be copied into the
"what" variable I created. So there will exist only one copy of
the object. Am I correct?
1a. If I'm correct, why is RVO (Return Value Optimization) needed?
In the second case, the object will be allocated through a dynamic storage, i.e. it will exist even out of the function scope. So I need to use a deleteon it. The function returns a pointer to such object, so there's no copy made this time, and performing delete who will free the previously allocated memory. Am I (hopefully) correct?
Also I understand I can do something like this:
MyClass& getClass(){
return MyClass();
}
and then in main:
MyClass who = getClass();
In this way I'm just telling that "who" is the same object as the one created in the function. Though, now we're out of the function scope and thus that object doesn't necessarily exists anymore. So I think this should be avoided in order to avoid trouble, right? (and the same goes for
MyClass* who = &getClass();
which would create a pointer to the local variable).
Bonus question: I assume that anything said till now is also true when returning vector<T>(say, for example, vector<double>), though I miss some pieces.
I know that a vector is allocated in the stack while the things it contains are in the heap, but using vector<T>::clear() is enough to clear such memory.
Now I want to follow the first procedure (i.e. return a vector by value): when the vector will be copied, also the onjects it contains will be copied; but exiting the function scope destroys the first object. Now I have the original objects that are contained nowhere, since their vector has been destroyed and I have no way of deleting such objects that are still in the heap. Or maybe a clear() is performed automatically?
I know that I may beatray some misunderstandings in these subjects (expecially in the vector part), so I hope you can help me clarify them.
Q1. What happens conceptually is the following: you create an object of type MyClass on the stack in the stack frame of getClass.
You then copy that object into the return value of the function, which is a bit of stack that was allocated before the function call to hold this object.
Then the function returns, the temporary gets cleaned up. You copy the return value into the local variable what. So you have one allocation and two copies.
Most (all?) compilers are smart enough to omit the first copy: the temporary is not used except as return value. However, the copy from the return value into the local variable on the caller side cannot be omitted, because the return value lives on a part of the stack that is freed as soon as the function finishes.
Q1a. Return Value Optimization (RVO) is a special feature, that does allow that final copy to be elided. That is, instead of returning the function result on the stack, it will be allocated straight away in the memory allocated for what, avoiding all copying altogether. Note that, contrary to all other compiler optimizations, RVO can change the behaviour of your program! You could give MyClass a non-default copy constructor, that has side effects, like printing a message to the console or liking a post on Facebook. Normally, the compiler is not allowed to remove such function calls unless it can prove that these side effects are absent. However, the C++ specs contain a special exception for RVO, that says that even if the copy constructor does something non-trivial, it is still allowed to omit the return value copy and reduce the whole thing to a single constructor call.
2. In the second case, the MyClass instance is not allocated on the stack, but on the heap. The result of the new operator is an integer: the address of the object on the heap. This is the only point where you will ever be able to obtain this address (provided you didn't use placement new), so you need to hold onto it: if you lose it, you cannot call delete and you will have created a memory leak.
You assign the result of new to a variable whose type is denoted by MyClass* so that the compiler can do type checking and stuff, but in memory it is just an integer large enough to hold an address on your system (32- or 64-bits). You can check this for yourself by trying to coerce the result to a size_t (which is typedef'd to typically an unsigned int or something larger depending on your architecture) and seeing the conversion succeed.
This integer is returned to the caller by value, i.e. on the stack, just as in example (1). So again,
in principle, there is copying going on, but in this case only copying of a single integer which your CPU is very good at (most of the times it will not even go on the stack but get passed in a register) and not the whole MyClass object (which in general has to go on the stack because it's very large, read: larger than an integer).
3. Yes, you should not do that. Your analysis is correct: as the function finishes, the local object is cleaned up and its address becomes meaningless. The problem is, that it sometimes seems to work. Forgetting about optimizations for the time being, the main reason the way memory works: clearing (zero-ing) memory is quite expensive, so that is hardly ever done. Instead, it is just marked as available again, but it's not overwritten until you make another allocation that needs it. Therefore, even though the object is technically dead, its data may still be in the memory so when you dereference the pointer you may still get the right data back. However, since the memory is technically free, it may be overwritten at any time between right now and at the end of the universe. You have created what C++ calls Undefined Behaviour (UB): it may seem to work right now on your computer, but there's no telling what may happen somewhere else or at another point in time.
Bonus: When you return a vector by value, as you remarked, it is not just destroyed: it is first copied to the return value or - taking RVO into account - into the target variable. There are two options now: (1) The copy creates its own objects on the heap, and modifies its internal pointers accordingly. You now have two proper (deep) copies co-existing temporarily -- then when the temporary object goes out of scope, you are just left with the one valid vector. Or (2): When copying the vector, the new copy takes ownership of all the pointers that the old one holds. This is possible, if you know that the old vector is about to be destroyed: rather than re-allocating all the contents again on the heap, you can just move them to the new vector and leave the old one in a sort of half-dead state -- as soon as the function is done cleaning that stack the old vector is no longer there anyway.
Which of these two options is used, is really irrelevant or rather, an implementation detail: they have the same result and whether the compiler is smart enough to choose (2) should not usually be your concern (though in practice option (2) will always happen: deep copying an object just to destroy the original is just pointless and easily avoided).
As long as you realize that the thing that gets copied is the part on the stack and the ownership of the pointers on the heap gets transferred: no copying happens on the heap and nothing gets cleared.
Here are my answers to your different questions:
1- You are absolutely correct. If I understand the sequentiallity correctly, your code will allocate memory, create your object, copy the variable into the what variable, and get destroyed as out of scope. The same thing happens when you do:
int SomeFunction()
{
return 10;
}
This will create a temporary that holds 10 (so allocate), copy it to the return vairbale, and then destroy the temporary (so deallocate) (Here I'm not sure of the specifics, maybe the compiler can remove some stuff via automatic inlining, constante values, ... but you get the idea). Which brings me to
1a- You need RVO when to limit this allocation, copy, and deallocation part. If your class allocates a lot of data upon construction it is a bad idea to return it directly. You can use move constructor in that case, and reuse the storage space allocated by the temporary for example. Or return a pointer. Which takes all the way down to
2- Returning a pointer works exactly as returning an int from a function. But because pointers are only 4 or 8 bytes long, allocation and deallocation cost a lot less than doing so for a class that's 10 Mb long. And instead of copying the object you copy its adress on the heap (usually less heavy, but copy nonetheless). Do not forget it is not because a pointer represents a memory that its size is 0 byte. So using a pointer requires getting the value from some memory address. Returning a reference and inlining are also good ideas to optimise your code, as you avoid chasing pointer, function calls, etc.
3- I think you are correct there. I'd have to make sure by testing, but if follow my logic you are right.
I hope I answered your questions. And I hope my answers are as correct as can be. But maybe someone more clever than me can correct me :-)
Best.

Arduino array memory usage

If I declare an array in the global scope, it uses up memory to store it. However, if I declare an array (I am using two types, one is a char array, while the other is an int array) inside a function (such as setup()) will the memory be freed automatically once the array goes out of scope?
I believe this happens for some variables such as int or byte. I just wanted to know if this applies to arrays as well.
Also, since I read that for programs containing lots of strings, it is best to store them in program space, does a call such as
lcd.print("Hello")
still use up the memory for the "Hello" string after the function ends (assuming that the print function does not store it someplace else)?
To the second question:
The F() macro will store strings in the progmen instead of using RAM, so you do not have this problem anymore:
lcd.print(F("Hello"));
As to your 1st question:
Yes. All variables declared inside a function are only valid inside until the function returns and are released automatically then. This has some implications:
You must not use a pointer to a locally declared variable after the variable went out of scope, for instance, after the function returned. (Don't return a pointer to a local array from your function!) - It is however perfecly legal to pass that pointer to other functions when calling them from within the declaring block/function.
Local variables are stored on the local stack so that there needs to be enough room left for the stack to grow by the corresponding number of bytes when the function is called.
The amount of memory used by those variables is not accounted for in the calculation of "used" RAM at compile time.

C pointer array scope and function calls

I have this situation:
{
float foo[10];
for (int i = 0; i < 10; i++) {
foo[i] = 1.0f;
}
object.function1(foo); // stores the float pointer to a const void* member of object
}
object.function2(); // uses the stored void pointer
Are the contents of the float pointer unknown in the second function call? It seems that I get weird results when I run my program. But if I declare the float foo[10] to be const and initialize it in the declaration, I get correct results. Why is this happening?
For the first question, yes using foo once it goes out of scope is incorrect. I'm not sure if it's defined behavior in the spec or not but it's definitely incorrect to do so. Best case scenario is that your program will immediately crash.
As for the second question, why does making it const work? This is an artifact of implementation. Likely what's happenning is the data is being written out to the data section of the DLL and hence is valid for the life of the program. The original sample instead puts the data on the stack where it has a much shorter lifetime. The code is still wrong, it just happens to work.
Yes, foo[] is out of scope when you call function2. It is an automatic variable, stored on the stack. When the code exits the block it was defined in, it is deallocated. You may have stored a reference (pointer) to it elsewhere, but that is meaningless.
In both cases you are getting undefined behaviour. Anything might happen.
You are storing a pointer to the locally declared array, but once the scope containing the array definition is exited the array - and all its members are destroyed.
The pointer that you have stored now no longer points to a float or even a valid memory address that could be used for a float. It might be an address that is reused for something else or it might continue to contain the original data unchanged. Either way, it is still not valid to attempt to dereference the pointer, either for reading or writing a float value.
For any declaration like this:
{
type_1 variable_name_1;
type_2 variable_name_2;
type_3 variable_name_3;
}
declaration, the variables are allocated on the stack.
You can print out the address of each variable:
printf("%p\n", variable_name )
and you'll see that addresses increase by small amount roughly (but not always exactly equal to), the amount of space each variable needs to store its data.
The memory used by stack variables is recycled when the '}' is reached and the variables go out of scope. This is done nice an efficiently just by subtracting some number from a special pointer called the 'stack pointer', which says where the data for new stack variables will have their data allocated. By incrementing and decrementing the stack pointer, programs have an extremely fast way of working out were the memory for variables will live. Its such and important concept that every major processor maintains a special piece of memory just for the stack pointer.
The memory for your array is also pushed and popped from the program's data stack and your array pointer is a pointer into the program's stack memory. While the language specification says accessing the data owned by out-of-scope variables has undefined consequences, the result is typically easy to predict. Usually, your array pointer will continue to hold its original data until new stack variables are allocated and assigned data (i.e. the memory is reused for other purposes).
So don't do it. Copy the array.
I'm less clear about what the standard says about constant arrays (probably the same thing -- the memory is invalid when the original declaration goes out of scope). However, your different behavior is explainable if your compiler allocated a chunk of memory for constants that is initialized when your program starts, and later, foo is made to point to that data when it comes into scope. At least, if I were writing a compiler, that's probably what I'd do as its both very fast and leads to using the smallest amount of memory. This theory is easily testable in the following way:
void f()
{
const float foo[2] = {99, 101};
fprintf( "-- %f\n", foo[0] );
const_cast<foo*>(foo)[0] = 666;
}
Call foo() twice. If the printed value changed between calls (or an invalid memory access exception is thrown), its a fair bet that the data for foo is allocated in special area for constants that the above code wrote over.
Allocating the memory in a special area doesn't work for non-const data because recursive functions may cause many separate copies of a variable to exist on the stack at the same time, each of which may hold different data.
It's undefined behavior in both cases. You should consider the stack based variable deallocated when control leaves the block.
What's happening is currently you're probably just setting a pointer (can't see the code, so I can't be sure). This pointer will point to the object foo, which is in scope at that point. But when it goes out of scope, all hell can break loose, and the C standard can make no guarantees about what happens to that data once it goes out of scope. It can be overwritten by anything. It works for a const array because you're lucky. Don't do that.
If you want the code to work correctly as it is, function1() is going to need to copy the data into the object member. Which means you'll also have to know the length of the array, which means you'll have to pass it in or have some nice termination method.
The memory associated with foo goes out of scope and is reclaimed.
Outside the {}, the pointer is invalid.
It is a good idea to make objects manage their own memory rather than refer to an external pointer. In this specific case your object could allocate its own foo internally and copy the data into it. However it really depends on what you are trying to achieve.
For simple problems like this it is better to give a simple answer, not 3 paragraphs about stacks and memory addresses.
There are 2 pairs of braces {}, one is inside the other. The array was declared after the first left brace { so it stops existing before the last brace }
The end
When answering a question you must answer it at the level of the person asking regardless of how well you yourself comprehend the issue or you may confuse the student.
-experienced ESL teacher

Performance on strings initialization in C++

I have following questions regarding strings in C++:
1>> which is a better option(considering performance) and why?
1.
string a;
a = "hello!";
OR
2.
string *a;
a = new string("hello!");
...
delete(a);
2>>
string a;
a = "less";
a = "moreeeeeee";
how exactly memory management is handled in c++ when a bigger string is copied into a smaller string? Are c++ strings mutable?
It is almost never necessary or desirable to say
string * s = new string("hello");
After all, you would (almost) never say:
int * i = new int(42);
You should instead say
string s( "hello" );
or
string s = "hello";
And yes, C++ strings are mutable.
All the following is what a naive compiler would do. Of course as long as it doesn't change the behavior of the program, the compiler is free to make any optimization.
string a;
a = "hello!";
First you initialize a to contain the empty string. (set length to 0, and one or two other operations). Then you assign a new value, overwriting the length value that was already set. It may also have to perform a check to see how big the current buffer is, and whether or not more memory should be allocated.
string *a;
a = new string("hello!");
...
delete(a);
Calling new requires the OS and the memory allocator to find a free chunk of memory. That's slow. Then you initialize it immediately, so you don't assign anything twice or require the buffer to be resized, like you do in the first version.
Then something bad happens, and you forget to call delete, and you have a memory leak, in addition to a string that is extremely slow to allocate. So this is bad.
string a;
a = "less";
a = "moreeeeeee";
Like in the first case, you first initialize a to contain the empty string. Then you assign a new string, and then another. Each of these may require a call to new to allocate more memory. Each line also requires length, and possibly other internal variables to be assigned.
Normally, you'd allocate it like this:
string a = "hello";
One line, perform initialization once, rather than first default-initializing, and then assigning the value you want.
It also minimizes errors, because you don't have a nonsensical empty string anywhere in your program. If the string exists, it contains the value you want.
About memory management, google RAII.
In short, string calls new/delete internally to resize its buffer. That means you never need to allocate a string with new. The string object has a fixed size, and is designed to be allocated on the stack, so that the destructor is automatically called when it goes out of scope. The destructor then guarantees that any allocated memory is freed. That way, you don't have to use new/delete in your user code, which means you won't leak memory.
Is there a specific reason why you constantly use assignment instead of intialization? That is, why don't you write
string a = "Hello";
etc.? This avoids a default construction and just makes more sense semantically. Creating a pointer to a string just for the sake of allocating it on the heap is never meaningful, i.e. your case 2 doesn't make sense and is slightly less efficient.
As to your last question, yes, strings in C++ are mutable unless declared const.
string a;
a = "hello!";
2 operations: calls the default constructor std:string() and then calls the operator::=
string *a; a = new string("hello!"); ... delete(a);
only one operation: calls the constructor std:string(const char*) but you should not forget to release your pointer.
What about
string a("hello");
In case 1.1, your string members (which include pointer to the data) are held in stack and the memory occupied by the class instance is freed when a goes out of scope.
In case 1.2, memory for the members is allocated dynamically from heap too.
When you assign a char* constant to a string, memory that will contain the data will be realloc'ed to fit the new data.
You may see how much memory is allocated by calling string::capacity().
When you call string a("hello"), memory gets allocated in the constructor.
Both constructor and assignment operator call same methods internally to allocated memory and copy new data there.
If you look at the docs for the STL string class (I believe the SGI docs are compliant to the spec), many of the methods list complexity guarantees. I believe many of the complexity guarantees are intentionally left vague to allow different implementations. I think some implementations actually use a copy-on-modify approach such that assigning one string to another is a constant-time operation, but you may incur an unexpected cost when you try to modify one of those instances. Not sure if that's still true in modern STL though.
You should also check out the capacity() function, which will tell you the maximum length string you can put into a given string instance before it will be forced to reallocate memory. You can also use reserve() to cause a reallocation to a specific amount if you know you're going to be storing a large string in the variable at a later time.
As others have said, as far as your examples go, you should really favor initialization over other approaches to avoid the creation of temporary objects.
Most likely
string a("hello!");
is faster than anything else.
You're coming from Java, right? In C++, objects are treated the same (in most ways) as the basic value types. Objects can live on the stack or in static storage, and be passed by value. When you declare a string in a function, that allocates on the stack however many bytes the string object takes. The string object itself does use dynamic memory to store the actual characters, but that's transparent to you. The other thing to remember is that when the function exits and the string you declared is no longer in scope, all of the memory it used is freed. No need for garbage collection (RAII is your best friend).
In your example:
string a;
a = "less";
a = "moreeeeeee";
This puts a block of memory on the stack and names it a, then the constructor is called and a is initialized to an empty string. The compiler stores the bytes for "less" and "moreeeeeee" in (I think) the .rdata section of your exe. String a will have a few fields, like a length field and a char* (I'm simplifying greatly). When you assign "less" to a, the operator=() method is called. It dynamically allocates memory to store the input value, then copies it in. When you later assign "moreeeeeee" to a, the operator=() method is again called and it reallocates enough memory to hold the new value if necessary, then copies it in to the internal buffer.
When string a's scope exits, the string destructor is called and the memory that was dynamically allocated to hold the actual characters is freed. Then the stack pointer is decremented and the memory that held a is no longer "on" the stack.
Creating a string directly in the heap is usually not a good idea, just like creating base types. It's not worth it since the object can easily stay on the stack and it has all the copy constructors and assignment operator needed for an efficient copy.
The std:string itself has a buffer in heap that may be shared by several string depending on the implementation.
For instance, with Microsoft's STL implementation you could do that:
string a = "Hello!";
string b = a;
And both string would share the same buffer until you changed it:
a = "Something else!";
That's why it was very bad to store the c_str() for latter use; c_str() guarantee only validity until another call to that string object is made.
This lead to very nasty concurrency bugs that required this sharing functionality to be turned off with a define if you used them in a multithreaded application.