I'm working on a coding question, and to solve it I'm creating my own data structure (class), "SetOfStacks", that has as a member a vector of stacks. In one of the member functions of SetOfStacks, I need to expand the vector using the push_back() function. To do this, I declare a stack variable (non-dynamically) in the member function and then pass that variable in to push_back().
The code works fine, but I don't understand why. I would figure that after the member function has finished executing, the stack variable would go out of scope (because it is not dynamically allocated) and as a result the vector would contain garbage. I would think that the solution would be to use dynamically allocated memory. Why does this work? My best hypothesis is that push_back() takes in the new stack by value and not by reference, effectively making a new copy of it. Any help is appreciated!
When you push_back() the stack elements to the vector, the element is passed by value to the vector, and not as a reference, so even if after the function definition the stack elements got destroyed, the vector has got the value already.
This you can related with the return value from function to caller. Even if the return value is local to function stack( i.e it is going to be destroyed after function gets executed), the return values gets copied to caller function before it gets destroyed.
Related
Let's say I have a struct like this:
struct typeA
{
long first;
string second
double third;
};
If I declare
typeA myArray[100];
Then myArray is stored in the stack, consuming sizeof(typeA)*100 bytes of garbage data (until I store some actual data, at least).
Whenever I pass this array as a parameter, I'll always be passing a pointer to the first of the first element in the stack. So the pointer goes from stack to stack.
But if I declare
vector<int> myVector (4, 100);
Then the myVector object is actually stored in the stack, and it contains a pointer to the first element of an array of 4*sizeof(int) bytes stored in the heap, where the actual data is stored. So pointer goes from stack to heap.
Whenever I pass this vector as a parameter, if I add it to the parameter list like this:
vector<int> parameterVector
the function gets a copy of the myVector object and stores it in the stack.
But if I do it like this:
vector<int> ¶meterVector
the function gets a reference to myVector stored in the stack, so I now have a variable stored in the stack, referencing a myVector object also stored in the stack, that contains a pointer to an array of actual elements stored in the heap.
Is this correct?
I have a few doubts here:
Do the actual elements get stored in a static array (the ones inherited from C, indicated with square brackets) in the heap?
Does the myVector object have just one pointer to the first element, or it has multiple pointers to each one of the elements?
So passing a vector by value doesn't pose much of a problem, since the only thing that gets copied is the vector object, but not the actual elements. Is that so?
If I got the whole thing wrong and the actual elements are copied as well when passing a vector parameter by value, then why does C++ allow this, considering it discourages it with static arrays? (as far as I know, static arrays always get passed as a reference to the first element).
Thanks!
Do the actual elements get stored in a static array (the ones inherited from C, indicated with square brackets) in the heap?
Typically the elements of the vector are stored in the free store using a dynamic array like
some_type* some_name = new some_type[some_size]
Does the myVector object have just one pointer to the first element, or it has multiple pointers to each one of the elements?
Typically a vector will have a pointer to the first element, a size variable and a capacity. It could have more but these are implementation details and are not defined by the standard.
So passing a vector by value doesn't pose much of a problem, since the only thing that gets copied is the vector object, but not the actual elements. Is that so?
No. copying the vector is an O(N) operation as it has to copy each element of the vector. If it did not then you would have two vectors using the same underlying array and if one gets destroyed then it would delete the array out from under the other one.
Do the actual elements get stored in a static array (the ones inherited from C, indicated with square brackets) in the heap?
std::vector<> will allocate memory on the heap for all your elements, given, that you use the standard allocator. It will manage that memory and reallocate, when necessary. So no, there is no static array. It is more as you would handle a dynamic array in C, but without all the traps.
If you are looking for a modern replacement for C-Arrays, have a look at std::array<>. Be aware, that a std::array<> will copy all the elements as well. Pass by reference, if that is what you mean.
Does the myVector object have just one pointer to the first element, or it has multiple pointers to each one of the elements?
std::vector usually is a pointer to the first element, a size and a few more bits for internal usage. But the details are actually implementation specific.
So passing a vector by value doesn't pose much of a problem, since the only thing that gets copied is the vector object, but not the actual elements. Is that so?
No. Whenever the vector object gets copied to another vector object, all the elements will be copied.
If I got the whole thing wrong and the actual elements are copied as well when passing a vector parameter by value, then why does C++ allow this, considering it discourages it with static arrays? (as far as I know, static arrays always get passed as a reference to the first element).
The "static arrays" are a C-Legacy. You should simply not use them any more in new code. In case you want to pass a vector by reference, do so and nothing will be copied. In case you want the vector to be moved, move it, instead of copying it. Whenever you tell the compiler, you want to copy an object, it will.
OK, why is it that way?
The C-behavior is somehow inconsistent with the rest of the language. When you pass an int, it will be copied, when you pass a struct, it will be copied, when you pass a pointer, it will be copied, but when you pass an array, the array will not be copied, but a pointer to its first element.
So the C++ way is more consistent. Pass by value copies everything, pass by reference doesn't. With C++11 move constructors, objects can be passed by moving them. That means, that the original vector will be left empty, while the new one has taken over the responsibility for the original memory block.
If, inside a function, I store data in an unordered_set, and then return pointers to the objects being stored, will the pointers still be valid outside of the scope of the function?
eg.
int *myFunc(){
std::unordered_set<int> hashset;
//add some objects
hashset.insert(4);
hashset.insert(5);
hashset.insert(6);
int *intptr = &(*hashset.insert(4)); //try to insert an object that may already be in the set, and get a pointer to the object in the set
return intptr;
}
will trying to access *intptr in another function cause an error? Or is the data in an unordered_set deallocated when the scope of an unordered_set ends?
Yes, in your example you are returning an object which has been destroyed when the destructor of the unordered_set is called, that is when the function exits its scope.
Even though the elements contained in an unordered_set are dynamically allocated (and with elements I mean the objects which contains your effective keys or values) they are also destroyed when the set itself it destroyed.
In practice you could be able to access the data and receive no errors but you shouldn't consider this situation since it's just unsafe. Think only that it is wrong.
To obtain what you need you should take care of inner initialization of objects contained inside the set by yourself. An unique_ptr<int> could do the trick, since returning the value would move it on the destruction of the set and prevent the object from being deallocated.
The short answer is yes.
The memory inside the hashset object should be considered invalid after the function is called. Thus returning a pointer to the internals of that object will have undefined behavior.
The longer answer is maybe.
However, the state of that memory may remain unchanged for some time after the function has returned. Thus you may get "correct" results from the code despite the memory being invalid.
The longest answer is it depends.
How the memory is handled will depend greatly on the platform that you are running on. Memory constrained systems may perform memory management very differently from desktops.
I have a class named "Human" and have a vector of humans and I populate it this way:
humans.push_back(Human());
and in another class, I have a vector human* pointing to the previous humans in this way:
cell.humans.push_back(&humans.back())
push_back function creates an object in heap memory and so the object won't change if the stack frames get changed. but apparently, by defining a variable like in a non-related function:
string foo = "a";
one of the humans' attributes are getting overridden and this is an unexpected behavior.
but when I change the code in a way that the first human vector keeps a pointer of human like this:
humans.push_back(new Human())
cell.humans.push_back(humans.back())
the issue will be solved. to debug the program, I even used gdb and set a watchpoint on the changed object but gdb got stock in an infinite loop!!!
how can i explain this behavior?
push_back function creates an object in heap memory and so the object won't change if the stack frames get changed
Yes, but in this case, that object is just a pointer. Its validity depends on the validity of the object it points to. If the reference to humans.back() gets invalidated, then cell.humans' pointers be left dangling. De-referencing them would lead to undefined behaviour.
the reason was interesting. when you push back a new object into a vector, the vector class may change the place and copy the previous objects into a new location(after pushing back a new object) but where are keeping the previous pointers that are now free and are invalidated.
I just waste hours on a simple line causing data loss. I have AnotherClass holding a vector of instances of MyClass. This AnotherClass instantiates objects of MyClass the following way:
AnotherClass::AnotherClass(){
MyClass myObject(...);
myVector.push_back(&myObject);
}
The address of myObject is afterwards pushed into a vector (with other instances of MyClass), like written in the code. When I start using instances of AnotherClass I notice the values of MyClass were lost (completely random). When I change the code to:
AnotherClass::AnotherClass(){
MyClass* myObject = new MyClass(...);
myVector.push_back(myObject);
}
I don't have data loss.
Can somebody be so kind to explain me why the second way of creating objects doesn't lead to a loss of data? (without referencing me to books of 1.000 pages)
Simple. The first version creates a local variable on the stack, which gets destroyed automatically when it goes out of scope (at the end of the function)
Your vector just contains a pointer to where the object used to be.
The second version creates an object on the heap, which will live until you eventually delete it.
The reason is RAII.
In the first snippet you declare an object in the scope of your method/constructor.
As soon this scope ends, this is the case when method is finished, your object gets out-of-scope and becomes cleaned for you (this means that its desctructor is called). Your vector now still holds pointers to your already cleaned and thus invalid objects, thats the reason why you get garbage.
In the second snippter, your object are contained on the heap. They wont become cleaned / destroyed unless you call delete myObj;. Thats the reason why they remain valid even after the method has finsihed.
You can solve this on multiple ways:
Declare your vector as std::vector<MyClass> (notice, not a pointer type)
Keep your second snippet but make sure to delete all elements of your vector once your done
Use smart pointers if you dont want to cleanup your objects by your own (e.g std::shared_ptr or std::unique_ptr)
The first way allocates a MyClass object on the stack. That object will be deallocated the moment it goes out ofscope, I.e. when the constructor has run its course.
The second way allocates the object in dynamic memory. That object will continue to exist until you call delete on it.
The second way is the way to do it but you should add a destructor to AnotherClass that iterates through the vector and deletes all objects. Otherwise your program will have a memory leak.
I am coming from a C# background to C++. Say I have a method that creates a object in a method on the stack, then I pass it to another classes method which adds it to a memeber vector.
void DoStuff()
{
SimpleObj so = SimpleObj("Data", 4);
memobj.Add(so);
}
//In memobj
void Add(SimpleObj& so)
{
memVec.push_back(so); //boost::ptr_vector object
}
Here are my questions:
Once the DoStuff methods ends will the so go out of scope and be popped from the stack?
memVec has a pointer to so but it got popped what happens here?
Whats the correct way to pass stack objects to methods that will store them as pointers?
I realise these are probably obvious to a C++ programmer with some expereince.
Mark
Yes.
The pointer remains "alive", but points to a no-longer-existent object. This means that the first time you try to dereference such pointer you'll go in undefined behavior (likely your program will crash, or, worse, will continue to run giving "strange" results).
You simply don't do that if you want to keep them after the function returned. That's why heap allocation and containers which store copies of objects are used.
The simplest way to achieve what you are trying to do would be to store a copy of the objects in a normal STL container (e.g. std::vector). If such objects are heavyweight and costly to copy around, you may want to allocate them on the heap store them in a container of adequate smart pointers, e.g. boost::shared_ptr (see the example in #Space_C0wb0y's answer).
Another possibility is to use the boost::ptr_vector in association with boost::ptr_vector_owner; this last class takes care of "owning" the objects stored in the associated ptr_vector, and deleting all the pointers when it goes out of scope. For more information on ptr_vector and ptr_vector_owner, you may want to have a look at this article.
To achieve your goal, you should use a shared_ptr:
void DoStuff()
{
boost::shared_ptr<SimpleObj> so(new SimpleObj("Data", 4));
memobj.Add(so);
}
//In memobj
void Add(boost::shared_ptr<SimpleObj> so)
{
memVec.push_back(so); // std::vector<boost::shared_ptr<SimpleObj> > memVec;
}
Yes your so object will be popped off the stack once your function leaves scope. You should create a heap object using new and add a pointer to that in your vector.
As said before, the pointer in your vector will point to something undefined once your first function goes out of scope
That code won't compile because inside the Add function you're trying to trying to push a whole object into a vector that expects a pointer to an object.
If instead you were to take the address of that object and push that onto the vector, then it would be dangerous as the original object would soon be popped off the stack and the pointer you stored would be pointing to uninitialised memory.
If you were using a normal vector instead of a pointer vector then the push_back call would be copying the whole object rather than the pointer and thus it would be safe. However, this is not necessarily efficient, and the 'copy everything' approach is probably not intuitive to someone from the C#, Java, or Python worlds.
In order to store a pointer to an object it must be created with new. Otherwise it will disappear when going out of scope.
The easiest solution to your problem as presented here would be to use std::vector instead of boost::ptr_vector, because this way the push_back would copy the object in the vector