C++: vector implementation and dynamic memory allocation - c++

I was looking to understand how vector is implemented in C++. There was a previous question that asked this, and so I took a look at it, and I have a small question. Assuming the implementation in the linked question is correct, let's look at this code:
int main(){
Vector<int> test2 = test_Vector();
cout << test2[0] << endl;
return 0;
}
// below is NOT the STL vector object, but the one in the linked question,
// in which the asker tries to implement STL vector himself/herself
Vector<int> test_Vector(){
Vector<int> test;
test.push_back(5);
return test;
}
As I understand it, the test Vector object is created locally, so when the test_Vector method returns, the local object goes out of scope, thereby calling the destructor and delete-ing the dynamic array. Since the code actually works and 5 is printed, I guess I'm wrong. What's the right explanation?

You are right, but you're missing one important thing.
Because you're returning a Vector<int>, you should think of it as being copied. This would normally invoke the copy constructor, which copies test into a new instance of Vector<int>. The copy constructor is implemented in the linked question as:
template<class T>
Vector<T>::Vector(const Vector<T> & v)
{
my_size = v.my_size;
my_capacity = v.my_capacity;
buffer = new T[my_size];
for (int i = 0; i < my_size; i++)
buffer[i] = v.buffer[i];
}
Note that the copy constructor might not be invoked due to Return Value Optimization (see hair-splitting in comments below). Compilers are allowed to optimize the copy away in many cases, and the C++ standard allows for the fact that this optimization may change program behaviour.
Whether the object is copied, or RVO is applied, you should end up with the same thing. The optimization should not ruin your object provided you follow normal object-oriented practices.
You should always think of function return values being passed by value (ie copied) regardless of type, and then consider that your compiler is probably doing RVO. It's important not to forget the Rule of Three (or Four, or Five).

He provided a public copy constructor (technically, he didn't make the copy constructor private), so the standard C++ logic kicks in and makes a copy of the object to return. The new object is local to main. Inside the copy constructor, new memory is malloced and the data copied over. If he hadn't provided a copy constructor, it would have accessed invalid memory (trying to access the memory from the pointer which has been freed)

When test is returned its copy constructor is invoked (in theory) which allocates a new chunk of memory and copies the contents of test. The original test on test_Vectors stack is destructed (in theory) and test2 gets assigned the copy, which means the assignment operator will be invoked (in theory) which will again allocate memory and copy the data from the temporary object that was created when returning. The temporary object returned from test_Vector then has its destructor called (in theory).
As you can see, without compiler optimizations this is hell :) The "in theory"s can be skipped if you don't do anything too baroque and the compiler is smart and your situation is this simple. See this for some updates to the story as of C++11.

Related

Why are there C++ algorithms defined specially for uninitialized memory?

Why do we need special algorithms to write to uninitialized (but allocated) memory? Won't the normal modifying algorithms do? Or does uninitialized memory mean something different from what the name itself conveys?
Take std::copy and std::uninitialized_copy for a range of std::strings.
Regular copy will assume there already exists a string there. The copy assignment operator of string will try to use any existing space in the string if possible for the copy.
However, if there wasn't already a string there, as in the case of an uninitialized memory, the copy assignment operator will access garbage memory and behavior is undefined.
Uninitialized copy on the other hand will create the string there instead of assigning to it, so it can be used in a memory that does not already have a string in it.
Essentially, the regular versions will have a *it = value; in them, and uninitialized versions will have something like a new (&(*it)) T(value);.
It is essentially about the object lifecycle.
After memory is allocated, it must be initialized by running the class' constructor. When the object is finished with, the class' destructor must be run.
The standard algorithms assume they are always accessing initialized memory and so objects can be created, copied, swapped and moved and deleted etc... based on that assumption.
When dealing with uninitialized memory however, the algorithms have to make sure they do not run a destructor on memory that was never initialized with the constructor. They have to avoid moving and swapping with non-existent objects by initializing the memory first when needed etc...
They have to deal with the extra step in the object lifecycle (initialization) that is unnecessary with already initialized memory.
It's the difference between construction and assignment:
struct foo { /* whatever */ };
foo f;
unsigned char buf[sizeof foo];
foo *foo_ptr = (foo*) buf;
*foo_ptr = f; // undefined behavior; *foo_ptr does not point at a valid object
new (foo_ptr) foo; // okay; initializes raw memory
*foo_ptr = f; // okay; assignment to an existing object

Fundamenta data type assignment operator with temporary objects

Hello so i am trying to understand what is going on under the hood.
int* puter;
puter = std::make_unique<int>(50).get();
if(puter) { std::cout << *puter << std::endl; }
At first i thought the puter should be a dangling ptr because temporary unique_ptr from make_unique would be destroyed along with the allocated resource. But that is not the case.
After a while i could understand that it might not happen if the unique_ptr resource was move assigned to puter.
But the real question is. When should i assume things will get move asigned from temporary? Is there a rule to it? Should i just use release method for assurance in such cases?
As others have mentioned, this actually triggers undefined behaviors.
Your original thought is exactly what happened
At first i thought the puter should be a dangling ptr because temporary unique_ptr from make_unique would be destroyed along with the allocated resource
Thus,
if(puter) { std::cout << *puter << std::endl; }
causes undefined behaviors -- You don't know what is there, different compilers and machines can have different results(e.g. can be 50, can be 0, can be nullptr, or some garbage values...)
But the real question is. When should i assume things will get move asigned from temporary? Is there a rule to it? Should i just use release method for assurance in such cases?
It depends if the type/class support move semantics(i.e. have move constructor/move assignment operator defined). If not, it's the same as copying.
In this case, int* is a primitive type, and primitive types do not have move constructor/assignment, so assign a temporary will not be move assign.
On the other hand, if you assign a temporary std::vector to another std::vector, move assignment will be called because std::vector class has a well-defined move assignment operator(and a move constructor). e.g.
std::vector<int> v;
v = std::vector<int>{1,2,3}; //RHS is a temporary object, so it will call the vector's move assignment operator

bit by bit definition (c++ shallow copy)

so I have an exam soon, and glancing through my notes, the teacher says that a shallow copy is defined as a bit by bit copy. I know all about shallow and deep copies, yet I have no idea what bit by bit copy is supposed to mean. Isn't all computer data stored as bits? Could this definition imply that during a shallow copy, a bitstream is implemented when copying the data? Anybody know stuff about this "bit by bit" terminology? Thanks
Say you have two variables MyObj a, b;. If a = b performs a shallow copy, then the bits in the variable b will now be the same as the bits in the variable a. In particular, if MyObj contains any pointers or references, they are the same in both a and b. The objects which are pointed or referred to are not copied.
Bit by bit copy / Shallow copy
Take for example a pointer pointing to a chunk of data:
int* my_ints = new int[1000];
my_ints point to the start of an area of memory which spans a thousand ints.
When you do
int* his_ints = my_ints;
the value of my_ints is copied to his_ints, i.e. the bits of my_ints is copied to his_ints. This means that his_ints also points to the start of the same area of memory which my_ints also points. Therefore by doing
his_ints[0] = 42;
my_ints[0] will also be 42 because they both point to the same data. That is what your professor is most probably referred to as "bit by bit" copying, which is also commonly called as "shallow copy". This is mostly encountered when copying pointers and references (you can't technically copy references, but you can bind a reference to a variable bound to another reference).
Deep copy
Now, you may not want to have the bit by bit copy behavior. For example, if you want a copy and you want to modify that copy without modifying the source. For this, you do a deep copy.
int* my_ints = new int[1000];
int* his_ints = new int[1000];
std::copy(my_ints, my_ints + 1000, his_ints);
std::copy there copies the ints in the area of memory pointed to by my_ints into the area of memory pointed to by his_ints. Now if you do
my_ints[0] = 42;
his_ints[0] = 90;
my_ints[0] and his_ints[0] will now have different values, as they now point to their respective and different areas of memory.
How does this matter in C++?
When you have your C++ class, you should properly define its constructor. The one constructor that is relevant with the topic is the copy constructor (note: this also applies to the copy assignment operator). If you only let the compiler generate the default copy constructor, it simply does shallow copies of data, including pointers that may point to areas of memory.
class Example {
public:
int* data;
Example() {
data = new int[1000];
}
Example(const Example&) = default; // Using C++11 default specifier
~Example() {
delete[] data;
}
};
// Example usage
{
Example x;
Example y = x; // Shallow copies x into y
assert(x.data == y.data);
// y's destructor will be called, doing delete[] data.
// x's destructor will be called, doing delete[] data;
// This is problematic, as you are doing delete[] twice on the same data.
}
To solve the problem, you must perform a deep copy of the data. For this to be done, you must define the copy constructor yourself.
Example(const Example& rhs) {
data = new int[1000];
std::copy(data, data + 1000, rhs.data);
}
For more info and better explanation, please see What is The Rule of Three?.
I would define a bit-by-bit copy as the transfer of the information allocated to an object as an unstructured block of memory. In the case of simple structs this is easy to imagine.
What are the contents of the source struct? Are they initialized? What are its relationships to other objects? All unimportant.
In some sense, a bit-by-bit copy is like a shallow copy in that like a shallow copy a bit-by-bit copy will not duplicate related objects, but that's because it doesn't even consider object relationships.
For example C++ defines a trivial copy constructor as
A trivial copy constructor is a constructor that creates a bytewise copy of the object representation of the argument, and performs no other action. Objects with trivial copy constructors can be copied by copying their object representations manually, e.g. with std::memmove. All data types compatible with the C language (POD types) are trivially copyable.
In contrast, shallow copy and its counter-part deep copy exist as a concept precisely because of the question of object relationships.

Local variable deletes memory of another variable when going out of scope [duplicate]

This question already has answers here:
What is The Rule of Three?
(8 answers)
Closed 9 years ago.
While designing a class that dynamically allocates memory I ran into the following problem regarding memory allocation. I was hoping that some of you might be able to point me in the right direction as to how I should design my class in a better way. My class dynamically allocates memory and therefore also deletes it in its destructor.
In order to illustrate the problem, consider the following silly class declaration:
class testClass{
int* data;
public:
testClass(){
data = new int;
*data = 5;
}
~testClass(){
delete data;
}
};
So far so good. Now suppose that I create one of these objects in main
int main(){
testClass myObject;
return 0;
}
Still no issues of course. However, suppose that I now write a function that takes a testClass object as an input and call this from main.
void doNoting(testClass copyOfMyObject){
//do nothing
}
int main(){
testClass myObject;
doNothing(myObject);
return 0;
}
This time around, the function creates a local variable, copyOfMyObject, that's simply a copy of myObject. Then when the end of that function is reached, that local object automatically has its destructor called which deletes the memory pointed to by its data pointer. However, since this is the same memory pointed to by myObject's data pointer, myObject inadvertently has its memory deleted in the process. My question is: what is a better way to design my class?
When you call doNothing(), it is making a copy of your testClass object, because it is being passed by value. Unfortunately, when this copy is destroyed, it calls the destructor, which deletes the same data used by the original instance of testClass.
You want to learn about "copy constructors", and "passing by reference". That is, you should define a copy constructor for your class so that when a copy is made of an instance, it allocates its own memory for its data member. Also, rather than passing by value, you could pass a pointer or a reference to doNothing(), so that no copy is made.
You should create a copy constructor, that is a constructor of the form:
testClass::testClass(const testClass &o)
{
// appropriate initialization here
}
In your case, "appropriate initialization" might mean allocate a new chunk of memory and copy the memory from the old chunk into the new chunk. Or it may mean doing reference counting. Or whatever.
You should also read more about the Rule of Three right here on StackOverflow!
Here's a guideline from an authority: A class with any of {destructor, assignment operator, copy constructor} generally needs all 3
You need a copy constructor that will make a new allocated int for your data, that will then destruct that, but not affect the original.
Alternately, you can make a private copy constructor that's blank, which effectively disables it, forcing your users to pass by reference, or another non-copying way of doing things.

Do containers (sets, vectors, etc) need to be created using the new keyword in order persist across functions?

I've been working on a school assignment that makes pretty heavy use of vectors, sets, stacks and queues.
What is the difference between Foo and Bar, especially when passing Foo and Bar between functions? When would it be safe to call delete on Bar, if ever? I'm guess it's never safe to call delete on Bar unless everything in that vector has been moved? If I return Foo, wont it (and its contents) be deleted when the function exits?
vector<Line *> Foo;
and:
vector<Line *> * Bar = new vector<Line *>();
Similary, lets say I have a function:
vector<Line *> BSTIndex::query(string expression)
{
vector<Line *> result; //Holds the lines that match the expression
string query = expression;
queue<string> * output = expressionParser(query);
doSomeStuffWithOutputHere();
return result;
}
And my expressionParser is:
queue<string> * BSTIndex::expressionParser(string query)
{
char * cQuery = new char[100];
strcpy_s(cQuery, 100,query.c_str());
//strcpy(cQuery, query.c_str());
queue<string> * output = new queue<string>(); //Operators go in the queue
stack<string> * s = new stack<string>(); //Operands go on the stack
performSomeMagicOnQueueAndStackHere();
return output;
}
The stack is actually only local to expressionParser so I KNOW I can remove the new keyword from that. The queue however, needs to go back to the query function where it's used but then that's it. Do I HAVE To create a pointer to the queue in this case (I want to say yes because it's going to fall out of scope when expressionParser returns). If I need to create the pointer, then I should call a delete output in my query function to properly get rid of the queue?
My last concern is the vector being returned by query. Should that be left as I have it or should it be a pointer and what's the difference between them? Once I consume that vector (display the information in it) I need it to be removed from memory, however, the pointers contained in the vector are still good and the items being pointed to by them should not be deleted. What's the best approach in this case?
What happens to the contents of containers when you call delete on the container? If the container held pointers, and those pointers are deleted won't that affect other areas of my program?
SO many questions (direct and indirect) crammed into such a small space!
What is the difference between Foo and Bar.
vector<Line*> Foo;
vector<Line*>* Bar = new vector<Line*>();
Foo is an object of automatic (potentially static (but the distinction is not important here)) storage duration. This means it is created at the point of declaration (constructors called) and destroyed when it goes out of scope (destructors called).
Bar on the other hand is a pointer. Pointer have no constructors or destructors and are used to point at other objects (or NULL). Here Bar is initialized to point at an object of dynamic storage duration. This is an object that has to be manually released (otherwise it will leak).
especially when passing Foo and Bar between functions?
When passing Foo (by value) to/from functions the copy constructor is used to make a copy of the original object. Note: The standard explicitly states that copying from a function (via return) can be elided (look up RVO/NRVO) and all modern compilers will remove the extra copy construction and build in place at the return site (but this is an optimization invisible to the user just think of it as a very efficient copy out of a function). The resut of this is that when passed into a function (by value) or returned from a function (by return) you are working on a new object (not the original). This is important because of the side effects of using the object will not affect the original. (see your question below about returning Foo).
When passing Bar (by value) the pointer is copied. But this means that what is pointed at is the same. So modifying an object via the pointer is modifying the original value. This makes passing it as a parameter very cheap (because all you are passing is the address of the object). But it makes returning a value potentially dangerous, this is because you could return the address of an object that has gone out of scope inside the function.
Using pointers in inherently dangerous and modern C++ programs rarely use RAW pointers directly. Pointers are usually wrapped up inside objects that manage the ownership and potentially lifespan of the object (see smart pointers and containers).
When would it be safe to call delete on Bar, if ever?
It is only safe to delete Bar if:
It is NULL
The object it points at has been dynamically allocated via new.
I'm guess it's never safe to call delete on Bar unless everything in that vector has been moved?
It would safe to delete Bar even if it's content had not been moved (though you could leak if the contained pointers were owned by Bar (this is not directly unsafe but can be inconvenient when you run out of space)). This is another reason pointers are rarely used directly, there are no ownership semantics associated with a pointer. This means we can not tell if Bar owns the pointers it holds (it is the responsibility of the owner to call delete on a pointer when it is no longer used).
If I return Foo, wont it (and its contents) be deleted when the function exits?
Yes. But because you return an object it will be copied out of the function. So what you use outside the function is a copy of Foo (Though optimizers may elide the copy because of RVO/NRVO. Not if the copy is elided the destructor is also elided).
Similary, lets say I have a function: query and expressionParser:
vector<Line *> BSTIndex::query(string expression)
{
vector<Line *> result; //Holds the lines that match the expression
string query = expression;
queue<string> * output = expressionParser(query);
doSomeStuffWithOutputHere();
return result;
}
queue<string> * BSTIndex::expressionParser(string query)
{
char* cQuery = new char[100];
strcpy_s(cQuery, 100,query.c_str());
queue<string> * output = new queue<string>(); //Operators go in the queue
stack<string> * s = new stack<string>(); //Operands go on the stack
performSomeMagicOnQueueAndStackHere();
return output;
}
The stack is actually only local to expressionParser so I KNOW I can remove the new keyword from that.
Not only can you but you should. As currently you are leaking this object when it goes out of scope.
The queue however, needs to go back to the query function where it's used but then that's it. Do I HAVE To create a pointer to the queue in this case (I want to say yes because it's going to fall out of scope when expressionParser returns).
No. You can pass the obejct back by value. It will be correctly copied out of the function. If you find that this is expensive (unlikely) then you could create a dynamic object with new and pass it back as a pointer (but you may want to look at smart pointers). Remember that pointers do not indicate that you are the owner so it is unclear who should delete a pointer (or even if it should be deleted). So (if passing a value is too expensive) use a std::auto_ptr<> and pass the pointer back inside it.
If I need to create the pointer, then I should call a delete output in my query function to properly get rid of the queue?
Yes. And No. If you create a dynamic object with new. Then somebody must call delete on it. It is bad C++ style to do this manually. Learn to use smart pointers so that it is done automatically and in an exception safe manor.
My last concern is the vector being returned by query. Should that be left as I have it or should it be a pointer and what's the difference between them?
I would do it like this:
vector<Line> BSTIndex::query(string const& expression)
{
vector<Line> result; // 1 Keep line objects not pointers.
queue<string> output = expressionParser(expression);
doSomeStuffWithOutputHere();
return result;
}
queue<string> BSTIndex::expressionParser(string const& query)
{
char cQuery[100]; // Don't dynamically allocate
// unless somebody takes ownership.
// It would probably be better to use string or vector
// depending on what you want to do.
strcpy_s(cQuery, 100, query.c_str()); // std::string can use the assignment operator.
queue<string> output; // Just build the que and use it.
stack<string> s; // Unless there is a need. Just use a local one.
performSomeMagicOnQueueAndStackHere();
return output;
}
Once I consume that vector (display the information in it) I need it to be removed from memory, however, the pointers contained in the vector are still good and the items being pointed to by them should not be deleted. What's the best approach in this case?
Depends who owns the pointers.
What happens to the contents of containers when you call delete on the container?
If the content of the containers are pointers. Then nothing. The pointers disappear (what they point at is unchanged and unaffected). If the container contains objects (not pointers) then the destructor is called on the objects.
If the container held pointers, and those pointers are deleted won't that affect other areas of my program?
It would. But you would have to call delete manually.
For the most part, what you normally want to do is just create an instance of the object in question, and when you need to return it, return it.
For something like a large container, that can (and often does) sound like it'd be horribly inefficient. In reality, however, it's usually quite efficient. The C++ standard has a couple of clauses to allow what are called Return Value Optimization (RVO) and Named Return Value Optimization (NRVO). Initially, what's in the standard doesn't sound very important -- it just says the compiler doesn't have to call the copy constructor for a returned value, even if the copy constructor has side effects so you can see that it's been omitted.
What this means in reality, however, is that the compiler (which is to say essentially all reasonably current compilers) will actually only create one collection at the highest level in the function call hierarchy where it gets used. When you end up returning the object of that type from some other function, instead of creating a local copy, and then copying it for the return, it'll just generate code in the called function that works directly in the collection in the calling function, so the return basically becomes a nop. As a result, instead of the horribly inefficient copying that it looks like, you end up with a really fast return with no copying at all.
(1) The stack is actually only local to expressionParser so I KNOW I can remove the new keyword
correct
(2) Do I HAVE To create a pointer to the queue in this case. If I need to create the pointer, then I should call a delete output in my query function to properly get rid of the queue?
correct. Here, "creating pointer" doesn't mean again using new and copy all contents. Just assign a pointer to the already created queue<string>. Simply delete output; when you are done with it in your query().
(3) vector being returned by query. Should that be left as I have it or should it be a pointer and what's the difference between them?
I would suggest either you pass the vector<Line*> by non-const reference to your query() and use it or return a vector<Line*>* from your function. Because difference is, if the vector is huge and return by value then it might copy all its content (considering the worst case without optimization)
(4) Once I consume that vector (display the information in it) I need it to be removed from memory, however, the pointers contained in the vector are still good and the items being pointed to by them should not be deleted. What's the best approach in this case?
Pass the vector<Line*> into the function by reference. And retain it in scope until you need it. Once you are done with pointer members 'Line*inside it, justdelete` them
(5) What happens to the contents of containers when you call delete on the container? If the container held pointers?
Nothing happens to the pointer members of the container when it's deleted. You have to explicitly call delete on every member. By the way in your case, you can't delete your vector<Line*> because it's an object not a pointer & when you delete output;, you don't have to worry about contents of queue<> because they are not pointers.
(6) Those pointers are deleted won't that affect other areas of my program?
Deleting a valid pointer (i.e. allocated on heap) doesn't affect your program in any way.
You got a logic problem here - a std::vector (or any container for that matter) of pointers will, at destruction, only kill those pointers, not what they point to. So the following is perfectly fine:
#include <vector>
#include <iostream>
std::vector<int*> GetIntPtrVec(){
std::vector<int*> ret(10); // prepare for 10 pointer;
for(int i=0; i<10; ++i){
ret[i] = new int(i);
}
}
int main(){
std::vector<int*> myvec = GetIntPtrVec(); // copies the return into myvec
// ^^^^^^^^^^^^^^^^^^^^^^
for(int i=0; i<myvec.size(); ++i){
// display and delete again
std::cout << myvec[i] << "\n";
delete myvec[i];
}
std::cin.get();
}
When returning a vector from a function, it gets copied into the receiving variable (marked by the ^ above).
Then for the pointer-to-vector approach, of course you need to delete that pointer again sometime - when you can be sure that nobody uses it anymore. But that same problem also applies for the first approach - when are you gonna delete the pointers inside the returned vector? You have to make sure that there are no dangling pointers, e.g. pointer, which point to your now deleted memory.