C++ Any benifits using pointer to vector of pointers? - c++

Before I start, I have already looked at these questions:
Memory consumption of a pointer to vector of pointers
Pointer to vector vs vector of pointers vs pointer to vector of pointers
And they both do not answer my question.
I am working on a project where it involves millions of instances to work with.
The data structure I am using is a bit complicated to look at for the first time, the structure itself isn't really of any importance for this question.
Now, if I had the following:
vector<object*> *myVector;
Everything goes to the heap.
However, what happens if I used this:
vector<object*> myVector;
Does the vector itself get stored on the stack? And its contents are then stored on the heap? For example, if the vector had 10 million instances, would I have 10 million references on the stack for 10 million objects on the heap? Or am I missing something?
Moreover, if I used this:
vector<object> *myVector;
Where do the contents of the vector go? Does everything go on the heap? If yes, then does it work recursively? For example:
vector<vector<vector<int>>> *myVector;
Are all the inner vectors then stored on the heap? If yes, then it should be enough deleting myVector in the destructor, without worrying on anything else?
Anyway, I am assuming all the answers that apply to vector would also apply to unordered_map. Please tell me if I am wrong.
Thanks in advance.

In general, elements of a std::vector will always be dynamically allocated on the heap. For most implementations, a std::vector<T> foo, where T could be a pointer or not, there will always be three pointers on the stack that describe the vector itself, but not the elements. For a std::vector<T> *bar, there is one pointer, bar, on stack. The three vector pointers as well as the vector elements are on the heap.
Similarly, for an std::unordered_map, the elements itself will always be dynamically allocated on the heap. Generally, any std type that has a Allocator template parameter, or any type that contains a variable amount of data, will have it's data dynamically allocated on the heap.
Now there are special cases, like custom allocators, weird optimizations, specifics of terminology, but for practical purposes, this is what you should know.
The next thing that you might want to learn about, are smart pointers, e.g. std::unique_ptr, and std::shared_ptr, ownership semantics and RAII. That is actually more important to know in order to write correct code (What do I need to delete and worry about?).

Related

C++ When to use pointer to vector?

I have a problem about pointer and standard library use.
Let's create a new class
class Graph
{
std::vector<Edge> *edge_list;
//another way is
//std::vector<Edge> edge_list;
}
I already thought two reasons why I use pointer:
It's easy to manipulate the memory using new and delete
It can be passed by parameters easily.
However, we can pass by reference if we use vector.Then Reason 2 doesn't count.
So, Is it true if I am not strict with memory allocation, I don't need to use pointer to vector and other std container?
The implementation of std::vector contains 2 pointers:
The beginning of the allocated array
1 element after the end of the allocated array
Essentially, when you declare a vector it has no space allocated in the heap, but as you add elements this changes.
Note that std::vector manages the memory it uses, so there is no need for you to worry about new and delete (unnecessary complexity). As soon as it goes out of scope, it deallocates its memory (stack and heap).
As you said, a vector can be passed very easily by reference, which works the same way as a pointer for machine code, and it's more clear.

Is it better to use a vector containing pointers, or a vector of values (to avoid heap fragmentation)?

I'm aware of the many articles on avoiding heap fragmentation. My question has to do with specifically what happens when we use a vector to store data:
class foo{
public:
std::vector<bar>; // Or I can have std::vector<bar*>
};
I have no problem using new or delete (I don't have memory leaks, and it is very clear to me when a bar is not used and I can call delete when necessary). But with regarding to heap fragmentation, which is better ? Does this make a stack overflow more likely ?
EDIT: I don't actually have any new information to add. I just wanted to thank everyone, and say that I've found that any question that is tagged with C++ seems to attract so many knowledgeable helpful people. Its really quite nice. Thank you.
TL;DR, unless you have a good reason, with sound measurements, always favour using a vector with value types.
I would argue for the std::vector<bar> form. Unless you have other (measured) reasons why, the guaranteed contiguous memory of vector with a value type is a better idea. When the vector allocates a contiguous block of memory, the objects will be laid out in memory next to each other (with some reserve). This allows the host system and runtime a better chance to avoid fragmentation. Allocating the objects individually may result in fragmentation since you do not control where they are allocated.
I note your concerns around the stack; vector has a small stack allocation so this shouldn't be a problem, the "usual" allocator std::allocator uses new to allocate memory for the vector on the heap.
Why should a vector always be favoured? Herb Sutter goes into some detail about this; http://channel9.msdn.com/Events/Build/2014/2-661 with supporting graphs, diagrams, explanations etc. from around the 23:30 mark and he picks up Bjarne's material at around the 46:00 mark.
The std::vector guarantees, that it's contents will be stored in continuous memory. If you store objects in your vector, then each object will be next to each other in a big chunk of memory, but if you put pointers to the vector, which point to objects created with new, then each object will be in random places in memory.
So the first version (storing objects, not pointers) is better for both avoiding heap fragmentation and utilizing the cache.
OP seems to have a basic misunderstanding about what std::vector does: It allocates a contiguous block of memory on the heap, you do not need to worry about your stack-space when using vector! So whenever you do not really need to store anything but the objects them self, just put them in your vector and you are good.
So std::vector<bar> appears to be the right choice in your case. Also, don't worry about heap fragmentation prematurely, let the compiler worry about that. (But you should worry about feeding pre-fetcher, that is why std::vector<bar> should be your standard choice to store anything.)
Use a vector of pointers in the following cases:
Bar is a base class and you want to store instances of derived classes in the vector.
Bar doesn't have an appropriate copy/move constructor or it is prohibitively expensive.
You want to store pointers to objects in the vector elsewhere, and you may change the capacity of the vector (for example, due to resize or push_back). If the objects are stored by value, old pointers will become invalid when the capacity changes.
You want some elements in the vector to be "empty", (which can be represented by null pointers), and it's not practical to store dummy, unused instances of Bar in the vector by value instead.
If you do need pointers, I strongly suggest using smart pointers.
In other cases, you should prefer to store objects by value, as it is usually more efficient.

vector<X*> vec vs vector<X>* vec

What is the difference in memory usage between:
std::vector<X*> vec
where each element is on the heap, but the vector itself isn't
and
std::vector<X>* vec
where the vector is declared on the heap, but each element is (on the stack?).
The second option doesn't make much sense- does it mean the vector pointer is on the heap, but it points back at each element, which are on the stack??
std::vector<X*> vec
Is an array of pointers of the class X. This is useful, for example, when making an array of non-copyable classes/objects like std::fstream in C++98. So
std::vector<std::fstream> vec;
is WRONG, and won't work. But
std::vector<std::fstream*> vec;
works, while you have to create a new object for each element, so for example if you want 5 fstream elements you have to write something like
vec.resize(5);
for(unsigned long i = 0; i < vec.size(); i++)
{
vec[i] = new std::fstream;
}
Of course, there are many other uses depending on your application.
Now the second case is a pointer of the vector itself. So:
vector<int>* vec;
is just a pointer! it doesn't carry any information, and you can't use it unless you create the object for the vector itself, like
vec = new vector<int>();
and eventually you may use it as:
vec->resize(5);
Now this is not really useful, since vectors anyway store their data on the heap and manage the memory they carry. So use it only if you have a good reason to do it, and sometimes you would need it. I don't have any example in mind on how it could be useful.
If this is what you really asked:
vector<X>* vec = new vector<X>();
it means that the whole vector with all its elements is on the heap. The elements occupy a contiguous memory block on the heap.
The difference is where (and what) you need to do for manual memory management.
Whenever you have a raw C-style pointer in C++, you need to do some manual memory management -- the raw pointer can point at anything, and the compiler won't do any automatic construction or destruction for you. So you need to be aware of where the pointer points and who 'owns' the memory pointed at in the rest of your code.
So when you have
std::vector<X*> vec;
you don't need to worry about memory management for the vector itself (the compiler will do it for you), but you do need to worry about the memory management of the pointed at X objects for the pointers you put in the vector. If you're allocating them with new, you need to make sure to manually delete them at some point.
When you have
std::vector<X> *vec;
You DO need to worry about memory management for the vector itself, but you DON'T need to worry about memory management for the individual elements.
Simplest is if you have:
std::vector<X> vec;
then you don't need to worry about memory management at all -- the compiler will take care of it for you.
In code using good modern C++ style, none of the above is true.
std::vector<X*> is a collection of handles to objects of type X or any of its subclasses, which you do not own. The owner knows how they were allocated and will deallocate them -- you don't know and don't care.
std::vector<X>* would in practice, only ever be used as a function argument which represents a vector you do not own (the caller does) but which you are going to modify. According to one common approach, the fact that it's a pointer rather than a vector means that it is optional. Much more rarely it might be used as a class member where the lifetime of the attached vector is known to outlive the class pointing to it.
std::vector<std::unique_ptr<X>> is a polymorphic collection of mixed objects of various subclasses of X (and maybe X itself directly). Occasionally you might use it non-polymorphically if X is expensive to move, but modern style makes most types cheap to move.
Prior to C++11, std::vector<some_smart_pointer<X> > (yes, there's a space between the closing brackets) would be used for both the polymorphic case and the non-copyable case. Note that some_smart_pointer isn't std::unique_ptr, which didn't exist yet, not std::auto_ptr, which wasn't usable in collections. boost::unique_ptr was a good choice. With C++11, the copyability requirement for collection elements is relaxed to moveability, so this reason completely went away. (There remain some types which are neither copyable nor moveable, such as the ScopeGuard pattern, but these should not be stored in a collection anyway)

Splitting QList into chunks, pointers or references?

I have this application that requires me to have a QList which will contain 1 < x < 10000+ objects. Now I have a few issues.
First of all, should I declare the QList as a pointer or straight on the stack? The objects in the QList are pretty small and are wrappers for QFileInfo. But how should I do this?
A list of objects on the stack?
A list (on stack) of pointers to objects on the heap?
A..
QList<FileInfoWrapper>*
Firstly, if I picked solution 2, would my heap be a mess since I just allocate small portions of data all over the place? I dont want that.
Secondly, if I pick the 3rd solution, how would this look in memory when I access the individual objects? And could I create pointers to them (they are afterall on the heap)?
Then we come to my other issue. This list will be past around like a fork at a diner and at some point I would like to create sublists that doesn't hold any data, only references/pointers to some of the objects in the list(for example object 0 to 250). I will then throw these lists into different threads that will have to have a ref to the object to be able to edit them (read: not a hard copy).
Also, could someone explain exactly what happens on the heap when you create a list like this:
QList<FileInfoWrapper>* list = new QList<FileInfoWrapper>();
Would it be like in c where you just create a pointer to the offset where that object will be located?
*(list + sizeof(FileInfoWrapper) * 10)
QList is a container class ... that means that it manages the memory for you so you don't have to worry about it. It's underlying data-structure is a variant of a deque with some special modifications, so your understanding of indexing into the list is not correct. But either way, these are details that are abstracted away by the interface, and you don't need to worry about them. You simply use the given class methods like operator[] or at() to obtain a reference to an object at a given index, and other functions like push_back() or insert() to copy objects into the container. So you can simply make a QList instance on the stack (as long as it doesn't go out-of-scope while it's needed), and copy objects into it. The underlying data-structure will properly allocate the memory needed dynamically to store the objects, and at the time of destruction of the QList object, it will deallocate the memory used to store the objects it "owns".
Think about QList as you would think of a STL container like std::vector or std::list ... again, the underlying data-structure for QList is not the same as these STL containers, but the point is that you can allocate the data-structure on the stack like you would any other class, and it contains all the private data-members and information necessary to manage the memory on the heap. Allocating the QList on the heap through a call to new doesn't gain you anything in that regard ... there are already pointers, etc. inside the data-structure allocating and managing the memory of the contained objects for you.
Finally, don't worry about data-fragmentation. The point of a good container class is to properly allocate memory to avoid memory fragmentation issues from allocating and reallocating memory too often. Additionally, allocating memory takes time, so if a container class were to constantly need to call new, that would really hurt it's performance. While allocating memory on every insertion may be a necessity for node-based containers like linked-lists and trees, hash-tables, dynamic-arrays, and other block-type data-structures are much more efficient at utilizing the memory they allocate to minimize these allocation calls.

Pointer to vector vs vector of pointers vs pointer to vector of pointers

Just wondering what you think is the best practice regarding vectors in C++.
If I have a class containing a vector member variable.
When should this vector be declared a:
"Whole-object" vector member varaiable containing values, i.e. vector<MyClass> my_vector;
Pointer to a vector, i.e vector<MyClass>* my_vector;
Vector of pointers, i.e. vector<MyClass*> my_vector;
Pointer to vector of pointers, i.e. vector<MyClass*>* my_vector;
I have a specific example in one of my classes where I have currently declared a vector as case 4, i.e. vector<AnotherClass*>* my_vector;
where AnotherClass is another of the classes I have created.
Then, in the initialization list of my constructor, I create the vector using new:
MyClass::MyClass()
: my_vector(new vector<AnotherClass*>())
{}
In my destructor I do the following:
MyClass::~MyClass()
{
for (int i=my_vector->size(); i>0; i--)
{
delete my_vector->at(i-1);
}
delete my_vector;
}
The elements of the vectors are added in one of the methods of my class.
I cannot know how many objects will be added to my vector in advance. That is decided when the code executes, based on parsing an xml-file.
Is this good practice? Or should the vector instead be declared as one of the other cases 1, 2 or 3 ?
When to use which case?
I know the elements of a vector should be pointers if they are subclasses of another class (polymorphism). But should pointers be used in any other cases ?
Thank you very much!!
Usually solution 1 is what you want since it’s the simplest in C++: you don’t have to take care of managing the memory, C++ does all that for you (for example you wouldn’t need to provide any destructor then).
There are specific cases where this doesn’t work (most notably when working with polymorphous objects) but in general this is the only good way.
Even when working with polymorphous objects or when you need heap allocated objects (for whatever reason) raw pointers are almost never a good idea. Instead, use a smart pointer or container of smart pointers. Modern C++ compilers provide shared_ptr from the upcoming C++ standard. If you’re using a compiler that doesn’t yet have that, you can use the implementation from Boost.
Definitely the first!
You use vector for its automatic memory management. Using a raw pointer to a vector means you don't get automatic memory management anymore, which does not make sense.
As for the value type: all containers basically assume value-like semantics. Again, you'd have to do memory management when using pointers, and it's vector's purpose to do that for you. This is also described in item 79 from the book C++ Coding Standards. If you need to use shared ownership or "weak" links, use the appropriate smart pointer instead.
Deleting all elements in a vector manually is an anti-pattern and violates the RAII idiom in C++. So if you have to store pointers to objects in a vector, better use a 'smart pointer' (for example boost::shared_ptr) to facilitate resource destructions. boost::shared_ptr for example calls delete automatically when the last reference to an object is destroyed.
There is also no need to allocate MyClass::my_vector using new. A simple solution would be:
class MyClass {
std::vector<whatever> m_vector;
};
Assuming whatever is a smart pointer type, there is no extra work to be done. That's it, all resources are automatically destroyed when the lifetime of a MyClass instance ends.
In many cases you can even use a plain std::vector<MyClass> - that's when the objects in the vector are safe to copy.
In your example, the vector is created when the object is created, and it is destroyed when the object is destroyed. This is exactly the behavior you get when making the vector a normal member of the class.
Also, in your current approach, you will run into problems when making copies of your object. By default, a pointer would result in a flat copy, meaning all copies of the object would share the same vector. This is the reason why, if you manually manage resources, you usually need The Big Three.
A vector of pointers is useful in cases of polymorphic objects, but there are alternatives you should consider:
If the vector owns the objects (that means their lifetime is bounded by that of the vector), you could use a boost::ptr_vector.
If the objects are not owned by the vector, you could either use a vector of boost::shared_ptr, or a vector of boost::ref.
A pointer to a vector is very rarely useful - a vector is cheap to construct and destruct.
For elements in the vector, there's no correct answer. How often does the vector change? How much does it cost to copy-construct the elements in the vector? Do other containers have references or pointers to the vector elements?
As a rule of thumb, I'd go with no pointers until you see or measure that the copying of your classes is expensive. And of course the case you mentioned, where you store various subclasses of a base class in the vector, will require pointers.
A reference counting smart pointer like boost::shared_ptr will likely be the best choice if your design would otherwise require you to use pointers as vector elements.
Complex answer : it depends.
if your vector is shared or has a lifecycle different from the class which embeds it, it might be better to keep it as a pointer.
If the objects you're referencing have no (or have expensive) copy constructors , then it's better to keep a vector of pointer. In the contrary, if your objects use shallow copy, using vector of objects prevent you from leaking...