Using class objects in std::vector - c++

What is better -
std::vector<MyClass*>
or
std::vector<MyClass>
?
I mean, will the second option store objects in heap anyway?
Which one is faster and cleaner?

std::vector<MyClass> shall be preferable in most cases. Yes, it will store objects in heap (dynamic storage) by default std::allocator.
The advantages are that objects are automatically destroyed on vector destruction and are allocated in single contiguous memory block reducing heap fragmentation. So this way it's cleaner.
Also this way is faster because it could minimize memory allocation operations. Vector will preallocate storage before constructing objects, so for N objects it will be M allocation operations and N constructor call, N > M (the bigger is N - the greater is difference). If you create objects manually and place them to vector by pointers it will lead to M allocations and N constructions, M = N + X, where X is vector storage allocations. And you always can minimize vector memory allocations if you know number of stored objects - using std::vector::reserve().
On the contrary using std::vector of pointers will require you to destroy objects manually, e.g. calling delete for dynamically allocated objects. This is not recommended. Such containers shall be used only as non-owning. Objects ownership shall be maintained externally in this case.

Yes, the second version will store the objects on the heap too, but in a more compact form, namely an array of MyClass.
In the first form you must allocate and deallocate your objects, while std::vector will do this for you in the second version.
So, as always, it depends on your needs and requirements. If you can choose, take the second form:
std::vector<MyClass>
it's much easier to maintain.

Depends. Storing copies of your class in the container is easier to work with, of course, but you need to run the copy constructor every time to save an instance to the container which may be problematical. Also, your container cannot store anything but that one class - in particular it cannot store other classes that are specialised ( inherited ) from your base class. So, in general, you usually end up having to store pointers to the class.
There are snags with storing pointers. However, to get the best of both worlds, consider using boost::ptr_vector instead, which has the advantages of smart pointers without the overhead.

std::vector<MyClass> will store objects on the heap (probably), std::vector<MyClass*> will store your pointers on the heap (probably) and your objects wherever they were created. Use std::vector<MyClass> whenever applicable, if you need to support inheritance std::vector<std::unique_ptr<MyClass>> again when applicable. If your vector isn't supposed to own the objects an std::vector<MyClass*> can be useful, but that's rarely the case.

Related

Is it safe to "dissolve" c++ arrays on the heap?

I am currently implementing my own vector container and I encountered a pretty interesting Issue(At leas for me). It may be a stupid question but idk.
My vector uses an heap array of pointers to heap allocated objects of unknown type (T**).
I did this because I wanted the pointers and references to individual elements to stay same, even after resizing.
This comes at performance cost when constructing and copying, because I need to create the array on the heap and each object of the array on the heap too. (Heap allocation is slower than on the stack, right?)
T** arr = new *T[size]{nullptr};
and then for each element
arr[i] = new T{data};
Now I wonder if it would be safe, beneficial(faster) and possible, if instead of allocating each object individually, I could create a second array on the heap and save the pointer of each object in the first one.Then use (and delete) these objects later as if they were allocated separately.
=> Is allocating arrays on the heap faster than allocating each object individually?
=> Is it safe to allocate objects in an array and forgetting about the array later? (sounds pretty dumb i think)
Link to my github repo: https://github.com/LinuxGameGeek/personal/tree/main/c%2B%2B/vector
Thanks for your help :)
First a remark, you should not think of the comparison heap/stack in terms of efficiency, but on object lifetime:
automatic arrays (what you call on stack) end their life at the end of the block where they are defined
dynamic arrays (whay you call on heap) exists until they are explicitly deleted
Now it is always more efficient to allocate a bunch of objects in an array than to allocate them separately. You save a number of internal calls and various data structure to maintain the heap. Simply you can only deallocate the array and not the individual objects.
Finally, except for trivially copyable objects, only the compiler and not the programmer knows about the exact allocation detail. For example (and for common implementations) an automatic string (so on stack) contains a pointer to a dynamic char array (so on heap)...
Said differently, unless you plan to only use you container for POD or trivially copyable objects, do not expect to handle all the allocation and deallocation yourself: non trivial objects have internal allocations.
Heap allocation is slower than on the stack, right?
Yes. Dynamic allocation has a cost.
Is allocating arrays on the heap faster than allocating each object individually?
Yes. Multiple allocations have that cost multiplied.
I wonder if it would be ... possible, if instead of allocating each object individually, I could create a second array on the heap and save the pointer of each object in the first one
It would be possible, but not trivial. Think hard how you would implement element erasure. And then think about how you would implement other features such as random access correctly into the container with arrays that contain indices from which elements have been erased.
... safe
It can be implemented safely.
... beneficial(faster)
Of course, reducing allocations from N to 1 would be beneficial by itself. But it comes at the cost of some scheme to implement the erasure. Whether this cost is greater than the benefit of reduced allocations depends on many things such as how the container is used.
Is it safe to allocate objects in an array and forgetting about the array later?
"Forgetting" about an allocation seems like a way to say "memory leak".
You could achieve similar advantages with a custom "pool" allocator. Implementing support for custom allocators to your container might be more generally useful.
P.S. Boost already has a "ptr_vector" container that supports custom allocators. No need to reinvent the wheel.
I did this because I wanted the pointers and references to individual
elements to stay same, even after resizing.
You should just use std::vector::reserve to prevent reallocation of vector data when it is resized.
Vector is quite primitive, but is is highly optimized. It will be extremely hard for you to beat it with your code. Just inspect its API and try its all functionalities. To create something better advanced knowledge of template programing is required (which apparently you do not have yet).
What you are trying to come up with is a use of placement new allocation for a deque-like container. It's a viable optimization, but usually its done to reduce allocation calls and memory fragmentation, e.g. on some RT or embedded systems. The array maybe even a static array in that case. But if you also require that instances of T would occupy adjacent space, that's a contradicting requirement, resorting them would kill any performance gains.
... beneficial(faster)
Depends on T. E.g. there is no point to do that to something like strings or shared pointers. Or anything that actually allocates resources elsewhere, unless T allows to change that behaviour too.
I wonder if it would be ... possible, if instead of allocating each
object individually, I could create a second array on the heap and
save the pointer of each object in the first one
Yes it is possible, even with standard ISO containers, thanks to allocators.
There is concern of thread safety or awareness if this "array" appears to be shared resource between multiple writer and reader threads. You might want to implement thread-local storages instead of using shared one and implement semaphores for crossover cases.
Usual application for that is to allocate not on heap but in statically allocated array, predetermined. Or in array that was allocated once at start of program.
Note that if you use placement new you should not use delete on created objects, you have to call destructor directly. placement new overload is not a true new as far as delete concerned. You may or may not cause error but you certainly will cause an crash if you used static array and you will cause heap corruption when deleting element that got same address as dynamically allocated array beginning
This comes at performance cost when constructing and copying, because I need to create the array on the heap and each object of the array on the heap too.
Copying a POD is extremely cheap. If you research perfect forwarding you can achieve the zero cost abstraction for constructors and the emplace_back() function. When copying, use std::copy() as it is very fast.
Is allocating arrays on the heap faster than allocating each object individually?
Each allocation requires you to ask the operating system for memory. Unless you are asking for a particularly large amount of memory you can assume each request will be a constant amount of time. Instead of asking for a parking space 10 times, ask for 10 parking spaces.
Is it safe to allocate objects in an array and forgetting about the array later? (sounds pretty dumb i think)
Depends what you mean by safe. If you can't answer this question on your own, then you must cleanup the memory and not leak under any circumstance.
An example of a time you might ignore cleaning up memory is when you know the program is going to end and cleaning up memory just to exit is kinda pointless. Still, you should clean it up. Read Serge Ballesta answer for more information about lifetime.

vector<X*> vec vs vector<X>* vec

What is the difference in memory usage between:
std::vector<X*> vec
where each element is on the heap, but the vector itself isn't
and
std::vector<X>* vec
where the vector is declared on the heap, but each element is (on the stack?).
The second option doesn't make much sense- does it mean the vector pointer is on the heap, but it points back at each element, which are on the stack??
std::vector<X*> vec
Is an array of pointers of the class X. This is useful, for example, when making an array of non-copyable classes/objects like std::fstream in C++98. So
std::vector<std::fstream> vec;
is WRONG, and won't work. But
std::vector<std::fstream*> vec;
works, while you have to create a new object for each element, so for example if you want 5 fstream elements you have to write something like
vec.resize(5);
for(unsigned long i = 0; i < vec.size(); i++)
{
vec[i] = new std::fstream;
}
Of course, there are many other uses depending on your application.
Now the second case is a pointer of the vector itself. So:
vector<int>* vec;
is just a pointer! it doesn't carry any information, and you can't use it unless you create the object for the vector itself, like
vec = new vector<int>();
and eventually you may use it as:
vec->resize(5);
Now this is not really useful, since vectors anyway store their data on the heap and manage the memory they carry. So use it only if you have a good reason to do it, and sometimes you would need it. I don't have any example in mind on how it could be useful.
If this is what you really asked:
vector<X>* vec = new vector<X>();
it means that the whole vector with all its elements is on the heap. The elements occupy a contiguous memory block on the heap.
The difference is where (and what) you need to do for manual memory management.
Whenever you have a raw C-style pointer in C++, you need to do some manual memory management -- the raw pointer can point at anything, and the compiler won't do any automatic construction or destruction for you. So you need to be aware of where the pointer points and who 'owns' the memory pointed at in the rest of your code.
So when you have
std::vector<X*> vec;
you don't need to worry about memory management for the vector itself (the compiler will do it for you), but you do need to worry about the memory management of the pointed at X objects for the pointers you put in the vector. If you're allocating them with new, you need to make sure to manually delete them at some point.
When you have
std::vector<X> *vec;
You DO need to worry about memory management for the vector itself, but you DON'T need to worry about memory management for the individual elements.
Simplest is if you have:
std::vector<X> vec;
then you don't need to worry about memory management at all -- the compiler will take care of it for you.
In code using good modern C++ style, none of the above is true.
std::vector<X*> is a collection of handles to objects of type X or any of its subclasses, which you do not own. The owner knows how they were allocated and will deallocate them -- you don't know and don't care.
std::vector<X>* would in practice, only ever be used as a function argument which represents a vector you do not own (the caller does) but which you are going to modify. According to one common approach, the fact that it's a pointer rather than a vector means that it is optional. Much more rarely it might be used as a class member where the lifetime of the attached vector is known to outlive the class pointing to it.
std::vector<std::unique_ptr<X>> is a polymorphic collection of mixed objects of various subclasses of X (and maybe X itself directly). Occasionally you might use it non-polymorphically if X is expensive to move, but modern style makes most types cheap to move.
Prior to C++11, std::vector<some_smart_pointer<X> > (yes, there's a space between the closing brackets) would be used for both the polymorphic case and the non-copyable case. Note that some_smart_pointer isn't std::unique_ptr, which didn't exist yet, not std::auto_ptr, which wasn't usable in collections. boost::unique_ptr was a good choice. With C++11, the copyability requirement for collection elements is relaxed to moveability, so this reason completely went away. (There remain some types which are neither copyable nor moveable, such as the ScopeGuard pattern, but these should not be stored in a collection anyway)

Splitting QList into chunks, pointers or references?

I have this application that requires me to have a QList which will contain 1 < x < 10000+ objects. Now I have a few issues.
First of all, should I declare the QList as a pointer or straight on the stack? The objects in the QList are pretty small and are wrappers for QFileInfo. But how should I do this?
A list of objects on the stack?
A list (on stack) of pointers to objects on the heap?
A..
QList<FileInfoWrapper>*
Firstly, if I picked solution 2, would my heap be a mess since I just allocate small portions of data all over the place? I dont want that.
Secondly, if I pick the 3rd solution, how would this look in memory when I access the individual objects? And could I create pointers to them (they are afterall on the heap)?
Then we come to my other issue. This list will be past around like a fork at a diner and at some point I would like to create sublists that doesn't hold any data, only references/pointers to some of the objects in the list(for example object 0 to 250). I will then throw these lists into different threads that will have to have a ref to the object to be able to edit them (read: not a hard copy).
Also, could someone explain exactly what happens on the heap when you create a list like this:
QList<FileInfoWrapper>* list = new QList<FileInfoWrapper>();
Would it be like in c where you just create a pointer to the offset where that object will be located?
*(list + sizeof(FileInfoWrapper) * 10)
QList is a container class ... that means that it manages the memory for you so you don't have to worry about it. It's underlying data-structure is a variant of a deque with some special modifications, so your understanding of indexing into the list is not correct. But either way, these are details that are abstracted away by the interface, and you don't need to worry about them. You simply use the given class methods like operator[] or at() to obtain a reference to an object at a given index, and other functions like push_back() or insert() to copy objects into the container. So you can simply make a QList instance on the stack (as long as it doesn't go out-of-scope while it's needed), and copy objects into it. The underlying data-structure will properly allocate the memory needed dynamically to store the objects, and at the time of destruction of the QList object, it will deallocate the memory used to store the objects it "owns".
Think about QList as you would think of a STL container like std::vector or std::list ... again, the underlying data-structure for QList is not the same as these STL containers, but the point is that you can allocate the data-structure on the stack like you would any other class, and it contains all the private data-members and information necessary to manage the memory on the heap. Allocating the QList on the heap through a call to new doesn't gain you anything in that regard ... there are already pointers, etc. inside the data-structure allocating and managing the memory of the contained objects for you.
Finally, don't worry about data-fragmentation. The point of a good container class is to properly allocate memory to avoid memory fragmentation issues from allocating and reallocating memory too often. Additionally, allocating memory takes time, so if a container class were to constantly need to call new, that would really hurt it's performance. While allocating memory on every insertion may be a necessity for node-based containers like linked-lists and trees, hash-tables, dynamic-arrays, and other block-type data-structures are much more efficient at utilizing the memory they allocate to minimize these allocation calls.

Dealing With Dynamically Allocated Memory in Container Classes

I am having some difficulty understanding how containers are implemented in C++. Specifically, how can I deal with data allocated on the stack vs data allocated on the heap. For instance:
vector<int> VectorA;
VectorA.push_back (1);
VectorA.push_back (2);
VectorA.push_back (3);
vector<int*> VectorB;
VectorB.push_back (new int (1));
VectorB.push_back (new int (2));
VectorB.push_back (new int (3));
How does one deal with making sure the integers in VectorB are deleted properly. I remember reading somewhere that std::vector only calls the destructor and doesn't actually delete anything. Also, if I wanted to implement my own LinkedList class, how would I deal with this particular problem?
The ideal way to deal with the problem is by using Smart Pointers as elements of your container instead of raw pointers.
Otherwise you have to manually do the memory management yourself.
The objects stored in the vector are stored on the heap anyway (in space allocated by the vector).
There is very little advantage in allocating the objects separately (option B), unless you intend to manage them somehow outside of the vector. In that case the vector can just destroy the pointers it stores, and trust the real owner to destroy the objects themselves.
The short answer is that most don't deal with such things at all. The standard containers (e.g., std::vector) don't store the original object at all -- they store a copy of the object. They manage the storage for their own copy, and it's up to you to manage the storage for the original.
In the case of storing pointers, you're basically taking responsibility for managing the storage pointed to by the pointer. The container is still making (and managing the storage for) a copy, but it's just a copy of the pointer, not the object to which it refers. That's up to you to deal with.

Pointer to vector vs vector of pointers vs pointer to vector of pointers

Just wondering what you think is the best practice regarding vectors in C++.
If I have a class containing a vector member variable.
When should this vector be declared a:
"Whole-object" vector member varaiable containing values, i.e. vector<MyClass> my_vector;
Pointer to a vector, i.e vector<MyClass>* my_vector;
Vector of pointers, i.e. vector<MyClass*> my_vector;
Pointer to vector of pointers, i.e. vector<MyClass*>* my_vector;
I have a specific example in one of my classes where I have currently declared a vector as case 4, i.e. vector<AnotherClass*>* my_vector;
where AnotherClass is another of the classes I have created.
Then, in the initialization list of my constructor, I create the vector using new:
MyClass::MyClass()
: my_vector(new vector<AnotherClass*>())
{}
In my destructor I do the following:
MyClass::~MyClass()
{
for (int i=my_vector->size(); i>0; i--)
{
delete my_vector->at(i-1);
}
delete my_vector;
}
The elements of the vectors are added in one of the methods of my class.
I cannot know how many objects will be added to my vector in advance. That is decided when the code executes, based on parsing an xml-file.
Is this good practice? Or should the vector instead be declared as one of the other cases 1, 2 or 3 ?
When to use which case?
I know the elements of a vector should be pointers if they are subclasses of another class (polymorphism). But should pointers be used in any other cases ?
Thank you very much!!
Usually solution 1 is what you want since it’s the simplest in C++: you don’t have to take care of managing the memory, C++ does all that for you (for example you wouldn’t need to provide any destructor then).
There are specific cases where this doesn’t work (most notably when working with polymorphous objects) but in general this is the only good way.
Even when working with polymorphous objects or when you need heap allocated objects (for whatever reason) raw pointers are almost never a good idea. Instead, use a smart pointer or container of smart pointers. Modern C++ compilers provide shared_ptr from the upcoming C++ standard. If you’re using a compiler that doesn’t yet have that, you can use the implementation from Boost.
Definitely the first!
You use vector for its automatic memory management. Using a raw pointer to a vector means you don't get automatic memory management anymore, which does not make sense.
As for the value type: all containers basically assume value-like semantics. Again, you'd have to do memory management when using pointers, and it's vector's purpose to do that for you. This is also described in item 79 from the book C++ Coding Standards. If you need to use shared ownership or "weak" links, use the appropriate smart pointer instead.
Deleting all elements in a vector manually is an anti-pattern and violates the RAII idiom in C++. So if you have to store pointers to objects in a vector, better use a 'smart pointer' (for example boost::shared_ptr) to facilitate resource destructions. boost::shared_ptr for example calls delete automatically when the last reference to an object is destroyed.
There is also no need to allocate MyClass::my_vector using new. A simple solution would be:
class MyClass {
std::vector<whatever> m_vector;
};
Assuming whatever is a smart pointer type, there is no extra work to be done. That's it, all resources are automatically destroyed when the lifetime of a MyClass instance ends.
In many cases you can even use a plain std::vector<MyClass> - that's when the objects in the vector are safe to copy.
In your example, the vector is created when the object is created, and it is destroyed when the object is destroyed. This is exactly the behavior you get when making the vector a normal member of the class.
Also, in your current approach, you will run into problems when making copies of your object. By default, a pointer would result in a flat copy, meaning all copies of the object would share the same vector. This is the reason why, if you manually manage resources, you usually need The Big Three.
A vector of pointers is useful in cases of polymorphic objects, but there are alternatives you should consider:
If the vector owns the objects (that means their lifetime is bounded by that of the vector), you could use a boost::ptr_vector.
If the objects are not owned by the vector, you could either use a vector of boost::shared_ptr, or a vector of boost::ref.
A pointer to a vector is very rarely useful - a vector is cheap to construct and destruct.
For elements in the vector, there's no correct answer. How often does the vector change? How much does it cost to copy-construct the elements in the vector? Do other containers have references or pointers to the vector elements?
As a rule of thumb, I'd go with no pointers until you see or measure that the copying of your classes is expensive. And of course the case you mentioned, where you store various subclasses of a base class in the vector, will require pointers.
A reference counting smart pointer like boost::shared_ptr will likely be the best choice if your design would otherwise require you to use pointers as vector elements.
Complex answer : it depends.
if your vector is shared or has a lifecycle different from the class which embeds it, it might be better to keep it as a pointer.
If the objects you're referencing have no (or have expensive) copy constructors , then it's better to keep a vector of pointer. In the contrary, if your objects use shallow copy, using vector of objects prevent you from leaking...