Splitting QList into chunks, pointers or references?

Splitting QList into chunks, pointers or references? - c++

I have this application that requires me to have a QList which will contain 1 < x < 10000+ objects. Now I have a few issues.
First of all, should I declare the QList as a pointer or straight on the stack? The objects in the QList are pretty small and are wrappers for QFileInfo. But how should I do this?
A list of objects on the stack?
A list (on stack) of pointers to objects on the heap?
A..
QList<FileInfoWrapper>*
Firstly, if I picked solution 2, would my heap be a mess since I just allocate small portions of data all over the place? I dont want that.
Secondly, if I pick the 3rd solution, how would this look in memory when I access the individual objects? And could I create pointers to them (they are afterall on the heap)?
Then we come to my other issue. This list will be past around like a fork at a diner and at some point I would like to create sublists that doesn't hold any data, only references/pointers to some of the objects in the list(for example object 0 to 250). I will then throw these lists into different threads that will have to have a ref to the object to be able to edit them (read: not a hard copy).
Also, could someone explain exactly what happens on the heap when you create a list like this:
QList<FileInfoWrapper>* list = new QList<FileInfoWrapper>();
Would it be like in c where you just create a pointer to the offset where that object will be located?
*(list + sizeof(FileInfoWrapper) * 10)

QList is a container class ... that means that it manages the memory for you so you don't have to worry about it. It's underlying data-structure is a variant of a deque with some special modifications, so your understanding of indexing into the list is not correct. But either way, these are details that are abstracted away by the interface, and you don't need to worry about them. You simply use the given class methods like operator[] or at() to obtain a reference to an object at a given index, and other functions like push_back() or insert() to copy objects into the container. So you can simply make a QList instance on the stack (as long as it doesn't go out-of-scope while it's needed), and copy objects into it. The underlying data-structure will properly allocate the memory needed dynamically to store the objects, and at the time of destruction of the QList object, it will deallocate the memory used to store the objects it "owns".
Think about QList as you would think of a STL container like std::vector or std::list ... again, the underlying data-structure for QList is not the same as these STL containers, but the point is that you can allocate the data-structure on the stack like you would any other class, and it contains all the private data-members and information necessary to manage the memory on the heap. Allocating the QList on the heap through a call to new doesn't gain you anything in that regard ... there are already pointers, etc. inside the data-structure allocating and managing the memory of the contained objects for you.
Finally, don't worry about data-fragmentation. The point of a good container class is to properly allocate memory to avoid memory fragmentation issues from allocating and reallocating memory too often. Additionally, allocating memory takes time, so if a container class were to constantly need to call new, that would really hurt it's performance. While allocating memory on every insertion may be a necessity for node-based containers like linked-lists and trees, hash-tables, dynamic-arrays, and other block-type data-structures are much more efficient at utilizing the memory they allocate to minimize these allocation calls.

Related

Is it safe to "dissolve" c++ arrays on the heap?

I am currently implementing my own vector container and I encountered a pretty interesting Issue(At leas for me). It may be a stupid question but idk.
My vector uses an heap array of pointers to heap allocated objects of unknown type (T**).
I did this because I wanted the pointers and references to individual elements to stay same, even after resizing.
This comes at performance cost when constructing and copying, because I need to create the array on the heap and each object of the array on the heap too. (Heap allocation is slower than on the stack, right?)
T** arr = new *T[size]{nullptr};
and then for each element
arr[i] = new T{data};
Now I wonder if it would be safe, beneficial(faster) and possible, if instead of allocating each object individually, I could create a second array on the heap and save the pointer of each object in the first one.Then use (and delete) these objects later as if they were allocated separately.
=> Is allocating arrays on the heap faster than allocating each object individually?
=> Is it safe to allocate objects in an array and forgetting about the array later? (sounds pretty dumb i think)
Link to my github repo: https://github.com/LinuxGameGeek/personal/tree/main/c%2B%2B/vector
Thanks for your help :)

First a remark, you should not think of the comparison heap/stack in terms of efficiency, but on object lifetime:
automatic arrays (what you call on stack) end their life at the end of the block where they are defined
dynamic arrays (whay you call on heap) exists until they are explicitly deleted
Now it is always more efficient to allocate a bunch of objects in an array than to allocate them separately. You save a number of internal calls and various data structure to maintain the heap. Simply you can only deallocate the array and not the individual objects.
Finally, except for trivially copyable objects, only the compiler and not the programmer knows about the exact allocation detail. For example (and for common implementations) an automatic string (so on stack) contains a pointer to a dynamic char array (so on heap)...
Said differently, unless you plan to only use you container for POD or trivially copyable objects, do not expect to handle all the allocation and deallocation yourself: non trivial objects have internal allocations.

Heap allocation is slower than on the stack, right?
Yes. Dynamic allocation has a cost.
Is allocating arrays on the heap faster than allocating each object individually?
Yes. Multiple allocations have that cost multiplied.
I wonder if it would be ... possible, if instead of allocating each object individually, I could create a second array on the heap and save the pointer of each object in the first one
It would be possible, but not trivial. Think hard how you would implement element erasure. And then think about how you would implement other features such as random access correctly into the container with arrays that contain indices from which elements have been erased.
... safe
It can be implemented safely.
... beneficial(faster)
Of course, reducing allocations from N to 1 would be beneficial by itself. But it comes at the cost of some scheme to implement the erasure. Whether this cost is greater than the benefit of reduced allocations depends on many things such as how the container is used.
Is it safe to allocate objects in an array and forgetting about the array later?
"Forgetting" about an allocation seems like a way to say "memory leak".
You could achieve similar advantages with a custom "pool" allocator. Implementing support for custom allocators to your container might be more generally useful.
P.S. Boost already has a "ptr_vector" container that supports custom allocators. No need to reinvent the wheel.

I did this because I wanted the pointers and references to individual
elements to stay same, even after resizing.
You should just use std::vector::reserve to prevent reallocation of vector data when it is resized.
Vector is quite primitive, but is is highly optimized. It will be extremely hard for you to beat it with your code. Just inspect its API and try its all functionalities. To create something better advanced knowledge of template programing is required (which apparently you do not have yet).

What you are trying to come up with is a use of placement new allocation for a deque-like container. It's a viable optimization, but usually its done to reduce allocation calls and memory fragmentation, e.g. on some RT or embedded systems. The array maybe even a static array in that case. But if you also require that instances of T would occupy adjacent space, that's a contradicting requirement, resorting them would kill any performance gains.
... beneficial(faster)
Depends on T. E.g. there is no point to do that to something like strings or shared pointers. Or anything that actually allocates resources elsewhere, unless T allows to change that behaviour too.
I wonder if it would be ... possible, if instead of allocating each
object individually, I could create a second array on the heap and
save the pointer of each object in the first one
Yes it is possible, even with standard ISO containers, thanks to allocators.
There is concern of thread safety or awareness if this "array" appears to be shared resource between multiple writer and reader threads. You might want to implement thread-local storages instead of using shared one and implement semaphores for crossover cases.
Usual application for that is to allocate not on heap but in statically allocated array, predetermined. Or in array that was allocated once at start of program.
Note that if you use placement new you should not use delete on created objects, you have to call destructor directly. placement new overload is not a true new as far as delete concerned. You may or may not cause error but you certainly will cause an crash if you used static array and you will cause heap corruption when deleting element that got same address as dynamically allocated array beginning

This comes at performance cost when constructing and copying, because I need to create the array on the heap and each object of the array on the heap too.
Copying a POD is extremely cheap. If you research perfect forwarding you can achieve the zero cost abstraction for constructors and the emplace_back() function. When copying, use std::copy() as it is very fast.
Is allocating arrays on the heap faster than allocating each object individually?
Each allocation requires you to ask the operating system for memory. Unless you are asking for a particularly large amount of memory you can assume each request will be a constant amount of time. Instead of asking for a parking space 10 times, ask for 10 parking spaces.
Is it safe to allocate objects in an array and forgetting about the array later? (sounds pretty dumb i think)
Depends what you mean by safe. If you can't answer this question on your own, then you must cleanup the memory and not leak under any circumstance.
An example of a time you might ignore cleaning up memory is when you know the program is going to end and cleaning up memory just to exit is kinda pointless. Still, you should clean it up. Read Serge Ballesta answer for more information about lifetime.

Is it better to use a vector containing pointers, or a vector of values (to avoid heap fragmentation)?

I'm aware of the many articles on avoiding heap fragmentation. My question has to do with specifically what happens when we use a vector to store data:
class foo{
public:
std::vector<bar>; // Or I can have std::vector<bar*>
};
I have no problem using new or delete (I don't have memory leaks, and it is very clear to me when a bar is not used and I can call delete when necessary). But with regarding to heap fragmentation, which is better ? Does this make a stack overflow more likely ?
EDIT: I don't actually have any new information to add. I just wanted to thank everyone, and say that I've found that any question that is tagged with C++ seems to attract so many knowledgeable helpful people. Its really quite nice. Thank you.

TL;DR, unless you have a good reason, with sound measurements, always favour using a vector with value types.
I would argue for the std::vector<bar> form. Unless you have other (measured) reasons why, the guaranteed contiguous memory of vector with a value type is a better idea. When the vector allocates a contiguous block of memory, the objects will be laid out in memory next to each other (with some reserve). This allows the host system and runtime a better chance to avoid fragmentation. Allocating the objects individually may result in fragmentation since you do not control where they are allocated.
I note your concerns around the stack; vector has a small stack allocation so this shouldn't be a problem, the "usual" allocator std::allocator uses new to allocate memory for the vector on the heap.
Why should a vector always be favoured? Herb Sutter goes into some detail about this; http://channel9.msdn.com/Events/Build/2014/2-661 with supporting graphs, diagrams, explanations etc. from around the 23:30 mark and he picks up Bjarne's material at around the 46:00 mark.

The std::vector guarantees, that it's contents will be stored in continuous memory. If you store objects in your vector, then each object will be next to each other in a big chunk of memory, but if you put pointers to the vector, which point to objects created with new, then each object will be in random places in memory.
So the first version (storing objects, not pointers) is better for both avoiding heap fragmentation and utilizing the cache.

OP seems to have a basic misunderstanding about what std::vector does: It allocates a contiguous block of memory on the heap, you do not need to worry about your stack-space when using vector! So whenever you do not really need to store anything but the objects them self, just put them in your vector and you are good.
So std::vector<bar> appears to be the right choice in your case. Also, don't worry about heap fragmentation prematurely, let the compiler worry about that. (But you should worry about feeding pre-fetcher, that is why std::vector<bar> should be your standard choice to store anything.)

Use a vector of pointers in the following cases:
Bar is a base class and you want to store instances of derived classes in the vector.
Bar doesn't have an appropriate copy/move constructor or it is prohibitively expensive.
You want to store pointers to objects in the vector elsewhere, and you may change the capacity of the vector (for example, due to resize or push_back). If the objects are stored by value, old pointers will become invalid when the capacity changes.
You want some elements in the vector to be "empty", (which can be represented by null pointers), and it's not practical to store dummy, unused instances of Bar in the vector by value instead.
If you do need pointers, I strongly suggest using smart pointers.
In other cases, you should prefer to store objects by value, as it is usually more efficient.

Memory allocation of values in a std::map

I've had some experience in C++ from school works. I've learned, among other things, that objects should be stored in a container (vector, map, etc) as pointers. The main reason being that we need the use of the new-operator, along with a copy constructor, in order to create a copy on the heap (otherwise called dynamic memory) of the object. This method also necessitates defining a destructor.
However, from what I've read since then, it seems that STL containers already store the values they contain on the heap. Thus, if I were to store my objects as values, a copy (using the copy constructor) would be made on the heap anyway, and there would be no need to define a destructor. All in all, a copy on the heap would be made anyway???
Also, if(true), then the only other reason I can think of for storing objects using pointers would be to alleviate resource needs for copying the container, as pointers are easier to copy than whole objects. However, this would require the use of std::shared_ptr instead of regular pointers, since you don't want elements in the copied container to be deleted when the original container is destroyed. This method would also alleviate the need for defining a destructor, wouldn't it?
Edit : The destructor to be defined would be for the class using the container, not for the class of the objects stored.
Edit 2 : I guess a more precise question would be : "Does it make a difference to store objects as pointers using the new-operator, as opposed to plain values, on a memory and resources used standpoint?"

The main reason to avoid storing full objects in containers (rather than pointers) is because copying or moving those objects is expensive. In that case, the recommended alternative is to store smart pointers in the container.
So...
vector<something_t> ................. Usually perfectly OK
vector<shared_ptr<something_t>> ..... Preferred if you want pointers
vector<something_t*> ................ Usually best avoided
The problem with raw pointers is that, when a raw pointer disappears, the object it points to hangs around causing memory and resource leaks - unless you've explicitly deleted it. C++ doesn't have garbage collection, and when a pointer is discarded, there's no way to know if other pointers may still be pointing to that object.
Raw pointers are a low-level tool - mostly used to write libraries such as vector and shared_ptr. Smart pointers are a high-level tool.
However, particularly with C++11 move semantics, the costs of moving items around in a vector is normally very small even for huge objects. For example, a vector<string> is fine even if all the strings are megabytes long. You mostly worry about the cost of moving objects if sizeof(classname) is big - if the object holds lots of data inside itself rather than in separate heap-allocated memory.
Even then, you don't always worry about the cost of moving objects. It doesn't matter that moving an object is expensive if you never move it. For example, a map doesn't need to move items around much. When you insert and delete items, the nodes (and contained items) stay where they are, it's just the pointers that link the nodes that change.

Using class objects in std::vector

What is better -
std::vector<MyClass*>
or
std::vector<MyClass>
?
I mean, will the second option store objects in heap anyway?
Which one is faster and cleaner?

std::vector<MyClass> shall be preferable in most cases. Yes, it will store objects in heap (dynamic storage) by default std::allocator.
The advantages are that objects are automatically destroyed on vector destruction and are allocated in single contiguous memory block reducing heap fragmentation. So this way it's cleaner.
Also this way is faster because it could minimize memory allocation operations. Vector will preallocate storage before constructing objects, so for N objects it will be M allocation operations and N constructor call, N > M (the bigger is N - the greater is difference). If you create objects manually and place them to vector by pointers it will lead to M allocations and N constructions, M = N + X, where X is vector storage allocations. And you always can minimize vector memory allocations if you know number of stored objects - using std::vector::reserve().
On the contrary using std::vector of pointers will require you to destroy objects manually, e.g. calling delete for dynamically allocated objects. This is not recommended. Such containers shall be used only as non-owning. Objects ownership shall be maintained externally in this case.

Yes, the second version will store the objects on the heap too, but in a more compact form, namely an array of MyClass.
In the first form you must allocate and deallocate your objects, while std::vector will do this for you in the second version.
So, as always, it depends on your needs and requirements. If you can choose, take the second form:
std::vector<MyClass>
it's much easier to maintain.

Depends. Storing copies of your class in the container is easier to work with, of course, but you need to run the copy constructor every time to save an instance to the container which may be problematical. Also, your container cannot store anything but that one class - in particular it cannot store other classes that are specialised ( inherited ) from your base class. So, in general, you usually end up having to store pointers to the class.
There are snags with storing pointers. However, to get the best of both worlds, consider using boost::ptr_vector instead, which has the advantages of smart pointers without the overhead.

std::vector<MyClass> will store objects on the heap (probably), std::vector<MyClass*> will store your pointers on the heap (probably) and your objects wherever they were created. Use std::vector<MyClass> whenever applicable, if you need to support inheritance std::vector<std::unique_ptr<MyClass>> again when applicable. If your vector isn't supposed to own the objects an std::vector<MyClass*> can be useful, but that's rarely the case.

A std::vector of pointers?

Here is what I'm trying to do. I have a std::vector with a certain number of elements, it can grow but not shrink. The thing is that its sort of cell based so there may not be anything at that position. Instead of creating an empty object and wasting memory, I thought of instead just NULLing that cell in the std::vector. The issue is that how do I get pointers in there without needing to manage my memory? How can I take advantage of not having to do new and keep track of the pointers?

How large are the objects and how sparse do you anticipate the vector will be? If the objects are not large or if there aren't many holes, the cost of having a few "empty" objects may be lower than the cost of having to dynamically allocate your objects and manage pointers to them.
That said, if you do want to store pointers in the vector, you'll want to use a vector of smart pointers (e.g., a vector<shared_ptr<T>>) or a container designed to own pointers (e.g., Boost's ptr_vector<T>).

If you're going to use pointers something will need to manage the memory.
It sounds like the best solution for you would be to use boost::optional. I believe it has exactly the semantics that you are looking for. (http://www.boost.org/doc/libs/1_39_0/libs/optional/doc/html/index.html).
Actually, after I wrote this, I realized that your use case(e.g. expensive default constructor) is used by the boost::optional docs: http://www.boost.org/doc/libs/1_39_0/libs/optional/doc/html/boost_optional/examples.html#boost_optional.examples.bypassing_expensive_unnecessary_default_construction

You can use a deque to hold an ever-increasing number of objects, and use your vector to hold pointers to the objects. The deque won't invalidate pointers to existing objects it holds if you only add new objects to the end of it. This is far less overhead than allocating each object separately. Just ensure that the deque is destroyed after or at the same time as the vector so you don't create dangling pointers.
However, based on the size of the 3-D array you mentioned in another answer's comment, you may have difficulty storing that many pointers. You might want to look into a sparse array implementation so that you mainly use memory for the portions of the array where you have non-null pointers.

You could use a smart pointer. For example boost::shared_ptr.

The issue is that how do I get pointers in there without needing to manage my memory?
You can do certainly do this using the shared_ptr or other similar techniques mentioned here. But in near future you will come across some problem where you will have to manage your own memory. So please get yourself comfortable with the pointer concept.
Normally if you see in big servers the memory management of object itself is considered as a responsibility and specially for this purpose you will create a class. This is known as pool. Whenever you need an object you ask the pool to give you the object and whenever you are done with the object you tell the pool that I am done. Now it is the responsibility of the pool to see what can be done with that object.
But the basic idea is your main program still deals with pointers but do not care about the memory. There is some other object who cares about it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js