Inserting items to vector by value vs unique_ptr - c++

I wonder what's the difference in performance and memory consumption between these two declarations of vector.
class MyType;
std::vector<MyType> vectorA;
std::vector<std::unique_ptr<MyType>> vectorB;
I have a dilemma, since I have to create some observers into the vector and if I'm doing that while initializing, I have to use unique_ptrs because of the vector's variable size. I don't want to use shared_ptrs because they have big memory overhead.
I'm looking for the best trade-off between performance and memory consumption. I need to know if it's better to init vectorA first (after initialization it's strictly immutable) and then create observers into that vector (it won't be changing its content so I can do that). In this scenario I would lose some performance.
Or if it is better to create vectorB instead of vectorA. Then I would be able to create observers while initialization, since the location that the unique_ptr is pointing to won't be changed. I suppose that in this case I would lose at least 4B/8B per item.
What do the best practices say? I'm beginner with C++ and I'm solving such a problem for the first time.

If possible, I would go with std::vector<MyType>.
The advantages of the approach are:
A single memory block containing all elements, so looping over it will better as the prefetcher can work better
Less memory allocations as new/delete are expensive methods
Less memory overhead (however, does 1 extra pointer per element do that much? You might gain this by ordering your members to reduce padding)
Pointers to elements are stable until you realloc (so reserving in front with the correct size can give stable pointers if you never push_back more than reserved)
However, using std::vector<std::unique_ptr<MyType>> resolves a few of the negative elements:
Can be used with MyType being an abstract class (or interface)
Move/copy constructor of the class will not be called (which can be expensive or non-existent)
It will allow you to keep raw pointers to MyType

Related

C++ how to manage iterators of contiguous dynamic arrays

I have been trying to figure out an efficient way of managing dynamic arrays which I may change occasionally but would like to randomly access and iterate over often.
I would like to be able to:
store the array in a continuous data block (reduce cache misses)
access each element individually and independently of the array handle (pointers > indices)
resize the array (dynamic)
So in order to achieve this I have been trying things out using std::vector<T>::iterator, and it worked very well, until recently, when I resized the vector (e.g. calling push_back()) that I was storing iterators of. All the iterators became invalid, because they were pointing to stale memory.
Is there any efficient (possibly STL-)way of keeping the iterator pointers up to date? Or do I have to update each Iterator manually?
Is this whole approach even worthwhile? Should I stick with indices?
EDIT:
I have used indices before and it was ok, but I have changed my approach because it still wasn´t good. I would always have to drag the entire array into scope and the indices could be easily used for any array. also there is no perfect way of defining a "NULL" index (none I know about).
What about the option to update all pointers along with a resize operation? All you would have to do is to store the original vector::begin, resize the vector and afterwards update all pointers to vector.begin() + (ptr - prevBegin) and resize operations is already something you should try to avoid.
Fully achieving all 3 of your goals is impossible. If you are fully contiguous, then you have one memory block with a finite size, and the only way to get more memory is to ask for more memory, which will not be contiguous with the memory you already have. So you have to sacrifice at least one requirement, to at least some degree:
If you are willing to partly sacrifice contiguity, you can use a std::deque. This is an array-of-arrays kind of structure. It doesn't invalidate references, for I think any operation that increases its size. It depends on the details of your data type but generally its performance is much closed to a contiguous array than a linked list. Well done but old (5 year) benchmarks: https://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html. Another option is to write a chunking allocator, to use either with deque or another structure. This is quite a bit more work though.
If you can use indices, then you can just use a vector
If you don't need to resize, you can still just use a vector and never resize it.
Unless you have a good reason, I would stick with indices. If your main performance bottlenecks are iteration related over a large number of elements (as your contiguity requirement implies), then this whole indexing thing should really be a non-issue. If you do have a very good reason for avoiding indices (which you haven't stated), then I would profile the deque versus the vector on the main loop operation to see how much worse the deque really does. It might be barely worse, and if neither deque nor vector work well enough for you, the next alternatives are quite a bit more work (probably involving allocators or a custom data structure).
Depending on your needs, if you can use the following data structure:
std::vector<std::unique_ptr<Foo>>
then no matter how the vector is resized, if you access your data via a Foo*, the pointer to foo will not be invalidated.
As the number of Foos you need to store in your vector changes, the vector may need to resize it's internal contiguous block of memory, which means any iterators you have pointing inside the vector will be invalidated when the vector resizes.
(You can read more here on C++0x iterator invalidation rules)
However, since the object stored in the vector is a pointer to an object elsewhere in the heap, the pointed-to-object (Foo in this example), will not be invalidated.
Note that the vector owns Foo (as alluded to by std::unique_ptr<Foo>), whilst you can store a non-owning pointer to Foo by keeping a Foo* as the means of accessing your Foo data.
So long as the vector outlives your need to access Foo via your Foo*, then you will not have any lifetime issues.
So in terms of your requirements:
store the array in a continuous data block (reduce cache misses)
yes, std::vector achieves this
access each element individually and independently of the array handle (pointers > indices)
yes, store a Foo* as your means of accessing each element individually, and that remains independent of the array handle (vector::iterator)
resize the array (dynamic)
yes, std::vector achieves this, automatically, resizing for you when you need it to.
Bonus:
Using a smart pointer (in this example std::unique_ptr) in the vector means memory management is also handled automatically for you. (Just make sure you don't try to access a Foo* after the vector is destroyed.
Edit:
It has been pointed out in the comments that storing a std::unique_ptr<Foo> in the vector violates your requirement for the objects to be stored in contiguous memory (if indeed that is what you mean by store the array in contiguous memory, as the vector's underlying array will be contiguous, but accessing the Foo objects will incur an indirection).
However, if you use a suitable allocator (eg arena allocator) for both the vector and the Foo objects, then you will have a higher chance of suffering fewer cache misses, as your Foo objects will exist near to the memory used by your vector, thereby having a higher chance of being in cache when iterating over the vector.

Considerations when choosing between storing values vs pointers in std:: containers

When storing objects in standard collections, what considerations should one think of when deciding between storing values vs pointers? Where the owner of these collections (ObjectOwner) is allocated on the heap. For small structs/objects I've been storing values, while for large objects I've been storing pointers. My reasoning for this was that when standard containers are resized, their contents are copied (small copy ok, big copy bad). Any other things to keep in mind here?
class ObjectOwner
{
public:
SmallObject& getSmallObject(int smallObjectId);
HugeObject* getHugeObject(int hugeObjectId);
private:
std::map<int, SmallObject> mSmallObjectMap;
std::map<int, HugeObject *> mHugeObjectMap;
};
Edit:
an example of the above for more context:
Create/Delete items stored in std::map relatively infrequently ( a few times per second)
Get from std::map frequently (once per 10 milliseconds)
small object: < 32 bytes
huge object: > 1024 bytes
I would store object by value unless I need it through pointer. Possible reasons:
I need to store object hierarchy in a container (to avoid slicing)
I need shared ownership
There are possibly other reasons, but reasoning by size is only valid for some containers (std::vector for example) and even there you can make object moving cost minimal (reserve enough room in advance for example). You example for object size with std::map does not make any sense as std::map does not relocate objects when growing.
Note: return type of a method should not reflect the way you store it in a container, but rather it should be based on method semantics, ie what you would do if object is not found.
Only your profiler knows the answer for your SPECIFIC case; trying to use pointers rather than objects is a reliable way of minimising the amount you copy when a copy must be done (be it a resize of a vector or a copy of the whole container); but sometimes you WANT that copy because it's a snapshot for a thread inside a mutex, and it's faster to copy the container than to hold the mutex and deal with the data.
Some objects might not be possible to keep in any way other than pointer because they're not copyable.
Any performance gain by using container of pointer could be offset by costs of having to write more copy code, or repeated calls to new().
There's not a one answer fits all, and before you worry about the performance here you should establish where the performance problems really are. (Just repeating the point - use a profiler!)

std::vector<A> vs std::vector<A*> difference for CPU

Lets discuss a case when I have a huge std::vector. I need to iterate on all elements and call print function. There are two cases. If I store my objects in the vector, and the objects will be next to each other in memory, or I allocate my object is the heap, and store the pointers of the objects in the vector. In this case the objects will be distributed in all over the RAM.
In case copies of the objects are stored in std::vector<A>, when CPU brings data from RAM to CPU cache then it brings a chunk of memory, which contains multiple elements of the vector. In this case when you iterate on each element and call a function, then you know that multiple elements will be processed and only then the CPU will go to RAM to request the remaining part of data to process. And this is good because CPU does not have a lot of free cycles.
What about the case of the std::vector<A*>? When it brings a chunk of pointers is it easy for CPU to obtain objects by pointer? Or it should request from RAM the objects on which you call some functions and there will be cache misses and free CPU cycles? Is it bad compared with the case above in the aspect of performance?
At least in a typical case, when the CPU fetches a pointer (or a number of pointers) from memory, it will not automatically fetch the data to which those pointers refer.
So, in the case of the vector of pointers, when you load the item that each of those pointers refers to, you'll typically get a cache miss, and access will be substantially slower than if they were stored contiguously. This is particularly true when/if each item is relatively small, so a number of them could fit in a single cache line (for some level of cache--keep in mind that a current processor will often have two or three levels of cache, each of which might have a different line size).
It may, however, be possible to mitigate this to some degree. You can overload operator new for a class to control allocations of objects of that class. Using this, you can at least keep objects of that class together in memory. That doesn't guarantee that the items in a particular vector will be contiguous, but could improve locality enough to make a noticeable improvement in speed.
Also note that the vector allocates its data via an Allocator object (which defaults to std::allocator<T>, which, in turn, uses new). Although the interface is kind of a mess so it's harder than you'd generally like, you can define an allocator to act differently if you wish. This won't generally have much effect on a single vector, but if (for example) you have a number of vectors (each of fixed size) and want them to use memory next to each other, you could do that via the allocator object.
If I store my objects in the vector, and the objects will be next to each other in memory, or I allocate my object is the heap
Regardless of using std::vector<A> or std::vector<A *>, the inner buffer of the vector will be allocated in the heap. You could, though, use an effecient memory pool to manage allocations and deletions, but you're still going to work with data on the heap.
Is it bad compared with the case above in the aspect of performance?
In the case of using std::vector<A *> without an specialized memory menagement, you may be lucky as to make the allocations and always get data nicely aligned in memory, but it is generally better to have the contiguous allocations performed by std::vector<A>. In the former case, it may take longer to have to reallocate the entire vector (since pointers are usually smaller than regular structs), but it will suffer from locality (considering memory accesses).
When it brings a chunk of pointers is it easy for CPU to obtain
objects by pointer?
No, it isn't. CPU doesn't know they're pointers (everything CPU sees is just a bunch of bits, no semantics involved) until it fetches "dereferencing" instruction.
Or it should request from RAM the objects on which you call some
functions and there will be cache misses and free CPU cycles?
That's right. CPU will try to load data corresponding to a cached pointer but it's likely that this data is located somewhere far away from recently accessed memory, so it'd be a cache miss.
Is it bad compared with the case above in the aspect of performance?
If the only thing you care about is accessing elements, then yes, it's bad. Yet in some cases vector of pointers is preferable. Namely, if your objects don't support moving (C++11 isn't mainstream yet) then vector copying becomes more expensive. Even if don't copy your vector it may be the case when you don't know in advance number of stored elements, so you can't call reverse(n) beforehand. Then all your objects will be copied when vector will exhaust its capacity and will be forced to resize.
But in the end it depends on concrete type. If your objects is small (tiny structs, ints or floats) then it's obviously better to work with then by copying because of overhead of pointers would be too big.

push_back objects into vector memory issue C++

Compare the two ways of initializing vector of objects here.
1.
vector<Obj> someVector;
Obj new_obj;
someVector.push_back(new_obj);
2.
vector<Obj*> ptrVector;
Obj* objptr = new Obj();
ptrVector.push_back(objptr);
The first one push_back actual object instead of the pointer of the object. Is vector push_back copying the value being pushed? My problem is, I have huge object and very long vectors, so I need to find a best way to save memory.
Is the second way better?
Are there other ways to have a vector of objects/pointers that I can find each object later and use the least memory at the same time?
Of the two above options, this third not included one is the most efficient:
std::vector<Obj> someVector;
someVector.reserve(preCalculatedSize);
for (int i = 0; i < preCalculatedSize; ++i)
someVector.emplace_back();
emplace_back directly constructs the object into the memory that the vector arranges for it. If you reserve prior to use, you can avoid reallocation and moving.
However, if the objects truly are large, then the advantages of cache-coherency are less. So a vector of smart pointers makes sense. Thus the forth option:
std::vector< std::unique_ptr<Obj> > someVector;
std::unique_ptr<Obj> element( new Obj );
someVector.push_back( std::move(element) );
is probably best. Here, we represent the lifetime of the data and how it is accessed in the same structure with nearly zero overhead, preventing it from getting out of sync.
You have to explicitly std::move the std::unique_ptr around when you want to move it. If you need a raw pointer for whatever reason, .get() is how to access it. -> and * and explicit operator bool are all overridden, so you only really need to call .get() when you have an interface that expects a Obj*.
Both of these solutions require C++11. If you lack C++11, and the objects truly are large, then the "vector of pointers to data" is acceptable.
In any case, what you really should do is determine which matches your model best, check performance, and only if there is an actual performance problem do optimizations.
If your Obj class doesn't require polymorphic behavior, then it is better to simply store the Obj types directly in the vector<Obj>.
If you store objects in vector<Obj*>, then you are assuming the responsibility of manually deallocating those objects when they are no longer needed. Better, in this case, to use vector<std::unique_ptr<Obj>> if possible, but again, only if polymorphic behavior is required.
The vector will store the Obj objects on the heap (by default, unless you override the allocator in the vector template). These objects will be stored in contiguous memory, which can also give you better cache locality, depending upon your use case.
The drawback to using vector<Obj> is that frequent insertion/removal from the vector may cause reallocation and copying of your Obj objects. However, that usually will not be the bottleneck in your application, and you should profile it if you feel like it is.
With C++11 move semantics, the implications of copying can be much reduced.
Using a vector<Obj> will take less memory to store if you can reserve the size ahead of time. vector<Obj *> will necessarily use more memory than vector<Obj> if the vector doesn't have to be reallocated, since you have the overhead of the pointers and the overhead of dynamic memory allocation. This overhead may be relatively small though if you only have a few large objects.
However, if you are very close to running out of memory, using vector<Obj> may cause a problem if you can't reserve the correct size ahead of time because you'll temporarily need extra storage when reallocating the vector.
Having a large vector of large objects may also cause an issue with memory fragmentation. If you can create the vector early in the execution of your program and reserve the size, this may not be an issue, but if the vector is created later, you might run into a problem due to memory holes on the heap.
Under the circumstances, I'd consider a third possibility: use std::deque instead of std::vector.
This is kind of a halfway point between the two you've given. A vector<obj> allocates one huge block to hold all the instances of the objects in the vector. A vector<obj *> allocates one block of pointers, but each instance of the object in a block of its own. Therefore, you get N objects plus N pointers.
A deque will create a block of pointers and a number of blocks of objects -- but (at least normally) it'll put a number of objects (call it M) together into a single block, so you get a block of N/M pointers, and N/M of objects.
This avoids many of the shortcomings of either a vector of objects or a vector of pointers. Once you allocate a block of objects, you never have to reallocate or copy them. You do (or may) eventually have to reallocate the block of pointers, but it'll be smaller (by a factor of M) than the vector of pointers if you try to do it by hand.
One caveat: if you're using Microsoft's compiler/standard library, this may not work very well -- they have some strange logic (still present up through VS 2013 RC) that means if your object size is larger than 16, you'll get only one object per block -- i.e., equivalent to your vector<obj *> idea.

C++ layout of an object containing containers

As someone with a lot of assembler language experience and old habits to lose, I recently did a project in C++ using a lot of the features that c++03 and c++11 have to offer (mostly the container classes, including some from Boost). It was surprisingly easy - and I tried wherever I could to favor simplicity over premature optimization. As we move into code review and performance testing I'm sure some of the old hands will have aneurisms at not seeing exactly how every byte is manipulated, so I want to have some advance ammunition.
I defined a class whose instance members contain several vectors and maps. Not "pointers to" vectors and maps. And I realized that I haven't got the slightest idea how much contiguous space my objects take up, or what the performance implications might be for frequently clearing and re-populating these containers.
What does such an object look like, once instantiated?
Formally, there aren't any constraints on the implementation
other than those specified in the standard, with regards to
interface and complexity. Practically, most, if not all
implementations derive from the same code base, and are fairly
similar.
The basic implementation of vector is three pointers. The
actual memory for the objects in the vector is dynamically
allocated. Depending on how the vector was "grown", the dynamic
area may contain extra memory; the three pointers point to the
start of the memory, the byte after the last byte currently
used, and the byte after the last byte allocated. Perhaps the
most significant aspect of the implementation is that it
separates allocation and initialization: the vector will, in
many cases, allocate more memory than is needed, without
constructing objects in it, and will only construct the objects
when needed. In addition, when you remove objects, or clear the
vector, it will not free the memory; it will only destruct the
objects, and will change the pointer to the end of the used
memory to reflect this. Later, when you insert objects, no
allocation will be needed.
When you add objects beyond the amount of allocated space,
vector will allocate a new, larger area; copy the objects into
it, then destruct the objects in the old space, and delete it.
Because of the complexity constrains, vector must grow the area
exponentially, by multiplying the size by some fixed constant
(1.5 and 2 are the most common factors), rather than by
incrementing it by some fixed amount. The result is that if you
grow the vector from empty using push_back, there will not be
too many reallocations and copies; another result is that if you
grow the vector from empty, it can end up using almost twice as
much memory as necessary. These issues can be avoided if you
preallocate using std::vector<>::reserve().
As for map, the complexity constraints and the fact that it must
be ordered mean that some sort of balanced tree must be used.
In all of the implementations I know, this is a classical
red-black tree: each entry is allocated separately, in a node
which contains two or three pointers, plus maybe a boolean, in
addition to the data.
I might add that the above applies to the optimized versions of
the containers. The usual implementations, when not optimized,
will add additional pointers to link all iterators to the
container, so that they can be marked when the container does
something which would invalidate them, and so that they can do
bounds checking.
Finally: these classes are templates, so in practice, you have
access to the sources, and can look at them. (Issues like
exception safety sometimes make the implementations less
straight forward than we might like, but the implementations
with g++ or VC++ aren't really that hard to understand.)
A map is a binary tree (of some variety, I believe it's customarily a Red-Black tree), so the map itself probably only contains a pointer and some housekeeping data (such as the number of elements).
As with any other binary tree, each node will then contain two or three pointers (two for "left & right" nodes, and perhaps one to the previous node above to avoid having to traverse the whole tree to find where the previous node(s) are).
In general, vector shouldn't be noticeably slower than a regular array, and certainly no worse than your own implementation of a variable size array using pointers.
A vector is a wrapper for an array. The vector class contains a pointer to a contiguous block of memory and knows its size somehow. When you clear a vector, it usually retains its old buffer (implementation-dependent) so that the next time you reuse it, there are less allocations. If you resize a vector above its current buffer size, it will have to allocate a new one. Reusing and clearing the same vectors to store objects is efficient. (std::string is similar). If you want to find out exactly how much a vector has allocated in its buffer, call the capacity function and multiply this by the size of the element type. You can call the reserve function to manually increase the buffer size, in expectation of the vector taking more elements shortly.
Maps are more complicated so I don't know. But if you need an associative container, you would have to use something complicated in C too, right?
Just wanted to add to the answers of others few things that I think are important.
Firstly, the default (in implementations I've seen) sizeof(std::vector<T>) is constant and made up of three pointers. Below is excerpt from GCC 4.7.2 STL header, the relevant parts:
template<typename _Tp, typename _Alloc>
struct _Vector_base
{
...
struct _Vector_impl : public _Tp_alloc_type
{
pointer _M_start;
pointer _M_finish;
pointer _M_end_of_storage;
...
};
...
_Vector_impl _M_impl;
...
};
template<typename _Tp, typename _Alloc = std::allocator<_Tp> >
class vector : protected _Vector_base<_Tp, _Alloc>
{
...
};
That's where the three pointers come from. Their names are self-explanatory, I think. But there is also a base class - the allocator. Which takes me to my second point.
Secondly, std::vector< T, Allocator = std::allocator<T>> takes second template parameter that is a class that handles memory operations. It's through functions of this class vector does memory management. There is a default STL allocator std::allocator<T>>. It has no data-members, only functions such as allocate, destroy etc. It bases its memory handling around new/delete. But you can write your own allocator and supply it to the std::vector as second template parameter. It has to conform to certain rules, such as functions it provides etc, but how the memory management is done internally - it's up to you, as long as it does not violate logic of std::vector relies on. It might introduce some data-members that will add to the sizeof(std::vector) through the inheritance above. It also gives you the "control over each bit".
Basically, a vector is just a pointer to an array, along with its capacity (total allocated memory) and size (actually used elements):
struct vector {
Item* elements;
size_t capacity;
size_t size;
};
Of course thanks to encapsulation all of this is well hidden and the users never get to handle the gory details (reallocation, calling constructors/destructors when needed, etc) directly.
As to your performance questions regarding clearing, it depends how you clear the vector:
Swapping it with a temporary empty vector (the usual idiom) will delete the old array: std::vector<int>().swap(myVector);
Using clear() or resize(0) will erase all the items and keep the allocated memory and capacity unchanged.
If you are concerned about efficiency, IMHO the main point to consider is to call reserve() in advance (if you can) in order to pre-allocate the array and avoid useless reallocations and copies (or moves with C++11). When adding a lot of items to a vector, this can make a big difference (as we all know, dynamic allocation is very costly so reducing it can give a big performance boost).
There is a lot more to say about this, but I believe I covered the essential details. Don't hesitate to ask if you need more information on a particular point.
Concerning maps, they are usually implemented using red-black trees. But the standard doesn't mandate this, it only gives functional and complexity requirements so any other data structure that fits the bill is good to go. I have to admit, I don't know how RB-trees are implemented but I guess that, again, a map contains at least a pointer and a size.
And of course, each and every container type is different (eg. unordered maps are usually hash tables).