Basic - shared_ptr to vector of vectors of values - c++

I have a vector of a vectors of objects containing just a few integers.
The outer vector holds hundreds of vectors, those hold thousands to hundreds of thousands of Data objects.
I am using a library with a lot of shared_ptr's involved, so that's what i'll be using.
How do I store this so that the data is stored to the heap?
std::vector<std::shared_ptr<std::vector<Data>>>
std::vector<std::vector<std::shared_ptr<Data>>>
etc
What is the correct way to handle this?

To store something on the heap you use new in c++ or malloc in c. Although I believe that the vector implementation does use the heap since vector is a dynamically sized container. So in reality if you add an element to a vector that elemenet is already on the heap unless it is a pointer in which case just the pointer is on the heap and not the element that the pointer points to as #Oswald points out.

How do I store this so that the data is stored to the heap?
If you need reference semantics, i.e. if you need the values in the container to be aliases for values which are also referred to from other parts of the code, and modifications made in one part of the code should be visible for other parts that hold an alias to the modified Data object, I would say this is the right container definition:
std::vector<std::vector<std::shared_ptr<Data>>>
For what concerns your question about where the storage comes from, std::vector always allocates its elements dynamically in a continuous region of storage, no matter whether those are shared_ptrs, vectors, or Datas.
However, I would recommend you to think if you really need reference semantics, or if it is not enough to store objects of type Data by value inside the containers:
std::vector<std::vector<Data>>
This would simplify your code and you would also get rid of the shared_ptr memory and run-time overhead.
Whether or not you need reference semantics is something that only you, as the designer of your application, can tell. The information you provided is not enough for me to tell it without uncertainty, but hopefully this answer gave you a hint on the kind questions you should ask yourself, and what would be the answer in each case.

Related

store strings in stable memory in c++

A little bit of background first (skip ahead to the boldface if you're bored by this).
I'm trying to glue two pieces of code together. One is a JSON/YML library that makes heavy use of a custom string view object, the other is a piece of code from the early 2000s.
I've been seeing weird behavior for a long time, until I have traced it down to a memory issue, namely that the string views I construct in the JSON/YML library take a const char* as a constructor, and assume that the memory location of that char array stays constant over the lifetime of the string view. However, some of the std::string objects on which I construct these views are temporary, so that's just not true and the string view ends up pointing at garbage.
Now, I thought I was being smart and constructed a cache in the form of a std::vector that would hold all the temporary strings, I would construct the string views on these and only clear the cache in the end - easy.
However, I was still seeing garbled strings every now and then, until I found the reason: sometimes, when pushing things to the vector beyond the preallocated size, the vector would be moved to a different memory location, invalidating all the string views. For now, I've settled on preallocating a cache size that is large enough to avoid any conceivable moving of the vector, but I can see this causing severe and untracable problems in the future for very large runs. So here's my question:
How can I construct a std::vector<std::string> or any other string container that either avoids being moved in memory alltogether, or at least throws an error message if that happens?
Of course, if you feel that I'm going about this whole issue in the wrong way fundamentally, please also let me know how I should deal with this issue instead.
If you're interested, the two pieces of code in question are RapidYAML and the CERN Statistics Library ROOT.
My answer from a similar question: Any way to update pointer/reference value when vector changes capability?
If you store objects in your vector as std::unique_ptr or std::shared_ptr, you can get an observing pointer to the underlying object with std::unique_ptr::get() (or a reference if you dereference the smart pointer). This way, even though the memory location of the smart pointer changes upon resizing, the observing pointer points to the same object and thus the same memory location.
[...] sometimes, when pushing things to the vector beyond the preallocated size, the vector would be moved to a different memory location, invalidating all the string views.
The reason is that std::vector is required to store its data contiguously in memory. So, if you exceed the maximum capacity of the vector when adding an element, it will allocate a new space in memory (big enough this time) and move all the data here.
What you are subject to is called iterator invalidation.
How can I construct a std::vector or any other string container that either avoids being moved in memory alltogether, or at least throws an error message if that happens?
You have at least 3 easy solutions:
If your cache size is supposed to be fixed and is known at compile-time, I would advise you to use std::array instead.
If your cache size is supposed to be fixed but not necessarily known at compile-time, I would advise you to reserve() the required capacity of your std::vector so that you will have the guarantee that it will big enough to not need to be reallocated.
If your cache size may change, I would advise you to use std::list instead. It is implemented as a (usually doubly) linked list. It will guarantee that the elements will not be relocated in memory.
But since they are not stored contiguously in memory, you'll lose the ability to have direct access to any element (i.e. you'll need to iterate over the list in order to find an element).
Of course there probably are other solutions (I do not claim this answer to be exhaustive) but these solutions will allow you to almost not change your code (only the container) and protect your string views to be invalidated.
Perhaps use an std::list. Its accessing method is slower (at least when iterating) but memory location is constant. Reason for both is that it does not use contiguous memory.
Alternatively create a wrapper that wraps a pointer to a string that has been created with "new". That address will also be constant. EDIT: Somehow I managed to miss that what I've just described is pretty much a smartpointer minus automated deletion ;)
Well sadly it is impossible to be able to grow a vector while being sure the content will stay at the same place on classical OS at least.
There is the function realloc that tries to keep the same place, but as you can read on the documentation, there is no guarantee to that, only the os will decide.
To solution your problem, you need the concept of a pool, a pool of string here, that handle the life time of your strings.
You may get away with a simple std::list of string, but it will lead to bad data aliasing and a lot of independent allocations bad to your performances. These will also be the problems with smart pointers.
So if you care about performances, how you may implement it in your case may be not far from your current implementation in my opinion. Because you cannot resize the vector, you should prefer an std::array of a fixed size that you decide at compile time. Then, whenever you need it, you can create a new one to expand your memory capacity. This may be easily implemented by a std::list<std::array> typically.
I don't know if it applies here, but you must be careful if your application can create any number of string during its execution as it may induce an ever growing memory pool, and maybe finally memory problems. To fix that you may insure that the strings you don't use anymore can be reused or freed. Sadly I cannot help you too much here, as these rules will depend on your application.

C++ how to manage iterators of contiguous dynamic arrays

I have been trying to figure out an efficient way of managing dynamic arrays which I may change occasionally but would like to randomly access and iterate over often.
I would like to be able to:
store the array in a continuous data block (reduce cache misses)
access each element individually and independently of the array handle (pointers > indices)
resize the array (dynamic)
So in order to achieve this I have been trying things out using std::vector<T>::iterator, and it worked very well, until recently, when I resized the vector (e.g. calling push_back()) that I was storing iterators of. All the iterators became invalid, because they were pointing to stale memory.
Is there any efficient (possibly STL-)way of keeping the iterator pointers up to date? Or do I have to update each Iterator manually?
Is this whole approach even worthwhile? Should I stick with indices?
EDIT:
I have used indices before and it was ok, but I have changed my approach because it still wasn´t good. I would always have to drag the entire array into scope and the indices could be easily used for any array. also there is no perfect way of defining a "NULL" index (none I know about).
What about the option to update all pointers along with a resize operation? All you would have to do is to store the original vector::begin, resize the vector and afterwards update all pointers to vector.begin() + (ptr - prevBegin) and resize operations is already something you should try to avoid.
Fully achieving all 3 of your goals is impossible. If you are fully contiguous, then you have one memory block with a finite size, and the only way to get more memory is to ask for more memory, which will not be contiguous with the memory you already have. So you have to sacrifice at least one requirement, to at least some degree:
If you are willing to partly sacrifice contiguity, you can use a std::deque. This is an array-of-arrays kind of structure. It doesn't invalidate references, for I think any operation that increases its size. It depends on the details of your data type but generally its performance is much closed to a contiguous array than a linked list. Well done but old (5 year) benchmarks: https://baptiste-wicht.com/posts/2012/12/cpp-benchmark-vector-list-deque.html. Another option is to write a chunking allocator, to use either with deque or another structure. This is quite a bit more work though.
If you can use indices, then you can just use a vector
If you don't need to resize, you can still just use a vector and never resize it.
Unless you have a good reason, I would stick with indices. If your main performance bottlenecks are iteration related over a large number of elements (as your contiguity requirement implies), then this whole indexing thing should really be a non-issue. If you do have a very good reason for avoiding indices (which you haven't stated), then I would profile the deque versus the vector on the main loop operation to see how much worse the deque really does. It might be barely worse, and if neither deque nor vector work well enough for you, the next alternatives are quite a bit more work (probably involving allocators or a custom data structure).
Depending on your needs, if you can use the following data structure:
std::vector<std::unique_ptr<Foo>>
then no matter how the vector is resized, if you access your data via a Foo*, the pointer to foo will not be invalidated.
As the number of Foos you need to store in your vector changes, the vector may need to resize it's internal contiguous block of memory, which means any iterators you have pointing inside the vector will be invalidated when the vector resizes.
(You can read more here on C++0x iterator invalidation rules)
However, since the object stored in the vector is a pointer to an object elsewhere in the heap, the pointed-to-object (Foo in this example), will not be invalidated.
Note that the vector owns Foo (as alluded to by std::unique_ptr<Foo>), whilst you can store a non-owning pointer to Foo by keeping a Foo* as the means of accessing your Foo data.
So long as the vector outlives your need to access Foo via your Foo*, then you will not have any lifetime issues.
So in terms of your requirements:
store the array in a continuous data block (reduce cache misses)
yes, std::vector achieves this
access each element individually and independently of the array handle (pointers > indices)
yes, store a Foo* as your means of accessing each element individually, and that remains independent of the array handle (vector::iterator)
resize the array (dynamic)
yes, std::vector achieves this, automatically, resizing for you when you need it to.
Bonus:
Using a smart pointer (in this example std::unique_ptr) in the vector means memory management is also handled automatically for you. (Just make sure you don't try to access a Foo* after the vector is destroyed.
Edit:
It has been pointed out in the comments that storing a std::unique_ptr<Foo> in the vector violates your requirement for the objects to be stored in contiguous memory (if indeed that is what you mean by store the array in contiguous memory, as the vector's underlying array will be contiguous, but accessing the Foo objects will incur an indirection).
However, if you use a suitable allocator (eg arena allocator) for both the vector and the Foo objects, then you will have a higher chance of suffering fewer cache misses, as your Foo objects will exist near to the memory used by your vector, thereby having a higher chance of being in cache when iterating over the vector.

Considerations when choosing between storing values vs pointers in std:: containers

When storing objects in standard collections, what considerations should one think of when deciding between storing values vs pointers? Where the owner of these collections (ObjectOwner) is allocated on the heap. For small structs/objects I've been storing values, while for large objects I've been storing pointers. My reasoning for this was that when standard containers are resized, their contents are copied (small copy ok, big copy bad). Any other things to keep in mind here?
class ObjectOwner
{
public:
SmallObject& getSmallObject(int smallObjectId);
HugeObject* getHugeObject(int hugeObjectId);
private:
std::map<int, SmallObject> mSmallObjectMap;
std::map<int, HugeObject *> mHugeObjectMap;
};
Edit:
an example of the above for more context:
Create/Delete items stored in std::map relatively infrequently ( a few times per second)
Get from std::map frequently (once per 10 milliseconds)
small object: < 32 bytes
huge object: > 1024 bytes
I would store object by value unless I need it through pointer. Possible reasons:
I need to store object hierarchy in a container (to avoid slicing)
I need shared ownership
There are possibly other reasons, but reasoning by size is only valid for some containers (std::vector for example) and even there you can make object moving cost minimal (reserve enough room in advance for example). You example for object size with std::map does not make any sense as std::map does not relocate objects when growing.
Note: return type of a method should not reflect the way you store it in a container, but rather it should be based on method semantics, ie what you would do if object is not found.
Only your profiler knows the answer for your SPECIFIC case; trying to use pointers rather than objects is a reliable way of minimising the amount you copy when a copy must be done (be it a resize of a vector or a copy of the whole container); but sometimes you WANT that copy because it's a snapshot for a thread inside a mutex, and it's faster to copy the container than to hold the mutex and deal with the data.
Some objects might not be possible to keep in any way other than pointer because they're not copyable.
Any performance gain by using container of pointer could be offset by costs of having to write more copy code, or repeated calls to new().
There's not a one answer fits all, and before you worry about the performance here you should establish where the performance problems really are. (Just repeating the point - use a profiler!)

What is the best way to store list/vector of dynamically created objects?

I was always wondering about this and today I have finally came to a moment that I need to get the answer right.
Namely: I am confused how should I store a list/vector/? of dynamically created objects ?
For instance: I have this program which will read from file, data about connections and then on this basis create nodes (with connection information inside it) objects.
So the problem is: should I use std::vector<node> and in a for loop create temporary objects and push_back() them or use std::vector<node*> and push_back() pointers of dynamically allocated nodes and then in the end of the program delete them ?
I do have the number of the elements which I would like to store.
Or maybe there is a better way of doing this and I am not aware about it.
Oh, if you have a known size... allocate your vector with that known size (to eliminate the overhead of a resize during load), then the load is as efficient as a memcpy(). It'll just stack your values as expected. Highly efficient, more so than a pointer indirection.
If your elements are polymorphic, you'll need pointers. If you're concerned about cleaning up memory (you should be), consider using: http://www.boost.org/doc/libs/1_50_0/libs/ptr_container/doc/ptr_vector.html
[EDIT]
If everything is constant sized, also consider http://www.boost.org/doc/libs/1_50_0/doc/html/array.html. It's like a C array, but without any resize overhead and STL support... if you really need speed.
If you do not absolutely require dynamic allocation, do not use it.
You would use std::vector<node> in that scenario.
You know the types, sizes, and element counts of what you need in advance. What you describe is perfect for vector -- you're dealing with a contiguous allocation.
In general, use of std::list should be very rare; vector is usually the right choice.
Also note that you can use reserve() to set the allocation size prior to population when you know the size -- this could save a lot of reallocation and node copying.

Dynamic arrays vs STL vectors exact difference?

What is the exact difference between dynamic arrays and vectors. It was an interview question to me.
I said both have sequential memory.
Vectors can be grown in size at any point in the code. He then said even dynamic arrays can be grown in size after creating.
I said vectors are error free since it is in the standard library. He said he will provide as .so file of dynamic arrays which is error free and has all the qualities on par with STL.
I am confused and didn't answer the exact difference. When I searched on Internet, I had seen the above statements only.
Can someone please explain me the exact difference? And what was the interviewer expecting from me?
He said he will provide as .so file of dynamic arrays which is error free and has all the qualities on par with STL.
If his dynamic array class does the same as std::vector (that is: it implements RAII to clean up after itself, can grow and shrink and whatever else std::vector does), then there's only one major advantage std::vector has over his dynamic array class:
std::vector is standardized and everybody knows it. If I see a std::vector in some piece of code, I know exactly what it does and how it is supposed to be used. If, however, I see a my::dynamic_array, I do not know that at all. I would need to have to look at its documentation or even — gasp! — implementation to find out whether my_dynamic_array::resize() does the same as std::vector::resize().
A great deal here depends on what he means by a "dynamic array". Most people mean something where the memory is allocated with array-new and freed with array-delete. If that's the intent here, then having qualities on a par with std::vector simply isn't possible.
The reason is fairly simple: std::vector routinely allocates a chunk of memory larger than necessary to hold the number of elements currently being stored. It then constructs objects in that memory as needed to expand. With array-new, however, you have no choice -- you're allocating an array of objects, so if you allocate space for (say) 100 objects, you end up with 100 objects being created in that space (immediately). It simply has no provision for having a buffer some part of which contains real objects, and another part of which is just plain memory, containing nothing.
I suppose if yo want to stretch a point, it's possible to imitate std::vector and still allocate the space with array-new. To do it, you just have to allocate an array of char, and then use placement new to create objects in that raw memory space. This allows pretty much the same things as std::vector, because it is nearly the same thing as std::vector. We're still missing a (potential) level of indirection though -- std::vector actually allocates memory via an Allocator object so you can change exactly how it allocates its raw memory (by default it uses std::allocator<T>, which uses operator new, but if you wanted to, you could actually write an allocator that would use new char[size], though I can't quite imagine why you would).
You could, of course, write your dynamic array to use an allocator object as well. At that point, for all practical purposes you've just reinvented std::vector under a (presumably) new name. In that case, #sbi is still right: the mere fact that it's not standardized means it's still missing one of the chief qualities of std:::vector -- the quality of being standardized and already known by everybody who knows C++. Even without that, though, we have to stretch the phrase "dynamic array" to (and I'd posit, beyond) the breaking point to get the same qualities as std::vector, even if we ignore standardization.
I expect they wanted you to talk about the traps of forgetting to delete the dynamic array with operator delete[] and then got confused themselves when they tried to help you along; it doesn't make much sense to implement a dynamic array as a plain class since it bakes in the element type.
The array memory allocated for vectors is released when the vector goes out of scope, in case the vector is declared on the stack (the backing array will be on the heap).
void foo() {
vector<int> v;
// ... method body
// backing array will be freed here
}
It says here: "Internally, vectors use a dynamically allocated array to store their elements."
Underlying concept of vectors is dynamically allocated array.
http://www.cplusplus.com/reference/vector/vector/
Maybe it's that dynamic array you would go through the copy process to a new dynamic array whenever you want to resize, but you are able to control when it does that depending on your knowledge of the data going into the array.
Whereas a vector uses the same process, but a vector does not know if it will grow or not later, so it probably allocates extra storage for possible growth in size, therefore it COULD possibly consume more memory space than intended to manage itself compared to dynamic arrays.
So, I'd say the difference is to use a vector when managing it's size is not a big deal, where you would use a dynamic array when you would rather do the resizing yourself.
Arrays have to be deallocated explicitly if defined dynamically whereas vectors are automatically de-allocated from heap memory.
Size of array cannot be determined if dynamically allocated whereas Size of the vector can be determined in O(1) time.
3.When arrays are passed to a function, a separate parameter for size is also passed whereas in case of passing a vector to a function, there is no such need as vector maintains variables which keeps track of size of container at all times.
4.When we allocate array dynamically then after size is initialized we cannot change the size whereasin vector we can do it.