How to know when a vector’s internal array is reallocated?

How to know when a vector’s internal array is reallocated? - c++

I have a std::vector containing a lot of elements. As the vector is huge, I store pointers to some specific elements in another vector, to be able to access them faster. But as the vector grows, sometimes its internal container gets reallocated, causing all my pointers to become invalid.
Is there a way to know when this happens? This way I can update the pointers in the other list.

You shoudldn't store pointers, you should store indexes :
i.e, instead of :
var1 = &vector[0];
var2 = &vector[13];
and access *var1, *var2
you should store :
i1 = 0;
i2 = 13;
and access vector[i1], vector[i2]
Note : you should still be careful if you use modifier methods :
pop_back() (which makes the last position invalid)
erase(i) (which shifts all the indexes greater than i)
etc ...
(your first method had the same caveat)

Maybe you should have a look at boost::container::stable_vector:
http://www.boost.org/doc/libs/1_51_0/doc/html/boost/container/stable_vector.html

Instead of storing the elements in the large vector directly, you could store pointers to the individual elements. So instead of a std::vector<int> you use a std::vector<int *>. Even when the vector reallocates its content, the addresses of the data itself won't change, so other pointers to it will stay valid. This, however, requires you to create each element you enter into the vector with new and then manually delete any data which is removed.

I apologise, I was plain wrong in my comment. N3337 23.3.6.3 vector capacity paragraph 5:
Remarks: Reallocation invalidates all the references, pointers, and
iterators referring to the elements in the sequence. It is guaranteed
that no reallocation takes place during insertions that happen after a
call to reserve() until the time when an insertion would make the
size of the vector greater than the value of capacity().

Related

Updating pointers to elements in array when array content changes

Say I have a simple contiguous array or vector containing some elements of type T
std::vector<T> someVector;
I have several raw pointers to the insides of the vector distributed around the application.
T* pointerOne = &someVector[5];
T* another = &someVector[42];
T* evenMore = &someVector[55];
However, the elements in the vector sometimes move around in the application, which can invalidate pointers (as in: doesn't point to what it's supposed to point at anymore):
std::swap(someVector[4],someVector[5]); //Oops! pointerOne now essentially points to whatever was in someVector[4], and the correct object that was in someVector[5] has moved places
What's an efficient (in terms of performance and memory footprint [although that probably goes hand in hand]) system for keeping these pointers updated when the contents of the array move around?
Some notes:
elements switch their positions very infrequently. num(location changes) << num(accesses to elements). This means that I'd like to keep pointers which are updated instead of introducing some other system that abstracts this problem away, because dereferencing a pointer is as fast as I can get in the application, and performance is very important here.
all of the Ts will always be inside a contiguous array. It won't at some point in development change to become some other container type, like a map.
I do know (and can modify) the code parts where the Ts are moved around inside the array. In fact that happens inside a single function. I.e. the system doesn't need to monitor the memory and somehow automatically detect at runtime if the contents of the array changes.

How about holding a reverse map to the pointers. This could be an array (or vector) in the length of your original array that holds pointers to the pointers you created. For instance at index 5 of this reverse map, you will have pointers to all the pointers that point at element 5 in the original array. Now if element 5 is swapped with say element 6, just go over all the pointers in the reverse map at index 5, set them to point at element 6 in the original array and also move all these pointers to index 6 of the reverse map. You can do this work from the single point in your code that moves stuff around.

An idea similar to user3678664's suggestion optimised for the case when there are many elements in the vector and few pointers:
Create a function that uses a
multimap<int, T**> ptrmap;
to save the addresses of the pointers.
void RegisterPointer(int pos, T** ptr)
{
ptrmap.insert(pair<int, T**>(pos, ptr));
}
Then write a function void UpdateSwappedPtrs(int pos, int newpos) that updates the swapped pointers by iterating through all the pointers in the ptrmap at pos and newpos.

Finding information in a vector?

In C++ I understand that in order to create a dynamic array you need to use vectors. However I have a problem when I need to find information I put in the vector.
For example:
Lets say I have a simple vector that stores the name of a person and a small message the wrote. In the vector how do I find where Bill is located.
I was also trying to understand how to do this in PHP when I posted this question.

Indeed you seam confused. Let me try to help you.
One thing that is maybe confusing you: std::vector is not a geometric vector. It's only a sequence of data of the same type that is contiguous in memory. So it's like an array.
a) Determine the size of a vector based on a variable. For example if
I was using an array it would look something like array [x][y] ( I
know it's not possible to do this). How would I do this with a vector
std::vector is basically a automatically managed dynamic array.
It means that it IS an array inside, but it's managed by code that will make sure that array grows (gets bigger) when you try to add more data than it current capacity can hold.
Actually, std::vector is a class template. It means that it's not a real class, it's code that the compiler will use to generate itself a real class. If I say
std::vector<int> my_ints; // this is a vector of ints
This vector can only hold ints. And then:
std::vector<std::string> name_list;
this one hold std::string objects.
As I was saying, inside, it's only code to manage an array dynamically. You can think the previous examples as if it was like that:
class
{
unsigned long size; // count of elements contained in this container
unsigned long capacity; // count of elements that the memory allocated by the array can hold
int* array; // array containing the values, created using new, destroyed using delete
}
my_ints;
This is an oversimplified view of how it is inside, so don't assume it's exactly like that, but it might be useful.
Now, when you add values, the value is copied in the memory of the array, in an element that is not used yet (through push_back() for example) or writing over an element already existing (using insert() for example).
If you add a value and the capacity of the vector is not enough to hold all values, then the the vector will automatically grow: it will create a much bigger array, copy it's current values inside, copy the additional value too, then delete the array it had before.
It's important to understand this: if a vector grows, then you can't assume that it's data is always at the same adress in memory, so pointers to it's data can't be trusted.
b) second how would I using the push back command to store the value
of a variable inside a specific spot. Again if I was using an array
it'd be like array[x][y] += q. Where x and y are the spot in the array
and q is the value.
You don't use push_back() to add a value between two values, you use insert().
The syntaxe array[x][y] += q Will certainly not do what you describe. It will add q to the value at the position array[x][y].

Arrays are different to std::vector because they are of a fixed size. All elements of the array exist while the array exists. When you create a std::vector with its default constructor, it is empty. It contains no elements, so you cannot index any elements.
However, std::vector does have a constructor that takes the initial size. If you pass a single int argument to the std::vector constructor, it will default initialise that many elements. For example:
std::vector<int> v(10); // Will have 10 ints
If you want the equivalent of a 2D array, then you'll need a std::vector<std::vector<T>>. If you want to construct it with a specific size, you will need specify the size of the outer std::vector as above, and pass it the std::vector that each element should be initialised to. For example, if you want a 10x20 vector:
// This will have 10x20 ints
std::vector<std::vector<int>> v(10, std::vector<int>(20));
Once these elements exist, you can index them just as you would an array:
int value = v[x][y];
It's worth noting that C++11 introduces std::array which has a compile-time fixed size. You could use it like this:
std::array<std::array<int, 20>, 10> arr;
However, you cannot use this if you want your array size to be determined by a variable. The dimensions must be compile-time constants.

Pass nested C++ vector as built-in style multi-dimensional array

If I have a vector in C++, I know I can safely pass it as an array (pointer to the contained type):
void some_function(size_t size, int array[])
{
// impl here...
}
// ...
std::vector<int> test;
some_function(test.size(), &test[0]);
Is it safe to do this with a nested vector?
void some_function(size_t x, size_t y, size_t z, int* multi_dimensional_array)
{
// impl here...
}
// ...
std::vector<std::vector<std::vector<int> > > test;
// initialize with non-jagged dimensions, ensure they're not empty, then...
some_function(test.size(), test[0].size(), test[0][0].size(), &test[0][0][0]);
Edit:
If it is not safe, what are some alternatives, both if I can change the signature of some_function, and if I can't?

Short answer is "no".
Elements here std::vector<std::vector<std::vector<int> > > test; are not replaced in contiguous memory area.

You can only expect multi_dimensional_array to point to a contiguos memory block of size test[0][0].size() * sizeof(int). But that is probably not what you want.

It is erroneous to take the address of any location in a vector and pass it. It might seem to work, but don't count on it.
The reason why is closely tied to why a vector is a vector, and not an array. We want a vector to grow dynamically, unlike an array. We want insertions into a vector be a constant cost and not depend on the size of the vector, like an array until you hit the allocated size of the array.
So how does the magic work? When there is no more internal space to add a next element to the vector, a new space is allocated twice the size of the old. The old space is copied to the new and the old space is no longer needed, or valid, which makes dangling any pointer to the old space. Twice the space is allocated so the average cost of insertion to the vector that is constant.

Is it safe to do this with a nested vector?
Yes, IF you want to access the inner-most vector only, and as long you know the number of elements it contains, and you don't try accessing more than that.
But seeing your function signature, it seems that you want to acess all three dimensions, in that case, no, that isn't valid.
The alternative is that you can call the function some_function(size_t size, int array[]) for each inner-most vector (if that solves your problem); and for that you can do this trick (or something similar):
void some_function(std::vector<int> & v1int)
{
//the final call to some_function(size_t size, int array[])
//which actually process the inner-most vectors
some_function(v1int.size(), &v1int[0]);
}
void some_function(std::vector<std::vector<int> > & v2int)
{
//call some_function(std::vector<int> & v1int) for each element!
std::for_each(v2int.begin(), v2int.end(), some_function);
}
//call some_function(std::vector<std::vector<int> > & v2int) for each element!
std::for_each(test.begin(), test.end(), some_function);

A very simple solution would be to simply copy the contents of the nested vector into one vector and pass it to that function. But this depends on how much overhead you are willing to take.
That being sad: Nested vectorS aren't good practice. A matrix class storing everything in contiguous memory and managing access is really more efficient and less ugly and would possibly allow something like T* matrix::get_raw() but the ordering of the contents would still be an implementation detail.

Simple answer - no, it is not. Did you try compiling this? And why not just pass the whole 3D vector as a reference? If you are trying to access old C code in this manner, then you cannot.

It would be much safer to pass the vector, or a reference to it:
void some_function(std::vector<std::vector<std::vector<int>>> & vector);
You can then get the size and items within the function, leaving less risk for mistakes. You can copy the vector or pass a pointer/reference, depending on expected size and use.
If you need to pass across modules, then it becomes slightly more complicated.

Trying to use &top_level_vector[0] and pass that to a C-style function that expects an int* isn't safe.
To support correct C-style access to a multi-dimensional array, all the bytes of all the hierarchy of arrays would have to be contiguous. In a c++ std::vector, this is true for the items contained by a vector, but not for the vector itself. If you try to take the address of the top-level vector, ala &top_level_vector[0], you're going to get an array of vectors, not an array of int.
The vector structure isn't simply an array of the contained type. It is implemented as a structure containing a pointer, as well as size and capacity book-keeping data. Therefore the question's std::vector<std::vector<std::vector<int> > > is more or less a hierarchical tree of structures, stitched together with pointers. Only the final leaf nodes in that tree are blocks of contiguous int values. And each of those blocks of memory are not necessarily contiguous to any other block.
In order to interface with C, you can only pass the contents of a single vector. So you'll have to create a single std::vector<int> of size x * y * z. Or you could decide to re-structure your C code to handle a single 1-dimensional stripe of data at a time. Then you could keep the hierarchy, and only pass in the contents of leaf vectors.

Pointers to elements of std::vector and std::list

I'm having a std::vector with elements of some class ClassA. Additionally I want to create an index using a std::map<key,ClassA*> which maps some key value to pointers to elements contained in the vector.
Is there any guarantee that these pointers remain valid (and point to the same object) when elements are added at the end of the vector (not inserted). I.e, would the following code be correct:
std::vector<ClassA> storage;
std::map<int, ClassA*> map;
for (int i=0; i<10000; ++i) {
storage.push_back(ClassA());
map.insert(std::make_pair(storage.back().getKey(), &(storage.back()));
}
// map contains only valid pointers to the 'correct' elements of storage
How is the situation, if I use std::list instead of std::vector?

Vectors - No. Because the capacity of vectors never shrinks, it is guaranteed that references, pointers, and iterators remain valid even when elements are deleted or changed, provided they refer to a position before the manipulated elements. However, insertions may invalidate references, pointers, and iterators.
Lists - Yes, inserting and deleting elements does not invalidate pointers, references, and iterators to other elements

As far as I understand, there is no such guarantee. Adding elements to the vector will cause elements re-allocation, thus invalidating all your pointers in the map.

Use std::deque! Pointers to the elements are stable when only push_back() is used.
Note: Iterators to elements may be invalidated! Pointers to elements won't.
Edit: this answer explains the details why: C++ deque's iterator invalidated after push_front()

I'm not sure whether it's guaranteed, but in practice storage.reserve(needed_size) should make sure no reallocations occur.
But why don't you store indexes?
It's easy to convert indexes to iterators by adding them to the begin iterator (storage.begin()+idx) and it's easy to turn any iterator into a pointer by first dereferencing it, and then taking its address (&*(storage.begin()+idx)).

Just make them both store pointers an explicitly delete the objects when you don't need them.
std::vector<ClassA*> storage;
std::map<int, ClassA*> map;
for (int i=0; i<10000; ++i) {
ClassA* a = new ClassA()
  storage.push_back(a)
  map.insert(std::make_pair(a->getKey(), a))
}
// map contains only valid pointers to the 'correct' elements of storage

From one of the comments to another answer, it seems as if all that you want is centralize (ease) memory management. If that is really the case, you should consider using prepackaged solutions like the boost pointer container library and keep your own code as simple as possible.
In particular, take a look at ptr_map

for vectors no.
for lists yes.
how?
iterator works as a pointer to a particular node in the list.
so you can assign values to any struct like:
list mylist;
pair< list::iterator ,int > temp;
temp = make_pair( mylist.begin() , x );

C++: Assigning values to non-continuous indexes in vectors?

If I want to declare a vector of unknown size, then assign values to index 5, index 10, index 1, index 100, in that order. Is it easily doable in a vector?
It seems there's no easy way. Cause if I initialize a vector without a size, then I can't access index 5 without first allocating memory for it by doing resize() or five push_back()'s. But resize clears previously stored values in a vector. I can construct the vector by giving it a size to begin with, but I don't know how big the vector should.
So how can I not have to declare a fixed size, and still access non-continuous indices in a vector?
(I doubt an array would be easier for this task).

Would an std::map between integer keys and values not be an easier solution here? Vectors will require a contiguous allocation of memory, so if you're only using the occasional index, you'll "waste" a lot of memory.

Resize doesn't clear the vector. You can easily do something like:
if (v.size() <= n)
v.resize(n+1);
v[n] = 42;
This will preserve all values in the vector and add just enough default initialized values so that index n becomes accessible.
That said, if you don't need all indexes or contigous memory, you might consider a different data structure.

resize() doesn't clear previously stored values in a vector.
see this documentation
I would also argue that if this is what you need to do then its possible that vector may not be the container for you. Did you consider using map maybe?

Data structures which do not contain a contiguous set of values are known as sparse or compressed data structures. It seems that this is what you are looking for.
If this is case, you want a sparse vector. There is one implemented in boost, see link text
Sparse structures are typically used to conserve memory. It is possible from your problem description that you don't actually care about memory use, but about addressing elements that don't yet exist (you want an auto-resizing container). In this case a simple solution with no external dependencies is as follows:
Create a template class that holds a vector and forwards all vector methods to it. Change your operator[] to resize the vector if the index is out of bounds.
// A vector that resizes on dereference if the index is out of bounds.
template<typename T>
struct resize_vector
{
typedef typename std::vector<T>::size_type size_type;
// ... Repeat for iterator/value_type typedefs etc
size_type size() const { return m_impl.size() }
// ... Repeat for all other vector methods you want
value_type& operator[](size_type i)
{
if (i >= size())
resize(i + 1); // Resize
return m_impl[i];
}
// You may want a const overload of operator[] that throws
// instead of resizing (or make m_impl mutable, but thats ugly).
private:
std::vector<T> m_impl;
};
As noted in other answers, elements aren't cleared when a vector is resized. Instead, when new elements are added by a resize, their default constructor is called. You therefore need to know when using this class that operator[] may return you a default constructed object reference. Your default constructor for <T> should therefore set the object to a sensible value for this purpose. You may use a sentinel value for example, if you need to know whether the element has previously been assigned a value.
The suggestion to use a std::map<size_t, T> also has merit as a solution, provided you don't mind the extra memory use, non-contiguous element storage and O(logN) lookup rather than O(1) for the vector. This all boils down to whether you want a sparse representation or automatic resizing; hopefully this answer covers both.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to know when a vector’s internal array is reallocated? - c++

Maybe you should have a look at boost::container::stable_vector: http://www.boost.org/doc/libs/1_51_0/doc/html/boost/container/stable_vector.html

Related

Updating pointers to elements in array when array content changes

Finding information in a vector?

Pass nested C++ vector as built-in style multi-dimensional array

Pointers to elements of std::vector and std::list

C++: Assigning values to non-continuous indexes in vectors?

Categories

Resources