What's the fastest way to reinitialize a vector? - c++

What is the fastest way to reset all values for a large vector to its default values?
struct foo
{
int id;
float score;
};
std::vector<foo> large_vector(10000000);
The simplest way would be to create a new vector, but I guess it takes more time to reallocate memory than to reinitialize an existing one?
I have to iterate over the vector to collect non-zero scores (could be thousands or millions) before resetting it. Should I reset the structs one by one in this loop?
Edit:
The vector size is fixed and 'default value' means 0 for every struct member (all floats and ints).

What's the fastest way to reinitialize a vector?
Don't.
Just record the fact that the vector has no valid entries by calling clear(). This has the advantage of being both (probably) optimal, and guaranteed correct, and also being perfectly expressive. IMO none of the suggested alternatives should be considered unless profiling shows an actual need.
Your element type is trivial, so the linear upper-bound on complexity should in reality be constant for a decent quality implementation - there's no need to destroy each element in turn.
No memory is deallocated, or needs to be re-allocated later.
You'll just need to push_back or emplace_back when you're writing into the vector after clear()ing, instead of using operator[].
To make this consistent with the first use, don't initialize your vector with 10000000 value-constructed elements, but use reserve(10000000) to pre-allocate without initialization.
eg.
int main() {
vector<foo> v;
v.reserve(10000000);
while(keep_running) {
use(v);
v.clear();
}
}
// precondition: v is empty, so
// don't access v[i] until you've done
// v.push_back({id,score})
// at least i+1 times
void use(vector<foo> &v) {
}
Since you need to zero your elements in-place, the second fastest general-purpose solution is probably to alter the loop above to
while(keep_running) {
v.resize(10000000);
use(v);
v.clear();
}
or alternatively to remove the clear() and use fill() to overwrite all elements in-place.
If non-zero elements are sparse, as may be the case if you're updating them based on some meaningful index, it might be faster to zero them on the fly as your main loop iterates over the vector.
Again, you really need to profile to find out which is better for your use case.

In order to determine the fastest way you will need to run some benchmarks.
There are a number of different ways to "reinitialise" a vector:
Call clear(), for trivial types this should be roughly equivalent to just doing vector.size = 0. The capacity of the vector doesn't change and no elements are deallocated. Destructors will be called on elements if they exist. As you push_back, emplace_back or resize the vector the old values will be overwritten.
Call assign(), e.g. large_vector.assign( large_vector.size(), Foo() );. This will iterate through the whole vector resetting every element to its default value. Hopefully the compiler will manage to optimise this to a memset or similar.
As your type is trivial, if you want to just reset every element to 0 you should be able to do a memset, e.g.: memset( large_vector.data(), 0, sizeof(Foo)*large_vector.size() );.
Call std::fill e.g. std::fill( large_vector.begin(), large_vector.end(), Foo() );, this should be similar to assign or memset.

What is the fastest way to reset all values for a large vector to its default values?
Depends on what vector in its "default values" means.
If you want to remove all elements, most efficient is std::vector::clear.
If you want to keep all elements in the vector but set their state, then you can use std::fill:
std::fill(large_vector.begin(), large_vector.end(), default_value);
If the element type is trivial, and the "default value" is zero†, then std::memset may be optimal:
static_assert(std::is_trivially_copyable_v<decltype(large_vector[0])>);
std::memset(large_vector.data(), 0, large_vector.size() * sizeof(large_vector[0]));
To verify that std::memset is worth the trouble, you should measure (or inspect assembly). The optimiser may do the work for you.
† Zero in the sense that all bits are unset. C++ does not guarantee that this is a representation for a zero float. It also doesn't guarantee it to be a null pointer, in case your non-minimal use case uses pointers.

Related

Can a vector of a "fixed" size also have a false size (compared to memory size) in c++

I know that for optimization reasons, the size of a vector might not be the actual size of the object in the memory.
For example, sometimes if I use push_back or resize, the vector in the memory would actually hold size for twice the number of elements.
I read that this is the case if I push or pop elements from the vector, or use resize, and even that shrink_to_fit doesn't always get the vector to the wanted size.
Is it the same way if I don't use all of these? I just initialize a vector of "fixed" number of elements, and never use them.
Can I assume that if I initialize a vector of size 16, it takes out place in memory as 16 * size of the elements?
Thank you.
Edit:
I need the vector for other use. The main problem is that I need to use a specific amount of memory. I can use array, but it would be much less convenient. I'm trying to understand the implementation of a line as:
std::vector<X> myVec(16);
As I said, I read that if I use push and pop, or resize, the OS (I think) can assign more space than I specified.
But I couldn't find anything that said that about a line as I wrote.
Thank you 2 :)
EDIT 2:
interesting findings:
{
vector<int> a(16);
vector<int> b(32);
vector<int> c(32);
cout << a.capacity(); // prints 16
cout << b.capacity(); // prints 32
b = a; // uses vector copy "="
cout << b.capacity(); // prints 32 (!!)
cout << c.capacity(); // prints 32
c = vector<int>(16) // uses vector move "="
cout << c.capacity(); // prints 16 (!!)}
I tried this after reading about the implementation of the move and copy constructors in vectors.
So it looks like when using rvalues (i.e the move constructor), it allocates the specified size. But, when using copy assignment (or when using a copy constructor), it doesn't free the memory, and the vector still has a capacity of 32.
standard does not talk about the initial capacity when constructed with the constructor you use and neither for any other constructor:
26.3.11.2 vector constructors, copy, and assignment [vector.cons]
explicit vector(size_type n, const Allocator& = Allocator());
Effects: Constructs a vector with n default-inserted elements using
the specified allocator. Requires: T shall be DefaultInsertable into
*this. Complexity: Linear in n.
That said, the result is implementation dependent(so check your compiler).
On the other hand none of the functions you mentioned guarantee the change in capacity as you want/expect.
It looks like you are looking for boost::static_vector. It is initialized at fixed capacity and will never exceed it while allowing to push and pop elements. A sort of boost::array but tracking amount of items stored.
This is what .reserve() is for. It'll set the capacity to whatever you ask for, even if you're not using it all yet. You'll still get reallocations if you try to push_back past that.
Alternatively, create exactly 16 elements to begin with and use as many as you want whenever you want. Though there are no standard guarantees (vectors are not designed for people who are worried about 16 bytes of memory), there's no reason for an implementation to allocate more than that space up-front and I'm not aware of any that do.

Why is it considered bad style to use the index operator on a vector in C++?

I am working on a program that uses vectors. So the first thing I did was declare my vector.
std::vector<double> x;
x.reserve(10)
(BTW, is this also considered bad practice? Should I just type std::vector<double> x(10)?)
Then I proceeded to assign values to the vector, and ask for its size.
for (int i=0; i<10; i++)
{
x[i]=7.1;
}
std::cout<<x.size()<<std::endl;
I didn't know it would return 0, so after some searching I found out that I needed to use the push_back method instead of the index operator.
for (int i=0; i<10; i++)
{
x.push_back(7.1);
}
std::cout<<x.size()<<std::endl;
And now it returns 10.
So what I want to know is why the index operator lets me access the value "stored" in vector x at a given index, but wont change its size. Also, why is this bad practice?
When you do x.reserve(10) you only set the capacity to ten elements, but the size is still zero.
That means then you use the index operator in your loop you will go out of bounds (since the size is zero) and you will have undefined behavior.
If you want to set the size, then use either resize or simply tell it when constructing the vector:
std::vector<double> x(10);
As for the capacity of the vector, when you set it (using e.g. reserve) then it allocates the memory needed for (in your case) ten elements. That means when you do push_back there will be no reallocations of the vector data.
If you do not change the capacity, or add elements beyond the capacity, then each push_back may cause a reallocation of the vector data.
It sounds like you're asking why things are the way they are. Most of it is down to efficiency.
If x[i] were to create value if it didn't already exist, there would be two hits to efficiency. First, the caller of indexing operations should ensure the index is not beyond the current size of the vector. Second, the new element would need to be default constructed even if you're about to assign a new value into it anyway.
The reason for having both reserve and resize is similar. resize requires a default construction of every element. For something like vector<double> that doesn't seem like a big deal, but for vector<ComplicatedClass>, it could be a big deal indeed. Using reserve is an optimization, completely optional, that allows you to anticipate the final size of the vector and prevent reallocations while it grows.
push_back avoids the default construction of an element, since the contents are known, it can use a move or copy constructor.
None of this is the wrong style, use whatever's appropriate for your situation.
std::vector<double> x;
x.reserve(10)
BTW, is this also considered bad practice?
No, creating an empty vector and reserving memory is not a bad practice.
Should I just type std::vector<double> (10)?)
If your intention is to initialize the vector of 10 elements, rather than empty one, then yes you should. (If your intention is to create an empty vector, then no)
Then I proceeded to assign values to the vector, and ask for its size.
for (int i=0; i<10; i++)
{
x[i]=7.1;
This has undefined behaviour. Do not try to access objects that do not exist.
so after some searching I found out that I needed to use the push_back method instead of the index operator.
That is one option. Another is to use the constructor to initialize the elements: std::vector<double> (10). Yet another is to use std::vector::resize.
Why is it considered bad style to use the index operator on a vector in C++?
It is not in general. It is wrong (not just bad style) if there are no elements at the index that you try to access.
Should I just type std::vector<double> x(10)?
Definitely yes!
As mentioned in #Some programmer dude's answer std::vector::reserve() only affects allocation policies but not the size of the vector.
std::vector<double> x(10);
is actually equivalent to
std::vector<double> x;
x.resize(10);
The bracket operator of the std::vector lets you access an item at the index i in your vector. If an item i does not exist, it cannot be accessed, neither for writing nor for reading.
So what I want to know is why the index operator lets me access the value "stored" in vector x at a given index, but wont change its size.
Because it wasn't designed to work that way. Probably the designers did not think that this behaviour would be desirable.
Please also note that std::vector::reserve does reserve memory for the vector but does not actually change its size. So after calling x.reserve(10) your vector has still got a size of 0 although internally memory for 10 elements has been allocated. If you now want to add an element, you must not use the bracket operator but std::vector::push_back instead. This function will increase the vector's size by one and then append your item. The advantage of calling reserve is that the memory for the vector must not be reallocated when calling push_back multiple times.
std::vector<double> x;
x.reserve(3);
x.push_back(3);
x.push_back(1);
x.push_back(7);
I think the behaviour you desire could be achieved using std::vector::resize. This function reserves the memory as reserve would and then actually changes the size of the vector.
std::vector<double> x;
x.resize(3);
x[0] = 3;
x[1] = 1;
x[2] = 7;
The previous code is equivalent to:
std::vector<double> x(3);
x[0] = 3;
x[1] = 1;
x[2] = 7;
Here the size is the constructor argument. Creating the vector this way performs the resize operation on creation.

Freeing up memory by deleting a vector in a vector in C++

I have a vector of vectors and I wish to delete myvec[i] from memory entirely, free up the room, and so on. Will .erase or .clear do the job for me? If not, what should I do?
Completely Removing The Vector
If you want to completely remove the vector at index i in your myvec, so that myvec[i] will no longer exist and myvec.size() will be one less that it was before, you should do this:
myvec.erase (myvec.begin() + i); // Note that this only works on vectors
This will completely deallocate all memories owned by myvec[i] and will move all the elements after it (myvec[i + 1], myvec[i + 2], etc.) one place back so that myvec will have one less vector in it.
Emptying But Keeping The Vector
However, if you don't want to remove the ith vector from myvec, and you just want to completely empty it while keeping the empty vector in place, there are several methods you can use.
Basic Method
One technique that is commonly used is to swap the vector you want to empty out with a new and completely empty vector, like this:
// suppose the type of your vectors is vector<int>
vector<int>().swap (myvec[i]);
This is guaranteed to free up all the memory in myvec[i], it's fast and it doesn't allocate any new heap memory or anything.
This is used because the method clear does not offer such a guarantee. If you clear the vector, it always does set its size to zero and destruct all the elements, but it might not (depending on the implementation) actually free the memory.
In C++11, you can do what you want with two function calls: (thanks for the helpful comment)
myvec[i].clear();
myvec[i].shrink_to_fit();
Generalization
You can write a small function that would work for most (probably all) STL containers and more:
template <typename T>
void Eviscerate (T & x)
{
T().swap (x);
}
which you use like this:
Eviscerate (myvec[i]);
This is obviously cleaner and more readable, not to mention more general.
In C++11, you can also use decltype to write a generic solution (independent of the type of your container and elements,) but it's very ugly and I only put it here for completeness:
// You should include <utility> for std::remove_reference
typename std::remove_reference<decltype(myvec[i])>::type().swap(myvec[i]);
My recommended method is the Eviscerate function above.
myvec.erase( myvec.begin() + i ) will remove myvec[i]
completely, calling its destructor, and freeing all of its
dynamically allocated memory. It will not reduce the memory
used directly by myvec: myvec.size() will be reduced by one,
but myvec.capacity() will be unchanged. To remove this last
residue, C++11 has myvec.shrink_to_fit(), which might remove
it; otherwise, you'll have to make a complete copy of myvec,
then swap it in:
void
shrink_to_fit( MyVecType& target )
{
MyVecType tmp( target.begin(), target.end() );
target.swap( tmp );
}
(This is basically what shring_to_fit will do under the hood.)
This is a very expensive operation, for very little real gain,
at least with regards to the removal of single elements; if you
are erasing a large number of elements, it might be worth
considering it after all of the erasures.
Finally, if you want to erase all of the elements,
myvec.clear() is exactly the same as myvec.erase() on each
element, with the same considerations described above. In this
case, creating an empty vector and swapping is a better
solution.

Pass nested C++ vector as built-in style multi-dimensional array

If I have a vector in C++, I know I can safely pass it as an array (pointer to the contained type):
void some_function(size_t size, int array[])
{
// impl here...
}
// ...
std::vector<int> test;
some_function(test.size(), &test[0]);
Is it safe to do this with a nested vector?
void some_function(size_t x, size_t y, size_t z, int* multi_dimensional_array)
{
// impl here...
}
// ...
std::vector<std::vector<std::vector<int> > > test;
// initialize with non-jagged dimensions, ensure they're not empty, then...
some_function(test.size(), test[0].size(), test[0][0].size(), &test[0][0][0]);
Edit:
If it is not safe, what are some alternatives, both if I can change the signature of some_function, and if I can't?
Short answer is "no".
Elements here std::vector<std::vector<std::vector<int> > > test; are not replaced in contiguous memory area.
You can only expect multi_dimensional_array to point to a contiguos memory block of size test[0][0].size() * sizeof(int). But that is probably not what you want.
It is erroneous to take the address of any location in a vector and pass it. It might seem to work, but don't count on it.
The reason why is closely tied to why a vector is a vector, and not an array. We want a vector to grow dynamically, unlike an array. We want insertions into a vector be a constant cost and not depend on the size of the vector, like an array until you hit the allocated size of the array.
So how does the magic work? When there is no more internal space to add a next element to the vector, a new space is allocated twice the size of the old. The old space is copied to the new and the old space is no longer needed, or valid, which makes dangling any pointer to the old space. Twice the space is allocated so the average cost of insertion to the vector that is constant.
Is it safe to do this with a nested vector?
Yes, IF you want to access the inner-most vector only, and as long you know the number of elements it contains, and you don't try accessing more than that.
But seeing your function signature, it seems that you want to acess all three dimensions, in that case, no, that isn't valid.
The alternative is that you can call the function some_function(size_t size, int array[]) for each inner-most vector (if that solves your problem); and for that you can do this trick (or something similar):
void some_function(std::vector<int> & v1int)
{
//the final call to some_function(size_t size, int array[])
//which actually process the inner-most vectors
some_function(v1int.size(), &v1int[0]);
}
void some_function(std::vector<std::vector<int> > & v2int)
{
//call some_function(std::vector<int> & v1int) for each element!
std::for_each(v2int.begin(), v2int.end(), some_function);
}
//call some_function(std::vector<std::vector<int> > & v2int) for each element!
std::for_each(test.begin(), test.end(), some_function);
A very simple solution would be to simply copy the contents of the nested vector into one vector and pass it to that function. But this depends on how much overhead you are willing to take.
That being sad: Nested vectorS aren't good practice. A matrix class storing everything in contiguous memory and managing access is really more efficient and less ugly and would possibly allow something like T* matrix::get_raw() but the ordering of the contents would still be an implementation detail.
Simple answer - no, it is not. Did you try compiling this? And why not just pass the whole 3D vector as a reference? If you are trying to access old C code in this manner, then you cannot.
It would be much safer to pass the vector, or a reference to it:
void some_function(std::vector<std::vector<std::vector<int>>> & vector);
You can then get the size and items within the function, leaving less risk for mistakes. You can copy the vector or pass a pointer/reference, depending on expected size and use.
If you need to pass across modules, then it becomes slightly more complicated.
Trying to use &top_level_vector[0] and pass that to a C-style function that expects an int* isn't safe.
To support correct C-style access to a multi-dimensional array, all the bytes of all the hierarchy of arrays would have to be contiguous. In a c++ std::vector, this is true for the items contained by a vector, but not for the vector itself. If you try to take the address of the top-level vector, ala &top_level_vector[0], you're going to get an array of vectors, not an array of int.
The vector structure isn't simply an array of the contained type. It is implemented as a structure containing a pointer, as well as size and capacity book-keeping data. Therefore the question's std::vector<std::vector<std::vector<int> > > is more or less a hierarchical tree of structures, stitched together with pointers. Only the final leaf nodes in that tree are blocks of contiguous int values. And each of those blocks of memory are not necessarily contiguous to any other block.
In order to interface with C, you can only pass the contents of a single vector. So you'll have to create a single std::vector<int> of size x * y * z. Or you could decide to re-structure your C code to handle a single 1-dimensional stripe of data at a time. Then you could keep the hierarchy, and only pass in the contents of leaf vectors.

C++: Assigning values to non-continuous indexes in vectors?

If I want to declare a vector of unknown size, then assign values to index 5, index 10, index 1, index 100, in that order. Is it easily doable in a vector?
It seems there's no easy way. Cause if I initialize a vector without a size, then I can't access index 5 without first allocating memory for it by doing resize() or five push_back()'s. But resize clears previously stored values in a vector. I can construct the vector by giving it a size to begin with, but I don't know how big the vector should.
So how can I not have to declare a fixed size, and still access non-continuous indices in a vector?
(I doubt an array would be easier for this task).
Would an std::map between integer keys and values not be an easier solution here? Vectors will require a contiguous allocation of memory, so if you're only using the occasional index, you'll "waste" a lot of memory.
Resize doesn't clear the vector. You can easily do something like:
if (v.size() <= n)
v.resize(n+1);
v[n] = 42;
This will preserve all values in the vector and add just enough default initialized values so that index n becomes accessible.
That said, if you don't need all indexes or contigous memory, you might consider a different data structure.
resize() doesn't clear previously stored values in a vector.
see this documentation
I would also argue that if this is what you need to do then its possible that vector may not be the container for you. Did you consider using map maybe?
Data structures which do not contain a contiguous set of values are known as sparse or compressed data structures. It seems that this is what you are looking for.
If this is case, you want a sparse vector. There is one implemented in boost, see link text
Sparse structures are typically used to conserve memory. It is possible from your problem description that you don't actually care about memory use, but about addressing elements that don't yet exist (you want an auto-resizing container). In this case a simple solution with no external dependencies is as follows:
Create a template class that holds a vector and forwards all vector methods to it. Change your operator[] to resize the vector if the index is out of bounds.
// A vector that resizes on dereference if the index is out of bounds.
template<typename T>
struct resize_vector
{
typedef typename std::vector<T>::size_type size_type;
// ... Repeat for iterator/value_type typedefs etc
size_type size() const { return m_impl.size() }
// ... Repeat for all other vector methods you want
value_type& operator[](size_type i)
{
if (i >= size())
resize(i + 1); // Resize
return m_impl[i];
}
// You may want a const overload of operator[] that throws
// instead of resizing (or make m_impl mutable, but thats ugly).
private:
std::vector<T> m_impl;
};
As noted in other answers, elements aren't cleared when a vector is resized. Instead, when new elements are added by a resize, their default constructor is called. You therefore need to know when using this class that operator[] may return you a default constructed object reference. Your default constructor for <T> should therefore set the object to a sensible value for this purpose. You may use a sentinel value for example, if you need to know whether the element has previously been assigned a value.
The suggestion to use a std::map<size_t, T> also has merit as a solution, provided you don't mind the extra memory use, non-contiguous element storage and O(logN) lookup rather than O(1) for the vector. This all boils down to whether you want a sparse representation or automatic resizing; hopefully this answer covers both.