Most of the time I am confused about how allocation/deallocation of stl objects is done. For example: take this loop.
vector<vector<int>> example;
for(//some conditions) {
vector<int>row;
for(//some conditions) {
row.push_back(k); //k is some int.
}
example.push_back(row);
}
In this case what is happening with the object row. I can still see values if I access via example, which means that when I do example.push_back(row) a new copy is created. Am I correct. Is there a good way to prevent the same(if I am correct).
Also can anyone give references where I can read up how is allocation/deallocation handled in stl or what are best practices to avoid such memory copying issue(in case of large applications).
Any help appreciated.
when I do example.push_back(row) a new copy is created. Am I correct.
Yes.
Is there a good way to prevent the same
Why would you want to prevent it? That behaviour is what makes vector simple and safe to use.
The standard library containers have value semantics, so they take a copy of the values you add to them and they manage the lifetime of those values, so you don't need to worry about it.
Also can anyone give references where I can read up how is allocation/deallocation handled in stl
Have you never heard of a search engine? Try http://www.sgi.com/tech/stl/Allocators.html for starters.
or what are best practices to avoid such memory copying issue(in case of large applications).
In general: forget about it. You usually don't need to worry about it, unless profiling has shown there's a performance problem.
std::vector does allow more fine-grained control over its memory usage, see the New members section and footnotes at http://www.sgi.com/tech/stl/Vector.html for more information.
For your example, you could add a new row to the example container then add the int values directly to it:
vector<vector<int>> example;
for(/*some conditions*/) {
example.resize(example.size()+1);
vector<int>& row = example.back();
for(/*some conditions*/) {
row.push_back(k); //k is some int.
}
}
Even better would be to reserve enough capacity in the vector in advance:
vector<vector<int>> example;
example.reserve( /* maximum expected size of vector */ );
for(/*some conditions*/) {
example.resize(example.size()+1);
vector<int>& row = example.back();
for(/*some conditions*/) {
row.push_back(k); //k is some int.
}
}
All an stl implementation has to do is obey the standard.
But std::swap is often used to switch the contents of a vector with another. This can be used to prevent value copies from being taken and is a good way of achieving efficiency, at least in the pre C++11 world. (In your case, push back an empty vector and swap it with the one you've created).
Related
Is there a way to reduce the capacity of a vector ?
My code inserts values into a vector (not knowing their number beforehand), and
when this finishes, the vectors are used only for read operations.
I guess I could create a new vector, do a .reseve() with the size and copy
the items, but I don't really like the extra copy operation.
PS: I don't care for a portable solution, as long as it works for gcc.
std::vector<T>(v).swap(v);
Swapping the contents with another vector swaps the capacity.
std::vector<T>(v).swap(v); ==> is equivalent to
std::vector<T> tmp(v); // copy elements into a temporary vector
v.swap(tmp); // swap internal vector data
Swap() would only change the internal data structure.
With C++11, you can call the member function shrink_to_fit(). The draft standard section 23.2.6.2 says:
shrink_to_fit is a non-binding request
to reduce capacity() to size(). [Note: The request is non-binding to
allow latitude for
implementation-specific optimizations.
—end note]
Go look at Scott Meyers Effective STL item 17.
Basically you can't directly reduce the storage size of a std::vector. resize() and reseve() will never reduce the actually memory footprint of a container. The "trick" is to create a new container of the right size, copy the data and swap that with the current container. If we would like to clear a container out this is simply:
std::vector<T>().swap(v);
If we have to copy the data over then we need to do the copy:
std::vector<T>(v).swap(v);
What this does is creates a new vector with the data from the old one, doing the copy that would be required in any operation that has the effect you need. Then calling swap() will just swap the internal buffers between the objects. At the end of the line the temporary vector that was created is deleted, but it has the guts from the old vector and the old vector has the guts from the new copy that is the exact size we need.
The idiomatic solution is to swap with a newly constructed vector.
vector<int>().swap(v);
Edit: I misread the question. The code above will clear the vector. OP wants to keep the elements untouched, only shrink capacity() to size().
It is difficult to say if aJ's code will do that. I doubt there's portable solution. For gcc, you'll have to take a look at their particular implementation of vector.
edit: So I've peeked at libstdc++ implementation. It seems that aJ's solution will indeed work.
vector<int>(v).swap(v);
See the source, line 232.
No, you cannot reduce the capacity of a vector without copying. However, you can control how much new allocation growth by checking capacity() and call reserve() every time you insert something. The default behavior for std::vector is to grow its capacity by a factor of 2 every time new capacity is needed. You can growth it by your own magic ratio:
template <typename T>
void myPushBack(std::vector<T>& vec, const T& val) {
if (vac.size() + 1 == vac.capacity()) {
vac.reserve(vac.size() * my_magic_ratio);
}
vec.push_back(val);
}
If you're into a bit hacky techniques, you can always pass in your own allocator and do whatever you need to do to reclaim the unused capacity.
I'm not saying that GCC couldn't have some method for doing what you want without a copy, but it would be tricky to implement (I think) because vectors need to use an Allocator object to allocate and deallocate memory, and the interface for an Allocator doesn't include a reallocate() method. I don't think it would be impossible to do, but it might be tricky.
If you're worried about about the overhead of your vector then maybe you should be looking to using another type of data structure. You mentioned that once your code is done initializing the vector it becomes a read only process. I would suggest going with an open ended array that will allow the program to decide its capacity at compile time. Or perhaps a linked list would be more suitable to your needs.
Lemme know if I completely misunderstood what you were getting at.
-UBcse
Old thread, I know, but in case anyone is viewing this in the future.. there's shrink_to_fit() in C++11 but since it is a non-binding request, the behaviour will depend on its implementation.
See: http://en.cppreference.com/w/cpp/container/vector/shrink_to_fit
I'm not an expert in C++,but it seems this solution works(atleast compiling it with g++ does):
std::vector<int>some_vector(20);//initial capacity 10
//first you gotta resize the vector;
some_vector.resize(10);
//then you can shrink to fit;
some_vector.shrink_to_fit();
//new capacity is 10;
This also works:
Try it online!
v = std::vector<T>(v); // if we need to keep same data
v = std::vector<T>(); // if we need to clear
It calls && overload of = operator, which does moving, same overload is used by swap().
Get the "Effective STL" book by Scott Myers. It has a complete item jus on reducing vector's capacity.
I have a float vector. As I process certain data, I push it back.I always know what the size will be while declaring the vector.
For the largest case, it is 172,490,752 floats. This takes about eleven seconds just to push_back everything.
Is there a faster alternative, like a different data structure or something?
If you know the final size, then reserve() that size after you declare the vector. That way it only has to allocate memory once.
Also, you may experiment with using emplace_back() although I doubt it will make any difference for a vector of float. But try it and benchmark it (with an optimized build of course - you are using an optimized build - right?).
The usual way of speeding up a vector when you know the size beforehand is to call reserve on it before using push_back. This eliminates the overhead of reallocating memory and copying the data every time the previous capacity is filled.
Sometimes for very demanding applications this won't be enough. Even though push_back won't reallocate, it still needs to check the capacity every time. There's no way to know how bad this is without benchmarking, since modern processors are amazingly efficient when a branch is always/never taken.
You could try resize instead of reserve and use array indexing, but the resize forces a default initialization of every element; this is a waste if you know you're going to set a new value into every element anyway.
An alternative would be to use std::unique_ptr<float[]> and allocate the storage yourself.
::boost::container::stable_vector Notice that allocating a contiguous block of 172 *4 MB might easily fail and requires quite a lot page joggling. Stable vector is essentially a list of smaller vectors or arrays of reasonable size. You may also want to populate it in parallel.
You could use a custom allocator which avoids default initialisation of all elements, as discussed in this answer, in conjunction with ordinary element access:
const size_t N = 172490752;
std::vector<float, uninitialised_allocator<float> > vec(N);
for(size_t i=0; i!=N; ++i)
vec[i] = the_value_for(i);
This avoids (i) default initializing all elements, (ii) checking for capacity at every push, and (iii) reallocation, but at the same time preserves all the convenience of using std::vector (rather than std::unique_ptr<float[]>). However, the allocator template parameter is unusual, so you will need to use generic code rather than std::vector-specific code.
I have two answers for you:
As previous answers have pointed out, using reserve to allocate the storage beforehand can be quite helpful, but:
push_back (or emplace_back) themselves have a performance penalty because during every call, they have to check whether the vector has to be reallocated. If you know the number of elements you will insert already, you can avoid this penalty by directly setting the elements using the access operator []
So the most efficient way I would recommend is:
Initialize the vector with the 'fill'-constructor:
std::vector<float> values(172490752, 0.0f);
Set the entries directly using the access operator:
values[i] = some_float;
++i;
The reason push_back is slow is that it will need to copy all the data several times as the vector grows, and even when it doesn’t need to copy data it needs to check. Vectors grow quickly enough that this doesn’t happen often, but it still does happen. A rough rule of thumb is that every element will need to be copied on average once or twice; the earlier elements will need to be copied a lot more, but almost half the elements won’t need to be copied at all.
You can avoid the copying, but not the checks, by calling reserve on the vector when you create it, ensuring it has enough space. You can avoid both the copying and the checks by creating it with the right size from the beginning, by giving the number of elements to the vector constructor, and then inserting using indexing as Tobias suggested; unfortunately, this also goes through the vector an extra time initializing everything.
If you know the number of floats at compile time and not just runtime, you could use an std::array, which avoids all these problems. If you only know the number at runtime, I would second Mark’s suggestion to go with std::unique_ptr<float[]>. You would create it with
size_t size = /* Number of floats */;
auto floats = unique_ptr<float[]>{new float[size]};
You don’t need to do anything special to delete this; when it goes out of scope it will free the memory. In most respects you can use it like a vector, but it won’t automatically resize.
Is there any big difference in allocation, deallocation and access time between std::vector<> and new[] when both are fixed and same length?
Depends on the types and how you call it. std::vector<int> v(1000000); has to zero a million ints, whereas new int[1000000]; doesn't, so I would expect a difference in speed. This is one place in std::vector where you might pay through the nose for something you don't use, if for some reason you don't care about the initial values of the elements.
If you compare std::vector<int> v(1000000); with new int[1000000](); then I doubt you'll see much difference. The significant question is whether one of them somehow has a more optimized loop setting the zeros, than the other one does. If so, then the implementation of the other one has missed a trick (or more specifically the optimizer has).
new is a bad thing, because it violates the idiom of single responsibility by assuming two responsibilities: Storage allocation and object construction. Complexity is the enemy of sanity, and you fight complexity by separating concerns and isolating responsibilities.
The standard library containers allow you to do just that, and only think about objects. Moreover, std::vector addionally allows you to think about storage, but separately, via the reserve/capacity interfaces.
So for the sake of keeping a clear mind about your program logic, you should always prefer a container such as std::vector:
std::vector<Foo> v;
// make some storage available
v.reserve(100);
// work with objects - no allocation is required
v.push_back(x);
v.push_back(f(1, 2));
v.emplace_back(true, 'a', 10);
I have a need to almost-constantly iterate over a sequence of structs in a read-only fashion but for every 1M+ reads, one of the threads may append an item. I think using a mutex would be overkill here and I also read somewhere that r/w locks have their own drawbacks for the readers.
I was thinking about using reserve() on a std::vector but this answer Iterate over STL container using indices safe way to avoid using locks? seemed to invalidate that.
Any ideas on what way might be fastest? The most important thing is for the readers to be able to quickly and efficiently iterate with as little contention as possible. The writing operations aren't time-sensitive.
Update: Another one of my use cases is that the "list" could contain pointers rather than structs. I.e, std::vector. Same requirements apply.
Update 2: Hypothetic example
globally accessible:
typedef std::vector<MyClass*> Vector;
Vector v;
v.reserve(50);
Reader threads 1-10: (these run pretty much run all the time)
.
.
int total = 0;
for (Vector::const_iterator it = v.begin(); it != v.end(); ++it)
{
MyClass* ptr = *it;
total += ptr->getTotal();
}
// do something with total
.
.
Writer threads 11-15:
MyClass* ptr = new MyClass();
v.push_back(ptr);
That's basically what happens here. threads 1-15 could all be running concurrently although generally there are only 1-2 reading threads and 1-2 writer threads.
What I think could work here is own implementation of vector, something like this:
template <typename T> class Vector
{
// constructor will be needed of course
public:
std::shared_ptr<const std::vector<T> > getVector()
{ return mVector; }
void push_back(const T&);
private:
std::shared_ptr<std::vector<T> > mVector;
};
Then, whenever readers need to access a specific Vector, they should call getVector() and keep the returned shared_ptr until finished reading.
But writers should always use Vector's push_back to add new value. This push_back should then check if mVector.size() == mVector.capacity() and if true, allocate new vector and assign it to mVector. Something like:
template <typename T> Vector<T>::push_back(const T& t)
{
if (mVector->size() == mVector->capacity())
{
// make certain here that new_size > old_size
std::vector<T> vec = new std::vector<T> (mVector->size() * SIZE_MULTIPLIER);
std::copy(mVector->begin(), mVector->end(), vec->begin());
mVector.reset(vec);
}
// put 't' into 'mVector'. 'mVector' is guaranteed not to reallocate now.
}
The idea here is inspired by RCU (read-copy-update) algorithm. If storage space is exhausted, the new storage should not invalidate the old storage as long as there is at least one reader accessing it. But, the new storage should be allocated and any reader coming after allocation, should be able to see it. The old storage should be deallocated as soon as no one is using it anymore (all readers are finished).
Since most HW architectures provide some way to have atomic increments and decrements, I think shared_ptr (and thus Vector) will be able to run completely lock-less.
One disadvantage to this approach though, is that depending on how long readers hold that shared_ptr you might end up with several copies of your data.
PS: hope I haven't made too many embarrassing errors in the code :-)
... using reserve() on a std::vector ...
This can only be useful if you can guarantee the vector will never need to grow. You've stated that the number if items is not bounded above, so you can't give that guarantee.
Notwithstanding the linked question, you could conceivable use std::vector just to manage memory for you, but it would take an extra layer of logic on top to work around the problems identified in the accepted answer.
The actual answer is: the fastest thing to do is minimize the amount of synchronization. What the minimal amount of synchronization is depends on details of your code and usage that you haven't specified.
For example, I sketched a solution using a linked-list of fixed-size chunks. This means your common use case should be as efficient as an array traversal, but you're able to grow dynamically without re-allocating.
However, the implementation turns out to be sensitive to questions like:
whether you need to remove items
whenever they're read?
only from the front or from other places?
whether you want the reader to busy-wait if the container is empty
whether this should use some kind of backoff
what degree of consistency is required?
Is there a way to reduce the capacity of a vector ?
My code inserts values into a vector (not knowing their number beforehand), and
when this finishes, the vectors are used only for read operations.
I guess I could create a new vector, do a .reseve() with the size and copy
the items, but I don't really like the extra copy operation.
PS: I don't care for a portable solution, as long as it works for gcc.
std::vector<T>(v).swap(v);
Swapping the contents with another vector swaps the capacity.
std::vector<T>(v).swap(v); ==> is equivalent to
std::vector<T> tmp(v); // copy elements into a temporary vector
v.swap(tmp); // swap internal vector data
Swap() would only change the internal data structure.
With C++11, you can call the member function shrink_to_fit(). The draft standard section 23.2.6.2 says:
shrink_to_fit is a non-binding request
to reduce capacity() to size(). [Note: The request is non-binding to
allow latitude for
implementation-specific optimizations.
—end note]
Go look at Scott Meyers Effective STL item 17.
Basically you can't directly reduce the storage size of a std::vector. resize() and reseve() will never reduce the actually memory footprint of a container. The "trick" is to create a new container of the right size, copy the data and swap that with the current container. If we would like to clear a container out this is simply:
std::vector<T>().swap(v);
If we have to copy the data over then we need to do the copy:
std::vector<T>(v).swap(v);
What this does is creates a new vector with the data from the old one, doing the copy that would be required in any operation that has the effect you need. Then calling swap() will just swap the internal buffers between the objects. At the end of the line the temporary vector that was created is deleted, but it has the guts from the old vector and the old vector has the guts from the new copy that is the exact size we need.
The idiomatic solution is to swap with a newly constructed vector.
vector<int>().swap(v);
Edit: I misread the question. The code above will clear the vector. OP wants to keep the elements untouched, only shrink capacity() to size().
It is difficult to say if aJ's code will do that. I doubt there's portable solution. For gcc, you'll have to take a look at their particular implementation of vector.
edit: So I've peeked at libstdc++ implementation. It seems that aJ's solution will indeed work.
vector<int>(v).swap(v);
See the source, line 232.
No, you cannot reduce the capacity of a vector without copying. However, you can control how much new allocation growth by checking capacity() and call reserve() every time you insert something. The default behavior for std::vector is to grow its capacity by a factor of 2 every time new capacity is needed. You can growth it by your own magic ratio:
template <typename T>
void myPushBack(std::vector<T>& vec, const T& val) {
if (vac.size() + 1 == vac.capacity()) {
vac.reserve(vac.size() * my_magic_ratio);
}
vec.push_back(val);
}
If you're into a bit hacky techniques, you can always pass in your own allocator and do whatever you need to do to reclaim the unused capacity.
I'm not saying that GCC couldn't have some method for doing what you want without a copy, but it would be tricky to implement (I think) because vectors need to use an Allocator object to allocate and deallocate memory, and the interface for an Allocator doesn't include a reallocate() method. I don't think it would be impossible to do, but it might be tricky.
If you're worried about about the overhead of your vector then maybe you should be looking to using another type of data structure. You mentioned that once your code is done initializing the vector it becomes a read only process. I would suggest going with an open ended array that will allow the program to decide its capacity at compile time. Or perhaps a linked list would be more suitable to your needs.
Lemme know if I completely misunderstood what you were getting at.
-UBcse
Old thread, I know, but in case anyone is viewing this in the future.. there's shrink_to_fit() in C++11 but since it is a non-binding request, the behaviour will depend on its implementation.
See: http://en.cppreference.com/w/cpp/container/vector/shrink_to_fit
I'm not an expert in C++,but it seems this solution works(atleast compiling it with g++ does):
std::vector<int>some_vector(20);//initial capacity 10
//first you gotta resize the vector;
some_vector.resize(10);
//then you can shrink to fit;
some_vector.shrink_to_fit();
//new capacity is 10;
This also works:
Try it online!
v = std::vector<T>(v); // if we need to keep same data
v = std::vector<T>(); // if we need to clear
It calls && overload of = operator, which does moving, same overload is used by swap().
Get the "Effective STL" book by Scott Myers. It has a complete item jus on reducing vector's capacity.