Faster alternative to push_back(size is known) - c++

I have a float vector. As I process certain data, I push it back.I always know what the size will be while declaring the vector.
For the largest case, it is 172,490,752 floats. This takes about eleven seconds just to push_back everything.
Is there a faster alternative, like a different data structure or something?

If you know the final size, then reserve() that size after you declare the vector. That way it only has to allocate memory once.
Also, you may experiment with using emplace_back() although I doubt it will make any difference for a vector of float. But try it and benchmark it (with an optimized build of course - you are using an optimized build - right?).

The usual way of speeding up a vector when you know the size beforehand is to call reserve on it before using push_back. This eliminates the overhead of reallocating memory and copying the data every time the previous capacity is filled.
Sometimes for very demanding applications this won't be enough. Even though push_back won't reallocate, it still needs to check the capacity every time. There's no way to know how bad this is without benchmarking, since modern processors are amazingly efficient when a branch is always/never taken.
You could try resize instead of reserve and use array indexing, but the resize forces a default initialization of every element; this is a waste if you know you're going to set a new value into every element anyway.
An alternative would be to use std::unique_ptr<float[]> and allocate the storage yourself.

::boost::container::stable_vector Notice that allocating a contiguous block of 172 *4 MB might easily fail and requires quite a lot page joggling. Stable vector is essentially a list of smaller vectors or arrays of reasonable size. You may also want to populate it in parallel.

You could use a custom allocator which avoids default initialisation of all elements, as discussed in this answer, in conjunction with ordinary element access:
const size_t N = 172490752;
std::vector<float, uninitialised_allocator<float> > vec(N);
for(size_t i=0; i!=N; ++i)
vec[i] = the_value_for(i);
This avoids (i) default initializing all elements, (ii) checking for capacity at every push, and (iii) reallocation, but at the same time preserves all the convenience of using std::vector (rather than std::unique_ptr<float[]>). However, the allocator template parameter is unusual, so you will need to use generic code rather than std::vector-specific code.

I have two answers for you:
As previous answers have pointed out, using reserve to allocate the storage beforehand can be quite helpful, but:
push_back (or emplace_back) themselves have a performance penalty because during every call, they have to check whether the vector has to be reallocated. If you know the number of elements you will insert already, you can avoid this penalty by directly setting the elements using the access operator []
So the most efficient way I would recommend is:
Initialize the vector with the 'fill'-constructor:
std::vector<float> values(172490752, 0.0f);
Set the entries directly using the access operator:
values[i] = some_float;
++i;

The reason push_back is slow is that it will need to copy all the data several times as the vector grows, and even when it doesn’t need to copy data it needs to check. Vectors grow quickly enough that this doesn’t happen often, but it still does happen. A rough rule of thumb is that every element will need to be copied on average once or twice; the earlier elements will need to be copied a lot more, but almost half the elements won’t need to be copied at all.
You can avoid the copying, but not the checks, by calling reserve on the vector when you create it, ensuring it has enough space. You can avoid both the copying and the checks by creating it with the right size from the beginning, by giving the number of elements to the vector constructor, and then inserting using indexing as Tobias suggested; unfortunately, this also goes through the vector an extra time initializing everything.
If you know the number of floats at compile time and not just runtime, you could use an std::array, which avoids all these problems. If you only know the number at runtime, I would second Mark’s suggestion to go with std::unique_ptr<float[]>. You would create it with
size_t size = /* Number of floats */;
auto floats = unique_ptr<float[]>{new float[size]};
You don’t need to do anything special to delete this; when it goes out of scope it will free the memory. In most respects you can use it like a vector, but it won’t automatically resize.

Related

Are C++ vector constructors efficient?

If I make a vector like this:
vector<int>(50000000, 0);
What happens internally? Does it make a default vector and then continually add values, resizing as necessary? Note: 50,000,000 is not known at compile time.
Would it make a difference if I make the vector like this:
gVec = vector<int>();
gVec.reserve(50000000);
// push_back default values
Please tell me the constructor knows to avoid unnecessary reallocations given the two parameters.
Would it make a difference if I make the vector like this:
gVec = vector<int>();
gVec.reserve(50000000);
// push_back default values
Yes it definitiely makes a difference Using push_back() to fill in the default values may turn out a lot less efficient.
To have the same operations as with done with the constructor vector<int>(50000000, 0); use std::vector<int>::resize():
vector<int> gVec;
gVec.resize(50000000,0);
You will greatly enhance what you learn from this question by stepping through the two options in the debugger - seeing what the std::vector source code does should be instructive if you can mentally filter out a lot of the initially-confusing template and memory allocation abstractions. Demystify this for yourself - the STL is just someone else's code, and most of my work time is spent looking through that.
std::vector guarantees contiguous storage so only one memory block is ever allocated for the elements. The vector control structure will require a second allocation, if it is heap-based and not RAII (stack-based).
vector<int>(N, 0);
creates a vector of capacity >= N and size N, with N values each set to 0.
Step by step:
gVec = vector<int>();
creates an empty vector, typically with a nominal 'best-guess' capacity.
gVec.reserve(N);
updates the vector's capacity - ensures the vector has room for at least N elements. Typically this involves a reallocation from the 'best guess' default capacity, which is unlikely to be large enough for the value of N proposed in this question.
// push_back default values
Each iteration here increases the vector's size by one and sets the new back() element of the vector to 0. The vector's capacity will not change until the number of values pushed exceeds N plus whatever pad the vector implementation might have applied (typically none).
reserve solely allocates storage. No initialization is performed. Applied on an empty vector it should result in one call to the allocate member function of the allocator used.
The constructor shown allocates the storage required and initializes every element to zero: It's semantically equivalent to a reserve and a row of push_back's.
In both cases no reallocations are done.
I suppose in theory the constructor could start by allocating a small block of memory and expanding several times before returning, at least for types that didn't have side-effects in their copy constructor. This would be allowed only because there were no observable side effects of doing so though, not because the standard does anything to allow it directly.
At least in my opinion, it's not worth spending any time or effort worrying about such a possibility though. Chances of anybody doing it seem remote, to say the least. It's only "allowed" to the degree that it's essentially impossible to truly prohibit it.

Variable sized char array with minimizing calls to new?

I need a char array that will dynamically change in size. I do not know how big it can get so preallocating is not an option. It could never get any bigger than 20 bytes 1 time, the next time it may get up to 5kb...
I want the allocation to be like a std vector.
I thought of using a std vector < char > but all those push backs seem like they waste time:
strVec.clear();
for(size_t i = 0; i < varLen; ++i)
{
strVec.push_back(0);
}
Is this the best I can do or is there a way to add a bunch of items to a vector at once? Or maybe a better way to do this.
Thanks
std::vector doesn't allocate memory every time you call push_back, but only when the size becomes bigger than the capacity
First, don't optimize until you've profiled your code and determined that there is a bottleneck. Consider the costs to readability, accessibility, and maintainability by doing something clever. Make sure any plan you take won't preclude you from working with Unicode in future. Still here? Alright.
As others have mentioned, vectors reserve more memory than they use initially, and push_back usually is very cheap.
There are cases when using push_back reallocates memory more than is necessary, however. For example, one million calls to myvector.push_back() might trigger 10 or 20 reallocations of myvector. On the other hand, inserting into a vector at its end will cause at most 1 reallocation of myvector*. I generally prefer the insertion idiom to the reserve / push_back idiom for both speed and readability reasons.
myvector.insert(myvector.end(), inputBegin, inputEnd)
If you do not know the size of your string in advance and cannot tolerate the hiccups caused by reallocations, perhaps because of hard real-time constraints, then maybe you should use a linked list. A linked list will have consistent performance at the price of much worse average performance.
If all of this isn't enough for your purposes, consider other data structures such as a rope or post back with more specifics about your case.
From Scott Meyer's Effective STL, IIRC
You can use the resize member function to add a bunch. However, I would not expect that push_back would be slow, especially if the vector's internal capacity is already non-trivial.
Is this the best I can do or is there a way to add a bunch of items to a vector at once? Or maybe a better way to do this.
push_back isn't very slow, it just compares the size to the current capacity and reallocates if necessary. The comparison may work out to essentially zero time because of branch prediction and superscalar execution on the CPU. The reallocation is performed O(log N) times, so the vector uses up to twice as much memory as needed but time spent on reallocation seldom adds up to anything.
To insert several items at once, use insert. There are a few overloads, the only trick is that you need to explicitly pass end.
my_vec.insert( my_vec.end(), num_to_add, initial_value );
my_vec.insert( my_vec.end(), first, last ); // iterators or pointers
For the second form, you could put the values in an array first and then copy the array to the end of the vector. But this might add as much complexity as it removes. That's how it goes with micro-optimization. Only attempt to optimize if you know there's a measurable gain to be had.

Efficient Array Reallocation in C++

How would I efficiently resize an array allocated using some standards-conforming C++ allocator? I know that no facilities for reallocation are provided in the C++ alloctor interface, but did the C++11 revision enable us to work with them more easily? Suppose that I have a class vec with a copy-assignment operator foo& operator=(const foo& x) defined. If x.size() > this->size(), I'm forced to
Call allocator.destroy() on all elements in the internal storage of foo.
Call allocator.deallocate() on the internal storage of foo.
Reallocate a new buffer with enough room for x.size() elements.
Use std::uninitialized_copy to populate the storage.
Is there some way that I more easily reallocate the internal storage of foo without having to go through all of this? I could provide an actual code sample if you think that it would be useful, but I feel that it would be unnecessary here.
Based on a previous question, the approach that I took for handling large arrays that could grow and shrink with reasonable efficiency was to write a container similar to a deque that broke the array down into multiple pages of smaller arrays. So for example, say we have an array of n elements, we select a page size p, and create 1 + n/p arrays (pages) of p elements. When we want to re-allocate and grow, we simply leave the existing pages where they are, and allocate the new pages. When we want to shrink, we free the totally empty pages.
The downside is the array access is slightly slower, in that given and index i, you need the page = i / p, and the offset into the page i % p, to get the element. I find this is still very fast however and provides a good solution. Theoretically, std::deque should do something very similar, but for the cases I tried with large arrays it was very slow. See comments and notes on the linked question for more details.
There is also a memory inefficiency in that given n elements, we are always holding p - n % p elements in reserve. i.e. we only ever allocate or deallocate complete pages. This was the best solution I could come up with in the context of large arrays with the requirement for re-sizing and fast access, while I don't doubt there are better solutions I'd love to see them.
A similar problem also arises if x.size() > this->size() in foo& operator=(foo&& x).
No, it doesn't. You just swap.
There is no function that will resize in place or return 0 on failure (to resize). I don't know of any operating system that supports that kind of functionality beyond telling you how big a particular allocation actually is.
All operating systems do however have support for implementing realloc, however, that does a copy if it cannot resize in place.
So, you can't have it because the C++ language would not be implementable on most current operating systems if you had to add a standard function to do it.
There are the C++11 rvalue reference and move constructors.
There's a great video talk on them.
Even if re-allocate exists, actually, you can only avoid #2 you mentioned in your question in a copy constructor. However in the case of internal buffer growing, re-allocate can save these four operations.
Is internal buffer of your array continuous? if so see the answer of your link
if not, Hashed array tree or array list may be your choice to avoid re-allocate.
Interestingly, the default allocator for g++ is smart enough to use the same address for consecutive deallocations and allocations of larger sizes, as long as there is enough unused space after the end of the initially-allocated buffer. While I haven't tested what I'm about to claim, I doubt that there is much of a time difference between malloc/realloc and allocate/deallocate/allocate.
This leads to a potentially very dangerous, nonstandard shortcut that may work if you know that there is enough room after the current buffer so that a reallocation would not result in a new address. (1) Deallocate the current buffer without calling alloc.destroy() (2) Allocate a new, larger buffer and check the returned address (3) If the new address equals the old address, proceed happily; otherwise, you lost your data (4) Call allocator.construct() for elements in the newly-allocated space.
I wouldn't advocate using this for anything other than satisfying your own curiosity, but it does work on g++ 4.6.

Creation of a template class creates major bottleneck

I am trying to write a scientific graph library, it works but I have some performance problems. When creating a graph I use a template class for the nodes and do something like
for(unsigned int i = 0; i < l_NodeCount; ++i)
m_NodeList.push_back(Node<T>(m_NodeCounter++));
Even though in the constructor of the node class almost nothing happens (a few variables are asigned) this part is a major bottleneck of my program (when I use over a million of nodes), especially in the debug mode it becomes too inefficient to run at all.
Is there a better way to simultaneusly create all those template classes without having to call the constructor each time or do I have to rewrite it without templates?
If the constructor does almost nothing, as you say, the bottleneck is most likely the allocation of new memory. The vector grows dynamically, and each time it's memory is exhausted, it will reserve new memory and copy all data there. When adding a large number of objects, this can happen very frequently, and become very expensive. This can be avoided by calling
m_NodeList.reserve(l_NodeCount);
With this call, the vector will allocate enough memory to hold l_NodeCount objects, and you will not have any expensive reallocations when bulk-adding the elements.
There are things that happen in your code:
as you add elements to the vector, it occasionally has to resize the internal array, which involves copying all existing elements to the new array
the constructor is called for each element
The constructor call is unavoidable. You create a million elements, you have a million constructor calls. What you can change is what the constructor does.
Adding elements is obviously unavoidable too, but the copying/resizing can be avoided. Call reserve on the vector initially, to reserve enough space for all your nodes.
Depending on your compiler, optimization settings and other flags, the compiler may do a lot of unnecessary bounds checking and debug checks as well.
You can disable this for the compiler (_SECURE_SCL=0 on VS2005/2008, _ITERATOR_DEBUG_LEVEL=0 in VS2010. I believe it's off by default in GCC, and don't know about other compilers).
Alternatively, you can rewrite the loop to minimize the amount of debug checking that needs to be done. Using the standard library algorithms instead of a raw loop allows the library to skip most of the checks (typically, a bounds check will then be performed on the begin and the end iterator, and not on the intervening iterations, whereas on a plain loop, it'll be done every time an iterator is dereferenced)
I would say, your bottleneck is not a template class which has nothing to do with run-time and is dealt with during compilation, but adding an element to vector container (you have tag "vector" in your question). You are performing A LOT of allocations using push_back. Try allocating required total memory right away and then fill elements.
you can avoid the templates buy having a list of (void *) to the objects. and cast them later.
but... if you wish to have 1,000,000 instances of the node class. you will have to call 1,000,000 the node constructor.

are STL Containers .push_back() naughty

This might seem daft for which I'm sorry, I've been writing a bit some code for the Playstation 2 for uni. I am writing a sort of API for the Graphic Synthesizer. I am using a similar syntax to that of openGL which is a state machine.
So the input would something like
gsBegin(GS_TRIANGLE);
gsColor(...);
gsVertex3f(...);
gsVertex3f(...);
gsVertex3f(...);
gsEnd();
This is great so far for line/triangles/quads with a determined amount of vertices, however things like a LINE_STRIP or TRIANGLE_FAN take an undetermined amount of points. I have been warned off several times for using stl containers because of the push_back() method in this situation because of the time sensitive nature (is this justified).
If its not justified what would be a better way of dealing with the undetermined amount situation. Currently I have an Array that can hold 30 vertices at a time, is this best way of dealing with this kind of situation?
Vector's push_back has amortized constant time complexity because it exponentially increases the capacity of the vector. (I'm assuming you're using vector, because it's ideal for this situation.) However, in practice, rendering code is very performance sensitive, so if the push_back causes a vector reallocation, performance may suffer.
You can prevent reallocations by reserving the capacity before you add to it. If you call myvec.reserve(10);, you are guaranteed to be able to add 10 elements before the vector reallocates.
However, this still requires knowing ahead of time how many elements you need. Also, if you create and destroy lots of different vectors, you're still doing a lot of memory allocation. Instead, just use one vector for all vertices, and re-use it. Calling clear() returns it to empty while keeping its allocated capacity. This way you don't actually need to reserve anything - the first few times you use it it'll reallocate and grow, but once it reaches its peak size, it won't need to reallocate any more. The nice thing about this is the vector finds the approximate size it needs to be, and once it's "warmed up" there's no further allocation so it is high performance.
In short:
Use a single persistently stored std::vector
push_back as much as you like
When you're done, clear().
In practice this will perform as well as a C array, but without a hard limit on size.
University, eh? Just tell them push_back has amortized constant time complexity and they'll be happy.
First, avoid using glBegin / glEnd if you can, and instead use something like glDrawArrays or glDrawElements.
push_back() on a std::vector is a quick operation unless the array needs to grow in size when the operation occurs. Set the vector capacity as high as you think you will need it to be and you should see minimal overhead. 'Raw' arrays will usually always be slightly faster, but then you have to deal with using 'raw' arrays.
There is always the alternative of using a deque.
A deque is very much like a vector, contiguity apart. Basically, it's often implemented as a vector of arrays.
This means a lower allocation cost, but member access might be slightly slower (though constant) because of the double dereference, so I am unsure if it's profitable in your case.
There is also the LLVM alternative: SmallVector<T,N>, which preallocates (right in the vector) space for N elements, and will simply get back to using a traditional vector-like implementation once the size has grown too much.
The drawback to using std::vector in this kind of situation is making sure you manage your memory allocation properly. On systems like the PS2 (PS3 seems to be a bit better at this), memory allocation is insanely slow and if you don't reserve the right amount of space in the vector to begin with (and it has to resize several times when adding items), you will slow your game to a creeping crawl. If you know what your max size is going to be and reserve it when you create the vector, you won't have a problem.
That said, if this vector is going to be a temporary/local variable, you will still be reallocating memory every time your function is called. So if this function is called every frame, you will still have the performance problem. You can get around this by using a custom allocator and/or making the vector global (or a member variable to a class that will exist during your game loop).
You can always equip the container you want to use with proper allocator, which takes into account the limitations of the platform and the expected grow/shrink scenarios etc...