c++ std::vector performance [reference required] - c++

I'm writing a parallel implementation of some data structures. I'd like to find out if someone know about difference in performance between pure pointers and std::vector. If you know trustful docs about it, please write URL/book name/whatever. Any hints are welcome!

The difference is usage and implementation relative.

You can make std::vector as fast as normal pointers by using the unchecked operator[] and resizing appropriately. The reality is that vector is a compile-time abstraction over a pointer- not a runtime one, unless you choose to use extras. What's much more important is the vastly increased safety vector offers- debugging iterators, automatic and safe resource management, etc. There is no reason to use a raw pointer.
Edit: My reference is that profiling run you did before you even considered losing the safety of vector.

According to this answer in a similar question, accessing an element in a dynamically-allocated array versus a std::vector will be roughly the same. There's some good analysis in that question and in this one as well.

If you mean to compare a std::vector with some hand-written dynamic array here are some points of reference:
The resizing factor on insertion is important. This factor isn't specified by the standard but is usually between 1.5 or 2 and it must guarantee amortized constant time on insertion operations.
The allocator: A lot of the performance depends on the allocation mechanism that is used, same goes for your pointers.
Boundary checking can occur in std::vector if you call vector::at which cannot happen with raw pointers.

Related

Raw pointer, smart pointer or std::vector for "low-level" container data in C++

Let's say I am making my own Matrix class/container which itself needs to manage some sort of array of doubles and a row/column dimension.
At the risk of sounding hand-wavy, what is the considered "best practice" for how to store this data if speed is of importance? The options I can see are:
Raw pointer to single dynamic C array
Unique pointer to a C array, which is similar/leads us to...
std::vector of doubles
What are the speed implications of these different options? Obviously it does depend on circumstance, but in a general case? Also, the size of an std::vector on MSVC and GCC for me is 24 bytes, indicating 3 pointers to the begin iterator, end iterator and the end of the memory allocation. Since I need to store the size myself to be aware of the Matrix dimensions, the end iterator is somewhat useless to me, expect for use with algorithms.
What are the thoughts on best practices of this? Is using a raw pointer acceptable since the container is somewhat "low-level"?
Thanks!
I would use std::vector because it solves memory allocation, deallocation, indexing, copying, etc.. Unless you will be using "millions" of matrices at the same time, the extra member (capacity) is probably not relevant.
In any case, optimizing the library for speed is the last thing you want to do -- after you can test the actual speed of your initial implementation. Then you can decide if it is worth spending time to effectively duplicate std::vector functionality with your own implementation.

Should I use std::vector instead of array [duplicate]

This question already has answers here:
When to use vectors and when to use arrays in C++?
(2 answers)
Closed 5 years ago.
The way I see this, they both have the same function except std::vector seems more flexible, so when would I need to use array, and could I use std::vector only?
This is not a new question, the original questions didn't have the answers I was looking for
One interesting thing to note is that while iterators will be invalidated in many functions with vectors, that is not the case with arrays. Note: std::swap with std::array the iterator will still point to the same spot.
See more:
http://en.cppreference.com/w/cpp/container/array
Good summary of advantages of arrays:
https://stackoverflow.com/a/4004027/7537900
This point seemed most interesting:
fixed-size arrays can be embedded directly into a struct or object,
which can improve memory locality and reducing the number of heap
allocations needed
Not having tested that, I'm not sure it's actually true though.
Here is a discussion in regards to 2D Vectors vs Arrays in regards to the competitive programming in Code Chef:
https://discuss.codechef.com/questions/49278/whether-to-use-arrays-or-vectors-in-c
Apparently memory is not contiguous in 2 dimensions in 2D vectors, only one dimension, however in 2D arrays it is.
As a rule of thumb, you should use:
a std::array if the size in fixed at compile time
a std::vector is the size is not fixed at compile time
a pointer on the address of their first element is you need low level access
a raw array if you are implementing a (non standard) container
Standard containers have the ability to know their size even when you pass them to other function, what raw arrays don't, and have enough goodies to never use raw arrays in C++ code without specific reasons. One could be a bottleneck that would require low level optimization, but only after profiling to identify the bottleneck. And you should benchmark in real condition whether the standard containers actually add any overload.
The only good reason I can think of is if you implement a special container. As standard containers are not meant to be derived, you have only two choices, either have you class contain a standard container and end in a container containing a container with delegations everywhere, or mimic a standard container (by copying code from a well knows implementation), and specialize it. In that case, you will find yourself managing directly raw arrays.
When using std:vector, the only performance hit would be when the capacity is reached, as the memory must be relocated to accomodate a larger number of objects in contiguous memory space on the heap
Thus here is a summary of both in regards to flexibility and performance:
std::array; Reallocation is not possible and thus no perfomance hit will occur due to relocation of memory on the heap.
std::vector; Only affects performance if capacity is exceeded and reallocation occurs. You can use reserve(size) to provide a rough estimate to the maximum amount of objects you'll need. This allows greater flexibility compared to std::array but will of course, have to reallocate memory if the reserved space is exceeded.

Why can std::vector be more performant than a native array?

In a comparison on performance between Java and C++ our professor claimed, that theres no benefit in choosing a native array (dynamic array like int * a = new int[NUM] ) over std::vector. She claimed too, that the optimized vector can be faster, but didn't tell us why. What is the magic behind the vector?
PLEASE just answers on performance, this is not a general poll!
Any super optimized code with lower level stuff like raw arrays can beat or tie the performance of an std::vector. But the benefits of having a vector far outweigh any small performance gains you get from low level code.
vector manages it's own memory, array you have to remember to manage the memory
vector allows you to make use of stdlib more easily (with dynamic arrays you have to keep track of the end ptr yourself) which is all very well written well tested code
sometimes vector can even be faster, like qsort v std::sort, read this article, keep in mind the reason this can happen is because writing higher level code can often allow you to reason better about the problem at hand.
Since the std containers are good to use for everything else, that dynamic arrays don't cover, keeping code consistent in style makes it more readable and less prone to errors.
One thing I would keep in mind is that you shouldn't compare a dynamic array to a std::vector they are two different things. It should be compared to something like std::dynarray which unfortunately probably won't make it into c++14 (boost prolly has one, and I'm sure there are reference implementations lying around). A std::dynarray implementation is unlikely to have any performance difference from a native array.
There is benefit of using vector instead of array when the number of elements needs to be changed.
Neither optimized vector can be faster than array.
The only performance optimization a std::vector can offer over a plain array is, that it can request more memory than currently needed. std::vector::reserve does just that. Adding elements up to its capacity() will not involve any more allocations.
This can be implemented with plain arrays as well, though.

Overhead to using std::vector?

I know that manual dynamic memory allocation is a bad idea in general, but is it sometimes a better solution than using, say, std::vector?
To give a crude example, if I had to store an array of n integers, where n <= 16, say. I could implement it using
int* data = new int[n]; //assuming n is set beforehand
or using a vector:
std::vector<int> data;
Is it absolutely always a better idea to use a std::vector or could there be practical situations where manually allocating the dynamic memory would be a better idea, to increase efficiency?
It is always better to use std::vector/std::array, at least until you can conclusively prove (through profiling) that the T* a = new T[100]; solution is considerably faster in your specific situation. This is unlikely to happen: vector/array is an extremely thin layer around a plain old array. There is some overhead to bounds checking with vector::at, but you can circumvent that by using operator[].
I can't think of any case where dynamically allocating a C style
vector makes sense. (I've been working in C++ for over 25
years, and I've yet to use new[].) Usually, if I know the
size up front, I'll use something like:
std::vector<int> data( n );
to get an already sized vector, rather than using push_back.
Of course, if n is very small and is known at compile time,
I'll use std::array (if I have access to C++11), or even
a C style array, and just create the object on the stack, with
no dynamic allocation. (Such cases seem to be rare in the
code I work on; small fixed size arrays tend to be members of
classes. Where I do occasionally use a C style array.)
If you know the size in advance (especially at compile time), and don't need the dynamic re-sizing abilities of std::vector, then using something simpler is fine.
However, that something should preferably be std::array if you have C++11, or something like boost::scoped_array otherwise.
I doubt there'll be much efficiency gain unless it significantly reduces code size or something, but it's more expressive which is worthwhile anyway.
You should try to avoid C-style-arrays in C++ whenever possible. The STL provides containers which usually suffice for every need. Just imagine reallocation for an array or deleting elements out of its middle. The container shields you from handling this, while you would have to take care of it for yourself, and if you haven't done this a hundred times it is quite error-prone.
An exception is of course, if you are adressing low-level-issues which might not be able to cope with STL-containers.
There have already been some discussion about this topic. See here on SO.
Is it absolutely always a better idea to use a std::vector or could there be practical situations where manually allocating the dynamic memory would be a better idea, to increase efficiency?
Call me a simpleton, but 99.9999...% of the times I would just use a standard container. The default choice should be std::vector, but also std::deque<> could be a reasonable option sometimes. If the size is known at compile-time, opt for std::array<>, which is a lightweight, safe wrapper of C-style arrays which introduces zero overhead.
Standard containers expose member functions to specify the initial reserved amount of memory, so you won't have troubles with reallocations, and you won't have to remember delete[]ing your array. I honestly do not see why one should use manual memory management.
Efficiency shouldn't be an issue, since you have throwing and non-throwing member functions to access the contained elements, so you have a choice whether to favor safety or performance.
std::vector could be constructed with an size_type parameter that instantiate the vector with the specified number of elements and that does a single dynamic allocation (same as your array) and also you can use reserve to decrease the number of re-allocations over the usage time.
In n is known at compile-time, then you should choose std::array as:
std::array<int, n> data; //n is compile-time constant
and if n is not known at compile-time, OR the array might grow at runtime, then go for std::vector:
std::vector<int> data(n); //n may be known at runtime
Or in some cases, you may also prefer std::deque which is faster than std::vector in some scenario. See these:
C++ benchmark – std::vector VS std::list VS std::deque
Using Vector and Deque by Herb Sutter
Hope that helps.
From a perspective of someone who often works with low level code with C++, std vectors are really just helper methods with a safety net for a classic C style array. The only overheads you'd experience realistically are memory allocations and safety checks for boundaries. If you're writing a program which needs performance and are going to be using vectors as a regular array I'd recommend to just use C style arrays instead of vectors. You should realistically be vetting the data that comes into the application and check the boundaries yourself to avoid checks on every memory access to the array.
It's good to see that others are checking the differences of the C ways and the C++ ways. More often than not C++ standard methods have significantly worse performance and uglier syntax than their C counterparts and is generally the reason people call C++ bloated. I think C++ focuses more on safety and making the language more like JavaScript/C# even though the language fundamentally lacks the foundation to be one.

Why is 'unbounded_array' more efficient than 'vector'?

It says here that
The unbounded array is similar to a
std::vector in that in can grow in
size beyond any fixed bound. However
unbounded_array is aimed at optimal
performance. Therefore unbounded_array
does not model a Sequence like
std::vector does.
What does this mean?
As a Boost developer myself, I can tell you that it's perfectly fine to question the statements in the documentation ;-)
From reading those docs, and from reading the source code (see storage.hpp) I can say that it's somewhat correct given some assumptions about the implementation of std::vector at the time that code was written. That code dates to 2000 initially, and perhaps as late as 2002. Which means at the time many STD implementations did not do a good job of optimizing destruction and construction of objects in containers. The claim about the non-resizing is easily refuted by using an initially large capacity vector. The claim about speed, I think, comes entirely from the fact that the unbounded_array has special code for eliding dtors & ctors when the stored objects have trivial implementations of them. Hence it can avoid calling them when it has to rearrange things, or when it's copying elements. Compared to really recent STD implementations it's not going to be faster, as new STD implementation tend to take advantage of things like move semantics to do even more optimizations.
It appears to lack insert and erase methods. As these may be "slow," ie their performance depends on size() in the vector implementation, they were omitted to prevent the programmer from shooting himself in the foot.
insert and erase are required by the standard for a container to be called a Sequence, so unlike vector, unbounded_array is not a sequence.
No efficiency is gained by failing to be a sequence, per se.
However, it is more efficient in its memory allocation scheme, by avoiding a concept of vector::capacity and always having the allocated block exactly the size of the content. This makes the unbounded_array object smaller and makes the block on the heap exactly as big as it needs to be.
As I understood it from the linked documentation, it is all about allocation strategy. std::vector afaik postpones allocation until necessary and than might allocate some reasonable chunk of meory, unbounded_array seams to allocate more memory early and therefore it might allocate less often. But this is only a gues from the statement in documentation, that it allocates more memory than might be needed and that the allocation is more expensive.