Overhead to using std::vector? - c++

I know that manual dynamic memory allocation is a bad idea in general, but is it sometimes a better solution than using, say, std::vector?
To give a crude example, if I had to store an array of n integers, where n <= 16, say. I could implement it using
int* data = new int[n]; //assuming n is set beforehand
or using a vector:
std::vector<int> data;
Is it absolutely always a better idea to use a std::vector or could there be practical situations where manually allocating the dynamic memory would be a better idea, to increase efficiency?

It is always better to use std::vector/std::array, at least until you can conclusively prove (through profiling) that the T* a = new T[100]; solution is considerably faster in your specific situation. This is unlikely to happen: vector/array is an extremely thin layer around a plain old array. There is some overhead to bounds checking with vector::at, but you can circumvent that by using operator[].

I can't think of any case where dynamically allocating a C style
vector makes sense. (I've been working in C++ for over 25
years, and I've yet to use new[].) Usually, if I know the
size up front, I'll use something like:
std::vector<int> data( n );
to get an already sized vector, rather than using push_back.
Of course, if n is very small and is known at compile time,
I'll use std::array (if I have access to C++11), or even
a C style array, and just create the object on the stack, with
no dynamic allocation. (Such cases seem to be rare in the
code I work on; small fixed size arrays tend to be members of
classes. Where I do occasionally use a C style array.)

If you know the size in advance (especially at compile time), and don't need the dynamic re-sizing abilities of std::vector, then using something simpler is fine.
However, that something should preferably be std::array if you have C++11, or something like boost::scoped_array otherwise.
I doubt there'll be much efficiency gain unless it significantly reduces code size or something, but it's more expressive which is worthwhile anyway.

You should try to avoid C-style-arrays in C++ whenever possible. The STL provides containers which usually suffice for every need. Just imagine reallocation for an array or deleting elements out of its middle. The container shields you from handling this, while you would have to take care of it for yourself, and if you haven't done this a hundred times it is quite error-prone.
An exception is of course, if you are adressing low-level-issues which might not be able to cope with STL-containers.
There have already been some discussion about this topic. See here on SO.

Is it absolutely always a better idea to use a std::vector or could there be practical situations where manually allocating the dynamic memory would be a better idea, to increase efficiency?
Call me a simpleton, but 99.9999...% of the times I would just use a standard container. The default choice should be std::vector, but also std::deque<> could be a reasonable option sometimes. If the size is known at compile-time, opt for std::array<>, which is a lightweight, safe wrapper of C-style arrays which introduces zero overhead.
Standard containers expose member functions to specify the initial reserved amount of memory, so you won't have troubles with reallocations, and you won't have to remember delete[]ing your array. I honestly do not see why one should use manual memory management.
Efficiency shouldn't be an issue, since you have throwing and non-throwing member functions to access the contained elements, so you have a choice whether to favor safety or performance.

std::vector could be constructed with an size_type parameter that instantiate the vector with the specified number of elements and that does a single dynamic allocation (same as your array) and also you can use reserve to decrease the number of re-allocations over the usage time.

In n is known at compile-time, then you should choose std::array as:
std::array<int, n> data; //n is compile-time constant
and if n is not known at compile-time, OR the array might grow at runtime, then go for std::vector:
std::vector<int> data(n); //n may be known at runtime
Or in some cases, you may also prefer std::deque which is faster than std::vector in some scenario. See these:
C++ benchmark – std::vector VS std::list VS std::deque
Using Vector and Deque by Herb Sutter
Hope that helps.

From a perspective of someone who often works with low level code with C++, std vectors are really just helper methods with a safety net for a classic C style array. The only overheads you'd experience realistically are memory allocations and safety checks for boundaries. If you're writing a program which needs performance and are going to be using vectors as a regular array I'd recommend to just use C style arrays instead of vectors. You should realistically be vetting the data that comes into the application and check the boundaries yourself to avoid checks on every memory access to the array.
It's good to see that others are checking the differences of the C ways and the C++ ways. More often than not C++ standard methods have significantly worse performance and uglier syntax than their C counterparts and is generally the reason people call C++ bloated. I think C++ focuses more on safety and making the language more like JavaScript/C# even though the language fundamentally lacks the foundation to be one.

Related

What advantages do arrays hold over vectors?

Well, after a full year of programming and only knowing of arrays, I was made aware of the existence of vectors (by some members of StackOverflow on a previous post of mine). I did a load of researching and studying them on my own and rewrote an entire application I had written with arrays and linked lists, with vectors. At this point, I'm not sure if I'll still use arrays, because vectors seem to be more flexible and efficient. With their ability to grow and shrink in size automatically, I don't know if I'll be using arrays as much. At this point, the only advantage I personally see is that arrays are much easier to write and understand. The learning curve for arrays is nothing, where there is a small learning curve for vectors. Anyway, I'm sure there's probably a good reason for using arrays in some situation and vectors in others, I was just curious what the community thinks. I'm an entirely a novice, so I assume that I'm just not well-informed enough on the strict usages of either.
And in case anyone is even remotely curious, this is the application I'm practicing using vectors with. Its really rough and needs a lot of work: https://github.com/JosephTLyons/Joseph-Lyons-Contact-Book-Application
A std::vector manages a dynamic array. If your program need an array that changes its size dynamically at run-time then you would end up writing code to do all the things a std::vector does but probably much less efficiently.
What the std::vector does is wrap all that code up in a single class so that you don't need to keep writing the same code to do the same stuff over and over.
Accessing the data in a std::vector is no less efficient than accessing the data in a dynamic array because the std::vector functions are all trivial inline functions that the compiler optimizes away.
If, however, you need a fixed size then you can get slightly more efficient than a std::vector with a raw array. However you won't loose anything using a std::array in those cases.
The places I still use raw arrays are like when I need a temporary fixed-size buffer that isn't going to be passed around to other functions:
// some code
{ // new scope for temporary buffer
char buffer[1024]; // buffer
file.read(buffer, sizeof(buffer)); // use buffer
} // buffer is destroyed here
But I find it hard to justify ever using a raw dynamic array over a std::vector.
This is not a full answer, but one thing I can think of is, that the "ability to grow and shrink" is not such a good thing if you know what you want. For example: assume you want to save memory of 1000 objects, but the memory will be filled at a rate that will cause the vector to grow each time. The overhead you'll get from growing will be costly when you can simply define a fixed array
Generally speaking: if you will use an array over a vector - you will have more power at your hands, meaning no "background" function calls you don't actually need (resizing), no extra memory saved for things you don't use (size of vector...).
Additionally, using memory on the stack (array) is faster than heap (vector*) as shown here
*as shown here it's not entirely precise to say vectors reside on the heap, but they sure hold more memory on the heap than the array (that holds none on the heap)
One reason is that if you have a lot of really small structures, small fixed length arrays can be memory efficient.
compare
struct point
{
float coords[4]
}
with
struct point
{
std::vector<float> coords;
}
Alternatives include std::array for cases like this. Also std::vector implementations will over allocate, meaning that if you want resize to 4 slots, you might have memory allocated for 16 slots.
Furthermore, the memory locations will be scattered and hard to predict, killing performance - using an exceptionally larger number of std::vectors may also need to memory fragmentation issues, where new starts failing.
I think this question is best answered flipped around:
What advantages does std::vector have over raw arrays?
I think this list is more easily enumerable (not to say this list is comprehensive):
Automatic dynamic memory allocation
Proper stack, queue, and sort implementations attached
Integration with C++ 11 related syntactical features such as iterator
If you aren't using such features there's not any particular benefit to std::vector over a "raw array" (though, similarly, in most cases the downsides are negligible).
Despite me saying this, for typical user applications (i.e. running on windows/unix desktop platforms) std::vector or std::array is (probably) typically the preferred data structure because even if you don't need all these features everywhere, if you're already using std::vector anywhere else you may as well keep your data types consistent so your code is easier to maintain.
However, since at the core std::vector simply adds functionality on top of "raw arrays" I think it's important to understand how arrays work in order to be fully take advantage of std::vector or std::array (knowing when to use std::array being one example) so you can reduce the "carbon footprint" of std::vector.
Additionally, be aware that you are going to see raw arrays when working with
Embedded code
Kernel code
Signal processing code
Cache efficient matrix implementations
Code dealing with very large data sets
Any other code where performance really matters
The lesson shouldn't be to freak out and say "must std::vector all the things!" when you encounter this in the real world.
Also: THIS!!!!
One of the powerful features of C++ is that often you can write a class (or struct) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.
There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
embedding arrays can localise memory access requirement, improving cache hits and therefore performance

Is there ever a valid reason to use C-style arrays in C++?

Between std::vector and std::array in TR1 and C++11, there are safe alternatives for both dynamic and fixed-size arrays which know their own length and don't exhibit horrible pointer/array duality.
So my question is, are there any circumstances in C++ when C arrays must be used (other than calling C library code), or is it reasonable to "ban" them altogether?
EDIT:
Thanks for the responses everybody, but it turns out this question is a duplicate of
Now that we have std::array what uses are left for C-style arrays?
so I'll direct everybody to look there instead.
[I'm not sure how to close my own question, but if a moderator (or a few more people with votes) wander past, please feel free to mark this as a dup and delete this sentence.]
I didnt want to answer this at first, but Im already getting worried that this question is going to be swamped with C programmers, or people who write C++ as object oriented C.
The real answer is that in idiomatic C++ there is almost never ever a reason to use a C style array. Even when using a C style code base, I usually use vectors. How is that possible, you say? Well, if you have a vector v and a C style function requires a pointer to be passed in, you can pass &v[0] (or better yet, v.data() which is the same thing).
Even for performance, its very rare that you can make a case for a C style array. A std::vector does involve a double indirection but I believe this is generally optimized away. If you dont trust the compiler (which is almost always a terrible move), then you can always use the same technique as above with v.data() to grab a pointer for your tight loop. For std::array, I believe the wrapper is even thinner.
You should only use one if you are an awesome programmer and you know exactly why you are doing it, or if an awesome programmer looks at your problem and tells you to. If you arent awesome and you are using C style arrays, the chances are high (but not 100%) that you are making a mistake,
Foo data[] = {
is a pretty common pattern. Elements can be added to it easily, and the size of the data array grows based on the elements added.
With C++11 you can replicate this with a std::array:
template<class T, class... Args>
auto make_array( Args&&... args )
-> std::array< T, sizeof...(Args) >
{
return { std::forward<Args>(args)... };
}
but even this isn't as good as one might like, as it does not support nested brackets like a C array does.
Suppose Foo was struct Foo { int x; double y; };. Then with C style arrays we can:
Foo arr[] = {
{1,2.2},
{3,4.5},
};
meanwhile
auto arr = make_array<Foo>(
{1,2.2},
{3,4.5}
};
does not compile. You'd have to repeat Foo for each line:
auto arr = make_array<Foo>(
Foo{1,2.2},
Foo{3,4.5}
};
which is copy-paste noise that can get in the way of the code being expressive.
Finally, note that "hello" is a const array of size 6. Code needs to know how to consume C-style arrays.
My typical response to this situation is to convert C-style arrays and C++ std::arrays into array_views, a range that consists of two pointers, and operate on them. This means I do not care if I was fed an array based on C or C++ syntax: I just care I was fed a packed sequence of data elements. These can also consume std::dynarrays and std::vectors with little work.
It did require writing an array_view, or stealing one from boost, or waiting for it to be added to the standard.
Sometimes an exsisting code base can force you to use them
The last time I needed to use them in new code was when I was doing embedded work and the standard library just didn't have an implementation of std::vector or std::array. In some older code bases you have to use arrays because of design decisions made by the previous developers.
In most cases if you are starting a new project with C++11 the old C style arrays are a fairly poor choice. This is because relative to std::array they are difficult to get correct and this difficulty is a direct expense when developing. This C++ FAQ entry sums up my thoughts on the matter fairly well: http://www.parashift.com/c++-faq/arrays-are-evil.html
Pre-C++14: In some (rare) cases, the missing initialization of types like int can improve the execution speed notably. Especially if some algorithm needs many short-lived arrays during his execution and the machine has not enough memory for pre-allocating making sense and/or the sizes could not be known first
C-style arrays are very useful in embedded system where memory is constrained (and severely limited).
The arrays allow for programming without dynamic memory allocation. Dynamic memory allocation generates fragmented memory and at some point in run-time, the memory has to be defragmented. In safety critical systems, defragmentation cannot occur during the periods that have critical timing.
The const arrays allow for data to be put into Read Only Memory or Flash memory, out of the precious RAM area. The data can be directly accessed and does not require any additional initialization time, as with std::vector or std::array.
The C-style array is a convenient tool to place raw data into a program. For example, bitmap data for images or fonts. In smaller embedded systems with no hard drives or flash drives, the data must directly accessed. C-style arrays allow for this.
Edit 1:
Also, std::array cannot be used with compiler that don't support C++11 or afterwards.
Many companies do not want to switch compilers once a project has started. Also, they may need to keep the compiler version around for maintenance fixes, and when Agencies require the company to reproduce an issue with a specified software version of the product.
I found just one reason today : when you want to know preciselly the size of the data block and control it for aligning in a giant data block .
This is usefull when your are dealing with stream processors or Streaming extensions like AVX or SSE.
Control the data block allocation to a huge single aligned block in memory is usefull. Your objects can manipulate the segments they are responsible and, when they finished , you can move and/or process the huge vector in an aligned way .

Why can std::vector be more performant than a native array?

In a comparison on performance between Java and C++ our professor claimed, that theres no benefit in choosing a native array (dynamic array like int * a = new int[NUM] ) over std::vector. She claimed too, that the optimized vector can be faster, but didn't tell us why. What is the magic behind the vector?
PLEASE just answers on performance, this is not a general poll!
Any super optimized code with lower level stuff like raw arrays can beat or tie the performance of an std::vector. But the benefits of having a vector far outweigh any small performance gains you get from low level code.
vector manages it's own memory, array you have to remember to manage the memory
vector allows you to make use of stdlib more easily (with dynamic arrays you have to keep track of the end ptr yourself) which is all very well written well tested code
sometimes vector can even be faster, like qsort v std::sort, read this article, keep in mind the reason this can happen is because writing higher level code can often allow you to reason better about the problem at hand.
Since the std containers are good to use for everything else, that dynamic arrays don't cover, keeping code consistent in style makes it more readable and less prone to errors.
One thing I would keep in mind is that you shouldn't compare a dynamic array to a std::vector they are two different things. It should be compared to something like std::dynarray which unfortunately probably won't make it into c++14 (boost prolly has one, and I'm sure there are reference implementations lying around). A std::dynarray implementation is unlikely to have any performance difference from a native array.
There is benefit of using vector instead of array when the number of elements needs to be changed.
Neither optimized vector can be faster than array.
The only performance optimization a std::vector can offer over a plain array is, that it can request more memory than currently needed. std::vector::reserve does just that. Adding elements up to its capacity() will not involve any more allocations.
This can be implemented with plain arrays as well, though.

c++ std::vector performance [reference required]

I'm writing a parallel implementation of some data structures. I'd like to find out if someone know about difference in performance between pure pointers and std::vector. If you know trustful docs about it, please write URL/book name/whatever. Any hints are welcome!
The difference is usage and implementation relative.
You can make std::vector as fast as normal pointers by using the unchecked operator[] and resizing appropriately. The reality is that vector is a compile-time abstraction over a pointer- not a runtime one, unless you choose to use extras. What's much more important is the vastly increased safety vector offers- debugging iterators, automatic and safe resource management, etc. There is no reason to use a raw pointer.
Edit: My reference is that profiling run you did before you even considered losing the safety of vector.
According to this answer in a similar question, accessing an element in a dynamically-allocated array versus a std::vector will be roughly the same. There's some good analysis in that question and in this one as well.
If you mean to compare a std::vector with some hand-written dynamic array here are some points of reference:
The resizing factor on insertion is important. This factor isn't specified by the standard but is usually between 1.5 or 2 and it must guarantee amortized constant time on insertion operations.
The allocator: A lot of the performance depends on the allocation mechanism that is used, same goes for your pointers.
Boundary checking can occur in std::vector if you call vector::at which cannot happen with raw pointers.

Are there any practical limitations to only using std::string instead of char arrays and std::vector/list instead of arrays in c++?

I use vectors, lists, strings and wstrings obsessively in my code. Are there any catch 22s involved that should make me more interested in using arrays from time to time, chars and wchars instead?
Basically, if working in an environment which supports the standard template library is there any case using the primitive types is actually better?
For 99% of the time and for 99% of Standard Library implementations, you will find that std::vectors will be fast enough, and the convenience and safety you get from using them will more than outweigh any small performance cost.
For those very rare cases when you really need bare-metal code, you can treat a vector like a C-style array:
vector <int> v( 100 );
int * p = &v[0];
p[3] = 42;
The C++ standard guarantees that vectors are allocated contiguously, so this is guaranteed to work.
Regarding strings, the convenience factor becomes almnost overwhelming, and the performance issues tend to go away. If you go beack to C-style strings, you are also going back to the use of functions like strlen(), which are inherently very inefficent themselves.
As for lists, you should think twice, and probably thrice, before using them at all, whether your own implementation or the standard. The vast majority of computing problems are better solved using a vector/array. The reason lists appear so often in the literature is to a large part because they are a convenient data structure for textbook and training course writers to use to explain pointers and dynamic allocation in one go. I speak here as an ex training course writer.
I would stick to STL classes (vectors, strings, etc). They are safer, easier to use, more productive, with less probability to have memory leaks and, AFAIK, they make some additional, run-time checking of boundaries, at least at DEBUG time (Visual C++).
Then, measure the performance. If you identify the bottleneck(s) is on STL classes, then move to C style strings and arrays usage.
From my experience, the chances to have the bottleneck on vector or string usage are very low.
One problem is the overhead when accessing elements. Even with vector and string when you access an element by index you need to first retrieve the buffer address, then add the offset (you don't do it manually, but the compiler emits such code). With raw array you already have the buffer address. This extra indirection can lead to significant overhead in certain cases and is subject to profiling when you want to improve performance.
If you don't need real time responses, stick with your approach. They are safer than chars.
You can occasionally encounter scenarios where you'll get better performance or memory usage from doing some stuff yourself (example, std::string typically has about 24 bytes of overhead, 12 bytes for the pointers in the std::string itself, and a header block on its dynamically allocated piece).
I have worked on projects where converting from std::string to const char* saved noticeable memory (10's of MB). I don't believe these projects are what you would call typical.
Oh, using STL will hurt your compile times, and at some point that may be an issue. When your project results in over a GB of object files being passed to the linker, you might want to consider how much of that is template bloat.
I've worked on several projects where the memory overhead for strings has become problematic.
It's worth considering in advance how your application needs to scale. If you need to be storing an unbounded number of strings, using const char*s into a globally managed string table can save you huge amounts of memory.
But generally, definitely use STL types unless there's a very good reason to do otherwise.
I believe the default memory allocation technique is a buffer for vectors and strings is one that allocates double the amount of memory each time the currently allocated memory gets used up. This can be wasteful. You can provide a custom allocator of course...
The other thing to consider is stack vs. heap. Staticly sized arrays and strings can sit on the stack, or at least the compiler handles the memory management for you. Newer compilers will handle dynamically sized arrays for you too if they provide the relevant C99/C++0x feature. Vectors and strings will always use the heap, and this can introduce performance issues if you have really tight constraints.
As a rule of thumb use whats already there unless it hurts your project with its speed/memory overhead... you'll probably find that for 99% of stuff the STL provided classes save you time and effort with little to no impact on your applications performance. (i.e. "avoid premature optimisation")