Why prefer std::vector over std::deque? [duplicate] - c++

This question already has answers here:
Why would I prefer using vector to deque
(10 answers)
Closed 3 years ago.
They both have access complexity of O(1) and random insertion/removal complexity of O(n). But vector costs more when expanding because of reallocation and copy, while deque does not have this issue.
It seems deque has a better performance, but why most people use vector instead of deque?

why most people use vector instead of deque?
Because this is what they have been taught.
vector and deque serve slightly different purposes. They can both be used as a simply container of objects, if that's all you need. When learning to program C++, that is all most people need -- a bucket to drop stuff in to, get stuff out of, and walk over.
When StackOverflow is asked a question like "which container should I use by default," the answer is almost invariably vector. The question is generally asked from the context of learning to program in C++, and at the point where a programmer is asking such a question, they don't yet know what they don't know. And there's a lot they don't yet know. So, we (StackOverflow) need a container that fits almost every need for better or worse, can be used in almost any context, and doesn't require that the programmer has asked all the right questions before landing on something approximating the correct answer. Furthermore, the Standard specifically recommends using vector. vector isn't best for all uses, and in fact deque is better than vector for many common uses -- but it's not so much better for a learning programmer that we should vary from the Standard's advice to newbie C++ programmers, so StackOverflow lands on vector.
After having learned the basics of the syntax and, shall we say, the strategies behind programming in C++, programmers split in to two branches: those who care to learn more and write better programs, and those who don't. Those who don't will stick on vector forever. I think many programmers fall in to this camp.
The rarer programmers who try to move beyond this phase start asking other questions -- questions like you've asked here. They know there is lots they don't yet know, and they want to start discovering what those things are. They will quickly (or less quickly) discover that when choosing between vector and deque, some questions they didn't think to ask before are:
Do I need the memory to be contigious?
Do I need to avoid lots of reallocations?
Do I need to keep valid iterators after insertions?
Do I need my collection to be compatible with some ancient C-like function?
Then they really start thinking about the code they are writing, discover yet more stuff they don't know, and the beat goes on...

From C++ standard section 23.1.1:
vector is the type of sequence that should be used by default... deque is
the data structure of choice when most insertions and deletions take place
at the beginning or at the end of the sequence.
However there are some arguments in the opposite direction.
In theory vector is at least as efficient as deque as it provides a subset of its functionality. If your task only needs what vector's interface provides, prefer vector - it can not be worse than a deque.

But vector costs more when expanding because of reallocation and copy
While it's true that vector sometimes has to reallocate its array as it grows, it will grow exponentially so that the amortised complexity is still O(1). Often, you can avoid reallocations by judicious use of reserve().
It seems deque has a better performance
There are many aspects to performance; the time taken by push_back is just one. In some applications, a container might be modified rarely, or populated on start-up and then never modified. In cases like that, iteration and access speed might be more important.
vector is the simplest possible container: a contiguous array. That means that iteration and random access can be achieved by simple pointer arithmetic, and accessing an element can be as fast as dereferencing a pointer.
deque has further requirements: it must not move the elements. This means that a typical implementation requires an extra level of indirection - it is generally implemented as something like an array of pointers to arrays. This means that element access requires dereferencing two pointers, which will be slower than one.
Of course, often speed is not a critical issue, and you choose containers according to their behavioural properties rather than their performance. You might choose vector if you need the elements to be contiguous, perhaps to work with an API based on pointers and arrays. You might choose deque or list if you want a guarantee that the elements won't move, so you can store pointers to them.

For cplusplus :
Therefore they provide a similar functionality as vectors, but with
efficient insertion and deletion of elements also at the beginning of
the sequence, and not only at its end. But, unlike vectors, deques are
not guaranteed to store all its elements in contiguous storage
locations, thus not allowing direct access by offsetting pointers to
elements.

Personally, I prefer using deque (I always end up spoiling myself and having to use push_front for some reason or other), but vector does have its uses/differences, with the main one being:
vector has contiguous memory, while a deque allocates via pages/chunks.
Note, the pages/chunks are fairly useful: constant-time insert/erase on the front of the container. It is also typical that a large block of memory broken up into a series of smaller blocks is more efficient than a single block of a memory.
You could also argue that, because deque is 'missing' size reservation methods (capacity/reserve), you have less to worry about.
I highly suggest you read Sutters' GotW on the topic.

Related

Can most of the data structures be implemented using vectors?

I used C++ vectors to implement stacks, queue, heaps, priority queue and directed weighted graphs. In the books and references, I have seen big classes for these data structures, all of which can be implemented in short using vectors. (May be there is more flexibility in using pointers)
Can we also implement even advanced data structures using vectors ?
If yes, why do C++ books still explain concepts with the long classes using pointers ?
Is it to keep in mind the lower level idea, if it is more vivid that way or it makes students equipped with such usage of pointers ?
It's true that many data structures can be implemented on top of a vector (array, for the sake of this answer), essentially all of them can, since every computation task can be implemented to run on a turing-machine which has a far more basic data access capability (or, in the real world, you may say that any program you implement with pointers eventually runs on a CPU with a simply array-like virtual memory space, so you could just call that a huge array). However, it's not always clever. Two main reasons :
performance / time complexity - a vector simply can't provide all basic operations that in O(1). There's a solution for fast initialization, but try to randomly insert values into a large vector and see how bad you perform - that's because you have to move all the elements by one place over and over. A list could do that in a single operation. Of course other structures have their own performance shortcomings, but that's the beauty of designing complicated data structures with these basic building blocks.
structural complexity - you can think of a list along the same line of a vector as an ordered container, and perhaps extend this into multidimensional matrices that can be implemented on top of them since they still retain some basic ordering, but there are more complicated structures. Take for e.g. a tree, a simple full binary tree one can be implemented with a vector very easily since the parent-child relations can be easily converted to index arithmetics, but what if the tree isn't full and has varying number of children per node? Now, you may say it can still be done (any graph can be implemented with vectors either through adjacency matrix or adjacency list for e.g.), but there's almost no sense in doing so when you can have a much simpler implementation using pointer links. Just think of doing an AVL roll with an array. :shudder:
Mind you that the second argument may very well boil down to performance ("hey, it's an awkward approach but I still managed to use a vector!"), but it's more than that - it would complicate your code, clutter your data structure design, and could make it far more prone to bugs.
Now, here comes the "but" - even though there's much sense in using all the possible tools the language provides you, it's very widely accepted to use vector-based structures for performance critical tasks. See almost all scientific CPU benchmarks, most of them ultimately rely on vectors (uncited, but I can elaborate further if anyone is interested. Suffice to say that even the well-known *graph*500 does that).
The reason is not that it's best programming practice, but that it's more suited with CPU internal structure and gets more "juice" out of the HW. That's due to spatial locality - CPUs are very fond of that as it allows the memory unit to parallelize accesses (in an array you always know where's the next element, in a list you have to wait until the current one is fetched), and also issue stream/stride prefetches that reduce latency of future requests.
I can't say this is always a good practice, when you run through a graph the accesses are still pretty irregular even if you use an array implementation, but it's still a very common practice.
To summarize, taking the question literally - most of them can, of sorts (for a given definition of "most", ok?), but if the intention was "why teach pointers", I believe you can see that in order to understand your limits and what you can and should use - you need to know a great deal more than just arrays and even pointers. A good programmer should know a bit about everything - OS design, CPU design, etc. You can't do anything decent unless you really understand the fabric you're running on, and that unfortunately (or not) includes lots of pointers
You can implement a kind of allocator using an std::vector as the backing store. If you do that, all the standard data structures from elementary computer science can be implemented on top of vectors. It will hardly free you from using pointers, though: vectors are really just chunks of memory with a few useful additional operations, most notably the ability to expand.
More to the point: if you don't understand pointers, you won't understand how to do use vector for advanced data structures either. vector is a useful abstraction, but it follows the C++ rule that "you don't get what you don't pay for", so it's also a very "thin" abstraction, and you do pay for the cost of abstraction in terms of the amount of code you have to write.
(Jonathan Wakely points out, in the comments, that you won't get the exact guarantees that the C++ standard library requires of allocators data structures when you implement them on top of vector. Put in principle, vectors are just a way of handling blocks of memory.)
If you are learning C++ you need to be familiar with pointers and how to use them even if there are more higher level concepts that does that job for you.
Yes, it is possible to implement most data structures with vectors or lists and if you just started learning programming it's probably a good idea that you'll know how to write these data structures yourself.
With that being said, production code should always use the standard library unless there is a good reason not to do so.

Why is std::vector so much more popular than std::deque? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why would I prefer using vector to deque
I am curious why is it that std::vector is so much more popular than std::deque. Deque is almost as efficient in lookup, more efficient in insert (without vector::reserve)and allows for inserting/deleting in the front.
Herb Sutter once recommended that if you want to use vector, just prefer deque (I am paraphrasing). However in a recent talk on Writing Modern C++ he again strongly recommends thinking of std::vector as a default container. According to the GOTW I linked earlier, even the standard has similar wording.
Is there a reason for this disparity? Is it just that vector is simpler and more well known, or is there a technical reason? Or is it that vector is just a cooler name .. ?
I can't speak for anybody else, but I can for myself.
When I first read about std::deque, I thought it was cool enough that for a while, I treated it not only as the default container, but as nearly the only container I used.
Then somebody asked about why, and I expounded at length on its virtues and why it was the best container to use for practically everything, and how it was much more versatile than std::vector.
Fortunately, the guy questioning my decision on it was persuasive enough that I did some testing. Testing showed that in nearly every case, std::deque was slower than std::vector -- often by a substantial factor (e.g., around 2). In fact, of the code I'd written using std::deque, just replacing std::deque with std::vector gave a speedup in all but a handful of cases.
I have used std::deque in a few cases since then, but definitely don't treat it as the default any more. The simple fact is that in the usual implementation it's noticeably slower than std::vector for most purposes.
I should add, however, that I'm reasonably certain that with the right implementation, it could be nearly equivalent to std::vector in virtually all cases. Most use a representation that's undoubtedly great from an asymptotic viewpoint, but doesn't work out quite so wonderfully (for many purposes) in the real world.
std::vector is very well understood, simple and is compatible with C (both in terms of the memory layout, and in terms of using pointers as iterators).
For some operations it is also more efficient than std::deque. Accessing elements by index is one example.
For a given task, it makes sense to use the simplest container that does the job well. In many cases that simplest container is std::vector.
People use std::vector over std::deque for a simple reason. They interface with a lot of C libraries and they have functions with parameters which require a pointer to contiguous storage, which std::deque doesn't (can't) guarantee.
Another reason is when you build code according to std::deque and your requirements change so that you have to support contiguous access, you will have a bit of refactoring to do.
I am compelled to mention that there are libraries out there whose authors claim to have created more efficient vector implementations in order to overcome inefficiencies when capacity is increased.
The structure of std::deque is a bit more complex which makes naive iteration quite a bit more expensive than std::vector. The insertions into std::vector with its reallocations tend not to be big problem, especially when using reserve() and only appending to the end. Also, there are easier to understand invalidation rules for std::vector although it is actually an advantage of std::deque that objects stay put when inserting/removing only at either end (note that std::deque iterators get invalidate upon each insertion, independent of where the insertion happens). Another plus for std:vector is the guarantee that the values are contiguous in memory causing it to make fewer memory allocations.
I guess, I would recommend use of std::deque algorithms were consistently optimized to use segmented sequences (I'm not aware that any standard C++ library does this optimization) and users were accessing sequences consistently using algorithms (as far as I can tell, only a tiny fraction of users considers the option to use algorithms). Otherwise I would suspect that std::deque is the better option with respect to performance only if you take advantage of its specific properties (e.g., that objects stay put and that you can insert/remove at the end). It is worth profiling the two alternatives, though.
Apart from std::vector being the most commonly known container class, it also has several advantages over std::deque, namely:
A typical std::deque requires an additional indirection to access the elments unlike as in case of std::vector.
Iterators in case of std::deque must be smart pointers and not pointers as in case of std::vector.
Elements of an std::vector are guaranteed to be contiguous and hence it is compatible with c-style functions which take arrays as parameters.
std::deque provide no support to control the capacity and the moment of reallocation.
Especially, the last point is noteworthy.

Should std::list be deprecated?

According to Bjarne Stroustrup's slides from his Going Native 2012 keynote, insertion and deletion in a std::list are terribly inefficient on modern hardware:
Vector beats list massively for insertion and deletion
If this is indeed true, what use cases are left for std::list? Shouldn't it be deprecated then?
Vector and list solve different problems. List provides the guarantee that iterators never become invalidated as you insert and remove other elements. Vector doesn't make that guarantee.
Its not all about performance. So the answer is no. List should not be deprecated.
Edit Beyond this, C++ isn't designed to work solely on "modern hardware." It is intended to be useful across a much wider range of hardware than that. I'm a programmer in the financial industries and I use C++, but other domains such as embedded devices, programmable controllers, heart-lung machines and myriad others are just as important. The C++ language should not be designed solely with the needs of certain domains and the performance of certain classes of hardware in mind. Just because I might not use a list doesn't mean it should be deprecated from the language.
Whether a vector outperforms a list or not also depends on the type of the elements. For example, for int elements vector is indeed very fast as most of the data fits inside the CPU cache and SIMD instructions can be used for the data copying. So the O(n) complexity of vector doesn't have much impact.
But what about larger data types, where copying doesn't translate to a stream operation, and instead data must be fetched from all over the place? Also, what about hardware that doesn't have large CPU caches and possibly also lacks SIMD instructions? C++ is used on much more than just modern desktop and workstation machines. Deprecating std::list is out of the question.
What Stroustrup is saying in that presentation is that before you pick std::list for your data, you should make sure that it's the right choice for your particular situation. In other words, benchmark and profile. It definitely doesn't say you should always pick std::vector.
No, and especially not based on one particular graph. There are instances where list will perform better than vector. See: http://www.baptiste-wicht.com/2012/12/cpp-benchmark-vector-list-deque/
And that's ignoring the non-performance differences, as others have mentioned.
Bjarne's point in that talk wasn't that you shouldn't use list. It was that people make too many assumptions about list's performance that often turn out to be wrong. He was simply justifying the stance that vector should always be your default go-to container type unless you actually find a need for the performance or other semantic characteristics of lists.
std::list is a deque, it has push_front() and pop_front(). It still has a niche role as such, though it may not be the best choice for a deque.
std::list does not reallocate memory, while std::vector may. Sometimes you don't want an item to move in memory (e.g. a stackful coroutine).
Linked lists are related to tree data structures. Both contain links. If we deprecate std::list, then what about tree-based containers?
Of course not. std::list is a different data structure. Comparing different data structure is good indication of its properties, advantages or disadvantages. But each data structure has its advantage.

Vector vs list according to Stroustrup [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
When do you prefer using std::list<T> instead of std::vector<T>?
I just watched the recording of the GoingNative'12 talk by Bjarne Stroustrup. And I'm a bit confused.
In this talk he in particular discusses the vector vs list question and suggest that in many cases vector is faster even if you insert and remove intensively to/from the middle, as compilers can optimize a lot of things and like compact structures. And the conclusion(as I understand it) is: first use vector and later think whether you need something else. That sounds reasonable, but taking into account the first observation, what criteria I should take into account? I always thought that if you insert/remove intensively - use list. Similar things are suggested in some topics here. See
Relative performance of std::vector vs. std::list vs. std::slist?
and
vector vs. list in STL
And now according to Stroustrup I was wrong.
Of course I can write a couple of tests and try to figure out what to use in each particular situation, but is there a theoretical way?
The most important motivation for preferring std::list over std::vector is the validity of iterators, not performance. If at the time you're inserting or erasing, you have other iterators into the container, then you probably need std::list, since it insertion doesn't invalidate any iterators, and erasure only invalidates iterators to the element being erased. About the only time std::list will win on performance is when copy and assignment are extremely expensive, and in such cases, it's often a better choice to modify the contained class to reduce the cost of copy and assignment, rather than switching to std::list.

Fastest way to speed up map<string,int> .find() in c++ . Where the keys are in alphabetical order

I have a map with about 100,000 pairs . Is there any way that i can speed up searching when using find(), given that the keys are in alphabetical order. Also how should i go about doing it. I know that you can specify a new comparator when you create the map. But will that speed up the find() function at all?
Thanks in advance.
[solved] Thanks a bunch guys i have decided to go with a vector and use lower and upperbound to "snip" some of the searching.
Also i am new here is there any way to mark this question as answered , or pick a best answer?
A different comparator will only speed up find if it manages to do the comparison faster (which, for strings will usually be pretty difficult).
If you're basically inserting all the data in order, then doing the searching, it may be faster to use a std::vector with std::lower_bound or std::upper_bound.
If you don't really care about ordering, and just want to find the data as quickly as possible, you might find that std::unordered_map works better for you.
Edit: Just for the record: the way you "might find" or "may find" those things is normally by profiling. Depending on the situation, it might be enough faster that it's pretty obvious even in simple testing, so profiling isn't really necessary, but if there's (much) doubt, or you want to quantify the effect, a profiler is probably the right way to do it.
std::map is already taking advantage of the fact the keys are in alphabetical order - it guarantees that itself. You aren't going to be able to improve it by changing the comparator (one assumes it's already a reasonably efficient string comparison).
Have you considered using unordered_map (aka hash_map in various implementations pre C++11? It should be able to search in O(1) instead of O(log(n)) for std::map.
You could also look into something slightly more exotic, like a trie, but that's not part of the standard library so you'd either have to find one elsewhere or roll your own, so I'd suggest unordered_map is a good place to start.
If you're using std::find to find elements, you should switch to using map::find (you don't really say in your question.) map::find uses the fact that the map is ordered to search much faster.
If that's still not good enough, you might look into a hash container such as unordered_map rather than map.
I've put in a vote for unordered_map but I wanted to also make another point.
One of the things that can hurt performance on modern machines is poor use of the cache. A map is going to have nodes allocated all over the place and there won't be much locality of reference. Also since it has to store a bunch of pointers between nodes it will use up more memory.
At the recent Going Native 2012 conference Bjarne Stroustroup gave an interesting talk that touched on this topic. He compared vector and list performance at a task involving a lot of random insertions and deletions, where it might seem list ought to have dominated, but because of the memory size and layout issue vector was in fact the fastest by far. Take a look at his slides, starting at slide 43.
unordered_map gives you direct access to the element and so it probably means even less hopping around in memory than trying to stick your data in a vector (and thus better performance than vector) so my comment is simply an admonishment to always keep your memory access pattern in mind for performance