Which stl container should I use when doing few inserts?

Which stl container should I use when doing few inserts? - c++

I don't know my exact numbers but i'll try my best. I have a 10000 element deque thats populated right at the start. Than i scan through each element and lets every 20 elements i'll need to insert an new element. The insert would happen at the current position and maybe one element back.
I don't exactly need to remember the position but i also don't exactly need random access either. I'd like fast inserts. Does deque and vector have a heavy price to pay on insert? Should i use list?
My other option is to have a 2nd deque list and as i go through each element insert it to the other deque list unless i need to do the insert i am talking about. This does need to be fast as its a performance intensive app. But I am using a lot of pointers (each element is a pointer) which is upsetting me but there isn't a way around that so i should assume L1 cache will always miss?

I'd start with std::vector in this case, but use a second std::vector for your mass mutations, reserve() appropriately, then swap() the vectors.
Update
It would take this general form:
std:vector<t_object*> source; // << source already holds 10000 elements
std:vector<t_object*> tmp;
// to minimize reallocations and frees to 1 and 1, if possible.
// if you do not swap or have to grow more, reserving can really work against you.
tmp.reserve(aMeaningfulReserveValue);
while (performingMassMutation) {
// "i scan through each element and lets every 20 elements"
for (twentyElements)
tmp.push_back(source[readPos++]);
// "every 20 elements i'll need to insert an new element"
tmp.push_back(newElement);
}
// approximately 500 iterations later…
source.swap(tmp);
Borealid brought up a good point, which is measure -- execution varies dramatically depending on your std library implementations, data sizes, complexity to copy, and so on.
For raw pointers of a collection this size with my configuration, the vector mass mutation and push_back above was 7 times faster than std::list insertion. push_back was faster than vector's range insertion.
As Emile points out below, std::vector::swap() does not need to move or reallocate elements -- it can just swap out internals (provided the allocators are the same type).

First off, the answer to all performance questions is "benchmark it". Always. Now...
If you don't care about the memory overhead, and you don't need random access, but you do care about having constant-time insertions, list is probably right for you.
std::vector will have constant-time insertions at the end when it has sufficient capacity. When the capacity is exceeded, it needs a linear-time copy. deque is better because it links discrete allocations, avoiding a complete copy and letting you do constant-time insertions at the front as well. Random insertions (every 20 elements) will always be linear time.
As for cache locality, a vector is as good as you can get (contiguous memory), but you said you cared about insertions rather than lookups; in my experience, when that's the case you don't care about how hot the cache gets as you scan through to dump, so list's poor behavior doesn't much matter.

Lists are useful when either you frequently want to insert elements in the middle of the collection, or frequently remove them. Lists are, however, slow to read.
Vectors are very fast to read and very fast when you only want to add or remove elements at the end of the collection, but they are very slow when you insert elements in the middle. This is because it has to move all elements after the desired position by one place, to make room for the new element.
Deques are basically doubly linked lists that can be used as vectors.
If you don't need to insert elements in the middle of the collection (you don't care about the order), I suggest you use vector. If you can approximate the number of elements that will be introduced in the vector from the beginning, you should also use std::vector::reserve to allocate memory necessary from the beginning. The value you pass to reserve doesn't need to be exact, just approximate; if it's smaller than needed, the vector will resize automatically, when necessary.

You can go two ways: list is always an option for random place insertions, however as you allocate every element separately this will cause some performance implications too. The other option of inserting in-place in the deque is not good as well - because you will pay linear time for every insertion. Maybe your idea of inserting in new deque is the best here - you pay twice as much memory, but on the other hand you always do insertion either at the end of the second deque, or one element before that - this all gives constant amortized time, and still you have good caching of the container.

The number of copies done for std::vector/deque ::insert etc is proportional to the number of elements between the insert position and the end of container (the number of elements that need to be shifted to make room). The worst-case for a std::vector is O(N) - when you insert at the front of the container. If you're inserting M elements the worst -case is therefore O(M*N) which isn't great.
There could also be a reallocation involved if the containers capacity is exceeded. You could prevent reallocation by ensuring that sufficient space was ::reserve'd up front.
You're other suggestion - copying to a second std::vector/deque container could be better in that it could always be organised to achieve O(N) complexity, but at the cost of temporarily storing two containers.
Using a std::list would allow you to achieve in-place O(1) inserts, but at the cost of additional memory overhead (storing the list pointers etc) and reduced memory locality (list nodes are not allocated contiguously). You could improve the memory locality by using a pool'd memory allocator (Boost pools maybe?).
Overall you'd have to benchmark to really sort out which is "the fastest" approach.
Hope this helps.

If you need fast inserts in the middle, but don't care about random access, vector and deque are definitely not for you: For those, every time you insert something, all elements between that one and the end have to be moved. Of the built-in containers, list is almost certainly your best bet. However a better data structure for your scenario would probably be a VList because it provides better cache locality, however that's not provided by the C++ standard library. The Wikipedia page links to a C++ implementation, however from a quick view on the interface it doesn't seem to completely STL compatible; I don't know if this is an issue for you.
Of course, in the end the only way to be sure which is the optimal solution is to measure the performance.

Related

Efficient linked list in C++?

This document says std::list is inefficient:
std::list is an extremely inefficient class that is rarely useful. It performs a heap allocation for every element inserted into it, thus having an extremely high constant factor, particularly for small data types.
Comment: that is to my surprise. std::list is a doubly linked list, so despite its inefficiency in element construction, it supports insert/delete in O(1) time complexity, but this feature is completely ignored in this quoted paragraph.
My question: Say I need a sequential container for small-sized homogeneous elements, and this container should support element insert/delete in O(1) complexity and does not need random access (though supporting random access is nice, it is not a must here). I also don't want the high constant factor introduced by heap allocation for each element's construction, at least when the number of element is small. Lastly, iterators should be invalidated only when the corresponding element is deleted. Apparently I need a custom container class, which might (or might not) be a variant of doubly linked list. How should I design this container?
If the aforementioned specification cannot be achieved, then perhaps I should have a custom memory allocator, say, bump pointer allocator? I know std::list takes an allocator as its second template argument.
Edit: I know I shouldn't be too concerned with this issue, from an engineering standpoint - fast enough is good enough. It is just a hypothetical question so I don't have a more detailed use case. Feel free to relax some of the requirements!
Edit2: I understand two algorithms of O(1) complexity can have entirely different performance due to the difference in their constant factors.

Your requirements are exactly those of std::list, except that you've decided you don't like the overhead of node-based allocation.
The sane approach is to start at the top and only do as much as you really need:
Just use std::list.
Benchmark it: is the default allocator really too slow for your purposes?
No: you're done.
Yes: goto 2
Use std::list with an existing custom allocator such as the Boost pool allocator
Benchmark it: is the Boost pool allocator really too slow for your purposes?
No: you're done.
Yes: goto 3
Use std::list with a hand-rolled custom allocator finely tuned to your unique needs, based on all the profiling you did in steps 1 and 2
Benchmark as before etc. etc.
Consider doing something more exotic as a last resort.
If you get to this stage, you should have a really well-specified SO question, with lots of detail about exactly what you need (eg. "I need to squeeze n nodes into a cacheline" rather than "this doc said this thing is slow and that sounds bad").
PS. The above makes two assumptions, but both are worth investigation:
as Baum mit Augen points out, it's not sufficient to do simple end-to-end timing, because you need to be sure where your time is going. It could be the allocator itself, or cache misses due to the memory layout, or something else. If something's slow, you still need to be sure why before you know what ought to change.
your requirements are taken as a given, but finding ways to weaken requirements is often the easiest way to make something faster.
do you really need constant-time insertion and deletion everywhere, or only at the front, or the back, or both but not in the middle?
do you really need those iterator invalidation constraints, or can they be relaxed?
are there access patterns you can exploit? If you're frequently removing an element from the front and then replacing it with a new one, could you just update it in-place?

As an alternative, you can use a growable array and handle the links explicitly, as indexes into the array.
Unused array elements are put in a linked list using one of the links. When an element is deleted, it is returned to the free list. When the free list is exhausted, grow the array and use the next element.
For the new free elements, you have two options:
append them to the free list at once,
append them on demand, based on the number of elements in the free list vs. the array size.

The requirement of not invalidating iterators except the one on a node being deleted is forbidding to every container that doesn't allocate individual nodes and is much different from e.g. list or map.
However, I've found that in almost every case when I thought that this was necessary, it turned out with a little discipline I could just as well do without. You might want to verify if you can, you would benefit greatly.
While std::list is indeed the "correct" thing if you need something like a list (for CS class, mostly), the statement that it is almost always the wrong choice is, unluckily, exactly right. While the O(1) claim is entirely true, it's nevertheless abysmal in relation to how actual computer hardware works, which gives it a huge constant factor. Note that not only are the objects that you iterate randomly placed, but the nodes that you maintain are, too (yes, you can somehow work around that with an allocator, but that is not the point). On the average, you have two one guaranteed cache misses for anything you do, plus up to two one dynamic allocations for mutating operations (one for the object, and another one for the node).
Edit: As pointed out by #ratchetfreak below, implementations of std::list commonly collapse the object and node allocation into one memory block as an optimization (akin to what e.g. make_shared does), which makes the average case somewhat less catastrophic (one allocation per mutation and one guaranteed cache miss instead of two).
A new, different consideration in this case might be that doing so may not be entirely trouble-free either. Postfixing the object with two pointers means reversing the direction while dereference which may interfere with auto prefetch.
Prefixing the object with the pointers, on the other hand, means you push the object back by two pointers' size, which will mean as much as 16 bytes on a 64-bit system (that might split a mid-sized object over cache line boundaries every time). Also, there's to consider that std::list cannot afford to break e.g. SSE code solely because it adds a clandestine offset as special surprise (so for example the xor-trick would likely not be applicable for reducing the two-pointer footprint). There would likely have to be some amount of "safe" padding to make sure objects added to a list still work the way they should.
I am unable to tell whether these are actual performance problems or merely distrust and fear from my side, but I believe it's fair to say that there may be more snakes hiding in the grass than one expects.
It's not for no reason that high-profile C++ experts (Stroustrup, notably) recommend using std::vector unless you have a really good reason not to.
Like many people before, I've tried to be smart about using (or inventing) something better than std::vector for one or the other particular, specialized problem where it seems you can do better, but it turns out that simply using std::vector is still almost always the best, or second best option (if std::vector happens to be not-the-best, std::deque is usually what you need instead).
You have way fewer allocations than with any other approach, way less memory fragmentation, way fewer indirections, and a much more favorable memory access pattern. And guess what, it's readily available and just works.
The fact that every now and then inserts require a copy of all elements is (usually) a total non-issue. You think it is, but it's not. It happens rarely and it is a copy of a linear block of memory, which is exactly what processors are good at (as opposed to many double-indirections and random jumps over memory).
If the requirement not to invalidate iterators is really an absolute must, you could for example pair a std::vector of objects with a dynamic bitset or, for lack of something better, a std::vector<bool>. Then use reserve() appropriately so reallocations do not happen. When deleting an element, do not remove it but only mark it as deleted in the bitmap (call the destructor by hand). At appropriate times, when you know that it's OK to invalidate iterators, call a "vacuum cleaner" function that compacts both the bit vector and the object vector. There, all unforeseeable iterator invalidations gone.
Yes, that requires maintaining one extra "element was deleted" bit, which is annoying. But a std::list must maintain two pointers as well, in additon to the actual object, and it must do allocations. With the vector (or two vectors), access is still very efficient, as it happens in a cache-friendly way. Iterating, even when checking for deleted nodes, still means you move linearly or almost-linearly over memory.

std::list is a doubly linked list, so despite its inefficiency in element construction, it supports insert/delete in O(1) time complexity, but this feature is completely ignored in this quoted paragraph.
It's ignored because it's a lie.
The problem of algorithmic complexity is that it generally measures one thing. For example, when we say that insertion in a std::map is O(log N), we mean that it performs O(log N) comparisons. The costs of iterating, fetching cache lines from memory, etc... are not taken into account.
This greatly simplifies analysis, of course, but unfortunately does not necessarily map cleanly to real-world implementation complexities. In particular, one egregious assumption is that memory allocation is constant-time. And that, is a bold-faced lie.
General purpose memory allocators (malloc and co), do not have any guarantee on the worst-case complexity of memory allocations. The worst-case is generally OS-dependent, and in the case of Linux it may involve the OOM killer (sift through the ongoing processes and kill one to reclaim its memory).
Special purpose memory allocators could potentially be made constant time... within a particular range of number of allocations (or maximum allocation size). Since Big-O notation is about the limit at infinity, it cannot be called O(1).
And thus, where the rubber meets the road, the implementation of std::list does NOT in general feature O(1) insertion/deletion, because the implementation relies on a real memory allocator, not an ideal one.
This is pretty depressing, however you need not lose all hopes.
Most notably, if you can figure out an upper-bound to the number of elements and can allocate that much memory up-front, then you can craft a memory allocator which will perform constant-time memory allocation, giving you the illusion of O(1).

Use two std::lists: One "free-list" that's preallocated with a large stash of nodes at startup, and the other "active" list into which you splice nodes from the free-list. This is constant time and doesn't require allocating a node.

The new slot_map proposal claim O(1) for insert and delete.
There is also a link to a video with a proposed implementation and some previous work.
If we knew more about the actual structure of the elements there might be some specialized associative containers that are much better.

I would suggest doing exactly what #Yves Daoust says, except instead of using a linked list for the free list, use a vector. Push and pop the free indices on the back of the vector. This is amortized O(1) insert, lookup, and delete, and doesn't involve any pointer chasing. It also doesn't require any annoying allocator business.

The simplest way I see to fulfill all your requirements:
Constant-time insertion/removal (hope amortized constant-time is okay for insertion).
No heap allocation/deallocation per element.
No iterator invalidation on removal.
... would be something like this, just making use of std::vector:
template <class T>
struct Node
{
// Stores the memory for an instance of 'T'.
// Use placement new to construct the object and
// manually invoke its dtor as necessary.
typename std::aligned_storage<sizeof(T), alignof(T)>::type element;
// Points to the next element or the next free
// element if this node has been removed.
int next;
// Points to the previous element.
int prev;
};
template <class T>
class NodeIterator
{
public:
...
private:
std::vector<Node<T>>* nodes;
int index;
};
template <class T>
class Nodes
{
public:
...
private:
// Stores all the nodes.
std::vector<Node> nodes;
// Points to the first free node or -1 if the free list
// is empty. Initially this starts out as -1.
int free_head;
};
... and hopefully with a better name than Nodes (I'm slightly tipsy and not so good at coming up with names at the moment). I'll leave the implementation up to you but that's the general idea. When you remove an element, just do a doubly-linked list removal using the indices and push it to the free head. The iterator doesn't invalidate since it stores an index to a vector. When you insert, check if the free head is -1. If not, overwrite the node at that position and pop. Otherwise push_back to the vector.
Illustration
Diagram (nodes are stored contiguously inside std::vector, we simply use index links to allow skipping over elements in a branchless way along with constant-time removals and insertions anywhere):
Let's say we want to remove a node. This is your standard doubly-linked list removal, except we use indices instead of pointers and you also push the node to the free list (which just involves manipulating integers):
Removal adjustment of links:
Pushing removed node to free list:
Now let's say you insert to this list. In that case, you pop off the free head and overwrite the node at that position.
After insertion:
Insertion to the middle in constant-time should likewise be easy to figure out. Basically you just insert to the free head or push_back to the vector if the free stack is empty. Then you do your standard double-linked list insertion. Logic for the free list (though I made this diagram for someone else and it involves an SLL, but you should get the idea):
Make sure you properly construct and destroy the elements using placement new and manual calls to the dtor on insertion/removal. If you really want to generalize it, you'll also need to think about exception-safety and we also need a read-only const iterator.
Pros and Cons
The benefit of such a structure is that it does allow very rapid insertions/removals from anywhere in the list (even for a gigantic list), insertion order is preserved for traversal, and it never invalidates the iterators to element which aren't directly removed (though it will invalidate pointers to them; use deque if you don't want pointers to be invalidated). Personally I'd find more use for it than std::list (which I practically never use).
For large enough lists (say, larger than your entire L3 cache as a case where you should definitely expect a huge edge), this should vastly outperform std::vector for removals and insertions to/from the middle and front. Removing elements from vector can be quite fast for small ones, but try removing a million elements from a vector starting from the front and working towards the back. There things will start to crawl while this one will finish in the blink of an eye. std::vector is ever-so-slightly overhyped IMO when people start using its erase method to remove elements from the middle of a vector spanning 10k elements or more, though I suppose this is still preferable over people naively using linked lists everywhere in a way where each node is individually allocated against a general-purpose allocator while causing cache misses galore.
The downside is that it only supports sequential access, requires the overhead of two integers per element, and as you can see in the above diagram, its spatial locality degrades if you constantly remove things sporadically.
Spatial Locality Degradation
The loss of spatial locality as you start removing and inserting a lot from/to the middle will lead to zig-zagging memory access patterns, potentially evicting data from a cache line only to go back and reload it during a single sequential
loop. This is generally inevitable with any data structure that allows removals from the middle in constant-time while likewise allowing that space to be reclaimed while preserving the order of insertion. However, you can restore spatial locality by offering some method or you can copy/swap the list. The copy constructor can copy the list in a way that iterates through the source list and inserts all the elements which gives you back a perfectly contiguous, cache-friendly vector with no holes (though doing this will invalidate iterators).
Alternative: Free List Allocator
An alternative that meets your requirements is implement a free list conforming to std::allocator and use it with std::list. I never liked reaching around data structures and messing around with custom allocators though and that one would double the memory use of the links on 64-bit by using pointers instead of 32-bit indices, so I'd prefer the above solution personally using std::vector as basically your analogical memory allocator and indices instead of pointers (which both reduce size and become a requirement if we use std::vector since pointers would be invalidated when vector reserves a new capacity).
Indexed Linked Lists
I call this kind of thing an "indexed linked list" as the linked list isn't really a container so much as a way of linking together things already stored in an array. And I find these indexed linked lists exponentially more useful since you don't have to get knee-deep in memory pools to avoid heap allocations/deallocations per node and can still maintain reasonable locality of reference (great LOR if you can afford to post-process things here and there to restore spatial locality).
You can also make this singly-linked if you add one more integer to the node iterator to store the previous node index (comes free of memory charge on 64-bit assuming 32-bit alignment requirements for int and 64-bit for pointers). However, you then lose the ability to add a reverse iterator and make all iterators bidirectional.
Benchmark
I whipped up a quick version of the above since you seem interested in 'em: release build, MSVC 2012, no checked iterators or anything like that:
--------------------------------------------
- test_vector_linked
--------------------------------------------
Inserting 200000 elements...
time passed for 'inserting': {0.000015 secs}
Erasing half the list...
time passed for 'erasing': {0.000021 secs}
time passed for 'iterating': {0.000002 secs}
time passed for 'copying': {0.000003 secs}
Results (up to 10 elements displayed):
[ 11 13 15 17 19 21 23 25 27 29 ]
finished test_vector_linked: {0.062000 secs}
--------------------------------------------
- test_vector
--------------------------------------------
Inserting 200000 elements...
time passed for 'inserting': {0.000012 secs}
Erasing half the vector...
time passed for 'erasing': {5.320000 secs}
time passed for 'iterating': {0.000000 secs}
time passed for 'copying': {0.000000 secs}
Results (up to 10 elements displayed):
[ 11 13 15 17 19 21 23 25 27 29 ]
finished test_vector: {5.320000 secs}
Was too lazy to use a high-precision timer but hopefully that gives an idea of why one shouldn't use vector's linear-time erase method in critical paths for non-trivial input sizes with vector above there taking ~86 times longer (and exponentially worse the larger the input size -- I tried with 2 million elements originally but gave up after waiting almost 10 minutes) and why I think vector is ever-so-slightly-overhyped for this kind of use. That said, we can turn removal from the middle into a very fast constant-time operation without shuffling the order of the elements, without invalidating indices and iterators storing them, and while still using vector... All we have to do is simply make it store a linked node with prev/next indices to allow skipping over removed elements.
For removal I used a randomly shuffled source vector of even-numbered indices to determine what elements to remove and in what order. That somewhat mimics a real world use case where you're removing from the middle of these containers through indices/iterators you formerly obtained, like removing the elements the user formerly selected with a marquee tool after he his the delete button (and again, you really shouldn't use scalar vector::erase for this with non-trivial sizes; it'd even be better to build a set of indices to remove and use remove_if -- still better than vector::erase called for one iterator at a time).
Note that iteration does become slightly slower with the linked nodes, and that doesn't have to do with iteration logic so much as the fact that each entry in the vector is larger with the links added (more memory to sequentially process equates to more cache misses and page faults). Nevertheless, if you're doing things like removing elements from very large inputs, the performance skew is so epic for large containers between linear-time and constant-time removal that this tends to be a worthwhile exchange.

I second #Useless' answer, particularly PS item 2 about revising requirements. If you relax the iterator invalidation constraint, then using std::vector<> is Stroustrup's standard suggestion for a small-number-of-items container (for reasons already mentioned in the comments). Related questions on SO.
Starting from C++11 there is also std::forward_list.
Also, if standard heap allocation for elements added to the container is not good enough, then I would say you need to look very carefully at your exact requirements and fine tune for them.

I just wanted to make a small comment about your choice. I'm a huge fan of vector because of it's read speeds, and you can direct access any element, and do sorting if need be. (vector of class/struct for example).
But anyways I digress, there's two nifty tips I wanted to disclose.
With vector inserts can be expensive, so a neat trick, don't insert if you can get away with not doing it. do a normal push_back (put at the end) then swap the element with one you want.
Same with deletes. They are expensive. So swap it with the last element, delete it.

Thanks for all the answers.
This is a simple - though not rigorous - benchmark.
// list.cc
#include <list>
using namespace std;
int main() {
for (size_t k = 0; k < 1e5; k++) {
list<size_t> ln;
for (size_t i = 0; i < 200; i++) {
ln.insert(ln.begin(), i);
if (i != 0 && i % 20 == 0) {
ln.erase(++++++++++ln.begin());
}
}
}
}
and
// vector.cc
#include <vector>
using namespace std;
int main() {
for (size_t k = 0; k < 1e5; k++) {
vector<size_t> vn;
for (size_t i = 0; i < 200; i++) {
vn.insert(vn.begin(), i);
if (i != 0 && i % 20 == 0) {
vn.erase(++++++++++vn.begin());
}
}
}
}
This test aims to test what std::list claims to excel at - O(1) inserting and erasing. And, because of the positions I ask to insert/delete, this race is heavily skewed against std::vector, because it has to shift all the following elements (hence O(n)), while std::list doesn't need to do that.
Now I compile them.
clang++ list.cc -o list
clang++ vector.cc -o vector
And test the runtime. The result is:
time ./list
./list 4.01s user 0.05s system 91% cpu 4.455 total
time ./vector
./vector 1.93s user 0.04s system 78% cpu 2.506 total
std::vector has won.
Compiled with optimization O3, std::vector still wins.
time ./list
./list 2.36s user 0.01s system 91% cpu 2.598 total
time ./vector
./vector 0.58s user 0.00s system 50% cpu 1.168 total
std::list has to call heap allocation for each element, while std::vector can allocate heap memory in batch (though it might be implementation-dependent), hence std::list's insert/delete has a higher constant factor, though it is O(1).
No wonder this document says
std::vector is well loved and respected.
EDIT: std::deque does even better in some cases, at least for this task.
// deque.cc
#include <deque>
using namespace std;
int main() {
for (size_t k = 0; k < 1e5; k++) {
deque<size_t> dn;
for (size_t i = 0; i < 200; i++) {
dn.insert(dn.begin(), i);
if (i != 0 && i % 20 == 0) {
dn.erase(++++++++++dn.begin());
}
}
}
}
Without optimization:
./deque 2.13s user 0.01s system 86% cpu 2.470 total
Optimized with O3:
./deque 0.27s user 0.00s system 50% cpu 0.551 total

What container to choose for fast search/insert with huge amounts of data?

So it's a thought experiment. I want to have a huge collection of structures such as:
struct
{
KeyType key;
ValueType value;
}
And I need fast access by a key and fast insertion of new values.
I would not use std::map cuz it has too big memory overhead for one structure and for huge amounts of data it might be drastical. Right?
So next I would consider using sorted std::vector and binary_search. It's fine for searching, but adding new values to the vector would be too slow. Imagine you need to add a new value to the beginning of the sorted array, you'd have to move data right aaaaaAAAALOT!
What if I use deque? As I know it has O(1) for push_back/push_front, but still O(n) for inserting (as it would have to move data anyway, less data though).
The questions are:
1) Is O(n) of inserting data in deque much faster in real situation than O(n) in vector?
2) What happens when you insert a value to Deque and the bucket it should go into is full?
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
Thanks!

I would not use std::map cuz it has too big memory overhead for one structure and for huge amounts of data it might be drastical. Right?
That depends on the size of your structs... the bigger they are the less the overheads are as a proportion of the overall memory use. For example, a std::map implementation might average say 20 bytes of housekeeping data per element (I just made that up - measure on your own system), so if your struct size is in the hundreds of bytes - who cares...? But, if the struct holds 2 ints, it's a big proportion....
So next I would consider using sorted std::vector and binary_search. It's fine for searching, but adding new values to the vector would be too slow. Imagine you need to add a new value to the beginning of the sorted array, you'd have to move data right aaaaaAAAALOT!
Totally unsuitable....
1) Is O(n) of inserting data in deque much faster in real situation than O(n) in vector?
As deque is likely implemented as a vector of fixed-sized arrays, insertion implies a shuffling of all elements towards the nearest end of the container. The shuffling's probably a tiny bit less cache efficient, but if inserting nearer the front of the container it would likely still end up faster.
2) What happens when you insert a value to Deque and the bucket it should go into is full?
As above, it'll need to shuffle, overflowing either:
the last element to become the first element of the next "bucket", moving all those elements along and overflowing into the next bucket, etc.
the first element to become the last element of the previous bucket, moving all those elements along and overflowing into the next bucket, etc.
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
unordered_map, which is implemented as a hash map. If you have small objects (e.g. less than 20 or 30 bytes) or a firm cap on the number of elements, you can normally easily outperform unordered_map with custom code, but it's rarely worth the effort unless the table access dominates you application's performance, and that performance is critical.

3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
Consider using std::unordered_map, which is an implementation of a hash map. Insertion, lookup, and removal are all O(1) in the average case. This assumes that you will only ever look for an item based on its exact key; if your searches can have different constraints then you either need a different structure, or you need multiple maps to map the various keys you will search for to the corresponding object.
This requires that there is an available hash function for KeyType, either as part of the standard library or provided by you.

There's no container which would provide the best of all the worlds to you. Like you are saying you want best lookup/insertion with minimum amount of space needed for storing elements.
Below if the list of containers which you could consider for your implementation:-
VECTOR :-
Strengths:-
1) Space is allocated only for holding data.
2) Good for random access.
3) Container of choice if insertions/deletions are not in the middle of the container.
Weakness:-
1) poor performance if insertions/deletions are at the middle.
2) rellocations happen if reserve is not used properly.
DEQUE:-
Choose deque over vector in case insertions/deletions are at the beginning as well as end of the container.
MAP:-
Disadvantage over vector:-
1) more space is allocated for holding pointers.
Advantages over vector:-
1) better insertions/deletions/lookup as compared to vector.
If std::unordered_map is used then these dictionary operations would be amortized O(1).

Firstly, in order to directly answer your questions:
1) Is O(n) of inserting data in deque much faster in real situation
than O(n) in vector?
The number of elements that have to be moved is (on average) only half compared to vector. However, it can actually perform worse as the data is stored in non-contiguous memory, so copying/moving the same number of elements is much less efficient (it cannot e.g. be implemented in terms of a single memcopy operation).
2) What happens when you insert a value to Deque and the bucket it
should go into is full?
At least for the gnu gcc Libstdc++ implementation, every bucket except the first and last one is always full. I believe, that inserting in the middle means that all elements are moved/copied one slot to the closer end (front or back) and the effect ripples through all buckets until the first or last one is reached.
In summary, the only scenario, where std::deque is consistently better than vector is if you use it as (suprise) a queue (only inserting and removing elements from the front or end) and that's what the implementation is optimized for. It is not optimized for insertions in the middle.
3) Is there another preferable type of container in case you need to
store lots of data and need two fast operations: search and insertion?
As already stated by others: A hash table like std::unordered_map is the data structure you are looking for.
From what I've heard however, std::unordered_map is a slightly suboptimal implementation if it, as it uses buckets in order to resolve hash collisions and those buckets are implemented as linked lists (here is a very interesting talk from Chandler Carruth on the general topic of the performance of different data structures). For random access on big data structures, cache locality should matter a lot less, so this is probably not such a big issue in your case.
Finally I'd like to mention that if your value and key types are small PODs and depending on how big your huge collection is (are we talking about some million or rather billions of elements) and how often you actually have to insert/remove elements, there might still be cases, where a simple std::vector outperforms any other STL container. So as always: if your thought experiment ever becomes reality try out and measure.

Choosing List or Vector for a given scenario in C++

For my application I am using STD vector. I am inserting to the vector at the end, but erasing from vector randomly i.e element can be erased from middle, front anywhere. These two are only requirement, 1)insert at the end 2) erase from anywhere.
So should I use STD List, since erasing does shifting of data. Or I would retain Vector in my code for any reason??
Please give comment, If Vector is the better option, how it would be better that List here?

One key reason to use std::vector over std::list is cache locality. A list is terrible in this regard, because its elements can be (and usually are) fragmented in your memory. This will degrade performance significantly.
Some would recommend using std::vector almost always. In terms of performance, cache locality is often more important than the complexity of insertion or deletion.
Here's a video about Bjarne Stroustrup's opinion regarding subject.

I would refer you to this cheat sheet, and the conclusion would be the list.

A list supports deletion at an arbitrary but known position in constant time.
Finding that position takes linear time, just like modifying a vector.
The only advantage of the list is if you repeatedly erase (or insert) at (approximately) the same position.
If you're erasing more or less at random, chances are that the better memory locality of the vector could win out in the end.
The only way to be sure is to measure and compare.

List is better in this case most probably. The advantage of a list over vector is that it supports deletion at arbitrary position with constant complexity. A vector would only be better choice if you require constant index operation of elements of the container. Still you have to take into consideration how is the element you would like to delete passed to your function for deletion. If you only pass an index, vector will be able to find the element in constant time, while in list you will have to iterate. In this case I would benchmark the two solution, but still I would bet on list performing better.

It depends on many factors and how are you using your data.
One factor: do you need an erase that maintains the order of the collection? or you can live with changing order?
Other factor: what kind of data is in the collection? (numbers: ints/floats) || pointers || objects
Not keeping order
You could continue using vector if the data is basic numbers or pointers, to delete one element you could copy the last element of the vector over the deleted position, then pop_back(). This way you avoid moving all the data.
If using objects, you could still use the same method if the object you need to copy is small.
Keeping order
Maybe List would be your friend here, still, some tests would be advised, depends on size of data, size of list, etc

Keeping an unordered list of small objects with frequent insertions and removals

Suppose I have a list of small objects that I iterate through (say, in a loop) with frequent insertions and removals. However, the sequential order that I iterate through the list does not matter. Instead of using std::list to store the elements, I was thinking about using std::vector in the following way (for constant time removals):
Insertion: use push_back to insert at the end of the array.
Removal: let's say I want to remove an element at position k from a vector of size n. Then, I copy the content of the nth (or (n-1)st, depending on how you see it) element to the kth element and use pop_back. Given that the elements are small, the copy operation shouldn't be costly.
This is to take advantage of contiguous memory and not having to dynamically allocate memory for every insertion. Is there a downside for this approach? I also noticed that C++11 has unordered_set, but I think this may be overkill for what I'm trying to do.
I apologize if this idea sounds blatantly obvious.

Your idea is the basic approach to keep an array efficient. If the order really doesn't matter for you, I think it's the ideal approach. You might want to encapsulate it in a class (a wrapper around std::vector) so that you can employ it in multiple places without code duplication, test it separately and generally follow the "single responsibility" principle.
If you have access to C++11 features, you won't even have to copy the n-th element - you can move it instead, making this feasible even for heavier objects.

I can't see a downside to the approach given your fairly loose requirements.
One other option to consider is that if you item is cheaper to swap than copy, you can swap the last item with the one to delete and the pop your now-swapped item off the end.
It does really sound like unordered_set is too heavyweight for your needs since it has constant time find that you don't need for your requirements.

What is better, a STL list or a STL Map for 20 entries, considering order of insertion is as important as the search speed

I have the following scenario.The implementation is required for a real time application.
1)I need to store at max 20 entries in a container(STL Map, STL List etc).
2)If a new entry comes and 20 entries are already present i have to overwrite the oldest entry with the new entry.
Considering point 2, i feel if the container is full (Max 20 entries) 'list' is the best bet as i can always remove the first entry in the list and add the new one at last (push_back). However, search won't be as efficient.
For only 20 entries, does it really make a big difference in terms of searching efficiency if i use a list in place of a map?
Also considering the cost of insertion in map i feel i should go for a list?
Could you please tell what is a better bet for me ?

1)I need to store at max 20 entries in a container(STL Map, STL List etc). 2)If a new entry comes and 20 entries are already present i have to overwrite the oldest entry with the new entry.
This seems to me the job for boost::circular_buffer.
In general the term circular buffer refers to an area in memory which is used to store incoming data. When the buffer is filled, new data is written starting at the beginning of the buffer and overwriting the old.
The circular_buffer is a STL compliant container. It is a kind of sequence similar to std::list or std::deque. It supports random access iterators, constant time insert and erase operations at the beginning or the end of the buffer and interoperability with std algorithms. The circular_buffer is especially designed to provide fixed capacity storage. When its capacity is exhausted, newly inserted elements will cause elements either at the beginning or end of the buffer (depending on what insert operation is used) to be overwritten.
The circular_buffer only allocates memory when created, when the capacity is adjusted explicitly, or as necessary to accommodate resizing or assign operations. On the other hand, there is also a circular_buffer_space_optimized available. It is an adaptor of the circular_buffer which does not allocate memory at once when created, rather it allocates memory as needed.
For the fast search, I think that with just 20 elements (if their comparison isn't too complicated) you're ok with a "low-cost" container like this and normal linear search, in my opinion it would be difficult to achieve better performance with other STL containers.

Maintain order of insertion, or allow fast searching: choose one.
std::map is not an option here because it doesn't maintain the order of insertion. Besides, it's an associative container. You should choose between a list, a deque and a vector. In terms of performance your best bet is a list, since you can pop off an element from the back and insert a new one at the front (or vice-versa) without any shifting or performance penalty.
The cost of insertion in a map, just as a sidenote, isn't expensive it all: it's in the order of O(log n). Practically irrelevant in the case of 20 elements. The same holds for a std::set.

With only 20 elements, I would not worry much about which container you use. If you determine that the container chosen is in fact a detriment to the performance of your application, it should be relatively easy to swap out the container chosen and replace it with a more-efficient container later.
With that being said, for a large number of elements, the std::deque would probably give you the best all-around efficiency for what you are trying to accomplish. Unlike std::vector, std::deque allows for removal from the front without needing to move all of the other elements. Unlike std::list, std::deque allows for random access of its elements.

You just need to implement a priority queue. STL Map doesn't work.

It depends on the size of the elements.
I know from my own experience that for five integers an unordered array of integers searched with linear search is faster than a set, a list or insertion sort and binary search on an ordered array.
The O() notation of an unordered array may be much worse than any of the other options but the normally unseen C in O(N+C) + C is so much smaller.
A list, set or map (anything that uses dynamic memory and is linked by pointers) will be dominated by cache misses, memory allocations and indirect reference penalties.

You need a Priority Queue implemented on an array.
See the Binary Heap for an implementation.

Do you already know that this is a bottleneck?
My advice would be to first use what is more natural to read while programming and only optimize it when you see that the performance is not what you need.

My suggestion would be to make a circular buffer. But that only works if "old" is determined by when it was inserted, and not some field.
If you need to have a proper LRU, then you should probably go and look at something like http://www.codeproject.com/KB/recipes/LRUCache.aspx?fid=1000025&df=90&mpp=25&noise=3&sort=Position&view=Quick&fr=15
But with 20 entries as your max, it will be very hard to you to find a complex algorithm that is actually faster than the trivial lineary check of every element.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js