What container to choose for fast search/insert with huge amounts of data? - c++

So it's a thought experiment. I want to have a huge collection of structures such as:
struct
{
KeyType key;
ValueType value;
}
And I need fast access by a key and fast insertion of new values.
I would not use std::map cuz it has too big memory overhead for one structure and for huge amounts of data it might be drastical. Right?
So next I would consider using sorted std::vector and binary_search. It's fine for searching, but adding new values to the vector would be too slow. Imagine you need to add a new value to the beginning of the sorted array, you'd have to move data right aaaaaAAAALOT!
What if I use deque? As I know it has O(1) for push_back/push_front, but still O(n) for inserting (as it would have to move data anyway, less data though).
The questions are:
1) Is O(n) of inserting data in deque much faster in real situation than O(n) in vector?
2) What happens when you insert a value to Deque and the bucket it should go into is full?
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
Thanks!

I would not use std::map cuz it has too big memory overhead for one structure and for huge amounts of data it might be drastical. Right?
That depends on the size of your structs... the bigger they are the less the overheads are as a proportion of the overall memory use. For example, a std::map implementation might average say 20 bytes of housekeeping data per element (I just made that up - measure on your own system), so if your struct size is in the hundreds of bytes - who cares...? But, if the struct holds 2 ints, it's a big proportion....
So next I would consider using sorted std::vector and binary_search. It's fine for searching, but adding new values to the vector would be too slow. Imagine you need to add a new value to the beginning of the sorted array, you'd have to move data right aaaaaAAAALOT!
Totally unsuitable....
1) Is O(n) of inserting data in deque much faster in real situation than O(n) in vector?
As deque is likely implemented as a vector of fixed-sized arrays, insertion implies a shuffling of all elements towards the nearest end of the container. The shuffling's probably a tiny bit less cache efficient, but if inserting nearer the front of the container it would likely still end up faster.
2) What happens when you insert a value to Deque and the bucket it should go into is full?
As above, it'll need to shuffle, overflowing either:
the last element to become the first element of the next "bucket", moving all those elements along and overflowing into the next bucket, etc.
the first element to become the last element of the previous bucket, moving all those elements along and overflowing into the next bucket, etc.
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
unordered_map, which is implemented as a hash map. If you have small objects (e.g. less than 20 or 30 bytes) or a firm cap on the number of elements, you can normally easily outperform unordered_map with custom code, but it's rarely worth the effort unless the table access dominates you application's performance, and that performance is critical.

3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
Consider using std::unordered_map, which is an implementation of a hash map. Insertion, lookup, and removal are all O(1) in the average case. This assumes that you will only ever look for an item based on its exact key; if your searches can have different constraints then you either need a different structure, or you need multiple maps to map the various keys you will search for to the corresponding object.
This requires that there is an available hash function for KeyType, either as part of the standard library or provided by you.

There's no container which would provide the best of all the worlds to you. Like you are saying you want best lookup/insertion with minimum amount of space needed for storing elements.
Below if the list of containers which you could consider for your implementation:-
VECTOR :-
Strengths:-
1) Space is allocated only for holding data.
2) Good for random access.
3) Container of choice if insertions/deletions are not in the middle of the container.
Weakness:-
1) poor performance if insertions/deletions are at the middle.
2) rellocations happen if reserve is not used properly.
DEQUE:-
Choose deque over vector in case insertions/deletions are at the beginning as well as end of the container.
MAP:-
Disadvantage over vector:-
1) more space is allocated for holding pointers.
Advantages over vector:-
1) better insertions/deletions/lookup as compared to vector.
If std::unordered_map is used then these dictionary operations would be amortized O(1).

Firstly, in order to directly answer your questions:
1) Is O(n) of inserting data in deque much faster in real situation
than O(n) in vector?
The number of elements that have to be moved is (on average) only half compared to vector. However, it can actually perform worse as the data is stored in non-contiguous memory, so copying/moving the same number of elements is much less efficient (it cannot e.g. be implemented in terms of a single memcopy operation).
2) What happens when you insert a value to Deque and the bucket it
should go into is full?
At least for the gnu gcc Libstdc++ implementation, every bucket except the first and last one is always full. I believe, that inserting in the middle means that all elements are moved/copied one slot to the closer end (front or back) and the effect ripples through all buckets until the first or last one is reached.
In summary, the only scenario, where std::deque is consistently better than vector is if you use it as (suprise) a queue (only inserting and removing elements from the front or end) and that's what the implementation is optimized for. It is not optimized for insertions in the middle.
3) Is there another preferable type of container in case you need to
store lots of data and need two fast operations: search and insertion?
As already stated by others: A hash table like std::unordered_map is the data structure you are looking for.
From what I've heard however, std::unordered_map is a slightly suboptimal implementation if it, as it uses buckets in order to resolve hash collisions and those buckets are implemented as linked lists (here is a very interesting talk from Chandler Carruth on the general topic of the performance of different data structures). For random access on big data structures, cache locality should matter a lot less, so this is probably not such a big issue in your case.
Finally I'd like to mention that if your value and key types are small PODs and depending on how big your huge collection is (are we talking about some million or rather billions of elements) and how often you actually have to insert/remove elements, there might still be cases, where a simple std::vector outperforms any other STL container. So as always: if your thought experiment ever becomes reality try out and measure.

Related

C++: container replacement for vector/deque for huge sizes

so my applications has containers with 100 million and more elements.
I'm on the hunt for a container which behaves - time-wise - better than std::deque (let alone std::vector) with respect to frequent insertions and deletions all over the container ... including near the middle. Access time to the n-th element does not need to be as fast as vector, but should definetely be better than full traversal like in std::list (which has a huge memory overhead per element anyway).
Elements should be treated ordered by index (like vector, deque, list), so std::set or std::unordered_set also do not work well.
Before I sit down and code such a container myself: has anyone seen such a beast already? I'm pretty sure the STL hasn't anything like this, looking to BOOST I did not find something I could use but I may be wrong.
Any hints?
There's a whole STL replacement for big data, in case your app is centric to such data:
STXXL - http://stxxl.sourceforge.net/
edit: I was actually a bit fast to answer. 100 million is not really a large number. E.g., if each element is one byte, you could save it in a 96MiB array. So whether STXXL is any useful, the size of an element should be significantly bigger.
I think you can get the performance characteristics that you want with a skip list:
https://en.wikipedia.org/wiki/Skip_list#Indexable_skiplist
It's the "indexable" part that you're interested in, of course -- you don't actually want the items to be sorted. So some modification is needed that I leave as an exercise.
You might find that 100 million list nodes begins to strain a 32 bit address space, but probably not an issue in 64 bits.
1) If the data is highly sparse, i.e. has lots of zeroes or can be expressed as such, I would highly recommend a data structure that takes advantage of that:
sparselib++ for matrices
sparsehash for hash maps
2) Hash maps should do O(1) for all the operations you describe and the sparsehash implementation I mentioned earlier is particularly space-efficient; it also includes a sparsetable type which is a bit more low-level and can be used in place of an array.
3) If the strict ordering is not that important (it probably is, because you mentioned elements should be treated ordered by index), you can swap the elements you want to erase to the end of the vector and then resize to do removal in O(1). Insertion would just be push_back.
Try a hash map. The STL has several, all with the unordered naming prefix , such as unorderd_map, etc. It has constant time insertion and look up given a good hashing algorithm. With your 'huge' data set the hash map would most likely cover your needs. Making a slight change to the application to cover the differences in the interfaces is trivial.

Which stl container should I use when doing few inserts?

I don't know my exact numbers but i'll try my best. I have a 10000 element deque thats populated right at the start. Than i scan through each element and lets every 20 elements i'll need to insert an new element. The insert would happen at the current position and maybe one element back.
I don't exactly need to remember the position but i also don't exactly need random access either. I'd like fast inserts. Does deque and vector have a heavy price to pay on insert? Should i use list?
My other option is to have a 2nd deque list and as i go through each element insert it to the other deque list unless i need to do the insert i am talking about. This does need to be fast as its a performance intensive app. But I am using a lot of pointers (each element is a pointer) which is upsetting me but there isn't a way around that so i should assume L1 cache will always miss?
I'd start with std::vector in this case, but use a second std::vector for your mass mutations, reserve() appropriately, then swap() the vectors.
Update
It would take this general form:
std:vector<t_object*> source; // << source already holds 10000 elements
std:vector<t_object*> tmp;
// to minimize reallocations and frees to 1 and 1, if possible.
// if you do not swap or have to grow more, reserving can really work against you.
tmp.reserve(aMeaningfulReserveValue);
while (performingMassMutation) {
// "i scan through each element and lets every 20 elements"
for (twentyElements)
tmp.push_back(source[readPos++]);
// "every 20 elements i'll need to insert an new element"
tmp.push_back(newElement);
}
// approximately 500 iterations later…
source.swap(tmp);
Borealid brought up a good point, which is measure -- execution varies dramatically depending on your std library implementations, data sizes, complexity to copy, and so on.
For raw pointers of a collection this size with my configuration, the vector mass mutation and push_back above was 7 times faster than std::list insertion. push_back was faster than vector's range insertion.
As Emile points out below, std::vector::swap() does not need to move or reallocate elements -- it can just swap out internals (provided the allocators are the same type).
First off, the answer to all performance questions is "benchmark it". Always. Now...
If you don't care about the memory overhead, and you don't need random access, but you do care about having constant-time insertions, list is probably right for you.
std::vector will have constant-time insertions at the end when it has sufficient capacity. When the capacity is exceeded, it needs a linear-time copy. deque is better because it links discrete allocations, avoiding a complete copy and letting you do constant-time insertions at the front as well. Random insertions (every 20 elements) will always be linear time.
As for cache locality, a vector is as good as you can get (contiguous memory), but you said you cared about insertions rather than lookups; in my experience, when that's the case you don't care about how hot the cache gets as you scan through to dump, so list's poor behavior doesn't much matter.
Lists are useful when either you frequently want to insert elements in the middle of the collection, or frequently remove them. Lists are, however, slow to read.
Vectors are very fast to read and very fast when you only want to add or remove elements at the end of the collection, but they are very slow when you insert elements in the middle. This is because it has to move all elements after the desired position by one place, to make room for the new element.
Deques are basically doubly linked lists that can be used as vectors.
If you don't need to insert elements in the middle of the collection (you don't care about the order), I suggest you use vector. If you can approximate the number of elements that will be introduced in the vector from the beginning, you should also use std::vector::reserve to allocate memory necessary from the beginning. The value you pass to reserve doesn't need to be exact, just approximate; if it's smaller than needed, the vector will resize automatically, when necessary.
You can go two ways: list is always an option for random place insertions, however as you allocate every element separately this will cause some performance implications too. The other option of inserting in-place in the deque is not good as well - because you will pay linear time for every insertion. Maybe your idea of inserting in new deque is the best here - you pay twice as much memory, but on the other hand you always do insertion either at the end of the second deque, or one element before that - this all gives constant amortized time, and still you have good caching of the container.
The number of copies done for std::vector/deque ::insert etc is proportional to the number of elements between the insert position and the end of container (the number of elements that need to be shifted to make room). The worst-case for a std::vector is O(N) - when you insert at the front of the container. If you're inserting M elements the worst -case is therefore O(M*N) which isn't great.
There could also be a reallocation involved if the containers capacity is exceeded. You could prevent reallocation by ensuring that sufficient space was ::reserve'd up front.
You're other suggestion - copying to a second std::vector/deque container could be better in that it could always be organised to achieve O(N) complexity, but at the cost of temporarily storing two containers.
Using a std::list would allow you to achieve in-place O(1) inserts, but at the cost of additional memory overhead (storing the list pointers etc) and reduced memory locality (list nodes are not allocated contiguously). You could improve the memory locality by using a pool'd memory allocator (Boost pools maybe?).
Overall you'd have to benchmark to really sort out which is "the fastest" approach.
Hope this helps.
If you need fast inserts in the middle, but don't care about random access, vector and deque are definitely not for you: For those, every time you insert something, all elements between that one and the end have to be moved. Of the built-in containers, list is almost certainly your best bet. However a better data structure for your scenario would probably be a VList because it provides better cache locality, however that's not provided by the C++ standard library. The Wikipedia page links to a C++ implementation, however from a quick view on the interface it doesn't seem to completely STL compatible; I don't know if this is an issue for you.
Of course, in the end the only way to be sure which is the optimal solution is to measure the performance.

How large does a collection have to be for std::map<k,v> to outpace a sorted std::vector<std::pair<k,v> >?

How large does a collection have to be for std::map to outpace a sorted std::vector >?
I've got a system where I need several thousand associative containers, and std::map seems to carry a lot of overhead in terms of CPU cache. I've heard somewhere that for small collections std::vector can be faster -- but I'm wondering where that line is....
EDIT: I'm talking about 5 items or fewer at a time in a given structure. I'm concerned most with execution time, not storage space. I know that questions like this are inherently platform-specific, but I'm looking for a "rule of thumb" to use.
Billy3
It's not really a question of size, but of usage.
A sorted vector works well when the usage pattern is that you read the data, then you do lookups in the data.
A map works well when the usage pattern involves a more or less arbitrary mixture of modifying the data (adding or deleting items) and doing queries on the data.
The reason for this is fairly simple: a map has higher overhead on an individual lookup (thanks to using linked nodes instead of a monolithic block of storage). An insertion or deletion that maintains order, however, has a complexity of only O(lg N). An insertion or deletion that maintains order in a vector has a complexity of O(N) instead.
There are, of course, various hybrid structures that can be helpful to consider as well. For example, even when data is being updated dynamically, you often start with a big bunch of data, and make a relatively small number of changes at a time to it. In this case, you can load your data into memory into a sorted vector, and keep the (small number of) added objects in a separate vector. Since that second vector is normally quite small, you simply don't bother with sorting it. When/if it gets too big, you sort it and merge it with the main data set.
Edit2: (in response to edit in question). If you're talking about 5 items or fewer, you're probably best off ignoring all of the above. Just leave the data unsorted, and do a linear search. For a collection this small, there's effectively almost no difference between a linear search and a binary search. For a linear search you expect to scan half the items on average, giving ~2.5 comparisons. For a binary search you're talking about log2 N, which (if my math is working this time of the morning) works out to ~2.3 -- too small a difference to care about or notice (in fact, a binary search has enough overhead that it could very easily end up slower).
If you say "outspace" you mean consuming more space (aka memory), then it's very likely that vector will always be more efficient (the underlying implementation is an continous memory array with no othe data, where map is a tree, so every data implies using more space). This however depends on how much the vector reserves extra space for future inserts.
When it is about time (and not space), vector will also always be more effective (doing a dichotomic search). But it will be extreamly bad for adding new elements (or removing them).
So : no simple answer ! Look-up the complexities, think about the uses you are going to do. http://www.cplusplus.com/reference/stl/
The main issue with std::map is an issue of cache, as you pointed.
The sorted vector is a well-known approach: Loki::AssocVector.
For very small datasets, the AssocVector should crush the map despite the copy involved during insertion simply because of cache locality. The AssocVector will also outperform the map for read-only usage. Binary search is more efficient there (less pointers to follow).
For all other uses, you'll need to profile...
There is however an hybrid alternative that you might wish to consider: using the Allocator parameter of the map to restrict the memory area where the items are allocated, thus minimizing the locality reference issue (the root of cache misses).
There is also a paradigm shift that you might consider: do you need sorted items, or fast look-up ?
In C++, the only STL-compliant containers for fast-lookup have been implemented in terms of Sorted Associative Containers for years. However the up-coming C++0x features the long awaited unordered_map which could out perform all the above solutions!
EDIT: Seeing as you're talking about 5 items or fewer:
Sorting involves swapping items. When inserting into std::map, that will only involve pointer swaps. Whether a vector or map will be faster depends on how fast it is to swap two elements.
I suggest you profile your application to figure it out.
If you want a simple and general rule, then you're out of luck - you'll need to consider at least the following factors:
Time
How often do you insert new items compared to how often you lookup?
Can you batch inserts of new items?
How expensive is sorting you vector? Vectors of elements that are expensive to swap become very expensive to sort - vectors of pointers take far less.
Memory
How much overhead per allocation does the allocator you're using have? std::map will perform one allocation per item.
How big are your key/value pairs?
How big are your pointers? (32/64 bit)
How fast does you implementation of std::vector grow? (Popular growth factors are 1.5 and 2)
Past a certain size of container and element, the overhead of allocation and tree pointers will become outweighed by the cost of the unused memory at the end of the vector - but by far the easiest way to find out if and when this occurs is by measuring.
It has to be in the millionth items. And even there ...
I am more thinking here to memory usage and memory accesses. Under hundreds of thousands, take whatever you want, there will be no noticeable difference. CPUs are really fast these days, and the bottleneck is memory latency.
But even with millions of items, if your map<> has been build by inserting elements in random order. When you want to traverse your map (in sorted order) you'll end up jumping around randomly in the memory, stalling the CPU for memory to be available, resulting in poor performance.
On the other side, if your millions of items are in a vector, traversing it is really fast, taking advantage of the CPU memory accesses predictions.
As other have written, it depends on your usage.
Edit: I would more question the way to organize your thousands of associative containers than the containers themselves if they contain only 5 items.

What is better, a STL list or a STL Map for 20 entries, considering order of insertion is as important as the search speed

I have the following scenario.The implementation is required for a real time application.
1)I need to store at max 20 entries in a container(STL Map, STL List etc).
2)If a new entry comes and 20 entries are already present i have to overwrite the oldest entry with the new entry.
Considering point 2, i feel if the container is full (Max 20 entries) 'list' is the best bet as i can always remove the first entry in the list and add the new one at last (push_back). However, search won't be as efficient.
For only 20 entries, does it really make a big difference in terms of searching efficiency if i use a list in place of a map?
Also considering the cost of insertion in map i feel i should go for a list?
Could you please tell what is a better bet for me ?
1)I need to store at max 20 entries in a container(STL Map, STL List etc). 2)If a new entry comes and 20 entries are already present i have to overwrite the oldest entry with the new entry.
This seems to me the job for boost::circular_buffer.
In general the term circular buffer refers to an area in memory which is used to store incoming data. When the buffer is filled, new data is written starting at the beginning of the buffer and overwriting the old.
The circular_buffer is a STL compliant container. It is a kind of sequence similar to std::list or std::deque. It supports random access iterators, constant time insert and erase operations at the beginning or the end of the buffer and interoperability with std algorithms. The circular_buffer is especially designed to provide fixed capacity storage. When its capacity is exhausted, newly inserted elements will cause elements either at the beginning or end of the buffer (depending on what insert operation is used) to be overwritten.
The circular_buffer only allocates memory when created, when the capacity is adjusted explicitly, or as necessary to accommodate resizing or assign operations. On the other hand, there is also a circular_buffer_space_optimized available. It is an adaptor of the circular_buffer which does not allocate memory at once when created, rather it allocates memory as needed.
For the fast search, I think that with just 20 elements (if their comparison isn't too complicated) you're ok with a "low-cost" container like this and normal linear search, in my opinion it would be difficult to achieve better performance with other STL containers.
Maintain order of insertion, or allow fast searching: choose one.
std::map is not an option here because it doesn't maintain the order of insertion. Besides, it's an associative container. You should choose between a list, a deque and a vector. In terms of performance your best bet is a list, since you can pop off an element from the back and insert a new one at the front (or vice-versa) without any shifting or performance penalty.
The cost of insertion in a map, just as a sidenote, isn't expensive it all: it's in the order of O(log n). Practically irrelevant in the case of 20 elements. The same holds for a std::set.
With only 20 elements, I would not worry much about which container you use. If you determine that the container chosen is in fact a detriment to the performance of your application, it should be relatively easy to swap out the container chosen and replace it with a more-efficient container later.
With that being said, for a large number of elements, the std::deque would probably give you the best all-around efficiency for what you are trying to accomplish. Unlike std::vector, std::deque allows for removal from the front without needing to move all of the other elements. Unlike std::list, std::deque allows for random access of its elements.
You just need to implement a priority queue. STL Map doesn't work.
It depends on the size of the elements.
I know from my own experience that for five integers an unordered array of integers searched with linear search is faster than a set, a list or insertion sort and binary search on an ordered array.
The O() notation of an unordered array may be much worse than any of the other options but the normally unseen C in O(N+C) + C is so much smaller.
A list, set or map (anything that uses dynamic memory and is linked by pointers) will be dominated by cache misses, memory allocations and indirect reference penalties.
You need a Priority Queue implemented on an array.
See the Binary Heap for an implementation.
Do you already know that this is a bottleneck?
My advice would be to first use what is more natural to read while programming and only optimize it when you see that the performance is not what you need.
My suggestion would be to make a circular buffer. But that only works if "old" is determined by when it was inserted, and not some field.
If you need to have a proper LRU, then you should probably go and look at something like http://www.codeproject.com/KB/recipes/LRUCache.aspx?fid=1000025&df=90&mpp=25&noise=3&sort=Position&view=Quick&fr=15
But with 20 entries as your max, it will be very hard to you to find a complex algorithm that is actually faster than the trivial lineary check of every element.

array vs vector vs list

I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:
array - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item using [] is faster.
stl vector - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item is slower than an array since it is a call to operator[] and a linked list .
stl list - insert and delete takes linear time since you need to iterate to a specific position before applying the insert/delete; additional space is needed for pointers; accessing an item is slower than an array since it is a linked list linear traversal.
Right now, my choice is to use an array. Is it justifiable? Or did I miss something?
Which is faster: traversing a list, then inserting a node or shifting items in an array to produce an empty position then inserting the item in that position?
What is the best way to measure this performance? Can I just display the timestamp before and after the operations?
Use STL vector. It provides an equally rich interface as list and removes the pain of managing memory that arrays require.
You will have to try very hard to expose the performance cost of operator[] - it usually gets inlined.
I do not have any number to give you, but I remember reading performance analysis that described how vector<int> was faster than list<int> even for inserts and deletes (under a certain size of course). The truth of the matter is that these processors we use are very fast - and if your vector fits in L2 cache, then it's going to go really really fast. Lists on the other hand have to manage heap objects that will kill your L2.
Premature optimization is the root of all evil.
Based on your post, I'd say there's no reason to make your choice of data structure here a performance based one. Pick whatever is most convenient and return to change it if and only if performance testing demonstrates it's a problem.
It is really worth investing some time in understanding the fundamental differences between lists and vectors.
The most significant difference between the two is the way they store elements and keep track of them.
- Lists -
List contains elements which have the address of a previous and next element stored in them. This means that you can INSERT or DELETE an element anywhere in the list with constant speed O(1) regardless of the list size. You also splice (insert another list) into the existing list anywhere with constant speed as well. The reason is that list only needs to change two pointers (the previous and next) for the element we are inserting into the list.
Lists are not good if you need random access. So if one plans to access nth element in the list - one has to traverse the list one by one - O(n) speed
- Vectors -
Vector contains elements in sequence, just like an array. This is very convenient for random access. Accessing the "nth" element in a vector is a simple pointer calculation (O(1) speed). Adding elements to a vector is, however, different. If one wants to add an element in the middle of a vector - all the elements that come after that element will have to be re allocated down to make room for the new entry. The speed will depend on the vector size and on the position of the new element. The worst case scenario is inserting an element at position 2 in a vector, the best one is appending a new element. Therefore - insert works with speed O(n), where "n" is the number of elements that need to be moved - not necessarily the size of a vector.
There are other differences that involve memory requirements etc., but understanding these basic principles of how lists and vectors actually work is really worth spending some time on.
As always ... "Premature optimization is the root of all evil" so first consider what is more convenient and make things work exactly the way you want them, then optimize. For 10 entries that you mention - it really does not matter what you use - you will never be able to see any kind of performance difference whatever method you use.
Prefer an std::vector over and array. Some advantages of vector are:
They allocate memory from the free space when increasing in size.
They are NOT a pointer in disguise.
They can increase/decrease in size run-time.
They can do range checking using at().
A vector knows its size, so you don't have to count elements.
The most compelling reason to use a vector is that it frees you from explicit memory management, and it does not leak memory. A vector keeps track of the memory it uses to store its elements. When a vector needs more memory for elements, it allocates more; when a vector goes out of scope, it frees that memory. Therefore, the user need not be concerned with the allocation and deallocation of memory for vector elements.
You're making assumptions you shouldn't be making, such as "accessing an item is slower than an array since it is a call to operator[]." I can understand the logic behind it, but you nor I can know until we profile it.
If you do, you'll find there is no overhead at all, when optimizations are turned on. The compiler inlines the function calls. There is a difference in memory performance. An array is statically allocated, while a vector dynamically allocates. A list allocates per node, which can throttle cache if you're not careful.
Some solutions are to have the vector allocate from the stack, and have a pool allocator for a list, so that the nodes can fit into cache.
So rather than worry about unsupported claims, you should worry about making your design as clean as possible. So, which makes more sense? An array, vector, or list? I don't know what you're trying to do so I can't answer you.
The "default" container tends to be a vector. Sometimes an array is perfectly acceptable too.
First a couple of notes:
A good rule of thumb about selecting data structures: Generally, if you examined all the possibilities and determined that an array is your best choice, start over. You did something very wrong.
STL lists don't support operator[], and if they did the reason that it would be slower than indexing an array has nothing to do with the overhead of a function call.
Those things being said, vector is the clear winner here. The call to operator[] is essentially negligible since the contents of a vector are guaranteed to be contiguous in memory. It supports insert() and erase() operations which you would essntially have to write yourself if you used an array. Basically it boils down to the fact that a vector is essentially an upgraded array which already supports all the operations you need.
I am maintaining a fixed-length table of 10 entries. Each item is a
structure of like 4 fields. There will be insert, update and delete
operations, specified by numeric position. I am wondering which is the
best data structure to use to maintain this table of information:
Based on this description it seems like list might be the better choice since its O(1) when inserting and deleting in the middle of the data structure. Unfortunately you cannot use numeric positions when using lists to do inserts and deletes like you can for arrays/vectors. This dilemma leads to a slew of questions which can be used to make an initial decision of which structure may be best to use. This structure can later be changed if testing clearly shows its the wrong choice.
The questions you need to ask are three fold. The first is how often are you planning on doing deletes/inserts in the middle relative to random reads. The second is how important is using a numeric position compared to an iterator. Finally, is order in your structure important.
If the answer to the first question is random reads will be more prevalent than a vector/array will probably work well. Note iterating through a data structure is not considered a random read even if the operator[] notation is used. For the second question, if you absolutely require numeric position than a vector/array will be required even though this may lead to a performance hit. Later testing may show this performance hit is negligible relative to the easier coding with numerical positions. Finally if order is unimportant you can insert and delete in a vector/array with an O(1) algorithm. A sample algorithm is shown below.
template <class T>
void erase(vector<T> & vect, int index) //note: vector cannot be const since you are changing vector
{
vect[index]= vect.back();//move the item in the back to the index
vect.pop_back(); //delete the item in the back
}
template <class T>
void insert(vector<T> & vect, int index, T value) //note: vector cannot be const since you are changing vector
{
vect.push_back(vect[index]);//insert the item at index to the back of the vector
vect[index] = value; //replace the item at index with value
}
I Believe it's as per your need if one needs more insert/to delete in starting or middle use list(doubly-linked internally) if one needs to access data randomly and addition to last element use array ( vector have dynamic allocation but if you require more operation as a sort, resize, etc use vector)