I have to loop over a subset of elements in a large array where each element point to another one (problem coming from the detection of connected component in a large graph).
My algo is going as follows:
1. consider 1st element
2. consider next element as the one pointed by the previous element.
3. loop until no new element is discover
4. consider next element not already consider in 1-3, get back to 1.
Note that the number of elements to consider is much smaller than the total number of elements.
For what I see now, I can either:
//create a map of all element, init all values to 0, set to 1 when consider
map<int,int> is_set; // is_set.size() will be equal to N
or
//create a (too) large array (total size), init to 0 the elements to consider
int* is_set = (int*)malloc(total_size * sizeof(int)); // is_set length will be total_size>>N
I know that accessing keys in map is O(log N) while it's only constant for arrays, but I don't know if malloc is not more costly at the creation while it also requires more memory?
When in doubt, measure the performance of both alternatives. That's the only way to know for sure which approach will be fastest for your application.
That said, a one-time large malloc is generally not terribly expensive. Also, although the map is O(log N), the big-O conceals a relatively large constant factor, at least for the std::map implementation, in my experience. I would not be surprised to find that the array approach is faster in this case, but again the only way to know for sure is to measure.
Keep in mind too that although the map does not have a large up-front memory allocation, it has many small allocations over the lifetime of the object (every time you insert a new element, you get another allocation, and every time you remove an element, you get another free). If you have very many of these, that can fragment your heap, which may negatively impact performance depending on what else your application might be doing at the same time.
If indexed search suits your needs (like provided by regular C-style arrays), probably std::map is not the right class for you. Instead, consider using std::vector if you need dynamic run-time allocation or std::array if your collection is fixed-sized and you just need the fastest bounds-safe alternative to a C-style pointer.
You can find more information on this previous post.
I know that accessing keys in map is O(log N) while it's only constant for arrays, but I don't know if malloc is not more costly at the creation while it also requires more memory?
Each entry in the map is dynamically allocated, so if the dynamic allocation is an issue it will be a bigger issue in the map. As of the data structure, you can use a bitmap rather than a plain array of int's. That will reduce the size of the array by a factor of 32 in architectures with 32bit ints, the extra cost of mapping the index into the array will in most cases be much smaller than the cost of the extra memory, as the structure is more compact and can fit in fewer cache lines.
There are other things to consider, as whether the density of elements in the set is small or not. If there are very few entries (i.e. the graph is sparse) then either option could be fine. As a final option you can manually implement the map by using a vector of pair<int,int> and short them, then use binary search. That will reduce the number of allocations, incur some extra cost in sorting and provide a more compact O(log N) solution than a map. Still, I would try to go for the bitmask.
Related
So it's a thought experiment. I want to have a huge collection of structures such as:
struct
{
KeyType key;
ValueType value;
}
And I need fast access by a key and fast insertion of new values.
I would not use std::map cuz it has too big memory overhead for one structure and for huge amounts of data it might be drastical. Right?
So next I would consider using sorted std::vector and binary_search. It's fine for searching, but adding new values to the vector would be too slow. Imagine you need to add a new value to the beginning of the sorted array, you'd have to move data right aaaaaAAAALOT!
What if I use deque? As I know it has O(1) for push_back/push_front, but still O(n) for inserting (as it would have to move data anyway, less data though).
The questions are:
1) Is O(n) of inserting data in deque much faster in real situation than O(n) in vector?
2) What happens when you insert a value to Deque and the bucket it should go into is full?
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
Thanks!
I would not use std::map cuz it has too big memory overhead for one structure and for huge amounts of data it might be drastical. Right?
That depends on the size of your structs... the bigger they are the less the overheads are as a proportion of the overall memory use. For example, a std::map implementation might average say 20 bytes of housekeeping data per element (I just made that up - measure on your own system), so if your struct size is in the hundreds of bytes - who cares...? But, if the struct holds 2 ints, it's a big proportion....
So next I would consider using sorted std::vector and binary_search. It's fine for searching, but adding new values to the vector would be too slow. Imagine you need to add a new value to the beginning of the sorted array, you'd have to move data right aaaaaAAAALOT!
Totally unsuitable....
1) Is O(n) of inserting data in deque much faster in real situation than O(n) in vector?
As deque is likely implemented as a vector of fixed-sized arrays, insertion implies a shuffling of all elements towards the nearest end of the container. The shuffling's probably a tiny bit less cache efficient, but if inserting nearer the front of the container it would likely still end up faster.
2) What happens when you insert a value to Deque and the bucket it should go into is full?
As above, it'll need to shuffle, overflowing either:
the last element to become the first element of the next "bucket", moving all those elements along and overflowing into the next bucket, etc.
the first element to become the last element of the previous bucket, moving all those elements along and overflowing into the next bucket, etc.
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
unordered_map, which is implemented as a hash map. If you have small objects (e.g. less than 20 or 30 bytes) or a firm cap on the number of elements, you can normally easily outperform unordered_map with custom code, but it's rarely worth the effort unless the table access dominates you application's performance, and that performance is critical.
3) Is there another preferable type of container in case you need to store lots of data and need two fast operations: search and insertion?
Consider using std::unordered_map, which is an implementation of a hash map. Insertion, lookup, and removal are all O(1) in the average case. This assumes that you will only ever look for an item based on its exact key; if your searches can have different constraints then you either need a different structure, or you need multiple maps to map the various keys you will search for to the corresponding object.
This requires that there is an available hash function for KeyType, either as part of the standard library or provided by you.
There's no container which would provide the best of all the worlds to you. Like you are saying you want best lookup/insertion with minimum amount of space needed for storing elements.
Below if the list of containers which you could consider for your implementation:-
VECTOR :-
Strengths:-
1) Space is allocated only for holding data.
2) Good for random access.
3) Container of choice if insertions/deletions are not in the middle of the container.
Weakness:-
1) poor performance if insertions/deletions are at the middle.
2) rellocations happen if reserve is not used properly.
DEQUE:-
Choose deque over vector in case insertions/deletions are at the beginning as well as end of the container.
MAP:-
Disadvantage over vector:-
1) more space is allocated for holding pointers.
Advantages over vector:-
1) better insertions/deletions/lookup as compared to vector.
If std::unordered_map is used then these dictionary operations would be amortized O(1).
Firstly, in order to directly answer your questions:
1) Is O(n) of inserting data in deque much faster in real situation
than O(n) in vector?
The number of elements that have to be moved is (on average) only half compared to vector. However, it can actually perform worse as the data is stored in non-contiguous memory, so copying/moving the same number of elements is much less efficient (it cannot e.g. be implemented in terms of a single memcopy operation).
2) What happens when you insert a value to Deque and the bucket it
should go into is full?
At least for the gnu gcc Libstdc++ implementation, every bucket except the first and last one is always full. I believe, that inserting in the middle means that all elements are moved/copied one slot to the closer end (front or back) and the effect ripples through all buckets until the first or last one is reached.
In summary, the only scenario, where std::deque is consistently better than vector is if you use it as (suprise) a queue (only inserting and removing elements from the front or end) and that's what the implementation is optimized for. It is not optimized for insertions in the middle.
3) Is there another preferable type of container in case you need to
store lots of data and need two fast operations: search and insertion?
As already stated by others: A hash table like std::unordered_map is the data structure you are looking for.
From what I've heard however, std::unordered_map is a slightly suboptimal implementation if it, as it uses buckets in order to resolve hash collisions and those buckets are implemented as linked lists (here is a very interesting talk from Chandler Carruth on the general topic of the performance of different data structures). For random access on big data structures, cache locality should matter a lot less, so this is probably not such a big issue in your case.
Finally I'd like to mention that if your value and key types are small PODs and depending on how big your huge collection is (are we talking about some million or rather billions of elements) and how often you actually have to insert/remove elements, there might still be cases, where a simple std::vector outperforms any other STL container. So as always: if your thought experiment ever becomes reality try out and measure.
I have about 70-150 different structs X with an unsigned integral ID field. They are read in and initialized on initialization of program and never modified thereafter. What would be the fastest way to access them (which happens a lot) among the following (or some other method?):
Use a std::vector v; where v[X.id] = X; to access by doing X& x = v[id]; (this should do a copy at the beginning but later on merely do a lookup by id on essentially a flat array.
Same as above but std::vector v; with X* x = v[id]; I am wary about this one because it has one extra level of indirection.
a std::map - feels like overkill compared to above?
same as above but unordered_map - again given 70-150 occurrences might not even beat suggestion 3.
Anything more clever? One problem I see with 1 is it might be a bit sparse in access patterns but not sure how to address that if that's the fastest way.
Using vector will be definitely the fastest approach:
Vector has complexity O(1). No need to search or hash, you will immediately find your instance.
Map has complexity O(log N). The map needs to compare your index with logN other entries to find your instance.
Unordered_map has complexity O(1), but with quite some overhead of calculating the hash value (although for simple numbers it will be not that much). However, the std::unordered_map still puts multiple entries behind one hash-index, so instead of comparing one index, it has to compare several ones (I think by default it's 4).
I think that for a small number of items there is no faster way to access than using array or vector data types because it provides constant time access. In case of many objects this approach is also fastest but also the most memory expensive.
If you don't mind pre-allocating the space for all the structures in advance, and the IDs can be ordered sequentially so they are also indexes, just use approach 1.
If the structures are expensive to construct, delay construction either by not actually placing the construction code in the default constructor (and doing it explicitly via a method call later) or by using a raw array and placement new on memory "slots" within that array.
If IDs are not sequential, use std::unordered_map to prevent waste of space on "holes".
I have an array of 64 structs that hold a decent amount of data (struct is around 128 bytes, so that's 8192 bytes that need to be re-arragned). The array needs to be sorted based off a single unsigned byte in each struct. An interesting property of my data is that it is likely that there will be many duplicates of sorted value - meaning that if you got rid of all duplicates, the array may only be 10 unique elements long, but this is not a given.
Once sorted, I need to create a stack that stores the size and type that each unique byte-run starts:
so if I ended up with sorted values:
4,4,4,9,9,9,9,9,14,14
the stack would be:
(4,3), (9,5), (14,2)
I figured that there would be some nice optimizations I could perform under these conditions. If I do heapsort, I can create the stack while I'm sorting, but would this be faster than a qsort and then build the stack afterwords? Would any sorting algorithm run slower because of the large structs I'm using? Any optimizations I can make because I'm only comparing bytes?
BTW: language is c++
Thanks.
I would imagine that STL will do what you want nicely. Re-writing your own sort routines and containers is likely to be error-prone, and slow. So only worry about if you find it's a bottleneck.
In general with large objects it can be faster to sort an array of pointers/indices of the objects rather than the objects. Or sort an array of nodes, where each node contains a pointer/index of the object and the object's sort key (in this case the key is one byte). To do this in C++ you can just supply a suitable comparator to std::sort or std::stable_sort. Then if you need the original objects in order, as opposed to just needing to know the correct order, finally copy the objects into a new array.
Copying 128 bytes is almost certainly much slower than performing a byte comparison, even with an extra indirection. So for optimal performance it's the moves you need to look at, not the comparisons, and dealing in pointers is one way to avoid most of the moving.
You could build your run-length encoding as you perform the copy at the end.
Of course it might be possible to go even faster with some custom sorting algorithm which makes special use of the numbers in your case (64, "around 128", and 1). But even simple questions like "which is fastest - introsort, heap sort or merge sort" are generally impossible to answer without writing and running the code.
The sort would not be slower because you will be sorting pointer or references to the structs and not the actual struct in memory.
The fact that your keys are integers, and there really aren't a lot of them,
odds are the Bucket Sort, with a bucket size of 1, would be very applicable.
How large does a collection have to be for std::map to outpace a sorted std::vector >?
I've got a system where I need several thousand associative containers, and std::map seems to carry a lot of overhead in terms of CPU cache. I've heard somewhere that for small collections std::vector can be faster -- but I'm wondering where that line is....
EDIT: I'm talking about 5 items or fewer at a time in a given structure. I'm concerned most with execution time, not storage space. I know that questions like this are inherently platform-specific, but I'm looking for a "rule of thumb" to use.
Billy3
It's not really a question of size, but of usage.
A sorted vector works well when the usage pattern is that you read the data, then you do lookups in the data.
A map works well when the usage pattern involves a more or less arbitrary mixture of modifying the data (adding or deleting items) and doing queries on the data.
The reason for this is fairly simple: a map has higher overhead on an individual lookup (thanks to using linked nodes instead of a monolithic block of storage). An insertion or deletion that maintains order, however, has a complexity of only O(lg N). An insertion or deletion that maintains order in a vector has a complexity of O(N) instead.
There are, of course, various hybrid structures that can be helpful to consider as well. For example, even when data is being updated dynamically, you often start with a big bunch of data, and make a relatively small number of changes at a time to it. In this case, you can load your data into memory into a sorted vector, and keep the (small number of) added objects in a separate vector. Since that second vector is normally quite small, you simply don't bother with sorting it. When/if it gets too big, you sort it and merge it with the main data set.
Edit2: (in response to edit in question). If you're talking about 5 items or fewer, you're probably best off ignoring all of the above. Just leave the data unsorted, and do a linear search. For a collection this small, there's effectively almost no difference between a linear search and a binary search. For a linear search you expect to scan half the items on average, giving ~2.5 comparisons. For a binary search you're talking about log2 N, which (if my math is working this time of the morning) works out to ~2.3 -- too small a difference to care about or notice (in fact, a binary search has enough overhead that it could very easily end up slower).
If you say "outspace" you mean consuming more space (aka memory), then it's very likely that vector will always be more efficient (the underlying implementation is an continous memory array with no othe data, where map is a tree, so every data implies using more space). This however depends on how much the vector reserves extra space for future inserts.
When it is about time (and not space), vector will also always be more effective (doing a dichotomic search). But it will be extreamly bad for adding new elements (or removing them).
So : no simple answer ! Look-up the complexities, think about the uses you are going to do. http://www.cplusplus.com/reference/stl/
The main issue with std::map is an issue of cache, as you pointed.
The sorted vector is a well-known approach: Loki::AssocVector.
For very small datasets, the AssocVector should crush the map despite the copy involved during insertion simply because of cache locality. The AssocVector will also outperform the map for read-only usage. Binary search is more efficient there (less pointers to follow).
For all other uses, you'll need to profile...
There is however an hybrid alternative that you might wish to consider: using the Allocator parameter of the map to restrict the memory area where the items are allocated, thus minimizing the locality reference issue (the root of cache misses).
There is also a paradigm shift that you might consider: do you need sorted items, or fast look-up ?
In C++, the only STL-compliant containers for fast-lookup have been implemented in terms of Sorted Associative Containers for years. However the up-coming C++0x features the long awaited unordered_map which could out perform all the above solutions!
EDIT: Seeing as you're talking about 5 items or fewer:
Sorting involves swapping items. When inserting into std::map, that will only involve pointer swaps. Whether a vector or map will be faster depends on how fast it is to swap two elements.
I suggest you profile your application to figure it out.
If you want a simple and general rule, then you're out of luck - you'll need to consider at least the following factors:
Time
How often do you insert new items compared to how often you lookup?
Can you batch inserts of new items?
How expensive is sorting you vector? Vectors of elements that are expensive to swap become very expensive to sort - vectors of pointers take far less.
Memory
How much overhead per allocation does the allocator you're using have? std::map will perform one allocation per item.
How big are your key/value pairs?
How big are your pointers? (32/64 bit)
How fast does you implementation of std::vector grow? (Popular growth factors are 1.5 and 2)
Past a certain size of container and element, the overhead of allocation and tree pointers will become outweighed by the cost of the unused memory at the end of the vector - but by far the easiest way to find out if and when this occurs is by measuring.
It has to be in the millionth items. And even there ...
I am more thinking here to memory usage and memory accesses. Under hundreds of thousands, take whatever you want, there will be no noticeable difference. CPUs are really fast these days, and the bottleneck is memory latency.
But even with millions of items, if your map<> has been build by inserting elements in random order. When you want to traverse your map (in sorted order) you'll end up jumping around randomly in the memory, stalling the CPU for memory to be available, resulting in poor performance.
On the other side, if your millions of items are in a vector, traversing it is really fast, taking advantage of the CPU memory accesses predictions.
As other have written, it depends on your usage.
Edit: I would more question the way to organize your thousands of associative containers than the containers themselves if they contain only 5 items.
I am maintaining a fixed-length table of 10 entries. Each item is a structure of like 4 fields. There will be insert, update and delete operations, specified by numeric position. I am wondering which is the best data structure to use to maintain this table of information:
array - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item using [] is faster.
stl vector - insert/delete takes linear time due to shifting; update takes constant time; no space is used for pointers; accessing an item is slower than an array since it is a call to operator[] and a linked list .
stl list - insert and delete takes linear time since you need to iterate to a specific position before applying the insert/delete; additional space is needed for pointers; accessing an item is slower than an array since it is a linked list linear traversal.
Right now, my choice is to use an array. Is it justifiable? Or did I miss something?
Which is faster: traversing a list, then inserting a node or shifting items in an array to produce an empty position then inserting the item in that position?
What is the best way to measure this performance? Can I just display the timestamp before and after the operations?
Use STL vector. It provides an equally rich interface as list and removes the pain of managing memory that arrays require.
You will have to try very hard to expose the performance cost of operator[] - it usually gets inlined.
I do not have any number to give you, but I remember reading performance analysis that described how vector<int> was faster than list<int> even for inserts and deletes (under a certain size of course). The truth of the matter is that these processors we use are very fast - and if your vector fits in L2 cache, then it's going to go really really fast. Lists on the other hand have to manage heap objects that will kill your L2.
Premature optimization is the root of all evil.
Based on your post, I'd say there's no reason to make your choice of data structure here a performance based one. Pick whatever is most convenient and return to change it if and only if performance testing demonstrates it's a problem.
It is really worth investing some time in understanding the fundamental differences between lists and vectors.
The most significant difference between the two is the way they store elements and keep track of them.
- Lists -
List contains elements which have the address of a previous and next element stored in them. This means that you can INSERT or DELETE an element anywhere in the list with constant speed O(1) regardless of the list size. You also splice (insert another list) into the existing list anywhere with constant speed as well. The reason is that list only needs to change two pointers (the previous and next) for the element we are inserting into the list.
Lists are not good if you need random access. So if one plans to access nth element in the list - one has to traverse the list one by one - O(n) speed
- Vectors -
Vector contains elements in sequence, just like an array. This is very convenient for random access. Accessing the "nth" element in a vector is a simple pointer calculation (O(1) speed). Adding elements to a vector is, however, different. If one wants to add an element in the middle of a vector - all the elements that come after that element will have to be re allocated down to make room for the new entry. The speed will depend on the vector size and on the position of the new element. The worst case scenario is inserting an element at position 2 in a vector, the best one is appending a new element. Therefore - insert works with speed O(n), where "n" is the number of elements that need to be moved - not necessarily the size of a vector.
There are other differences that involve memory requirements etc., but understanding these basic principles of how lists and vectors actually work is really worth spending some time on.
As always ... "Premature optimization is the root of all evil" so first consider what is more convenient and make things work exactly the way you want them, then optimize. For 10 entries that you mention - it really does not matter what you use - you will never be able to see any kind of performance difference whatever method you use.
Prefer an std::vector over and array. Some advantages of vector are:
They allocate memory from the free space when increasing in size.
They are NOT a pointer in disguise.
They can increase/decrease in size run-time.
They can do range checking using at().
A vector knows its size, so you don't have to count elements.
The most compelling reason to use a vector is that it frees you from explicit memory management, and it does not leak memory. A vector keeps track of the memory it uses to store its elements. When a vector needs more memory for elements, it allocates more; when a vector goes out of scope, it frees that memory. Therefore, the user need not be concerned with the allocation and deallocation of memory for vector elements.
You're making assumptions you shouldn't be making, such as "accessing an item is slower than an array since it is a call to operator[]." I can understand the logic behind it, but you nor I can know until we profile it.
If you do, you'll find there is no overhead at all, when optimizations are turned on. The compiler inlines the function calls. There is a difference in memory performance. An array is statically allocated, while a vector dynamically allocates. A list allocates per node, which can throttle cache if you're not careful.
Some solutions are to have the vector allocate from the stack, and have a pool allocator for a list, so that the nodes can fit into cache.
So rather than worry about unsupported claims, you should worry about making your design as clean as possible. So, which makes more sense? An array, vector, or list? I don't know what you're trying to do so I can't answer you.
The "default" container tends to be a vector. Sometimes an array is perfectly acceptable too.
First a couple of notes:
A good rule of thumb about selecting data structures: Generally, if you examined all the possibilities and determined that an array is your best choice, start over. You did something very wrong.
STL lists don't support operator[], and if they did the reason that it would be slower than indexing an array has nothing to do with the overhead of a function call.
Those things being said, vector is the clear winner here. The call to operator[] is essentially negligible since the contents of a vector are guaranteed to be contiguous in memory. It supports insert() and erase() operations which you would essntially have to write yourself if you used an array. Basically it boils down to the fact that a vector is essentially an upgraded array which already supports all the operations you need.
I am maintaining a fixed-length table of 10 entries. Each item is a
structure of like 4 fields. There will be insert, update and delete
operations, specified by numeric position. I am wondering which is the
best data structure to use to maintain this table of information:
Based on this description it seems like list might be the better choice since its O(1) when inserting and deleting in the middle of the data structure. Unfortunately you cannot use numeric positions when using lists to do inserts and deletes like you can for arrays/vectors. This dilemma leads to a slew of questions which can be used to make an initial decision of which structure may be best to use. This structure can later be changed if testing clearly shows its the wrong choice.
The questions you need to ask are three fold. The first is how often are you planning on doing deletes/inserts in the middle relative to random reads. The second is how important is using a numeric position compared to an iterator. Finally, is order in your structure important.
If the answer to the first question is random reads will be more prevalent than a vector/array will probably work well. Note iterating through a data structure is not considered a random read even if the operator[] notation is used. For the second question, if you absolutely require numeric position than a vector/array will be required even though this may lead to a performance hit. Later testing may show this performance hit is negligible relative to the easier coding with numerical positions. Finally if order is unimportant you can insert and delete in a vector/array with an O(1) algorithm. A sample algorithm is shown below.
template <class T>
void erase(vector<T> & vect, int index) //note: vector cannot be const since you are changing vector
{
vect[index]= vect.back();//move the item in the back to the index
vect.pop_back(); //delete the item in the back
}
template <class T>
void insert(vector<T> & vect, int index, T value) //note: vector cannot be const since you are changing vector
{
vect.push_back(vect[index]);//insert the item at index to the back of the vector
vect[index] = value; //replace the item at index with value
}
I Believe it's as per your need if one needs more insert/to delete in starting or middle use list(doubly-linked internally) if one needs to access data randomly and addition to last element use array ( vector have dynamic allocation but if you require more operation as a sort, resize, etc use vector)