Best container for ordered elements - c++

I am developing a time critical application and am looking for the best container to handle a collection of elements of the following type:
class Element
{
int weight;
Data data;
};
Considering that the time critical steps of my application, periodically performed in a unique thread, are the following:
the Element with the lowest weight is extracted from the container, and data is processed;
a number n>=0 of new Element, with random(*) weight, are inserted into the container.
Some Element of the container may have the same weight. The total number of elements in the container at any time is quite high and almost stationary in average (several hundreds of thousands). The time needed for the extract/process/insert sequence described above must be as low as possible. (Note(*): new weight is actually computed from data but is considered as random here to simplify.)
After some searches and tries of different STL containers, I ended up using std::multiset container, which performed about 5 times faster than ordered std::vector and 16 times faster than ordered std:list. But still, I am wondering if I could achieve even better performance, considering that the bottleneck of my application remains in the extract/insert operations.
Notice that, though I only tried ordered containers, I did not mentioned "ordered container" in my requirements. I do not need the Element to be ordered in the container, I only need to perform the "extract lowest weighted element"/"insert new elements" operations as fast as possible. I am not limited to STL containers and can go for boost, or any other implementation, if suited.
Thanks for help.

I do not need the Element to be ordered in the container, I only need to perform the "extract lowest weighted element"/"insert new elements" operations as fast as possible.
Then you should try priority_queue<T>, or use make_heap/push_heap/pop_heap operations on a vector<T>.
Since you are looking for min heap, not max heap, you would need to supply a custom comparator that orders your Element objects in reverse order.

I think that within the STL , lazy std::vector will give the best results.
a suggested psuedo code may look like:
emplace back new elements in the end of the vector
only when you want to smallest element, sort the array and get the first element
in this way, you get the amortized insertion time of vector, relativly small amount of memory allocations and good cache locality.

It is instructive to consider different candidates and how your assumptions would impact the final selection. When your requirements change, it then becomes easer to switch containers.
Generally, the containers of size N have roughly 3 complexity categories for their basic acces/modification operations: (amortized) O(1), O(log N) and O(N).
Your first requirement (finding the lowest weight element) gives you roughly three candidates with O(1) complexity, and one candidate with O(N) complexity per element:
O(1) for std::priority_queue<Element, LowestWeightCompare>
O(1) for std::multiset<Element, LowestWeightCompare>
O(1) for boost::flat_multiset<Element, LowestWeightCompare>
O(N) for std::unordered_multiset<Element>
Your second requirement (randomized insertion of new elements) gives you the following complexity per element for each of the above four choices
O(log N) for std::priority_queue
O(log N) for std::multiset
O(N) for boost::flat_multiset
amortized O(1) for std::unordered_multiset
Among the first three choices, boost::multiset should be dominated by the other two for large N. Among the remaining two, the better caching behavior of std::priority_queue over std::multiset might prevail. But: measure, measure, measure, however.
It is a priori ambiguous whether std::unorderd_multiset is competitive with the other three. Depending on the number n of randomly inserted elements, total cost per batch of find(1)-insert(n) would be O(N) search + O(n) insertion for std::unordered_multiset and O(1) search + O(n log N) insertion for std::multiset. Again, measure, measure, measure.
How robust are these considerations with respect to your requirements? The story would change as follows if you would have to find the k lowest weight elements in each batch. Then you'd have to compare the costs of find(k)-insert(n). The search costs would scale roughly as
O(k log N) for std::priority_queue
O(1) for std::multiset
O(1) for boost::flat_multiset
O(k N) for std::unordered_multiset
Note that a priority_queue can only efficiently access the top element, not its k top elements without actually calling pop() on them, which has O(log N) complexity per call. If you expect that your code would likely change from a find(1)-insert(n) batch-mode to a find(k)-insert(n), then it might be a good idea to choose std::multiset, or at least document what kind of interface changes it would require.
Bonus: the best of both worlds?! You might also want to experiment a bit with Boost.MultiIndex and use something like (check the documentation to get the syntax correct)
boost::multi_index<
Element,
indexed_by<
ordered_non_unique<member<Element, &Element::weight>, std::less<>>,
hashed_non_unique<>
>
>
The above code will create a node-based container that implement two pointer structures to keep track of both the ordering by Element weight and also to allow quick hashed insertion. This will allow O(1) lookup of the lowest weight Element and also allows O(n) random insertion of n new elements.
For large N, it should scale better than the four previously mentioned containers, but again, for moderate N, cache effects induced by pointer chasing into random memory might spoil its theoretical advantage over std::priority_queue. Did I mention the mantra of measure, measure, measure?

Try either of these:
std::map<int,std::vector<Data>>
or
std::unordered_map<int,std::vector<Data>>
The int above is the weight.
These both have different speeds for find, remove and add depending on many different factors such as if the element is there or not. (If there, unordered_map .find is faster, if not, map .find is faster)

Related

Is there a data structure like a C++ std set which also quickly returns the number of elements in a range?

In a C++ std::set (often implemented using red-black binary search trees), the elements are automatically sorted, and key lookups and deletions in arbitrary positions take time O(log n) [amortised, i.e. ignoring reallocations when the size gets too big for the current capacity].
In a sorted C++ std::vector, lookups are also fast (actually probably a bit faster than std::set), but insertions are slow (since maintaining sortedness takes time O(n)).
However, sorted C++ std::vectors have another property: they can find the number of elements in a range quickly (in time O(log n)).
i.e., a sorted C++ std::vector can quickly answer: how many elements lie between given x,y?
std::set can quickly find iterators to the start and end of the range, but gives no clue how many elements are within.
So, is there a data structure that allows all the speed of a C++ std::set (fast lookups and deletions), but also allows fast computation of the number of elements in a given range?
(By fast, I mean time O(log n), or maybe a polynomial in log n, or maybe even sqrt(n). Just as long as it's faster than O(n), since O(n) is almost the same as the trivial O(n log n) to search through everything).
(If not possible, even an estimate of the number to within a fixed factor would be useful. For integers a trivial upper bound is y-x+1, but how to get a lower bound? For arbitrary objects with an ordering there's no such estimate).
EDIT: I have just seen the
related question, which essentially asks whether one can compute the number of preceding elements. (Sorry, my fault for not seeing it before). This is clearly trivially equivalent to this question (to get the number in a range, just compute the start/end elements and subtract, etc.)
However, that question also allows the data to be computed once and then be fixed, unlike here, so that question (and the sorted vector answer) isn't actually a duplicate of this one.
The data structure you're looking for is an Order Statistic Tree
It's typically implemented as a binary search tree in which each node additionally stores the size of its subtree.
Unfortunately, I'm pretty sure the STL doesn't provide one.
All data structures have their pros and cons, the reason why the standard library offers a bunch of containers.
And the rule is that there is often a balance between quickness of modifications and quickness of data extraction. Here you would like to easily access the number of elements in a range. A possibility in a tree based structure would be to cache in each node the number of elements of its subtree. That would add an average log(N) operations (the height of the tree) on each insertion or deletion, but would highly speedup the computation of the number of elements in a range. Unfortunately, few classes from the C++ standard library are tailored for derivation (and AFAIK std::set is not) so you will have to implement your tree from scratch.
Maybe you are looking for LinkedHashSet alternate for C++ https://docs.oracle.com/javase/7/docs/api/java/util/LinkedHashSet.html.

STL priority_queue<pair> vs. map

I need a priority queue that will store a value for every key, not just the key. I think the viable options are std::multi_map<K,V> since it iterates in key order, or std::priority_queue<std::pair<K,V>> since it sorts on K before V. Is there any reason I should prefer one over the other, other than personal preference? Are they really the same, or did I miss something?
A priority queue is sorted initially, in O(N) time, and then iterating all the elements in decreasing order takes O(N log N) time. It is stored in a std::vector behind the scenes, so there's only a small coefficient after the big-O behavior. Part of that, though, is moving the elements around inside the vector. If sizeof (K) or sizeof (V) is large, it will be a bit slower.
std::map is a red-black tree (in universal practice), so it takes O(N log N) time to insert the elements, keeping them sorted after each insertion. They are stored as linked nodes, so each item incurs malloc and free overhead. Then it takes O(N) time to iterate over them and destroy the structure.
The priority queue overall should usually have better performance, but it's more constraining on your usage: the data items will move around during iteration, and you can only iterate once.
If you don't need to insert new items while iterating, you can use std::sort with a std::vector, of course. This should outperform the priority_queue by some constant factor.
As with most things in performance, the only way to judge for sure is to try it both ways (with real-world testcases) and measure.
By the way, to maximize performance, you can define a custom comparison function to ignore the V and compare only the K within the pair<K,V>.

std::map get the lowest n elements time

std::map should be implemented with a binary search tree as I read in the documentation and it sorts them too.
I need to insert rapidly and retrieve rapidly elements. I also need to get the first lowest N elements from time to time.
I was thinking about using a std::map, is it a good choice? If it is, what is the time I would need to retrieve the lowest N elements? O(n*logn)?
Given you need both retrieval and n smallest, I would say std::map is reasonable choice. But depending on the exact access pattern std::vector with sorting might be a good choice too.
I am not sure what you mean by retrieve. Time to read k elements is O(k) (provided you do it sequentially using iterator), time to remove them is O(k log n) (n is the total amount of elements; even if you do it sequentially using iterators).
You can use iterators to rapidly read through the lowest N elements. Going from begin() to the N-1th element will take O(n) time (getting the next element is amortised constant time for a std::map).
I'd note, however, that it is often actually faster to use a sorted std::vector with a binary chop search method to implement what it sounds like you are doing so depending on your exact requirements this might be worth investigating.
The C++ standard requires that all required iterator operations (including iterator increment) be amortized constant time. Consequently, getting the first N items in a container must take amortized O(N) time.
I would say yes to both questions.

Looking for clarification on Hashing and BST functions and Big O notation

So I am trying to understand the data types and Big O notation of some functions for a BST and Hashing.
So first off, how are BSTs and Hashing stored? Are BSTs usually arrays, or are they linked lists because they have to point to their left and right leaves?
What about Hashing? I've had the most trouble finding clear information regarding Hashing in terms of computation-based searching. I understand that Hashing is best implemented with an array of chains. Is this for faster searching or to decrease overhead on creating the allocated data type?
This following question might be just bad interpretation on my part, but what makes a traversal function different from a search function in BSTs, Hashing, and STL containers?
Is traversal Big O(N) for BSTS because you're actually visiting each node/data member, whereas search() can reduce its time by eliminating half the searching field?
And somewhat related, why is it that in the STL, list.insert() and list.erase() have a Big O(1) whereas the vector and deque counterparts are O(N)?
Lastly, why would a vector.push_back() be O(N)? I thought the function could be done something along the lines of this like O(1), but I've come across text saying it is O(N):
vector<int> vic(2,3);
vector<int>::const iterator IT = vic.end();
//wanna insert 4 to the end using push_back
IT++;
(*IT) = 4;
hopefully this works. I'm a bit tired but I would love any explanations why something similar to that wouldn't be efficient or plausible. Thanks
BST's (Ordered Binary Trees) are a series of nodes where a parent node points to its two children, which in turn point to their max-two children, etc. They're traversed in O(n) time because traversal visits every node. Lookups take O(log n) time. Inserts take O(1) time because internally they don't need to a bunch of existing nodes; just allocate some memory and re-aim the pointers. :)
Hashes (unordered_map) use a hashing algorithm to assign elements to buckets. Usually buckets contain a linked list so that hash collisions just result in several elements in the same bucket. Traversal will again be O(n), as expected. Lookups and inserts will be amortized O(1). Amortized means that on average, O(1), though an individual insert might result in a rehashing (redistribution of buckets to minimize collisions). But over time the average complexity is O(1). Note, however, that big-O notation doesn't really deal with the "constant" aspect; only order of growth. The constant overhead in the hashing algorithms can be high enough that for some data-sets the O(log n) binary trees outperform the hashes. Nevertheless, the hash's advantage is that its operations are constant time-complexity.
Search functions take advantage (in the case of binary trees) of the notion of "order"; a search through a BST has the same characteristics as a basic binary search over an ordered array. O(log n) growth. Hashes don't really "search". They compute the bucket, and then quickly run through the collisions to find the target. That's why lookups are constant time.
As for insert and erase; in array-based sequence containers, all elements that come after the target have to be bumped over to the right. Move semantics in C++11 can improve upon the performance, but the operation is still O(n). For linked sequence containers (list, forward_list, trees), insertion and erasing just means fiddling with some pointers internally. It's a constant-time process.
push_back() will be O(1) until you exceed the existing allocated capacity of the vector. Once the capacity is exceeded, a new allocation takes place to produce a container that is large enough to accept more elements. All the elements need to then be moved into the larger memory region, which is an O(n) process. I believe Move Semantics can help here as well, but it's still going to be O(n). Vectors and strings are implemented such that as they allocate space for a growing data set, they allocate more than they need, in anticipation of additional growth. This is an efficiency safeguard; it means that the typical push_back() won't trigger a new allocation and move of the entire data set into a larger container. But eventually after enough push_backs, the limit will be reached, and the vector's elements will be copied into a larger container, which again has some extra headroom left over for more efficient push_backs.
Traversal refers to visiting every node, whereas search is only to find a particular node, so your intuition is spot on there. O(N) complexity because you need to visit N nodes.
std::vector::insert is for insert in the middle, and it involves copying all subsequent elements over by one slot, inorder to make room for the element being inserted, hence O(N). Linked list doesnt have this issue, hence O(1). Similar logic for erase. deque properties are similar to vector
std::vector::push_back is a O(1) operation, for the most part, only deviates if capacity is exceeded and reallocations + copy are needed.

A variation of priority queue

I need some kind of priority queue to store pairs <key, value>. Values are unique, but keys aren't. I will be performing the following operations (most common first):
random insertion;
retrieving (and removing) all elements with the least key.
random removal (by value);
I can't use std::priority_queue because it only supports removing the head.
For now, I'm using an unsorted std::list. Insertion is performed by just pushing new elements to the back (O(1)). Operation 2 sorts the list with list::sort (O(N*logN)), before performing the actual retrieval. Removal, however, is O(n), which is a bit expensive.
Any idea of a better data structure?
When you need order, use an ordered container. There is no point in paying the cost of sorting later on.
Your current solution is:
Insertion O(1)
Retrieval O(N log N)
Removal O(N) (which is as good as you can get without keeping another index there)
Simply using a std::multi_map you could have:
Insertion O(log N)
Retrieval O(log N) <-- much better isn't it ? We need to find the end of the range
Removal O(N)
Now, you could do slightly better with a std::map< key, std::vector<value> >:
Insertion O(log M) where M is the number of distinct keys
Retrieval O(1) (begin is guaranteed to be amortized constant time)
Removal O(N)
You can't really push the random removal... unless you're willing to keep another index there. For example:
typedef std::vector<value_type> data_value_t;
typedef std::map<key_type, data_value_t> data_t;
typedef std::pair<data_t::iterator,size_t> index_value_t;
// where iterator gives you the right vector and size_t is an index in it
typedef std::unordered_map<value_type, index_value_t> index_t;
But keeping this second index up to date is error prone... and will be done at the expense of the other operations! For example, with this structure, you would have:
Insertion O(log M) --> complexity of insertion in hash map is O(1)
Retrieval O(N/M) --> need to de index all the values in the vector, there are N/M in average
Removal O(N/M) --> finding in hash map O(1), dereferencing O(1), removing from the vector O(N/M) because we need to shift approximately half the content of the vector. Using a list would yield O(1)... but might not be faster (depends on the number of elements because of the memory tradeoff).
Also bear in mind that hash map complexity are amortized ones. Trigger a reallocation because you overgrew the load factor, and this particular insertion will take a very long time.
I'd really go with the std::map<key_type, std::vector<value_type> > in your stead. That's the best bang for the buck.
Can you reverse the order of the collection, i.e. store them in <value, key> order?
Then you could just use std::map having O(logn) time for insertion O(n) for removal (traversing whole collection) and O(logn) for random removal of value (which would be the key of said map).
If you could find a map implementation based on hashes instead of trees (like std::map) the times would be even better: O(1), O(n), O(1).
If you're using Visual Studio they have hash_multimap. I should also add that Boost has an unordered multimap, here. If you need an ordered multimap, STL multimap or ordered multiset STL multiset
std::multimap seem to be what you are searching for.
It will store your objects ordered by key, allow you to retrieve the lowest/highest key value (begin(), rbegin()) and all the object with a given key (equal_range, lower_bound, upper_bound).
(EDIT: if you have just a few items, say less than 30, you should also test the performance of just using a deque or a vector)
If I understood well, you performance target is to have fast (1) and (3), and (2) is not that important. In this case, and given that values are unique, why not just have a std::set<value>, and do a sequential search for (2)? You'd have O(log n) for (1) and (3), and O(n) for (2). Better yet, if your STL has std::hash_set, you'd have close to O(1) for (1) and (3).
If you need something better than O(n) for (2), one alternative would be to have a set of priority queues.
Ok, so I've tested many options and ended up with something based on the idea of Matthieu M.. I'm currently using a std::map<key_type, std::list<value_type> >, where the value_type contains a std::list<value_type>::iterator to itself, which is useful for removal.
Removal must check if the vector is empty, which implies a map query and possibly a call to erase. Worst-case complexity is when keys are distinct, O(logN) for insertion, O(1) for retrieval and O(logN) for removal. I've got very good experimental results comparing to other alternatives on my test machine.
Using a std::vector is less efficient both in terms of theoretical complexity (O(N) worst-case for removal when keys are identical) and experimentation I've been doing.