Heaps and Binary Search Trees - heap

What is the run-time associated with (Max-heapify) that is implemented using k-ary heap.
Is a k-ary heap more efficient than a binary heap asymptotically speaking?
Is a k-ary heap more efficient than a binary heap in practice?
can a search tree be implemented as k-arry?

You've asked a lot of questions, so I'll try to answer all of them in turn.
The runtime of the heapify operation on a k-ary heap is O(n), which is independent of k. This isn't immediately obvious, but most introductory algorithms textbooks have a proof of this result for the case where k = 2.
Let's do the analysis for a k-ary heap in general, which we can then compare against a binary heap by just setting k = 2. In a k-ary heap, the cost of a find-min operation is O(1) (just look at the top of the heap) and the cost of a heapify operation is O(n), as mentioned above. When adding a new element to a k-ary heap, the runtime is proportional to the height of the heap, which is O(logk n) = O(log n / log k) (that follows from using the change-of-base formula for logarithms). It's not common to include the base of a logarithm inside big-O notation, but in this case because k is a parameter we can't ignore its contribution. In an extract-min operation, we need to work from the top of the tree down to the bottom. At each level, we look at up to k children of the current node to find the largest, then potentially do a swap down. This means that there is O(k) work per layer and there are O(log n / log k) layers, so the work done is O(k log n / log k). Asymptotically, for any fixed k, the runtimes of these operations are O(1), O(n), O(log n), and O(log n), respectively, so there's no asymptotic difference between a k-ary heap and a binary heap.
In practice, though, there are differences. One good way to see this is to make k really, really big (say, 10100). In that case, the cost of a deletion will be quite large because there will be up to 10100 children per node, which will dwarf the height of the corresponding binary tree. For middling values of k (k = 3 or 4), there's a chance that it may actually be faster to use a 3-ary or 4-ary tree over a binary tree, but really the best way to find out would be to profile it and see what happens. The interactions of factors like locality of reference, caching, and division speed will all be competing with one another to affect the runtime.
Yes! There are such things as multiway search trees. One of the most famous of these is the B-tree, which is actually a pretty fun data structure to read up on.
Hope this helps!

Related

Order-maintenance data structure in C++

I'm looking for a data structure which would efficiently solve Order-maintenance problem. In the other words, I need to efficiently
insert (in the middle),
delete (in the middle),
compare positions of elements in the container.
I found good articles which discuss this problem:
Two Algorithms for Maintaining Order in a List,
Two Simplified Algorithms for Maintaining Order in a List.
The algorithms are quite efficient (some state to be O(1) for all operations), but they do not seem to be trivial, and I'm wondering if there is an open source C++ implementation of these or similar data structures.
I've seen related topic, some simpler approaches with time complexity O(log n) for all operations were suggested, but here I'm looking for existing implementation.
If there was an example in some other popular languages it would be good too, this way I would be able at least to experiment with it before trying to implement it myself.
Details
I'm going to
maintain a list of pointers to objects,
from time to time I will need to change object's order (delete+insert),
given a subset of objects I need to be able to quickly sort them and process them in correct order.
Note
The standard ordering containers (std::set, std::map) is not what I'm looking for because they will maintain order for me, but I need to order elements myself. Similar to what I would do with std::list, but there position comparison would be linear, which is not acceptable.
If you are looking for easy-to-implement and efficient solution at the same time you could build this structure using a balanced binary search tree (AVL or Red-Black tree). You could implement the operations as follows:
insert(X, Y) (inserts X immediately after Y in the total order) - if X doesn't have a right child set the right child of X to be Y, else let Z be the leftmost node of tree with root X.right (that means the lowest Z = X.right.left.left.left... which is not NULL) and set it's left child of Z to be Y. Balance if you have to. You can see that the total complexity would be O(log n).
delete(X) - just delete the node X as you'd normally will from the tree. Complexity O(log n).
compare(X,Y) - find the path from X to the root and the path from Y to the root. You can find Z , the lowest common ancestor of X and Y, from those two paths. Now, you can compare X and Y depending on whether they are in the left or in the right subtree of Z (they can't be in the same subtree at the same time since then Z won't be their lowest common ancestor). Complexity O(log n).
So you can see that the advantage of this implementation is that all operations would have complexity O(log n) and it's easy to implement.
You can use skip list similar to how you use std::list
Skip lists were first described in 1989 by William Pugh.
To quote the author:
Skip lists are a probabilistic data structure that seem likely to supplant balanced trees as the implementation method of choice for many applications. Skip list algorithms have the same asymptotic expected time bounds as balanced trees and are simpler, faster and use less space.
http://drum.lib.umd.edu/handle/1903/542
STL is the solution to your problem.
It's the standard, proven and efficient containers and the algorithms that support them. almost all of the containers in STL support the actions you have mentioned.
It's seems like std::deque has the best qualities to the tasks you are referring to:
1) Insertion : both from to the back and to the front in O(1) complexity
2) Deletion : unlike contiguous containers, std::deque::erase is O(N) where N is the number of items deleted. meaning that erasing only one item has the complexity of O(1)
3) Position comparison : using std::advance, the complexity on std::deque is O(N)
4) Sorting : using std::sort, usually will use quick sort for the task, and will run in O(n* log n). In MSVC++ at least, the function tries to guess what is the best sorting algorithm for the given container.
do not try to use open source solution/building your own library before you have tried using STL thoroughly!

Performance of vector sort/unique/erase vs. copy to unordered_set

I have a function that gets all neighbours of a list of points in a grid out to a certain distance, which involves a lot of duplicates (my neighbour's neighbour == me again).
I've been experimenting with a couple of different solutions, but I have no idea which is the more efficient. Below is some code demonstrating two solutions running side by side, one using std::vector sort-unique-erase, the other using std::copy into a std::unordered_set.
I also tried another solution, which is to pass the vector containing the neighbours so far to the neighbour function, which will use std::find to ensure a neighbour doesn't already exist before adding it.
So three solutions, but I can't quite wrap my head around which is gonna be faster. Any ideas anyone?
Code snippet follows:
// Vector of all neighbours of all modified phi points, which may initially include duplicates.
std::vector<VecDi> aneighs;
// Hash function, mapping points to their norm distance.
auto hasher = [&] (const VecDi& a) {
return std::hash<UINT>()(a.squaredNorm() >> 2);
};
// Unordered set for storing neighbours without duplication.
std::unordered_set<VecDi, UINT (*) (const VecDi& a)> sneighs(phi.dims().squaredNorm() >> 2, hasher);
... compute big long list of points including many duplicates ...
// Insert neighbours into unordered_set to remove duplicates.
std::copy(aneighs.begin(), aneighs.end(), std::inserter(sneighs, sneighs.end()));
// De-dupe neighbours list.
// TODO: is this method faster or slower than unordered_set?
std::sort(aneighs.begin(), aneighs.end(), [&] (const VecDi& a, const VecDi&b) {
const UINT aidx = Grid<VecDi, D>::index(a, phi.dims(), phi.offset());
const UINT bidx = Grid<VecDi, D>::index(b, phi.dims(), phi.offset());
return aidx < bidx;
});
aneighs.erase(std::unique(aneighs.begin(), aneighs.end()), aneighs.end());
A great deal here is likely to depend on the size of the output set (which, in turn, will depend on how distant of neighbors you sample).
If it's small, (no more than a few dozen items or so) your hand-rolled set implementation using std::vector and std::find will probably remain fairly competitive. Its problem is that it's an O(N2) algorithm -- each time you insert an item, you have to search all the existing items, so each insertion is linear on the number of items already in the set. Therefore, as the set grows larger, its time to insert items grows roughly quadratically.
Using std::set you each insertion has to only do approximately log2(N) comparisons instead of N comparison. That reduces the overall complexity from O(N2) to O(N log N). The major shortcoming is that it's (at least normally) implemented as a tree built up of individually allocated nodes. That typically reduces its locality of reference -- i.e., each item you insert will consist of the data itself plus some pointers, and traversing the tree means following pointers around. Since they're allocated individually, chances are pretty good that nodes that are (currently) adjacent in the tree won't be adjacent in memory, so you'll see a fair number of cache misses. Bottom line: while its speed grows fairly slowly as the number of items increases, the constants involved are fairly large -- for a small number of items, it'll start out fairly slow (typically quite a bit slower than your hand-rolled version).
Using a vector/sort/unique combines some of the advantages of each of the preceding. Storing the items in a vector (without extra pointers for each) typically leads to better cache usage -- items at adjacent indexes are also at adjacent memory locations, so when you insert a new item, chances are that the location for the new item will already be in the cache. The major disadvantage is that if you're dealing with a really large set, this could use quite a bit more memory. Where a set eliminates duplicates as you insert each item (i.e., an item will only be inserted if it's different from anything already in the set) this will insert all the items, then at the end delete all the duplicates. Given current memory availability and the number of neighbors I'd guess you're probably visiting, I doubt this is a major disadvantage in practice, but under the wrong circumstances, it could lead to a serious problem -- nearly any use of virtual memory would almost certainly make it a net loss.
Looking at the last from a complexity viewpoint, it's going to O(N log N), sort of like the set. The difference is that with the set it's really more like O(N log M), where N is the total number of neighbors, and M is the number of unique neighbors. With the vector, it's really O(N log N), where N is (again) the total number of neighbors. As such, if the number of duplicates is extremely large, a set could have a significant algorithmic advantage.
It's also possible to implement a set-like structure in purely linear sequences. This retains the set's advantage of only storing unique items, but also the vector's locality of reference advantage. The idea is to keep most of the current set sorted, so you can search it in log(N) complexity. When you insert a new item, however, you just put it in the separate vector (or an unsorted portion of the existing vector). When you do a new insertion you also do a linear search on those unsorted items.
When that unsorted part gets too large (for some definition of "too large") you sort those items and merge them into the main group, then start the same sequence again. If you define "too large" in terms of "log N" (where N is the number of items in the sorted group) you can retain O(N log N) complexity for the data structure as a whole. When I've played with it, I've found that the unsorted portion can be larger than I'd have expected before it starts to cause a problem though.
Unsorted set has a constant time complexity o(1) for insertion (on average), so the operation will be o(n) where n is the number is elements before removal.
sorting a list of element of size n is o(n log n), going over the list to remove duplicates is o(n). o(n log n) + o(n) = o(n log n)
The unsorted set (which is similar to an hash table in performance) is better.
data about unsorted set times:
http://en.cppreference.com/w/cpp/container/unordered_set

Data structure for O(log N) find and update, considering small L1 cache

I'm currently working on an embedded device project where I'm running into performance problems. Profiling has located an O(N) operation that I'd like to eliminate.
I basically have two arrays int A[N] and short B[N]. Entries in A are unique and ordered by external constraints. The most common operation is to check if a particular value a appears in A[]. Less frequently, but still common is a change to an element of A[]. The new value is unrelated to the previous value.
Since the most common operation is the find, that's where B[] comes in. It's a sorted array of indices in A[], such that A[B[i]] < A[B[j]] if and only if i<j. That means that I can find values in A using a binary search.
Of course, when I update A[k], I have to find k in B and move it to a new position, to maintain the search order. Since I know the old and new values of A[k], that's just a memmove() of a subset of B[] between the old and new position of k. This is the O(N) operation that I need to fix; since the old and new values of A[k] are essentially random I'm moving on average about N/2 N/3 elements.
I looked into std::make_heap using [](int i, int j) { return A[i] < A[j]; } as the predicate. In that case I can easily make B[0] point to the smallest element of A, and updating B is now a cheap O(log N) rebalancing operation. However, I generally don't need the smallest value of A, I need to find if any given value is present. And that's now a O(N log N) search in B. (Half of my N elements are at heap depth log N, a quarter at (log N)-1, etc), which is no improvement over a dumb O(N) search directly in A.
Considering that std::set has O(log N) insert and find, I'd say that it should be possible to get the same performance here for update and find. But how do I do that? Do I need another order for B? A different type?
B is currently a short [N] because A and B together are about the size of my CPU cache, and my main memory is a lot slower. Going from 6*N to 8*N bytes would not be nice, but still acceptable if my find and update go to O(log N) both.
If the only operations are (1) check if value 'a' belongs to A and (2) update values in A, why don't you use a hash table in place of the sorted array B? Especially if A does not grow or shrink in size and the values only change this would be a much better solution. A hash table does not require significantly more memory than an array. (Alternatively, B should be changed not to a heap but to a binary search tree, that could be self-balancing, e.g. a splay tree or a red-black tree. However, trees require extra memory because of the left- and right-pointers.)
A practical solution that grows memory use from 6N to 8N bytes is to aim for exactly 50% filled hash table, i.e. use a hash table that consists of an array of 2N shorts. I would recommend implementing the Cuckoo Hashing mechanism (see http://en.wikipedia.org/wiki/Cuckoo_hashing). Read the article further and you find that you can get load factors above 50% (i.e. push memory consumption down from 8N towards, say, 7N) by using more hash functions. "Using just three hash functions increases the load to 91%."
From Wikipedia:
A study by Zukowski et al. has shown that cuckoo hashing is much
faster than chained hashing for small, cache-resident hash tables on
modern processors. Kenneth Ross has shown bucketized versions of
cuckoo hashing (variants that use buckets that contain more than one
key) to be faster than conventional methods also for large hash
tables, when space utilization is high. The performance of the
bucketized cuckoo hash table was investigated further by Askitis,
with its performance compared against alternative hashing schemes.
std::set usually provides the O(log(n)) insert and delete by using a binary search tree. This unfortunately uses 3*N space for most pointer based implementations. Assuming word sized data, 1 for data, 2 for pointers to left and right child on each node.
If you have some constant N and can guarantee that ceil(log2(N)) is less than half the word size you can use a fixed length array of tree nodes each 2*N size. Use 1 for data, 1 for the indexes of the two child nodes, stored as the upper and lower half of the word. Whether this would let you use a self balancing binary search tree of some manner depends on your N and word size. For a 16 bit system you only get N = 256, but for 32 its 65k.
Since you have limited N, can't you use std::set<short, cmp, pool_allocator> B with Boost's pool_allocator?

Why is std::map implemented as a red-black tree?

Why is std::map implemented as a red-black tree?
There are several balanced binary search trees (BSTs) out there. What were design trade-offs in choosing a red-black tree?
Probably the two most common self balancing tree algorithms are Red-Black trees and AVL trees. To balance the tree after an insertion/update both algorithms use the notion of rotations where the nodes of the tree are rotated to perform the re-balancing.
While in both algorithms the insert/delete operations are O(log n), in the case of Red-Black tree re-balancing rotation is an O(1) operation while with AVL this is a O(log n) operation, making the Red-Black tree more efficient in this aspect of the re-balancing stage and one of the possible reasons that it is more commonly used.
Red-Black trees are used in most collection libraries, including the offerings from Java and Microsoft .NET Framework.
It really depends on the usage. AVL tree usually has more rotations of rebalancing. So if your application doesn't have too many insertion and deletion operations, but weights heavily on searching, then AVL tree probably is a good choice.
std::map uses Red-Black tree as it gets a reasonable trade-off between the speed of node insertion/deletion and searching.
The previous answers only address tree alternatives and red black probably only remains for historical reasons.
Why not a hash table?
A type only requires < operator (comparison) to be used as a key in a tree. However, hash tables require that each key type has a hash function defined. Keeping type requirements to a minimum is very important for generic programming so you can use it with a wide variety of types and algorithms.
Designing a good hash table requires intimate knowledge of the context it which it will be used. Should it use open addressing, or linked chaining? What levels of load should it accept before resizing? Should it use an expensive hash that avoids collisions, or one that is rough and fast?
Since the STL can't anticipate which is the best choice for your application, the default needs to be more flexible. Trees "just work" and scale nicely.
(C++11 did add hash tables with unordered_map. You can see from the documentation it requires setting policies to configure many of these options.)
What about other trees?
Red Black trees offer fast lookup and are self balancing, unlike BSTs. Another user pointed out its advantages over the self-balancing AVL tree.
Alexander Stepanov (The creator of STL) said that he would use a B* Tree instead of a Red-Black tree if he wrote std::map again, because it is more friendly for modern memory caches.
One of the biggest changes since then has been the growth of caches.
Cache misses are very costly, so locality of reference is much more
important now. Node-based data structures, which have low locality of
reference, make much less sense. If I were designing STL today, I
would have a different set of containers. For example, an in-memory
B*-tree is a far better choice than a red-black tree for implementing
an associative container. - Alexander Stepanov
Should maps always use trees?
Another possible maps implementation would be a sorted vector (insertion sort) and binary search. This would work well
for containers which aren't modified often but are queried frequently.
I often do this in C as qsort and bsearch are built in.
Do I even need to use map?
Cache considerations mean it rarely makes sense to use std::list or std::deque over std:vector even for those situations we were taught in school (such as removing an element from the middle of the list).
Applying that same reasoning, using a for loop to linear search a list is often more efficient and cleaner than building a map for a few lookups.
Of course choosing a readable container is usually more important than performance.
AVL trees have a maximum height of 1.44logn, while RB trees have a maximum of 2logn. Inserting an element in a AVL may imply a rebalance at one point in the tree. The rebalancing finishes the insertion. After insertion of a new leaf, updating the ancestors of that leaf has to be done up to the root, or up to a point where the two subtrees are of equal depth. The probability of having to update k nodes is 1/3^k. Rebalancing is O(1). Removing an element may imply more than one rebalancing (up to half the depth of the tree).
RB-trees are B-trees of order 4 represented as binary search trees. A 4-node in the B-tree results in two levels in the equivalent BST. In the worst case, all the nodes of the tree are 2-nodes, with only one chain of 3-nodes down to a leaf. That leaf will be at a distance of 2logn from the root.
Going down from the root to the insertion point, one has to change 4-nodes into 2-nodes, to make sure any insertion will not saturate a leaf. Coming back from the insertion, all these nodes have to be analysed to make sure they correctly represent 4-nodes. This can also be done going down in the tree. The global cost will be the same. There is no free lunch! Removing an element from the tree is of the same order.
All these trees require that nodes carry information on height, weight, color, etc. Only Splay trees are free from such additional info. But most people are afraid of Splay trees, because of the ramdomness of their structure!
Finally, trees can also carry weight information in the nodes, permitting weight balancing. Various schemes can be applied. One should rebalance when a subtree contains more than 3 times the number of elements of the other subtree. Rebalancing is again done either throuh a single or double rotation. This means a worst case of 2.4logn. One can get away with 2 times instead of 3, a much better ratio, but it may mean leaving a little less thant 1% of the subtrees unbalanced here and there. Tricky!
Which type of tree is the best? AVL for sure. They are the simplest to code, and have their worst height nearest to logn. For a tree of 1000000 elements, an AVL will be at most of height 29, a RB 40, and a weight balanced 36 or 50 depending on the ratio.
There are a lot of other variables: randomness, ratio of adds, deletes, searches, etc.
It is just the choice of your implementation - they could be implemented as any balanced tree. The various choices are all comparable with minor differences. Therefore any is as good as any.
Update 2017-06-14: webbertiger edit its answer after I commented. I should point out that its answer is now a lot better to my eyes. But I kept my answer just as additional information...
Due to the fact that I think first answer is wrong (correction: not both anymore) and the third has a wrong affirmation. I feel I had to clarify things...
The 2 most popular tree are AVL and Red Black (RB). The main difference lie in the utilization:
AVL : Better if ratio of consultation (read) is bigger than manipulation (modification). Memory foot print is a little less than RB (due to the bit required for coloring).
RB : Better in general cases where there is a balance between consultation (read) and manipulation (modification) or more modification over consultation. A slightly bigger memory footprint due to the storing of red-black flag.
The main difference come from the coloring. You do have less re-balance action in RB tree than AVL because the coloring enable you to sometimes skip or shorten re-balance actions which have a relative hi cost. Because of the coloring, RB tree also have higher level of nodes because it could accept red nodes between black ones (having the possibilities of ~2x more levels) making search (read) a little bit less efficient... but because it is a constant (2x), it stay in O(log n).
If you consider the performance hit for a modification of a tree (significative) VS the performance hit of consultation of a tree (almost insignificant), it become natural to prefer RB over AVL for a general case.

When is a hash table better to use than a search tree?

When is a hash table better to use than a search tree?
Depends on what you want to do with the data structure.
Operation Hash table Search Tree
Search O(1) O(log(N))
Insert O(1) O(log(N))
Delete O(1) O(log(N))
Traversal O(N) O(N)
Min/Max-Key -hard- O(log(N))
Find-Next-Key -hard- O(1)
Insert, Search on Hashtable depend on the load factor of the hash
table and its design. Poorly designed hastables can have O(N) search and insert. The same is true for your Search Tree.
Deleting in a hash table can be cumbersome depending on your collision
resolution stategy.
Traversing the container, Finding Min/Max, Find Next/Prev sort of
operations are better on a search tree because of its ordering.
All estimates of search tree above are for 'balanced' search trees.
When the average access and insertion time are more important than the best access and insertion time. Practically I think search trees are usually as good a solution as hash tables, because even though in theory big theta of one is better than big theta of log n, log n is very fast, and as you start dealing with large values of n the effect on the practical difference shrinks. Also, big theta of one says nothing about the value of the constant. Granted, this holds for the complexity of trees as well, but the constant factors of trees are much more fixed, usually at a very low number, among implementations than those of hash tables.
Again, I know theorists will disagree with me here, but it's computers we're dealing with here, and for log n to be of any significance burden for a computer n must be unrealistically large. If n is a trillion then log of n is 40, and a computer today can perform 40 iterations rather quickly. For log of n to grow to 50 you already have over a quadrillion elements.
The C++ standard as it stands today doesn't provide a hash-table among its containers and I think there's a reason people were fine with it as it is for over a decade.
My take on things:
Operation Hash table(1) SBB Search Tree(2)
.find(obj) -> obj O(1) O(1)*
.insert(obj) O(1) O(log(N))
.delete(obj) O(1) O(log(N))
.traverse / for x in... O(N) O(N)
.largerThan(obj) -> {objs} unsupported O(log(N))
\
union right O(1) + parent O(1)
.sorted() -> [obj] unsupported no need
\
already sorted so no need
to print out, .traverse() is O(N)
.findMin() -> obj unsupported** O(log(N)), maybe O(1)
\
descend from root, e.g.:
root.left.left.left...left -> O(log(N))
might be able to cache for O(1)
.findNext(obj) -> obj unsupported O(log(N))
\
first perform x=.find(obj) which is O(1)
then descend from that node, e.g.:
x.right.left.left...right -> O(log(N))
(1) http://en.wikipedia.org/wiki/Hash_table
(2) http://en.wikipedia.org/wiki/Self-balancing_binary_search_tree , e.g. http://en.wikipedia.org/wiki/Tango_tree or http://en.wikipedia.org/wiki/Splay_tree
(*) You can use a hash table in conjunction with a search tree to obtain this. There is no asymptotic speed or space penalty. Otherwise, it's O(log(N)).
(**) Unless you never delete, in which case just cache smallest and largest elements and it's O(1).
These costs may be amortized.
Conclusion:
You want to use trees when the ordering matters.
Among many issues, it depends on how expensive the hash function is. In my experience, hashes are generally about twice as fast as balanced trees for a sensible hash function, but it's certainly possible for them to be slower.