This question already has answers here:
Binary search tree over AVL tree
(4 answers)
Closed 7 years ago.
It seems to me like an AVL tree is always more efficient than an BST. So why do people still use BST? Is there a costs incurred in an AVL implementation?
AVL Trees have their advantages and disadvantages over other trees like Red-Black trees or 2-3 Trees or just plain BST.
AVL Trees:
Advantage:
Simple. Fairly easy to code and understand
Extra storage: fairly minimal, 2 bits per node (to store +1,0,-1). There is also a
trick where you can use 1 bit per node by using your children's
single bit.
The constant for lookup (look in your favorite
analysis book: Knuth, etc.) is 1.5
(so 1.5 log). Red-Black trees have a constant of 2 (so 2*log(n) for a lookup).
Disadvantages:
Deletions are expensive-ish. It is still logarithm to delete a node, but you may have to "rotate" all the way up to the root of the tree. In other words, a bigger constant. Red-Black trees only have to do 1 rotate.
Not simple to code. They are probably the "simplist" of the tree family, but there are still a lot or corner cases.
If you expect your data to be sorted, a BST devolves into a linked list. BUT if you expect your data to be fairly random, "on average", all of your operations for a BST (lookup, deletion, insertion) will be about logarithmic. It's VERY easy to code up BSTs: AVL trees, although fairly straightforward to code up, have a lot of corner cases and testing can be tricky.
In summary, plain Binary Search Trees are easy to code and get right, and if your data is fairly random, should perform very well (on average, all operations would be logarithmic). AVL Tree are harder to code, but guarantee logarithmic performance, at the price of some extra space and more complex code.
Related
Boost provides boost::container::set/map/multiset/multimap where the underlying binary-search-tree (BST) can be configured, and it can be chosen to be an AVL tree.
One (maybe the most crucial one) reason, why one would prefer AVL trees over Red-Black trees, is the merge and split operations of complexity O(logN). However, surprisingly for me, it seems boost::container doesn't provide these operations. The documentation describes merge as an element-wise operation of O(NlogN) complexity (this is regardless of the underlying BST implementation!?), and the documentation doesn't even mention about split!
I can't say about merge, but as for split, I can assume that the lack of it might be justified by the constant-time size issue, so split of complexity O(logN) might not be aware of the sizes of the two resulting parts. But this could be fixed having an intrusive container and holding the sub-tree nodes count with each node.
There is also boost::intrusive::avl_set, but I couldn't find the AVL merge and split algorithms in the documentation.
So the questions are.
Is there a full-functional, ready-to-go AVL based implementation of set/map/multiset/multimap that provides merge and split operations with the complexity of O(logN)?
If not, how can I build one using boost::intrusive::avl_set?
Inspired by this question: Why isn't std::set just called std::binary_tree? I came up with one of my own. Is red-black tree the only possible data structure fullfilling requirements of std::set or are there any others? For instance, another self-balancing tree - AVL tree - seems to be good alternative with very similar properties. Is it theoretically possible to replace underlying data structure of std::set or is there a group of requirements that makes red-black tree the only viable choice?
AVL trees have worse performance (not to be confused with asymptotic complexity) than RB trees in most real world situations. You can base std::set on AVL trees and be fully standard-compliant, but it will not win you any customers.
I would like to know which balanced BST would be easy to code in C++, and still have a complexity roughly equal to O(logn).
I've already tried Red Black trees, but would like an alternative that is less complex to code. I have worked with Treaps in the past, but am interested in exploring options that either perform better or are easier to implement.
What are your suggestions?
AVL trees generally perform better than Treaps in my experience, and they're not any harder to implement.
They work by rotating branches of the tree that become unbalanced after any insertion or deletion. This guarantees that they will have perfect balance so they can't be "tricked" by strange data.
Treaps on the other hand are distributed randomly, which for large data sets is close to balanced, but you still don't get that perfect O(logn). Furthermore you could just happen to come across a data set that inserts in a very unbalanced way, and your access time can get close to O(n).
Check out wikipedia's page for more info: en.wikipedia.org/wiki/Avl_tree
This question already has answers here:
What's the difference between set<pair> and map in C++?
(7 answers)
Closed 9 years ago.
I'm relatively new to c++ programming and was wondering if someone could help clarify a few questions for me.
http://www.cplusplus.com/reference/set/set/
http://www.cplusplus.com/reference/map/map/
I've been reading on how to implement STL binary search trees and I keep noticing that std::set and std::map are constantly mentioned as the methods for accomplishing such a task. What exactly is the difference between the two however? To me both seem almost identical and I'm not sure if there's something I'm not noticing that makes one better than the other for specific tasks. Is there any advantage of using std::set over std::map for implementing a STL binary search tree that takes values from an array or vector (such as speed for example)?
If someone could help me understand this concept I'd greatly appreciate it!
Both std::set and std::map are associative containers. The difference is that std::sets contain only the key,
while in std::map there is an associated value , that is if A -> B , then map[A]=B , this works like hashing but not O(1) , instead O(log N).
You can further look unordered_map which provides the operation in O(1) time.
std::set keeps data in sorted format .
Implementation of both is done by balanced trees (like AVL or Red-Black trees ) giving O(logN) time complexity.
But important point to note is that both can store unique values . To overcome that you must see also multimap and multiset .
Hope this helps !
update: In the case of Red-Black tree re-balancing rotation is an O(1) operation while with AVL this is a O(log n) operation, making the Red-Black tree more efficient in this aspect of the re-balancing stage and one of the possible reasons that it is more commonly used.
What are some general tips/pointers on vectorizing tree operations? Memory layout wise, algorithm wise, etc.
Some domain specific stuff:
Each parent node will have quite a few (20 - 200) child nodes.
Each node has a low probability of having child nodes.
Operations on the tree is mostly conditional walks.
The performance of walking over the tree is more important than insertion/deletion/search speeds.
Beware, this is very hard to implement. Last year a team of Intel, Oracle and UCSC presented an amazing solution "FAST: Fast Architecture Sensitive Tree Search
on Modern CPUs and GPUs". They won the "Best Paper Award 2010" by ACM SIGMOD.
Because of the random nature of trees it's not immediately obvious how vectorizing walks would be a big plus to you.
I would lay the tree out as a flat array of (parentid, node data) "node" items, sorted by parentid, so you can at least visit the children of a node together. Of course this doesn't give you much if your tree isn't "fat" (ie low number of children on average for a node).
Your best bet though is really just to emphasize on the brute force of SIMD, because you really can't do fancy random jumps through your list with this API.
Edit: I wouldn't throw out the normal tree class you most likely have though, implement the SIMD way and see if you really gain anything, I'm not convinced you will...
What about using spectral graph theory algorithms? They should be much easy to vectorize, as they deal with matrices.