Implementation of std::set using different data structures - c++

Inspired by this question: Why isn't std::set just called std::binary_tree? I came up with one of my own. Is red-black tree the only possible data structure fullfilling requirements of std::set or are there any others? For instance, another self-balancing tree - AVL tree - seems to be good alternative with very similar properties. Is it theoretically possible to replace underlying data structure of std::set or is there a group of requirements that makes red-black tree the only viable choice?

AVL trees have worse performance (not to be confused with asymptotic complexity) than RB trees in most real world situations. You can base std::set on AVL trees and be fully standard-compliant, but it will not win you any customers.

Related

Red-black tree in C++ STL

In current C++ STL, where are red-black tree used? (I assume map and set do?) Is the red-black tree used 2-3 tree (ie only left or right child can be red) or 2-3-4 tree (ie both left and right child can be red)? is there a red-black tree lib in STL?
std::map, std::multimap, std::set and std::multiset are often implemented in terms of red-black trees but doing so is not mandated by the standard. Since using a red-black tree is not required there is also no requirement for any particular flavor of RB tree.
I believe (though am not certain) that SGI's STL (upon which much of the original standard library is based) actually does have a red-black tree available. If it helps, I know boost::intrusive does have a reusable red-black tree implementation.

Missing merge and split for boost AVL trees?

Boost provides boost::container::set/map/multiset/multimap where the underlying binary-search-tree (BST) can be configured, and it can be chosen to be an AVL tree.
One (maybe the most crucial one) reason, why one would prefer AVL trees over Red-Black trees, is the merge and split operations of complexity O(logN). However, surprisingly for me, it seems boost::container doesn't provide these operations. The documentation describes merge as an element-wise operation of O(NlogN) complexity (this is regardless of the underlying BST implementation!?), and the documentation doesn't even mention about split!
I can't say about merge, but as for split, I can assume that the lack of it might be justified by the constant-time size issue, so split of complexity O(logN) might not be aware of the sizes of the two resulting parts. But this could be fixed having an intrusive container and holding the sub-tree nodes count with each node.
There is also boost::intrusive::avl_set, but I couldn't find the AVL merge and split algorithms in the documentation.
So the questions are.
Is there a full-functional, ready-to-go AVL based implementation of set/map/multiset/multimap that provides merge and split operations with the complexity of O(logN)?
If not, how can I build one using boost::intrusive::avl_set?

What is the downside to using an AVL tree? [duplicate]

This question already has answers here:
Binary search tree over AVL tree
(4 answers)
Closed 7 years ago.
It seems to me like an AVL tree is always more efficient than an BST. So why do people still use BST? Is there a costs incurred in an AVL implementation?
AVL Trees have their advantages and disadvantages over other trees like Red-Black trees or 2-3 Trees or just plain BST.
AVL Trees:
Advantage:
Simple. Fairly easy to code and understand
Extra storage: fairly minimal, 2 bits per node (to store +1,0,-1). There is also a
trick where you can use 1 bit per node by using your children's
single bit.
The constant for lookup (look in your favorite
analysis book: Knuth, etc.) is 1.5
(so 1.5 log). Red-Black trees have a constant of 2 (so 2*log(n) for a lookup).
Disadvantages:
Deletions are expensive-ish. It is still logarithm to delete a node, but you may have to "rotate" all the way up to the root of the tree. In other words, a bigger constant. Red-Black trees only have to do 1 rotate.
Not simple to code. They are probably the "simplist" of the tree family, but there are still a lot or corner cases.
If you expect your data to be sorted, a BST devolves into a linked list. BUT if you expect your data to be fairly random, "on average", all of your operations for a BST (lookup, deletion, insertion) will be about logarithmic. It's VERY easy to code up BSTs: AVL trees, although fairly straightforward to code up, have a lot of corner cases and testing can be tricky.
In summary, plain Binary Search Trees are easy to code and get right, and if your data is fairly random, should perform very well (on average, all operations would be logarithmic). AVL Tree are harder to code, but guarantee logarithmic performance, at the price of some extra space and more complex code.

What kind of sorting do we have in std::map or std::set?

If we insert random integers in std::set, and read the set, we get ordered sequence. Basically, we have implicit sorting. However, what kind of sorting algorithm do we have here? Is it heapsort?
At least normally, it's a tree sort. That is, the items are inserted into a balanced binary search tree (usually a red-black tree), and that tree is traversed in order.
std::set and std::map are usually implemented using self-balancing binary search trees, usually red-black trees because they tend to be the fastest in practice. For detailed information about these data structures, you might want to consult a textbook such as Introduction to Algorithms by Cormen et al. or Algorithms by Sedgewick.
The C++ standard doesn't enforce any kind of sorting algorithm for std::set or std::map. So their implementations might differ among different platforms.
With that said, they are commonly implemented as a red-black tree, which is a self-balancing binary search tree. They don't sort their contents, they maintain the order of their contents as new items are inserted. Inserting a single item to them is usually O(logn).

Does any std::set implementation not use a red-black tree?

Has anyone seen an implementation of the STL where stl::set is not implemented as a red-black tree?
The reason I ask is that, in my experiments, B-trees outperform std::set (and other red-black tree implementations) by a factor of 2 to 4 depending on the value of B. I'm curious if there is a compelling reason to use red-black trees when there appear to be faster data structures available.
Some folks over at Google actually built a B-tree based implementation of the C++ standard library containers. They seem to have much better performance than standard binary tree implementations.
There is a catch, though. The C++ standard guarantees that deleting an element from a map or set only invalidates other iterators pointing to the same element in the map or set. With the B-tree based implementation, due to node splits and consolidations, the erase member functions on these new structures may invalidate iterators to other elements in the tree. As a result, these implementations aren't perfect replacements for the standard implementations and couldn't be used in a conformant implementation.
Hope this helps!
There is at least one implementation based on AVL trees instead of red-black trees.
I haven't tried to verify conformance of this implementation, but at least (unlike a B-tree based implementation) it at least could be written to conform perfectly to the requirements of the standard.