Why is std::map implemented as a red-black tree? - c++

Why is std::map implemented as a red-black tree?
There are several balanced binary search trees (BSTs) out there. What were design trade-offs in choosing a red-black tree?

Probably the two most common self balancing tree algorithms are Red-Black trees and AVL trees. To balance the tree after an insertion/update both algorithms use the notion of rotations where the nodes of the tree are rotated to perform the re-balancing.
While in both algorithms the insert/delete operations are O(log n), in the case of Red-Black tree re-balancing rotation is an O(1) operation while with AVL this is a O(log n) operation, making the Red-Black tree more efficient in this aspect of the re-balancing stage and one of the possible reasons that it is more commonly used.
Red-Black trees are used in most collection libraries, including the offerings from Java and Microsoft .NET Framework.

It really depends on the usage. AVL tree usually has more rotations of rebalancing. So if your application doesn't have too many insertion and deletion operations, but weights heavily on searching, then AVL tree probably is a good choice.
std::map uses Red-Black tree as it gets a reasonable trade-off between the speed of node insertion/deletion and searching.

The previous answers only address tree alternatives and red black probably only remains for historical reasons.
Why not a hash table?
A type only requires < operator (comparison) to be used as a key in a tree. However, hash tables require that each key type has a hash function defined. Keeping type requirements to a minimum is very important for generic programming so you can use it with a wide variety of types and algorithms.
Designing a good hash table requires intimate knowledge of the context it which it will be used. Should it use open addressing, or linked chaining? What levels of load should it accept before resizing? Should it use an expensive hash that avoids collisions, or one that is rough and fast?
Since the STL can't anticipate which is the best choice for your application, the default needs to be more flexible. Trees "just work" and scale nicely.
(C++11 did add hash tables with unordered_map. You can see from the documentation it requires setting policies to configure many of these options.)
What about other trees?
Red Black trees offer fast lookup and are self balancing, unlike BSTs. Another user pointed out its advantages over the self-balancing AVL tree.
Alexander Stepanov (The creator of STL) said that he would use a B* Tree instead of a Red-Black tree if he wrote std::map again, because it is more friendly for modern memory caches.
One of the biggest changes since then has been the growth of caches.
Cache misses are very costly, so locality of reference is much more
important now. Node-based data structures, which have low locality of
reference, make much less sense. If I were designing STL today, I
would have a different set of containers. For example, an in-memory
B*-tree is a far better choice than a red-black tree for implementing
an associative container. - Alexander Stepanov
Should maps always use trees?
Another possible maps implementation would be a sorted vector (insertion sort) and binary search. This would work well
for containers which aren't modified often but are queried frequently.
I often do this in C as qsort and bsearch are built in.
Do I even need to use map?
Cache considerations mean it rarely makes sense to use std::list or std::deque over std:vector even for those situations we were taught in school (such as removing an element from the middle of the list).
Applying that same reasoning, using a for loop to linear search a list is often more efficient and cleaner than building a map for a few lookups.
Of course choosing a readable container is usually more important than performance.

AVL trees have a maximum height of 1.44logn, while RB trees have a maximum of 2logn. Inserting an element in a AVL may imply a rebalance at one point in the tree. The rebalancing finishes the insertion. After insertion of a new leaf, updating the ancestors of that leaf has to be done up to the root, or up to a point where the two subtrees are of equal depth. The probability of having to update k nodes is 1/3^k. Rebalancing is O(1). Removing an element may imply more than one rebalancing (up to half the depth of the tree).
RB-trees are B-trees of order 4 represented as binary search trees. A 4-node in the B-tree results in two levels in the equivalent BST. In the worst case, all the nodes of the tree are 2-nodes, with only one chain of 3-nodes down to a leaf. That leaf will be at a distance of 2logn from the root.
Going down from the root to the insertion point, one has to change 4-nodes into 2-nodes, to make sure any insertion will not saturate a leaf. Coming back from the insertion, all these nodes have to be analysed to make sure they correctly represent 4-nodes. This can also be done going down in the tree. The global cost will be the same. There is no free lunch! Removing an element from the tree is of the same order.
All these trees require that nodes carry information on height, weight, color, etc. Only Splay trees are free from such additional info. But most people are afraid of Splay trees, because of the ramdomness of their structure!
Finally, trees can also carry weight information in the nodes, permitting weight balancing. Various schemes can be applied. One should rebalance when a subtree contains more than 3 times the number of elements of the other subtree. Rebalancing is again done either throuh a single or double rotation. This means a worst case of 2.4logn. One can get away with 2 times instead of 3, a much better ratio, but it may mean leaving a little less thant 1% of the subtrees unbalanced here and there. Tricky!
Which type of tree is the best? AVL for sure. They are the simplest to code, and have their worst height nearest to logn. For a tree of 1000000 elements, an AVL will be at most of height 29, a RB 40, and a weight balanced 36 or 50 depending on the ratio.
There are a lot of other variables: randomness, ratio of adds, deletes, searches, etc.

It is just the choice of your implementation - they could be implemented as any balanced tree. The various choices are all comparable with minor differences. Therefore any is as good as any.

Update 2017-06-14: webbertiger edit its answer after I commented. I should point out that its answer is now a lot better to my eyes. But I kept my answer just as additional information...
Due to the fact that I think first answer is wrong (correction: not both anymore) and the third has a wrong affirmation. I feel I had to clarify things...
The 2 most popular tree are AVL and Red Black (RB). The main difference lie in the utilization:
AVL : Better if ratio of consultation (read) is bigger than manipulation (modification). Memory foot print is a little less than RB (due to the bit required for coloring).
RB : Better in general cases where there is a balance between consultation (read) and manipulation (modification) or more modification over consultation. A slightly bigger memory footprint due to the storing of red-black flag.
The main difference come from the coloring. You do have less re-balance action in RB tree than AVL because the coloring enable you to sometimes skip or shorten re-balance actions which have a relative hi cost. Because of the coloring, RB tree also have higher level of nodes because it could accept red nodes between black ones (having the possibilities of ~2x more levels) making search (read) a little bit less efficient... but because it is a constant (2x), it stay in O(log n).
If you consider the performance hit for a modification of a tree (significative) VS the performance hit of consultation of a tree (almost insignificant), it become natural to prefer RB over AVL for a general case.

Related

Order-maintenance data structure in C++

I'm looking for a data structure which would efficiently solve Order-maintenance problem. In the other words, I need to efficiently
insert (in the middle),
delete (in the middle),
compare positions of elements in the container.
I found good articles which discuss this problem:
Two Algorithms for Maintaining Order in a List,
Two Simplified Algorithms for Maintaining Order in a List.
The algorithms are quite efficient (some state to be O(1) for all operations), but they do not seem to be trivial, and I'm wondering if there is an open source C++ implementation of these or similar data structures.
I've seen related topic, some simpler approaches with time complexity O(log n) for all operations were suggested, but here I'm looking for existing implementation.
If there was an example in some other popular languages it would be good too, this way I would be able at least to experiment with it before trying to implement it myself.
Details
I'm going to
maintain a list of pointers to objects,
from time to time I will need to change object's order (delete+insert),
given a subset of objects I need to be able to quickly sort them and process them in correct order.
Note
The standard ordering containers (std::set, std::map) is not what I'm looking for because they will maintain order for me, but I need to order elements myself. Similar to what I would do with std::list, but there position comparison would be linear, which is not acceptable.
If you are looking for easy-to-implement and efficient solution at the same time you could build this structure using a balanced binary search tree (AVL or Red-Black tree). You could implement the operations as follows:
insert(X, Y) (inserts X immediately after Y in the total order) - if X doesn't have a right child set the right child of X to be Y, else let Z be the leftmost node of tree with root X.right (that means the lowest Z = X.right.left.left.left... which is not NULL) and set it's left child of Z to be Y. Balance if you have to. You can see that the total complexity would be O(log n).
delete(X) - just delete the node X as you'd normally will from the tree. Complexity O(log n).
compare(X,Y) - find the path from X to the root and the path from Y to the root. You can find Z , the lowest common ancestor of X and Y, from those two paths. Now, you can compare X and Y depending on whether they are in the left or in the right subtree of Z (they can't be in the same subtree at the same time since then Z won't be their lowest common ancestor). Complexity O(log n).
So you can see that the advantage of this implementation is that all operations would have complexity O(log n) and it's easy to implement.
You can use skip list similar to how you use std::list
Skip lists were first described in 1989 by William Pugh.
To quote the author:
Skip lists are a probabilistic data structure that seem likely to supplant balanced trees as the implementation method of choice for many applications. Skip list algorithms have the same asymptotic expected time bounds as balanced trees and are simpler, faster and use less space.
http://drum.lib.umd.edu/handle/1903/542
STL is the solution to your problem.
It's the standard, proven and efficient containers and the algorithms that support them. almost all of the containers in STL support the actions you have mentioned.
It's seems like std::deque has the best qualities to the tasks you are referring to:
1) Insertion : both from to the back and to the front in O(1) complexity
2) Deletion : unlike contiguous containers, std::deque::erase is O(N) where N is the number of items deleted. meaning that erasing only one item has the complexity of O(1)
3) Position comparison : using std::advance, the complexity on std::deque is O(N)
4) Sorting : using std::sort, usually will use quick sort for the task, and will run in O(n* log n). In MSVC++ at least, the function tries to guess what is the best sorting algorithm for the given container.
do not try to use open source solution/building your own library before you have tried using STL thoroughly!

augmenting/index priority_queue in STL

I am using STL priority_queue as an data structure in my graph application. You can safely assume it like a advance version of Prim's spanning tree algorithm.
With in the Algorithm I want to find a node in the priority queue (not just a minimum node) efficiently.[ this is needed because cost of node might get changed and need to be fixed in priority_queue]
All i have to do is augment the priority_queue and index it based on my node key's also. I don't find any way this can be done in STL. Can anyone have better idea how to do it in STL?
The std::priority_queue<T> doesn't support efficient look-up of nodes: it uses a d-ary heap, typically with d == 2. This representation doesn't keep nodes put. If you really want to use a std::priority_queue<T> with Prim's algorithm, the only way is to just add nodes with their current shortest distance and possibly add each node multiple times. This turns the size of the into O(E) instead of O(N), though, i.e., for graphs with many edges it will result in a much higher complexity.
You can use something like std::map<...> but that really suffers from pretty much the same problem: you can either locate the next node to extract efficiently or you can locate the nodes to update efficiently.
The "proper" approach is to use a node-based priority queue, e.g., a Fibanocci-heap: Since the nodes stay put, you can get a handle from the heap when inserting a node and efficiently update the distance of a node through the handle. Access to the closest node is efficient using the few top nodes in the heap's set of trees. The overall performance of basic heap operations (push(), top(), and pop()) are slower for Fibonacci heaps than for d-ary heaps but the efficient update of individual nodes makes their use worthwhile. I seem to recall that Prim's algorithm actually required Fibonacci-heaps anyway to achieve the tight complexity bound.
I know that there is an implementation of Fibonacci-heaps at Boost. An efficient implementation of Fibonacci heaps isn't entirely trivial but they are more efficient than just being of theoretical interest.

Kd tree: data stored only in leaves vs stored in leaves and nodes

I am trying to implement a Kd tree to perform the nearest neighbor and approximate nearest neighbor search in C++. So far I came across 2 versions of the most basic Kd tree.
The one, where data is stored in nodes and in leaves, such as here
The one, where data is stored only in leaves, such as here
They seem to be fundamentally the same, having the same asymptotic properties.
My question is: are there some reasons why choose one over another?
I figured two reasons so far:
The tree which stores data in nodes too is shallower by 1 level.
The tree which stores data only in leaves has easier to
implement delete data function
Are there some other reasons I should consider before deciding which one to make?
You can just mark nodes as deleted, and postpone any structural changes to the next tree rebuild. k-d-trees degrade over time, so you'll need to do frequent tree rebuilds. k-d-trees are great for low-dimensional data sets that do not change, or where you can easily afford to rebuild an (approximately) optimal tree.
As for implementing the tree, I recommend using a minimalistic structure. I usually do not use nodes. I use an array of data object references. The axis is defined by the current search depth, no need to store it anywhere. Left and right neighbors are given by the binary search tree of the array. (Otherwise, just add an array of byte, half the size of your dataset, for storing the axes you used). Loading the tree is done by a specialized QuickSort. In theory it's O(n^2) worst-case, but with a good heuristic such as median-of-5 you can get O(n log n) quite reliably and with minimal constant overhead.
While it doesn't hold as much for C/C++, in many other languages you will pay quite a price for managing a lot of objects. A type*[] is the cheapest data structure you'll find, and in particular it does not require a lot of management effort. To mark an element as deleted, you can null it, and search both sides when you encounter a null. For insertions, I'd first collect them in a buffer. And when the modification counter reaches a threshold, rebuild.
And that's the whole point of it: if your tree is really cheap to rebuild (as cheap as resorting an almost pre-sorted array!) then it does not harm to frequently rebuild the tree.
Linear scanning over a short "insertion list" is very CPU cache friendly. Skipping nulls is very cheap, too.
If you want a more dynamic structure, I recommend looking at R*-trees. They are actually desinged to balance on inserts and deletions, and organize the data in a disk-oriented block structure. But even for R-trees, there have been reports that keeping an insertion buffer etc. to postpone structural changes improves performance. And bulk loading in many situations helps a lot, too!

What are binary trees and why are they useful? [duplicate]

I am wondering what the particular applications of binary trees are. Could you give some real examples?
To squabble about the performance of binary-trees is meaningless - they are not a data structure, but a family of data structures, all with different performance characteristics. While it is true that unbalanced binary trees perform much worse than self-balancing binary trees for searching, there are many binary trees (such as binary tries) for which "balancing" has no meaning.
Applications of binary trees
Binary Search Tree - Used in many search applications where data is constantly entering/leaving, such as the map and set objects in many languages' libraries.
Binary Space Partition - Used in almost every 3D video game to determine what objects need to be rendered.
Binary Tries - Used in almost every high-bandwidth router for storing router-tables.
Hash Trees - Used in torrents and specialized image-signatures in which a hash needs to be verified, but the whole file is not available. Also used in blockchains for eg. Bitcoin.
Heaps - Used in implementing efficient priority-queues, which in turn are used for scheduling processes in many operating systems, Quality-of-Service in routers, and A* (path-finding algorithm used in AI applications, including robotics and video games). Also used in heap-sort.
Huffman Coding Tree (Chip Uni) - Used in compression algorithms, such as those used by the .jpeg and .mp3 file-formats.
GGM Trees - Used in cryptographic applications to generate a tree of pseudo-random numbers.
Syntax Tree - Constructed by compilers and (implicitly) calculators to parse expressions.
Treap - Randomized data structure used in wireless networking and memory allocation.
T-tree - Though most databases use some form of B-tree to store data on the drive, databases which keep all (most) their data in memory often use T-trees to do so.
The reason that binary trees are used more often than n-ary trees for searching is that n-ary trees are more complex, but usually provide no real speed advantage.
In a (balanced) binary tree with m nodes, moving from one level to the next requires one comparison, and there are log_2(m) levels, for a total of log_2(m) comparisons.
In contrast, an n-ary tree will require log_2(n) comparisons (using a binary search) to move to the next level. Since there are log_n(m) total levels, the search will require log_2(n)*log_n(m) = log_2(m) comparisons total. So, though n-ary trees are more complex, they provide no advantage in terms of total comparisons necessary.
(However, n-ary trees are still useful in niche-situations. The examples that come immediately to mind are quad-trees and other space-partitioning trees, where divisioning space using only two nodes per level would make the logic unnecessarily complex; and B-trees used in many databases, where the limiting factor is not how many comparisons are done at each level but how many nodes can be loaded from the hard-drive at once)
When most people talk about binary trees, they're more often than not thinking about binary search trees, so I'll cover that first.
A non-balanced binary search tree is actually useful for little more than educating students about data structures. That's because, unless the data is coming in in a relatively random order, the tree can easily degenerate into its worst-case form, which is a linked list, since simple binary trees are not balanced.
A good case in point: I once had to fix some software which loaded its data into a binary tree for manipulation and searching. It wrote the data out in sorted form:
Alice
Bob
Chloe
David
Edwina
Frank
so that, when reading it back in, ended up with the following tree:
Alice
/ \
= Bob
/ \
= Chloe
/ \
= David
/ \
= Edwina
/ \
= Frank
/ \
= =
which is the degenerate form. If you go looking for Frank in that tree, you'll have to search all six nodes before you find him.
Binary trees become truly useful for searching when you balance them. This involves rotating sub-trees through their root node so that the height difference between any two sub-trees is less than or equal to 1. Adding those names above one at a time into a balanced tree would give you the following sequence:
1. Alice
/ \
= =
 
2. Alice
/ \
= Bob
/ \
= =
 
3. Bob
_/ \_
Alice Chloe
/ \ / \
= = = =
 
4. Bob
_/ \_
Alice Chloe
/ \ / \
= = = David
/ \
= =
 
5. Bob
____/ \____
Alice David
/ \ / \
= = Chloe Edwina
/ \ / \
= = = =
 
6. Chloe
___/ \___
Bob Edwina
/ \ / \
Alice = David Frank
/ \ / \ / \
= = = = = =
You can actually see whole sub-trees rotating to the left (in steps 3 and 6) as the entries are added and this gives you a balanced binary tree in which the worst case lookup is O(log N) rather than the O(N) that the degenerate form gives. At no point does the highest NULL (=) differ from the lowest by more than one level. And, in the final tree above, you can find Frank by only looking at three nodes (Chloe, Edwina and, finally, Frank).
Of course, they can become even more useful when you make them balanced multi-way trees rather than binary trees. That means that each node holds more than one item (technically, they hold N items and N+1 pointers, a binary tree being a special case of a 1-way multi-way tree, with 1 item and 2 pointers).
With a three-way tree, you end up with:
Alice Bob Chloe
/ | | \
= = = David Edwina Frank
/ | | \
= = = =
This is typically used in maintaining keys for an index of items. I've written database software optimised for the hardware where a node is exactly the size of a disk block (say, 512 bytes) and you put as many keys as you can into a single node. The pointers in this case were actually record numbers into a fixed-length-record direct-access file separate from the index file (so record number X could be found by just seeking to X * record_length).
For example, if the pointers are 4 bytes and the key size is 10, the number of keys in a 512-byte node is 36. That's 36 keys (360 bytes) and 37 pointers (148 bytes) for a total of 508 bytes with 4 bytes wasted per node.
The use of multi-way keys introduces the complexity of a two-phase search (multi-way search to find the correct node combined with a small sequential (or linear binary) search to find the correct key in the node) but the advantage in doing less disk I/O more than makes up for this.
I see no reason to do this for an in-memory structure, you'd be better off sticking with a balanced binary tree and keeping your code simple.
Also keep in mind that the advantages of O(log N) over O(N) don't really appear when your data sets are small. If you're using a multi-way tree to store the fifteen people in your address book, it's probably overkill. The advantages come when you're storing something like every order from your hundred thousand customers over the last ten years.
The whole point of big-O notation is to indicate what happens as the N approaches infinity. Some people may disagree but it's even okay to use bubble sort if you're sure the data sets will stay below a certain size, as long as nothing else is readily available :-)
As to other uses for binary trees, there are a great many, such as:
Binary heaps where higher keys are above or equal to lower ones rather than to the left of (or below or equal to and right);
Hash trees, similar to hash tables;
Abstract syntax trees for compilation of computer languages;
Huffman trees for compression of data;
Routing trees for network traffic.
Given how much explanation I generated for the search trees, I'm reticent to go into a lot of detail on the others, but that should be enough to research them, should you desire.
The organization of Morse code is a binary tree.
A binary tree is a tree data structure in which each node has at most two child nodes, usually distinguished as "left" and "right". Nodes with children are parent nodes, and child nodes may contain references to their parents. Outside the tree, there is often a reference to the "root" node (the ancestor of all nodes), if it exists. Any node in the data structure can be reached by starting at root node and repeatedly following references to either the left or right child. In a binary tree a degree of every node is maximum two.
Binary trees are useful, because as you can see in the picture, if you want to find any node in the tree, you only have to look a maximum of 6 times. If you wanted to search for node 24, for example, you would start at the root.
The root has a value of 31, which is greater than 24, so you go to the left node.
The left node has a value of 15, which is less than 24, so you go to the right node.
The right node has a value of 23, which is less than 24, so you go to the right node.
The right node has a value of 27, which is greater than 24, so you go to the left node.
The left node has a value of 25, which is greater than 24, so you go to the left node.
The node has a value of 24, which is the key we are looking for.
This search is illustrated below:
You can see that you can exclude half of the nodes of the entire tree on the first pass. and half of the left subtree on the second. This makes for very effective searches. If this was done on 4 billion elements, you would only have to search a maximum of 32 times. Therefore, the more elements contained in the tree, the more efficient your search can be.
Deletions can become complex. If the node has 0 or 1 child, then it's simply a matter of moving some pointers to exclude the one to be deleted. However, you can not easily delete a node with 2 children. So we take a short cut. Let's say we wanted to delete node 19.
Since trying to determine where to move the left and right pointers to is not easy, we find one to substitute it with. We go to the left sub-tree, and go as far right as we can go. This gives us the next greatest value of the node we want to delete.
Now we copy all of 18's contents, except for the left and right pointers, and delete the original 18 node.
To create these images, I implemented an AVL tree, a self balancing tree, so that at any point in time, the tree has at most one level of difference between the leaf nodes (nodes with no children). This keeps the tree from becoming skewed and maintains the maximum O(log n) search time, with the cost of a little more time required for insertions and deletions.
Here is a sample showing how my AVL tree has kept itself as compact and balanced as possible.
In a sorted array, lookups would still take O(log(n)), just like a tree, but random insertion and removal would take O(n) instead of the tree's O(log(n)). Some STL containers use these performance characteristics to their advantage so insertion and removal times take a maximum of O(log n), which is very fast. Some of these containers are map, multimap, set, and multiset.
Example code for an AVL tree can be found at http://ideone.com/MheW8
The main application is binary search trees. These are a data structure in which searching, insertion, and removal are all very fast (about log(n) operations)
One interesting example of a binary tree that hasn't been mentioned is that of a recursively evaluated mathematical expression. It's basically useless from a practical standpoint, but it is an interesting way to think of such expressions.
Basically each node of the tree has a value that is either inherent to itself or is evaluated by recursively by operating on the values of its children.
For example, the expression (1+3)*2 can be expressed as:
*
/ \
+ 2
/ \
1 3
To evaluate the expression, we ask for the value of the parent. This node in turn gets its values from its children, a plus operator and a node that simply contains '2'. The plus operator in turn gets its values from children with values '1' and '3' and adds them, returning 4 to the multiplication node which returns 8.
This use of a binary tree is akin to reverse polish notation in a sense, in that the order in which operations are performed is identical. Also one thing to note is that it doesn't necessarily have to be a binary tree, it's just that most commonly used operators are binary. At its most basic level, the binary tree here is in fact just a very simple purely functional programming language.
Applications of Binary tree:
Implementing routing table in router.
Data compression code
Implementation of Expression parsers and expression solvers
To solve database problem such as indexing.
Expression evaluation
I dont think there is any use for "pure" binary trees. (except for educational purposes)
Balanced binary trees, such as Red-Black trees or AVL trees are much more useful, because they guarantee O(logn) operations. Normal binary trees may end up being a list (or almost list) and are not really useful in applications using much data.
Balanced trees are often used for implementing maps or sets.
They can also be used for sorting in O(nlogn), even tho there exist better ways to do it.
Also for searching/inserting/deleting Hash tables can be used, which usually have better performance than binary search trees (balanced or not).
An application where (balanced) binary search trees would be useful would be if searching/inserting/deleting and sorting would be needed. Sort could be in-place (almost, ignoring the stack space needed for the recursion), given a ready build balanced tree. It still would be O(nlogn) but with a smaller constant factor and no extra space needed (except for the new array, assuming the data has to be put into an array). Hash tables on the other hand can not be sorted (at least not directly).
Maybe they are also useful in some sophisticated algorithms for doing something, but tbh nothing comes to my mind. If i find more i will edit my post.
Other trees like f.e. B+trees are widely used in databases
Binary trees are used in Huffman coding, which are used as a compression code.
Binary trees are used in Binary search trees, which are useful for maintaining records of data without much extra space.
One of the most common application is to efficiently store data in sorted form in order to access and search stored elements quickly. For instance, std::map or std::set in C++ Standard Library.
Binary tree as data structure is useful for various implementations of expression parsers and expression solvers.
It may also be used to solve some of database problems, for example, indexing.
Generally, binary tree is a general concept of particular tree-based data structure and various specific types of binary trees can be constructed with different properties.
In C++ STL, and many other standard libraries in other languages, like Java and C#. Binary search trees are used to implement set and map.
One of the most important application of binary trees are balanced binary search trees like:
Red-Black trees
AVL trees
Scapegoat trees
These type of trees have the property that the difference in heights of left subtree and right subtree is maintained small by doing operations like rotations each time a node is inserted or deleted.
Due to this, the overall height of the tree remains of the order of log n and the operations such as search, insertion and deletion of the nodes are performed in O(log n) time. The STL of C++ also implements these trees in the form of sets and maps.
They can be used as a quick way to sort data. Insert data into a binary search tree at O(log(n)). Then traverse the tree in order to sort them.
Implementations of java.util.Set
On modern hardware, a binary tree is nearly always suboptimal due to bad cache and space behaviour. This also goes for the (semi)balanced variants. If you find them, it is where performance doesn't count (or is dominated by the compare function), or more likely for historic or ignorance reasons.
your programs syntax, or for that matter many other things such as natural languages can be parsed using binary tree (though not necessarily).
BST a kind of binary tree is used in Unix kernels for managing a set of virtual memory areas(VMAs).
Nearly all database (and database-like) programs use a binary tree to implement their indexing systems.
A compiler who uses a binary tree for a representation of a AST, can use known algorithms for
parsing the tree like postorder,inorder.The programmer does not need to come up with it's own algorithm.
Because a binary tree for a source file is higher than the n-ary tree,it's building takes more time.
Take this production:
selstmnt := "if" "(" expr ")" stmnt "ELSE" stmnt
In a binary tree it will have 3levels of nodes, but the n-ary tree will have 1 level(of chids)
That's why Unix based OS-s are slow.

Algorithm for removing multiple elements from a Red-Black tree

Is there an algorithm that allows to delete multiple nodes in RB or the only algorithm to delete nodes from RB is to do it in a way:
1. Delete one and
2. if necessary fix tree
If more than half the nodes are being deleted, you can throw away the existing tree and build a new one in less time, since insertion and deletion have the same cost.
If there is no constraint that says the tree must remain balanced while you are doing a multiple node deletion, it seems reasonable to me that you could fix the tree after doing multiple deletes.
The purpose of balancing the tree after each deletion is to make sure the delete operation is consistent in its computational cost. If you do not require deletes to be consistent in this fashion, you could write your delete algorithm differently. The fixup operation will be a more lengthy computation than after just one delete, though. It will also likely be a more complicated one, too.
You might be interested in a data structure called TeardownTree. It supports delete_range operation that works in O(k + log n) time, where n is the initial number of items in the tree and k is the number of items deleted (and returned to the caller). Full disclosure: I am the author.
I have to emphasize that the data structure does not support the insert operation, but is optimized for clone and delete_range. I have written up an informal description of the algorithm. With all the optimizations the code is now significantly different from that document, but it should be enough to grasp the idea.
The way I solved this problem was to create a linked list of nodes to be deleted and to use the standard deletion method on them in succession. I would be interested to know if there is a better algorithm for mass deletion.
I would suggest using a Treap instead of a Red-Black tree, since balancing the tree in various scenarios seems easier with a Treap v/s a Red-Black tree. I'm have the same problem as you, but with Treaps. https://cstheory.stackexchange.com/questions/20495/algorithm-to-bulk-delete-nodes-from-a-treap
Am unsure if the expected height bounds remain valid post bulk-deletion (algorithm mentioned in the question).