Do not understand how c++ set works - c++

I am using the std::set class for a leetcode question. From Googling I learned that std::set keeps the elements in an ordered manner and I heard that set.begin() returns the smallest element. But I also heard set uses red-black trees and has O(log n) time complexity. I don't understand how these two can go together, as in how does set.begin() return smallest element when a red-black tree doesn't guarantee the smallest element will be the head.
Also set.begin() function makes it seem like this container uses an array instead of a linked list to build the redblack tree, which again I don't understand. How can an array be used instead of a tree?

In the underlying tree, the leftmost node is the smallest, and begin() is the leftmost node.
Iterating traverses the tree's nodes in the appropriate order.
For instance, if the tree is (this is a simpler "regular" binary search tree, but the principle is the same with red-black trees)
4
/ \
2 6
/\ /
1 3 5
then iterating will start at 1, then move up to 2, down again to 3, up two steps to 4, down two steps to 5, and finally up to 6.
(This means that the "steps" when iterating over a tree are not constant-time operations.)

The standard does not impose a particular implementation for any of the containers. A red-black tree is a possible implementation of set. It's also free to choose how to make begin constant complexity.
Assuming the implementation chooses a tree, the most obvious way is by keeping a pointer to the least element, as well as to the root of the tree.
You can implement an ordered set with an array, e.g. boost::flat_set. That doesn't meet all the same complexity requirements as std::set. It does this by only inserting into the sorted position, and only if there isn't an equivalent element already present.

Related

how is std::set (red/black tree) forward iteration implemented?

if i did an inorder traversal of a balanced BST from smallest to largest value, i'd use a DFS which maintains a stack of size lg(n). but if I needed to find the inorder successor of an arbitrary node, it's a worst case lg(n) operation. but if I wanted to iterate in order, id need to find the inorder successor repeatedly for each node yielding O(n*lg(n)). Does std::set use some trick for inorder iteration, or does it really cost O(n*lg(n)), or is the time cost amortized somehow?
There is no trick in the in-order iteration; all you need is an O(1) mechanism for finding the parent of the current node.
The in-order scan traverses each parent-child edge exactly twice: once from parent-to-child and once from child-to-parent. Since there are the same number of edges as non-root nodes in the tree, it follows that the complete iteration performs Θ(n) transitions to iterate over n nodes, which is amortised constant time per node.
The usual way to find the parent is to store a parent link in the node. The extra link certainly increases the size of a node (four pointers instead of three) but the cost is within reason.
If it were not for the C++ iterator invalidation rules, which require that iterators to elements of ordered associative containers must not be invalidated by insertion or deletion of other elements, it would be possible to maintain a stack of size O(log n) in the iterator. Such an iterator would be bulky but not unmanageably so (since log n is limited in practice to a smallish integer), but it would not be usable after any tree modification.

C++ : Running time of next() and prev() in a multiset iterator?

What is the time complexity of applying the next() and prev() function on an multiset<int>::iterator type object where the corresponding multiset contains the N elements?
I understand that in STL, a multiset is implemented as a balanced binary search tree and hence I expect the time complexity to be O(log N) per operation (in the worst case) in case we just traverse the tree until we find the appropriate value, but I have a hunch that this should be O(1) on average.
But what if the tree is implemented as follows - when inserting element x in the balanced binary search tree, we can also retrieve the the largest number in the tree smaller than x and the smallest number in the tree larger than x in O(log N). Thus in theory, we can have each node in the tree maintain pointer to its next and prev elements so that next() and prev() then run in constant time per query.
Can anybody share some light on what's up?
The standard mandates that all operations on iterators run in amortized constant time: http://www.eel.is/c++draft/iterator.requirements#general-10. The basic idea is that each iterator category only defines operations which can be implemented in amortized time.
Iteration is a common thing to do, and if operator++ on an iterator (I guess that's what you mean by next?) was logN, then traversing a container in a loop would be NlogN. The standard makes this impossible; since operator++ is amortized constant, iterating over any data structure in the standard is always O(N).
However, I dug into the implementation of multiset on gcc5.4 to at least have one example. Both set and multiset are implemented in terms of the same underlying structure, _Rb_tree. Delving into that structure a bit, it's nodes not only have left and right node pointers, but also a parent node pointer, and an iterator is just a pointer to a node.
Given a node in a binary search tree that includes a pointer to its parent, it's easy to figure out what the next node in the tree is:
If it has a right child, descend to the right child. Then descend left child as far as you can; that's the next node.
If it does not have a right child, ascend to your parent, and determine whether the original node is the left or right child of the parent. If the node is the left child of the parent, then the parent is the next node. If the node is the right of the parent, the parent was already processed, so you need to apply the same logic recursively between the parent and its grandparent.
This SO question shows the source code with the core logic: What is the definition of _Rb_tree_increment in bits/stl_tree.h? (it's surprisingly hard to find for some reason).
This does not have constant time, in particular in both 1. and 2. we have loops that either descend or ascend and could take at most log(N) time. However, you can easily convince yourself that the amortized time is constant because as you traverse the tree with this algorithm, each node is touched at most 4 times:
Once on the way down to its left child.
Once when it comes back up from the left child and needs to consider itself.
Once when it descends to its right child.
Once when ascending from the right child.
In retrospect I would say this is the fairly-obvious choice. Iteration over the whole data structure is a common operation, so the performance is very important. Adding a third pointer to the node is not a trivial amount of space, but it's not the end of the world either; at most it will bloat the data structure from 3 to 4 words (2 pointers + data, which alignment makes 3 at the minimum, vs 3 pointers + data). If you work with ranges, as opposed to two iterators, an alternative would be to maintain a stack and then you don't need the parent pointer, but this only works if you iterate from the very beginning to the end; it wouldn't allow iteration from an iterator in the middle to the end (which is also an important operation for BST's).
I think the next() and prev() will take anywhere between 1 and h where h is the height of the tree which is approx O(log N). If you use next() to walk from beginning to end, N nodes, the iterator should visit the entire tree and that is about 2N (2 because the iterator has to traverse the downwards then upwards through the pointers that link the nodes). Total traversal is not O(N * log N) as some steps are better than other steps. At the very worst, a next() might be from a leaf node to the head node which is h approximately O(log N). But that will only occur twice (once to arrive at begin(), the second time at the right most node of the left tree to the head node). So on average next() and prev() are 2 which is O(1).

Time Complexity in singly link list

I am studying data-structure: singly link list.
The website says singly linked list has a insertion and deletion time complexity of O(1). Am I missing something?
website link
I do this in C++, and I only have a root pointer. If I want to insert at the end, then I have to travel all the way to the back, which means O(n).
The explanation for this is, that the big O notation in the linked table refers to the function implementation itself, not including the list traversal to find the previous reference node in the list.
If you follow the link to the wikipedia article of the Singly-LinkedList implementation it becomes more clear:
function insertAfter(Node node, Node newNode)
function removeAfter(Node node)
The above function signatures already take the predecessor node as argument (same for the other variants implicitly).
Finding the predecessor is a different operation and may be O(n) or other time complexity.
You missed the interface at two places:
std::list::insert()/std:list::erase() need an iterator to the element where to insert or erase. This means you have no search but only alter two pointers in elements in the list, which is constant complexity.
Inserting at the end of a list can be done via push_back. The standard requires this to be also O(1). Which means, if you have a std::list, it will store first and last element.
EDIT: Sorry, you meet std::forward_list. Point 1 holds also for this even if the names are insert_after and erase_after. Points 2 not, you have to iterate to the end of the list.
I do this in C++, and I only have a root pointer. If I want to insert at the end, then I have to travel all the way to the back, which means O(n).
That's two operations, you first search O(n) the list for given position, then insert O(1) element into the list.
In a single linked list, the operation of insertion consists of:
alternating pointer of previous element
wrapping object into data structure and setting its pointer to next element
Both are invariant to list size.
On the other hand, take for example a heap structure. Insertion of each element requires O(log(n)) operations for it to retain its structure. Tree structures have similar mechanisms that will be run upon insertion and depend on current tree size.
Here it is considered that you already have the node after which you need to add a new element.
In that case for a singly-linked-list insertion time complexity becomes O(1).
The fact is that, unlike an array, we don’t need to shift the elements of a singly-linked list while doing an insertion. Therefore, the insertion time complexity of a singly-linked list is O(1).
Imagine that you have a Python list filled with integer numbers...
my_list = [9, 8, 4, 5, 6]
... and you want to insert the number 3 right after the element 8.
my_list.insert(2, 3)
The printed result will be:
[9, 8, 3, 4, 5, 6]
When you do an insertion to my_list, the other elements after the element 3 are all shifted towards the right, so their indexes are changed. As a result, the time complexity to insert an element at a given index is O(n).
However, in singly-linked lists, there are no array elements, but chained nodes and node values.
Image source: LeetCode
As the above image shows, the prev node holds the reference of the next node. As #πάντα ῥεῖ stated, "function signatures already take the predecessor node as argument". You can find the previous node in O(n) time, but while inserting a new node, you just need to change the addresses of connected nodes and that is O(1) time complexity.

How is the ordering of std::map achieved?

We can see from several sources that std::map is implemented using a red-black tree. It is my understanding that these types of data structures do not hold their elements in any particular order and just maintain the BST property and the height balancing requirements.
So, how is it that map::begin is constant time, and we are able to iterate over an ordered sequence?
Starting from the premise that std::map is maintaining a BST internally (which is not strictly required by the standard, but most libraries probably do that, like a red-black tree).
In a BST, to find the smallest element, you would just follow the left branches until you reach a leaf, which is O(log(N)). However, if you want to deliver the "begin()" iterator in constant time, it is quite simple to just keep track, internally, of the smallest element. Every time an insertion causes the smallest element to change, you update it, that's all. It's memory overhead, of course, but that's a trade-off.
There are possibly other ways to single out the smallest element (like keeping the root node unbalanced on purpose). Either way, it's not hard to do.
To iterate through the "ordered" sequence, you simply have to do an in-order traversal of the tree. Starting from the left-most leaf node, you go (up), (right), (up, up), (right), ... so on.. it's a simple set of rules and it's easy to implement, just see a quick implementation of a simple BST inorder iterator that I wrote a while back. As you do the in-order traversal, you will visit every node from the smallest to the biggest, in the correct order. In other words, it just gives you the illusion that "array" is sorted, but in reality, it's the traversal that makes it look like it's sorted.
The balancing properties of a red-black tree allow you to insert a node, anywhere in the tree, at O(log N) cost. For typical std::map implementations, the container will keep the tree sorted, and whenever you insert a new node, insert it into the correct location to keep the tree sorted, and then rebalance the tree to maintain the red-black property.
So no, red-black trees are not inherently sorted.
RB trees are binary search trees. Binary search trees don't necessarily store their elements in any particular order, but you can always get an inorder traversal. I'm not sure how map::begin guarantees constant time, I'd assume this involves always remembering the path to the smallest element (normally it'd be O(log(n))).

How a multiset works and how one can find the minimum element in multiset

When we insert elements in a multiset are they inserted in a sorted order .
How can i find the smallest elements of a mutiset ?
And how can i access ith element in a mutiset ?
Can some one please explain how multiset works and how it store elements in it ?
Thanks in advance .
Here is one solution that always works (regardless of the ordering scheme):
std::multiset<int> m;
//do something with m
std::cout<<*std::min_element(m.begin(),m.end())<<std::endl;
That should be O(n), so it takes no advantages of the already sorted nature of the storage scheme of a multiset.
Access "i-th" element:
std::cout<<*std::next(m.begin(),i-1)<<std::endl;
But again, what is meant by "i-th element" is determined by your ordering scheme.
Ok, and when your ordering scheme is given by std::less -- the standard case -- then indeed
m.begin();
gives you the minimal element. You can read it up here.
Multiset works by maintaining a red-black balanced binary tree.
Generally the meaning of a balanced tree (and red-black specifically) is that you can add/search/delete/get the min/get the max (and more) in O(logk) operations where k is the size of the tree (that is, the number of elements in the multiset). Specifically in c++'s multiset the complexity might change a bit, depends on the action.
If your set is s then:
You can get the min element in the set by using s.begin();
You can get the i'th element in the set by using *next(s.begin(),i-1) (as next(it, d) gives you the pointer to the element in position it + d). The complexity of this is linear as stated here.