How a multiset works and how one can find the minimum element in multiset - c++

When we insert elements in a multiset are they inserted in a sorted order .
How can i find the smallest elements of a mutiset ?
And how can i access ith element in a mutiset ?
Can some one please explain how multiset works and how it store elements in it ?
Thanks in advance .

Here is one solution that always works (regardless of the ordering scheme):
std::multiset<int> m;
//do something with m
std::cout<<*std::min_element(m.begin(),m.end())<<std::endl;
That should be O(n), so it takes no advantages of the already sorted nature of the storage scheme of a multiset.
Access "i-th" element:
std::cout<<*std::next(m.begin(),i-1)<<std::endl;
But again, what is meant by "i-th element" is determined by your ordering scheme.
Ok, and when your ordering scheme is given by std::less -- the standard case -- then indeed
m.begin();
gives you the minimal element. You can read it up here.

Multiset works by maintaining a red-black balanced binary tree.
Generally the meaning of a balanced tree (and red-black specifically) is that you can add/search/delete/get the min/get the max (and more) in O(logk) operations where k is the size of the tree (that is, the number of elements in the multiset). Specifically in c++'s multiset the complexity might change a bit, depends on the action.
If your set is s then:
You can get the min element in the set by using s.begin();
You can get the i'th element in the set by using *next(s.begin(),i-1) (as next(it, d) gives you the pointer to the element in position it + d). The complexity of this is linear as stated here.

Related

How to efficiently insert a range of consecutive integers into a std::set?

In C++, I have a std::set that I would like to insert a range of consecutive integers. How can I do this efficiently, hopefully in O(n) time where n is the length of the range?
I'm thinking I'd use the inputIterator version of std::insert, but am unclear on how to build the input iterator.
std::set<int> mySet;
// Insert [34 - 75):
mySet.insert(inputIteratorTo34, inputIteratorTo75);
How can I create the input iterator and will this be O(n) on the range size?
The efficient way of inserting already ordered elements into a set is to hint the library as to where the next element will be. For that you want to use the version of insert that takes an iterator:
std::set<int>::iterator it = mySet.end();
for (int x : input) {
it = mySet.insert(it, x);
}
On the other hand, you might want to consider other containers. Whenever possible, use std::vector. If the amount of insertions is small compared to lookups, or if all inserts happen upfront, then you can build a vector, sort it and use lower_bound for lookups. In this case, since the input is already sorted, you can skip the sorting.
If insertions (or removals) happen all over the place, you might want to consider using std::unordered_set<int> which has an average O(1) insertion (per element) and lookup cost.
For the particular case of tracking small numbers in a set, all of which are small (34 to 75 are small numbers) you can also consider using bitsets or even a plain array of bool in which you set the elements to true when inserted. Either will have O(n) insertion (all elements) and O(1) lookup (each lookup), which is better than the set.
A Boost way could be:
std::set<int> numbers(
boost::counting_iterator<int>(0),
boost::counting_iterator<int>(10));
A great LINK for other answers, Specially #Mani's answer
std::set is a type of binary-search-tree, which means an insertion costs O(lgn) on average,
c++98:If N elements are inserted, Nlog(size+N) in general, but linear
in size+N if the elements are already sorted according to the same
ordering criterion used by the container.
c++11:If N elements are inserted, Nlog(size+N). Implementations may
optimize if the range is already sorted.
I think the C++98 implement will trace the current insertion node and check if the next value to insert is larger than the current one, in which case there's no need to start from root again.
in c++11, this is an optional optimize, so you may implement a skiplist structure, and use this range-insert feture in your implement, or you may optimize the programm according to your scenarios
Taking the hint provided by aksham, I see the answer is:
#include <boost/iterator/counting_iterator.hpp>
std::set<int> mySet;
// Insert [34 - 75):
mySet.insert(boost::counting_iterator<int>(34),
boost::counting_iterator<int>(75));
It's not clear why you specifically want to insert using iterators to specify a range.
However, I believe you can use a simple for-loop to insert with the desired O(n) complexity.
Quoting from cppreference's page on std::set, the complexity is:
If N elements are inserted, Nlog(size+N) in general, but linear in size+N if the elements are already sorted according to the same ordering criterion used by the container.
So, using a for-loop:
std::set<int> mySet;
for(int i = 34; i < 75; ++i)
mySet.insert(i);

Difference between std::set and std::priority_queue

Since both std::priority_queue and std::set (and std::multiset) are data containers that store elements and allow you to access them in an ordered fashion, and have same insertion complexity O(log n), what are the advantages of using one over the other (or, what kind of situations call for the one or the other?)?
While I know that the underlying structures are different, I am not as much interested in the difference in their implementation as I am in the comparison their performance and suitability for various uses.
Note: I know about the no-duplicates in a set. That's why I also mentioned std::multiset since it has the exactly same behavior as the std::set but can be used where the data stored is allowed to compare as equal elements. So please, don't comment on single/multiple keys issue.
A priority queue only gives you access to one element in sorted order -- i.e., you can get the highest priority item, and when you remove that, you can get the next highest priority, and so on. A priority queue also allows duplicate elements, so it's more like a multiset than a set. [Edit: As #Tadeusz Kopec pointed out, building a heap is also linear on the number of items in the heap, where building a set is O(N log N) unless it's being built from a sequence that's already ordered (in which case it is also linear).]
A set allows you full access in sorted order, so you can, for example, find two elements somewhere in the middle of the set, then traverse in order from one to the other.
std::priority_queue allows to do the following:
Insert an element O(log n)
Get the smallest element O(1)
Erase the smallest element O(log n)
while std::set has more possibilities:
Insert any element O(log n) and the constant is greater than in std::priority_queue
Find any element O(log n)
Find an element, >= than the one your are looking for O(log n) (lower_bound)
Erase any element O(log n)
Erase any element by its iterator O(1)
Move to previous/next element in sorted order O(1)
Get the smallest element O(1)
Get the largest element O(1)
set/multiset are generally backed by a binary tree. http://en.wikipedia.org/wiki/Binary_tree
priority_queue is generally backed by a heap. http://en.wikipedia.org/wiki/Heap_(data_structure)
So the question is really when should you use a binary tree instead of a heap?
Both structures are laid out in a tree, however the rules about the relationship between anscestors are different.
We will call the positions P for parent, L for left child, and R for right child.
In a binary tree L < P < R.
In a heap P < L and P < R
So binary trees sort "sideways" and heaps sort "upwards".
So if we look at this as a triangle than in the binary tree L,P,R are completely sorted, whereas in the heap the relationship between L and R is unknown (only their relationship to P).
This has the following effects:
If you have an unsorted array and want to turn it into a binary tree it takes O(nlogn) time. If you want to turn it into a heap it only takes O(n) time, (as it just compares to find the extreme element)
Heaps are more efficient if you only need the extreme element (lowest or highest by some comparison function). Heaps only do the comparisons (lazily) necessary to determine the extreme element.
Binary trees perform the comparisons necessary to order the entire collection, and keep the entire collection sorted all-the-time.
Heaps have constant-time lookup (peek) of lowest element, binary trees have logarithmic time lookup of lowest element.
Since both std::priority_queue and std::set (and std::multiset) are data containers that store elements and allow you to access them in an ordered fashion, and have same insertion complexity O(log n), what are the advantages of using one over the other (or, what kind of situations call for the one or the other?)?
Even though insert and erase operations for both containers have the same complexity O(log n), these operations for std::set are slower than for std::priority_queue. That's because std::set makes many memory allocations. Every element of std::set is stored at its own allocation. std::priority_queue (with underlying std::vector container by default) uses single allocation to store all elements. On other hand std::priority_queue uses many swap operations on its elements whereas std::set uses just pointers swapping. So if swapping is very slow operation for element type, using std::set may be more efficient. Moreover element may be non-swappable at all.
Memory overhead for std::set is much bigger also because it has to store many pointers between its nodes.

Insert, find-min(key) and delete(based on value) functionality in logN with STL containers?

Here's an interesting problem:
Let's say we have a set A for which the following are permitted:
Insert x
Find-min x
Delete the n-th inserted element in A
Create a data structure to permit these in logarithmic time.
The most common solution is with a heap. AFAIK, heaps with decrease-key (based on a value - generally the index when an element was added) keep a table with the Pos[1...N] meaning the i-th added value is now on index Pos[i], so it can find the key to decrease in O(1). Can someone confirm this?
Another question is how we solve the problem with STL containers? i.e. with sets, maps or priority queues. A partial solution i found is to have a priority queue with indexes but ordered by the value to these indexes. I.e. A[1..N] are our added elements in order of insertion. pri-queue with 1..N based on comparison of (A[i],A[j]). Now we keep a table with the deleted indexes and verify if the min-value index was deleted. Unfortunately, Find-min becomes slightly proportional with no. of deleted values.
Any alternative ideas?
Now I thought how to formulate a more general problem.
Create a data structure similar to multimap with <key, value> elements. Keys are not unique. Values are. Insert, find one (based on key), find (based on value), delete one (based on key) and delete (based on value) must be permitted O(logN).
Perhaps a bit oddly, this is possible with a manually implemented Binary Search Tree with a modification: for every node operation a hash-table or a map based on value is updated with the new pointer to the node.
Similar to having a strictly ordered std::set (if equal key order by value) with a hash-table on value giving the iterator to the element containing that value.
Possible with std::set and a (std::map/hash table) as described by Chong Luo.
You can use a combination of two containers to solve your problem - A vector in which you add each consecutive element and a set:
You use the set to execute find_min while
When you insert an element you execute push_back in the vector and insert in the set
When you delete the n-th element, you see it's value in the vector and erase it from the set. Here I assume the number of elements does not change after executing delete n-th element.
I think you can not solve the problem with only one container from STL. However there are some data structures that can solve your problem:
Skip list - can find the minimum in constant time and will perform the other two operations with amortized complexity O(log(n)). It is relatively easy to implement.
Tiered vector is easy to implement and will perform find_min in constant time and the other two operations in O(sqrt(n))
And of course the approach you propose - write your own heap that keeps track of where is the n-th element in it.

Random element in STL set/map in log n

Since C++ STL set/map are implemented as red-black trees, it should be possible to not only do insert, delete, and find in O(log n) time, but also getMin, getMax, getRandom. As I understand the former two have their equivalent in begin() and end() (is that correct?). How about the last one? How can I do that?
The only idea I had so far was to use advance with a random argument, which however takes linear time...
EDIT: 'random' should refer to a uniform distribution
begin() is equivalent to a getMin operation, but end() returns an iterator one past the maximum, so it'd be rbegin().
As for getRandom: assuming you mean getting any item randomly with uniform probability, that might be possible in O(lg n) time in an AVL tree, but I don't see how to do it efficiently in a red-black tree. How will you know how many subtrees there are left and right of a given node without counting them in n/2 = O(n) time? And since std::set and std::map don't give direct access to their underlying tree, how are you going to traverse it?
I see three possible solutions:
use an AVL tree instead;
maintain a vector with the elements in the map or set parallel to it;
use a Boost::MultiIndex container with a sorted and a random-access view.
Edit: Boost.Intrusive might also do the trick.
Yes, begin and rbegin (not end!) are the minimum and maximum key value, respectively.
If your key is simple, e.g. an integer, you could just create a random integer in the range [min, max) (using ) and get the map's lower_bound for that.
As you suspect begin() and either end() - 1 or rbegin() will get you the min and max values. I can't see any way to uniformly get a random element in such a tree though. However you have a couple options:
You can do it in linear time using advance.
You can keep a separate vector of map iterators that you keep up to date on all insertions/deletions.
You could revisit the container choice. For example, would a sorted vector, heap, or some other representation be better?
If you have an even distribution of values in the set or map, you could choose a random value between the min and max and use lower_bound to find the closest value to it.
If insertions and deletions are infrequent, you can use a vector instead and sort it as necessary. Populating a vector and sorting it takes approximately the same amount of time as populating a set or map; it might even be faster, you'd need to test it to be sure. Selecting a random element would be trivial at that point.
I think you can actually do that with STL, but it's a bit more complicated.
You need to maintain a map. Each with a key from 1..N (N is the number of elements).
So each time you need to take a random element, generate a random number from 1..N, then find the element in the map with the chosen key. This is the element that you pick.
Afterwards, you need to maintain the consistency of the map by finding the biggest element, and update its key with the random number that you just picked.
Since each step is a log(n) operation, the total time is log(n).
with existing STL, there's probably no way. But There's a way to get random key in O(1) with addition std::map and std::vector structure by using reverse indexing.
Maintaining a map m & a vector v.
when inserting a new key k, let i = v.length(), then insert into m, and push k into v so that v[i] = k;
when deleting key k, let i = m[k], lookup the last element k2 in v, set m[k2]=i & v[i] = k2, pop_back v, and remove k from m;
to get a random key, let r = rand()%v.length(), then random key k = v[r];
so the basic idea is to have a continuous array of all existing keys.

Which STL Container?

I need a container (not necessarily a STL container) which let me do the following easily:
Insertion and removal of elements at any position
Accessing elements by their index
Iterate over the elements in any order
I used std::list, but it won't let me insert at any position (it does, but for that I'll have to iterate over all elements and then insert at the position I want, which is slow, as the list may be huge). So can you recommend any efficient solution?
It's not completely clear to me what you mean by "Iterate over the elements in any order" - does this mean you don't care about the order, as long as you can iterate, or that you want to be able to iterate using arbitrarily defined criteria? These are very different conditions!
Assuming you meant iteration order doesn't matter, several possible containers come to mind:
std::map [a red-black tree, typically]
Insertion, removal, and access are O(log(n))
Iteration is ordered by index
hash_map or std::tr1::unordered_map [a hash table]
Insertion, removal, and access are all (approx) O(1)
Iteration is 'random'
This diagram will help you a lot, I think so.
Either a vector or a deque will suit. vector will provide faster accesses, but deque will provide faster instertions and removals.
Well, you can't have all of those in constant time, unfortunately. Decide if you are going to do more insertions or reads, and base your decision on that.
For example, a vector will let you access any element by index in constant time, iterate over the elements in linear time (all containers should allow this), but insertion and removal takes linear time (slower than a list).
You can try std::deque, but it will not provide the constant time removal of elements in middle but it supports
random access to elements
constant time insertion and removal
of elements at the end of the
sequence
linear time insertion and removal of
elements in the middle.
A vector. When you erase any item, copy the last item over one to be erased (or swap them, whichever is faster) and pop_back. To insert at a position (but why should you, if the order doesn't matter!?), push_back the item at that position and overwrite (or swap) with item to be inserted.
By "iterating over the elements in any order", do you mean you need support for both forward and backwards by index, or do you mean order doesn't matter?
You want a special tree called a unsorted counted tree. This allows O(log(n)) indexed insertion, O(log(n)) indexed removal, and O(log(n)) indexed lookup. It also allows O(n) iteration in either the forward or reverse direction. One example where these are used is text editors, where each line of text in the editor is a node.
Here are some references:
Counted B-Trees
Rope (computer science)
An order statistic tree might be useful here. It's basically just a normal tree, except that every node in the tree includes a count of the nodes in its left sub-tree. This supports all the basic operations with no worse than logarithmic complexity. During insertion, anytime you insert an item in a left sub-tree, you increment the node's count. During deletion, anytime you delete from the left sub-tree, you decrement the node's count. To index to node N, you start from the root. The root has a count of nodes in its left sub-tree, so you check whether N is less than, equal to, or greater than the count for the root. If it's less, you search in the left subtree in the same way. If it's greater, you descend the right sub-tree, add the root's count to that node's count, and compare that to N. Continue until A) you've found the correct node, or B) you've determined that there are fewer than N items in the tree.
(source: adrinael.net)
But it sounds like you're looking for a single container with the following properties:
All the best benefits of various containers
None of their ensuing downsides
And that's impossible. One benefit causes a detriment. Choosing a container is about compromise.
std::vector
[padding for "15 chars" here]