Random element in STL set/map in log n

Random element in STL set/map in log n - c++

Since C++ STL set/map are implemented as red-black trees, it should be possible to not only do insert, delete, and find in O(log n) time, but also getMin, getMax, getRandom. As I understand the former two have their equivalent in begin() and end() (is that correct?). How about the last one? How can I do that?
The only idea I had so far was to use advance with a random argument, which however takes linear time...
EDIT: 'random' should refer to a uniform distribution

begin() is equivalent to a getMin operation, but end() returns an iterator one past the maximum, so it'd be rbegin().
As for getRandom: assuming you mean getting any item randomly with uniform probability, that might be possible in O(lg n) time in an AVL tree, but I don't see how to do it efficiently in a red-black tree. How will you know how many subtrees there are left and right of a given node without counting them in n/2 = O(n) time? And since std::set and std::map don't give direct access to their underlying tree, how are you going to traverse it?
I see three possible solutions:
use an AVL tree instead;
maintain a vector with the elements in the map or set parallel to it;
use a Boost::MultiIndex container with a sorted and a random-access view.
Edit: Boost.Intrusive might also do the trick.

Yes, begin and rbegin (not end!) are the minimum and maximum key value, respectively.
If your key is simple, e.g. an integer, you could just create a random integer in the range [min, max) (using ) and get the map's lower_bound for that.

As you suspect begin() and either end() - 1 or rbegin() will get you the min and max values. I can't see any way to uniformly get a random element in such a tree though. However you have a couple options:
You can do it in linear time using advance.
You can keep a separate vector of map iterators that you keep up to date on all insertions/deletions.
You could revisit the container choice. For example, would a sorted vector, heap, or some other representation be better?

If you have an even distribution of values in the set or map, you could choose a random value between the min and max and use lower_bound to find the closest value to it.
If insertions and deletions are infrequent, you can use a vector instead and sort it as necessary. Populating a vector and sorting it takes approximately the same amount of time as populating a set or map; it might even be faster, you'd need to test it to be sure. Selecting a random element would be trivial at that point.

I think you can actually do that with STL, but it's a bit more complicated.
You need to maintain a map. Each with a key from 1..N (N is the number of elements).
So each time you need to take a random element, generate a random number from 1..N, then find the element in the map with the chosen key. This is the element that you pick.
Afterwards, you need to maintain the consistency of the map by finding the biggest element, and update its key with the random number that you just picked.
Since each step is a log(n) operation, the total time is log(n).

with existing STL, there's probably no way. But There's a way to get random key in O(1) with addition std::map and std::vector structure by using reverse indexing.
Maintaining a map m & a vector v.
when inserting a new key k, let i = v.length(), then insert into m, and push k into v so that v[i] = k;
when deleting key k, let i = m[k], lookup the last element k2 in v, set m[k2]=i & v[i] = k2, pop_back v, and remove k from m;
to get a random key, let r = rand()%v.length(), then random key k = v[r];
so the basic idea is to have a continuous array of all existing keys.

Related

Insert a sorted range into std::set with hint

Assume I have a std::set (which is by definition sorted), and I have another range of sorted elements (for the sake of simplicity, in a different std::set object). Also, I have a guarantee that all values in the second set are larger than all the values in the first set.
I know I can efficiently insert one element into std::set - if I pass a correct hint, this will be O(1). I know I can insert any range into std::set, but as no hint is passed, this will be O(k logN) (where k is number of new elements, and N number of old elements).
Can I insert a range in a std::set and provide a hint? The only way I can think of is to do k single inserts with a hint, which does push the complexity of the insert operations in my case down to O(k):
std::set <int> bigSet{1,2,5,7,10,15,18};
std::set <int> biggerSet{50,60,70};
for(auto bigElem : biggerSet)
bigSet.insert(bigSet.end(), bigElem);

First of all, to do the merge you're talking about, you probably want to use set (or map's) merge member function, which will let you merge some existing map into this one. The advantage of doing this (and the reason you might not want to, depending your usage pattern) is that the items being merged in are actually moved from one set to the other, so you don't have to allocate new nodes (which can save a fair amount of time). The disadvantage is that the nodes then disappear from the source set, so if you need each local histogram to remain intact after being merged into the global histogram, you don't want to do this.
You can typically do better than O(log N) when searching a sorted vector. Assuming reasonably predictable distribution you can use an interpolating search to do a search in (typically) around O(log log N), often called "pseudo-constant" complexity.
Given that you only do insertions relatively infrequently, you might also consider a hybrid structure. This starts with a small chunk of data that you don't keep sorted. When you reach an upper bound on its size, you sort it and insert it into a sorted vector. Then you go back to adding items to your unsorted area. When it reaches the limit, again sort it and merge it with the existing sorted data.
Assuming you limit the unsorted chunk to no larger than log(N), search complexity is still O(log N)--one log(n) binary search or log log N interpolating search on the sorted chunk, and one log(n) linear search on the unsorted chunk. Once you've verified that an item doesn't exist yet, adding it has constant complexity (just tack it onto the end of the unsorted chunk). The big advantage is that this can still easily use a contiguous structure such as a vector, so it's much more cache friendly than a typical tree structure.
Since your global histogram is (apparently) only ever populated with data coming from the local histograms, it might be worth considering just keeping it in a vector, and when you need to merge in the data from one of the local chunks, just use std::merge to take the existing global histogram and the local histogram, and merge them together into a new global histogram. This has O(N + M) complexity (N = size of global histogram, M = size of local histogram). Depending on the typical size of a local histogram, this could pretty easily work out as a win.

Merging two sorted containers is much quicker than sorting. It's complexity is O(N), so in theory what you say makes sense. It's the reason why merge-sort is one of the quickest sorting algorithms. If you follow the link, you will also find pseudo-code, what you are doing is just one pass of the main loop.
You will also find the algorithm implemented in STL as std::merge. This takes ANY container as an input, I would suggest using std::vector as default container for new element. Sorting a vector is a very fast operation. You may even find it better to use a sorted-vector instead of a set for output. You can always use std::lower_bound to get O(Nlog(N)) performance from a sorted-vector.
Vectors have many advantages compared with set/map. Not least of which is they are very easy to visualise in a debugger :-)
(The code at the bottom of the std::merge shows an example of using vectors)

You can merge the sets more efficiently using special functions for that.
In case you insist, insert returns information about the inserted location.
iterator insert( const_iterator hint, const value_type& value );
Code:
std::set <int> bigSet{1,2,5,7,10,15,18};
std::set <int> biggerSet{50,60,70};
auto hint = bigSet.cend();
for(auto& bigElem : biggerSet)
hint = bigSet.insert(hint, bigElem);
This assumes, of course, that you are inserting new elements that will end up together or close in the final set. Otherwise there is not much to gain, only the fact that since the source is a set (it is ordered) then about half of the three will not be looked up.
There is also a member function
template< class InputIt > void insert( InputIt first, InputIt last );.
That might or might not do something like this internally.

How a multiset works and how one can find the minimum element in multiset

When we insert elements in a multiset are they inserted in a sorted order .
How can i find the smallest elements of a mutiset ?
And how can i access ith element in a mutiset ?
Can some one please explain how multiset works and how it store elements in it ?
Thanks in advance .

Here is one solution that always works (regardless of the ordering scheme):
std::multiset<int> m;
//do something with m
std::cout<<*std::min_element(m.begin(),m.end())<<std::endl;
That should be O(n), so it takes no advantages of the already sorted nature of the storage scheme of a multiset.
Access "i-th" element:
std::cout<<*std::next(m.begin(),i-1)<<std::endl;
But again, what is meant by "i-th element" is determined by your ordering scheme.
Ok, and when your ordering scheme is given by std::less -- the standard case -- then indeed
m.begin();
gives you the minimal element. You can read it up here.

Multiset works by maintaining a red-black balanced binary tree.
Generally the meaning of a balanced tree (and red-black specifically) is that you can add/search/delete/get the min/get the max (and more) in O(logk) operations where k is the size of the tree (that is, the number of elements in the multiset). Specifically in c++'s multiset the complexity might change a bit, depends on the action.
If your set is s then:
You can get the min element in the set by using s.begin();
You can get the i'th element in the set by using *next(s.begin(),i-1) (as next(it, d) gives you the pointer to the element in position it + d). The complexity of this is linear as stated here.

How to efficiently insert a range of consecutive integers into a std::set?

In C++, I have a std::set that I would like to insert a range of consecutive integers. How can I do this efficiently, hopefully in O(n) time where n is the length of the range?
I'm thinking I'd use the inputIterator version of std::insert, but am unclear on how to build the input iterator.
std::set<int> mySet;
// Insert [34 - 75):
mySet.insert(inputIteratorTo34, inputIteratorTo75);
How can I create the input iterator and will this be O(n) on the range size?

The efficient way of inserting already ordered elements into a set is to hint the library as to where the next element will be. For that you want to use the version of insert that takes an iterator:
std::set<int>::iterator it = mySet.end();
for (int x : input) {
it = mySet.insert(it, x);
}
On the other hand, you might want to consider other containers. Whenever possible, use std::vector. If the amount of insertions is small compared to lookups, or if all inserts happen upfront, then you can build a vector, sort it and use lower_bound for lookups. In this case, since the input is already sorted, you can skip the sorting.
If insertions (or removals) happen all over the place, you might want to consider using std::unordered_set<int> which has an average O(1) insertion (per element) and lookup cost.
For the particular case of tracking small numbers in a set, all of which are small (34 to 75 are small numbers) you can also consider using bitsets or even a plain array of bool in which you set the elements to true when inserted. Either will have O(n) insertion (all elements) and O(1) lookup (each lookup), which is better than the set.

A Boost way could be:
std::set<int> numbers(
boost::counting_iterator<int>(0),
boost::counting_iterator<int>(10));
A great LINK for other answers, Specially #Mani's answer

std::set is a type of binary-search-tree, which means an insertion costs O(lgn) on average,
c++98:If N elements are inserted, Nlog(size+N) in general, but linear
in size+N if the elements are already sorted according to the same
ordering criterion used by the container.
c++11:If N elements are inserted, Nlog(size+N). Implementations may
optimize if the range is already sorted.
I think the C++98 implement will trace the current insertion node and check if the next value to insert is larger than the current one, in which case there's no need to start from root again.
in c++11, this is an optional optimize, so you may implement a skiplist structure, and use this range-insert feture in your implement, or you may optimize the programm according to your scenarios

Taking the hint provided by aksham, I see the answer is:
#include <boost/iterator/counting_iterator.hpp>
std::set<int> mySet;
// Insert [34 - 75):
mySet.insert(boost::counting_iterator<int>(34),
boost::counting_iterator<int>(75));

It's not clear why you specifically want to insert using iterators to specify a range.
However, I believe you can use a simple for-loop to insert with the desired O(n) complexity.
Quoting from cppreference's page on std::set, the complexity is:
If N elements are inserted, Nlog(size+N) in general, but linear in size+N if the elements are already sorted according to the same ordering criterion used by the container.
So, using a for-loop:
std::set<int> mySet;
for(int i = 34; i < 75; ++i)
mySet.insert(i);

Implement decreaseKey in STL Priority Queue C++

I'm trying to implement Prim's Algorithm and for that I need to have a decreaseKey method for a priority queue (to update the key value in a priority queue). Can I implement this in the STL Priority Queue?
If it helps, this is the algorithm I'm following:
for each vertex u in graph G
set key of u to INFINITY
set parent of u to NIL
set key of source vertex to 0
en-queue to priority queue Q all vertices in graph with keys as above
while Q is not empty
pop vertex u with lowest key in Q
for each adjacent vertex v of u do
if (v is still in Q) and (key(u) + weight-function(u, v) < key(v)) then
set u to be parent of v
update v's key to equal key(u) + weight-function(u, v) // This part is giving me problems as I don't know how implement decreaseKey in priority queue

I do not think you can implement it in STL container. Remember you can always write your own heap(priority queue) based on vector, but there is a work around:
Keep array of distances you have, lets say d. In you priority queue you put pairs of distances and index of vertex of this distance. When you need to delete some value from queue, do not delete it, just update your value in d array and put new pair into queue.
Every time you take new value from queue, check if distance in pair is actually that good, as in your array d. If not ignore it.
Time is same O(MlogM). Memory is O(MlogM), where M is number of edges.
There is another approach: use RB-Tree, it can insert and delete keys in O(logN) and get minimum as well. You can find implementation in STL of RB-Tree in std::set container.
But, although time complexity is same, RB-Tree works slower and has bigger hidden constant, so it might be slightly slower, appx. 5 times slower. Depends on data, of course .

For the other approach : better than using a std::set.
You may use a btree::btree_set (or btree::safe_btree_set).
This is an implementation identical to std::set made by google using B-Tree unlike stl which use RB-Tree. This much better than std::set and also O(logN).
check the performance comparison :
http://code.google.com/p/cpp-btree/wiki/UsageInstructions
And it has also a much lower memory footprint.

I'm no expert, so hope this is not too dumb, but would a vector combined with lower_bound work very well?
If you use lower_bound to find the correct place to insert new values, your vector will always be sorted as you build it, no sorting required. When your vector is sorted, isn't lower_bound a binary search with logarithmic class performance?
Since it is sorted, finding the min value (or max) is a snap.
To reduce key, you'd do a lower_bound search, delete, and do lower_bound again to insert the reduced key, which = 2 logarithmic class operations. Still not bad.
Alternatively, you could update the key and sort the vector. I would guess with random access, that should still be in the logarithmic class, but don't know exactly what stl does there.
With sorted vector, if you know the candidate key is less than the one that's in there, then maybe you could even just sort the part of the vector that has all the lesser values for even better performance.
Another consideration is I think sets/maps have quite a bit more memory overhead than vectors do?

I think most sorting is limited to NLogN, so 2 LogN for re-inserting rather than sorting might be better for the reduce key operation.
The other thing is inserting in vector is not so hot, however on the whole, does the idea of vector w lower_bound seem worth considering?
thanks

std::map get the lowest n elements time

std::map should be implemented with a binary search tree as I read in the documentation and it sorts them too.
I need to insert rapidly and retrieve rapidly elements. I also need to get the first lowest N elements from time to time.
I was thinking about using a std::map, is it a good choice? If it is, what is the time I would need to retrieve the lowest N elements? O(n*logn)?

Given you need both retrieval and n smallest, I would say std::map is reasonable choice. But depending on the exact access pattern std::vector with sorting might be a good choice too.
I am not sure what you mean by retrieve. Time to read k elements is O(k) (provided you do it sequentially using iterator), time to remove them is O(k log n) (n is the total amount of elements; even if you do it sequentially using iterators).

You can use iterators to rapidly read through the lowest N elements. Going from begin() to the N-1th element will take O(n) time (getting the next element is amortised constant time for a std::map).
I'd note, however, that it is often actually faster to use a sorted std::vector with a binary chop search method to implement what it sounds like you are doing so depending on your exact requirements this might be worth investigating.

The C++ standard requires that all required iterator operations (including iterator increment) be amortized constant time. Consequently, getting the first N items in a container must take amortized O(N) time.

I would say yes to both questions.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js