Implement decreaseKey in STL Priority Queue C++ - c++

I'm trying to implement Prim's Algorithm and for that I need to have a decreaseKey method for a priority queue (to update the key value in a priority queue). Can I implement this in the STL Priority Queue?
If it helps, this is the algorithm I'm following:
for each vertex u in graph G
set key of u to INFINITY
set parent of u to NIL
set key of source vertex to 0
en-queue to priority queue Q all vertices in graph with keys as above
while Q is not empty
pop vertex u with lowest key in Q
for each adjacent vertex v of u do
if (v is still in Q) and (key(u) + weight-function(u, v) < key(v)) then
set u to be parent of v
update v's key to equal key(u) + weight-function(u, v) // This part is giving me problems as I don't know how implement decreaseKey in priority queue

I do not think you can implement it in STL container. Remember you can always write your own heap(priority queue) based on vector, but there is a work around:
Keep array of distances you have, lets say d. In you priority queue you put pairs of distances and index of vertex of this distance. When you need to delete some value from queue, do not delete it, just update your value in d array and put new pair into queue.
Every time you take new value from queue, check if distance in pair is actually that good, as in your array d. If not ignore it.
Time is same O(MlogM). Memory is O(MlogM), where M is number of edges.
There is another approach: use RB-Tree, it can insert and delete keys in O(logN) and get minimum as well. You can find implementation in STL of RB-Tree in std::set container.
But, although time complexity is same, RB-Tree works slower and has bigger hidden constant, so it might be slightly slower, appx. 5 times slower. Depends on data, of course .

For the other approach : better than using a std::set.
You may use a btree::btree_set (or btree::safe_btree_set).
This is an implementation identical to std::set made by google using B-Tree unlike stl which use RB-Tree. This much better than std::set and also O(logN).
check the performance comparison :
http://code.google.com/p/cpp-btree/wiki/UsageInstructions
And it has also a much lower memory footprint.

I'm no expert, so hope this is not too dumb, but would a vector combined with lower_bound work very well?
If you use lower_bound to find the correct place to insert new values, your vector will always be sorted as you build it, no sorting required. When your vector is sorted, isn't lower_bound a binary search with logarithmic class performance?
Since it is sorted, finding the min value (or max) is a snap.
To reduce key, you'd do a lower_bound search, delete, and do lower_bound again to insert the reduced key, which = 2 logarithmic class operations. Still not bad.
Alternatively, you could update the key and sort the vector. I would guess with random access, that should still be in the logarithmic class, but don't know exactly what stl does there.
With sorted vector, if you know the candidate key is less than the one that's in there, then maybe you could even just sort the part of the vector that has all the lesser values for even better performance.
Another consideration is I think sets/maps have quite a bit more memory overhead than vectors do?

I think most sorting is limited to NLogN, so 2 LogN for re-inserting rather than sorting might be better for the reduce key operation.
The other thing is inserting in vector is not so hot, however on the whole, does the idea of vector w lower_bound seem worth considering?
thanks

Related

Insert a sorted range into std::set with hint

Assume I have a std::set (which is by definition sorted), and I have another range of sorted elements (for the sake of simplicity, in a different std::set object). Also, I have a guarantee that all values in the second set are larger than all the values in the first set.
I know I can efficiently insert one element into std::set - if I pass a correct hint, this will be O(1). I know I can insert any range into std::set, but as no hint is passed, this will be O(k logN) (where k is number of new elements, and N number of old elements).
Can I insert a range in a std::set and provide a hint? The only way I can think of is to do k single inserts with a hint, which does push the complexity of the insert operations in my case down to O(k):
std::set <int> bigSet{1,2,5,7,10,15,18};
std::set <int> biggerSet{50,60,70};
for(auto bigElem : biggerSet)
bigSet.insert(bigSet.end(), bigElem);
First of all, to do the merge you're talking about, you probably want to use set (or map's) merge member function, which will let you merge some existing map into this one. The advantage of doing this (and the reason you might not want to, depending your usage pattern) is that the items being merged in are actually moved from one set to the other, so you don't have to allocate new nodes (which can save a fair amount of time). The disadvantage is that the nodes then disappear from the source set, so if you need each local histogram to remain intact after being merged into the global histogram, you don't want to do this.
You can typically do better than O(log N) when searching a sorted vector. Assuming reasonably predictable distribution you can use an interpolating search to do a search in (typically) around O(log log N), often called "pseudo-constant" complexity.
Given that you only do insertions relatively infrequently, you might also consider a hybrid structure. This starts with a small chunk of data that you don't keep sorted. When you reach an upper bound on its size, you sort it and insert it into a sorted vector. Then you go back to adding items to your unsorted area. When it reaches the limit, again sort it and merge it with the existing sorted data.
Assuming you limit the unsorted chunk to no larger than log(N), search complexity is still O(log N)--one log(n) binary search or log log N interpolating search on the sorted chunk, and one log(n) linear search on the unsorted chunk. Once you've verified that an item doesn't exist yet, adding it has constant complexity (just tack it onto the end of the unsorted chunk). The big advantage is that this can still easily use a contiguous structure such as a vector, so it's much more cache friendly than a typical tree structure.
Since your global histogram is (apparently) only ever populated with data coming from the local histograms, it might be worth considering just keeping it in a vector, and when you need to merge in the data from one of the local chunks, just use std::merge to take the existing global histogram and the local histogram, and merge them together into a new global histogram. This has O(N + M) complexity (N = size of global histogram, M = size of local histogram). Depending on the typical size of a local histogram, this could pretty easily work out as a win.
Merging two sorted containers is much quicker than sorting. It's complexity is O(N), so in theory what you say makes sense. It's the reason why merge-sort is one of the quickest sorting algorithms. If you follow the link, you will also find pseudo-code, what you are doing is just one pass of the main loop.
You will also find the algorithm implemented in STL as std::merge. This takes ANY container as an input, I would suggest using std::vector as default container for new element. Sorting a vector is a very fast operation. You may even find it better to use a sorted-vector instead of a set for output. You can always use std::lower_bound to get O(Nlog(N)) performance from a sorted-vector.
Vectors have many advantages compared with set/map. Not least of which is they are very easy to visualise in a debugger :-)
(The code at the bottom of the std::merge shows an example of using vectors)
You can merge the sets more efficiently using special functions for that.
In case you insist, insert returns information about the inserted location.
iterator insert( const_iterator hint, const value_type& value );
Code:
std::set <int> bigSet{1,2,5,7,10,15,18};
std::set <int> biggerSet{50,60,70};
auto hint = bigSet.cend();
for(auto& bigElem : biggerSet)
hint = bigSet.insert(hint, bigElem);
This assumes, of course, that you are inserting new elements that will end up together or close in the final set. Otherwise there is not much to gain, only the fact that since the source is a set (it is ordered) then about half of the three will not be looked up.
There is also a member function
template< class InputIt > void insert( InputIt first, InputIt last );.
That might or might not do something like this internally.

Stream of Integers arriving at specified interval need to look sorted

Interview question: There is a stream of Integers that arrives at specified intervals (say every 20 sec). Which Container of STL would you use to store them so that the Integers look sorted? My reply was map/set when there is no duplicate or multimap/multiset when there is duplicate. Any better answer if exists?
Use a multiset if you want to preserve duplicates. If you don't want to preserve duplicates, use a set.
If it's only being updated every 20 seconds, it probably doesn't matter a whole lot (unless it goes for so long that the set of integers becomes tremendously huge).
If you had data coming in a lot faster, there are alternatives that might be worth considering. One would be to use a couple of vectors. As data arrives, just push it onto one of the vectors. When you need to do an in-order traversal, sort that newly arrived data, and merge with the other vector of existing (already-sorted data). That'll give you results in order, which you can then write out to another vector, and start the same cycle again.
The big advantage here is that you're dealing with contiguous data instead of individually allocated nodes. Even with a possibility of three vectors in use at a time, your total memory usage is likely to be about equal (or possibly even less than) that of using a set or multiset.
Another possibility to consider (that's a bit of a hybrid between the two) would be something like a B+ tree. This is still a tree, so you can do in-order insertions with logarithmic complexity, but you have all the data in the leaf nodes (which are fairly large) so you get at least a reasonable amount of contiguous access as well.
To maintain a sorted list of integers streaming I would use std::priority_queue with any underlying container (vector or deque depending on the particular use).
You can keep push() ing to the priority_queue and use top() and pop() to retrieve in the sorted order.
Answer should be std::set . std::map<key, value> has to consider when there is a pairs of data as <key, value> and it need to be sorted according to the value of key
In same way if you have to consider duplicates, use std::multiset and std::multimap according to type of data.

STL priority_queue<pair> vs. map

I need a priority queue that will store a value for every key, not just the key. I think the viable options are std::multi_map<K,V> since it iterates in key order, or std::priority_queue<std::pair<K,V>> since it sorts on K before V. Is there any reason I should prefer one over the other, other than personal preference? Are they really the same, or did I miss something?
A priority queue is sorted initially, in O(N) time, and then iterating all the elements in decreasing order takes O(N log N) time. It is stored in a std::vector behind the scenes, so there's only a small coefficient after the big-O behavior. Part of that, though, is moving the elements around inside the vector. If sizeof (K) or sizeof (V) is large, it will be a bit slower.
std::map is a red-black tree (in universal practice), so it takes O(N log N) time to insert the elements, keeping them sorted after each insertion. They are stored as linked nodes, so each item incurs malloc and free overhead. Then it takes O(N) time to iterate over them and destroy the structure.
The priority queue overall should usually have better performance, but it's more constraining on your usage: the data items will move around during iteration, and you can only iterate once.
If you don't need to insert new items while iterating, you can use std::sort with a std::vector, of course. This should outperform the priority_queue by some constant factor.
As with most things in performance, the only way to judge for sure is to try it both ways (with real-world testcases) and measure.
By the way, to maximize performance, you can define a custom comparison function to ignore the V and compare only the K within the pair<K,V>.

Random element in STL set/map in log n

Since C++ STL set/map are implemented as red-black trees, it should be possible to not only do insert, delete, and find in O(log n) time, but also getMin, getMax, getRandom. As I understand the former two have their equivalent in begin() and end() (is that correct?). How about the last one? How can I do that?
The only idea I had so far was to use advance with a random argument, which however takes linear time...
EDIT: 'random' should refer to a uniform distribution
begin() is equivalent to a getMin operation, but end() returns an iterator one past the maximum, so it'd be rbegin().
As for getRandom: assuming you mean getting any item randomly with uniform probability, that might be possible in O(lg n) time in an AVL tree, but I don't see how to do it efficiently in a red-black tree. How will you know how many subtrees there are left and right of a given node without counting them in n/2 = O(n) time? And since std::set and std::map don't give direct access to their underlying tree, how are you going to traverse it?
I see three possible solutions:
use an AVL tree instead;
maintain a vector with the elements in the map or set parallel to it;
use a Boost::MultiIndex container with a sorted and a random-access view.
Edit: Boost.Intrusive might also do the trick.
Yes, begin and rbegin (not end!) are the minimum and maximum key value, respectively.
If your key is simple, e.g. an integer, you could just create a random integer in the range [min, max) (using ) and get the map's lower_bound for that.
As you suspect begin() and either end() - 1 or rbegin() will get you the min and max values. I can't see any way to uniformly get a random element in such a tree though. However you have a couple options:
You can do it in linear time using advance.
You can keep a separate vector of map iterators that you keep up to date on all insertions/deletions.
You could revisit the container choice. For example, would a sorted vector, heap, or some other representation be better?
If you have an even distribution of values in the set or map, you could choose a random value between the min and max and use lower_bound to find the closest value to it.
If insertions and deletions are infrequent, you can use a vector instead and sort it as necessary. Populating a vector and sorting it takes approximately the same amount of time as populating a set or map; it might even be faster, you'd need to test it to be sure. Selecting a random element would be trivial at that point.
I think you can actually do that with STL, but it's a bit more complicated.
You need to maintain a map. Each with a key from 1..N (N is the number of elements).
So each time you need to take a random element, generate a random number from 1..N, then find the element in the map with the chosen key. This is the element that you pick.
Afterwards, you need to maintain the consistency of the map by finding the biggest element, and update its key with the random number that you just picked.
Since each step is a log(n) operation, the total time is log(n).
with existing STL, there's probably no way. But There's a way to get random key in O(1) with addition std::map and std::vector structure by using reverse indexing.
Maintaining a map m & a vector v.
when inserting a new key k, let i = v.length(), then insert into m, and push k into v so that v[i] = k;
when deleting key k, let i = m[k], lookup the last element k2 in v, set m[k2]=i & v[i] = k2, pop_back v, and remove k from m;
to get a random key, let r = rand()%v.length(), then random key k = v[r];
so the basic idea is to have a continuous array of all existing keys.

Efficiently finding multiple items in a container

I need to find a number of objects from a large container.
The only way I can think of to do that seems to be to just search the container for one item at a time in a loop, however, even which an efficient search with an average case of say "log n" (where n is the size of the container), this gives me "m log n" (where m is the number of items I'm looking for) for the entire operation.
That seems highly suboptimal to me, and as its something that I am likely to need to do on a frequent bases, something I'd definitely like to improve if possible.
Neither part has been implemented yet, so I'm open for suggestions on the format of the main container, the "list" of items I'm looking for, etc, as well as the actual search algorithm.
The items are complex objects, however the search key is just a simple integer.
Hash tables have basically O(1) lookup. This gives you O(m) to lookup m items; obviously you can't lookup m items faster than O(m) because you need to get the result out.
If you're purely doing look-up (you don't require ordered elements) and can give up some memory, try unordered_map (it's TR1, also implemented in Boost), which has constant-time amortized look-up.
In a game engine, we tested std::map and unordered_map, and while map was faster for insertions (if I recall), unordered_map blew it out of the water for retrieval. We had greater than 1000 elements in the map, for scale, which is fairly low compared to some other tasks you may be doing.
If you require elements to be ordered, your next bet is std::map, which has the look-up times you've posted, and keeps the elements ordered. In general, it also uses less memory than an unordered_map.
If your container is a vector and the elements are sorted, you can use std::lower_bound to search in O(log n) time. If your search items are also sorted, you can do a small optimization by always using the last found iterator as the start of the search for the next one, e.g.
vector<stuff> container;
vector<stuff>::iterator it = container.begin();
for (int i = 0; i < search_items.size() && it != container.end(); ++i)
{
it = std::lower_bound(it, container.end(), search_items[i]);
// make sure the found item is a match
if (it != container.end() && search_items[i] < *it)
it = container.end(); // break out early
}
if (it != container.end()) // found it!
boost/tr1 unordered_map and unordered_set are containers backed by a hash table which gives you search in amortized contant time [ O(1) ]
Boost Unordered documentation.
I suppose if you have a sorted container and a uniform distribution of items then the most efficient type of method would be a recursive bisection search with an execution path somewhat like a tree - calling itself twice whenever all the objects being searched for are in both halves of the bisection.
However, if you choose a container based on a hash-table (boost unordered set, I think?), or something similar, then lookup can be O(1), so searching in a loop really doesn't matter.
EDIT:
note that std::map and std::set are normally (always?) implemented using rb-trees, so are only log(n) for lookup.
Are you sure that m log2(n) is actually going to be a problem? If you are using a std::map that is even relatively large, the number of actually comparisons is still pretty small - if you are looking up 10,000 elements in a map of 1,000,000, the number of comparisons should be about 200,000 or about 20 comparisons per target element. This really isn't bad if your key is just a simple integer.
If you were hashing something that didn't already have a nice key, then I would say go with boost::unordered_map. I would implement it with std::map first, profile it, and then decide if you want to make the next jump to Boost.
If you're frequently performing the same projections on your collection, such as extracting elements with a key of "42", you could consider maintaining these subsets in buckets. You'd internally maintain a hashmap from keys to vectors of elements with that key, and add elements to the appropriate bucket as well as your primary collection representing "everything". Extracting a subgroup of elements is constant time (because the respective collections have already been built), and the memory overhead of maintaining the buckets scales primarily with the number of unique keys in your dataset.
This technique is decidedly less effective if you have a large number of unique key values, and makes insertions and removals more expensive, but it's good for some situations- I thought it was at least worth mentioning.