Overwrite elements at end of priority queue once specified capacity is met? - c++

I'm trying to search for the nearest N points in a quad tree and using an STL priority queue to store the points as they are found (sorted by distance from the query point).
Points exceeding a max distance from the query point are never added to the queue. However, I also would like to cut off the number of items that can be returned by the search. Currently, I add all points which are closer to the query point than the max distance, and then only read the top N points from the queue.
In testing, this is too slow -- simply adding every point closer than the max distance ends up slowing down as more points are added. I would instead like to only add more points to the queue if either: there are fewer than N points currently in the queue, or the point in question is closer to the query point than the Nth point in the queue, in which case that point is overriden, and does not increase the number of elements in the queue.
Is there a way to do this with the STL priority queue, or is my only option to write my own?

Do you have to know the current N best points while you iterate through them? If not, perhaps you can add all the points to a random-access container (e.g. std::vector)and just call std::partial_sort at the end to get the N best.
If you really need to do it with a priority queue, std::priority_queue won't suffice. You could put something together with the std::algorithm heap functions, which lets you access the underlying container.

Related

How to reduce the value of C++ priority queue?

Consider I have a min priority queue with the smallest value on the top, I hope to reduce the value on the top so that the property of the queue would still be maintained and time complexity would be O(1). How to do that?
I have seen the question here how to change the value of std::priority_queue top()?
Where take the value out, modify it and push it back basically would be O(logN) complexity, I wonder can I make use of the property of reducing the value so that there is no need to push the value back again?
The standard priority queue doesn't support changing the keys.
What you are looking for is something similar to another data structure called Indexed Priority Queue, often used by Dijkstra algorithm.
The Indexed Prioirty queue supports 2 more methods in it's API: increaseKey and decreaseKey enabling modifying the key's itself.
The STL doesnt define indexed priority queue. You'd probably need to implement one by yourself or look for some third party implementation.
I see the point of this question differently from others. Based on this question,
I have a min priority queue with the smallest value on the top, I hope to reduce the value on the top so that the property of the queue would still be maintained and time complexity would be O(1).
With std::priority_queue, you can't. We may try a hack like const_cast<int &>(pq.top()) = lesser_value_than_top. It could work in some implementations, but it is NOT safe.
So, I would like to suggest building your own PQ with an array-based heap, and you can just change the top's value without log(N) work, as long as it is a value equal or less than the current top.

Implementing Bentley-Ottmann in C++ using the STL

I want to implement the Bentley-Ottmann line segment crossing algorithm based on this description, using STL elements.
Bentley-Ottmann Wikipedia
What I am struggling with is the implementation of the priority queue. The description asks me to erase any intersection:
If p is the left endpoint of a line segment s, insert s into T. Find the segments r and t that are immediately below and above s in T (if they exist) and if their crossing forms a potential future event in the event queue, remove it. If s crosses r or t, add those crossing points as potential future events in the event queue.
It doesn't seem to be possible to use an STL priority queue as the event queue, since its searching complexity is linear and I would need to find and remove any crossing of s and t. Should I use a set instead? Or is it possible with a priority queue?
There are priority queue structures that you can quickly delete from, but they will require a lot of additional memory.
It is actually more efficient just to leave the r-t intersection in the queue. Then, when it comes time to process the event, just ignore it if it's invalid (because r and t are not adjacent) or if it's already been done.
In order to detect when r-t has already been done, just make sure that your priority queue is ordered by a total ordering, i.e., don't just compare the x value of the events. When multiple events have the same x value, use the identifiers of the segments involved to break ties. Then, when r-t appears multiple times in the queue, all of the occurrences will be together and you can just pop them all off in sequence.

construct boost priority queue based on iterators

I have a list of binomial_heaps and each iteration of the algorithm I have to update the priority of an element in some of the binomial_heaps. For this I use the update function of the boost binomial_heap. However one of the binomial_heaps I have to remove and rebuild completely (as all priorities change). Instead of using push every time (which if I understand correctly would have a complexity of n*log(n)) I would like to construct it based on iterators of an underlying container (a kind of heapify or make_heap operation which would be linear time). This seems possible in the standard priority_queue, but not in the boost implementation. On the other hand the standard one does not provide me with an update function. Is there a way around this where I can have both, or another library that supports both. Or maybe my reasoning, that pushing all elements on an empty priority queue is slower, is not correct?
Some might say there is something seriously wrong with the fact that I need to rebuild an entire priority queue which would make the use of the priority queue completely superfluous. The algorithm I want to implement is "Finding community structure in very large networks by Aaron Clauset" in which the authors do exactly that (unless I didn't interpret it correctly)
(Sorry couldn't post the link to the paper as I don't have enough reputation to post more than 2 links)
The "fast modularity" algorithm by Clauset et al. (paper here, code here) uses a pair of linked data structures. On the one hand, you have a sparse matrix data structure (which is really just an adjacency list in which instead of storing the elements hanging off a particular array element as a linked list, we store them using a balanced binary tree data structure), and a max-heap. All the values in the sparse matrix (which are really the dQ_ij values for the potential merges in the algorithm) are also stored in the max-heap.
So, the max-heap is just an efficient way of finding the edge in the sparse matrix with the most positive value. Once you have the ij pair for that edge, you want to "insert" the elements of column (row) i into the elements of column (row) j, and then you want to delete column (row) i. So, you're not going to rebuild the entire max-heap after each pop from the max-heap. Instead, you want to delete some elements from it (the ones in the row/column that you delete from the sparse matrix) and update the values of others (the ones in the updated row/column for j).
This is where the linked data structure is helpful -- in the original implementation, each element in the sparse matrix stores a pointer to its corresponding entry in the max-heap so that if you update the value in the sparse matrix, you can then find the corresponding element in the max-heap and update its value. Once you do this, you need to re-heapify the updated heap element, by letting it move (recursively) up or down in the heap. Similarly, if you delete an element in the sparse matrix, you can find its entry in the heap and call a delete function on it.

Algorithm for merging short lists into a long vector

I have a sparse matrix class whose non-zeros and corresponding column indices are stored, in row-order, in what are basically STL-vector-like containers. They may have unused capacity, like vectors; and to insert/remove elements, existing elements must be moved.
Say I have an operation, insert_erase_replace, or ier for short. ier can do the following, given a position p, a column index j, and a value v:
if v==0, ier removes the entry at p and left-shifts all subsequent entries.
if v!=0, and j is already present at p, ier replaces the cell contents at p with v.
if v!=0, and j is not present at p, ier inserts the entry v and column index j at p after right-shifting all subsequent entries.
So all of that is trivial.
Now let's say I have ier2, which does the same thing, except that it takes a list containing multiple column indices j and corresponding values v. It also has a size n, which indicates how many index/value pairs are present in the list. But because the vector only stores non-zeros, sometimes the actual insertion size is smaller than n.
Still trivial.
But now let's say I have ier3, which takes not just one list like ier2, but multiple lists. This represents editing a slice of the sparse matrix.
At some point, it becomes more efficient to iterate through the vectors, copying them piece by piece and inserting/replacing/erasing the list indices/values ier2-style as we arrive at each insertion point. And if the total insertion size would cause my vector to need a resize anyway, then we do that.
Given that my vector is much, much larger than the total length of the lists, is there an algorithm for efficiently merging the lists into the vector?
So far, here's what I have:
Each list passed to ier3 represents either a net deletion of entries (a left shift), a net replacement (no movement, therefore cheap), or a net insertion of entries (a right shift). There may also be some re-arrangement of elements in there, but the expensive parts are the net deletions and net insertions.
It's not hard to figure out an algorithm for efficiently doing ONLY net insertions or net deletions.
It's harder when either of the two may be happening.
The only thing I can think to do is to handle it in two passes:
Erase/replace
Insert/replace
We erase first because it makes it more likely that any insertions will require fewer copies.
Is this the right approach? Does anyone know of a better one?
Okay, so I'm going to suppose the intervals covered in each list in ier3 are disjoint and given to you in order. If it's meant for editing slices of a matrix, this seems reasonable. I'm also assuming you that you don't need to resize the vector, because that case is easily detectable and solvable.
Initialise a read pointer and a write pointer to the start of the vector you're editing. There'll be an instruction pointer into ie3 too, but I'll ignore that here for clarity's sake. You'll also need a queue. At each step, one of several things can happen:
Default: Neither read nor write are at a position detailed by ier3. In this case, add the element under read to the back of the queue and write the element at the front of the queue to the cell under write. Move both pointers forward one.
read is over a cell that needs to be deleted. In this case, simply move read forward one without adding anything to the queue.
read passes from one cell to the next such that an insertion should happen between them. In this case, add the insertion to the back of the queue and then continue with the next relevant case.
read is at a cell that needs to be modified. In this case, insert the modified cell at the back of the queue, write whatever's at the front of the queue to write, and step them both forwards.
read has arrived at the unused capacity of the vector. In which case just write whatever's left in the queue.
That's the basic outline, but a couple of optimizations can be made: first, if the queue's empty, step both pointers forward to the next position detailed by ie3 without doing anything. Second, minimize the buffer by doing extra writing steps whenever read is ahead of write and the queue is nonempty.
I'd go with your plan with a few important points highlighted.
The erase/replace step should start from the left and only move points within the affected range - it can leave a "gap". It should determine the size of the final vector. At the end of this step, use the determined size to shift the "tail" of the vector as needed, leaving the exact amount of space required for insertions free.
The insertions should start from the right and fill up the gap we left in step 1 by copying each point to it's final position.
This will never shift the main vector once and never copy any point (from the existing slice or insertion set) more than twice so it's essentially linear.
Other data structures might be helpful too - reserving space at both the front and end, or building it out of multiple sections so a resize doesn't force a full copy.
One further optimisation would be to allow some insertions during step 1. If you've erased some, completing any insertion you come across immediately until it balances will prevent you needing to move any points until you reach another erase.
Let n be the size of the list and m be the size of the vector. It sounds like ier does a binary search for j every time, so the searching part is O(n*log(m)).
Assuming the elements in the list are sorted, once you find the first element, it's faster to just navigate up the vector to find the next one. That way searching becomes O(log(m) + n) = O(n).
Also, do a dry pass first to count net deletions/insertions, and a second pass to actually apply the changes. I think these two passes will run faster than the two you describe.
I can suggest a different design for a sparse matrix that should help you achieve performance and a low memory footprint for large sparse matrices.
Instead of vector, why not use a 2D hash table. something like (no std:: for smaller code):
typedef unordered_map< unsigned /* index */, int /* value */ > col_type;
unordered_map< unsigned /* index */, col_type*>; // may need to define hash function for col_type
the outer class (sparse_matrix) searches in O(1) for a column. If not found, it allocates a new column.
Then the column type is searched for the column index in O(1) and either delete/replace or insert based on the original logic. It can see if the column is now empty and delete it from the 'row' hash map.
all basic operations add/delete/replace are O(1).
If you need a fast ordered iteration of the matrix, you can replace the unordered_map with 'map'. If the matrix is very sparse, the O(nlog(n)) complexity will be similar to the hash_map's.
BTW I used pointer to the col_type on purse, the outer hash map grows much (much!) faster this way.

How to repeatedly insert elements into a sorted list fast

I do not have formal CS training, so bear with me.
I need to do a simulation, which can abstracted away to the following (omitting the details):
We have a list of real numbers representing the times of events. In
each step, we
remove the first event, and
as a result of "processing" it, a few other events may get inserted into the list at a strictly later time
and repeat this many times.
Questions
What data structure / algorithm can I use to implement this as efficiently as possible? I need to increase the number of events/numbers in the list significantly. The priority is to make this as fast as possible for a long list.
Since I'm doing this in C++, what data structures are already available in the STL or boost that will make it simple to implement this?
More details:
The number of events in the list is variable, but it's guaranteed to be between n and 2*n where n is some simulation parameter. While the event times are increasing, the time-difference of the latest and earliest events is also guaranteed to be less than a constant T. Finally, I suspect that the density of events in time, while not constant, also has an upper and lower bound (i.e. all the events will never be strongly clustered around a single point in time)
Efforts so far:
As the title of the question says, I was thinking of using a sorted list of numbers. If I use a linked list for constant time insertion, then I have trouble finding the position where to insert new events in a fast (sublinear) way.
Right now I am using an approximation where I divide time into buckets, and keep track of how many event are there in each bucket. Then process the buckets one-by-one as time "passes", always adding a new bucket at the end when removing one from the front, thus keeping the number of buckets constant. This is fast, but only an approximation.
A min-heap might suit your needs. There's an explanation here and I think STL provides the priority_queue for you.
Insertion time is O(log N), removal is O(log N)
It sounds like you need/want a priority queue. If memory serves, the priority queue adapter in the standard library is written to retrieve the largest items instead of the smallest, so you'll have to specify that it use std::greater for comparison.
Other than that, it provides just about exactly what you've asked for: the ability to quickly access/remove the smallest/largest item, and the ability to insert new items quickly. While it doesn't maintain all the items in order, it does maintain enough order that it can still find/remove the one smallest (or largest) item quickly.
I would start with a basic priority queue, and see if that's fast enough.
If not, then you can look at writing something custom.
http://en.wikipedia.org/wiki/Priority_queue
A binary tree is always sorted and has faster access times than a linear list. Search, insert and delete times are O(log(n)).
But it depends whether the items have to be sorted all the time, or only after the process is finished. In the latter case a hash table is probably faster. At the end of the process you then would copy the items to an array or a list and sort it.