Consider I have a min priority queue with the smallest value on the top, I hope to reduce the value on the top so that the property of the queue would still be maintained and time complexity would be O(1). How to do that?
I have seen the question here how to change the value of std::priority_queue top()?
Where take the value out, modify it and push it back basically would be O(logN) complexity, I wonder can I make use of the property of reducing the value so that there is no need to push the value back again?
The standard priority queue doesn't support changing the keys.
What you are looking for is something similar to another data structure called Indexed Priority Queue, often used by Dijkstra algorithm.
The Indexed Prioirty queue supports 2 more methods in it's API: increaseKey and decreaseKey enabling modifying the key's itself.
The STL doesnt define indexed priority queue. You'd probably need to implement one by yourself or look for some third party implementation.
I see the point of this question differently from others. Based on this question,
I have a min priority queue with the smallest value on the top, I hope to reduce the value on the top so that the property of the queue would still be maintained and time complexity would be O(1).
With std::priority_queue, you can't. We may try a hack like const_cast<int &>(pq.top()) = lesser_value_than_top. It could work in some implementations, but it is NOT safe.
So, I would like to suggest building your own PQ with an array-based heap, and you can just change the top's value without log(N) work, as long as it is a value equal or less than the current top.
Related
I have a worker class, and I can submit jobs to the worker. Worker keeps these jobs and runs them sequentially in the order of priority (priority can be any unsigned int basically). For this case std::priority_queue or even a std::set/map could be used to store jobs ordered by priority and then worker would be able to to extract them in order in O(1). Adding jobs would be O(log N).
Now, the requirement that I have is to be able to change priority of any submitted job. In case of std::set/map I'd need to remove and add back the job with different priority. This would be O(log N) and on top of that with set/map it would reallocate nodes internally afaik (this might possibly be avoided with C++17 though). What makes it unusual is that in my case I'll update job priorities way more often than scheduling or executing them. Basically I might schedule a job once, and before it's executed I may end up updating its priority thousands times. In fact, priorities of each job will be changed like 10-20 times a second.
In my case it's reasonably safe to assume that I won't have more than 10K jobs in the queue. At start of my process I expect it always to grow to 10K or so jobs and as these jobs are removed queue should eventually be almost empty all the time, and occasionally there would be 10-50 new jobs added, but it shouldn't grow more than 1000 jobs. Jobs would be removed at a rate of a few jobs a second. Because of my weird requirement of that frequent priority update std::priority_queue or a set don't seem like a good fit. Plain std::list seems to be a better choice: priority change or update/removal is O(1), and when I need to remove jobs it's O(N) to walk entire list to find highest priority item which should happen less frequently than modifying priorities.
One other observation that even though job priorities change often, these changes do not necessarily result in ordering change, e.g. I could possibly simply update key element of my set (by casting away constness or making key mutable?) if that change would still keep that modified element between left and right nodes. What would you suggest for such priority queue? Any boost container or custom data structure design is OK.
In case of set/map I use priority as a key. To make keys unique in my case each key is actually two integers: job sequence number (derived from atomic int that I increment for each new request) and actual priority number. This way if I add multiple jobs with the same priority, they will be executed in order they were scheduled, as sequence numbers would keep them ordered.
A simple priority heap should fit your requirements. Insertion, removal and priority change is all O(log n). But you said usually the priority change would not result in a change in the order. So in case of a priority heap when you change the priority you would check the changed item against the parent and the 2 children and if none of the heap conditions are violated no up or down heap action is required. So only rarely the full O(log n) time will be needed. Practically it will be more like O(1).
Now for efficient operation it is crucial that given an item I you can find the position of that item in the heap in O(1) and access the parent and children.
If the heap simply contains the items in an array then that is all just pointer arithmetic. The drawback is that reordering the heap means copying the items.
If you store pointers to items in the heap then you have to also store a back reference to the position in the heap in the items them self. When you reorder the heap you then only swap the pointers and update the back references.
Basically your are looking for a IndexPriorityQueue. You can implement your own varient of the index priority queue based on your requirement.
A index priority queue allows you to decrease key or increase the key , i.e basically you can increase and decrease the priority of your jobs.
The following is the java implementation of the IndexMinQueue, hope it helps you. IndexMinQueue
So, i'm coding one thing on c++, and i'm trying to implement a priority queue with pairing heaps. I want this priority to automatically increase over time, so that if the element (a class) has been in the heap for, say, 5 minutes, it's priority (a variable) is increased. And i have no clue how to make that happen.
I could implement a function which would check the duration for each element each set amount of time, but the problem is that it's pretty tough to check each and every element within a heap. So I think I need to do something withinin the elements, but I'm not sure what and how.
Is there any simple solution to that? I feel like i must be missing something, but if that's not the case, then I'd better drop this idea, because I have to finish this thing pretty soon.
UP: This program is meant for the human queue, so the reason for this idea is to not make people wait for too long. The priority is arbitrary, there are priority "levels" set for each element when it's added, so making the time the priority is not a solution for me.
You can add the elements to a linked list:
A new element is added to the end of the list
When the first element is in heap for 5 mins, its priority increased and it is moved to the end of the list.
This way you can check only the first element. Another advantage is that you can set the timer to the value the first element is to be checked in. That is, no need to do unnecessary periodical checks.
Im developing an A* for the first time, and I was using a priority_queue for the open set, until I realize you need to check if nodes are in the open set too, not just the close one.
Thing is, you cant iterate over a priority queue..So why everyone recommend a priority queue for the open set? Is it yet the best option? I think the only way to iterate over it is making a copy so I can pop everything from it (enormous cost).
What the best data structure to use on A*?
A priority queue (PQ) is an abstract data structure (ADS). There are many possibilities to implement them. Unfortunately, the priority_queue supplied with the C++ standard library is rather limited, and other implementations are suited a lot better for implementing A*. Spoilers: you can use std::set/multiset instead of std::priority_queue. But read on:
So what do you need from the priority queue to implement A* is:
Get the node with the lowest cost
Decrease the costs of arbitrary elements
Any priority queue can do 1., but for 2., you need a "mutable" priority queue. The Standard-Lib one cannot do this. Also, you need an easy way to find entries in the priority queue, to find out where to decrease the keys (For when A* finds a better path to an already opened node). There are two basic ways for this: You store a handle to the priority queue element within your graph node (or use a map to store those handles for each graph node) - or you insert the graph nodes themselves.
For the first case, where you store handles for each node, you can use std::multiset for your priority queue. std::multiset::first() will always be your "lowest cost" key, and you can decrease a key by removing it from the set, changing the value and re-inserting, and updating the handle. Alternatively, you can use the mutable priority queues from Boost.Heap, which directly support "decrease-key".
For the second case, you would need some kind of "intrusive" binary tree - since your pathfinding nodes themselves need to be in the priority queue. If you don't want to roll your own, see the ordered associative containers in Boost.Intrusive.
The subject is very large, I suggest you reading this page if you want to know the different possibilities and have a good understanding of which data structure is adapted to your situation :
http://theory.stanford.edu/~amitp/GameProgramming/ImplementationNotes.html#set-representation
In my case, the binary heap was a good balance between difficulty to implement and performances, which was totally what I was looking for. But maybe you are looking for something different ?
The rest of the document is a very good reference for A* for game development
http://theory.stanford.edu/~amitp/GameProgramming/index.html
They mean A priority queue not necessarily the std::priority_queue class that comes with the language. If the built in one doesn't do what you need it to write your own, or find another.
I do not have formal CS training, so bear with me.
I need to do a simulation, which can abstracted away to the following (omitting the details):
We have a list of real numbers representing the times of events. In
each step, we
remove the first event, and
as a result of "processing" it, a few other events may get inserted into the list at a strictly later time
and repeat this many times.
Questions
What data structure / algorithm can I use to implement this as efficiently as possible? I need to increase the number of events/numbers in the list significantly. The priority is to make this as fast as possible for a long list.
Since I'm doing this in C++, what data structures are already available in the STL or boost that will make it simple to implement this?
More details:
The number of events in the list is variable, but it's guaranteed to be between n and 2*n where n is some simulation parameter. While the event times are increasing, the time-difference of the latest and earliest events is also guaranteed to be less than a constant T. Finally, I suspect that the density of events in time, while not constant, also has an upper and lower bound (i.e. all the events will never be strongly clustered around a single point in time)
Efforts so far:
As the title of the question says, I was thinking of using a sorted list of numbers. If I use a linked list for constant time insertion, then I have trouble finding the position where to insert new events in a fast (sublinear) way.
Right now I am using an approximation where I divide time into buckets, and keep track of how many event are there in each bucket. Then process the buckets one-by-one as time "passes", always adding a new bucket at the end when removing one from the front, thus keeping the number of buckets constant. This is fast, but only an approximation.
A min-heap might suit your needs. There's an explanation here and I think STL provides the priority_queue for you.
Insertion time is O(log N), removal is O(log N)
It sounds like you need/want a priority queue. If memory serves, the priority queue adapter in the standard library is written to retrieve the largest items instead of the smallest, so you'll have to specify that it use std::greater for comparison.
Other than that, it provides just about exactly what you've asked for: the ability to quickly access/remove the smallest/largest item, and the ability to insert new items quickly. While it doesn't maintain all the items in order, it does maintain enough order that it can still find/remove the one smallest (or largest) item quickly.
I would start with a basic priority queue, and see if that's fast enough.
If not, then you can look at writing something custom.
http://en.wikipedia.org/wiki/Priority_queue
A binary tree is always sorted and has faster access times than a linear list. Search, insert and delete times are O(log(n)).
But it depends whether the items have to be sorted all the time, or only after the process is finished. In the latter case a hash table is probably faster. At the end of the process you then would copy the items to an array or a list and sort it.
Why are most priority/heap queues implemented as 0 being the highest priority? I'm assuming I'm missing out some key mathematical principle. As I was implementing my own priority queue recently it seemed easier to write the insert function if priority went up with the integer value, but apparently people smarter than me think it should go the other way.
Any ideas?
Most priority queues are implemented as a fibonacci heap or something similar. That data structure supports extracting the minimum in constant time, which makes it natural to make 0 the highest priority, and take elements out of the queue by extracting the minimum.
If it's ever increasing, how could you ever set anything to the highest priority? (+1 for rossfab's answer :)
The reason it's often the other way is that priority queues are used to implement algorithms like Dijkstra's and A*, where the priority is the distance to the node, and you want to process closer nodes first.
I don't think there's any design reason for it. It's probably just because most programmers are used to thinking of 0 as the first element. Another reason might be because enumerators start at 0 so the first defined enum "Highest" integer value will be 0.
As a counterexample, in what's surely one of the most readily available (if not used) priority queue implementations, namely STL's std::priority_queue, the top() element is the one numerically highest according to operator<. Of course everyone is used to the convention of lowest in sort order being front of queue so this catches a lot of people out the first time they use it.
There is nothing inherent about a priority queue that makes 0 a better choice for top priority. For someone writing a reusable implementation however, you will have to pick something, and 0 is well defined no matter what type of integral or floating point value you use for your priority.
Even in writing a private implementation, if you decide you only need 256 priority levels and use an unsigned char as your priority and have 255 be your top priority, then you will need to carefully look over all your code if you decide that you need more levels.