Data structure for a priority queue of jobs with priority that changes often

Data structure for a priority queue of jobs with priority that changes often - c++

I have a worker class, and I can submit jobs to the worker. Worker keeps these jobs and runs them sequentially in the order of priority (priority can be any unsigned int basically). For this case std::priority_queue or even a std::set/map could be used to store jobs ordered by priority and then worker would be able to to extract them in order in O(1). Adding jobs would be O(log N).
Now, the requirement that I have is to be able to change priority of any submitted job. In case of std::set/map I'd need to remove and add back the job with different priority. This would be O(log N) and on top of that with set/map it would reallocate nodes internally afaik (this might possibly be avoided with C++17 though). What makes it unusual is that in my case I'll update job priorities way more often than scheduling or executing them. Basically I might schedule a job once, and before it's executed I may end up updating its priority thousands times. In fact, priorities of each job will be changed like 10-20 times a second.
In my case it's reasonably safe to assume that I won't have more than 10K jobs in the queue. At start of my process I expect it always to grow to 10K or so jobs and as these jobs are removed queue should eventually be almost empty all the time, and occasionally there would be 10-50 new jobs added, but it shouldn't grow more than 1000 jobs. Jobs would be removed at a rate of a few jobs a second. Because of my weird requirement of that frequent priority update std::priority_queue or a set don't seem like a good fit. Plain std::list seems to be a better choice: priority change or update/removal is O(1), and when I need to remove jobs it's O(N) to walk entire list to find highest priority item which should happen less frequently than modifying priorities.
One other observation that even though job priorities change often, these changes do not necessarily result in ordering change, e.g. I could possibly simply update key element of my set (by casting away constness or making key mutable?) if that change would still keep that modified element between left and right nodes. What would you suggest for such priority queue? Any boost container or custom data structure design is OK.
In case of set/map I use priority as a key. To make keys unique in my case each key is actually two integers: job sequence number (derived from atomic int that I increment for each new request) and actual priority number. This way if I add multiple jobs with the same priority, they will be executed in order they were scheduled, as sequence numbers would keep them ordered.

A simple priority heap should fit your requirements. Insertion, removal and priority change is all O(log n). But you said usually the priority change would not result in a change in the order. So in case of a priority heap when you change the priority you would check the changed item against the parent and the 2 children and if none of the heap conditions are violated no up or down heap action is required. So only rarely the full O(log n) time will be needed. Practically it will be more like O(1).
Now for efficient operation it is crucial that given an item I you can find the position of that item in the heap in O(1) and access the parent and children.
If the heap simply contains the items in an array then that is all just pointer arithmetic. The drawback is that reordering the heap means copying the items.
If you store pointers to items in the heap then you have to also store a back reference to the position in the heap in the items them self. When you reorder the heap you then only swap the pointers and update the back references.

Basically your are looking for a IndexPriorityQueue. You can implement your own varient of the index priority queue based on your requirement.
A index priority queue allows you to decrease key or increase the key , i.e basically you can increase and decrease the priority of your jobs.
The following is the java implementation of the IndexMinQueue, hope it helps you. IndexMinQueue

Related

boost flat_map batch insertion

I have a C++ program that maintains a boost::flat_map. It receives real-time commands in the form of (key, value). If value is 0, then the flat_map[key] should be deleted if it exists. If value is nonzero, then the flat_map[key] should be set to value in the flat_map if the entry already exists, or it should be inserted if the entry does not already exist.
However, the commands do not come one by one. Instead, they come in batches, and the program only needs the flat_map to be sorted after each entire batch of commands is processed. It does not need the flat_map to be sorted while in the middle of processing a batch of commands.
Given this flexibility, is there a way to reduce processing time by avoiding the flat_map overhead of moving many elements on each insertion/deletion, and only incurring that overhead once at the end of each batch? The program is very latency sensitive.
Appreciate any input you may have!

You can use extract_sequence / adopt_sequence to update the underlying vector, and so long as it ends up ordered and uniqued, there's only a pair of vector moves in overhead.
auto underlying = my_map.extract_sequence();
// merge underlying and batch
my_map.adopt_sequence(boost::ordered_unique_range_t{}, std::move(underlying));

How to reduce the value of C++ priority queue?

Consider I have a min priority queue with the smallest value on the top, I hope to reduce the value on the top so that the property of the queue would still be maintained and time complexity would be O(1). How to do that?
I have seen the question here how to change the value of std::priority_queue top()?
Where take the value out, modify it and push it back basically would be O(logN) complexity, I wonder can I make use of the property of reducing the value so that there is no need to push the value back again?

The standard priority queue doesn't support changing the keys.
What you are looking for is something similar to another data structure called Indexed Priority Queue, often used by Dijkstra algorithm.
The Indexed Prioirty queue supports 2 more methods in it's API: increaseKey and decreaseKey enabling modifying the key's itself.
The STL doesnt define indexed priority queue. You'd probably need to implement one by yourself or look for some third party implementation.

I see the point of this question differently from others. Based on this question,
I have a min priority queue with the smallest value on the top, I hope to reduce the value on the top so that the property of the queue would still be maintained and time complexity would be O(1).
With std::priority_queue, you can't. We may try a hack like const_cast<int &>(pq.top()) = lesser_value_than_top. It could work in some implementations, but it is NOT safe.
So, I would like to suggest building your own PQ with an array-based heap, and you can just change the top's value without log(N) work, as long as it is a value equal or less than the current top.

Automatically increasing variable within a class

So, i'm coding one thing on c++, and i'm trying to implement a priority queue with pairing heaps. I want this priority to automatically increase over time, so that if the element (a class) has been in the heap for, say, 5 minutes, it's priority (a variable) is increased. And i have no clue how to make that happen.
I could implement a function which would check the duration for each element each set amount of time, but the problem is that it's pretty tough to check each and every element within a heap. So I think I need to do something withinin the elements, but I'm not sure what and how.
Is there any simple solution to that? I feel like i must be missing something, but if that's not the case, then I'd better drop this idea, because I have to finish this thing pretty soon.
UP: This program is meant for the human queue, so the reason for this idea is to not make people wait for too long. The priority is arbitrary, there are priority "levels" set for each element when it's added, so making the time the priority is not a solution for me.

You can add the elements to a linked list:
A new element is added to the end of the list
When the first element is in heap for 5 mins, its priority increased and it is moved to the end of the list.
This way you can check only the first element. Another advantage is that you can set the timer to the value the first element is to be checked in. That is, no need to do unnecessary periodical checks.

Multithreaded bruteforce comparisons between elements to create a graph

I have N elements who needs to be compared between each other to create a graph. It gives (N*N-1)/2 comparisons in total.
I want to multithread those comparisons I also have several constraints:
Each element is quite big, it is a matrix actually, so copying all elements in each thread would take too much memory.
Each comparison should occur, meaning I cannot skip one.
At each time a new element can be added in the list this is very tricky because I need to track what has been done, to do just the new ones.
Since the number of comparisons could be huge, like 20millions, I cannot have a queue that big.
Lastly, one could stop the process at any time, I must be able to resume where I was even in other execution of the app.
So far I have a Master thread which contains all the elements and several worker in a thread pool. The worker threads compare a list of pairs or a range of elements. I have a thought of a comparison generator which gives the next X comparisons on demand.
How could I build this generator ?
Should I copy every pairs for the workers, use a ReadWriteLock directly from the worker to read the data from Master ?
How could I track the progress on every thread ?
How could I stop and resume the state of the comparisons ?
I am sorry if that's a lot of questions.
Thank you !

Assuming reads are thread-safe (it usually is as long as no one is writing), a simple solution is to subdivide the tasks among the set of worker threads in some manner, doing so in advance. For instance, for n workers, you could allocate pair (x, y) to worker x mod n. The only communication is letting each worker know its ordinal (0…n-1). Each thread should drop its answers into a private array, which can be collated after everyone else finishes.
A more sophisticated model that accommodates varying worker productivity is to push every value 0…N-1 onto a queue. Each worker thread pulls a number, x, off the queue, evaluates every (x, y) pair, and then goes back for another x.
If you want to take the time, it's more efficient to enqueue pairs so as to minimise cache-thrashing. This is a tricky problem. Essentially, you want to enqueue pairs from small clusters of elements so that every pair within a cluster is evaluated at approximately the same time. As tricky as this is, it could make a huge difference to the efficiency of your algorithm.

How to repeatedly insert elements into a sorted list fast

I do not have formal CS training, so bear with me.
I need to do a simulation, which can abstracted away to the following (omitting the details):
We have a list of real numbers representing the times of events. In
each step, we
remove the first event, and
as a result of "processing" it, a few other events may get inserted into the list at a strictly later time
and repeat this many times.
Questions
What data structure / algorithm can I use to implement this as efficiently as possible? I need to increase the number of events/numbers in the list significantly. The priority is to make this as fast as possible for a long list.
Since I'm doing this in C++, what data structures are already available in the STL or boost that will make it simple to implement this?
More details:
The number of events in the list is variable, but it's guaranteed to be between n and 2*n where n is some simulation parameter. While the event times are increasing, the time-difference of the latest and earliest events is also guaranteed to be less than a constant T. Finally, I suspect that the density of events in time, while not constant, also has an upper and lower bound (i.e. all the events will never be strongly clustered around a single point in time)
Efforts so far:
As the title of the question says, I was thinking of using a sorted list of numbers. If I use a linked list for constant time insertion, then I have trouble finding the position where to insert new events in a fast (sublinear) way.
Right now I am using an approximation where I divide time into buckets, and keep track of how many event are there in each bucket. Then process the buckets one-by-one as time "passes", always adding a new bucket at the end when removing one from the front, thus keeping the number of buckets constant. This is fast, but only an approximation.

A min-heap might suit your needs. There's an explanation here and I think STL provides the priority_queue for you.
Insertion time is O(log N), removal is O(log N)

It sounds like you need/want a priority queue. If memory serves, the priority queue adapter in the standard library is written to retrieve the largest items instead of the smallest, so you'll have to specify that it use std::greater for comparison.
Other than that, it provides just about exactly what you've asked for: the ability to quickly access/remove the smallest/largest item, and the ability to insert new items quickly. While it doesn't maintain all the items in order, it does maintain enough order that it can still find/remove the one smallest (or largest) item quickly.

I would start with a basic priority queue, and see if that's fast enough.
If not, then you can look at writing something custom.
http://en.wikipedia.org/wiki/Priority_queue

A binary tree is always sorted and has faster access times than a linear list. Search, insert and delete times are O(log(n)).
But it depends whether the items have to be sorted all the time, or only after the process is finished. In the latter case a hash table is probably faster. At the end of the process you then would copy the items to an array or a list and sort it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js