Subtract sample from list, update average - c++

I have a map defined as:
std::map<float, std::map<unsigned short, int>>
The float value is the value of which I keep an average value. Fx a speed. The short is an identifier fx a date. The integer is a count, which I also keep track of. The count says how many samples there have been, with that specific value. Eg. On this date, I have a 'bin' with the values of 220 km/h. The count for this could be 2132.
In case I add a new sample, e.g. a car that drove 220, it then falls in a specific 'bin' for that speed and day, I update the average and count as following:
count++;
average = average + ((samplevalue - average) / count);
This seems to work fine, the average is updated.
But my problem is when I want to remove entries for a specific identifier, e.g. day.
for (auto& bin : *containerMap_)
{
auto iter = bin.second.find(currentDay);
if (iter != bin.second.end())
{
average = (( average * count) - iter->first) / (count - (iter->second));
meassurementCount -= iter->second;
bin.second.erase(iter);
}
}
Here iter->first is the value of the bin (speed) and iter->second is the count.
Though when I delete for a specific identifier, the average is not updated as expected.
In my testcase i only insert samples with one speed, so I only have one bin. Each speed is 3, so I expect the average to keep being 3. It does so with insert, but when I try to remove from one identifier (day), the average is updated to:
Avg : 3.00139
Why does that calculation not work? I would really hate to itterate through everything to update the average, since the count could be in billions.
Best regards :)

Related

Is a heapsort/Priority queue better than a map (ordered map in STL) in C++ in terms of time complexity?

This is a question that has been bugging for me a long time.
I have a huge dataset and I want to sort that dataset using a number that is generated each time a for loop is run throughout that dataset. So, I decided to bind that number to the index of a row in a map. This way, I have indirectly sorted the the dataset (which is a 2D vector of strings). Now I have been told there is a better way for me to do this, preferably by the heap data structure. But I do not know how a heap, like a priority queue would be helpful in this situation (except for being able to print out the smallest element in the heap which would take O(n) time). And I am strictly talking about time complexity here. I'll show you the code that I have:
` for(i = 0; i < final_dataset.size(); i+=2){
string str1 = final_dataset[i][0];
string str2 = final_dataset[i][3];
string str3 = final_dataset[i+1][3];
istringstream(str2) >> num2;
istringstream(str3) >> num3;
total_date_diff = date_calc(str1, numeric_year, numeric_month); //Calculate the difference in input date and the dates in the final_dataset
num2 = d2 - num2; //Calculate the difference in supplies
num3 = d3 - num3; //Calculate the difference in exports
euclid_result = euclid_distance_result(total_date_diff,num2, num3); //Calculate the euclidian distance for every row in the final_dataset
results.insert(pair<double, int>(euclid_result, i)); //Bind the respective euclidian distances to the row index of every row in the final_dataset
}
int k_counter = 0;
cout<<"The "<<k<<" nearest neighbors of entered data point are: \n\n";
//iterate through the map, results(key:euclidian distance, value: corresponding row index) to output the datapoints, in terms of increasing order of euclidian distances.
//Since maps are sorted automatically in ascending order, display the datapoints(taken from the rows of the final_dataset) corresponding to the first k values in the map.
for(auto itr = results.begin(); itr != results.end(); itr++){
cout<<"("<<final_dataset[itr->second][0]<<","<<final_dataset[itr->second][3]<<","
<<final_dataset[itr->second+1][3]<<")"<<" with the label: "
<<final_dataset[itr->second][1]<<"\t"<<" and Euclidian distance="<<itr->first<<"\n";
k_counter++;
if(k_counter == k) break; //once k is reached, stop iterating
}
How could a priority queue possible make it faster than what I did?
The documentation on C++ maps says:
Maps are typically implemented as binary search trees.
This implies that insertion of one key/value pair has a time complexity of O(log𝑛), where 𝑛 is the size of the map at the time of insertion.
To insert all key/value pairs you will thus have a total time complexity of O(log1 + log2 + log3 + ... + log𝑛) = O(log(𝑛!)) = O(𝑛log𝑛)
This is the same time complexity you get from storing the key/value pairs in a vector and sorting it by key calling std::sort.
Heap
A heap-based algorithm can have an advantage when you don't really need to sort the whole input, but are only interested in the least π‘˜ elements, or maybe even only the least π‘˜th element.
Inserting into a heap has also a time complexity of O(log𝑛), where 𝑛 is the size of the heap at the time of insertion.
However, a well designed heap-based algorithm will not need the heap to have more than π‘˜ elements: once that size is reached, values are exchanged -- also with O(logπ‘˜) time complexity. So the overall time complexity will then be O(𝑛logπ‘˜). This is definitely an improvement on O(𝑛log𝑛) when π‘˜ β‰ͺ 𝑛.

Optimizing this 'statistical coincidence' finding algorithm

Goal
The code below is designed to take in a vector<vector<float> > of random numbers from a Gaussian distribution, and perform the following:
Iterate simultaneously through all n columns of the vector until you encounter the first value such exceeding some threshold.
Continue iterating until either a) you encounter a second value exceeding that threshold such that that value comes from a different column that the first found value, or b) you exceed some maximum number of iterations.
In the case of a), continue iterating until either c) you find a third value exceeding the threshold such that the value comes from a different column than the first found value and the second found value, or b) you exceed some maximum number of iterations from the first found value. In the case of b) start over again, except this time start iterating at one row after the first found value.
In the case of c), add one to a counter, and jump forward some x rows. In the case of d), start over, except this time start iterating at one row after the first found value.
How I accomplish this:
In my opinion, the most challenging part is making sure all three values are contributed by a unique column. To tackle this, I used std::set. I iterate through each row of the vector<vector<float> >, then iterate through each column of that row. I check each column for a value exceeding the threshold, and store it's columnar number in an std::set.
I continue iterating. If I reach max_iterations, I jump back to one after the first-found value, empty the set, and reset the counter. If the std::set has size 3, I add one to the counter.
My issue:
This code will need to run on multidimensional vectors of sizes on the order of tens of columns and hundreds of thousands to millions of rows. As of now, that's excruciatingly slow. I'd like to improve performance significantly, if possible.
My code:
void findRate(float thresholdVolts){
set<size_t> cache;
vector<size_t> index;
size_t count = 0, found = 0;
for(auto rowItr = waveform.begin(); rowItr != waveform.end(); ++rowItr){
auto &row = *rowItr;
for(auto colnItr = row.begin(); colnItr != row.end(); ++colnItr){
auto &cell = *colnItr;
if(abs(cell/rmsVoltage) >= (thresholdVolts/rmsVoltage)){
cache.insert(std::distance(row.begin(), colnItr));
index.push_back(std::distance(row.begin(), colnItr));
}
}
if(cache.size() == 0) count == 0;
if(cache.size() == 3){
++found;
cache.clear();
if(std::distance(rowItr, output.end()) > ((4000 - count) + 4E+6)){
std::advance(rowItr, ((4000 - count) + 4E+6));
}
}
}
}
One thing you could do right away, in your inner loop. I understand that rmsVoltage is an external variable that is constant durng execution of the function.
for(auto colnItr = row.begin(); colnItr != row.end(); ++colnItr){
auto &cell = *colnItr;
// you can remove 2 divisions here. Divisions are the slowest
// arithmetic instructions on any cpu
//
// this:
// if(abs(cell/rmsVoltage) >= (thresholdVolts/rmsVoltage)){
//
// becomes this
if (abs(cell) >= thresholdVolts) {
cache.insert(std::distance(row.begin(), colnItr));
index.push_back(std::distance(row.begin(), colnItr));
}
And a bit below: why are you adding a floating point constant to a size_t ??
This may cause unnecessary conversions of size_t to double and then back to size_t, some compilers may hande this, but definitely not all.
These are relatively costly operations.
// this:
// if(std::distance(rowItr, output.end()) > ((4000 - count) + 4E+6)){
// std::advance(rowItr, ((4000 - count) + 4E+6));
// }
if (std::distance(rowItr, output.end()) > (4'004'000 - count))
std::advance(rowItr, 4'004'000 - count);
Also, after observing the needs in memory for your function, you should preallocate some reasonable space for containers cache and index, using vector<>::reserve(), and set<>::reserve().
Did you give us the entire algorithm? The contents of container index are not used anywhere.
Please let me know how much time you've gained with these changes.

Node Information for Segment Tree

Problem :
Think about cars in a race as points on a line. All the cars start at
the same point with an initial speed of zero. All of them move in
the same direction. There are N cars in total, numbered from 1 to
N.
You will be given two kinds of queries :
1. Change the speed of the car i at any time t
2. Output the current winner of the race at any time t
For query type 1, I will be given the time, Car No. and the New Speed.
For query type 2, I will be given the time at which the we need to find the winning car.
Constraints :
N <= 50,000 , Queries <= 10^5
Also time in every query would be >= time in the previous query
What I tried till now :
#include<bits/stdc++.h>
using namespace std;
pair<int,pair<int,int> > arr[50005];//this stores car's last position,speed,time(at which this speed was assigned)
int main()
{
int n,q;
cin>>n>>q;
for(int i=1;i<=n;++i){arr[i]={0,{0,0}};}
while(q--)
{
int type;
cin>>type;
if(type==1)
{
int time,car,speed;
cin>>time>>car>>speed;
arr[car].first= arr[car].first + 1LL*(time-arr[car].second.second)*arr[car].second.first;// new position
arr[car].second.first = speed;
arr[car].second.second = time;
}
else
{
int ans=-1,time;
cin>>time;
for(int i=1;i<=n;++i){
//position at the "time" is the last position plus distance travelled
int temp = (time - arr[i].second.second)*arr[i].second.first + arr[i].first;
// farthest car is the current winner
ans = max(ans,temp);
}
cout<<ans<<endl;
}
}
return 0;
}
Since this approach answers the type 2 query in O(N) it is really ineffecient.
Since I only need an update operation and max range query I thought of using segment tree to speed this up to O(LogN).
I actually was halfway through my code when i realised that it isn't possible to answer query type 2 with this ! What i stored was essentially the same with an extra variable winning_car ! I thought for type 1 query I would just update the variables the same way i did above but for answering which car is winning at the moment I couldn't come up with a query function that would answer that in O(LogN). Either it is possible to write such a function ( which i am not able to) or i stored insufficient information in my node( I think it is this) !
What should I store in the tree node ? (or is it not possible with segment tree ?). Not asking for a ready-made code here, just the approach(if it is complex some pseudo-code would be nice though)

How to figure out "progress" while sorting?

I'm using stable_sort to sort a large vector.
The sorting takes on the order of a few seconds (say, 5-10 seconds), and I would like to display a progress bar to the user showing how much of the sorting is done so far.
But (even if I was to write my own sorting routine) how can I tell how much progress I have made, and how much more there is left to go?
I don't need it to be exact, but I need it to be "reasonable" (i.e. reasonably linear, not faked, and certainly not backtracking).
Standard library sort uses a user-supplied comparison function, so you can insert a comparison counter into it. The total number of comparisons for either quicksort/introsort or mergesort will be very close to log2N * N (where N is the number of elements in the vector). So that's what I'd export to a progress bar: number of comparisons / N*log2N
Since you're using mergesort, the comparison count will be a very precise measure of progress. It might be slightly non-linear if the implementation spends time permuting the vector between comparison runs, but I doubt your users will see the non-linearity (and anyway, we're all used to inaccurate non-linear progress bars :) ).
Quicksort/introsort would show more variance, depending on the nature of the data, but even in that case it's better than nothing, and you could always add a fudge factor on the basis of experience.
A simple counter in your compare class will cost you practically nothing. Personally I wouldn't even bother locking it (the locks would hurt performance); it's unlikely to get into an inconsistent state, and anyway the progress bar won't go start radiating lizards just because it gets an inconsistent progress number.
Split the vector into several equal sections, the quantity depending upon the granularity of progress reporting you desire. Sort each section seperately. Then start merging with std::merge. You can report your progress after sorting each section, and after each merge. You'll need to experiment to determine how much percentage the sorting of the sections should be counted compared to the mergings.
Edit:
I did some experiments of my own and found the merging to be insignificant compared to the sorting, and this is the function I came up with:
template<typename It, typename Comp, typename Reporter>
void ReportSort(It ibegin, It iend, Comp cmp, Reporter report, double range_low=0.0, double range_high=1.0)
{
double range_span = range_high - range_low;
double range_mid = range_low + range_span/2.0;
using namespace std;
auto size = iend - ibegin;
if (size < 32768) {
stable_sort(ibegin,iend,cmp);
} else {
ReportSort(ibegin,ibegin+size/2,cmp,report,range_low,range_mid);
report(range_mid);
ReportSort(ibegin+size/2,iend,cmp,report,range_mid,range_high);
inplace_merge(ibegin, ibegin + size/2, iend);
}
}
int main()
{
std::vector<int> v(100000000);
std::iota(v.begin(), v.end(), 0);
std::random_shuffle(v.begin(), v.end());
std::cout << "starting...\n";
double percent_done = 0.0;
auto report = [&](double d) {
if (d - percent_done >= 0.05) {
percent_done += 0.05;
std::cout << static_cast<int>(percent_done * 100) << "%\n";
}
};
ReportSort(v.begin(), v.end(), std::less<int>(), report);
}
Stable sort is based on merge sort. If you wrote your own version of merge sort then (ignoring some speed-up tricks) you would see that it consists of log N passes. Each pass starts with 2^k sorted lists and produces 2^(k-1) lists, with the sort finished when it merges two lists into one. So you could use the value of k as an indication of progress.
If you were going to run experiments, you might instrument the comparison object to count the number of comparisons made and try and see if the number of comparisons made is some reasonably predictable multiple of n log n. Then you could keep track of progress by counting the number of comparisons done.
(Note that with the C++ stable sort, you have to hope that it finds enough store to hold a copy of the data. Otherwise the cost goes from N log N to perhaps N (log N)^2 and your predictions will be far too optimistic).
Select a small subset of indices and count inversions. You know its maximal value, and you know when you are done the value is zero. So, you can use this value as a "progressor". You can think of it as a measure of entropy.
Easiest way to do it: sort a small vector and extrapolate the time assuming O(n log n) complexity.
t(n) = C * n * log(n) β‡’ t(n1) / t(n2) = n1/n2 * log(n1)/log(n2)
If sorting 10 elements takes 1 ΞΌs, then 100 elements will take 1 ΞΌs * 100/10 * log(100)/log(10) = 20 ΞΌs.
Quicksort is basically
partition input using a pivot element
sort smallest part recursively
sort largest part using tail recursion
All the work is done in the partition step. You could do the outer partition directly and then report progress as the smallest part is done.
So there would be an additional step between 2 and 3 above.
Update progressor
Here is some code.
template <typename RandomAccessIterator>
void sort_wReporting(RandomAccessIterator first, RandomAccessIterator last)
{
double done = 0;
double whole = static_cast<double>(std::distance(first, last));
typedef typename std::iterator_traits<RandomAccessIterator>::value_type value_type;
while (first != last && first + 1 != last)
{
auto d = std::distance(first, last);
value_type pivot = *(first + std::rand() % d);
auto iter = std::partition(first, last,
[pivot](const value_type& x){ return x < pivot; });
auto lower = distance(first, iter);
auto upper = distance(iter, last);
if (lower < upper)
{
std::sort(first, iter);
done += lower;
first = iter;
}
else
{
std::sort(iter, last);
done += upper;
last = iter;
}
std::cout << done / whole << std::endl;
}
}
I spent almost one day to figure out how to display the progress for shell sort, so I will leave here my simple formula. Given an array of colors, it will display the progress. It is blending the colors from red to yellow and then to green. When it is Sorted, it is the last position of the array that is blue. For shell sort, the iterations each time it passes through the array are quite proportional, so the progress becomes pretty accurate.
(Code in Dart/Flutter)
List<Color> colors = [
Color(0xFFFF0000),
Color(0xFFFF5500),
Color(0xFFFFAA00),
Color(0xFFFFFF00),
Color(0xFFAAFF00),
Color(0xFF55FF00),
Color(0xFF00FF00),
Colors.blue,
];
[...]
style: TextStyle(
color: colors[(((pass - 1) * (colors.length - 1)) / (log(a.length) / log(2)).floor()).floor()]),
It is basically a cross-multiplication.
a means array. (log(a.length) / log(2)).floor() means rounding down the log2(N), where N means the number of items. I tested this with several combinations of array sizes, array numbers, and sizes for the array of colors, so I think it is good to go.

Fastest way to obtain the largest X numbers from a very large unsorted list?

I'm trying to obtain the top say, 100 scores from a list of scores being generated by my program. Unfortuatly the list is huge (on the order of millions to billions) so sorting is a time intensive portion of the program.
Whats the best way of doing the sorting to get the top 100 scores?
The only two methods i can think of so far is either first generating all the scores into a massive array and then sorting it and taking the top 100. Or second, generating X number of scores, sorting it and truncating the top 100 scores then continue generating more scores, adding them to the truncated list and then sorting it again.
Either way I do it, it still takes more time than i would like, any ideas on how to do it in an even more efficient way? (I've never taken programming courses before, maybe those of you with comp sci degrees know about efficient algorithms to do this, at least that's what I'm hoping).
Lastly, whats the sorting algorithm used by the standard sort() function in c++?
Thanks,
-Faken
Edit: Just for anyone who is curious...
I did a few time trials on the before and after and here are the results:
Old program (preforms sorting after each outer loop iteration):
top 100 scores: 147 seconds
top 10 scores: 147 seconds
top 1 scores: 146 seconds
Sorting disabled: 55 seconds
new program (implementing tracking of only top scores and using default sorting function):
top 100 scores: 350 seconds <-- hmm...worse than before
top 10 scores: 103 seconds
top 1 scores: 69 seconds
Sorting disabled: 51 seconds
new rewrite (optimizations in data stored, hand written sorting algorithm):
top 100 scores: 71 seconds <-- Very nice!
top 10 scores: 52 seconds
top 1 scores: 51 seconds
Sorting disabled: 50 seconds
Done on a core 2, 1.6 GHz...I can't wait till my core i7 860 arrives...
There's a lot of other even more aggressive optimizations for me to work out (mainly in the area of reducing the number of iterations i run), but as it stands right now, the speed is more than good enough, i might not even bother to work out those algorithm optimizations.
Thanks to eveyrone for their input!
take the first 100 scores, and sort them in an array.
take the next score, and insertion-sort it into the array (starting at the "small" end)
drop the 101st value
continue with the next value, at 2, until done
Over time, the list will resemble the 100 largest value more and more, so more often, you find that the insertion sort immediately aborts, finding that the new value is smaller than the smallest value of the candidates for the top 100.
You can do this in O(n) time, without any sorting, using a heap:
#!/usr/bin/python
import heapq
def top_n(l, n):
top_n = []
smallest = None
for elem in l:
if len(top_n) < n:
top_n.append(elem)
if len(top_n) == n:
heapq.heapify(top_n)
smallest = heapq.nsmallest(1, top_n)[0]
else:
if elem > smallest:
heapq.heapreplace(top_n, elem)
smallest = heapq.nsmallest(1, top_n)[0]
return sorted(top_n)
def random_ints(n):
import random
for i in range(0, n):
yield random.randint(0, 10000)
print top_n(random_ints(1000000), 100)
Times on my machine (Core2 Q6600, Linux, Python 2.6, measured with bash time builtin):
100000 elements: .29 seconds
1000000 elements: 2.8 seconds
10000000 elements: 25.2 seconds
Edit/addition: In C++, you can use std::priority_queue in much the same way as Python's heapq module is used here. You'll want to use the std::greater ordering instead of the default std::less, so that the top() member function returns the smallest element instead of the largest one. C++'s priority queue doesn't have the equivalent of heapreplace, which replaces the top element with a new one, so instead you'll want to pop the top (smallest) element and then push the newly seen value. Other than that the algorithm translates quite cleanly from Python to C++.
Here's the 'natural' C++ way to do this:
std::vector<Score> v;
// fill in v
std::partial_sort(v.begin(), v.begin() + 100, v.end(), std::greater<Score>());
std::sort(v.begin(), v.begin() + 100);
This is linear in the number of scores.
The algorithm used by std::sort isn't specified by the standard, but libstdc++ (used by g++) uses an "adaptive introsort", which is essentially a median-of-3 quicksort down to a certain level, followed by an insertion sort.
Declare an array where you can put the 100 best scores. Loop through the huge list and check for each item if it qualifies to be inserted in the top 100. Use a simple insert sort to add an item to the top list.
Something like this (C# code, but you get the idea):
Score[] toplist = new Score[100];
int size = 0;
foreach (Score score in hugeList) {
int pos = size;
while (pos > 0 && toplist[pos - 1] < score) {
pos--;
if (pos < 99) toplist[pos + 1] = toplist[pos];
}
if (size < 100) size++;
if (pos < size) toplist[pos] = score;
}
I tested it on my computer (Code 2 Duo 2.54 MHz Win 7 x64) and I can process 100.000.000 items in 369 ms.
Since speed is of the essence here, and 40.000 possible highscore values is totally maintainable by any of today's computers, I'd resort to bucket sort for simplicity. My guess is that it would outperform any of the algorithms proposed thus far. The downside is that you'd have to determine some upper limit for the highscore values.
So, let's assume your max highscore value is 40.000:
Make an array of 40.000 entries. Loop through your highscore values. Each time you encounter highscore x, increase your array[x] by one. After this, all you have to do is count the top entries in your array until you have reached 100 counted highscores.
You can do it in Haskell like this:
largest100 xs = take 100 $ sortBy (flip compare) xs
This looks like it sorts all the numbers into descending order (the "flip compare" bit reverses the arguments to the standard comparison function) and then returns the first 100 entries from the list. But Haskell is lazily evaluated, so the sortBy function does just enough sorting to find the first 100 numbers in the list, and then stops.
Purists will note that you could also write the function as
largest100 = take 100 . sortBy (flip compare)
This means just the same thing, but illustrates the Haskell style of composing a new function out of the building blocks of other functions rather than handing variables around the place.
You want the absolute largest X numbers, so I'm guessing you don't want some sort of heuristic. How unsorted is the list? If it's pretty random, your best bet really is just to do a quick sort on the whole list and grab the top X results.
If you can filter scores during the list generation, that's way way better. Only ever store X values, and every time you get a new value, compare it to those X values. If it's less than all of them, throw it out. If it's bigger than one of them, throw out the new smallest value.
If X is small enough you can even keep your list of X values sorted so that you are comparing your new number to a sorted list of values, you can make an O(1) check to see if the new value is smaller than all of the rest and thus throw it out. Otherwise, a quick binary search can find where the new value goes in the list and then you can throw away the first value of the array (assuming the first element is the smallest element).
Place the data into a balanced Tree structure (probably Red-Black tree) that does the sorting in place. Insertions should be O(lg n). Grabbing the highest x scores should be O(lg n) as well.
You can prune the tree every once in awhile if you find you need optimizations at some point.
If you only need to report the value of top 100 scores (and not any associated data), and if you know that the scores will all be in a finite range such as [0,100], then an easy way to do it is with "counting sort"...
Basically, create an array representing all possible values (e.g. an array of size 101 if scores can range from 0 to 100 inclusive), and initialize all the elements of the array with a value of 0. Then, iterate through the list of scores, incrementing the corresponding entry in the list of achieved scores. That is, compile the number of times each score in the range has been achieved. Then, working from the end of the array to the beginning of the array, you can pick out the top X score. Here is some pseudo-code:
let type Score be an integer ranging from 0 to 100, inclusive.
let scores be an array of Score objects
let scorerange be an array of integers of size 101.
for i in [0,100]
set scorerange[i] = 0
for each score in scores
set scorerange[score] = scorerange[score] + 1
let top be the number of top scores to report
let idx be an integer initialized to the end of scorerange (i.e. 100)
while (top > 0) and (idx>=0):
if scorerange[idx] > 0:
report "There are " scorerange[idx] " scores with value " idx
top = top - scorerange[idx]
idx = idx - 1;
I answered this question in response to an interview question in 2008. I implemented a templatized priority queue in C#.
using System;
using System.Collections.Generic;
using System.Text;
namespace CompanyTest
{
// Based on pre-generics C# implementation at
// http://www.boyet.com/Articles/WritingapriorityqueueinC.html
// and wikipedia article
// http://en.wikipedia.org/wiki/Binary_heap
class PriorityQueue<T>
{
struct Pair
{
T val;
int priority;
public Pair(T v, int p)
{
this.val = v;
this.priority = p;
}
public T Val { get { return this.val; } }
public int Priority { get { return this.priority; } }
}
#region Private members
private System.Collections.Generic.List<Pair> array = new System.Collections.Generic.List<Pair>();
#endregion
#region Constructor
public PriorityQueue()
{
}
#endregion
#region Public methods
public void Enqueue(T val, int priority)
{
Pair p = new Pair(val, priority);
array.Add(p);
bubbleUp(array.Count - 1);
}
public T Dequeue()
{
if (array.Count <= 0)
throw new System.InvalidOperationException("Queue is empty");
else
{
Pair result = array[0];
array[0] = array[array.Count - 1];
array.RemoveAt(array.Count - 1);
if (array.Count > 0)
trickleDown(0);
return result.Val;
}
}
#endregion
#region Private methods
private static int ParentOf(int index)
{
return (index - 1) / 2;
}
private static int LeftChildOf(int index)
{
return (index * 2) + 1;
}
private static bool ParentIsLowerPriority(Pair parent, Pair item)
{
return (parent.Priority < item.Priority);
}
// Move high priority items from bottom up the heap
private void bubbleUp(int index)
{
Pair item = array[index];
int parent = ParentOf(index);
while ((index > 0) && ParentIsLowerPriority(array[parent], item))
{
// Parent is lower priority -- move it down
array[index] = array[parent];
index = parent;
parent = ParentOf(index);
}
// Write the item once in its correct place
array[index] = item;
}
// Push low priority items from the top of the down
private void trickleDown(int index)
{
Pair item = array[index];
int child = LeftChildOf(index);
while (child < array.Count)
{
bool rightChildExists = ((child + 1) < array.Count);
if (rightChildExists)
{
bool rightChildIsHigherPriority = (array[child].Priority < array[child + 1].Priority);
if (rightChildIsHigherPriority)
child++;
}
// array[child] points at higher priority sibling -- move it up
array[index] = array[child];
index = child;
child = LeftChildOf(index);
}
// Put the former root in its correct place
array[index] = item;
bubbleUp(index);
}
#endregion
}
}
Median of medians algorithm.