Node Information for Segment Tree - c++

Problem :
Think about cars in a race as points on a line. All the cars start at
the same point with an initial speed of zero. All of them move in
the same direction. There are N cars in total, numbered from 1 to
N.
You will be given two kinds of queries :
1. Change the speed of the car i at any time t
2. Output the current winner of the race at any time t
For query type 1, I will be given the time, Car No. and the New Speed.
For query type 2, I will be given the time at which the we need to find the winning car.
Constraints :
N <= 50,000 , Queries <= 10^5
Also time in every query would be >= time in the previous query
What I tried till now :
#include<bits/stdc++.h>
using namespace std;
pair<int,pair<int,int> > arr[50005];//this stores car's last position,speed,time(at which this speed was assigned)
int main()
{
int n,q;
cin>>n>>q;
for(int i=1;i<=n;++i){arr[i]={0,{0,0}};}
while(q--)
{
int type;
cin>>type;
if(type==1)
{
int time,car,speed;
cin>>time>>car>>speed;
arr[car].first= arr[car].first + 1LL*(time-arr[car].second.second)*arr[car].second.first;// new position
arr[car].second.first = speed;
arr[car].second.second = time;
}
else
{
int ans=-1,time;
cin>>time;
for(int i=1;i<=n;++i){
//position at the "time" is the last position plus distance travelled
int temp = (time - arr[i].second.second)*arr[i].second.first + arr[i].first;
// farthest car is the current winner
ans = max(ans,temp);
}
cout<<ans<<endl;
}
}
return 0;
}
Since this approach answers the type 2 query in O(N) it is really ineffecient.
Since I only need an update operation and max range query I thought of using segment tree to speed this up to O(LogN).
I actually was halfway through my code when i realised that it isn't possible to answer query type 2 with this ! What i stored was essentially the same with an extra variable winning_car ! I thought for type 1 query I would just update the variables the same way i did above but for answering which car is winning at the moment I couldn't come up with a query function that would answer that in O(LogN). Either it is possible to write such a function ( which i am not able to) or i stored insufficient information in my node( I think it is this) !
What should I store in the tree node ? (or is it not possible with segment tree ?). Not asking for a ready-made code here, just the approach(if it is complex some pseudo-code would be nice though)

Related

1838. Frequency of the Most Frequent Element leetcode C++

I am trying LeetCode problem 1838. Frequency of the Most Frequent Element:
The frequency of an element is the number of times it occurs in an array.
You are given an integer array nums and an integer k. In one operation, you can choose an index of nums and increment the element at that index by 1.
Return the maximum possible frequency of an element after performing at most k operations.
I am getting a Wrong Answer error for a specific test case.
My code
int checkfreq(vector<int>nums,int k,int i)
{
//int sz=nums.size();
int counter=0;
//int i=sz-1;
int el=nums[i];
while(k!=0 && i>0)
{
--i;
while(nums[i]!=el && k>0 && i>=0)
{
++nums[i];
--k;
}
}
counter=count(nums.begin(),nums.end(),el);
return counter;
}
class Solution {
public:
int maxFrequency(vector<int>& nums, int k) {
sort(nums.begin(),nums.end());
vector<int> nums2=nums;
auto distinct=unique(nums2.begin(),nums2.end());
nums2.resize(distance(nums2.begin(),distinct));
int xx=nums.size()-1;
int counter=checkfreq(nums,k,xx);
for(int i=nums2.size()-2;i>=0;--i)
{
--xx;
int temp=checkfreq(nums,k,xx);
if(temp>counter)
counter=temp;
}
return counter;
}
};
Failing test case
Input
nums = [9968,9934,9996,9928,9934,9906,9971,9980,9931,9970,9928,9973,9930,9992,9930,9920,9927,9951,9939,9915,9963,9955,9955,9955,9933,9926,9987,9912,9942,9961,9988,9966,9906,9992,9938,9941,9987,9917,10000,9919,9945,9953,9994,9913,9983,9967,9996,9962,9982,9946,9924,9982,9910,9930,9990,9903,9987,9977,9927,9922,9970,9978,9925,9950,9988,9980,9991,9997,9920,9910,9957,9938,9928,9944,9995,9905,9937,9946,9953,9909,9979,9961,9986,9979,9996,9912,9906,9968,9926,10000,9922,9943,9982,9917,9920,9952,9908,10000,9914,9979,9932,9918,9996,9923,9929,9997,9901,9955,9976,9959,9995,9948,9994,9996,9939,9977,9977,9901,9939,9953,9902,9926,9993,9926,9906,9914,9911,9901,9912,9990,9922,9911,9907,9901,9998,9941,9950,9985,9935,9928,9909,9929,9963,9997,9977,9997,9938,9933,9925,9907,9976,9921,9957,9931,9925,9979,9935,9990,9910,9938,9947,9969,9989,9976,9900,9910,9967,9951,9984,9979,9916,9978,9961,9986,9945,9976,9980,9921,9975,9999,9922]
k = 1524
Output
Expected: 81
My code returns: 79
I tried to solve as many cases as I could. I realise this is a bruteforce approach, but don't understand why my code is giving the wrong answer.
My approach is to convert numbers from last into the specified number. I need to check these as we have to count how many maximum numbers we can convert. Then this is repeated for every number till second last number. This is basically what I was thinking while writing this code.
The reason for the different output is that your xx index is only decreased one unit at each iteration of the i loop. But that loop is iterating for the number of unique elements, while xx is an index in the original vector. When there are many duplicates, that means xx is coming nowhere near the start of the vector and so it misses opportunities there.
You could fix that problem by replacing:
--xx;
...with:
--xx;
while (xx >= 0 && nums[xx] == nums[xx+1]) --xx;
if (xx < 0) break;
That will solve the issue you raise. You can also drop the unique call, and the distinct, nums2 and i variables. The outer loop could just check that xx > 0.
Efficiency is your next problem
Your algorithm is not as efficient as needed, and other tests with huge input data will time out.
Hint 1: checkfreq's inner loop is incrementing nums[i] one unit at a time. Do you see a way to have it increase with a larger amount, so to avoid that inner loop?
Hint 2 (harder): checkfreq is often incrementing the same value in different calls -- even more so when k is large and the section of the vector that can be incremented is large. Can you think of a way to avoid that checkfreq needs to redo that much work in subsequent calls, and can only concentrate on what is different compared to what it had to calculate in the previous call?

Finding the sequence so that the event is finished at the earliest

This is a problem from informatica olympiad that I am trying to solve since sometime. This is important for me since this contains an underlying fundamental problem that I see in a lot of problems.
Given N citizens for an event such that they have to program on a single computer, eat chocolates and then eat doughnuts. time , ith citizen takes for each task is given as input. Each citizen has to finish the tasks in order, i.e., first program then eat chocolate and then eat doughnuts. Any number of people could eat chocolates or doughnuts at a time but since computer is one only 1 person can program each time. Once, he is done he would move to chocolates and next person shall program. The task is to find the order in which citizens be sent out to program such that event ends in minimum time and this time is the output.
I worked this problem using the approach:
If I start with ith citizen then for remaining n-1 citizens if I find the time (tn-1) then tn = max((ni[0]+ni[1]+ni[2]), ni[0] + tn-1). Eg.:
18 7 6
23 10 27
20 9 14
then 18+7+6, 18+23+10+27, 18+23+20+9+14, max would be 84 but if you start with 23 then time would be 74 which is less.
I implemented this approach whose code I am presenting here. However, the complexity is O(n!) for my approach. I can see underlying repeated subproblems,so I could use DP approach. But the problem is I need to store the time value for each list i to j such that it could begin with any k from i to j and so on. This storage process would again be complex and require n! storage. How, to solve this problem and similar such problems?
Here is my program on my approach:
#include <iostream>
#include <vector>
#include <climits>
int min_time_sequence(std::vector<std::vector<int> > Info, int N)
{
if (N == 0) return 0;
if (N == 1)
{
int val = Info[0][0] + Info[0][1] + Info[0][2];
return val;
}
std::vector<std::vector<int> > tmp = Info;
int mn = INT_MAX;
for (int i = 0; i < N; ++i)
{
//prepare new list
tmp.erase(tmp.begin()+i);
int mn = min_time_sequence(tmp, N-1);
int v1 = Info[i][0] + mn;
int v2 = Info[i][0] + Info[i][1] + Info[i][2];
int larger = v1 > v2 ? v1 : v2;
if (mn > larger) mn = larger;
}
return mn;
}
int main()
{
int N;
std::cin>>N;
std::vector<std::vector<int> > Info;
//input
for (int i = 0; i < N; ++i)
{
std::cin>>Info[i][0];
std::cin>>Info[i][1];
std::cin>>Info[i][2];
}
int mx = 0;
if (N > 0)
mx = min_time_sequence(Info, N);
std::cout<<mx<<std::endl;
return 0;
}
Since you asked for general techniques, you might want to look at greedy algorithms, that is, algorithms that repeatedly optimize the next selection. In this case, that might be for the remaining person who will take the longest total time (the sum of the three times) to program next, so he or she will finish eating sooner, and no one who starts later will take more time.
If such an algorithm were optimal, the program could simply sort the list by the sum of times, in decreasing order, which takes O(N log N) time.
You would, however, be expected to prove that your solution is valid. One way to do that is known as “Greedy Stays Ahead.” That is an inductive proof where you show that the solution your greedy algorithm produces is at least as optimal (by some measure equivalent to optimality at the final step) at its first step, then that it is also as good at its second step, the step after that, and so on. Hint: you might try measuring what is the worst-case scenario for how much time the event could need after each person starts programming. At the final step, when the last person gets to start programming, this is equivalent to optimality.
Another method to prove an algorithm is optimal is “Proof by Exchange.” This is a form of proof by contradiction in which you hypothesize that some different solution is optimal, then you show that exchanging a part of that solution with a part of your solution could improve the supposedly-optimal solution. That contradicts the premise that it was ever optimal—which proves that no other solution is better than this. So: assume the optimal order is different, meaning the last person who finishes started after someone else who took less time. What happens if you switch the positions of those two people?
Greedy solutions are not always best, so in cases where they are not, you would want to look at other techniques, such as symmetry-breaking and pruning the search tree early.

Divide and Conquer to find maximum difference in an array

I am trying to solve a problem where given an array I need to calculate the maximum difference such that the larger element appears after the smaller element.
Here is a better problem statement:
Given the stock prices on each day for n days, what is the maximum profit a person can make by doing exactly one transaction. One transaction means that the person can buy exactly one stock on one day and sell it on a later date.
I am trying to solve this problem using divide and conquer algo.
In my recursive function i am trying to spilt the array into two halves, but i am not sure on how to proceed with logic. Do i just get the max difference in each halves and compare?
int calculateMaxDiff(int *src, int start, int end){
if(end - start == 1) return src[start];
int middle = (start + end)/ 2;
int half1_diff;
int half2_diff;
half1_diff = calculateMaxDiff(src, start, middle);
half2_diff = calculateMaxDiff(src, middle, end);
//Do i need to have two loops here that calculate the diffs for each halves
....
return max(half1_diff, half2_diff);
}
Edit: Example output
Give an array {12, 9, 18, 3, 7, 11, 6, 15, 6, 1 ,10} should return 12 as a result of difference between 15 and 3
The question in your problem can be translated into a better problem statement:
Given the stock prices on each day for n days, what is the maximum profit a person can make by doing exactly one transaction. One transaction means that the person can buy exactly one stock on one day and sell it on a later date.
The divide-and-conquer solution: Let's see if we can solve this by splitting the input in half, solving the problem in each subarray, then combining the two together. Turns out we actually can do this, and can do so efficiently! The intuition is as follows. If we have a single day, the best option is to buy on that day and then sell it back on the same day for no profit. Otherwise, split the array into two halves. If we think about what the optimal answer might be, it must be in one of three places:
The correct buy/sell pair occurs completely within the first half.
The correct buy/sell pair occurs completely within the second half.
The correct buy/sell pair occurs across both halves - we buy in the first half, then sell in the second half.
We can get the values for (1) and (2) by recursively invoking our algorithm on the first and second halves. For option (3), the way to make the highest profit would be to buy at the lowest point in the first half and sell in the greatest point in the second half. We can find the minimum and maximum values in the two halves by just doing a simple linear scan over the input and finding the two values. This then gives us an algorithm with the following recurrence:
T(n) = 2T(n/2) + O(n)
T(n) = O(nlogn)
Here is a simple implementation in Python. Its very simple to understand and its also easy to convert to C++:
def DivideAndConquerSingleSellProfit(arr):
# Base case: If the array has zero or one elements in it, the maximum
# profit is 0.
if len(arr) <= 1:
return 0;
# Cut the array into two roughly equal pieces.
left = arr[ : len(arr) / 2]
right = arr[len(arr) / 2 : ]
# Find the values for buying and selling purely in the left or purely in
# the right.
leftBest = DivideAndConquerSingleSellProfit(left)
rightBest = DivideAndConquerSingleSellProfit(right)
# Compute the best profit for buying in the left and selling in the right.
crossBest = max(right) - min(left)
# Return the best of the three
return max(leftBest, rightBest, crossBest)
Edit: Here is the C++ implementation for the above algorithm
#include <iostream>
#include <algorithm>
using namespace std;
int calculateMin(int a[], int low, int high)
{
int i,mini;
mini = a[low];
for(i=low;i<=high;i++)
{
if(a[i]<mini)
{
mini = a[i];
}
}
return mini;
}
int calculateMax(int a[], int low, int high)
{
int i,maxi;
maxi = a[low];
for(i=low;i<=high;i++)
{
if(a[i]>maxi)
{
maxi = a[i];
}
}
return maxi;
}
int calculateMaxDiff(int a[], int low, int high)
{
if(low>=high)
return 0;
int mid = (low+high)/2;
int leftPartition = calculateMaxDiff(a,low,mid);
int rightPartition = calculateMaxDiff(a,mid+1,high);
int left = calculateMin(a,low,mid); // calculate the min value in the left partition
int right = calculateMax(a,mid+1,high); // calculate the max value in the right partition
return max(max(leftPartition, rightPartition), (right - left));
}
int main() {
int arr[] = {12, 9, 18, 3, 7, 11, 6, 15, 6, 1 ,10};
int len = sizeof(arr)/sizeof(arr[0]);
int ans = calculateMaxDiff(arr, 0, len-1);
cout << "Maximum Profit: " <<ans<<endl;
return 0;
}
Hope it helps!!!
There is no need in complicated D/C algorithm because simple cycle with checking like
maxdiff = max(current - min_so_far, maxdiff)
update min_so_far
solves the problem
If you really want to apply divide and conquer method, you may return triplet {local_min, local_max, local_max_diff} from recursive function like:
left = calculateMaxDiff(start, middle)
right = calculateMaxDiff(middle + 1, end)
return {min(left.local_min, right.local_min),
max(left.local_max, right.local_max),
max(left.local_diff, right.local_diff, right.localmax - left.local_min)
The key for a divide and conquer algorithm is the conquer part.
For this problem the most important condition is:
the larger element appears after the smaller element
For an array src, after dividing src to 2 halves, half1 and half2, suppose the answer would be in position i and j, there are 3 cases now:
i and j are both in half1 -> half1_diff
i and j are both in half2 -> half2_diff
i is in half1 and j is in half2
So the main part is to deal with case3. As the larger one comes after, so we just need to find the minimum value min_half1 in half1 and the maximum value max_half2 in half2, and check if it meets the condition max_half2 >= min_half1 and update the result as max(half1_diff, half2_diff, max_half2-min_half1).
In order to calculate min_half1 and max_half2 efficiently, you have to keep the record of min and max value of the array, and it takes O(1) time.
So T(n) = 2T(n/2) + O(1), T(n) = O(n).
Check the example for more details
http://ideone.com/TbIL2r

Understanding Sum of subsets

I've just started learning Backtracking algorithms at college. Somehow I've managed to make a program for the Subset-Sum problem. Works fine but then i discovered that my program doesn't give out all the possible combinations.
For example : There might be a hundred combinations to a target sum but my program gives only 30.
Here is the code. It would be a great help if anyone could point out what my mistake is.
int tot=0;//tot is the total sum of all the numbers in the set.
int prob[500], d, s[100], top = -1, n; // n = number of elements in the set. prob[i] is the array with the set.
void subset()
{
int i=0,sum=0; //sum - being updated at every iteration and check if it matches 'd'
while(i<n)
{
if((sum+prob[i] <= d)&&(prob[i] <= d))
{
s[++top] = i;
sum+=prob[i];
}
if(sum == d) // d is the target sum
{
show(); // this function just displays the integer array 's'
top = -1; // top points to the recent number added to the int array 's'
i = s[top+1];
sum = 0;
}
i++;
while(i == n && top!=-1)
{
sum-=prob[s[top]];
i = s[top--]+1;
}
}
}
int main()
{
cout<<"Enter number of elements : ";cin>>n;
cout<<"Enter required sum : ";cin>>d;
cout<<"Enter SET :\n";
for(int i=0;i<n;i++)
{
cin>>prob[i];
tot+=prob[i];
}
if(d <= tot)
{
subset();
}
return 0;
}
When I run the program :
Enter number of elements : 7
Enter the required sum : 12
Enter SET :
4 3 2 6 8 12 21
SOLUTION 1 : 4, 2, 6
SOLUTION 2 : 12
Although 4, 8 is also a solution, my program doesnt show it.
Its even worse with the number of inputs as 100 or more. There will be atleast 10000 combinations, but my program shows 100.
The Logic which I am trying to follow :
Take in the elements of the main SET into a subset as long as the
sum of the subset remains less than or equal to the target sum.
If the addition of a particular number to the subset sum makes it
larger than the target, it doesnt take it.
Once it reaches the end
of the set, and answer has not been found, it removes the most
recently taken number from the set and starts looking at the numbers
in the position after the position of the recent number removed.
(since what i store in the array 's' is the positions of the
selected numbers from the main SET).
The solutions you are going to find depend on the order of the entries in the set due to your "as long as" clause in step 1.
If you take entries as long as they don't get you over the target, once you've taken e.g. '4' and '2', '8' will take you over the target, so as long as '2' is in your set before '8', you'll never get a subset with '4' and '8'.
You should either add a possibility to skip adding an entry (or add it to one subset but not to another) or change the order of your set and re-examine it.
It may be that a stack-free solution is possible, but the usual (and generally easiest!) way to implement backtracking algorithms is through recursion, e.g.:
int i = 0, n; // i needs to be visible to show()
int s[100];
// Considering only the subset of prob[] values whose indexes are >= start,
// print all subsets that sum to total.
void new_subsets(int start, int total) {
if (total == 0) show(); // total == 0 means we already have a solution
// Look for the next number that could fit
while (start < n && prob[start] > total) {
++start;
}
if (start < n) {
// We found a number, prob[start], that can be added without overflow.
// Try including it by solving the subproblem that results.
s[i++] = start;
new_subsets(start + 1, total - prob[start]);
i--;
// Now try excluding it by solving the subproblem that results.
new_subsets(start + 1, total);
}
}
You would then call this from main() with new_subsets(0, d);. Recursion can be tricky to understand at first, but it's important to get your head around it -- try easier problems (e.g. generating Fibonacci numbers recursively) if the above doesn't make any sense.
Working instead with the solution you have given, one problem I can see is that as soon as you find a solution, you wipe it out and start looking for a new solution from the number to the right of the first number that was included in this solution (top = -1; i = s[top+1]; implies i = s[0], and there is a subsequent i++;). This will miss solutions that begin with the same first number. You should just do if (sum == d) { show(); } instead, to make sure you get them all.
I initially found your inner while loop pretty confusing, but I think it's actually doing the right thing: once i hits the end of the array, it will delete the last number added to the partial solution, and if this number was the last number in the array, it will loop again to delete the second-to-last number from the partial solution. It can never loop more than twice because numbers included in a partial solution are all at distinct positions.
I haven't analysed the algorithm in detail, but what struck me is that your algorithm doesn't account for the possibility that, after having one solution that starts with number X, there could be multiple solutions starting with that number.
A first improvement would be to avoid resetting your stack s and the running sum after you printed the solution.

Fastest way to obtain the largest X numbers from a very large unsorted list?

I'm trying to obtain the top say, 100 scores from a list of scores being generated by my program. Unfortuatly the list is huge (on the order of millions to billions) so sorting is a time intensive portion of the program.
Whats the best way of doing the sorting to get the top 100 scores?
The only two methods i can think of so far is either first generating all the scores into a massive array and then sorting it and taking the top 100. Or second, generating X number of scores, sorting it and truncating the top 100 scores then continue generating more scores, adding them to the truncated list and then sorting it again.
Either way I do it, it still takes more time than i would like, any ideas on how to do it in an even more efficient way? (I've never taken programming courses before, maybe those of you with comp sci degrees know about efficient algorithms to do this, at least that's what I'm hoping).
Lastly, whats the sorting algorithm used by the standard sort() function in c++?
Thanks,
-Faken
Edit: Just for anyone who is curious...
I did a few time trials on the before and after and here are the results:
Old program (preforms sorting after each outer loop iteration):
top 100 scores: 147 seconds
top 10 scores: 147 seconds
top 1 scores: 146 seconds
Sorting disabled: 55 seconds
new program (implementing tracking of only top scores and using default sorting function):
top 100 scores: 350 seconds <-- hmm...worse than before
top 10 scores: 103 seconds
top 1 scores: 69 seconds
Sorting disabled: 51 seconds
new rewrite (optimizations in data stored, hand written sorting algorithm):
top 100 scores: 71 seconds <-- Very nice!
top 10 scores: 52 seconds
top 1 scores: 51 seconds
Sorting disabled: 50 seconds
Done on a core 2, 1.6 GHz...I can't wait till my core i7 860 arrives...
There's a lot of other even more aggressive optimizations for me to work out (mainly in the area of reducing the number of iterations i run), but as it stands right now, the speed is more than good enough, i might not even bother to work out those algorithm optimizations.
Thanks to eveyrone for their input!
take the first 100 scores, and sort them in an array.
take the next score, and insertion-sort it into the array (starting at the "small" end)
drop the 101st value
continue with the next value, at 2, until done
Over time, the list will resemble the 100 largest value more and more, so more often, you find that the insertion sort immediately aborts, finding that the new value is smaller than the smallest value of the candidates for the top 100.
You can do this in O(n) time, without any sorting, using a heap:
#!/usr/bin/python
import heapq
def top_n(l, n):
top_n = []
smallest = None
for elem in l:
if len(top_n) < n:
top_n.append(elem)
if len(top_n) == n:
heapq.heapify(top_n)
smallest = heapq.nsmallest(1, top_n)[0]
else:
if elem > smallest:
heapq.heapreplace(top_n, elem)
smallest = heapq.nsmallest(1, top_n)[0]
return sorted(top_n)
def random_ints(n):
import random
for i in range(0, n):
yield random.randint(0, 10000)
print top_n(random_ints(1000000), 100)
Times on my machine (Core2 Q6600, Linux, Python 2.6, measured with bash time builtin):
100000 elements: .29 seconds
1000000 elements: 2.8 seconds
10000000 elements: 25.2 seconds
Edit/addition: In C++, you can use std::priority_queue in much the same way as Python's heapq module is used here. You'll want to use the std::greater ordering instead of the default std::less, so that the top() member function returns the smallest element instead of the largest one. C++'s priority queue doesn't have the equivalent of heapreplace, which replaces the top element with a new one, so instead you'll want to pop the top (smallest) element and then push the newly seen value. Other than that the algorithm translates quite cleanly from Python to C++.
Here's the 'natural' C++ way to do this:
std::vector<Score> v;
// fill in v
std::partial_sort(v.begin(), v.begin() + 100, v.end(), std::greater<Score>());
std::sort(v.begin(), v.begin() + 100);
This is linear in the number of scores.
The algorithm used by std::sort isn't specified by the standard, but libstdc++ (used by g++) uses an "adaptive introsort", which is essentially a median-of-3 quicksort down to a certain level, followed by an insertion sort.
Declare an array where you can put the 100 best scores. Loop through the huge list and check for each item if it qualifies to be inserted in the top 100. Use a simple insert sort to add an item to the top list.
Something like this (C# code, but you get the idea):
Score[] toplist = new Score[100];
int size = 0;
foreach (Score score in hugeList) {
int pos = size;
while (pos > 0 && toplist[pos - 1] < score) {
pos--;
if (pos < 99) toplist[pos + 1] = toplist[pos];
}
if (size < 100) size++;
if (pos < size) toplist[pos] = score;
}
I tested it on my computer (Code 2 Duo 2.54 MHz Win 7 x64) and I can process 100.000.000 items in 369 ms.
Since speed is of the essence here, and 40.000 possible highscore values is totally maintainable by any of today's computers, I'd resort to bucket sort for simplicity. My guess is that it would outperform any of the algorithms proposed thus far. The downside is that you'd have to determine some upper limit for the highscore values.
So, let's assume your max highscore value is 40.000:
Make an array of 40.000 entries. Loop through your highscore values. Each time you encounter highscore x, increase your array[x] by one. After this, all you have to do is count the top entries in your array until you have reached 100 counted highscores.
You can do it in Haskell like this:
largest100 xs = take 100 $ sortBy (flip compare) xs
This looks like it sorts all the numbers into descending order (the "flip compare" bit reverses the arguments to the standard comparison function) and then returns the first 100 entries from the list. But Haskell is lazily evaluated, so the sortBy function does just enough sorting to find the first 100 numbers in the list, and then stops.
Purists will note that you could also write the function as
largest100 = take 100 . sortBy (flip compare)
This means just the same thing, but illustrates the Haskell style of composing a new function out of the building blocks of other functions rather than handing variables around the place.
You want the absolute largest X numbers, so I'm guessing you don't want some sort of heuristic. How unsorted is the list? If it's pretty random, your best bet really is just to do a quick sort on the whole list and grab the top X results.
If you can filter scores during the list generation, that's way way better. Only ever store X values, and every time you get a new value, compare it to those X values. If it's less than all of them, throw it out. If it's bigger than one of them, throw out the new smallest value.
If X is small enough you can even keep your list of X values sorted so that you are comparing your new number to a sorted list of values, you can make an O(1) check to see if the new value is smaller than all of the rest and thus throw it out. Otherwise, a quick binary search can find where the new value goes in the list and then you can throw away the first value of the array (assuming the first element is the smallest element).
Place the data into a balanced Tree structure (probably Red-Black tree) that does the sorting in place. Insertions should be O(lg n). Grabbing the highest x scores should be O(lg n) as well.
You can prune the tree every once in awhile if you find you need optimizations at some point.
If you only need to report the value of top 100 scores (and not any associated data), and if you know that the scores will all be in a finite range such as [0,100], then an easy way to do it is with "counting sort"...
Basically, create an array representing all possible values (e.g. an array of size 101 if scores can range from 0 to 100 inclusive), and initialize all the elements of the array with a value of 0. Then, iterate through the list of scores, incrementing the corresponding entry in the list of achieved scores. That is, compile the number of times each score in the range has been achieved. Then, working from the end of the array to the beginning of the array, you can pick out the top X score. Here is some pseudo-code:
let type Score be an integer ranging from 0 to 100, inclusive.
let scores be an array of Score objects
let scorerange be an array of integers of size 101.
for i in [0,100]
set scorerange[i] = 0
for each score in scores
set scorerange[score] = scorerange[score] + 1
let top be the number of top scores to report
let idx be an integer initialized to the end of scorerange (i.e. 100)
while (top > 0) and (idx>=0):
if scorerange[idx] > 0:
report "There are " scorerange[idx] " scores with value " idx
top = top - scorerange[idx]
idx = idx - 1;
I answered this question in response to an interview question in 2008. I implemented a templatized priority queue in C#.
using System;
using System.Collections.Generic;
using System.Text;
namespace CompanyTest
{
// Based on pre-generics C# implementation at
// http://www.boyet.com/Articles/WritingapriorityqueueinC.html
// and wikipedia article
// http://en.wikipedia.org/wiki/Binary_heap
class PriorityQueue<T>
{
struct Pair
{
T val;
int priority;
public Pair(T v, int p)
{
this.val = v;
this.priority = p;
}
public T Val { get { return this.val; } }
public int Priority { get { return this.priority; } }
}
#region Private members
private System.Collections.Generic.List<Pair> array = new System.Collections.Generic.List<Pair>();
#endregion
#region Constructor
public PriorityQueue()
{
}
#endregion
#region Public methods
public void Enqueue(T val, int priority)
{
Pair p = new Pair(val, priority);
array.Add(p);
bubbleUp(array.Count - 1);
}
public T Dequeue()
{
if (array.Count <= 0)
throw new System.InvalidOperationException("Queue is empty");
else
{
Pair result = array[0];
array[0] = array[array.Count - 1];
array.RemoveAt(array.Count - 1);
if (array.Count > 0)
trickleDown(0);
return result.Val;
}
}
#endregion
#region Private methods
private static int ParentOf(int index)
{
return (index - 1) / 2;
}
private static int LeftChildOf(int index)
{
return (index * 2) + 1;
}
private static bool ParentIsLowerPriority(Pair parent, Pair item)
{
return (parent.Priority < item.Priority);
}
// Move high priority items from bottom up the heap
private void bubbleUp(int index)
{
Pair item = array[index];
int parent = ParentOf(index);
while ((index > 0) && ParentIsLowerPriority(array[parent], item))
{
// Parent is lower priority -- move it down
array[index] = array[parent];
index = parent;
parent = ParentOf(index);
}
// Write the item once in its correct place
array[index] = item;
}
// Push low priority items from the top of the down
private void trickleDown(int index)
{
Pair item = array[index];
int child = LeftChildOf(index);
while (child < array.Count)
{
bool rightChildExists = ((child + 1) < array.Count);
if (rightChildExists)
{
bool rightChildIsHigherPriority = (array[child].Priority < array[child + 1].Priority);
if (rightChildIsHigherPriority)
child++;
}
// array[child] points at higher priority sibling -- move it up
array[index] = array[child];
index = child;
child = LeftChildOf(index);
}
// Put the former root in its correct place
array[index] = item;
bubbleUp(index);
}
#endregion
}
}
Median of medians algorithm.