Better method to search array?

Better method to search array? - c++

I have an array (nodes[][]) that contains values of effective distances that looks something like this:
__ __
|1 0.4 3 |
|0.4 1 0 |
|3 3.2 1 ... |
|0.8 4 5 |
|0 0 1 |
-- --
Where the first value, node[0][0] is the distance from node 0 to node 0 which is 1.
So the distance from node 2 to node 1 is 3.2 (node[2][1]=3.2)
I need, given a node column, to search through the rows to find the farthest distance, while not picking itself (node[1][1])
The method I was thinking to do something like this:
int n=0;
currentnode=0; //this is the column I am searching now
if(currentnode==n)
n++;
best=node[n][currentnode];
nextbest=node[n++][currentnode];
if(nextbest>best)
best=nextbest;
else
for(int x=n;x<max;x++) //max is the last column
{
if(currentnode==n)
continue;
nextbest=node[x][currentnode];
if(nextbest>best)
best=nextbest;
}
I can't think of a better method to do this. I could use functions to make it shorter but this is GENERALLY what I am thinking about using. After this I have to loops this to go to the next column that the best distance returns and do this routine again.

As always when trying to optimize, you have to make a choice:
Do you want the cost during insertion, or during search ?
If you have few insertions, and a lot of search to do in the container, then you need a sorted container. Finding the maximum will be O(1) - i.e. just pick the last element.
If you have a lot of insertions and a few search, then you can stay with an unsorted container, and finding a maximum is O(n) - i.e. you have to check all values at least once to pick the the maximum.

You can simplify it quite a bit. A lot of your checks and temporary variables are redundant. Here's a small function that performs your search. I've renamed most of the variables to be a little more precise what their roles are.
int maxDistance(int fromNode) {
int max = -1;
for (int toNode = 0; toNode < nodeCount; ++toNode)
{
if (fromNode != toNode && nodes[toNode][fromNode] > max) {
max = node[toNode][fromNode];
}
}
return max;
}

If you are willing to sacrifice some space, you could add additional arrays to keep track of the maximum distance seen so far for a particular column/row and the node that that distance corresponds to.

Profile it. Unless this is a major bottleneck, I'd favour clarity (maintainability) over cleverness.
Looping linearly over arrays is something that modern processors do rather well, and the O(N) approach often works just fine.
With thousands of nodes, I'd expect your old Pentium III to be able to a few gazillion a second! :)

Related

efficiently mask-out exactly 30% of array with 1M entries

My question's header is similar to this link, however that one wasn't answered to my expectations.
I have an array of integers (1 000 000 entries), and need to mask exactly 30% of elements.
My approach is to loop over elements and roll a dice for each one. Doing it in a non-interrupted manner is good for cache coherency.
As soon as I notice that exactly 300 000 of elements were indeed masked, I need to stop. However, I might reach the end of an array and have only 200 000 elements masked, forcing me to loop a second time, maybe even a third, etc.
What's the most efficient way to ensure I won't have to loop a second time, and not being biased towards picking some elements?
Edit:
//I need to preserve the order of elements.
//For instance, I might have:
[12, 14, 1, 24, 5, 8]
//Masking away 30% might give me:
[0, 14, 1, 24, 0, 8]
The result of masking must be the original array, with some elements set to zero

Just do a fisher-yates shuffle but stop at only 300000 iterations. The last 300000 elements will be the randomly chosen ones.
std::size_t size = 1000000;
for(std::size_t i = 0; i < 300000; ++i)
{
std::size_t r = std::rand() % size;
std::swap(array[r], array[size-1]);
--size;
}
I'm using std::rand for brevity. Obviously you want to use something better.
The other way is this:
for(std::size_t i = 0; i < 300000;)
{
std::size_t r = rand() % 1000000;
if(array[r] != 0)
{
array[r] = 0;
++i;
}
}
Which has no bias and does not reorder elements, but is inferior to fisher yates, especially for high percentages.

When I see a massive list, my mind always goes first to divide-and-conquer.
I won't be writing out a fully-fleshed algorithm here, just a skeleton. You seem like you have enough of a clue to take decent idea and run with it. I think I only need to point you in the right direction. With that said...
We'd need an RNG that can return a suitably-distributed value for how many masked values could potentially be below a given cut point in the list. I'll use the halfway point of the list for said cut. Some statistician can probably set you up with the right RNG function. (Anyone?) I don't want to assume it's just uniformly random [0..mask_count), but it might be.
Given that, you might do something like this:
// the magic RNG your stats homework will provide
int random_split_sub_count_lo( int count, int sub_count, int split_point );
void mask_random_sublist( int *list, int list_count, int sub_count )
{
if (list_count > SOME_SMALL_THRESHOLD)
{
int list_count_lo = list_count / 2; // arbitrary
int list_count_hi = list_count - list_count_lo;
int sub_count_lo = random_split_sub_count_lo( list_count, mask_count, list_count_lo );
int sub_count_hi = list_count - sub_count_lo;
mask( list, list_count_lo, sub_count_lo );
mask( list + sub_count_lo, list_count_hi, sub_count_hi );
}
else
{
// insert here some simple/obvious/naive implementation that
// would be ludicrous to use on a massive list due to complexity,
// but which works great on very small lists. I'm assuming you
// can do this part yourself.
}
}
Assuming you can find someone more informed on statistical distributions than I to provide you with a lead on the randomizer you need to split the sublist count, this should give you O(n) performance, with 'n' being the number of masked entries. Also, since the recursion is set up to traverse the actual physical array in constantly-ascending-index order, cache usage should be as optimal as it's gonna get.
Caveat: There may be minor distribution issues due to the discrete nature of the list versus the 30% fraction as you recurse down and down to smaller list sizes. In practice, I suspect this may not matter much, but whatever person this solution is meant for may not be satisfied that the random distribution is truly uniform when viewed under the microscope. YMMV, I guess.

Here's one suggestion. One million bits is only 128K which is not an onerous amount.
So create a bit array with all items initialised to zero. Then randomly select 300,000 of them (accounting for duplicates, of course) and mark those bits as one.
Then you can run through the bit array and, any that are set to one (or zero, if your idea of masking means you want to process the other 700,000), do whatever action you wish to the corresponding entry in the original array.
If you want to ensure there's no possibility of duplicates when randomly selecting them, just trade off space for time by using a Fisher-Yates shuffle.
Construct an collection of all the indices and, for each of the 700,000 you want removed (or 300,000 if, as mentioned, masking means you want to process the other ones) you want selected:
pick one at random from the remaining set.
copy the final element over the one selected.
reduce the set size.
This will leave you with a random subset of indices that you can use to process the integers in the main array.

You want reservoir sampling. Sample code courtesy of Wikipedia:
(*
S has items to sample, R will contain the result
*)
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]

2 player team knowing maximum moves

Given a list of N players who are to play a 2 player game. Each of them are either well versed in making a particular move or they are not. Find out the maximum number of moves a 2-player team can know.
And also find out how many teams can know that maximum number of moves?
Example Let we have 4 players and 5 moves with ith player is versed in jth move if a[i][j] is 1 otherwise it is 0.
10101
11100
11010
00101
Here maximum number of moves a 2-player team can know is 5 and their are two teams that can know that maximum number of moves.
Explanation : (1, 3) and (3, 4) know all the 5 moves. So the maximal moves a 2-player team knows is 5, and only 2 teams can acheive this.
My approach : For each pair of players i check if any of the players is versed in ith move or not and for each player maintain the maximum pairs he can make with other players with his local maximum move combination.
vector<int> pairmemo;
for(int i=0;i<n;i++){
int mymax=INT_MIN;
int countpairs=0;
for(int j=i+1;j<n;j++){
int count=0;
for(int k=0;k<m;k++){
if(arr[i][k]==1 || arr[j][k]==1)
{
count++;
}
}
if(mymax<count){
mymax=count;
countpairs=0;
}
if(mymax==count){
countpairs++;
}
}
pairmemo.push_back(countpairs);
maxmemo.push_back(mymax);
}
Overall maximum of all N players is answer and count is corresponding sum of the pairs being calculated.
for(int i=0;i<n;i++){
if(maxi<maxmemo[i])
maxi=maxmemo[i];
}
int countmaxi=0;
for(int i=0;i<n;i++){
if(maxmemo[i]==maxi){
countmaxi+=pairmemo[i];
}
}
cout<<maxi<<"\n";
cout<<countmaxi<<"\n";
Time complexity : O((N^2)*M)
Code :
How can i improve it?
Constraints : N<= 3000 and M<=1000

If you represent each set of moves by a very large integer, the problem boils down to finding pair of players (I, J) which have maximum number of bits set in MovesI OR MovesJ.
So, you can use bit-packing and compress all the information on moves in Long integer array. It would take 16 unsigned long integers to store according to the constraints. So, for each pair of players you OR the corresponding arrays and count number of ones. This would take O(N^2 * 16) which would run pretty fast given the constraints.
Example:
Lets say given matrix is
11010
00011
and you used 4-bit integer for packing it.
It would look like:
1101-0000
0001-1000
that is,
13,0
1,8
After OR the moves array for 2 player team becomes 13,8, now count the bits which are one. You have to optimize the counting of bits also, for that read the accepted answer here, otherwise the factor M would appear in complexity. Just maintain one count variable and one maxNumberOfBitsSet variable as you process the pairs.

What Ill do is:
1. Do logical OR between all the possible pairs - O(N^2) and store it's SUM in a 2D array with the symmetric diagonal ignored. (thats we save half of the calc - see example)
2. find the max value in the 2D Array (can be done while doing task 1) -> O(1)
3. count how many cells in the 2D array equals to the maximum value in task 2 O(N^2)
sum: 2*O(N^2)+ O(1) => O(N^2)
Example (using the data in the question (with letters indexes):
A[10101] B[11100] C[11010] D[00101]
Task 1:
[A|B] = 11101 = SUM(4)
[A|C] = 11111 = SUM(5)
[A|D] = 10101 = SUM(3)
[B|C] = 11110 = SUM(4)
[B|D] = 11101 = SUM(4)
[C|D] = 11111 = SUM(5)
Task 2 (Done while is done 1):
Max = 5
Task 3:
Count = 2
By the way, O(N^2) is the minimum possible since you HAVE to check all the possible pairs.

Since you have to find all solutions, unless you find a way to find a count without actually finding the solutions themselves, you have to actually look at or eliminate all possible solutions. So the worst case will always be O(N^2*M), which I'll call O(n^3) as long as N and M are both big and similar size.
However, you can hope for much better performance on the average case by pruning.
Don't check every case. Find ways to eliminate combinations without checking them.
I would sum and store the total number of moves known to each player, and sort the array rows by that value. That should provide an easy check for exiting the loop early. Sorting at O(n log n) should be basically free in an O(n^3) algorithm.
Use Priyank's basic idea, except with bitsets, since you obviously can't use a fixed integer type with 3000 bits.
You may benefit from making a second array of bitsets for the columns, and use that as a mask for pruning players.

Finding a number in an array

I have an array of 20 numbers (64 bit int) something like 10, 25, 36,43...., 118, 121 (sorted numbers).
Now, I have to give millions of numbers as input (say 17, 30).
What I have to give as output is:
for Input 17:
17 is < 25 and > 10. So, output will be index 0.
for Input 30:
30 is < 36 and > 25. So, output will be index 1.
Now, I can do it using linear search, binary serach. Is there any method to do it faster way ? Input numbers are random (gaussian).

If you know the distribution, you can direct your search in a smarter way.
Here is the rough idea of this variant of binary search:
Assuming that your data is expected to be distributed uniformly on 0 to 100.
If you observe the value 0, you start at the beginning. If your value is 37, you start at 37% of the array you have. This is the key difference to binary search: you don't always start at 50%, but you try to start in the expected "optimal" position.
This also works for Gaussian distributed data, if you know the parameters (If you don't know them, you can still estimate them easily from the observed data). You would compute the Gaussian CDF, and this yields the place to start your search.
Now for the next step, you need to refine your search. At the position you looked at, there was a different value. You can use this to re-estimate the position to continue searching.
Now even if you don't know the distribution this can work very well. So you start with a binary search, and looked at objects at 50% and 25% already. Instead of going to 37.5% next, you can do a better guess, if your query values was e.g. very close to the 50% entry. Unless your data set is very "clumpy" (and your queries are not correlated to the data) then this should still outperform "naive" binary search that always splits in the middle.
http://en.wikipedia.org/wiki/Interpolation_search
The expected average runtime apparently is O(log(log(n)), from Wikipedia.
Update: since someone complained that with just 20 numbers things are different. Yes, they are. With 20 numbers linear search may be best. Because of CPU caching. Linear scanning through a small amount of memory - that fits into the CPU cache - can be really fast. In particular with an unrolled loop. But that case is quite pathetic and uninteresting IMHO.

I believe best option for you is to use upper_bound - it will find the first value in the array bigger than the one you are searching for.
Still depending on the problem you try to solve maybe lower_bound or binary_search may be the thing you need.
All of these algorithms are with logarithmic complexity.

There is nothing will be better than binary search since your array is sorted.
Linear search is O(n) while binary search is O(log n)
Edit:
Interpolation search makes an extra assumption (the elements have to be uniformly distributed) and do more comparisons per iteration.
You can try both and empirically measure which is better for your case

In fact, this problem is quite interesting because it is a re-cast of an information theoretic framework.
Given 20 numbers, you will end up with 21 bins (including < first one and > last one).
For each incoming number, you are to map to one of these 21 bins. This mapping is done by comparison. Each comparison gives you 1 bit of information (< or >= -- two states).
So suppose the incoming number requires 5 comparisons in order to figure out which bin it belongs to, then it is equivalent to using 5 bits to represent that number.
Our goal is to minimize the number of comparisons! We have 1 million numbers each belonging to 21 ordered code words. How do we do that?
This is exactly an entropy compression problem.
Let a[1],.. a[20], be your 20 numbers.
Let p(n) = pr { incoming number is < n }.
Build the decision tree as follows.
Step 1.
let i = argmin |p(a[i]) - 0.5|
define p0(n) = p(n) / (sum(p(j), j=0...a[i-1])), and p0(n)=0 for n >= a[i].
define p1(n) = p(n) / (sum(p(j), j=a[i]...a[20])), and p1(n)=0 for n < a[i].
Step 2.
let i0 = argmin |p0(a[i0]) - 0.5|
let i1 = argmin |p1(a[i1]) - 0.5|
and so on...
and by the time we're done, we end up with:
i, i0, i1, i00, i01, i10, i11, etc.
each one of these i gives us the comparison position.
so now our algorithm is as follows:
let u = input number.
if (u < a[i]) {
if (u < a[i0]) {
if (u < a[i00]) {
} else {
}
} else {
if (u < a[i01]) {
} else {
}
}
} else {
similarly...
}
so the i's define a tree, and the if statements are walking the tree. we can just as well put it into a loop, but it's easier to illustrate with a bunch of if.
so for example, if you knew that your data were uniformly distributed between 0 and 2^63, and your 20 number were
0,1,2,3,...19
then
i = 20 (notice that there is no i1)
i0 = 10
i00 = 5
i01 = 15
i000 = 3
i001 = 7
i010 = 13
i011 = 17
i0000 = 2
i0001 = 4
i0010 = 6
i0011 = 9
i00110 = 8
i0100 = 12
i01000 = 11
i0110 = 16
i0111 = 19
i01110 = 18
ok so basically, the comparison would be as follows:
if (u < a[20]) {
if (u < a[10]) {
if (u < a[5]) {
} else {
...
}
} else {
...
}
} else {
return 21
}
so note here, that I am not doing binary search! I am first checking the end point. why?
there is 100*((2^63)-20)/(2^63) percent chance that it will be greater than a[20]. this is basically like 99.999999999999999783159565502899% chance!
so this algorithm as it is has an expected number of comparison of 1 for a dataset with the properties specified above! (this is better than log log :p)
notice what I have done here is I am basically using fewer compares to find numbers that are more probable and more compares to find numbers that are less probable. for example, the number 18 requires 6 comparisons (1 more than needed with binary search); however, the numbers 20 to 2^63 require only 1 comparison. this same principle is used for lossless (entropy) data compression -- use fewer bits to encode code words that appear often.
building the tree is a one time process and you can use the tree 1 million times later.
the question is... when does this decision tree become binary search? homework exercise! :p the answer is simple. it's similar to when you can't compress a file any more.
ok, so I didn't pull this out of my behind... the basis is here:
http://en.wikipedia.org/wiki/Arithmetic_coding

You could perform binary search using std::lower_bound and std::upper_bound. These give you back iterators, so you can use std::distance to get an index.

Find pair of elements in integer array such that abs(v[i]-v[j]) is minimized

Lets say we have int array with 5 elements: 1, 2, 3, 4, 5
What I need to do is to find minimum abs value of array's elements' subtraction:
We need to check like that
1-2 2-3 3-4 4-5
1-3 2-4 3-5
1-4 2-5
1-5
And find minimum abs value of these subtractions. We can find it with 2 fors. The question is, is there any algorithm for finding value with one and only for?

sort the list and subtract nearest two elements

The provably best performing solution is assymptotically linear O(n) up until constant factors.
This means that the time taken is proportional to the number of the elements in the array (which of course is the best we can do as we at least have to read every element of the array, which already takes O(n) time).
Here is one such O(n) solution (which also uses O(1) space if the list can be modified in-place):
int mindiff(const vector<int>& v)
{
IntRadixSort(v.begin(), v.end());
int best = MAX_INT;
for (int i = 0; i < v.size()-1; i++)
{
int diff = abs(v[i]-v[i+1]);
if (diff < best)
best = diff;
}
return best;
}
IntRadixSort is a linear time fixed-width integer sorting algorithm defined here:
http://en.wikipedia.org/wiki/Radix_sort
The concept is that you leverage the fixed-bitwidth nature of ints by paritioning them in a series of fixed passes on the bit positions. ie partition them on the hi bit (32nd), then on the next highest (31st), then on the next (30th), and so on - which only takes linear time.

The problem is equivalent to sorting. Any sorting algorithm could be used, and at the end, return the difference between the nearest elements. A final pass over the data could be used to find that difference, or it could be maintained during the sort. Before the data is sorted the min difference between adjacent elements will be an upper bound.
So to do it without two loops, use a sorting algorithm that does not have two loops. In a way it feels like semantics, but recursive sorting algorithms will do it with only one loop. If this issue is the n(n+1)/2 subtractions required by the simple two loop case, you can use an O(n log n) algorithm.

No, unless you know the list is sorted, you need two

Its simple Iterate in a for loop
keep 2 variable "minpos and maxpos " and " minneg" and "maxneg"
check for the sign of the value you encounter and store maximum positive in maxpos
and minimum +ve number in "minpos" do the same by checking in if case for number
less than zero. Now take the difference of maxpos-minpos in one variable and
maxneg and minneg in one variable and print the larger of the two . You will get
desired.
I believe you definitely know how to find max and min in one for loop
correction :- The above one is to find max difference in case of minimum you need to
take max and second max instead of max and min :)

This might be help you:
end=4;
subtractmin;
m=0;
for(i=1;i<end;i++){
if(abs(a[m]-a[i+m])<subtractmin)
subtractmin=abs(a[m]-a[i+m];}
if(m<4){
m=m+1
end=end-1;
i=m+2;
}}

Has anyone seen this improvement to quicksort before?

Handling repeated elements in previous quicksorts
I have found a way to handle repeated elements more efficiently in quicksort and would like to know if anyone has seen this done before.
This method greatly reduces the overhead involved in checking for repeated elements which improves performance both with and without repeated elements. Typically, repeated elements are handled in a few different ways which I will first enumerate.
First, there is the Dutch National Flag method which sort the array like [ < pivot | == pivot | unsorted | > pivot].
Second, there is the method of putting the equal elements to the far left during the sort and then moving them to the center the sort is [ == pivot | < pivot | unsorted | > pivot] and then after the sort the == elements are moved to the center.
Third, the Bentley-McIlroy partitioning puts the == elements to both sides so the sort is [ == pivot | < pivot | unsorted | > pivot | == pivot] and then the == elements are moved to the middle.
The last two methods are done in an attempt to reduce the overhead.
My Method
Now, let me explain how my method improves the quicksort by reducing the number of comparisons.
I use two quicksort functions together rather than just one.
The first function I will call q1 and it sorts an array as [ < pivot | unsorted | >= pivot].
The second function I will call q2 and it sorts the array as [ <= pivot | unsorted | > pivot].
Let's now look at the usage of these in tandem in order to improve the handling of repeated elements.
First of all, we call q1 to sort the whole array. It picks a pivot which we will further reference to as pivot1 and then sorts around pivot1. Thus, our array is sorted to this point as [ < pivot1 | >= pivot1 ].
Then, for the [ < pivot1] partition, we send it to q1 again, and that part is fairly normal so let's sort the other partition first.
For the [ >= pivot1] partition, we send it to q2. q2 choses a pivot, which we will reference to as pivot2 from within this sub-array and sorts it into [ <= pivot2 | > pivot2].
If we look now at the entire array, our sorting looks like [ < pivot1 | >= pivot1 and <= pivot2 | > pivot2]. This looks very much like a dual-pivot quicksort.
Now, let's return to the subarray inside of q2 ([ <= pivot2 | > pivot2]).
For the [ > pivot2] partition, we just send it back to q1 which is not very interesting.
For the [ <= pivot2] partition, we first check if pivot1 == pivot2. If they are equal, then this partition is already sorted because they are all equal elements! If the pivots aren't equal, then we just send this partition to q2 again which picks a pivot (further pivot3), sorts, and if pivot3 == pivot1, then it does not have to sort the [ <= pivot 3] and so on.
Hopefully, you get the point by now. The improvement with this technique is that equal elements are handled without having to check if each element is also equal to the pivots. In other words, it uses less comparisons.
There is one other possible improvement that I have not tried yet which is to check in qs2 if the size of the [ <= pivot2] partition is rather large (or the [> pivot2] partition is very small) in comparison to the size of its total subarray and then to do a more standard check for repeated elements in that case (one of the methods listed above).
Source Code
Here are two very simplified qs1 and qs2 functions. They use the Sedgewick converging pointers method of sorting. They can obviously can be very optimized (they choose pivots extremely poorly for instance), but this is just to show the idea. My own implementation is longer, faster and much harder to read so let's start with this:
// qs sorts into [ < p | >= p ]
void qs1(int a[], long left, long right){
// Pick a pivot and set up some indicies
int pivot = a[right], temp;
long i = left - 1, j = right;
// do the sort
for(;;){
while(a[++i] < pivot);
while(a[--j] >= pivot) if(i == j) break;
if(i >= j) break;
temp = a[i];
a[i] = a[j];
a[j] = temp;
}
// Put the pivot in the correct spot
temp = a[i];
a[i] = a[right];
a[right] = temp;
// send the [ < p ] partition to qs1
if(left < i - 1)
qs1(a, left, i - 1);
// send the [ >= p] partition to qs2
if( right > i + 1)
qs2(a, i + 1, right);
}
void qs2(int a[], long left, long right){
// Pick a pivot and set up some indicies
int pivot = a[left], temp;
long i = left, j = right + 1;
// do the sort
for(;;){
while(a[--j] > pivot);
while(a[++i] <= pivot) if(i == j) break;
if(i >= j) break;
temp = a[i];
a[i] = a[j];
a[j] = temp;
}
// Put the pivot in the correct spot
temp = a[j];
a[j] = a[left];
a[left] = temp;
// Send the [ > p ] partition to qs1
if( right > j + 1)
qs1(a, j + 1, right);
// Here is where we check the pivots.
// a[left-1] is the other pivot we need to compare with.
// This handles the repeated elements.
if(pivot != a[left-1])
// since the pivots don't match, we pass [ <= p ] on to qs2
if(left < j - 1)
qs2(a, left, j - 1);
}
I know that this is a rather simple idea, but it gives a pretty significant improvement in runtime when I add in the standard quicksort improvements (median-of-3 pivot choosing, and insertion sort for small array for a start). If you are going to test using this code, only do it on random data because of the poor pivot choosing (or improve the pivot choice). To use this sort you would call:
qs1(array,0,indexofendofarray);
Some Benchmarks
If you want to know just how fast it is, here is a little bit of data for starters. This uses my optimized version, not the one given above. However, the one given above is still much closer in time to the dual-pivot quicksort than the std::sort time.
On highly random data with 2,000,000 elements, I get these times (from sorting several consecutive datasets):
std::sort - 1.609 seconds
dual-pivot quicksort - 1.25 seconds
qs1/qs2 - 1.172 seconds
Where std::sort is the C++ Standard Library sort, the dual-pivot quicksort is one that came out several months ago by Vladimir Yaroslavskiy, and qs1/qs2 is my quicksort implementation.
On much less random data. with 2,000,000 elements and generated with rand() % 1000 (which means that each element has roughly 2000 copies) the times are:
std::sort - 0.468 seconds
dual-pivot quicksort - 0.438 seconds
qs1/qs2 - 0.407 seconds
There are some instances where the dual-pivot quicksort wins out and I do realize that the dual-pivot quicksort could be optimized more, but the same could be safely stated for my quicksort.
Has anyone seen this before?
I know this is a long question/explanation, but have any of you seen this improvement before? If so, then why isn't it being used?

Vladimir Yaroslavskiy | 11 Sep 12:35
Replacement of Quicksort in java.util.Arrays with new Dual-Pivot Quicksort
Visit http://permalink.gmane.org/gmane.comp.java.openjdk.core-libs.devel/2628

To answer your question, no I have not seen this approach before. I'm not going to profile your code and do the other hard work, but perhaps the following are next steps/considerations in formally presenting your algorithm. In the real world, sorting algorithms are implemented to have:
Good scalability / complexity and Low overhead
Scaling and overhead are obvious and are easy to measure. When profiling sorting, in addition to time measure number of comparisons and swaps. Performance on large files will also be dependent on disk seek time. For example, merge sort works well on large files with a magnetic disk. ( see also Quick Sort Vs Merge Sort )
Wide range of inputs with good performance
There's lots of data that needs sorting. And applications are known to produce data in patterns, so it is important to make the sort is resilient against poor performance under certain patterns. Your algorithm optimizes for repeated numbers. What if all numbers are repeated but only once (i.e. seq 1000>file; seq 1000>>file; shuf file)? What if numbers are already sorted? sorted backwards? what about a pattern of 1,2,3,1,2,3,1,2,3,1,2,3? 1,2,3,4,5,6,7,6,5,4,3,2,1? 7,6,5,4,3,2,1,2,3,4,5,6,7? Poor performance in one of these common scenarios is a deal breaker! Before comparing against a published general-purpose algorithm it is wise to have this analysis prepared.
Low-risk of pathological performance
Of all the permutations of inputs, there is one that performs worse than the others. How much worse does that perform than average? And how many permutations will provide similar poor performance?
Good luck on your next steps!

It's a great improvment and I'm sure it's been implemented specifically if you expect a lot of equal objects. There are many of the wall tweeks of this kind.
If I understand all you wrote correctly, the reason it's not generally "known" is that it does improve the basic O(n2) performance. That means, double the number of objects, quadruple the time. Your improvement doesn't change this unless all objects are equal.

std:sort is not exactly fast.
Here are results I get comparing it to randomized parallel nonrecursive quicksort:
pnrqSort (longs):
.:.1 000 000 36ms (items per ms: 27777.8)
.:.5 000 000 140ms (items per ms: 35714.3)
.:.10 000 000 296ms (items per ms: 33783.8)
.:.50 000 000 1s 484ms (items per ms: 33692.7)
.:.100 000 000 2s 936ms (items per ms: 34059.9)
.:.250 000 000 8s 300ms (items per ms: 30120.5)
.:.400 000 000 12s 611ms (items per ms: 31718.3)
.:.500 000 000 16s 428ms (items per ms: 30435.8)
std::sort(longs)
.:.1 000 000 134ms (items per ms: 7462.69)
.:.5 000 000 716ms (items per ms: 6983.24)
std::sort vector of longs
1 000 000 511ms (items per ms: 1956.95)
2 500 000 943ms (items per ms: 2651.11)
Since you have extra method it is going to cause more stack use which will ultimately slow things down. Why median of 3 is used, I don't know, because it's a poor method, but with random pivot points quicksort never has big issues with uniform or presorted data and there's no danger of intentional median of 3 killer data.

nobody seems to like your algorithm, but I do.
Seems to me it's a nice way to re-do classic quicksort in a manner now
safe for use with highly repeated elements.
Your q1 and q2 subalgorithms, it seems to me are actually the SAME algorithm
except that < and <= operators interchanged and a few other things, which if you
wanted would allow you to write shorter pseudocode for this (though might be less
efficient). Recommend you read
JL Bentley, MD McIlroy: Engineering a Sort Function
SOFTWARE—PRACTICE AND EXPERIENCE 23,11 (Nov 1993)1249-1265
e-available here
http://www.skidmore.edu/~meckmann/2009Spring/cs206/papers/spe862jb.pdf
to see the tests they put their quicksort through. Your idea might be nicer and/or better,
but it needs to run the gauntlet of the kinds of tests they tried, using some
particular pivot-choosing method. Find one that passes all their tests without ever suffering quadratic runtime. Then if in addition your algorithm is both faster and nicer than theirs, you would then clearly have a worthwhile contribution.
The "Tukey Ninther" thing they use to generate a pivot seems to me is usable by you too
and will automatically make it very hard for the quadratic time worst case to arise in practice.
I mean, if you just use median-of-3 and try the middle and two end elements of the array as
your three, then an adversary will make the initial array state be increasing then decreasing and then you'll fall on your face with quadratic runtime on a not-too-implausible input. But with Tukey Ninther on 9 elements, it's pretty hard for me to construct
a plausible input which hurts you with quadratic runtime.
Another view & a suggestion:
Think of the combination of q1 splitting your array, then q2 splitting the right subarray,
as a single q12 algorithm producing a 3-way split of the array. Now, you need to recurse
on the 3 subarrays (or only 2 if the two pivots happen to be equal). Now always
recurse on the SMALLEST of the subarrays you were going to recurse on, FIRST, and
the largest LAST -- and do not implement this largest one as a recursion, but rather just stay in the same routine and loop back up to the top with a shrunk window. That way
you have 1 fewer recursive call in q12 than you would have, but the main point of this is,
it is now IMPOSSIBLE for the recursion stack to ever get more than O(logN) long.
OK? This solves another annoying worst-case problem quicksort can suffer while also making
your code a bit faster anyhow.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js