Something faster than std::nth_element

Something faster than std::nth_element - c++

I'm working on a kd-tree implementation and I'm currently using std::nth_element for partition a vector of elements by their median. However std::nth_element takes 90% of the time of tree construction. Can anyone suggest a more efficient alternative?
Thanks in advance

Do you really need the nth element, or do you need an element "near" the middle?
There are faster ways to get an element "near" the middle. One example goes roughly like:
function rough_middle(container)
divide container into subsequences of length 5
find median of each subsequence of length 5 ~ O(k) * O(n/5)
return rough_middle( { median of each subsequence} ) ~ O(rough_middle(n/5))
The result should be something that is roughly in the middle. A real nth element algorithm might use something like the above, and then clean it up afterwards to find the actual nth element.
At n=5, you get the middle.
At n=25, you get the middle of the short sequence middles. This is going to be greater than all of the lesser of each short sequence, or at least the 9th element and no more than the 16th element, or 36% away from edge.
At n=125, you get the rough middle of each short sequence middle. This is at least the 9th middle, so there are 8*3+2=26 elements less than your rough middle, or 20.8% away from edge.
At n=625, you get the rough middle of each short sequence middle. This is at least the 26th middle, so there are 77 elements less than your rough middle, or 12% away from the edge.
At n=5^k, you get the rough middle of the 5^(k-1) rough middles. If the rough middle of a 5^k sequence is r(k), then r(k+1) = r(k)*3-1 ~ 3^k.
3^k grows slower than 5^k in O-notation.
3^log_5(n)
= e^( ln(3) ln(n)/ln(5) )
= n^(ln(3)/ln(5))
=~ n^0.68
is a very rough estimate of the lower bound of where the rough_middle of a sequence of n elements ends up.
In theory, it may take as many as approx n^0.33 iterations of reductions to reach a single element, which isn't really that good. (the number of bits in n^0.68 is ~0.68 times the number of bits in n. If we shave that much off each rough middle, we need to repeat it very roughly n^0.33 times number of bits in n to consume all the bits -- more, because as we subtract from the n, the next n gets a slightly smaller value subtracted from it).
The way that the nth element solutions I've seen solve this is by doing a partition and repair at each level: instead of recursing into rough_middle, you recurse into middle. The real middle of the medians is then guaranteed to be pretty close to the actual middle of your sequence, and you can "find the real middle" relatively quickly (in O-notation) from this.
Possibly we can optimize this process by doing a more accurate rough_middle iterations when there are more elements, but never forcing it to be the actual middle? The bigger the end n is, the closer to the middle we need the recursive calls to be to the middle for the end result to be reasonably close to the middle.
But in practice, the probability that your sequence is a really bad one that actually takes n^0.33 steps to partition down to nothing might be really low. Sort of like the quicksort problem: median of 3 elements is usually good enough.
A quick stats analysis.
You pick 5 elements at random, and pick the middle one.
The median index of a set of 2m+1 random sample of a uniform distribution follows the beta distribution with parameters of roughly (m+1, m+1), with maybe some scaling factors for non-[0,1] intervals.
The mean of the median is clearly 1/2. The variance is:
(3*3)^2 / ( (3+3)^2 (3+3+1) )
= 81 / (36 * 7)
=~ 0.32
Figuring out the next step is beyond my stats. I'll cheat.
If we imagine that taking the median index element from a bunch of items with mean 0.5 and variance 0.32 is as good as averaging their index...
Let n now be the number of elements in our original set.
Then the sum of the indexes of medians of the short sequences has an average of n times n/5*0.5 = 0.1 * n^2. The variance of the sum of the indexes of the medians of the short sequences is n times n/5*0.32 = 0.064 * n^2.
If we then divide the value by n/5 we get:
So mean of n/2 and variance of 1.6.
Oh, if that was true, that would be awesome. Variance that doesn't grow with the size of n means that as n gets large, the average index of the medians of the short sequences gets ridiculously tightly distributed. I guess it makes some sense. Sadly, we aren't quite doing that -- we want the distribution of the pseudo-median of the medians of the short sequences. Which is almost certainly worse.
Implementation detail. We can with logarithmic number of memory overhead do an in-place rough median. (we might even be able to do it without the memory overhead!)
We maintain a vector of 5 indexes with a "nothing here" placeholder.
Each is a successive layer.
At each element, we advance the bottom index. If it is full, we grab the median, and insert it on the next level up, and clear the bottom layer.
At the end, we complete.
using target = std::pair<size_t,std::array<size_t, 5>>;
bool push( target& t, size_t i ) {
t.second[t.first]=i;
++t.first;
if (t.first==5)
return true;
}
template<class Container>
size_t extract_median( Container const& c, target& t ) {
Assert(t.first != 0);
std::sort( t.data(), t.data()+t.first, [&c](size_t lhs, size_t rhs){
return c[lhs]<c[rhs];
} );
size_t r = t[(t.first+1)/2];
t.first = 0;
return r;
}
template<class Container>
void advance(Container const& c, std::vector<target>& targets, size_t i) {
size_t height = 0;
while(true) {
if (targets.size() <= height)
targets.push_back({});
if (!push(targets[height], i))
return;
i = extract_median(c, targets[height]);
}
}
template<class Container>
size_t collapse(Container const& c, target* b, target* e) {
if (b==e) return -1;
size_t before = collapse(c, b, e-1);
target& last = (*e-1);
if (before!=-1)
push(before, last);
if (last.first == 0)
return -1;
return extract_median(c, last);
}
template<class Container>
size_t rough_median_index( Container const& c ) {
std::vector<target> targets;
for (auto const& x:c) {
advance(c, targets, &x-c.data());
}
return collapse(c, targets.data(), targets.data()+targets.size());
}
which sketches out how it could work on random access containers.

If you have more lookups than insertions into the vector you could consider using a data structure which sorts on insertion -- such as std::set -- and then use std::advance() to get the n'th element in sorted order.

Related

efficiently mask-out exactly 30% of array with 1M entries

My question's header is similar to this link, however that one wasn't answered to my expectations.
I have an array of integers (1 000 000 entries), and need to mask exactly 30% of elements.
My approach is to loop over elements and roll a dice for each one. Doing it in a non-interrupted manner is good for cache coherency.
As soon as I notice that exactly 300 000 of elements were indeed masked, I need to stop. However, I might reach the end of an array and have only 200 000 elements masked, forcing me to loop a second time, maybe even a third, etc.
What's the most efficient way to ensure I won't have to loop a second time, and not being biased towards picking some elements?
Edit:
//I need to preserve the order of elements.
//For instance, I might have:
[12, 14, 1, 24, 5, 8]
//Masking away 30% might give me:
[0, 14, 1, 24, 0, 8]
The result of masking must be the original array, with some elements set to zero

Just do a fisher-yates shuffle but stop at only 300000 iterations. The last 300000 elements will be the randomly chosen ones.
std::size_t size = 1000000;
for(std::size_t i = 0; i < 300000; ++i)
{
std::size_t r = std::rand() % size;
std::swap(array[r], array[size-1]);
--size;
}
I'm using std::rand for brevity. Obviously you want to use something better.
The other way is this:
for(std::size_t i = 0; i < 300000;)
{
std::size_t r = rand() % 1000000;
if(array[r] != 0)
{
array[r] = 0;
++i;
}
}
Which has no bias and does not reorder elements, but is inferior to fisher yates, especially for high percentages.

When I see a massive list, my mind always goes first to divide-and-conquer.
I won't be writing out a fully-fleshed algorithm here, just a skeleton. You seem like you have enough of a clue to take decent idea and run with it. I think I only need to point you in the right direction. With that said...
We'd need an RNG that can return a suitably-distributed value for how many masked values could potentially be below a given cut point in the list. I'll use the halfway point of the list for said cut. Some statistician can probably set you up with the right RNG function. (Anyone?) I don't want to assume it's just uniformly random [0..mask_count), but it might be.
Given that, you might do something like this:
// the magic RNG your stats homework will provide
int random_split_sub_count_lo( int count, int sub_count, int split_point );
void mask_random_sublist( int *list, int list_count, int sub_count )
{
if (list_count > SOME_SMALL_THRESHOLD)
{
int list_count_lo = list_count / 2; // arbitrary
int list_count_hi = list_count - list_count_lo;
int sub_count_lo = random_split_sub_count_lo( list_count, mask_count, list_count_lo );
int sub_count_hi = list_count - sub_count_lo;
mask( list, list_count_lo, sub_count_lo );
mask( list + sub_count_lo, list_count_hi, sub_count_hi );
}
else
{
// insert here some simple/obvious/naive implementation that
// would be ludicrous to use on a massive list due to complexity,
// but which works great on very small lists. I'm assuming you
// can do this part yourself.
}
}
Assuming you can find someone more informed on statistical distributions than I to provide you with a lead on the randomizer you need to split the sublist count, this should give you O(n) performance, with 'n' being the number of masked entries. Also, since the recursion is set up to traverse the actual physical array in constantly-ascending-index order, cache usage should be as optimal as it's gonna get.
Caveat: There may be minor distribution issues due to the discrete nature of the list versus the 30% fraction as you recurse down and down to smaller list sizes. In practice, I suspect this may not matter much, but whatever person this solution is meant for may not be satisfied that the random distribution is truly uniform when viewed under the microscope. YMMV, I guess.

Here's one suggestion. One million bits is only 128K which is not an onerous amount.
So create a bit array with all items initialised to zero. Then randomly select 300,000 of them (accounting for duplicates, of course) and mark those bits as one.
Then you can run through the bit array and, any that are set to one (or zero, if your idea of masking means you want to process the other 700,000), do whatever action you wish to the corresponding entry in the original array.
If you want to ensure there's no possibility of duplicates when randomly selecting them, just trade off space for time by using a Fisher-Yates shuffle.
Construct an collection of all the indices and, for each of the 700,000 you want removed (or 300,000 if, as mentioned, masking means you want to process the other ones) you want selected:
pick one at random from the remaining set.
copy the final element over the one selected.
reduce the set size.
This will leave you with a random subset of indices that you can use to process the integers in the main array.

You want reservoir sampling. Sample code courtesy of Wikipedia:
(*
S has items to sample, R will contain the result
*)
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]

STL random_sample replace with a decreasing probability

The STL random_sample function use a replace strategy to sample from the given interval. Why we need a decreasing probability, I've seen a similar algorithm without decrease the replacement probability. What is the difference. The code is as:
/*This is an excerpt from STL implementation*/
template <class InputIterator, class RandomAccessIterator, class Distance>
RandomAccessIterator __random_sample(InputIterator first, InputIterator last,
RandomAccessIterator out,
const Distance n)
{
Distance m = 0;
Distance t = n;
for ( ; first != last && m < n; ++m, ++first) //the strategy is also used in mahout
out[m] = *first;//fill it
while (first != last) {
++t;
Distance M = lrand48() % t;
if (M < n)
out[M] = *first;//replace it with a decreasing probability
++first;
}
return out + m;
}

It is obvious that the first element (indeed, the first n elements) has to be chosen with the probability of 1 (the "fill it" stage). In order to remain in the final sample, this first element needs then to survive m-n possible replacements - that's what brings the probability of it being in the sample down to n/m. On the other hand, the last element only participates in one replacement; it should thus be added to the sample with the probability of n/m from the start.
For simplicity, suppose that you only need to pick one element, out of m, using this replacement strategy (note that you don't know up front what m is: you just iterate until you suddenly reach the end). You take the first element, and keep it with a probability of 1 (for all you know, this is the only element). Then you discover the second element, and you throw a coin and either keep it or discard it with the probability of 1/2. At this point, each of the first two elements has 1/2 probability to be the chosen one.
Now you see the third element, and you keep it with probability of 1/3. Each of the first two elements had 1/2 probability of participating in this encounter, and 2/3 of surviving it - for a total of 1/2 * 2/3 == 1/3 probability of still being around. So, again, each of the first three elements has 1/3 probability of being chosen at this point.
The inductive step of proving that, after tth element is inspected, each of the first t elements has 1/t probability of being chosen is left as an exercise for the reader. So is the extension of the proof to a sample of size n > 1.

Find dominant mode of an unsorted array

Note, this is a homework assignment.
I need to find the mode of an array (positive values) and secondarily return that value if the mode is greater that sizeof(array)/2,the dominant value. Some arrays will have neither.
That is simple enough, but there is a constraint that the array must NOT be sorted prior to the determination, additionally, the complexity must be on the order of O(nlogn).
Using this second constraint, and the master theorem we can determine that the time complexity 'T(n) = A*T(n/B) + n^D' where A=B and log_B(A)=D for O(nlogn) to be true. Thus, A=B=D=2. This is also convenient since the dominant value must be dominant in the 1st, 2nd, or both halves of an array.
Using 'T(n) = A*T(n/B) + n^D' we know that the search function will call itself twice at each level (A), divide the problem set by 2 at each level (B). I'm stuck figuring out how to make my algorithm take into account the n^2 at each level.
To make some code of this:
int search(a,b) {
search(a, a+(b-a)/2);
search(a+(b-a)/2+1, b);
}
The "glue" I'm missing here is how to combine these divided functions and I think that will implement the n^2 complexity. There is some trick here where the dominant must be the dominant in the 1st or 2nd half or both, not quite sure how that helps me right now with the complexity constraint.
I've written down some examples of small arrays and I've drawn out ways it would divide. I can't seem to go in the correct direction of finding one, single method that will always return the dominant value.
At level 0, the function needs to call itself to search the first half and second half of the array. That needs to recurse, and call itself. Then at each level, it needs to perform n^2 operations. So in an array [2,0,2,0,2] it would split that into a search on [2,0] and a search on [2,0,2] AND perform 25 operations. A search on [2,0] would call a search on [2] and a search on [0] AND perform 4 operations. I'm assuming these would need to be a search of the array space itself. I was planning to use C++ and use something from STL to iterate and count the values. I could create a large array and just update counts by their index.

if some number occurs more than half, it can be done by O(n) time complexity and O(1) space complexity as follow:
int num = a[0], occ = 1;
for (int i=1; i<n; i++) {
if (a[i] == num) occ++;
else {
occ--;
if (occ < 0) {
num = a[i];
occ = 1;
}
}
}
since u r not sure whether such number occurs, all u need to do is to apply the above algorithm to get a number first, then iterate the whole array 2nd time to get the occurance of the number and check whether it is greater than half.

If you want to find just the dominant mode of an array, and do it recursively, here's the pseudo-code:
def DominantMode(array):
# if there is only one element, that's the dominant mode
if len(array) == 1: return array[0]
# otherwise, find the dominant mode of the left and right halves
left = DominantMode(array[0:len(array)/2])
right = DominantMode(array[len(array)/2:len(array)])
# if both sides have the same dominant mode, the whole array has that mode
if left == right: return left
# otherwise, we have to scan the whole array to determine which one wins
leftCount = sum(element == left for element in array)
rightCount = sum(element == right for element in array)
if leftCount > len(array) / 2: return left
if rightCount > len(array) / 2: return right
# if neither wins, just return None
return None
The above algorithm is O(nlogn) time but only O(logn) space.
If you want to find the mode of an array (not just the dominant mode), first compute the histogram. You can do this in O(n) time (visiting each element of the array exactly once) by storing the historgram in a hash table that maps the element value to its frequency.
Once the histogram has been computed, you can iterate over it (visiting each element at most once) to find the highest frequency. Once you find a frequency larger than half the size of the array, you can return immediately and ignore the rest of the histogram. Since the size of the histogram can be no larger than the size of the original array, this step is also O(n) time (and O(n) space).
Since both steps are O(n) time, the resulting algorithmic complexity is O(n) time.

How to figure out "progress" while sorting?

I'm using stable_sort to sort a large vector.
The sorting takes on the order of a few seconds (say, 5-10 seconds), and I would like to display a progress bar to the user showing how much of the sorting is done so far.
But (even if I was to write my own sorting routine) how can I tell how much progress I have made, and how much more there is left to go?
I don't need it to be exact, but I need it to be "reasonable" (i.e. reasonably linear, not faked, and certainly not backtracking).

Standard library sort uses a user-supplied comparison function, so you can insert a comparison counter into it. The total number of comparisons for either quicksort/introsort or mergesort will be very close to log2N * N (where N is the number of elements in the vector). So that's what I'd export to a progress bar: number of comparisons / N*log2N
Since you're using mergesort, the comparison count will be a very precise measure of progress. It might be slightly non-linear if the implementation spends time permuting the vector between comparison runs, but I doubt your users will see the non-linearity (and anyway, we're all used to inaccurate non-linear progress bars :) ).
Quicksort/introsort would show more variance, depending on the nature of the data, but even in that case it's better than nothing, and you could always add a fudge factor on the basis of experience.
A simple counter in your compare class will cost you practically nothing. Personally I wouldn't even bother locking it (the locks would hurt performance); it's unlikely to get into an inconsistent state, and anyway the progress bar won't go start radiating lizards just because it gets an inconsistent progress number.

Split the vector into several equal sections, the quantity depending upon the granularity of progress reporting you desire. Sort each section seperately. Then start merging with std::merge. You can report your progress after sorting each section, and after each merge. You'll need to experiment to determine how much percentage the sorting of the sections should be counted compared to the mergings.
Edit:
I did some experiments of my own and found the merging to be insignificant compared to the sorting, and this is the function I came up with:
template<typename It, typename Comp, typename Reporter>
void ReportSort(It ibegin, It iend, Comp cmp, Reporter report, double range_low=0.0, double range_high=1.0)
{
double range_span = range_high - range_low;
double range_mid = range_low + range_span/2.0;
using namespace std;
auto size = iend - ibegin;
if (size < 32768) {
stable_sort(ibegin,iend,cmp);
} else {
ReportSort(ibegin,ibegin+size/2,cmp,report,range_low,range_mid);
report(range_mid);
ReportSort(ibegin+size/2,iend,cmp,report,range_mid,range_high);
inplace_merge(ibegin, ibegin + size/2, iend);
}
}
int main()
{
std::vector<int> v(100000000);
std::iota(v.begin(), v.end(), 0);
std::random_shuffle(v.begin(), v.end());
std::cout << "starting...\n";
double percent_done = 0.0;
auto report = [&](double d) {
if (d - percent_done >= 0.05) {
percent_done += 0.05;
std::cout << static_cast<int>(percent_done * 100) << "%\n";
}
};
ReportSort(v.begin(), v.end(), std::less<int>(), report);
}

Stable sort is based on merge sort. If you wrote your own version of merge sort then (ignoring some speed-up tricks) you would see that it consists of log N passes. Each pass starts with 2^k sorted lists and produces 2^(k-1) lists, with the sort finished when it merges two lists into one. So you could use the value of k as an indication of progress.
If you were going to run experiments, you might instrument the comparison object to count the number of comparisons made and try and see if the number of comparisons made is some reasonably predictable multiple of n log n. Then you could keep track of progress by counting the number of comparisons done.
(Note that with the C++ stable sort, you have to hope that it finds enough store to hold a copy of the data. Otherwise the cost goes from N log N to perhaps N (log N)^2 and your predictions will be far too optimistic).

Select a small subset of indices and count inversions. You know its maximal value, and you know when you are done the value is zero. So, you can use this value as a "progressor". You can think of it as a measure of entropy.

Easiest way to do it: sort a small vector and extrapolate the time assuming O(n log n) complexity.
t(n) = C * n * log(n) ⇒ t(n1) / t(n2) = n1/n2 * log(n1)/log(n2)
If sorting 10 elements takes 1 μs, then 100 elements will take 1 μs * 100/10 * log(100)/log(10) = 20 μs.

Quicksort is basically
partition input using a pivot element
sort smallest part recursively
sort largest part using tail recursion
All the work is done in the partition step. You could do the outer partition directly and then report progress as the smallest part is done.
So there would be an additional step between 2 and 3 above.
Update progressor
Here is some code.
template <typename RandomAccessIterator>
void sort_wReporting(RandomAccessIterator first, RandomAccessIterator last)
{
double done = 0;
double whole = static_cast<double>(std::distance(first, last));
typedef typename std::iterator_traits<RandomAccessIterator>::value_type value_type;
while (first != last && first + 1 != last)
{
auto d = std::distance(first, last);
value_type pivot = *(first + std::rand() % d);
auto iter = std::partition(first, last,
[pivot](const value_type& x){ return x < pivot; });
auto lower = distance(first, iter);
auto upper = distance(iter, last);
if (lower < upper)
{
std::sort(first, iter);
done += lower;
first = iter;
}
else
{
std::sort(iter, last);
done += upper;
last = iter;
}
std::cout << done / whole << std::endl;
}
}

I spent almost one day to figure out how to display the progress for shell sort, so I will leave here my simple formula. Given an array of colors, it will display the progress. It is blending the colors from red to yellow and then to green. When it is Sorted, it is the last position of the array that is blue. For shell sort, the iterations each time it passes through the array are quite proportional, so the progress becomes pretty accurate.
(Code in Dart/Flutter)
List<Color> colors = [
Color(0xFFFF0000),
Color(0xFFFF5500),
Color(0xFFFFAA00),
Color(0xFFFFFF00),
Color(0xFFAAFF00),
Color(0xFF55FF00),
Color(0xFF00FF00),
Colors.blue,
];
[...]
style: TextStyle(
color: colors[(((pass - 1) * (colors.length - 1)) / (log(a.length) / log(2)).floor()).floor()]),
It is basically a cross-multiplication.
a means array. (log(a.length) / log(2)).floor() means rounding down the log2(N), where N means the number of items. I tested this with several combinations of array sizes, array numbers, and sizes for the array of colors, so I think it is good to go.

Find pair of elements in integer array such that abs(v[i]-v[j]) is minimized

Lets say we have int array with 5 elements: 1, 2, 3, 4, 5
What I need to do is to find minimum abs value of array's elements' subtraction:
We need to check like that
1-2 2-3 3-4 4-5
1-3 2-4 3-5
1-4 2-5
1-5
And find minimum abs value of these subtractions. We can find it with 2 fors. The question is, is there any algorithm for finding value with one and only for?

sort the list and subtract nearest two elements

The provably best performing solution is assymptotically linear O(n) up until constant factors.
This means that the time taken is proportional to the number of the elements in the array (which of course is the best we can do as we at least have to read every element of the array, which already takes O(n) time).
Here is one such O(n) solution (which also uses O(1) space if the list can be modified in-place):
int mindiff(const vector<int>& v)
{
IntRadixSort(v.begin(), v.end());
int best = MAX_INT;
for (int i = 0; i < v.size()-1; i++)
{
int diff = abs(v[i]-v[i+1]);
if (diff < best)
best = diff;
}
return best;
}
IntRadixSort is a linear time fixed-width integer sorting algorithm defined here:
http://en.wikipedia.org/wiki/Radix_sort
The concept is that you leverage the fixed-bitwidth nature of ints by paritioning them in a series of fixed passes on the bit positions. ie partition them on the hi bit (32nd), then on the next highest (31st), then on the next (30th), and so on - which only takes linear time.

The problem is equivalent to sorting. Any sorting algorithm could be used, and at the end, return the difference between the nearest elements. A final pass over the data could be used to find that difference, or it could be maintained during the sort. Before the data is sorted the min difference between adjacent elements will be an upper bound.
So to do it without two loops, use a sorting algorithm that does not have two loops. In a way it feels like semantics, but recursive sorting algorithms will do it with only one loop. If this issue is the n(n+1)/2 subtractions required by the simple two loop case, you can use an O(n log n) algorithm.

No, unless you know the list is sorted, you need two

Its simple Iterate in a for loop
keep 2 variable "minpos and maxpos " and " minneg" and "maxneg"
check for the sign of the value you encounter and store maximum positive in maxpos
and minimum +ve number in "minpos" do the same by checking in if case for number
less than zero. Now take the difference of maxpos-minpos in one variable and
maxneg and minneg in one variable and print the larger of the two . You will get
desired.
I believe you definitely know how to find max and min in one for loop
correction :- The above one is to find max difference in case of minimum you need to
take max and second max instead of max and min :)

This might be help you:
end=4;
subtractmin;
m=0;
for(i=1;i<end;i++){
if(abs(a[m]-a[i+m])<subtractmin)
subtractmin=abs(a[m]-a[i+m];}
if(m<4){
m=m+1
end=end-1;
i=m+2;
}}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js