Does anyone know both the expected running times and worst case running times for different implementations of std::nth_element? I use this algorithm nearly every day.
I'm specifically interested in the STL versions shipping with the recent Microsoft Compilers, but any information on this topic is helpful.
Please note that this is not a duplicate of this question. I understand which algorithms exist, but I'm interested in which implementations use which algorithms.
For background, there are well-known algorithms to do this. One is O(n) average case and O(n log n) worst case, one is O(n) worst case but slow in practice (median of medians).
Also note that there is talk of interesting implementation strategies to get worst-case O(n) running times that are fast in practice. The standard says that this must be at worse O(n) average time.
The expected running time is O(N)
The worst case running time for most implemententations is O(N * N) because most implementations use QuickSelect and it could be that QuickSelect runs into bad partitions.
That is true for Microsoft VS2008, VS2010 & VS2012.
Now with the new ISO C++ 2011 standard, the complexity for std::sort has been tightened up - it is guaranteed to be O(N * log N) and has no worse case as David Musser's IntroSort is used: - use QuickSort and if parts of the array experience bad partitioning, swap to heapsort.
Ideally exactly the same should apply std::nth_element but the ISO C++ 2011 standard has not tightened up the complexity requirements. So std::nth_element could be O(N * N) in the worst case. This could be because in David Musser's original paper (see here) he did not mention what algorithm should be swapped to if QuickSelect goes bad.
In the worst case, the median-of-medians using groups of 5 could be used (I have seen a paper the recommended groups of 7 but cannot find it). So a quality implementation of std::nth_element could use QuickSelect and swap to median-of-medians if partitioning goes bad. This would guarantee O(N) behaviour. QuickSelect can be improved by using sampling making the worst case unlikely but not impossible.
The implementation in GCC 4.7 uses introspective selection by David Musser (here you have his paper giving details on introsort and introselect). According to those documents the worst-case execution time is O(n).
cppreference says, first it sorts and then finds the nth element, but by this way average should be O(n log n) (by comparison based sorting algorithms), but they wrote average is O(n), seems incorrect except using sorting like radix sort, ... but because it has generic comparison based input, seems it's impossible to use radix sort or any other sort which is not comparison based. anyway, using fast sorting algorithms is better than using normal selection algorithm in practice (both memory and average time).
Related
Since C++11, the C++ Standard Library (c.f. Section 25.4.1.1 of a draft verion of the standard) requires the std::sort algorithm to have asymptotic worst case execution time O(n log n) instead of average case.
Following the change, for example the quicksort algorithm does not comply the specification anymore. This was pointed out in a bugreport for LLVM's libc++. Instead, the algorithms introsort or pdqsort, which have worst case running time O(n log n), are used usually.
Is there any documentation on the motivation for that change? Is there some anecdote or incident that lead to that change?
I recently came across an article that claimed that it can find all primes less than n in O(n) using an efficient Sieve Of Eratosthenes. However I am unable to see how it is O(n).
https://www.geeksforgeeks.org/sieve-eratosthenes-0n-time-complexity/
Could anyone please help with that?
The normal Sieve of Eratosthenes is O(n log log n).
Paul Pritchard has done some work on sieves similar to the Sieve of Eratosthenes that run in O(n) and even in O(n / log log n). They are tricky to implement, and despite improved theoretical time complexity, the bookkeeping involved in running the sieves makes them slower than the normal Sieve of Eratosthenes.
I discuss a simple version of Pritchard's sieve at my blog.
It is a version of the Gries and Misra (1978) sieve, which is an O(n) sieve. A better description can be found here:
(external link) Sieve of Eratosthenes Having Linear Time Complexity.
For a more theoretical look at this type of sieve, from an expert in the field, see Pritchard's paper:
(external link) Linear Prime-Number Sieves: A Family Tree (1987, PDF).
Pritchard is well known for his sub-linear sieve algorithm and paper as well as other early contributions.
The version at GfG uses a lot of extra memory. The version at CP uses a little less. Both are huge compared to typical byte or bit implementations of the SoE. At 10^9, it is over 60x more memory used than a simple bit array monolithic SoE, and also half the speed, even when using uint32_t types.
So in practice it is slower than a simple 4-line monolithic SoE, which is usually where we start before getting into the interesting optimizations (segmented sieves, wheels, etc.). If you actually want the factor array, then that's useful. It's also useful for learning and experimentation, though the GfG article doesn't actually do much other than give the code. The CP page does go over a bit of the history and some memory/speed analysis.
The algorithm at your link is a variation of Algorithm 3.3 in Paul Pritchard's paper "Linear Prime-Number Sieves: a Family Tree". The reason the algorithm is linear, i.e. O(n), is that each composite is removed once, because a composite c has a unique form p*f where p=lpf(c), and it is removed when the outer loop variable if f, and the inner loop variable j is such that p[j]=p.
Incidentally, the code is inelegant. There is no need for two arrays; SPF suffices. Also, the first test (on j) in the inner loop is unnecessary.
Many other linear sieves are presented in Pritchard's paper, one of which is due to Gries and Misra, which is an entirely different algorithm. The algorithm at your link is often mis-attributed to Gries and Misra.
Does anyone know both the expected running times and worst case running times for different implementations of std::nth_element? I use this algorithm nearly every day.
I'm specifically interested in the STL versions shipping with the recent Microsoft Compilers, but any information on this topic is helpful.
Please note that this is not a duplicate of this question. I understand which algorithms exist, but I'm interested in which implementations use which algorithms.
For background, there are well-known algorithms to do this. One is O(n) average case and O(n log n) worst case, one is O(n) worst case but slow in practice (median of medians).
Also note that there is talk of interesting implementation strategies to get worst-case O(n) running times that are fast in practice. The standard says that this must be at worse O(n) average time.
The expected running time is O(N)
The worst case running time for most implemententations is O(N * N) because most implementations use QuickSelect and it could be that QuickSelect runs into bad partitions.
That is true for Microsoft VS2008, VS2010 & VS2012.
Now with the new ISO C++ 2011 standard, the complexity for std::sort has been tightened up - it is guaranteed to be O(N * log N) and has no worse case as David Musser's IntroSort is used: - use QuickSort and if parts of the array experience bad partitioning, swap to heapsort.
Ideally exactly the same should apply std::nth_element but the ISO C++ 2011 standard has not tightened up the complexity requirements. So std::nth_element could be O(N * N) in the worst case. This could be because in David Musser's original paper (see here) he did not mention what algorithm should be swapped to if QuickSelect goes bad.
In the worst case, the median-of-medians using groups of 5 could be used (I have seen a paper the recommended groups of 7 but cannot find it). So a quality implementation of std::nth_element could use QuickSelect and swap to median-of-medians if partitioning goes bad. This would guarantee O(N) behaviour. QuickSelect can be improved by using sampling making the worst case unlikely but not impossible.
The implementation in GCC 4.7 uses introspective selection by David Musser (here you have his paper giving details on introsort and introselect). According to those documents the worst-case execution time is O(n).
cppreference says, first it sorts and then finds the nth element, but by this way average should be O(n log n) (by comparison based sorting algorithms), but they wrote average is O(n), seems incorrect except using sorting like radix sort, ... but because it has generic comparison based input, seems it's impossible to use radix sort or any other sort which is not comparison based. anyway, using fast sorting algorithms is better than using normal selection algorithm in practice (both memory and average time).
The wikipedia article for merge sort.
The wikipedia article for quick sort.
Both articles have excellent visualizations.
Both have n*log(n) complexity.
So obviously the distribution of the data will effect the speed of the sort. My guess would be that since a comparison can just as quickly compare any two values, no matter their spread, the range of data values does not matter.
More importantly one should consider the lateral distribution (x direction ) with respect to ordering (magnitude removed).
A good test case to consider would be if the test data had some level of sorting...
It typically depends on the data structures involved. Quick sort is
typically the fastest, but it doesn't guarantee O(n*log(n)); there are
degenerate cases where it becomes O(n^2). Heap sort is the usual
alternative; it guarantees O(n*log(n)), regardless of the initial order,
but it has a much higher constant factor. It's usually used when you
need a hard upper limit to the time taken. Some more recent algorithms
use quick sort, but attempt to recognize when it starts to degenerate,
and switch to heap sort then. Merge sort is used when the data
structure doesn't support random access, since it works with pure
sequential access (forward iterators, rather than random access
iterators). It's used in std::list<>::sort, for example. It's also
widely used for external sorting, where random access can be very, very
expensive compared to sequential access. (When sorting a file which
doesn't fit into memory, you might break it into chunks which fit into
memory, sort these using quicksort, writing each out to a file, then
merge sort the generated files.)
Mergesort is quicker when dealing with linked lists. This is because pointers can easily be changed when merging lists. It only requires one pass (O(n)) through the list.
Quicksort's in-place algorithm requires the movement (swapping) of data. While this can be very efficient for in-memory dataset, it can be much more expensive if your dataset doesn't fit in memory. The result would be lots of I/O.
These days, there is a lot of parallelization that occurs. Parallelizing Mergesort is simpler than Quicksort (in-place). If not using the in-place algorithm, then the space complexity for quicksort is O(n) which is the same are mergesort.
So, to generalize, quicksort is probably more effective for datasets that fit in memory. For stuff that's larger, it's better to use mergesort.
The other general time to use mergesort over quicksort is if the data is very similar (that is, not close to being uniform). Quicksort relies on using a pivot. In the case where all the values are the similar, quicksort hits a worst case of O(n^2). If the values of the data are very similar, then it's more likely that a poor pivot will be chosen leading to very unbalanced partitions leading to an O(n^2) runtime. The most straightforward example is if all the values in the list are the same.
There is a real-world sorting algorithm -- called Timsort -- that does exploit the idea that data encountered in the wild is often partially sorted.
The algorithm is derived from merge sort and insertion sort, and is used in CPython, Java 7 and Android.
See the Wikipedia article for more details.
While Java 6 and earlier versions use merge sort as the sorting algorithms, C# uses QuickSort as the sorting algorithm.
QuickSort performs better than merge sort even though they are both O(nlogn). QuickSort's has a smaller constant than merge sort.
Of the two, use merge sort when you need a stable sort. You can use a modified quicksort (such as introsort) when you don't, since it tends to be faster and it uses less memory.
Plain old Quicksort as described by Hoare is quite sensitive to performance-killing special cases that make it Theta(n^2), so you normally do need a modified version. That's where the data-distribution comes in, since merge sort doesn't have bad cases. Once you start modifying quicksort you can go on with all sorts of different tweaks, and introsort is one of the more effective ones. It detects on the fly whether it's in a killer case, and if so switches to heapsort.
In fact, Hoare's most basic Quicksort fails worst for already-sorted data, and so your "good test cases" with some level of sorting will kill it to some level. That fact is for curiosity only, though, since it only takes a very small tweak to avoid that, nothing like as complicated as going all the way to introsort. So it's simplistic to even bother analyzing the version that's killed by sorted data.
In practice, in C++ you'd generally use std::stable_sort and std::sort rather than worrying too much about the exact algorithm.
Remember in practice, unless you have a very large data set and/or are executing the sort many many times, it probably won't matter at all. That being said, quicksort is generally considered the 'fastest' n*log(n) sorter. See this question already asked: Quick Sort Vs Merge Sort
I believe that the C++ standard for std::sort does not guarantee O(n) performance on a list that's already sorted. But still, I'm wondering whether to your knowledge any implementations of the STL (GCC, MSVC, etc) make the std::is_sorted check before executing the sort algorithm?
Asked another way, what performance can one expect (without guarantees, of course) from running std::sort on a sorted container?
Side note: I posted some benchmarks for GCC 4.5 with C++0x enabled on my blog. Here's the results:
Implementations are free to use any efficient sorting algorithm they want so this is highly implementation dependant
However I have seen a performance comparison of libstdc++ as used on linux and against libc++ the new C++ library developed by Apple/LLVM. Both these libraries are very efficient on sorted or reverse sorted data (much faster than on a random list) with the new library being considerable faster then the old and recognizing many more patterns.
To be certain you should consider doing your own benchmarks.
No. Also, it's not logical to have is_sorted() called for any STL implementation. Since, is_sorted() is available already as a stand-alone. And many users may not want to waste execution cycles unnecessarily to call that function when they already know that their container is not sorted.
STL also should be following the C++ philosophy: "pay per use".
Wow! Did you have optimizations all the way cranked up?
the results of your code on my platform (note the values on the vertical axis).
I suggest you read this comparison of sorting algorithms, it is very well done and informative, it compares a number of sorting algorithms with each other and with GCC's implementation of std::sort. You will notice, in the charts on the given link, that the performance of std::sort for "almost sorted" and "almost reverse" are linear in the number of elements to sort, that is, O(n). So, no guarantee, but you can easily expect that an almost sorted list will be sorted in almost linear-time. But, of course, it does not do a is_sorted check, and even if it will sort a sorted array in linear-time, it won't be as fast as doing a is_sorted check and skipping the sorting altogether. It is your decision to determine if it is better to check before sorting or not.
The standard sanctions only std::sort implementations with complexity O(n log n):
Complexity: Approximately N log N (where N == last - first) comparisons on the average.
See section 25.3.1.1 Sorting [lib.sort] (ISO/IEC 14882:2003(E)).
Thus, the set of allowed sorting functions is limited, and you are right that it does not guarantee linear complexity.
Ideal behavior for a sort is O(n), but this is not possible in the average case.
Of course the average case is not necessarily the exact case you have right now, so for corner cases, there's not much of a guarantee.
And why would any implementation do that check? What would it gain? -- Nothing in average. A good design rule is not to clutter implementation with optimizations for corner cases which make no difference in average. This example is similar to check for self-assignment. A simple answer: don't do it.
There's no guarantee that it'll check this. Some implementations will do it , others probably won't.
However, if you suspect that your input might already be sorted (or nearly sorted), std::stable_sort might be a better option.