Please, can someone explaine this:
If documentations says that STL std::vector finding element speed performace = O(ln(n)),
what does it mean.
O(ln(n)) - what is "O", where I can read about that?
And where I can read about performace of other STL containers preformance
Thank you very much
Big O notation is a way to measure how an algorithm will scale as the size of the data on which it is working grows.
Finding an element if a vector is usually O(n), it's only O(lg(n)) when the vector is sorted and you use one of the binary search family of algorithms.
The complexity of each algorithm is specified in the standard and also in any reference (such as the above link to std::lower_bound).
BTW, ln is log on base e, but all logarithms are the same complexity so even though a binary search only performs lg (log2) operations it's technically correct to say it's O(ln(n)).
Big-O notation is about the time complexity of a programs performance.
so O(ln(n)) means that accessing elements in std::vector as the vector gets larger is proportional to ln(size_of_vector), but this is only with a sorted vector using binary search. Binary search performs faster than linear search because you eliminate possible elements twice as fast, thus the ln is really a log base 2.
http://en.wikipedia.org/wiki/Big_O_notation
This is known as the Big O notation, it is an expression of the asymptotic complexity of a given algorithm and in relation with some parameters.
asymptotic means that we are not interested in the first few cases, but about the behavior of the algorithm when the size of the input parameters grow.
the parameters depend on the algorithm to be measured
an operation in which we are interested
For example, in the case of a binary search we express a relationship between the number of comparison performed depending on the size of the set in which we search.
The running time can usually be inferred from this, but it is not always true, especially if one didn't picked up the right "operation" with regard to the implementation or hardware constraints.
There was a nice post a few days ago speaking about sorting using a tape as storage. Because the complexities in searching express the number of comparison and using a tape as storage the runtime is mostly affected by the movement of the tape... it turned out that bubblesort would perform better than quicksort despite being described as slower in general.
The O is big O notation. It describes the runtime complexity of an algorithm. Essentially it's the number of operations required to compute the answer.
All the other answers have cleared up O, to find the typical complexity of a given algorithm look at a decent reference, such as this. At the bottom of each algorithm, the complexity of the algorithm is documented.
O is an abbreviation for "Order". It's a measure of the running time of the operation.
For example, this code is O(n^2) because it's going to execute n*n times.
int n = 100;
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n; ++j) {
}
}
Related
Why the c++'s implemented string::find() doesn't use the KMP algorithm (and doesn't run in O(N + M)) and runs in O(N * M)? Is that corrected in C++0x?
If the complexity of current find is not O(N * M), what is that?
so what algorithm is implemented in gcc? is that KMP? if not, why?
I've tested that and the running time shows that it runs in O(N * M)
Why the c++'s implemented string::substr() doesn't use the KMP algorithm (and doesn't run in O(N + M)) and runs in O(N * M)?
I assume you mean find(), rather than substr() which doesn't need to search and should run in linear time (and only because it has to copy the result into a new string).
The C++ standard doesn't specify implementation details, and only specifies complexity requirements in some cases. The only complexity requirements on std::string operations are that size(), max_size(), operator[], swap(), c_str() and data() are all constant time. The complexity of anything else depends on the choices made by whoever implemented the library you're using.
The most likely reason for choosing a simple search over something like KMP is to avoid needing extra storage. Unless the string to be found is very long, and the string to search contains a lot of partial matches, the time taken to allocate and free that would likely be much more than the cost of the extra complexity.
Is that corrected in c++0x?
No, C++11 doesn't add any complexity requirements to std::string, and certainly doesn't add any mandatory implementation details.
If the complexity of current substr is not O(N * M), what is that?
That's the worst-case complexity, when the string to search contains a lot of long partial matches. If the characters have a reasonably uniform distribution, then the average complexity would be closer to O(N). So by choosing an algorithm with better worst-case complexity, you may well make more typical cases much slower.
FYI,
The string::find in both gcc/libstdc++ and llvm/libcxx were very slow. I improved both of them quite significantly (by ~20x in some cases). You might want to check the new implementation:
GCC: PR66414 optimize std::string::find
https://github.com/gcc-mirror/gcc/commit/fc7ebc4b8d9ad7e2891b7f72152e8a2b7543cd65
LLVM: https://reviews.llvm.org/D27068
The new algorithm is simpler and uses hand optimized assembly functions of memchr, and memcmp.
Where do you get the impression from that std::string::substr() doesn't use a linear algorithm? In fact, I can't even imagine how to implement in a way which has the complexity you quoted. Also, there isn't much of an algorithm involved: is it possible that you are think this function does something else than it does? std::string::substr() just creates a new string starting at its first argument and using either the number of characters specified by the second parameter or the characters up to the end of the string.
You may be referring to std::string::find() which doesn't have any complexity requirements or std::search() which is indeed allowed to do O(n * m) comparisons. However, this is a giving implementers the freedom to choose between an algorithm which has the best theoretical complexity vs. one which doesn't doesn't need additional memory. Since allocation of arbitrary amounts of memory is generally undesirable unless specifically requested, this seems a reasonable thing to do.
The C++ standard does not dictate the performance characteristics of substr (or many other parts, including the find you're most likely referring to with an M*N complexity).
It mostly dictates functional aspects of the language (with some exceptions like the non-legacy sort functions, for example).
Implementations are even free to implement qsort as a bubble sort (but only if they want to be ridiculed and possibly go out of business).
For example, there are only seven (very small) sub-points in section 21.4.7.2 basic_string::find of C++11, and none of them specify performance parameters.
Let's look into the CLRS book. On the page 989 of third edition we have the following exercise:
Suppose that pattern P and text T are randomly chosen strings of
length m and n, respectively, from the d-ary alphabet
Ʃd = {0; 1; ...; d}, where d >= 2. Show that the
expected number of character-to-character comparisons made by the
implicit loop in line 4 of the naive algorithm is
over all executions of this loop. (Assume that the naive algorithm
stops comparing characters for a given shift once it finds a mismatch
or matches the entire pattern.) Thus, for randomly chosen strings, the
naive algorithm is quite efficient.
NAIVE-STRING-MATCHER(T,P)
1 n = T:length
2 m = P:length
3 for s = 0 to n - m
4 if P[1..m] == T[s+1..s+m]
5 print “Pattern occurs with shift” s
Proof:
For a single shift we are expected to perform 1 + 1/d + ... + 1/d^{m-1} comparisons. Now use summation formula and multiply by number of valid shifts, which is n - m + 1. □
Where do you get your information about the C++ library? If you do mean string::search and it really doesn't use the KMP algorithm then I suggest that it is because that algorithm isn't generally faster than a simple linear search due to having to build a partial match table before the search can proceed.
If you are going to be searching for the same pattern in multiple texts. The BoyerMoore algorithm is a good choice because the pattern tables need only be computed once , but are used multiple times when searching multiple texts. If you are only going to search for a pattern once in 1 text though, the overhead of computing the tables along with the overhead of allocating memory slows you down too much and std::string.find(....) will beat you since it does not allocate any memory and has no overhead.
Boost has multiple string searching algorithms. I found that BM was an order of magnitude slower in the case of a single pattern search in 1 text than std::string.find().
For my cases BoyerMoore was rarely faster than std::string.find() even when searching multiple texts with the same pattern.
Here is the link to BoyerMoore
BoyerMoore
I'm doing a project in which i need to insert data into vectors sort it and search it ...
i need fastest possible algorithms for sort and search ... i've been searching and found out that std::sort is basically quicksort which is one of the fastest sorts but i cant figure out which search algorithm is the best ? binarysearch?? can u help me with it? tnx ... So i've got 3 methods:
void addToVector(Obj o)
{
fvector.push_back(o);
}
void sortVector()
{
sort(fvector.begin(), fvector().end());
}
Obj* search(string& bla)
{
//i would write binary search here
return binarysearch(..);
}
I've been searching and found out that std::sort is basically
quicksort.
Answer: Not quite. Most implementations use a hybrid algorithm like
introsort, which combines quick-sort, heap-sort and insertion sort.
Quick-sort is one of the fastest sorting methods.
Answer: Not quite. In general it holds (i.e., in the average case quick-sort is of complexity). However, quick-sort has quadratic worst-case performance (i.e., ). Furthermore, for a small number of inputs (e.g., if you have a std::vector with a small numbers of elements) sorting with quick-sort tends to achieve worst performance than other sorting algorithms that are considered "slower" (see chart below):
I can't figure out which searching algorithm is the best. Is it binary-search?
Answer: Binary search has the same average and worst case performance (i.e., ). Also have in mind that binary-search requires that the container should be arranged in ascending or descending order. However, whether is better than other searching methods (e.g., linear search which has time complexity) depends on a number of factors. Some of them are:
The number of elements/objects (see chart below).
The type of elements/objects.
Bottom Line:
Usually looking for the "fastest" algorithm denotes premature optimization and according to one of the "great ones" (Premature optimization is the root of all evil - Donald Knuth). The "fastest", as I hope it has been clearly shown, depends on quite a number of factors.
Use std::sort to sort your std::vector.
After sorting your std::vector use std::binary_search to find out whether a certain element exists in your std::vector or use std::lower_bound or std::upper_bound to find and get an element from your std::vector.
For amortised O(1) access times, use a [std::unordered_map], maybe using a custom hash for best effects.
Sorting seems to be unneccessary extra work.
Searching and Sorting efficiency is highly dependent on the type of data, the ordering of the raw data, and the quantity of the data.
For example, for small sorted data sets, a linear search may be faster than a binary search; or the time differences between the two is negligible.
Some sort algorithms will perform horribly on inversely ordered data, such a binary tree sort. Data that does not have much variation may cause a high degree of collisions on hash algorithms.
Perhaps you need to answer the bigger question: Is search or sorting the execution bottleneck in my program? Profile and find out.
If you need the fastest or the best sorting algorithm... There is no such one. At least it haven't been found yet. There are algorithms that provide better results for different data, there are algorithms that provide good results for most of data. You either need to analyze your data and find the best one for your case or use generic algo like std::sort and expect it to provide good results but not the best.
if your elements are of integer you should use bucket sort algorithm which run at O(N) time instead of O(nlogn) average case as with qsort
[http://en.wikipedia.org/wiki/Bucket_sort]
Sorting
In case you want to know about the fastest sorting technique for integer values in a vector then I would suggest you to refer the following link:
https://github.com/fenilgmehta/Fastest-Integer-Sort
It uses radix sort and counting sort for large arrays and merge sort along with insertion sort for small arrays.
According to statistics, this sorting algorithm is way faster than C++ std::sort for integral values.
It is 6 times faster than C++ STL std::sort for "int64_t array[10000000]"
Searching
If you want to know whether a particular value is present in the vector or not, then you should use binary_search(...)
If you want to know the exact location of an element, then use lower_bound(...) and upper_bound(...)
Given an array of N integer such that only one integer is repeated. Find the repeated integer in O(n) time and constant space. There is no range for the value of integers or the value of N
For example given an array of 6 integers as 23 45 67 87 23 47. The answer is 23
(I hope this covers ambiguous and vague part)
I searched on the net but was unable to find any such question in which range of integers was not fixed.
Also here is an example that answers a similar question to mine but here he created a hash table with the highest integer value in C++.But the cpp does not allow such to create an array with 2^64 element(on a 64-bit computer).
I am sorry I didn't mention it before the array is immutable
Jun Tarui has shown that any duplicate finder using O(log n) space requires at least Ω(log n / log log n) passes, which exceeds linear time. I.e. your question is provably unsolvable even if you allow logarithmic space.
There is an interesting algorithm by Gopalan and Radhakrishnan that finds duplicates in one pass over the input and O((log n)^3) space, which sounds like your best bet a priori.
Radix sort has time complexity O(kn) where k > log_2 n often gets viewed as a constant, albeit a large one. You cannot implement a radix sort in constant space obviously, but you could perhaps reuse your input data's space.
There are numerical tricks if you assume features about the numbers themselves. If almost all numbers between 1 and n are present, then simply add them up and subtract n(n+1)/2. If all the numbers are primes, you could cheat by ignoring the running time of division.
As an aside, there is a well-known lower bound of Ω(log_2(n!)) on comparison sorting, which suggests that google might help you find lower bounds on simple problems like finding duplicates as well.
If the array isn't sorted, you can only do it in O(nlogn).
Some approaches can be found here.
If the range of the integers is bounded, you can perform a counting sort variant in O(n) time. The space complexity is O(k) where k is the upper bound on the integers(*), but that's a constant, so it's O(1).
If the range of the integers is unbounded, then I don't think there's any way to do this, but I'm not an expert at complexity puzzles.
(*) It's O(k) since there's also a constant upper bound on the number of occurrences of each integer, namely 2.
In the case where the entries are bounded by the length of the array, then you can check out Find any one of multiple possible repeated integers in a list and the O(N) time and O(1) space solution.
The generalization you mention is discussed in this follow up question: Algorithm to find a repeated number in a list that may contain any number of repeats and the O(n log^2 n) time and O(1) space solution.
The approach that would come closest to O(N) in time is probably a conventional hash table, where the hash entries are simply the numbers, used as keys. You'd walk through the list, inserting each entry in the hash table, after first checking whether it was already in the table.
Not strictly O(N), however, since hash search/insertion gets slower as the table fills up. And in terms of storage it would be expensive for large lists -- at least 3x and possibly 10-20x the size of the array of numbers.
As was already mentioned by others, I don't see any way to do it in O(n).
However, you can try a probabilistic approach by using a Bloom Filter. It will give you O(n) if you are lucky.
Since extra space is not allowed this can't be done without comparison.The concept of lower bound on the time complexity of comparison sort can be applied here to prove that the problem in its original form can't be solved in O(n) in the worst case.
We can do in linear time o(n) here as well
public class DuplicateInOnePass {
public static void duplicate()
{
int [] ar={6,7,8,8,7,9,9,10};
Arrays.sort(ar);
for (int i =0 ; i <ar.length-1; i++)
{
if (ar[i]==ar[i+1])
System.out.println("Uniqie Elements are" +ar[i]);
}
}
public static void main(String[] args) {
duplicate();
}
}
Could anyone explain how an unordered set works? I am also not sure how a set works. My main question is what is the efficiency of its find function.
For example, what is the total big O run time of this?
vector<int> theFirst;
vector<int> theSecond;
vector<int> theMatch;
theFirst.push_back( -2147483648 );
theFirst.push_back(2);
theFirst.push_back(44);
theSecond.push_back(2);
theSecond.push_back( -2147483648 );
theSecond.push_back( 33 );
//1) Place the contents into a unordered set that is O(m).
//2) O(n) look up so thats O(m + n).
//3) Add them to third structure so that's O(t)
//4) All together it becomes O(m + n + t)
unordered_set<int> theUnorderedSet(theFirst.begin(), theFirst.end());
for(int i = 0; i < theSecond.size(); i++)
{
if(theUnorderedSet.find(theSecond[i]) != theUnorderedSet.end())
{
theMatch.push_back( theSecond[i] );
cout << theSecond[i];
}
}
unordered_set and all the other unordered_ data structures use hashing, as mentioned by #Sean. Hashing involves amortized constant time for insertion, and close to constant time for lookup. A hash function essentially takes some information and produces a number from it. It is a function in the sense that the same input has to produce the same output. However, different inputs can result in the same output, resulting in what is termed a collision. Lookup would be guaranteed to be constant time for an "perfect hash function", that is, one with no collisions. In practice, the input number comes from the element you store in the structure (say it's value, it is a primitive type) and maps it to a location in a data structure. Hence, for a given key, the function takes you to the place where the element is stored without need for any traversals or searches (ignoring collisions here for simplicity), hence constant time. There are different implementations of these structures (open addressing, chaining, etc.) See hash table, hash function. I also recommend section 3.7 of The Algorithm Design Manual by Skiena. Now, concerning big-O complexity, you are right that you have O(n) + O(n) + O(size of overlap). Since the overlap cannot be bigger than the smaller of m and n, the overall complexity can be expressed as O(kN), where N is the largest between m and n. So, O(N). Again, this is "best case", without collisions, and with perfect hashing.
set and multi_set on the other hand use binary trees, so insertions and look-ups are typically O(logN). The actual performance of a hashed structure vs. a binary tree one will depend on N, so it is best to try the two approaches and profile them in a realistic running scenario.
All of the std::unordered_*() data types make use of a hash to perform lookups. Look at Boost's documentation on the subject and I think you'll gain an understanding very quickly.
http://www.boost.org/doc/libs/1_46_1/doc/html/unordered.html
What's faster: inserting into a priority queue, or sorting retrospectively?
I am generating some items that I need to be sorted at the end. I was wondering, what is faster in terms of complexity: inserting them directly in a priority_queue or a similar data structure, or using a sort algorithm at end?
Testing is the best way to answer this question for your specific computer architecture, compiler, and implementation. Beyond that, there are generalizations.
First off, priority queues are not necessarily O(n log n).
If you have integer data, there are priority queues which work in O(1) time. Beucher and Meyer's 1992 publication "The morphological approach to segmentation: the watershed transformation" describes hierarchical queues, which work quite quickly for integer values with limited range. Brown's 1988 publication "Calendar queues: a fast 0 (1) priority queue implementation for the simulation event set problem" offers another solution which deals well with larger ranges of integers - two decades of work following Brown's publication has produced some nice results for doing integer priority queues fast. But the machinery of these queues can become complicated: bucket sorts and radix sorts may still provide O(1) operation. In some cases, you may even be able to quantize floating-point data to take advantage of an O(1) priority queue.
Even in the general case of floating-point data, that O(n log n) is a little misleading. Edelkamp's book "Heuristic Search: Theory and Applications" has the following handy table showing the time complexity for various priority queue algorithms (remember, priority queues are equivalent to sorting and heap management):
As you can see, many priority queues have O(log n) costs not just for insertion, but also for extraction, and even queue management! While the coefficient is generally dropped for measuring the time complexity of an algorithm, these costs are still worth knowing.
But all these queues still have time complexities which are comparable. Which is best? A 2010 paper by Cris L. Luengo Hendriks entitled "Revisiting priority queues for image analysis" addresses this question.
In Hendriks' hold test, a priority queue was seeded with N random numbers in the range [0,50]. The top-most element of the queue was then dequeued, incremented by a random value in the range [0,2], and then queued. This operation was repeated 10^7 times. The overhead of generating the random numbers was subtracted from the measured times. Ladder queues and hierarchical heaps performed quite well by this test.
The per element time to initialize and empty the queues were also measured---these tests are very relevant to your question.
As you can see, the different queues often had very different responses to enqueueing and dequeueing. These figures imply that while there may be priority queue algorithms which are superior for continuous operation, there is no best choice of algorithm for simply filling and then emptying a priority queue (the operation you're doing).
Let's look back at your questions:
What's faster: inserting into a priority queue, or sorting retrospectively?
As shown above, priority queues can be made efficient, but there are still costs for insertion, removal, and management. Insertion into a vector is fast. It's O(1) in amortized time, and there are no management costs, plus the vector is O(n) to be read.
Sorting the vector will cost you O(n log n) assuming that you have floating-point data, but this time complexity's not hiding things like the priority queues were. (You have to be a little careful, though. Quicksort runs very well on some data, but it has a worst-case time complexity of O(n^2). For some implementations, this is a serious security risk.)
I'm afraid I don't have data for the costs of sorting, but I'd say that retroactive sorting captures the essence of what you're trying to do better and is therefore the better choice. Based on the relative complexity of priority queue management versus post-sorting, I'd say that post-sorting should be faster. But again, you should test this.
I am generating some items that I need to be sorted at the end. I was wondering, what is faster in terms of complexity: inserting them directly in a priority-queue or a similar data structure, or using a sort algorithm at end?
We're probably covered this above.
There's another question you didn't ask, though. And perhaps you already know the answer. It's a question of stability. The C++ STL says that the priority queue must maintain a "strict weak" order. This means that elements of equal priority are incomparable and may be placed in any order, as opposed to a "total order" where every element is comparable. (There's a nice description of ordering here.) In sorting, "strict weak" is analogous to an unstable sort and "total order" is analogous to a stable sort.
The upshot is that if elements of the same priority should stay in the same order you pushed them into your data structure, then you need a stable sort or a total order. If you plan to use the C++ STL, then you have only one option. Priority queues use a strict weak ordering, so they're useless here, but the "stable_sort" algorithm in the STL Algorithm library will get the job done.
Let me know if you'd like a copy of any of the papers mentioned or would like clarification. :-)
Inserting n items into a priority queue will have asymptotic complexity O(n log n) so in terms of complexity, it’s not more efficient than using sort once, at the end.
Whether it’s more efficient in practice really depends. You need to test. In fact, in practice, even continued insertion into a linear array (as in insertion sort, without building a heap) may be the most efficient, even though asymptotically it has worse runtime.
Depends on the data, but I generally find InsertSort to be faster.
I had a related question, and I found in the end the bottleneck was just that I was doing a deffered sort (Only when I ended up needed it) and on a large amount of items, I usually had the worst-case-scenario for my QuickSort (already in order), So I used an insert sort
Sorting 1000-2000 elements with many cache misses
So analyze your data!
To your first question (which is faster): it depends. Just test it. Assuming you want the final result in a vector, the alternatives might look something like this:
#include <iostream>
#include <vector>
#include <queue>
#include <cstdlib>
#include <functional>
#include <algorithm>
#include <iterator>
#ifndef NUM
#define NUM 10
#endif
int main() {
std::srand(1038749);
std::vector<int> res;
#ifdef USE_VECTOR
for (int i = 0; i < NUM; ++i) {
res.push_back(std::rand());
}
std::sort(res.begin(), res.end(), std::greater<int>());
#else
std::priority_queue<int> q;
for (int i = 0; i < NUM; ++i) {
q.push(std::rand());
}
res.resize(q.size());
for (int i = 0; i < NUM; ++i) {
res[i] = q.top();
q.pop();
}
#endif
#if NUM <= 10
std::copy(res.begin(), res.end(), std::ostream_iterator<int>(std::cout,"\n"));
#endif
}
$ g++ sortspeed.cpp -o sortspeed -DNUM=10000000 && time ./sortspeed
real 0m20.719s
user 0m20.561s
sys 0m0.077s
$ g++ sortspeed.cpp -o sortspeed -DUSE_VECTOR -DNUM=10000000 && time ./sortspeed
real 0m5.828s
user 0m5.733s
sys 0m0.108s
So, std::sort beats std::priority_queue, in this case. But maybe you have a better or worse std:sort, and maybe you have a better or worse implementation of a heap. Or if not better or worse, just more or less suited to your exact usage, which is different from my invented usage: "create a sorted vector containing the values".
I can say with a lot of confidence that random data won't hit the worst case of std::sort, so in a sense this test might flatter it. But for a good implementation of std::sort, its worst case will be very difficult to construct, and might not actually be all that bad anyway.
Edit: I added use of a multiset, since some people have suggested a tree:
#elif defined(USE_SET)
std::multiset<int,std::greater<int> > s;
for (int i = 0; i < NUM; ++i) {
s.insert(std::rand());
}
res.resize(s.size());
int j = 0;
for (std::multiset<int>::iterator i = s.begin(); i != s.end(); ++i, ++j) {
res[j] = *i;
}
#else
$ g++ sortspeed.cpp -o sortspeed -DUSE_SET -DNUM=10000000 && time ./sortspeed
real 0m26.656s
user 0m26.530s
sys 0m0.062s
To your second question (complexity): they're all O(n log n), ignoring fiddly implementation details like whether memory allocation is O(1) or not (vector::push_back and other forms of insert at the end are amortized O(1)) and assuming that by "sort" you mean a comparison sort. Other kinds of sort can have lower complexity.
As far as I understand, your problem does not require Priority Queue, since your tasks sounds like "Make many insertions, after that sort everything". That's like shooting birds from a laser, not an appropriate tool. Use standard sorting techniques for that.
You would need a Priority Queue, if your task was to imitate a sequence of operations, where each operation can be either "Add an element to the set" or "Remove smallest/greatest element from the set". This can be used in problem of finding a shortest path on the graph, for example. Here you cannot just use standard sorting techniques.
A priority queue is usually implemented as a heap. Sorting using a heap is on average slower than quicksort, except that quicksort has a worse worst case performance. Also heaps are relatively heavy data structures, so there's more overhead.
I'd reccomend sort at end.
Why not use a binary search tree? Then the elements are sorted at all times and the insertion costs are equal to the priority queue.
Read about RedBlack balanced trees here
I think that the insertion is more efficient in almost all cases where you are generating the data (i.e. don't already have it in a list).
A priority queue is not your only option for insertion as you go. As mentioned in other answers a binary tree (or related RB-tree) is equally efficient.
I would also check how the priority queue is implemented - many are based on b-trees already but a few implementations are not very good at extracting the elements (they essentially go through the entire queue and look for the highest priority).
On a max-insert priority queue operations are O(lg n)
There are a lot of great answers to this question. A reasonable "rule of thumb" is
If you have all your elements "up front" then choose sorting.
If you will be adding elements / removing minimal elements "on the fly" then use a priority queue (e.g., heap).
For the first case, the best "worst-case" sort is heap sort anyway and you'll often get better cache performance by just focusing on sorting (i.e. instead of interleaving with other operations).