sort() function in C++ - c++

I found two forms of sort() in C++:
1) sort(begin, end);
2) XXX.sort();
One can be used directly without an object, and one is working with an object.
Is that all? What's the differences between these two sort()? They are from the same library or not? Is the second a method of XXX?
Can I use it like this
vector<int> myvector
myvector.sort();
or
list<int> mylist;
mylist.sort();

std::sort is a function template that works with any pair of random-access iterators. Consequently, the algorithm implemented by std::sort is tailored (optimized) for random access. Most of the time it is going to be some flavor of quick-sort.
std::list::sort is a dedicated list-oriented version of sort. std::list is a container that does not support [efficient] random access, which means that you can't use std::sort with it. This creates the need for a dedicated sorting algorithm. Most of the time it will be implemented as some flavor of merge-sort.

std::list includes a sort() member, because there are ways to do sorting efficiently that work for lists, but almost nothing else. If you tried to use the same sorting implementation for a list as everything else, one of them wouldn't work very well, so they provide a special one for list.
Most of the time, you should use std::sort, and forget that std::list even exists. It's useful in a few very specific circumstances, but they're sufficiently rare that using it is a mistake at least 90% of the time that people think they should.
Edit: The situation is not that using std::list::sort is a mistake, but that using std::list is usually a mistake. There are a couple of reasons for this. First of all, std::list requires a couple of pointers in addition to space for each item you store, so unless your data is fairly large, it can impose a substantial space overhead. Second, since each node is allocated individually, traversing nodes tends to be fairly slow (locality of reference is low, so cache utilization tends to be poor). Finally, although you can insert or delete an element from the middle of a list with constant complexity, you (normally) need to use a linear-complexity traversal of the list to find the right point at which to insert or delete that item. Couple that with the poor cache utilization, and you get insertions and deletions in the middle of a linked list usually being slower than with something like a vector or deque.
The other 10% of the time (or maybe somewhat less than that) is when you can reasonably store a position in the list in the long term, and (at least typically) plan on making a number of insertions and deletions at (close to) that same point, without traversing the linked list each time to find the right place. If you make enough modifications to the list at (or close to) the same place, you can amortize the list traversal over a number of insertions and deletions. If you make enough modifications at (close to) the same place, you can come out ahead.
That can work out well in a few cases, but forces you to store more "state" data for a long period of time, mostly to make up for the deficiencies in the data structure itself. To the extent possible, I prefer to make individual operations close to pure and atomic, so they minimize dependence (and effect) on long-term data. In the single-threaded case, this mostly helped modularity, but with multi-core, multithreaded (etc.) programming, it can also have a substantial effect on efficiency and difficulty of development. Access to that long-term data generally requires thread synchronization.
That's not to say that avoiding list will automatically make your programs lots faster, or anything like that -- just that I find situations where it's really useful to be quite rare, and a lot of situations that seem to fit what people have been told about when a linked list will be useful, really aren't.

std::sort only works with random-access iterators. That limits it to arrays, vector and deque (and user-defined types which provide random access).
std::list is a linked list which doesn't support random access. The generic sort function doesn't work for it. And even if it did, it would be suboptimal. A linked list is not sorted by copying data from node to node - it is sorted by relinking the nodes.
std::sort also wouldn't work for other containers (sets and maps), but those are always sorted anyway.

Related

std::vector::insert vs std::list::operator[]

I know that std::list::operator[] is not implemented as it has bad performance. But what about std::vector::insert it is as much inefficient as much std::list::operator[] is. What is the explanation behind?
std::vector::insert is implemented because std::vector has to meet requirements of SequenceContainer concept, while operator[] is not required by any concepts (that I know of), possible that will be added in ContiguousContainer concept in c++17. So operator[] added to containers that can be used like arrays, while insert is required by interface specification, so containers that meet certain concept can be used in generic algorithms.
I think Slava's answer does the best job, but I'd like to offer a supplementary explanation. Generally with data structures, it's far more common to have more accesses than insertions, than vice versa. There are many data structures that may try to optimize access at the cost of insertion, but the inverse is much more rare, because it tends to be less useful in real life.
operator[], if implemented for a linked list, would access an existing value presumably (similar to how it does for vector). insert adds new values. It's much more likely you will be willing to take a big performance hit to insert some new elements into a vector, provided that the subsequent accesses are extremely fast. In many cases, element insertion may be outside of the critical path entirely whereas the critical path consists of a single traversal, or random access of already-present data. So it's simply convenient to have insert take care of the details for you in that case (it's actually a bit annoying to write efficiently and correctly). This is actually a not-uncommon use of a vector.
On the other hand, using operator[] on a linked list would almost always be a sign that you are using the wrong data structure.
std::list::operator[] would require an O(N) traversal and is not really in accordance with what a list is designed to do. If you need operator[] then use a different container type. When C++ folk see a [] they assume an O(1) (or, at worse, an O(Log N)) operation. Supplying [] for a list would break that.
But although std::vector::insert is also O(N), it can be optimised: an at-end insertion can be readily optimised by having the vector's capacity grow in large chunks. An insertion in the middle requires an element-by-element move, but that again can be performed very quickly on modern chipsets.
The [] operator is inherited from plain arrays. It has always been understood as a fast (sub linear time) accessor of the underlying container. Since list is does not support sub linear time access, it does not make sense for it to implement the operator.
auto a = Container [10]; // Ideally I can assume this is quick
The equivalent of your std::list <>::operator [] is std::next <std::list<>::iterator>. Documented at cpp-reference.
auto a = *std::next (Container.begin (), 10); // This may take a little while
This is the truly generic way index a container. If the container supports random access, it will be constant time, other wise it will be linear.

Fast data structure that supports finding the minimum element and accessing, inserting, removing and updating data at any index

I'm looking for ideas to implement a templatized sequence container data structure which can beat the performance of std::vector in as many features as possible and potentially perform much faster. It should support the following:
Finding the minimum element (and returning it's index)
Insertion at any index
Removal at any index
Accessing and updating any element by index (via operator[])
What would be some good ways to implement such a structure in C++?
You generally be pretty sure that the STL implementations of all containers tend to be very good at the range of tasks they were designed for. That is to say, you're unlikely to be able to build a container that is as robust as std::vector and quicker for all applications. However, generally speaking, it is almost always possible to beat a generic tool when optimizing for a specific application.
First, let's think about what a vector actually is. You can think of it as a pointer to a c-style array, except that its elements are stored on the heap. Unlike a c array, it also provides a bunch of methods that make it a little bit more convenient to manipulate. But like a c-array, all of it's data is stored contiguously in memory, so lookups are extremely cheap, but changing its size may require the entire array to be shifted elsewhere in memory to make room for the new elements.
Here are some ideas for how you could do each of the things you're asking for better than a vanilla std::vector:
Finding the minimum element: Search is typically O(N) for many containers, and certainly for a vector (because you need to iterate through all elements to find the lowest). You can make it O(1), or very close to free, by simply keeping the smallest element at all times, and only updating it when the container is changed.
Insertion at any index: If your elements are small and there are not many, I wouldn't bother tinkering here, just do what the vector does and keep elements contiguously next to each other to keep lookups quick. If you have large elements, store pointers to the elements instead of the elements themselves (boost's stable vector will do this for you). Keep in mind that this make lookup more expensive, because you now need to dereference the pointer, so whether you want to do this will depend on your application. If you know the number of elements you are going to insert, std::vector provides the reserve method which preallocates some memory for you, but what it doesn't do is allow you to decide how the size of the allocated memory grows. So if your application warrants lots of push_back operations without enough information to intelligently call reserve, you might be able to beat the standard std::vector implementation by tailoring the growth function of your container to your particular needs. Another option is using a linked list (e.g. std::list), which will beat an std::vector in insertions for larger containers. However, the cost here is that lookup (see 4.) will now become vastly slower (O(N) instead of O(1) for vectors), so you're unlikely to want to go down this path unless you plan to do more insertions/erasures than lookups.
Removal at any index: Similar considerations as for 2.
Accessing and updating any element by index (via operator[]): The only way you can beat std::vector in this regard is by making sure your data is in the cache when you try to access it. This is because lookup for a vector is essentially an array lookup, which is really just some pointer arithmetic and a pointer dereference. If you don't access your vector often you might be able to squeeze out a few clock cycles by using a custom allocator (see boost pools) and placing your pool close to the stack pointer.
I stopped writing mainly because there are dozens of ways in which you could approach this problem.
At the end of the day, this is probably more of an exercise in teaching you that the implementation of std::vector is likely to be extremely efficient for most compilers. All of these suggestions are essentially micro-optimizations (which are the root of all evil), so please don't blindly apply these in important code, as they're highly likely to end up costing you a lot of time and headache.
However, that's not to say you shouldn't tinker and learn for yourself, so by all means go ahead and try to beat it for your application and let us know how you go! Good luck :)

A Real World Example of Vector vs List showing scenario where each is more efficient than the other

I was actually asked this question at a job interview and I froze. The question gave me food for thought so I thought I'd also ask if any of you can help with examples. Can you please focus on efficiency in your real world examples showing how one is more efficient than the other and vice versa?
Thanks much
Edit:
Thanks everybody for your input. Just to clarify I was asking for real world examples, don't worry about if they are trivial examples or not, any will do.
The normal perception before listen for the first time that you should avoid std::list and prefer std::vector as default always is that the topics learn in the university apply as learned (Big-O). See the references for the Big-O of the method of the two containers.
That std::list (a double linked list), should be more performance in insertion, deletion (O(1)) if it's in the middle of the container (is where the big performance gain should manifest).
The problem with this statements is that the hardware love continuous memory, and a lot of features have been invented to take advantage of this (prefetch, cache, etc...), the compiler have been optimized to recognize pattern (eg: memcpy, memmove), and generate the best performance code (sometime directly in assembly).
The previous consideration conclude for even big container size (eg: half million, I think was the max size that Bjarne test) the std::vector outperform the std::list in performance (and much more in memory size).
In the new standard (C++11) this difference grow more because of move semantic making the grow in push_back lower cost in some cases.
My recommendation is always use std::vector until you have a reason not to used.
More info:
http://isocpp.org/blog/2014/06/stroustrup-lists
http://blog.davidecoppola.com/2014/05/20/cpp-benchmarks-vector-vs-list-vs-deque/
A video for more information: Why you should avoid Linked Lists
He is asking for real examples about that, so I assume you are asking about real applications of these data sctructures.
Vector better than list: any aplication that requires a grid representation for example (any computer vision algorithm, for instance) in which the size of the grid is not known in advance (otherwise you would use array). In this case, you would need random access to container elements and vector is O(1) (constant time). Another example is for example when you have to build a tree (for a tree-based path planning algorithm, for instance). You do not know the number of nodes in advance, so you just push_back nodes all the time and store in their nodes the indices of the parents.
List better than vector: when you are going to do a lot of insertions/deletions in the middle of the container (note middle I mean not first/last elements). Imagine you are implementing a videogame in which many enemies are appearing and you have to kill them. Everytime you kill one, the proper thing to do is to detele that enemy so that it does not consume memory. However, it is very unlikely that you kill enemy 1. Therefore, you would need to remove that enemy from a list of enemies. If you are implementing insertion-sort-like algorithms a list can be also very helpful since you are all the time moving elements within the list.
If you want to randomly access an item efficiently and don't care about efficiency of insertion and deletion, use vector.
Otherwise, if you may have many insertion and deletion and care little about access the items, use list.
+1 NetVipeC
One advantage list has over vector is in terms of iterator invalidation. If you have an iterator to an element in a vector and you modify the vector, the iterator is no longer guaranteed to be valid. For a list the iterator is still valid (unless you remove the element it was pointing to). From this it should be easy to create examples (artificial maybe) where list is more efficient than vector.

What is a good data structure with fast insertions+deletions and partial reversing?

I am writing an algorithm in C++, and for that I need a data structure to store a sequence of n data points. That is, I want to store elements v[0],v[1],...,v[n-1]. I do care about the order.
The operations I would like to be fast are:
Sequential access (i.e. access to v[0], then v[1] and so on with the ability to edit the values);
Point relocation, i.e.
{v[0],v[1],...,v[i],v[i+1],...,v[j-1],v[j],v[j+1],...,v[n-1]} -> {v[0],v[1],...,v[i],v[j],v[i+1],...,v[j-1],v[j+1],...,v[n-1]};
Partial reversion, i.e.
{v[0],...,v[i],v[i+1],...,v[j-1],v[j],v[j+1],...,v[n-1]} -> {v[0],...,v[i],v[j],v[j-1],...,v[i+1],v[j+1],...,v[n-1]}.
It seems, that I can implement my algorithm using a XOR-linked list, and it will give the smallest complexity (operations above will be O(1), giving O(n^2) for my algorithm). But I know, that XOR-linked list is considered to be "an ugly data structure" ([1], [2]).
What is a good data structure for this than? More precisely, is there any other commonly used data structure, implementing these operations in O(1) time?
It depends on a lot more factors that you have not mentioned.
First, it depends on the size of your elements and their copying code. If you have small elements (about 64 bytes or less) and that their copying (or moving) is cheap (or even, trivial (in the POD-type sense)), then using std::vector is likely to be a very good choice, even with the "worse" time complexities (btw, as a tip for the future, don't be too hung up on the pursuit of minimal time complexity, it's only one part of the whole story).
This is especially true if your elements are trivial, because the rotate operation in a vector is going to be very fast (although still O(n)), while the other operations are the same in terms of time complexity compared to a linked-list.
Second, it depends on how often you do the different operations that you mentioned. Sequential access through a linked-list is really inefficient, and they are generally not recommended if traversals are something you do often.
If your elements are so small that you consider using a XOR-linked-list (to spare one pointer per element), then it's likely that you shouldn't be using a linked-list at all. And to be honest, I'm not sure the XOR-linked-list is ever really worth it (but I don't want to get into that).
Overall, I would have to say that you should test these three options: linked-list (either std::forward_list or std::list), std::vector, or a vector of pointers (e.g. std::vector< std::unique_ptr<T> >). Another choice might be an unrolled linked-list, but that will only be good within a specific regime of the factors that I mentioned.
And I reiterate, I said test them, because that is really the only way you will know what is best. "Analysis" is only for rules of thumb, no more, as per one of my favorite quotes:
As far as the laws of mathematics refer to reality, they are not certain; as far as they are certain, they do not refer to reality. (Albert Einstein)
I would just use a std::vector<T>, then only look for a potentially more efficient data structure if you find the operations are too expensive in practice.
Sequential access of std::vector<T> is extremely efficient.
The std::rotate algorithm will work for operation 2.
The std::reverse algorithm will work for operation 3.
The mentioned algorithms are in the <algorithm> header.

Efficiency of iterators in unordered_map (C++)

I can't seem to find any information on this, so I turn to stackoverflow. How efficient are the iterators of std::tr1::unordered_map in C++? Especially compared to, say, list iterators. Would it make sense to make a wrapper class that also holds all the keys in a list to allow for efficient iteration (my code does use a lot of iteration over the keys in an unordered_map). For those who will recommend boost, I can't use it (for whatever reasons).
I haven't checked TR1, but N3035 (C++0x draft) says this:
All the categories of iterators
require only those functions that are
realizable for a given category in
constant time (amortized). Therefore,
requirement tables for the iterators
do not have a complexity column.
The standard isn't going to give an efficiency guarantee other than in terms of complexity, so you have no guaranteed comparison of list and unordered_map other than that they're both amortized constant time (i.e., linear time for a complete iteration over the container).
In practice, I'd expect an unordered_map iterator to be at least in the vicinity of list, unless your hashmap is very sparsely populated. There could be an O(number of buckets) term in the complexity of the complete iteration. But I've never looked at even one implementation specifically of unordered_map for C++, so I don't know what adornments to expect on a simplistic "array of linked lists" hashtable implementation. If you have a "typical" platform test it, if you're trying to write code that will definitely be the fastest possible on all C++ implementations then tough luck, you can't ;-)
The unordered_map iterator basically just has to walk over the internal tree structure of the hashtable. This just means doing some pointer following, and so should be pretty efficient. Of course, if you are walking an unordered_map a lot, you may be using the wrong data structure in the first place. In any case, the answer to this, as it is for all performance questions, is for you to time your specific code to see if it is fast enough.
Unfortunately, you can't say for sure if something is efficient enough unless you've tried it and measured the results. I can tell you that the standard library, TR1, and Boost classes have had tons of eyes on them. They're probably as fast as they're going to get for most common use cases. Walking an container is certainly a common use case.
With all that said, you need to ask yourself a few questions:
What's the clearest way to say what I want? It may be that writing a wrapper class adds unneeded complexity to your code. Make it correct first, then make it fast.
Can I afford the extra memory and time to maintain a list in parallel with the unordered_map?
Is unordered_map really the right data structure? If your most common use case is traversal from beginning to end, you might be better off with vector because the memory is guaranteed to be contiguous.
Answered by the benchmarks here https://stackoverflow.com/a/25027750/1085128
unordered_map is partway between vector and map for iteration. It is significantly faster than map.