std::vector::insert vs std::list::operator[]

std::vector::insert vs std::list::operator[] - c++

I know that std::list::operator[] is not implemented as it has bad performance. But what about std::vector::insert it is as much inefficient as much std::list::operator[] is. What is the explanation behind?

std::vector::insert is implemented because std::vector has to meet requirements of SequenceContainer concept, while operator[] is not required by any concepts (that I know of), possible that will be added in ContiguousContainer concept in c++17. So operator[] added to containers that can be used like arrays, while insert is required by interface specification, so containers that meet certain concept can be used in generic algorithms.

I think Slava's answer does the best job, but I'd like to offer a supplementary explanation. Generally with data structures, it's far more common to have more accesses than insertions, than vice versa. There are many data structures that may try to optimize access at the cost of insertion, but the inverse is much more rare, because it tends to be less useful in real life.
operator[], if implemented for a linked list, would access an existing value presumably (similar to how it does for vector). insert adds new values. It's much more likely you will be willing to take a big performance hit to insert some new elements into a vector, provided that the subsequent accesses are extremely fast. In many cases, element insertion may be outside of the critical path entirely whereas the critical path consists of a single traversal, or random access of already-present data. So it's simply convenient to have insert take care of the details for you in that case (it's actually a bit annoying to write efficiently and correctly). This is actually a not-uncommon use of a vector.
On the other hand, using operator[] on a linked list would almost always be a sign that you are using the wrong data structure.

std::list::operator[] would require an O(N) traversal and is not really in accordance with what a list is designed to do. If you need operator[] then use a different container type. When C++ folk see a [] they assume an O(1) (or, at worse, an O(Log N)) operation. Supplying [] for a list would break that.
But although std::vector::insert is also O(N), it can be optimised: an at-end insertion can be readily optimised by having the vector's capacity grow in large chunks. An insertion in the middle requires an element-by-element move, but that again can be performed very quickly on modern chipsets.

The [] operator is inherited from plain arrays. It has always been understood as a fast (sub linear time) accessor of the underlying container. Since list is does not support sub linear time access, it does not make sense for it to implement the operator.
auto a = Container [10]; // Ideally I can assume this is quick
The equivalent of your std::list <>::operator [] is std::next <std::list<>::iterator>. Documented at cpp-reference.
auto a = *std::next (Container.begin (), 10); // This may take a little while
This is the truly generic way index a container. If the container supports random access, it will be constant time, other wise it will be linear.

Related

Iterating through a sequence while modifying it. Use vector or List ? C++/ STL

Suppose I have a long sequence of unordered elements S s1, s2, s3,.... of a arbitrary but fixed data type through which I wish to iterate and delete certain elements according to some boolean criterion.
Now if after iterating through the sequence if I am not interested in the final ordering of the sequence then I can store my sequence in 2 ways
Use a plain ol' std::list to represent the sequence. Perform removal with the std::list methods.
Use a std::vector to represent the sequence. If a certain element fails the criterion and has to be deleted swap it with the last vector element and perform a pop_back.
My questions are
1.Which would be a better/efficient way timewise and/or memorywise to store my sequence?
2.If I had to venture a guess, then I would say list, because if si 's data-type memory size is large, swapping would be expensive. Would this reasoning be correct?

In practice, std::vector has a great performance advantage over other containers due to its tight memory locality. If your elements are moreover movable (i.e. inexpensive to swap), then your second option should be your first try. Implement it with the standard remove/erase idiom:
v.erase(std::remove_if(v.begin(), v.end(), predicate), v.end());
You should also set up a second version with a std::list and compare the performance:
l.remove_if(predicate);
The list avoids moving any elements around, so in theory it could be efficient, however the practical effects of memory locality cannot be captured by the language standard and you cannot get around measuring and comparing the actual performance.
(Supposedly, if your element type is huge, like sizeof(T) > 10000, the list will probably start being faster than the vector. Test and compare, and keep your code modular such that changing this later is easy.)

If you have a C++11 compiler, or atleast an rvalue reference aware one, using swap will cost you nothing if your data type isn't flat (i.e., contains pointers to external resources or in other words, is expensive to copy) since it will just move your structs around. So if you have such a compiler, create a move constructor (read up on that) for the data type, and you're set. Just use a std::vector from there on.
Now if your structs are flat (no external resources), and are large, you might really want to use a std::list, since the memory overhead would be reasonably small in comparision to your data type's size. Since you only seem to be interested in bidirectional/sequential access to the elements, this might be just the right place to use a list.
A last point, and an important factor, measure. The default container to reach for should always be std::vector. Measure how both perform before blindly deciding on one. Another important factor is if you actually need to do anything else with the containers, like random-access or such stuff.
Edit: Before I forget, you might also just want to create a view over the container holding your data, which might be very cheap.

We can only guess. I'd say that if the objects are easily copiable (e.g. basic types) then std::vector will be more efficient, as removing elements will not alloc/realloc/free any memory. But if the cost of copying elements is significant, then the std::list will be better.
But note that with C++11, the copy will be converted into a move, so you should consider the moving cost, that will be presumably quite less than the copy.

In almost all practical cases, use std::vector initially. As always, write your code first then optimise later, if and when it is needed. If your profiler indicates that vector's inefficiencies are the cause, then try a list. I've almost never seen a performance benefit from it though.

why Vector doesn't provide the remove() member function while list provides?

If I want to delete all the elements with a value from vector,I call remove algorithm and then call vector's erase member function to physically delete it.
But in the case of list , simple call remove member function and it will delete all elements with that value.
I am not sure why vector does't provide the remove MF while list does it.
For Exp: I want to delete value '4' from vector v.
vector<int> v;
vector<int> ::iterator Itr;
for (int i=0; i< 6; i++)
v.push_back(i*2);
v.push_back(4);
v.push_back(8);
v.push_back(4);
v.erase(remove(v.begin(),v.end(),4), v.end());
and for list:
list.remove(4); // will delete all the element which has value 4

The question is not why std::vector does not offer the operation, but rather why does std::list offer it. The design of the STL is focused on the separation of the containers and the algorithms by means of iterators, and in all cases where an algorithm can be implemented efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can be implemented much more efficiently with knowledge of the container. That is the case of removing elements from a container. The cost of using the remove-erase idiom is linear in the size of the container (which cannot be reduced much), but that hides the fact that in the worst case all but one of those operations are copies of the objects (the only element that matches is the first), and those copies can represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity of the operation will still be linear, but the associated cost for each one of the elements removed is very low, a couple of pointer copies and releasing of a node in memory. At the same time, the implementation as part of the list can offer stronger guarantees: pointers, references and iterators to elements that are not erased do not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific container is std::list::sort, that uses mergesort that is less efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with iterators unless there is a strong reason to provide a particular implementation in a concrete container.

I believe the rationale is that std::list offers a very efficient remove method (if implemented as a doubly linked listed it just adjusts the pointers to the element and deallocates its storage), while element removal for std::vector is comparably slow.
The remove+erase trick is the standard way which works for all container types that offer the required iterator type. Presumably, the designers of the STL wanted to point out this difference. They could have opted to give all containers a remove method, which would be remove+erase for all containers except those who knew a better way, but they didn't.
This seems to violate the idea of generic code at a first glance, but I don't think it really does. Having a simple, generic remove that is easily accessible makes it easy to write generic code that compiles fine with all container types, but at the end generic code that would run extremely slow in 9 out of 10 cases and blazingly fast in the tenth is not truly generic, it just looks so.
The same pattern can be found at std::map::find vs. the generic std::find.

Removing an element from a vector is much slower than doing so for a list: it is (on average) proportional to the size of the vector, whereas the operation on a list is executed in constant time.
The designers of the standard library decided not to include this feature under the principle of "things that look easy should BE (computationally) easy".
I'm not sure whether I agree with this philosophy, but it's considered a design goal for C++.

Because dropping item from a list is cheap, while doing so on a vector is expensive - all following elements have to be shifted, i.e. copied/moved.

I imagine its due to efficiency, its slower to remove random elements from a vector than it is from a list. It might mean a little more typing for situations like this, but at least its obvious in the interface that the std::vector isn't the best data structure if you need to do this often.

Efficiency of iterators in unordered_map (C++)

I can't seem to find any information on this, so I turn to stackoverflow. How efficient are the iterators of std::tr1::unordered_map in C++? Especially compared to, say, list iterators. Would it make sense to make a wrapper class that also holds all the keys in a list to allow for efficient iteration (my code does use a lot of iteration over the keys in an unordered_map). For those who will recommend boost, I can't use it (for whatever reasons).

I haven't checked TR1, but N3035 (C++0x draft) says this:
All the categories of iterators
require only those functions that are
realizable for a given category in
constant time (amortized). Therefore,
requirement tables for the iterators
do not have a complexity column.
The standard isn't going to give an efficiency guarantee other than in terms of complexity, so you have no guaranteed comparison of list and unordered_map other than that they're both amortized constant time (i.e., linear time for a complete iteration over the container).
In practice, I'd expect an unordered_map iterator to be at least in the vicinity of list, unless your hashmap is very sparsely populated. There could be an O(number of buckets) term in the complexity of the complete iteration. But I've never looked at even one implementation specifically of unordered_map for C++, so I don't know what adornments to expect on a simplistic "array of linked lists" hashtable implementation. If you have a "typical" platform test it, if you're trying to write code that will definitely be the fastest possible on all C++ implementations then tough luck, you can't ;-)

The unordered_map iterator basically just has to walk over the internal tree structure of the hashtable. This just means doing some pointer following, and so should be pretty efficient. Of course, if you are walking an unordered_map a lot, you may be using the wrong data structure in the first place. In any case, the answer to this, as it is for all performance questions, is for you to time your specific code to see if it is fast enough.

Unfortunately, you can't say for sure if something is efficient enough unless you've tried it and measured the results. I can tell you that the standard library, TR1, and Boost classes have had tons of eyes on them. They're probably as fast as they're going to get for most common use cases. Walking an container is certainly a common use case.
With all that said, you need to ask yourself a few questions:
What's the clearest way to say what I want? It may be that writing a wrapper class adds unneeded complexity to your code. Make it correct first, then make it fast.
Can I afford the extra memory and time to maintain a list in parallel with the unordered_map?
Is unordered_map really the right data structure? If your most common use case is traversal from beginning to end, you might be better off with vector because the memory is guaranteed to be contiguous.

Answered by the benchmarks here https://stackoverflow.com/a/25027750/1085128
unordered_map is partway between vector and map for iteration. It is significantly faster than map.

sort() function in C++

I found two forms of sort() in C++:
1) sort(begin, end);
2) XXX.sort();
One can be used directly without an object, and one is working with an object.
Is that all? What's the differences between these two sort()? They are from the same library or not? Is the second a method of XXX?
Can I use it like this
vector<int> myvector
myvector.sort();
or
list<int> mylist;
mylist.sort();

std::sort is a function template that works with any pair of random-access iterators. Consequently, the algorithm implemented by std::sort is tailored (optimized) for random access. Most of the time it is going to be some flavor of quick-sort.
std::list::sort is a dedicated list-oriented version of sort. std::list is a container that does not support [efficient] random access, which means that you can't use std::sort with it. This creates the need for a dedicated sorting algorithm. Most of the time it will be implemented as some flavor of merge-sort.

std::list includes a sort() member, because there are ways to do sorting efficiently that work for lists, but almost nothing else. If you tried to use the same sorting implementation for a list as everything else, one of them wouldn't work very well, so they provide a special one for list.
Most of the time, you should use std::sort, and forget that std::list even exists. It's useful in a few very specific circumstances, but they're sufficiently rare that using it is a mistake at least 90% of the time that people think they should.
Edit: The situation is not that using std::list::sort is a mistake, but that using std::list is usually a mistake. There are a couple of reasons for this. First of all, std::list requires a couple of pointers in addition to space for each item you store, so unless your data is fairly large, it can impose a substantial space overhead. Second, since each node is allocated individually, traversing nodes tends to be fairly slow (locality of reference is low, so cache utilization tends to be poor). Finally, although you can insert or delete an element from the middle of a list with constant complexity, you (normally) need to use a linear-complexity traversal of the list to find the right point at which to insert or delete that item. Couple that with the poor cache utilization, and you get insertions and deletions in the middle of a linked list usually being slower than with something like a vector or deque.
The other 10% of the time (or maybe somewhat less than that) is when you can reasonably store a position in the list in the long term, and (at least typically) plan on making a number of insertions and deletions at (close to) that same point, without traversing the linked list each time to find the right place. If you make enough modifications to the list at (or close to) the same place, you can amortize the list traversal over a number of insertions and deletions. If you make enough modifications at (close to) the same place, you can come out ahead.
That can work out well in a few cases, but forces you to store more "state" data for a long period of time, mostly to make up for the deficiencies in the data structure itself. To the extent possible, I prefer to make individual operations close to pure and atomic, so they minimize dependence (and effect) on long-term data. In the single-threaded case, this mostly helped modularity, but with multi-core, multithreaded (etc.) programming, it can also have a substantial effect on efficiency and difficulty of development. Access to that long-term data generally requires thread synchronization.
That's not to say that avoiding list will automatically make your programs lots faster, or anything like that -- just that I find situations where it's really useful to be quite rare, and a lot of situations that seem to fit what people have been told about when a linked list will be useful, really aren't.

std::sort only works with random-access iterators. That limits it to arrays, vector and deque (and user-defined types which provide random access).
std::list is a linked list which doesn't support random access. The generic sort function doesn't work for it. And even if it did, it would be suboptimal. A linked list is not sorted by copying data from node to node - it is sorted by relinking the nodes.
std::sort also wouldn't work for other containers (sets and maps), but those are always sorted anyway.

Benefit of slist over vector?

What I need is just a dynamically growing array. I don't need random access, and I always insert to the end and read it from the beginning to the end.
slist seems to be the first choice, because it provides just enough of what I need. However, I can't tell what benefit I get by using slist instead of vector. Besides, several materials I read about STL say, "vectors are generally the most efficient in time for accessing elements and to add or remove elements from the end of the sequence". Therefore, my question is: for my needs, is slist really a better choice than vector? Thanks in advance.

For starters, slist is non-standard.
For your choice, a linked list will be slower than a vector, count on it. There are two reasons contributing to this:
First and foremost, cache locality; vectors store their elements linearly in the RAM which facilitates caching and pre-fetching.
Secondly, appending to a linked list involves dynamic allocations which add a large overhead. By contrast, vectors most of the time do not need to allocate memory.
However, a std::deque will probably be even faster. In-depth performance analysis has shown that, despite bias to the contrary, std::deque is almost always superior to std::vector in performance (if no random access is needed), due to its improved (chunked) memory allocation strategy.

Yes, if you are always reading beginning to end, slist (a linked list) sounds like the way to go. The possible exception is if you will be inserting a large quantity of elements at the end at the same time. Then, the vector could be better if you use reserve appropriately.
Of course, profile to be sure it's better for your application.

Matt Austern (author of "Generic Programming and the STL" and general C++ guru) is a strong advocate of singly-linked lists for inclusion in the forthcoming C++ standard; see his presentation at http://www.accu-usa.org/Slides/SinglyLinkedLists.ppt and his long article at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2543.htm for more details, including a discussion of the trade-offs involved that may guide you in possibly choosing this data structure. (Note that the currently proposed name is forward_list, though slist is how it was traditionally named in SGI's STL & other popular libraries).

I'll second (or maybe third...) the opinion that std::vector or std::deque will do the job. The only thing that I will add is a few additional factors that should guide the decision between std::vector<T> and std::list<T>. These have a lot to do with the characteristics of T and what algorithms you plan on using.
The first is memory overhead. Std::list is a node-based container so if T is a primitive type or relatively small user-defined type, then the memory overhead of the node-based linkage might be non-negligible - consider that std::list<int> is likely to use at least 3 * sizeof(int) storage for each element whereas std::vector will only use sizeof(int) storage with a small header overhead. Std::deque is similar to std::vector but has a small overhead that is linear to N.
The next issue is the cost of copy construction. If T(T const&) is at all expensive, then steer clear of std::vector<T> since it cause a bunch of copies to occur as the size of the vector grows. This is where std::deque<T> is a clear winner and std::list<T> is also a contender.
The final issue that usually guides the decision on container type is whether your algorithms can work with the iterator invalidation constraints of std::vector and std::deque. If you will be manipulating the container elements a lot (e.g., sorting, inserting in the middle, or shuffling), then you might want to lean towards std::list since manipulating the order requires little more than resetting a few linkage pointers.

I'm guessing you mean std::list by "slist". Vectors are good when you need fast, random-access to a sequence of elements, with guaranteed contiguous memory, and fast sequential reading (IOW, from the beginning to the end). Lists are good when you need fast (constant-time) insertion or deletion of items at the beginning or end of the sequence, but don't care about the performance of random-access or sequential reading.
The reason for the difference is the way the 2 are implemented. Vectors are implemented internally as an array of items, which needs to be reallocated when its size/capacity is reached on adding an item. Lists are implemented as a doubly-linked list, which can cause cache-misses for sequential reading. Random-access for lists also requires scanning from the first (or last) item in the list, until it locates the item you're requesting.

Sounds like a good job for std::deque to me. It has the memory benefits of a vector like contiguous memory allocation for each "slab" (good for CPU caches), no overhead for each element like with std::list and it does not need to reallocate the whole set as std::vector does. Read more about std::deque here

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js