Is inserting in the end equivalent to std::copy()? - c++

Consider following two ways to append elements into a vector
std::vector<int> vi1(10,42), vi2;
vi2.insert(vi2.end(),vi1.begin(),vi1.end());
<OR>
std::copy(vi1.begin(),vi1.end(),std::back_inserter(vi2));
std::copy version looks cleaner and I don't have to type vi2 twice. But since it is a generic algorithm while insert is a member function, could insert perform better than std::copy or does it do the same thing?
I can benchmark myself but I have to do it for every vector for every template type. Has anyone done already?

There are some subtle differences. In the first case (std::vector<>::insert) you are giving a range to the container, so it can calculate the distance and perform a single allocator to grow to the final required size. In the second case (std::copy) that information is not directly present in the interface, and it could potentially cause multiple reallocations of the buffer.
Note that even if multiple reallocations are needed, the amortized cost of insertion must still be constant, so this does not imply an asymptotic cost change, but might matter. Also note that a particularly smart implementation of the library has all the required information to make the second version as efficient by specializing the behavior of std::copy to handle back insert iterators specially (although I have not checked whether any implementation actually does this).

You would think that vector::insert might be able to optimize the case where it's inserting multiple items at once, but it's harder than it looks. What if the iterators are output iterators for example - there's no way of knowing ahead of time how many insertions you'll do. It's likely that the code for insert just does multiple push_backs the same as back_inserter.

vector::insert will probably perform better in most cases on most mainstream implementations of the C++ standard library. The reason is that the vector object has internal knowledge of the currently allocated memory buffer, and can pre-allocate enough memory to perform the entire insertion since the number of elements can be computed in advance with random-access iterators. However, std::copy along with std::back_inserter will keep calling vector::push_back, which may trigger multiple allocations.
The GNU implementation of std::vector::insert in libstdc++, for example, pre-allocates a buffer in advance if the iterator category is RandomAccessIterator. With input iterators, vector::insert may be equivalent to std::copy, because you can't determine the number of elements in advance.

It is not equivalent to std::copy. It is equivalent to push_back (in some sense).
Yes, std::back_inserter does the same thing, which you use with std::copy which can insert at the front also if you use std:front_inserter (though you cannot use it with std::vector). It can insert at specified iterator also if you use std::inserter instead. So you see, std::copy does thing based on what you pass as third argument.
Now coming back to the essence of the question. I think you should use insert, as it can perform better, because it may discover the number of elements it is going to insert, so it may allocate that much memory at once (if needed). In your case, it is likely to perform better, because v1 is std::vector which means it is easy to compute the number of elements in O(1) time.

Related

std::vector::insert vs std::list::operator[]

I know that std::list::operator[] is not implemented as it has bad performance. But what about std::vector::insert it is as much inefficient as much std::list::operator[] is. What is the explanation behind?
std::vector::insert is implemented because std::vector has to meet requirements of SequenceContainer concept, while operator[] is not required by any concepts (that I know of), possible that will be added in ContiguousContainer concept in c++17. So operator[] added to containers that can be used like arrays, while insert is required by interface specification, so containers that meet certain concept can be used in generic algorithms.
I think Slava's answer does the best job, but I'd like to offer a supplementary explanation. Generally with data structures, it's far more common to have more accesses than insertions, than vice versa. There are many data structures that may try to optimize access at the cost of insertion, but the inverse is much more rare, because it tends to be less useful in real life.
operator[], if implemented for a linked list, would access an existing value presumably (similar to how it does for vector). insert adds new values. It's much more likely you will be willing to take a big performance hit to insert some new elements into a vector, provided that the subsequent accesses are extremely fast. In many cases, element insertion may be outside of the critical path entirely whereas the critical path consists of a single traversal, or random access of already-present data. So it's simply convenient to have insert take care of the details for you in that case (it's actually a bit annoying to write efficiently and correctly). This is actually a not-uncommon use of a vector.
On the other hand, using operator[] on a linked list would almost always be a sign that you are using the wrong data structure.
std::list::operator[] would require an O(N) traversal and is not really in accordance with what a list is designed to do. If you need operator[] then use a different container type. When C++ folk see a [] they assume an O(1) (or, at worse, an O(Log N)) operation. Supplying [] for a list would break that.
But although std::vector::insert is also O(N), it can be optimised: an at-end insertion can be readily optimised by having the vector's capacity grow in large chunks. An insertion in the middle requires an element-by-element move, but that again can be performed very quickly on modern chipsets.
The [] operator is inherited from plain arrays. It has always been understood as a fast (sub linear time) accessor of the underlying container. Since list is does not support sub linear time access, it does not make sense for it to implement the operator.
auto a = Container [10]; // Ideally I can assume this is quick
The equivalent of your std::list <>::operator [] is std::next <std::list<>::iterator>. Documented at cpp-reference.
auto a = *std::next (Container.begin (), 10); // This may take a little while
This is the truly generic way index a container. If the container supports random access, it will be constant time, other wise it will be linear.

vector vs. list from stl - remove method

std::list has a remove method, while the std::vector doesn't. What is the reason for that?
std::list<>::remove is a physical removal method, which can be implemented by physically destroying list elements that satisfy certain criteria (by physical destruction I mean the end of element's storage duration). Physical removal is only applicable to lists. It cannot be applied to arrays, like std::vector<>. It simply is not possible to physically end storage duration of an individual element of an array. Arrays can only be created and destroyed as a whole. This is why std::vector<> does not have anything similar to std::list<>::remove.
The universal removal method applicable to all modifiable sequences is what one might call logical removal: the target elements are "removed" from the sequence by overwriting their values with values of elements located further down in the sequence. I.e. the sequence is shifted and compacted by copying the persistent data "to the left". Such logical removal is implemented by freestanding functions, like std::remove. Such functions are applicable in equal degree to both std::vector<> and std::list<>.
In cases where the approach based on immediate physical removal of specific elements applies, it will work more efficiently than the generic approach I referred above as logical removal. That is why it was worth providing it specifically for std::list<>.
std::list::remove removes all items in a list that match the provided value.
std::list<int> myList;
// fill it with numbers
myList.remove(10); // physically removes all instances of 10 from the list
It has a similar function, std::list::remove_if, which allows you to specify some other predicate.
std::list::remove (which physically removes the elements) is required to be a member function as it needs to know about the memory structure (that is, it must update the previous and next pointers for each item that needs to be updated, and remove the items), and the entire function is done in linear time (a single iteration of the list can remove all of the requested elements without invalidating any of the iterators pointing to items that remain).
You cannot physically remove a single element from a std::vector. You either reallocate the entire vector, or you move every element after the removed items and adjust the size member. The "cleanest" implementation of that set of operations would be to do
// within some instance of vector
void vector::remove(const T& t)
{
erase(std::remove(t), end());
}
Which would require std::vector to depend on <algorithm> (something that is currently not required).
As the "sorting" is needed to remove the items without multiple allocations and copies being required. (You do not need to sort a list to physically remove elements).
Contrary to what others are saying, it has nothing to do with speed. It has to do with the algorithm needing to know how the data is stored in memory.
As a side note: This is also a similar reason why std::remove (and friends) do not actually remove the items from the container they operate on; they just move all the ones that are not going to be removed to the "front" of the container. Without the knowledge of how to actually remove an object from a container, the generic algorithm cannot actually do the removing.
Consider the implementation details of both containers. A vector has to provide a continuous memory block for storage. In order to remove an element at index n != N (with N being the vector's length), all elements from n+1 to N-1 need to be moved. The various functions in the <algorithm> header implement that behavior, like std::remove or std::remove_if. The advantage of these being free-standing functions is that they can work for any type that offers the needed iterators.
A list on the other hand, is implemented as a linked list structure, so:
It's fast to remove an element from anywhere
It's impossible to do it as efficiently using iterators (since the internal structure has to be known and manipulated).
In general in STL the logic is "if it can be done efficiently - then it's a class member. If it's inefficient - then it's an outside function"
This way they make the distinction between "correct" (i.e. "efficient") use of classes vs. "incorrect" (inefficient) use.
For example, random access iterators have a += operator, while other iterators use the std::advance function.
And in this case - removing elements from an std::list is very efficient as you don't need to move the remaining values like you do in std::vector
It's all about efficiency AND reference/pointer/iterator validity. list items can be removed without disturbing any other pointers and iterators. This is not true for a vector and other containers in all but the most trivial cases. Nothing prevents use the external strategy, but you have a superior options.. That said this fellow said it better than I could on a duplicate question
From another poster on a duplicate question:
The question is not why std::vector does not offer the operation, but
rather why does std::list offer it. The design of the STL is focused
on the separation of the containers and the algorithms by means of
iterators, and in all cases where an algorithm can be implemented
efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can
be implemented much more efficiently with knowledge of the container.
That is the case of removing elements from a container. The cost of
using the remove-erase idiom is linear in the size of the container
(which cannot be reduced much), but that hides the fact that in the
worst case all but one of those operations are copies of the objects
(the only element that matches is the first), and those copies can
represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity
of the operation will still be linear, but the associated cost for
each one of the elements removed is very low, a couple of pointer
copies and releasing of a node in memory. At the same time, the
implementation as part of the list can offer stronger guarantees:
pointers, references and iterators to elements that are not erased do
not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific
container is std::list::sort, that uses mergesort that is less
efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with
iterators unless there is a strong reason to provide a particular
implementation in a concrete container.
std::list is designed to work like a linked list. That is, it is designed (you might say optimized) for constant time insertion and removal ... but access is relatively slow (as it typically requires traversing the list).
std::vector is designed for constant-time access, like an array. So it is optimized for random access ... but insertion and removal are really only supposed to be done at the "tail" or "end", elsewhere they're typically going to be much slower.
Different data structures with different purposes ... therefore different operations.
To remove an element from a container you have to find it first. There's a big difference between sorted and unsorted vectors, so in general, it's not possible to implement an efficient remove method for the vector.

Iterating through a sequence while modifying it. Use vector or List ? C++/ STL

Suppose I have a long sequence of unordered elements S s1, s2, s3,.... of a arbitrary but fixed data type through which I wish to iterate and delete certain elements according to some boolean criterion.
Now if after iterating through the sequence if I am not interested in the final ordering of the sequence then I can store my sequence in 2 ways
Use a plain ol' std::list to represent the sequence. Perform removal with the std::list methods.
Use a std::vector to represent the sequence. If a certain element fails the criterion and has to be deleted swap it with the last vector element and perform a pop_back.
My questions are
1.Which would be a better/efficient way timewise and/or memorywise to store my sequence?
2.If I had to venture a guess, then I would say list, because if si 's data-type memory size is large, swapping would be expensive. Would this reasoning be correct?
In practice, std::vector has a great performance advantage over other containers due to its tight memory locality. If your elements are moreover movable (i.e. inexpensive to swap), then your second option should be your first try. Implement it with the standard remove/erase idiom:
v.erase(std::remove_if(v.begin(), v.end(), predicate), v.end());
You should also set up a second version with a std::list and compare the performance:
l.remove_if(predicate);
The list avoids moving any elements around, so in theory it could be efficient, however the practical effects of memory locality cannot be captured by the language standard and you cannot get around measuring and comparing the actual performance.
(Supposedly, if your element type is huge, like sizeof(T) > 10000, the list will probably start being faster than the vector. Test and compare, and keep your code modular such that changing this later is easy.)
If you have a C++11 compiler, or atleast an rvalue reference aware one, using swap will cost you nothing if your data type isn't flat (i.e., contains pointers to external resources or in other words, is expensive to copy) since it will just move your structs around. So if you have such a compiler, create a move constructor (read up on that) for the data type, and you're set. Just use a std::vector from there on.
Now if your structs are flat (no external resources), and are large, you might really want to use a std::list, since the memory overhead would be reasonably small in comparision to your data type's size. Since you only seem to be interested in bidirectional/sequential access to the elements, this might be just the right place to use a list.
A last point, and an important factor, measure. The default container to reach for should always be std::vector. Measure how both perform before blindly deciding on one. Another important factor is if you actually need to do anything else with the containers, like random-access or such stuff.
Edit: Before I forget, you might also just want to create a view over the container holding your data, which might be very cheap.
We can only guess. I'd say that if the objects are easily copiable (e.g. basic types) then std::vector will be more efficient, as removing elements will not alloc/realloc/free any memory. But if the cost of copying elements is significant, then the std::list will be better.
But note that with C++11, the copy will be converted into a move, so you should consider the moving cost, that will be presumably quite less than the copy.
In almost all practical cases, use std::vector initially. As always, write your code first then optimise later, if and when it is needed. If your profiler indicates that vector's inefficiencies are the cause, then try a list. I've almost never seen a performance benefit from it though.

Fastest way to convert from vector of pairs to two independent vectors in C++

lets say I have a vector of pair<int,int>. Now I want to extract the pair.first and pair.second as independent vectors. I can iterate on the vector and do that but is there a better/faster way?
In C++11, if you don't need the old vector anymore, you could perhaps get a tiny bit of extra efficiency from move semantics:
for (auto it = std::make_move_iterator(v.begin()),
end = std::make_move_iterator(v.end()); it != end; ++it)
{
v1.push_back(std::move(it->first));
v2.push_back(std::move(it->second));
}
Other than that, you certainly cannot do better than one loop. You will have to touch every element at least once, so this is as efficient as it gets.
Note that moving can only make a difference if the element types themselves have move semantics that are better than copying. This is not the case for ints, or any PODs. But it can't hurt to write your code generically so that you can take advantage of this in future situations.
If copying/moving is a problem, though, you should consider whether some view adapter for your original vector might be a better approach.
No there is not. The one thing to take care of is to use reserve on the two resulting vectors to prevent unnecessary reallocations.
You won't be able to avoid the iteration. As to the fastest solution,
it depends on what is in the pair, and on the actual implementation.
Depending on these, it may be better to create the target vectors with
the correct size, and assign to them; or to create them empty, and use
reserve and then push_back. You might also want to compare indexing
with using iterators; if you're using pre-sized vectors, using only one
control variable instead of three might be an improvement. (With g++,
the last time I measured, creating vectors of the correct size and
assigning was faster than using reserve and push_back, at least for
double. Despite the fact that it meant looping twice internally, and
initializing the values to 0.0.)
You might also want to try creating functional objects to extract the
first and second elements of the pair (supposing you don't have them
already), and use two calls to transform. Again, either with a
predimensionned vector or using a back inserter as target. Off hand, I
wouldn't expect this to provide better performance, but you never know.
You have to iterate over the vector anyway, so in terms of Complexity, this is as good as you can get.

why Vector doesn't provide the remove() member function while list provides?

If I want to delete all the elements with a value from vector,I call remove algorithm and then call vector's erase member function to physically delete it.
But in the case of list , simple call remove member function and it will delete all elements with that value.
I am not sure why vector does't provide the remove MF while list does it.
For Exp: I want to delete value '4' from vector v.
vector<int> v;
vector<int> ::iterator Itr;
for (int i=0; i< 6; i++)
v.push_back(i*2);
v.push_back(4);
v.push_back(8);
v.push_back(4);
v.erase(remove(v.begin(),v.end(),4), v.end());
and for list:
list.remove(4); // will delete all the element which has value 4
The question is not why std::vector does not offer the operation, but rather why does std::list offer it. The design of the STL is focused on the separation of the containers and the algorithms by means of iterators, and in all cases where an algorithm can be implemented efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can be implemented much more efficiently with knowledge of the container. That is the case of removing elements from a container. The cost of using the remove-erase idiom is linear in the size of the container (which cannot be reduced much), but that hides the fact that in the worst case all but one of those operations are copies of the objects (the only element that matches is the first), and those copies can represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity of the operation will still be linear, but the associated cost for each one of the elements removed is very low, a couple of pointer copies and releasing of a node in memory. At the same time, the implementation as part of the list can offer stronger guarantees: pointers, references and iterators to elements that are not erased do not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific container is std::list::sort, that uses mergesort that is less efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with iterators unless there is a strong reason to provide a particular implementation in a concrete container.
I believe the rationale is that std::list offers a very efficient remove method (if implemented as a doubly linked listed it just adjusts the pointers to the element and deallocates its storage), while element removal for std::vector is comparably slow.
The remove+erase trick is the standard way which works for all container types that offer the required iterator type. Presumably, the designers of the STL wanted to point out this difference. They could have opted to give all containers a remove method, which would be remove+erase for all containers except those who knew a better way, but they didn't.
This seems to violate the idea of generic code at a first glance, but I don't think it really does. Having a simple, generic remove that is easily accessible makes it easy to write generic code that compiles fine with all container types, but at the end generic code that would run extremely slow in 9 out of 10 cases and blazingly fast in the tenth is not truly generic, it just looks so.
The same pattern can be found at std::map::find vs. the generic std::find.
Removing an element from a vector is much slower than doing so for a list: it is (on average) proportional to the size of the vector, whereas the operation on a list is executed in constant time.
The designers of the standard library decided not to include this feature under the principle of "things that look easy should BE (computationally) easy".
I'm not sure whether I agree with this philosophy, but it's considered a design goal for C++.
Because dropping item from a list is cheap, while doing so on a vector is expensive - all following elements have to be shifted, i.e. copied/moved.
I imagine its due to efficiency, its slower to remove random elements from a vector than it is from a list. It might mean a little more typing for situations like this, but at least its obvious in the interface that the std::vector isn't the best data structure if you need to do this often.