What is the efficient way to split a vector - c++

Is there any other constant time way to split a vector other than using the following.
std::vector<int> v_SplitVector(start , end);
This would take a complexity of O(N). In this case O(end - start). Is there a constant time operation to do this.
OR am I using the wrong container for the task?..

The act of "splitting" a container, for container like vectors, where elements sits on contiguous memory, require necessarily a copy / move of everything needs to go on the other side.
Container like list, that have elements each on its own memory block can be easily rearranged (see std::list::splice)
But having elements in non contiguous memory may result in lower memory access performance due to more frequent cache missing.
In other words, the complexity of the algorithm may be not the only factor influencing performance: an infrequent linear copy may damage you less than a frequent linear walk on dispersed elements.
The trade-off mostly depends on how the hardware manage caches and how the std implementation you are using takes care of that (and how the compiler can eventually optimize)

This is a copy rather than a split, hence the complexity. You can probably write a split for list which might perform better.

std::vector doesn't support the following, but if an efficient "split" operation is very important to you then you could perhaps write your own container. This would be quite a lot of work.
You could define "split" as follows:
removes an initial segment of the container, and returns a new container containing those elements. References to those elements continue to refer to the same elements in the new container. The old container contains the remaining elements. The capacity of the new container is equal to its size, and the capacity of the old container is reduced by the number of elements removed.
Then the old container and the new container would share a block of underlying storage (presumably with ref-counting). The new container would have to reallocate if you append to it (since the memory immediately at the end of its elements is in use), but so long as that happens rarely or never it could be a win.
Your example code takes a copy, though, it doesn't modify the original container. If a logical copy is a requirement then to do it without actually copying the elements you need either COW or immutable objects.
std::list has a splice() function that can move a range of elements from one list to another. This avoids copying the elements, but as of C++11 it is in effect guaranteed not to be O(1), because it needs to count how many elements it has moved. In C++03 implementations could choose whether they wanted this op to be O(1) or list::size() to be O(1), but in C++11 size() is required to be constant time for all containers.
Comparing the performance of std::vector with std::list is usually about more than just one operation, though. You have to consider that list doesn't have random-access iterators, and so on.

Creating a new std::vector necessarily requires copying, since
vectors aren't allowed to share parts of their implementation.
A modification in the container from which you obtained start
and end shouldn't affect the values in splitVector.
What you can do, fairly easily, is create a View container,
which simply holds the two iterators, and maps all accesses
through them. Something along the lines of:
template <typename Iterator>
class View
{
Iterator myBegin;
Iterator myEnd;
public:
typedef typename std::iterator_traits<Iterator>::value_type value_type;
// And the other usual typedefs, including...
typedef Iterator iterator;
View( Iterator begin, Iterator end )
: myBegin( begin )
, myEnd( end )
{
}
iterator begin() { return myBegin; }
iterator end() { return myEnd; }
value_type operator[]( ptrdiff_t index ) { return *(myBegin + index ); }
// ...
};
This requires a fair amount of boilerplate, because the
interface to something like vector is rather complete, but it's
all very straight forward and simple. The one thing you cannot
do with this, however, is modify the topology of either the
underlying container or of any View—anything which might
invalidate any iterators will of course, wreck havoc.

When adding or removing elements to/from a place different than start/end, the vector must have complexity of at least o(n) due to internal shifts required. The sme follows when you want to not only remove, but move the elements out: for a vector, they must be copied, hence, at least 1 op per element moved. That means that moving elements out of a vector is at least O(N) where N is the amount of elements moved.
If you need near-constant time add/remove operations (be it adding/inserting one, or many elements) you should look at list/linkedlist containers, where all elements and sublists are easily 'detachable', especially if you know the pointer/iterator. Or trees, or any other dynamic structure.
completely by the way, I sense what v_SplitVector does, but where did it came from? I do not remember such function/method in stdlib or boost?

Related

vector vs. list from stl - remove method

std::list has a remove method, while the std::vector doesn't. What is the reason for that?
std::list<>::remove is a physical removal method, which can be implemented by physically destroying list elements that satisfy certain criteria (by physical destruction I mean the end of element's storage duration). Physical removal is only applicable to lists. It cannot be applied to arrays, like std::vector<>. It simply is not possible to physically end storage duration of an individual element of an array. Arrays can only be created and destroyed as a whole. This is why std::vector<> does not have anything similar to std::list<>::remove.
The universal removal method applicable to all modifiable sequences is what one might call logical removal: the target elements are "removed" from the sequence by overwriting their values with values of elements located further down in the sequence. I.e. the sequence is shifted and compacted by copying the persistent data "to the left". Such logical removal is implemented by freestanding functions, like std::remove. Such functions are applicable in equal degree to both std::vector<> and std::list<>.
In cases where the approach based on immediate physical removal of specific elements applies, it will work more efficiently than the generic approach I referred above as logical removal. That is why it was worth providing it specifically for std::list<>.
std::list::remove removes all items in a list that match the provided value.
std::list<int> myList;
// fill it with numbers
myList.remove(10); // physically removes all instances of 10 from the list
It has a similar function, std::list::remove_if, which allows you to specify some other predicate.
std::list::remove (which physically removes the elements) is required to be a member function as it needs to know about the memory structure (that is, it must update the previous and next pointers for each item that needs to be updated, and remove the items), and the entire function is done in linear time (a single iteration of the list can remove all of the requested elements without invalidating any of the iterators pointing to items that remain).
You cannot physically remove a single element from a std::vector. You either reallocate the entire vector, or you move every element after the removed items and adjust the size member. The "cleanest" implementation of that set of operations would be to do
// within some instance of vector
void vector::remove(const T& t)
{
erase(std::remove(t), end());
}
Which would require std::vector to depend on <algorithm> (something that is currently not required).
As the "sorting" is needed to remove the items without multiple allocations and copies being required. (You do not need to sort a list to physically remove elements).
Contrary to what others are saying, it has nothing to do with speed. It has to do with the algorithm needing to know how the data is stored in memory.
As a side note: This is also a similar reason why std::remove (and friends) do not actually remove the items from the container they operate on; they just move all the ones that are not going to be removed to the "front" of the container. Without the knowledge of how to actually remove an object from a container, the generic algorithm cannot actually do the removing.
Consider the implementation details of both containers. A vector has to provide a continuous memory block for storage. In order to remove an element at index n != N (with N being the vector's length), all elements from n+1 to N-1 need to be moved. The various functions in the <algorithm> header implement that behavior, like std::remove or std::remove_if. The advantage of these being free-standing functions is that they can work for any type that offers the needed iterators.
A list on the other hand, is implemented as a linked list structure, so:
It's fast to remove an element from anywhere
It's impossible to do it as efficiently using iterators (since the internal structure has to be known and manipulated).
In general in STL the logic is "if it can be done efficiently - then it's a class member. If it's inefficient - then it's an outside function"
This way they make the distinction between "correct" (i.e. "efficient") use of classes vs. "incorrect" (inefficient) use.
For example, random access iterators have a += operator, while other iterators use the std::advance function.
And in this case - removing elements from an std::list is very efficient as you don't need to move the remaining values like you do in std::vector
It's all about efficiency AND reference/pointer/iterator validity. list items can be removed without disturbing any other pointers and iterators. This is not true for a vector and other containers in all but the most trivial cases. Nothing prevents use the external strategy, but you have a superior options.. That said this fellow said it better than I could on a duplicate question
From another poster on a duplicate question:
The question is not why std::vector does not offer the operation, but
rather why does std::list offer it. The design of the STL is focused
on the separation of the containers and the algorithms by means of
iterators, and in all cases where an algorithm can be implemented
efficiently in terms of iterators, that is the option.
There are, however, cases where there are specific operations that can
be implemented much more efficiently with knowledge of the container.
That is the case of removing elements from a container. The cost of
using the remove-erase idiom is linear in the size of the container
(which cannot be reduced much), but that hides the fact that in the
worst case all but one of those operations are copies of the objects
(the only element that matches is the first), and those copies can
represent quite a big hidden cost.
By implementing the operation as a method in std::list the complexity
of the operation will still be linear, but the associated cost for
each one of the elements removed is very low, a couple of pointer
copies and releasing of a node in memory. At the same time, the
implementation as part of the list can offer stronger guarantees:
pointers, references and iterators to elements that are not erased do
not become invalidated in the operation.
Another example of an algorithm that is implemented in the specific
container is std::list::sort, that uses mergesort that is less
efficient than std::sort but does not require random-access iterators.
So basically, algorithms are implemented as free functions with
iterators unless there is a strong reason to provide a particular
implementation in a concrete container.
std::list is designed to work like a linked list. That is, it is designed (you might say optimized) for constant time insertion and removal ... but access is relatively slow (as it typically requires traversing the list).
std::vector is designed for constant-time access, like an array. So it is optimized for random access ... but insertion and removal are really only supposed to be done at the "tail" or "end", elsewhere they're typically going to be much slower.
Different data structures with different purposes ... therefore different operations.
To remove an element from a container you have to find it first. There's a big difference between sorted and unsorted vectors, so in general, it's not possible to implement an efficient remove method for the vector.

Difference between std::remove and erase for vector?

I have a doubt that I would like to clarify in my head. I am aware of the different behavior for std::vector between erase and std::remove where the first physically removes an element from the vector, reducing size, and the other just moves an element leaving the capacity the same.
Is this just for efficiency reasons? By using erase, all elements in a std::vector will be shifted by 1, causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy, that difference might matter, right?
Is this just for efficiency reason? By using erase all elements in a std::vector will be shifted by 1 causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy that difference mihgt matter, right?
The reason for using this idiom is exactly that. There is a benefit in performance, but not in the case of a single erasure. Where it does matter is if you need to remove multiple elements from the vector. In this case, the std::remove will copy each not removed element only once to its final location, while the vector::erase approach would move all of the elements from the position to the end multiple times. Consider:
std::vector<int> v{ 1, 2, 3, 4, 5 };
// remove all elements < 5
If you went over the vector removing elements one by one, you would remove the 1, causing copies of the remainder elements that get shifted (4). Then you would remove 2 and shift all remainding elements by one (3)... if you see the pattern this is a O(N^2) algorithm.
In the case of std::remove the algorithm maintains a read and write heads, and iterates over the container. For the first 4 elements the read head will be moved and the element tested, but no element is copied. Only for the fifth element the object would be copied from the last to the first position, and the algorithm will complete with a single copy and returning an iterator to the second position. This is a O(N) algorithm. The later std::vector::erase with the range will cause destruction of all the remainder elements and resizing the container.
As others have mentioned, in the standard library algorithms are applied to iterators, and lack knowledge of the sequence being iterated. This design is more flexible than other approaches on which algorithms are aware of the containers in that a single implementation of the algorithm can be used with any sequence that complies with the iterator requirements. Consider for example, std::remove_copy_if, it can be used even without containers, by using iterators that generate/accept sequences:
std::remove_copy_if(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::ostream_iterator<int>(std::cout, " "),
[](int x) { return !(x%2); } // is even
);
That single line of code will filter out all even numbers from standard input and dump that to standard output, without requiring the loading of all numbers into memory in a container. This is the advantage of the split, the disadvantage is that the algorithms cannot modify the container itself, only the values referred to by the iterators.
std::remove is an algorithm from the STL which is quite container agnostic. It requires some concept, true, but it has been designed to also work with C arrays, which are static in sizes.
std::remove simply returns a new end() iterator to point to one past the last non-removed element (the number of items from the returned value to end() will match the number of items to be removed, but there is no guarantee their values are the same as those you were removing - they are in a valid but unspecified state). This is done so that it can work for multiple container types (basically any container type that a ForwardIterator can iterate through).
std::vector::erase actually sets the new end() iterator after adjusting the size. This is because the vector's method actually knows how to handle adjusting it's iterators (the same can be done with std::list::erase, std::deque::erase, etc.).
remove organizes a given container to remove unwanted objects. The container's erase function actually handles the "removing" the way that container needs it to be done. That is why they are separate.
I think it has to do with needing direct access to the vector itself to be able to resize it. std::remove only has access to the iterators, so it has no way of telling the vector "Hey, you now have fewer elements".
See yves Baumes answer as to why std::remove is designed this way.
Yes, that's the gist of it. Note that erase is also supported by the other standard containers where its performance characteristics are different (e.g. list::erase is O(1)), while std::remove is container-agnostic and works with any type of forward iterator (so it works for e.g. bare arrays as well).
Kind of. Algorithms such as remove work on iterators (which are an abstraction to represent an element in a collection) which do not necessarily know which type of collection they are operating on - and therefore cannot call members on the collection to do the actual removal.
This is good because it allows algorithms to work generically on any container and also on ranges that are subsets of the entire collection.
Also, as you say, for performance - it may not be necessary to actually remove (and destroy) the elements if all you need is access to the logical end position to pass on to another algorithm.
Standard library algorithms operate on sequences. A sequence is defined by a pair of iterators; the first points at the first element in the sequence, and the second points one-past-the-end of the sequence. That's all; algorithms don't care where the sequence comes from.
Standard library containers hold data values, and provide a pair of iterators that specify a sequence for use by algorithms. They also provide member functions that may be able to do the same operations as an algorithm more efficiently by taking advantage of the internal data structure of the container.

Vector vs Deque insertion in middle

I know that deque is more efficient than vector when insertions are at front or end and vector is better if we have to do pointer arithmetic. But which one to use when we have to perform insertions in middle.? and Why.?
You might think that a deque would have the advantage, because it stores the data broken up into blocks. However to implement operator[] in constant time requires all those blocks to be the same size. Inserting or deleting an element in the middle still requires shifting all the values on one side or the other, same as a vector. Since the vector is simpler and has better locality for caching, it should come out ahead.
Selection criteria with Standard library containers is, You select a container depending upon:
Type of data you want to store &
The type of operations you want to perform on the data.
If you want to perform large number of insertions in the middle you are much better off using a std::list.
If the choice is just between a std::deque and std::vector then there are a number of factors to consider:
Typically, there is one more indirection in case of deque to access the elements, so element
access and iterator movement of deques are usually a bit slower.
In systems that have size limitations for blocks of memory, a deque might contain more elements because it uses more than one block of memory. Thus, max_size() might be larger for deques.
Deques provide no support to control the capacity and the moment of reallocation. In
particular, any insertion or deletion of elements other than at the beginning or end
invalidates all pointers, references, and iterators that refer to elements of the deque.
However, reallocation may perform better than for vectors, because according to their
typical internal structure, deques don't have to copy all elements on reallocation.
Blocks of memory might get freed when they are no longer used, so the memory size of a
deque might shrink (this is not a condition imposed by standard but most implementations do)
std::deque could perform better for large containers because it is typically implemented as a linked sequence of contiguous data blocks, as opposed to the single block used in an std::vector. So an insertion in the middle would result in less data being copied from one place to another, and potentially less reallocations.
Of course, whether that matters or not depends on the size of the containers and the cost of copying the elements stored. With C++11 move semantics, the cost of the latter is less important. But in the end, the only way of knowing is profiling with a realistic application.
Deque would still be more efficient, as it doesn't have to move half of the array every time you insert an element.
Of course, this will only really matter if you consider large numbers of elements, and even than it is advisable to run a benchmark and see which one works better in your particular case. Remember that premature optimization is the root of all evil.

STL Containers - difference between vector, list and deque

Should I use deque instead of vector if i'd like to push elements also in the beginning of the container? When should I use list and what's the point of it?
Use deque if you need efficient insertion/removal at the beginning and end of the sequence and random access; use list if you need efficient insertion anywhere, at the sacrifice of random access. Iterators and references to list elements are very stable under almost any mutation of the container, while deque has very peculiar iterator and reference invalidation rules (so check them out carefully).
Also, list is a node-based container, while a deque uses chunks of contiguous memory, so memory locality may have performance effects that cannot be captured by asymptotic complexity estimates.
deque can serve as a replacement for vector almost everywhere and should probably have been considered the "default" container in C++ (on account of its more flexible memory requirements); the only reason to prefer vector is when you must have a guaranteed contiguous memory layout of your sequence.
deque and vector provide random access, list provides only linear accesses. So if you need to be able to do container[i], that rules out list. On the other hand, you can insert and remove items anywhere in a list efficiently, and operations in the middle of vector and deque are slow.
deque and vector are very similar, and are basically interchangeable for most purposes. There are only two differences worth mentioning. First, vector can only efficiently add new items at the end, while deque can add items at either end efficiently. So why would you ever use a vector then? Unlike deque, vector guarantee that all items will be stored in contiguous memory locations, which makes iterating through them faster in some situations.

What's the difference between deque and list STL containers?

What is the difference between the two? I mean the methods are all the same. So, for a user, they work identically.
Is that correct??
Let me list down the differences:
Deque manages its elements with a
dynamic array, provides random
access, and has almost the same
interface as a vector.
List manages its elements as a
doubly linked list and does not
provide random access.
Deque provides Fast insertions and deletions at
both the end and the beginning. Inserting and deleting elements in
the middle is relatively slow because
all elements up to either of both
ends may be moved to make room or to
fill a gap.
In List, inserting and removing elements is fast at each position,
including both ends.
Deque: Any insertion or deletion of elements
other than at the beginning or end
invalidates all pointers, references,
and iterators that refer to elements
of the deque.
List: Inserting and deleting elements does
not invalidate pointers, references,
and iterators to other elements.
Complexity
Insert/erase at the beginning in middle at the end
Deque: Amortized constant Linear Amortized constant
List: Constant Constant Constant
From the (dated but still very useful) SGI STL summary of deque:
A deque is very much like a vector: like vector, it is a sequence that supports random access to elements, constant time insertion and removal of elements at the end of the sequence, and linear time insertion and removal of elements in the middle.
The main way in which deque differs from vector is that deque also supports constant time insertion and removal of elements at the beginning of the sequence. Additionally, deque does not have any member functions analogous to vector's capacity() and reserve(), and does not provide any of the guarantees on iterator validity that are associated with those member functions.
Here's the summary on list from the same site:
A list is a doubly linked list. That is, it is a Sequence that supports both forward and backward traversal, and (amortized) constant time insertion and removal of elements at the beginning or the end, or in the middle. Lists have the important property that insertion and splicing do not invalidate iterators to list elements, and that even removal invalidates only the iterators that point to the elements that are removed. The ordering of iterators may be changed (that is, list::iterator might have a different predecessor or successor after a list operation than it did before), but the iterators themselves will not be invalidated or made to point to different elements unless that invalidation or mutation is explicit.
In summary the containers may have shared routines but the time guarantees for those routines differ from container to container. This is very important when considering which of these containers to use for a task: taking into account how the container will be most frequently used (e.g., more for searching than for insertion/deletion) goes a long way in directing you to the right container.
std::list is basically a doubly linked list.
std::deque, on the other hand, is implemented more like std::vector. It has constant access time by index, as well as insertion and removal at the beginning and end, which provides dramatically different performance characteristics than a list.
I've made illustrations for the students in my C++ class.
This is based (loosely) on (my understanding of) the implementation in the GCC STL implementation (
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_deque.h and https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/bits/stl_list.h)
Double-ended queue
Elements in the collection are stored in memory blocks. The number of elements per block depends on the size of the element: the bigger the elements, the fewer per block. The underlying hope is that if the blocks are of a similar size no matter the type of the elements, that should help the allocator most of the time.
You have an array (called a map in the GCC implementation) listing the memory blocks. All memory blocks are full except the first one which may have room at the beginning and the last which may have room at the end. The map itself is filled from the center outwards. This is how, contrarily to a std::vector, insertion at both ends can be done in constant time. Similarly to a std:::vector, random access is possible in constant time, but requires two indirections instead of one. Similarly to std::vector and contrarily to std::list, removing or inserting elements in the middle is costly because you have to reorganize a large part of the datastructure.
Doubly-linked list
Doubly-linked list are perhaps more usual. Every element is stored in its own memory block, allocated independently from the other elements. In each block, you have the element's value and two pointers: one to the previous element, one to the next element. It makes it very easy to insert element at any position in the list, or even to move a subchain of elements from one list to another (an operation called splicing): you just have to update the pointers at the beginning and end of the insertion point. The downside is that to find one element by its index, you have to walk the chain of pointers, so random access has a linear cost in the numbers of elements in the list.
Another important guarantee is the way each different container stores its data in memory:
A vector is a single contiguous memory block.
A deque is a set of linked memory blocks, where more than one element is stored in each memory block.
A list is a set of elements dispersed in memory, i.e.: only one element is stored per memory "block".
Note that the deque was designed to try to balance the advantages of both vector and list without their respective drawbacks. It is a specially interesting container in memory limited platforms, for example, microcontrollers.
The memory storage strategy is often overlooked, however, it is frequently one of the most important reasons to select the most suitable container for a certain application.
No. A deque only supports O(1) insertion and deletion at the front and back. It may, for example, be implemented in a vector with wrap-around. Since it also guarantees O(1) random access, you can be sure it's not using (just) a doubly linked list.
Among eminent differences between deque and list
For deque :
Items stored side by side;
Optimized for adding datas from two sides (front, back);
Elements indexed by numbers (integers).
Can be browsed by iterators and even by element's index.
Time access to data is faster.
For list
Items stored "randomly" in the memory;
Can be browsed only by iterators;
Optimized for insertion and removal in the middle.
Time access to data is slower, slow to iterate, due to its very poor spatial locality.
Handles very well large elements
You can check also the following Link, which compares the performance between the two STL containers (with std::vector)
Hope i shared some useful informations.
The performance differences have been explained well by others. I just wanted to add that similar or even identical interfaces are common in object-oriented programming -- part of the general methodology of writing object-oriented software. You should IN NO WAY assume that two classes work the same way simply because they implement the same interface, any more than you should assume that a horse works like a dog because they both implement attack() and make_noise().
Here's a proof-of-concept code use of list, unorded map that gives O(1) lookup and O(1) exact LRU maintenance. Needs the (non-erased) iterators to survive erase operations. Plan to use in a O(1) arbitrarily large software managed cache for CPU pointers on GPU memory. Nods to the Linux O(1) scheduler (LRU <-> run queue per processor). The unordered_map has constant time access via hash table.
#include <iostream>
#include <list>
#include <unordered_map>
using namespace std;
struct MapEntry {
list<uint64_t>::iterator LRU_entry;
uint64_t CpuPtr;
};
typedef unordered_map<uint64_t,MapEntry> Table;
typedef list<uint64_t> FIFO;
FIFO LRU; // LRU list at a given priority
Table DeviceBuffer; // Table of device buffers
void Print(void){
for (FIFO::iterator l = LRU.begin(); l != LRU.end(); l++) {
std::cout<< "LRU entry "<< *l << " : " ;
std::cout<< "Buffer entry "<< DeviceBuffer[*l].CpuPtr <<endl;
}
}
int main()
{
LRU.push_back(0);
LRU.push_back(1);
LRU.push_back(2);
LRU.push_back(3);
LRU.push_back(4);
for (FIFO::iterator i = LRU.begin(); i != LRU.end(); i++) {
MapEntry ME = { i, *i};
DeviceBuffer[*i] = ME;
}
std::cout<< "************ Initial set of CpuPtrs" <<endl;
Print();
{
// Suppose evict an entry - find it via "key - memory address uin64_t" and remove from
// cache "tag" table AND LRU list with O(1) operations
uint64_t key=2;
LRU.erase(DeviceBuffer[2].LRU_entry);
DeviceBuffer.erase(2);
}
std::cout<< "************ Remove item 2 " <<endl;
Print();
{
// Insert a new allocation in both tag table, and LRU ordering wiith O(1) operations
uint64_t key=9;
LRU.push_front(key);
MapEntry ME = { LRU.begin(), key };
DeviceBuffer[key]=ME;
}
std::cout<< "************ Add item 9 " <<endl;
Print();
std::cout << "Victim "<<LRU.back()<<endl;
}