Why does std::partition not have an out-of-place variant? - c++

std::partition is nifty, but it is in-place; and std::partition_copy is also nice, but it takes two output iterators, i.e. you have to at least count in advance the number of elements satisfying the predicate if you want to use the same output array. Why isn't there an out-of-place std::partition, or a single-output-iterator std::partition_copy, in <algorithm>?

Presumably because the functionality can already be attained by either:
Copying the original container and using in-place partition.
Sizing the single destination container to the correct size and using the begin() and rbegin() iterators to fill it from the front and back with partition_copy.

Related

find() vs binary_search() in STL

Which function is more efficient in searching an element in vector find() or binary_search() ?
The simple answer is: std::find for unsorted data and std::binary_search for sorted data. But I think there's much more to this:
Both methods take a range [start, end) with n elements and and a value x that is to be found as input. But note the important difference that std::binary_search only returns a bool that tells you wether the range contained the element, or not. std::find() returns an iterator. So both have different, but overlapping use cases.
std::find() is pretty straight forward. O(n) iterator increments and O(n) comparisons. Also it doesn't matter wether the input data is sorted or not.
For std::binary_search() you need to consider multiple factors:
It only works on sorted data. You need to take the cost of sorting into account.
The number of comparisons is always O(log n).
If the iterator does not satisfy LegacyRandomAccessIterator the number of iterator increments is O(n), it will be logarithmic when they do satisfy this requirement.
Conclusion (a bit opinionated):
when you operate on un-sorted data or need the location of the item you searched for you must use std::find()
when your data is already sorted or needs to be sorted anyway and you simply want to check if an element is present or not, use std::binary_search()
If you want to search containers like std::set, std::map or their unordered counterparts also consider their builtin methods like std::set::find
When you are not sure if the data is sorted or not, You have to use find() and If the data will be sorted you should use binary_search().
For more information, You can refer find() and binary_search()
If your input is sorted then you can use binary_search as it will take O(lg n) time. if your input is unsorted you can use find, which will take O(n) time.

Difference between std::remove and erase for vector?

I have a doubt that I would like to clarify in my head. I am aware of the different behavior for std::vector between erase and std::remove where the first physically removes an element from the vector, reducing size, and the other just moves an element leaving the capacity the same.
Is this just for efficiency reasons? By using erase, all elements in a std::vector will be shifted by 1, causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy, that difference might matter, right?
Is this just for efficiency reason? By using erase all elements in a std::vector will be shifted by 1 causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy that difference mihgt matter, right?
The reason for using this idiom is exactly that. There is a benefit in performance, but not in the case of a single erasure. Where it does matter is if you need to remove multiple elements from the vector. In this case, the std::remove will copy each not removed element only once to its final location, while the vector::erase approach would move all of the elements from the position to the end multiple times. Consider:
std::vector<int> v{ 1, 2, 3, 4, 5 };
// remove all elements < 5
If you went over the vector removing elements one by one, you would remove the 1, causing copies of the remainder elements that get shifted (4). Then you would remove 2 and shift all remainding elements by one (3)... if you see the pattern this is a O(N^2) algorithm.
In the case of std::remove the algorithm maintains a read and write heads, and iterates over the container. For the first 4 elements the read head will be moved and the element tested, but no element is copied. Only for the fifth element the object would be copied from the last to the first position, and the algorithm will complete with a single copy and returning an iterator to the second position. This is a O(N) algorithm. The later std::vector::erase with the range will cause destruction of all the remainder elements and resizing the container.
As others have mentioned, in the standard library algorithms are applied to iterators, and lack knowledge of the sequence being iterated. This design is more flexible than other approaches on which algorithms are aware of the containers in that a single implementation of the algorithm can be used with any sequence that complies with the iterator requirements. Consider for example, std::remove_copy_if, it can be used even without containers, by using iterators that generate/accept sequences:
std::remove_copy_if(std::istream_iterator<int>(std::cin),
std::istream_iterator<int>(),
std::ostream_iterator<int>(std::cout, " "),
[](int x) { return !(x%2); } // is even
);
That single line of code will filter out all even numbers from standard input and dump that to standard output, without requiring the loading of all numbers into memory in a container. This is the advantage of the split, the disadvantage is that the algorithms cannot modify the container itself, only the values referred to by the iterators.
std::remove is an algorithm from the STL which is quite container agnostic. It requires some concept, true, but it has been designed to also work with C arrays, which are static in sizes.
std::remove simply returns a new end() iterator to point to one past the last non-removed element (the number of items from the returned value to end() will match the number of items to be removed, but there is no guarantee their values are the same as those you were removing - they are in a valid but unspecified state). This is done so that it can work for multiple container types (basically any container type that a ForwardIterator can iterate through).
std::vector::erase actually sets the new end() iterator after adjusting the size. This is because the vector's method actually knows how to handle adjusting it's iterators (the same can be done with std::list::erase, std::deque::erase, etc.).
remove organizes a given container to remove unwanted objects. The container's erase function actually handles the "removing" the way that container needs it to be done. That is why they are separate.
I think it has to do with needing direct access to the vector itself to be able to resize it. std::remove only has access to the iterators, so it has no way of telling the vector "Hey, you now have fewer elements".
See yves Baumes answer as to why std::remove is designed this way.
Yes, that's the gist of it. Note that erase is also supported by the other standard containers where its performance characteristics are different (e.g. list::erase is O(1)), while std::remove is container-agnostic and works with any type of forward iterator (so it works for e.g. bare arrays as well).
Kind of. Algorithms such as remove work on iterators (which are an abstraction to represent an element in a collection) which do not necessarily know which type of collection they are operating on - and therefore cannot call members on the collection to do the actual removal.
This is good because it allows algorithms to work generically on any container and also on ranges that are subsets of the entire collection.
Also, as you say, for performance - it may not be necessary to actually remove (and destroy) the elements if all you need is access to the logical end position to pass on to another algorithm.
Standard library algorithms operate on sequences. A sequence is defined by a pair of iterators; the first points at the first element in the sequence, and the second points one-past-the-end of the sequence. That's all; algorithms don't care where the sequence comes from.
Standard library containers hold data values, and provide a pair of iterators that specify a sequence for use by algorithms. They also provide member functions that may be able to do the same operations as an algorithm more efficiently by taking advantage of the internal data structure of the container.

Inserting an element into vector in the middle

Is there an way of inserting/deleting an element from the vectors other than the following..
The formal method of using 'push_back'
Using 'find()' in this way... find(v.begin(), v.end(), int)
I have read some where that inserting in the middle can be achieved by inclusive insertion/deletion.
So, is it really possible?
You can use std::vector::insert; however, note that this operation is O(.size()). If your code needs to perform insertions in the middle frequently, you may want to switch to a linked-list structure.
Is there an way of inserting/deleting an element from the vectors other than the following
Yes, you can use std::vector::insert() to insert element at a specified position.
Because vectors use an array as their underlying storage, inserting elements in positions other than the vector end causes the container to move all the elements that were after position to their new positions. This is generally an inefficient operation compared to the one performed for the same operation by other kinds of sequence containers (such as std::list).
std::vector is standard container, you could apply standard STL algorithms on it.
vector::insert seems to be what you want.

Need the position/index of a std::set element on find

I use a std::set to sort a vector of unordered duplicate values. Every time I find an element in my set, I need to know the position (index) of the element as well. There are a lot of elements (hundreds of thousands) in my set, and using std::distance() gives me abysmal performance.
Is std::distance the only way to go?
You can sort the elements in place using the std::sort() algorithm. Then when you find an element in the vector using binary_search() just subtract the result of a call to begin() from the iterator pointing to the element.
Another alternative is to use std::partial_sort_copy() if you don't want to overwrite your original vector. Just sort into another vector and you can do the same thing I described above.

removing elements by value in C++ - does the preferred idiom really consist of a double negative?

I came across this answer to the question of removing elements by value in C++:
C++ Erase vector element by value rather than by position?
Basically:
vec.erase(std::remove(vec.begin(), vec.end(), valueToRemove), vec.end());
The answer makes sense, but isn't this bad style? The logic is consists of a double negative... is there a cleaner way to do this?
Deleting an element from a collection consists of two steps:
Moving down all subsequent elements to fill in the holes created by matches
Marking the new end
With the C++ standard library, these are two separate functions, remove and erase, respectively.
One could certainly imagine an erase_if type of function which would be easier to use, but evidently the current code is considered good enough. Of course you can write your own remove_if.
This isn't bad and in fact an efficient way of removing elements from a vector based on a condition in linear time. Watch this video from 35th minute. STL explanation for the Erase and Remove Idiom
Remember that there are different types of containers: Contiguous vs node-based, and sequential vs associative.
Node-based containers allow efficient erase/insert. Sequential containers organize elements by insertion order (i.e. position), while associative containers arrange them by (key) value.
All current associative containers (map/set/unordered) are node-based, and with them you can erase elements directly, and you should use the element-wise member erase function directly. Lists are node-based sequence containers, so you can erase individual elements efficiently, but finding an element by value takes linear time, which is why lists offer a member remove function. Only sequence containers (vector and deque) have no easy way to erase elements by value, and that's where the free remove algorithm comes in, which first rearranges the sequence to then allow the container's member erase to perform an efficient erasure at the end of the container.
Unlike the many generic aspects of the standard library which work without any knowledge of the underlying container, the copy/erase idiom is one of those things which require a bit of detail knowledge about the differences between the containers.