algorithm to remove elements in the intersection of two sets - c++

I have a Visual Studio 2008 C++03 application where I have two standard containers. I would like to remove from one container all of the items that are present in the other container (the intersection of the sets).
something like this:
std::vector< int > items = /* 1, 2, 3, 4, 5, 6, 7 */;
std::set< int > items_to_remove = /* 2, 4, 5*/;
std::some_algorithm( items.begin, items.end(), items_to_remove.begin(), items_to_remove.end() );
assert( items == /* 1, 3, 6, 7 */ )
Is there an existing algorithm or pattern that will do this or do I need to roll my own?
Thanks

Try with:
items.erase(
std::remove_if(
items.begin(), items.end()
, std::bind1st(
std::mem_fun( &std::set< int >::count )
, items_to_remove
)
)
, items.end()
);
std::remove(_if) doesn't actually remove anything, since it works with iterators and not containers. What it does is reorder the elements to be removed at the end of the range, and returns an iterator to the new end of the container. You then call erase to actually remove from the container all of the elements past the new end.
Update: If I recall correctly, binding to a member function of a component of the standard library is not standard C++, as implementations are allowed to add default parameters to the function. You'd be safer by creating your own function or function-object predicate that checks whether the element is contained in the set of items to remove.

Personally, I prefer to create small helpers for this (that I reuse heavily).
template <typename Container>
class InPredicate {
public:
InPredicate(Container const& c): _c(c) {}
template <typename U>
bool operator()(U const& u) {
return std::find(_c.begin(), _c.end(), u) != _c.end();
}
private:
Container const& _c;
};
// Typical builder for automatic type deduction
template <typename Container>
InPredicate<Container> in(Container const& c) {
return InPredicate<Container>(c);
}
This also helps to have a true erase_if algorithm
template <typename Container, typename Predicate>
void erase_if(Container& c, Predicate p) {
c.erase(std::remove_if(c.begin(), c.end(), p), c.end());
}
And then:
erase_if(items, in(items_to_remove));
which is pretty readable :)

One more solution:
There is standard provided algorithm set_difference which can be used for this.
But it requires extra container to hold the result. I personally prefer to do it in-place.
std::vector< int > items;
//say items = [1,2,3,4,5,6,7,8,9]
std::set<int>items_to_remove;
//say items_to_remove = <2,4,5>
std::vector<int>result(items.size()); //as this algorithm uses output
//iterator not inserter iterator for result.
std::vector<int>::iterator new_end = std::set_difference(items.begin(),
items.end(),items_to_remove.begin(),items_to_remove.end(),result.begin());
result.erase(new_end,result.end()); // to erase unwanted elements at the
// end.

You can use std::erase in combination with std::remove for this. There is a C++ idiom called the Erase - Remove idiom, which is going to help you accomplish this.

Assuming you have two sets, A and B, and you want to remove from B, the intersection, I, of (A,B) such that I = A^B, your final results will be:
A (left intact)
B' = B-I
Full theory:
http://math.comsci.us/sets/difference.html
This is quite simple.
Create and populate A and B
Create a third intermediate vector, I
Copy the contents of B into I
For each element a_j of A, which contains j elements, search I for the element a_j; If the element is found in I, remove it
Finally, the code to remove an individual element can be found here:
How do I remove an item from a stl vector with a certain value?
And the code to search for an item is here:
How to find if an item is present in a std::vector?
Good luck!

Here's a more "hands-on" in-place method that doesn't require fancy functions nor do the vectors need to be sorted:
#include <vector>
template <class TYPE>
void remove_intersection(std::vector<TYPE> &items, const std::vector<TYPE> &items_to_remove)
{
for (int i = 0; i < (int)items_to_remove.size(); i++) {
for (int j = 0; j < (int)items.size(); j++) {
if (items_to_remove[i] == items[j]) {
items.erase(items.begin() + j);
j--;//Roll back the iterator to prevent skipping over
}
}
}
}
If you know that the multiplicity in each set is 1 (not a multiset), then you can actually replace the j--; line with a break; for better performance.

Related

Implementing stable_partition for forward_list

I want to implement something similar to std::stable_partition but for forward_list of c++11.
The stl version requires bidirectional iterators, however by utilizing container specific methods I believe I can get the same outcome effeciently.
Example declaration :
template <typename T, typename UnaryPredicate>
void stable_partition(std::forward_list<T>& list, UnaryPredicate p);
(while possible to add begin and end iterators, I omitted them for brevity. The same for returning the partition point )
I already worked out the algorithm to accomplish this on my own list type, but I have troubles implementing it in stl.
The key method appears to be splice_after. Other methods require memory allocations and copying elements.
Algorithm sketch :
create a new empty list. It will hold all elements p returns true on.
loop over the target list, add items to the true list in accordance to invoking p.
concat the true list to the beginning of the target list.
With proper coding this should be linear time (all operations inside the loop can be done in constant time) and without extra memory allocation or copying.
I am trying to implement the second step using splice_after, but I end up either concating the wrong element or invalidating my iterators.
The question:
What is the correct use of splice_after, so that I avoid
mixing iterators between lists and insert the correct elements?
First Attempt (how I hoped it works):
template <typename T, typename UnaryPredicate>
void stable_partition(std::forward_list<T>& list, UnaryPredicate p)
{
std::forward_list<T> positives;
auto positives_iter = positives.before_begin();
for (auto iter = list.begin(); iter != list.end(); ++iter)
{
if (p(*iter))
positives.splice_after(positives_iter, list, iter);
}
list.splice_after(list.before_begin(), positives);
}
Unfortunately this has at least one major flaw: splice_after inserts after iter, and the wrong element is inserted.
Also, when the element is moved to the other list, incrementing iter now traverses the wrong list.
Having to maintain the preceding iterators for std::forward_list::splice_after makes it a bit trickier, but still pretty short:
template<class T, class UnaryPredicate>
std::array<std::forward_list<T>, 2>
stable_partition(std::forward_list<T>& list, UnaryPredicate p) {
std::array<std::forward_list<T>, 2> r;
decltype(r[0].before_begin()) pos[2] = {r[0].before_begin(), r[1].before_begin()};
for(auto i = list.before_begin(), ni = i, e = list.end(); ++ni != e; ni = i) {
bool idx = p(*ni);
auto& p = pos[idx];
r[idx].splice_after(p, list, i);
++p;
}
return r;
}
Usage example:
template<class T>
void print(std::forward_list<T> const& list) {
for(auto const& e : list)
std::cout << e << ' ';
std::cout << '\n';
}
int main() {
std::forward_list<int> l{0,1,2,3,4,5,6};
print(l);
// Partition into even and odd elements.
auto p = stable_partition(l, [](auto e) { return e % 2; });
print(p[0]); // Even elements.
print(p[1]); // Odd elements.
}

Function on multiple vector

I have a sorting algorithm on a vector, and I want to apply it to several vectors, without knowing how much. The only thing I'm sure is that there will be at least 1 vector (always the same) on which I will perform my algorithm. Other will just follow.
Here's an example :
void sort(std::vector<int>& sortVector, std::vector<double>& follow1, std::vector<char>& follow2, ... ){
for (int i = 1; i<vector.size(); ++i){
if ( vector[i-1] > vector[i] ) { //I know it's not sorting here, it's only for the example
std::swap(vector[i-1], vector[i]);
std::swap(follow1[i-1], follow1[i]);
std::swap(follow2[i-1], follow2[i]);
....
}
}
}
I was thinking about using variadic function, but since it's a recursive function, I was wondering if it won't take too much time to everytime create my va_arg list (I'm working on vector sized 500millions/1billions ...). So does something else exists?
As I'm writing this question, I'm understanding that maybe i'm fooling myself, and there is no other way to achieve what I want and variadic function is maybe not that long. (I really don't know, in fact).
EDIT :
In fact, I'm doing an Octree-sorting of datas in order to be usable in opengl.
Since my datas are not always the same (e.g OBJ files will gives me normals, PTS files will gives me Intensity and Colors, ...), I want to be able to reorder all my vectors (in which are contained my datas) so that they have the same order as the position vectors (The vector that contains the positions of my points, it'll be always here).
But all my vectors will have same length, and I want all my followervector to be reorganised as the first one.
If i have 3 Vectors, if I swap first and third values in my first vector, I want to swap first and thrid values in my 2 others vectors.
But my vectors are not all the same. Some will be std::vector<char>, other std::vector<Vec3>, std::vector<unsigned>, and so on.
With range-v3, you may use zip, something like:
template <typename T, typename ... Ranges>
void sort(std::vector<T>& refVector, Ranges&& ... ranges){
ranges::sort(ranges::view::zip(refVector, std::forward<Ranges>(ranges)...));
}
Demo
Or if you don't want to use ranges to compare (for ties in refVector), you can project to use only refVector:
template <typename T, typename ... Ranges>
void sort(std::vector<T>& refVector, Ranges&& ... ranges){
ranges::sort(ranges::view::zip(refVector, std::forward<Ranges>(ranges)...),
std::less<>{},
[](auto& tup) -> T& { return std::get<0>(tup); });
}
Although, I totally agree with the comment of n.m. I suggest to use a vector of vectors which contain the follow vectors and than do a loop over all follow vectors.
void sort(std::vector<int>& vector, std::vector<std::vector<double>>& followers){
for (int i = 1; i<vector.size(); ++i){
if ( vector[i-1] > vector[i] ) {
std::swap(vector[i-1], vector[i]);
for (auto & follow : followers)
std::swap(follow[i-1], follow[i]);
}
}
}
Nevertheless, as n.m. pointed out, perhaps think about putting all your data you like to sort in a class like structure. Than you can have a vector of your class and apply std::sort, see here.
struct MyStruct
{
int key; //content of your int vector named "vector"
double follow1;
std::string follow2;
// all your inforrmation of the follow vectors go here.
MyStruct(int k, const std::string& s) : key(k), stringValue(s) {}
};
struct less_than_key
{
inline bool operator() (const MyStruct& struct1, const MyStruct& struct2)
{
return (struct1.key < struct2.key);
}
};
std::vector < MyStruct > vec;
vec.push_back(MyStruct(4, 1.2, "test"));
vec.push_back(MyStruct(3, 2.8, "a"));
vec.push_back(MyStruct(2, 0.0, "is"));
vec.push_back(MyStruct(1, -10.5, "this"));
std::sort(vec.begin(), vec.end(), less_than_key());
The main problem here is that the std::sort algorithm cannot operate on multiple vectors at the same time.
For the purpose of demonstration, let's assume you have a std::vector<int> v1 and a std::vector<char> v2 (of the same size of course) and you want to sort both depending on the values in v1. To solve this, I basically see three possible solutions, all of which generalize to an arbitrary number of vectors:
1) Put all your data into a single vector.
Define a struct, say Data, that keeps an entry of every data vector.
struct Data
{
int d1;
char d2;
// extend here for more vectors
};
Now construct a new std::vector<Data> and fill it from your original vectors:
std::vector<Data> d(v1.size());
for(std::size_t i = 0; i < d.size(); ++i)
{
d[i].d1 = v1[i];
d[i].d2 = v2[i];
// extend here for more vectors
}
Since everything is stored inside a single vector now, you can use std::sort to bring it into order. Since we want it to be sorted based on the first entry (d1), which stores the values of the first vector, we use a custom predicate:
std::sort(d.begin(), d.end(),
[](const Data& l, const Data& r) { return l.d1 < r.d1; });
Afterwards, all data is sorted in d based on the first vector's values. You can now either work on with the combined vector d or you split the data into the original vectors:
std::transform(d.begin(), d.end(), v1.begin(),
[](const Data& e) { return e.d1; });
std::transform(d.begin(), d.end(), v2.begin(),
[](const Data& e) { return e.d2; });
// extend here for more vectors
2) Use the first vector to compute the indices of the sorted range and use these indices to bring all vectors into order:
First, you attach to all elements in your first vector their current position. Then you sort it using std::sort and a predicate that only compares for the value (ignoring the position).
template<typename T>
std::vector<std::size_t> computeSortIndices(const std::vector<T>& v)
{
std::vector<std::pair<T, std::size_t>> d(v.size());
for(std::size_t i = 0; i < v.size(); ++i)
d[i] = std::make_pair(v[i], i);
std::sort(d.begin(), d.end(),
[](const std::pair<T, std::size_t>& l,
const std::pair<T, std::size_t>& r)
{
return l.first < r.first;
});
std::vector<std::size_t> indices(v.size());
std::transform(d.begin(), d.end(), indices.begin(),
[](const std::pair<T, std::size_t>& p) { return p.second; });
return indices;
}
Say in the resulting index vector the entry at position 0 is 8, then this tells you that the vector entries that have to go to the first position in the sorted vectors are those at position 8 in the original ranges.
You then use this information to sort all of your vectors:
template<typename T>
void sortByIndices(std::vector<T>& v,
const std::vector<std::size_t>& indices)
{
assert(v.size() == indices.size());
std::vector<T> result(v.size());
for(std::size_t i = 0; i < indices.size(); ++i)
result[i] = v[indices[i]];
v = std::move(result);
}
Any number of vectors may then be sorted like this:
const auto indices = computeSortIndices(v1);
sortByIndices(v1, indices);
sortByIndices(v2, indices);
// extend here for more vectors
This can be improved a bit by extracting the sorted v1 out of computeSortIndices directly, so that you do not need to sort it again using sortByIndices.
3) Implement your own sort function that is able to operate on multiple vectors. I have sketched an implementation of an in-place merge sort that is able to sort any number of vectors depending on the values in the first one.
The core of the merge sort algorithm is implemented by the multiMergeSortRec function, which takes an arbitrary number (> 0) of vectors of arbitrary types.
The function splits all vectors into first and second half, sorts both halves recursively and merges the the results back together. Search the web for a full explanation of merge sort if you need more details.
template<typename T, typename... Ts>
void multiMergeSortRec(
std::size_t b, std::size_t e,
std::vector<T>& v, std::vector<Ts>&... vs)
{
const std::size_t dist = e - b;
if(dist <= 1)
return;
std::size_t m = b + (dist / static_cast<std::size_t>(2));
// split in half and recursively sort both parts
multiMergeSortRec(b, m, v, vs...);
multiMergeSortRec(m, e, v, vs...);
// merge both sorted parts
while(b < m)
{
if(v[b] <= v[m])
++b;
else
{
++m;
rotateAll(b, m, v, vs...);
if(m == e)
break;
}
}
}
template<typename T, typename... Ts>
void multiMergeSort(std::vector<T>& v, std::vector<Ts>&... vs)
{
// TODO: check that all vectors have same length
if(v.size() < 2)
return ;
multiMergeSortRec<T, Ts...>(0, v.size(), v, vs...);
}
In order to operate in-place, parts of the vectors have to be rotated. This is done by the rotateAll function, which again works on an arbitrary number of vectors by recursively processing the variadic parameter pack.
void rotateAll(std::size_t, std::size_t)
{
}
template<typename T, typename... Ts>
void rotateAll(std::size_t b, std::size_t e,
std::vector<T>& v, std::vector<Ts>&... vs)
{
std::rotate(v.begin() + b, v.begin() + e - 1, v.begin() + e);
rotateAll(b, e, vs...);
}
Note, that the recursive calls of rotateAll are very likely to be inlined by every optimizing compiler, such that the function merely applies std::rotate to all vectors. You can circumvent the need to rotate parts of the vector, if you leave in-place and merge into an additional vector. I like to emphasize that this is neither an optimized nor a fully tested implementation of merge sort. It should serve as a sketch, since you really do not want to use bubble sort whenever you work on large vectors.
Let's quickly compare the above alternatives:
1) is easier to implement, since it relies on an existing (highly optimized and tested) std::sort implementation.
1) needs all data to be copied into the new vector and possibly (depending on your use case) all of it to be copied back.
In 1) multiple places have to be extended if you need to attach additional vectors to be sorted.
The implementation effort for 2) is mediocre (more than 1, but less and easier than 3), but it relies on optimized and tested std::sort.
2) cannot sort in-place (using the indices) and thus has to make a copy of every vector. Maybe there is an in-place alternative, but I cannot think of one right now (at least an easy one).
2) is easy to extend for additional vectors.
For 3) you need to implement sorting yourself, which makes it more difficult to get right.
3) does not need to copy all data. The implementation can be further optimized and can be tweaked for improved performance (out-of-place) or reduced memory consumption (in-place).
3) can work on additional vectors without any change. Just invoke multiMergeSort with one or more additional arguments.
All three work for heterogeneous sets of vectors, in contrast to the std::vector<std::vector<>> approach.
Which of the alternatives performs better in your case, is hard to say and should greatly depend on the number of vectors and their size, so if you really need optimal performance (and/or memory usage) you need to measure.
Find an implementation of the above here.
By far the easiest solution is to create a helper vector std::vector<size_t> initialized with std::iota(helper.begin(), helper.end(), size_t{});.
Next, sort this array,. obviously not by the array index (iota already did that), but by sortvector[i]. IOW, the predicate is [sortvector&](size_t i, size_t j) { sortVector[i] < sortVector[j]; }.
You now have the proper order of array indices. I.e. if helper[0]==17, then it means that the new front of all vectors should be the original 18th element. Usually the easiest way to produce the sorted result is to copy over elements, and then swap the original vector and the copy, repeated for all vectors. But if copying all elements is too expensive, it can be done in-place. (Note that if O(N) element copes are too expensive, a straightforward std::sort tends to perform badly as well as it needs pivots)

How does an STL algorithm identify the container?

How does an STL algorithm identify on what kind of container it is operating on?
For example, sort accepts iterators as arguments.
How does it know what kind of container it has to sort?
It doesn't :-) That's the whole point of iterators -- the algorithms that work on them don't need to know anything about the underlying container, and vice versa.
How does it work then? Well, the iterators themselves have a certain set of well-known properties. For example, 'random-access' iterators allow any algorithm to access an element offset from iterator by a constant:
std::vector<int> vec = { 1, 2, 3, 4 };
assert(*(vec.begin() + 2) == 3);
For a sort, the iterators need to support random access (in order to access all the elements between the first and end iterators in an arbitrary order), and they need to be writable (in order to assign or otherwise swap values around), otherwise known as 'output' iterators.
Example of an output iterator vs. an input (read-only) one:
std::vector<int> vec = { 1, 2, 3, 4 };
*vec.begin() = 9;
assert(vec[0] == 9);
*vec.cbegin() = 10; // Does not compile -- cbegin() yields a const iterator
// that is 'random-access', but not 'output'
It doesn't need to know the type of the container, it just needs to know the type of iterator.
As mentioned earlier, STL uses iterators, not containers. It uses the technique known as "tag dispatch" to deduce proper algorithm flavor.
For example, STL has a function "advance" which moves given iterator it by given n positions
template<class IteratorType,
class IntegerDiffType> inline
void advance(IteratorType& it, IntegerDiffType n)
For bidirectional iterators it has to apply ++ or -- many times; for random access iterators it can jump at once. This function is used in std::binary_search, std::lower_bound and some other algorithms.
Internally, it uses iterator type traits to select the strategy:
template<class IteratorType,
class IntegerDiffType>
void advance(IteratorType& it, IntegerDiffType n)
{
typedef typename iterator_traits<IteratorType>::category category;
advance_impl(it, n, category());
}
Of course, STL has to implement the overloaded "impl" functions:
template<class IteratorType,
class IntegerDiffType>
void advance(IteratorType& it, IntegerDiffType n, bidirectional_iterator_tag)
{ // increment iterator by offset, bidirectional iterators
for (; 0 < n; --n)
++it;
for (; n < 0; ++n)
--it;
}
template<class IteratorType,
class IntegerDiffType>
void advance(IteratorType& it, IntegerDiffType n, random_access_iterator_tag)
{ // increment iterator by offset, random-access iterators
it += n;
}

Insert multiple values into vector

I have a std::vector<T> variable. I also have two variables of type T, the first of which represents the value in the vector after which I am to insert, while the second represents the value to insert.
So lets say I have this container: 1,2,1,1,2,2
And the two values are 2 and 3 with respect to their definitions above. Then I wish to write a function which will update the container to instead contain:
1,2,3,1,1,2,3,2,3
I am using c++98 and boost. What std or boost functions might I use to implement this function?
Iterating over the vector and using std::insert is one way, but it gets messy when one realizes that you need to remember to hop over the value you just inserted.
This is what I would probably do:
vector<T> copy;
for (vector<T>::iterator i=original.begin(); i!=original.end(); ++i)
{
copy.push_back(*i);
if (*i == first)
copy.push_back(second);
}
original.swap(copy);
Put a call to reserve in there if you want. You know you need room for at least original.size() elements. You could also do an initial iteraton over the vector (or use std::count) to determine the exact amount of elements to reserve, but without testing, I don't know whether that would improve performance.
I propose a solution that works in place and in O(n) in memory and O(2n) time. Instead of O(n^2) in time by the solution proposed by Laethnes and O(2n) in memory by the solution proposed by Benjamin.
// First pass, count elements equal to first.
std::size_t elems = std::count(data.begin(), data.end(), first);
// Resize so we'll add without reallocating the elements.
data.resize(data.size() + elems);
vector<T>::reverse_iterator end = data.rbegin() + elems;
// Iterate from the end. Move elements from the end to the new end (and so elements to insert will have some place).
for(vector<T>::reverse_iterator new_end = data.rbegin(); end != data.rend() && elems > 0; ++new_end,++end)
{
// If the current element is the one we search, insert second first. (We iterate from the end).
if(*end == first)
{
*new_end = second;
++new_end;
--elems;
}
// Copy the data to the end.
*new_end = *end;
}
This algorithm may be buggy but the idea is to copy only once each elements by:
Firstly count how much elements we'll need to insert.
Secondly by going though the data from the end and moving each elements to the new end.
This is what I probably would do:
typedef ::std::vector<int> MyList;
typedef MyList::iterator MyListIter;
MyList data;
// ... fill data ...
const int searchValue = 2;
const int addValue = 3;
// Find first occurence of searched value
MyListIter iter = ::std::find(data.begin(), data.end(), searchValue);
while(iter != data.end())
{
// We want to add our value after searched one
++iter;
// Insert value and return iterator pointing to the inserted position
// (original iterator is invalid now).
iter = data.insert(iter, addValue);
// This is needed only if we want to be sure that out value won't be used
// - for example if searchValue == addValue is true, code would create
// infinite loop.
++iter;
// Search for next value.
iter = ::std::find(iter, data.end(), searchValue);
}
but as you can see, I couldn't avoid the incrementation you mentioned. But I don't think that would be bad thing: I would put this code to separate functions (probably in some kind of "core/utils" module) and - of course - implement this function as template, so I would write it only once - only once worrying about incrementing value is IMHO acceptable. Very acceptable.
template <class ValueType>
void insertAfter(::std::vector<ValueType> &io_data,
const ValueType &i_searchValue,
const ValueType &i_insertAfterValue);
or even better (IMHO)
template <class ListType, class ValueType>
void insertAfter(ListType &io_data,
const ValueType &i_searchValue,
const ValueType &i_insertAfterValue);
EDIT:
well, I would solve problem little different way: first count number of the searched value occurrence (preferably store in some kind of cache which can be kept and used repeatably) so I could prepare array before (only one allocation) and used memcpy to move original values (for types like int only, of course) or memmove (if the vector allocated size is sufficient already).
In place, O(1) additional memory and O(n) time (Live at Coliru):
template <typename T, typename A>
void do_thing(std::vector<T, A>& vec, T target, T inserted) {
using std::swap;
typedef typename std::vector<T, A>::size_type size_t;
const size_t occurrences = std::count(vec.begin(), vec.end(), target);
if (occurrences == 0) return;
const size_t original_size = vec.size();
vec.resize(original_size + occurrences, inserted);
for(size_t i = original_size - 1, end = i + occurrences; i > 0; --i, --end) {
if (vec[i] == target) {
--end;
}
swap(vec[i], vec[end]);
}
}

min n elements with expensive or deleted default constructor

Given an array v (some STL container, e.g. std::vector< double >) of generally unsorted data (say assert(std::is_same< typeof(v), V >::value);). Over the elements of the array is defined comparison operator, say std::less. You need to create an array with n minimal elements (copies form v), but the elements are not default constructible (or is expensive operation). How to do it by means of STL? Non-modifying sequence algorithm is required.
Originally seen as a way to solve using std::back_insert_iterator, but there is some confusion as explained further:
assert(!std::is_default_constructible< typename V::value_type >::value); // assume
template< class V >
V min_n_elements(typename V::const_iterator begin, typename V::const_iterator end, typename V::size_type const n)
{
assert(!(std::distance(begin, end) < n));
V result; // V result(n); not allowed
result.reserve(n);
std::partial_sort_copy(begin, end, std::back_inserter(result), /*What should be here? mb something X(result.capacity())?*/, std::less< typename V::value_type >());
return result;
}
I want to find solution that is optimal in terms of time and memory (O(1) additional memory and <= O(std::partial_sort_copy) time consumption). Totally algorithm should operate on the following number of memory: v.size() elements of non-modifiable source v as input and n of newly created elements, all of which are copies of the n smallest elements of source array v, as output. That's all. I think this is a realistic limits.
EDIT: reimplemented with heap:
template< class V >
V min_n_elements(typename V::const_iterator b, typename V::const_iterator e, typename V::size_type const n) {
assert(std::distance(b, e) >= n);
V res(b, b+n);
make_heap(res.begin(), res.end());
for (auto i=b+n; i<e; ++i) {
if (*i < res.front()) {
pop_heap(res.begin(), res.end());
res.back() = *i;
push_heap(res.begin(), res.end());
}
}
return std::move(res);
}
Unless you also need those elements sorted, it's probably easiest and fastest to use std::nth_element, then std::copy.
template <class InIter, class OutIter>
min_n_elements(InIter b, InIter e, OutIter o, InIter::difference_type n) {
InIter pos = b+n;
std::nth_element(b, pos, e);
std:copy(b, pos, o);
}
std::nth_element not only finds the given element, but guarantees that those elements less than that are two it's "left", and those greater are to its "right".
This does side-step the real problem a bit though -- instead of actually creating the container for the results, it simply expects the user to create a container of the correct type, and then provide an iterator (e.g., a back_insert_iterator) to put the data in the right place. At the same time, I think this is really the correct thing to do -- the algorithm to find N minimum elements and the choice of container for the destination are separate.
If you really want to put the result in a specific container type anyway, that shouldn't be terribly difficult though:
template <class V>
V n_min_element(V::iterator b, V::iterator e) {
V::const_iterator pos = b+n;
nth_element(b, pos, e);
V ret(b, pos);
return V;
}
As they stand, these do modify the (order of elements in) the input, but given that you've said the input isn't sorted, I'm assuming their order doesn't matter, so that should be permissible. If you can't do that, the next possibility is probably to create a collection of pointers, and use a comparison function that compares based on the pointees, then do your nth_element on that, and finally copy the pointees to the new collection.