I have a sorting algorithm on a vector, and I want to apply it to several vectors, without knowing how much. The only thing I'm sure is that there will be at least 1 vector (always the same) on which I will perform my algorithm. Other will just follow.
Here's an example :
void sort(std::vector<int>& sortVector, std::vector<double>& follow1, std::vector<char>& follow2, ... ){
for (int i = 1; i<vector.size(); ++i){
if ( vector[i-1] > vector[i] ) { //I know it's not sorting here, it's only for the example
std::swap(vector[i-1], vector[i]);
std::swap(follow1[i-1], follow1[i]);
std::swap(follow2[i-1], follow2[i]);
....
}
}
}
I was thinking about using variadic function, but since it's a recursive function, I was wondering if it won't take too much time to everytime create my va_arg list (I'm working on vector sized 500millions/1billions ...). So does something else exists?
As I'm writing this question, I'm understanding that maybe i'm fooling myself, and there is no other way to achieve what I want and variadic function is maybe not that long. (I really don't know, in fact).
EDIT :
In fact, I'm doing an Octree-sorting of datas in order to be usable in opengl.
Since my datas are not always the same (e.g OBJ files will gives me normals, PTS files will gives me Intensity and Colors, ...), I want to be able to reorder all my vectors (in which are contained my datas) so that they have the same order as the position vectors (The vector that contains the positions of my points, it'll be always here).
But all my vectors will have same length, and I want all my followervector to be reorganised as the first one.
If i have 3 Vectors, if I swap first and third values in my first vector, I want to swap first and thrid values in my 2 others vectors.
But my vectors are not all the same. Some will be std::vector<char>, other std::vector<Vec3>, std::vector<unsigned>, and so on.
With range-v3, you may use zip, something like:
template <typename T, typename ... Ranges>
void sort(std::vector<T>& refVector, Ranges&& ... ranges){
ranges::sort(ranges::view::zip(refVector, std::forward<Ranges>(ranges)...));
}
Demo
Or if you don't want to use ranges to compare (for ties in refVector), you can project to use only refVector:
template <typename T, typename ... Ranges>
void sort(std::vector<T>& refVector, Ranges&& ... ranges){
ranges::sort(ranges::view::zip(refVector, std::forward<Ranges>(ranges)...),
std::less<>{},
[](auto& tup) -> T& { return std::get<0>(tup); });
}
Although, I totally agree with the comment of n.m. I suggest to use a vector of vectors which contain the follow vectors and than do a loop over all follow vectors.
void sort(std::vector<int>& vector, std::vector<std::vector<double>>& followers){
for (int i = 1; i<vector.size(); ++i){
if ( vector[i-1] > vector[i] ) {
std::swap(vector[i-1], vector[i]);
for (auto & follow : followers)
std::swap(follow[i-1], follow[i]);
}
}
}
Nevertheless, as n.m. pointed out, perhaps think about putting all your data you like to sort in a class like structure. Than you can have a vector of your class and apply std::sort, see here.
struct MyStruct
{
int key; //content of your int vector named "vector"
double follow1;
std::string follow2;
// all your inforrmation of the follow vectors go here.
MyStruct(int k, const std::string& s) : key(k), stringValue(s) {}
};
struct less_than_key
{
inline bool operator() (const MyStruct& struct1, const MyStruct& struct2)
{
return (struct1.key < struct2.key);
}
};
std::vector < MyStruct > vec;
vec.push_back(MyStruct(4, 1.2, "test"));
vec.push_back(MyStruct(3, 2.8, "a"));
vec.push_back(MyStruct(2, 0.0, "is"));
vec.push_back(MyStruct(1, -10.5, "this"));
std::sort(vec.begin(), vec.end(), less_than_key());
The main problem here is that the std::sort algorithm cannot operate on multiple vectors at the same time.
For the purpose of demonstration, let's assume you have a std::vector<int> v1 and a std::vector<char> v2 (of the same size of course) and you want to sort both depending on the values in v1. To solve this, I basically see three possible solutions, all of which generalize to an arbitrary number of vectors:
1) Put all your data into a single vector.
Define a struct, say Data, that keeps an entry of every data vector.
struct Data
{
int d1;
char d2;
// extend here for more vectors
};
Now construct a new std::vector<Data> and fill it from your original vectors:
std::vector<Data> d(v1.size());
for(std::size_t i = 0; i < d.size(); ++i)
{
d[i].d1 = v1[i];
d[i].d2 = v2[i];
// extend here for more vectors
}
Since everything is stored inside a single vector now, you can use std::sort to bring it into order. Since we want it to be sorted based on the first entry (d1), which stores the values of the first vector, we use a custom predicate:
std::sort(d.begin(), d.end(),
[](const Data& l, const Data& r) { return l.d1 < r.d1; });
Afterwards, all data is sorted in d based on the first vector's values. You can now either work on with the combined vector d or you split the data into the original vectors:
std::transform(d.begin(), d.end(), v1.begin(),
[](const Data& e) { return e.d1; });
std::transform(d.begin(), d.end(), v2.begin(),
[](const Data& e) { return e.d2; });
// extend here for more vectors
2) Use the first vector to compute the indices of the sorted range and use these indices to bring all vectors into order:
First, you attach to all elements in your first vector their current position. Then you sort it using std::sort and a predicate that only compares for the value (ignoring the position).
template<typename T>
std::vector<std::size_t> computeSortIndices(const std::vector<T>& v)
{
std::vector<std::pair<T, std::size_t>> d(v.size());
for(std::size_t i = 0; i < v.size(); ++i)
d[i] = std::make_pair(v[i], i);
std::sort(d.begin(), d.end(),
[](const std::pair<T, std::size_t>& l,
const std::pair<T, std::size_t>& r)
{
return l.first < r.first;
});
std::vector<std::size_t> indices(v.size());
std::transform(d.begin(), d.end(), indices.begin(),
[](const std::pair<T, std::size_t>& p) { return p.second; });
return indices;
}
Say in the resulting index vector the entry at position 0 is 8, then this tells you that the vector entries that have to go to the first position in the sorted vectors are those at position 8 in the original ranges.
You then use this information to sort all of your vectors:
template<typename T>
void sortByIndices(std::vector<T>& v,
const std::vector<std::size_t>& indices)
{
assert(v.size() == indices.size());
std::vector<T> result(v.size());
for(std::size_t i = 0; i < indices.size(); ++i)
result[i] = v[indices[i]];
v = std::move(result);
}
Any number of vectors may then be sorted like this:
const auto indices = computeSortIndices(v1);
sortByIndices(v1, indices);
sortByIndices(v2, indices);
// extend here for more vectors
This can be improved a bit by extracting the sorted v1 out of computeSortIndices directly, so that you do not need to sort it again using sortByIndices.
3) Implement your own sort function that is able to operate on multiple vectors. I have sketched an implementation of an in-place merge sort that is able to sort any number of vectors depending on the values in the first one.
The core of the merge sort algorithm is implemented by the multiMergeSortRec function, which takes an arbitrary number (> 0) of vectors of arbitrary types.
The function splits all vectors into first and second half, sorts both halves recursively and merges the the results back together. Search the web for a full explanation of merge sort if you need more details.
template<typename T, typename... Ts>
void multiMergeSortRec(
std::size_t b, std::size_t e,
std::vector<T>& v, std::vector<Ts>&... vs)
{
const std::size_t dist = e - b;
if(dist <= 1)
return;
std::size_t m = b + (dist / static_cast<std::size_t>(2));
// split in half and recursively sort both parts
multiMergeSortRec(b, m, v, vs...);
multiMergeSortRec(m, e, v, vs...);
// merge both sorted parts
while(b < m)
{
if(v[b] <= v[m])
++b;
else
{
++m;
rotateAll(b, m, v, vs...);
if(m == e)
break;
}
}
}
template<typename T, typename... Ts>
void multiMergeSort(std::vector<T>& v, std::vector<Ts>&... vs)
{
// TODO: check that all vectors have same length
if(v.size() < 2)
return ;
multiMergeSortRec<T, Ts...>(0, v.size(), v, vs...);
}
In order to operate in-place, parts of the vectors have to be rotated. This is done by the rotateAll function, which again works on an arbitrary number of vectors by recursively processing the variadic parameter pack.
void rotateAll(std::size_t, std::size_t)
{
}
template<typename T, typename... Ts>
void rotateAll(std::size_t b, std::size_t e,
std::vector<T>& v, std::vector<Ts>&... vs)
{
std::rotate(v.begin() + b, v.begin() + e - 1, v.begin() + e);
rotateAll(b, e, vs...);
}
Note, that the recursive calls of rotateAll are very likely to be inlined by every optimizing compiler, such that the function merely applies std::rotate to all vectors. You can circumvent the need to rotate parts of the vector, if you leave in-place and merge into an additional vector. I like to emphasize that this is neither an optimized nor a fully tested implementation of merge sort. It should serve as a sketch, since you really do not want to use bubble sort whenever you work on large vectors.
Let's quickly compare the above alternatives:
1) is easier to implement, since it relies on an existing (highly optimized and tested) std::sort implementation.
1) needs all data to be copied into the new vector and possibly (depending on your use case) all of it to be copied back.
In 1) multiple places have to be extended if you need to attach additional vectors to be sorted.
The implementation effort for 2) is mediocre (more than 1, but less and easier than 3), but it relies on optimized and tested std::sort.
2) cannot sort in-place (using the indices) and thus has to make a copy of every vector. Maybe there is an in-place alternative, but I cannot think of one right now (at least an easy one).
2) is easy to extend for additional vectors.
For 3) you need to implement sorting yourself, which makes it more difficult to get right.
3) does not need to copy all data. The implementation can be further optimized and can be tweaked for improved performance (out-of-place) or reduced memory consumption (in-place).
3) can work on additional vectors without any change. Just invoke multiMergeSort with one or more additional arguments.
All three work for heterogeneous sets of vectors, in contrast to the std::vector<std::vector<>> approach.
Which of the alternatives performs better in your case, is hard to say and should greatly depend on the number of vectors and their size, so if you really need optimal performance (and/or memory usage) you need to measure.
Find an implementation of the above here.
By far the easiest solution is to create a helper vector std::vector<size_t> initialized with std::iota(helper.begin(), helper.end(), size_t{});.
Next, sort this array,. obviously not by the array index (iota already did that), but by sortvector[i]. IOW, the predicate is [sortvector&](size_t i, size_t j) { sortVector[i] < sortVector[j]; }.
You now have the proper order of array indices. I.e. if helper[0]==17, then it means that the new front of all vectors should be the original 18th element. Usually the easiest way to produce the sorted result is to copy over elements, and then swap the original vector and the copy, repeated for all vectors. But if copying all elements is too expensive, it can be done in-place. (Note that if O(N) element copes are too expensive, a straightforward std::sort tends to perform badly as well as it needs pivots)
Related
I have a std::vector<int> and I want to throw away the x first and y last elements. Just copying the elements is not an option, since this is O(n).
Is there something like vector.begin()+=x to let the vector just start later and end earlier?
I also tried
items = std::vector<int> (&items[x+1],&items[0]+items.size()-y);
where items is my vector, but this gave me bad_alloc
C++ standard algorithms work on ranges, not on actual containers, so you don't need to extract anything: you just need to adjust the iterator range you're working with.
void foo(const std::vector<T>& vec, const size_t start, const size_t end)
{
assert(vec.size() >= end-start);
auto it1 = vec.begin() + start;
auto it2 = vec.begin() + end;
std::whatever(it1, it2);
}
I don't see why it needs to be any more complicated than that.
(trivial live demo)
If you only need a range of values, you can represent that as a pair of iterators from first to last element of the range. These can be acquired in constant time.
Edit: According to the description in the comments, this seems like the most sensible solution. If your functions expect a vector reference, then you'll need to refactor a bit.
Other solutions:
If you don't need the original vector, and therefore can modify it, and the order of elements is not relevant, you can swap the first x elements with the n-x-y...n-y elements and then remove the last x+y elements. This can be done in O(x+y) time.
If appropriate, you could choose to use std::list for which what you're asking can be done in constant time if you have iterators to the first and last node of the sublist. This also requires that you can modify the original list but the order of elements won't change.
If those are not options, then you need to copy and are stuck with O(n).
The other answers are correct: usually iterators will do.
Nevertheless, you can also write a vector view. Here is a sketch:
template<typename T>
struct vector_view
{
vector_view(std::vector<T> const& v, size_t ind_begin, size_t ind_end)
: _v(v)
, _size(/* size of range */)
, _ind_begin(ind_begin) {}
auto size() const { return _size; }
auto const& operator[](size_t i) const
{
//possibly check for input outside range
return _v[ i + _ind_begin ];
}
//conversion of view to std::vector
operator std::vector<T>() const
{
std::vector<T> ret(_size);
//fill it
return ret;
}
private:
std::vector<T> const& _v;
size_t _size;
size_t _ind_begin;
}
Expose further methods as required (some iterator stuff might be appropriate when you want to use that with the standard library algorithms).
Further, take care on the validity of the const reference std::vector<T> const& v; -- if that could be an issue, one should better work with shared-pointers.
One can also think of more general approaches here, for example, use strides or similar things.
Suppose we have a vector of pairs:
std::vector<std::pair<A,B>> v;
where for type A only equality is defined:
bool operator==(A const & lhs, A const & rhs) { ... }
How would you sort it that all pairs with the same first element will end up close? To be clear, the output I hope to achieve should be the same as does something like this:
std::unordered_multimap<A,B> m(v.begin(),v.end());
std::copy(m.begin(),m.end(),v.begin());
However I would like, if possible, to:
Do the sorting in place.
Avoid the need to define a hash function for equality.
Edit: additional concrete information.
In my case the number of elements isn't particularly big (I expect N = 10~1000), though I have to repeat this sorting many times ( ~400) as part of a bigger algorithm, and the datatype known as A is pretty big (it contains among other things an unordered_map with ~20 std::pair<uint32_t,uint32_t> in it, which is the structure preventing me to invent an ordering, and making it hard to build a hash function)
First option: cluster() and sort_within()
The handwritten double loop by #MadScienceDreams can be written as a cluster() algorithm of O(N * K) complexity with N elements and K clusters. It repeatedly calls std::partition (using C++14 style with generic lambdas, easily adaptable to C++1, or even C++98 style by writing your own function objects):
template<class FwdIt, class Equal = std::equal_to<>>
void cluster(FwdIt first, FwdIt last, Equal eq = Equal{})
{
for (auto it = first; it != last; /* increment inside loop */)
it = std::partition(it, last, [=](auto const& elem){
return eq(elem, *it);
});
}
which you call on your input vector<std::pair> as
cluster(begin(v), end(v), [](auto const& L, auto const& R){
return L.first == R.first;
});
The next algorithm to write is sort_within which takes two predicates: an equality and a comparison function object, and repeatedly calls std::find_if_not to find the end of the current range, followed by std::sort to sort within that range:
template<class RndIt, class Equal = std::equal_to<>, class Compare = std::less<>>
void sort_within(RndIt first, RndIt last, Equal eq = Equal{}, Compare cmp = Compare{})
{
for (auto it = first; it != last; /* increment inside loop */) {
auto next = std::find_if_not(it, last, [=](auto const& elem){
return eq(elem, *it);
});
std::sort(it, next, cmp);
it = next;
}
}
On an already clustered input, you can call it as:
sort_within(begin(v), end(v),
[](auto const& L, auto const& R){ return L.first == R.first; },
[](auto const& L, auto const& R){ return L.second < R.second; }
);
Live Example that shows it for some real data using std::pair<int, int>.
Second option: user-defined comparison
Even if there is no operator< defined on A, you might define it yourself. Here, there are two broad options. First, if A is hashable, you can define
bool operator<(A const& L, A const& R)
{
return std::hash<A>()(L) < std::hash<A>()(R);
}
and write std::sort(begin(v), end(v)) directly. You will have O(N log N) calls to std::hash if you don't want to cache all the unique hash values in a separate storage.
Second, if A is not hashable, but does have data member getters x(), y() and z(), that uniquely determine equality on A: you can do
bool operator<(A const& L, A const& R)
{
return std::tie(L.x(), L.y(), L.z()) < std::tie(R.x(), R.y(), R.z());
}
Again you can write std::sort(begin(v), end(v)) directly.
if you can come up with a function that assigns to each unique element a unique number, then you can build secondary array with this unique numbers and then sort secondary array and with it primary for example by merge sort.
But in this case you need function that assigns to each unique element a unique number i.e. hash-function without collisions. I think this should not be a problem.
And asymptotic of this solution if hash-function have O(1), then building secondary array is O(N) and sorting it with primary is O(NlogN). And summary O(N + NlogN) = O(N logN).
And the bad side of this solution is that it requires double memory.
In conclusion the main sense of this solution is quickly translate your elements to elements which you can quickly compare.
An in place algorithm is
for (int i = 0; i < n-2; i++)
{
for (int j = i+2; j < n; j++)
{
if (v[j].first == v[i].first)
{
std::swap(v[j],v[i+1]);
i++;
}
}
There is probably a more elegant way to write the loop, but this is O(n*m), where n is the number of elements and m is the number of keys. So if m is much smaller than n (with a best case being that all the keys are the same), this can be approximated by O(n). Worst case, the number of key ~= n, so this is O(n^2). I have no idea what you expect for the number of keys, so I can't really do the average case, but it is most likely O(n^2) for the average case as well.
For a small number of keys, this may work faster than unordered multimap, but you'll have to measure to find out.
Note: the order of clusters is completely random.
Edit: (much more efficient in the partially-clustered case, doesn't change complexity)
for (int i = 0; i < n-2; i++)
{
for(;i<n-2 && v[i+1].first==v[i].first; i++){}
for (int j = i+2; j < n; j++)
{
if (v[j].first == v[i].first)
{
std::swap(v[j],v[i+1]);
i++;
}
}
Edit 2: At /u/MrPisarik's comment, removed redundant i check in inner loop.
I'm surprised no one has suggested the use of std::partition yet. It makes the solution nice, elegant, and generic:
template<typename BidirIt, typename BinaryPredicate>
void equivalence_partition(BidirIt first, BidirIt last, BinaryPredicate p) {
using element_type = typename std::decay<decltype(*first)>::type;
if(first == last) {
return;
}
auto new_first = std::partition
(first, last, [=](element_type const &rhs) { return p(*first, rhs); });
equivalence_partition(new_first, last, p);
}
template<typename BidirIt>
void equivalence_partition(BidirIt first, BidirIt last) {
using element_type = typename std::decay<decltype(*first)>::type;
equivalence_partition(first, last, std::equal_to<element_type>());
}
Example here.
I have code that creates several object instances (each instance having a fitness value, among other things) from which I want to sample N unique objects using weighted selection based on their fitness values. All objects not sampled are then discarded (but they need to be initially created to determine their fitness value).
my current code looks something like this:
vector<Item> getItems(..) {
std::vector<Item> items
.. // generate N values for items
int N = items.size();
std::vector<double> fitnessVals;
for(auto it = items.begin(); it != items.end(); ++it)
fitnessVals.push_back(it->getFitness());
std::mt19937& rng = getRng();
for(int i = 0, i < N, ++i) {
std::discrete_distribution<int> dist(fitnessVals.begin() + i, fitnessVals.end());
unsigned int pick = dist(rng);
std::swap(fitnessVals.at(i), fitnessVals.at(pick));
std::swap(items.at(i), items.at(pick));
}
items.erase(items.begin() + N, items.end());
return items;
}
Typically ~10,000 instances are initially created, with N being ~200. The fitness value is non-negative, usually valued at ~70. It could go as high as ~3000, but higher values are increasingly more unlikely.
Is there an elegant way to get rid of the fitnessVals vector? Or perhaps a better way to do this in general? Efficiency is important, but I'm also wondering about good C++ coding practices.
If you're asking whether you can do this just with the items in your items vector, the answer is yes. The following is a rather hideous but none-the-less effective way to do that: I apologize in advance for the density.
This wraps the unsuspecting container iterator in another iterator of our own devices; one that pairs it with a member function of your choice. You may have to dance with const in this to get it to work correctly with your member function choice. That task i leave to you.
template<typename Iter, typename R>
struct memfn_iterator_s :
public std::iterator<std::input_iterator_tag, R>
{
using value_type = typename std::iterator_traits<Iter>::value_type;
memfn_iterator_s(Iter it, R(value_type::*fn)())
: m_it(it), mem_fn(fn) {}
R operator*()
{
return ((*m_it).*mem_fn)();
}
bool operator ==(const memfn_iterator_s& arg) const
{
return m_it == arg.m_it;
}
bool operator !=(const memfn_iterator_s& arg) const
{
return m_it != arg.m_it;
}
memfn_iterator_s& operator ++() { ++m_it; return *this; }
private:
R (value_type::*mem_fn)();
Iter m_it;
};
A generator function follows to create the above monstrosity:
template<typename Iter, typename R>
memfn_iterator_s<Iter,R> memfn_iterator(
Iter it,
R (std::iterator_traits<Iter>::value_type::*fn)())
{
return memfn_iterator_s<Iter,R>(it, fn);
}
What this buys you is the ability to do this:
auto it_end = memfn_iterator(items.end(), &Item::getFitness);
for(unsigned int i = 0; i < N; ++i)
{
auto it_begin = memfn_iterator(items.begin()+i, &Item::getFitness);
std::discrete_distribution<unsigned int> dist(it_begin, it_end);
std::swap(items.at(i), items.at(i+dist(rng)));
}
items.erase(items.begin() + N, items.end());
No temporary array is required. The member function is called for the respective item when required by the discrete distribution (which usually keeps it own vector of weights, and as such replicating that effort would be redundant).
Dunno if you'll get anything helpful or useful out of that, but it was fun to think about.
It's pretty nice that they have a discrete distribution in STL. As far as I know, the most efficient algorithm for sampling from a set of weighted objects (i.e., with probability proportional to weights) is the alias method. There's a Java implementation here: http://www.keithschwarz.com/interesting/code/?dir=alias-method
I suspect that's what the STL discrete_distribution uses anyway. If you're going to be calling your getItems function frequently, you might want to create a "FitnessSet" class or something so that you don't have to build your distribution every time you want to sample from the same set.
EDIT: Another suggestion... If you want to be able to delete items, you could instead store your objects in a binary tree. Each node would contain the sum of the weights in the subtree beneath it, and the objects themselves could be in the leaves. You could select an object through a series of log(N) coin tosses: at a given node, choose a random number between 0 and node.subtreeweight. If it's less than node.left.subtreeweight, go left; otherwise go right. Continue recursively until you reach a leaf.
I would try something like the following (see code comments):
#include <algorithm> // For std::swap and std::transform
#include <functional> // For std::mem_fun_ref
#include <random> // For std::discrete_distribution
#include <vector> // For std::vector
size_t
get_items(std::vector<Item>& results, const std::vector<Item>& items)
{
// Copy the items to the results vector. All operations should be
// done on it, rather than the original items vector.
results.assign(items.begin(), items.end());
// Create the fitness values vector, immediately allocating
// the number of doubles required to match the size of the
// input item vector.
std::vector<double> fitness_vals(results.size());
// Use some STL "magic" ...
// This will iterate over the items vector, calling the
// getFitness() method on each item, and storing the result
// in the fitness_vals vector.
std::transform(results.begin(), results.end(),
fitness_vals.begin(),
std::mem_fun_ref(&Item::getFitness));
//
std::mt19937& rng = getRng();
for (size_t i=0; i < results.size(); ++i) {
std::discrete_distribution<int> dist(fitness_vals.begin() + i, fitness_vals.end());
unsigned int pick = dist(rng);
std::swap(fitness_vals[ii], fitness_vals[pick]);
std::swap(results[i], results[pick]);
}
return (results.size());
}
Instead of returning the results vector, the caller provides a vector into which the results should be added. Also, the original vector (passed as the second parameter) remains unchanged. If this is not something that concerns you, you can always pass just the one vector and work with it directly.
I don't see a way to not have the fitness values vector; the discrete_distribution constructor needs to have the begin and end iterators, so from what I can tell, you will need to have this vector.
The rest of it is basically the same, with the return value being the number of items in the result vector, rather than the vector itself.
This example makes use of a number of STL features (algorithms, containers, functors) which I have found to be useful and part of my day-to-day development.
Edit: the call to items.erase() is superfluous; items.begin() + N where N == items.size() is equivalent to items.end(). The call to items.erase() would equate to a no-op.
Given an array v (some STL container, e.g. std::vector< double >) of generally unsorted data (say assert(std::is_same< typeof(v), V >::value);). Over the elements of the array is defined comparison operator, say std::less. You need to create an array with n minimal elements (copies form v), but the elements are not default constructible (or is expensive operation). How to do it by means of STL? Non-modifying sequence algorithm is required.
Originally seen as a way to solve using std::back_insert_iterator, but there is some confusion as explained further:
assert(!std::is_default_constructible< typename V::value_type >::value); // assume
template< class V >
V min_n_elements(typename V::const_iterator begin, typename V::const_iterator end, typename V::size_type const n)
{
assert(!(std::distance(begin, end) < n));
V result; // V result(n); not allowed
result.reserve(n);
std::partial_sort_copy(begin, end, std::back_inserter(result), /*What should be here? mb something X(result.capacity())?*/, std::less< typename V::value_type >());
return result;
}
I want to find solution that is optimal in terms of time and memory (O(1) additional memory and <= O(std::partial_sort_copy) time consumption). Totally algorithm should operate on the following number of memory: v.size() elements of non-modifiable source v as input and n of newly created elements, all of which are copies of the n smallest elements of source array v, as output. That's all. I think this is a realistic limits.
EDIT: reimplemented with heap:
template< class V >
V min_n_elements(typename V::const_iterator b, typename V::const_iterator e, typename V::size_type const n) {
assert(std::distance(b, e) >= n);
V res(b, b+n);
make_heap(res.begin(), res.end());
for (auto i=b+n; i<e; ++i) {
if (*i < res.front()) {
pop_heap(res.begin(), res.end());
res.back() = *i;
push_heap(res.begin(), res.end());
}
}
return std::move(res);
}
Unless you also need those elements sorted, it's probably easiest and fastest to use std::nth_element, then std::copy.
template <class InIter, class OutIter>
min_n_elements(InIter b, InIter e, OutIter o, InIter::difference_type n) {
InIter pos = b+n;
std::nth_element(b, pos, e);
std:copy(b, pos, o);
}
std::nth_element not only finds the given element, but guarantees that those elements less than that are two it's "left", and those greater are to its "right".
This does side-step the real problem a bit though -- instead of actually creating the container for the results, it simply expects the user to create a container of the correct type, and then provide an iterator (e.g., a back_insert_iterator) to put the data in the right place. At the same time, I think this is really the correct thing to do -- the algorithm to find N minimum elements and the choice of container for the destination are separate.
If you really want to put the result in a specific container type anyway, that shouldn't be terribly difficult though:
template <class V>
V n_min_element(V::iterator b, V::iterator e) {
V::const_iterator pos = b+n;
nth_element(b, pos, e);
V ret(b, pos);
return V;
}
As they stand, these do modify the (order of elements in) the input, but given that you've said the input isn't sorted, I'm assuming their order doesn't matter, so that should be permissible. If you can't do that, the next possibility is probably to create a collection of pointers, and use a comparison function that compares based on the pointees, then do your nth_element on that, and finally copy the pointees to the new collection.
I have a Visual Studio 2008 C++03 application where I have two standard containers. I would like to remove from one container all of the items that are present in the other container (the intersection of the sets).
something like this:
std::vector< int > items = /* 1, 2, 3, 4, 5, 6, 7 */;
std::set< int > items_to_remove = /* 2, 4, 5*/;
std::some_algorithm( items.begin, items.end(), items_to_remove.begin(), items_to_remove.end() );
assert( items == /* 1, 3, 6, 7 */ )
Is there an existing algorithm or pattern that will do this or do I need to roll my own?
Thanks
Try with:
items.erase(
std::remove_if(
items.begin(), items.end()
, std::bind1st(
std::mem_fun( &std::set< int >::count )
, items_to_remove
)
)
, items.end()
);
std::remove(_if) doesn't actually remove anything, since it works with iterators and not containers. What it does is reorder the elements to be removed at the end of the range, and returns an iterator to the new end of the container. You then call erase to actually remove from the container all of the elements past the new end.
Update: If I recall correctly, binding to a member function of a component of the standard library is not standard C++, as implementations are allowed to add default parameters to the function. You'd be safer by creating your own function or function-object predicate that checks whether the element is contained in the set of items to remove.
Personally, I prefer to create small helpers for this (that I reuse heavily).
template <typename Container>
class InPredicate {
public:
InPredicate(Container const& c): _c(c) {}
template <typename U>
bool operator()(U const& u) {
return std::find(_c.begin(), _c.end(), u) != _c.end();
}
private:
Container const& _c;
};
// Typical builder for automatic type deduction
template <typename Container>
InPredicate<Container> in(Container const& c) {
return InPredicate<Container>(c);
}
This also helps to have a true erase_if algorithm
template <typename Container, typename Predicate>
void erase_if(Container& c, Predicate p) {
c.erase(std::remove_if(c.begin(), c.end(), p), c.end());
}
And then:
erase_if(items, in(items_to_remove));
which is pretty readable :)
One more solution:
There is standard provided algorithm set_difference which can be used for this.
But it requires extra container to hold the result. I personally prefer to do it in-place.
std::vector< int > items;
//say items = [1,2,3,4,5,6,7,8,9]
std::set<int>items_to_remove;
//say items_to_remove = <2,4,5>
std::vector<int>result(items.size()); //as this algorithm uses output
//iterator not inserter iterator for result.
std::vector<int>::iterator new_end = std::set_difference(items.begin(),
items.end(),items_to_remove.begin(),items_to_remove.end(),result.begin());
result.erase(new_end,result.end()); // to erase unwanted elements at the
// end.
You can use std::erase in combination with std::remove for this. There is a C++ idiom called the Erase - Remove idiom, which is going to help you accomplish this.
Assuming you have two sets, A and B, and you want to remove from B, the intersection, I, of (A,B) such that I = A^B, your final results will be:
A (left intact)
B' = B-I
Full theory:
http://math.comsci.us/sets/difference.html
This is quite simple.
Create and populate A and B
Create a third intermediate vector, I
Copy the contents of B into I
For each element a_j of A, which contains j elements, search I for the element a_j; If the element is found in I, remove it
Finally, the code to remove an individual element can be found here:
How do I remove an item from a stl vector with a certain value?
And the code to search for an item is here:
How to find if an item is present in a std::vector?
Good luck!
Here's a more "hands-on" in-place method that doesn't require fancy functions nor do the vectors need to be sorted:
#include <vector>
template <class TYPE>
void remove_intersection(std::vector<TYPE> &items, const std::vector<TYPE> &items_to_remove)
{
for (int i = 0; i < (int)items_to_remove.size(); i++) {
for (int j = 0; j < (int)items.size(); j++) {
if (items_to_remove[i] == items[j]) {
items.erase(items.begin() + j);
j--;//Roll back the iterator to prevent skipping over
}
}
}
}
If you know that the multiplicity in each set is 1 (not a multiset), then you can actually replace the j--; line with a break; for better performance.