Sort when only equality is available

Sort when only equality is available - c++

Suppose we have a vector of pairs:
std::vector<std::pair<A,B>> v;
where for type A only equality is defined:
bool operator==(A const & lhs, A const & rhs) { ... }
How would you sort it that all pairs with the same first element will end up close? To be clear, the output I hope to achieve should be the same as does something like this:
std::unordered_multimap<A,B> m(v.begin(),v.end());
std::copy(m.begin(),m.end(),v.begin());
However I would like, if possible, to:
Do the sorting in place.
Avoid the need to define a hash function for equality.
Edit: additional concrete information.
In my case the number of elements isn't particularly big (I expect N = 10~1000), though I have to repeat this sorting many times ( ~400) as part of a bigger algorithm, and the datatype known as A is pretty big (it contains among other things an unordered_map with ~20 std::pair<uint32_t,uint32_t> in it, which is the structure preventing me to invent an ordering, and making it hard to build a hash function)

First option: cluster() and sort_within()
The handwritten double loop by #MadScienceDreams can be written as a cluster() algorithm of O(N * K) complexity with N elements and K clusters. It repeatedly calls std::partition (using C++14 style with generic lambdas, easily adaptable to C++1, or even C++98 style by writing your own function objects):
template<class FwdIt, class Equal = std::equal_to<>>
void cluster(FwdIt first, FwdIt last, Equal eq = Equal{})
{
for (auto it = first; it != last; /* increment inside loop */)
it = std::partition(it, last, [=](auto const& elem){
return eq(elem, *it);
});
}
which you call on your input vector<std::pair> as
cluster(begin(v), end(v), [](auto const& L, auto const& R){
return L.first == R.first;
});
The next algorithm to write is sort_within which takes two predicates: an equality and a comparison function object, and repeatedly calls std::find_if_not to find the end of the current range, followed by std::sort to sort within that range:
template<class RndIt, class Equal = std::equal_to<>, class Compare = std::less<>>
void sort_within(RndIt first, RndIt last, Equal eq = Equal{}, Compare cmp = Compare{})
{
for (auto it = first; it != last; /* increment inside loop */) {
auto next = std::find_if_not(it, last, [=](auto const& elem){
return eq(elem, *it);
});
std::sort(it, next, cmp);
it = next;
}
}
On an already clustered input, you can call it as:
sort_within(begin(v), end(v),
[](auto const& L, auto const& R){ return L.first == R.first; },
[](auto const& L, auto const& R){ return L.second < R.second; }
);
Live Example that shows it for some real data using std::pair<int, int>.
Second option: user-defined comparison
Even if there is no operator< defined on A, you might define it yourself. Here, there are two broad options. First, if A is hashable, you can define
bool operator<(A const& L, A const& R)
{
return std::hash<A>()(L) < std::hash<A>()(R);
}
and write std::sort(begin(v), end(v)) directly. You will have O(N log N) calls to std::hash if you don't want to cache all the unique hash values in a separate storage.
Second, if A is not hashable, but does have data member getters x(), y() and z(), that uniquely determine equality on A: you can do
bool operator<(A const& L, A const& R)
{
return std::tie(L.x(), L.y(), L.z()) < std::tie(R.x(), R.y(), R.z());
}
Again you can write std::sort(begin(v), end(v)) directly.

if you can come up with a function that assigns to each unique element a unique number, then you can build secondary array with this unique numbers and then sort secondary array and with it primary for example by merge sort.
But in this case you need function that assigns to each unique element a unique number i.e. hash-function without collisions. I think this should not be a problem.
And asymptotic of this solution if hash-function have O(1), then building secondary array is O(N) and sorting it with primary is O(NlogN). And summary O(N + NlogN) = O(N logN).
And the bad side of this solution is that it requires double memory.
In conclusion the main sense of this solution is quickly translate your elements to elements which you can quickly compare.

An in place algorithm is
for (int i = 0; i < n-2; i++)
{
for (int j = i+2; j < n; j++)
{
if (v[j].first == v[i].first)
{
std::swap(v[j],v[i+1]);
i++;
}
}
There is probably a more elegant way to write the loop, but this is O(n*m), where n is the number of elements and m is the number of keys. So if m is much smaller than n (with a best case being that all the keys are the same), this can be approximated by O(n). Worst case, the number of key ~= n, so this is O(n^2). I have no idea what you expect for the number of keys, so I can't really do the average case, but it is most likely O(n^2) for the average case as well.
For a small number of keys, this may work faster than unordered multimap, but you'll have to measure to find out.
Note: the order of clusters is completely random.
Edit: (much more efficient in the partially-clustered case, doesn't change complexity)
for (int i = 0; i < n-2; i++)
{
for(;i<n-2 && v[i+1].first==v[i].first; i++){}
for (int j = i+2; j < n; j++)
{
if (v[j].first == v[i].first)
{
std::swap(v[j],v[i+1]);
i++;
}
}
Edit 2: At /u/MrPisarik's comment, removed redundant i check in inner loop.

I'm surprised no one has suggested the use of std::partition yet. It makes the solution nice, elegant, and generic:
template<typename BidirIt, typename BinaryPredicate>
void equivalence_partition(BidirIt first, BidirIt last, BinaryPredicate p) {
using element_type = typename std::decay<decltype(*first)>::type;
if(first == last) {
return;
}
auto new_first = std::partition
(first, last, [=](element_type const &rhs) { return p(*first, rhs); });
equivalence_partition(new_first, last, p);
}
template<typename BidirIt>
void equivalence_partition(BidirIt first, BidirIt last) {
using element_type = typename std::decay<decltype(*first)>::type;
equivalence_partition(first, last, std::equal_to<element_type>());
}
Example here.

Related

faster erase-remove idiom when I don't care about order and don't have duplicates?

I have a vector of objects and want to delete by value. However the value only occurs once if at all, and I don't care about sorting.
Obviously, if such delete-by-values were extremely common, and/or the data set quite big, a vector wouldn't be the best data structure. But let's say I've determined that not to be the case.
To be clear, if my code were C, I'd be happy with the following:
void delete_by_value( int* const piArray, int& n, int iValue ) {
for ( int i = 0; i < n; i++ ) {
if ( piArray[ i ] == iValue ) {
piArray[ i ] = piArray[ --n ];
return;
}
}
}
It seems that the "modern idiom" approach using std::algos and container methods would be:
v.erase(std::remove(v.begin(), v.end(), iValue), v.end());
But that should be far slower since for a random existent element, it's n/2 moves and n compares. My version is 1 move and n/2 compares.
Surely there's a better way to do this in "the modern idiom" than erase-remove-idiom? And if not why not?

Use std::find to replace the loop. Take the replacement value from the predecessor of the end iterator, and also use that iterator to erase that element. As this iterator is to the last element, erase is cheap. Bonus: bool return for success checking and templateing over int.
template<typename T>
bool delete_by_value(std::vector<T> &v, T const &del) {
auto final = v.end();
auto found = std::find(v.begin(), final, del);
if(found == final) return false;
*found = *--final;
v.erase(final);
return true;
}

Surely there's a better way to do this in "the modern idiom" than erase-remove-idiom?
There aren't a ready-made function for every niche use case in the standard library. Unstable remove is one of the functions that is not provided. It has been proposed (p0041r0) a while back though. Likewise, there are also no special versions of algorithms for the special case of vectors that do not contain duplicates.
So, you'll need to implement the algorithm yourself if you wish to use an optimal algorithm. There is std::find for linear search. After that, you only need to assign from last element and finally pop it off.

Most implementations of std::vector::resize will not reallocate if you make the size of the vector smaller. So, the following will probably have similar performance to the C example.
void find_and_delete(std::vector<int>& v, int value) {
auto it = std::find(v.begin(), v.end(), value);
if (it != v.end()) {
*it = v.back();
v.resize(v.size() - 1);
}
}

C++ way would be mostly identical with std::vector:
template <typename T>
void delete_by_value(std::vector<T>& v, const T& value) {
auto it = std::find(v.begin(), v.end(), value);
if (it != v.end()) {
*it = std::move(v.back());
v.pop_back();
}
}

Function on multiple vector

I have a sorting algorithm on a vector, and I want to apply it to several vectors, without knowing how much. The only thing I'm sure is that there will be at least 1 vector (always the same) on which I will perform my algorithm. Other will just follow.
Here's an example :
void sort(std::vector<int>& sortVector, std::vector<double>& follow1, std::vector<char>& follow2, ... ){
for (int i = 1; i<vector.size(); ++i){
if ( vector[i-1] > vector[i] ) { //I know it's not sorting here, it's only for the example
std::swap(vector[i-1], vector[i]);
std::swap(follow1[i-1], follow1[i]);
std::swap(follow2[i-1], follow2[i]);
....
}
}
}
I was thinking about using variadic function, but since it's a recursive function, I was wondering if it won't take too much time to everytime create my va_arg list (I'm working on vector sized 500millions/1billions ...). So does something else exists?
As I'm writing this question, I'm understanding that maybe i'm fooling myself, and there is no other way to achieve what I want and variadic function is maybe not that long. (I really don't know, in fact).
EDIT :
In fact, I'm doing an Octree-sorting of datas in order to be usable in opengl.
Since my datas are not always the same (e.g OBJ files will gives me normals, PTS files will gives me Intensity and Colors, ...), I want to be able to reorder all my vectors (in which are contained my datas) so that they have the same order as the position vectors (The vector that contains the positions of my points, it'll be always here).
But all my vectors will have same length, and I want all my followervector to be reorganised as the first one.
If i have 3 Vectors, if I swap first and third values in my first vector, I want to swap first and thrid values in my 2 others vectors.
But my vectors are not all the same. Some will be std::vector<char>, other std::vector<Vec3>, std::vector<unsigned>, and so on.

With range-v3, you may use zip, something like:
template <typename T, typename ... Ranges>
void sort(std::vector<T>& refVector, Ranges&& ... ranges){
ranges::sort(ranges::view::zip(refVector, std::forward<Ranges>(ranges)...));
}
Demo
Or if you don't want to use ranges to compare (for ties in refVector), you can project to use only refVector:
template <typename T, typename ... Ranges>
void sort(std::vector<T>& refVector, Ranges&& ... ranges){
ranges::sort(ranges::view::zip(refVector, std::forward<Ranges>(ranges)...),
std::less<>{},
[](auto& tup) -> T& { return std::get<0>(tup); });
}

Although, I totally agree with the comment of n.m. I suggest to use a vector of vectors which contain the follow vectors and than do a loop over all follow vectors.
void sort(std::vector<int>& vector, std::vector<std::vector<double>>& followers){
for (int i = 1; i<vector.size(); ++i){
if ( vector[i-1] > vector[i] ) {
std::swap(vector[i-1], vector[i]);
for (auto & follow : followers)
std::swap(follow[i-1], follow[i]);
}
}
}
Nevertheless, as n.m. pointed out, perhaps think about putting all your data you like to sort in a class like structure. Than you can have a vector of your class and apply std::sort, see here.
struct MyStruct
{
int key; //content of your int vector named "vector"
double follow1;
std::string follow2;
// all your inforrmation of the follow vectors go here.
MyStruct(int k, const std::string& s) : key(k), stringValue(s) {}
};
struct less_than_key
{
inline bool operator() (const MyStruct& struct1, const MyStruct& struct2)
{
return (struct1.key < struct2.key);
}
};
std::vector < MyStruct > vec;
vec.push_back(MyStruct(4, 1.2, "test"));
vec.push_back(MyStruct(3, 2.8, "a"));
vec.push_back(MyStruct(2, 0.0, "is"));
vec.push_back(MyStruct(1, -10.5, "this"));
std::sort(vec.begin(), vec.end(), less_than_key());

The main problem here is that the std::sort algorithm cannot operate on multiple vectors at the same time.
For the purpose of demonstration, let's assume you have a std::vector<int> v1 and a std::vector<char> v2 (of the same size of course) and you want to sort both depending on the values in v1. To solve this, I basically see three possible solutions, all of which generalize to an arbitrary number of vectors:
1) Put all your data into a single vector.
Define a struct, say Data, that keeps an entry of every data vector.
struct Data
{
int d1;
char d2;
// extend here for more vectors
};
Now construct a new std::vector<Data> and fill it from your original vectors:
std::vector<Data> d(v1.size());
for(std::size_t i = 0; i < d.size(); ++i)
{
d[i].d1 = v1[i];
d[i].d2 = v2[i];
// extend here for more vectors
}
Since everything is stored inside a single vector now, you can use std::sort to bring it into order. Since we want it to be sorted based on the first entry (d1), which stores the values of the first vector, we use a custom predicate:
std::sort(d.begin(), d.end(),
[](const Data& l, const Data& r) { return l.d1 < r.d1; });
Afterwards, all data is sorted in d based on the first vector's values. You can now either work on with the combined vector d or you split the data into the original vectors:
std::transform(d.begin(), d.end(), v1.begin(),
[](const Data& e) { return e.d1; });
std::transform(d.begin(), d.end(), v2.begin(),
[](const Data& e) { return e.d2; });
// extend here for more vectors
2) Use the first vector to compute the indices of the sorted range and use these indices to bring all vectors into order:
First, you attach to all elements in your first vector their current position. Then you sort it using std::sort and a predicate that only compares for the value (ignoring the position).
template<typename T>
std::vector<std::size_t> computeSortIndices(const std::vector<T>& v)
{
std::vector<std::pair<T, std::size_t>> d(v.size());
for(std::size_t i = 0; i < v.size(); ++i)
d[i] = std::make_pair(v[i], i);
std::sort(d.begin(), d.end(),
[](const std::pair<T, std::size_t>& l,
const std::pair<T, std::size_t>& r)
{
return l.first < r.first;
});
std::vector<std::size_t> indices(v.size());
std::transform(d.begin(), d.end(), indices.begin(),
[](const std::pair<T, std::size_t>& p) { return p.second; });
return indices;
}
Say in the resulting index vector the entry at position 0 is 8, then this tells you that the vector entries that have to go to the first position in the sorted vectors are those at position 8 in the original ranges.
You then use this information to sort all of your vectors:
template<typename T>
void sortByIndices(std::vector<T>& v,
const std::vector<std::size_t>& indices)
{
assert(v.size() == indices.size());
std::vector<T> result(v.size());
for(std::size_t i = 0; i < indices.size(); ++i)
result[i] = v[indices[i]];
v = std::move(result);
}
Any number of vectors may then be sorted like this:
const auto indices = computeSortIndices(v1);
sortByIndices(v1, indices);
sortByIndices(v2, indices);
// extend here for more vectors
This can be improved a bit by extracting the sorted v1 out of computeSortIndices directly, so that you do not need to sort it again using sortByIndices.
3) Implement your own sort function that is able to operate on multiple vectors. I have sketched an implementation of an in-place merge sort that is able to sort any number of vectors depending on the values in the first one.
The core of the merge sort algorithm is implemented by the multiMergeSortRec function, which takes an arbitrary number (> 0) of vectors of arbitrary types.
The function splits all vectors into first and second half, sorts both halves recursively and merges the the results back together. Search the web for a full explanation of merge sort if you need more details.
template<typename T, typename... Ts>
void multiMergeSortRec(
std::size_t b, std::size_t e,
std::vector<T>& v, std::vector<Ts>&... vs)
{
const std::size_t dist = e - b;
if(dist <= 1)
return;
std::size_t m = b + (dist / static_cast<std::size_t>(2));
// split in half and recursively sort both parts
multiMergeSortRec(b, m, v, vs...);
multiMergeSortRec(m, e, v, vs...);
// merge both sorted parts
while(b < m)
{
if(v[b] <= v[m])
++b;
else
{
++m;
rotateAll(b, m, v, vs...);
if(m == e)
break;
}
}
}
template<typename T, typename... Ts>
void multiMergeSort(std::vector<T>& v, std::vector<Ts>&... vs)
{
// TODO: check that all vectors have same length
if(v.size() < 2)
return ;
multiMergeSortRec<T, Ts...>(0, v.size(), v, vs...);
}
In order to operate in-place, parts of the vectors have to be rotated. This is done by the rotateAll function, which again works on an arbitrary number of vectors by recursively processing the variadic parameter pack.
void rotateAll(std::size_t, std::size_t)
{
}
template<typename T, typename... Ts>
void rotateAll(std::size_t b, std::size_t e,
std::vector<T>& v, std::vector<Ts>&... vs)
{
std::rotate(v.begin() + b, v.begin() + e - 1, v.begin() + e);
rotateAll(b, e, vs...);
}
Note, that the recursive calls of rotateAll are very likely to be inlined by every optimizing compiler, such that the function merely applies std::rotate to all vectors. You can circumvent the need to rotate parts of the vector, if you leave in-place and merge into an additional vector. I like to emphasize that this is neither an optimized nor a fully tested implementation of merge sort. It should serve as a sketch, since you really do not want to use bubble sort whenever you work on large vectors.
Let's quickly compare the above alternatives:
1) is easier to implement, since it relies on an existing (highly optimized and tested) std::sort implementation.
1) needs all data to be copied into the new vector and possibly (depending on your use case) all of it to be copied back.
In 1) multiple places have to be extended if you need to attach additional vectors to be sorted.
The implementation effort for 2) is mediocre (more than 1, but less and easier than 3), but it relies on optimized and tested std::sort.
2) cannot sort in-place (using the indices) and thus has to make a copy of every vector. Maybe there is an in-place alternative, but I cannot think of one right now (at least an easy one).
2) is easy to extend for additional vectors.
For 3) you need to implement sorting yourself, which makes it more difficult to get right.
3) does not need to copy all data. The implementation can be further optimized and can be tweaked for improved performance (out-of-place) or reduced memory consumption (in-place).
3) can work on additional vectors without any change. Just invoke multiMergeSort with one or more additional arguments.
All three work for heterogeneous sets of vectors, in contrast to the std::vector<std::vector<>> approach.
Which of the alternatives performs better in your case, is hard to say and should greatly depend on the number of vectors and their size, so if you really need optimal performance (and/or memory usage) you need to measure.
Find an implementation of the above here.

By far the easiest solution is to create a helper vector std::vector<size_t> initialized with std::iota(helper.begin(), helper.end(), size_t{});.
Next, sort this array,. obviously not by the array index (iota already did that), but by sortvector[i]. IOW, the predicate is [sortvector&](size_t i, size_t j) { sortVector[i] < sortVector[j]; }.
You now have the proper order of array indices. I.e. if helper[0]==17, then it means that the new front of all vectors should be the original 18th element. Usually the easiest way to produce the sorted result is to copy over elements, and then swap the original vector and the copy, repeated for all vectors. But if copying all elements is too expensive, it can be done in-place. (Note that if O(N) element copes are too expensive, a straightforward std::sort tends to perform badly as well as it needs pivots)

How to efficiently compare vectors with C++?

I need advice for micro optimization in C++ for a vector comparison function,
it compares two vectors for equality and order of elements does not matter.
template <class T>
static bool compareVectors(const vector<T> &a, const vector<T> &b)
{
int n = a.size();
std::vector<bool> free(n, true);
for (int i = 0; i < n; i++) {
bool matchFound = false;
for (int j = 0; j < n; j++) {
if (free[j] && a[i] == b[j]) {
matchFound = true;
free[j] = false;
break;
}
}
if (!matchFound) return false;
}
return true;
}
This function is used heavily and I am thinking of possible way to optimize it.
Can you please give me some suggestions? By the way I use C++11.
Thanks

It just realized that this code only does kind of a "set equivalency" check (and now I see that you actually did say that, what a lousy reader I am!). This can be achieved much simpler
template <class T>
static bool compareVectors(vector<T> a, vector<T> b)
{
std::sort(a.begin(), a.end());
std::sort(b.begin(), b.end());
return (a == b);
}
You'll need to include the header algorithm.
If your vectors are always of same size, you may want to add an assertion at the beginning of the method:
assert(a.size() == b.size());
This will be handy in debugging your program if you once perform this operation for unequal lengths by mistake.
Otherwise, the vectors can't be the same if they have unequal length, so just add
if ( a.size() != b.size() )
{
return false;
}
before the sort instructions. This will save you lots of time.
The complexity of this technically is O(n*log(n)) because it's mainly dependent on the sorting which (usually) is of that complexity. This is better than your O(n^2) approach, but might be worse due to the needed copies. This is irrelevant if your original vectors may be sorted.
If you want to stick with your approach, but tweak it, here are my thoughts on this:
You can use std::find for this:
template <class T>
static bool compareVectors(const vector<T> &a, const vector<T> &b)
{
const size_t n = a.size(); // make it const and unsigned!
std::vector<bool> free(n, true);
for ( size_t i = 0; i < n; ++i )
{
bool matchFound = false;
auto start = b.cbegin();
while ( true )
{
const auto position = std::find(start, b.cend(), a[i]);
if ( position == b.cend() )
{
break; // nothing found
}
const auto index = position - b.cbegin();
if ( free[index] )
{
// free pair found
free[index] = false;
matchFound = true;
break;
}
else
{
start = position + 1; // search in the rest
}
}
if ( !matchFound )
{
return false;
}
}
return true;
}
Another possibility is replacing the structure to store free positions. You may try a std::bitset or just store the used indices in a vector and check if a match isn't in that index-vector. If the outcome of this function is very often the same (so either mostly true or mostly false) you can optimize your data structures to reflect that. E.g. I'd use the list of used indices if the outcome is usually false since only a handful of indices might needed to be stored.
This method has the same complexity as your approach. Using std::find to search for things is sometimes better than a manual search. (E.g. if the data is sorted and the compiler knows about it, this can be a binary search).

Your can probabilistically compare two unsorted vectors (u,v) in O(n):
Calculate:
U= xor(h(u[0]), h(u[1]), ..., h(u[n-1]))
V= xor(h(v[0]), h(v[1]), ..., h(v[n-1]))
If U==V then the vectors are probably equal.
h(x) is any non-cryptographic hash function - such as MurmurHash. (Cryptographic functions would work as well but would usually be slower).
(This would work even without hashing, but it would be much less robust when the values have a relatively small range).
A 128-bit hash function would be good enough for many practical applications.

I am noticing that most proposed solution involved sorting booth of the input vectors.I think sorting the arrays compute more that what is strictly necessary for the evaluation the equality of the two vector ( and if the inputs vectors are constant, a copy needs to be made).
One other way would be to build an associative container to count the element in each vector... It's also possible to do the reduction of the two vector in parrallel.In the case of very large vector that could give a nice speed up.
template <typename T> bool compareVector(const std::vector<T> & vec1, const std::vector<T> & vec2) {
if (vec1.size() != vec2.size())
return false ;
//Here we assuame that T is hashable ...
auto count_set = std::unordered_map<T,int>();
//We count the element in each vector...
for (unsigned int count = 0 ; count < vec1.size();++count)
{
count_set[vec1[count]]++;
count_set[vec2[count]]--;
} ;
// If everything balance out we should have zero everywhere
return std::all_of(count_set.begin(),count_set.end(),[](const std::pair<T,int> p) { return p.second == 0 ;});
}
That way depend on the performance of your hashsing function , we might get linear complexity in the the length of booth vector (vs n*logn with the sorting).
NB the code might have some bug , did have time to check it ...
Benchmarking this way of comparing two vector to sort based comparison i get on ubuntu 13.10,vmware core i7 gen 3 :
Comparing 200 vectors of 500 elements by counting takes 0.184113 seconds
Comparing 200 vectors of 500 elements by sorting takes 0.276409 seconds
Comparing 200 vectors of 1000 elements by counting takes 0.359848 seconds
Comparing 200 vectors of 1000 elements by sorting takes 0.559436 seconds
Comparing 200 vectors of 5000 elements by counting takes 1.78584 seconds
Comparing 200 vectors of 5000 elements by sorting takes 2.97983 seconds

As others suggested, sorting your vectors beforehand will improve performance.
As an additional optimization you can make heaps out of the vectors to compare (with complexity O(n) instead of sorting with O(n*log(n)).
Afterwards you can pop elements from both heaps (complexity O(log(n))) until you get a mismatch.
This has the advantage that you only heapify instead of sort your vectors if they are not equal.
Below is a code sample. To know what is really fastest, you will have to measure with some sample data for your usecase.
#include <algorithm>
typedef std::vector<int> myvector;
bool compare(myvector& l, myvector& r)
{
bool possibly_equal=l.size()==r.size();
if(possibly_equal)
{
std::make_heap(l.begin(),l.end());
std::make_heap(r.begin(),r.end());
for(int i=l.size();i!=0;--i)
{
possibly_equal=l.front()==r.front();
if(!possibly_equal)
break;
std::pop_heap(l.begin(),l.begin()+i);
std::pop_heap(r.begin(),r.begin()+i);
}
}
return possibly_equal;
}

If you use this function a lot on the same vectors, it might be better to keep sorted copies for comparison.
In theory it might even be better to sort the vectors and compare sorted vectors if each one is compared just once, (sorting is O(n*log(n)), comparing sorted vector O(n), while your function is O(n^2).
But I suppose the time spent allocating memory for the sorted vectors will dwarf any theoretical gains if you don't compare the same vectors often.
As with all optimisations, profiling is the only way to make sure, I'd try some std::sort / std::equal combo.

Like stefan says you need to sort to get better complexity.
Then you can use
== operator (tnx for the correction in the comments - ste equal will also work but it is more appropriate for comparing ranges not entire containers)
If that is not fast enough only then bother with microoptimization.
Also are vectors guaranteed to be of the same size?
If not put that check at the begining.

Another possible solution (viable only if all elements are unique), which should improve somewhat the solution of #stefan (although the complexity would remain in O(NlogN)) is this:
template <class T>
static bool compareVectors(vector<T> a, const vector<T> & b)
{
// You should probably check this outside as it can
// avoid you the copy of a
if (a.size() != b.size()) return false;
std::sort(a.begin(), a.end());
for (const auto & v : b)
if ( !std::binary_search(a.begin(), a.end(), v) ) return false;
return true;
}
This should be faster since it performs the search directly as an O(NlogN) operation, instead of sorting b (O(NlogN)) and then searching both vectors (O(N)).

vector sort and erase won't work

When using this code to remove duplicates I get invalid operands to binary expression errors. I think that this is down to using a vector of a struct but I am not sure I have Googled my question and I get this code over and over again which suggests that this code is right but it isn't working for me.
std::sort(vec.begin(), vec.end());
vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
Any help will be appreciated.
EDIT:
fileSize = textFile.size();
vector<wordFrequency> words (fileSize);
int index = 0;
for(int i = 0; i <= fileSize - 1; i++)
{
for(int j = 0; j < fileSize - 1; j++)
{
if(string::npos != textFile[i].find(textFile[j]))
{
words[i].Word = textFile[i];
words[i].Times = index++;
}
}
index = 0;
}
sort(words.begin(), words.end());
words.erase(unique(words.begin(), words.end(), words.end()));

First problem.
unique used wrongly
unique(words.begin(), words.end(), words.end()));
You are calling the three operand form of unique, which takes a start, an end, and a predicate. The compiler will pass words.end() as the predicate, and the function expects that to be your comparison functor. Obviously, it isn't one, and you enter the happy world of C++ error messages.
Second problem.
either use the predicate form or define an ordering
See the definitions of sort and unique.
You can either provide a
bool operator< (wordFrequency const &lhs, wordFrequency const &rhs)
{
return lhs.val_ < rhs.val_;
}
, but only do this if a less-than operation makes sense for that type, i.e. if there is a natural ordering, and if it's not just arbitrary (maybe you want other sort orders in the future?).
In the general case, use the predicate forms for sorting:
auto pred = [](wordFrequency const &lhs, wordFrequency const &rhs)
{
return lhs.foo < rhs.foo;
};
sort (words.begin(), words.end(), pred);
words.erase (unique (words.begin(), words.end(), pred));
If you can't C++11, write a functor:
struct FreqAscending { // should make it adaptible with std::binary_function
bool operator() (wordFrequency const &lhs, wordFrequency const &rhs) const
{ ... };
};
I guess in your case ("frequency of words"), operator<makes sense.
Also note vector::erase: This will remove the element indicated by the passed iterator. But, see also std::unique, unique returns an iterator to the new end of the range, and I am not sure if you really want to remove the new end of the range. Is this what you mean?
words.erase (words.begin(),
unique (words.begin(), words.end(), pred));
Third problem.
If you only need top ten, don't sort
C++ comes with different sorting algorithms (based on this). For top 10, you can use:
nth_element: gives you the top elements without sorting them
partial_sort: gives you the top elements, sorted
This wastes less watts on your CPU, will contribute to overall desktop performance, and your laptop batteries last longer so can do even more sorts.

The most probable answer is that operator< is not declared for the type of object vec contains. Have you overloaded it? It should look something like that:
bool operator<(const YourType& _a, const YourType& _b)
{
//... comparison check here
}

That code should work, as std::unique returns an iterator pointing to the beginning of the duplicate elements. What type is your vector containing? Perhaps you need to implement the equality operator.

print and count the number of permutation (without using stl next_permutation)

I'm a C programmer and trying to get better at C++. I want to implement a permutation function (without using the STL algorithms). I came up with the following algorithm (out of my C way of thinking), but
a) it crashes for k > 2 (I suppose because the element that the iterator
points to, gets deleted, is inserted back and then incremented).
b) erase/insert operation seem unnecessary.
How would the C++ experts amongst you implement it?
template <class T>
class Ordering {
public:
Ordering(int n);
int combination(int k);
int permutation(int k);
private:
set<T> elements;
vector<T> order;
}
template <class T>
int Ordering<T>::permutation (int k) {
if (k > elements.size()) {
return 0;
}
if (k == 0) {
printOrder();
return 1;
}
int count = 0;
for (typename set<T>::iterator it = elements.begin();
it != elements.end();
it++
)
{
order[k-1] = *it;
elements.erase(*it);
count += permutation(k-1);
elements.insert(*it);
}
return count;
}

The problem is in your iteration over the elements set. You try to increment an iterator which you have removed. That cannot work.
If you insist in using this approach, you must store the successor of it, before calling set::erase. That means you have to move the incrementation part of your for loop into the loop.
Like this:
for (typename set<T>::iterator it = elements.begin();
it != elements.end();
/* nothing here */
)
{
order[k-1] = *it;
typename set<T>::iterator next = it;
++next;
elements.erase(*it);
count += permutation(k-1);
elements.insert(order[k-1]);
it = next;
}
Edit: One possible way of temporarily "removing" objects from your set would be to have a std::set<std::pair<T,bool>> and simply write it->second = false and afterwards it->second = true. Then, while iterating, you can skip entries where the second value is false. This adds a bit of an overhead since you have to do a lot more work while descending. But inserting+removing elements adds a logarithmic overhead every time, which is probably worse.
If you used a (custom) linked list (perhaps you can even get std::list to do that) you could very inexpensively remove and re-insert objects.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Sort when only equality is available - c++

Related

faster erase-remove idiom when I don't care about order and don't have duplicates?

Function on multiple vector

How to efficiently compare vectors with C++?

vector sort and erase won't work

print and count the number of permutation (without using stl next_permutation)

Categories

Resources