Hashing std::vector independent of items order - c++

I am looking for a hash function for std::vector, which would be independent from vector's item's ordering.
In other words I am looking for a hash implementation,
that would give me same result for
std::vector<int> v1(1,2,3);
std::vector<int> v2(2,3,1);
std::vector<int> v3(1,3,2);
Any ideas on how I might accomplish this?

template<template<class...>class element_hash=std::hash>
struct symmetric_range_hash {
template<class T>
std::size_t operator()( T const& t ) const {
std::size_t r = element_hash<int>{}(0); // seed with the hash of 0.
for (auto&& x:t) {
using element_type = std::decay_t<decltype(x)>;
auto next = element_hash<element_type>{}(x);
r = r + next;
}
return r;
}
};
That should do it. We gather the hashes via + which is symmetric.
+ is better than ^ because it takes longer to get a cycle. With ^, {1,1} and {2,2} would hash the same (and in general even numbers of anything "disappear"). With + they instead get multiplied.
So the end result is the sum, for each distinct value in the array, of the hash of that value times its count, mod "max(size_t)+1".
Note that an unordered_map requires both a hash and an equality. If you want collision, you'll need to also write an ==.
struct unordered_equal {
template<class C>
bool operator()(C const& lhs, C const& rhs)const {
using std::begin;
using K = std::decay_t< *decltype(begin(lhs)) > >;
std::unordered_map< K, std::size_t > counts;
for (auto&& k : lhs) {
counts[k]++;
}
for (auto&& k : rhs) {
counts[k]--;
}
for (auto&& kv : counts)
if (kv.second != 0) return false;
return true;
}
};

Related

Using std algorithm library for unique equivalence with respect to binary relation

I have binary relation on some type T induced by a function equivalent:
bool equivalent(T const& a, T const& b); // returns true if a and b are equivalent
It has the properties that
equivalent(a, a) == true
and
equivalent(a, b) == equivalent(b, a)
for all a, b.
For a given collection of elements of type T, I want to remove all but the first occurrence of each equivalence class. I have come up with the following Code but was wandering:
Is there is a solution without an explicit loop?
std::vector<T> filter_all_but_one_for_each_set_of_equivalent_T(std::vector<T> const& ts) {
std::vector<T> result;
for (auto iter = ts.begin(); iter != ts.end(); ++iter) {
auto const& elem = *iter;
bool has_equivalent_element_at_earlier_position = std::any_of(
ts.begin(),
iter,
&equivalent
);
if (not has_equivalent_element_at_earlier_position) {
result.push_back(routing_pin);
}
}
return result;
}
Update
As far as I understand std::unique won't do because my type T is not sortable. And because I only have C++11 in my case, but I would be interested in other options too for education.
Here's a way that only has one very simple loop:
First define our class, which I'll call A instead of T because T is typically used for templates:
class A{
public:
explicit A(int _i) : i(_i){};
int get() const{return i;}
private:
int i;
};
And then our equivalent function just compares the integers for equality:
bool equivalent(A const& a, A const& b){return a.get() == b.get();}
next I'll define the filtering function.
The idea here is to take advantage of std::remove to do the looping and erasing efficiently for us (it typically swaps elements to the end so that you are not shifting the vector for each removal).
We start by removing everything that matches the first element, then afterwards remove everything that matches the second element (which is guaranteed != to the first element now), and so on.
std::vector<A> filter_all_but_one_for_each_set_of_equivalent_A(std::vector<A> as) {
for(size_t i = 1; i < as.size(); ++i){
as.erase(std::remove_if(as.begin() + i, as.end(), [&as, i](const A& next){return equivalent(as[i-1], next);}), as.end());
}
return as;
}
Demo
Edit: As Richard Hodges mentioned, it is possible to delay any erasing until the very end. I couldn't make it look as beautiful though:
std::vector<A> filter_all_but_one_for_each_set_of_equivalent_A(std::vector<A> as) {
auto end = as.end();
for(size_t i = 1; i < std::distance(as.begin(), end); ++i){
end = std::remove_if(as.begin() + i, end, [&as, i](const A& next){return equivalent(as[i-1], next);});
}
as.erase(end, as.end());
return as;
}
Demo 2
Expanding on my comment in AndyG's answer:
template<class T, class A, class Equivalent>
auto deduplicated2(std::vector<T, A> vec, Equivalent&& equivalent) -> std::vector<T, A>
{
auto current = std::begin(vec);
// current 'last of retained sequence'
auto last = std::end(vec);
while (current != last)
{
// define a predicate which checks for equivalence to current
auto same = [&](T const& x) -> bool
{
return equivalent(*current, x);
};
// move non-equivalent items to end of sequence
// return new 'end of valid sequence'
last = std::remove_if(std::next(current), last, same);
}
// erase all items beyond the 'end of valid sequence'
vec.erase(last, std::end(vec));
return vec;
}
Credit to AndyG please.
For very large vectors where T is hashable, we can aim for an O(n) solution:
template<class T, class A, class Equivalent>
auto deduplicated(std::vector<T, A> const& vec, Equivalent&& equivalent) -> std::vector<T, A>
{
auto seen = std::unordered_set<T, std::hash<T>, Equivalent>(vec.size(), std::hash<T>(), std::forward<Equivalent>(equivalent));
auto result = std::vector<T, A>();
result.resize(vec.size());
auto current = std::begin(vec);
while (current != std::end(vec))
{
if (seen.insert(*current).second)
{
result.push_back(*current);
}
}
return result;
}
Finally, revisiting the first solution and refactoring into sub-concerns (I can't help myself):
// in-place de-duplication of sequence, similar interface to remove_if
template<class Iter, class Equivalent>
Iter inplace_deduplicate_sequence(Iter first, Iter last, Equivalent&& equivalent)
{
while (first != last)
{
// define a predicate which checks for equivalence to current
using value_type = typename std::iterator_traits<Iter>::value_type;
auto same = [&](value_type const& x) -> bool
{
return equivalent(*first, x);
};
// move non-equivalent items to end of sequence
// return new 'end of valid sequence'
last = std::remove_if(std::next(first), last, same);
}
return last;
}
// in-place de-duplication on while vector, including container truncation
template<class T, class A, class Equivalent>
void inplace_deduplicate(std::vector<T, A>& vec, Equivalent&& equivalent)
{
vec.erase(inplace_deduplicate_sequence(vec.begin(),
vec.end(),
std::forward<Equivalent>(equivalent)),
vec.end());
}
// non-destructive version
template<class T, class A, class Equivalent>
auto deduplicated2(std::vector<T, A> vec, Equivalent&& equivalent) -> std::vector<T, A>
{
inplace_deduplicate(vec, std::forward<Equivalent>(equivalent));
return vec;
}
You can try this one. The trick here is to obtain the index while inside predicate.
std::vector<T> output;
std::copy_if(
input.begin(), input.end(),
std::back_inserter(output),
[&](const T& x) {
size_t index = &x - &input[0];
return find_if(
input.begin(), input.begin() + index, x,
[&x](const T& y) {
return equivalent(x, y);
}) == input.begin() + index;
});
Since performance is not an issue, you can use std::accumulate to scan through the elements and add them to an accumulator vector xs if there is not already
an equaivalent element in xs.
With this you don't need any hand-written raw loops at all.
std::vector<A> filter_all_but_one_for_each_set_of_equivalent_A(std::vector<A> as) {
return std::accumulate(as.begin(), as.end(),
std::vector<A>{}, [](std::vector<A> xs, A const& x) {
if ( std::find_if(xs.begin(), xs.end(), [x](A const& y) {return equivalent(x,y);}) == xs.end() ) {
xs.push_back(x);
}
return xs;
});
}
With two helper functions this becomes actually readable:
bool contains_equivalent(std::vector<A> const& xs, A const& x) {
return std::find_if(xs.begin(), xs.end(),
[x](A const& y) {return equivalent(x,y);}) != xs.end();
};
std::vector<A> push_back_if(std::vector<A> xs, A const& x) {
if ( !contains_equivalent(xs, x) ) {
xs.push_back(x);
}
return xs;
};
The function itself is just a call to std::accumulate:
std::vector<A> filter_all_but_one_for_each_set_of_equivalent_A(std::vector<A> as) {
return std::accumulate(as.begin(), as.end(), std::vector<A>{}, push_back_if);
}
I've modified AndyG's example code with my proposed function.
As defined above, std::accumulate calls push_back_if with a copy of the accumulator variable, and the return value is move-assigned to the accumulator again. This is very inefficient, but can be optimized by changing push_back_if to take a reference so that the vector is modified in-place. The initial value needs to be passed as a reference wrapper with std::ref to eliminate remaining copies.
std::vector<A>& push_back_if(std::vector<A>& xs, A const& x) {
if ( !contains_equivalent(xs, x) ) {
xs.push_back(x);
}
return xs;
};
std::vector<A> filter_all_but_one_for_each_set_of_equivalent_A(std::vector<A> const& as) {
std::vector<A> acc;
return std::accumulate(as.begin(), as.end(), std::ref(acc), push_back_if);
}
You can see in the example that the copy-constructor is almost completely eliminated.
First coming up with another loop version, in contrast to your own, it unifies in place, you might find it interesting:
std::vector<int> v({1, 7, 1, 8, 9, 8, 9, 1, 1, 7});
auto retained = v.begin();
for(auto i = v.begin(); i != v.end(); ++i)
{
bool isFirst = true;
for(auto j = v.begin(); j != retained; ++j)
{
if(*i == *j)
{
isFirst = false;
break;
}
}
if(isFirst)
{
*retained++ = *i;
}
}
v.erase(retained, v.end());
This was the base for a version using std::remove_if and std::find_if:
auto retained = v.begin();
auto c = [&v, &retained](int n)
{
if(std::find_if(v.begin(), retained, [n](int m) { return m == n; }) != retained)
return true;
// element remains, so we need to increase!!!
++retained;
return false;
};
v.erase(std::remove_if(v.begin(), v.end(), c), v.end());
You need the lambda in this case, as we need a unique-predicate, whereas equivalent (in my int example represented by operator==) is a binary one...
struct S {
int eq;
int value;
bool operator==(const S& other) const { return eq == other.eq; }
};
namespace std {
template <> struct hash<S>
{
size_t operator()(const S &s) const
{
return hash<int>()(s.eq);
}
};
}
array<S, 6> as{ { {1,0},{2,0},{3,0},{ 1,1 },{ 2,1 },{ 3,1 } } };
unordered_set<S> us(as.cbegin(), as.cend());

Making a comparator from an ordered container

Given a list of objects, what is the cleanest way to create a functor object to act as a comparator, such that the comparator respects the ordering of the objects in the list. It is guaranteed that the objects in the list are unique, and the list contains the entire space of possible objects.
For example, suppose we have:
const std::vector<std::string> ordering {"dog", "cat", "mouse", "elephant"};
Now we want a function to act as a comparator, say for a map:
using Comparator = std::function<bool(const std::string&, const std::string&>;
using MyMap = std::map<std::string, int, Comparator>;
I have a solution, but it's not what I'd call pretty:
const auto cmp = [&ordering] (const auto& lhs, const auto& rhs)
{
const std::array<std::reference_wrapper<const std::decay_t<decltype(lhs)>, 2> values {lhs, rhs};
return *std::find_first_of(std::cbegin(ordering), std::cend(ordering),
std::cbegin(values), std::cend(values),
[] (const auto& lhs, const auto& rhs) {
return lhs == rhs.get();
}) == lhs;
};
Is there something a little less verbose?
You can use:
const std::vector<std::string> ordering {"dog", "cat", "mouse", "elephant"};
struct cmp
{
bool operator()(const std::string& lhs, const std::string& rhs)
{
return (std::find(ordering.begin(), ordering.end(), lhs) <
std::find(ordering.begin(), ordering.end(), rhs));
}
};
using MyMap = std::map<std::string, int, cmp>;
See it working at http://ideone.com/JzTNwt.
Just skip the algorithms and write a for-loop:
auto cmp = [&ordering](auto const& lhs, auto const& rhs) {
for (auto const& elem : ordering) {
// check rhs first in case the two are equal
if (elem == rhs) {
return false;
}
else if (elem == lhs) {
return true;
}
}
return false;
};
It might technically be longer than your solution, but I find it way easier to read.
Alternatively, depending on the size of the ordering, could throw both into a map:
std::unordered_map<std::string, int> as_map(std::vector<std::string> const& ordering)
{
std::unordered_map<std::string, int> m;
for (auto const& elem : ordering) {
m.emplace(elem, m.size());
}
return m;
}
auto cmp = [ordering_map = as_map(ordering)](auto const& lhs, auto const& rhs){
auto left = ordering_map.find(lhs);
auto right = ordering_map.find(rhs);
return left != ordering_map.end() && right != ordering_map.end() &&
left->second < right->second;
};
KISS. Build up your solution from reusable primitives with clear semantics.
order_by takes a projection A->B and returns an ordering on A using the ordering on B. It optionally takes an ordering on B (not used here):
template<class F, class Next=std::less<>>
auto order_by(F&& f, Next&& next = {}) {
return [f=std::forward<F>(f), next=std::forward<Next>(next)]
(auto&& lhs, auto&& rhs)->bool
{
return next(f(lhs), f(rhs));
};
}
index_in takes a container c and returns a function that takes an element, and determines its index in c:
template<class C>
auto index_in( C&& c ) {
return [c=std::forward<C>(c)](auto&& x){
using std::begin; using std::end;
return std::find( begin(c), end(c), x ) - begin(c);
};
}
template<class T>
auto index_in( std::initializer_list<T> il ) {
return [il](auto&& x){
using std::begin; using std::end;
return std::find( begin(il), end(il), x ) - begin(il);
};
}
We then compose them:
auto cmp = order_by( index_in(std::move(ordering) ) );
Each component can be independently tested and validated, and the composition "obviously" works. We can also rewrite index_in to use a faster lookup than linear, say a map from key to index, and we'd have two implementations that can unit test against each other.
I find order_by very often useful. index_in I've never had to use before.
This trick makes constructing the resulting map on one line, instead of storing cmp, practical, as the final description is short and clear.
template<class T>
using Comparator = std::function<bool(T const&, T const&>;
using MyMap = std::map<std::string, int, Comparator<std::string>>;
MyMap m = order_by( index_in({"dog", "cat", "mouse", "elephant"}) );
is also really pretty looking.
Here is a second approach.
We can move some of the work outside of the lambda.
template<class T0, class...Ts>
std::array< std::reference_wrapper<T0>, sizeof...(Ts)+1 >
make_ref_array( T0& t0, Ts&... ts ) {
return {std::ref(t0), std::ref(ts)...};
}
Then we can write the lambda from first principles instead of algorithms:
Comparator cmp = [&ordering] (const auto& lhs, const auto& rhs)
{
if (lhs==rhs) return false; // x is never less than x.
for (auto&& e:ordering)
for (auto&& z:make_ref_array(lhs, rhs))
if (e==z.get())
return std::addressof(z.get())==std::addressof(lhs);
return false;
};
The resulting lambda is a bit less verbose.
If you had ranged based algorithms it might also help.
In both my solutions, all elements not in the list are greater than any element in the list, and are equal to each other.
I suggest you make the key a structure type containing the key (std::string in this case) and the index in the array.
Something like
struct Key
{
std::string str;
size_t index;
};
using MyMap = std::map<Key, int, Comparator>;
struct Comparator
{
bool operator()(const Key &a, const Key &b) const
{
return a.index < b.index;
}
};
This is basically the same as R. Sahu's answer. But since I'd already typed it up... The primary thing I'm advocating here is keeping ordering internal to cmp, presuming that you don't need it externally.
const auto cmp = [ordering = vector<string>{ "dog", "cat", "mouse", "elephant" }](const auto& lhs, const auto& rhs) { return find(cbegin(ordering), cend(ordering), lhs) < find(cbegin(ordering), cend(ordering), rhs); };
Live Example
Encapsulating the ordering within cmp makes the scope that a reader has to look at when he changes ordering much smaller. (Personally I wouldn't construct cmp as an Rvalue, I'd just dump it directly into the constructor for the same reason... Though it is becoming a bit of a redonculous one-liner:
map<string, int, function<bool(const string&, const string&)>> myMap([ordering = vector<string>{ "dog", "cat", "mouse", "elephant" }](const auto& lhs, const auto& rhs) { return find(cbegin(ordering), cend(ordering), lhs) < find(cbegin(ordering), cend(ordering), rhs); });
Live Example

How to use an unordered_map of fixed array and int?

My code works just fine when I use map< array<int,FIXEDSIZE>, int> but not when I use unordered_map< array<int,FIXEDSIZE>, int>.
It creates this massive list of errors so I don't really know what's wrong. Things like "value" is not a member, or "no match for operator[]", etc.
This is how I am using my map (which I name cache):
if (cache.find(key) != cache.end()) return cache[key];
and
cache[key] = valueToMemoize;
This is basically what boost::hash_combine boils down to:
void hash_combine(std::size_t& seed, std::size_t value) {
seed ^= value + 0x9e3779b9 + (seed<<6) + (seed>>2);
}
A simple hasher for containers - hash all of their elements using std::hash, and combine them.
struct container_hasher {
template<class T>
std::size_t operator()(const T& c) const {
std::size_t seed = 0;
for(const auto& elem : c) {
hash_combine(seed, std::hash<typename T::value_type>()(elem));
}
return seed;
}
};
Use:
std::unordered_map<std::array<int, 10>, int, container_hasher> my_map;
For cheaper lookup, do
auto r = cache.find(key);
if(r != cache.end()) return r->second;
For std::map, you might want to use lower_bound instead, to help with the later insertion:
auto lb = cache.lower_bound(key);
if(lb != cache.end() && lb->first == key) return lb->second;
cache.emplace_hint(lb, key, valueToMemoize);
You need to define your custom hash object like below:
template<typename T, std::size_t N>
class arrayHash {
public:
std::size_t operator()(std::array<T, N> const &arr) const {
std::size_t sum(0);
for(auto &&i : arr) sum += std::hash<T>()(i);
return sum;
}
};
And then define your unordered_map as:
std::unordered_map<std::array<int, FIXEDSIZE>, int, arrayHash<int, FIXEDSIZE>> umap;
Live Demo

Efficient way to find frequencies of each unique value in the std::vector

Given a vector std::vector<double> v, we can find unique elements efficiently by:
std::vector<double> uv(v.begin(), v.end());
std::sort(uv.begin(), uv.end());
std::erase(std::unique(uv.begin, uv.end()), uv.end());
What would the be the nicest way (without loops, with STL or lambdas) to create a vector:
std::vector<double> freq_uv(uv.size());
which would contain frequencies of each distinct element appearing in v (order the same as sorted unique values)?
Note: type can be anything, not just double
After you sort, before you erase:
std::vector<int> freq_uv;
freq_uv.push_back(0);
auto prev = uv[0]; // you should ensure !uv.empty() if previous code did not already ensure it.
for (auto const & x : uv)
{
if (prev != x)
{
freq_uv.push_back(0);
prev = x;
}
++freq_uv.back();
}
Note that, while I generally like to count occurences with a map, as Yakk is doing, in this case I think it is doing a lot of unnecessary work as we already know the vector is sorted.
Another possibility is to use a std::map (not unordered), instead of sorting. This will get your frequencies first. Then, since the map is ordered, you can just create the sorted, unique vector, and the frequency vector directly from the map.
// uv not yet created
std::map<T, int> freq_map;
for (auto const & x : v)
++freq_map[x];
std::vector<T> uv;
std::vector<int> freq_uv;
for (auto const & p : freq_map)
{
uv.push_back(p.first);
freq_uv.push_back(p.second);
}
First, note that == and to a lesser extent < on double is often a poor idea: often you'll have values that logically "should" be equal if the double was infinite precision, but are slightly different.
However, collecting the frequencies is easy:
template<typename T, typename Allocator>
std::unordered_map< T, std::size_t > frequencies( std::vector<T, Allocator> const& src ) {
std::unordered_map< T, std::size_t > retval;
for (auto&& x:src)
++retval[x];
return retval;
}
assuming std::hash<T> is defined (which it is for double). If not, there is more boilerplate, so I'll skip it. Note that this does not care if the vector is sorted.
If you want it in the form of std::vector<std::size_t> in sync with your sorted vector, you can just do this:
template<typename T, typename Hash, typename Equality, typename Allocator>
std::vector<std::size_t> collate_frequencies(
std::vector<T, Allocator> const& order,
std::unordered_map<T, std::size_t, Hash, Equality> const& frequencies
) {
std::vector<std::size_t> retval;
retval.reserve(order.size());
for( auto&& x : order )
retval.push_back( frequencies[x] );
return retval;
}
I took the liberty of making these functions overly generic, so they support more than just doubles.
using equal_range:
std::vector<int> results;
for(auto i = begin(v); i != end(v);)
{
auto r = std::equal_range(i, end(v), *i);
results.emplace_back( std::distance(r.first, r.second) );
i = r.second;
}
SSCCE:
#include <vector>
#include <algorithm>
#include <iostream>
#include <iterator>
int main()
{
std::vector<double> v{1.0, 2.0, 1.0, 2.0, 1.0, 3.0};
std::sort(begin(v), end(v));
std::vector<int> results;
for(auto i = begin(v); i != end(v);)
{
auto r = std::equal_range(i, end(v), *i);
results.emplace_back( std::distance(r.first, r.second) );
i = r.second;
}
for(auto const& e : results) std::cout << e << "; ";
}
An O(n) solution when the range of values is limited, for example chars. Using less than the CPU level 1 cache for the counter leaves room for other values.
(untested code)
constexp int ProblemSize = 256;
using CountArray = std::array<int, ProblemSize>;
CountArray CountUnique(const std::vector<char>& vec) {
CountArray count;
for(const auto ch : vec)
count[ch]++;
return count;
}

Determining the unique rows of a 2D array (vector<vector<T> >)

I am using a datatype of std::vector<std::vector<T> > to store a 2D matrix/array. I would like to determine the unique rows of this matrix. I am looking for any advice or pointers on how to go about doing this operation.
I have tried two methods.
Method 1: slightly convoluted. I keep an index for each row with 0/1 indicating whether the row is a duplicate value, and work through the matrix, storing the index of each unique row in a deque. I want to store the results in a <vector<vector<T> >, and so from this deque of indices, I pre-allocate and then assign the rows from the matrix into the return value.
Method 2: Is easier to read, and in many cases faster than method 1. I keep a deque of the unique rows that have been found, and just loop through the rows and compare each row to all the entries in this deque.
I am comparing both of these methods to matlab, and these C++ routines are orders of magnitude slower. Does anyone have any clever ideas on how I might speed this operation up? I am looking to do this operation on matrices that potentially have millions of rows.
I am storing the unique rows in a deque during the loop to avoid the cost of resizing a vector, and then copying the deque to the vector<vector<T> > for the results. I've benchmarked this operation closely, and it is not anywhere near slowing operation down, it accounts for less than .5% of the runtime on a matrix with 100,000 rows for example.
Thanks,
Bob
Here is the code. If anyone is interested in a more complete example showing the usage, drop me a comment and I can put something together.
Method 1:
template <typename T>
void uniqueRows( const std::vector<std::vector<T> > &A,
std::vector<std::vector<T> > &ret) {
// Go through a vector<vector<T> > and find the unique rows
// have a value ind for each row that is 1/0 indicating if a value
// has been previously searched.
// cur : current item being compared to every item
// num : number of values searched for. Once all the values in the
// matrix have been searched, terminate.
size_t N = A.size();
size_t num=1,cur=0,it=1;
std::vector<unsigned char> ind(N,0);
std::deque<size_t> ulist; // create a deque to store the unique inds
ind[cur] = 1;
ulist.push_back(0); // ret.push_back(A[0]);
while(num < N ) {
if(it >= N ) {
++cur; // find next non-duplicate value, push back
while(ind[cur])
++cur;
ulist.push_back(cur); //ret.push_back(A[cur]);
++num;
it = cur+1; // start search for duplicates at the next row
if(it >= N && num == N)
break;
}
if(!ind[it] && A[cur]==A[it]) {
ind[it] = 1; // mark as duplicate
++num;
}
++it;
} // ~while num
// loop over the deque and .push_back the unique vectors
std::deque<size_t>::iterator iter;
const std::deque<size_t>::iterator end = ulist.end();
ret.reserve(ulist.size());
for(iter= ulist.begin(); iter != end; ++iter) {
ret.push_back(A[*iter]);
}
}
Here is the code for method 2:
template <typename T>
inline bool isInList(const std::deque< std::vector<T> > &A,
const std::vector<T> &b) {
typename std::deque<std::vector<T> >::const_iterator it;
const typename std::deque<std::vector<T> >::const_iterator end = A.end();
for(it = A.begin(); it != end; ++it) {
if(*it == b)
return true;
}
return false;
}
template <typename T>
void uniqueRows1(const::std::vector<std::vector<T> > &A,
std::vector<std::vector<T> > &ret) {
typename std::deque<std::vector<T> > ulist;
typename std::vector<std::vector<T> >::const_iterator it = A.begin();
const typename std::vector<std::vector<T> >::const_iterator end = A.end();
ulist.push_back(*it);
for(++it; it != end; ++it) {
if(!isInList(ulist,*it)) {
ulist.push_back(*it);
}
}
ret.reserve(ulist.size());
for(size_t i = 0; i != ulist.size(); ++i) {
ret.push_back(ulist[i]);
}
}
You should also consider using hashing, it preserves row ordering and could be faster (amortized O(m*n) if alteration of the original is permitted, O(2*m*n) if a copy is required) than sort/unique -- especially noticeable for large matrices (on small matrices you are probably better off with Billy's solution since his requires no additional memory allocation to keep track of the hashes.)
Anyway, taking advantage of Boost.Unordered, here's what you can do:
#include <vector>
#include <boost/foreach.hpp>
#include <boost/ref.hpp>
#include <boost/typeof/typeof.hpp>
#include <boost/unordered_set.hpp>
namespace boost {
template< typename T >
size_t hash_value(const boost::reference_wrapper< T >& v) {
return boost::hash_value(v.get());
}
template< typename T >
bool operator==(const boost::reference_wrapper< T >& lhs, const boost::reference_wrapper< T >& rhs) {
return lhs.get() == rhs.get();
}
}
// destructive, but fast if the original copy is no longer required
template <typename T>
void uniqueRows_inplace(std::vector<std::vector<T> >& A)
{
boost::unordered_set< boost::reference_wrapper< std::vector< T > const > > unique(A.size());
for (BOOST_AUTO(it, A.begin()); it != A.end(); ) {
if (unique.insert(boost::cref(*it)).second) {
++it;
} else {
A.erase(it);
}
}
}
// returning a copy (extra copying cost)
template <typename T>
void uniqueRows_copy(const std::vector<std::vector<T> > &A,
std::vector< std::vector< T > > &ret)
{
ret.reserve(A.size());
boost::unordered_set< boost::reference_wrapper< std::vector< T > const > > unique;
BOOST_FOREACH(const std::vector< T >& row, A) {
if (unique.insert(boost::cref(row)).second) {
ret.push_back(row);
}
}
}
EDIT: I forgot std::vector already defines operator< and operator== so you need not even use that:
template <typename t>
std::vector<std::vector<t> > GetUniqueRows(std::vector<std::vector<t> > input)
{
std::sort(input.begin(), input.end());
input.erase(std::unique(input.begin(), input.end()), input.end());
return input;
}
Use std::unique in concert with a custom functor which calls std::equal on the two vectors.
std::unique requires that the input be sorted first. Use a custom functor calling std::lexicographical_compare on the two vectors input. If you need to recover the unreordered output, you'll need to store the existing order somehow. This will achieve M*n log n complexity for the sort operation (where M is the length of the inner vectors, n is the number of inner vectors), while the std::unique call will take m*n time.
For comparison, both your existing approaches are m*n^2 time.
EDIT: Example:
template <typename t>
struct VectorEqual : std::binary_function<const std::vector<t>&, const std::vector<t>&, bool>
{
bool operator()(const std::vector<t>& lhs, const std::vector<t>& rhs)
{
if (lhs.size() != rhs.size()) return false;
return std::equal(lhs.first(), lhs.second(), rhs.first());
}
};
template <typename t>
struct VectorLess : std::binary_function<const std::vector<t>&, const std::vector<t>&, bool>
{
bool operator()(const std::vector<t>& lhs, const std::vector<t>& rhs)
{
return std::lexicographical_compare(lhs.first(), lhs.second(), rhs.first(), rhs.second());
}
};
template <typename t>
std::vector<std::vector<t> > GetUniqueRows(std::vector<std::vector<t> > input)
{
std::sort(input.begin(), input.end(), VectorLess<t>());
input.erase(std::unique(input.begin(), input.end(), VectorEqual<t>()), input.end());
return input;
}