Making a comparator from an ordered container - c++

Given a list of objects, what is the cleanest way to create a functor object to act as a comparator, such that the comparator respects the ordering of the objects in the list. It is guaranteed that the objects in the list are unique, and the list contains the entire space of possible objects.
For example, suppose we have:
const std::vector<std::string> ordering {"dog", "cat", "mouse", "elephant"};
Now we want a function to act as a comparator, say for a map:
using Comparator = std::function<bool(const std::string&, const std::string&>;
using MyMap = std::map<std::string, int, Comparator>;
I have a solution, but it's not what I'd call pretty:
const auto cmp = [&ordering] (const auto& lhs, const auto& rhs)
{
const std::array<std::reference_wrapper<const std::decay_t<decltype(lhs)>, 2> values {lhs, rhs};
return *std::find_first_of(std::cbegin(ordering), std::cend(ordering),
std::cbegin(values), std::cend(values),
[] (const auto& lhs, const auto& rhs) {
return lhs == rhs.get();
}) == lhs;
};
Is there something a little less verbose?

You can use:
const std::vector<std::string> ordering {"dog", "cat", "mouse", "elephant"};
struct cmp
{
bool operator()(const std::string& lhs, const std::string& rhs)
{
return (std::find(ordering.begin(), ordering.end(), lhs) <
std::find(ordering.begin(), ordering.end(), rhs));
}
};
using MyMap = std::map<std::string, int, cmp>;
See it working at http://ideone.com/JzTNwt.

Just skip the algorithms and write a for-loop:
auto cmp = [&ordering](auto const& lhs, auto const& rhs) {
for (auto const& elem : ordering) {
// check rhs first in case the two are equal
if (elem == rhs) {
return false;
}
else if (elem == lhs) {
return true;
}
}
return false;
};
It might technically be longer than your solution, but I find it way easier to read.
Alternatively, depending on the size of the ordering, could throw both into a map:
std::unordered_map<std::string, int> as_map(std::vector<std::string> const& ordering)
{
std::unordered_map<std::string, int> m;
for (auto const& elem : ordering) {
m.emplace(elem, m.size());
}
return m;
}
auto cmp = [ordering_map = as_map(ordering)](auto const& lhs, auto const& rhs){
auto left = ordering_map.find(lhs);
auto right = ordering_map.find(rhs);
return left != ordering_map.end() && right != ordering_map.end() &&
left->second < right->second;
};

KISS. Build up your solution from reusable primitives with clear semantics.
order_by takes a projection A->B and returns an ordering on A using the ordering on B. It optionally takes an ordering on B (not used here):
template<class F, class Next=std::less<>>
auto order_by(F&& f, Next&& next = {}) {
return [f=std::forward<F>(f), next=std::forward<Next>(next)]
(auto&& lhs, auto&& rhs)->bool
{
return next(f(lhs), f(rhs));
};
}
index_in takes a container c and returns a function that takes an element, and determines its index in c:
template<class C>
auto index_in( C&& c ) {
return [c=std::forward<C>(c)](auto&& x){
using std::begin; using std::end;
return std::find( begin(c), end(c), x ) - begin(c);
};
}
template<class T>
auto index_in( std::initializer_list<T> il ) {
return [il](auto&& x){
using std::begin; using std::end;
return std::find( begin(il), end(il), x ) - begin(il);
};
}
We then compose them:
auto cmp = order_by( index_in(std::move(ordering) ) );
Each component can be independently tested and validated, and the composition "obviously" works. We can also rewrite index_in to use a faster lookup than linear, say a map from key to index, and we'd have two implementations that can unit test against each other.
I find order_by very often useful. index_in I've never had to use before.
This trick makes constructing the resulting map on one line, instead of storing cmp, practical, as the final description is short and clear.
template<class T>
using Comparator = std::function<bool(T const&, T const&>;
using MyMap = std::map<std::string, int, Comparator<std::string>>;
MyMap m = order_by( index_in({"dog", "cat", "mouse", "elephant"}) );
is also really pretty looking.
Here is a second approach.
We can move some of the work outside of the lambda.
template<class T0, class...Ts>
std::array< std::reference_wrapper<T0>, sizeof...(Ts)+1 >
make_ref_array( T0& t0, Ts&... ts ) {
return {std::ref(t0), std::ref(ts)...};
}
Then we can write the lambda from first principles instead of algorithms:
Comparator cmp = [&ordering] (const auto& lhs, const auto& rhs)
{
if (lhs==rhs) return false; // x is never less than x.
for (auto&& e:ordering)
for (auto&& z:make_ref_array(lhs, rhs))
if (e==z.get())
return std::addressof(z.get())==std::addressof(lhs);
return false;
};
The resulting lambda is a bit less verbose.
If you had ranged based algorithms it might also help.
In both my solutions, all elements not in the list are greater than any element in the list, and are equal to each other.

I suggest you make the key a structure type containing the key (std::string in this case) and the index in the array.
Something like
struct Key
{
std::string str;
size_t index;
};
using MyMap = std::map<Key, int, Comparator>;
struct Comparator
{
bool operator()(const Key &a, const Key &b) const
{
return a.index < b.index;
}
};

This is basically the same as R. Sahu's answer. But since I'd already typed it up... The primary thing I'm advocating here is keeping ordering internal to cmp, presuming that you don't need it externally.
const auto cmp = [ordering = vector<string>{ "dog", "cat", "mouse", "elephant" }](const auto& lhs, const auto& rhs) { return find(cbegin(ordering), cend(ordering), lhs) < find(cbegin(ordering), cend(ordering), rhs); };
Live Example
Encapsulating the ordering within cmp makes the scope that a reader has to look at when he changes ordering much smaller. (Personally I wouldn't construct cmp as an Rvalue, I'd just dump it directly into the constructor for the same reason... Though it is becoming a bit of a redonculous one-liner:
map<string, int, function<bool(const string&, const string&)>> myMap([ordering = vector<string>{ "dog", "cat", "mouse", "elephant" }](const auto& lhs, const auto& rhs) { return find(cbegin(ordering), cend(ordering), lhs) < find(cbegin(ordering), cend(ordering), rhs); });
Live Example

Related

Can I extend std::map::lower_bound to search on non-key_type arguments?

Here is an illustration of my situation. I have a std::map and I want to find the first pair<key,value> where the key is any member of an equivalence class of keys.
#include <map>
struct Category
{
int foo;
int bar;
bool operator < (const Category & rhs) const;
bool operator > (const Category & rhs) const;
};
struct Key
{
Category category;
float quality;
bool operator < (const Key & rhs) const
{
if (category < rhs.category)
return true;
else if (category > rhs.category)
return false;
else
return quality < rhs.quality;
}
};
struct Value {};
typedef std::map <Key, Value> Container;
Container::iterator find_low_quality
(
Container & container,
const Category & category
)
{
return container.lower_bound (category);
}
Container::iterator find_high_quality
(
Container & container,
const Category & category
)
{
// some checks need to be done, here omitted for brevity
return --container.upper_bound (category);
}
This doesn't work because map::lower_bound and map::upper_bound only take a key_type (i.e. Key) argument. I couldn't get std::lower_bound to compile, I see it expects a LegacyForwardIterator but I'm having a hard time interpreting the spec for this.
Insofar as the Key for my map is ordered, the Key has a compatible ordering with Category, viz: k<c if and only if k.category<c, so my requirements seem to make logical sense.
In the real situation, the Key class is more complex, and separating the quality/category components (in order to use a map<category,map<quality,value>> solution) isn't really going to work, in case that's what you're thinking of.
How can I find the lower (and upper) bounds of the range of elements in my map whose keys are equivalent to some non-key value?
C++14 introduced the concept of a transparent comparator, where it is possible to use find, lower_bound, upper_bound, ... with anything that can be compared to the key type, as long as the comparator explicitly opts in to this behavior.
In your case you'd need to add a custom comparator
struct KeyComparator {
// opt into being transparent comparator
using is_transparent = void;
bool operator()(Key const& lhs, Key const& rhs) const {
return lhs < rhs;
}
bool operator()(Key const& lhs, Category const& rhs) const {
return lhs.category < rhs;
}
bool operator()(Category const& lhs, Key const& rhs) const {
return lhs < rhs.category;
}
};
and then you need to use that in your Container
typedef std::map <Key, Value, KeyComparator> Container;
Live demo

Hashing std::vector independent of items order

I am looking for a hash function for std::vector, which would be independent from vector's item's ordering.
In other words I am looking for a hash implementation,
that would give me same result for
std::vector<int> v1(1,2,3);
std::vector<int> v2(2,3,1);
std::vector<int> v3(1,3,2);
Any ideas on how I might accomplish this?
template<template<class...>class element_hash=std::hash>
struct symmetric_range_hash {
template<class T>
std::size_t operator()( T const& t ) const {
std::size_t r = element_hash<int>{}(0); // seed with the hash of 0.
for (auto&& x:t) {
using element_type = std::decay_t<decltype(x)>;
auto next = element_hash<element_type>{}(x);
r = r + next;
}
return r;
}
};
That should do it. We gather the hashes via + which is symmetric.
+ is better than ^ because it takes longer to get a cycle. With ^, {1,1} and {2,2} would hash the same (and in general even numbers of anything "disappear"). With + they instead get multiplied.
So the end result is the sum, for each distinct value in the array, of the hash of that value times its count, mod "max(size_t)+1".
Note that an unordered_map requires both a hash and an equality. If you want collision, you'll need to also write an ==.
struct unordered_equal {
template<class C>
bool operator()(C const& lhs, C const& rhs)const {
using std::begin;
using K = std::decay_t< *decltype(begin(lhs)) > >;
std::unordered_map< K, std::size_t > counts;
for (auto&& k : lhs) {
counts[k]++;
}
for (auto&& k : rhs) {
counts[k]--;
}
for (auto&& kv : counts)
if (kv.second != 0) return false;
return true;
}
};

Elegant way to provide flatting iterator for vector of vectors

I have an adapter, whose goal is to provide forward iterator for pair values pair<FeatureVector, Label>. However in my internal representation I store data as vector<pair<vector<strings>, Label>>.
So during iterations, I need to flatten it and convert every single string, which is short sentence like "oil drops massively today", to FeatureVector
In raw variant I have something like:
{
{"Oil drops massively","OPEC surge oil produciton","Brent price goes up" -> "OIL_LABEL"},
{"France consume more vine", "vine production in Italy drops" -> "VINE_LABEL"}
}
and I need to convert it to:
{
vectorize("Oil drops massively") -> "OIL_LABEL",
vectorize("OPEC surge oil produciton") -> "OIL_LABEL", ... ,
vectorize("vine production in Italy drops") -> "VINE_LABEL"
}
vectorize() -> it's a conversion from sentence to sparse vector like this "Oil drops on NYSE" -> {0,1,0..0,1,0..0,1}
The simpliest way will be create new vector and initialize it with all data and than use it's iterators, but this is pretty resource havy operation, so ideally I want this kind of conversion to be done over each iteration. What is the most elegant way for such kind of conversion?
This is a simplified version of data structure for storing text corpus. Iterators later need to be used in classifier initialization, which require 2 iterators: begin and end which is logically similar to the same as in vector.
A simple range type:
template<class It>
struct range_t {
It b{},e{};
It begin() const {return b;}
It end() const {return e;}
bool empty() const {return begin()==end();}
friend bool operator==(range_t lhs, range_t rhs){
if (lhs.empty() && rhs.empty()) return true;
return lhs.begin() == rhs.begin() && lhs.end() == rhs.end();
}
friend bool operator!=(range_t lhs, range_t rhs){
return !(lhs==rhs);
}
range_t without_front( std::size_t N = 1 ) const {
return { std::next(begin(), N), end() };
}
range_t without_back( std::size_t N = 1 ) const {
return { begin(), std::prev(end(),N) };
}
decltype(auto) front() const {
return *begin();
}
decltype(auto) back() const {
return *std::prev(end());
}
};
template<class It>
range_t<It> range( It b, It e ) {
return {b,e};
}
Here is a non-compliant pseudo-iterator that does the cross product of two ranes:
template<class ItA, class ItB>
struct cross_iterator_t {
range_t<ItA> cur_a;
range_t<ItB> orig_b;
range_t<ItB> cur_b;
cross_iterator_t( range_t<ItA> a, range_t<ItB> b ):
cur_a(a), orig_b(b), cur_b(b)
{}
bool empty() const { return cur_a.empty() || cur_b.empty(); }
void operator++(){
cur_b = cur_b.without_front();
if (cur_b.empty()) {
cur_a = cur_a.without_front();
if (cur_a.empty()) return;
cur_b = orig_b;
}
}
auto operator*()const {
return std::make_pair( cur_a.front(), cur_b.front() );
}
friend bool operator==( cross_iterator_t lhs, cross_iterator_t rhs ) {
if (lhs.empty() && rhs.empty()) return true;
auto mytie=[](auto&& self){
return std::tie(self.cur_a, self.cur_b);
};
return mytie(lhs)==mytie(rhs);
}
friend bool operator!=( cross_iterator_t lhs, cross_iterator_t rhs ) {
return !(lhs==rhs);
}
};
template<class Lhs, class Rhs>
auto cross_iterator( range_t<Lhs> a, range_t<Rhs> b )
-> cross_iterator_t<Lhs, Rhs>
{
return {a,b};
}
From this you can take std::vector<A>, B and do:
template<class A, class B>
auto cross_one_element( A& range_a, B& b_element ) {
auto a = range( std::begin(range_a), std::end(range_a) );
auto b = range( &b_element, (&b_element) +1 );
auto s = cross_iterator(a, b);
decltype(s) f{};
return cross_iterator(s, f);
}
So that solves one of your problems. The above needs to be fixed to support true input iterator featurs, not just the above pseudo-iterator that works with for(:).
Then you have to write code that takes a vector of X and transorms it into a range of f(X) for some function f.
Then you have to write code that takes a range of ranges, and flattens it into a range.
Each of these steps is no harder than above.
There are libraries that do this for you. boost has some, Rangesv3 has some, as do a pile of other range-manipulation libraries.
Boost even lets you write an iterator by specifying what to do on * and on next and on ==. Getting what to do when one of your sub-vectors is empty remains tricky, so using more generic algorithms in this case is probably wise.
The code above is not tested, and is C++14. C++11 versions are merely more verbose.

Construction a vector from the concatenation of 2 vectors

Is there a way to construct a vector as the concatenation of 2 vectors (Other than creating a helper function?)
For example:
const vector<int> first = {13};
const vector<int> second = {42};
const vector<int> concatenation = first + second;
I know that vector doesn't have an addition operator like string, but that's the behavior that I want. Such that concatenation would contain: 13 and 42.
I know that I can initialize concatenation like this, but it prevents me from making concatenation const:
vector<int> concatenation = first;
first.insert(concatenation.end(), second.cbegin(), second.cend());
No, it's not possible if you require that
no helper function is defined, and
the resulting vector can be declared const.
template<typename T>
std::vector<T> operator+(const std::vector<T>& v1, const std::vector<T>& v2){
std::vector<T> vr(std::begin(v1), std::end(v1));
vr.insert(std::end(vr), std::begin(v2), std::end(v2));
return vr;
}
This does require a helper "function", but at least it allows you to use it as
const vector<int> concatenation = first + second;
I think you have to write a help function. I'd write it as:
std::vector<int> concatenate(const std::vector<int>& lhs, const std::vector<int>& rhs)
{
auto result = lhs;
std::copy( rhs.begin(), rhs.end(), std::back_inserter(result) );
return result;
}
The call it as:
const auto concatenation = concatenate(first, second);
If the vectors are likely to be very large (or contain elements that are expensive to copy), then you might need to do a reserve first to save reallocations:
std::vector<int> concatenate(const std::vector<int>& lhs, const std::vector<int>& rhs)
{
std::vector<int> result;
result.reserve( lhs.size() + rhs.size() );
std::copy( lhs.begin(), lhs.end(), std::back_inserter(result) );
std::copy( rhs.begin(), rhs.end(), std::back_inserter(result) );
return result;
}
(Personally, I would only bother if there was evidence it was a bottleneck).
class Vector : public vector<int>
{
public:
Vector operator+(const Vector& vec);
};
Vector Vector::operator+(const Vector& vec)
{
for (int i = 0; i < vec.size(); i++)
{
this->push_back(vec[i]);
}
return *this;
}
Let me preface this by saying this is a hack, and will not give an answer to how to do this using a vector. Instead we'll depend on sizeof(int) == sizeof(char32_t) and use a u32string to contain our data.
This answer makes it exceedingly clear that only primitives can be used in a basic_string, and that any primitive larger than 32-bits would require writing a custom char_traits, but for an int we can just use u32string.
The qualification for this can be validated by doing:
static_assert(sizeof(int) == sizeof(char32_t));
Once size equality has been established, and with the knowledge that things like non-const data, and emplace or emplace_back cannot be used, u32string can be used like a vector<int>, with the notable inclusion of an addition opperator:
const vector<int> first = {13};
const vector<int> second = {42};
const u32string concatenation = u32string(first.cbegin(), first.cend()) + u32string(second.cbegin(), second.cend());
[Live Example]
I came across this question looking for the same thing, and hoping there was an easier way than the one I came up with... seems like there isn't.
So, some iterator trickery should do it if you don't mind a helper template class:
#include <vector>
#include <iostream>
template<class T>
class concat
{
public:
using value_type = typename std::vector<T>::const_iterator::value_type;
using difference_type = typename std::vector<T>::const_iterator::difference_type;
using reference = typename std::vector<T>::const_iterator::reference;
using pointer = typename std::vector<T>::const_iterator::pointer;
using iterator_category = std::forward_iterator_tag;
concat(
const std::vector<T>& first,
const std::vector<T>& last,
const typename std::vector<T>::const_iterator& iterator) :
mFirst{first},
mLast{last},
mIterator{iterator}{}
bool operator!= ( const concat& i ) const
{
return mIterator != i.mIterator;
}
concat& operator++()
{
++mIterator;
if(mIterator==mFirst.end())
{
mIterator = mLast.begin();
}
return *this;
}
reference operator*() const
{
return *mIterator;
}
private:
const std::vector<T>& mFirst;
const std::vector<T>& mLast;
typename std::vector<T>::const_iterator mIterator;
};
int main()
{
const std::vector<int> first{0,1,2,3,4};
const std::vector<int> last{5,6,7,8,9};
const std::vector<int> concatenated(
concat<int>(first,last,first.begin()),
concat<int>(first,last,last.end()));
for(auto i: concatenated)
{
std::cout << i << std::endl;
}
return 0;
}
You may have to implement operator++(int) or operator== depending on how your STL implements the InputIterator constructor, this is the minimal iterator code example I could come up with for MingW GCC.
Have Fun! :)

Chaining of ordering predicates (e.g. for std::sort)

You can pass a function pointer, function object (or boost lambda) to std::sort to define a strict weak ordering of the elements of the container you want sorted.
However, sometimes (enough that I've hit this several times), you want to be able to chain "primitive" comparisons.
A trivial example would be if you were sorting a collection of objects that represent contact data. Sometimes you will want to sort by last name, first name, area code. Other times first name, last name - yet other times age, first name, area code... etc
Now, you can certainly write an additional function object for each case, but that violates the DRY principle - especially if each comparison is less trivial.
It seems like you should be able to write a hierarchy of comparison functions - the low level ones do the single, primitive, comparisons (e.g. first name < first name), then higher level ones call the lower level ones in succession (probably chaining with && to make use of short circuit evaluation) to generate the composite functions.
The trouble with this approach is that std::sort takes a binary predicate - the predicate can only return a bool. So if you're composing them you can't tell if a "false" indicates equality or greater than. You can make your lower level predicates return an int, with three states - but then you would have to wrap those in higher level predicates before they could be used with std::sort on their own.
In all, these are not insurmountable problems. It just seems harder than it should be - and certainly invites a helper library implementation.
Therefore, does anyone know of any pre-existing library (esp. if it's a std or boost library) that can help here - of have any other thoughts on the matter?
[Update]
As mentioned in some of the comments - I've gone ahead and written my own implementation of a class to manage this. It's fairly minimal, and probably has some issues with it in general. but on that basis, for anyone interested, the class is here:
http://pastebin.com/f52a85e4f
And some helper functions (to avoid the need to specify template args) is here:
http://pastebin.com/fa03d66e
You could build a little chaining system like so:
struct Type {
string first, last;
int age;
};
struct CmpFirst {
bool operator () (const Type& lhs, const Type& rhs) { return lhs.first < rhs.first; }
};
struct CmpLast {
bool operator () (const Type& lhs, const Type& rhs) { return lhs.last < rhs.last; }
};
struct CmpAge {
bool operator () (const Type& lhs, const Type& rhs) { return lhs.age < rhs.age; }
};
template <typename First, typename Second>
struct Chain {
Chain(const First& f_, const Second& s_): f(f_), s(s_) {}
bool operator () (const Type& lhs, const Type& rhs) {
if(f(lhs, rhs))
return true;
if(f(rhs, lhs))
return false;
return s(lhs, rhs);
}
template <typename Next>
Chain <Chain, Next> chain(const Next& next) const {
return Chain <Chain, Next> (*this, next);
}
First f;
Second s;
};
struct False { bool operator() (const Type& lhs, const Type& rhs) { return false; } };
template <typename Op>
Chain <False, Op> make_chain(const Op& op) { return Chain <False, Op> (False(), op); }
Then to use it:
vector <Type> v; // fill this baby up
sort(v.begin(), v.end(), make_chain(CmpLast()).chain(CmpFirst()).chain(CmpAge()));
The last line is a little verbose, but I think it's clear what's intended.
One conventional way to handle this is to sort in multiple passes and use a stable sort. Notice that std::sort is generally not stable. However, there’s std::stable_sort.
That said, I would write a wrapper around functors that return a tristate (representing less, equals, greater).
You can try this:
Usage:
struct Citizen {
std::wstring iFirstName;
std::wstring iLastName;
};
ChainComparer<Citizen> cmp;
cmp.Chain<std::less>( boost::bind( &Citizen::iLastName, _1 ) );
cmp.Chain<std::less>( boost::bind( &Citizen::iFirstName, _1 ) );
std::vector<Citizen> vec;
std::sort( vec.begin(), vec.end(), cmp );
Implementation:
template <typename T>
class ChainComparer {
public:
typedef boost::function<bool(const T&, const T&)> TComparator;
typedef TComparator EqualComparator;
typedef TComparator CustomComparator;
template <template <typename> class TComparer, typename TValueGetter>
void Chain( const TValueGetter& getter ) {
iComparers.push_back( std::make_pair(
boost::bind( getter, _1 ) == boost::bind( getter, _2 ),
boost::bind( TComparer<TValueGetter::result_type>(), boost::bind( getter, _1 ), boost::bind( getter, _2 ) )
) );
}
bool operator()( const T& lhs, const T& rhs ) {
BOOST_FOREACH( const auto& comparer, iComparers ) {
if( !comparer.first( lhs, rhs ) ) {
return comparer.second( lhs, rhs );
}
}
return false;
}
private:
std::vector<std::pair<EqualComparator, CustomComparator>> iComparers;
};
std::sort is not guaranteed to be stable because stable sorts are usually slower than non-stable ones ... so using a stable sort multiple times looks like a recipe for performance trouble...
And yes it's really a shame that sort ask for a predicate:
I see no other way than create a functor accepting a vector of tristate functions ...
The chaining solution is verbose. You could also use boost::bind in conjunction with std::logical_and to build your sorting predicate. See the linked article for more information: How the boost bind library can improve your C++ programs
Variadic templates in C++ 11 give a shorter option:
#include <iostream>
using namespace std;
struct vec { int x,y,z; };
struct CmpX {
bool operator() (const vec& lhs, const vec& rhs) const
{ return lhs.x < rhs.x; }
};
struct CmpY {
bool operator() (const vec& lhs, const vec& rhs) const
{ return lhs.y < rhs.y; }
};
struct CmpZ {
bool operator() (const vec& lhs, const vec& rhs) const
{ return lhs.z < rhs.z; }
};
template <typename T>
bool chained(const T &, const T &) {
return false;
}
template <typename CMP, typename T, typename ...P>
bool chained(const T &t1, const T &t2, const CMP &c, P...p) {
if (c(t1,t2)) { return true; }
if (c(t2,t1)) { return false; }
else { return chained(t1, t2, p...); }
}
int main(int argc, char **argv) {
vec x = { 1,2,3 }, y = { 2,2,3 }, z = { 1,3,3 };
cout << chained(x,x,CmpX(),CmpY(),CmpZ()) << endl;
return 0;
}