Searching std::map in O(n) for a partial key - c++

I have a (C++ 14) map
using MyTuple = tuple<A, B, C>;
map<MyTuple, MyData>
where MyTuple has the obvious operator<() that compares first A, then B, then C.
I want to do an O(ln N) search for keys that match some constant (a, b). Obviously if any are present they will be consecutive. So basically I want
map<MyTuple, MyData> my_map = GetMyData();
tuple<A, B> my_key = make_tuple(a, b);
auto iter = my_map.lower_bound_if([my_key](const MyTuple& key) {
if (get<0>(my_key) == get<0>(key) &&
get<1>(my_key) == get<1>(key)) {
return true;
}
return false;
});
while( /* iter.first is still an (a,b) */) {
// Do something with iter.
// Increment iter.
}
But there's no function map::lower_bound_if and yet map::find and map::lower_bound take a full MyTuple. I could (hackishly) find a value of C that is lower than anything in my data, though this is fragile over time. I could write the function myself though it would likely be dependent on my current local implementation of std::map.
Have I missed an obvious solution?
Update
The solution, which I've accepted, was to use partial key mathing in the compare function (transparent operator functors), which are new since C++14 and took me longer than I care to admit to understand. This article was a good mental ice breaker and this SO question was good once I understood.
The basic insight is to consider a set of objects sorted on some key that is part of the object. For example, a set of employees with the sort key being employee.id. And we'd like to be able to search on employee or on the integer id. So we make a struct of bool operator()() that encompasses the various ways we might want to compare. And then overload resolution does the rest.
In my case, this meant that I could provide the strict total ordering I needed to make my code work but also provide a comparator that only imposes a partial ordering which I only use for lower_bound() lookups. Because the extra comparator doesn't provide a strict total ordering, it would not be suitable for, say, find() unless we meant "find one of".
As an aside, I realised after posing my question that it was a bit daft in a way. I wanted an O(ln n) lookup but wanted to use a different sort function. That can't be guaranteed to work: it depends on me providing a sort function that really does provide an ordering that is a sub-ordering of the strict total ordering. If I did otherwise, it would clearly fail. And so this is why there's no O(ln n) function find_if(), because that can only be linear.
Indeed, the technique of transparent operator functors is clever but does depend on the programmer providing no worse than a subordering.

In c++14 you can use the overload to search on a partial key:
struct CompareFirstTwo {
using is_transparent = void;
bool operator()(const tuple<A, B, C>& lhs, const tuple<A, B, C>& rhs) const ...
bool operator()(const tuple<A, B>& lhs, const tuple<A, B, C>& rhs) const ...
bool operator()(const tuple<A, B, C>& lhs, const tuple<A, B>& rhs) const ...
};
Use the comparator above in a call to equal_range to ignore the third field in the tuple.

One approach is to put "any value you like" for C, use lower_bound, and be aware that the values you are looking for might be before the lower_bound, as well as after, so you might need to do some operator-- to find the first one, just as you would use operator++ to find the last one? The number of operations in order to find the range in advance does not change, but there is an in-advance overhead if you wanted to iterate over them and test for the end of <A,B> on the fly.
Obviously, it would be convenient, but not necessary, if that C was a "hacky" value that compared as the lowest possible C, as that would make the backward search faster, but we have mitigated your risk of fragility?
Another option would be to make your own container that is a map of maps. The outer map would be indexed by <A,B> and the inner map indexed by < C >.
You could then add your own methods to implement a whole-map indexing and iteration using <A,B,C>; and use the existing <A,B> methods to either return the inner map when indexed, or return your iterators as results for lower_bound and upper_bound, that can then be used as a range for all <A,B,allC>.

If you have an order such that x {xa, xb, xc} is always < than y {ya, yb, yc} when f(xa, xb) (and conversely for >), it becomes easy. Consider it as "sort by A and B first".
Then you just need to know your maximal and minimal C value.
Search between lower_bound(a, b, MIN_C) and upper_bound(a, b, MAX_C). Alternatively, as suggested in the comments
A map of tuple<A, B> to map of C to MyData? – Vlad Feinstein
would work.

Related

Different types for `std::sort` comparator in C++

When we provide a comparator function for std::sort, we use the following overload:
template< class RandomIt, class Compare >
void sort( RandomIt first, RandomIt last, Compare comp );
in which the comparator function for std::sort should have the following syntax:
bool cmp(const Type1 &a, const Type2 &b);
But as you can see a and b may have different types. cppreference says:
The types Type1 and Type2 must be such that an object of type RandomIt
can be dereferenced and then implicitly converted to both of them. ​
But I still cannot understand exactly how we can have 2 different types in a single array when we try to sort it.
Is it possible for someone to provide a small example with different types for std::sort's comparator function?
Its not about what is stored in the array, only one type can ever be stored. It is about what the comparator function is. Take for example this:
struct Animal {};
struct Cat : Animal {};
struct Dog : Animal {};
struct Hound : Dog {};
bool cmp(const Animal &a, const Animal &b);
Even if you have a list of Dogs, Cats or Hounds you can still sort them with the function cmp because they are all implicitly convertible. ie.
std::vector<Hound> hounds;
... // fill hounds
std::sort(hounds.begin(), hounds.end(), cmp);
And you can even imagine cases where Type1 and Type2 are not the same, eg.:
bool cmp(const Animal &a, const Dog &b);
etc ...
Although this would be exceedingly rare.
The types Type1 (Animal) and Type2 (Dog) must be such that an object of type RandomIt (Hound) can be dereferenced and then implicitly converted to both of them. ​Which is true.
The point is that a restriction on the types that a cmp function can take to the same, precludes generality. In some cases this is a good idea, but in this case it would be unreasonably strict and may force problems for edge case implementations. Furthermore, the cmp function used instd::sort is bound by the requirements set out for Compare (probably for simplicity). Compare requirements are used for all sorts of other things, like std::max.
But I still cannot get it exactly how we can have 2 different types in a single array when we try to sort it.
You can't have two different types in an array. The comparator doesn't suggest it's possible. It's specified like that simply because:
The code can be well formed when the types are not the same.
Demanding the same type is a restriction that serves little to no purpose.
So the specification offers a looser contract than is "obvious", in order to help our code be more flexible if needed. As a toy example, say we have this comparator laying around:
auto cmp(int a, long b) -> bool { return a < b; }
Why prevent us from using this perfectly legal (albeit silly) function to sort an array of integers?
But I still cannot get it exactly how we can have 2 different types in a single array when we try to sort it.
You can't.
But the requirements of Compare are not just for sorting arrays, or just for sorting at all!
They're for any time you want to compare one thing to another thing.
Is minutes(42) less than hours(1)? Yes! You may find useful a comparator for such occasions.
Compare is a more general concept that finds uses throughout the language.
Is ti possible that someone provide a small example with different types for std::sort's comparator function
Others have shown examples that indicate how silly you have to get to find a "useful" example to use against std::sort specifically.
But it's not "std::sort's comparator function". It's a comparator function, which you just so happen to be using with std::sort.
It's true that, when doing so, you probably want the particular comparator that you pick to accept operands of the same type.
But I still cannot get it exactly how we can have 2 different types in a single array
You cannot have two different types in a single array.
An array can have objects of only single type. But that single type must be implicitly convertible to both argument types of cmp.
Is ti possible that someone provide a small example with different types for std::sort's comparator function?
Here you go:
int arr[] = {1, 2, 3, 0};
auto cmp = [](const int &a, const long &b) {
return a < b;
};
std::sort(std::begin(arr), std::end(arr), cmp);
Note the two different arguments of cmp. This is just a minimal example, which is technically correct, but admittedly nonsensical. Frankly, I've never encountered a case where it would be useful to have different types for the arguments of a comparison function.
The requirements for a comparator are far looser than you think:
It must accept two dereferenced iterators into the sequence as arguments.
Using an implicit conversion-sequence is fine.
The return-value must be contextually-convertible to bool.
An explicit conversion-operator works just fine.
It must be a copyable and nothrow-destructible complete type.
It must not modify the arguments, so it doesn't interfere with the calling algorithm.
That does not in any way imply the use of constant references if references are used at all.
It must induce a full weak order (cmp(a, b) implies !cmp(b, a), cmp(a, b) && cmp(b, c) implies cmp(a, c)).
So, a valid but fairly useless comparator would be:
template <class... X>
auto useless(X&&...) { return nullptr; }
The type requirements on Compare are not saying much about the elements of the sequence you are sorting, but instead they are allowing all comps for which
if (comp(*first, *other))
is valid.
Most of the time, Type1 will be equal to Type2, but they are not required to be equal.

Basic std set logic

This may be dull question, but I want to be sure.
Lets say I have a struct:
struct A
{
int number;
bool flag;
bool operator<(const A& other) const
{
return number < other.number;
}
};
Somewhere in code:
A a1, a2, a3;
std::set<A> set;
a1.flag = true;
a1.number = 0;
a2.flag = false;
a2.number = 10;
a3 = a1;
set.insert(a1);
set.insert(a2);
if(set.find(a3) == set.end())
{
printf("NOT FOUND");
}
else
{
printf("FOUND");
}
The output I get is "FOUND". I understand that, since I am passing values, elements in set are compared by value. But how can objects A be compared by their values, since equality operator is not overrided? I dont understand how overriding operator '<' can be enough for sets finding function.
The ordered containers (set, multiset, map, multimap) use one single predicate to establish the element order and find values, the less-than predicate.
Two elements are considered equal if neither one is less-than the other.
This notion if "equality" may not be the same as some other notion of equality you may have. Sometimes the term "equivalent" is preferred to distinguish this notion that's induced by the less-than ordering from other, ad-hoc notions of equality that may exist simultaneously (e.g. an overloaded operator==).
For "sane" value types (also called regular types), ad-hoc equality and less-than-induced equivalence are required to be the same; many naturally occurring types are regular (e.g. arithmetic types (if NaNs are removed)). In other cases, especially if the less-than predicate is provided externally and not by the type itself, it's entirely possible that the less-than equivalence classes contain many non-"equal" values.
The flag member is entirely irrelevant here. The set has found an element that is equivalent to the searched-for value, with respect to <.
That is, if a is not less than b, and b is not less than a, then a and b must be equal. This is how it works with normal integers. That is how it is decided 2 values are equivalent in a std::set.
std::set doesn't use == at all. (unordered_set, which is a hash set, does use it, because it's the only way to distinguish hash collisions).
You can also provide a function to do the work of <, but it must behave as a strict weak ordering. Which is a bit heavy on the maths, but basically you could use > instead, via std::greater, or define your own named function rather than defining operator<.
So there is nothing technically to stop you defining an operator== that behaves differently from the notion of equivalence that comes from your operator<, but std::set won't use it, and it would probably confuse people.
From the documentation of set
two objects a and b are considered equivalent (not unique) if neither
compares less than the other: !comp(a, b) && !comp(b, a)
http://en.cppreference.com/w/cpp/container/set
In the template you can see
template<
class Key,
class Compare = std::less<Key>,
class Allocator = std::allocator<Key>
> class set;
std::less
will call operator< and that is why it works.

Why C++ STL containers use "less than" operator< and not "equal equal" operator== as comparator?

While implementing a comparator operator inside a custom class for std::map, I came across this question and couldn't see anywhere being asked.
Apart from the above question, also interested to know in brief, how operator< would work for std::map.
Origin of the question:
struct Address {
long m_IPv4Address;
bool isTCP;
bool operator< (const Address&) const; // trouble
};
std::map<K,D> needs to be able to sort. By default is uses std::less<K>, which for non-pointers uses <1.
Using the rule that you demand the least you can from your users, it synthesizes "equivalence" from < when it needs it (!(a<b) && !(b<a) means a and b are equivalent, ie, neither is less than the other).
This makes it easier to write classes to use as key components for a map, which seems like a good idea.
There are std containers that use == such as std::unordered_map, which uses std::hash and ==. Again, they are designed so that they require the least from their users -- you don't need full ordering for unordered_ containers, just equivalence and a good hash.
As it happens, it is really easy to write a < if you have access to <tuple>.
struct Address {
long m_IPv4Address;
bool isTCP;
bool operator< (const Address& o) const {
return
std::tie( m_IPv4Address, isTCP )
< std::tie( o.m_IPv4Address, o.isTCP );
}
};
which uses std::tie defined in <tuple> to generate a proper < for you. std::tie takes a bunch of data, and generates a tuple of references, which has a good < already defined.
1 For pointers, it uses some comparison that is compatible with < where < behaviour is specified, and behaves well when < does not. This only really matters on segmented memory model and other obscure architectures.
Because std::map is a sorted associative container, it's keys need ordering.
An == operator would not allow to order multiple keys
You might be looking for std::unordered_map , which work has a hashtable. You can specify your own hash and equality operator functions :
explicit unordered_map( size_type bucket_count,
const Hash& hash = Hash(),
const KeyEqual& equal = KeyEqual(),
const Allocator& alloc = Allocator() );
With < you can order elements. If a < b then a should be placed before b in the collection.
You can also determine if two items are equivalent: if !(a < b) && !(b < a) (if neither object is smaller than the other), then they're equivalent.
Those two capabilities are all std::map requires. So it just expects its element type to provide an operator <.
With == you could determine equality, but you wouldn't be able to order elements. So it wouldn't satisfy the requirements of std::map.

How to std::find using a Compare object?

I am confused about the interface of std::find. Why doesn't it take a Compare object that tells it how to compare two objects?
If I could pass a Compare object I could make the following code work, where I would like to compare by value, instead of just comparing the pointer values directly:
typedef std::vector<std::string*> Vec;
Vec vec;
std::string* s1 = new std::string("foo");
std::string* s2 = new std::string("foo");
vec.push_back(s1);
Vec::const_iterator found = std::find(vec.begin(), vec.end(), s2);
// not found, obviously, because I can't tell it to compare by value
delete s1;
delete s2;
Is the following the recommended way to do it?
template<class T>
struct MyEqualsByVal {
const T& x_;
MyEqualsByVal(const T& x) : x_(x) {}
bool operator()(const T& y) const {
return *x_ == *y;
}
};
// ...
vec.push_back(s1);
Vec::const_iterator found =
std::find_if(vec.begin(), vec.end(),
MyEqualsByVal<std::string*>(s2)); // OK, will find "foo"
find can't be overloaded to take a unary predicate instead of a value, because it's an unconstrained template parameter. So if you called find(first, last, my_predicate), there would be a potential ambiguity whether you want the predicate to be evaluated on each member of the range, or whether you want to find a member of the range that's equal to the predicate itself (it could be a range of predicates, for all the designers of the standard libraries know or care, or the value_type of the iterator could be convertible both to the predicate type, and to its argument_type). Hence the need for find_if to go under a separate name.
find could have been overloaded to take an optional binary predicate, in addition to the value searched for. But capturing values in functors, as you've done, is such a standard technique that I don't think it would be a massive gain: it's certainly never necessary since you can always achieve the same result with find_if.
If you got the find you wanted, you'd still have to write a functor (or use boost), since <functional> doesn't contain anything to dereference a pointer. Your functor would be a little simpler as a binary predicate, though, or you could use a function pointer, so it'd be a modest gain. So I don't know why this isn't provided. Given the copy_if fiasco I'm not sure there's much value in assuming there are always good reasons for algorithms that aren't available :-)
Since your T is a pointer, you may as well store a copy of the pointer in the function object.
Other than that, that is how it is done and there's not a whole lot more to it.
As an aside, it's not a good idea to store bare pointers in a container, unless you are extremely careful with ensuring exception safety, which is almost always more hassle than it's worth.
That's exactly what find_if is for - it takes a predicate that is called to compare elements.

binary_search, find_if and <functional>

std::find_if takes a predicate in one of it's overloaded function. Binders make it possible to write EqualityComparators for user-defined types and use them either for dynamic comparison or static comparison.
In contrast the binary search functions of the standard library take a comparator and a const T& to the value that should be used for comparison. This feels inconsistent to me and could possibly more inefficient as the comparator has to be called with both arguments every time instead of having the constant argument bound to it. While it could be possible to implement std::binary_search in a way to use std::bind this would require all comparators to inherit from std::binary_function. Most code I've seen doesn't do that.
Is there a possible benefit from letting comparators inherit from std::binary_function when using it with algorithms that take a const T& as a value instead of letting me use the binders? Is there a reason for not providing predicate overloads in those functions?
A single-argument predicate version of std::binary_search wouldn't be able to complete in O(log n) time.
Consider the old game "guess the letter I'm thinking of". You could ask: "Is it A?" "Is it B?".. and so on until you reached the letter. That's a linear, or O(n), algorithm. But smarter would be to ask "Is it before M?" "Is it before G?" "Is it before I?" and so on until you get to the letter in question. That's a logarithmic, or O(log n), algorithm.
This is what std::binary_search does, and to do this in needs to be able to distinguish three conditions:
Candidate C is the searched-for item X
Candidate C is greater than X
Candidate C is less than X
A one-argument predicate P(x) says only "x has property P" or "x doesn't have property P". You can't get three results from this boolean function.
A comparator (say, <) lets you get three results by calculating C < X and also X < C. Then you have three possibilities:
!(C < X) && !(X < C) C is equal to X
C < X && !(X < C) C is less than X
!(C < X) && X < C C is greater than X
Note that both X and C get bound to both parameters of < at different times, which is why you can't just bind X to one argument of < and use that.
Edit: thanks to jpalecek for reminding me binary_search uses <, not <=.
Edit edit: thanks to Rob Kennedy for clarification.
They are completely different algorithms: find_if looks linearly for the first item for which the predicate is true, binary_search takes advantage that the range is sorted to test in logarithmic time if a given value is in it.
The predicate for binary_search specifies the function according to which the range is ordered (you'd most likely want to use the same predicate you used for sorting it).
You can't take advantage of the sortedness to search for a value satisfying some completely unrelated predicate (you'd have to use find_if anyway). Note however, that with a sorted range you can do more than just test for existence with lower_bound, upper_bound and equal_range.
The question, what is the purpose of std::binary_function is an interesting one.
All it does is provide typedefs for result_type, first_argument_type and second_argument_type. These would allow the users, given a functor as a template argument, to find out and use these types, e.g
template <class T, class BinaryFunction>
void foo(const T& a, const T& b, BinaryFunction f)
{
//declare a variable to store the result of the function call
typename BinaryFunction::result_type result = f(a, b);
//...
}
However, I think the only place where they are used in the standard library is creating other functor wrappers like bind1st, bind2nd, not1, not2. (If they were used for other purposes, people would yell at you any time you used a function as a functor since it would be an unportable thing to do.)
For example, binary_negate might be implemented as (GCC):
template<typename _Predicate>
class binary_negate
: public binary_function<typename _Predicate::first_argument_type,
typename _Predicate::second_argument_type, bool>
{
protected:
_Predicate _M_pred;
public:
explicit
binary_negate(const _Predicate& __x) : _M_pred(__x) { }
bool
operator()(const typename _Predicate::first_argument_type& __x,
const typename _Predicate::second_argument_type& __y) const
{ return !_M_pred(__x, __y); }
};
Of course, operator() could perhaps just be a template, in which case those typedefs would be unnecessary (any downsides?). There are probably also metaprogramming techniques to find out what the argument types are without requiring the user to typedef them explicitly. I suppose it would somewhat get into the way with the power that C++0x gives - e.g when I'd like to implement a negator for a function of any arity with variadic templates...
(IMO the C++98 functors are a bit too inflexible and primitive compared for example to std::tr1::bind and std::tr1::mem_fn, but probably at the time compiler support for metaprogramming techniques required to make those work was not that good, and perhaps the techniques were still being discovered.)
This is a misunderstanding of the Functor concept in C++.
It has nothing to do with inheritance. The property that makes an object a functor (eligible for passing to any of the algorithms) is validity of the expression object(x) or object(x, y), respectively, regardless whether it is a function pointer or an object with overloaded function call operator. Definitely not inheritance from anything. The same applies for std::bind.
The use of binary functors as comparators comes from the fact that comparators (eg. std::less) are binary functors and it's good to be able to use them directly.
IMHO there would be no gain in providing or using the predicate version you propose (after all, it takes just passing one reference). There would be no (performance) gain in using binders, because it does the same thing as the algorithm (bind would pass the extra argument in lieu of the algorithm).