How does std::set comparator function work? - c++

Currently working on an algorithm problems using set.
set<string> mySet;
mySet.insert("(())()");
mySet.insert("()()()");
//print mySet:
(())()
()()()
Ok great, as expected.
However if I put a comp function that sorts the set by its length, I only get 1 result back.
struct size_comp
{
bool operator()(const string& a, const string& b) const{
return a.size()>b.size();
}
};
set<string, size_comp> mySet;
mySet.insert("(())()");
mySet.insert("()()()");
//print myset
(())()
Can someone explain to me why?
I tried using a multi set, but its appending duplicates.
multiset<string,size_comp> mSet;
mSet.insert("(())()");
mSet.insert("()()()");
mSet.insert("()()()");
//print mset
"(())()","()()()","()()()"

std::set stores unique values only. Two values a,b are considered equivalent if and only if
!comp(a,b) && !comp(b,a)
or in everyday language, if a is not smaller than b and b is not smaller than a. In particular, only this criterion is used to check for equality, the normal operator== is not considered at all.
So with your comparator, the set can only contain one string of length n for every n.
If you want to allow multiple values that are equivalent under your comparison, use std::multiset. This will of course also allow exact duplicates, again, under your comparator, "asdf" is just as equivalent to "aaaa" as it is to "asdf".
If that does not make sense for your problem, you need to come up with either a different comparator that induces a proper notion of equality or use another data structure.
A quick fix to get the behavior you probably want (correct me if I'm wrong) would be introducing a secondary comparison criterion like the normal operator>. That way, we sort by length first, but are still able to distinguish between different strings of the same length.
struct size_comp
{
bool operator()(const string& a, const string& b) const{
if (a.size() != b.size())
return a.size() > b.size();
return a > b;
}
};

The comparator template argument, which defaults to std::less<T>, must represent a strict weak ordering relation between values in its domain.
This kind of relation has some requirements:
it's not reflexive (x < x yields false)
it's asymmetric (x < y implies that y < x is false)
it's transitive (x < y && y < z implies x < z)
Taking this further we can define equivalence between values in term of this relation, because if !(x < y) && !(y < x) then it must hold that x == y.
In your situation you have that ∀ x, y such that x.size() == y.size(), then both comp(x,y) == false && comp(y,x) == false, so since no x or y is lesser than the other, then they must be equal.
This equivalence is used to determine if two items correspond to the same, thus ignoring second insertion in your example.
To fix this you must make sure that your comparator never returns false for both comp(x,y) and comp(y,x) if you don't want to consider x equal to y, for example by doing
auto cmp = [](const string& a, const string& b) {
if (a.size() != b.size())
return a.size() > b.size();
else
return std::less()(a, b);
}
So that for input of same length you fallback to normal lexicographic order.

This is because equality of elements is defined by the comparator. An element is considered equal to another if and only if !comp(a, b) && !comp(b, a).
Since the length of "(())()" is not greater, nor lesser than the length of "()()()", they are considered equal by your comparator. There can be only unique elements in a std::set, and an equivalent object will overwrite the existing one.
The default comparator uses operator<, which in the case of strings, performs lexicographical ordering.
I tried using a multi set, but its appending duplicates.
Multiset indeed does allow duplicates. Therefore both strings will be contained despite having the same length.

size_comp considers only the length of the strings. The default comparison operator uses lexicographic comparison, which distinguishes based on the content of the string as well as the length.

Related

Comparator for matching point in a range

I need to create a std::set of ranges for finding matching points in these ranges. Each range is defined as follows:
struct Range {
uint32_t start;
uint32_t end;
uint32_t pr;
};
In this structure start/end pair identify each range. pr identifies the priority of that range. It means if a single point falls into 2 different ranges, I like to return range with smaller pr. I like to create a std::set with a transparent comparator to match points like this:
struct RangeComparator {
bool operator()(const Range& l, const Range& r) const {
if (l.end < r.start)
return true;
if (l.end < r.end && l.pr >= r.pr)
return true;
return false;
}
bool operator()(const Range& l, uint32_t p) const {
if (p < l.start)
return true;
return false;
}
bool operator()(uint32_t p, const Range& r) const {
if (p < r.start)
return true;
return false;
}
using is_transparent = int;
};
std::set<Range, RangeComparator> ranges;
ranges.emplace(100,250,1);
ranges.emplace(200,350,2);
auto v1 = ranges.find(110); // <-- return range 1
auto v2 = ranges.find(210); // <-- return range 1 because pr range 1 is less
auto v3 = ranges.find(260); // <-- return range 2
I know my comparators are wrong. I wonder how I can write these 3 comparators to answer these queries correctly? Is it possible at all?
find returns an element that compares equivalent to the argument. Equivalent means that it compares neither larger nor smaller in the strict weak ordering provided to the std::set.
Therefore, to make your use case work, you want all points in a range to compare equivalent to the range.
If two ranges overlap, then the points shared by the two ranges need to compare equivalent to both ranges. The priority doesn't matter for this, since the equivalence should presumably hold if only one of the ranges is present.
However, one of the defining properties of a strict weak ordering is that the property of comparing equivalent is transitive. Therefore in this ordering the two ranges must then also compare equal in order to satisfy the requirements of std::set.
Therefore, as long as the possible ranges are not completely separated, the only valid strict weak ordering is the one that compares all ranges and points equivalent.
This is however not an order that would give you what you want.
This analysis holds for all standard library associative containers, since they have the same requirements on the ordering.

Sorting a vector that contains pair<x,y>

I have seen a code somewhere in which a guy did something like this
#define pp pair<int,int>
int main()
{
int n,i;
scanf("%d",&n);
vector<pp> G;
for(i=0;i<n;i++)
{
int x,y;
scanf("%d%d",&x,&y);
G.push_back(pp(x+y,x-y));
}
sort(G.begin(),G.end());
I want to know how sorting is done here. I mean to say on what parameter sorting is performed in a vector that contains pairs in it.
The sort function sort according to an comparaison function (which defines the order). One could want to specify which omparaison to use, however if the comparaison function is not specified (like in your case) then sort uses the defaut order being the < operator on the considered object
So in your case, ordering is possible because the comparaison operator is overloaded for std::pair.
The behaviour of those operator is described here : http://en.cppreference.com/w/cpp/utility/pair/operator_cmp
Therefor, when sorting the < operator on the std::pair class will be called, and your pairs are going to be ordered lexicographically
(0,1) < (0,2) < (1,0) < (1,2) < (2,7)
Check this link: http://www.cplusplus.com/reference/utility/pair/operators/
Similarly, operators <, >, <= and >= perform a lexicographical comparison on the sequence formed by members first and second.
So, a < b means (a.first < b.first) || (a.first == b.first && a.second < b.second)

using a custom comparator with std::set

I'm trying to create a list of words read from a file arranged by their length. For that, I'm trying to use std::set with a custom comparator.
class Longer {
public:
bool operator() (const string& a, const string& b)
{ return a.size() > b.size();}
};
set<string, Longer> make_dictionary (const string& ifile){
// produces a map of words in 'ifile' sorted by their length
ifstream ifs {ifile};
if (!ifs) throw runtime_error ("couldn't open file for reading");
string word;
set<string, Longer> words;
while (ifs >> word){
strip(word);
tolower(word);
words.insert(word);
}
remove_plurals(words);
if (ifs.eof()){
return words;
}
else
throw runtime_error ("input failed");
}
From this, I expect a list of all words in a file arranged by their length. Instead, I get a very short list, with exactly one word for each length occurring in the input:
polynomially-decidable
complexity-theoretic
linearly-decidable
lexicographically
alternating-time
finite-variable
newenvironment
documentclass
binoppenalty
investigate
usepackage
corollary
latexsym
article
remark
logic
12pt
box
on
a
Any idea of what's going on here?
With your comparator, equal-length words are equivalent, and you can't have duplicate equivalent entries in a set.
To maintain multiple words, you should modify your comparator so that it also performs, say, a lexicographic comparison if the lengths are the same.
Your comparator only compares by length, that means that equally-sized but different strings are treated as being equivalent by std::set. (std::set treats them equally if neither a < b nor b < a are true, with < being your custom comparator function.)
That means your comparator should also consider the string contents to avoid this situation. The keyword here is lexicographic comparison, meaning you take multiple comparison criteria in account. The first criterion would be your string length, and the second would be the string itself. An easy way to write lexicographic comparison is to make use of std::tuple which provides a comparison operator performing lexicographic comparison on the components by overloading the operator<.
To make your "reverse" ordering of length, which you wrote with operator>, compatible with the usually used operator<, simply take the negative size of the strings, i.e. first rewrite a.size() > b.size() as -a.size() < -b.size(), and then compose it with the string itself into tuples, finally compare the tuples with <:
class Longer {
public:
bool operator() (const string& a, const string& b)
{
return std::make_tuple(-a.size(), a )
< std::make_tuple(-b.size(), b );
// ^^^^^^^^^ ^^^
// first second
// criterion criterion
}
};

multimap with custom keys - comparison function

bool operator<(const Binding& b1, const Binding& b2)
{
if(b1.r != b2.r && b1.t1 != b2.t1)
{
if(b1.r != b2.r)
return b1.r < b2.r;
return b1.t1 < b2.t1;
}
return false;
}
I have a comparison function like above. Basically, I need to deem the objects equal if one of their attribute matches. I am using this comparison function for my multimap whose key is 'Binding' object.
The problem I face is that lower_bound and upper_bound functions return the same iterator which points to a valid object. For example (t1 = 1, r = 2) is already in the map and when I try to search it in the map with (t1 = 1, r = 2), I get a same iterator as return value of upper_bound and lower_bound functions.
Is anything wrong with the comparison function? Is there a way to figure a function where I can still ensure that the objects are equivalent even if just one of their field matches?
Shouldn't the upper_bound iterator return the object past the
The comparator for a map or multimap is expected to express a strict weak ordering relation between the set of keys. Your requirement "two objects are equivalent if just one of their fields matches" cannot be such a relation. Take these three keys:
1: r=1, t1=10
2: r=1, t1=42
3: r=2, t1=42
clearly, keys 1 and 2 are equivalent, because they have the same r. Likewise, 2 and 3 are equivalent because of the same t1. That means, that 1 and 3 have to be equivalent as well, although they have no matching fields.
As a corollary, all possible keys have to be equivalent under these circumstances, which means you dont have any ordering at all and a multimap is not the right way to go.
For your case, Boost.MultiIndex comes to mind. You could then have two separate indices for r and t1 and do your lower_bound, upper_bound and equal_range searches over both indices separately.
Your comparision function after removing redundant code can be re-written as
bool operator<(const Binding& b1, const Binding& b2)
{
if(b1.r != b2.r && b1.t1 != b2.t1)
{
//if(b1.r != b2.r) // always true
return b1.r < b2.r;
//return b1.t1 < b2.t1; // Never reached
}
return false;
}
Or by de-morgan's law
bool operator<(const Binding& b1, const Binding& b2)
{
if(b1.r == b2.r || b1.t1 == b2.t1) return false;
else return b1.r < b2.r;
}
This does not guarantee a < c if a < b and b < c
Ex: Binding(r, t): a(3, 5), b(4, 6), c(5, 5)
If your comparision function doesn't follow above crieteria, you may get strange results. (including infinite loops in some cases if library is not robust)
Your comparison function will return false if either the rs or the ts match because of the && in the if() clause. Did you mean || ? Replacing the && with || would give you a valid comparison function which compare first by the r field then the t field.
Note that std::pair already has a comparison function that does exactly that.
Your text below your code though states:
Basically, I need to deem the objects equal if one of their attribute matches
You cannot do that as it wouldn't be transitive (thus you wouldn't have strict ordering).
The inside of your if block has another if that is certain to be true, as the && clause means both sides are true.

How does STL map know, that map contains a given element?

In question Using char as key in stdmap there is advised to use custom compare function/functor:
struct cmp_str
{
bool operator()(char const *a, char const *b)
{
return std::strcmp(a, b) < 0;
}
};
map<char *, int, cmp_str> BlahBlah;
This allows map to detect if key A is less than key B. But for example map<>::find() returns end if element is not found, and iterator to it if it is found. So map knows about equivalence, not only less-than. How?
The equality condition for two keys a and b are that a<b and b<a are both false. The map itself is commonly implemented as a balanced binary tree*, so the less-than comparison is used to traverse the map from the root node until the matching element is found. When searching for a key k, less-than comparison is used until the first element for which the comparison is false is found. If the inverse comparison is also false, k has been found. Otherwise, k is not in the map. The map only uses the less-than comparison to this purpose.
Note also that std::set uses exactly the same mechanism, the only difference being that each element is it's own key.
* strictly speaking, the C++ standard does not specify that std::map be a balanced binary tree, but the complexity constraints it places on operations such as insertion and look-up mean that implementations chose structures such as red-black tree.
Equivalence/operator== can be expressed as a function of operator<:
bool operator==(T left, T right) {
return !(left < right) && !(right < left);
}
This is because the comparator of the map must implement a strict weak ordering, such as <.
One of the mathematical properties of such a relation is Antisymmetry, which states that for any x and y, then not (x < y) and not (y < x) implies x == y.
Therefore, after finding the first element not to compare smaller than the key you are searching for, the implementation simply checks if that element compares greater, and it's neither smaller nor greater, then it must be equal.