C++: How does set know when two items are equal? - c++

I have created a set of C-strings, supplying my own comparator function because I wanted it to only take take the first three characters into account. Here's its definition:
struct set_object {
bool operator()(const char* first, const char* second) {
return strncmp(first, second, 3) > 0;
}
};
std::set<const char*, set_object> c_string_set;
It works as I wanted it to, sorting the strings as I add them the way I outlined in the set_object class. But the interesting part begins when I try to add a string that compares equal to one already added. For example, if I try to add "aaab" when there is already "aaa" in the set, it doesn't add it into the set. If I add "aaab" first, then try to add "aaa", it only lists "aaab". But how does it know when they are equal if I only provided a function that returns true when one of the strings is greater? It should return false when it's either equal or smaller!
To clarify, it's not a problem, just trying to figure out how C++ works.

You're right that set_object(x, y) returning false doesn't say whether x is less than y or they are equal. So set then calls set_object(y, x) to find out.

if (!less(first,second) && !less(second,first)) // equivalent!
If neither one is less than the other, they are equivalent (not equal, there's a very subtle difference).

If an item x is not greater nor smaller than another item y. it means that x and y are the same...

The items are deemed equivalent if a<b and b<a are both false.
See http://www.sgi.com/tech/stl/StrictWeakOrdering.html.

Related

STL pair comparison - first elements

Can someone explain meaning of this paragraph
The great advantage of pairs is that they have built-in operations to compare themselves. Pairs are compared first-to-second element. If the first elements are not equal, the result will be based on the comparison of the first elements only; the second elements will be compared only if the first ones are equal. The array (or vector) of pairs can easily be sorted by STL internal functions.
and hence this
For example, if you want to sort the array of integer points so that they form a polygon, it’s a good idea to put them to the vector< pair<double, pair<int,int> >, where each element of vector is { polar angle, { x, y } }. One call to the STL sorting function will give you the desired order of points.
I have been struggling for an hour to understand this.
Source
Consider looking at operator< for pair<A,B>, which is a class that looks something like:
struct pairAB {
A a;
B b;
};
You could translate that paragraph directly into code:
bool operator<(const pairAB& lhs, const pairAB& rhs) {
if (lhs.a != rhs.a) { // If the first elements are not equal
return lhs.a < rhs.a; // the result will be based on
} // the comparison of the first elements only
return lhs.b < rhs.b; // the second elements will be compared
// only if the first ones are equal.
}
Or, thinking more abstractly, this is how lexicographic sort works. Think of how you would order two words. You'd compare their first letters - if they're different, you can stop and see which one is less. If they're the same, then you go onto the second letter.
The first paragraph says that pairs have an ordering as follows: if you have (x, y) and (z, w), and you compare them, then it will first check if x is smaller (or larger) than z: if yes, than the first pair is smaller (or larger) than the second. If x = z, however, then it will compare y and w. This makes it very convenient to do stuff like sorting a vector of pairs if the first elements of the pairs are more important to the order than the second elements.
The second paragraph gives an interesting application. Suppose you stand at some point on a plane, and there's a polygon enclosing you. Then each point will have an angle and a distance. But given the points, how do you know in what order should they be to form a polygon (without crisscrossing themselves)? If you store the points in this format (angle, distance), then you'll get the circling direction for free. That's actually rather neat.
The STL pair is a container to hold two objects together. Consider this for example,
pair a, b;
The first element can be accessed via a.first and the second via a.second.
The first paragraph is telling us that STL provides built-in operations to compare two pairs. For example, you need to compare 'a' and 'b', then the comparison is first done using a.first and b.first. If both the values are same, then the comparison is done using a.second and b.second. Since this is a built-in functionality, you can easily use it with the internal functions of STL like sort, b_search, etc.
The second paragraph is an example of how this might be used. Consider a situation where you would want to sort the points in a polygon. You would first want to sort them based on their polar angle, then the x co-ordinate and then the y co-ordinate. Thus we make use of the pair {angle, {x,y}}. So any comparison would be first done on the angle, then advanced to the x value and then the y value.
It will be easier to understand if to compare a simple example of pairs of last names and first names.
For example if you have pairs
{ Tomson, Ann }
{ Smith, Tony }
{ Smith, John }
and want to sort them in the ascending order you have to compare the pairs with each other.
If you compare the first two pairs
{ Tomson, Ann }
{ Smith, Tony }
then the last name of the first pair is greater than the last name of the second pair. So there is no need to compare also the first names. It is already clear that pair
{ Smith, Tony }
has to precede pair
{ Tomson, Ann }
On the other hand if you compare pairs
{ Smith, Tony }
{ Smith, John }
then the last names of the pairs are equal. So you need to compare the first names of the pairs. As John is less than Tony then it is clear that pair
{ Smith, John }
will precede pair
{ Smith, Tony }
though the last names (the first elements of the pairs) are equal.
As for this pair { polar angle, { x, y } } then if polar ahgles of two different pairs are equal then there will be compared { x, y } that in turn a pair. So if fird elements ( x ) are equal than there will be compared y(s).
It's actually when a you have vector/arrays of pairs you don't have to care about sorting when you use sort() function,You just use sort(v.begin(),v.end())-> it will be automatically sorted on the basis of first element and when first elements are equal they will compared using second element. See code and output in the link,it will be all clear. https://ideone.com/Ad2yVG .see code in link

Is it possible to process equality in a std::set comparator?

I am sorry if the title isn't very descriptive, I was having a hard time figuring out how to name this question. This is pretty much the first time I need to use a set, though I've been using maps forever.
I don't think it is possible, but I need to ask. I would like to perform a specific action on a struct when I add it to my std::set, but only if equality is true.
For example, I can use a list and then sort() and unique() the list. In my predicate, I can do as I wish, since I will get the result if 2 values are equal.
Here is a quick example of what my list predicate looks like:
bool markovWeightOrdering (unique_ptr<Word>& w1, unique_ptr<Word>& w2) {
if (w1->word_ == w2->word_) {
w1->weight_++;
return true;
}
return false;
}
Does anyone have an idea how to achieve a similar result, while using a std::set for the obvious gain in performance (and simplicity), since my container needs to be unique anyways? Thank you for any help or guidance, it is much appreciated.
element in set are immutable, so you cannot modify them.
if you use set with pointer (or similar), the pointed object may be modified (but care to not modify the order). std::set::insert returns a pair with iterator and a boolean to tell if element has been inserted, so you may do something like:
auto p = s.insert(make_unique<Word>("test"));
if (p.second == false) {
(*p.first)->weight += 1;
}
Live example
Manipulating a compare operator is likely a bad idea.
You might use a std::set with a predicate, instead:
struct LessWord
{
bool operator () (const std::unique_ptr<Word>& w1, const std::unique_ptr<Word>& w2) {
return w1->key < w2->key;
}
};
typedef std::set<std::unique_ptr<Word>, LessWord> word_set;
Than you test at insert if the word is existing and increment the weight:
word_set words;
std::unique_ptr<Word> word_ptr;
auto insert = words.insert(word_ptr);
if( ! insert.second)
++(insert.first->get()->weight_);
Note: Doing this is breaking const correctness, logically. A set element is immutable, but the unique_ptr enables modifications (even a fatal modification of key values).

Test if all elements of a vector are equal

I want to test if a non-empty vector contains identical elements. Is this the best way?
count(vecSamples.begin()+1, vecSamples.end(), vecSamples.front()) == vecSamples.size()-1;
In c++11 (or Boost Algorithm)
std::all_of(vecSamples.begin()+1,vecSamples.end(),
[&](const T & r) {return r==vecSamples.front();})
As #john correctly points out, your solution iterates over the entire container even if the first two elements are different, which is quite a waste.
How about a purely no-boost no c++11 required solution?
bool allAreEqual =
find_if(vecSamples.begin() + 1,
vecSamples.end(),
bind1st(not_equal_to<int>(), vecSamples.front())) == vecSamples.end();
Stops on first non-equal element found.
Just make sure your vecSamples is non-empty before running this.
Probably not, because it always examines all the elements of the vector even if the first two elements are different. Personally I'd just write a for loop.
If your vector contains at least one element:
std::equal(vecSamples.begin() + 1, vecSamples.end(), vecSamples.begin())

String Comparison return value (Is is used in applications that sorts characters ?)

When we use strcmp(str1, str2); or str1.compare(str2); the return values are like -1, 0 and 1, for str1 < str2, str1 == str2 or str1 > str2 respectively.
The question is, is it defined like this for a specific reason?
For instance, in binary tree sorting algorithm, we push smaller values to the left child and larger values to the right child. This strcmp or string::compare functions seem to be perfect for that. However, does anyone use string matching in order to sort a tree (integer index are easier to use) ?
So, what is the actual purpose of the three return values ( -1, 0, 1). Why cant it just return 1 for true, and 0 for false?
Thanks
The purpose of having three return values is exactly what it seems like: to answer all questions about string comparisons at once.
Everyone has different needs. Some people sometimes need a simple less-than test; strncmp provides this. Some people need equality testing; strncmp provides this. Some people really do need to know the full relationship between two strings; strncmp provides this.
What you absolutely don't want is someone writing this:
if(strless(lhs, rhs))
{
}
else if(strequal(lhs, rhs))
{
}
That's doing two potentially expensive comparison operations. strless also knows if they were equal, because it had to get to the end of both strings to return that it was not less.
Oh, and FYI: the return values isn't -1 or +1; it's greater than zero or less than zero. Or zero if they're equal.
It's useful for certain cases where knowing all three cases is important. Use operator< for string when you just care about a boolean comparison.
It could, but then you would need multiple functions for sorting and comparison. With strcmp() returning smaller, equal or bigger, you can use them easily for comparison and for sorting.
Remember that BSTs are not the only place where you would like to compare strings. You might want to sort a name list or similar. Also, it is not uncommon to have a string as key in a tree too.
As others have stated, there are real purposes for comparison of strings with < > == implications. For example; fixed length numbers assigned to strings will resolve correctly; ie: "312235423" > "312235422". On some occasions this is useful.
However the feature you're asking for, true/false for solutions still works with the given return values.
if (-1)
{
// resolves true
}
else if (1)
{
// also resolves true
}
else if (0)
{
// resolves false
}

Is it possible to construct an "infinite" string?

Is there any real sequence of characters that always compares greater than any other string?
My first thought was that a string constructed like so:
std::basic_string<T>(std::string::max_size(), std::numeric_limits<T>::max())
Would do the trick, provided that the fact that it would almost definitely fail to work isn't such a big issue. So I presume this kind of hackery could only be accomplished in Unicode, if it can be accomplished at all. I've never heard of anything that would indicate that it really is possible, but neither have I heard tell that it isn't, and I'm curious.
Any thoughts on how to achieve this without a possibly_infinite<basic_string<T>>?
I assume that you compare strings using their character value. I.e. one character acts like a digit, a longer string is greater than shorter string, etc.
s there any real sequence of characters that always compares greater than any other string?
No, because:
Let's assume there is a string s that is always greater than any other string.
If you make a copy of s, the copy will be equal to s. Equal means "not greater". Therefore there can be a string that is not greater than s.
If you make a copy of s and append one character at the end, it will be greater than original s. Therefore there can be a string that is greater than s.
Which means, it is not possible to make s.
I.e.
A string s that is always greater than any other string cannot exist. A copy of s (copy == other string) will be equal to s, and "equal" means "not greater".
A string s that is always greater or equal to any other string, can exist if a maximum string size has a reasonable limit. Without a size limit, it will be possible to take a copy of s, append one character at the end, and get a string that is greater than s.
In my opinion, the proper solution would be to introduce some kind of special string object that represents infinitely "large" string, and write a comparison operator for that object and standard string. Also, in this case you may need custom string class.
It is possible to make string that is always less or equal to any other string. Zero length string will be exactly that - always smaller than anything else, and equal to other zero-length strings.
Or you could write counter-intuitive comparison routine where shorter string is greater than longer string, but in this case next code maintainer will hate you, so it is not a good idea.
Not sure why would you ever need something like that, though.
You probably need a custom comparator, for which you define a magic "infinite string" value and which will always treat that value as greater than any other.
Unicode solves a lot of problems, but not that one. Unicode is just a different encoding for a character, 1, 2 or 4 bytes, they are still stored in a plain array. You can use infinite strings when you find a machine with infinite memory.
Yes. How you do it, I have no idea :)
You should try to state what you intend to achieve and what your requirements are. In particular, does it have to be a string? is there any limitation on the domain? do they need to be compared with <?
You can use a non-string type:
struct infinite_string {};
bool operator<( std::string const & , infinite_string const & ) {
return true;
}
bool operator<( infinite_string const &, std::string const & ) {
return false;
}
If you can use std::lexicographical_compare and you don't need to store it as a string, then you can write an infinite iterator:
template <typename CharT>
struct infinite_iterator
{
CharT operator*() { return std::numeric_limits<CharT>::max(); }
infinite_iterator& operator++() { return *this; }
bool operator<( const infinite_iterator& ) { return true; }
// all other stuff to make it proper
};
assert( std::lexicographical_compare( str.begin(), str.end(),
infinite_iterator, infinite_iterator ) );
If you can use any other comparisson functor and your domain has some invalid you can use that to your advantage:
namespace detail {
// assume that "\0\0\0\0\0" is not valid in your domain
std::string const infinite( 5, 0 );
}
bool compare( std::string const & lhs, std::string const & rhs ) {
if ( lhs == detail::infinite ) return false;
if ( rhs == detail::infinite ) return true;
return lhs < rhs;
}
if you need an artificial bound within a space of objects that isn't bounded, the standard trick is to add an extra element and define a new comparison operator that enforces your property.
Or implement lazy strings.
Well if you were to dynamically construct a string of equal length as the one that you are comparing to and fill it with the highest ASCII code available (7F for normal ASCII or FF for extended) you would be guaranteed that this string would compare equal to or greater than the one you compare it to.
What's your comparator?
Based on that, you can construct something that is the 'top' of your lattice.