How are strings compared in IF statement arguments - c++

In the if function argument, how is the string s and t compared? Why is the condition (s > t) true?
string s = "to be";
string t = "not " + s; // t = “not to be”
string u = s + " or " + t; // u = “to be or not to be”
if (s > t) // true: “to be” > “not to be”
cout << u; // outputs “to be or not to be”

std::string operator comp
All comparisons are done via the compare() member function (which
itself is defined in terms of Traits::compare()):
Two strings are equal if both the size of lhs and rhs are equal and each character in lhs has equivalent character in rhs at the same
position.
The ordering comparisons are done lexicographically -- the comparison is performed by a function equivalent to
std::lexicographical_compare or std::lexicographical_compare_three_way
(since C++20).
So, in short it does a lexicographical compare
I.e. "to be"s > "not to be"s == true because - at first position - 't' > 'n'.

The comparison of std::string was designed to be not surprising, or at least minimally surprising. If you stick to lowercase letters and spaces, as in your example, operator< and operator> follow alphabetical ordering.
not to be
to be
to be or not to be
Since you are sticking to the simple case, string{"to be"} > string{"not to be"} because they are in reverse alphabetical order. That is, 't' > 'n' (as characters).
When you expand into other characters, there might be some surprises. For example, 'Z' < 'a' since ASCII puts capital letters before lowercase letters. Still, the principle still holds: the ordering of std::string is based on the ordering of the underlying character set. Look for the first character position where the strings differ; the strings are ordered the same as the characters in that position. If one (and only one) string ran out of characters before a difference was found, then the shorter string comes before the longer one.

Related

How does std::set comparator function work?

Currently working on an algorithm problems using set.
set<string> mySet;
mySet.insert("(())()");
mySet.insert("()()()");
//print mySet:
(())()
()()()
Ok great, as expected.
However if I put a comp function that sorts the set by its length, I only get 1 result back.
struct size_comp
{
bool operator()(const string& a, const string& b) const{
return a.size()>b.size();
}
};
set<string, size_comp> mySet;
mySet.insert("(())()");
mySet.insert("()()()");
//print myset
(())()
Can someone explain to me why?
I tried using a multi set, but its appending duplicates.
multiset<string,size_comp> mSet;
mSet.insert("(())()");
mSet.insert("()()()");
mSet.insert("()()()");
//print mset
"(())()","()()()","()()()"
std::set stores unique values only. Two values a,b are considered equivalent if and only if
!comp(a,b) && !comp(b,a)
or in everyday language, if a is not smaller than b and b is not smaller than a. In particular, only this criterion is used to check for equality, the normal operator== is not considered at all.
So with your comparator, the set can only contain one string of length n for every n.
If you want to allow multiple values that are equivalent under your comparison, use std::multiset. This will of course also allow exact duplicates, again, under your comparator, "asdf" is just as equivalent to "aaaa" as it is to "asdf".
If that does not make sense for your problem, you need to come up with either a different comparator that induces a proper notion of equality or use another data structure.
A quick fix to get the behavior you probably want (correct me if I'm wrong) would be introducing a secondary comparison criterion like the normal operator>. That way, we sort by length first, but are still able to distinguish between different strings of the same length.
struct size_comp
{
bool operator()(const string& a, const string& b) const{
if (a.size() != b.size())
return a.size() > b.size();
return a > b;
}
};
The comparator template argument, which defaults to std::less<T>, must represent a strict weak ordering relation between values in its domain.
This kind of relation has some requirements:
it's not reflexive (x < x yields false)
it's asymmetric (x < y implies that y < x is false)
it's transitive (x < y && y < z implies x < z)
Taking this further we can define equivalence between values in term of this relation, because if !(x < y) && !(y < x) then it must hold that x == y.
In your situation you have that ∀ x, y such that x.size() == y.size(), then both comp(x,y) == false && comp(y,x) == false, so since no x or y is lesser than the other, then they must be equal.
This equivalence is used to determine if two items correspond to the same, thus ignoring second insertion in your example.
To fix this you must make sure that your comparator never returns false for both comp(x,y) and comp(y,x) if you don't want to consider x equal to y, for example by doing
auto cmp = [](const string& a, const string& b) {
if (a.size() != b.size())
return a.size() > b.size();
else
return std::less()(a, b);
}
So that for input of same length you fallback to normal lexicographic order.
This is because equality of elements is defined by the comparator. An element is considered equal to another if and only if !comp(a, b) && !comp(b, a).
Since the length of "(())()" is not greater, nor lesser than the length of "()()()", they are considered equal by your comparator. There can be only unique elements in a std::set, and an equivalent object will overwrite the existing one.
The default comparator uses operator<, which in the case of strings, performs lexicographical ordering.
I tried using a multi set, but its appending duplicates.
Multiset indeed does allow duplicates. Therefore both strings will be contained despite having the same length.
size_comp considers only the length of the strings. The default comparison operator uses lexicographic comparison, which distinguishes based on the content of the string as well as the length.

can not understand how pairs in c++ stl work?

While reading a tutorial on Topcoder I came across a statement
Pairs are compared first-to-second element. If the first elements are not equal, the result will be based on the comparison of the first elements only; the second elements will be compared only if the first ones are equal.
I cannot understand what this statement is trying to say?
Consider pairs of std::pair<int, int>
std::pair<int, int> a = {1,1};
std::pair<int, int> b = {1,3};
std::pair<int, int> c = {3,2};
To determine a < c we can look at the first item and see that 1 < 3. We don't even need to consider the second element at this point.
But to determine a < b, both first items are 1, so we must then look at the second item to see that 1 < 3.
If you compare b < c, you will find that b has a smaller 1st element, but c has a smaller second element. Since the first element takes precedence, it b will be considered smaller.
So if you were to sort these pairs, they would be arranged
a < b < c
It means the following expression
( p1.first == p2.first ) && ( p1.second == p2.second )
If subexpression
( p1.first == p2.first )
is equal to flase then subexpression
( p1.second == p2.second )
will not be evaluated because it is already clear that the whole expression will be equal to false.
That is the comparison of pairs corresponds to evaluation of logical AND operator that according to the C++ STandard evaluates the following way
1 The && operator groups left-to-right. The operands are both
contextually converted to bool (Clause 4). The result is true if both
operands are true and false otherwise. Unlike &, && guarantees
left-to-right evaluation: the second operand is not evaluated if the
first operand is false.
Let's say you have list of names of persons which you want to sort alphabatically. Assume also that every name has first name and last name only.
How would you sort?
You just compare first names until they match exactly. When first names are same then you check for second name.

using a custom comparator with std::set

I'm trying to create a list of words read from a file arranged by their length. For that, I'm trying to use std::set with a custom comparator.
class Longer {
public:
bool operator() (const string& a, const string& b)
{ return a.size() > b.size();}
};
set<string, Longer> make_dictionary (const string& ifile){
// produces a map of words in 'ifile' sorted by their length
ifstream ifs {ifile};
if (!ifs) throw runtime_error ("couldn't open file for reading");
string word;
set<string, Longer> words;
while (ifs >> word){
strip(word);
tolower(word);
words.insert(word);
}
remove_plurals(words);
if (ifs.eof()){
return words;
}
else
throw runtime_error ("input failed");
}
From this, I expect a list of all words in a file arranged by their length. Instead, I get a very short list, with exactly one word for each length occurring in the input:
polynomially-decidable
complexity-theoretic
linearly-decidable
lexicographically
alternating-time
finite-variable
newenvironment
documentclass
binoppenalty
investigate
usepackage
corollary
latexsym
article
remark
logic
12pt
box
on
a
Any idea of what's going on here?
With your comparator, equal-length words are equivalent, and you can't have duplicate equivalent entries in a set.
To maintain multiple words, you should modify your comparator so that it also performs, say, a lexicographic comparison if the lengths are the same.
Your comparator only compares by length, that means that equally-sized but different strings are treated as being equivalent by std::set. (std::set treats them equally if neither a < b nor b < a are true, with < being your custom comparator function.)
That means your comparator should also consider the string contents to avoid this situation. The keyword here is lexicographic comparison, meaning you take multiple comparison criteria in account. The first criterion would be your string length, and the second would be the string itself. An easy way to write lexicographic comparison is to make use of std::tuple which provides a comparison operator performing lexicographic comparison on the components by overloading the operator<.
To make your "reverse" ordering of length, which you wrote with operator>, compatible with the usually used operator<, simply take the negative size of the strings, i.e. first rewrite a.size() > b.size() as -a.size() < -b.size(), and then compose it with the string itself into tuples, finally compare the tuples with <:
class Longer {
public:
bool operator() (const string& a, const string& b)
{
return std::make_tuple(-a.size(), a )
< std::make_tuple(-b.size(), b );
// ^^^^^^^^^ ^^^
// first second
// criterion criterion
}
};

Sorting a vector of ints that have been converted into strings

So I am needing to sort a vector of strings in numerical order. I am using the sort function and it almost works. Say I have the numbers 10, 20, 5, 200, 50, 75 that have been converted to strings. The sort function sorts them like so: 10, 200, 25, 5, 50, 75. So it is only sorting the first character I suppose? Is there an easy way to get it to sort more than the first character? And yes, they must be converted to strings for my particular use.
Thanks!
Look the following piece of code:
#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
int main()
{
std::vector<std::string> v {"123", "453", "78", "333"};
std::sort(std::begin(v), std::end(v), [] (std::string const &A, std::string const &B) { return std::stoi(A) < std::stoi(B);});
for(auto i : v) std::cout << i << std::endl;
}
The question is really why you want to sort this after it became a vector of strings and not before that.
The simplest way to sort a vector of strings holding ints might be to convert it to ints, sort that and then convert back to strings into the first vector... which in your case could be more efficient if you did not convert to strings in the first place.
Regarding the suggestion to convert to int on the fly inside the comparator, that is going to be expensive. Comparing int is trivial compared with the process of conversion from string to int. Sorting is O(N log N) (expected) number of comparisons, if you convert on the fly you will be doing O(N log N) conversions, if you convert once you will do O(N) conversions and O(N log N) trivial int compares.
You can also handcraft an algorithm to do the comparison. If you can assume that all values are positive and there are no leading zeros, a number, represented as a string, is larger than any other number represented as a string with a shorter length. You could use that to build a comparisson function:
struct Compare {
bool operator()(std::string const & lhs, std::string const & rhs) const {
return lhs.size() < rhs.size()
|| (lhs.size() == rhs.size() && lhs < rhs);
}
};
If there can be leading zeros, it is simple to find how many leading zeroes and adjust the size accordingly inside the comparator. If the numbers can be negative you can further extend the comparator to detect the sign and then apply something similar to the comparisson above.
Can you use a standard map instead?
// now since map is already sorted by keys, you look up on the integer to get the equivalent string.
std::map<int, string> integersAndStrings;
integersAndStrings[1] = "one";
integersAndStrings[2] = "two";
integersAndStrings[3] = "three";
You could also write a variant of 40two's example. Instead of stoi, you can just make your own predicate to compare the characters. If lhs has fewer digits than rhs, lhs must be a smaller number (assuming no floating point); if same number of digits than compare the strings (i.e., what David Rodriguez showed you in his answer). I didn't notice that he had already suggested that when I wrote my answer. The only additional thing that I am adding is really the suggestion of using another container (i.e., std::map).

std::map with a char[5] key that may contain null bytes

The keys are binary garbage and I only defined them as chars because I need a 1-byte array.
They may contain null bytes.
Now problem is, when I have a two keys: ab(0)a and ab(0)b ((0) being a null byte), the map treats them as strings, considers them equal and I don't get two unique map entries.
What's the best way to solve this?
Why not use std::string as key:
//must use this as:
std::string key1("ab\0a",4);
std::string key2("ab\0b",4);
std::string key3("a\0b\0b",5);
std::string key4("a\0\0b\0b",6);
Second argument should denote the size of the c-string. All of the above use this constructor:
string ( const char * s, size_t n );
description of which is this:
Content is initialized to a copy of the string formed by the first n characters in the array of characters pointed by s.
Use std::array<char,5> or maybe even better (if you want really to handle keys as binary values) std::bitset
If you really want to use char[5] as your key, consider writing your own comparison class to compare between keys correctly. The map class requires one of these in order to organize its contents. By default, it is using a version that doesn't work with your key.
Here's a page on the map class that shows the parameters for map. You'd want to write your own Compare class to replace less<Key> which is the third template parameter to map.
If you only need to distinguish them and don't rely on a lexicographical ordering you could treat each key as uint64_t. This has the advantage, that you could easily replace std::map by a hashmap implementation and that you don't have to do anything by hand.
Otherwise you can also write your own comparator somehow like this:
class MyKeyComp
{
public:
operator()(char* lhs, char* rhs)
{
return lhs[0] == rhs[0] ?
(lhs[1] == rhs[1] ?
(lhs[2] == rhs[2] ?
(lhs[3] == rhs[3] ? lhs[4] < rhs[4])
: lhs[3] < rhs[3])
: lhs[2] < rhs[2])
: lhs[1] < rhs[1])
: lhs[0] < rhs[0];
}
};