How do the ">" and "<" operators work for string comparisons? - c++

In the C++ textbook I'm reading (Programming, Principles and Practice using C++ by Bjarne Stroustrup), there are numerous instances of code snippets in which strings are compared as follows:
if (str1 > str2) and then some code.
Could someone please explain to me how "bigger than" and "less than" operators work in conjunction with strings declared as follows:
#include <string>
.
.
.
string str1 = "foo";
string str2 = "bar";
I've tried searching on Stack Overflow and on others for this answer, but to no avail.

The strings are compared lexicographically . The string in which an character is lexicographically greater than the corresponding character in other string will be considered as the greater string. This is known as lexicographical comparison.
The lexicographic or lexicographical order (also known
as lexical order, dictionary order, alphabetical order or
lexicographic(al) product) is a generalization of the way words are
alphabetically ordered based on the alphabetical order of their
component letters.
You can use the .compare() method as well to compare two strings . It will return the following values -
0 : They compare equal
Less than 0 : Either the value of the first character that does not
match is lower in the compared string, or all compared characters
match but the compared string is shorter.
more than 0 : Either the value of the first character that does not
match is greater in the compared string, or all compared characters
match but the compared string is longer.
On the contrary, the relational operators (like >, <, ==) will return only boolean values true of false. In the expression str1 < str2 , str1 will be smaller than str2 if first mismatched character in str1 is smaller than the corresponding character in str2. If all characters match, str1 will be less than str2 only if its length is shorter.

Relational operators on strings estabilish so-called lexicographical order. It's the same ordering as used in the dictionaries.
Here's the algorithm: to determine if a is less than b, you need to find the smallest i so that a[i] != b[i], and if (unsigned char)a[i] < (unsigned char)b[i], then a < b, otherwise a > b.
If there is no such i, then a == b.
If a is a prefix of b, then a < b (this follows from the algorithm above, if you consider the extra \0 at the end of a string to be a part of the string).

Related

What is C++ comparing when comparing two different strings?

Code below when I check if K or Y is greater, what method is used to compare two different strings? number of bits?
#include <iostream>
#include <string>
using namespace std;
int main() {
string y = "can't";
string k = "solve";
if(k > y){
cout << "k is bigger";
}else {
cout << "y is bigger";
}
return 0;
}
k is bigger
string compare is a lexigraphical comparison:
All comparisons are done via the compare() member function (which
itself is defined in terms of Traits::compare()):
Two strings are equal if both the size of lhs and rhs are equal and
each character in lhs has equivalent character in rhs at the same
position.
The ordering comparisons are done lexicographically -- the comparison
is performed by a function equivalent to std::lexicographical_compare.
And this is how lexigraphical comparison works:
A lexicographical comparison is the kind of comparison generally
used to sort words alphabetically in dictionaries; It involves
comparing sequentially the elements that have the same position in
both ranges against each other until one element is not equivalent to
the other.
These relational operators are overloaded in the header file string.
All the relational operators used for string operations can be found in the link below.
http://www.cplusplus.com/reference/string/string/operators/
Hope this clears your doubt.

How does s.compare member function behave in the C++ code below?

I have this C++ code below to solve for a homework and after I ran it with Code::Blocks, it tells me that i=0, which means the expression s.compare(t)<0 is false. But, the way I see it, it's the other way around: (s<t, because AbcA < AAbcA). Can someone please explain it to me?
#include <iostream>
#include <string>
using namespace std;
int main(void) {
string s = "Abc", t = "A";
s=s+t;
t=t+s;
int i = s.compare(t)<0;
int j = s.length()<t.length();
cout<<i+j<<endl;
return 0;
}
According to the reference std::string::compare returns:
negative value if *this appears before the character sequence specified by the arguments, in lexicographical order
zero if both character sequences compare equivalent
positive value if *this appears after the character sequence specified by the arguments, in lexicographical order
Lexicographical comparison being defined as:
Lexicographical comparison is a operation with the following properties:
Two ranges are compared element by element.
The first mismatching element defines which range is lexicographically less or greater than the other.
If one range is a prefix of another, the shorter range is lexicographically less than the other.
If two ranges have equivalent elements and are of the same length, then the ranges are lexicographically equal.
An empty range is lexicographically less than any non-empty range.
Two empty ranges are lexicographically equal.
"AbcA" comes lexicographically after "AAbc", because the first nonequal character 'b' (ASCII 0x62) comes after 'A' (ASCII 0x41)

How can I compare if a char is higher or lower in alphabetical order than another?

Pretty much as the title. I'm writing a linked list and I need a function to sort the list alphabetically, and I'm pretty stumped. Not sure how this has never come up before, but I have no idea how to do it other than to create my own function listing the entire alphabet and comparing positions of letters from scratch.
Is there any easy way to do this?
Edit for clarity:
I've got a linear linked list of class objects, each class object has a char name, and I'm writing a function to compare the name of each object in the list, to find the highest object alphabetically, and then find the next object down alphabetically, etc, linking them together as I go. I already have a function that does this for an int field, so I just need to rewrite it to compare inequalities between alphabetical characters where a is largest and z is smallest.
In hindsight that was probably a lot more relevant than I thought.
I think a couple of the answers I've gotten already should work so I'll pop back and select a best answer once I've gotten it working.
I'm also working with g++ and unity.
I think that in general case the best approach will be to use std::char_traits:
char a, b;
std::cin >> a >> b;
std::locale loc;
a = std::tolower(a, loc);
b = std::tolower(b, loc);
std::cout << std::char_traits::compare(&a, &b, 1u);
But in many common situations you can simply compare chars as you do it with other integer types.
My guess is that your list contains char* as data (it better contain std::strings as data). If the list is composed of the latter, you can simply sort using the overloaded std::string's operator<, like
return str1 < str2; // true if `str1` is lexicographically before `str2`
If your list is made of C-like null-terminated strings, then you can sort them using std::strcmp like
return std::strcmp(s1, s2);
or use the std::char_traits::compare (as mentioned by #Anton) like
return std::char_traits<char>::compare(s1, s2, std::min(std::strlen(s1), std::strlen(s2)));
or sort them via temporary std::strings (most expensive), like
return std::string(s1) < std::string(s2); // here s1 and s2 are C-strings
If your list simply contains characters, then, as mentioned in the comments,
return c1 < c2; // returns true whenever c1 is before c2 in the alphabet
If you don't care about uppercase/lowercase, then you can use std::toupper to transform the character into uppercase, then always compare the uppercase.
#include <stdio.h>
#include <ctype.h>
void main(void) {
char a = 'X', b = 'M';
printf("%i\n", a < b);
printf("%i\n", b < a);
printf("%i\n", 'a' < 'B');
printf("%i\n", tolower('a') < tolower('B'));
}
prints out:
0
1
0
1
chars are still numbers, and can be compared as such. The upper case letters and lower case letters are all in order, with the upper case letters before the lower. (Such that 'Z' < 'a'.) See an ASCII table.
As you can see from this ASCII table, all of the alphanumeric characters appear in the correct alphabetical order, regarding their actual values:
"Is there any easy way to do this?"
So yes, comparing the character values will provide to have them sorted in alphabetical order.
would something like below suffice ? convert everything to upper first.
class compareLessThanChar{
public:
bool operator()(const char a, const char b)
{ return toupper(a) < toupper(b); }
}
std::multiset<char, compareLessThanChar> sortedContainer;

C++ string sort like a human being?

I would like to sort alphanumeric strings the way a human being would sort them. I.e., "A2" comes before "A10", and "a" certainly comes before "Z"! Is there any way to do with without writing a mini-parser? Ideally it would also put "A1B1" before "A1B10". I see the question "Natural (human alpha-numeric) sort in Microsoft SQL 2005" with a possible answer, but it uses various library functions, as does "Sorting Strings for Humans with IComparer".
Below is a test case that currently fails:
#include <set>
#include <iterator>
#include <iostream>
#include <vector>
#include <cassert>
template <typename T>
struct LexicographicSort {
inline bool operator() (const T& lhs, const T& rhs) const{
std::ostringstream s1,s2;
s1 << toLower(lhs); s2 << toLower(rhs);
bool less = s1.str() < s2.str();
//Answer: bool less = doj::alphanum_less<std::string>()(s1.str(), s2.str());
std::cout<<s1.str()<<" "<<s2.str()<<" "<<less<<"\n";
return less;
}
inline std::string toLower(const std::string& str) const {
std::string newString("");
for (std::string::const_iterator charIt = str.begin();
charIt!=str.end();++charIt) {
newString.push_back(std::tolower(*charIt));
}
return newString;
}
};
int main(void) {
const std::string reference[5] = {"ab","B","c1","c2","c10"};
std::vector<std::string> referenceStrings(&(reference[0]), &(reference[5]));
//Insert in reverse order so we know they get sorted
std::set<std::string,LexicographicSort<std::string> > strings(referenceStrings.rbegin(), referenceStrings.rend());
std::cout<<"Items:\n";
std::copy(strings.begin(), strings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
std::vector<std::string> sortedStrings(strings.begin(), strings.end());
assert(sortedStrings == referenceStrings);
}
Is there any way to do with without writing a mini-parser?
Let someone else do that?
I'm using this implementation: http://www.davekoelle.com/alphanum.html, I've modified it to support wchar_t, too.
It really depends what you mean by "parser." If you want to avoid writing a parser, I would think you should avail yourself of library functions.
Treat the string as a sequence of subsequences which are uniformly alphabetic, numeric, or "other."
Get the next alphanumeric sequence of each string using isalnum and backtrack-checking for + or - if it is a number. Use strtold in-place to find the end of a numeric subsequence.
If one is numeric and one is alphabetic, the string with the numeric subsequence comes first.
If one string has run out of characters, it comes first.
Use strcoll to compare alphabetic subsequences within the current locale.
Use strtold to compare numeric subsequences within the current locale.
Repeat until finished with one or both strings.
Break ties with strcmp.
This algorithm has something of a weakness in comparing numeric strings which exceed the precision of long double.
Is there any way to do it without writing a mini parser? I would think the answer is no. But writing a parser isn't that tough. I had to do this a while ago to sort our company's stock numbers. Basically just scan the number and turn it into an array. Check the "type" of every character: alpha, number, maybe you have others you need to deal with special. Like I had to treat hyphens special because we wanted A-B-C to sort before AB-A. Then start peeling off characters. As long as they are the same type as the first character, they go into the same bucket. Once the type changes, you start putting them in a different bucket. Then you also need a compare function that compares bucket-by-bucket. When both buckets are alpha, you just do a normal alpha compare. When both are digits, convert both to integer and do an integer compare, or pad the shorter to the length of the longer or something equivalent. When they're different types, you'll need a rule for how those compare, like does A-A come before or after A-1 ?
It's not a trivial job and you have to come up with rules for all the odd cases that may arise, but I would think you could get it together in a few hours of work.
Without any parsing, there's no way to compare human written numbers (high values first with leading zeroes stripped) and normal characters as part of the same string.
The parsing doesn't need to be terribly complex though. A simple hash table to deal with things like case sensitivity and stripping special characters ('A'='a'=1,'B'='b'='2,... or 'A'=1,'a'=2,'B'=3,..., '-'=0(strip)), remap your string to an array of the hashed values, then truncate number cases (if a number is encountered and the last character was a number, multiply the last number by ten and add the current value to it).
From there, sort as normal.

How to get the next prefix in C++?

Given a sequence (for example a string "Xa"), I want to get the next prefix in order lexicographic (i.e "Xb"). The next of "aZ" should be "b"
A motivating use case where this function is useful is described here.
As I don't want to reinvent the wheel, I'm wondering if there is any function in C++ STL or boost that can help to define this generic function easily?
If not, do you think that this function can be useful?
Notes
Even if the examples are strings, the function should work for any Sequence.
The lexicographic order should be a template parameter of the function.
From the answers I conclude that there is nothing on C++/Boost that can help to define this generic function easily and also that this function is too specific to be proposed for free. I will implement a generic next_prefix and after that I will request if you find it useful.
I have accepted the single answer that gives some hints on how to do that even if the proposed implementation is not generic.
I'm not sure I understand the semantics by which you wish the string to transform, but maybe something like the following can be a starting point for you. The code will increment the sequence, as if it was a sequence of digits representing a number.
template<typename Bi, typename I>
bool increment(Bi first, Bi last, I minval, I maxval)
{
if( last == first ) return false;
while( --last != first && *last == maxval ) *last = minval;
if( last == first && *last == maxval ) {
*last = minval;
return false;
}
++*last;
return true;
}
Maybe you wish to add an overload with a function object, or an overload or specialization for primitives. A couple of examples:
string s1("aaz");
increment(s1.begin(), s1.end(), 'a', 'z');
cout << s1 << endl; // aba
string s2("95");
do {
cout << s2 << ' '; // 95 96 97 98 99
} while( increment(s2.begin(), s2.end(), '0', '9') );
cout << endl;
That seem so specific that I can't see how it would get in STL or boost.
When you say the order is a template parameter, what are you envisaging will be passed? A comparator that takes two characters and returns bool?
If so, then that's a bit of a nightmare, because the only way to find "the least char greater than my current char" is to sort all the chars, find your current char in the result, and step forward one (or actually, if some chars might compare equal, use upper_bound with your current char to find the first greater char).
In practice, for any sane string collation you can define a "get the next char, or warn me if I gave you the last char" function more efficiently, and build your "get the next prefix" function on top of that. Hopefully, permitting an arbitrary order is more flexibility than you need.
Orderings are typically specified as a comparator, not as a sequence generator.
Lexicographical orderings in particular tend be only partial, for example, in case or diacritic insensitivity. Therefore your final product will be nondeterministic, or at best arbitrary. ("Always choose lowest numerical encoding"?)
In any case, if you accept a comparator as input, the only way to translate that to an increment operation would be to compare the current value against every other in the character space. Which could work, 127 values being so few (a comparator-sorted table would make short work of the problem), or could be impossibly slow, if you use any other kind of character.
The best way is likely to define the character ordering somehow, then define the rules from going from one character to two characters to three characters.
Use whatever sort function you wish to use over the complete list of characters that you want to include, then just use that as the ordering. Find the index of the current character, and you can easily find the previous and next characters. Only advance the right-most character, unless it's going to roll over, then advance the next character to the left.
In other words, reinventing the wheel is like 10 lines of Python. Probably less than 500 lines of C++. :)