Is there any real sequence of characters that always compares greater than any other string?
My first thought was that a string constructed like so:
std::basic_string<T>(std::string::max_size(), std::numeric_limits<T>::max())
Would do the trick, provided that the fact that it would almost definitely fail to work isn't such a big issue. So I presume this kind of hackery could only be accomplished in Unicode, if it can be accomplished at all. I've never heard of anything that would indicate that it really is possible, but neither have I heard tell that it isn't, and I'm curious.
Any thoughts on how to achieve this without a possibly_infinite<basic_string<T>>?
I assume that you compare strings using their character value. I.e. one character acts like a digit, a longer string is greater than shorter string, etc.
s there any real sequence of characters that always compares greater than any other string?
No, because:
Let's assume there is a string s that is always greater than any other string.
If you make a copy of s, the copy will be equal to s. Equal means "not greater". Therefore there can be a string that is not greater than s.
If you make a copy of s and append one character at the end, it will be greater than original s. Therefore there can be a string that is greater than s.
Which means, it is not possible to make s.
I.e.
A string s that is always greater than any other string cannot exist. A copy of s (copy == other string) will be equal to s, and "equal" means "not greater".
A string s that is always greater or equal to any other string, can exist if a maximum string size has a reasonable limit. Without a size limit, it will be possible to take a copy of s, append one character at the end, and get a string that is greater than s.
In my opinion, the proper solution would be to introduce some kind of special string object that represents infinitely "large" string, and write a comparison operator for that object and standard string. Also, in this case you may need custom string class.
It is possible to make string that is always less or equal to any other string. Zero length string will be exactly that - always smaller than anything else, and equal to other zero-length strings.
Or you could write counter-intuitive comparison routine where shorter string is greater than longer string, but in this case next code maintainer will hate you, so it is not a good idea.
Not sure why would you ever need something like that, though.
You probably need a custom comparator, for which you define a magic "infinite string" value and which will always treat that value as greater than any other.
Unicode solves a lot of problems, but not that one. Unicode is just a different encoding for a character, 1, 2 or 4 bytes, they are still stored in a plain array. You can use infinite strings when you find a machine with infinite memory.
Yes. How you do it, I have no idea :)
You should try to state what you intend to achieve and what your requirements are. In particular, does it have to be a string? is there any limitation on the domain? do they need to be compared with <?
You can use a non-string type:
struct infinite_string {};
bool operator<( std::string const & , infinite_string const & ) {
return true;
}
bool operator<( infinite_string const &, std::string const & ) {
return false;
}
If you can use std::lexicographical_compare and you don't need to store it as a string, then you can write an infinite iterator:
template <typename CharT>
struct infinite_iterator
{
CharT operator*() { return std::numeric_limits<CharT>::max(); }
infinite_iterator& operator++() { return *this; }
bool operator<( const infinite_iterator& ) { return true; }
// all other stuff to make it proper
};
assert( std::lexicographical_compare( str.begin(), str.end(),
infinite_iterator, infinite_iterator ) );
If you can use any other comparisson functor and your domain has some invalid you can use that to your advantage:
namespace detail {
// assume that "\0\0\0\0\0" is not valid in your domain
std::string const infinite( 5, 0 );
}
bool compare( std::string const & lhs, std::string const & rhs ) {
if ( lhs == detail::infinite ) return false;
if ( rhs == detail::infinite ) return true;
return lhs < rhs;
}
if you need an artificial bound within a space of objects that isn't bounded, the standard trick is to add an extra element and define a new comparison operator that enforces your property.
Or implement lazy strings.
Well if you were to dynamically construct a string of equal length as the one that you are comparing to and fill it with the highest ASCII code available (7F for normal ASCII or FF for extended) you would be guaranteed that this string would compare equal to or greater than the one you compare it to.
What's your comparator?
Based on that, you can construct something that is the 'top' of your lattice.
Related
Is there a way to find and replace subset of a char*/string in a set?
Example:
std::set<char*> myset;
myset.insert("catt");
myset.insert("world");
myset.insert("hello");
it = myset.subsetfind("tt");
myset.replace(it, "t");
There are at least three reasons why this won't work.
std::set provides only the means to search the set for a value that compares equally to the value being searched for, and not to a value that matches some arbitrary portion of the value.
The shown program is undefined behavior. A string literal, such as "hello" is a const char *, and not a char *. No self-respecting C++ compiler will allow you to insert a const char * into a container of char *s. And you can't modify const values, by definition, anyway.
Values in std::set cannot be modified. To effect the modification of an existing value in a set, it must be erase()d, then the new value insert()ed.
std::set is simply not the right container for the goals you're trying to accomplish.
No, you can't (or at least shouldn't) modify the key while it's in the set. Doing so could change the relative order of the elements, in which case the modification would render the set invalid.
You need to start with a set of things you can modify. Then you need to search for the item, remove it from the set, modify it, then re-insert the result back into the set.
std::set<std::string> myset {"catt", "world", "hello"};
auto pos = std::find_if(myset.begin(), myset.end(), [](auto const &s) { return s.find("tt");};
if (pos != myset.end()) {
auto temp = *pos;
myset.remove(pos);
auto p= temp.find("tt");
temp.replace(p, 2, "t");
myset.insert(temp);
}
You cannot modify elements within a set.
You can find strings that contain the substring using std::find_if. Once you find matching elements, you can remove each from the set and add a modified copy of the string, with the substring replaced with something else.
PS. Remember that you cannot modify string literals. You will need to allocate some memory for the strings.
PPS. Implicit conversion of string literal to char* has been deprecated since C++ was standardized, and since C++11 such conversion is ill-formed.
PPPS. The default comparator will not be correct when you use pointers as the element type. I recommend you to use std::string instead. (A strcmp based comparator approach would also be possible, although much more prone to memory bugs).
You could use std::find_if with a predicate function/functor/lambda that searches for the substring you want.
I am trying the lower_bound function in C++.
Used it multiple times for 1 d datatypes.
Now, I am trying it on a sorted array dict[5000][20] to find strings of size <=20.
The string to be matched is in str.
bool recurseSerialNum(char *name,int s,int l,char (*keypad)[3],string str,char (*dict)[20],int
dictlen)
{
char (*idx)[20]= lower_bound(&dict[0],&dict[0]+dictlen,str.c_str());
int tmp=idx-dict;
if(tmp!=dictlen)
printf("%s\n",*idx);
}
As per http://www.cplusplus.com/reference/algorithm/lower_bound/?kw=lower_bound , this function is supposed to return the index of 'last'(beyond end) in case no match is found i.e. tmp should be equal dictlen.
In my case, it always returns the beginning index i.e. I get tmp equal to 0 both 1. When passed a string that is there in the dict and 2. When passed a string that is not there in the dict.
I think the issue is in handling and passing of the pointer. The default comparator should be available for this case as is available in case of vector. I also tried passing an explicit one, to no avail.
I tried this comparator -
bool compStr(const char *a, const char *b){
return strcmp(a,b)<0;
}
I know the ALTERNATE is to used vector ,etc, but I would like to know the issue in this one.
Searched on this over google as well as SO, but did not find anything similar.
There are two misunderstandings here, I believe.
std::lower_bound does not check if an element is part of a sorted range. Instead it finds the leftmost place where an element could be inserted without breaking the ordering.
You're not comparing the contents of the strings but their memory addresses.
It is true that dict in your case is a sorted range in that the sense that the memory addresses of the inner arrays are ascending. Where in relation to this str.c_str() lies is, of course, undefined. In practice, dict is a stack object, you will often find that the memory range for the heap (where str.c_str() invariably lies) is below that of the stack, in which case lower_bound quite correctly tells you that if you wanted to insert this address into the sorted range of addresses as which you interpret dict, you'd have to do so at the beginning.
For a solution, since there is an operator<(char const *, std::string const &), you could simply write
char (*idx)[20] = lower_bound(&dict[0], &dict[0] + dictlen, str);
...but are you perhaps really looking for std::find?
When we use strcmp(str1, str2); or str1.compare(str2); the return values are like -1, 0 and 1, for str1 < str2, str1 == str2 or str1 > str2 respectively.
The question is, is it defined like this for a specific reason?
For instance, in binary tree sorting algorithm, we push smaller values to the left child and larger values to the right child. This strcmp or string::compare functions seem to be perfect for that. However, does anyone use string matching in order to sort a tree (integer index are easier to use) ?
So, what is the actual purpose of the three return values ( -1, 0, 1). Why cant it just return 1 for true, and 0 for false?
Thanks
The purpose of having three return values is exactly what it seems like: to answer all questions about string comparisons at once.
Everyone has different needs. Some people sometimes need a simple less-than test; strncmp provides this. Some people need equality testing; strncmp provides this. Some people really do need to know the full relationship between two strings; strncmp provides this.
What you absolutely don't want is someone writing this:
if(strless(lhs, rhs))
{
}
else if(strequal(lhs, rhs))
{
}
That's doing two potentially expensive comparison operations. strless also knows if they were equal, because it had to get to the end of both strings to return that it was not less.
Oh, and FYI: the return values isn't -1 or +1; it's greater than zero or less than zero. Or zero if they're equal.
It's useful for certain cases where knowing all three cases is important. Use operator< for string when you just care about a boolean comparison.
It could, but then you would need multiple functions for sorting and comparison. With strcmp() returning smaller, equal or bigger, you can use them easily for comparison and for sorting.
Remember that BSTs are not the only place where you would like to compare strings. You might want to sort a name list or similar. Also, it is not uncommon to have a string as key in a tree too.
As others have stated, there are real purposes for comparison of strings with < > == implications. For example; fixed length numbers assigned to strings will resolve correctly; ie: "312235423" > "312235422". On some occasions this is useful.
However the feature you're asking for, true/false for solutions still works with the given return values.
if (-1)
{
// resolves true
}
else if (1)
{
// also resolves true
}
else if (0)
{
// resolves false
}
I would like to sort alphanumeric strings the way a human being would sort them. I.e., "A2" comes before "A10", and "a" certainly comes before "Z"! Is there any way to do with without writing a mini-parser? Ideally it would also put "A1B1" before "A1B10". I see the question "Natural (human alpha-numeric) sort in Microsoft SQL 2005" with a possible answer, but it uses various library functions, as does "Sorting Strings for Humans with IComparer".
Below is a test case that currently fails:
#include <set>
#include <iterator>
#include <iostream>
#include <vector>
#include <cassert>
template <typename T>
struct LexicographicSort {
inline bool operator() (const T& lhs, const T& rhs) const{
std::ostringstream s1,s2;
s1 << toLower(lhs); s2 << toLower(rhs);
bool less = s1.str() < s2.str();
//Answer: bool less = doj::alphanum_less<std::string>()(s1.str(), s2.str());
std::cout<<s1.str()<<" "<<s2.str()<<" "<<less<<"\n";
return less;
}
inline std::string toLower(const std::string& str) const {
std::string newString("");
for (std::string::const_iterator charIt = str.begin();
charIt!=str.end();++charIt) {
newString.push_back(std::tolower(*charIt));
}
return newString;
}
};
int main(void) {
const std::string reference[5] = {"ab","B","c1","c2","c10"};
std::vector<std::string> referenceStrings(&(reference[0]), &(reference[5]));
//Insert in reverse order so we know they get sorted
std::set<std::string,LexicographicSort<std::string> > strings(referenceStrings.rbegin(), referenceStrings.rend());
std::cout<<"Items:\n";
std::copy(strings.begin(), strings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
std::vector<std::string> sortedStrings(strings.begin(), strings.end());
assert(sortedStrings == referenceStrings);
}
Is there any way to do with without writing a mini-parser?
Let someone else do that?
I'm using this implementation: http://www.davekoelle.com/alphanum.html, I've modified it to support wchar_t, too.
It really depends what you mean by "parser." If you want to avoid writing a parser, I would think you should avail yourself of library functions.
Treat the string as a sequence of subsequences which are uniformly alphabetic, numeric, or "other."
Get the next alphanumeric sequence of each string using isalnum and backtrack-checking for + or - if it is a number. Use strtold in-place to find the end of a numeric subsequence.
If one is numeric and one is alphabetic, the string with the numeric subsequence comes first.
If one string has run out of characters, it comes first.
Use strcoll to compare alphabetic subsequences within the current locale.
Use strtold to compare numeric subsequences within the current locale.
Repeat until finished with one or both strings.
Break ties with strcmp.
This algorithm has something of a weakness in comparing numeric strings which exceed the precision of long double.
Is there any way to do it without writing a mini parser? I would think the answer is no. But writing a parser isn't that tough. I had to do this a while ago to sort our company's stock numbers. Basically just scan the number and turn it into an array. Check the "type" of every character: alpha, number, maybe you have others you need to deal with special. Like I had to treat hyphens special because we wanted A-B-C to sort before AB-A. Then start peeling off characters. As long as they are the same type as the first character, they go into the same bucket. Once the type changes, you start putting them in a different bucket. Then you also need a compare function that compares bucket-by-bucket. When both buckets are alpha, you just do a normal alpha compare. When both are digits, convert both to integer and do an integer compare, or pad the shorter to the length of the longer or something equivalent. When they're different types, you'll need a rule for how those compare, like does A-A come before or after A-1 ?
It's not a trivial job and you have to come up with rules for all the odd cases that may arise, but I would think you could get it together in a few hours of work.
Without any parsing, there's no way to compare human written numbers (high values first with leading zeroes stripped) and normal characters as part of the same string.
The parsing doesn't need to be terribly complex though. A simple hash table to deal with things like case sensitivity and stripping special characters ('A'='a'=1,'B'='b'='2,... or 'A'=1,'a'=2,'B'=3,..., '-'=0(strip)), remap your string to an array of the hashed values, then truncate number cases (if a number is encountered and the last character was a number, multiply the last number by ten and add the current value to it).
From there, sort as normal.
Given a sequence (for example a string "Xa"), I want to get the next prefix in order lexicographic (i.e "Xb"). The next of "aZ" should be "b"
A motivating use case where this function is useful is described here.
As I don't want to reinvent the wheel, I'm wondering if there is any function in C++ STL or boost that can help to define this generic function easily?
If not, do you think that this function can be useful?
Notes
Even if the examples are strings, the function should work for any Sequence.
The lexicographic order should be a template parameter of the function.
From the answers I conclude that there is nothing on C++/Boost that can help to define this generic function easily and also that this function is too specific to be proposed for free. I will implement a generic next_prefix and after that I will request if you find it useful.
I have accepted the single answer that gives some hints on how to do that even if the proposed implementation is not generic.
I'm not sure I understand the semantics by which you wish the string to transform, but maybe something like the following can be a starting point for you. The code will increment the sequence, as if it was a sequence of digits representing a number.
template<typename Bi, typename I>
bool increment(Bi first, Bi last, I minval, I maxval)
{
if( last == first ) return false;
while( --last != first && *last == maxval ) *last = minval;
if( last == first && *last == maxval ) {
*last = minval;
return false;
}
++*last;
return true;
}
Maybe you wish to add an overload with a function object, or an overload or specialization for primitives. A couple of examples:
string s1("aaz");
increment(s1.begin(), s1.end(), 'a', 'z');
cout << s1 << endl; // aba
string s2("95");
do {
cout << s2 << ' '; // 95 96 97 98 99
} while( increment(s2.begin(), s2.end(), '0', '9') );
cout << endl;
That seem so specific that I can't see how it would get in STL or boost.
When you say the order is a template parameter, what are you envisaging will be passed? A comparator that takes two characters and returns bool?
If so, then that's a bit of a nightmare, because the only way to find "the least char greater than my current char" is to sort all the chars, find your current char in the result, and step forward one (or actually, if some chars might compare equal, use upper_bound with your current char to find the first greater char).
In practice, for any sane string collation you can define a "get the next char, or warn me if I gave you the last char" function more efficiently, and build your "get the next prefix" function on top of that. Hopefully, permitting an arbitrary order is more flexibility than you need.
Orderings are typically specified as a comparator, not as a sequence generator.
Lexicographical orderings in particular tend be only partial, for example, in case or diacritic insensitivity. Therefore your final product will be nondeterministic, or at best arbitrary. ("Always choose lowest numerical encoding"?)
In any case, if you accept a comparator as input, the only way to translate that to an increment operation would be to compare the current value against every other in the character space. Which could work, 127 values being so few (a comparator-sorted table would make short work of the problem), or could be impossibly slow, if you use any other kind of character.
The best way is likely to define the character ordering somehow, then define the rules from going from one character to two characters to three characters.
Use whatever sort function you wish to use over the complete list of characters that you want to include, then just use that as the ordering. Find the index of the current character, and you can easily find the previous and next characters. Only advance the right-most character, unless it's going to roll over, then advance the next character to the left.
In other words, reinventing the wheel is like 10 lines of Python. Probably less than 500 lines of C++. :)