C++ sorting - special characters after normal letters - c++

I have a vector of strings which is holding strings like
BB
aA
12
b
AA
&
[
**
1
Using the default sort() with C++ I am given this sorted list
&
**
1
12
AA
BB
[
aA
b
Instead of that, I need it to be ordered with the normal letters A, a, B, b.... coming first, those followed by "special" characters like 0-9, *, [, , ~, !.... in their normal ASCII order.
I really have no idea how to go about changing the way that the vector is sorted in order to make sure that it is all in that order. Thanks.

Another untested solution.
If I missed a case, I'm sure someone will point it out, but here goes:
#include <iostream>
#include <algorithm>
#include <string>
#include <cctype>
#include <vector>
#include <iterator>
using namespace std;
int main() {
std::vector <std::string> StringVect = { "BB", "aA", "12", "b", "AA", "&", "[", "**", "1" };
std::sort(StringVect.begin(), StringVect.end(), []
(const std::string& s1, const std::string& s2)
{
if (s1.empty() || s2.empty())
return s1 < s2;
// a convenience array
bool ac[] = { isalpha(s1[0]), isalpha(s2[0]),
isdigit(s1[0]), isdigit(s2[0]),
!isalnum(s1[0]), !isalnum(s2[0]) };
// If both strings start with the same type, then return
// s1 < s2
if ((ac[0] && ac[1]) || // if both alpha strings
(ac[2] && ac[3]) || // if both digit strings
(ac[4] && ac[5])) // if both non-alphanumeric strings
return s1 < s2;
// if first string is alpha, or second string is not alphanumeric
// the strings are in order, else they are not
return (ac[0] || ac[5]);
});
copy(StringVect.begin(), StringVect.end(), ostream_iterator<string>(cout, "\n"));
}
Basically, the condition says this:
1) If one of the strings are empty, then return s1 < s2
2) If both strings start with the same character type, just return s1 < s2
3) If the first string starts with an alpha, or if the second string is not an alphanumeric, then the strings are in order and return true, else return false. The trick here is to realize that step 2) eliminated all combinations of both strings being of the same type, so our check at the step 3) stage becomes simplified.
Live example: http://ideone.com/jxxhIY
Edit:
If you are checking for case-insensitive strings, then you need to change the code, and add the case-insensitive check. I won't add the code, since there are multiple ways, both with their respective advantages and disadvantages, of doing a case-insensitive compare.
#include <iostream>
#include <algorithm>
#include <string>
#include <cctype>
#include <vector>
#include <iterator>
using namespace std;
int main() {
std::vector <std::string> StringVect = { "BB", "aA", "12", "b", "AA", "&", "[", "**", "1" };
std::sort(StringVect.begin(), StringVect.end(), []
(const std::string& s1, const std::string& s2)
{
if (s1.empty() || s2.empty())
return s1 < s2;
// a convenience array
bool ac[] = { isalpha(s1[0]), isalpha(s2[0]),
isdigit(s1[0]), isdigit(s2[0]),
!isalnum(s1[0]), !isalnum(s2[0]) };
// If both strings start with the same type, then return
// s1 < s2
if ((ac[2] && ac[3]) || (ac[4] && ac[5]))
return s1 < s2;
// case insensitive
if (ac[0] && ac[1]) // both strings are alpha
return myCaseInsensitiveComp(s1, s2); //returns true if s1 < s2, false otherwise
// if first string is alpha, or second string is not alphanumeric
// the strings are in order, else they are not
return (ac[0] || ac[5]);
});
copy(StringVect.begin(), StringVect.end(), ostream_iterator<string>(cout, "\n"));
}
Again, the myCaseInsensitiveComp is a stub that you should fill in with a function that accomplishes this goal. For a link, see this:
Case insensitive string comparison in C++

You can specify your own function that compares values for the sort to use.
bool myfunction(std::string a, std::string b){ ... }
std::sort(data.begin(), data.end(), myfunction);
Using myfunction you can specify the order you want. Here is more information about sort:
http://www.cplusplus.com/reference/algorithm/sort/

Anyway, you need implement your own comparison logic for it, and use it with sort.
(1) You can give a comparison function object to sort, and implement your own comparison(less than) logic in it, such as:
struct my_comparison {
bool operator()(const string &a, const string &b) {
// implement the less than logic here
}
}
and then:
sort(v.begin(), v.end(), my_comparison());
Reference: http://en.cppreference.com/w/cpp/algorithm/sort
(2) You can implement your own char_traits<char> to make a special string which uses the special comparison logic. Such as:
struct my_char_traits : public char_traits<char>
// just inherit all the other functions
// that we don't need to replace
{
static bool eq(char c1, char c2) {
// implement the comparison logic here
}
static bool lt(char c1, char c2)
// implement the comparison logic here
}
static int compare(const char* s1, const char* s2, size_t n)
// implement the comparison logic here
}
};
And then:
typedef basic_string<char, my_char_traits> my_string;
vector<my_string> v;
// ...
sort(v.begin(), v.end());
Reference: http://en.cppreference.com/w/cpp/string/char_traits

Disclaimer: Untested code.
Here's an implementation of a compare function that should work. The crux of the solution is to re-encode into an array the values corresponding to the characters whose order you want to be changed in the compare function.
void initializeEncoding(char encoding[])
{
for (int i = 0; i < 256; ++i )
{
encoding[i] = i;
}
// Now re-encode for letters, numbers, and other special characters.
// The control characters end at 31. 32 == ' ', the space character.
int nextIndex = 32;
// Re-encode the uppercase letters.
for ( int c = 'A'; c <= 'Z'; ++c, ++nextIndex )
{
encoding[c] = nextIndex;
}
// Re-encode the lowercase letters.
for ( int c = 'a'; c <= 'z'; ++c, ++nextIndex )
{
encoding[c] = nextIndex;
}
// Re-encode the numbers.
for ( int c = '0'; c <= '9'; ++c, ++nextIndex )
{
encoding[c] = nextIndex;
}
// Re-encode the special chracters.
char const* specialChars = " !\"#$%&'()*+,-./:;<=>?[\\]^_`{|}~";
for ( char* cp = specialChars; *cp != '\0'; ++cp, ++nextIndex )
{
encoding[*cp] = nextIndex;
}
}
bool mycompare(char const* s1, char const* s2)
{
static char encoding[256];
static bool initialized = false;
if ( !initialized )
{
initializeEncoding(encoding);
initialized = true;
}
for ( ; *s1 != '\0' && *s2 != '\0'; ++s1, ++s2 )
{
if ( encoding[*s1] != encoding[*s2] )
{
break;
}
}
return ((encoding[*s1] - encoding[*s2]) < 0);
}
bool mycompare(std::string const& s1, std::string const& s2)
{
return mycompare(s1.c_str(), s2.c_str());
}

Related

What is the effective way to replace all occurrences of a character with every character in the alphabet?

What is the effective way to replace all occurrences of a character with every character in the alphabet in std::string?
#include <algorithm>
#include <string>
using namespace std;
void some_func() {
string s = "example *trin*";
string letters = "abcdefghijklmnopqrstuvwxyz";
// replace all '*' to 'letter of alphabet'
for (int i = 0; i < 25; i++)
{
//replace letter * with a letter in string which is moved +1 each loop
replace(s.begin(), s.end(), '*', letters.at(i));
i++;
cout << s;
}
how can i get this to work?
You can just have a function:
receiving the string you want to operate on, and the character you want to replace, and
returning a list with the new strings, once the replacement has been done;
for every letter in the alphabet, you could check if it is in the input string and, in that case, create a copy of the input string, do the replacement using std::replace, and add it to the return list.
[Demo]
#include <algorithm> // replace
#include <fmt/ranges.h>
#include <string>
#include <string_view>
#include <vector>
std::vector<std::string> replace(const std::string& s, const char c) {
std::string_view alphabet{"abcdefghijklmnopqrstuvwxyz"};
std::vector<std::string> ret{};
for (const char l : alphabet) {
if (s.find(c) != std::string::npos) {
std::string t{s};
std::ranges::replace(t, c, l);
ret.emplace_back(std::move(t));
}
}
return ret;
}
int main() {
std::string s{"occurrences"};
fmt::print("Replace '{}': {}\n", 'c', replace(s, 'c'));
fmt::print("Replace '{}': {}\n", 'z', replace(s, 'z'));
}
// Outputs:
//
// Replace 'c': ["oaaurrenaes", "obburrenbes", "oddurrendes"...]
// Replace 'z': []
Edit: update on your comment below.
however if I wanted to replace 1 character at a time for example in
occurrences there are multiple "C" if i only wanted to replace 1 of
them then run all outcomes of that then move onto the next "C" and
replace all of them and so on, how could that be done?
In that case, you'd need to iterate over your input string, doing the replacement to one char at a time, and adding each of those new strings to the return list.
[Demo]
for (const char l : alphabet) {
if (s.find(c) != std::string::npos) {
for (size_t i{0}; i < s.size(); ++i) {
if (s[i] == c) {
std::string t{s};
t[i] = l;
ret.emplace_back(std::move(t));
}
}
}
}
// Outputs:
//
// Replace 'c': ["oacurrences", "ocaurrences", "occurrenaes"...]
// Replace 'z': []

How do I read a string char by char in C++?

I need to read a string char by char in order to perform some controls on it. Is it possible to do that? Have I necessarily got to convert it to a char array?
I tried to point at single chars with string_to_control[i] and then increase i to move, but this doesn't seem to work.
As an example, I post a piece of the code for the control of parenthesis.
bool Class::func(const string& cont){
const string *p = &cont;
int k = 0;
//control for parenthesis
while (p[k].compare('\0') != 0) {
if (p[k].compare("(") == 0) { ap++; };
if (p[k].compare(")") == 0) { ch++; };
k++;
};
//...
};
The string is copied alright, but as soon as I try the first comparison an exception is thrown.
EDIT: I add that I would like to have different copies of the initial string cont (and move on them, rather than on cont directly) in order to manipulate them (later on in the code, I need to verify that certain words are in the right place).
The simplest way to iterate through a string character by character is a range-for:
bool Class::func(const string& cont){
for (char c : cont) {
if (c == '(') { ap++; }
if (c == ')') { ch++; }
}
//...
};
The range-for syntax was added in C++11. If, for some reason, you're using an old compiler that doesn't have C++11 support, you can iterate by index perfectly well without any casts or copies:
bool Class::func(const string& cont){
for (size_t i = 0; i < cont.size(); ++i) {
if (cont[i] == '(') { ap++; }
if (cont[i] == ')') { ch++; }
}
//...
};
If you just want to count the opening and closing parentheses take a look at this:
bool Class::func(const string& cont) {
for (const auto c : cont) {
switch (c) {
case '(': ++ap; break;
case ')': ++ch; break;
}
}
// ...
}
const string *p = &cont;
int k = 0;
while (p[k].compare('\0') != 0)
Treats p as if it were an array, as p only points to a single value your code has undefined behaviour when k is non-zero. I assume what you actually wanted to write was:
bool Class::func(const string& cont){
while (cont[k] != '\0') {
if (cont[k] == '(') { ap++; };
if (cont[k] == ') { ch++; };
k++;
};
};
A simpler way would be to iterate over std::string using begin() and end() or even more simply just use a range for loop:
bool Class::func(const string& cont){
for (char ch : cont) {
if (ch == '(') { ap++; };
if (ch == ')') { ch++; };
};
};
If you want to copy your string simply declare a new string:
std::string copy = cont;
The std::string::operator[] overload allows expressions such as cont[k]. Your code treats p as an array of std::string rather then an array of characters as you intended. That could be corrected by:
const string &p = cont;
but is unnecessary since you can already access cont directly.
cont[k] has type char so calling std::string::compare() is not valid. You can compare chars in the normal manner:
cont[k] == '('`
You should also be aware that before C++11 the end of a std::string is not delimited by a \0 like a C string (there may happen to be a NUL after the string data, but that is trusting to luck). C++11 does guarantee that, but probably only to "fix" older code that made the assumption that it was.
If you use std::string::at rather then std::string::operator[] an exception will be thrown if you exceed the bounds. But you should use either range-based for, a std::string::iterator or std::string::length() to iterate a string to the end.
If you don't want to use iterators std::string also overloads operator[], so you can access the chars like you would do with a char[].
cont[i] will return the character at index i for example, then you can use == to compare it to another char:
bool Class::func(const string& cont){
int k = 0;
while (k < cont.length()) {
if (cont[k] == '(') { ap++; };
if (cont[k] == ')') { ch++; };
k++;
};
};
To count parentheses, you can use std::count algorithm from the standard library:
/* const */ auto ap = std::count(cont.begin(), cont.end(), '(');
/* const */ auto ch = std::count(cont.begin(), cont.end(), ')');
The string will be traversed twice.
For single traversal you can implement a generic function (requires C++17):
template<class C, typename... Ts>
auto count(const C& c, const Ts&... values) {
std::array<typename C::difference_type, sizeof...(Ts)> counts{};
for (auto& value : c) {
auto it = counts.begin();
((*it++ += (value == values)), ...);
}
return counts;
}
and then write
/* const */ auto [ap, ch] = count(cont, '(', ')');
First convert the string to a char array like this:
bool Class::func(const string& cont){
char p[cont.size() + 1];
strcpy(p, cont.c_str());
int k = 0;
//control for parenthesis
while (p[k].compare('\0') != 0) {
if (p[k].compare("(") == 0) { ap++; };
if (p[k].compare(")") == 0) { ch++; };
k++;
};
//...
};
You could do what you want with an algorithm, which means you can avoid the array conversion:
#include <iostream>
#include <string>
#include <cstring>
#include <algorithm> // std::count
int main()
{
std::string s = "hi(there),(now))";
int ap = std::count (s.c_str(), s.c_str()+s.size(), '(');
int ch = std::count (s.c_str(), s.c_str()+s.size(), ')');
std::cout << ap << "," << ch << '\n'; // prints 2,3
return 0;
}

C++ writing a function that extracts words from a paragraph

The program I am writing reads a text file, breaks the paragraph into individual words, compares them to a list of "sensitive words" and if a word from the text file matches a word from the Sensitive word list, it is censored. I have wrote functions that find the beginning of each word, and a function that will censor or replace words on the Sensitive word list with "#####" (which I left out of this post). A word in this case is any string that contains alphanumeric characters.
The function I am having trouble with is the function that will "extract" or return the individual words to compare to the sensitive word list (extractWord). At the moment it just returns the first letter of the last word in the sentence. So right now all the function does is return "w". I need all the individual words.
Here is what I have so far ...
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
bool wordBeginsAt (const std::string& message, int pos);
bool isAlphanumeric (char c); //
std::string extractWord (const std::string& fromMessage, int beginningAt);
int main()
{
string word = "I need to break these words up individually. 12345 count as words";
string newWord;
for (int i = 0; i < word.length(); ++i)
{
if (wordBeginsAt(word, i))
{
newWord = extractWord(word, i);
}
}
//cout << newWord; // testing output
return 0;
}
bool wordBeginsAt (const std::string& message, int pos)
{
if(pos==0)
{return true;}
else
if (isAlphanumeric(message[pos])==true && isAlphanumeric(message[pos- 1])==false)
{
return true;
}
else
return false;
}
bool isAlphanumeric (char c)
{
return (c >= 'A' && c <= 'Z')
|| (c >= 'a' && c <= 'z')
|| (c >= '0' && c <= '9');
}
std::string extractWord (const std::string& fromMessage, int beginningAt)
{
string targetWord= "";
targetWord = targetWord + fromMessage[beginningAt];
return targetWord;
}
edit: after trying to use targetWord as an array (which I couldn't define the size) and using several different for and while loops within extractWord I found a solution:
std::string extractWord (const std::string& fromMessage, int beginningAt)
{
string targetWord= "";
while (isAlphanumeric(fromMessage[beginningAt++]))
{
targetWord = targetWord + fromMessage[beginningAt-1];
}
return targetWord;
Since this is a C++ question, how about using modern C++, instead of using dressed-up C code? The modern C++ library has all the algorithms and functions needed to implement all of this work for you:
#include <algorithm>
#include <cctype>
std::string paragraph;
// Somehow, figure out how to get your paragraph into this std::string, then:
auto b=paragraph.begin(), e=paragraph.end();
while (b != e)
{
// Find first alphanumeric character, using POSIX isalnum()
auto p=std::find_if(b, e, [](char c) { return isalnum(c); });
// Find the next non-alphanumeric chararacter
b=std::find_if(p, e, [](char c) { return !isalnum(c); });
if (isbadword(std::string(p, b)))
std::fill(p, b, '#');
}
This does pretty much what you asked, in a fraction of the size of all that manual code that manually searches this stuff. All you have to do is to figure out what...
bool isbadword(const std::string &s)
...needs to do.
Your homework assignment is how to slightly tweak this code to avoid, in certain specific situations, calling isbadword() with an empty string.

Sorting strings with numerical digits in it

I have strings like 7X1234 XY1236 NM1235. I want to sort this strings using last 4 numerical digits only ignoring the initial two alphabets. Also, I want to compare those numerical digits to see if they are sequential.
One way to achieve this I can think of is to split these strings between alphabets and numerals as (7X and 1234) and work lexical cast the numeral string to int and work on it. But, how can I associate the alphabet part again to the numeral part that is how to prefix 7X again to 1234 at the end when the numeral strings are sorted and compared in C++?
In short if I have 7X1234 XY1236 NM1235 BV1238 I need to get 7X1234 NM1235 XY1236 BV1238
I did not elaborate that I wanted to find out if the numerical part of strings are sequential. Right now when I have just ints like 1234 1236 1235 1238 I do something like below
std::vector<int> sortedDigits{1234 1235 1236 1238};
int count = 1;
int pos = 0;
std::vector<std::pair<int, int> > myVec;
myVec.push_back(std::make_pair(sortedDigits[pos], count));
for(size_t i = 1; i < sortedDigits.size(); ++i)
{
if(sortedDigits[i] != (sortedDigits[i-1] + 1))
{
count = 1;
myVec.push_back(std::make_pair(sortedDigits[i], count) );
++pos;
}
else
{
sortedDigits[pos].second = ++count;
}
}
So at the end I get (1234, 3) and (1238, 1)
I don't know how can I get something like this when strings are there?
Since the character encoded values of numerals are ordered in the same order as the numbers they represent, you can do string comparison on the last four digits:
#include <cstring>
#include <string>
// Requires: a.size() >= 2, b.size() >= 2
bool two_less(std::string const & a, std::string const & b)
{
return std::strcmp(a.data() + 2, b.data() + 2) < 0;
}
Now use sort with predicate:
#include <algorithm>
#include <vector>
std::vector<std::string> data { "7X1234", "YX1236" };
std::sort(data.begin(), data.end(), two_less);
In C++11, and in particular if you have no repeated use for this, you can also use a lambda directly in the sort call:
std::sort(data.begin(), data.end(),
[](std::string const & a, std::string const & b)
{ return std::strcmp(a.data() + 2, b.data() + 2) < 0; });
Then you can even make the number "2" a captured variable if you need to vary it.
Use qsort and provide a comparator function that indexes into the start of the string plus an offset of two, rather than directly from the beginning of the string.
For example your comparator function could look like this:
int compare (const void * a, const void * b)
{
char * a_cmp = ((char *)a)+2;
char * b_cmp = ((char *)b)+2;
return strcmp(a_cmp, b_cmp);
}
You can e.g make struct like this
struct combined{
string alph;
int numeral;
};
put these in a c++ standard container
and use the sort of algoritm with a user defined compare object.
You should create a class that encapsulates your string and which has an int and and string field. This class can overload the comparison operators.
class NumberedString
{
private:
int number;
string originalString;
public:
NumberedString(string original) { ... }
friend bool operator> (NumberedString &left, NumberedString &right);
friend bool operator<=(NumberedString &left, NumberedString &right);
friend bool operator< (NumberedString &left, NumberedString &right);
friend bool operator>=(NumberedString &left, NumberedString &right);
};
You can just define your comparator
bool mycomparator(const std::string& a, const std::string& b) {
return a.substr(2) < b.substr(2);
}
then you can sort your std::vector<std::string> passing mycomparator as third parameter.
In C++11 this is also a case in which an anonymous lambda is a good fit...
#include <vector>
#include <algorithm>
#include <string>
#include <iostream>
int main(int argc, const char *argv[])
{
std::vector<std::string> data = {"7X1234", "XY1236", "NM1235", "BV1238"};
std::sort(data.begin(), data.end(),
[](const std::string& a, const std::string& b) {
return a.substr(2) < b.substr(2);
});
for (auto x : data) {
std::cout << x << std::endl;
}
return 0;
}
If you're 100% sure that the strings in the array are in XX9999 format you can use instead
return strncmp(a.data()+2, b.data()+2, 4) < 0;
that is more efficient because doesn't require any memory allocation to do the comparison.
Use a std::map<int, std::string>, using the int value as key and the respective string as value. You can then simply iterate over the map and retrieve the strings; they will already be in sorted order.
How about something like this:
std::string str[] = { "7X1234", "XY1236", "NM1235" };
std::map<int, std::string> m;
for(s : str)
{
std::ostringstream ss(s.substr(2));
int num;
ss >> num;
m[num] = s;
}
for(i : m)
{
std::cout << i->second << " ";
}
std::cout << std::endl;
I just typed this in, so minor typos/bugs may be there, but principle should work.

How to sort file names with numbers and alphabets in order in C?

I have used the following code to sort files in alphabetical order and it sorts the files as shown in the figure:
for(int i = 0;i < maxcnt;i++)
{
for(int j = i+1;j < maxcnt;j++)
{
if(strcmp(Array[i],Array[j]) > 0)
{
strcpy(temp,Array[i]);
strcpy(Array[i],Array[j]);
strcpy(Array[j],temp);
}
}
}
But I need to sort it as order seen in Windows explorer
How to sort like this way? Please help
For a C answer, the following is a replacement for strcasecmp(). This function recurses to handle strings that contain alternating numeric and non-numeric substrings. You can use it with qsort():
int strcasecmp_withNumbers(const void *void_a, const void *void_b) {
const char *a = void_a;
const char *b = void_b;
if (!a || !b) { // if one doesn't exist, other wins by default
return a ? 1 : b ? -1 : 0;
}
if (isdigit(*a) && isdigit(*b)) { // if both start with numbers
char *remainderA;
char *remainderB;
long valA = strtol(a, &remainderA, 10);
long valB = strtol(b, &remainderB, 10);
if (valA != valB)
return valA - valB;
// if you wish 7 == 007, comment out the next two lines
else if (remainderB - b != remainderA - a) // equal with diff lengths
return (remainderB - b) - (remainderA - a); // set 007 before 7
else // if numerical parts equal, recurse
return strcasecmp_withNumbers(remainderA, remainderB);
}
if (isdigit(*a) || isdigit(*b)) { // if just one is a number
return isdigit(*a) ? -1 : 1; // numbers always come first
}
while (*a && *b) { // non-numeric characters
if (isdigit(*a) || isdigit(*b))
return strcasecmp_withNumbers(a, b); // recurse
if (tolower(*a) != tolower(*b))
return tolower(*a) - tolower(*b);
a++;
b++;
}
return *a ? 1 : *b ? -1 : 0;
}
Notes:
Windows needs stricmp() rather than the Unix equivalent strcasecmp().
The above code will (obviously) give incorrect results if the numbers are really big.
Leading zeros are ignored here. In my area, this is a feature, not a bug: we usually want UAL0123 to match UAL123. But this may or may not be what you require.
See also Sort on a string that may contain a number and How to implement a natural sort algorithm in c++?, although the answers there, or in their links, are certainly long and rambling compared with the above code, by about a factor of at least four.
Natural sorting is the way that you must take here . I have a working code for my scenario. You probably can make use of it by altering it according to your needs :
#ifndef JSW_NATURAL_COMPARE
#define JSW_NATURAL_COMPARE
#include <string>
int natural_compare(const char *a, const char *b);
int natural_compare(const std::string& a, const std::string& b);
#endif
#include <cctype>
namespace {
// Note: This is a convenience for the natural_compare
// function, it is *not* designed for general use
class int_span {
int _ws;
int _zeros;
const char *_value;
const char *_end;
public:
int_span(const char *src)
{
const char *start = src;
// Save and skip leading whitespace
while (std::isspace(*(unsigned char*)src)) ++src;
_ws = src - start;
// Save and skip leading zeros
start = src;
while (*src == '0') ++src;
_zeros = src - start;
// Save the edges of the value
_value = src;
while (std::isdigit(*(unsigned char*)src)) ++src;
_end = src;
}
bool is_int() const { return _value != _end; }
const char *value() const { return _value; }
int whitespace() const { return _ws; }
int zeros() const { return _zeros; }
int digits() const { return _end - _value; }
int non_value() const { return whitespace() + zeros(); }
};
inline int safe_compare(int a, int b)
{
return a < b ? -1 : a > b;
}
}
int natural_compare(const char *a, const char *b)
{
int cmp = 0;
while (cmp == 0 && *a != '\0' && *b != '\0') {
int_span lhs(a), rhs(b);
if (lhs.is_int() && rhs.is_int()) {
if (lhs.digits() != rhs.digits()) {
// For differing widths (excluding leading characters),
// the value with fewer digits takes priority
cmp = safe_compare(lhs.digits(), rhs.digits());
}
else {
int digits = lhs.digits();
a = lhs.value();
b = rhs.value();
// For matching widths (excluding leading characters),
// search from MSD to LSD for the larger value
while (--digits >= 0 && cmp == 0)
cmp = safe_compare(*a++, *b++);
}
if (cmp == 0) {
// If the values are equal, we need a tie
// breaker using leading whitespace and zeros
if (lhs.non_value() != rhs.non_value()) {
// For differing widths of combined whitespace and
// leading zeros, the smaller width takes priority
cmp = safe_compare(lhs.non_value(), rhs.non_value());
}
else {
// For matching widths of combined whitespace
// and leading zeros, more whitespace takes priority
cmp = safe_compare(rhs.whitespace(), lhs.whitespace());
}
}
}
else {
// No special logic unless both spans are integers
cmp = safe_compare(*a++, *b++);
}
}
// All else being equal so far, the shorter string takes priority
return cmp == 0 ? safe_compare(*a, *b) : cmp;
}
#include <string>
int natural_compare(const std::string& a, const std::string& b)
{
return natural_compare(a.c_str(), b.c_str());
}
What you want to do is perform "Natural Sort". Here is a blog post about it, explaining implementation in python I believe. Here is a perl module that accomplishes it. There also seems to be a similar question at How to implement a natural sort algorithm in c++?
Taking into account that this has a c++ tag, you could elaborate on #Joseph Quinsey's answer and create a natural_less function to be passed to the standard library.
using namespace std;
bool natural_less(const string& lhs, const string& rhs)
{
return strcasecmp_withNumbers(lhs.c_str(), rhs.c_str()) < 0;
}
void example(vector<string>& data)
{
std::sort(data.begin(), data.end(), natural_less);
}
I took the time to write some working code as an exercise
https://github.com/kennethlaskoski/natural_less
Modifying this answer:
bool compareNat(const std::string& a, const std::string& b){
if (a.empty())
return true;
if (b.empty())
return false;
if (std::isdigit(a[0]) && !std::isdigit(b[0]))
return true;
if (!std::isdigit(a[0]) && std::isdigit(b[0]))
return false;
if (!std::isdigit(a[0]) && !std::isdigit(b[0]))
{
if (a[0] == b[0])
return compareNat(a.substr(1), b.substr(1));
return (toUpper(a) < toUpper(b));
//toUpper() is a function to convert a std::string to uppercase.
}
// Both strings begin with digit --> parse both numbers
std::istringstream issa(a);
std::istringstream issb(b);
int ia, ib;
issa >> ia;
issb >> ib;
if (ia != ib)
return ia < ib;
// Numbers are the same --> remove numbers and recurse
std::string anew, bnew;
std::getline(issa, anew);
std::getline(issb, bnew);
return (compareNat(anew, bnew));
}
toUpper() function:
std::string toUpper(std::string s){
for(int i=0;i<(int)s.length();i++){s[i]=toupper(s[i]);}
return s;
}
Usage:
#include <iostream> // std::cout
#include <string>
#include <algorithm> // std::sort, std::copy
#include <iterator> // std::ostream_iterator
#include <sstream> // std::istringstream
#include <vector>
#include <cctype> // std::isdigit
int main()
{
std::vector<std::string> str;
str.push_back("20.txt");
str.push_back("10.txt");
str.push_back("1.txt");
str.push_back("z2.txt");
str.push_back("z10.txt");
str.push_back("z100.txt");
str.push_back("1_t.txt");
str.push_back("abc.txt");
str.push_back("Abc.txt");
str.push_back("bcd.txt");
std::sort(str.begin(), str.end(), compareNat);
std::copy(str.begin(), str.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Your problem is that you have an interpretation behind parts of the file name.
In lexicographical order, Slide1 is before Slide10 which is before Slide5.
You expect Slide5 before Slide10 as you have an interpretation of the substrings 5 and 10 (as integers).
You will run into more problems, if you had the
name of the month in the filename, and would expect them to be ordered by date (i.e. January comes before August). You will need to adjust your sorting to this interpretation (and the "natural" order will depend on your interpretation, there is no generic solution).
Another approach is to format the filenames in a way that your sorting and the lexicographical order agree. In your case, you would use leading zeroes and a fixed length for the number. So Slide1 becomes Slide01, and then you will see that sorting them lexicographically will yield the result you would like to have.
However, often you cannot influence the output of an application, and thus cannot enforce your format directly.
What I do in those cases: write a little script/function that renames the file to a proper format, and then use standard sorting algorithms to sort them. The advantage of this is that you do not need to adapt your sorting, and can use existing software for the sorting.
On the downside, there are situations where this is not feasible (as filenames need to be fixed).