How do I read a string char by char in C++? - c++

I need to read a string char by char in order to perform some controls on it. Is it possible to do that? Have I necessarily got to convert it to a char array?
I tried to point at single chars with string_to_control[i] and then increase i to move, but this doesn't seem to work.
As an example, I post a piece of the code for the control of parenthesis.
bool Class::func(const string& cont){
const string *p = &cont;
int k = 0;
//control for parenthesis
while (p[k].compare('\0') != 0) {
if (p[k].compare("(") == 0) { ap++; };
if (p[k].compare(")") == 0) { ch++; };
k++;
};
//...
};
The string is copied alright, but as soon as I try the first comparison an exception is thrown.
EDIT: I add that I would like to have different copies of the initial string cont (and move on them, rather than on cont directly) in order to manipulate them (later on in the code, I need to verify that certain words are in the right place).

The simplest way to iterate through a string character by character is a range-for:
bool Class::func(const string& cont){
for (char c : cont) {
if (c == '(') { ap++; }
if (c == ')') { ch++; }
}
//...
};
The range-for syntax was added in C++11. If, for some reason, you're using an old compiler that doesn't have C++11 support, you can iterate by index perfectly well without any casts or copies:
bool Class::func(const string& cont){
for (size_t i = 0; i < cont.size(); ++i) {
if (cont[i] == '(') { ap++; }
if (cont[i] == ')') { ch++; }
}
//...
};

If you just want to count the opening and closing parentheses take a look at this:
bool Class::func(const string& cont) {
for (const auto c : cont) {
switch (c) {
case '(': ++ap; break;
case ')': ++ch; break;
}
}
// ...
}

const string *p = &cont;
int k = 0;
while (p[k].compare('\0') != 0)
Treats p as if it were an array, as p only points to a single value your code has undefined behaviour when k is non-zero. I assume what you actually wanted to write was:
bool Class::func(const string& cont){
while (cont[k] != '\0') {
if (cont[k] == '(') { ap++; };
if (cont[k] == ') { ch++; };
k++;
};
};
A simpler way would be to iterate over std::string using begin() and end() or even more simply just use a range for loop:
bool Class::func(const string& cont){
for (char ch : cont) {
if (ch == '(') { ap++; };
if (ch == ')') { ch++; };
};
};
If you want to copy your string simply declare a new string:
std::string copy = cont;

The std::string::operator[] overload allows expressions such as cont[k]. Your code treats p as an array of std::string rather then an array of characters as you intended. That could be corrected by:
const string &p = cont;
but is unnecessary since you can already access cont directly.
cont[k] has type char so calling std::string::compare() is not valid. You can compare chars in the normal manner:
cont[k] == '('`
You should also be aware that before C++11 the end of a std::string is not delimited by a \0 like a C string (there may happen to be a NUL after the string data, but that is trusting to luck). C++11 does guarantee that, but probably only to "fix" older code that made the assumption that it was.
If you use std::string::at rather then std::string::operator[] an exception will be thrown if you exceed the bounds. But you should use either range-based for, a std::string::iterator or std::string::length() to iterate a string to the end.

If you don't want to use iterators std::string also overloads operator[], so you can access the chars like you would do with a char[].
cont[i] will return the character at index i for example, then you can use == to compare it to another char:
bool Class::func(const string& cont){
int k = 0;
while (k < cont.length()) {
if (cont[k] == '(') { ap++; };
if (cont[k] == ')') { ch++; };
k++;
};
};

To count parentheses, you can use std::count algorithm from the standard library:
/* const */ auto ap = std::count(cont.begin(), cont.end(), '(');
/* const */ auto ch = std::count(cont.begin(), cont.end(), ')');
The string will be traversed twice.
For single traversal you can implement a generic function (requires C++17):
template<class C, typename... Ts>
auto count(const C& c, const Ts&... values) {
std::array<typename C::difference_type, sizeof...(Ts)> counts{};
for (auto& value : c) {
auto it = counts.begin();
((*it++ += (value == values)), ...);
}
return counts;
}
and then write
/* const */ auto [ap, ch] = count(cont, '(', ')');

First convert the string to a char array like this:
bool Class::func(const string& cont){
char p[cont.size() + 1];
strcpy(p, cont.c_str());
int k = 0;
//control for parenthesis
while (p[k].compare('\0') != 0) {
if (p[k].compare("(") == 0) { ap++; };
if (p[k].compare(")") == 0) { ch++; };
k++;
};
//...
};
You could do what you want with an algorithm, which means you can avoid the array conversion:
#include <iostream>
#include <string>
#include <cstring>
#include <algorithm> // std::count
int main()
{
std::string s = "hi(there),(now))";
int ap = std::count (s.c_str(), s.c_str()+s.size(), '(');
int ch = std::count (s.c_str(), s.c_str()+s.size(), ')');
std::cout << ap << "," << ch << '\n'; // prints 2,3
return 0;
}

Related

Check if two given strings are isomorphic to each other c++, not sure why it's wrong

class Solution {
public:
bool isIsomorphic(string s, string t) {
vector <int> sfreq (26,0);
vector <int> tfreq (26,0);
for (int i=0; i<s.size(); i++) {
sfreq[s[i]-'a']++;
tfreq[t[i]-'a']++;
}
if (sfreq != tfreq) {
return false;
}
return true;
}
};
Hi, this is my code in c++, I saw something similar from https://www.geeksforgeeks.org/check-if-two-given-strings-are-isomorphic-to-each-other/ but my answer shows it's wrong. Can anyone please tell me why it's wrong?
You completely misunderstood the description.
Your question suggests that any permutation of characters in input do not change answer. Also you assumed that histograms are equal.
Position of character is important. Each position in both strings creates a unique pair.
Here my code which passed:
class Solution {
public:
static bool canMapInOneDirection(std::string_view s, std::string_view t)
{
const auto n = s.size();
std::array<char, 128> mapping{};
for(size_t i = 0; i < n; ++i) {
if (mapping[s[i]] == 0) mapping[s[i]] = t[i];
else if (mapping[s[i]] != t[i]) return false;
}
return true;
}
bool isIsomorphic(string s, string t)
{
return s.size() == t.size() && canMapInOneDirection(s, t) && canMapInOneDirection(t, s);
}
};
And test cases you can use to test your code:
s
t
answear
"a"
"b"
true
"aa"
"bb"
true
"ab"
"aa"
false
"aabbcc"
"aabcbc"
false
https://godbolt.org/z/61EcTK5fq
This not a question about anagrams or directly about character frequencies. It is about pattern. It's about having a character-by-character mapping that makes one string into the other. AABC is isomorphic to XXYZ but not isomorphic to BCAA.
When we talk about Isomorphism (same form) it's often a good idea to look for a signature representation.
So instead of determining if two strings are isomorphic I've decided to define a unique signature representation and determine isomorphism if two strings map to the same signature.
I've used std::vector<char> for the signature representation such that the first character (if any) is assigned 0 the second (previously unseen) character 1 and so on.
So a string like MOON has signature {0,1,1,2} because the middle characters are the only repeats. MOON is isomorphic to BOOK but not NOON.
The advantage of such a strategy is that if many strings are to be compared to find groups of mutually isomorphic strings each string need only be converted to its signature once.
#include <iostream>
#include <string>
#include <vector>
#include <unordered_map>
std::vector<char> get_signature(const std::string& str){
std::vector<char> result;
std::unordered_map<char,char> map;
char curr{1};
for(auto cchar : str){
char& c{map[cchar]};
if(c==0){
c=curr++;
}
result.emplace_back(c-1);
}
return result;
}
int check_signature(const std::string& str, const std::vector<char>& expect ){
const auto result{get_signature(str)};
return result==expect?0:1;
}
int main() {
int errors{0};
{
const std::string str{"ABCDE"};
const std::vector<char> signat{0,1,2,3,4};
errors+=check_signature(str,signat);
}
{
const std::string str{"BABY"};
const std::vector<char> signat{0,1,0,2};
errors+=check_signature(str,signat);
}
{
const std::string str{"XXYZX"};
const std::vector<char> signat{0,0,1,2,0};
errors+=check_signature(str,signat);
}
{
const std::string str{"AABCA"};
const std::vector<char> signat{0,0,1,2,0};
errors+=check_signature(str,signat);
}
{
const std::string str{""};
const std::vector<char> signat{};
errors+=check_signature(str,signat);
}
{
const std::string str{"Z"};
const std::vector<char> signat{0};
errors+=check_signature(str,signat);
}
if(get_signature("XXYZX")!=get_signature("AABCA")){
++errors;
}
if(get_signature("MOON")==get_signature("AABCA")){
++errors;
}
if(get_signature("MOON")!=get_signature("BOOK")){
++errors;
}
if(get_signature("MOON")==get_signature("NOON")){
++errors;
}
if(errors!=0){
std::cout << "ERRORS\n";
}else{
std::cout << "SUCCESS\n";
}
return 0;
}
Expected Output: SUCCESS
Because you are missing a loop.
Note that, it still requires more corner case checking to make it fully work. The second approach properly handles all cases.
class Solution {
public:
bool isIsomorphic(string s, string t) {
vector <int> sfreq (26,0);
vector <int> tfreq (26,0);
for (int i=0; i < s.size(); i++) {
sfreq[s[i]-'a']++;
tfreq[t[i]-'a']++;
}
// character at the same index (can be different character) should have the same count.
for(int i= 0; i < s.size(); i++)
if (sfreq[s[i]-'a'] != tfreq[t[i]-'a']) return false;
return true;
}
};
But the above solution only works if there is direct index mappping between characters. Like, AAABBCA and XXXYYZX. But fails for bbbaaaba and aaabbbba. Also, no uppercase, lowercase handled. The link you shared contains the wrong implementation which is mentioned in the comment.
The solution below works as I tested.
class Solution {
public:
bool isIsomorphic(string s, string t) {
vector<int> scount(128, -1), tcount(128, -1);
for (int i = 0; i < s.size(); ++i) {
auto schar = s[i], tchar = t[i];
if (scount[schar] == -1) {
scount[schar] = tchar;
if (tcount[tchar] != -1) return false;
else tcount[tchar] = schar;
} else if (scount[schar] != tchar) return false;
}
return true;
}
};

Creating a recursion in c++

I'm learning how to write recursions, and I am confused as to how to simplify the body of a function into a recursion.
For my current assignment, I have "to Mesh two strings by alternating characters from them. If one string runs out before the other, just pick from the longer one. For example, mesh("Fred", "Wilma") is "FWrieldma". Use recursion. Do not use loops."
Well... I created the loop....
string result;
for (int i = 0; i < a.size(); ++i)
{
result += a.at(i) + b.at(i)
}
But making that into a recursion is stumping me.
This is what I have so far (We are not allowed to change anything above or below where it is marked):
#include <string>
using namespace std;
/**
Combines alternate characters from each string.
#param a the first string.
#param b the second string
*/
string mesh(string a, string b)
{
// INSERT CODE BENEATH HERE
string result;
if (a.size() < 1) result = b;
if (b.size() < 1) result = a;
else
{
result = a.at(0) + b.at(1);
}
return result;
// ENTER CODE ABOVE HERE
}
But i know its not right because there is no recursion and it flat out doesn't work
I think this does what you've asked and keeps the function prototype intact. Also it looks similar to your suggested code.
#include <iostream>
using namespace std;
string mesh(string a, string b) {
if (!a.size()) return b;
if (!b.size()) return a;
return a[0] + (b[0] + mesh(a.substr(1), b.substr(1)));
}
int main(int argc, char const *argv[])
{
printf("%s\n", mesh("Fred", "Wilma").c_str());
return 0;
}
First try to find out what is a single step of the recursion. There is more than one way to do it, one possibility is traverse the strings by using some index pos and in a single step add the characters from the respective positions of the strings:
std::string mesh(const std::string& a, const std::string& b,size_t pos) {
/*...*/
std::string result;
if (pos < a.size()) result += a[pos];
if (pos < b.size()) result += b[pos];
/*...*/
}
To recurse we call the method again for the next index and append to result:
std::string mesh(const std::string& a, const std::string& b,size_t pos = 0) {
/*...*/
std::string result;
if (pos < a.size()) result += a[pos];
if (pos < b.size()) result += b[pos];
return result + mesh(a,b,pos+1);
}
Finally we need a stop condition. The recursion should stop when both strings have no more characters at index pos:
std::string mesh(const std::string& a, const std::string& b,size_t pos = 0) {
if (pos >= a.size() && pos >= b.size()) return "";
std::string result;
if (pos < a.size()) result += a[pos];
if (pos < b.size()) result += b[pos];
return result + mesh(a,b,pos+1);
}
For example:
int main() {
std::cout << mesh("Fred","Wilma");
}
will result in the desired FWrieldma output.
Disclaimer: As pointed out by SergeyA, I didn't pay to much attention to performance when writing this answer. I suppose this is an exercise to practice recursion, while in reality I don't see a reason to implement this via recursion.
Just adding onto largest_prime_is_463035's answer.
If you have to keep signature of mesh the same then you would create another function that has the actual implementation and now mesh can be called be only the two string arguments.
#include <string>
#include <iostream>
using namespace std;
/**
Combines alternate characters from each string.
#param a the first string.
#param b the second string
*/
void meshInternal(const string a, const string b, string& result, unsigned int index=0){
if(index >= a.size()){
result += b.substr(index);
return;
}
if(index >= b.size()){
result += a.substr(index);
return;
}
result.push_back(a[index]);
result.push_back(b[index]);
meshInternal(a, b, result, ++index);
}
string mesh(const string a, const string b)
{
string result = "";
meshInternal("Fred", "Wilma", result);
return result;
}
int main() {
string result = mesh("Fred", "Wilma");
std::cout << result << std::endl;
return 2;
}
As it is not possible to pass another parameter in the mesh function, but in every recursive call we need to know which character from string a and string b will be appended to the result. One simple solution may be removing the first character from both string a and string b and append it to the result. Now, as we are passing string a and string b as reference, removing first character will ultimately make the string empty after a while. So, we can check whether both the string a and string b become empty and set it as the base-case of the recursion call.
This code solves the problem:
std::string mesh(string& a, string& b) {
if (a.size() == 0 && b.size() == 0) return "";
std::string result;
if (a.size()) {
result += a[0];
a.erase(0, 1);
}
if (b.size()) {
result += b[0];
b.erase(0, 1);
}
return result + mesh(a,b);
}
int main()
{
string a = "Fred";
string b = "Wilma";
std::cout << mesh(a,b);
return 0;
}
#include <string>
#include <iostream>
#include <string_view>
// recursive mesh function.
// passing the result object for effeciency.
void mesh(std::string& result, std::string_view l, std::string_view r)
{
// check the exit condition.
// If either the left of right are empty add the other to the result.
if (std::begin(l) == std::end(l)) {
result += r;
return;
}
if (std::begin(r) == std::end(r)) {
result += l;
return;
}
// Add letter from the left and right to the result.
result += *std::begin(l);
result += *std::begin(r);
// Adjust the size of the view
l.remove_prefix(1);
r.remove_prefix(1);
// recursively call to get the next letter.
mesh(result, l, r);
}
// Utility wrapper to get view of strings and create
// the result object to be passed to the recursive function.
std::string mesh(std::string const& l, std::string const& r)
{
std::string result;
mesh(result, std::string_view(l), std::string_view(r));
return result;
}
int main()
{
std::cout << mesh("Fred", "Wilma");
}

C++ Brute Force attack function does not return results

so I'm currently working on a brute force attacker project in C++. I've managed to get it working, but one problem that I'm facing is that if the program actually managed to get a correct guess, the function still goes on. I think the problem is that the program fails to return a guess. Take a look at my code:
(Sorry for the mess, by the way, I'm not that experienced in C++ - I used to code in Python/JS.)
#include <iostream>
#include <cstdlib>
#include <string>
std::string chars = "abcdefghijklmnopqrstuvwxyz";
std::string iterateStr(std::string s, std::string guess, int pos);
std::string crack(std::string s);
std::string iterateChar(std::string s, std::string guess, int pos);
int main() {
crack("bb");
return EXIT_SUCCESS;
}
// this function iterates through the letters of the alphabet
std::string iterateChar(std::string s, std::string guess, int pos) {
for(int i = 0; i < chars.length(); i++) {
// sets the char to a certain letter from the chars variable
guess[pos] = chars[i];
// if the position reaches the end of the string
if(pos == s.length()) {
if(guess.compare(s) == 0) {
break;
}
} else {
// else, recursively call the function
std::cout << guess << " : " << s << std::endl;
iterateChar(s, guess, pos+1);
}
}
return guess;
}
// this function iterates through the characters in the string
std::string iterateStr(std::string s, std::string guess, int pos) {
for(int i = 0; i < s.length(); i++) {
guess = iterateChar(s, guess, i);
if(s.compare(guess) == 0) {
return guess;
}
}
return guess;
}
std::string crack(std::string s) {
int len = s.length();
std::string newS(len, 'a');
std::string newGuess;
newGuess = iterateStr(s, newS, 0);
return newGuess;
}
Edit : Updated code.
The main flaw in the posted code is that the recursive function returns a string (the guessed password) without a clear indication for the caller that the password was found.
Passing around all the strings by value, is also a potential efficiency problem, but the OP should be worried by snippets like this:
guess[pos] = chars[i]; // 'chars' contains the alphabet
if(pos == s.length()) {
if(guess.compare(s) == 0) {
break;
}
}
Where guess and s are strings of the same length. If that length is 2 (OP's last example), guess[2] is outside the bounds, but the successive call to guess.compare(s) will compare only the two chars "inside".
The loop inside iterateStr does nothing useful too, and the pos parameter is unused.
Rather than fixing this attempt, it may be better to rewrite it from scratch
#include <iostream>
#include <string>
#include <utility>
// Sets up the variable and start the brute force search
template <class Predicate>
auto crack(std::string const &src, size_t length, Predicate is_correct)
-> std::pair<bool, std::string>;
// Implements the brute force search in a single recursive function. It uses a
// lambda to check the password, instead of passing it directly
template <class Predicate>
bool recursive_search(std::string const &src, std::string &guess, size_t pos,
Predicate is_correct);
// Helper function, for testing purpouse
void test_cracker(std::string const &alphabet, std::string const &password);
int main()
{
test_cracker("abcdefghijklmnopqrstuvwxyz", "dance");
test_cracker("abcdefghijklmnopqrstuvwxyz ", "go on");
test_cracker("0123456789", "42");
test_cracker("0123456789", "one"); // <- 'Password not found.'
}
void test_cracker(std::string const &alphabet, std::string const &password)
{
auto [found, pwd] = crack(alphabet, password.length(),
[&password] (std::string const &guess) { return guess == password; });
std::cout << (found ? pwd : "Password not found.") << '\n';
}
// Brute force recursive search
template <class Predicate>
bool recursive_search(std::string const &src, std::string &guess, size_t pos,
Predicate is_correct)
{
if ( pos + 1 == guess.size() )
{
for (auto const ch : src)
{
guess[pos] = ch;
if ( is_correct(guess) )
return true;
}
}
else
{
for (auto const ch : src)
{
guess[pos] = ch;
if ( recursive_search(src, guess, pos + 1, is_correct) )
return true;
}
}
return false;
}
template <class Predicate>
auto crack(std::string const &src, size_t length, Predicate is_correct)
-> std::pair<bool, std::string>
{
if ( src.empty() )
return { length == 0 && is_correct(src), src };
std::string guess(length, src[0]);
return { recursive_search(src, guess, 0, is_correct), guess };
}
I've tried your code even with the modified version of your iterateStr() function. I used the word abduct as it is quicker to search for. When stepping through the debugger I noticed that your iterateChar() function was not returning when a match was found. Also I noticed that the length of string s being passed in was 6 however the guess string that is being updated on each iteration had a length of 7. You might want to step through your code and check this out.
For example at on specific iteration the s string contains: abduct but the guess string contains aaaabjz then on the next iteration the guess string contains aaaabkz. This might be your concerning issue of why the loop or function continues even when you think a match is found.
The difference in lengths here could be your culprit.
Also when stepping through your modified code:
for ( size_t i = 0; i < s.length(); i++ ) {
guess = iterCh( s, guess, i );
std::cout << "in the iterStr loop\n";
if ( guess.compare( s ) == 0 ) {
return guess;
}
}
return guess;
in your iterateStr() function the recursion always calls guess = iterCh( s, guess, i ); and the code never prints in the iterStr loop\n";. Your iterateChar function is completing through the entire string or sequence of characters never finding and return a match. I even tried the word abs as it is easier and quicker to step through the debugger and I'm getting the same kind of results.

How to sort file names with numbers and alphabets in order in C?

I have used the following code to sort files in alphabetical order and it sorts the files as shown in the figure:
for(int i = 0;i < maxcnt;i++)
{
for(int j = i+1;j < maxcnt;j++)
{
if(strcmp(Array[i],Array[j]) > 0)
{
strcpy(temp,Array[i]);
strcpy(Array[i],Array[j]);
strcpy(Array[j],temp);
}
}
}
But I need to sort it as order seen in Windows explorer
How to sort like this way? Please help
For a C answer, the following is a replacement for strcasecmp(). This function recurses to handle strings that contain alternating numeric and non-numeric substrings. You can use it with qsort():
int strcasecmp_withNumbers(const void *void_a, const void *void_b) {
const char *a = void_a;
const char *b = void_b;
if (!a || !b) { // if one doesn't exist, other wins by default
return a ? 1 : b ? -1 : 0;
}
if (isdigit(*a) && isdigit(*b)) { // if both start with numbers
char *remainderA;
char *remainderB;
long valA = strtol(a, &remainderA, 10);
long valB = strtol(b, &remainderB, 10);
if (valA != valB)
return valA - valB;
// if you wish 7 == 007, comment out the next two lines
else if (remainderB - b != remainderA - a) // equal with diff lengths
return (remainderB - b) - (remainderA - a); // set 007 before 7
else // if numerical parts equal, recurse
return strcasecmp_withNumbers(remainderA, remainderB);
}
if (isdigit(*a) || isdigit(*b)) { // if just one is a number
return isdigit(*a) ? -1 : 1; // numbers always come first
}
while (*a && *b) { // non-numeric characters
if (isdigit(*a) || isdigit(*b))
return strcasecmp_withNumbers(a, b); // recurse
if (tolower(*a) != tolower(*b))
return tolower(*a) - tolower(*b);
a++;
b++;
}
return *a ? 1 : *b ? -1 : 0;
}
Notes:
Windows needs stricmp() rather than the Unix equivalent strcasecmp().
The above code will (obviously) give incorrect results if the numbers are really big.
Leading zeros are ignored here. In my area, this is a feature, not a bug: we usually want UAL0123 to match UAL123. But this may or may not be what you require.
See also Sort on a string that may contain a number and How to implement a natural sort algorithm in c++?, although the answers there, or in their links, are certainly long and rambling compared with the above code, by about a factor of at least four.
Natural sorting is the way that you must take here . I have a working code for my scenario. You probably can make use of it by altering it according to your needs :
#ifndef JSW_NATURAL_COMPARE
#define JSW_NATURAL_COMPARE
#include <string>
int natural_compare(const char *a, const char *b);
int natural_compare(const std::string& a, const std::string& b);
#endif
#include <cctype>
namespace {
// Note: This is a convenience for the natural_compare
// function, it is *not* designed for general use
class int_span {
int _ws;
int _zeros;
const char *_value;
const char *_end;
public:
int_span(const char *src)
{
const char *start = src;
// Save and skip leading whitespace
while (std::isspace(*(unsigned char*)src)) ++src;
_ws = src - start;
// Save and skip leading zeros
start = src;
while (*src == '0') ++src;
_zeros = src - start;
// Save the edges of the value
_value = src;
while (std::isdigit(*(unsigned char*)src)) ++src;
_end = src;
}
bool is_int() const { return _value != _end; }
const char *value() const { return _value; }
int whitespace() const { return _ws; }
int zeros() const { return _zeros; }
int digits() const { return _end - _value; }
int non_value() const { return whitespace() + zeros(); }
};
inline int safe_compare(int a, int b)
{
return a < b ? -1 : a > b;
}
}
int natural_compare(const char *a, const char *b)
{
int cmp = 0;
while (cmp == 0 && *a != '\0' && *b != '\0') {
int_span lhs(a), rhs(b);
if (lhs.is_int() && rhs.is_int()) {
if (lhs.digits() != rhs.digits()) {
// For differing widths (excluding leading characters),
// the value with fewer digits takes priority
cmp = safe_compare(lhs.digits(), rhs.digits());
}
else {
int digits = lhs.digits();
a = lhs.value();
b = rhs.value();
// For matching widths (excluding leading characters),
// search from MSD to LSD for the larger value
while (--digits >= 0 && cmp == 0)
cmp = safe_compare(*a++, *b++);
}
if (cmp == 0) {
// If the values are equal, we need a tie
// breaker using leading whitespace and zeros
if (lhs.non_value() != rhs.non_value()) {
// For differing widths of combined whitespace and
// leading zeros, the smaller width takes priority
cmp = safe_compare(lhs.non_value(), rhs.non_value());
}
else {
// For matching widths of combined whitespace
// and leading zeros, more whitespace takes priority
cmp = safe_compare(rhs.whitespace(), lhs.whitespace());
}
}
}
else {
// No special logic unless both spans are integers
cmp = safe_compare(*a++, *b++);
}
}
// All else being equal so far, the shorter string takes priority
return cmp == 0 ? safe_compare(*a, *b) : cmp;
}
#include <string>
int natural_compare(const std::string& a, const std::string& b)
{
return natural_compare(a.c_str(), b.c_str());
}
What you want to do is perform "Natural Sort". Here is a blog post about it, explaining implementation in python I believe. Here is a perl module that accomplishes it. There also seems to be a similar question at How to implement a natural sort algorithm in c++?
Taking into account that this has a c++ tag, you could elaborate on #Joseph Quinsey's answer and create a natural_less function to be passed to the standard library.
using namespace std;
bool natural_less(const string& lhs, const string& rhs)
{
return strcasecmp_withNumbers(lhs.c_str(), rhs.c_str()) < 0;
}
void example(vector<string>& data)
{
std::sort(data.begin(), data.end(), natural_less);
}
I took the time to write some working code as an exercise
https://github.com/kennethlaskoski/natural_less
Modifying this answer:
bool compareNat(const std::string& a, const std::string& b){
if (a.empty())
return true;
if (b.empty())
return false;
if (std::isdigit(a[0]) && !std::isdigit(b[0]))
return true;
if (!std::isdigit(a[0]) && std::isdigit(b[0]))
return false;
if (!std::isdigit(a[0]) && !std::isdigit(b[0]))
{
if (a[0] == b[0])
return compareNat(a.substr(1), b.substr(1));
return (toUpper(a) < toUpper(b));
//toUpper() is a function to convert a std::string to uppercase.
}
// Both strings begin with digit --> parse both numbers
std::istringstream issa(a);
std::istringstream issb(b);
int ia, ib;
issa >> ia;
issb >> ib;
if (ia != ib)
return ia < ib;
// Numbers are the same --> remove numbers and recurse
std::string anew, bnew;
std::getline(issa, anew);
std::getline(issb, bnew);
return (compareNat(anew, bnew));
}
toUpper() function:
std::string toUpper(std::string s){
for(int i=0;i<(int)s.length();i++){s[i]=toupper(s[i]);}
return s;
}
Usage:
#include <iostream> // std::cout
#include <string>
#include <algorithm> // std::sort, std::copy
#include <iterator> // std::ostream_iterator
#include <sstream> // std::istringstream
#include <vector>
#include <cctype> // std::isdigit
int main()
{
std::vector<std::string> str;
str.push_back("20.txt");
str.push_back("10.txt");
str.push_back("1.txt");
str.push_back("z2.txt");
str.push_back("z10.txt");
str.push_back("z100.txt");
str.push_back("1_t.txt");
str.push_back("abc.txt");
str.push_back("Abc.txt");
str.push_back("bcd.txt");
std::sort(str.begin(), str.end(), compareNat);
std::copy(str.begin(), str.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Your problem is that you have an interpretation behind parts of the file name.
In lexicographical order, Slide1 is before Slide10 which is before Slide5.
You expect Slide5 before Slide10 as you have an interpretation of the substrings 5 and 10 (as integers).
You will run into more problems, if you had the
name of the month in the filename, and would expect them to be ordered by date (i.e. January comes before August). You will need to adjust your sorting to this interpretation (and the "natural" order will depend on your interpretation, there is no generic solution).
Another approach is to format the filenames in a way that your sorting and the lexicographical order agree. In your case, you would use leading zeroes and a fixed length for the number. So Slide1 becomes Slide01, and then you will see that sorting them lexicographically will yield the result you would like to have.
However, often you cannot influence the output of an application, and thus cannot enforce your format directly.
What I do in those cases: write a little script/function that renames the file to a proper format, and then use standard sorting algorithms to sort them. The advantage of this is that you do not need to adapt your sorting, and can use existing software for the sorting.
On the downside, there are situations where this is not feasible (as filenames need to be fixed).

How to assign '-' is equal to ' ' for string class compare, is this possible?

I am comparing two strings, str1 and str2, using the string.compare function in #string. Is there a way to force the class to think that '-' is the equivalent of ' '. Looking at the member functions of char_traits I thought that the .assign would allow me to accomplish this but it is acting as if I am saying, str1='-'; or str1=' ';. I would prefer not to rewrite my own string handling class.
What about copying and replacing all occurrences of "-" with " " before comparing the two strings?
There is nothing in the library for such a specific use case, but it's easy to do it yourself:
Make a copy of both strings. In each, replace all '-' with ' '. Then perform the comparison on those strings;
Alternatively, make your own function that iterates through each character and performs a lexicographic comparison with the additional semantics you've described. This has the benefit of not requiring string copies, but is going to be more code and possibly more error-prone.
There are several possibilities, depending whether you want this behavior:
once
encoded in the class
If you want to behavior once: simply use your own (custom) algorithm:
bool isSpace(char i) { return i == '-' or i == ' '; }
int compare(std::string const& left, std::string const& right) {
typedef std::string::const_iterator ConstIterator;
typedef std::pair<ConstIterator, ConstIterator> Result;
size_t const size = std::min(left.size(), right.size());
Result const r = std::mismatch(left.begin(),
left.begin() + size,
right.begin(),
[](char a, char b) {
return a == b or (isSpace(a) and isSpace(b));
});
if (r.first == left.begin() + size) { // equal up til the end, shorter wins
return left.size() < right.size() ? -1 :
(left.size() == right.size() ? 0 : 1);
}
// not equal until the end
return *r.first < *r.second ? -1 : 1;
}
If this behavior need be encoded within the class itself, you need to use basic_string and provide a custom trait class.
The traits class provides a static int compare ( const char_type* s1, const char_type* s2, size_t n); function that is used by std::string::compare under the hood.
So for example:
struct MyTraits: char_traits<char> // too lazy to reimplement everything
{
static int compare(const char_type* s1, const char_type* s2, size_t n);
// definition can be trivially derived from the above version
};
typedef std::basic_string<char, MyTraits> MyString;
Of course, MyString is then completely incompatible with other std::string.
Frankly, if you can, simply "normalize" your string and decide whether you'll use '-' or ' '. It will make your life easier.
A simple way is to fold the strings, i.e. simple replace all "-" with " " before comparing them.
The standard C++ library provide powerful algorithms for this. It seems you want to use std::mismatch() together with a custom predicate considering '-' and ' ' to be identical. This would look something like this:
bool pred(char c0, char c1) {
return c0 == c1
|| (c0 == '-' && c1 == ' ')
|| (c0 == ' ' && c1 == '-');
}
std::string const& s(s0.size() < s1.size()? s0: s1);
std::string const& l(s0.size() < s1.size()? s1: s0);
auto p = std::mismatch(s.begin(), s.end(), l.begin(), pred);
Following this, p is a pair of iterator pointing at the first char which differs (or the end iterator). To determine which string sort before or after you'd just evaluate the result.
The interface is a bit annoying in that the shorter sequence needs to first: there should be constraints for both ends.
(community-wiki, feel free to contribute.)
You can define a new type of string, based on a new type of char. The following compiles and seems to run, but maybe there are some things missing or wrong. The goal is to define a string class where hyphens are automatically converted to spaces.
#include<iostream>
using namespace std;
struct newchar {
char c;
bool operator <(const newchar &other) const {
return this->c < other.c;
}
newchar(char c_): c(c_) {
fixme();
}
newchar(): c('\0') {}
newchar & operator = (const newchar &in) {
this->c = in.c;
fixme();
return *this;
}
void fixme() {
if(c=='-')
c = ' ';
}
};
struct newstring : basic_string<newchar> {
string toCharString() const {
std :: string s;
for(const_iterator i = this->begin(); i != this->end(); i++) {
char c = i->c;
s += c;
}
return s;
}
};
ostream& operator<< (ostream & os, const newstring &ns) {
os << ns.toCharString();
return os;
}
int main() {
newstring s;
s.compare(s);
s += 'k';
cout << '<' << string() << '>' << endl;
cout << '<' << s << '>' << endl;
}