C++ string diff (a la Python's difflib)

C++ string diff (a la Python's difflib) - c++

I'm trying to diff two strings to determine whether or not they solely vary in one numerical subset of the string structure; for example,
varies_in_single_number_field('foo7bar', 'foo123bar')
# Returns True, because 7 != 123, and there's only one varying
# number region between the two strings.
In Python I can use the difflib to accomplish this:
import difflib, doctest
def varies_in_single_number_field(str1, str2):
"""
A typical use case is as follows:
>>> varies_in_single_number_field('foo7bar00', 'foo123bar00')
True
Numerical variation in two dimensions is no good:
>>> varies_in_single_number_field('foo7bar00', 'foo123bar01')
False
Varying in a nonexistent field is okay:
>>> varies_in_single_number_field('foobar00', 'foo123bar00')
True
Identical strings don't *vary* in any number field:
>>> varies_in_single_number_field('foobar00', 'foobar00')
False
"""
in_differing_substring = False
passed_differing_substring = False # There should be only one.
differ = difflib.Differ()
for letter_diff in differ.compare(str1, str2):
letter = letter_diff[2:]
if letter_diff.startswith(('-', '+')):
if passed_differing_substring: # Already saw a varying field.
return False
in_differing_substring = True
if not letter.isdigit(): return False # Non-digit diff character.
elif in_differing_substring: # Diff character not found - end of diff.
in_differing_substring = False
passed_differing_substring = True
return passed_differing_substring # No variation if no diff was passed.
if __name__ == '__main__': doctest.testmod()
But I have no idea how to find something like difflib for C++. Alternative approaches welcome. :)

This might work, it at least passes your demonstration test:
EDIT: I've made some modifications to deal with some string indexing issues. I believe it should be good now.
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <cctype>
bool starts_with(const std::string &s1, const std::string &s2) {
return (s1.length() <= s2.length()) && (s2.substr(0, s1.length()) == s1);
}
bool ends_with(const std::string &s1, const std::string &s2) {
return (s1.length() <= s2.length()) && (s2.substr(s2.length() - s1.length()) == s1);
}
bool is_numeric(const std::string &s) {
for(std::string::const_iterator it = s.begin(); it != s.end(); ++it) {
if(!std::isdigit(*it)) {
return false;
}
}
return true;
}
bool varies_in_single_number_field(std::string s1, std::string s2) {
size_t index1 = 0;
size_t index2 = s1.length() - 1;
if(s1 == s2) {
return false;
}
if((s1.empty() && is_numeric(s2)) || (s2.empty() && is_numeric(s1))) {
return true;
}
if(s1.length() < s2.length()) {
s1.swap(s2);
}
while(index1 < s1.length() && starts_with(s1.substr(0, index1), s2)) { index1++; }
while(ends_with(s1.substr(index2), s2)) { index2--; }
return is_numeric(s1.substr(index1 - 1, (index2 + 1) - (index1 - 1)));
}
int main() {
std::cout << std::boolalpha << varies_in_single_number_field("foo7bar00", "foo123bar00") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("foo7bar00", "foo123bar01") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("foobar00", "foo123bar00") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("foobar00", "foobar00") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("7aaa", "aaa") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("aaa7", "aaa") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("aaa", "7aaa") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("aaa", "aaa7") << std::endl;
}
Basically, it looks for a string which has 3 parts, string2 begins with part1, string2 ends with part3 and part2 is only digits.

It's probably a bit of overkill, but you could use boost to interface to python. At the worst, difflib is implemented in pure python, and it's not too long. It should be possible to port from python to C...

You could do an ad hoc approach: You're looking to match strings s and s', where s=abc and s'=ab'c, and the b and b' should be two distinct numbers (possible empty). So:
Compare the strings from the left, char by char, until you hit different characters, and then stop. You
Similarly, compare the strings from the right until you hit different characters, OR hit that left marker.
Then check the remainders in the middle to see if they're both numbers.

How about using something like boost::regex?
// pseudo code, may or may not compile
bool match_except_numbers(const std::string& s1, const std::string& s2)
{
static const boost::regex fooNumberBar("foo\\d+bar");
return boost::match(s1, fooNumberBar) && boost::match(s2, fooNumberBar);
}

#Evan Teran: looks like we did this in parallel -- I have a markedly less readable O(n) implementation:
#include <cassert>
#include <cctype>
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
ostringstream debug;
const bool DEBUG = true;
bool varies_in_single_number_field(const string &str1, const string &str2) {
bool in_difference = false;
bool passed_difference = false;
string str1_digits, str2_digits;
size_t str1_iter = 0, str2_iter = 0;
while (str1_iter < str1.size() && str2_iter < str2.size()) {
const char &str1_char = str1.at(str1_iter);
const char &str2_char = str2.at(str2_iter);
debug << "str1: " << str1_char << "; str2: " << str2_char << endl;
if (str1_char == str2_char) {
if (in_difference) {
in_difference = false;
passed_difference = true;
}
++str1_iter, ++str2_iter;
continue;
}
in_difference = true;
if (passed_difference) { /* Already passed a difference. */
debug << "Already passed a difference." << endl;
return false;
}
bool str1_char_is_digit = isdigit(str1_char);
bool str2_char_is_digit = isdigit(str2_char);
if (str1_char_is_digit && !str2_char_is_digit) {
++str1_iter;
str1_digits.push_back(str1_char);
} else if (!str1_char_is_digit && str2_char_is_digit) {
++str2_iter;
str2_digits.push_back(str2_char);
} else if (str1_char_is_digit && str2_char_is_digit) {
++str1_iter, ++str2_iter;
str1_digits.push_back(str1_char);
str2_digits.push_back(str2_char);
} else { /* Both are non-digits and they're different. */
return false;
}
}
if (in_difference) {
in_difference = false;
passed_difference = true;
}
string str1_remainder = str1.substr(str1_iter);
string str2_remainder = str2.substr(str2_iter);
debug << "Got to exit point; passed difference: " << passed_difference
<< "; str1 digits: " << str1_digits
<< "; str2 digits: " << str2_digits
<< "; str1 remainder: " << str1_remainder
<< "; str2 remainder: " << str2_remainder
<< endl;
return passed_difference
&& (str1_digits != str2_digits)
&& (str1_remainder == str2_remainder);
}
int main() {
assert(varies_in_single_number_field("foo7bar00", "foo123bar00") == true);
assert(varies_in_single_number_field("foo7bar00", "foo123bar01") == false);
assert(varies_in_single_number_field("foobar00", "foo123bar00") == true);
assert(varies_in_single_number_field("foobar00", "foobar00") == false);
assert(varies_in_single_number_field("foobar00", "foobaz00") == false);
assert(varies_in_single_number_field("foo00bar", "foo01barz") == false);
assert(varies_in_single_number_field("foo01barz", "foo00bar") == false);
if (DEBUG) {
cout << debug.str();
}
return 0;
}

Related

C++ Input to only accept operator symbols or only numbers

I've been trying to code this as sort of a beginner program for C++ to help me understand the language more but I don't really know how to tackle this.
Basically the program needs to only accept '+', '*', '/', '%' or numbers, otherwise it will say the input is invalid.
Here's what I've done so far
#include <iostream>
#include <string>
using namespace std;
int main()
{
string x;
cout<<"Enter expression: ";
getline(cin,x);
if(x.find('*') != string::npos){
cout<<"Multiplication";
}else if(x.find('/') != string::npos){
cout<<"Division";
}else if(x.find('%') != string::npos){
cout<<"Modulo";
}else if(x.find('+') != string::npos){
cout<<"Addition";
}else{
cout<<"Invalid!";
}
return 0;
}

Definition of the Valid Input
Here I assume that the valid input is given by the following natural statements. First of all, as you mentioned,
Each input must be constructed from *, /, %, + and an integer.
Only zero or one operation. ( So 1+1 is valid. )
For an input of an integer, I also assume
Whitespace characters are allowed in the left and right side of the input string.
Whitespace characters between non-whitespace characters are not allowed.
The first non-whitespace character must be 0, 1, ..., 9 or - (for negative integers).
The second and the subsequent non-whitespace characters must be 0, 1, ..., 8 or 9.
Note that in my assumption the positive sign character +, decimal-point character . are not allowed for integer inputs.
For instance, in this definition,
"123", " 123", "123 " and " -123 " are all valid integer inputs.
"abc", "123a", " 1 23", "+123" and "1.0" are all invalid ones.
Validity Check Function for An Integer
First, to check the validity of the input of an integer, we trim the input and remove left and right whitespaces using the following trimming function: ( If you can use C++17, std::string_view would be more preferable from the performance poin of view.)
#include <string>
std::string trimLR(const std::string& str)
{
const auto strBegin = str.find_first_not_of(" \f\n\r\t\v");
if (strBegin == std::string::npos){
return "";
}
const auto strEnd = str.find_last_not_of(" \f\n\r\t\v");
const auto strRange = strEnd - strBegin + 1;
return str.substr(strBegin, strRange);
}
Next, we define the following simple validity check function isInteger which checks whether the passed string is an integer or not. Here std::isdigit is useful to check whether each character is digits or not.
Please note that various interesting methods are proposed in the past
posts.
#include <string>
#include <algorithm>
bool isInteger(const std::string& s)
{
const auto ts = trimLR(s);
if(ts.empty()){
return false;
}
const std::size_t offset = (ts[0] == '-') ? 1 : 0;
const auto begin = ts.cbegin() + offset;
return (begin != ts.cend()) // false if s is just a negative sign "-"
&& std::all_of(begin, ts.cend(), [](unsigned char c){
return std::isdigit(c);
});
}
Main Function
Now it is easy and straightforward to implement the main function.
The following code will check inputs and work fine.
The next considerations are writing tests and performance tunings:
DEMO(Multiplication)
DEMO(Division)
DEMO(Modulo)
DEMO(Addition)
DEMO(Invalid 1)
DEMO(Invalid 2)
#include <iostream>
int main()
{
std::string x;
std::cout << "Enter expression: ";
std::getline(std::cin, x);
const auto optPos = x.find_first_of("*/%+");
if (optPos == std::string::npos)
{
if(isInteger(x)){
std::cout << "Valid input, " << x;
}
else{
std::cout << "Invalid input, " << x;
}
return 0;
}
const auto left = x.substr(0, optPos);
const auto opt = x.substr(optPos, 1);
const auto right = x.substr(std::min(optPos+1, x.length()-1));
if (!isInteger(left) || !isInteger(right))
{
std::cout
<< "Either `" << left << "`, `" << right
<< "` or both are invalid inputs." << std::endl;
return 0;
}
const auto leftVal = std::stod(left);
const auto rightVal = std::stod(right);
if(opt == "*")
{
std::cout
<< "Multiplication: "
<< x << " = " << (leftVal * rightVal);
}
else if(opt == "/")
{
std::cout
<< "Division: "
<< x << " = " << (leftVal / rightVal);
}
else if(opt == "%")
{
std::cout
<< "Modulo: "
<< x << " = " << (std::stoi(left) % std::stoi(right));
}
else if(opt == "+")
{
std::cout
<< "Addition: "
<< x << " = " << (leftVal + rightVal);
}
return 0;
}

how to use + as a character in c++ string

I am creating a bool function to test the validity of an input string. One of the specs requires testing the placement of a + symbol. However, when I try and search for '+' within the string, nothing results. I am thinking this is because + is an operator? I have also tried using '+' and creating a substring at this location with no success.
simplified version of code:
bool isValidString(string s)
{
size_t found1 = s.find_first_not of("123456789") //string should only contain numbers, B, and +
if ( s[found1] == 'B' ) {
found1++;
if s[found1] == '+'
return true;
else
return false; }
}

If I may guess at your intent ...
Here are 3 ways to detect 'B+', in your strings, but the 3rd does not meet your requirements.
#include <iostream>
#include <iomanip>
#include <vector>
// returns true when "B+" found in s
bool isValidString1 (std::string s)
{
bool retVal = false;
size_t found1 = s.find_first_not_of("123456789"); //string should only contain numbers, B, and +
if ( s[found1] == 'B' )
{
found1++;
retVal = (s[found1] == '+');
}
return retVal;
}
bool isValidString2 (std::string s)
{
size_t found1 = s.find_first_not_of("123456789"); //string should only contain numbers, B, and +
bool retVal = false;
switch (s[found1])
{
case 'B': retVal = ('+' == s[found1+1]); break;
case '+': /* tbd - what do if out of order */ break;
default : /* tbd - what do if not allowed */ break;
}
return (retVal);
}
// simple, but does not reject the rest of non-digits
bool isValidString3 (std::string s)
{
size_t indx = s.find("B+");
return (indx != std::string::npos);
}
void test(std::string s)
{
std::cout << "\n 1 s: '" << s << "' "
<< (isValidString1(s) ? "valid" : "invalid");
std::cout << "\n 2 s: '" << s << "' "
<< (isValidString2(s) ? "valid" : "invalid");
std::cout << "\n 3 s: '" << s << "' "
<< (isValidString3(s) ? "valid" : "invalid") << std::endl;
}
int main(int , char** )
{
std::string s10 = "1234B+56789";
test(s10);
std::string s11 = "1234+B+5678";
test(s11);
std::string s12 = "B+12345678";
test(s12);
std::string s13 = "12345678B+";
test(s13);
std::string s14 = "12345678+B";
test(s14);
}
Output looks like:
1 s: '1234B+56789' valid
2 s: '1234B+56789' valid
3 s: '1234B+56789' valid
1 s: '1234+B+5678' invalid
2 s: '1234+B+5678' invalid
3 s: '1234+B+5678' valid
1 s: 'B+12345678' valid
2 s: 'B+12345678' valid
3 s: 'B+12345678' valid
1 s: '12345678B+' valid
2 s: '12345678B+' valid
3 s: '12345678B+' valid
1 s: '12345678+B' invalid
2 s: '12345678+B' invalid
3 s: '12345678+B' invalid
It appears that the following (output from isValidStr3()) is actually not what you want.
3 s: '1234+B+5678' valid
I find no issue with '+', it is simply another char in a std::string. No escaping needed. In this context, it is not an operator.

this seems to work
#include <iostream>
#include <string>
using namespace std;
int main()
{
string str{ "Smith, where Jones + + \"+ +\", \"+ +\" +."
" \"+ +\" + + the examiners' approval." };
string substr{ '+' };
cout << "The string to be searched is:" << endl << str << endl;
size_t offset{};
size_t count{};
size_t increment{ substr.length() };
while (true)
{
offset = str.find(substr, offset);
if (string::npos == offset)
break;
offset += increment;
++count;
}
cout << " The string \"" << substr
<< "\" was found " << count << " times in the string above."
<< endl;
return 0;
}

C++: check if string is a valid integer using "strtol"

I have heard that I should use strtol instead of atoi due to its better error handling. I wanted to test out strtol by seeing if I could use this code to check if a string is an integer:
#include <iostream>
#include <stdlib.h>
using namespace std;
int main()
{
string testString = "ANYTHING";
cout << "testString = " << testString << endl;
int testInt = strtol(testString.c_str(),NULL,0);
cout << "errno = " << errno << endl;
if (errno > 0)
{
cout << "There was an error." << endl;
cout << "testInt = " << testInt << endl;
}
else
{
cout << "Success." << endl;
cout << "testInt = " << testInt << endl;
}
return 0;
}
I replaced ANYTHING with 5 and it worked perfectly:
testString = 5
errno = 0
Success.
testInt = 5
And when I do it with 2147483648, the largest possible int + 1 (2147483648), it returns this:
testString = 2147483648
errno = 34
There was an error.
testInt = 2147483647
Fair enough. But, when i try it with Hello world!, it incorrectly thinks it's a valid int and returns 0:
testString = Hello world!
errno = 0
Success.
testInt = 0
Notes:
I am using Code::Blocks with GNU GCC Compiler on Windows
"Have g++ follow the C++11 ISO C++ language standard [-std=c++11]" is checked in "Compiler Flags".

According with the man page of strtol. You must define your function such as:
bool isNumeric(const std::string& str) {
char *end;
long val = std::strtol(str.c_str(), &end, 10);
if ((errno == ERANGE && (val == LONG_MAX || val == LONG_MIN)) || (errno != 0 && val == 0)) {
// if the converted value would fall out of the range of the result type.
return false;
}
if (end == str) {
// No digits were found.
return false;
}
// check if the string was fully processed.
return *end == '\0';
}
In C++11, I prefer to use std::stol instead of std::strtol, such as:
bool isNumeric(const std::string& str) {
try {
size_t sz;
std::stol(str, &sz);
return sz == str.size();
} catch (const std::invalid_argument&) {
// if no conversion could be performed.
return false;
} catch (const std::out_of_range&) {
// if the converted value would fall out of the range of the result type.
return false;
}
}
std::stol calls std::strtol, but you works directly with std::string and the code is simplified.

strtol stops on the first non digit
but if you read the man page http://man7.org/linux/man-pages/man3/strtol.3.html you can see
If endptr is not NULL, strtol() stores the address of the first
invalid character in *endptr. If there were no digits at all,
strtol() stores the original value of nptr in *endptr (and returns
0). In particular, if *nptr is not '\0' but **endptr is '\0' on
return, the entire string is valid.
ie
string testString = "ANYTHING";
cout << "testString = " << testString << endl;
char *endptr;
int testInt = strtol(testString.c_str(),&endptr,0);
if(**endptr)
cout << "bad input";

Do not use the C++11 way solution with exceptions, because it is slower. Here is a fast C++11 version:
#include <algorithm>
bool is_decimal(const std::string& s)
{
return !s.empty() && std::find_if(s.begin(), s.end(), [](char c){ return !std::isdigit(c); }) == s.end();
}
If you are sure that your strings mostly are not empty, then you can delete !s.empty(). If you are not, keep it because !s.empty() (!(s.length()==0)) is cheaper than if you call find_if (reference) with an empty string.
Edit:
If you have to handle overflow use the exception version above. Only if you can not use exceptions use this:
#include <string>
#include <sstream>
#include <limits>
template <class T>
bool is_decimal_and_fit(const std::string& s)
{
long double decimal = 0;
return (!(std::istringstream(s) >> decimal).fail() && (decimal >= std::numeric_limits<T>::lowest()) && (decimal <= std::numeric_limits<T>::max()));
}

How to return a certain boolean value in a recursive function?

I want to make a recursive function that determines if a string's characters all consist of alphabets or not. I just can't figure it out. Here's what I've done so far but it doesn't work properly.
bool isAlphabetic(string s){
const char *c = s.c_str();
if ((!isalpha(c[0]))||(!isalpha(c[s.size()])))
{
return false;
}
else if (isalpha(c[0]))
{
isAlphabetic(c+1);
return true;
}
}
can anyone suggest a correct way?

Leaving aside the many partial strings you'll create (consider passing in just the string and a starting index instead), the isalpha(c[s.size()]) check will always fail, since that's the \0 at the end of the string. You're also ignoring the result of the recursive calls.
bool isAlphabetic(string s){
if (s.size() < 1)
return true; // empty string contains no non-alphas
const char *c = s.c_str();
if (!isalpha(c[0]))
{
return false; // found a non-alpha, we're done.
}
else
{
return isAlphabetic(c+1); // good so far, try the rest of the string
}
}

Building on Paul's answer, here is a fixed implementation that won't copy any portion of the string. It accomplishes this by passing a reference to the string object and an index to the character to check; recursion simply adds 1 to this index to check the next character, and so on until the end of the string is found.
I have removed your call to c_str() since it isn't needed. string can be directly indexed.
bool isAlphabetic(string const & s, int startIndex = 0) {
// Terminating case: End of string reached. This means success.
if (startIndex == s.size()) {
return true;
}
// Failure case: Found a non-alphabetic character.
if (!isalpha(s[startIndex])) {
return false;
}
// Recursive case: This character is alphabetic, so check the rest of the string.
return isAlphabetic(s, startIndex + 1);
}
Note that the empty string is considered alphabetic by this function. You can change this by changing return true to return !s.empty().

Here a working example:
#include <iostream>
#include <string>
using namespace std;
bool isAlphabetic(string s)
{
if( s.empty() )
{
return false;
}
cout << "checking: " << s[0] << endl;
if( isalpha(s[0]) )
{
return true;
}
return isAlphabetic(&s[0]+1);
}
int main()
{
string word0 = "test";
if( isAlphabetic(word0) )
{
cout << word0 << " is alphabetic" << endl;
}
else
{
cout << word0 << " is NOT alphabetic" << endl;
}
string word1 = "1234";
if( isAlphabetic(word1) )
{
cout << word1 << " is alphabetic" << endl;
}
else
{
cout << word1 << " is NOT alphabetic" << endl;
}
string word2 = "1234w";
if( isAlphabetic(word2) )
{
cout << word2 << " is alphabetic" << endl;
}
else
{
cout << word2 << " is NOT alphabetic" << endl;
}
return 0;
}

Getting the text of the last directory of a given string with the delimiters of /

Given a URL (which is a string) such as this:
www.testsite.com/pictures/banners/whatever/
I want to be able to get the characters of the last directory in the URL (in this case it's "whatever", I want to also remove the forward slashes). What would be the most efficient way to do this?
Thanks for any help

#include <iostream>
#include <string>
std::string getlastcomponent(std::string s) {
if (s.size() > 0 && s[s.size()-1] == '/')
s.resize(s.size() - 1);
size_t i = s.find_last_of('/');
return (i != s.npos) ? s.substr(i+1) : s;
}
int main() {
std::string s1 = "www.testsite.com/pictures/banners/whatever/";
std::string s2 = "www.testsite.com/pictures/banners/whatever";
std::string s3 = "whatever/";
std::string s4 = "whatever";
std::cout << getlastcomponent(s1) << '\n';
std::cout << getlastcomponent(s2) << '\n';
std::cout << getlastcomponent(s3) << '\n';
std::cout << getlastcomponent(s4) << '\n';
return 0;
}

Get the length and push every letter from last ( at example pseudo code:
x = string.length()
while(X != 0)
{
CharVector.push(string.at(x));
x--;
if(string.at(x) == "\") break;
}
then you got revetahw instead of whatever.
Then just swap it with this fucntion:
string ReverseString( const string& word )
{
std::string l_bla;
bla.reserve(word.size());
for ( string::size_type x = word.length ( ); x > 0; x-- )
{
l_bla += word.at ( x -1 );
}
return l_bla;
}
so you got whatever

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ string diff (a la Python's difflib) - c++

It's probably a bit of overkill, but you could use boost to interface to python. At the worst, difflib is implemented in pure python, and it's not too long. It should be possible to port from python to C...

How about using something like boost::regex? // pseudo code, may or may not compile bool match_except_numbers(const std::string& s1, const std::string& s2) { static const boost::regex fooNumberBar("foo\\d+bar"); return boost::match(s1, fooNumberBar) && boost::match(s2, fooNumberBar); }

Related

C++ Input to only accept operator symbols or only numbers

how to use + as a character in c++ string

C++: check if string is a valid integer using "strtol"

How to return a certain boolean value in a recursive function?

Getting the text of the last directory of a given string with the delimiters of /

Categories

Resources