Leet Code Regular Expression Matching Problem - c++

https://leetcode.com/problems/regular-expression-matching
I was doing this practice problem (cpp) and while faster solutions are in the comments, I would like to understand why my code isn't working. This fails with s = "mississippi" and p = "mis*is*p*.". Tracing through the code, I figured it would correctly remove the first two letters, then when seeing the s* it would go through the s in the string (two of them), then remove the i in both, remove all the s (again 2) then remove all the p's (which is none, because it's compared against the i in the first string, so it should not modify that string). Finally, the '.' would match with the first p and remove both. So the final string should be "pi" and return false when the length is compared to zero.
class Solution {
public:
bool isMatch(string s, string p) {
while (s.length() > 0){
if (p.length() == 0){
return false;
}else if (p.length() == 1){
return p.compare(s) == 0 || p.at(0) == '.';
}else{
if (p.at(1) == '*'){
char c = p.at(0);
p = p.substr(2);
if (c == '.'){
return true;
}
int spot = 0;
while(spot < s.length() && s.at(spot) == c){
spot++;
}
if (spot != 0){
s = s.substr(spot);
}
}else{
if (s.at(0) != p.at(0) && p.at(0) != '.'){
return false;
}
s = s.substr(1);
p = p.substr(1);
}
}
}
return s.length() == 0;
}
};

Your logic is faulty here
return p.compare(s) == 0 || p.at(0) == '.';
That should be
return p.compare(s) == 0 || (s.length() == 1 && p.at(0) == '.');
That took me five minutes to find, two minutes looking at the code without seeing the problem, and then three minutes using a debugger to track down the logic error. You really should learn to use a debugger, much more efficient than asking on SO.
Some tips here.

Related

String and character comparison

I am new to programming and I need to search any string to see if it includes only the letters a,b,c,d,e or f. The minute the program finds a letter that is not one of those the program should return false. Here is my function
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if ((word[index] == 'a') || (word[index] == 'b') || (word[index] == 'c')||
(word[index] == 'd')|| (word[index] == 'e')|| (word[index] == 'f')) {
return true;
}
else {
return false;
}
index++;
}
}
Thank you very much for nay help! :)
The moment the return statement is encountered, the function is exited. This means that the moment any of the characters 'a', 'b', 'c', 'd', 'e', 'f' is encountered while iterating, due to the return statement the function will be exited immediately.
You can use std::string::find_first_not_of as shown below:
std::string input = "somearbitrarystring";
std::string validChars = "abcdef";
std::size_t found = input.find_first_not_of(validChars);
if(found != std::string::npos)
std::cout << "Found nonfavorite character " <<input[found]<<" at position "<<found<< std::endl;
else
{
std::cout<<"Only favorite characters found"<<std::endl;
}
If you unroll the loop by hand, you will spot the problem immediately:
if ((word[0] == 'a') || (word[0] == 'b') || (word[0] == 'c')||
(word[0] == 'd')|| (word[0] == 'e')|| (word[0] == 'f')) {
return true;
}
else {
return false;
}
if ((word[1] == 'a') || (word[1] == 'b') || (word[1] == 'c')||
(word[1] == 'd')|| (word[1] == 'e')|| (word[1] == 'f')) {
return true;
}
else {
return false;
}
//...
That is, the return value depends only on the first element.
"The minute the program finds a letter that is not one of those the program should return false" means
if ((word[0] != 'a') || (word[0] != 'b') || (word[0] != 'c')||
(word[0] != 'd')|| (word[0] != 'e')|| (word[0] != 'f')) {
return false;
}
if ((word[1] != 'a') || (word[1] != 'b') || (word[1] != 'c')||
(word[1] != 'd')|| (word[1] != 'e')|| (word[1] != 'f')) {
return false;
}
// ...
// After checking all the characters, you know what all them were in
// your desired set, so you can return unconditionally.
return true;
or, with a loop:
while (index < length) {
if ((word[index] != 'a') || (word[index] != 'b') || (word[index] != 'c')||
(word[index] != 'd')|| (word[index] != 'e')|| (word[index] != 'f')) {
return false;
}
index++;
}
return true;
bool is_favorite(string word){
return ( word.find_first_not_of( "abcdef" ) == std::string::npos );
}
It returns true if, and only if, there are only the characters 'a' through 'f' in the string. Any other character ends the search immediately.
And if you exchange string word with const string & word, your function will not have to create a copy of each word you pass to it, but work on a read-only reference to it, improving efficiency.
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if (word[index] > 'f' || word[index] < 'a')
return false;
index++;
}
return true;
}
The return true is logically in the wrong place in your code.
Your version returns true as soon as it finds one letter that is a through f. It's premature to conclude that the whole string is valid at that point, because there may yet be an invalid character later in the string.
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if ((word[index] == 'a') || (word[index] == 'b') || (word[index] == 'c')||
(word[index] == 'd')|| (word[index] == 'e')|| (word[index] == 'f')) {
return true; // This is premature.
}
else {
return false;
}
index++;
}
}
Minimal change that illustrates where the return true should be: after the loop. The return true is reached only if and only if we did not detect any invalid characters in the loop.
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if ((word[index] == 'a') || (word[index] == 'b') || (word[index] == 'c')||
(word[index] == 'd')|| (word[index] == 'e')|| (word[index] == 'f')) {
// Do nothing here
}
else {
return false;
}
index++;
}
return true;
}
Obviously now that the affirmative block of the if is empty, you could refactor a little and only check for the negative condition. The logic of it should read closely to the way you described the problem in words:
"The minute the program finds a letter that is not one of those the program should return false."
bool is_favorite(string word){
int length = word.length(); // "word" is the string.
int index = 0;
while (index < length) {
if (!is_letter_a_through_f((word[index])
return false;
index++;
}
return true;
}
I replaced your large logical check against many characters with a function in the above code to make it more readable. I trust you do that without difficulty. My own preference is to keep statements short so that they are readable, and so that when you read the code, you can hold in your short-term memory the logic of what you are saying about control flow without being overloaded by the mechanics of your letter comparison.

C++, How to escape special characters from argv without the user manually adding escape characters

I'm writing a program in C++ that can take several input arguments like so:
Edit: based on suggestions from comments
int main(int argc, char **argv) {
constants c;
for (int i=0; i<argc; i++) {
if ( (argv[i])[0] == '-') {
if ((argv[i])[1] == 'h'){
bHelp = true;
//spit out some help text here
}
else if ((argv[i])[1] == 'c' && (argv[i+1]) != nullptr){
c.host = argv[i+1];
}
else if ((argv[i])[1] == 'd' && (argv[i+1]) != nullptr){
c.databasename = argv[i+1];
}
else if ((argv[i])[1] == 'w' && (argv[i+1]) != nullptr){
c.password = argv[i+1];
}
else if ((argv[i])[1] == 'u' && (argv[i+1]) != nullptr){
c.username = argv[i+1];
}
else if ((argv[i])[1] == 'p' && (argv[i+1]) != nullptr){
c.port = argv[i+1];
}
}
}
if (bHelp) {exit(1);}
When run the program seems to work properly, so far so good I thought.
However if any of the input following the flags has special characters for instance a '#' the program segfaults on start.
Whilst you can still make it work by manually escaping such characters on start, with "./app -u testuser -w \#fakepass" for example.
I would rather not bother my end-user with such things and would prefer to solve it in the code.

Validating email address without regex

This must have a canonical answer but I cannot find it... Using a regular expression to validate an email address has answers which show regex is really not the best way to validate emails. Searching online keeps turning up lots and lots of regex-based answers.
That question is about PHP and an answer references a handy class MailAddress. C# has something very similar but what about plain old C++? Is there a boost/C++11 utility to take all the pain away? Or something in WinAPI/MFC, even?
I have to write one solution because I have a g++ version installed that doesnt support std::regex (Application crashes) and I dont want to upgrade the thing for a single E-Mail validation as this application probably never will need any further regex I wrote a function doing the job. You can even easily scale allowed characters for each part of the E-Mail addres (before #, after # and after '.') depdending on your needs. Took 20 min to write and was way easier then messing with compiler and environment stuff just for one function call.
Here you go, have fun:
bool emailAddressIsValid(std::string _email)
{
bool retVal = false;
//Tolower cast
std::transform(_email.begin(), _email.end(), _email.begin(), ::tolower);
//Edit these to change valid characters you want to be supported to be valid. You can edit it for each section. Remember to edit the array size in the for-loops below.
const char* validCharsName = "abcdefghijklmnopqrstuvwxyz0123456789.%+_-"; //length = 41, change in loop
const char* validCharsDomain = "abcdefghijklmnopqrstuvwxyz0123456789.-"; //length = 38, changein loop
const char* validCharsTld = "abcdefghijklmnopqrstuvwxyz"; //length = 26, change in loop
bool invalidCharacterFound = false;
bool atFound = false;
bool dotAfterAtFound = false;
uint16_t letterCountBeforeAt = 0;
uint16_t letterCountAfterAt = 0;
uint16_t letterCountAfterDot = 0;
for (uint16_t i = 0; i < _email.length(); i++) {
char currentLetter = _email[i];
//Found first #? Lets mark that and continue
if (atFound == false && dotAfterAtFound == false && currentLetter == '#') {
atFound = true;
continue;
}
//Found '.' after #? lets mark that and continue
if (atFound == true && dotAfterAtFound == false && currentLetter == '.') {
dotAfterAtFound = true;
continue;
}
//Count characters before # (must be > 0)
if (atFound == false && dotAfterAtFound == false) {
letterCountBeforeAt++;
}
//Count characters after # (must be > 0)
if (atFound == true && dotAfterAtFound == false) {
letterCountAfterAt++;
}
//Count characters after '.'(dot) after # (must be between 2 and 6 characters (.tld)
if (atFound == true && dotAfterAtFound == true) {
letterCountAfterDot++;
}
//Validate characters, before '#'
if (atFound == false && dotAfterAtFound == false) {
bool isValidCharacter = false;
for (uint16_t j = 0; j < 41; j++) {
if (validCharsName[j] == currentLetter) {
isValidCharacter = true;
break;
}
}
if (isValidCharacter == false) {
invalidCharacterFound = true;
break;
}
}
//Validate characters, after '#', before '.' (dot)
if (atFound == true && dotAfterAtFound == false) {
bool isValidCharacter = false;
for (uint16_t k = 0; k < 38; k++) {
if (validCharsDomain[k] == currentLetter) {
isValidCharacter = true;
break;
}
}
if (isValidCharacter == false) {
invalidCharacterFound = true;
break;
}
}
//After '.' (dot), and after '#' (.tld)
if (atFound == true && dotAfterAtFound == true) {
bool isValidCharacter = false;
for (uint16_t m = 0; m < 26; m++) {
if (validCharsTld[m] == currentLetter) {
isValidCharacter = true;
break;
}
}
if (isValidCharacter == false) {
invalidCharacterFound = true;
break;
}
}
//Break the loop to speed up thigns if one character was invalid
if (invalidCharacterFound == true) {
break;
}
}
//Compare collected information and finalize validation. If all matches: retVal -> true!
if (atFound == true && dotAfterAtFound == true && invalidCharacterFound == false && letterCountBeforeAt >= 1 && letterCountAfterAt >= 1 && letterCountAfterDot >= 2 && letterCountAfterDot <= 6) {
retVal = true;
}
return retVal;
}

String Subscript Out Of Range C++: Debug

So I am debugging a runtime error I am getting. "string subscript out of range".
I know where the problem is and what is causing it, yet I am looking for a possible solution that will perform in a similar or identical manner without giving me the error.
Here is the code snippet to where the error occurs. Correct me if I am wrong, the problem is occurring because I am declaring a 0 length string then trying to manipulate an nth element.
std::string VsuShapeLine::GetRunwayNumber()
{
std::string name, nbstr, newnbstr, convnbstr;
int idx,idx2, num, count, pos;
char buf[3];
int idx3=-1;
name = this->GetName();
idx = name.find("ALight");
if (idx == -1)
{
idx = name.find("Lights");
idx3 = name.find_last_of("Lights");
}
idx2 = name.find('_');
idx2 +=3;
nbstr = name.substr(idx2, idx-idx2);
if (idx3 != -1)
idx3++;
else
idx3 = idx+6;
if (name.at(idx3) == 'N')
{
pos = nbstr.length();
if (isalpha(nbstr[idx-1]))
nbstr[pos-1] = _toupper(nbstr[pos-1]);
return (nbstr);
}
else if (name.at(idx3) == 'F')
{
convnbstr = nbstr.substr(0,2);
num = atoi(convnbstr.data());
num +=18;
_itoa(num, buf, 10);
newnbstr = buf;
count = nbstr.size();
if (count > 2)
{
if (nbstr.at(2) == 'l' || nbstr.at(2) == 'L')
newnbstr += 'r';
else if (nbstr.at(2) == 'r'|| nbstr.at(2) == 'R')
newnbstr += 'l';
else if (nbstr.at(2) == 'c' || nbstr.at(2) == 'C')
newnbstr += 'c';
}
pos = newnbstr.length();
if (isalpha(newnbstr[pos-1]))
newnbstr[pos-1] = _toupper(newnbstr[pos-1]);
return (newnbstr);
}
return ("");
}
Btw for whoever is interested the problem was at this line:
if (isalpha(nbstr[idx-1])
At this point nbstr is a string of length 3 and idx' value, the way my program works, is always either 9 or 10.
Also as Retired Ninja mentioned checks should be done after using the string::find function.

recursive call overflows

On a test data set the following code works, but when I change to a second test set with a similar size it overflows.
To change a string of tokens into an associated new string of tokens I use this vector lookup function
//looks for input string in vector and returns output, 'c' is check row, 'r' is return row
string vectorSearch(string &check, int &direction, int n, int c, int r, int level)
{
if ((direction == 1 && check.length() <= 1) || n == list.size()-1 ||(direction == 0 && check.length() > 1)) { //if reading and string is 1 char then pass over
if (direction == 1){ //convert '???' into '?'
string temp = "";
bool wildToken = false;
for (unsigned int i = 0; i < check.length(); i++) {
temp+='?';
if (check.compare(temp) == 0) { check = '?'; wildToken = false; } //done,'???" case, return '?' token
else if (check[i] == '?') wildToken = true; //not done searching
}
}
return check;
} else {
if (list[n][c] == check || list[n][c] == ('0'+check)) //add dummy '0'
return list[n][r];
else
return vectorSearch (check, direction, n+1, c, r, level);
}
}
After working fine for a dozen conversions the stack overflows
vectorSearch is called from this function
//this function takes an ontology and direction==1 (default) changes from string
//to single char or if direction==0 takes single char and converts to string representation
string Lexicon::convertOntology(string input, int level, int direction, string out, string temp)
{
if (input == "" && temp == "")
return out; //check for completed conversion
else {
if (direction == 0 || input[0] == '.' || input[0] == '-' || input == "" ) { //found deliniator or end
if (temp == "") temp = input[0]; //condition for reverse w/o deleniators
if (input != "") return convertOntology(input.substr(1), level+1, direction,
out+=vectorSearch(temp, direction, 0, direction, 1-direction, level));
else {
string empty = "";
return convertOntology(empty, level+1, direction, out+=vectorSearch(temp, direction, 0, direction, 1-direction, level));
}
} else
return convertOntology(input.substr(1), level, direction, out, temp+=input[0]); //increment and check
}
}
The call stack is a finite resource and can be exhausted like any other. The larger your function is (with respect to creation of local variables you create inside it) the larger the amount of space each call uses on the stack. It is something that is unavoidable with recursion unless you can restrict the number of recursive calls in some way.
You can only go so deep with recursion before running out of stack space. Luckily, any recursive function can be re-written to be iterative. I believe the below is a correct iterative implementation of your vectorSearch, I'll leave the latter one to you.
string vectorSearch(string &check, int &direction, int n, int c, int r, int level)
{
while(true)
{
if ((direction == 1 && check.length() <= 1) || n == list.size()-1 ||(direction == 0 && check.length() > 1)) { //if reading and string is 1 char then pass over
if (direction == 1){ //convert '???' into '?'
string temp = "";
bool wildToken = false;
for (unsigned int i = 0; i < check.length(); i++) {
temp+='?';
if (check.compare(temp) == 0) { check = '?'; wildToken = false; } //done,'???" case, return '?' token
else if (check[i] == '?') wildToken = true; //not done searching
}
}
return check;
} else if (list[n][c] == check || list[n][c] == ('0'+check)) {//add dummy '0'
return list[n][r];
}
n++;
}
}
thank you to the reviews and comments.
The functions are fine - this recursive function bundle requires that the string exists in the database it acts an, and the string checks prior to these incorrectly recognized a special condition and inserted a dummy char. There is the recursive function that precedes these two - I did not correctly see that I had written a bundle of three recursive functions - and that one was searching within parameters for a string longer than what exists in the database; apparently the parameters were wider than the stack. Checked into the parameters and one was not updated and was not controlling.
I fixed the special condition, the strings are now the same length and the search parameters are fixed.
the functions posted are not too complex.