Time complexity for Regular Expression Matching - c++

I was solving the problem Regular Expression Matching on leetcode. I solved it using recursion as below:
if (p.empty()) return s.empty();
if ('*' == p[1])
// x* matches empty string or at least one character: x* -> xx*
// *s is to ensure s is non-empty
return (isMatch(s, p.substr(2)) || !s.empty() && (s[0] == p[0] || '.' == p[0]) && isMatch(s.substr(1), p));
else
return !s.empty() && (s[0] == p[0] || '.' == p[0]) && isMatch(s.substr(1), p.substr(1));
But in this how can I find the time complexity of the code?
Problem link: https://leetcode.com/problems/regular-expression-matching/
PS: In solution they have explained the time complexity but I could not understand that.

Assume T(t,p) is the time complexity of function isMatch(text, pattern) where t is text.length() and p is pattern.length() / 2
First T(x,0) = 1 for all x
Then if pattern[1] == '*', T(t,p) = T(t,p-1) + T(t-1,p) + O(t + p)
Otherwise T(t,p) = T(t-1, p-0.5) + O(t + p)
Obviously the first case is worse
Think about the Combination Meaning of T.
Originally the is a ball you on coordinate (t,p), in one step you can move it to (t-1,p) or (t,p-1), costing t+p.
The ball stop on axis.
Then T(t,p) equals to the total cost of each valid way to move the ball to axis starting from (t,p).
Then we know
So the total time complexity is O((t+p)2^(t + p/2))
BTW your code will run faster if you use something like std::string_view instead of .substr(), which prevents copying the whole string.

Related

Increment operator not working in while condition

I've written a while loop to increment a pointer until the content is a null byte or the difference between adjacent elements is greater than 1, and this has worked fine:
while (i[1] && *i + 1 == i[1]) i++;
Then I tried to rewrite it as:
while (i[1] && *(i++) + 1 == *i);
But in this way, it got stuck in an infinite loop, as if i was not being incremented. Why is this so?
Edit:
I must apologize for being misleading but I discovered now that it does not get stuck inside the while loop I showed you, rather it simply exits that while loop and instead gets stuck in its parent loop, let me just share with you the whole code:
char accepted[strlen(literal)+1];
strcpy(accepted, literal);
std::sort(accepted, accepted + strlen(accepted));
char *i = accepted-1;
while (*++i){
uint8_t rmin = *i;
//while (i[1] && *i + 1 == i[1]) i++;
while (i[1] && *(i++) + 1 == *i);
uint8_t rmax = *i;
ranges.push_back(Range{rmin, rmax});
if (!i[1]) break;//is this necessary?
}
My question is no longer valid.
And yes, "clever" unreadable code is a bad idea.
There are two problems in your code:
while (i[1] && *(i++) + 1 == *i);
The && operator uses short circuit evaluation, that is if the left part (i[1]) is 0, then the right part (*(i++) + 1 == *i) is never evaluated. That's the reason why your code loops indefinitely.
the expression *(i++) + 1 == *i yields undefined behaviour because the order of evaluation of the sub expressions left and right of the == is not specified.
It's usually not advised to write "clever" code. Write readable code and let the compiler take care of optimizations.

Checking if two patterns match one another?

This Leetcode problem is about how to match a pattern string against a text string as efficiently as possible. The pattern string can consists of letters, dots, and stars, where a letter only matches itself, a dot matches any individual character, and a star matches any number of copies of the preceding character. For example, the pattern
ab*c.
would match ace and abbbbcc. I know that it's possible to solve this original problem using dynamic programming.
My question is whether it's possible to see whether two patterns match one another. For example, the pattern
bdbaa.*
can match
bdb.*daa
Is there a nice algorithm for solving this pattern-on-pattern matching problem?
Here's one approach that works in polynomial time. It's slightly heavyweight and there may be a more efficient solution, though.
The first observation that I think helps here is to reframe the problem. Rather than asking whether these patterns match each other, let's ask this equivalent question:
Given patterns P1 and P2, is there a string w where P1 and P2 each match w?
In other words, rather than trying to get the two patterns to match one another, we'll search for a string that each pattern matches.
You may have noticed that the sorts of patterns you're allowed to work with are a subset of the regular expressions. This is helpful, since there's a pretty elaborate theory of what you can do with regular expressions and their properties. So rather than taking aim at your original problem, let's solve this even more general one:
Given two regular expressions R1 and R2, is there a string w that both R1 and R2 match?
The reason for solving this more general problem is that it enables us to use the theory that's been developed around regular expressions. For example, in formal language theory we can talk about the language of a regular expression, which is the set of all strings that the regex matches. We can denote this L(R). If there's a string that's matched by two regexes R1 and R2, then that string belongs to both L(R1) and L(R2), so our question is equivalent to
Given two regexes R1 and R2, is there a string w in L(R1) ∩ L(R2)?
So far all we've done is reframe the problem we want to solve. Now let's go solve it.
The key step here is that it's possible to convert any regular expression into an NFA (a nondeterministic finite automaton) so that every string matched by the regex is accepted by the NFA and vice-versa. Even better, the resulting NFA can be constructed in polynomial time. So let's begin by constructing NFAs for each input regex.
Now that we have those NFAs, we want to answer this question: is there a string that both NFAs accept? And fortunately, there's a quick way to answer this. There's a common construction on NFAs called the product construction that, given two NFAs N1 and N2, constructs a new NFA N' that accepts all the strings accepted by both N1 and N2 and no other strings. Again, this construction runs in polynomial time.
Once we have N', we're basically done! All we have to do is run a breadth-first or depth-first search through the states of N' to see if we find an accepting state. If so, great! That means there's a string accepted by N', which means that there's a string accepted by both N1 and N2, which means that there's a string matched by both R1 and R2, so the answer to the original question is "yes!" And conversely, if we can't reach an accepting state, then the answer is "no, it's not possible."
I'm certain that there's a way to do all of this implicitly by doing some sort of implicit BFS over the automaton N' without actually constructing it, and it should be possible to do this in something like time O(n2). If I have some more time, I'll revisit this answer and expand on how to do that.
I have worked on my idea of DP and came out with the below implementation of the above problem. Please feel free to edit the code in case someone finds any test cases failed. From my side, I tried few test cases and passed all of them, which I will be mentioning below as well.
Please note that I have extended the idea which is used to solve the regex pattern matching with a string using DP. To refer to that idea, please refer to the LeetCode link provided in the OP and look out for discussion part. They have given the explanation for regex matching and the string.
The idea is to create a dynamic memoization table, entries of which will follow the below rules:
If pattern1[i] == pattern2[j], dp[i][j] = dp[i-1][j-1]
If pattern1[i] == '.' or pattern2[j] == '.', then dp[i][j] = dp[i-1][j-1]
The trick lies here: If pattern1[i] = '*', then if dp[i-2][j] exists, then
dp[i][j] = dp[i-2][j] || dp[i][j-1] else dp[i][j] = dp[i][j-1].
If pattern2[j] == '*', then if pattern1[i] == pattern2[j-1], then
dp[i][j] = dp[i][j-2] || dp[i-1][j]
else dp[i][j] = dp[i][j-2]
pattern1 goes row-wise and pattern2 goes column-wise. Also, please note that this code should also work for normal regex pattern matching with any given string. I have verified it by running it on LeetCode and it passed all the available test cases there!
Below is the complete working implementation of the above logic:
boolean matchRegex(String pattern1, String pattern2){
boolean dp[][] = new boolean[pattern1.length()+1][pattern2.length()+1];
dp[0][0] = true;
//fill up for the starting row
for(int j=1;j<=pattern2.length();j++){
if(pattern2.charAt(j-1) == '*')
dp[0][j] = dp[0][j-2];
}
//fill up for the starting column
for(int j=1;j<=pattern1.length();j++){
if(pattern1.charAt(j-1) == '*')
dp[j][0] = dp[j-2][0];
}
//fill for rest table
for(int i=1;i<=pattern1.length();i++){
for(int j=1;j<=pattern2.length();j++){
//if second character of pattern1 is *, it will be equal to
//value in top row of current cell
if(pattern1.charAt(i-1) == '*'){
dp[i][j] = dp[i-2][j] || dp[i][j-1];
}
else if(pattern1.charAt(i-1)!='*' && pattern2.charAt(j-1)!='*'
&& (pattern1.charAt(i-1) == pattern2.charAt(j-1)
|| pattern1.charAt(i-1)=='.' || pattern2.charAt(j-1)=='.'))
dp[i][j] = dp[i-1][j-1];
else if(pattern2.charAt(j-1) == '*'){
boolean temp = false;
if(pattern2.charAt(j-2) == pattern1.charAt(i-1)
|| pattern1.charAt(i-1)=='.'
|| pattern1.charAt(i-1)=='*'
|| pattern2.charAt(j-2)=='.')
temp = dp[i-1][j];
dp[i][j] = dp[i][j-2] || temp;
}
}
}
//comment this portion if you don't want to see entire dp table
for(int i=0;i<=pattern1.length();i++){
for(int j=0;j<=pattern2.length();j++)
System.out.print(dp[i][j]+" ");
System.out.println("");
}
return dp[pattern1.length()][pattern2.length()];
}
Driver method:
System.out.println(e.matchRegex("bdbaa.*", "bdb.*daa"));
Input1: bdbaa.* and bdb.*daa
Output1: true
Input2: .*acd and .*bce
Output2: false
Input3: acd.* and .*bce
Output3: true
Time complexity: O(mn) where m and n are lengths of two regex patterns given. Same will be the space complexity.
You can use a dynamic approach tailored to this subset of a Thompson NFA style regex implementing only . and *:
You can do that either with dynamic programming (here in Ruby):
def is_match(s, p)
return true if s==p
len_s, len_p=s.length, p.length
dp=Array.new(len_s+1) { |row| [false] * (len_p+1) }
dp[0][0]=true
(2..len_p).each { |j| dp[0][j]=dp[0][j-2] && p[j-1]=='*' }
(1..len_s).each do |i|
(1..len_p).each do |j|
if p[j-1]=='*'
a=dp[i][j - 2]
b=[s[i - 1], '.'].include?(p[j-2])
c=dp[i - 1][j]
dp[i][j]= a || (b && c)
else
a=dp[i - 1][j - 1]
b=['.', s[i - 1]].include?(p[j - 1])
dp[i][j]=a && b
end
end
end
dp[len_s][len_p]
end
# 139 ms on Leetcode
Or recursively:
def is_match(s,p,memo={["",""]=>true})
if p=="" && s!="" then return false end
if s=="" && p!="" then return p.scan(/.(.)/).uniq==[['*']] && p.length.even? end
if memo[[s,p]]!=nil then return memo[[s,p]] end
ch, exp, prev=s[-1],p[-1], p.length<2 ? 0 : p[-2]
a=(exp=='*' && (
([ch,'.'].include?(prev) && is_match(s[0...-1], p, memo) ||
is_match(s, p[0...-2], memo))))
b=([ch,'.'].include?(exp) && is_match(s[0...-1], p[0...-1], memo))
memo[[s,p]]=(a || b)
end
# 92 ms on Leetcode
In each case:
The operative starting point in the string and pattern is at the second character looking for * and matches one character back for as long as s matches the character in p prior to the *
The meta character . is being used as a fill in for the actual character. This allows any character in s to match . in p
You can solve this with backtracking too, not very efficiently (because the match of the same substrings may be recalculated many times, which could be improved by introducing a lookup table where all non-matching pairs of strings are saved and the calculation only happens when they cannot be found in the lookup table), but seems to work (js, the algorithm assumes the simple regex are valid, which means not beginning with * and no two adjacent * [try it yourself]):
function canBeEmpty(s) {
if (s.length % 2 == 1)
return false;
for (let i = 1; i < s.length; i += 2)
if (s[i] != "*")
return false;
return true;
}
function match(a, b) {
if (a.length == 0 || b.length == 0)
return canBeEmpty(a) && canBeEmpty(b);
let x = 0, y = 0;
// process characters up to the next star
while ((x + 1 == a.length || a[x + 1] != "*") &&
(y + 1 == b.length || b[y + 1] != "*")) {
if (a[x] != b[y] && a[x] != "." && b[y] != ".")
return false;
x++; y++;
if (x == a.length || y == b.length)
return canBeEmpty(a.substr(x)) && canBeEmpty(b.substr(y));
}
if (x + 1 < a.length && y + 1 < b.length && a[x + 1] == "*" && b[y + 1] == "*")
// star coming in both strings
return match(a.substr(x + 2), b.substr(y)) || // try skip in a
match(a.substr(x), b.substr(y + 2)); // try skip in b
else if (x + 1 < a.length && a[x + 1] == "*") // star coming in a, but not in b
return match(a.substr(x + 2), b.substr(y)) || // try skip * in a
((a[x] == "." || b[y] == "." || a[x] == b[y]) && // if chars matching
match(a.substr(x), b.substr(y + 1))); // try skip char in b
else // star coming in b, but not in a
return match(a.substr(x), b.substr(y + 2)) || // try skip * in b
((a[x] == "." || b[y] == "." || a[x] == b[y]) && // if chars matching
match(a.substr(x + 1), b.substr(y))); // try skip char in a
}
For a little optimization you could normalize the strings first:
function normalize(s) {
while (/([^*])\*\1([^*]|$)/.test(s) || /([^*])\*\1\*/.test(s)) {
s = s.replace(/([^*])\*\1([^*]|$)/, "$1$1*$2"); // move stars right
s = s.replace(/([^*])\*\1\*/, "$1*"); // reduce
}
return s;
}
// example: normalize("aa*aa*aa*bb*b*cc*cd*dd") => "aaaa*bb*ccc*ddd*"
There is a further reduction of the input possible: x*.* and .*x* can both be replaced by .*, so to get the maximal reduction you would have to try to move as many stars as possible next to .* (so moving some stars to the left can be better than moving all to the right).
IIUC, you are asking: "Can a regex pattern match another regex pattern?"
Yes, it can. Specifically, . matches "any character" which of course includes . and *. So if you have a string like this:
bdbaa.*
How could you match it? Well, you could match it like this:
bdbaa..
Or like this:
b.*
Or like:
.*ba*.*

Need Help Understanding Recursive Prefix Evaluator

This is a piece of code I found in my textbook for using recursion to evaluate prefix expressions. I'm having trouble understanding this code and the process in which it goes through.
char *a; int i;
int eval()
{ int x = 0;
while (a[i] == ' ') i++;
if (a[i] == '+')
{ i++; return eval() + eval(); }
if (a[i] == '*')
{ i++; return eval() * eval(); }
while ((a[i] >= '0') && (a[i] <= '9'))
x = 10*x + (a[i++] - '0');
return x;
}
I guess I'm confused primarily with the return statements and how it eventually leads to solving a prefix expression. Thanks in advance!
The best way to understand recursive examples is to work through an example :
char* a = "+11 4"
first off, i is initialized to 0 because there is no default initializer. i is also global, so updates to it will affect all calls of eval().
i = 0, a[i] = '+'
there are no leading spaces, so the first while loop condition fails. The first if statement succeeds, i is incremented to 1 and eval() + eval() is executed. We'll evaluate these one at a time, and then come back after we have our results.
i = 1, a[1] = '1'
Again, no leading spaces, so the first while loop fails. The first and second if statements fail. In the last while loop, '1' is between 0 and 9(based on ascii value), so x becomes 0 + a[1] - '0', or 0 + 1 = 1. Important here is that i is incremented after a[i] is read, then i is incremented. The next iteration of the while loop adds to x. Here x = 10 * 1 + a[2] - '0', or 10 + 1 = 11. With the correct value of x, we can exit eval() and return the result of the first operand, again here 11.
i = 2, a[2] = '4'
As in the previous step, the only statement executed in this call of eval() is the last while loop. x = 0 + a[2] - '0', or 0 + 4 = 4. So we return 4.
At this point the control flow returns back to the original call to eval(), and now we have both values for the operands. We simply perform the addition to get 11 + 4 = 15, then return the result.
Every time eval() is called, it computes the value of the immediate next expression starting at position i, and returns that value.
Within eval:
The first while loop is just to ignore all the spaces.
Then there are 3 cases:
(a) Evaluate expressions starting with a + (i.e. An expression of the form A+B which is "+ A B" in prefix
(b) Evaluate expressions starting with a * (i.e. A*B = "* A B")
(c) Evaluate integer values (i.e. Any consecutive sequence of digits)
The while loop at the end takes care of case (c).
The code for case (a) is similar to that for case (b). Think about case (a):
If we encounter a + sign, it means we need to add the next two "things" we find in the sequence. The "things" might be numbers, or may themselves be expressions to be evaluated (such as X+Y or X*Y).
In order to get what these "things" are, the function eval() is called with an updated value of i. Each call to eval() will fetch the value of the immediate next expression, and update position i.
Thus, 2 successive calls to eval() obtain the values of the 2 following expressions.
We then apply the + operator to the 2 values, and return the result.
It will help to work through an example such as "+ * 2 3 * 4 5", which is prefix notation for (2*3)+(4*5).
So this piece of code can only eat +, *, spaces and numbers. It is supposed to eat one command which can be one of:
- + <op1> <op2>
- * <op1> <op2>
<number>
It gets a pointer to a string, and a reading position which is incremented as the program goes along that string.
char *a; int i;
int eval()
{ int x = 0;
while (a[i] == ' ') i++; // it eats all spaces
if (a[i] == '+')
/* if the program encounters '+', two operands are expected next.
The reading position i already points just before the place
from which you have to start reading the next operand
(which is what first eval() call will do).
After the first eval() is finished,
the reading position is moved to the begin of the second operand,
which will be read during the second eval() call. */
{ i++; return eval() + eval(); }
if (a[i] == '*') // exactly the same, but for '*' operation.
{ i++; return eval() * eval(); }
while ((a[i] >= '0') && (a[i] <= '9')) // here it eats all digit until something else is encountered.
x = 10*x + (a[i++] - '0'); // every time the new digit is read, it multiplies the previously obtained number by 10 and adds the new digit.
return x;
// base case: returning the number. Note that the reading position already moved past it.
}
The example you are given uses a couple of global variables. They persist outside of the function's scope and must be initialized before calling the function.
i should be initialized to 0 so that you start at the beginning of the string, and the prefix expression is the string in a.
the operator is your prefix and so should be your first non-blank character, if you start with a number (string of numbers) you are done, that is the result.
example: a = " + 15 450"
eval() finds '+' at i = 1
calls eval()
which finds '1' at i = 3 and then '5'
calculates x = 1 x 10 + 5
returns 15
calls eval()
which finds '4' at i = 6 and then '5' and then '0'
calclulates x = ((4 x 10) + 5) x 10) + 0
returns 450
calculates the '+' operator of 15 and 450
returns 465
The returns are either a value found or the result of an operator and the succeeding results found. So recursively, the function successively looks through the input string and performs the operations until either the string ends or an invalid character is found.
Rather than breaking up code into chunks and so on, i'll try and just explain the concept it as simple as possible.
The eval function always skips spaces so that it points to either a number character ('0'->'9'), an addition ('+') or a multiply ('*') at the current place in the expression string.
If it encounters a number, it proceeds to continue to eat the number digits, until it reaches a non-number digit returning the total result in integer format.
If it encounters operator ('+' and '*') it requires two integers, so eval calls itself twice to get the next two numbers from the expression string and returns that result as an integer.
One hair in the soup may be evaluation order, cf. https://www.securecoding.cert.org/confluence/display/seccode/EXP10-C.+Do+not+depend+on+the+order+of+evaluation+of+subexpressions+or+the+order+in+which+side+effects+take+place.
It is not specified which eval in "eval() + eval()" is, well, evaluated first. That's ok for commutative operators but will fail for - or /, because eval() as a side effect advances the global position counter so that the (in time) second eval gets the (in space) second expression. But that may well be the (in space) first eval.
I think the fix is easy; assign to a temp and compute with that:
if (a[i] == '-')
{ i++; int tmp = eval(); return tmp - eval(); }

C++ random numbers logical operator wierd outcome

I am trying to make a program generating random numbers until it finds a predefined set of numbers (eg. if I had a set of my 5 favourite numbers, how many times would I need to play for the computer to randomly find the same numbers). I have written a simple program but don't understand the outcome which seems to be slightly unrelated to what I expected, for example the outcome does not necessarily contain all of the predefined numbers sometimes it does (and even that doesn't stop the loop from running). I think that the problem lies in the logical operator '&&' but am not sure. Here is the code:
const int one = 1;
const int two = 2;
const int three = 3;
using namespace std;
int main()
{
int first, second, third;
int i = 0;
time_t seconds;
time(&seconds);
srand ((unsigned int) seconds);
do
{
first = rand() % 10 + 1;
second = rand() % 10 + 1;
third = rand() % 10 + 1;
i++;
cout << first<<","<<second<<","<<third<< endl;
cout <<i<<endl;
} while (first != one && second != two && third != three);
return 0;
}
and here is out of the possible outcomes:
3,10,4
1 // itineration variable
7,10,4
2
4,4,6
3
3,5,6
4
7,1,8
5
5,4,2
6
2,5,7
7
2,4,7
8
8,4,9
9
7,4,4
10
8,6,5
11
3,2,7
12
I have also noticed that If I use the || operator instead of && the loop will execute until it finds the exact numbers respecting the order in which the variables were set (here: 1,2,3). This is better however what shall I do make the loop stop even if the order is not the same, only the numbers? Thanks for your answers and help.
The issue is here in your condition:
} while (first != one && second != two && third != three);
You continue while none of them is equal. But once at least one of them is equal, you stop/leave the loop.
To fix this, use logical or (||) rather than a logical and (&&) to link the tests:
} while (first != one || second != two || third != three);
Now it will continue as long as any of them doesn't match.
Edit - for a more advanced comparison:
I'll be using a simple macro to make it easier to read:
#define isoneof(x,a,b,c) ((x) == (a) || (x) == (b) || (x) == (c))
Note that there are different approaches you could use.
} while(!isoneof(first, one, two, three) || !isoneof(second, one, two, three) || !isoneof(third, one, two, three))
You have a mistake in your logical condition: it means "while all numbers are not equal". To break this condition, it is enough for one pair to become equal.
You needed to construct a different condition - either put "not" in front of it
!(first==one && second==two && third==three)
or convert using De Morgan's law:
first!=one || second!=two || third!=three

C++ Recognizing double digits using strings

Sorry, I realized that I put in all of my code in this question. All of my code equals most of the answer for this particular problem for other students, which was idiotic.
Here's the basic gist of the problem I put:
I needed to recognize single digit numbers in a regular mathematical expression (such as 5 + 6) as well as double digit (such as 56 + 78). The mathematical expressions could also be displayed as 56+78 (no spaces) or 56 +78 and so on.
The actual problem was that I was reading in the expression as 5 6 + 7 8 no matter what the input was.
Thanks and sorry that I pretty much deleted this question, but my goal is not to give answers out for homework problems.
Jesse Smothermon
The problem really consists of two parts: lexing the input (turning the sequence of characters into a sequence of "tokens") and evaluating the expression. If you do these two tasks separately, it should be much easier.
First, read in the input and convert it into a sequence of tokens, where each token is an operator (+, -, etc.) or an operand (42, etc.).
Then, perform the infix-to-postfix conversion on this sequence of tokens. A "Token" type doesn't have to be anything fancy, it can be as simple as:
struct Token {
enum Type { Operand, Operator };
enum OperatorType { Plus, Minus };
Type type_;
OperatorType operatorType_; // only valid if type_ == Operator
int operand_; // only valid if type_ == Operand
};
First, it helps to move such ifs like this
userInput[i] != '+' || userInput[i] != '-' || userInput[i] != '*' || userInput[i] != '/' || userInput[i] != '^' || userInput[i] != ' ' && i < userInput.length()
into its own function, just for the clarity.
bool isOperator(char c){
return c == '+' || c == '-' || c == '*' || c == '/' || c == '^';
}
Also, no need to check that it's no operator, just check that the input is a number:
bool isNum(char c){
return '0' <= c && c <= '9';
}
Another thing, with the long chain above, you got the problem that you will also enter the tempNumber += ... block, if the input character is anyhing other than '+'. You would have to check with &&, or better with the function above:
if (isNum(userInput[iterator])){
tempNumber += userInput[iterator];
}
This will also rule out any invalid input like b, X and the likes.
Then, for your problem with double digit numbers:
The problem is, that you always input a space after inserting the tempNumber. You only need to do that, if the digit sequence is finished. To fix that, just modify the end of your long if-else if chain:
// ... operator stuff
} else {
postfixExpression << tempNumber;
// peek if the next character is also a digit, if not insert a space
// also, if the current character is the last in the sequence, there can be no next digit
if (iterator == userInput.lenght()-1 || !isNum(userInput[iterator+1])){
postfixExpression << ' ';
}
}
This should do the job of giving the correct representation from 56 + 78 --> 56 78 +. Please tell me if there's anything wrong. :)