Checking if two patterns match one another? - regex

This Leetcode problem is about how to match a pattern string against a text string as efficiently as possible. The pattern string can consists of letters, dots, and stars, where a letter only matches itself, a dot matches any individual character, and a star matches any number of copies of the preceding character. For example, the pattern
ab*c.
would match ace and abbbbcc. I know that it's possible to solve this original problem using dynamic programming.
My question is whether it's possible to see whether two patterns match one another. For example, the pattern
bdbaa.*
can match
bdb.*daa
Is there a nice algorithm for solving this pattern-on-pattern matching problem?

Here's one approach that works in polynomial time. It's slightly heavyweight and there may be a more efficient solution, though.
The first observation that I think helps here is to reframe the problem. Rather than asking whether these patterns match each other, let's ask this equivalent question:
Given patterns P1 and P2, is there a string w where P1 and P2 each match w?
In other words, rather than trying to get the two patterns to match one another, we'll search for a string that each pattern matches.
You may have noticed that the sorts of patterns you're allowed to work with are a subset of the regular expressions. This is helpful, since there's a pretty elaborate theory of what you can do with regular expressions and their properties. So rather than taking aim at your original problem, let's solve this even more general one:
Given two regular expressions R1 and R2, is there a string w that both R1 and R2 match?
The reason for solving this more general problem is that it enables us to use the theory that's been developed around regular expressions. For example, in formal language theory we can talk about the language of a regular expression, which is the set of all strings that the regex matches. We can denote this L(R). If there's a string that's matched by two regexes R1 and R2, then that string belongs to both L(R1) and L(R2), so our question is equivalent to
Given two regexes R1 and R2, is there a string w in L(R1) ∩ L(R2)?
So far all we've done is reframe the problem we want to solve. Now let's go solve it.
The key step here is that it's possible to convert any regular expression into an NFA (a nondeterministic finite automaton) so that every string matched by the regex is accepted by the NFA and vice-versa. Even better, the resulting NFA can be constructed in polynomial time. So let's begin by constructing NFAs for each input regex.
Now that we have those NFAs, we want to answer this question: is there a string that both NFAs accept? And fortunately, there's a quick way to answer this. There's a common construction on NFAs called the product construction that, given two NFAs N1 and N2, constructs a new NFA N' that accepts all the strings accepted by both N1 and N2 and no other strings. Again, this construction runs in polynomial time.
Once we have N', we're basically done! All we have to do is run a breadth-first or depth-first search through the states of N' to see if we find an accepting state. If so, great! That means there's a string accepted by N', which means that there's a string accepted by both N1 and N2, which means that there's a string matched by both R1 and R2, so the answer to the original question is "yes!" And conversely, if we can't reach an accepting state, then the answer is "no, it's not possible."
I'm certain that there's a way to do all of this implicitly by doing some sort of implicit BFS over the automaton N' without actually constructing it, and it should be possible to do this in something like time O(n2). If I have some more time, I'll revisit this answer and expand on how to do that.

I have worked on my idea of DP and came out with the below implementation of the above problem. Please feel free to edit the code in case someone finds any test cases failed. From my side, I tried few test cases and passed all of them, which I will be mentioning below as well.
Please note that I have extended the idea which is used to solve the regex pattern matching with a string using DP. To refer to that idea, please refer to the LeetCode link provided in the OP and look out for discussion part. They have given the explanation for regex matching and the string.
The idea is to create a dynamic memoization table, entries of which will follow the below rules:
If pattern1[i] == pattern2[j], dp[i][j] = dp[i-1][j-1]
If pattern1[i] == '.' or pattern2[j] == '.', then dp[i][j] = dp[i-1][j-1]
The trick lies here: If pattern1[i] = '*', then if dp[i-2][j] exists, then
dp[i][j] = dp[i-2][j] || dp[i][j-1] else dp[i][j] = dp[i][j-1].
If pattern2[j] == '*', then if pattern1[i] == pattern2[j-1], then
dp[i][j] = dp[i][j-2] || dp[i-1][j]
else dp[i][j] = dp[i][j-2]
pattern1 goes row-wise and pattern2 goes column-wise. Also, please note that this code should also work for normal regex pattern matching with any given string. I have verified it by running it on LeetCode and it passed all the available test cases there!
Below is the complete working implementation of the above logic:
boolean matchRegex(String pattern1, String pattern2){
boolean dp[][] = new boolean[pattern1.length()+1][pattern2.length()+1];
dp[0][0] = true;
//fill up for the starting row
for(int j=1;j<=pattern2.length();j++){
if(pattern2.charAt(j-1) == '*')
dp[0][j] = dp[0][j-2];
}
//fill up for the starting column
for(int j=1;j<=pattern1.length();j++){
if(pattern1.charAt(j-1) == '*')
dp[j][0] = dp[j-2][0];
}
//fill for rest table
for(int i=1;i<=pattern1.length();i++){
for(int j=1;j<=pattern2.length();j++){
//if second character of pattern1 is *, it will be equal to
//value in top row of current cell
if(pattern1.charAt(i-1) == '*'){
dp[i][j] = dp[i-2][j] || dp[i][j-1];
}
else if(pattern1.charAt(i-1)!='*' && pattern2.charAt(j-1)!='*'
&& (pattern1.charAt(i-1) == pattern2.charAt(j-1)
|| pattern1.charAt(i-1)=='.' || pattern2.charAt(j-1)=='.'))
dp[i][j] = dp[i-1][j-1];
else if(pattern2.charAt(j-1) == '*'){
boolean temp = false;
if(pattern2.charAt(j-2) == pattern1.charAt(i-1)
|| pattern1.charAt(i-1)=='.'
|| pattern1.charAt(i-1)=='*'
|| pattern2.charAt(j-2)=='.')
temp = dp[i-1][j];
dp[i][j] = dp[i][j-2] || temp;
}
}
}
//comment this portion if you don't want to see entire dp table
for(int i=0;i<=pattern1.length();i++){
for(int j=0;j<=pattern2.length();j++)
System.out.print(dp[i][j]+" ");
System.out.println("");
}
return dp[pattern1.length()][pattern2.length()];
}
Driver method:
System.out.println(e.matchRegex("bdbaa.*", "bdb.*daa"));
Input1: bdbaa.* and bdb.*daa
Output1: true
Input2: .*acd and .*bce
Output2: false
Input3: acd.* and .*bce
Output3: true
Time complexity: O(mn) where m and n are lengths of two regex patterns given. Same will be the space complexity.

You can use a dynamic approach tailored to this subset of a Thompson NFA style regex implementing only . and *:
You can do that either with dynamic programming (here in Ruby):
def is_match(s, p)
return true if s==p
len_s, len_p=s.length, p.length
dp=Array.new(len_s+1) { |row| [false] * (len_p+1) }
dp[0][0]=true
(2..len_p).each { |j| dp[0][j]=dp[0][j-2] && p[j-1]=='*' }
(1..len_s).each do |i|
(1..len_p).each do |j|
if p[j-1]=='*'
a=dp[i][j - 2]
b=[s[i - 1], '.'].include?(p[j-2])
c=dp[i - 1][j]
dp[i][j]= a || (b && c)
else
a=dp[i - 1][j - 1]
b=['.', s[i - 1]].include?(p[j - 1])
dp[i][j]=a && b
end
end
end
dp[len_s][len_p]
end
# 139 ms on Leetcode
Or recursively:
def is_match(s,p,memo={["",""]=>true})
if p=="" && s!="" then return false end
if s=="" && p!="" then return p.scan(/.(.)/).uniq==[['*']] && p.length.even? end
if memo[[s,p]]!=nil then return memo[[s,p]] end
ch, exp, prev=s[-1],p[-1], p.length<2 ? 0 : p[-2]
a=(exp=='*' && (
([ch,'.'].include?(prev) && is_match(s[0...-1], p, memo) ||
is_match(s, p[0...-2], memo))))
b=([ch,'.'].include?(exp) && is_match(s[0...-1], p[0...-1], memo))
memo[[s,p]]=(a || b)
end
# 92 ms on Leetcode
In each case:
The operative starting point in the string and pattern is at the second character looking for * and matches one character back for as long as s matches the character in p prior to the *
The meta character . is being used as a fill in for the actual character. This allows any character in s to match . in p

You can solve this with backtracking too, not very efficiently (because the match of the same substrings may be recalculated many times, which could be improved by introducing a lookup table where all non-matching pairs of strings are saved and the calculation only happens when they cannot be found in the lookup table), but seems to work (js, the algorithm assumes the simple regex are valid, which means not beginning with * and no two adjacent * [try it yourself]):
function canBeEmpty(s) {
if (s.length % 2 == 1)
return false;
for (let i = 1; i < s.length; i += 2)
if (s[i] != "*")
return false;
return true;
}
function match(a, b) {
if (a.length == 0 || b.length == 0)
return canBeEmpty(a) && canBeEmpty(b);
let x = 0, y = 0;
// process characters up to the next star
while ((x + 1 == a.length || a[x + 1] != "*") &&
(y + 1 == b.length || b[y + 1] != "*")) {
if (a[x] != b[y] && a[x] != "." && b[y] != ".")
return false;
x++; y++;
if (x == a.length || y == b.length)
return canBeEmpty(a.substr(x)) && canBeEmpty(b.substr(y));
}
if (x + 1 < a.length && y + 1 < b.length && a[x + 1] == "*" && b[y + 1] == "*")
// star coming in both strings
return match(a.substr(x + 2), b.substr(y)) || // try skip in a
match(a.substr(x), b.substr(y + 2)); // try skip in b
else if (x + 1 < a.length && a[x + 1] == "*") // star coming in a, but not in b
return match(a.substr(x + 2), b.substr(y)) || // try skip * in a
((a[x] == "." || b[y] == "." || a[x] == b[y]) && // if chars matching
match(a.substr(x), b.substr(y + 1))); // try skip char in b
else // star coming in b, but not in a
return match(a.substr(x), b.substr(y + 2)) || // try skip * in b
((a[x] == "." || b[y] == "." || a[x] == b[y]) && // if chars matching
match(a.substr(x + 1), b.substr(y))); // try skip char in a
}
For a little optimization you could normalize the strings first:
function normalize(s) {
while (/([^*])\*\1([^*]|$)/.test(s) || /([^*])\*\1\*/.test(s)) {
s = s.replace(/([^*])\*\1([^*]|$)/, "$1$1*$2"); // move stars right
s = s.replace(/([^*])\*\1\*/, "$1*"); // reduce
}
return s;
}
// example: normalize("aa*aa*aa*bb*b*cc*cd*dd") => "aaaa*bb*ccc*ddd*"
There is a further reduction of the input possible: x*.* and .*x* can both be replaced by .*, so to get the maximal reduction you would have to try to move as many stars as possible next to .* (so moving some stars to the left can be better than moving all to the right).

IIUC, you are asking: "Can a regex pattern match another regex pattern?"
Yes, it can. Specifically, . matches "any character" which of course includes . and *. So if you have a string like this:
bdbaa.*
How could you match it? Well, you could match it like this:
bdbaa..
Or like this:
b.*
Or like:
.*ba*.*

Related

Time complexity for Regular Expression Matching

I was solving the problem Regular Expression Matching on leetcode. I solved it using recursion as below:
if (p.empty()) return s.empty();
if ('*' == p[1])
// x* matches empty string or at least one character: x* -> xx*
// *s is to ensure s is non-empty
return (isMatch(s, p.substr(2)) || !s.empty() && (s[0] == p[0] || '.' == p[0]) && isMatch(s.substr(1), p));
else
return !s.empty() && (s[0] == p[0] || '.' == p[0]) && isMatch(s.substr(1), p.substr(1));
But in this how can I find the time complexity of the code?
Problem link: https://leetcode.com/problems/regular-expression-matching/
PS: In solution they have explained the time complexity but I could not understand that.
Assume T(t,p) is the time complexity of function isMatch(text, pattern) where t is text.length() and p is pattern.length() / 2
First T(x,0) = 1 for all x
Then if pattern[1] == '*', T(t,p) = T(t,p-1) + T(t-1,p) + O(t + p)
Otherwise T(t,p) = T(t-1, p-0.5) + O(t + p)
Obviously the first case is worse
Think about the Combination Meaning of T.
Originally the is a ball you on coordinate (t,p), in one step you can move it to (t-1,p) or (t,p-1), costing t+p.
The ball stop on axis.
Then T(t,p) equals to the total cost of each valid way to move the ball to axis starting from (t,p).
Then we know
So the total time complexity is O((t+p)2^(t + p/2))
BTW your code will run faster if you use something like std::string_view instead of .substr(), which prevents copying the whole string.

Recursively insert "22" between two repeated characters

This is the homework spec
// Returns a string where the same characters next each other in
// string n are separated by "22"
//
// Pseudocode Example:
// doubleDouble("goodbye") => "go22odbye"
// doubleDouble("yyuu") => "y22yu22u"
// doubleDouble("aaaa") => "a22a22a22a"
//
string doubleDouble(string n)
{
return ""; // This is not always correct.
}
This is part of a homework set that deals with recursion. I know how to use recursion with an int function but I'm not entirely sure how to approach this problem when a string is passed through. Is it as simple as n.length == n.length() +1 ? and then simply insert "22" ? Alongside this, how would one traverse the string? Thanks!
I would say that the base case be if n turns out to be just a blank space, or if it has a size 0, then simply return the string back, no changes made.
This seems OK to me (untested code)
string doubleDouble(string n)
{
if (n.size() < 2)
return n;
else if (n[0] == n[1])
return n[0] + ("22" + doubleDouble(n.substr(1)));
else
return n[0] + doubleDouble(n.substr(1));
}
As you can see you recurse on substrings of the original string.
Undoubtedly there are more efficient ways to do this (that minimise the string copying) but I'll leave that for you.
PS you need to enable C++11 to get all the required versions of + for strings.
string doubleDouble(string n)
{
if(n.length() == 2){
if(n[0]==n[1])
return n.insert(1,"22");
else
return n;
}
else if(n.length() == 1)
return n;
else
if(n.length() % 2 == 1 || n[n.length()/2]!=n[n.length()/2+1])
return doubleDouble(n.substr(0,n.length() / 2)) + doubleDouble(n.substr((n.length() / 2),n.length()));
else if(n[n.length()/2]==n[n.length()/2+1])
return doubleDouble(n.substr(0,n.length() / 2)) +"22"+ doubleDouble(n.substr((n.length() / 2),n.length()));
}
My solution uses 'Divide-and-conquer' paradigm to separate the string in 2 halfs again and again until it finds 1 or 2 characters only. The solution above is simpler. I tried an academic style.

How can I use a variable in another input statement?

I am asking the user to input an expression which will be evaluated in postfix notation. The beginning of the expression is the variable name where the answer of the evaluated expression will be stored. Ex: A 4 5 * 6 + 2 * 1 – 6 / 4 2 + 3 * * = where A is the variable name and the equal sign means the answer to the expression will be stored in the variable A. The OUT A statement means that the number stored in the variable A will be printed out.
What I need help with is that when I input the second expression, I do not get the right answer. For example, my first expression A 4 5 * 6 + 2 * 1 – 6 / 4 2 + 3 * * = will evaluate to 153 and then when I input my second expression B A 10 * 35.50 + =, it has to evaluate to 1565.5, but it doesn't. It evaluates to 35.5. I cannot figure out why I am getting the wrong answer. Also, I need help with the OUT statement.
else if (isalpha(expr1[i]))
{
stackIt.push(mapVars1[expr1[i]]);
}
Will place the variable, or zero if the variable has not been set, onto the stack.
else if (isalpha(expr1[i]))
{
map<char, double>::iterator found = mapVars1.find(expr1[i]);
if (found != mapVars1.end())
{
stackIt.push(found->second);
}
else
{
// error message and exit loop
}
}
Is probably better.
Other suggestions:
Compilers are pretty sharp these days, but you may get a bit out of char cur = expr1[i]; and then using cur (or suitably descriptive variable name) in place of the remaining expr1[i]s in the loop.
Consider using isdigit instead of expr1[i] >= '0' && expr1[i] <= '9'
Test your code for expressions with multiple spaces in a row or a space after an operator. It looks like you will re-add the last number you parsed.
Test for input like 123a456. You might not like the result.
If spaces after each token in the expression are specified in the expression protocol, placing your input string into a stringstream will allow you to remove a great deal of your parsing code.
stringstream in(expr1);
string token;
while (in >> token)
{
if (token == "+" || token == "-'" || ...)
{
// operator code
}
else if (token == "=")
{
// equals code
}
else if (mapVars1.find(token) != mapVars1.end())
{
// push variable
}
else if (token.length() > 0)
{
char * endp;
double val = strtod(token.c_str(), &endp);
if (*endp == '\0')
{
// push val
}
}
}
To use previous symbol names in subsequent expressions add this to the if statements in your parsing loop:
else if (expr1[i] >= 'A' && expr1[i] <= 'Z')
{
stackIt.push(mapVars1[expr[i]]);
}
Also you need to pass mapVars by reference to accumulate its contents across Eval calls:
void Eval(string expr1, map<char, double> & mapVars1)
For the output (or any) other command I would recommend parsing the command token that's at the front of the string first. Then call different evaluators based on the command string. You are trying to check for OUT right now after you have already tried to evaluate the string as an arithmetic assignment command. You need to make that choice first.

Need Help Understanding Recursive Prefix Evaluator

This is a piece of code I found in my textbook for using recursion to evaluate prefix expressions. I'm having trouble understanding this code and the process in which it goes through.
char *a; int i;
int eval()
{ int x = 0;
while (a[i] == ' ') i++;
if (a[i] == '+')
{ i++; return eval() + eval(); }
if (a[i] == '*')
{ i++; return eval() * eval(); }
while ((a[i] >= '0') && (a[i] <= '9'))
x = 10*x + (a[i++] - '0');
return x;
}
I guess I'm confused primarily with the return statements and how it eventually leads to solving a prefix expression. Thanks in advance!
The best way to understand recursive examples is to work through an example :
char* a = "+11 4"
first off, i is initialized to 0 because there is no default initializer. i is also global, so updates to it will affect all calls of eval().
i = 0, a[i] = '+'
there are no leading spaces, so the first while loop condition fails. The first if statement succeeds, i is incremented to 1 and eval() + eval() is executed. We'll evaluate these one at a time, and then come back after we have our results.
i = 1, a[1] = '1'
Again, no leading spaces, so the first while loop fails. The first and second if statements fail. In the last while loop, '1' is between 0 and 9(based on ascii value), so x becomes 0 + a[1] - '0', or 0 + 1 = 1. Important here is that i is incremented after a[i] is read, then i is incremented. The next iteration of the while loop adds to x. Here x = 10 * 1 + a[2] - '0', or 10 + 1 = 11. With the correct value of x, we can exit eval() and return the result of the first operand, again here 11.
i = 2, a[2] = '4'
As in the previous step, the only statement executed in this call of eval() is the last while loop. x = 0 + a[2] - '0', or 0 + 4 = 4. So we return 4.
At this point the control flow returns back to the original call to eval(), and now we have both values for the operands. We simply perform the addition to get 11 + 4 = 15, then return the result.
Every time eval() is called, it computes the value of the immediate next expression starting at position i, and returns that value.
Within eval:
The first while loop is just to ignore all the spaces.
Then there are 3 cases:
(a) Evaluate expressions starting with a + (i.e. An expression of the form A+B which is "+ A B" in prefix
(b) Evaluate expressions starting with a * (i.e. A*B = "* A B")
(c) Evaluate integer values (i.e. Any consecutive sequence of digits)
The while loop at the end takes care of case (c).
The code for case (a) is similar to that for case (b). Think about case (a):
If we encounter a + sign, it means we need to add the next two "things" we find in the sequence. The "things" might be numbers, or may themselves be expressions to be evaluated (such as X+Y or X*Y).
In order to get what these "things" are, the function eval() is called with an updated value of i. Each call to eval() will fetch the value of the immediate next expression, and update position i.
Thus, 2 successive calls to eval() obtain the values of the 2 following expressions.
We then apply the + operator to the 2 values, and return the result.
It will help to work through an example such as "+ * 2 3 * 4 5", which is prefix notation for (2*3)+(4*5).
So this piece of code can only eat +, *, spaces and numbers. It is supposed to eat one command which can be one of:
- + <op1> <op2>
- * <op1> <op2>
<number>
It gets a pointer to a string, and a reading position which is incremented as the program goes along that string.
char *a; int i;
int eval()
{ int x = 0;
while (a[i] == ' ') i++; // it eats all spaces
if (a[i] == '+')
/* if the program encounters '+', two operands are expected next.
The reading position i already points just before the place
from which you have to start reading the next operand
(which is what first eval() call will do).
After the first eval() is finished,
the reading position is moved to the begin of the second operand,
which will be read during the second eval() call. */
{ i++; return eval() + eval(); }
if (a[i] == '*') // exactly the same, but for '*' operation.
{ i++; return eval() * eval(); }
while ((a[i] >= '0') && (a[i] <= '9')) // here it eats all digit until something else is encountered.
x = 10*x + (a[i++] - '0'); // every time the new digit is read, it multiplies the previously obtained number by 10 and adds the new digit.
return x;
// base case: returning the number. Note that the reading position already moved past it.
}
The example you are given uses a couple of global variables. They persist outside of the function's scope and must be initialized before calling the function.
i should be initialized to 0 so that you start at the beginning of the string, and the prefix expression is the string in a.
the operator is your prefix and so should be your first non-blank character, if you start with a number (string of numbers) you are done, that is the result.
example: a = " + 15 450"
eval() finds '+' at i = 1
calls eval()
which finds '1' at i = 3 and then '5'
calculates x = 1 x 10 + 5
returns 15
calls eval()
which finds '4' at i = 6 and then '5' and then '0'
calclulates x = ((4 x 10) + 5) x 10) + 0
returns 450
calculates the '+' operator of 15 and 450
returns 465
The returns are either a value found or the result of an operator and the succeeding results found. So recursively, the function successively looks through the input string and performs the operations until either the string ends or an invalid character is found.
Rather than breaking up code into chunks and so on, i'll try and just explain the concept it as simple as possible.
The eval function always skips spaces so that it points to either a number character ('0'->'9'), an addition ('+') or a multiply ('*') at the current place in the expression string.
If it encounters a number, it proceeds to continue to eat the number digits, until it reaches a non-number digit returning the total result in integer format.
If it encounters operator ('+' and '*') it requires two integers, so eval calls itself twice to get the next two numbers from the expression string and returns that result as an integer.
One hair in the soup may be evaluation order, cf. https://www.securecoding.cert.org/confluence/display/seccode/EXP10-C.+Do+not+depend+on+the+order+of+evaluation+of+subexpressions+or+the+order+in+which+side+effects+take+place.
It is not specified which eval in "eval() + eval()" is, well, evaluated first. That's ok for commutative operators but will fail for - or /, because eval() as a side effect advances the global position counter so that the (in time) second eval gets the (in space) second expression. But that may well be the (in space) first eval.
I think the fix is easy; assign to a temp and compute with that:
if (a[i] == '-')
{ i++; int tmp = eval(); return tmp - eval(); }

Implementing Longest Common Substring using Suffix Array

I am using this program for computing the suffix array and the Longest Common Prefix.
I am required to calculate the longest common substring between two strings.
For that, I concatenate strings, A#B and then use this algorithm.
I have Suffix Array sa[] and the LCP[] array.
The the longest common substring is the max value of LCP[] array.
In order to find the substring, the only condition is that among substrings of common lengths, the one occurring the first time in string B should be the answer.
For that, I maintain max of the LCP[]. If LCP[curr_index] == max, then I make sure that the left_index of the substring B is smaller than the previous value of left_index.
However, this approach is not giving a right answer. Where is the fault?
max=-1;
for(int i=1;i<strlen(S)-1;++i)
{
//checking that sa[i+1] occurs after s[i] or not
if(lcp[i] >= max && sa[i] < l1 && sa[i+1] >= l1+1 )
{
if( max == lcp[i] && sa[i+1] < left_index ) left_index=sa[i+1];
else if (lcp[i] > ma )
{
left_index=sa[i+1];
max=lcp[i];
}
}
//checking that sa[i+1] occurs after s[i] or not
else if (lcp[i] >= max && sa[i] >= l1+1 && sa[i+1] < l1 )
{
if( max == lcp[i] && sa[i] < left_index) left_index=sa[i];
else if (lcp[i]>ma)
{
left_index=sa[i];
max=lcp[i];
}
}
}
AFAIK, This problem is from a programming contest and discussing about programming problems of ongoing contest before editorials have been released shouldn't be .... Although I am giving you some insights as I got Wrong Answer with suffix array. Then I used suffix Automaton which gives me Accepted.
Suffix array works in O(nlog^2 n) whereas Suffix Automaton works in O(n). So my advice is go with suffix Automaton and you will surely get Accepted.
And if you can code solution for that problem, you will surely code this.
Also found in codchef forum that:
Try this case
babaazzzzyy
badyybac
The suffix array will contain baa... (From 1st string ) , baba.. ( from first string ) , bac ( from second string ) , bad from second string .
So if you are examining consecutive entries of SA then you will find a match at "baba" and "bac" and find the index of "ba" as 7 in second string , even though its actually at index 1 also .
Its likely that you may output "yy" instead of "ba"
And also handling the constraint ...the first longest common substring to be found on the second string, should be written to output... would be very easy in case of suffix automaton. Best of luck!