^\s*[)]*\s*$ and ^\s*[(]*\s*$ matches the parentheses ( and ) which are bold. That is, what am trying is to ignore parentheses that are single and not (condition1) parentheses:
while
( #matches here
( #matches here
(condition1) && (condition2) &&
condition3
) ||
(#matches here
(condition4) ||
condition5 &&
(condition6)
) #matches here
) #matches here
but if I have like this it does not match:
while
(( #does not match here
(condition1) && (condition2) &&
condition3
) ||
(
(condition4) ||
condition5 &&
(condition6)
) ) #does not match here
or
while
((( #does not match here
(condition1) && (condition2) &&
condition3
)) ||
(( #does not match here
(condition4) ||
condition5 &&
(condition6)
) ) ) #does not match here
How can I match all the parentheses that are single?
I'd personally recommend that you use a simple stack to figure out open and closing brackets rather than trip over regular expressions.
Related
I have one query. I have to match 2 strings in one if condition:
$release = 5.x (Here x should be greater than or equal to 3)
$version = Rx (this x should be greater than or equal to 5 if $release is 5.3, otherwise anything is acceptable)
e.g. 5.1R11 is not acceptable, 5.3R4 is not, 5.3R5 is acceptable, and 5.4 R1 is acceptable.
I have written a code like this:
$release = "5.2";
$version = "R4";
if ( $release =~ /5.(?>=3)(\d)/ && $version =~ m/R(?>=5)(\d)/ )
{
print "OK";
}
How can I write this?
This is really a three-level version string, and I suggest that you use Perl's version facility
use strict;
use warnings 'all';
use feature 'say';
use version;
my $release = '5.2';
my $version = 'R4';
if ( $version =~ /R(\d+)/ && version->parse("$release.$1") ge v5.3.5 ) {
say 'OK';
}
In regex (?>) it means atomic grouping.
Group the element so it will stored into $1 then compare the $1 with number so it should be
if (( ($release =~ /5\.(\d)/) && ($1 > 3) ) && (($version =~ m/R(\d)/) && ($1 >= 3) ) )
{
print "OK\n";
}
I got the correct one after modifying mkhun's solution:
if ((($release =~ /5.3/)) && (($version =~ m/R(\d+)(.\d+)?/) && ($1 >= 5))
|| ((($release =~ /5.(\d)/) && ($1 > 3)) && ($version =~ m/R(\d+)(.\d+)?/)) )
{
print "OK\n";
}
I want to validate a string, that is containing an expression like
isFun && ( isHelpful || isUseful )
These expressions can contain operands, binary operators and unary operators:
my $unary_operator = qr/(?:\+|\-|!|not)/;
my $binary_operator = qr/(?:<|>|<=|>=|==|\!\=|<=>|\~\~|\&|\||\^|\&\&|\|\||lt|gt|le|ge|and|or|xor|eq|ne|cmp)/i;
my $operand = qr/[a-z0-9_]+/i;
They are kind of similiar to what you know from perl's condition pattern (for example inside of if statements). The brackets have to be balanced and unary operators can only be used once in a row.
I would like to find a perl compatible regular expression, that makes sure to have a valid logical / mathmatical expression where only the given operators are used and the operands are matching the regex that is given by $operand. With recursions, this could be possible in perl. The statement is in infix notation.
My current solution is to parse a tree and perform some iterations, but I want to compress this algorithm into a single regular expression.
For my first try (I still excluded all verbose operands), I used
my $re =
qr{
( # 1
( # 2 operands ...
$operand
)
|
( # 3 unary operators with recursion for operand
(?:$unary_operator(?!\s*$unary_operator)\s*(?1))
)
|
( # 4 balance brackets
\(
\s*(?1)\s*
\)
)
|
( # 5 binary operators with recursion for each operand
(?1)\s*$binary_operator\s*(?1)
)
)
}x;
...which ends up in infinite recursion. I think the recursion might be caused in using the first (?1) in parenthesis 5.
Is somebody out there with a working solution?
The following should work for validating your expressions:
use strict;
use warnings;
my $unary_operator = qr/(?:\+|\-|!|not)/;
my $binary_operator = qr/(?:<|>|<=|>=|==|\!\=|<=>|\~\~|\&|\||\^|\&\&|\|\||lt|gt|le|ge|and|or|xor|eq|ne|cmp)/i;
my $operand = qr/[a-z0-9_]+/i;
my $re =
qr{^
( # 1
\s*
(?> (?:$unary_operator \s*)* )
(?:
\b$operand\b
|
\( (?1) \)
)
(?:
\s*
$binary_operator
\s*
(?1)
)*
\s*
)
$}x;
while (<DATA>) {
chomp;
my ($expr, $status) = split "#", $_, 2;
if ($expr =~ $re) {
print "good $_\n";
} else {
print "bad $_\n";
}
}
__DATA__
isFun # Good
isFun && isHelpful # Good
isFun && isHelpful || isUseful # Good
isFun && ( isHelpful || isUseful ) # Good
isFun && ( isHelpful || (isUseful) ) # Good
isFun && ( isHelpful || (isUseful ) # Fail - Missing )
not (isFun && (isHelpful || (isBad && isDangerous))) # Good
isGenuine!isReal # Fail
!isGenuine isReal # Fail
Outputs:
good isFun # Good
good isFun && isHelpful # Good
good isFun && isHelpful || isUseful # Good
good isFun && ( isHelpful || isUseful ) # Good
good isFun && ( isHelpful || (isUseful) ) # Good
bad isFun && ( isHelpful || (isUseful ) # Fail - Missing )
good not (isFun && (isHelpful || (isBad && isDangerous))) # Good
bad isGenuine!isReal # Fail
bad !isGenuine isReal # Fail
what's the regex for get all match about:
IF(.....);
I need to get the start and the end of the previous string: the content can be also ( and ) and then can be other (... IF (...) ....)
I need ONLY content inside IF.
Any idea ?
That's because, I need to get an Excel formula (if condition) and transforms it to another language (java script).
EDIT:
i tried
`/IF\s*(\(\s*.+?\s*\))/i or /IF(\(.+?\))/`
this doesn't work because it match only if there aren't ) or ( inside 'IF(...)'
I suspect you have a problewm that is not suitable for regex matching. You want to do unbounded counting (so you can match opening and closing parentheses) and this is more than a regexp can handle. Hand-rolling a parser to do the matching you want shouldn't be hard, though.
Essentially (pseudo-code):
Find "IF"
Ensure next character is "("
Initialise counter parendepth to 1
While parendepth > 0:
place next character in ch
if ch == "(":
parendepth += 1
if ch == ")":
parendepth -= 1
Add in small amounts of "remember start" and "remember end" and you should be all set.
This is one way to do it in Perl. Any regex flavor that allows recursion
should have this capability.
In this example, the fact that the correct parenthesis are annotated
(see the output) and balanced, means its possible to store the data
in a structured way.
This in no way validates anything, its just a quick solution.
use strict;
use warnings;
##
$/ = undef;
my $str = <DATA>;
my ($lvl, $keyword) = ( 0, '(?:IF|ELSIF)' ); # One or more keywords
# (using 2 in this example)
my $kwrx = qr/
(\b $keyword \s*) #1 - keword capture group
( #2 - recursion group
\( # literal '('
( #3 - content capture group
(?:
(?> [^()]+ ) # any non parenth char
| (?2) # or, recurse group 2
)*
)
\) # literal ')'
)
| ( (?:(?!\b $keyword \s*).)+ ) #4
| ($keyword) #5
/sx;
##
print "\n$str\n- - -\n";
findKeywords ( $str );
exit 0;
##
sub findKeywords
{
my ($str) = #_;
while ($str =~ /$kwrx/g)
{
# Process keyword(s), recurse its contents
if (defined $2) {
print "${1}[";
$lvl++;
findKeywords ( $3 );
}
# Process non-keyword text
elsif (defined $4) {
print "$4";
}
elsif (defined $5) {
print "$5";
}
}
if ($lvl > 0) {
print ']';
$lvl--;
}
}
__DATA__
IF( some junk IF (inner meter(s)) )
THEN {
IF ( its in
here
( IF (a=5)
ELSIF
( b=5
and IF( a=4 or
IF(its Monday) and there are
IF( ('lots') IF( ('of') IF( ('these') ) ) )
)
)
)
then its ok
)
ELSIF ( or here() )
ELSE (or nothing)
}
Output:
IF( some junk IF (inner meter(s)) )
THEN {
IF ( its in
here
( IF (a=5)
ELSIF
( b=5
and IF( a=4 or
IF(its Monday) and there are
IF( ('lots') IF( ('of') IF( ('these') ) ) )
)
)
)
then its ok
)
ELSIF ( or here() )
ELSE (or nothing)
}
- - -
IF[ some junk IF [inner meter(s)] ]
THEN {
IF [ its in
here
( IF [a=5]
ELSIF
[ b=5
and IF[ a=4 or
IF[its Monday] and there are
IF[ ('lots') IF[ ('of') IF[ ('these') ] ] ]
]
]
)
then its ok
]
ELSIF [ or here() ]
ELSE (or nothing)
}
Expanding on Paolo's answer, you might also need to worry about spaces and case:
/IF\s*(\(\s*.+?\s*\))/i
This should work and capture all the text between parentheses, including both parentheses, as the first match:
/IF(\(.+?\))/
Please note that it won't match IF() (empty parentheses): if you want to match empty parentheses too, you can replace the + (match one or more) with an * (match zero or more):
/IF(\(.*?\))/
--- EDIT
If you need to match formulas with parentheses (besides the outmost ones) you can use
/IF(\(.*\))/
which will make the regex "not greedy" by removing the ?. This way it will match the longest string possible. Sorry, I assumed wrongly that you did not have any sub-parentheses.
It's not possible only using regular expressions. If you are or can use .NET you should look in to using Balanced Matching.
I want to do something like this
if(($Fifo[5]=~/T0int(\S+)/)&&($Fifo[6]=~/T0int(\S+)/)&&($1 ne $2))
{
<Do something>
}
How can I reference matches evaluated in two regexps ?
By $1 I meant match evaluated in the first regexp and $2 in the next.
my($first) = $Fifo[5] =~ /T0int(\S+)/;
my($second) = $Fifo[6] =~ /T0int(\S+)/;
if (defined($first) && defined($second) && $first ne $second)) { ⋯ }
or more cavalierly:
if (($Fifo[5] =~ /T0int(\S+)/)[0] ne ($Fifo[6] =~ /T0int(\S+)/)[0]) { ⋯ }
or even more cavalierly still:
if ( (my($first, $second) = "#Fifo[5,6]" =~ /T0int(\S+)/g )
&& $first && $second
&& $first ne $second)
{
⋯
}
Try something like this:
if( ($Fifo[5] =~ (/T0int(\S+)/)) && ($Fifo[6] =~ (/T0int(\S+)/)) && ($1 ne $2) )
Basically put parenthesis around regex to group them as $1, $2
Perhaps regex is not the best way to parse this, tell me if I it is not. Anyway, here are some examples of what the syntax tree looks like:
(S (CC and))
(SBARTMP (IN once) (NP otherstuff))
(S (S (NP blah (VP blah)) (CC then) (NP blah (VP blah (PP blah))) ))
Anyway, what I am trying to do is pull the connective out (and, then, once, etc) and its corresponding head (CC,IN,CC), which I already know for each syntax tree so it can act as an anchor, and I also need to retrieve its parent (in the first it is S, second SBARTMP, and third it is S), and its siblings, if there are any (in the first none, in the second left hand side sibling, and third left-hand-side and right-hand-side sibling). Anything higher than the parent is not included
my $pos = "(\\\w|-)*";
my $sibling = qr{\s*(\\((?:(?>[^()]+)|(?1))*\\))\s*};
my $connective = "once";
my $re = qr{(\(\w*\s*$sibling*\s*\\(IN\s$connective\\)\s*$sibling*\s*\))};
This code works for things like:
my $test1 = "(X (SBAR-TMP (IN once) (S sdf) (S sdf)))";
my $test2 = "(X (SBAR-TMP (IN once))";
my $test3 = "(X (SBAR-TMP (IN once) (X as))";
my $test4 = "(X (SBAR-TMP (X adsf) (IN once))";
It will throw away the X on top and keep everything else, however, once the siblings have stuff embedded in them then it does not match because the regex does not go deeper.
my $test = "(X (SBAR-TMP (IN once) (MORE stuff (MORE stuff))))";
I am not sure how to account for this. I am kind of new to the extended patterns for Perl, just started learning it. To clarify a bit about what the regex is doing: it looks for the connective within two parentheses and the capital-letter/- combo, looks for a complete parent of the same format closing with two parentheses and then should look for any number of siblings that have all their parentheses paired off.
To only get the nearest 'parent' to your anchor connective you can
do it as a recursive parent with a FAIL or do it directly.
(for some reason I can't edit my other posts, must be cookies being deleted).
use strict;
use warnings;
my $connective = qr/ \((?:IN|CC)\s(?:once|and|then)\)/x;
my $sibling = qr/
\s*
(
(?! $connective )
\(
(?:
(?> (?: [^()]+ ) )
| (?-1)
)*
\)
)
\s*
/x;
my $regex1 = qr/
\( ( [\w-]+ \s* $sibling* \s* $connective \s* $sibling* ) \) #1
/x;
my $regex2 = qr/
( #1
\( \s*
( #2
[\w-]+ \s*
(?> $sibling* \s* $connective (?(R)(*FAIL)) \s* $sibling*
| (?1)
)
)
\s*
\)
)
/x;
my $sample = qq/
(X (SBAR-TMP (IN once) (S sdf) (S sdf)))
(X (SBAR-TMP (IN once))
(X (SBAR-TMP (IN once) (X as))
(X (SBAR-TMP (X adsf) (IN once))
(X (SBAR-TMP (IN once) (MORE stuff (MORE stuff))))
(S (CC and))
(SBARTMP (IN once) (NP otherstuff))
(S (S (NP blah (VP blah)) (CC then) (NP blah (VP blah (PP blah))) ))
/;
while ($sample =~ /$regex1/xg) {
print "Found: $1\n";
}
print '-' x 20, "\n";
while ($sample =~ /$regex2/xg) {
print "Found: $2\n";
}
__END__
Why did you give up on this, you almost had it. Try this:
use strict;
use warnings;
my $connective = qr/(?: \((?:IN|CC)\s(?:once|and|then)\) )/x;
my $sibling = qr/
\s*
(
(?!$connect)
\(
(?:
(?> (?: [^()]+ ) )
| (?-1)
)*
\)
)
\s*
/x;
my $regex = qr/
( #1
\(
\s* [\w-]+ \s*
(?> $sibling* \s* $connective \s* $sibling*
| (?1)
)
\s*
\)
)
/x;
my #tests = (
'(X (SBAR-TMP (IN once) (S sdf) (S sdf)))',
'(X (SBAR-TMP (IN once))',
'(X (SBAR-TMP (IN once) (X as))',
'(X (SBAR-TMP (X adsf) (IN once))',
);
for my $sample (#tests)
{
while ($sample =~ /$regex/xg) {
print "Found: $1\n";
}
}
my $another =<<EOS;
(S (CC and))
(SBARTMP (IN once) (NP otherstuff))
(S
(S
(NP blah
(VP blah)
)
(CC then)
(NP blah
(VP blah
(PP blah)
)
)
)
)
EOS
print "\n---------\n";
while ($another =~ /$regex/xg) {
print "\nFound:\n$1\n";
}
END
This should work as well
use strict;
use warnings;
my $connective = qr/(?: \((?:IN|CC)\s(?:once|and|then)\) )/x;
my $sibling = qr/
(?: \s*
(
(?!$connective)
\(
(?:
(?> (?: [^()]+ ) )
| (?-1)
)*
\)
)
\s* )
/x;
my $regex = qr/
( #1
\( \s*
( #2
[\w-]+ \s*
(?> $sibling* \s* $connective (?(R)(*FAIL)) \s* $sibling*
| (?1)
)
)
\s*
\)
)
/x;
my #tests = (
'(X (SBAR-TMP (IN once) (S sdf) (S sdf)))',
'(X (SBAR-TMP (IN once))',
'(X (SBAR-TMP (IN once) (X as))',
'(X (SBAR-TMP (X adsf) (IN once))',
'(X (SBAR-TMP (IN once) (MORE stuff (MORE stuff))))',
);
for my $sample (#tests)
{
while ($sample =~ /$regex/xg) {
print "Found: $2\n";
}
}
my $another = "
(S (CC and))
(SBARTMP (IN once) (NP otherstuff))
(S (S (NP blah (VP blah)) (CC then) (NP blah (VP blah (PP blah))) ))
";
print "\n---------\n";
while ($another =~ /$regex/xg) {
print "\nFound:\n$2\n";
}
__END__