How to find the next unbalanced brace?

How to find the next unbalanced brace? - regex

The regex below captures everything up to the last balanced }.
Now, what regex would be able to capture everything up to the next unbalanced }? In other words, how can I can get ... {three {four}} five} from $str instead of just ... {three {four}}?
my $str = "one two {three {four}} five} six";
if ( $str =~ /
(
.*?
{
(?> [^{}] | (?-1) )+
}
)
/sx
)
{
print "$1\n";
}

So you want to match
[noncurlies [block noncurlies [...]]] "}"
where a block is
"{" [noncurlies [block noncurlies [...]]] "}"
As a grammar:
start : text "}"
text : noncurly* ( block noncurly* )*
block : "{" text "}"
noncurly : /[^{}]/
As a regex (5.10+):
/
^
(
(
[^{}]*
(?:
\{ (?-1) \}
[^{}]*
)*
)
\}
)
/x
As a regex (5.10+):
/
^ ( (?&TEXT) \} )
(?(DEFINE)
(?<TEXT> [^{}]* (?: (?&BLOCK) [^{}]* )* )
(?<BLOCK> \{ (?&TEXT) \} )
)
/x

Related

need return value for captured group from last captured string in perl

I have XML files from which i want to capture init value( tag) for each parameter.I am copying some part of xml for reference.
I have port name and parameter name( tag(MNO) available with me.
eg . port name is XYZ & parameter name is MNO
port name is PQR & parameter name is ABC and GHI
There can be multiple tag under one container.
<R-PORT-PROTOTYPE UUID="Oac11eff016c6bb667f357a89xOac11f0ad174240e817fa858f00">
<SHORT-NAME>XYZ</SHORT-NAME>
<REQUIRED-COM-SPECS>
<PARAMETER-REQUIRE-COM-SPEC>
<INIT-VALUE>
<APPLICATION-VALUE-SPECIFICATION>
<SHORT-LABEL>Init_Val</SHORT-LABEL>
<CATEGORY>VALUE</CATEGORY>
<SW-VALUE-CONT>
<SW-VALUES-PHYS>
<V>0.071</V>
</SW-VALUES-PHYS>
</SW-VALUE-CONT>
</APPLICATION-VALUE-SPECIFICATION>
</INIT-VALUE>
<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">/SoftwareTypes/Interfaces/MNO</PARAMETER-REF>
</PARAMETER-REQUIRE-COM-SPEC>
</REQUIRED-COM-SPECS>
</R-PORT-PROTOTYPE>
<R-PORT-PROTOTYPE UUID="Oac11eff016c6bb667f357a89xOac11f0ad174240e817f8f55900">
<SHORT-NAME>PQR</SHORT-NAME>
<REQUIRED-COM-SPECS>
<PARAMETER-REQUIRE-COM-SPEC>
<INIT-VALUE>
<APPLICATION-VALUE-SPECIFICATION>
<SHORT-LABEL>Init_0</SHORT-LABEL>
<CATEGORY>VALUE</CATEGORY>
<SW-VALUE-CONT>
<SW-VALUES-PHYS>
<V>80</V>
</SW-VALUES-PHYS>
</SW-VALUE-CONT>
</APPLICATION-VALUE-SPECIFICATION>
</INIT-VALUE>
<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">/SoftwareTypes/Interfaces/ABC</PARAMETER-REF>
</PARAMETER-REQUIRE-COM-SPEC>
<PARAMETER-REQUIRE-COM-SPEC>
<INIT-VALUE>
<APPLICATION-VALUE-SPECIFICATION>
<SHORT-LABEL>Int_ghi</SHORT-LABEL>
<CATEGORY>VALUE</CATEGORY>
<SW-VALUE-CONT>
<SW-VALUES-PHYS>
<V>-80</V>
</SW-VALUES-PHYS>
</SW-VALUE-CONT>
</APPLICATION-VALUE-SPECIFICATION>
</INIT-VALUE>
<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">/SoftwareTypes/Interfaces/GHI</PARAMETER-REF>
</PARAMETER-REQUIRE-COM-SPEC>
</REQUIRED-COM-SPECS>
</R-PORT-PROTOTYPE>
regex :
if($test_string=~ /<R-PORT-PROTOTYPE.*?<short-name>Port_name<\/short-name>.*?<V>(.*?)<\/.*?<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">.*?Parameter_name<\/PARAMETER-REF>/gis) {
print $2;
}
I need output 80 if parameter is ABC and -80 if parameter is GHI

I suggest using XML::LibXML.
Here I've combined two Xpath queries to find V nodes:
SHORT-NAME is XYZ and PARAMETER-REF (with DEST == PARAMETER-DATA-PROTOTYPE) contains MNO.
SHORT-NAME is PQR and PARAMETER-REF (with DEST == PARAMETER-DATA-PROTOTYPE) contains ABC or GHI.
Example:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $dom = XML::LibXML->load_xml(location => 'doc.xml');
my $query = q{
//R-PORT-PROTOTYPE/SHORT-NAME[text()="XYZ"]/..
//PARAMETER-REF[#DEST="PARAMETER-DATA-PROTOTYPE"][
contains(text(),'MNO')
]/..//V
|
//R-PORT-PROTOTYPE/SHORT-NAME[text()="PQR"]/..
//PARAMETER-REF[#DEST="PARAMETER-DATA-PROTOTYPE"][
contains(text(),'ABC') or contains(text(),'GHI')
]/..//V
};
foreach my $vnode ($dom->findnodes($query)) {
print $vnode->to_literal() . "\n";
}
Output:
0.071
80
-80

The two ways to get either or both is
1 - Linear https://regex101.com/r/NYbvI8/1
# https://regex101.com/r/NYbvI8/1
# if($test_string=~ /<R-PORT-PROTOTYPE.*?<short-name>PQR<\/short-name>(?:.*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO1<\/PARAMETER-REF>)?(?:.*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO2<\/PARAMETER-REF>)?(?(1)|(?(2)|(?!)))/gis)
<R-PORT-PROTOTYPE .*? <short-name>PQR</short-name>
(?:
.*?
<V>
( .*? ) # (1)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO1</PARAMETER-REF>
)?
(?:
.*?
<V>
( .*? ) # (2)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO2</PARAMETER-REF>
)?
(?(1)
| (?(2)
| (?!)
)
)
2 - Out of order https://regex101.com/r/gQJ3cO/1
# https://regex101.com/r/t4M9UB/1
# if($test_string=~ /<R-PORT-PROTOTYPE.*?<short-name>PQR<\/short-name>(?:(?:(?(1)(?!)).*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO1<\/PARAMETER-REF>)|(?:(?(2)(?!)).*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO2<\/PARAMETER-REF>)){1,2}/gis)
<R-PORT-PROTOTYPE .*? <short-name>PQR</short-name>
(?:
(?:
(?(1) (?!) )
.*?
<V>
( .*? ) # (1)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO1</PARAMETER-REF>
)
|
(?:
(?(2) (?!) )
.*?
<V>
( .*? ) # (2)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO2</PARAMETER-REF>
)
){1,2}

Rewriting a recursive regex for older Perl version

The following piece of code works just fine with Perl (v5.16.2). However, when I run it using Perl v5.8.9, it complains about the following regex. How can I rewrite this regex in a way that works with Perl v5.8.9. (I can't update the version).
REGEX:
use strict;
use warnings;
our %formula_per_k;
INIT {
# List all functions that you want to allow in formulas. All other words will be interpretted as variables.
my #FORMULA_FUNCS = qw(sqrt exp log);
# Load the data via a file.
my $data = do {local $/; <DATA>};
# Parse K blocks
while ($data =~ m{
^K \s+ (\w+) \s* \{
( (?: [^{}]+ | \{(?2)\} )* ) # Matched braces only.
\}
}mgx) {
my ($name, $params) = ($1, $2);
# Parse LOL block
next if $params !~ m{
LOL \s* \{
( (?: [^{}]+ | \{(?1)\} )*? ) # Matched braces only.
\}
}mx;
my $lol = $1;
# Start building anonymous subroutine
my $conditions = '';
# Parse Conditions and Formulas
while ($lol =~ m{
COND \s* \{ (.*?) \} \s*
FORMULA \s* \{ (.*?) \}
}gx) {
my ($cond, $formula) = ($1, $2);
# Remove Excess spacing and translate variable into perl scalar.
for ($cond, $formula) {
s/^\s+|\s+$//g;
s{([a-zA-Z]+)}{
my $var = $1;
$var = "\$hashref->{$var}" if ! grep {$var eq $_} #FORMULA_FUNCS;
$var
}eg;
}
$conditions .= "return $formula if $cond; ";
}
my $code = "sub {my \$hashref = shift; ${conditions} return; }";
my $sub = eval $code;
if ($#) {
die "Invalid formulas in $name: $#";
}
$formula_per_k{$name} = $sub;
}
}
sub formula_per_k {
my ($k, $vars) = #_;
die "Unrecognized K value '$k'" if ! exists $formula_per_k{$k};
return $formula_per_k{$k}($vars);
}
print "'K1', {d => .1} = " . formula_per_k('K1', {d => .1}) . "\n";
print "'K1', {d => .05} = " . formula_per_k('K1', {d => .05}) . "\n";
print "'K3', {d => .02} = " . formula_per_k('K3', {d => .02}) . "\n";
print "'K3', {d => .021} = " . formula_per_k('K3', {d => .021}) . "\n";
__DATA__
... #OTHER STUFFS
K K1 {
LOL {
COND { d < 0.01 }
FORMULA { -0.2 + 3.3*sqrt(d) }
COND { d >= 0.01 }
FORMULA { -0.2 + 3.3*sqrt(d+0.4) }
}
}
... #OTHER STUFFS
K K2 {
LOL {
COND { d < 0.03 }
FORMULA { -2.2 + 1.3*sqrt(d) }
COND { d >= 0.03 }
FORMULA { -2.2 + 1.3*sqrt(d+0.8) }
}
}
... #OTHER STUFFS
K K3 {
LOL {
COND { d < 0.02 }
FORMULA { -4.3 + 0.3*sqrt(d) }
COND { d >= 0.02 }
FORMULA { -4.3 + 0.3*sqrt(d+0.3) }
}
}
... #OTHER STUFF
Outputs:
'K1', {d => .1} = 2.13345237791561
'K1', {d => .05} = 2.01370729772479
'K3', {d => .02} = -4.13029437251523
'K3', {d => .021} = -4.13002941430942
ERROR:
Sequence (?1...) not recognized in regex; marked by <-- HERE in m/
^K \s+ M3 \s* {
( (?: [^{}]+ | {(?2 <-- HERE )} )* ) # Matched braces only.
}
/ at ./code.pl line 215, <RFILE> line 12.
UPDATE:
Code is updated.
This was originally suggested by https://stackoverflow.com/users/1733163/miller

Before the introduction of (?PARNO), we had to use (??{ code }) to create recursive regular expressions. An example can be found in perlre - Extended Patterns.
The following is tested on v5.16.2, v5.20.0, and locally on a v5.8.9 perlbrew:
our $braces_re;
$braces_re = qr{
\{
(?:
(?> [^{}]+ )
|
(??{ $braces_re })
)*
\}
}sx;
# parse FOO block
while (
$data =~ m{
^FOO \s+ (\w+) \s* \{
( (?: [^{}]+ | (??{ $braces_re }) )* ) # Matched braces only.
\}
}mgx
)
{
my $params = $1;
# parse BAR block
next if $params !~ m{
BAR \s* \{
( (?: [^{}]+ | (??{ $braces_re }) )*? ) # Matched braces only.
\}
}mx;
# SOME CODE
}
Note, I intentionally separated out the declaration of the _re variable and its initialization. There are some versions of perl that will let you declare a recursive regular expression in the same statement as the initialization, but v5.8.9 is not one of them.
Also, if you're comfortable altering your original regex more than just dropping in a replacement for (?PARNO) notation, then the above can be reduced to the following. Also confirmed on v5.16.2:
my $braces_re;
$braces_re = qr{
(?:
(?> [^{}]+ )
| # The following is a "postponed" regular subexpression.
\{ (??{ $braces_re }) \} # Deferred execution enables recursive regex
)*
}sx;
# parse FOO block
while ( $data =~ m{^FOO \s+ (\w+) \s* \{ ( $braces_re ) \} }mgx ) {
my $params = $1;
# parse BAR block
next if $params !~ m{BAR \s* \{ ( $braces_re ) \}}mx;
# SOME CODE
}

perl stream file for regex token including scanned tokens

I am trying to stream a file in perl and tokenize the lines and include the tokens.
I have:
while( $line =~ /([\/][\d]*[%].*?[%][\d]*[\/]|[^\s]+|[\s]+)/g ) {
my $word = $1;
#...
}
But it doesn't work when there's no spaces in the token.
For example, if my line is:
$line = '/15%one (1)(2)%15/ is a /%good (1)%/ +/%number(2)%/.'
I would like to split that line into:
$output =
[
'/15%one (1)(2)%15/',
' ',
'is',
' ',
'a',
'/%good (1)%/',
' ',
'+',
'/%number(2)%/',
'.'
]
What is the best way to do this?

(?:(?!STRING).)* is to STRING as [^CHAR]* is to CHAR, so
my #tokens;
push #tokens, $1
while $line =~ m{
\G
( \s+
| ([\/])([0-9]*)%
(?: (?! %\3\2 ). )*
%\3\2
| (?: (?! [\/][0-9]*% )\S )+
)
}sxg;
but that doesn't validate. If you want to validate, you could use
my #tokens;
push #tokens, $1
while $line =~ m{
\G
( \s+
| ([\/])([0-9]*)%
(?: (?! %\3\2 ). )*
%\3\2
| (?: (?! [\/][0-9]*% )\S )+
| \z (*COMMIT) (*FAIL)
| (?{ die "Syntax error" })
)
}sxg;
The following also validates, but it's a bit more readable and makes it easy to differentiate the token types.:
my #tokens;
for ($line) {
m{\G ( \s+ ) }sxgc
&& do { push #tokens, $1; redo };
m{\G ( ([\/])([0-9]*)% (?: (?! %\3\2 ). )* %\3\2 ) }sxgc
&& do { push #tokens, $1; redo };
m{\G ( (?: (?! [\/][0-9]*% )\S )+ ) }sxgc
&& do { push #tokens, $1; redo };
m{\G \z }sxgc
&& last;
die "Syntax error";
}
pos will get you information about where the error occurred.

Regex with recursive expression to match nested braces?

I'm trying to match text like sp { ...{...}... }, where the curly braces are allowed to nest. This is what I have so far:
my $regex = qr/
( #save $1
sp\s+ #start Soar production
( #save $2
\{ #opening brace
[^{}]* #anything but braces
\} #closing brace
| (?1) #or nested braces
)+ #0 or more
)
/x;
I just cannot get it to match the following text: sp { { word } }. Can anyone see what is wrong with my regex?

There are numerous problems. The recursive bit should be:
(
(?: \{ (?-1) \}
| [^{}]+
)*
)
All together:
my $regex = qr/
sp\s+
\{
(
(?: \{ (?-1) \}
| [^{}]++
)*
)
\}
/x;
print "$1\n" if 'sp { { word } }' =~ /($regex)/;

This is case for the underused Text::Balanced, a very handy core module for this kind of thing. It does rely on the pos of the start of the delimited sequence being found/set first, so I typically invoke it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::Balanced 'extract_bracketed';
sub get_bracketed {
my $str = shift;
# seek to beginning of bracket
return undef unless $str =~ /(sp\s+)(?={)/gc;
# store the prefix
my $prefix = $1;
# get everything from the start brace to the matching end brace
my ($bracketed) = extract_bracketed( $str, '{}');
# no closing brace found
return undef unless $bracketed;
# return the whole match
return $prefix . $bracketed;
}
my $str = 'sp { { word } }';
print get_bracketed $str;
The regex with the gc modifier tells the string to remember where the end point of the match is, and extract_bracketed uses that information to know where to start.

Parse IF condition using Regular Expression

I need to create the RE meets the following IF condition
string InputValue=" If (X.Value==” X”) then X.Value = “X”;
Elseif (X.Value==” X”) then X.Value = “X”;
Elseif (X.Value==” Y ") then X.Value = “Y”;
Elseif (X.Value== ” Z ") then X.Value = “Z”;
Else X.Value = “M”;";
as you know its only 1 if and 0 or many ElseIF and 0 or 1 Else and also i want to consider space and Enter
I try to use the following RE but its failed
string pattern="If\([a-z]*\.Value==""[a-z]*""\) Then [a-z]*\.Value=""[a-z]*""\;
(ElseIf\([a-z]*\.Value==""[a-z]*""\) Then [a-z]*\.Value=""[a-z]*""\;)*
(Else [a-z]*\.Value=""[a-z]*""\;)?";
bool result = Regex.IsMatch(InputValue, pattern, RegexOptions.IgnoreCase);
All ideas are welcomed

As in http://ideone.com/onS2e:
string condition = #"[_a-z]\w* \.VALUE \s* == \s* "" [^""]* """;
string assignment = #"[_a-z]\w* \.VALUE \s* = \s* "" [^""]* "" \s* ;";
string pattern = string.Format(
#"\b IF \s* \( \s* {0} \s* \) \s* THEN \s+ {1} \s*
( \b ELSEIF \s* \( \s* {0} \s* \) \s* THEN \s+ {1} \s* )* # repeat ELSEIF any number of times
( \b ELSE \s+ {1} )? # at most one ELSE",
condition, assignment);
Regex myRegex = new Regex( pattern, RegexOptions.IgnorePatternWhitespace |
RegexOptions.IgnoreCase | RegexOptions.Singleline );
Updated:
As in http://ideone.com/1coOp:
string pattern = string.Format(
#"^ \s* IF \s* \( \s* {0} \s* \) \s* THEN \s+ {1} \s*
( \b ELSEIF \s* \( \s* {0} \s* \) \s* THEN \s+ {1} \s* )* # repeat ELSEIF any number of times
( \b ELSE \s+ {1} )? # at most one ELSE
\s* $",
condition, assignment);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to find the next unbalanced brace? - regex

Related

need return value for captured group from last captured string in perl

Rewriting a recursive regex for older Perl version

perl stream file for regex token including scanned tokens

Regex with recursive expression to match nested braces?

Parse IF condition using Regular Expression

Categories

Resources