Parse IF condition using Regular Expression

Parse IF condition using Regular Expression - regex

I need to create the RE meets the following IF condition
string InputValue=" If (X.Value==” X”) then X.Value = “X”;
Elseif (X.Value==” X”) then X.Value = “X”;
Elseif (X.Value==” Y ") then X.Value = “Y”;
Elseif (X.Value== ” Z ") then X.Value = “Z”;
Else X.Value = “M”;";
as you know its only 1 if and 0 or many ElseIF and 0 or 1 Else and also i want to consider space and Enter
I try to use the following RE but its failed
string pattern="If\([a-z]*\.Value==""[a-z]*""\) Then [a-z]*\.Value=""[a-z]*""\;
(ElseIf\([a-z]*\.Value==""[a-z]*""\) Then [a-z]*\.Value=""[a-z]*""\;)*
(Else [a-z]*\.Value=""[a-z]*""\;)?";
bool result = Regex.IsMatch(InputValue, pattern, RegexOptions.IgnoreCase);
All ideas are welcomed

As in http://ideone.com/onS2e:
string condition = #"[_a-z]\w* \.VALUE \s* == \s* "" [^""]* """;
string assignment = #"[_a-z]\w* \.VALUE \s* = \s* "" [^""]* "" \s* ;";
string pattern = string.Format(
#"\b IF \s* \( \s* {0} \s* \) \s* THEN \s+ {1} \s*
( \b ELSEIF \s* \( \s* {0} \s* \) \s* THEN \s+ {1} \s* )* # repeat ELSEIF any number of times
( \b ELSE \s+ {1} )? # at most one ELSE",
condition, assignment);
Regex myRegex = new Regex( pattern, RegexOptions.IgnorePatternWhitespace |
RegexOptions.IgnoreCase | RegexOptions.Singleline );
Updated:
As in http://ideone.com/1coOp:
string pattern = string.Format(
#"^ \s* IF \s* \( \s* {0} \s* \) \s* THEN \s+ {1} \s*
( \b ELSEIF \s* \( \s* {0} \s* \) \s* THEN \s+ {1} \s* )* # repeat ELSEIF any number of times
( \b ELSE \s+ {1} )? # at most one ELSE
\s* $",
condition, assignment);

Related

need return value for captured group from last captured string in perl

I have XML files from which i want to capture init value( tag) for each parameter.I am copying some part of xml for reference.
I have port name and parameter name( tag(MNO) available with me.
eg . port name is XYZ & parameter name is MNO
port name is PQR & parameter name is ABC and GHI
There can be multiple tag under one container.
<R-PORT-PROTOTYPE UUID="Oac11eff016c6bb667f357a89xOac11f0ad174240e817fa858f00">
<SHORT-NAME>XYZ</SHORT-NAME>
<REQUIRED-COM-SPECS>
<PARAMETER-REQUIRE-COM-SPEC>
<INIT-VALUE>
<APPLICATION-VALUE-SPECIFICATION>
<SHORT-LABEL>Init_Val</SHORT-LABEL>
<CATEGORY>VALUE</CATEGORY>
<SW-VALUE-CONT>
<SW-VALUES-PHYS>
<V>0.071</V>
</SW-VALUES-PHYS>
</SW-VALUE-CONT>
</APPLICATION-VALUE-SPECIFICATION>
</INIT-VALUE>
<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">/SoftwareTypes/Interfaces/MNO</PARAMETER-REF>
</PARAMETER-REQUIRE-COM-SPEC>
</REQUIRED-COM-SPECS>
</R-PORT-PROTOTYPE>
<R-PORT-PROTOTYPE UUID="Oac11eff016c6bb667f357a89xOac11f0ad174240e817f8f55900">
<SHORT-NAME>PQR</SHORT-NAME>
<REQUIRED-COM-SPECS>
<PARAMETER-REQUIRE-COM-SPEC>
<INIT-VALUE>
<APPLICATION-VALUE-SPECIFICATION>
<SHORT-LABEL>Init_0</SHORT-LABEL>
<CATEGORY>VALUE</CATEGORY>
<SW-VALUE-CONT>
<SW-VALUES-PHYS>
<V>80</V>
</SW-VALUES-PHYS>
</SW-VALUE-CONT>
</APPLICATION-VALUE-SPECIFICATION>
</INIT-VALUE>
<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">/SoftwareTypes/Interfaces/ABC</PARAMETER-REF>
</PARAMETER-REQUIRE-COM-SPEC>
<PARAMETER-REQUIRE-COM-SPEC>
<INIT-VALUE>
<APPLICATION-VALUE-SPECIFICATION>
<SHORT-LABEL>Int_ghi</SHORT-LABEL>
<CATEGORY>VALUE</CATEGORY>
<SW-VALUE-CONT>
<SW-VALUES-PHYS>
<V>-80</V>
</SW-VALUES-PHYS>
</SW-VALUE-CONT>
</APPLICATION-VALUE-SPECIFICATION>
</INIT-VALUE>
<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">/SoftwareTypes/Interfaces/GHI</PARAMETER-REF>
</PARAMETER-REQUIRE-COM-SPEC>
</REQUIRED-COM-SPECS>
</R-PORT-PROTOTYPE>
regex :
if($test_string=~ /<R-PORT-PROTOTYPE.*?<short-name>Port_name<\/short-name>.*?<V>(.*?)<\/.*?<PARAMETER-REF DEST="PARAMETER-DATA-PROTOTYPE">.*?Parameter_name<\/PARAMETER-REF>/gis) {
print $2;
}
I need output 80 if parameter is ABC and -80 if parameter is GHI

I suggest using XML::LibXML.
Here I've combined two Xpath queries to find V nodes:
SHORT-NAME is XYZ and PARAMETER-REF (with DEST == PARAMETER-DATA-PROTOTYPE) contains MNO.
SHORT-NAME is PQR and PARAMETER-REF (with DEST == PARAMETER-DATA-PROTOTYPE) contains ABC or GHI.
Example:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $dom = XML::LibXML->load_xml(location => 'doc.xml');
my $query = q{
//R-PORT-PROTOTYPE/SHORT-NAME[text()="XYZ"]/..
//PARAMETER-REF[#DEST="PARAMETER-DATA-PROTOTYPE"][
contains(text(),'MNO')
]/..//V
|
//R-PORT-PROTOTYPE/SHORT-NAME[text()="PQR"]/..
//PARAMETER-REF[#DEST="PARAMETER-DATA-PROTOTYPE"][
contains(text(),'ABC') or contains(text(),'GHI')
]/..//V
};
foreach my $vnode ($dom->findnodes($query)) {
print $vnode->to_literal() . "\n";
}
Output:
0.071
80
-80

The two ways to get either or both is
1 - Linear https://regex101.com/r/NYbvI8/1
# https://regex101.com/r/NYbvI8/1
# if($test_string=~ /<R-PORT-PROTOTYPE.*?<short-name>PQR<\/short-name>(?:.*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO1<\/PARAMETER-REF>)?(?:.*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO2<\/PARAMETER-REF>)?(?(1)|(?(2)|(?!)))/gis)
<R-PORT-PROTOTYPE .*? <short-name>PQR</short-name>
(?:
.*?
<V>
( .*? ) # (1)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO1</PARAMETER-REF>
)?
(?:
.*?
<V>
( .*? ) # (2)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO2</PARAMETER-REF>
)?
(?(1)
| (?(2)
| (?!)
)
)
2 - Out of order https://regex101.com/r/gQJ3cO/1
# https://regex101.com/r/t4M9UB/1
# if($test_string=~ /<R-PORT-PROTOTYPE.*?<short-name>PQR<\/short-name>(?:(?:(?(1)(?!)).*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO1<\/PARAMETER-REF>)|(?:(?(2)(?!)).*?<V>(.*?)<\/V>.*?<PARAMETER-REF[ ]DEST="PARAMETER-DATA-PROTOTYPE">.*?MNO2<\/PARAMETER-REF>)){1,2}/gis)
<R-PORT-PROTOTYPE .*? <short-name>PQR</short-name>
(?:
(?:
(?(1) (?!) )
.*?
<V>
( .*? ) # (1)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO1</PARAMETER-REF>
)
|
(?:
(?(2) (?!) )
.*?
<V>
( .*? ) # (2)
</V>
.*?
<PARAMETER-REF [ ] DEST="PARAMETER-DATA-PROTOTYPE"> .*? MNO2</PARAMETER-REF>
)
){1,2}

How to find the next unbalanced brace?

The regex below captures everything up to the last balanced }.
Now, what regex would be able to capture everything up to the next unbalanced }? In other words, how can I can get ... {three {four}} five} from $str instead of just ... {three {four}}?
my $str = "one two {three {four}} five} six";
if ( $str =~ /
(
.*?
{
(?> [^{}] | (?-1) )+
}
)
/sx
)
{
print "$1\n";
}

So you want to match
[noncurlies [block noncurlies [...]]] "}"
where a block is
"{" [noncurlies [block noncurlies [...]]] "}"
As a grammar:
start : text "}"
text : noncurly* ( block noncurly* )*
block : "{" text "}"
noncurly : /[^{}]/
As a regex (5.10+):
/
^
(
(
[^{}]*
(?:
\{ (?-1) \}
[^{}]*
)*
)
\}
)
/x
As a regex (5.10+):
/
^ ( (?&TEXT) \} )
(?(DEFINE)
(?<TEXT> [^{}]* (?: (?&BLOCK) [^{}]* )* )
(?<BLOCK> \{ (?&TEXT) \} )
)
/x

Perl regular expression {} quantifier multiple matches

Im trying to parse a file wherein each line has 3 floats(1, +1.0 -1.0 being valid values) and while the regular expression in the snippet matches a float value, I'm not sure how I should be using the Perl quantifier {n} to match multiple floats within a single line.
#!/usr/bin/perl
use strict;
use warnings;
open(my $fh, "<", "floatNumbers.txt") or die "Cannot open < floatNumbers.txt";
while(<$fh>)
{
if ($_=~m/([-+]?\d*[\.[0-9]*]?\s*)/)
{
print $1."\n";
}
}
Code snippet, I tried to match 3 floats within a line. Could readers help me with the correct usage of the {} quantifier?
if ($_=~m/([-+]?\d*[\.[0-9]*]?\s*){3}/)

You're trying to do extraction and validation at the same time. I'd go with:
sub is_float {
return $_[0] =~ /
^
[-+]?
(?: \d+(?:\.[0-9]*)? # 9, 9., 9.9
| \.[0-9]+ # .9
)
\z
/x;
}
while (<$fh>) {
my #fields = split;
if (#fields != 3 || grep { !is_float($_) } #fields) {
warn("Syntax error at line $.\n");
next;
}
print("#fields\n");
}
Note that your validation consdered ., [ and ...0...0... to be numbers. I fixed that.

Quntifiers just allow you to specify how many times you want to match something in a regex.
For example /(ba){3}/ would match ba in a string exactly 3 times :
bababanfnfd = bababa but not
baba = no match.
You can also use (taken from: http://perldoc.perl.org/perlrequick.html):
a? = match 'a' 1 or 0 times
a* = match 'a' 0 or more times, i.e., any number of times
a+ = match 'a' 1 or more times, i.e., at least once
a{n,m} = match at least n times, but not more than m times.
a{n,} = match at least n or more times
a{n} = match exactly n times

This is a generalized pattern that I think does what you are talking about:
# ^\s*(?:[-+]?(?=[^\s\d]*\d)\d*\.?\d*(?:\s+|$)){3}$
^ # BOL
\s* # optional whitespaces
(?: # Grouping start
[-+]? # optional -+
(?= [^\s\d]* \d ) # lookahead for \d
\d* \.? \d* # match this form (everything optional but guaranteed a \d)
(?: \s+ | $ ) # whitespaces or EOL
){3} # Grouping end, do 3 times
$ # EOL

perl stream file for regex token including scanned tokens

I am trying to stream a file in perl and tokenize the lines and include the tokens.
I have:
while( $line =~ /([\/][\d]*[%].*?[%][\d]*[\/]|[^\s]+|[\s]+)/g ) {
my $word = $1;
#...
}
But it doesn't work when there's no spaces in the token.
For example, if my line is:
$line = '/15%one (1)(2)%15/ is a /%good (1)%/ +/%number(2)%/.'
I would like to split that line into:
$output =
[
'/15%one (1)(2)%15/',
' ',
'is',
' ',
'a',
'/%good (1)%/',
' ',
'+',
'/%number(2)%/',
'.'
]
What is the best way to do this?

(?:(?!STRING).)* is to STRING as [^CHAR]* is to CHAR, so
my #tokens;
push #tokens, $1
while $line =~ m{
\G
( \s+
| ([\/])([0-9]*)%
(?: (?! %\3\2 ). )*
%\3\2
| (?: (?! [\/][0-9]*% )\S )+
)
}sxg;
but that doesn't validate. If you want to validate, you could use
my #tokens;
push #tokens, $1
while $line =~ m{
\G
( \s+
| ([\/])([0-9]*)%
(?: (?! %\3\2 ). )*
%\3\2
| (?: (?! [\/][0-9]*% )\S )+
| \z (*COMMIT) (*FAIL)
| (?{ die "Syntax error" })
)
}sxg;
The following also validates, but it's a bit more readable and makes it easy to differentiate the token types.:
my #tokens;
for ($line) {
m{\G ( \s+ ) }sxgc
&& do { push #tokens, $1; redo };
m{\G ( ([\/])([0-9]*)% (?: (?! %\3\2 ). )* %\3\2 ) }sxgc
&& do { push #tokens, $1; redo };
m{\G ( (?: (?! [\/][0-9]*% )\S )+ ) }sxgc
&& do { push #tokens, $1; redo };
m{\G \z }sxgc
&& last;
die "Syntax error";
}
pos will get you information about where the error occurred.

How to match words separated with single space vs words separated with multiple spaces

I need to separate the key and values from the text that looks like below
Student ID: 0
Department ID = 18432
Name XYZ
Subjects:
Computer Architecture
Advanced Network Security 2
In the above example Student ID, Department ID and Name are the keys and 0,18432, XYZ are values. The keys are separated from the values either by :,= or multiple spaces. I tried reg ex such as
$line =~ /(([\w\(\)]*\s)*)([=:\s?]?)\s*(\S.*)?$/;
$key = $2;
$colon=$3;
$value = $4;
The problem I am facing is identifying when a word is separated with single space and when it is separated by more than one.
The output I get is
line is Student ID: 0
key is Student , value is ID: 0
while I want key is Student ID and value is 0. For lines like Subjects: and Computer Architecture, the key should have Subjects and Computer Architecture. I have logic later when there is no value or colon, I append the strings to the previous key so it will look like Subjects=Computer Architecture;Advanced Network Security 2
Update: Thanks Ikegami for indicating that I use look behind operator. But I still seem to have problem solving it.
$line=~/^(?: ( [^:=]+ ) (?<!\s\s)\s* [:=]\s*|\s*)(.*)$/x;
So When I say (?<!\s\s)\s* [:=]\s*|\s* I mean when there more than two spaces, consume all the spaces and when there are no two consecutive spaces look for : or = and consume spaces. So if you pass below line to the expression, Shouldnt I be getting $1=Name and $2=ABC XYZ?
Name ABC XYZ
What I seem to be getting is key is empty and value is Name ABC XYZ.

If
Name Eric Brine
Computer Architecture x86
means
key: Name Eric value: Brine
key: Computer Architecture value: x86
then you want
# Requires 5.10
if (/
^
(?: (?<key> [^:=]+ (?<!\s) ) \s* [:=] \s* (?<val> .* )
| (?<key> .+ (?<!\s) ) \s+ (?<val> \S+ )
)
\s* $
/x) {
my $key = $+{key};
my $val = $+{val};
...
}
or
if (/
^
(?: ( [^:=]+ (?<!\s) ) \s* [:=] \s* ( .* )
| ( .+ (?<!\s) ) \s+ ( \S+ )
)
\s*
( .* )
/x) {
my ($key,$val) = defined($1) ? ($1,$2) : ($3,$4);
...
}
If
Name Eric Brine
Computer Architecture x86
means
key: Name value: Eric Brine
key: Computer value: Architecture x86
then you want
# Requires 5.10
if (/
^
(?: (?<key> [^:=]+ (?<!\s) ) \s* [:=]
| (?<key> \S+ ) \s
)
\s*
(?<val> .* )
/x) {
my $key = $+{key};
my $val = $+{val};
...
}
or
if (/
^
(?: ( [^:=]+ (?<!\s) ) \s* [:=]
| ( \S+ ) \s
)
\s*
( .* )
/x) {
my $key = defined($1) ? $1 : $2;
my $val = $3;
...
}
Note that you can remove all the space and line breaks. For example, the last snippet can be written as:
if (/^(?:([^:=]+(?<!\s))\s*[:=]|(\S+)\s)\s*(.*)/) {
my $key = defined($1) ? $1 : $2;
my $val = $3;
...
}

Try specifying the key part as two bits of text with an optional space in between;
$line =~ /([\w\(\)]*\s?[\w\(\)]*)\s*([=:]?)\s*(\S.*)?$/;
That should capture both one-word and two-word keys.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Parse IF condition using Regular Expression - regex

Related

need return value for captured group from last captured string in perl

How to find the next unbalanced brace?

Perl regular expression {} quantifier multiple matches

perl stream file for regex token including scanned tokens

How to match words separated with single space vs words separated with multiple spaces

Categories

Resources