Perl regex negation - regex

How do I negate this regular expression (without using !~)?
my $Line='pqr_abc_def_ghi_xyz';
if ($Line=~/(?:abc|def|ghi)/)
{
printf("abc|def|ghi is not present\n");
}
else
{
printf("abc|def|ghi is present\n");
}
Note: abc,def or ghi could be preceded or succeeded by string

if ( $Line =~ /^(?!.*(?:abc|def|ghi))/s ) {
I.e., it is not possible to match that pattern anywhere after the start of the string.

Another way, this might give you more control of the individual component substrings
# (?s)^(?:(?:(?!abc|def|ghi).)+|)$
(?s)
^
(?:
(?:
(?!
abc
| def
| ghi
)
.
)+
|
)
$

Another option could be to use unless instead of if:
unless ($Line=~/(?:abc|def|ghi)/){printf("abc|def|ghi is not present\n");}
else {printf("abc|def|ghi is present\n");}

Related

Perl regular expression {} quantifier multiple matches

Im trying to parse a file wherein each line has 3 floats(1, +1.0 -1.0 being valid values) and while the regular expression in the snippet matches a float value, I'm not sure how I should be using the Perl quantifier {n} to match multiple floats within a single line.
#!/usr/bin/perl
use strict;
use warnings;
open(my $fh, "<", "floatNumbers.txt") or die "Cannot open < floatNumbers.txt";
while(<$fh>)
{
if ($_=~m/([-+]?\d*[\.[0-9]*]?\s*)/)
{
print $1."\n";
}
}
Code snippet, I tried to match 3 floats within a line. Could readers help me with the correct usage of the {} quantifier?
if ($_=~m/([-+]?\d*[\.[0-9]*]?\s*){3}/)
You're trying to do extraction and validation at the same time. I'd go with:
sub is_float {
return $_[0] =~ /
^
[-+]?
(?: \d+(?:\.[0-9]*)? # 9, 9., 9.9
| \.[0-9]+ # .9
)
\z
/x;
}
while (<$fh>) {
my #fields = split;
if (#fields != 3 || grep { !is_float($_) } #fields) {
warn("Syntax error at line $.\n");
next;
}
print("#fields\n");
}
Note that your validation consdered ., [ and ...0...0... to be numbers. I fixed that.
Quntifiers just allow you to specify how many times you want to match something in a regex.
For example /(ba){3}/ would match ba in a string exactly 3 times :
bababanfnfd = bababa but not
baba = no match.
You can also use (taken from: http://perldoc.perl.org/perlrequick.html):
a? = match 'a' 1 or 0 times
a* = match 'a' 0 or more times, i.e., any number of times
a+ = match 'a' 1 or more times, i.e., at least once
a{n,m} = match at least n times, but not more than m times.
a{n,} = match at least n or more times
a{n} = match exactly n times
This is a generalized pattern that I think does what you are talking about:
# ^\s*(?:[-+]?(?=[^\s\d]*\d)\d*\.?\d*(?:\s+|$)){3}$
^ # BOL
\s* # optional whitespaces
(?: # Grouping start
[-+]? # optional -+
(?= [^\s\d]* \d ) # lookahead for \d
\d* \.? \d* # match this form (everything optional but guaranteed a \d)
(?: \s+ | $ ) # whitespaces or EOL
){3} # Grouping end, do 3 times
$ # EOL

Regex with recursive expression to match nested braces?

I'm trying to match text like sp { ...{...}... }, where the curly braces are allowed to nest. This is what I have so far:
my $regex = qr/
( #save $1
sp\s+ #start Soar production
( #save $2
\{ #opening brace
[^{}]* #anything but braces
\} #closing brace
| (?1) #or nested braces
)+ #0 or more
)
/x;
I just cannot get it to match the following text: sp { { word } }. Can anyone see what is wrong with my regex?
There are numerous problems. The recursive bit should be:
(
(?: \{ (?-1) \}
| [^{}]+
)*
)
All together:
my $regex = qr/
sp\s+
\{
(
(?: \{ (?-1) \}
| [^{}]++
)*
)
\}
/x;
print "$1\n" if 'sp { { word } }' =~ /($regex)/;
This is case for the underused Text::Balanced, a very handy core module for this kind of thing. It does rely on the pos of the start of the delimited sequence being found/set first, so I typically invoke it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::Balanced 'extract_bracketed';
sub get_bracketed {
my $str = shift;
# seek to beginning of bracket
return undef unless $str =~ /(sp\s+)(?={)/gc;
# store the prefix
my $prefix = $1;
# get everything from the start brace to the matching end brace
my ($bracketed) = extract_bracketed( $str, '{}');
# no closing brace found
return undef unless $bracketed;
# return the whole match
return $prefix . $bracketed;
}
my $str = 'sp { { word } }';
print get_bracketed $str;
The regex with the gc modifier tells the string to remember where the end point of the match is, and extract_bracketed uses that information to know where to start.

Regular expressions to match protected separated values

I'd like to have a regular expression to match a separated values with some protected values that can contain the separator character.
For instance:
"A,B,{C,D,E},F"
would give:
"A"
"B"
"{C,D,E}"
"F"
Please note the protected values can be nested, as follows:
"A,B,{C,D,{E,F}},G"
would give:
"A"
"B"
"{C,D,{E,F}}"
"G"
I already coded that feature with a character iteration as follow:
sub Parse
{
my #item;
my $curly;
my $string;
foreach(split //)
{
$_ eq "{" and ++$curly;
$_ eq "}" and --$curly;
if(!$curly && /[,:]/)
{
push #item, $string;
undef $string;
next;
}
$string .= $_;
}
push #item, $string;
return #item;
}
But it would definitively be so much nicer with a regexp.
A regex that supports nesting would look as follows:
my #items;
push #items, $1 while
/
(?: ^ | \G , )
(
(?: [^,{}]+
| (
\{
(?: [^{}]
| (?2)
)*
\}
)
| # Empty
)
)
/xg;
$ perl -E'$_ = shift; ... say for #items;' 'A,B,{C,D,{E,F}},G'
A
B
{C,D,{E,F}}
G
Assumes valid input since it can't extract and validate at the same time. (Well, not without making things really messy.)
Improved from nhahtdh's answer.
$_ = "A,B,{C,D,E},F";
while ( m/(\{.*?\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "[$&]\n";
}
Improved it again. Please look at this one!
$_ = "A,B,{C,D,{E,F}},G";
while ( m/(\{.*\}|((?<=^)|(?<=,)).(?=,|$))/g ) {
print "$&\n";
}
It will get:
A
B
{C,D,{E,F}}
G
$a = "A,B,{C,D,E},F";
while ($a =~ s/(\{[\{\}\w,]+\}|\w)//) {
push (#res, $1);
}
print "\#res: #res\n"
Result:
#res: A B {C,D,E} F
Explanation : we try to match either the protected block \{[\{\}\w,]+\} or just a single character \w successively in a loop, deleting it from the original string if there is a match. Every time there is a match, we store it (meaning the $1) in the array, et voilĂ !
Here is a regex in bash:
chronos#localhost / $ echo "A,B,{C,D,E},F" | grep -oE "(\{[^\}]*\}|[A-Z])"
A
B
{C,D,E}
F
Try this regex. Use the regex to match and extract the token.
/(\{.*?\}|(?<=,|^).*?(?=,|$))/
I have not tested this code in Perl.
There is an assumption about on how the regex engine works here (I assume that it will try to match the first part \{.*?\} before the second part). I also assume that there are no nested curly bracket, and badly paired curly brackets.
$s = "A,B,{C,D,E},F";
#t = split /,(?=.*{)|,(?!.*})/, $s;

Matching balanced parenthesis in Perl regex

I have an expression which I need to split and store in an array:
aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } }, aaa="bbb{}" { aa="b}b" }, aaa="bbb,ccc"
It should look like this once split and stored in the array:
aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } }
aaa="bbb{}" { aa="b}b" }
aaa="bbb,ccc"
I use Perl version 5.8 and could someone resolve this?
Use the perl module "Regexp::Common". It has a nice balanced parenthesis Regex that works well.
# ASN.1
use Regexp::Common;
$bp = $RE{balanced}{-parens=>'{}'};
#genes = $l =~ /($bp)/g;
There's an example in perlre, using the recursive regex features introduced in v5.10. Although you are limited to v5.8, other people coming to this question should get the right solution :)
$re = qr{
( # paren group 1 (full function)
foo
( # paren group 2 (parens)
\(
( # paren group 3 (contents of parens)
(?:
(?> [^()]+ ) # Non-parens without backtracking
|
(?2) # Recurse to start of paren group 2
)*
)
\)
)
)
}x;
I agree with Scott Rippey, more or less, about writing your own parser. Here's a simple one:
my $in = 'aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } }, ' .
'aaa="bbb{}" { aa="b}b" }, ' .
'aaa="bbb,ccc"'
;
my #out = ('');
my $nesting = 0;
while($in !~ m/\G$/cg)
{
if($nesting == 0 && $in =~ m/\G,\s*/cg)
{
push #out, '';
next;
}
if($in =~ m/\G(\{+)/cg)
{ $nesting += length $1; }
elsif($in =~ m/\G(\}+)/cg)
{
$nesting -= length $1;
die if $nesting < 0;
}
elsif($in =~ m/\G((?:[^{}"]|"[^"]*")+)/cg)
{ }
else
{ die; }
$out[-1] .= $1;
}
(Tested in Perl 5.10; sorry, I don't have Perl 5.8 handy, but so far as I know there aren't any relevant differences.) Needless to say, you'll want to replace the dies with something application-specific. And you'll likely have to tweak the above to handle cases not included in your example. (For example, can quoted strings contain \"? Can ' be used instead of "? This code doesn't handle either of those possibilities.)
To match balanced parenthesis or curly brackets, and if you want to take under account backslashed (escaped) ones, the proposed solutions would not work. Instead, you would write something like this (building on the suggested solution in perlre):
$re = qr/
( # paren group 1 (full function)
foo
(?<paren_group> # paren group 2 (parens)
\(
( # paren group 3 (contents of parens)
(?:
(?> (?:\\[()]|(?![()]).)+ ) # escaped parens or no parens
|
(?&paren_group) # Recurse to named capture group
)*
)
\)
)
)
/x;
Try something like this:
use strict;
use warnings;
use Data::Dumper;
my $exp=<<END;
aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } } , aaa="bbb{}" { aa="b}b" }, aaa="bbb,ccc"
END
chomp $exp;
my #arr = map { $_ =~ s/^\s*//; $_ =~ s/\s* $//; "$_}"} split('}\s*,',$exp);
print Dumper(\#arr);
Although Recursive Regular Expressions can usually be used to capture "balanced braces" {}, they won't work for you, because you ALSO have the requirement to match "balanced quotes" ".
This would be a very tricky task for a Perl Regular Expression, and I'm fairly certain it's not possible. (In contrast, it could probably be done with Microsoft's "balancing groups" Regex feature).
I would suggest creating your own parser. As you process each character, you count each " and {}, and only split on , if they are "balanced".

How can I match a pipe character followed by whitespace and another pipe?

I am trying to find all matches in a string that begins with | |.
I have tried: if ($line =~ m/^\\\|\s\\\|/) which didn't work.
Any ideas?
You are escaping the pipe one time too many, effectively escaping the backslash instead.
print "YES!" if ($line =~ m/^\|\s\|/);
Pipe character should be escaped with a single backslash in a Perl regex. (Perl regexes are a bit different from POSIX regexes. If you're using this in, say, grep, things would be a bit different.) If you're specifically looking for a space between them, then use an unescaped space. They're perfectly acceptable in a Perl regex. Here's a brief test program:
my #lines = <DATA>;
for (#lines) {
print if /^\| \|/;
}
__DATA__
| | Good - space
|| Bad - no space
| | Bad - tab
| | Bad - beginning space
Bad - no bars
If it's a literal string you're searching for, you don't need a regular expression.
my $search_for = '| |';
my $search_in = whatever();
if ( substr( $search_in, 0, length $search_for ) eq $search_for ) {
print "found '$search_for' at start of string.\n";
}
Or it might be clearer to do this:
my $search_for = '| |';
my $search_in = whatever();
if ( 0 == index( $search_in, $search_for ) ) {
print "found '$search_for' at start of string.\n";
}
You might also want to look at quotemeta when you want to use a literal in a regexp.
Remove the ^ and the double back-slashes. The ^ forces the string to be at the beginning of the string. Since you're looking for all matches in one string, that's probably not what you want.
m/\|\s\|/
What about:
m/^\|\s*\|/