Substitute: replacement evaluation - replace

Is the first and the second substitution equivalent if the replacement is passed in a variable?
#!/usr/bin/env perl6
use v6;
my $foo = 'switch';
my $t1 = my $t2 = my $t3 = my $t4 = 'this has a $foo in it';
my $replace = prompt( ':' ); # $0
$t1.=subst( / ( \$ \w+ ) /, $replace );
$t2.=subst( / ( \$ \w+ ) /, { $replace } );
$t3.=subst( / ( \$ \w+ ) /, { $replace.EVAL } );
$t4.=subst( / ( \$ \w+ ) /, { ( $replace.EVAL ).EVAL } );
say "T1 : $t1";
say "T2 : $t2";
say "T3 : $t3";
say "T4 : $t4";
# T1 : this has a $0 in it
# T2 : this has a $0 in it
# T3 : this has a $foo in it
# T4 : this has a switch in it

The only difference between $replace and {$replace} is that the second is a block that returns the value of the variable. It's only adding a level of indirection, but the result is the same.
Update: Edited according to #raiph's comments.

Related

How to match string that contain exact 3 time occurrence of special character in perl

I have try few method to match a word that contain exact 3 times slash but cannot work. Below are the example
#array = qw( abc/ab1/abc/abc a2/b1/c3/d4/ee w/5/a s/t )
foreach my $string (#array){
if ( $string =~ /^\/{3}/ ){
print " yes, word with 3 / found !\n";
print "$string\n";
}
else {
print " no word contain 3 / found\n";
}
Few macthing i try but none of them work
$string =~ /^\/{3}/;
$string =~ /^(\w+\/\w+\/\w+\/\w+)/;
$string =~ /^(.*\/.*\/.*\/.*)/;
Any other way i can match this type of string and print the string?
Match a / globally and compare the number of matches with 3
if ( ( () = m{/}g ) == 3 ) { say "Matched 3 times" }
where the =()= operator is a play on context, forcing list context on its right side but returning the number of elements of that list when scalar context is provided on its left side.
If you are uncomfortable with such a syntax stretch then assign to an array
if ( ( my #m = m{/}g ) == 3 ) { say "Matched 3 times" }
where the subsequent comparison evaluates it in the scalar context.
You are trying to match three consecutive / and your string doesn't have that.
The pattern you need (with whitespace added) is
^ [^/]* / [^/]* / [^/]* / [^/]* \z
or
^ [^/]* (?: / [^/]* ){3} \z
Your second attempt was close, but using ^ without \z made it so you checked for string starting with your pattern.
Solutions:
say for grep { m{^ [^/]* (?: / [^/]* ){3} \z}x } #array;
or
say for grep { ( () = m{/}g ) == 3 } #array;
or
say for grep { tr{/}{} == 3 } #array;
You need to match
a slash
surrounded by some non-slashes (^(?:[^\/]*)
repeating the match exactly three times
and enclosing the whole triple in start of line and and of line anchors:
$string =~ /^(?:[^\/]*\/[^\/]*){3}$/;
if ( $string =~ /\/.*\/.*\// and $string !~ /\/.*\/.*\/.*\// )

Regex parsing of DNS answer

I would like to parse the following lines
8.8.19.12.53 > 125.15.15.9.40583: [udp sum ok] 62639 q: A? mp.microsoft.com. 6/5/9 mp.microsoft.com. CNAME .mp.microsoft.com.c.footprint.net., mp.microsoft.com.c.footprint.net. A 8.250.143.254, mp.microsoft.com.c.footprint.net. A 8.250.157.254 ns: c.footprint.net. NS d.ns.c.footprint.net. ar: d.ns.c.footprint.net. A 4.26.235.155 (439)
8.8.19.12.53 > 125.15.15.9.42091: [udp sum ok] 46555 q: A? www.toto.net. 1/0/0 www.toto.net. A 120.33.1.11 (47)
and get the following output
125.15.15.9 mp.microsoft.com A 8.250.143.254 A 8.250.157.254
125.15.15.9 www.toto.net A 120.33.1.11
I succeeded in parsing the first two fields with command
sed -Eun 's/[^>]+> ([0-9.]+)\.[0-9]+:.+q: A\? ([a-z0-9.-]+)\.([^:]+).*/\1:\2:\3/pg
`
But I cannot get the resolved IPs (A xx.xx.xx.xx). In fact there may be several.
Would it be possible to get such output using sed or Perl ?
EDIT:
As I added in comments, parsing of a larger input sample, I also require several lines to be discarded in the output. This lines can be characterized by :
the number of A records ("A xx.xx.xx.xx") is non null
or the line must not contains NXDomain\*?-
I succeed in meeting the new first need, but not for the second.
Following the #ikegami reply, here is my attempt:
perl -nle '
my $field_value_re = qr/(?![^\s:]++:(?!\S)) \S++ (?: (?! \s++ [^\s:]++:(?!\S) ) \s++ \S++ )*+/x;
my ($id, $rest) = /^ \s+ ( [^:]++ ) : \s++ $field_value_re ( .* ) /sx
or next;
my ($ip) = $id =~ /^ \S++ \s++ \S++ \s++ ( [^\s\.]++\.[^\s\.]++\.[^\s\.]++\.[^\s\.]++ )\.[^\s\.]++ \z /x
or next;
my %fields = $rest =~ /\G \s++ ( [^\s:]++ ) :(?!\S) \s++ ( $field_value_re ) /gsx;
my ($query, $answers) = $fields{q} =~ /^ A\? \s++ ( \S++ ) \s++ \S++ \s++ ( .* ) /sx
or next;
$query =~ s/\.\z//;
my #answers = split(/\s*+,\s*+/, $answers);
my ($afield) = join " ", map { /^\S++\s++A\s++(\S++)/ } #answers;
if ( length($afield) != 0)
{
print join " ", $ip, $query, $afield;
}
' dns.sample
This does as you ask with the sample data
I first build a regex pattern $url_re that matches numeric URLs to make the following code more concise. Then I search for the first URL immediately after >, the named URL right after A?, and all of the following URLs which are preceded by A
They are all stored in array #urls and printed
use strict;
use warnings 'all';
use 5.010;
my $url_re = qr/(?:\d+\.){3}\d+/;
while ( <DATA> ) {
my #urls = ( />\s+($url_re)/, /A\?\s+([-\w.]+\w)/, /(A\s+$url_re)/g );
say "#urls";
}
__DATA__
8.8.19.12.53 > 125.15.15.9.40583: [udp sum ok] 62639 q: A? mp.microsoft.com. 6/5/9 mp.microsoft.com. CNAME .mp.microsoft.com.c.footprint.net., mp.microsoft.com.c.footprint.net. A 8.250.143.254, mp.microsoft.com.c.footprint.net. A 8.250.157.254 ns: c.footprint.net. NS d.ns.c.footprint.net. ar: d.ns.c.footprint.net. A 4.26.235.155 (439)
8.8.19.12.53 > 125.15.15.9.42091: [udp sum ok] 46555 q: A? www.toto.net. 1/0/0 www.toto.net. A 120.33.1.11 (47)
output
125.15.15.9 mp.microsoft.com A 8.250.143.254 A 8.250.157.254 A 4.26.235.155
125.15.15.9 www.toto.net A 120.33.1.11
Each line appears to be of the form
{"id" with spaces}: {stuff} [ {key}: {stuff} ]*
You appear to be interested in information inside the "id", and inside the field named q. The value of the q field appears to be of the form
A? {word} {word} {ns_return} [, {ns_return} ]*
Here's a robust solution that handles the format described above.
perl -nle'
my $field_value_re = qr/(?![^\s:]++:(?!\S)) \S++ (?: (?! \s++ [^\s:]++:(?!\S) ) \s++ \S++ )*+/x;
my ($id, $id_val, $rest) = /^ ( [^:]++ ) : \s++ ( $field_value_re ) ( .* ) /sx
or next;
next if $id_val =~ /\bNXDomain\b/;
my ($ip) = $id =~ /^ \S++ \s++ \S++ \s++ ( [^\s\.]++\.[^\s\.]++\.[^\s\.]++\.[^\s\.]++ )\.[^\s\.]++ \z /x
or next;
my %fields = $rest =~ /\G \s++ ( [^\s:]++ ) :(?!\S) \s++ ( $field_value_re ) /gsx;
my ($query, $answers) = $fields{q} =~ /^ A\? \s++ ( \S++ ) \s++ \S++ \s++ ( .* ) /sx
or next;
$query =~ s/\.\z//;
my #answers =
map { /^\S++\s++A\s++(\S++)/ }
split(/\s*+,\s*+/, $answers);
next if !#answers;
print join " ", $ip, $query, map { "A $_" } #answers;
' log
125.15.15.9 mp.microsoft.com A 8.250.143.254 A 8.250.157.254
125.15.15.9 www.toto.net A 120.33.1.11
This prints the desired output by using the map function in a somewhat unorthodox way to ignore any fields after q:
perl -lne 'print join qq/\t/, m/> ([\d\.]+)\./, map {/A\? ([^\s]+)\./, /(A [\d\.]+)/g} / q:([^:]+)/' log.txt

Rewriting a recursive regex for older Perl version

The following piece of code works just fine with Perl (v5.16.2). However, when I run it using Perl v5.8.9, it complains about the following regex. How can I rewrite this regex in a way that works with Perl v5.8.9. (I can't update the version).
REGEX:
use strict;
use warnings;
our %formula_per_k;
INIT {
# List all functions that you want to allow in formulas. All other words will be interpretted as variables.
my #FORMULA_FUNCS = qw(sqrt exp log);
# Load the data via a file.
my $data = do {local $/; <DATA>};
# Parse K blocks
while ($data =~ m{
^K \s+ (\w+) \s* \{
( (?: [^{}]+ | \{(?2)\} )* ) # Matched braces only.
\}
}mgx) {
my ($name, $params) = ($1, $2);
# Parse LOL block
next if $params !~ m{
LOL \s* \{
( (?: [^{}]+ | \{(?1)\} )*? ) # Matched braces only.
\}
}mx;
my $lol = $1;
# Start building anonymous subroutine
my $conditions = '';
# Parse Conditions and Formulas
while ($lol =~ m{
COND \s* \{ (.*?) \} \s*
FORMULA \s* \{ (.*?) \}
}gx) {
my ($cond, $formula) = ($1, $2);
# Remove Excess spacing and translate variable into perl scalar.
for ($cond, $formula) {
s/^\s+|\s+$//g;
s{([a-zA-Z]+)}{
my $var = $1;
$var = "\$hashref->{$var}" if ! grep {$var eq $_} #FORMULA_FUNCS;
$var
}eg;
}
$conditions .= "return $formula if $cond; ";
}
my $code = "sub {my \$hashref = shift; ${conditions} return; }";
my $sub = eval $code;
if ($#) {
die "Invalid formulas in $name: $#";
}
$formula_per_k{$name} = $sub;
}
}
sub formula_per_k {
my ($k, $vars) = #_;
die "Unrecognized K value '$k'" if ! exists $formula_per_k{$k};
return $formula_per_k{$k}($vars);
}
print "'K1', {d => .1} = " . formula_per_k('K1', {d => .1}) . "\n";
print "'K1', {d => .05} = " . formula_per_k('K1', {d => .05}) . "\n";
print "'K3', {d => .02} = " . formula_per_k('K3', {d => .02}) . "\n";
print "'K3', {d => .021} = " . formula_per_k('K3', {d => .021}) . "\n";
__DATA__
... #OTHER STUFFS
K K1 {
LOL {
COND { d < 0.01 }
FORMULA { -0.2 + 3.3*sqrt(d) }
COND { d >= 0.01 }
FORMULA { -0.2 + 3.3*sqrt(d+0.4) }
}
}
... #OTHER STUFFS
K K2 {
LOL {
COND { d < 0.03 }
FORMULA { -2.2 + 1.3*sqrt(d) }
COND { d >= 0.03 }
FORMULA { -2.2 + 1.3*sqrt(d+0.8) }
}
}
... #OTHER STUFFS
K K3 {
LOL {
COND { d < 0.02 }
FORMULA { -4.3 + 0.3*sqrt(d) }
COND { d >= 0.02 }
FORMULA { -4.3 + 0.3*sqrt(d+0.3) }
}
}
... #OTHER STUFF
Outputs:
'K1', {d => .1} = 2.13345237791561
'K1', {d => .05} = 2.01370729772479
'K3', {d => .02} = -4.13029437251523
'K3', {d => .021} = -4.13002941430942
ERROR:
Sequence (?1...) not recognized in regex; marked by <-- HERE in m/
^K \s+ M3 \s* {
( (?: [^{}]+ | {(?2 <-- HERE )} )* ) # Matched braces only.
}
/ at ./code.pl line 215, <RFILE> line 12.
UPDATE:
Code is updated.
This was originally suggested by https://stackoverflow.com/users/1733163/miller
Before the introduction of (?PARNO), we had to use (??{ code }) to create recursive regular expressions. An example can be found in perlre - Extended Patterns.
The following is tested on v5.16.2, v5.20.0, and locally on a v5.8.9 perlbrew:
our $braces_re;
$braces_re = qr{
\{
(?:
(?> [^{}]+ )
|
(??{ $braces_re })
)*
\}
}sx;
# parse FOO block
while (
$data =~ m{
^FOO \s+ (\w+) \s* \{
( (?: [^{}]+ | (??{ $braces_re }) )* ) # Matched braces only.
\}
}mgx
)
{
my $params = $1;
# parse BAR block
next if $params !~ m{
BAR \s* \{
( (?: [^{}]+ | (??{ $braces_re }) )*? ) # Matched braces only.
\}
}mx;
# SOME CODE
}
Note, I intentionally separated out the declaration of the _re variable and its initialization. There are some versions of perl that will let you declare a recursive regular expression in the same statement as the initialization, but v5.8.9 is not one of them.
Also, if you're comfortable altering your original regex more than just dropping in a replacement for (?PARNO) notation, then the above can be reduced to the following. Also confirmed on v5.16.2:
my $braces_re;
$braces_re = qr{
(?:
(?> [^{}]+ )
| # The following is a "postponed" regular subexpression.
\{ (??{ $braces_re }) \} # Deferred execution enables recursive regex
)*
}sx;
# parse FOO block
while ( $data =~ m{^FOO \s+ (\w+) \s* \{ ( $braces_re ) \} }mgx ) {
my $params = $1;
# parse BAR block
next if $params !~ m{BAR \s* \{ ( $braces_re ) \}}mx;
# SOME CODE
}

Perl Parsing CSV file with embedded commas

I'm parsing a CSV file with embedded commas, and obviously, using split() has a few limitations due to this.
One thing I should note is that the values with embedded commas are surrounded by parentheses, double quotes, or both...
for example:
(Date, Notional),
"Date, Notional",
"(Date, Notional)"
Also, I'm trying to do this without using any modules for certain reasons I don't want to go into right now...
Can anyone help me out with this?
This should do what you need. It works in a very similar way to the code in Text::CSV_PP, but doesn't allow for escaped characters within the field as you say you have none
use strict;
use warnings;
use 5.010;
my $re = qr/(?| "\( ( [^()""]* ) \)" | \( ( [^()]* ) \) | " ( [^"]* ) " | ( [^,]* ) ) , \s* /x;
my $line = '(Date, Notional 1), "Date, Notional 2", "(Date, Notional 3)"';
my #fields = "$line," =~ /$re/g;
say "<$_>" for #fields;
output
<Date, Notional 1>
<Date, Notional 2>
<Date, Notional 3>
Update
Here's a version for older Perls (prior to version 10) that don't have the regex branch reset construct. It produces identical output to the above
use strict;
use warnings;
use 5.010;
my $re = qr/(?: "\( ( [^()""]* ) \)" | \( ( [^()]* ) \) | " ( [^"]* ) " | ( [^,]* ) ) , \s* /x;
my $line = '(Date, Notional 1), "Date, Notional 2", "(Date, Notional 3)"';
my #fields = grep defined, "$line," =~ /$re/g;
say "<$_>" for #fields;
I know you already have a working solution with Borodin's answer, but for the record there is also a simple solution with split (see the results at the bottom of the online demo). This situation sounds very similar to regex match a pattern unless....
#!/usr/bin/perl
$regex = '(?:\([^\)]*\)|"[^"]*")(*SKIP)(*F)|\s*,\s*';
$subject = '(Date, Notional), "Date, Notional", "(Date, Notional)"';
#splits = split($regex, $subject);
print "\n*** Splits ***\n";
foreach(#splits) { print "$_\n"; }
How it Works
The left side of the alternation | matches complete (parentheses) and (quotes), then deliberately fails. The right side matches commas, and we know they are the right commas because they were not matched by the expression on the left.
Possible Refinements
If desired, the parenthess-matching portion could be made recursive to match (nested(parens))
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
I know that this is quite old question, but for completeness I would like to add solution from great book "Mastering Regular Expressions" by Jeffrey Friedl (page 271):
sub parse_csv {
my $text = shift; # record containing comma-separated values
my #fields = ( );
my $field;
chomp($text);
while ($text =~ m{\G(?:^|,)(?:"((?>[^"]*)(?:""[^"]*)*)"|([^",]*))}gx) {
if (defined $2) {
$field = $2;
} else {
$field = $1;
$field =~ s/""/"/g;
}
# print "[$field]";
push #fields, $field;
}
return #fields;
}
Try it against test row:
my $line = q(Ten Thousand,10000, 2710 ,,"10,000",,"It's ""10 Grand"", baby",10K);
my #fields = parse_csv($line);
my $i;
for ($i = 0; $i < #fields; $i++) {
print "$fields[$i],";
}
print "\n";

How can I parse runmqsc command output using Perl?

I am trying devise Perl regex to parse command output from IBM's runmqsc utility.
Each line of output of interest contains one or more attribute/value pairs with format: "ATTRIBUTE(VALUE)". The value for an attribute can be empty, or can contain parenthesis itself. Typically, a maximum of two attribute/value pairs appear on a given line, so the regex is written under this assumption.
Example input to Perl RE:
CHANNEL(TO.IPTWX01) CHLTYPE(CLUSRCVR)
DISCINT(6000) SHORTRTY(10)
TRPTYPE(TCP) DESCR( )
LONGTMR(1200) SCYEXIT( )
CONNAME(NODE(1414)) MREXIT( )
MREXIT( ) CONNAME2(SOME(1416))
TPNAME( ) BATCHSZ(50)
MCANAME( ) MODENAME( )
ALTTIME(00.41.56) SSLPEER()
CONTRIVED() ATTR (00-41-56)
CONTRIVED() DOCTORED()
MSGEXIT( )
I have the following Perl code to capture each attribute/value pair.
Perl Code
my $resplit = qr/\s+([^\s]+(?:\([^)]*\))?)\s?/;
while ( <IN2> )
{ s/[\s\r\n]+$//;
if ( m/^\s(?:$resplit)(?:$resplit)?$/ )
{ my ($one,$two) = ($1,$2);
print "one: $one, two: $two\n";
}
}
Here's the output when the above code is applied to sample input:
one: CHANNEL(TO.IPTWX01), two: CHLTYPE(CLUSRCVR)
one: DISCINT(6000), two: SHORTRTY(10)
one: TRPTYPE(TCP), two: DESCR( )
one: LONGTMR(1200), two: SCYEXIT( )
one: CONNAME(NODE(1414)), two: MREXIT( )
one: MREXIT( ), two: CONNAME2(SOME(1416))
one: TPNAME( ), two: BATCHSZ(50)
one: MCANAME( ), two: MODENAME( )
one: ALTTIME(00.41.56), two: SSLPEER()
one: CONTRIVED(), two: ATTR(00-41-56)
one: CONTRIVED(), two: DOCTORED()
one: MSGEXIT(, two: )
This works great with the exception of the last line in the output
above. I'm really struggling to figure out how
to modify the above expression $resplit to capture the last case.
Can anyone offer any ideas/suggestions on how to make this work or
another approach?
The Text::Balanced module is designed to handle this sort of problem. This approach will handle any number of columns as well.
use strict;
use warnings;
use Text::Balanced qw(extract_bracketed);
my ($extracted, $remainder, $prefix);
while ( defined($remainder = <DATA>) ){
while ( Get_paren_text() ){
$prefix =~ s/ //g;
print $prefix, $extracted, "\n";
}
}
sub Get_paren_text {
($extracted, $remainder, $prefix)
= extract_bracketed($remainder, '()', '[\w ]+');
return defined $extracted;
}
__DATA__
CHANNEL(TO.IPTWX01) CHLTYPE(CLUSRCVR) FOO( ( BAR) )
DISCINT(6000) SHORTRTY(10) BIZZ((((BUZZ) ) ) ) )
TRPTYPE(TCP) DESCR( )
LONGTMR(1200) SCYEXIT( )
CONNAME(NODE(1414)) MREXIT( )
MREXIT( ) CONNAME2(SOME(1416))
TPNAME( ) BATCHSZ(50)
MCANAME( ) MODENAME( )
ALTTIME(00.41.56) SSLPEER()
CONTRIVED() ATTR (00-41-56)
CONTRIVED() DOCTORED()
MSGEXIT( )
I wanted to try to use Regexp::Grammars.
So here it is:
#! /opt/perl/bin/perl
use strict;
#use warnings;
use 5.10.1;
use Regexp::Grammars;
my $grammar = qr{
<line>
<token: line>
(?: <[pair]> \s* )+
(?{
my $arr = $MATCH{pair};
local $MATCH = {};
for my $pair( #$arr ){
my($key) = keys %$pair;
my($value) = values %$pair;
$MATCH->{$key} = $value;
}
})
<token: pair>
<attrib> \s* \( \s* <value> \s* \)
(?{
$MATCH = {
$MATCH{attrib} => $MATCH{value}
};
})
<token: attrib>
[^()]*?
<token: value>
(?:
<MATCH=pair> |
[^()]*?
)
}x;
use warnings;
my %attr;
while( my $line = <> ){
$line =~ /$grammar/;
for my $key ( keys %{ $/{line} } ){
$attr{$key} = $/{line}{$key};
}
}
use YAML;
say Dump \%attr;
---
ALTTIME: 00.41.56
ATTR: 00-41-56
BATCHSZ: 50
CHANNEL: TO.IPTWX01
CHLTYPE: CLUSRCVR
CONNAME:
NODE: 1414
CONNAME2:
SOME: 1416
CONTRIVED: ''
DESCR: ''
DISCINT: 6000
DOCTORED: ''
LONGTMR: 1200
MCANAME: ''
MODENAME: ''
MREXIT: ''
MSGEXIT: ''
SCYEXIT: ''
SHORTRTY: 10
SSLPEER: ''
TPNAME: ''
TRPTYPE: TCP
while ( <IN2> ) {
while ( /([A-Z]+)\s*(\((?:[^()]*+|(?2))*\))/g ) {
print "$1$2\n";
}
}
This works for nested parens e.g.
CONNAME(NODE(1414, SOME(1416) ) ) ATTR (00-41-56)
The (?2) part is recursive, the *+ means "don't backtrack" - only works in Perl 5.10 or later; I got this from http://faq.perl.org/perlfaq6.html#Can_I_use_Perl_regul
#!/usr/bin/perl
use strict;
use warnings;
my #parsed;
while ( my $line = <DATA> ) {
while ( $line =~ / ([A-Z0-9]+) \s* \( (.*?) \) \s /gx ) {
push #parsed, { $1 => $2 }
}
}
use Data::Dumper;
print Dumper \#parsed;
__DATA__
CHANNEL(TO.IPTWX01) CHLTYPE(CLUSRCVR)
DISCINT(6000) SHORTRTY(10)
TRPTYPE(TCP) DESCR( )
LONGTMR(1200) SCYEXIT( )
CONNAME(NODE(1414)) MREXIT( )
MREXIT( ) CONNAME2(SOME(1416))
TPNAME( ) BATCHSZ(50)
MCANAME( ) MODENAME( )
ALTTIME(00.41.56) SSLPEER()
CONTRIVED() ATTR (00-41-56)
CONTRIVED() DOCTORED()
MSGEXIT( )