Perl regex strange behaviour

Perl regex strange behaviour - regex

Method 1:
$C_HOME = "$ENV{EO_HOME}\\common\\";
print $C_HOME;
gives C:\work\System11R1\common\
ie The environment variable is getting expanded.
Method 2:
Parse properties file having
C_HOME = $ENV{EO_HOME}\common\
while(<IN>) {
if(m/(.*)\s+=\s+(.*)/)
{
$o{$1}=$2;
}
}
$C_HOME = $o{"C_HOME"};
print $C_HOME;
This gives a output of $ENV{EO_HOME}\common\
ie The environment variable is not getting expanded.
How do I make sure that the environment variable gets expanded in the second case also.

The problem is in the line:
$o{$1}=$2;
Of course perl will not evaluate $2 automatically as it read it.
If you want, you can evaluate it manually:
$o{$1}=eval($2);
But you must be sure that it is ok from security point of view.

the value of $o{C_HOME} contains the literal string $ENV{C_HOME}\common\. To get the $ENV-value eval-ed, use eval...
$C_HOME = eval $o{"C_HOME"};
I leave it to you to find out why that will fail, however...

Expression must be evaluated:
$C_HOME = eval($o{"C_HOME"});

Perl expands variables in double-quote-like code strings, not in data.
You have to eval a string to explicity interpolate variables inside it, but doing so without checking what you are passing to eval is dangerous.
Instead, look for everything you may want to interpolate inside the string and eval those using a regex substitution with the /ee modifier.
This program looks for all references to elements of the %ENV hash in the config value and replaces them. You may want to add support for whitespace wherever Perl allows it ($ ENV { EO_HOME } compiles just fine). It also assigns test values for %ENV which you will need to remove.
use strict;
use warnings;
my %data;
%ENV = ( EO_HOME => 'C:\work\System11R1' );
while (<DATA>) {
if ( my ($key, $val) = m/ (.*) \s+ = \s* (.*) /x ) {
$val =~ s/ ( \$ENV \{ \w+ \} ) / $1 /gxee;
$data{$key} = $val;
}
}
print $data{C_HOME};
__DATA__
C_HOME = $ENV{EO_HOME}\common\
output
C:\work\System11R1\common\

Related

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.

One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";

Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi

The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.

I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge

You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".

If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

perl replacing serialized strings from sql dump

I'm having to replace fqdn's inside a SQL dump for website migration purposes. I've written a perl filter that's supposed to take STDIN, replace the serialized strings containing the domain name that's supposed to be replaced, replace it with whatever argument is passed into the script, and output to STDOUT.
This is what I have so far:
my $search = $ARGV[0];
my $replace = $ARGV[1];
my $offset_s = length($search);
my $offset_r = length($replace);
my $regex = eval { "s\:([0-9]+)\:\\\"(https?\://.*)($search.*)\\\"" };
while (<STDIN>) {
my #fs = split(';', $_);
foreach (#fs) {
chomp;
if (m#$regex#g) {
my ( $len, $extra, $str ) = ( $1, $2, $3 );
my $new_len = $len - $offset_s + $offset_r;
$str =~ eval { s/$search/$replace/ };
print 's:' . $new_len . ':' . $extra . $str . '\"'."\n";
}
}
}
The filter gets passed data that may look like this (this is taken from a wordpress dump, but we're also supposed to accommodate drupal dumps:
INSERT INTO `wp_2_options` VALUES (1,'siteurl','http://to.be.replaced.com/wordpress/','yes'),(125,'dashboard_widget_options','
a:2:{
s:25:\"dashboard_recent_comments\";a:1:{
s:5:\"items\";i:5;
}
s:24:\"dashboard_incoming_links\";a:2:{
s:4:\"home\";s:31:\"http://to.be.replaced.com/wordpress\";
s:4:\"link\";s:107:\"http://blogsearch.google.com/blogsearch?scoring=d&partner=wordpress&q=link:http://to.be.replaced.com/wordpress/\";
}
}
','yes'),(148,'theme_175','
a:1:{
s:13:\"courses_image\";s:37:\"http://to.be.replaced.com/files/image.png\";
}
','yes')
The regex works if I don't have any periods in my $search. I've tried escaping the periods, i.e. domain\.to\.be\.replaced, but that didn't work. I'm probably doing this either in a very roundabout way or missing something obvious. Any help would be greatly appreciated.

There is no need to evaluate (eval) your regular expression because of including variables in them. Also, to avoid the special meaning of metacharacters of those variables like $search, escape them using quotemeta() function or including the variable between \Q and \E inside the regexp. So instead of:
my $regex = eval { "s\:([0-9]+)\:\\\"(https?\://.*)($search.*)\\\"" };
Use:
my $regex = qr{s\:([0-9]+)\:\\\"(https?\://.*)(\Q$search\E.*)\\\"};
or
my $quoted_search = quotemeta $search;
my $regex = qr{s\:([0-9]+)\:\\\"(https?\://.*)($quoted_search.*)\\\"};
And the same advice for this line:
$str =~ eval { s/$search/$replace/ };

you have to double the escape char \ in your $search variable for the interpolated string to contain the escaped periods.
i.e. domain\.to\.be\.replaced -> domain.to.be.replaced (not wanted)
while domain\\.to\\.be\\.replaced -> domain\.to\.be\.replaced (correct).

I'm not sure your perl regex would replace the DNS in string matching several times the old DNS (in the same serialized string).
I made a gist with a script using bash, sed and one big perl regex for this same problem. You may give it a try.
The regex I use is something like that (exploded for lisibility, and having -7 as the known difference between domain names lengths):
perl -n -p -i -e '1 while s#
([;|{]s:)
([0-9]+)
:\\"
(((?!\\";).)*?)
(domain\.to\.be\.replaced)
(.*?)
\\";#"$1".($2-7).":\\\"$3new.domain.tld$6\\\";"#ge;' file
Which is maybe not the best one but at least it seems to de the job. The g option manages lines containing several serialized strings to cleanup and the while loop redo the whole job until no replacement occurs in serilized strings (for strings containing several occurences of the DNS). I'm not fan enough of regex to try a recursive one.

Uninitialized Backreference in Substitution

Assuming that I must do this substitution using a single substitution, what is the preferred method to avoid this error:
Use of uninitialized value $2 in concatenation (.) or string at -e line 1.
With this Perl code:
perl -e 'use strict;use warnings;my $str="a";$str=~s/(a)|(b)/$1foo$2/gsmo;'
The goal here is to either print "afoo" or "foob" depending on what $str contains.
I can use no warnings; but then I am worried I will miss other "real" warnings. I also know that using one pattern makes this convoluted but my actual pattern is much more complicated.
If you care the actual replacements are closer to:
#!perl
my $search = q~(document\.domain.*?</script>)|(</head>)~;
my $search_re = qr/$search/smo;
my $replace = q("$1
<script src=\"/library.js\"></script>
$2");
while (<*.tmpl>) {
my $str = fead_file($_);
$str =~ s/$search_re/$replace/gee;
}
But even more complicated, basically the above code just reads from a DB to get the search & replace and then does them to the template. Having to run this script twice with every commit would introduce too much overhead, apparently... so says them...

You could:
my $replace = q("#{[$1||'']}
<script src=\"/library.js\"></script>
#{[$2||'']}");
(using // instead of || on 5.10+)

Still works with /g:
s/(a)|(b)/ ($1 // '') . 'foo' . ($2 // '') /ge

Well, you can't find both "a" and "b" when you specifically say OR (|). Also, you cannot concatenate the strings by placing the variable name next to the text, e.g. $1foo.
I'm not quite sure what you are saying about overhead, but you do need to check the match in order to do a correct replacement.
s/(a)/$1 . "foo"/ge || s/(b)/"foo" . $1/ge;
This might work. If the first one works, the second won't be executed (short circuit OR).

Similar to ikegami's solution, if you want to hold the replacement in a variable you can call a code reference in s///e passing it the captures.
#!perl
my $search = q~(document\.domain.*?</script>)|(</head>)~;
my $search_re = qr/$search/smo;
my $replace = sub {
my $one = shift || '';
my $two = shift || '';
return qq($one\n<script src="/library.js"></script>\n$two);
}
while (<*.tmpl>) {
my $str = fead_file($_);
$str =~ s/$search_re/$replace->($1, $2)/ge;
}

Matching n parentheses in perl regex

I've got some data that I'm parsing in Perl, and will be adding more and more differently formatted data in the near future. What I would like to do is write an easy-to-use function, that I could pass a string and a regex to, and it would return anything in parentheses. It would work something like this (pseudocode):
sub parse {
$data = shift;
$regex = shift;
$data =~ eval ("m/$regex/")
foreach $x ($1...$n)
{
push (#ra, $x);
}
return \#ra;
}
Then, I could call it like this:
#subs = parse ($data, '^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)');
As you can see, there's a couple of issues with this code. I don't know if the eval would work, the 'foreach' definitely wouldn't work, and without knowing how many parentheses there are, I don't know how many times to loop.
This is too complicated for split, so if there's another function or possibility that I'm overlooking, let me know.
Thanks for your help!

In list context, a regular expression will return a list of all the parenthesized matches.
So all you have to do is:
my #matches = $string =~ /regex (with) (parens)/;
And assuming that it matched, #matches will be an array of the two capturing groups.
So using your regex:
my #subs = $data =~ /^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)/;
Also, when you have long regexes, Perl has the x modifier, which goes after the closing regex delimiter. The x modifier allows you to put white-space and newlines inside the regex for increased readability.
If you are worried about the capturing groups that might be zero length, you can pass the matches through #subs = grep {length} #subs to filter them out.

Then, I could call it like this:
#subs = parse($data,
'^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)');
Instead, call it like:
parse($data,
qr/^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)/);
Further, your task would be made simpler if you can use named captures (i.e. Perl 5.10 and later). Here is an example:
#!/usr/bin/perl
use strict; use warnings;
my %re = (
id => '(?<id> [0-9]+ )',
name => '(?<name> \w+ )',
value => '(?<value> [0-9]+ )',
);
my #this = (
'123,one:12',
'456,two:21',
);
my #that = (
'one:[12],123',
'two:[21],456',
);
my $this_re = qr/$re{id} , $re{name} : $re{value}/x;
my $that_re = qr/$re{name} : \[$re{value}\] , $re{id} /x;
use YAML;
for my $d ( #this ) {
print Dump [ parse($d, $this_re) ];
}
for my $d ( #that ) {
print Dump [ parse($d, $that_re) ];
}
sub parse {
my ($d, $re) = #_;
return unless $d =~ $re;
return my #result = #+{qw(id name value)};
}
Output:
---
- 123
- one
- 12
---
- 456
- two
- 21
---
- 123
- one
- 12
---
- 456
- two
- 21

You are trying to parse a complex expression with a regex - which is an insufficient tool for the job. Recall that regular expressions cannot parse higher grammars. For intuition, any expression which might be nested cannot be parsed with regex.

When you want to find text inside of pairs of parenthesis, you want to use Text::Balanced.
But, that is not what you want to do, so it will not help you.

How can I find out what was replaced in a Perl substitution?

Is there any way to find out what was substituted for (the "old" text) after applying the s/// operator? I tried doing:
if (s/(\w+)/new/) {
my $oldTxt = $1;
# ...
}
But that doesn't work. $1 is undefined.

Your code works for me. Copied and pasted from a real terminal window:
$ perl -le '$_ = "*X*"; if (s/(\w+)/new/) { print $1 }'
X
Your problem must be something else.

If you're using 5.10 or later, you don't have to use the potentially-perfomance-killing $&. The ${^MATCH} variable from the /p flag does the same thing but only for the specified regex:
use 5.010;
if( s/abc(\w+)123/new/p ) {
say "I replaced ${^MATCH}"
}

$& does what you want but see the health warning in perlvar
The use of this variable anywhere in a program imposes a considerable performance penalty on all regular expression matches.
If you can find a way to do this without using $&, try that. You could run the regex twice:
my ($match) = /(\w+)/;
if (s/(\w+)/new/) {
my $oldTxt = $match;
# ...
}

You could make the replacement an eval expression:
if (s/(\w+)/$var=$1; "new"/e) { .. do something with $var .. }

You should be able to use the Perl match variables:
$& Contains the string matched by the last pattern match

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Perl regex strange behaviour - regex

The problem is in the line: $o{$1}=$2; Of course perl will not evaluate $2 automatically as it read it. If you want, you can evaluate it manually: $o{$1}=eval($2); But you must be sure that it is ok from security point of view.

the value of $o{C_HOME} contains the literal string $ENV{C_HOME}\common\. To get the $ENV-value eval-ed, use eval... $C_HOME = eval $o{"C_HOME"}; I leave it to you to find out why that will fail, however...

Expression must be evaluated: $C_HOME = eval($o{"C_HOME"});

Related

Perl how do you assign a varanble to a regex match result

perl replacing serialized strings from sql dump

Uninitialized Backreference in Substitution

Matching n parentheses in perl regex

How can I find out what was replaced in a Perl substitution?

Categories

Resources