How can I find out what was replaced in a Perl substitution? - regex

Is there any way to find out what was substituted for (the "old" text) after applying the s/// operator? I tried doing:
if (s/(\w+)/new/) {
my $oldTxt = $1;
# ...
}
But that doesn't work. $1 is undefined.

Your code works for me. Copied and pasted from a real terminal window:
$ perl -le '$_ = "*X*"; if (s/(\w+)/new/) { print $1 }'
X
Your problem must be something else.

If you're using 5.10 or later, you don't have to use the potentially-perfomance-killing $&. The ${^MATCH} variable from the /p flag does the same thing but only for the specified regex:
use 5.010;
if( s/abc(\w+)123/new/p ) {
say "I replaced ${^MATCH}"
}

$& does what you want but see the health warning in perlvar
The use of this variable anywhere in a program imposes a considerable performance penalty on all regular expression matches.
If you can find a way to do this without using $&, try that. You could run the regex twice:
my ($match) = /(\w+)/;
if (s/(\w+)/new/) {
my $oldTxt = $match;
# ...
}

You could make the replacement an eval expression:
if (s/(\w+)/$var=$1; "new"/e) { .. do something with $var .. }

You should be able to use the Perl match variables:
$& Contains the string matched by the last pattern match

Related

Perl how do you assign a varanble to a regex match result

How do you create a $scalar from the result of a regex match?
Is there any way that once the script has matched the regex that it can be assigned to a variable so it can be used later on, outside of the block.
IE. If $regex_result = blah blah then do something.
I understand that I should make the regex as non-greedy as possible.
#!/usr/bin/perl
use strict;
use warnings;
# use diagnostics;
use Win32::OLE;
use Win32::OLE::Const 'Microsoft Outlook';
my #Qmail;
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\\s\*owner \#/";
my $outlook = Win32::OLE->new('Outlook.Application')
or warn "Failed Opening Outlook.";
my $namespace = $outlook->GetNamespace("MAPI");
my $folder = $namespace->Folders("test")->Folders("Inbox");
my $items = $folder->Items;
foreach my $msg ( $items->in ) {
if ( $msg->{Subject} =~ m/^(.*test alert) / ) {
my $name = $1;
print " processing Email for $name \n";
push #Qmail, $msg->{Body};
}
}
for(#Qmail) {
next unless /$regex|^\s*description/i;
print; # prints what i want ie lines that start with owner and description
}
print $sentence; # prints ^\\s\*offense \ # not lines that start with owner.
One way is to verify a match occurred.
use strict;
use warnings;
my $str = "hello what world";
my $match = 'no match found';
my $what = 'no what found';
if ( $str =~ /hello (what) world/ )
{
$match = $&;
$what = $1;
}
print '$match = ', $match, "\n";
print '$what = ', $what, "\n";
Use Below Perl variables to meet your requirements -
$` = The string preceding whatever was matched by the last pattern match, not counting patterns matched in nested blocks that have been exited already.
$& = Contains the string matched by the last pattern match
$' = The string following whatever was matched by the last pattern match, not counting patterns matched in nested blockes that have been exited already. For example:
$_ = 'abcdefghi';
/def/;
print "$`:$&:$'\n"; # prints abc:def:ghi
The match of a regex is stored in special variables (as well as some more readable variables if you specify the regex to do so and use the /p flag).
For the whole last match you're looking at the $MATCH (or $& for short) variable. This is covered in the manual page perlvar.
So say you wanted to store your last for loop's matches in an array called #matches, you could write the loop (and for some reason I think you meant it to be a foreach loop) as:
my #matches = ();
foreach (#Qmail) {
next unless /$regex|^\s*description/i;
push #matches_in_qmail $MATCH
print;
}
I think you have a problem in your code. I'm not sure of the original intention but looking at these lines:
my $regex = "^\\s\*owner \#";
my $sentence = $regex =~ "/^\s*owner #/";
I'll step through that as:
Assign $regexto the string ^\s*owner #.
Assign $sentence to value of running a match within $regex with the regular expression /^s*owner $/ (which won't match, if it did $sentence will be 1 but since it didn't it's false).
I think. I'm actually not exactly certain what that line will do or was meant to do.
I'm not quite sure what part of the match you want: the captures, or something else. I've written Regexp::Result which you can use to grab all the captures etc. on a successful match, and Regexp::Flow to grab multiple results (including success statuses). If you just want numbered captures, you can also use Data::Munge
You can do the following:
my $str ="hello world";
my ($hello, $world) = $str =~ /(hello)|(what)/;
say "[$_]" for($hello,$world);
As you see $hello contains "hello".
If you have older perl on your system like me, perl 5.18 or earlier, and you use $ $& $' like codequestor's answer above, it will slow down your program.
Instead, you can use your regex pattern with the modifier /p, and then check these 3 variables: ${^PREMATCH}, ${^MATCH}, and ${^POSTMATCH} for your matching results.

Perl, Assign regex match to scalar

There's an example snippet in Mail::POP3Client in which theres a piece of syntax that I don't understand why or how it's working:
foreach ( $pop->Head( $i ) ) {
/^(From|Subject):\s+/i and print $_, "\n";
}
The regex bit in particular. $_ remains the same after that line but only the match is printed.
An additional question; How could I assign the match of that regex to a scalar of my own so I can use that instead of just print it?
This is actually pretty tricky. What it's doing is making use of perl's short circuiting feature to make a conditional statement. it is the same as saying this.
if (/^(From|Subject):\s+/i) {
print $_;
}
It works because perl stops evaluating and statements after something evaluates to 0. and unless otherwise specified a regex in the form /regex/ instead of $somevar =~ /regex/ will apply the regex to the default variable, $_
you can store it like this
my $var;
if (/^(From|Subject):\s+/i) {
$var = $_;
}
or you could use a capture group
/^((?:From|Subject):\s+)/i
which will store the whole thing into $1

Perl regex strange behaviour

Method 1:
$C_HOME = "$ENV{EO_HOME}\\common\\";
print $C_HOME;
gives C:\work\System11R1\common\
ie The environment variable is getting expanded.
Method 2:
Parse properties file having
C_HOME = $ENV{EO_HOME}\common\
while(<IN>) {
if(m/(.*)\s+=\s+(.*)/)
{
$o{$1}=$2;
}
}
$C_HOME = $o{"C_HOME"};
print $C_HOME;
This gives a output of $ENV{EO_HOME}\common\
ie The environment variable is not getting expanded.
How do I make sure that the environment variable gets expanded in the second case also.
The problem is in the line:
$o{$1}=$2;
Of course perl will not evaluate $2 automatically as it read it.
If you want, you can evaluate it manually:
$o{$1}=eval($2);
But you must be sure that it is ok from security point of view.
the value of $o{C_HOME} contains the literal string $ENV{C_HOME}\common\. To get the $ENV-value eval-ed, use eval...
$C_HOME = eval $o{"C_HOME"};
I leave it to you to find out why that will fail, however...
Expression must be evaluated:
$C_HOME = eval($o{"C_HOME"});
Perl expands variables in double-quote-like code strings, not in data.
You have to eval a string to explicity interpolate variables inside it, but doing so without checking what you are passing to eval is dangerous.
Instead, look for everything you may want to interpolate inside the string and eval those using a regex substitution with the /ee modifier.
This program looks for all references to elements of the %ENV hash in the config value and replaces them. You may want to add support for whitespace wherever Perl allows it ($ ENV { EO_HOME } compiles just fine). It also assigns test values for %ENV which you will need to remove.
use strict;
use warnings;
my %data;
%ENV = ( EO_HOME => 'C:\work\System11R1' );
while (<DATA>) {
if ( my ($key, $val) = m/ (.*) \s+ = \s* (.*) /x ) {
$val =~ s/ ( \$ENV \{ \w+ \} ) / $1 /gxee;
$data{$key} = $val;
}
}
print $data{C_HOME};
__DATA__
C_HOME = $ENV{EO_HOME}\common\
output
C:\work\System11R1\common\

Opposite of (foo|bar|baz)

I'd like a regex to match everything but a few specific options within a broader expression.
The following example will match test_foo.pl or test_bar.pl or test_baz.pl:
/test_(foo|bar|baz)\.pl/
But I'd like just the opposite:
match test_.*\.pl except for where .* = (foo|bar|baz)
I'm kind of limited in my options for this because this is not directly into a perl program, but an argument to cloc, a program that counts lines of code (that happens to be written in perl). So I'm looking for an answer that can be done in one regex, not multiple chained together.
You should be able to accomplish this by using a negative lookahead:
/test_(?!foo|bar|baz).*\.pl/
This will fail if foo, bar, or baz immediately follows test_.
Note that this could still match something like test_notfoo.pl, and would fail on test_fool.pl, if you do not want this behavior please clarify by adding some examples of what exactly should and should not match.
If you want to accept something like test_fool.pl or test_bart.pl, then you could change it to the following:
/test_(?!(foo|bar|baz)\.pl).*\.pl/
#!/usr/bin/env perl
use strict; use warnings;
my $pat = qr/\Atest_.+(?<!foo|bar|baz)[.]pl\z/;
while (my $line = <DATA>) {
chomp $line;
printf "%s %s\n", $line, $line =~ $pat ? 'matches' : "doesn't match";
}
__DATA__
test_bar.pl
test_foo.pl
test_baz.pl
test baz.pl
0test_bar.pl
test_me.pl
test_me_too.txt
Output:
test_bar.pl doesn't match
test_foo.pl doesn't match
test_baz.pl doesn't match
test baz.pl doesn't match
0test_bar.pl doesn't match
test_me.pl matches
test_me_too.txt doesn't match
(?:(?!STR).)*
is to
STR
as
[^CHAR]
is to
CHAR
So you want
if (/^test_(?:(?!foo|bar|baz).)*\.pl\z/s)
More readable:
my %bad = map { $_ => 1 } qw( foo bar baz );
if (/^test_(.*)\.pl\z/s && !$bad{$1})
Hmm, I might have misunderstood your question. Anyway, maybe this is helpful ...
You would negate the match operator. For example:
perl -lwe "print for grep ! m/(lwp|archive).*\.pl/, glob q(*.pl)"
# Note you'd use single-quotes on Linux but double-quotes on Windows.
# Nothing to do with Perl, just different shells (bash vs cmd.exe).
The ! negates the match. The above is shorthand for:
perl -lwe "print for grep ! ($_ =~ m/(lwp|archive).*\.pl/), glob q(*.pl)"
Which can also be written using the negated match operator !~, as follows:
perl -lwe "print for grep $_ !~ m/(lwp|archive).*\.pl/, glob q(*.pl)"
In case you're wondering, the glob is simply used to get an input list of filenames as per your example. I just substituted another match pattern suitable for the files I had handy in a directory.

How do I assign the result of a regex match to a new variable, in a single line?

I want to match and assign to a variable in just one line:
my $abspath='/var/ftp/path/to/file.txt';
$abspath =~ #/var/ftp/(.*)$#;
my $relpath=$1;
I'm sure it must be easy.
my ($relpath) = $abspath =~ m#/var/ftp/(.*)$#;
In list context the match returns the values of the groups.
Obligatory Clippy: "Hi! I see you are doing path manipulation in Perl. Do you want to use Path::Class instead?"
use Path::Class qw(file);
my $abspath = file '/var/ftp/path/to/file.txt';
my $relpath = $abspath->relative('/var/ftp');
# returns "path/to/file.txt" in string context
You can accomplish it with the match and replace operator:
(my $relpath = $abspath ) =~ s#/var/ftp/(.*)#$1# ;
This code assigns $abspath to $relpath and then applies the regex on it.
Edit: Qtax answer is more elegant if you just need simple matches. If you ever need complex substitutions (as I usually need), just use my expression.
With Perl 5.14 you can also use the /r (non destructive substitution) modifier:
perl -E'my $abspath="/var/ftp/path/to/file.txt"; \
my $relpath= $abspath=~ s{/var/ftp/}{}r; \
say "abspath: $abspath - relpath: $relpath"'
See "New Features of Perl 5.14: Non-destructive Substitution" for more examples.
As you just want to remove the beginning of the string you could optimize the expression:
(my $relpath = $abspath) =~ s#^/var/ftp/##;
Or even:
my $relpath = substr($abspath, 9);