Inline regex replacement in perl - regex

Is there a way to replace text with a regex inline, rather than taking the text from a variable and storing it in a variable?
I'm a perl beginner. I often find myself writing
my $foo = $bar;
$foo =~ s/regex/replacement/;
doStuff($foo)
where I'd really like to write
doStuff($bar->replace(s/regex/replacement/));
or the like, rather than using a temporary variable and three lines.
Is there a way to do this? Obviously when the regex is sufficiently complicated it makes sense to split it out so it can be better explained, but when it's just s/\s//g it feels wrong to clutter the code with additional variables.

You really can't do what you want because the substitution function returns either a 1 if it worked or an empty string if it didn't work. That means if you did this:
doStuff($foo =~ s/regex/replacement/);
The doStuff function would be using either 1 or an empty string as a parameter. There is no reason why the substitution function couldn't return the resultant string instead of just a 1 if it worked. However, it was a design decision from the earliest days of Perl. Otherwise, what would happen with this?
$foo = "widget";
if ($foo =~ s/red/blue/) {
print "We only sell blue stuff and not red stuff!\n";
}
The resulting string is still widget, but the substitution actually failed. However, if the substitution returned the resulting string and not an empty string, the if would still be true.
Then, consider this case:
$bar = "FOO!";
if ($bar =~ s/FOO!//) {
print "Fixed up \'\$bar\'!\n";
}
$bar is now an empty string. If the substitution returned the result, it would return an empty string. Yet, the substitution actually succeeded and I want to my if to be true.
In most languages, the substitution function returns the resulting string, and you'd have to do something like this:
if ($bar != replace("$bar", "/FOO!//")) {
print "Fixed up \'\$bar''!\n";
}
So, because of a Perl design decision (basically to better mimic awk syntax), there's no easy way to do what you want. However you could have done this:
($foo = $bar) =~ s/regex/replacement/;
doStuff($foo);
That would do an in place setting of $foo without first assigning it the value of $bar. $bar would remain unchanged.

Starting from perl 5.14 you can use Non-destructive substitution to achieve desired behavior.
Use /r modifier to do so:
doStuff($bar=~s/regex/replacement/r);

use Algorithm::Loops "Filter";
# leaves $foo unchanged
doStuff( Filter { s/this/that/ } $foo );

You can use a do { } block to avoid creating a temporary variable in the current scope:
doStuff( do {(my $foo = $bar) =~ s/regex/replacement/; $foo} );

Is this what you want?:
my $foo = 'Replace this with that';
(my $bar = $foo) =~ s/this/that/;
print "Foo: $foo\nBar: $bar\n";
Prints:
Foo: Replace this with that
Bar: Replace that with that

There is yet another way: Write your own function:
sub replace (
my $variable = shift;
my $substring = shift;
eval "\$variable =~ s${substring};";
return $variable
}
doStuff(replace($foo, "/regex/replace/"));
This wouldn't be worth it for a single call, and it would probably just make your code more confusing in that case. However, if you're doing this a dozen or so times, it might make more sense to write your own function to do this.

Related

$& not resolved as part of perl substitution

I have a perl script which searches and replaces data in multiple files. Since more than one word can be replaced in a file, I wrote a function that accepts the search and replace patterns as arrays. I then loop over the arrays in this function and perform the substitution. It works well but just for one particular file, I need to append something in front of the matched string( character #). Hence, I pass "#\$&" as my replace pattern. Its received properly but somehow the $& is never resolved. Instead the operation replaces the matched string with literal value of '#$&'. The same thing works if I directly use #$& in my substituion command in the readFile function. I know we may be able to achieve the result in other ways, but I really want to know why the same replacement pattern works when passed directly while it doesn't work when read as an array element.
I have commented the substitution command that works well for reference. Can anyone please help me spot the problem here ?
my #search= ("host\\s*(replication|all)");
my #replace= ("#\$&");
my $sLine = scalar #search;
my $rLine = scalar #replace;
my $data = ???;
for ( my $i=0; $i < $sLine; $i++)
{
print("\n search = $search[$i] replace = $replace[$i] \n");
#$data =~ s/$search[$i]/#$&/g; ==> this works
$data =~ s/$search[$i]/$replace[$i]/g; #==> this doesn't
}
print($data);
The difference between the working solution and the non-working solution is the same as the difference between
print "#$&"; # Prints `#` and the value of `$&`.
and
print "$replace[$i]"; # Prints the value of `$replace[$i]`.
You can use the following:
use String::Substitution qw( gsub_modify );
for my $i (0..$#search) {
gsub_modify($data, $search[$i], $replace[$i]);
}
This is a more in-depth explanation.
s/$search[$i]/#$&/g
is short for
s/$search[$i]/ "#$&" /eg
which is equivalent to
s/$search[$i]/ "#" . $& /eg # Replaces with `#` and the value of `$&`.
/e causes the replacement expression to be evaluated as Perl code, using its result as the replacement string.
On the other hand,
s/$search[$i]/$replace[$i]/g
is short for
s/$search[$i]/ "$replace[$i]" /eg
which is equivalent to
s/$search[$i]/ $replace[$i] /eg # Replaces with the value of `$replace[$i]`.

Perl regular expressions troubles

I have a variable $rowref->[5] which contains the string:
" 1.72.1.13.3.5 (ISU)"
I am using XML::Twig to build modify an XML file and this variable contains the information for the version number of something. So I want to get rid of the whitespaces and the (ISU). I tried to use a substitution and XML::Twig to set the attribute:
$artifact->set_att(version=> $rowref->[5] =~ s/([^0-9\.])//g)
Interestingly what I got in my output was
<artifact [...] version="9"/>
I don't understand what I am doing wrong. I checked with a regular expression tester and it seems fine. Can somebody spot my error?
The return value of s/// is the number of substitutions it made, which in your case is 9. If you are using at least perl 5.14, add the r flag to the substitution:
If the "/r" (non-destructive) option is used then it runs the
substitution on a copy of the string and instead of returning the
number of substitutions, it returns the copy whether or not a
substitution occurred. The original string is never changed when
"/r" is used. The copy will always be a plain string, even if the
input is an object or a tied variable.
Otherwise, go through a temporary variable like this:
my $version = $rowref->[5];
$version =~ s/([^0-9\.])//g;
$artifact->set_att(version => $version);
The regex substitution changes the varialbe in place but returns the number of substitutions it made (1 without the /g modifier, if it was succesful).
my $str = 'words 123';
my $ret = $str =~ s/\d/numbers/g;
say "Got $ret. String is now: $str";
You can do the substitution first, $rowref->[5] =~ s/...//;, and then use the changed variable.

Perl, Assign regex match to scalar

There's an example snippet in Mail::POP3Client in which theres a piece of syntax that I don't understand why or how it's working:
foreach ( $pop->Head( $i ) ) {
/^(From|Subject):\s+/i and print $_, "\n";
}
The regex bit in particular. $_ remains the same after that line but only the match is printed.
An additional question; How could I assign the match of that regex to a scalar of my own so I can use that instead of just print it?
This is actually pretty tricky. What it's doing is making use of perl's short circuiting feature to make a conditional statement. it is the same as saying this.
if (/^(From|Subject):\s+/i) {
print $_;
}
It works because perl stops evaluating and statements after something evaluates to 0. and unless otherwise specified a regex in the form /regex/ instead of $somevar =~ /regex/ will apply the regex to the default variable, $_
you can store it like this
my $var;
if (/^(From|Subject):\s+/i) {
$var = $_;
}
or you could use a capture group
/^((?:From|Subject):\s+)/i
which will store the whole thing into $1

Uninitialized Backreference in Substitution

Assuming that I must do this substitution using a single substitution, what is the preferred method to avoid this error:
Use of uninitialized value $2 in concatenation (.) or string at -e line 1.
With this Perl code:
perl -e 'use strict;use warnings;my $str="a";$str=~s/(a)|(b)/$1foo$2/gsmo;'
The goal here is to either print "afoo" or "foob" depending on what $str contains.
I can use no warnings; but then I am worried I will miss other "real" warnings. I also know that using one pattern makes this convoluted but my actual pattern is much more complicated.
If you care the actual replacements are closer to:
#!perl
my $search = q~(document\.domain.*?</script>)|(</head>)~;
my $search_re = qr/$search/smo;
my $replace = q("$1
<script src=\"/library.js\"></script>
$2");
while (<*.tmpl>) {
my $str = fead_file($_);
$str =~ s/$search_re/$replace/gee;
}
But even more complicated, basically the above code just reads from a DB to get the search & replace and then does them to the template. Having to run this script twice with every commit would introduce too much overhead, apparently... so says them...
You could:
my $replace = q("#{[$1||'']}
<script src=\"/library.js\"></script>
#{[$2||'']}");
(using // instead of || on 5.10+)
Still works with /g:
s/(a)|(b)/ ($1 // '') . 'foo' . ($2 // '') /ge
Well, you can't find both "a" and "b" when you specifically say OR (|). Also, you cannot concatenate the strings by placing the variable name next to the text, e.g. $1foo.
I'm not quite sure what you are saying about overhead, but you do need to check the match in order to do a correct replacement.
s/(a)/$1 . "foo"/ge || s/(b)/"foo" . $1/ge;
This might work. If the first one works, the second won't be executed (short circuit OR).
Similar to ikegami's solution, if you want to hold the replacement in a variable you can call a code reference in s///e passing it the captures.
#!perl
my $search = q~(document\.domain.*?</script>)|(</head>)~;
my $search_re = qr/$search/smo;
my $replace = sub {
my $one = shift || '';
my $two = shift || '';
return qq($one\n<script src="/library.js"></script>\n$two);
}
while (<*.tmpl>) {
my $str = fead_file($_);
$str =~ s/$search_re/$replace->($1, $2)/ge;
}

How can I find out what was replaced in a Perl substitution?

Is there any way to find out what was substituted for (the "old" text) after applying the s/// operator? I tried doing:
if (s/(\w+)/new/) {
my $oldTxt = $1;
# ...
}
But that doesn't work. $1 is undefined.
Your code works for me. Copied and pasted from a real terminal window:
$ perl -le '$_ = "*X*"; if (s/(\w+)/new/) { print $1 }'
X
Your problem must be something else.
If you're using 5.10 or later, you don't have to use the potentially-perfomance-killing $&. The ${^MATCH} variable from the /p flag does the same thing but only for the specified regex:
use 5.010;
if( s/abc(\w+)123/new/p ) {
say "I replaced ${^MATCH}"
}
$& does what you want but see the health warning in perlvar
The use of this variable anywhere in a program imposes a considerable performance penalty on all regular expression matches.
If you can find a way to do this without using $&, try that. You could run the regex twice:
my ($match) = /(\w+)/;
if (s/(\w+)/new/) {
my $oldTxt = $match;
# ...
}
You could make the replacement an eval expression:
if (s/(\w+)/$var=$1; "new"/e) { .. do something with $var .. }
You should be able to use the Perl match variables:
$& Contains the string matched by the last pattern match