Perl: using variable in substitution [duplicate] - regex

I would like to do the following:
$find = "start (.*) end";
$replace = "foo \1 bar";
$var = "start middle end";
$var =~ s/$find/$replace/;
I would expect $var to contain "foo middle bar", but it does not work. Neither does:
$replace = 'foo \1 bar';
Somehow I am missing something regarding the escaping.

On the replacement side, you must use $1, not \1.
And you can only do what you want by making replace an evalable expression that gives the result you want and telling s/// to eval it with the /ee modifier like so:
$find="start (.*) end";
$replace='"foo $1 bar"';
$var = "start middle end";
$var =~ s/$find/$replace/ee;
print "var: $var\n";
To see why the "" and double /e are needed, see the effect of the double eval here:
$ perl
$foo = "middle";
$replace='"foo $foo bar"';
print eval('$replace'), "\n";
print eval(eval('$replace')), "\n";
__END__
"foo $foo bar"
foo middle bar
(Though as ikegami notes, a single /e or the first /e of a double e isn't really an eval(); rather, it tells the compiler that the substitution is code to compile, not a string. Nonetheless, eval(eval(...)) still demonstrates why you need to do what you need to do to get /ee to work as desired.)

Deparse tells us this is what is being executed:
$find = 'start (.*) end';
$replace = "foo \cA bar";
$var = 'start middle end';
$var =~ s/$find/$replace/;
However,
/$find/foo \1 bar/
Is interpreted as :
$var =~ s/$find/foo $1 bar/;
Unfortunately it appears there is no easy way to do this.
You can do it with a string eval, but thats dangerous.
The most sane solution that works for me was this:
$find = "start (.*) end";
$replace = 'foo \1 bar';
$var = "start middle end";
sub repl {
my $find = shift;
my $replace = shift;
my $var = shift;
# Capture first
my #items = ( $var =~ $find );
$var =~ s/$find/$replace/;
for( reverse 0 .. $#items ){
my $n = $_ + 1;
# Many More Rules can go here, ie: \g matchers and \{ }
$var =~ s/\\$n/${items[$_]}/g ;
$var =~ s/\$$n/${items[$_]}/g ;
}
return $var;
}
print repl $find, $replace, $var;
A rebuttal against the ee technique:
As I said in my answer, I avoid evals for a reason.
$find="start (.*) end";
$replace='do{ print "I am a dirty little hacker" while 1; "foo $1 bar" }';
$var = "start middle end";
$var =~ s/$find/$replace/ee;
print "var: $var\n";
this code does exactly what you think it does.
If your substitution string is in a web application, you just opened the door to arbitrary code execution.
Good Job.
Also, it WON'T work with taints turned on for this very reason.
$find="start (.*) end";
$replace='"' . $ARGV[0] . '"';
$var = "start middle end";
$var =~ s/$find/$replace/ee;
print "var: $var\n"
$ perl /tmp/re.pl 'foo $1 bar'
var: foo middle bar
$ perl -T /tmp/re.pl 'foo $1 bar'
Insecure dependency in eval while running with -T switch at /tmp/re.pl line 10.
However, the more careful technique is sane, safe, secure, and doesn't fail taint. ( Be assured tho, the string it emits is still tainted, so you don't lose any security. )

As others have suggested, you could use the following:
my $find = 'start (.*) end';
my $replace = 'foo $1 bar'; # 'foo \1 bar' is an error.
my $var = "start middle end";
$var =~ s/$find/$replace/ee;
The above is short for the following:
my $find = 'start (.*) end';
my $replace = 'foo $1 bar';
my $var = "start middle end";
$var =~ s/$find/ eval($replace) /e;
I prefer the second to the first since it doesn't hide the fact that eval(EXPR) is used. However, both of the above silence errors, so the following would be better:
my $find = 'start (.*) end';
my $replace = 'foo $1 bar';
my $var = "start middle end";
$var =~ s/$find/ my $r = eval($replace); die $# if $#; $r /e;
But as you can see, all of the above allow for the execution of arbitrary Perl code. The following would be far safer:
use String::Substitution qw( sub_modify );
my $find = 'start (.*) end';
my $replace = 'foo $1 bar';
my $var = "start middle end";
sub_modify($var, $find, $replace);

# perl -de 0
$match="hi(.*)"
$sub='$1'
$res="hi1234"
$res =~ s/$match/$sub/gee
p $res
1234
Be careful, though. This causes two layers of eval to occur, one for each e at the end of the regex:
$sub --> $1
$1 --> final value, in the example, 1234

I would suggest something like:
$text =~ m{(.*)$find(.*)};
$text = $1 . $replace . $2;
It is quite readable and seems to be safe. If multiple replace is needed, it is easy:
while ($text =~ m{(.*)$find(.*)}){
$text = $1 . $replace . $2;
}

#!/usr/bin/perl
$sub = "\\1";
$str = "hi1234";
$res = $str;
$match = "hi(.*)";
$res =~ s/$match/$1/g;
print $res
This got me the '1234'.

See THIS previous SO post on using a variable on the replacement side of s///in Perl. Look both at the accepted answer and the rebuttal answer.
What you are trying to do is possible with the s///ee form that performs a double eval on the right hand string. See perlop quote like operators for more examples.
Be warned that there are security impilcations of evaland this will not work in taint mode.

I did not manage to make the most popular answers work.
The ee method complained when my replacement string contained several consecutive backreferences.
Kent Fredric's answer only replaced the first match, and I need my search and replace to be global. I did not figure out a way to make it replace all matches that didn't cause other issues. For example, I tried running the method recursively until it no longer caused the string to change, but that causes an infinite loop if the replacement string contains the search string, whereas a regular global replacement does not do that.
I attempted to come up with a solution of my own using plain old eval:
eval '$var =~ s/' . $find . '/' . $replace . '/gsu;';
Of course, this allows for code injection. But as far as I know, the only way to escape the regex query and inject code is to insert two forward slashes in $find or one in $replace, followed by a semi-colon, after which you can add add code. For example, if I set the variables this way:
my $find = 'foo';
my $replace = 'bar/; print "You\'ve just been hacked!\n"; #';
The evaluated code is this:
$var =~ s/foo/bar/; print "You've just been hacked!\n"; #/gsu;';
So what I do is make sure the strings don't contain any unescaped forward slashes.
First, I copy the strings into dummy strings.
my $findTest = $find;
my $replaceTest = $replace;
Then, I remove all escaped backslashes (backslash pairs) from the dummy strings. This allows me to find forward slashes that are not escaped, without falling into the trap of considering a forward slash escaped if it's preceded by an escaped backslash. For example: \/ contains an escaped forward slash, but \\/ contains a literal forward slash, because the backslash is escaped.
$findTest =~ s/\\\\//gmu;
$replaceTest =~ s/\\\\//gmu;
Now if any forward slash that is not preceded by a backslash remains in the strings, I throw a fatal error, as that would allow the user to insert arbitrary code.
if ($findTest =~ /(?<!\\)\// || $replaceTest =~ /(?<!\\)\//)
{
print "String must not contain unescaped slashes.\n";
exit 1;
}
Then I eval.
eval '$var =~ s/' . $find . '/' . $replace . '/gsu;';
I'm not an expert at preventing code injection, but I'm the only one using my script, so I'm content using this solution without fully knowing if it's vulnerable. But as far as I know, it may be, so if anyone knows if there is or isn't any way to inject code into this, please provide your insight in a comment.

I'm not certain on what it is you're trying to achieve. But maybe you can use this:
$var =~ s/^start/foo/;
$var =~ s/end$/bar/;
I.e. just leave the middle alone and replace the start and end.

Related

How to capture every match in a global regex substitution?

I realize it is possible to achieve this with a slight workaround, but I am hoping there is a simpler way (since I often make use of this type of expression).
Given the example string:
my $str = "An example: sentence!*"
A regex can be used to match each punctuation mark and capture them in an array.
Thereafter, I can simply repeat the regex and replace the matches as in the following code:
push (#matches, $1), while ($str =~ /([\*\!:;])/);
$str =~ s/([\*\!:;])//g;
Would it be possible to combine this into a single step in Perl where substitution occurs globally while also keeping tabs on the replaced matches?
You can embed code to run in your regular expression:
my #matches;
my $str = 'An example: sentence!*';
$str =~ s/([\*\!:;])(?{push #matches, $1})//g;
But with a match this simple, I'd just do the captures and substitution separately.
Yes, it's possible.
my #matches;
$str =~ s/[*!:;]/ push #matches, $&; "" /eg;
However, I'm not convinced that the above is faster or clearer than the following:
my #matches = $str =~ /[*!:;]/g;
$str =~ tr/*!:;//d;
Use:
my $str = "An example: sentence!*";
my #matches = $str =~ /([\*\!:;])/g;
say Dumper \#matches;
$str =~ tr/*!:;//d;
Output:
$VAR1 = [
':',
'!',
'*'
];
Is that what you're looking for ?
my ($str, #matches) = ("An example: sentence!*");
#first method :
($str =~ s/([\*\!:;])//g) && push(#matches, $1);
#second method :
push(#matches, $1) while ($str =~ s/([\*\!:;])//g);
Try:
my $str = "An example: sentence!*";
push(#mys, ($str=~m/([^\w\s])/g));
print join "\n", #mys;
Thanks.

Add html to perl Regex

Am trying to replace all `` with a HTML code tag
replace:
$string = "Foo `FooBar` Bar";
with:
$string = "Foo <code>FooBar</code> Bar";
i tried these
$pattern = '`(.*?)`';
my $replace = "<code/>$&</code>";
$subject =~ s/$pattern/$replace/im;
#And
$subject =~ s/$pattern/<code/>$&</code>/im;
but none of them works.
Assuming you meant $string instead of $subject...
use strict;
use warnings;
use v5.10;
my $string = "Foo `FooBar` Bar";
my $pattern = '`(.*?)`';
my $replace = "<code/>$&</code>";
$string =~ s{$pattern}{$replace}im;
say $string;
This results in...
$ perl ~/tmp/test.plx
Use of uninitialized value $& in concatenation (.) or string at /Users/schwern/tmp/test.plx line 9.
Foo <code/></code> Bar
There's some problems here. First, $& means the string matched by the last match. That would be all of `FooBar`. You just want FooBar which is inside capturing parens. You get that with $1. See Extracting Matches in the Perl Regex Tutorial.
Second is $& and $1 are variables. If you put them in double quotes like $replace = "<code/>$&</code>" then Perl will immediately interpolate them. That means $replace is <code/></code>. This is where the warning comes from. If you want to use $1 it has to go directly into the replace.
Finally, when quoting regexes it's best to use qr{}. That does special regex quoting. It avoids all sorts of quoting issues.
Put it all together...
use strict;
use warnings;
use v5.10;
my $string = "Foo `FooBar` Bar";
my $pattern = qr{`(.*?)`};
$string =~ s{$pattern}{<code/>$1</code>}im;
say $string;

How to replace stuff in a Perl regex

I have a string $text and want to modify it with a regex. The string contains multiple sections like <NAME>John</NAME>.
I want to search for those sections, which I would normally do with something like
$text =~ m/<NAME>(.*?)<\/NAME>/g
but then make sure that there are no leading and trailing blanks and no leading non-word characters, which I would normally ensure with something like
$temp =~ s/^\s+|\s+$//g; # trim leading and trailing whitespaces
$temp = s/^\W*//g; # remove all leading non-word chars
Now my question is: How do I actually make this happen? Is it possible to use a s/// regex instead of the m//?
This is possible in a single substitution, but it's unnecessarily complex. I suggest you do a two-tier substitution using a executable replacement.
my $text = '<NAME> %^John^%
</NAME>';
$text =~ s{ (?<=<NAME>) ([^<>]*) (?=</NAME>) }{
(my $new = $1) =~ s/\A\s+|\s+\z//g;
$new =~ s/\A\W+//;
$new;
}eg;
print $text;
output
<NAME>John^%</NAME>
This is even simpler if you have version 14 or later of Perl 5, and want to use the non-destructive ( /r modifier) substitution mode.
$text =~ s{ (?<=<NAME>) ([^<>]*) (?=</NAME>) }{ $1 =~ s/\A\s+|\s+\z//gr =~ s/\A\W+//r }exg;
If I understand correctly, what you want to do is merely "clean up" the text inside the tag (insofar as it's possible to "parse" XML using regular expressions). This should do the trick:
$text =~ s/(<NAME>)\s*\W*(.*?)\s*(<\/NAME>)/$1$2$3/sgi;

How to save a matching regex's value to a variable in one line of perl?

I'm sure there is a very simple way to do this, but whenever I search for examples, I get the two step method. Here is what I typically do:
$data =~ m/(my_query)/;
$result = $1;
I want to set $result in the same line as the regex and never use $1. Thanks!
my($result) = ($data =~ m/(my_query)/);
As noted in a comment, the my($result) needs the parentheses to provide an array context for the result of the match. In an array context, you get the $1 etc allocated to the array. You could use #result = ($data =~ m/(my_query)/);; you could omit the my but you would need to keep the parentheses; you could subscript the array using $result = ($data =~ m/(my_query)/)[0]; (thanks ysth). The key words here are 'array context'.
Examples:
$ perl -e '$data="abcdef";my($result)=($data =~ m/(cde)/); print "$result\n"'
cde
$ perl -e '$data="abcdef"; ($result)=($data =~ m/(cde)/); print "$result\n"'
cde
$ perl -e '$data="abcdef"; #result =($data =~ m/(cde)/); print "$result[0]\n"'
cde
$ perl -e '$data="abcdef"; $result =($data =~ m/(cde)/)[0]; print "$result\n"'
cde
$
You didn't specify what problem you want to avoid, but there is definitely one to avoid. The following code assigns something unknown to $result when the pattern doesn't match:
$data =~ /(my_query)/;
my $result = $1;
You could use a conditional to assign something useful to $result when the pattern doesn't match
my $result = $data =~ /(my_query)/ ? $1 : undef;
Or you could take advantage of the fact that m// in list context returns what it captured.
my ($result) = $data =~ /(my_query)/;
$data="abcde";
$data =~ s/(cde)/$result=$1/e;

How can I use a variable in the replacement side of the Perl substitution operator?

I would like to do the following:
$find = "start (.*) end";
$replace = "foo \1 bar";
$var = "start middle end";
$var =~ s/$find/$replace/;
I would expect $var to contain "foo middle bar", but it does not work. Neither does:
$replace = 'foo \1 bar';
Somehow I am missing something regarding the escaping.
On the replacement side, you must use $1, not \1.
And you can only do what you want by making replace an evalable expression that gives the result you want and telling s/// to eval it with the /ee modifier like so:
$find="start (.*) end";
$replace='"foo $1 bar"';
$var = "start middle end";
$var =~ s/$find/$replace/ee;
print "var: $var\n";
To see why the "" and double /e are needed, see the effect of the double eval here:
$ perl
$foo = "middle";
$replace='"foo $foo bar"';
print eval('$replace'), "\n";
print eval(eval('$replace')), "\n";
__END__
"foo $foo bar"
foo middle bar
(Though as ikegami notes, a single /e or the first /e of a double e isn't really an eval(); rather, it tells the compiler that the substitution is code to compile, not a string. Nonetheless, eval(eval(...)) still demonstrates why you need to do what you need to do to get /ee to work as desired.)
Deparse tells us this is what is being executed:
$find = 'start (.*) end';
$replace = "foo \cA bar";
$var = 'start middle end';
$var =~ s/$find/$replace/;
However,
/$find/foo \1 bar/
Is interpreted as :
$var =~ s/$find/foo $1 bar/;
Unfortunately it appears there is no easy way to do this.
You can do it with a string eval, but thats dangerous.
The most sane solution that works for me was this:
$find = "start (.*) end";
$replace = 'foo \1 bar';
$var = "start middle end";
sub repl {
my $find = shift;
my $replace = shift;
my $var = shift;
# Capture first
my #items = ( $var =~ $find );
$var =~ s/$find/$replace/;
for( reverse 0 .. $#items ){
my $n = $_ + 1;
# Many More Rules can go here, ie: \g matchers and \{ }
$var =~ s/\\$n/${items[$_]}/g ;
$var =~ s/\$$n/${items[$_]}/g ;
}
return $var;
}
print repl $find, $replace, $var;
A rebuttal against the ee technique:
As I said in my answer, I avoid evals for a reason.
$find="start (.*) end";
$replace='do{ print "I am a dirty little hacker" while 1; "foo $1 bar" }';
$var = "start middle end";
$var =~ s/$find/$replace/ee;
print "var: $var\n";
this code does exactly what you think it does.
If your substitution string is in a web application, you just opened the door to arbitrary code execution.
Good Job.
Also, it WON'T work with taints turned on for this very reason.
$find="start (.*) end";
$replace='"' . $ARGV[0] . '"';
$var = "start middle end";
$var =~ s/$find/$replace/ee;
print "var: $var\n"
$ perl /tmp/re.pl 'foo $1 bar'
var: foo middle bar
$ perl -T /tmp/re.pl 'foo $1 bar'
Insecure dependency in eval while running with -T switch at /tmp/re.pl line 10.
However, the more careful technique is sane, safe, secure, and doesn't fail taint. ( Be assured tho, the string it emits is still tainted, so you don't lose any security. )
As others have suggested, you could use the following:
my $find = 'start (.*) end';
my $replace = 'foo $1 bar'; # 'foo \1 bar' is an error.
my $var = "start middle end";
$var =~ s/$find/$replace/ee;
The above is short for the following:
my $find = 'start (.*) end';
my $replace = 'foo $1 bar';
my $var = "start middle end";
$var =~ s/$find/ eval($replace) /e;
I prefer the second to the first since it doesn't hide the fact that eval(EXPR) is used. However, both of the above silence errors, so the following would be better:
my $find = 'start (.*) end';
my $replace = 'foo $1 bar';
my $var = "start middle end";
$var =~ s/$find/ my $r = eval($replace); die $# if $#; $r /e;
But as you can see, all of the above allow for the execution of arbitrary Perl code. The following would be far safer:
use String::Substitution qw( sub_modify );
my $find = 'start (.*) end';
my $replace = 'foo $1 bar';
my $var = "start middle end";
sub_modify($var, $find, $replace);
# perl -de 0
$match="hi(.*)"
$sub='$1'
$res="hi1234"
$res =~ s/$match/$sub/gee
p $res
1234
Be careful, though. This causes two layers of eval to occur, one for each e at the end of the regex:
$sub --> $1
$1 --> final value, in the example, 1234
I would suggest something like:
$text =~ m{(.*)$find(.*)};
$text = $1 . $replace . $2;
It is quite readable and seems to be safe. If multiple replace is needed, it is easy:
while ($text =~ m{(.*)$find(.*)}){
$text = $1 . $replace . $2;
}
#!/usr/bin/perl
$sub = "\\1";
$str = "hi1234";
$res = $str;
$match = "hi(.*)";
$res =~ s/$match/$1/g;
print $res
This got me the '1234'.
See THIS previous SO post on using a variable on the replacement side of s///in Perl. Look both at the accepted answer and the rebuttal answer.
What you are trying to do is possible with the s///ee form that performs a double eval on the right hand string. See perlop quote like operators for more examples.
Be warned that there are security impilcations of evaland this will not work in taint mode.
I did not manage to make the most popular answers work.
The ee method complained when my replacement string contained several consecutive backreferences.
Kent Fredric's answer only replaced the first match, and I need my search and replace to be global. I did not figure out a way to make it replace all matches that didn't cause other issues. For example, I tried running the method recursively until it no longer caused the string to change, but that causes an infinite loop if the replacement string contains the search string, whereas a regular global replacement does not do that.
I attempted to come up with a solution of my own using plain old eval:
eval '$var =~ s/' . $find . '/' . $replace . '/gsu;';
Of course, this allows for code injection. But as far as I know, the only way to escape the regex query and inject code is to insert two forward slashes in $find or one in $replace, followed by a semi-colon, after which you can add add code. For example, if I set the variables this way:
my $find = 'foo';
my $replace = 'bar/; print "You\'ve just been hacked!\n"; #';
The evaluated code is this:
$var =~ s/foo/bar/; print "You've just been hacked!\n"; #/gsu;';
So what I do is make sure the strings don't contain any unescaped forward slashes.
First, I copy the strings into dummy strings.
my $findTest = $find;
my $replaceTest = $replace;
Then, I remove all escaped backslashes (backslash pairs) from the dummy strings. This allows me to find forward slashes that are not escaped, without falling into the trap of considering a forward slash escaped if it's preceded by an escaped backslash. For example: \/ contains an escaped forward slash, but \\/ contains a literal forward slash, because the backslash is escaped.
$findTest =~ s/\\\\//gmu;
$replaceTest =~ s/\\\\//gmu;
Now if any forward slash that is not preceded by a backslash remains in the strings, I throw a fatal error, as that would allow the user to insert arbitrary code.
if ($findTest =~ /(?<!\\)\// || $replaceTest =~ /(?<!\\)\//)
{
print "String must not contain unescaped slashes.\n";
exit 1;
}
Then I eval.
eval '$var =~ s/' . $find . '/' . $replace . '/gsu;';
I'm not an expert at preventing code injection, but I'm the only one using my script, so I'm content using this solution without fully knowing if it's vulnerable. But as far as I know, it may be, so if anyone knows if there is or isn't any way to inject code into this, please provide your insight in a comment.
I'm not certain on what it is you're trying to achieve. But maybe you can use this:
$var =~ s/^start/foo/;
$var =~ s/end$/bar/;
I.e. just leave the middle alone and replace the start and end.