Uninitialized Backreference in Substitution - regex

Assuming that I must do this substitution using a single substitution, what is the preferred method to avoid this error:
Use of uninitialized value $2 in concatenation (.) or string at -e line 1.
With this Perl code:
perl -e 'use strict;use warnings;my $str="a";$str=~s/(a)|(b)/$1foo$2/gsmo;'
The goal here is to either print "afoo" or "foob" depending on what $str contains.
I can use no warnings; but then I am worried I will miss other "real" warnings. I also know that using one pattern makes this convoluted but my actual pattern is much more complicated.
If you care the actual replacements are closer to:
#!perl
my $search = q~(document\.domain.*?</script>)|(</head>)~;
my $search_re = qr/$search/smo;
my $replace = q("$1
<script src=\"/library.js\"></script>
$2");
while (<*.tmpl>) {
my $str = fead_file($_);
$str =~ s/$search_re/$replace/gee;
}
But even more complicated, basically the above code just reads from a DB to get the search & replace and then does them to the template. Having to run this script twice with every commit would introduce too much overhead, apparently... so says them...

You could:
my $replace = q("#{[$1||'']}
<script src=\"/library.js\"></script>
#{[$2||'']}");
(using // instead of || on 5.10+)

Still works with /g:
s/(a)|(b)/ ($1 // '') . 'foo' . ($2 // '') /ge

Well, you can't find both "a" and "b" when you specifically say OR (|). Also, you cannot concatenate the strings by placing the variable name next to the text, e.g. $1foo.
I'm not quite sure what you are saying about overhead, but you do need to check the match in order to do a correct replacement.
s/(a)/$1 . "foo"/ge || s/(b)/"foo" . $1/ge;
This might work. If the first one works, the second won't be executed (short circuit OR).

Similar to ikegami's solution, if you want to hold the replacement in a variable you can call a code reference in s///e passing it the captures.
#!perl
my $search = q~(document\.domain.*?</script>)|(</head>)~;
my $search_re = qr/$search/smo;
my $replace = sub {
my $one = shift || '';
my $two = shift || '';
return qq($one\n<script src="/library.js"></script>\n$two);
}
while (<*.tmpl>) {
my $str = fead_file($_);
$str =~ s/$search_re/$replace->($1, $2)/ge;
}

Related

Perl do substitution in substitution itself

I was doing some regex substitution operation with the html snippet using Perl.
This is how I match the wanted part: (class="p_hw"><a href=")(http://[^<>"]*?xxxx\.com\/[^<>"]*[=/])([^<>"]*)(">(?:<b>)?)(.*?)(?=<)
I need to replace the http:// with entry:// followed by certain parameter value of the http url($3 for that matter) if that value exists in a hash(%hw_f), or else the first word(or phrase) from $5 will be used when it exists in %hw_f. If all conditions are not matched, the snippet will stay unchanged.
I have tried the following:
s#(class="p_hw"><a href=")(http://[^<>"]*?xxxx\.com\/[^<>"]*[=/])([^<>"]*)(">(?:<b>)?)(.*?)(?=<)#
my #n = split(/\,|;/, $5);
my #m = map {s,^\s+|\s+$,,mgr} #n;
my $new = $3 =~ s/^\s+|\s+$//mgr;
my $new2 = $new =~ s/\+/ /mgr;
exists $hw_f{$new2} ? "$1entry://$new2$4$5" : (exists $hw_f{$m[0]} ? "$1entry://$m[0]$4$5" : "$1$2$3$4$5") #eg;
%hw_f is where all conditions will be matched against.
It gives the following error:
Use of uninitialized value $1 in concatenation (.) or string
I need to obtain a new value based on $3 within the substitution, continue with that new value. How could I do that?
I'm not going to try to really fix the logic of what you're trying to accomplish because it's rather ill advised. What I will do is offer some semantic and coding advice.
1: Use Regexp::Common and URI to deal with URLs. It is almost never worth it to write your own regexes. Parsing HTML with regex requires that you seriously know what you're doing. https://metacpan.org/search?q=regexp%3A%3Acommon
2: Always only use {} and // to wrap regex. (A 99% rule)
3: Always immediately copy the numbered variables into meaningfully named my() variables unless the expression is trivial.
4: Modify arrays inplace with postfix foreach.
5: Spread out the code formatting to make it visually appealing.
6: Use sprintf for complicated variable recombinations. It makes it a lot easier to see what variable is used where and for what.
HTH
# 1 2 3 4 5
s{(class="p_hw"><a href=\")(http://[^<>"]*?xxxx\.com/[^<>"]*[=/])([^<>\"]*)(\">(?:<b>)?)(.*?)(?=<)}{
my ($m1, $m2, $m3, $m4, $m5) = ($1, $2, $3, $4, $5);
my #n = split /[,|;]/, $m5;
s/^\s+|\s+$//mg foreach #n;
(my $new = $m3) =~ s/^\s+|\s+$//mg;
(my $new2 = $new) =~ s/\+/ /g;
exists $hw_f{$new2} ?
sprintf "%sentry://%s%s%s", $m1, $new2, $m4, $m5 :
exists $hw_f{$n[0]} ?
sprintf "%sentry://%s%s%s", $m1, $n[0], $m4, $m5 :
"$m1$m2$m3$m4$m5";
}ige;
Update:
while (<DICT>) {
s#(class="p_hw"><a href=")(http://[^<>"]*?wordinfo\.info\/[^<>"]*[=/])([^<>"]*)(">(?:<b>)?)(.*?)(?=<)#
my $one = $1;
my $two = $2;
my $three = $3;
my $four = $4;
my $five = $5;
my #n = split(/\,|;/, $5);
my #m = map {s,^\s+|\s+$,,mgr} #n;
my $new = $3 =~ s/^\s+|\s+$//mgr;
my $new2 = $new =~ s/\+/ /mgr;
exists $hw_f{$new2} ? $one."entry://$new2$four$five" : (exists $hw_f{$m[0]} ? $one."entry://$m[0]$four$five" : "$one$two$three$four$five") #eg;
print $FH $_;
}
Assigning all the capture variables before all the regex engine invocation as #DavidO in the comment mentioned, it finally works. Thanks.
from your post it is not obvious what you try to achieve. If you would describe the problem in following format it would be easier to understand
--- Example -----------------------
I extract from web page a snippet with <a href="http:\\....... which I would like to convert/transform into following format <a href="http:\\........
At least in this way we know what is INPUT and what OUTPUT expected.
--- End of the example ------------
When you apply regex with memory it is easier to store remembered values in an array or better hash
use strict;
use warnings;
use Data::Dumper;
my %href;
$data = shift;
if( $data =~ /<a href="(\w+):\\\\([\w\d\.]+)\\([\w\d\.]+)\\(.+)">([^<]+)</ ) {
#href{qw(protocol dns dir rest desc)} = ($1,$2,$3,$4,$5);
print Dumper(\%href);
} else {
print "No match found\n";
}

How to pass a replacing regex as a command line argument to a perl script

I am trying to write a simple perl script to apply a given regex to a filename among other things, and I am having trouble passing a regex into the script as an argument.
What I would like to be able to do is somthing like this:
> myscript 's/hi/bye/i' hi.h
bye.h
>
I have produced this code
#!/utils/bin/perl -w
use strict;
use warnings;
my $n_args = $#ARGV + 1;
my $regex = $ARGV[0];
for(my $i=1; $i<$n_args; $i++) {
my $file = $ARGV[$i];
$file =~ $regex;
print "OUTPUT: $file\n";
}
I cannot use qr because apparently it cannot be used on replacing regexes (although my source for this is a forum post so I'm happy to be proved wrong).
I would rather avoid passing the two parts in as seperate strings and manually doing the regex in the perl script.
Is it possible to pass the regex as an argument like this, and if so what is the best way to do it?
There's more than one way to do it, I think.
The Evial Way:
As you basically send in a regex expression, it can be evaluated to get the result. Like this:
my #args = ('s/hi/bye/', 'hi.h');
my ($regex, #filenames) = #args;
for my $file (#filenames) {
eval("\$file =~ $regex");
print "OUTPUT: $file\n";
}
Of course, following this way will open you to some very nasty surprises. For example, consider passing this set of arguments:
...
my #args = ('s/hi/bye/; print qq{MINE IS AN EVIL LAUGH!\n}', 'hi.h');
...
Yes, it will laugh at you most evailly.
The Safe Way:
my ($regex_expr, #filenames) = #args;
my ($substr, $replace) = $regex_expr =~ m#^s/((?:[^/]|\\/)+)/((?:[^/]|\\/)+)/#;
for my $file (#filenames) {
$file =~ s/$substr/$replace/;
print "OUTPUT: $file\n";
}
As you can see, we parse the expression given to us into two parts, then use these parts to build a full operator. Obviously, this approach is less flexible, but, of course, it's much more safe.
The Easiest Way:
my ($search, $replace, #filenames) = #args;
for my $file (#filenames) {
$file =~ s/$search/$replace/;
print "OUTPUT: $file\n";
}
Yes, that's right - no regex parsing at all! What happens here is we decided to take two arguments - 'search pattern' and 'replacement string' - instead of a single one. Will it make our script less flexible than the previous one? No, as we still had to parse the regex expression more-or-less regularly. But now user clearly understand all the data that is given to a command, which is usually quite an improvement. )
#args in both examples corresponds to #ARGV array.
The s/a/b/i is an operator, not simply a regular expression, so you need to use eval if you want it to be interpreted properly.
#!/usr/bin/env perl
use warnings;
use strict;
my $regex = shift;
my $sub = eval "sub { \$_[0] =~ $regex; }";
foreach my $file (#ARGV) {
&$sub($file);
print "OUTPUT: $file\n";
}
The trick here is that I'm substituting this "bit of code" into a string to produce Perl code that defines an anonymous subroutine $_[0] =~ s/a/b/i; (or whatever code you pass it), then using eval to compile that code and give me a code reference I can call from within the loop.
$ test.pl 's/foo/bar/' foo nicefood
OUTPUT: bar
OUTPUT: nicebard
$ test.pl 'tr/o/e/' foo nicefood
OUTPUT: fee
OUTPUT: nicefeed
This is more efficient than putting an eval "\$file =~ $regex;" inside the loop as then it'll get compiled and eval-ed at every iteration rather than just once up-front.
A word of warning about eval - as raina77ow's answer explains, you should avoid eval unless you're 100% sure you are always getting your input from a trusted source...
s/a/b/i is not a regex. It is a regex plus substitution. Unless you use the string eval, make this work might be pretty tough (consider s{a}<b>e and so on).
The trouble is that you are trying to pass a perl operator when all you really need to pass is the arguments:
myscript hi bye hi.h
In the script:
my ($find, $replace, #files) = #ARGV;
...
$file =~ s/$find/$replace/i;
Your code is a bit clunky. This is all you need:
use strict;
use warnings;
my ($find, $replace, #files) = #ARGV;
for my $file (#files) {
$file =~ s/$find/$replace/i;
print "$file\n";
}
Note that this way allows you to use meta characters in the regex, such as \w{2}foo?. This can be both a good thing and a bad thing. To make all characters intepreted literally (disable meta characters), you can use \Q ... \E like so:
... s/\Q$find\E/$replace/i;

Perl regex strange behaviour

Method 1:
$C_HOME = "$ENV{EO_HOME}\\common\\";
print $C_HOME;
gives C:\work\System11R1\common\
ie The environment variable is getting expanded.
Method 2:
Parse properties file having
C_HOME = $ENV{EO_HOME}\common\
while(<IN>) {
if(m/(.*)\s+=\s+(.*)/)
{
$o{$1}=$2;
}
}
$C_HOME = $o{"C_HOME"};
print $C_HOME;
This gives a output of $ENV{EO_HOME}\common\
ie The environment variable is not getting expanded.
How do I make sure that the environment variable gets expanded in the second case also.
The problem is in the line:
$o{$1}=$2;
Of course perl will not evaluate $2 automatically as it read it.
If you want, you can evaluate it manually:
$o{$1}=eval($2);
But you must be sure that it is ok from security point of view.
the value of $o{C_HOME} contains the literal string $ENV{C_HOME}\common\. To get the $ENV-value eval-ed, use eval...
$C_HOME = eval $o{"C_HOME"};
I leave it to you to find out why that will fail, however...
Expression must be evaluated:
$C_HOME = eval($o{"C_HOME"});
Perl expands variables in double-quote-like code strings, not in data.
You have to eval a string to explicity interpolate variables inside it, but doing so without checking what you are passing to eval is dangerous.
Instead, look for everything you may want to interpolate inside the string and eval those using a regex substitution with the /ee modifier.
This program looks for all references to elements of the %ENV hash in the config value and replaces them. You may want to add support for whitespace wherever Perl allows it ($ ENV { EO_HOME } compiles just fine). It also assigns test values for %ENV which you will need to remove.
use strict;
use warnings;
my %data;
%ENV = ( EO_HOME => 'C:\work\System11R1' );
while (<DATA>) {
if ( my ($key, $val) = m/ (.*) \s+ = \s* (.*) /x ) {
$val =~ s/ ( \$ENV \{ \w+ \} ) / $1 /gxee;
$data{$key} = $val;
}
}
print $data{C_HOME};
__DATA__
C_HOME = $ENV{EO_HOME}\common\
output
C:\work\System11R1\common\

Match regex and assign results in single line of code

I want to be able to do a regex match on a variable and assign the results to the variable itself. What is the best way to do it?
I want to essentially combine lines 2 and 3 in a single line of code:
$variable = "some string";
$variable =~ /(find something).*/;
$variable = $1;
Is there a shorter/simpler way to do this? Am I missing something?
my($variable) = "some string" =~ /(e\s*str)/;
This works because
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3 …).
and because my($variable) = ... (note the parentheses around the scalar) supplies list context to the match.
If the pattern fails to match, $variable gets the undefined value.
Why do you want it to be shorter? Does is really matter?
$variable = $1 if $variable =~ /(find something).*/;
If you are worried about the variable name or doing this repeatedly, wrap the thing in a subroutine and forget about it:
some_sub( $variable, qr/pattern/ );
sub some_sub { $_[0] = $1 if eval { $_[0] =~ m/$_[1]/ }; $1 };
However you implement it, the point of the subroutine is to make it reuseable so you give a particular set of lines a short name that stands in their place.
Several other answers mention a destructive substitution:
( my $new = $variable ) =~ s/pattern/replacement/;
I tend to keep the original data around, and Perl v5.14 has an /r flag that leaves the original alone and returns a new string with the replacement (instead of the count of replacements):
my $match = $variable =~ s/pattern/replacement/r;
Well, you could say
my $variable;
($variable) = ($variable = "find something soon") =~ /(find something).*/;
or
(my $variable = "find something soon") =~ s/^.*?(find something).*/$1/;
You can do substitution as:
$a = 'stackoverflow';
$a =~ s/(\w+)overflow/$1/;
$a is now "stack"
From Perl Cookbook 2nd ed
6.1 Copying and Substituting Simultaneously
$dst = $src;
$dst =~ s/this/that/;
becomes
($dst = $src) =~ s/this/that/;
I just assumed everyone did it this way, amazed that no one gave this answer.
Almost ....
You can combine the match and retrieve the matched value with a substitution.
$variable =~ s/.*(find something).*/$1/;
AFAIK, You will always have to copy the value though, unless you do not care to clobber the original.
$variable2 = "stackoverflow";
(my $variable1) = ($variable2 =~ /stack(\w+)/);
$variable1 now equals "overflow".
I do this:
#!/usr/bin/perl
$target = "n: 123";
my ($target) = $target =~ /n:\s*(\d+)/g;
print $target; # the var $target now is "123"
Also, to amplify the accepted answer using the ternary operator to allow you to specify a default if there is no match:
my $match = $variable =~ /(*pattern*).*/ ? $1 : *defaultValue*;

Inline regex replacement in perl

Is there a way to replace text with a regex inline, rather than taking the text from a variable and storing it in a variable?
I'm a perl beginner. I often find myself writing
my $foo = $bar;
$foo =~ s/regex/replacement/;
doStuff($foo)
where I'd really like to write
doStuff($bar->replace(s/regex/replacement/));
or the like, rather than using a temporary variable and three lines.
Is there a way to do this? Obviously when the regex is sufficiently complicated it makes sense to split it out so it can be better explained, but when it's just s/\s//g it feels wrong to clutter the code with additional variables.
You really can't do what you want because the substitution function returns either a 1 if it worked or an empty string if it didn't work. That means if you did this:
doStuff($foo =~ s/regex/replacement/);
The doStuff function would be using either 1 or an empty string as a parameter. There is no reason why the substitution function couldn't return the resultant string instead of just a 1 if it worked. However, it was a design decision from the earliest days of Perl. Otherwise, what would happen with this?
$foo = "widget";
if ($foo =~ s/red/blue/) {
print "We only sell blue stuff and not red stuff!\n";
}
The resulting string is still widget, but the substitution actually failed. However, if the substitution returned the resulting string and not an empty string, the if would still be true.
Then, consider this case:
$bar = "FOO!";
if ($bar =~ s/FOO!//) {
print "Fixed up \'\$bar\'!\n";
}
$bar is now an empty string. If the substitution returned the result, it would return an empty string. Yet, the substitution actually succeeded and I want to my if to be true.
In most languages, the substitution function returns the resulting string, and you'd have to do something like this:
if ($bar != replace("$bar", "/FOO!//")) {
print "Fixed up \'\$bar''!\n";
}
So, because of a Perl design decision (basically to better mimic awk syntax), there's no easy way to do what you want. However you could have done this:
($foo = $bar) =~ s/regex/replacement/;
doStuff($foo);
That would do an in place setting of $foo without first assigning it the value of $bar. $bar would remain unchanged.
Starting from perl 5.14 you can use Non-destructive substitution to achieve desired behavior.
Use /r modifier to do so:
doStuff($bar=~s/regex/replacement/r);
use Algorithm::Loops "Filter";
# leaves $foo unchanged
doStuff( Filter { s/this/that/ } $foo );
You can use a do { } block to avoid creating a temporary variable in the current scope:
doStuff( do {(my $foo = $bar) =~ s/regex/replacement/; $foo} );
Is this what you want?:
my $foo = 'Replace this with that';
(my $bar = $foo) =~ s/this/that/;
print "Foo: $foo\nBar: $bar\n";
Prints:
Foo: Replace this with that
Bar: Replace that with that
There is yet another way: Write your own function:
sub replace (
my $variable = shift;
my $substring = shift;
eval "\$variable =~ s${substring};";
return $variable
}
doStuff(replace($foo, "/regex/replace/"));
This wouldn't be worth it for a single call, and it would probably just make your code more confusing in that case. However, if you're doing this a dozen or so times, it might make more sense to write your own function to do this.