Perl regular expressions troubles - regex

I have a variable $rowref->[5] which contains the string:
" 1.72.1.13.3.5 (ISU)"
I am using XML::Twig to build modify an XML file and this variable contains the information for the version number of something. So I want to get rid of the whitespaces and the (ISU). I tried to use a substitution and XML::Twig to set the attribute:
$artifact->set_att(version=> $rowref->[5] =~ s/([^0-9\.])//g)
Interestingly what I got in my output was
<artifact [...] version="9"/>
I don't understand what I am doing wrong. I checked with a regular expression tester and it seems fine. Can somebody spot my error?

The return value of s/// is the number of substitutions it made, which in your case is 9. If you are using at least perl 5.14, add the r flag to the substitution:
If the "/r" (non-destructive) option is used then it runs the
substitution on a copy of the string and instead of returning the
number of substitutions, it returns the copy whether or not a
substitution occurred. The original string is never changed when
"/r" is used. The copy will always be a plain string, even if the
input is an object or a tied variable.
Otherwise, go through a temporary variable like this:
my $version = $rowref->[5];
$version =~ s/([^0-9\.])//g;
$artifact->set_att(version => $version);

The regex substitution changes the varialbe in place but returns the number of substitutions it made (1 without the /g modifier, if it was succesful).
my $str = 'words 123';
my $ret = $str =~ s/\d/numbers/g;
say "Got $ret. String is now: $str";
You can do the substitution first, $rowref->[5] =~ s/...//;, and then use the changed variable.

Related

Replacing with Named Captures and Precompiled Regular Expressions in Perl

I'm trying to compile a set of substitution regexes but I can't figure out how to delay interpolation of the capture variables in the replacement scalar I'm setting aside; here's a simple contrived example:
use strict;
use warnings;
my $from = "quick";
my $to = "zippy";
my $find = qr/${from} (?<a>(fox|dog))/;
my $repl = "$to $+{a}"; # Use of uninitialized value in concatenation (.) or string
my $s0 = "The quick fox...\n";
$s0 =~ s/${find}/${repl}/;
print($s0);
This doesn't work because repl is interpolated immediately and elicits "Use of uninitialized value in concatenation (.) or string"
If I use non-interpolating '' quotes it doesn't interpolate in the actual substitution so I get "The zippy $+{a}..."
Is there a trick to setting aside a replacement scalar that contains capture references?
You are getting the warning because you are using $+{a} before performing the match. qr// doesn't perform any matching; it's simply compiles the pattern. It's s/// that performs the match.
You presumably meant to use
my $repl = "$to \$+{a}";
But that simply outputs
The zippy \$+{a}...
You could use the following:
my $find = qr/quick (?<a>fox|dog)/;
my $s0 = "The quick fox...\n";
$s0 =~ s/$find/zippy $+{a}/;
print($s0);
But that hard codes the replacement expression. If you want this code to be dynamic, then what you are building is a template system.
I don't know of any template system with your specific desired syntax.
If you're ok with using the positional variables ($1) instead of named ones ($+{a}), you can use String::Substitution.
use String::Substitution qw( sub_modify );
my $find = qr/quick (?<a>fox|dog)/; # Or simply qr/\Q$from\E (fox|dog)/
my $repl = "zippy \$1";
my $s0 = "The quick fox...\n";
sub_modify($s0, $find, $repl);
print($s0);
The qr// only compiles a pattern. It does not perform a match, so it does not set anything in %+. Hence, the uninitialized warnings.
However, you can do that in the substitution so you don't need to prepare the replacement ahead of time:
s/$find/$to $+{a}/;
However, if you don't know what you want your replacement to be, you can eval code in the replacement side of the substitution that will then be the replacement. Here's a simple addition:
s/$find/ 2 + 2 /e;
You'd get the sum as the replacement:
The 4 jumped over the lazy dog
But here's the rub: That's code and it can do whatever code can do. How you construct that is very important and should never use unsanitized user input.
If you didn't know the string you wanted to put in there, you can construct it beforehand and store it in the variable you use in the replacement side. However, you are making Perl code to eval, so it needs to be a valid Perl string. The double quotes are part of the eval that you will eval later:
my $replacement = '"$to $+{a}"';
s/$find/$replacement/;
Like that, you get the literal string value from $replacement:
The "$to $+{a}" jumped over the lazy dog
Adding the /e means that we evaluate the replacement side as code:
s/$find/$replacement/e;
But, that code is $replacement, and ends up giving us the same result because it's just its string value:
The "$to $+{a}" jumped over the lazy dog
Now here's the fun part. We can eval again! Add another /e and the substitution will eval the first time, then take that result and eval it again:
$s0 =~ s/${find}/$replacement/ee;
The first round of the eval gets the literal text value of $replacement, which is "$to $+{a}" (including the double quotes). The second round takes "$to $+{a}" and evals that, filling in the variables with the values in the current lexical scope. The %+ is populated by the substitution already. Now you have your result:
The zippy fox jumped over the lazy dog
However, this isn't a trick you should pull out lightly. There might be a better way to attack your problem. You do this sort of thing when you bend anything else to your will.
You also have to be very careful that you do what you intend in the string that you construct. You are creating new Perl code. If you are using any sort of outside data that you didn't supply, someone can trick your program into running code that you didn't intend.
There are three good ways to do dynamic regex substitution at runtime:
String interpolation of variables s///
Callback for code execution s///e
Embedded code constructs in the regex.
See the examples below.
Normally a callback form, either via a function or Embedded regex code is used when logic is required to construct a replacement.
Otherwise, use a simple string interpolation on the replacement side.
use strict;
use warnings;
my $s0 = "";
my ($from, $to) = ("quick", "zippy") ;
sub getRepl {
my ($grp1, $grp2) = #_;
if ( $grp1 eq $from ) {
return "<$to $grp2>" }
else {
return "< $2>"
}
}
my $find = qr/(\Q${from}\E) (fox|dog)/;
# ======================================
# Substitution via string interpolation
$s0 = "The quick dog...\n";
$s0 =~ s/$find/[$to $2]/;
print $s0;
# ======================================
# Substitution via callback (eval)
$s0 = "The quick dog...\n";
$s0 =~ s/$find/ getRepl($1,$2) /e;
print $s0;
# ==================================================
# Substitution via regex embedded code constructs
my $repl = "";
my $RxCodeEmbed = qr/(\Q${from}\E)(?{$repl = '(' . $to}) (fox|dog)(?{$repl .= ' ' . $^N . ')'})/;
$s0 = "The quick dog...\n";
$s0 =~ s/$RxCodeEmbed/$repl/;
print $s0;
Outputs
The [zippy dog]...
The <zippy dog>...
The (zippy dog)...

$& not resolved as part of perl substitution

I have a perl script which searches and replaces data in multiple files. Since more than one word can be replaced in a file, I wrote a function that accepts the search and replace patterns as arrays. I then loop over the arrays in this function and perform the substitution. It works well but just for one particular file, I need to append something in front of the matched string( character #). Hence, I pass "#\$&" as my replace pattern. Its received properly but somehow the $& is never resolved. Instead the operation replaces the matched string with literal value of '#$&'. The same thing works if I directly use #$& in my substituion command in the readFile function. I know we may be able to achieve the result in other ways, but I really want to know why the same replacement pattern works when passed directly while it doesn't work when read as an array element.
I have commented the substitution command that works well for reference. Can anyone please help me spot the problem here ?
my #search= ("host\\s*(replication|all)");
my #replace= ("#\$&");
my $sLine = scalar #search;
my $rLine = scalar #replace;
my $data = ???;
for ( my $i=0; $i < $sLine; $i++)
{
print("\n search = $search[$i] replace = $replace[$i] \n");
#$data =~ s/$search[$i]/#$&/g; ==> this works
$data =~ s/$search[$i]/$replace[$i]/g; #==> this doesn't
}
print($data);
The difference between the working solution and the non-working solution is the same as the difference between
print "#$&"; # Prints `#` and the value of `$&`.
and
print "$replace[$i]"; # Prints the value of `$replace[$i]`.
You can use the following:
use String::Substitution qw( gsub_modify );
for my $i (0..$#search) {
gsub_modify($data, $search[$i], $replace[$i]);
}
This is a more in-depth explanation.
s/$search[$i]/#$&/g
is short for
s/$search[$i]/ "#$&" /eg
which is equivalent to
s/$search[$i]/ "#" . $& /eg # Replaces with `#` and the value of `$&`.
/e causes the replacement expression to be evaluated as Perl code, using its result as the replacement string.
On the other hand,
s/$search[$i]/$replace[$i]/g
is short for
s/$search[$i]/ "$replace[$i]" /eg
which is equivalent to
s/$search[$i]/ $replace[$i] /eg # Replaces with the value of `$replace[$i]`.

Powershell - Replacing a string with a variable ending with a dollar sign

I'm a bit lost with this one. For whatever reason the replace function in powershell doesn't play well with variables ending with a $ sign.
Command:
$var='A#$A#$'
$line=('$var='+"'"+"'")
$line -replace '^.+$',('$line='+"'"+$var+"'")
Expected output:
$line='A#$A#$'
Actual output:
$line='A#$A#
It looks like you're getting hit with a regex substitution that you don't want. The regex special variable $' represents everything after your match. Since your regex matches the entire string, $' is effectively empty. During the replace operation, the .Net regex engine sees $' in your expected output and substitutes in that empty string.
One way to avoid this is to replace all instances of $ in your $var string with $$:
$line -replace '^.+$',('$line='+"'"+($var.Replace('$','$$'))+"'")
You can see more information about regex substitution in .Net here:
Substitutions
I was able to find a band-aid of sorts by replacing $ with a special character and then reverting it back after the change. Preferably you would choose a character that doesn't have a key on your keyboard. For me I chose "¤".
$var='A#$A#$'
$var=$var -replace '\$','¤'
$line=("`$var=''")
$line -replace '^.+$',("`$line='$var'") -replace '¤','$'
I don't really understand the purpose of your posted lines, it seems to me that it would just make more sense to do $line='$line='''+$var+"'", BUT if you insist on your way, just do two replace calls, like this:
$line -replace '^.+$',('$line=''LOL''') -replace 'LOL',$var

Inline regex replacement in perl

Is there a way to replace text with a regex inline, rather than taking the text from a variable and storing it in a variable?
I'm a perl beginner. I often find myself writing
my $foo = $bar;
$foo =~ s/regex/replacement/;
doStuff($foo)
where I'd really like to write
doStuff($bar->replace(s/regex/replacement/));
or the like, rather than using a temporary variable and three lines.
Is there a way to do this? Obviously when the regex is sufficiently complicated it makes sense to split it out so it can be better explained, but when it's just s/\s//g it feels wrong to clutter the code with additional variables.
You really can't do what you want because the substitution function returns either a 1 if it worked or an empty string if it didn't work. That means if you did this:
doStuff($foo =~ s/regex/replacement/);
The doStuff function would be using either 1 or an empty string as a parameter. There is no reason why the substitution function couldn't return the resultant string instead of just a 1 if it worked. However, it was a design decision from the earliest days of Perl. Otherwise, what would happen with this?
$foo = "widget";
if ($foo =~ s/red/blue/) {
print "We only sell blue stuff and not red stuff!\n";
}
The resulting string is still widget, but the substitution actually failed. However, if the substitution returned the resulting string and not an empty string, the if would still be true.
Then, consider this case:
$bar = "FOO!";
if ($bar =~ s/FOO!//) {
print "Fixed up \'\$bar\'!\n";
}
$bar is now an empty string. If the substitution returned the result, it would return an empty string. Yet, the substitution actually succeeded and I want to my if to be true.
In most languages, the substitution function returns the resulting string, and you'd have to do something like this:
if ($bar != replace("$bar", "/FOO!//")) {
print "Fixed up \'\$bar''!\n";
}
So, because of a Perl design decision (basically to better mimic awk syntax), there's no easy way to do what you want. However you could have done this:
($foo = $bar) =~ s/regex/replacement/;
doStuff($foo);
That would do an in place setting of $foo without first assigning it the value of $bar. $bar would remain unchanged.
Starting from perl 5.14 you can use Non-destructive substitution to achieve desired behavior.
Use /r modifier to do so:
doStuff($bar=~s/regex/replacement/r);
use Algorithm::Loops "Filter";
# leaves $foo unchanged
doStuff( Filter { s/this/that/ } $foo );
You can use a do { } block to avoid creating a temporary variable in the current scope:
doStuff( do {(my $foo = $bar) =~ s/regex/replacement/; $foo} );
Is this what you want?:
my $foo = 'Replace this with that';
(my $bar = $foo) =~ s/this/that/;
print "Foo: $foo\nBar: $bar\n";
Prints:
Foo: Replace this with that
Bar: Replace that with that
There is yet another way: Write your own function:
sub replace (
my $variable = shift;
my $substring = shift;
eval "\$variable =~ s${substring};";
return $variable
}
doStuff(replace($foo, "/regex/replace/"));
This wouldn't be worth it for a single call, and it would probably just make your code more confusing in that case. However, if you're doing this a dozen or so times, it might make more sense to write your own function to do this.

Double interpolation of regular expressions in Perl

I have a Perl program that stores regular expressions in configuration files. They are in the form:
regex = ^/d+$
Elsewhere, the regex gets parsed from the file and stored in a variable - $regex.
I then use the variable when checking the regex, e.g.
$lValid = ($valuetocheck =~ /$regex/);
I want to be able to include perl variables in the config file, e.g.
regex = ^\d+$stored_regex$
But I can't work out how to do it.
When regular expressions are parsed by Perl they get interpreted twice.
First the variables are expanded, and then the the regular expression itself is parsed.
What I need is a three stage process:
First interpolate $regex, then interpolate the variables it contains and then parse the resulting regular expression.
Both the first two interpolations need to be "regular expression aware". e.g. they should know that the string contain $ as an anchor etc...
Any ideas?
You can define the regexp in your configuration file like this:
regex = ^\d+(??{$stored_regex})$
But you will need to disable a security check in the block where you're using the regexp by doing this in your Perl program:
use re 'eval';
Using eval can help you here. Take a look at the following code it can precompile a regexp that's ready to be used latter:
my $compiled_regexp;
my $regexp = '^\d+$stored_regexp$';
my $stored_regexp = 'a';
eval "\$compiled_regexp = qr/$regexp/;";
print "$compiled_regexp\n";
The operator qr// can be used to precompile a regexp. It lets you build it but doesn't execute it yet. You can first build your regexps with it and then use them latter.
Your Perl variables are not in scope within your configuration file, and I think that's a good thing. eval is scary.
You would be better off implementing your own templating.
So in the config file:
regex = ^\d+__TEMPLATE_FIELD__$
In the config file reader:
# something like this for every template field you need
$regex =~ s/__TEMPLATE_FIELD__/$stored_regex/g;
When using:
$lValid = ($valuetocheck =~ m/$regex/)
Move these around depending on at what point you want the template substitution to apply.
A tangentially related gotcha: If you do double interpolation inline, and you also have substitution strings in variables, consider:
# the concat with doublequotes in the replacement string
# are to make them PART OF THE STRING, NOT THE STRING DELIMITERS,
# in other words, so the 2nd interpolation sees a double quoted string :
# eval eval $replace -> eval $1 hello world -> syntax error
# eval eval $replace -> eval "$1 hellow world" -> works ok
# see: http://www.perlmonks.org?node_id=687031
if($line =~ s/$search/'"' . $replace . '"'/ee) {
# STUFF...
}