Regexp search and replace as variables in Perl - regex

I can't find a solution to this and its driving me crazy!
my $foo = qr/(\S+) (\X+)/;
my $bar = qr/$2/;
line =~ s/$foo/$bar/g
My problem is that $bar uses a previously defined value of $2 rather than the (\X+).

Please note that second part of s is not regex, but rather string to replace regex found. You can achieve what you want with this (note ee double-eval option at the end):
my $foo = qr/(\S+) (\X+)/;
my $bar = '$2'; # no interpolation
$line =~ s/$foo/$bar/gee; # first eval make $bar -> '$2', second replaces it

I guess value of $bar should just be a string and not a regex. The qr// doesn't look right there.

Similar to bvr's suggestion you can use a sub ref for the replacement side of s///. This has the advantage of being precompiled (both the sub ref, and the substitution) as opposed to being recompiled for each match. In most cases this will be faster and more likely to catch any errors at compile time.
my $foo = qr/(\S+) (\X+)/;
my $bar = sub { $2 }; # or my $bar = \&some_replace_function;
$line =~ s/$foo/$bar->()/ge;

Related

Limit the translation to just one word in a phrase?

Coming new to Perl world from Python, and wonder if there is a simple way to limit the translation or replace to just one word in a phrase?
In the example, the 2nd word kind also got changed to lind. Is there a simple way to do the translation without diving into some looping? Thanks.
The first word has been correctly translated to gazelle, but 2nd word has been changed too as you can see.
my $string = 'gazekke is one kind of antelope';
my $count = ($string =~ tr/k/l/);
print "There are $count changes \n";
print $string; # gazelle is one lind of antelope <-- kind becomes lind too!
I don't know of an option for tr to stop translation after the first word.
But you can use a regex with backreferences for this.
use strict;
my $string = 'gazekke is one kind of antelope';
# Match first word in $1 and rest of sentence in $2.
$string =~ m/(\w+)(.*)/;
# Translate all k's to l's in the first word.
(my $translated = $1) =~ tr/k/l/;
# Concatenate the translated first word with the rest
$string = "$translated$2";
print $string;
Outputs: gazelle is one kind of antelope
Pick the first match (a word in this case), precisely what regex does when without /g, and in that word replace all wanted characters, by running code in the replacement side, by /e
$string =~ s{(\w+)}{ $1 =~ s/k/l/gr }e;
In the regex in the replacement side, /r modifier makes it handily return the changed string and doesn't change the original, what also allows a substitution to run on $1 (which can't be modified as is a read-only).
tr is a character class transliterator. For anything else you would use regex.
$string =~ s/gazekke/gazelle/;
You can put a code block as the second half of s/// to do more complicated replacements or transmogrifications.
$string =~ s{([A-Za-z]+)}{ &mangler($1) if $should_be_mangled{$1}; }ge;
Edit:
Here's how you would first locate a phrase and then work on it.
$phrase_regex = qr/(?|(gazekke) is one kind of antelope|(etc))/;
$string =~ s{($phrase_regex)}{
my $match = $1;
my $word = $2;
$match =~ s{$word}{
my $new = $new_word_map{$word};
&additional_mangling($new);
$new;
}e;
$match;
}ge;
Here's the Perl regex documentation.
https://perldoc.perl.org/perlre

Why can't I store a regexp in a variable?

Given the following code,
my $string = "foo";
my $regex = s/foo/bar/;
$string =~ $regex;
print $string, "\n";
I would have expected the output to be bar, however it is foo. Why is that the case, and how can I solve that problem?
Note that in my actual case, the regex is more complicated, and I actually want to store several of them in a hash (so I can write something like $string =~ $rules{$key}).
You're looking for substitution, not only the regex part so I guess compiled regex (qr//) is not what you're looking for,
use strict;
use warnings;
my $string = "foo";
my $regex = sub { $_[0] =~ s/foo/bar/ };
$regex->($string);
print $string, "\n";
Your statement
my $regex = s/foo/bar/
is equivalent to
my $regex = $_ =~ s/foo/bar/
s/// returns the number of substitutions made, or it returns false (specifically, the empty string). So $regex is now '' or 1 (it could be more if the /g modifier was in effect) and
$string =~ $regex
is doing 'foo' =~ // or 'foo' =~ /1/ depending on what $_ contained originally.
You can store a regex pattern in a variable but, in your example, the regex is just foo, and there is a lot more going on than just that pattern
The statement s/foo/bar/ is more complex than it seems -- it is a fully-fledged statement that applies a regex pattern to a target string and substitutes a replacement string if the pattern is found. In this case the target string is the default variable $_ and the replacement string is foo. You could think of it as a call to a subroutine
substitute($_, 'foo', 'bar')
and the regex pattern is only the second parameter
What you can do is store a regex pattern. The regex part of that substitution is foo, and you can say
my $pattern = qr/foo/;
s/$pattern/bar/;
But you really should explain the problem that you're trying to solve so that we can help you better
In the assignment, you need to tell Perl not to evaluate the regular expression but just to keep it. This is what qr is for.
But you can't do this with whole substitutions, which is why Сухой27 suggests using a subroutine.

Perl regexp with $1 and /e

I'm making a regexp constructor.
But when running:
my $text = 'a a a';
my $replace = '$1/$2-$3';
$text =~ s/(\w) (\w+) (\w+)/$replace/gmi;
$text here = '$1/$2-$3';
So $1,$2,$3 are not changed but placed as they are in $replace. How would i make it use $replace content as manually printed replate pattern?
$replace is just a string. If you want it to be evaluated as code, you need the /e modifier in your substitution. But you also need to prepare your string for the evaluation to interpolate your variables:
my $replace = 'qq($1/$2-$3)';
$text =~ s/(\w) (\w+) (\w+)/$replace/gmiee;
We use double evaluation to first turn the variable into a string, then to evaluate that string.
However, whenever you find yourself relying on eval, you're probably doing something unnecessary. Eval can be rather evil, as OmnipotentEntity rightly points out, so be very careful about using it.

Match regex and assign results in single line of code

I want to be able to do a regex match on a variable and assign the results to the variable itself. What is the best way to do it?
I want to essentially combine lines 2 and 3 in a single line of code:
$variable = "some string";
$variable =~ /(find something).*/;
$variable = $1;
Is there a shorter/simpler way to do this? Am I missing something?
my($variable) = "some string" =~ /(e\s*str)/;
This works because
If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e., ($1, $2, $3 …).
and because my($variable) = ... (note the parentheses around the scalar) supplies list context to the match.
If the pattern fails to match, $variable gets the undefined value.
Why do you want it to be shorter? Does is really matter?
$variable = $1 if $variable =~ /(find something).*/;
If you are worried about the variable name or doing this repeatedly, wrap the thing in a subroutine and forget about it:
some_sub( $variable, qr/pattern/ );
sub some_sub { $_[0] = $1 if eval { $_[0] =~ m/$_[1]/ }; $1 };
However you implement it, the point of the subroutine is to make it reuseable so you give a particular set of lines a short name that stands in their place.
Several other answers mention a destructive substitution:
( my $new = $variable ) =~ s/pattern/replacement/;
I tend to keep the original data around, and Perl v5.14 has an /r flag that leaves the original alone and returns a new string with the replacement (instead of the count of replacements):
my $match = $variable =~ s/pattern/replacement/r;
Well, you could say
my $variable;
($variable) = ($variable = "find something soon") =~ /(find something).*/;
or
(my $variable = "find something soon") =~ s/^.*?(find something).*/$1/;
You can do substitution as:
$a = 'stackoverflow';
$a =~ s/(\w+)overflow/$1/;
$a is now "stack"
From Perl Cookbook 2nd ed
6.1 Copying and Substituting Simultaneously
$dst = $src;
$dst =~ s/this/that/;
becomes
($dst = $src) =~ s/this/that/;
I just assumed everyone did it this way, amazed that no one gave this answer.
Almost ....
You can combine the match and retrieve the matched value with a substitution.
$variable =~ s/.*(find something).*/$1/;
AFAIK, You will always have to copy the value though, unless you do not care to clobber the original.
$variable2 = "stackoverflow";
(my $variable1) = ($variable2 =~ /stack(\w+)/);
$variable1 now equals "overflow".
I do this:
#!/usr/bin/perl
$target = "n: 123";
my ($target) = $target =~ /n:\s*(\d+)/g;
print $target; # the var $target now is "123"
Also, to amplify the accepted answer using the ternary operator to allow you to specify a default if there is no match:
my $match = $variable =~ /(*pattern*).*/ ? $1 : *defaultValue*;

How do I remove all hyphens with a Perl regex?

I thought this would have done it...
$rowfetch = $DBS->{Row}->GetCharValue("meetdays");
$rowfetch = /[-]/gi;
printline($rowfetch);
But it seems that I'm missing a small yet critical piece of the regex syntax.
$rowfetch is always something along the lines of:
------S
-M-W---
--T-TF-
etc... to represent the days of the week a meeting happens
$rowfetch =~ s/-//gi
That's what you need for your second line there. You're just finding stuff, not actually changing it without the "s" prefix.
You also need to use the regex operator "=~" for this.
Here is what your code presently does:
# Assign 'rowfetch' to the value fetched from:
# The function 'GetCharValue' which is a method of:
# An Value in A Hash Identified by the key "Row" in:
# Either a Hash-Ref or a Blessed Hash-Ref
# Where 'GetCharValue' is given the parameter "meetdays"
$rowfetch = $DBS->{Row}->GetCharValue("meetdays");
# Assign $rowfetch to the number of times
# the default variable ( $_ ) matched the expression /[-]/
$rowfetch = /[-]/gi;
# Print the number of times.
printline($rowfetch);
Which is equivalent to having written the following code:
$rowfetch = ( $_ =~ /[-]/ )
printline( $rowfetch );
The magic you are looking for is the
=~
Token instead of
=
The former is a Regex operator, and the latter is an assignment operator.
There are many different regex operators too:
if( $subject =~ m/expression/ ){
}
Will make the given codeblock execute only if $subject matches the given expression, and
$subject =~ s/foo/bar/gi
Replaces ( s/) all instances of "foo" with "bar", case-insentitively (/i), and repeating the replacement more than once(/g), on the variable $subject.
Using the tr operator is faster than using a s/// regex substitution.
$rowfetch =~ tr/-//d;
Benchmark:
use Benchmark qw(cmpthese);
my $s = 'foo-bar-baz-blee-goo-glab-blech';
cmpthese(-5, {
trd => sub { (my $a = $s) =~ tr/-//d },
sub => sub { (my $a = $s) =~ s/-//g },
});
Results on my system:
Rate sub trd
sub 300754/s -- -79%
trd 1429005/s 375% --
Off-topic, but without the hyphens, how will you know whether a "T" is Tuesday or Thursday?