Search/replacement in string - regex

I have a Perl program where the user enters some text, a search pattern and a replacement string.
I am using the the s/// operator to replace the search pattern by the replacement string, but in this scenario, if the user enters capture variables (like $1) or backslash escapes (like \u or \L) in the replacement string, the replacement pattern should process these metacharacters but rather it is treating them as string characters.
I have the following code:
#!/usr/bin/perl -w
use strict;
chomp(my $text = <STDIN>); #read the text
chomp(my $regex = <STDIN>); #read the search pattern
chomp(my $replace = <STDIN>); #read the replacement pattern
$text=~s/$regex/$replace/g; # do replacement
print $text,"\n";
The sample input for this code is
fred flintstone and wilma flintstone are good couples
(fred|wilma) flintstone
\u\L$1\E Flintstone
The output is for above code is
\u\L$1\E Flintstone and \u\L$1\E Flintstone are good couples
I have found a way to make this correct in the following code:
#!/usr/bin/perl -w
use strict;
chomp(my $text = <STDIN>);
chomp(my $regex = <STDIN>);
chomp(my $replace = <STDIN>);
$replace = '"' . $replace . '"'; # creating the string as '"\u\L$1\E Flintstone"'
$text = ~s/$regex/$replace/gee; # applying double evaluation
print $text,"\n";
Now this code gives the correct output as
Fred Flintstone and Wilma Flintstone are good couples
I want to know if there is a better approach for this problem?

A double-quoted string cannot be interpolated without compiling it
The /ee modifier carries the same warnings as eval: it is always dangerous to execute code of any sort entered by the user, and even a simple string may contain a block consisting of any Perl code the user likes to enter, like "my dangerous string #{ unlink glob '*.*'}"
I suggest that you make use of the excellent String::Interpolate module, which exports a safe_interpolate function that will use the Safe module to perform the interpolation in a "safe" compartment
#!/usr/bin/perl
use strict;
use warnings 'all';
use String::Interpolate 'safe_interpolate';
my $text = 'fred flintstone and wilma flintstone are good couples';
my $regex = '(fred|wilma) flintstone';
my $replace = '\u\L$1\E Flintstone';
$text =~ s/$regex/ safe_interpolate($replace) /eg; # do replacement
print $text,"\n";

(First, you shouldn't use -w anymore. It's been supplanted by the (lexically scoped, more predictable) use warnings pragma in 2000.)
For your problem, you could use replace from the Data::Munge module, which is "a clone of javascript's String.prototype.replace".
use Data::Munge qw(replace);
$text = replace($text, $regex, $replace, 'g');
This will expand things like $& or $1, but not backslash sequences such as \u. For that, you could specify your own expansion function, but then you'd have to parse and replace special sequences in your replacement string manually. For $1 and friends, this is easy, but something like \Ufoo\Q$1\Ebar\Ebaz is hard to handle correctly, especially if $1 contains '\E' (even perl itself has had problems in this area historically). But if you get that part working, it's easy to plug into replace.

Related

Extract only pattern matched text

I have written a basic program using regular expression.
However the entire line is being returned instead of the matched part.
I want to extract the number only.
use strict;
use warnings;
my $line = "ABMA 1234";
$line =~ /(\s)(\d){4}/;
print $line; #prints *ABMA 1234*
Is my regular expression incorrect?
If you want to print 1234, you need to change your regex and print the 2nd match:
use strict;
use warnings;
my $line = "ABMA 1234";
$line =~ /(\s)(\d{4})/;
print $2;
You can replace the exact value with the corresponding values. And your are not removing the text \w;
use strict;
use warnings;
my $line = "ABMA 1234";
$line=~s/([A-z]*)\s+(\d+)/$2/;
print $line; #prints only 1234
If you want to store the value in the new string then
(my $newstring = $line)=~s/([A-z]*)\s+(\d+)/$2/;
print $newstring; #prints only 1234
Just try this:
I don't know how you output the match in perl but you can use below regex for output the full match in your regex, you might getting space appended with your result in your current regex.
\b[\d]{4}
DEMO

Simple perl regex replacement

Here is my perl code:
my $var="[url=/jobs/]click here[/url]";
$var =~ /\[url=(.+?)\](.+?)\[\/url\]/\2/g
I'm very new to perl so i am aware that its incorrect but how do i perform this regex replacement correctly.
The end result would be a transformation of $var to click here
So, with all the answers you know the substitute form is s///
However, with something this big you should break it up into parts
to make it easier to maintain. And also helps to get out of the
quagmire of delimiter hell.
This uses a pre-compiled regex and a callback function invoked with s///e
use strict;
use warnings;
# Pre-compiled regex
my $rx = qr{\[url=(.+?)\](.+?)\[/url\]};
# Callback
sub MakeAnchor {
my ($href,$text) = #_;
return '' . $text . '';
}
my $input = '[url=/jobs/]click here[/url]';
$input =~ s/$rx/MakeAnchor($1,$2)/eg;
print $input;
Outout
click here

Perl regex return matches from substitution

I am trying to simultaneously remove and store (into an array) all matches of some regex in a string.
To return matches from a string into an array, you could use
my #matches = $string=~/$pattern/g;
I would like to use a similar pattern for a substitution regex. Of course, one option is:
my #matches = $string=~/$pattern/g;
$string =~ s/$pattern//g;
But is there really no way to do this without running the regex engine over the full string twice? Something like
my #matches = $string=~s/$pattern//g
Except that this will only return the number of subs, regardless of list context. I would also take, as a consolation prize, a method to use qr// where I could simply modify the quoted regex to to a sub regex, but I don't know if that's possible either (and that wouldn't preclude searching the same string twice).
Perhaps the following will be helpful:
use warnings;
use strict;
my $string = 'I thistle thing am thinking this Thistle a changed thirsty string.';
my $pattern = '\b[Tt]hi\S+\b';
my #matches;
$string =~ s/($pattern)/push #matches, $1; ''/ge;
print "New string: $string; Removed: #matches\n";
Output:
New string: I am a changed string.; Removed: thistle thing thinking this Thistle thirsty
Here is another way to do it without executing Perl code inside the substitution. The trick is that the s///g will return one capture at a time and undef if it does not match, thus quitting the while loop.
use strict;
use warnings;
use Data::Dump;
my $string = "The example Kenosis came up with was way better than mine.";
my #matches;
push #matches, $1 while $string =~ s/(\b\w{4}\b)\s//;
dd #matches, $string;
__END__
(
"came",
"with",
"than",
"The example Kenosis up was way better mine.",
)

How to pass a replacing regex as a command line argument to a perl script

I am trying to write a simple perl script to apply a given regex to a filename among other things, and I am having trouble passing a regex into the script as an argument.
What I would like to be able to do is somthing like this:
> myscript 's/hi/bye/i' hi.h
bye.h
>
I have produced this code
#!/utils/bin/perl -w
use strict;
use warnings;
my $n_args = $#ARGV + 1;
my $regex = $ARGV[0];
for(my $i=1; $i<$n_args; $i++) {
my $file = $ARGV[$i];
$file =~ $regex;
print "OUTPUT: $file\n";
}
I cannot use qr because apparently it cannot be used on replacing regexes (although my source for this is a forum post so I'm happy to be proved wrong).
I would rather avoid passing the two parts in as seperate strings and manually doing the regex in the perl script.
Is it possible to pass the regex as an argument like this, and if so what is the best way to do it?
There's more than one way to do it, I think.
The Evial Way:
As you basically send in a regex expression, it can be evaluated to get the result. Like this:
my #args = ('s/hi/bye/', 'hi.h');
my ($regex, #filenames) = #args;
for my $file (#filenames) {
eval("\$file =~ $regex");
print "OUTPUT: $file\n";
}
Of course, following this way will open you to some very nasty surprises. For example, consider passing this set of arguments:
...
my #args = ('s/hi/bye/; print qq{MINE IS AN EVIL LAUGH!\n}', 'hi.h');
...
Yes, it will laugh at you most evailly.
The Safe Way:
my ($regex_expr, #filenames) = #args;
my ($substr, $replace) = $regex_expr =~ m#^s/((?:[^/]|\\/)+)/((?:[^/]|\\/)+)/#;
for my $file (#filenames) {
$file =~ s/$substr/$replace/;
print "OUTPUT: $file\n";
}
As you can see, we parse the expression given to us into two parts, then use these parts to build a full operator. Obviously, this approach is less flexible, but, of course, it's much more safe.
The Easiest Way:
my ($search, $replace, #filenames) = #args;
for my $file (#filenames) {
$file =~ s/$search/$replace/;
print "OUTPUT: $file\n";
}
Yes, that's right - no regex parsing at all! What happens here is we decided to take two arguments - 'search pattern' and 'replacement string' - instead of a single one. Will it make our script less flexible than the previous one? No, as we still had to parse the regex expression more-or-less regularly. But now user clearly understand all the data that is given to a command, which is usually quite an improvement. )
#args in both examples corresponds to #ARGV array.
The s/a/b/i is an operator, not simply a regular expression, so you need to use eval if you want it to be interpreted properly.
#!/usr/bin/env perl
use warnings;
use strict;
my $regex = shift;
my $sub = eval "sub { \$_[0] =~ $regex; }";
foreach my $file (#ARGV) {
&$sub($file);
print "OUTPUT: $file\n";
}
The trick here is that I'm substituting this "bit of code" into a string to produce Perl code that defines an anonymous subroutine $_[0] =~ s/a/b/i; (or whatever code you pass it), then using eval to compile that code and give me a code reference I can call from within the loop.
$ test.pl 's/foo/bar/' foo nicefood
OUTPUT: bar
OUTPUT: nicebard
$ test.pl 'tr/o/e/' foo nicefood
OUTPUT: fee
OUTPUT: nicefeed
This is more efficient than putting an eval "\$file =~ $regex;" inside the loop as then it'll get compiled and eval-ed at every iteration rather than just once up-front.
A word of warning about eval - as raina77ow's answer explains, you should avoid eval unless you're 100% sure you are always getting your input from a trusted source...
s/a/b/i is not a regex. It is a regex plus substitution. Unless you use the string eval, make this work might be pretty tough (consider s{a}<b>e and so on).
The trouble is that you are trying to pass a perl operator when all you really need to pass is the arguments:
myscript hi bye hi.h
In the script:
my ($find, $replace, #files) = #ARGV;
...
$file =~ s/$find/$replace/i;
Your code is a bit clunky. This is all you need:
use strict;
use warnings;
my ($find, $replace, #files) = #ARGV;
for my $file (#files) {
$file =~ s/$find/$replace/i;
print "$file\n";
}
Note that this way allows you to use meta characters in the regex, such as \w{2}foo?. This can be both a good thing and a bad thing. To make all characters intepreted literally (disable meta characters), you can use \Q ... \E like so:
... s/\Q$find\E/$replace/i;

How do I substitute with an evaluated expression in Perl?

There's a file dummy.txt
The contents are:
9/0/2010
9/2/2010
10/11/2010
I have to change the month portion (0,2,11) to +1, ie, (1,3,12)
I wrote the substitution regex as follows
$line =~ s/\/(\d+)\//\/\1+1\//;
It's is printing
9/0+1/2010
9/2+1/2010
10/11+1/2010
How to make it add - 3 numerically than perform string concat? 2+1??
Three changes:
You'll have to use the e modifier
to allow an expression in the
replacement part.
To make the replacement globally
you should use the g modifier. This is not needed if you've one date per line.
You use $1 on the replacement side, not a backreference
This should work:
$line =~ s{/(\d+)/}{'/'.($1+1).'/'}eg;
Also if your regex contains the delimiter you're using(/ in your case), it's better to choose a different delimiter ({} above), this way you don't have to escape the delimiter in the regex making your regex clean.
this works: (e is to evaluate the replacement string: see the perlrequick documentation).
$line = '8/10/2010';
$line =~ s!/(\d+)/!('/'.($1+1).'/')!e;
print $line;
It helps to use ! or some other character as the delimiter if your regular expression has / itself.
You can also use, from this question in Can Perl string interpolation perform any expression evaluation?
$line = '8/10/2010';
$line =~ s!/(\d+)/!("/#{[$1+1]}/")!e;
print $line;
but if this is a homework question, be ready to explain when the teacher asks you how you reach this solution.
How about this?
$ cat date.txt
9/0/2010
9/2/2010
10/11/2010
$ perl chdate.pl
9/1/2010
9/3/2010
10/12/2010
$ cat chdate.pl
use strict;
use warnings;
open my $fp, '<', "date.txt" or die $!;
while (<$fp>) {
chomp;
my #arr = split (/\//, $_);
my $temp = $arr[1]+1;
print "$arr[0]/$temp/$arr[2]\n";
}
close $fp;
$